why propensity scores should be used for matching · c c cccc cccc cc cccccc c cc c c cccc cc c c...

84
Why Propensity Scores Should Be Used for Matching Ben Jann University of Bern, [email protected] 2017 German Stata Users Group Meeting Berlin, June 23, 2017 Ben Jann (University of Bern) Propensity Scores Matching Berlin, 23.06.2017 1

Upload: duongkhanh

Post on 26-May-2018

410 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

Why Propensity Scores Should Be Used for Matching

Ben Jann

University of Bern benjannsozunibech

2017 German Stata Users Group MeetingBerlin June 23 2017

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 1

Contents

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 2

Counterfactual Causality (see Neyman 1923 Rubin 1974 1990)

aka Rubin Causal Model aka Potential Outcomes Framework

John Stuart Mill (1806ndash1873)

Thus if a person eats of a particulardish and dies in consequence that iswould not have died if he had not eatenof it people would be apt to say thateating of that dish was the cause of hisdeath (Mill 2002[1843]214)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 3

Counterfactual Causality (see Neyman 1923 Rubin 1974 1990)

aka Rubin Causal Model aka Potential Outcomes Framework

Treatment variable D

D =

1 treatment (eats of a particular dish)

0 control (does not eat of a particular dish)

Potential outcomes Y 1 and Y 0

I Y 1 potential outcome with treatment (D = 1)F If person i would eat of a particular dish would she die or would she

surviveI Y 0 potential outcome without treatment (D = 0)

F If person i would not eat of a particular dish would she die or wouldshe survive

Causal effect of the treatment for individual i

causal effect = difference between potential outcomes

δi = Y 1i minus Y 0

i

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 4

Fundamental Problem of Causal Inference

The causal effect of D on Y for individual i is defined as thedifference in potential outcomes δi = Y 1

i minus Y 0i

However the observed outcome variable is

Yi =

Y 1

i if Di = 1

Y 0i if Di = 0

That is only one of the two potential outcomes will be realized andhence only Y 1

i or Y 0i can be observed but never both

Consequence

The individual treatment effect δi cannot be observed

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 5

Average Treatment Effect

Although individual causal effects cannot be observed the averagecausal effect in a population (the so-called ldquoAverage TreatmentEffectrdquo) can be identified comparing the expected values of Y 1 andY 0

ATE = E [δ] = E [Y 1 minus Y 0] = E [Y 1]minus E [Y 0]

Some other quantities of interestI Average Treatment Effect on the Treated (ATT)

ATT = E [Y 1 minus Y 0|D = 1] = E [Y 1|D = 1]minus E [Y 0|D = 1]

I Average Treatment Effect on the Untreated (ATC)

ATC = E [Y 1 minus Y 0|D = 0] = E [Y 1|D = 0]minus E [Y 0|D = 0]

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 6

Average Treatment Effect

To determine the average effect unbiased estimates of E [Y 0] andE [Y 1] are required

If the independence assumption

(Y 0Y 1) perpperp D

applies that is if D is independent from Y 0 and Y 1 then

E [Y 0] = E [Y 0|D = 0]

E [Y 1] = E [Y 1|D = 1]

In this case the average causal effect can be be measured by asimple group comparison (mean difference) of observations withouttreatment (D = 0) and observations with treatment (D = 1)

Randomized experiments solve the problem If the assignment of Dis randomized D is independent from Y 0 and Y 1 by design

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 7

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 8

Conditional Independence Strong Ignorability

Can causal effects also be identified from ldquoobservationalrdquo (ienon-experimental) data

Sometimes it can be argued that the independence assumption isvalid conditionally (conditional independence ldquounconfoundednessrdquo)

(Y 0Y 1) perpperp D |X

If in addition the overlap assumption

0 lt Pr(D = 1|X = x) lt 1 for all x

is given then the ATE (or ATT or ATC) can be identified byconditioning on X

For example

ATE =sumx

Pr[X = x ] E [Y |D = 1X = x ]minus E [Y |D = 0X = x ]

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 9

Matching

Matching is one approach to ldquocondition on X rdquo if strong ignorabilityholds

Basic idea1 For each observation in the treatment group find ldquostatistical twinsrdquo in

the control group with the same (or at least very similar) X values(and vice versa)

2 The Y values of these matching observations are then used tocompute the counterfactual outcome for the observation at hand

3 An estimate for the average causal effect can be obtained as themean of the differences between the observed values and theldquoimputedrdquo counterfactual values over all observations

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 10

Matching

Formally

ATT =1

ND=1

sumi |D=1

[Yi minus Y 0

i

]=

1ND=1

sumi |D=1

Yi minussum

j |D=0

wijYj

ATC =

1ND=0

sumi |D=0

[Y 1

i minus Yi

]=

1ND=0

sumi |D=0

sumj |D=1

wijYj minus Yi

ATE =

ND=1

Nmiddot ATT +

ND=0

Nmiddot ATC

Different matching algorithms use different definitions of wij

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 11

Exact Matching

Exact matchingwij =

1ki if Xi = Xj

0 else

with ki as the number of observations for which Xi = Xj applies

The result equivalent to ldquoperfect stratificationrdquo or ldquosubclassificationrdquo(see eg Cochran 1968)

Problem If X contains several variables there is a large probabilitythat no exact matches can be found for many observations (theldquocurse of dimensionalityrdquo)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 12

Multivariate Distance Matching (MDM)

An alternative is to match based on a distance metric that measuresthe proximity between observations in the multivariate space of X

The idea then is to use observations that are ldquocloserdquo but notnecessarily equal as matches

A common approach is to use

MD(Xi Xj) =radic

(Xi minus Xj)primeΣminus1(Xi minus Xj)

as distance metric where Σ is an appropriate scaling matrix

I Mahalanobis matching Σ is the covariance matrix of X I Euclidean matching Σ is the identity matrixI Mahalanobis matching is equivalent to Euclidean matching based onstandardized and orthogonalized X

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 13

Matching Algorithms

Various matching algorithms can be employed to find potentialmatches based on MD and determine the matching weights wij

Pair matching (one-to-one matching without replacement)I For each observation i in the treatment group find observation j inthe control group for which MDij is smallest Once observation j isused as a match do not use it again

Nearest-neighbor matchingI For each observation i in the treatment group find the k closestobservations in the control group A single control can be usedmultiple times as a match In case of ties (multiple controls withidentical MD) use all ties as matches k is set by the researcher

Caliper matchingI Like nearest-neighbor matching but only use controls for which MDis smaller than some threshold c

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 14

Mahalanobis Matching

Radius matchingI Use all controls as matches for which MD is smaller than somethreshold c

Kernel matchingI Like radius matching but give larger weight to controls for which MDis small (using some kernel function such as eg the Epanechnikovkernel)

In addition since matching is no longer exact it may make sense torefine the estimates by applying regression-adjustment to thematched data (also known as ldquobias-adjustmentrdquo in the context ofnearest-neighbor matching)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 15

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 16

The Propensity Score Theorem (Rosenbaum and Rubin 1983)

If the conditional independence assumption is true then

Pr(Di = 1|Y 0i Y

1i Xi) = Pr(Di = 1|Xi) = π(Xi)

where π(X ) is called the propensity score

That is(Y 0Y 1) perpperp D |X

implies(Y 0Y 1) perpperp D |π(X )

so that under strong ignorability the average causal effect can beestimated by conditioning on the propensity score π(X ) instead of X

This is remarkable because the information in X which may includemany variables can be reduced to just one dimension This greatlysimplifies the matching task

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 17

Propensity Score Matching (PSM)

Instead of computing multivariate distances we can thus simplymatch on the (one-dimensional) propensity score

ProcedureI Step 1 Estimate the propensity score eg using a Logit modelI Step 2 Apply a matching algorithm using differences in thepropensity score |π(Xi )minus π(Xj)| instead of multivariate distances

PSM is tremendously popularI httpsscholargooglechscholarq=propensity+score+AND+(matching+OR+matched+OR+match)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 18

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 19

King and Nielsen

In 20152016 Gary King and Richard Nielsen circulated a paper thatcreated quite some concern among applied researchers

The basic message of the paper is that PSM is really really bad andshould be discarded

The paperI httpjmp1sexgVw

SlidesI httpsgkingharvardedupresentationswhy-propensity-scores-should-not-be-used-matching-6

Watch itI httpswwwyoutubecomwatchv=rBv39pK1iEs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 20

King and Nielsen

The story goes about as follows

Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)

I Matching is good because it reduces model dependence

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 21

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

CC

C

CC

C

C

CC

C

C

C

C

C

C

C

C

CC C

C

C

C

C

C

C

C

C

C

C

C

CCC

CC

CC

C

C

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

CC

C

CC

C

C

CC

C

C

C

C

C

C

C

C

CC C

C

C

C

C

C

C

C

C

C

C

C

CCC

CC

CC

C

C

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

CC

C

CC

C

C

CC

C

C

C

C

C

C

C

C

CC C

C

C

C

C

C

C

C

C

C

C

C

CCC

CC

CC

C

C

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

TC

C

C

CC

CC

C

C

CC

C CC

C

C

CCCC

C

CC

C

CC

CCCC

C

C

C

C

CC

CCCC

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

TC

C

C

CC

CC

C

C

CC

C CC

C

C

CCCC

C

CC

C

CC

CCCC

C

C

C

C

CC

CCCC

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

King and NielsenArgument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomizationMatching Finding Hidden Randomized Experiments

Types of Experiments

BalanceCovariates

CompleteRandomization

FullyBlocked

Observed On average ExactUnobserved On average On average

Fully blocked dominates complete randomization forimbalance model dependence power eciency bias researchcosts robustness Eg Imai King Nall 2009 SEs 600 smaller

Goal of Each Matching Method (in Observational Data)

bull PSM complete randomization

bull Other methods fully blocked

bull Other matching methods dominate PSM (wait it gets worse)

623

(slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 23

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

CC CCC CCCCC CCCCCC CCC CC C C CCCCC CCC CC CC CC CC CC CC CC CC C CCCCC CC CC C CCCC C CCC CC CC C CCC CC CC CC C CCC CC CC CCC C CC CCCCC CC CCCC C CCC CCC CC C C CCCC CC CCCC C CCCC C CC CCC CC CCC CC CC CC CC C CCCC C CC C C CC CCC CC CCC CCCCC CCCC CCC CC CCC C C CC CC CC CCCC CC CC CCC C C CCC C CC CCC CC C CCC C C CCC CC CC C CC C CC CCCCC CCCC C C CC C CCCC CC CCC CCC C CCC CC CC CC CC CC C CC C CC CC CCC CC C C CCC CCC C CC CC CCC CCC CC CCC CC C CCC CC C CC CCCC C CC C CC CC C CC C CCC C C CCC CC C CCC CCC CC CCCC CC CC C CC C CC CC C CC CCC CC C CCC CCCC C CC CC C CCC CC CC CC C CCCCC CCC C C CC C CC CCC CCC CC CCC CC CCCC C CCC CC CCC

923 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

923 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching is Suboptimal

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

CC

C C

CC

C

C

C CC

C

C

CC

C

C

C

C

C

CC CC

C

C

C

C

CC

C

CC

C

C

C

C

C

C

C

C

C C

CC

CCC

CC

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

King and Nielsen

Argument 3I Random pruning (deleting observations at random) increasesimbalance This is because the sample size decreases so that varianceincreases (large differences become more likely)

I More imbalancevariance means more model dependence andresearcher discretion

I Because PSM approximates complete randomization it engages inrandom pruning

I PSM Paradox (ldquowhen you do lsquobetterrsquo you do worserdquo)F When matching is made more strict (eg by decreasing the size of

the caliper) PSM like other matching methods typically reducesimbalance But soon the PSM Paradox kicks in such that furtherpruning quickly increases imbalance

F If the data is such that there are no big differences between treatedand untreated to begin with the PSM Paradox kicks in almostimmediately

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 25

PSM Increases Model Dependence amp Bias

Model Dependence0

000

010

020

030

040

05

Number of Units Pruned

Varia

nce

0 40 80 120 160

MDM

PSM

Bias

20

25

30

35

40

Number of Units Pruned

Max

imum

Coe

ffici

ent a

cros

s 51

2 Sp

ecifi

catio

ns0 40 80 120 160

MDM

PSM

True effect = 2

Yi = 2Ti + X1i + X2i + ii N(0 1)

2023 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26

The Propensity Score Paradox in Real Data

Finkel et al (JOP 2012)

0 500 1000 1500 2000 2500 3000

0

2

4

6

8

10

Number of units pruned

Imba

lanc

e

CEMMDM

PSM

14 SD caliper

Raw

Random

Nielsen et al (AJPS 2011)

0 500 1000 1500 2000 2500

0

5

10

15

20

25

30

Number of units pruned

Imba

lanc

e

CEM

MDM

PSM

14 SD caliper

RandomRaw

Similar pattern for gt 20 other real data sets we checked

2123 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 27

Are King and Nielsen right

Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)

I Matching is good because it reduces model dependence

I fully agree

My view however may be somewhat less pessimistic I believe thatresearch results can be credible if researchers are well educated sothat they know what they are doing and if modeling decisions aremade transparent and robustness of results is evaluated (anddocumented)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 28

Are King and Nielsen right

Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization

That fully blocked randomization is more efficient than completerandomization ndash given the sample size ndash is of course true (how largethe efficiency gains are depends on the strength of the relationbetween X and Y )

However if blocking reduces the sample size it is not a priori clearwhether estimates from the blocked sample are more efficient thanestimates from the full sample (although often they will be)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29

Are King and Nielsen right

Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization

That PSM approximates complete randomization is only partiallytrue PSM approximates complete randomization withinobservations with the same propensity score Hence PSM issomewhere between complete randomization and fully blockedrandomizationI If the X variables have no relation to T (treatment) then allobservations have the same propensity score Hence we end up withcomplete randomization

I If the X variables have a strong effect on T there is lots of blocking

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)

As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)

Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary

I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)

In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31

The Command

kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)

I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics

I Computationally efficient implementation

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32

Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)

drop if industry==2(4 observations deleted)

Mahalanobis-distance kernel matching

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33

Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)

Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif

collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888

3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0

10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0

2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0

Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio

collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966

3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1

10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1

2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34

Some Examples make a graph of the balancing stats mat M = r(M)

mat V = r(V)

coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)

addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))

addplot 2 xline(1) norescaling

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-4 -2 0 2 4 0 1 2 3 4

Std mean difference Variance ratio

Raw Matched

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35

Some Examples

Propensity-score kernel matching

kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Propensity-score kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 431 26 457 1214 182 1396 00188

Treatment-effects estimation

wage Coef

ATT 3887224NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36

Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)

01

23

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Den

sity

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37

Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)

05

1

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Cum

ulat

ive

prob

abilit

y

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38

Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)

02

46

8Raw Matched (ATT)

Untreated Treated

Prop

ensi

ty s

core

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39

Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289

NATE 1432913 2333282 614 0000 9755981 1890228

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40

Some Examples

Do some tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41

Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42

Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43

Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44

Some Examples

Mahalanobis-distance and propensity-score matching combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45

Some Examples

Exact matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46

Some Examples

Bandwidth selection the default (based on distribution of distances in

one-nearest-neighbor matching)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47

Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

3

02

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48

Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49

Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

4

1112

1314

Wei

ghte

d M

ISE

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50

Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51

Some Examples make a graph of the common-support stats mat M = r(M)

coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52

Some Examples

Multiple outcome variables

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53

Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54

Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2763358 166 0097 -082975 1000241

1ATT 9518705 406903 234 0019 1543553 1749386

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 123Prob gt chi2 = 02679

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4452343 111 0268 -379406 1365881

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017-06-24

Propensity Scores MatchingIllustration using kmatch

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017-06-24

Propensity Scores MatchingIllustration using kmatch

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64

Conclusions

The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)

Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65

Conclusions

Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the

literature (somewhat unclear eg how categorical variables should betreated)

Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 2: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

Contents

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 2

Counterfactual Causality (see Neyman 1923 Rubin 1974 1990)

aka Rubin Causal Model aka Potential Outcomes Framework

John Stuart Mill (1806ndash1873)

Thus if a person eats of a particulardish and dies in consequence that iswould not have died if he had not eatenof it people would be apt to say thateating of that dish was the cause of hisdeath (Mill 2002[1843]214)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 3

Counterfactual Causality (see Neyman 1923 Rubin 1974 1990)

aka Rubin Causal Model aka Potential Outcomes Framework

Treatment variable D

D =

1 treatment (eats of a particular dish)

0 control (does not eat of a particular dish)

Potential outcomes Y 1 and Y 0

I Y 1 potential outcome with treatment (D = 1)F If person i would eat of a particular dish would she die or would she

surviveI Y 0 potential outcome without treatment (D = 0)

F If person i would not eat of a particular dish would she die or wouldshe survive

Causal effect of the treatment for individual i

causal effect = difference between potential outcomes

δi = Y 1i minus Y 0

i

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 4

Fundamental Problem of Causal Inference

The causal effect of D on Y for individual i is defined as thedifference in potential outcomes δi = Y 1

i minus Y 0i

However the observed outcome variable is

Yi =

Y 1

i if Di = 1

Y 0i if Di = 0

That is only one of the two potential outcomes will be realized andhence only Y 1

i or Y 0i can be observed but never both

Consequence

The individual treatment effect δi cannot be observed

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 5

Average Treatment Effect

Although individual causal effects cannot be observed the averagecausal effect in a population (the so-called ldquoAverage TreatmentEffectrdquo) can be identified comparing the expected values of Y 1 andY 0

ATE = E [δ] = E [Y 1 minus Y 0] = E [Y 1]minus E [Y 0]

Some other quantities of interestI Average Treatment Effect on the Treated (ATT)

ATT = E [Y 1 minus Y 0|D = 1] = E [Y 1|D = 1]minus E [Y 0|D = 1]

I Average Treatment Effect on the Untreated (ATC)

ATC = E [Y 1 minus Y 0|D = 0] = E [Y 1|D = 0]minus E [Y 0|D = 0]

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 6

Average Treatment Effect

To determine the average effect unbiased estimates of E [Y 0] andE [Y 1] are required

If the independence assumption

(Y 0Y 1) perpperp D

applies that is if D is independent from Y 0 and Y 1 then

E [Y 0] = E [Y 0|D = 0]

E [Y 1] = E [Y 1|D = 1]

In this case the average causal effect can be be measured by asimple group comparison (mean difference) of observations withouttreatment (D = 0) and observations with treatment (D = 1)

Randomized experiments solve the problem If the assignment of Dis randomized D is independent from Y 0 and Y 1 by design

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 7

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 8

Conditional Independence Strong Ignorability

Can causal effects also be identified from ldquoobservationalrdquo (ienon-experimental) data

Sometimes it can be argued that the independence assumption isvalid conditionally (conditional independence ldquounconfoundednessrdquo)

(Y 0Y 1) perpperp D |X

If in addition the overlap assumption

0 lt Pr(D = 1|X = x) lt 1 for all x

is given then the ATE (or ATT or ATC) can be identified byconditioning on X

For example

ATE =sumx

Pr[X = x ] E [Y |D = 1X = x ]minus E [Y |D = 0X = x ]

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 9

Matching

Matching is one approach to ldquocondition on X rdquo if strong ignorabilityholds

Basic idea1 For each observation in the treatment group find ldquostatistical twinsrdquo in

the control group with the same (or at least very similar) X values(and vice versa)

2 The Y values of these matching observations are then used tocompute the counterfactual outcome for the observation at hand

3 An estimate for the average causal effect can be obtained as themean of the differences between the observed values and theldquoimputedrdquo counterfactual values over all observations

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 10

Matching

Formally

ATT =1

ND=1

sumi |D=1

[Yi minus Y 0

i

]=

1ND=1

sumi |D=1

Yi minussum

j |D=0

wijYj

ATC =

1ND=0

sumi |D=0

[Y 1

i minus Yi

]=

1ND=0

sumi |D=0

sumj |D=1

wijYj minus Yi

ATE =

ND=1

Nmiddot ATT +

ND=0

Nmiddot ATC

Different matching algorithms use different definitions of wij

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 11

Exact Matching

Exact matchingwij =

1ki if Xi = Xj

0 else

with ki as the number of observations for which Xi = Xj applies

The result equivalent to ldquoperfect stratificationrdquo or ldquosubclassificationrdquo(see eg Cochran 1968)

Problem If X contains several variables there is a large probabilitythat no exact matches can be found for many observations (theldquocurse of dimensionalityrdquo)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 12

Multivariate Distance Matching (MDM)

An alternative is to match based on a distance metric that measuresthe proximity between observations in the multivariate space of X

The idea then is to use observations that are ldquocloserdquo but notnecessarily equal as matches

A common approach is to use

MD(Xi Xj) =radic

(Xi minus Xj)primeΣminus1(Xi minus Xj)

as distance metric where Σ is an appropriate scaling matrix

I Mahalanobis matching Σ is the covariance matrix of X I Euclidean matching Σ is the identity matrixI Mahalanobis matching is equivalent to Euclidean matching based onstandardized and orthogonalized X

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 13

Matching Algorithms

Various matching algorithms can be employed to find potentialmatches based on MD and determine the matching weights wij

Pair matching (one-to-one matching without replacement)I For each observation i in the treatment group find observation j inthe control group for which MDij is smallest Once observation j isused as a match do not use it again

Nearest-neighbor matchingI For each observation i in the treatment group find the k closestobservations in the control group A single control can be usedmultiple times as a match In case of ties (multiple controls withidentical MD) use all ties as matches k is set by the researcher

Caliper matchingI Like nearest-neighbor matching but only use controls for which MDis smaller than some threshold c

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 14

Mahalanobis Matching

Radius matchingI Use all controls as matches for which MD is smaller than somethreshold c

Kernel matchingI Like radius matching but give larger weight to controls for which MDis small (using some kernel function such as eg the Epanechnikovkernel)

In addition since matching is no longer exact it may make sense torefine the estimates by applying regression-adjustment to thematched data (also known as ldquobias-adjustmentrdquo in the context ofnearest-neighbor matching)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 15

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 16

The Propensity Score Theorem (Rosenbaum and Rubin 1983)

If the conditional independence assumption is true then

Pr(Di = 1|Y 0i Y

1i Xi) = Pr(Di = 1|Xi) = π(Xi)

where π(X ) is called the propensity score

That is(Y 0Y 1) perpperp D |X

implies(Y 0Y 1) perpperp D |π(X )

so that under strong ignorability the average causal effect can beestimated by conditioning on the propensity score π(X ) instead of X

This is remarkable because the information in X which may includemany variables can be reduced to just one dimension This greatlysimplifies the matching task

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 17

Propensity Score Matching (PSM)

Instead of computing multivariate distances we can thus simplymatch on the (one-dimensional) propensity score

ProcedureI Step 1 Estimate the propensity score eg using a Logit modelI Step 2 Apply a matching algorithm using differences in thepropensity score |π(Xi )minus π(Xj)| instead of multivariate distances

PSM is tremendously popularI httpsscholargooglechscholarq=propensity+score+AND+(matching+OR+matched+OR+match)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 18

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 19

King and Nielsen

In 20152016 Gary King and Richard Nielsen circulated a paper thatcreated quite some concern among applied researchers

The basic message of the paper is that PSM is really really bad andshould be discarded

The paperI httpjmp1sexgVw

SlidesI httpsgkingharvardedupresentationswhy-propensity-scores-should-not-be-used-matching-6

Watch itI httpswwwyoutubecomwatchv=rBv39pK1iEs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 20

King and Nielsen

The story goes about as follows

Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)

I Matching is good because it reduces model dependence

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 21

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

CC

C

CC

C

C

CC

C

C

C

C

C

C

C

C

CC C

C

C

C

C

C

C

C

C

C

C

C

CCC

CC

CC

C

C

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

CC

C

CC

C

C

CC

C

C

C

C

C

C

C

C

CC C

C

C

C

C

C

C

C

C

C

C

C

CCC

CC

CC

C

C

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

CC

C

CC

C

C

CC

C

C

C

C

C

C

C

C

CC C

C

C

C

C

C

C

C

C

C

C

C

CCC

CC

CC

C

C

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

TC

C

C

CC

CC

C

C

CC

C CC

C

C

CCCC

C

CC

C

CC

CCCC

C

C

C

C

CC

CCCC

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

TC

C

C

CC

CC

C

C

CC

C CC

C

C

CCCC

C

CC

C

CC

CCCC

C

C

C

C

CC

CCCC

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

King and NielsenArgument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomizationMatching Finding Hidden Randomized Experiments

Types of Experiments

BalanceCovariates

CompleteRandomization

FullyBlocked

Observed On average ExactUnobserved On average On average

Fully blocked dominates complete randomization forimbalance model dependence power eciency bias researchcosts robustness Eg Imai King Nall 2009 SEs 600 smaller

Goal of Each Matching Method (in Observational Data)

bull PSM complete randomization

bull Other methods fully blocked

bull Other matching methods dominate PSM (wait it gets worse)

623

(slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 23

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

CC CCC CCCCC CCCCCC CCC CC C C CCCCC CCC CC CC CC CC CC CC CC CC C CCCCC CC CC C CCCC C CCC CC CC C CCC CC CC CC C CCC CC CC CCC C CC CCCCC CC CCCC C CCC CCC CC C C CCCC CC CCCC C CCCC C CC CCC CC CCC CC CC CC CC C CCCC C CC C C CC CCC CC CCC CCCCC CCCC CCC CC CCC C C CC CC CC CCCC CC CC CCC C C CCC C CC CCC CC C CCC C C CCC CC CC C CC C CC CCCCC CCCC C C CC C CCCC CC CCC CCC C CCC CC CC CC CC CC C CC C CC CC CCC CC C C CCC CCC C CC CC CCC CCC CC CCC CC C CCC CC C CC CCCC C CC C CC CC C CC C CCC C C CCC CC C CCC CCC CC CCCC CC CC C CC C CC CC C CC CCC CC C CCC CCCC C CC CC C CCC CC CC CC C CCCCC CCC C C CC C CC CCC CCC CC CCC CC CCCC C CCC CC CCC

923 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

923 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching is Suboptimal

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

CC

C C

CC

C

C

C CC

C

C

CC

C

C

C

C

C

CC CC

C

C

C

C

CC

C

CC

C

C

C

C

C

C

C

C

C C

CC

CCC

CC

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

King and Nielsen

Argument 3I Random pruning (deleting observations at random) increasesimbalance This is because the sample size decreases so that varianceincreases (large differences become more likely)

I More imbalancevariance means more model dependence andresearcher discretion

I Because PSM approximates complete randomization it engages inrandom pruning

I PSM Paradox (ldquowhen you do lsquobetterrsquo you do worserdquo)F When matching is made more strict (eg by decreasing the size of

the caliper) PSM like other matching methods typically reducesimbalance But soon the PSM Paradox kicks in such that furtherpruning quickly increases imbalance

F If the data is such that there are no big differences between treatedand untreated to begin with the PSM Paradox kicks in almostimmediately

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 25

PSM Increases Model Dependence amp Bias

Model Dependence0

000

010

020

030

040

05

Number of Units Pruned

Varia

nce

0 40 80 120 160

MDM

PSM

Bias

20

25

30

35

40

Number of Units Pruned

Max

imum

Coe

ffici

ent a

cros

s 51

2 Sp

ecifi

catio

ns0 40 80 120 160

MDM

PSM

True effect = 2

Yi = 2Ti + X1i + X2i + ii N(0 1)

2023 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26

The Propensity Score Paradox in Real Data

Finkel et al (JOP 2012)

0 500 1000 1500 2000 2500 3000

0

2

4

6

8

10

Number of units pruned

Imba

lanc

e

CEMMDM

PSM

14 SD caliper

Raw

Random

Nielsen et al (AJPS 2011)

0 500 1000 1500 2000 2500

0

5

10

15

20

25

30

Number of units pruned

Imba

lanc

e

CEM

MDM

PSM

14 SD caliper

RandomRaw

Similar pattern for gt 20 other real data sets we checked

2123 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 27

Are King and Nielsen right

Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)

I Matching is good because it reduces model dependence

I fully agree

My view however may be somewhat less pessimistic I believe thatresearch results can be credible if researchers are well educated sothat they know what they are doing and if modeling decisions aremade transparent and robustness of results is evaluated (anddocumented)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 28

Are King and Nielsen right

Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization

That fully blocked randomization is more efficient than completerandomization ndash given the sample size ndash is of course true (how largethe efficiency gains are depends on the strength of the relationbetween X and Y )

However if blocking reduces the sample size it is not a priori clearwhether estimates from the blocked sample are more efficient thanestimates from the full sample (although often they will be)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29

Are King and Nielsen right

Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization

That PSM approximates complete randomization is only partiallytrue PSM approximates complete randomization withinobservations with the same propensity score Hence PSM issomewhere between complete randomization and fully blockedrandomizationI If the X variables have no relation to T (treatment) then allobservations have the same propensity score Hence we end up withcomplete randomization

I If the X variables have a strong effect on T there is lots of blocking

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)

As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)

Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary

I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)

In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31

The Command

kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)

I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics

I Computationally efficient implementation

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32

Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)

drop if industry==2(4 observations deleted)

Mahalanobis-distance kernel matching

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33

Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)

Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif

collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888

3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0

10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0

2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0

Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio

collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966

3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1

10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1

2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34

Some Examples make a graph of the balancing stats mat M = r(M)

mat V = r(V)

coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)

addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))

addplot 2 xline(1) norescaling

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-4 -2 0 2 4 0 1 2 3 4

Std mean difference Variance ratio

Raw Matched

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35

Some Examples

Propensity-score kernel matching

kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Propensity-score kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 431 26 457 1214 182 1396 00188

Treatment-effects estimation

wage Coef

ATT 3887224NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36

Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)

01

23

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Den

sity

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37

Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)

05

1

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Cum

ulat

ive

prob

abilit

y

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38

Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)

02

46

8Raw Matched (ATT)

Untreated Treated

Prop

ensi

ty s

core

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39

Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289

NATE 1432913 2333282 614 0000 9755981 1890228

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40

Some Examples

Do some tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41

Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42

Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43

Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44

Some Examples

Mahalanobis-distance and propensity-score matching combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45

Some Examples

Exact matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46

Some Examples

Bandwidth selection the default (based on distribution of distances in

one-nearest-neighbor matching)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47

Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

3

02

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48

Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49

Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

4

1112

1314

Wei

ghte

d M

ISE

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50

Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51

Some Examples make a graph of the common-support stats mat M = r(M)

coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52

Some Examples

Multiple outcome variables

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53

Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54

Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2763358 166 0097 -082975 1000241

1ATT 9518705 406903 234 0019 1543553 1749386

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 123Prob gt chi2 = 02679

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4452343 111 0268 -379406 1365881

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017-06-24

Propensity Scores MatchingIllustration using kmatch

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017-06-24

Propensity Scores MatchingIllustration using kmatch

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64

Conclusions

The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)

Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65

Conclusions

Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the

literature (somewhat unclear eg how categorical variables should betreated)

Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 3: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

Counterfactual Causality (see Neyman 1923 Rubin 1974 1990)

aka Rubin Causal Model aka Potential Outcomes Framework

John Stuart Mill (1806ndash1873)

Thus if a person eats of a particulardish and dies in consequence that iswould not have died if he had not eatenof it people would be apt to say thateating of that dish was the cause of hisdeath (Mill 2002[1843]214)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 3

Counterfactual Causality (see Neyman 1923 Rubin 1974 1990)

aka Rubin Causal Model aka Potential Outcomes Framework

Treatment variable D

D =

1 treatment (eats of a particular dish)

0 control (does not eat of a particular dish)

Potential outcomes Y 1 and Y 0

I Y 1 potential outcome with treatment (D = 1)F If person i would eat of a particular dish would she die or would she

surviveI Y 0 potential outcome without treatment (D = 0)

F If person i would not eat of a particular dish would she die or wouldshe survive

Causal effect of the treatment for individual i

causal effect = difference between potential outcomes

δi = Y 1i minus Y 0

i

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 4

Fundamental Problem of Causal Inference

The causal effect of D on Y for individual i is defined as thedifference in potential outcomes δi = Y 1

i minus Y 0i

However the observed outcome variable is

Yi =

Y 1

i if Di = 1

Y 0i if Di = 0

That is only one of the two potential outcomes will be realized andhence only Y 1

i or Y 0i can be observed but never both

Consequence

The individual treatment effect δi cannot be observed

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 5

Average Treatment Effect

Although individual causal effects cannot be observed the averagecausal effect in a population (the so-called ldquoAverage TreatmentEffectrdquo) can be identified comparing the expected values of Y 1 andY 0

ATE = E [δ] = E [Y 1 minus Y 0] = E [Y 1]minus E [Y 0]

Some other quantities of interestI Average Treatment Effect on the Treated (ATT)

ATT = E [Y 1 minus Y 0|D = 1] = E [Y 1|D = 1]minus E [Y 0|D = 1]

I Average Treatment Effect on the Untreated (ATC)

ATC = E [Y 1 minus Y 0|D = 0] = E [Y 1|D = 0]minus E [Y 0|D = 0]

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 6

Average Treatment Effect

To determine the average effect unbiased estimates of E [Y 0] andE [Y 1] are required

If the independence assumption

(Y 0Y 1) perpperp D

applies that is if D is independent from Y 0 and Y 1 then

E [Y 0] = E [Y 0|D = 0]

E [Y 1] = E [Y 1|D = 1]

In this case the average causal effect can be be measured by asimple group comparison (mean difference) of observations withouttreatment (D = 0) and observations with treatment (D = 1)

Randomized experiments solve the problem If the assignment of Dis randomized D is independent from Y 0 and Y 1 by design

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 7

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 8

Conditional Independence Strong Ignorability

Can causal effects also be identified from ldquoobservationalrdquo (ienon-experimental) data

Sometimes it can be argued that the independence assumption isvalid conditionally (conditional independence ldquounconfoundednessrdquo)

(Y 0Y 1) perpperp D |X

If in addition the overlap assumption

0 lt Pr(D = 1|X = x) lt 1 for all x

is given then the ATE (or ATT or ATC) can be identified byconditioning on X

For example

ATE =sumx

Pr[X = x ] E [Y |D = 1X = x ]minus E [Y |D = 0X = x ]

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 9

Matching

Matching is one approach to ldquocondition on X rdquo if strong ignorabilityholds

Basic idea1 For each observation in the treatment group find ldquostatistical twinsrdquo in

the control group with the same (or at least very similar) X values(and vice versa)

2 The Y values of these matching observations are then used tocompute the counterfactual outcome for the observation at hand

3 An estimate for the average causal effect can be obtained as themean of the differences between the observed values and theldquoimputedrdquo counterfactual values over all observations

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 10

Matching

Formally

ATT =1

ND=1

sumi |D=1

[Yi minus Y 0

i

]=

1ND=1

sumi |D=1

Yi minussum

j |D=0

wijYj

ATC =

1ND=0

sumi |D=0

[Y 1

i minus Yi

]=

1ND=0

sumi |D=0

sumj |D=1

wijYj minus Yi

ATE =

ND=1

Nmiddot ATT +

ND=0

Nmiddot ATC

Different matching algorithms use different definitions of wij

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 11

Exact Matching

Exact matchingwij =

1ki if Xi = Xj

0 else

with ki as the number of observations for which Xi = Xj applies

The result equivalent to ldquoperfect stratificationrdquo or ldquosubclassificationrdquo(see eg Cochran 1968)

Problem If X contains several variables there is a large probabilitythat no exact matches can be found for many observations (theldquocurse of dimensionalityrdquo)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 12

Multivariate Distance Matching (MDM)

An alternative is to match based on a distance metric that measuresthe proximity between observations in the multivariate space of X

The idea then is to use observations that are ldquocloserdquo but notnecessarily equal as matches

A common approach is to use

MD(Xi Xj) =radic

(Xi minus Xj)primeΣminus1(Xi minus Xj)

as distance metric where Σ is an appropriate scaling matrix

I Mahalanobis matching Σ is the covariance matrix of X I Euclidean matching Σ is the identity matrixI Mahalanobis matching is equivalent to Euclidean matching based onstandardized and orthogonalized X

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 13

Matching Algorithms

Various matching algorithms can be employed to find potentialmatches based on MD and determine the matching weights wij

Pair matching (one-to-one matching without replacement)I For each observation i in the treatment group find observation j inthe control group for which MDij is smallest Once observation j isused as a match do not use it again

Nearest-neighbor matchingI For each observation i in the treatment group find the k closestobservations in the control group A single control can be usedmultiple times as a match In case of ties (multiple controls withidentical MD) use all ties as matches k is set by the researcher

Caliper matchingI Like nearest-neighbor matching but only use controls for which MDis smaller than some threshold c

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 14

Mahalanobis Matching

Radius matchingI Use all controls as matches for which MD is smaller than somethreshold c

Kernel matchingI Like radius matching but give larger weight to controls for which MDis small (using some kernel function such as eg the Epanechnikovkernel)

In addition since matching is no longer exact it may make sense torefine the estimates by applying regression-adjustment to thematched data (also known as ldquobias-adjustmentrdquo in the context ofnearest-neighbor matching)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 15

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 16

The Propensity Score Theorem (Rosenbaum and Rubin 1983)

If the conditional independence assumption is true then

Pr(Di = 1|Y 0i Y

1i Xi) = Pr(Di = 1|Xi) = π(Xi)

where π(X ) is called the propensity score

That is(Y 0Y 1) perpperp D |X

implies(Y 0Y 1) perpperp D |π(X )

so that under strong ignorability the average causal effect can beestimated by conditioning on the propensity score π(X ) instead of X

This is remarkable because the information in X which may includemany variables can be reduced to just one dimension This greatlysimplifies the matching task

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 17

Propensity Score Matching (PSM)

Instead of computing multivariate distances we can thus simplymatch on the (one-dimensional) propensity score

ProcedureI Step 1 Estimate the propensity score eg using a Logit modelI Step 2 Apply a matching algorithm using differences in thepropensity score |π(Xi )minus π(Xj)| instead of multivariate distances

PSM is tremendously popularI httpsscholargooglechscholarq=propensity+score+AND+(matching+OR+matched+OR+match)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 18

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 19

King and Nielsen

In 20152016 Gary King and Richard Nielsen circulated a paper thatcreated quite some concern among applied researchers

The basic message of the paper is that PSM is really really bad andshould be discarded

The paperI httpjmp1sexgVw

SlidesI httpsgkingharvardedupresentationswhy-propensity-scores-should-not-be-used-matching-6

Watch itI httpswwwyoutubecomwatchv=rBv39pK1iEs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 20

King and Nielsen

The story goes about as follows

Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)

I Matching is good because it reduces model dependence

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 21

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

CC

C

CC

C

C

CC

C

C

C

C

C

C

C

C

CC C

C

C

C

C

C

C

C

C

C

C

C

CCC

CC

CC

C

C

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

CC

C

CC

C

C

CC

C

C

C

C

C

C

C

C

CC C

C

C

C

C

C

C

C

C

C

C

C

CCC

CC

CC

C

C

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

CC

C

CC

C

C

CC

C

C

C

C

C

C

C

C

CC C

C

C

C

C

C

C

C

C

C

C

C

CCC

CC

CC

C

C

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

TC

C

C

CC

CC

C

C

CC

C CC

C

C

CCCC

C

CC

C

CC

CCCC

C

C

C

C

CC

CCCC

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

TC

C

C

CC

CC

C

C

CC

C CC

C

C

CCCC

C

CC

C

CC

CCCC

C

C

C

C

CC

CCCC

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

King and NielsenArgument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomizationMatching Finding Hidden Randomized Experiments

Types of Experiments

BalanceCovariates

CompleteRandomization

FullyBlocked

Observed On average ExactUnobserved On average On average

Fully blocked dominates complete randomization forimbalance model dependence power eciency bias researchcosts robustness Eg Imai King Nall 2009 SEs 600 smaller

Goal of Each Matching Method (in Observational Data)

bull PSM complete randomization

bull Other methods fully blocked

bull Other matching methods dominate PSM (wait it gets worse)

623

(slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 23

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

CC CCC CCCCC CCCCCC CCC CC C C CCCCC CCC CC CC CC CC CC CC CC CC C CCCCC CC CC C CCCC C CCC CC CC C CCC CC CC CC C CCC CC CC CCC C CC CCCCC CC CCCC C CCC CCC CC C C CCCC CC CCCC C CCCC C CC CCC CC CCC CC CC CC CC C CCCC C CC C C CC CCC CC CCC CCCCC CCCC CCC CC CCC C C CC CC CC CCCC CC CC CCC C C CCC C CC CCC CC C CCC C C CCC CC CC C CC C CC CCCCC CCCC C C CC C CCCC CC CCC CCC C CCC CC CC CC CC CC C CC C CC CC CCC CC C C CCC CCC C CC CC CCC CCC CC CCC CC C CCC CC C CC CCCC C CC C CC CC C CC C CCC C C CCC CC C CCC CCC CC CCCC CC CC C CC C CC CC C CC CCC CC C CCC CCCC C CC CC C CCC CC CC CC C CCCCC CCC C C CC C CC CCC CCC CC CCC CC CCCC C CCC CC CCC

923 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

923 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching is Suboptimal

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

CC

C C

CC

C

C

C CC

C

C

CC

C

C

C

C

C

CC CC

C

C

C

C

CC

C

CC

C

C

C

C

C

C

C

C

C C

CC

CCC

CC

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

King and Nielsen

Argument 3I Random pruning (deleting observations at random) increasesimbalance This is because the sample size decreases so that varianceincreases (large differences become more likely)

I More imbalancevariance means more model dependence andresearcher discretion

I Because PSM approximates complete randomization it engages inrandom pruning

I PSM Paradox (ldquowhen you do lsquobetterrsquo you do worserdquo)F When matching is made more strict (eg by decreasing the size of

the caliper) PSM like other matching methods typically reducesimbalance But soon the PSM Paradox kicks in such that furtherpruning quickly increases imbalance

F If the data is such that there are no big differences between treatedand untreated to begin with the PSM Paradox kicks in almostimmediately

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 25

PSM Increases Model Dependence amp Bias

Model Dependence0

000

010

020

030

040

05

Number of Units Pruned

Varia

nce

0 40 80 120 160

MDM

PSM

Bias

20

25

30

35

40

Number of Units Pruned

Max

imum

Coe

ffici

ent a

cros

s 51

2 Sp

ecifi

catio

ns0 40 80 120 160

MDM

PSM

True effect = 2

Yi = 2Ti + X1i + X2i + ii N(0 1)

2023 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26

The Propensity Score Paradox in Real Data

Finkel et al (JOP 2012)

0 500 1000 1500 2000 2500 3000

0

2

4

6

8

10

Number of units pruned

Imba

lanc

e

CEMMDM

PSM

14 SD caliper

Raw

Random

Nielsen et al (AJPS 2011)

0 500 1000 1500 2000 2500

0

5

10

15

20

25

30

Number of units pruned

Imba

lanc

e

CEM

MDM

PSM

14 SD caliper

RandomRaw

Similar pattern for gt 20 other real data sets we checked

2123 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 27

Are King and Nielsen right

Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)

I Matching is good because it reduces model dependence

I fully agree

My view however may be somewhat less pessimistic I believe thatresearch results can be credible if researchers are well educated sothat they know what they are doing and if modeling decisions aremade transparent and robustness of results is evaluated (anddocumented)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 28

Are King and Nielsen right

Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization

That fully blocked randomization is more efficient than completerandomization ndash given the sample size ndash is of course true (how largethe efficiency gains are depends on the strength of the relationbetween X and Y )

However if blocking reduces the sample size it is not a priori clearwhether estimates from the blocked sample are more efficient thanestimates from the full sample (although often they will be)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29

Are King and Nielsen right

Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization

That PSM approximates complete randomization is only partiallytrue PSM approximates complete randomization withinobservations with the same propensity score Hence PSM issomewhere between complete randomization and fully blockedrandomizationI If the X variables have no relation to T (treatment) then allobservations have the same propensity score Hence we end up withcomplete randomization

I If the X variables have a strong effect on T there is lots of blocking

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)

As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)

Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary

I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)

In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31

The Command

kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)

I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics

I Computationally efficient implementation

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32

Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)

drop if industry==2(4 observations deleted)

Mahalanobis-distance kernel matching

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33

Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)

Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif

collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888

3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0

10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0

2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0

Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio

collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966

3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1

10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1

2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34

Some Examples make a graph of the balancing stats mat M = r(M)

mat V = r(V)

coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)

addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))

addplot 2 xline(1) norescaling

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-4 -2 0 2 4 0 1 2 3 4

Std mean difference Variance ratio

Raw Matched

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35

Some Examples

Propensity-score kernel matching

kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Propensity-score kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 431 26 457 1214 182 1396 00188

Treatment-effects estimation

wage Coef

ATT 3887224NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36

Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)

01

23

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Den

sity

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37

Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)

05

1

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Cum

ulat

ive

prob

abilit

y

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38

Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)

02

46

8Raw Matched (ATT)

Untreated Treated

Prop

ensi

ty s

core

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39

Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289

NATE 1432913 2333282 614 0000 9755981 1890228

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40

Some Examples

Do some tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41

Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42

Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43

Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44

Some Examples

Mahalanobis-distance and propensity-score matching combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45

Some Examples

Exact matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46

Some Examples

Bandwidth selection the default (based on distribution of distances in

one-nearest-neighbor matching)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47

Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

3

02

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48

Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49

Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

4

1112

1314

Wei

ghte

d M

ISE

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50

Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51

Some Examples make a graph of the common-support stats mat M = r(M)

coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52

Some Examples

Multiple outcome variables

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53

Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54

Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2763358 166 0097 -082975 1000241

1ATT 9518705 406903 234 0019 1543553 1749386

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 123Prob gt chi2 = 02679

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4452343 111 0268 -379406 1365881

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017-06-24

Propensity Scores MatchingIllustration using kmatch

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017-06-24

Propensity Scores MatchingIllustration using kmatch

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64

Conclusions

The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)

Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65

Conclusions

Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the

literature (somewhat unclear eg how categorical variables should betreated)

Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 4: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

Counterfactual Causality (see Neyman 1923 Rubin 1974 1990)

aka Rubin Causal Model aka Potential Outcomes Framework

Treatment variable D

D =

1 treatment (eats of a particular dish)

0 control (does not eat of a particular dish)

Potential outcomes Y 1 and Y 0

I Y 1 potential outcome with treatment (D = 1)F If person i would eat of a particular dish would she die or would she

surviveI Y 0 potential outcome without treatment (D = 0)

F If person i would not eat of a particular dish would she die or wouldshe survive

Causal effect of the treatment for individual i

causal effect = difference between potential outcomes

δi = Y 1i minus Y 0

i

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 4

Fundamental Problem of Causal Inference

The causal effect of D on Y for individual i is defined as thedifference in potential outcomes δi = Y 1

i minus Y 0i

However the observed outcome variable is

Yi =

Y 1

i if Di = 1

Y 0i if Di = 0

That is only one of the two potential outcomes will be realized andhence only Y 1

i or Y 0i can be observed but never both

Consequence

The individual treatment effect δi cannot be observed

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 5

Average Treatment Effect

Although individual causal effects cannot be observed the averagecausal effect in a population (the so-called ldquoAverage TreatmentEffectrdquo) can be identified comparing the expected values of Y 1 andY 0

ATE = E [δ] = E [Y 1 minus Y 0] = E [Y 1]minus E [Y 0]

Some other quantities of interestI Average Treatment Effect on the Treated (ATT)

ATT = E [Y 1 minus Y 0|D = 1] = E [Y 1|D = 1]minus E [Y 0|D = 1]

I Average Treatment Effect on the Untreated (ATC)

ATC = E [Y 1 minus Y 0|D = 0] = E [Y 1|D = 0]minus E [Y 0|D = 0]

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 6

Average Treatment Effect

To determine the average effect unbiased estimates of E [Y 0] andE [Y 1] are required

If the independence assumption

(Y 0Y 1) perpperp D

applies that is if D is independent from Y 0 and Y 1 then

E [Y 0] = E [Y 0|D = 0]

E [Y 1] = E [Y 1|D = 1]

In this case the average causal effect can be be measured by asimple group comparison (mean difference) of observations withouttreatment (D = 0) and observations with treatment (D = 1)

Randomized experiments solve the problem If the assignment of Dis randomized D is independent from Y 0 and Y 1 by design

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 7

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 8

Conditional Independence Strong Ignorability

Can causal effects also be identified from ldquoobservationalrdquo (ienon-experimental) data

Sometimes it can be argued that the independence assumption isvalid conditionally (conditional independence ldquounconfoundednessrdquo)

(Y 0Y 1) perpperp D |X

If in addition the overlap assumption

0 lt Pr(D = 1|X = x) lt 1 for all x

is given then the ATE (or ATT or ATC) can be identified byconditioning on X

For example

ATE =sumx

Pr[X = x ] E [Y |D = 1X = x ]minus E [Y |D = 0X = x ]

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 9

Matching

Matching is one approach to ldquocondition on X rdquo if strong ignorabilityholds

Basic idea1 For each observation in the treatment group find ldquostatistical twinsrdquo in

the control group with the same (or at least very similar) X values(and vice versa)

2 The Y values of these matching observations are then used tocompute the counterfactual outcome for the observation at hand

3 An estimate for the average causal effect can be obtained as themean of the differences between the observed values and theldquoimputedrdquo counterfactual values over all observations

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 10

Matching

Formally

ATT =1

ND=1

sumi |D=1

[Yi minus Y 0

i

]=

1ND=1

sumi |D=1

Yi minussum

j |D=0

wijYj

ATC =

1ND=0

sumi |D=0

[Y 1

i minus Yi

]=

1ND=0

sumi |D=0

sumj |D=1

wijYj minus Yi

ATE =

ND=1

Nmiddot ATT +

ND=0

Nmiddot ATC

Different matching algorithms use different definitions of wij

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 11

Exact Matching

Exact matchingwij =

1ki if Xi = Xj

0 else

with ki as the number of observations for which Xi = Xj applies

The result equivalent to ldquoperfect stratificationrdquo or ldquosubclassificationrdquo(see eg Cochran 1968)

Problem If X contains several variables there is a large probabilitythat no exact matches can be found for many observations (theldquocurse of dimensionalityrdquo)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 12

Multivariate Distance Matching (MDM)

An alternative is to match based on a distance metric that measuresthe proximity between observations in the multivariate space of X

The idea then is to use observations that are ldquocloserdquo but notnecessarily equal as matches

A common approach is to use

MD(Xi Xj) =radic

(Xi minus Xj)primeΣminus1(Xi minus Xj)

as distance metric where Σ is an appropriate scaling matrix

I Mahalanobis matching Σ is the covariance matrix of X I Euclidean matching Σ is the identity matrixI Mahalanobis matching is equivalent to Euclidean matching based onstandardized and orthogonalized X

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 13

Matching Algorithms

Various matching algorithms can be employed to find potentialmatches based on MD and determine the matching weights wij

Pair matching (one-to-one matching without replacement)I For each observation i in the treatment group find observation j inthe control group for which MDij is smallest Once observation j isused as a match do not use it again

Nearest-neighbor matchingI For each observation i in the treatment group find the k closestobservations in the control group A single control can be usedmultiple times as a match In case of ties (multiple controls withidentical MD) use all ties as matches k is set by the researcher

Caliper matchingI Like nearest-neighbor matching but only use controls for which MDis smaller than some threshold c

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 14

Mahalanobis Matching

Radius matchingI Use all controls as matches for which MD is smaller than somethreshold c

Kernel matchingI Like radius matching but give larger weight to controls for which MDis small (using some kernel function such as eg the Epanechnikovkernel)

In addition since matching is no longer exact it may make sense torefine the estimates by applying regression-adjustment to thematched data (also known as ldquobias-adjustmentrdquo in the context ofnearest-neighbor matching)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 15

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 16

The Propensity Score Theorem (Rosenbaum and Rubin 1983)

If the conditional independence assumption is true then

Pr(Di = 1|Y 0i Y

1i Xi) = Pr(Di = 1|Xi) = π(Xi)

where π(X ) is called the propensity score

That is(Y 0Y 1) perpperp D |X

implies(Y 0Y 1) perpperp D |π(X )

so that under strong ignorability the average causal effect can beestimated by conditioning on the propensity score π(X ) instead of X

This is remarkable because the information in X which may includemany variables can be reduced to just one dimension This greatlysimplifies the matching task

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 17

Propensity Score Matching (PSM)

Instead of computing multivariate distances we can thus simplymatch on the (one-dimensional) propensity score

ProcedureI Step 1 Estimate the propensity score eg using a Logit modelI Step 2 Apply a matching algorithm using differences in thepropensity score |π(Xi )minus π(Xj)| instead of multivariate distances

PSM is tremendously popularI httpsscholargooglechscholarq=propensity+score+AND+(matching+OR+matched+OR+match)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 18

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 19

King and Nielsen

In 20152016 Gary King and Richard Nielsen circulated a paper thatcreated quite some concern among applied researchers

The basic message of the paper is that PSM is really really bad andshould be discarded

The paperI httpjmp1sexgVw

SlidesI httpsgkingharvardedupresentationswhy-propensity-scores-should-not-be-used-matching-6

Watch itI httpswwwyoutubecomwatchv=rBv39pK1iEs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 20

King and Nielsen

The story goes about as follows

Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)

I Matching is good because it reduces model dependence

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 21

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

CC

C

CC

C

C

CC

C

C

C

C

C

C

C

C

CC C

C

C

C

C

C

C

C

C

C

C

C

CCC

CC

CC

C

C

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

CC

C

CC

C

C

CC

C

C

C

C

C

C

C

C

CC C

C

C

C

C

C

C

C

C

C

C

C

CCC

CC

CC

C

C

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

CC

C

CC

C

C

CC

C

C

C

C

C

C

C

C

CC C

C

C

C

C

C

C

C

C

C

C

C

CCC

CC

CC

C

C

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

TC

C

C

CC

CC

C

C

CC

C CC

C

C

CCCC

C

CC

C

CC

CCCC

C

C

C

C

CC

CCCC

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

TC

C

C

CC

CC

C

C

CC

C CC

C

C

CCCC

C

CC

C

CC

CCCC

C

C

C

C

CC

CCCC

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

King and NielsenArgument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomizationMatching Finding Hidden Randomized Experiments

Types of Experiments

BalanceCovariates

CompleteRandomization

FullyBlocked

Observed On average ExactUnobserved On average On average

Fully blocked dominates complete randomization forimbalance model dependence power eciency bias researchcosts robustness Eg Imai King Nall 2009 SEs 600 smaller

Goal of Each Matching Method (in Observational Data)

bull PSM complete randomization

bull Other methods fully blocked

bull Other matching methods dominate PSM (wait it gets worse)

623

(slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 23

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

CC CCC CCCCC CCCCCC CCC CC C C CCCCC CCC CC CC CC CC CC CC CC CC C CCCCC CC CC C CCCC C CCC CC CC C CCC CC CC CC C CCC CC CC CCC C CC CCCCC CC CCCC C CCC CCC CC C C CCCC CC CCCC C CCCC C CC CCC CC CCC CC CC CC CC C CCCC C CC C C CC CCC CC CCC CCCCC CCCC CCC CC CCC C C CC CC CC CCCC CC CC CCC C C CCC C CC CCC CC C CCC C C CCC CC CC C CC C CC CCCCC CCCC C C CC C CCCC CC CCC CCC C CCC CC CC CC CC CC C CC C CC CC CCC CC C C CCC CCC C CC CC CCC CCC CC CCC CC C CCC CC C CC CCCC C CC C CC CC C CC C CCC C C CCC CC C CCC CCC CC CCCC CC CC C CC C CC CC C CC CCC CC C CCC CCCC C CC CC C CCC CC CC CC C CCCCC CCC C C CC C CC CCC CCC CC CCC CC CCCC C CCC CC CCC

923 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

923 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching is Suboptimal

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

CC

C C

CC

C

C

C CC

C

C

CC

C

C

C

C

C

CC CC

C

C

C

C

CC

C

CC

C

C

C

C

C

C

C

C

C C

CC

CCC

CC

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

King and Nielsen

Argument 3I Random pruning (deleting observations at random) increasesimbalance This is because the sample size decreases so that varianceincreases (large differences become more likely)

I More imbalancevariance means more model dependence andresearcher discretion

I Because PSM approximates complete randomization it engages inrandom pruning

I PSM Paradox (ldquowhen you do lsquobetterrsquo you do worserdquo)F When matching is made more strict (eg by decreasing the size of

the caliper) PSM like other matching methods typically reducesimbalance But soon the PSM Paradox kicks in such that furtherpruning quickly increases imbalance

F If the data is such that there are no big differences between treatedand untreated to begin with the PSM Paradox kicks in almostimmediately

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 25

PSM Increases Model Dependence amp Bias

Model Dependence0

000

010

020

030

040

05

Number of Units Pruned

Varia

nce

0 40 80 120 160

MDM

PSM

Bias

20

25

30

35

40

Number of Units Pruned

Max

imum

Coe

ffici

ent a

cros

s 51

2 Sp

ecifi

catio

ns0 40 80 120 160

MDM

PSM

True effect = 2

Yi = 2Ti + X1i + X2i + ii N(0 1)

2023 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26

The Propensity Score Paradox in Real Data

Finkel et al (JOP 2012)

0 500 1000 1500 2000 2500 3000

0

2

4

6

8

10

Number of units pruned

Imba

lanc

e

CEMMDM

PSM

14 SD caliper

Raw

Random

Nielsen et al (AJPS 2011)

0 500 1000 1500 2000 2500

0

5

10

15

20

25

30

Number of units pruned

Imba

lanc

e

CEM

MDM

PSM

14 SD caliper

RandomRaw

Similar pattern for gt 20 other real data sets we checked

2123 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 27

Are King and Nielsen right

Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)

I Matching is good because it reduces model dependence

I fully agree

My view however may be somewhat less pessimistic I believe thatresearch results can be credible if researchers are well educated sothat they know what they are doing and if modeling decisions aremade transparent and robustness of results is evaluated (anddocumented)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 28

Are King and Nielsen right

Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization

That fully blocked randomization is more efficient than completerandomization ndash given the sample size ndash is of course true (how largethe efficiency gains are depends on the strength of the relationbetween X and Y )

However if blocking reduces the sample size it is not a priori clearwhether estimates from the blocked sample are more efficient thanestimates from the full sample (although often they will be)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29

Are King and Nielsen right

Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization

That PSM approximates complete randomization is only partiallytrue PSM approximates complete randomization withinobservations with the same propensity score Hence PSM issomewhere between complete randomization and fully blockedrandomizationI If the X variables have no relation to T (treatment) then allobservations have the same propensity score Hence we end up withcomplete randomization

I If the X variables have a strong effect on T there is lots of blocking

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)

As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)

Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary

I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)

In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31

The Command

kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)

I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics

I Computationally efficient implementation

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32

Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)

drop if industry==2(4 observations deleted)

Mahalanobis-distance kernel matching

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33

Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)

Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif

collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888

3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0

10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0

2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0

Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio

collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966

3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1

10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1

2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34

Some Examples make a graph of the balancing stats mat M = r(M)

mat V = r(V)

coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)

addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))

addplot 2 xline(1) norescaling

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-4 -2 0 2 4 0 1 2 3 4

Std mean difference Variance ratio

Raw Matched

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35

Some Examples

Propensity-score kernel matching

kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Propensity-score kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 431 26 457 1214 182 1396 00188

Treatment-effects estimation

wage Coef

ATT 3887224NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36

Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)

01

23

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Den

sity

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37

Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)

05

1

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Cum

ulat

ive

prob

abilit

y

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38

Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)

02

46

8Raw Matched (ATT)

Untreated Treated

Prop

ensi

ty s

core

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39

Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289

NATE 1432913 2333282 614 0000 9755981 1890228

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40

Some Examples

Do some tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41

Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42

Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43

Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44

Some Examples

Mahalanobis-distance and propensity-score matching combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45

Some Examples

Exact matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46

Some Examples

Bandwidth selection the default (based on distribution of distances in

one-nearest-neighbor matching)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47

Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

3

02

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48

Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49

Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

4

1112

1314

Wei

ghte

d M

ISE

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50

Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51

Some Examples make a graph of the common-support stats mat M = r(M)

coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52

Some Examples

Multiple outcome variables

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53

Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54

Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2763358 166 0097 -082975 1000241

1ATT 9518705 406903 234 0019 1543553 1749386

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 123Prob gt chi2 = 02679

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4452343 111 0268 -379406 1365881

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017-06-24

Propensity Scores MatchingIllustration using kmatch

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017-06-24

Propensity Scores MatchingIllustration using kmatch

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64

Conclusions

The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)

Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65

Conclusions

Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the

literature (somewhat unclear eg how categorical variables should betreated)

Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 5: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

Fundamental Problem of Causal Inference

The causal effect of D on Y for individual i is defined as thedifference in potential outcomes δi = Y 1

i minus Y 0i

However the observed outcome variable is

Yi =

Y 1

i if Di = 1

Y 0i if Di = 0

That is only one of the two potential outcomes will be realized andhence only Y 1

i or Y 0i can be observed but never both

Consequence

The individual treatment effect δi cannot be observed

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 5

Average Treatment Effect

Although individual causal effects cannot be observed the averagecausal effect in a population (the so-called ldquoAverage TreatmentEffectrdquo) can be identified comparing the expected values of Y 1 andY 0

ATE = E [δ] = E [Y 1 minus Y 0] = E [Y 1]minus E [Y 0]

Some other quantities of interestI Average Treatment Effect on the Treated (ATT)

ATT = E [Y 1 minus Y 0|D = 1] = E [Y 1|D = 1]minus E [Y 0|D = 1]

I Average Treatment Effect on the Untreated (ATC)

ATC = E [Y 1 minus Y 0|D = 0] = E [Y 1|D = 0]minus E [Y 0|D = 0]

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 6

Average Treatment Effect

To determine the average effect unbiased estimates of E [Y 0] andE [Y 1] are required

If the independence assumption

(Y 0Y 1) perpperp D

applies that is if D is independent from Y 0 and Y 1 then

E [Y 0] = E [Y 0|D = 0]

E [Y 1] = E [Y 1|D = 1]

In this case the average causal effect can be be measured by asimple group comparison (mean difference) of observations withouttreatment (D = 0) and observations with treatment (D = 1)

Randomized experiments solve the problem If the assignment of Dis randomized D is independent from Y 0 and Y 1 by design

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 7

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 8

Conditional Independence Strong Ignorability

Can causal effects also be identified from ldquoobservationalrdquo (ienon-experimental) data

Sometimes it can be argued that the independence assumption isvalid conditionally (conditional independence ldquounconfoundednessrdquo)

(Y 0Y 1) perpperp D |X

If in addition the overlap assumption

0 lt Pr(D = 1|X = x) lt 1 for all x

is given then the ATE (or ATT or ATC) can be identified byconditioning on X

For example

ATE =sumx

Pr[X = x ] E [Y |D = 1X = x ]minus E [Y |D = 0X = x ]

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 9

Matching

Matching is one approach to ldquocondition on X rdquo if strong ignorabilityholds

Basic idea1 For each observation in the treatment group find ldquostatistical twinsrdquo in

the control group with the same (or at least very similar) X values(and vice versa)

2 The Y values of these matching observations are then used tocompute the counterfactual outcome for the observation at hand

3 An estimate for the average causal effect can be obtained as themean of the differences between the observed values and theldquoimputedrdquo counterfactual values over all observations

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 10

Matching

Formally

ATT =1

ND=1

sumi |D=1

[Yi minus Y 0

i

]=

1ND=1

sumi |D=1

Yi minussum

j |D=0

wijYj

ATC =

1ND=0

sumi |D=0

[Y 1

i minus Yi

]=

1ND=0

sumi |D=0

sumj |D=1

wijYj minus Yi

ATE =

ND=1

Nmiddot ATT +

ND=0

Nmiddot ATC

Different matching algorithms use different definitions of wij

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 11

Exact Matching

Exact matchingwij =

1ki if Xi = Xj

0 else

with ki as the number of observations for which Xi = Xj applies

The result equivalent to ldquoperfect stratificationrdquo or ldquosubclassificationrdquo(see eg Cochran 1968)

Problem If X contains several variables there is a large probabilitythat no exact matches can be found for many observations (theldquocurse of dimensionalityrdquo)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 12

Multivariate Distance Matching (MDM)

An alternative is to match based on a distance metric that measuresthe proximity between observations in the multivariate space of X

The idea then is to use observations that are ldquocloserdquo but notnecessarily equal as matches

A common approach is to use

MD(Xi Xj) =radic

(Xi minus Xj)primeΣminus1(Xi minus Xj)

as distance metric where Σ is an appropriate scaling matrix

I Mahalanobis matching Σ is the covariance matrix of X I Euclidean matching Σ is the identity matrixI Mahalanobis matching is equivalent to Euclidean matching based onstandardized and orthogonalized X

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 13

Matching Algorithms

Various matching algorithms can be employed to find potentialmatches based on MD and determine the matching weights wij

Pair matching (one-to-one matching without replacement)I For each observation i in the treatment group find observation j inthe control group for which MDij is smallest Once observation j isused as a match do not use it again

Nearest-neighbor matchingI For each observation i in the treatment group find the k closestobservations in the control group A single control can be usedmultiple times as a match In case of ties (multiple controls withidentical MD) use all ties as matches k is set by the researcher

Caliper matchingI Like nearest-neighbor matching but only use controls for which MDis smaller than some threshold c

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 14

Mahalanobis Matching

Radius matchingI Use all controls as matches for which MD is smaller than somethreshold c

Kernel matchingI Like radius matching but give larger weight to controls for which MDis small (using some kernel function such as eg the Epanechnikovkernel)

In addition since matching is no longer exact it may make sense torefine the estimates by applying regression-adjustment to thematched data (also known as ldquobias-adjustmentrdquo in the context ofnearest-neighbor matching)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 15

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 16

The Propensity Score Theorem (Rosenbaum and Rubin 1983)

If the conditional independence assumption is true then

Pr(Di = 1|Y 0i Y

1i Xi) = Pr(Di = 1|Xi) = π(Xi)

where π(X ) is called the propensity score

That is(Y 0Y 1) perpperp D |X

implies(Y 0Y 1) perpperp D |π(X )

so that under strong ignorability the average causal effect can beestimated by conditioning on the propensity score π(X ) instead of X

This is remarkable because the information in X which may includemany variables can be reduced to just one dimension This greatlysimplifies the matching task

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 17

Propensity Score Matching (PSM)

Instead of computing multivariate distances we can thus simplymatch on the (one-dimensional) propensity score

ProcedureI Step 1 Estimate the propensity score eg using a Logit modelI Step 2 Apply a matching algorithm using differences in thepropensity score |π(Xi )minus π(Xj)| instead of multivariate distances

PSM is tremendously popularI httpsscholargooglechscholarq=propensity+score+AND+(matching+OR+matched+OR+match)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 18

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 19

King and Nielsen

In 20152016 Gary King and Richard Nielsen circulated a paper thatcreated quite some concern among applied researchers

The basic message of the paper is that PSM is really really bad andshould be discarded

The paperI httpjmp1sexgVw

SlidesI httpsgkingharvardedupresentationswhy-propensity-scores-should-not-be-used-matching-6

Watch itI httpswwwyoutubecomwatchv=rBv39pK1iEs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 20

King and Nielsen

The story goes about as follows

Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)

I Matching is good because it reduces model dependence

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 21

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

CC

C

CC

C

C

CC

C

C

C

C

C

C

C

C

CC C

C

C

C

C

C

C

C

C

C

C

C

CCC

CC

CC

C

C

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

CC

C

CC

C

C

CC

C

C

C

C

C

C

C

C

CC C

C

C

C

C

C

C

C

C

C

C

C

CCC

CC

CC

C

C

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

CC

C

CC

C

C

CC

C

C

C

C

C

C

C

C

CC C

C

C

C

C

C

C

C

C

C

C

C

CCC

CC

CC

C

C

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

TC

C

C

CC

CC

C

C

CC

C CC

C

C

CCCC

C

CC

C

CC

CCCC

C

C

C

C

CC

CCCC

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

TC

C

C

CC

CC

C

C

CC

C CC

C

C

CCCC

C

CC

C

CC

CCCC

C

C

C

C

CC

CCCC

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

King and NielsenArgument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomizationMatching Finding Hidden Randomized Experiments

Types of Experiments

BalanceCovariates

CompleteRandomization

FullyBlocked

Observed On average ExactUnobserved On average On average

Fully blocked dominates complete randomization forimbalance model dependence power eciency bias researchcosts robustness Eg Imai King Nall 2009 SEs 600 smaller

Goal of Each Matching Method (in Observational Data)

bull PSM complete randomization

bull Other methods fully blocked

bull Other matching methods dominate PSM (wait it gets worse)

623

(slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 23

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

CC CCC CCCCC CCCCCC CCC CC C C CCCCC CCC CC CC CC CC CC CC CC CC C CCCCC CC CC C CCCC C CCC CC CC C CCC CC CC CC C CCC CC CC CCC C CC CCCCC CC CCCC C CCC CCC CC C C CCCC CC CCCC C CCCC C CC CCC CC CCC CC CC CC CC C CCCC C CC C C CC CCC CC CCC CCCCC CCCC CCC CC CCC C C CC CC CC CCCC CC CC CCC C C CCC C CC CCC CC C CCC C C CCC CC CC C CC C CC CCCCC CCCC C C CC C CCCC CC CCC CCC C CCC CC CC CC CC CC C CC C CC CC CCC CC C C CCC CCC C CC CC CCC CCC CC CCC CC C CCC CC C CC CCCC C CC C CC CC C CC C CCC C C CCC CC C CCC CCC CC CCCC CC CC C CC C CC CC C CC CCC CC C CCC CCCC C CC CC C CCC CC CC CC C CCCCC CCC C C CC C CC CCC CCC CC CCC CC CCCC C CCC CC CCC

923 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

923 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching is Suboptimal

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

CC

C C

CC

C

C

C CC

C

C

CC

C

C

C

C

C

CC CC

C

C

C

C

CC

C

CC

C

C

C

C

C

C

C

C

C C

CC

CCC

CC

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

King and Nielsen

Argument 3I Random pruning (deleting observations at random) increasesimbalance This is because the sample size decreases so that varianceincreases (large differences become more likely)

I More imbalancevariance means more model dependence andresearcher discretion

I Because PSM approximates complete randomization it engages inrandom pruning

I PSM Paradox (ldquowhen you do lsquobetterrsquo you do worserdquo)F When matching is made more strict (eg by decreasing the size of

the caliper) PSM like other matching methods typically reducesimbalance But soon the PSM Paradox kicks in such that furtherpruning quickly increases imbalance

F If the data is such that there are no big differences between treatedand untreated to begin with the PSM Paradox kicks in almostimmediately

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 25

PSM Increases Model Dependence amp Bias

Model Dependence0

000

010

020

030

040

05

Number of Units Pruned

Varia

nce

0 40 80 120 160

MDM

PSM

Bias

20

25

30

35

40

Number of Units Pruned

Max

imum

Coe

ffici

ent a

cros

s 51

2 Sp

ecifi

catio

ns0 40 80 120 160

MDM

PSM

True effect = 2

Yi = 2Ti + X1i + X2i + ii N(0 1)

2023 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26

The Propensity Score Paradox in Real Data

Finkel et al (JOP 2012)

0 500 1000 1500 2000 2500 3000

0

2

4

6

8

10

Number of units pruned

Imba

lanc

e

CEMMDM

PSM

14 SD caliper

Raw

Random

Nielsen et al (AJPS 2011)

0 500 1000 1500 2000 2500

0

5

10

15

20

25

30

Number of units pruned

Imba

lanc

e

CEM

MDM

PSM

14 SD caliper

RandomRaw

Similar pattern for gt 20 other real data sets we checked

2123 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 27

Are King and Nielsen right

Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)

I Matching is good because it reduces model dependence

I fully agree

My view however may be somewhat less pessimistic I believe thatresearch results can be credible if researchers are well educated sothat they know what they are doing and if modeling decisions aremade transparent and robustness of results is evaluated (anddocumented)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 28

Are King and Nielsen right

Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization

That fully blocked randomization is more efficient than completerandomization ndash given the sample size ndash is of course true (how largethe efficiency gains are depends on the strength of the relationbetween X and Y )

However if blocking reduces the sample size it is not a priori clearwhether estimates from the blocked sample are more efficient thanestimates from the full sample (although often they will be)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29

Are King and Nielsen right

Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization

That PSM approximates complete randomization is only partiallytrue PSM approximates complete randomization withinobservations with the same propensity score Hence PSM issomewhere between complete randomization and fully blockedrandomizationI If the X variables have no relation to T (treatment) then allobservations have the same propensity score Hence we end up withcomplete randomization

I If the X variables have a strong effect on T there is lots of blocking

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)

As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)

Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary

I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)

In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31

The Command

kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)

I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics

I Computationally efficient implementation

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32

Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)

drop if industry==2(4 observations deleted)

Mahalanobis-distance kernel matching

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33

Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)

Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif

collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888

3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0

10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0

2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0

Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio

collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966

3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1

10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1

2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34

Some Examples make a graph of the balancing stats mat M = r(M)

mat V = r(V)

coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)

addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))

addplot 2 xline(1) norescaling

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-4 -2 0 2 4 0 1 2 3 4

Std mean difference Variance ratio

Raw Matched

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35

Some Examples

Propensity-score kernel matching

kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Propensity-score kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 431 26 457 1214 182 1396 00188

Treatment-effects estimation

wage Coef

ATT 3887224NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36

Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)

01

23

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Den

sity

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37

Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)

05

1

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Cum

ulat

ive

prob

abilit

y

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38

Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)

02

46

8Raw Matched (ATT)

Untreated Treated

Prop

ensi

ty s

core

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39

Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289

NATE 1432913 2333282 614 0000 9755981 1890228

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40

Some Examples

Do some tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41

Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42

Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43

Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44

Some Examples

Mahalanobis-distance and propensity-score matching combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45

Some Examples

Exact matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46

Some Examples

Bandwidth selection the default (based on distribution of distances in

one-nearest-neighbor matching)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47

Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

3

02

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48

Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49

Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

4

1112

1314

Wei

ghte

d M

ISE

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50

Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51

Some Examples make a graph of the common-support stats mat M = r(M)

coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52

Some Examples

Multiple outcome variables

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53

Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54

Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2763358 166 0097 -082975 1000241

1ATT 9518705 406903 234 0019 1543553 1749386

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 123Prob gt chi2 = 02679

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4452343 111 0268 -379406 1365881

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017-06-24

Propensity Scores MatchingIllustration using kmatch

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017-06-24

Propensity Scores MatchingIllustration using kmatch

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64

Conclusions

The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)

Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65

Conclusions

Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the

literature (somewhat unclear eg how categorical variables should betreated)

Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 6: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

Average Treatment Effect

Although individual causal effects cannot be observed the averagecausal effect in a population (the so-called ldquoAverage TreatmentEffectrdquo) can be identified comparing the expected values of Y 1 andY 0

ATE = E [δ] = E [Y 1 minus Y 0] = E [Y 1]minus E [Y 0]

Some other quantities of interestI Average Treatment Effect on the Treated (ATT)

ATT = E [Y 1 minus Y 0|D = 1] = E [Y 1|D = 1]minus E [Y 0|D = 1]

I Average Treatment Effect on the Untreated (ATC)

ATC = E [Y 1 minus Y 0|D = 0] = E [Y 1|D = 0]minus E [Y 0|D = 0]

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 6

Average Treatment Effect

To determine the average effect unbiased estimates of E [Y 0] andE [Y 1] are required

If the independence assumption

(Y 0Y 1) perpperp D

applies that is if D is independent from Y 0 and Y 1 then

E [Y 0] = E [Y 0|D = 0]

E [Y 1] = E [Y 1|D = 1]

In this case the average causal effect can be be measured by asimple group comparison (mean difference) of observations withouttreatment (D = 0) and observations with treatment (D = 1)

Randomized experiments solve the problem If the assignment of Dis randomized D is independent from Y 0 and Y 1 by design

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 7

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 8

Conditional Independence Strong Ignorability

Can causal effects also be identified from ldquoobservationalrdquo (ienon-experimental) data

Sometimes it can be argued that the independence assumption isvalid conditionally (conditional independence ldquounconfoundednessrdquo)

(Y 0Y 1) perpperp D |X

If in addition the overlap assumption

0 lt Pr(D = 1|X = x) lt 1 for all x

is given then the ATE (or ATT or ATC) can be identified byconditioning on X

For example

ATE =sumx

Pr[X = x ] E [Y |D = 1X = x ]minus E [Y |D = 0X = x ]

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 9

Matching

Matching is one approach to ldquocondition on X rdquo if strong ignorabilityholds

Basic idea1 For each observation in the treatment group find ldquostatistical twinsrdquo in

the control group with the same (or at least very similar) X values(and vice versa)

2 The Y values of these matching observations are then used tocompute the counterfactual outcome for the observation at hand

3 An estimate for the average causal effect can be obtained as themean of the differences between the observed values and theldquoimputedrdquo counterfactual values over all observations

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 10

Matching

Formally

ATT =1

ND=1

sumi |D=1

[Yi minus Y 0

i

]=

1ND=1

sumi |D=1

Yi minussum

j |D=0

wijYj

ATC =

1ND=0

sumi |D=0

[Y 1

i minus Yi

]=

1ND=0

sumi |D=0

sumj |D=1

wijYj minus Yi

ATE =

ND=1

Nmiddot ATT +

ND=0

Nmiddot ATC

Different matching algorithms use different definitions of wij

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 11

Exact Matching

Exact matchingwij =

1ki if Xi = Xj

0 else

with ki as the number of observations for which Xi = Xj applies

The result equivalent to ldquoperfect stratificationrdquo or ldquosubclassificationrdquo(see eg Cochran 1968)

Problem If X contains several variables there is a large probabilitythat no exact matches can be found for many observations (theldquocurse of dimensionalityrdquo)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 12

Multivariate Distance Matching (MDM)

An alternative is to match based on a distance metric that measuresthe proximity between observations in the multivariate space of X

The idea then is to use observations that are ldquocloserdquo but notnecessarily equal as matches

A common approach is to use

MD(Xi Xj) =radic

(Xi minus Xj)primeΣminus1(Xi minus Xj)

as distance metric where Σ is an appropriate scaling matrix

I Mahalanobis matching Σ is the covariance matrix of X I Euclidean matching Σ is the identity matrixI Mahalanobis matching is equivalent to Euclidean matching based onstandardized and orthogonalized X

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 13

Matching Algorithms

Various matching algorithms can be employed to find potentialmatches based on MD and determine the matching weights wij

Pair matching (one-to-one matching without replacement)I For each observation i in the treatment group find observation j inthe control group for which MDij is smallest Once observation j isused as a match do not use it again

Nearest-neighbor matchingI For each observation i in the treatment group find the k closestobservations in the control group A single control can be usedmultiple times as a match In case of ties (multiple controls withidentical MD) use all ties as matches k is set by the researcher

Caliper matchingI Like nearest-neighbor matching but only use controls for which MDis smaller than some threshold c

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 14

Mahalanobis Matching

Radius matchingI Use all controls as matches for which MD is smaller than somethreshold c

Kernel matchingI Like radius matching but give larger weight to controls for which MDis small (using some kernel function such as eg the Epanechnikovkernel)

In addition since matching is no longer exact it may make sense torefine the estimates by applying regression-adjustment to thematched data (also known as ldquobias-adjustmentrdquo in the context ofnearest-neighbor matching)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 15

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 16

The Propensity Score Theorem (Rosenbaum and Rubin 1983)

If the conditional independence assumption is true then

Pr(Di = 1|Y 0i Y

1i Xi) = Pr(Di = 1|Xi) = π(Xi)

where π(X ) is called the propensity score

That is(Y 0Y 1) perpperp D |X

implies(Y 0Y 1) perpperp D |π(X )

so that under strong ignorability the average causal effect can beestimated by conditioning on the propensity score π(X ) instead of X

This is remarkable because the information in X which may includemany variables can be reduced to just one dimension This greatlysimplifies the matching task

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 17

Propensity Score Matching (PSM)

Instead of computing multivariate distances we can thus simplymatch on the (one-dimensional) propensity score

ProcedureI Step 1 Estimate the propensity score eg using a Logit modelI Step 2 Apply a matching algorithm using differences in thepropensity score |π(Xi )minus π(Xj)| instead of multivariate distances

PSM is tremendously popularI httpsscholargooglechscholarq=propensity+score+AND+(matching+OR+matched+OR+match)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 18

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 19

King and Nielsen

In 20152016 Gary King and Richard Nielsen circulated a paper thatcreated quite some concern among applied researchers

The basic message of the paper is that PSM is really really bad andshould be discarded

The paperI httpjmp1sexgVw

SlidesI httpsgkingharvardedupresentationswhy-propensity-scores-should-not-be-used-matching-6

Watch itI httpswwwyoutubecomwatchv=rBv39pK1iEs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 20

King and Nielsen

The story goes about as follows

Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)

I Matching is good because it reduces model dependence

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 21

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

CC

C

CC

C

C

CC

C

C

C

C

C

C

C

C

CC C

C

C

C

C

C

C

C

C

C

C

C

CCC

CC

CC

C

C

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

CC

C

CC

C

C

CC

C

C

C

C

C

C

C

C

CC C

C

C

C

C

C

C

C

C

C

C

C

CCC

CC

CC

C

C

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

CC

C

CC

C

C

CC

C

C

C

C

C

C

C

C

CC C

C

C

C

C

C

C

C

C

C

C

C

CCC

CC

CC

C

C

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

TC

C

C

CC

CC

C

C

CC

C CC

C

C

CCCC

C

CC

C

CC

CCCC

C

C

C

C

CC

CCCC

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

TC

C

C

CC

CC

C

C

CC

C CC

C

C

CCCC

C

CC

C

CC

CCCC

C

C

C

C

CC

CCCC

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

King and NielsenArgument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomizationMatching Finding Hidden Randomized Experiments

Types of Experiments

BalanceCovariates

CompleteRandomization

FullyBlocked

Observed On average ExactUnobserved On average On average

Fully blocked dominates complete randomization forimbalance model dependence power eciency bias researchcosts robustness Eg Imai King Nall 2009 SEs 600 smaller

Goal of Each Matching Method (in Observational Data)

bull PSM complete randomization

bull Other methods fully blocked

bull Other matching methods dominate PSM (wait it gets worse)

623

(slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 23

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

CC CCC CCCCC CCCCCC CCC CC C C CCCCC CCC CC CC CC CC CC CC CC CC C CCCCC CC CC C CCCC C CCC CC CC C CCC CC CC CC C CCC CC CC CCC C CC CCCCC CC CCCC C CCC CCC CC C C CCCC CC CCCC C CCCC C CC CCC CC CCC CC CC CC CC C CCCC C CC C C CC CCC CC CCC CCCCC CCCC CCC CC CCC C C CC CC CC CCCC CC CC CCC C C CCC C CC CCC CC C CCC C C CCC CC CC C CC C CC CCCCC CCCC C C CC C CCCC CC CCC CCC C CCC CC CC CC CC CC C CC C CC CC CCC CC C C CCC CCC C CC CC CCC CCC CC CCC CC C CCC CC C CC CCCC C CC C CC CC C CC C CCC C C CCC CC C CCC CCC CC CCCC CC CC C CC C CC CC C CC CCC CC C CCC CCCC C CC CC C CCC CC CC CC C CCCCC CCC C C CC C CC CCC CCC CC CCC CC CCCC C CCC CC CCC

923 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

923 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching is Suboptimal

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

CC

C C

CC

C

C

C CC

C

C

CC

C

C

C

C

C

CC CC

C

C

C

C

CC

C

CC

C

C

C

C

C

C

C

C

C C

CC

CCC

CC

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

King and Nielsen

Argument 3I Random pruning (deleting observations at random) increasesimbalance This is because the sample size decreases so that varianceincreases (large differences become more likely)

I More imbalancevariance means more model dependence andresearcher discretion

I Because PSM approximates complete randomization it engages inrandom pruning

I PSM Paradox (ldquowhen you do lsquobetterrsquo you do worserdquo)F When matching is made more strict (eg by decreasing the size of

the caliper) PSM like other matching methods typically reducesimbalance But soon the PSM Paradox kicks in such that furtherpruning quickly increases imbalance

F If the data is such that there are no big differences between treatedand untreated to begin with the PSM Paradox kicks in almostimmediately

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 25

PSM Increases Model Dependence amp Bias

Model Dependence0

000

010

020

030

040

05

Number of Units Pruned

Varia

nce

0 40 80 120 160

MDM

PSM

Bias

20

25

30

35

40

Number of Units Pruned

Max

imum

Coe

ffici

ent a

cros

s 51

2 Sp

ecifi

catio

ns0 40 80 120 160

MDM

PSM

True effect = 2

Yi = 2Ti + X1i + X2i + ii N(0 1)

2023 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26

The Propensity Score Paradox in Real Data

Finkel et al (JOP 2012)

0 500 1000 1500 2000 2500 3000

0

2

4

6

8

10

Number of units pruned

Imba

lanc

e

CEMMDM

PSM

14 SD caliper

Raw

Random

Nielsen et al (AJPS 2011)

0 500 1000 1500 2000 2500

0

5

10

15

20

25

30

Number of units pruned

Imba

lanc

e

CEM

MDM

PSM

14 SD caliper

RandomRaw

Similar pattern for gt 20 other real data sets we checked

2123 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 27

Are King and Nielsen right

Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)

I Matching is good because it reduces model dependence

I fully agree

My view however may be somewhat less pessimistic I believe thatresearch results can be credible if researchers are well educated sothat they know what they are doing and if modeling decisions aremade transparent and robustness of results is evaluated (anddocumented)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 28

Are King and Nielsen right

Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization

That fully blocked randomization is more efficient than completerandomization ndash given the sample size ndash is of course true (how largethe efficiency gains are depends on the strength of the relationbetween X and Y )

However if blocking reduces the sample size it is not a priori clearwhether estimates from the blocked sample are more efficient thanestimates from the full sample (although often they will be)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29

Are King and Nielsen right

Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization

That PSM approximates complete randomization is only partiallytrue PSM approximates complete randomization withinobservations with the same propensity score Hence PSM issomewhere between complete randomization and fully blockedrandomizationI If the X variables have no relation to T (treatment) then allobservations have the same propensity score Hence we end up withcomplete randomization

I If the X variables have a strong effect on T there is lots of blocking

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)

As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)

Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary

I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)

In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31

The Command

kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)

I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics

I Computationally efficient implementation

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32

Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)

drop if industry==2(4 observations deleted)

Mahalanobis-distance kernel matching

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33

Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)

Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif

collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888

3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0

10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0

2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0

Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio

collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966

3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1

10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1

2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34

Some Examples make a graph of the balancing stats mat M = r(M)

mat V = r(V)

coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)

addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))

addplot 2 xline(1) norescaling

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-4 -2 0 2 4 0 1 2 3 4

Std mean difference Variance ratio

Raw Matched

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35

Some Examples

Propensity-score kernel matching

kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Propensity-score kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 431 26 457 1214 182 1396 00188

Treatment-effects estimation

wage Coef

ATT 3887224NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36

Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)

01

23

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Den

sity

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37

Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)

05

1

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Cum

ulat

ive

prob

abilit

y

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38

Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)

02

46

8Raw Matched (ATT)

Untreated Treated

Prop

ensi

ty s

core

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39

Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289

NATE 1432913 2333282 614 0000 9755981 1890228

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40

Some Examples

Do some tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41

Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42

Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43

Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44

Some Examples

Mahalanobis-distance and propensity-score matching combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45

Some Examples

Exact matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46

Some Examples

Bandwidth selection the default (based on distribution of distances in

one-nearest-neighbor matching)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47

Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

3

02

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48

Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49

Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

4

1112

1314

Wei

ghte

d M

ISE

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50

Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51

Some Examples make a graph of the common-support stats mat M = r(M)

coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52

Some Examples

Multiple outcome variables

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53

Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54

Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2763358 166 0097 -082975 1000241

1ATT 9518705 406903 234 0019 1543553 1749386

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 123Prob gt chi2 = 02679

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4452343 111 0268 -379406 1365881

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017-06-24

Propensity Scores MatchingIllustration using kmatch

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017-06-24

Propensity Scores MatchingIllustration using kmatch

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64

Conclusions

The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)

Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65

Conclusions

Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the

literature (somewhat unclear eg how categorical variables should betreated)

Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 7: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

Average Treatment Effect

To determine the average effect unbiased estimates of E [Y 0] andE [Y 1] are required

If the independence assumption

(Y 0Y 1) perpperp D

applies that is if D is independent from Y 0 and Y 1 then

E [Y 0] = E [Y 0|D = 0]

E [Y 1] = E [Y 1|D = 1]

In this case the average causal effect can be be measured by asimple group comparison (mean difference) of observations withouttreatment (D = 0) and observations with treatment (D = 1)

Randomized experiments solve the problem If the assignment of Dis randomized D is independent from Y 0 and Y 1 by design

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 7

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 8

Conditional Independence Strong Ignorability

Can causal effects also be identified from ldquoobservationalrdquo (ienon-experimental) data

Sometimes it can be argued that the independence assumption isvalid conditionally (conditional independence ldquounconfoundednessrdquo)

(Y 0Y 1) perpperp D |X

If in addition the overlap assumption

0 lt Pr(D = 1|X = x) lt 1 for all x

is given then the ATE (or ATT or ATC) can be identified byconditioning on X

For example

ATE =sumx

Pr[X = x ] E [Y |D = 1X = x ]minus E [Y |D = 0X = x ]

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 9

Matching

Matching is one approach to ldquocondition on X rdquo if strong ignorabilityholds

Basic idea1 For each observation in the treatment group find ldquostatistical twinsrdquo in

the control group with the same (or at least very similar) X values(and vice versa)

2 The Y values of these matching observations are then used tocompute the counterfactual outcome for the observation at hand

3 An estimate for the average causal effect can be obtained as themean of the differences between the observed values and theldquoimputedrdquo counterfactual values over all observations

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 10

Matching

Formally

ATT =1

ND=1

sumi |D=1

[Yi minus Y 0

i

]=

1ND=1

sumi |D=1

Yi minussum

j |D=0

wijYj

ATC =

1ND=0

sumi |D=0

[Y 1

i minus Yi

]=

1ND=0

sumi |D=0

sumj |D=1

wijYj minus Yi

ATE =

ND=1

Nmiddot ATT +

ND=0

Nmiddot ATC

Different matching algorithms use different definitions of wij

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 11

Exact Matching

Exact matchingwij =

1ki if Xi = Xj

0 else

with ki as the number of observations for which Xi = Xj applies

The result equivalent to ldquoperfect stratificationrdquo or ldquosubclassificationrdquo(see eg Cochran 1968)

Problem If X contains several variables there is a large probabilitythat no exact matches can be found for many observations (theldquocurse of dimensionalityrdquo)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 12

Multivariate Distance Matching (MDM)

An alternative is to match based on a distance metric that measuresthe proximity between observations in the multivariate space of X

The idea then is to use observations that are ldquocloserdquo but notnecessarily equal as matches

A common approach is to use

MD(Xi Xj) =radic

(Xi minus Xj)primeΣminus1(Xi minus Xj)

as distance metric where Σ is an appropriate scaling matrix

I Mahalanobis matching Σ is the covariance matrix of X I Euclidean matching Σ is the identity matrixI Mahalanobis matching is equivalent to Euclidean matching based onstandardized and orthogonalized X

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 13

Matching Algorithms

Various matching algorithms can be employed to find potentialmatches based on MD and determine the matching weights wij

Pair matching (one-to-one matching without replacement)I For each observation i in the treatment group find observation j inthe control group for which MDij is smallest Once observation j isused as a match do not use it again

Nearest-neighbor matchingI For each observation i in the treatment group find the k closestobservations in the control group A single control can be usedmultiple times as a match In case of ties (multiple controls withidentical MD) use all ties as matches k is set by the researcher

Caliper matchingI Like nearest-neighbor matching but only use controls for which MDis smaller than some threshold c

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 14

Mahalanobis Matching

Radius matchingI Use all controls as matches for which MD is smaller than somethreshold c

Kernel matchingI Like radius matching but give larger weight to controls for which MDis small (using some kernel function such as eg the Epanechnikovkernel)

In addition since matching is no longer exact it may make sense torefine the estimates by applying regression-adjustment to thematched data (also known as ldquobias-adjustmentrdquo in the context ofnearest-neighbor matching)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 15

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 16

The Propensity Score Theorem (Rosenbaum and Rubin 1983)

If the conditional independence assumption is true then

Pr(Di = 1|Y 0i Y

1i Xi) = Pr(Di = 1|Xi) = π(Xi)

where π(X ) is called the propensity score

That is(Y 0Y 1) perpperp D |X

implies(Y 0Y 1) perpperp D |π(X )

so that under strong ignorability the average causal effect can beestimated by conditioning on the propensity score π(X ) instead of X

This is remarkable because the information in X which may includemany variables can be reduced to just one dimension This greatlysimplifies the matching task

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 17

Propensity Score Matching (PSM)

Instead of computing multivariate distances we can thus simplymatch on the (one-dimensional) propensity score

ProcedureI Step 1 Estimate the propensity score eg using a Logit modelI Step 2 Apply a matching algorithm using differences in thepropensity score |π(Xi )minus π(Xj)| instead of multivariate distances

PSM is tremendously popularI httpsscholargooglechscholarq=propensity+score+AND+(matching+OR+matched+OR+match)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 18

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 19

King and Nielsen

In 20152016 Gary King and Richard Nielsen circulated a paper thatcreated quite some concern among applied researchers

The basic message of the paper is that PSM is really really bad andshould be discarded

The paperI httpjmp1sexgVw

SlidesI httpsgkingharvardedupresentationswhy-propensity-scores-should-not-be-used-matching-6

Watch itI httpswwwyoutubecomwatchv=rBv39pK1iEs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 20

King and Nielsen

The story goes about as follows

Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)

I Matching is good because it reduces model dependence

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 21

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

CC

C

CC

C

C

CC

C

C

C

C

C

C

C

C

CC C

C

C

C

C

C

C

C

C

C

C

C

CCC

CC

CC

C

C

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

CC

C

CC

C

C

CC

C

C

C

C

C

C

C

C

CC C

C

C

C

C

C

C

C

C

C

C

C

CCC

CC

CC

C

C

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

CC

C

CC

C

C

CC

C

C

C

C

C

C

C

C

CC C

C

C

C

C

C

C

C

C

C

C

C

CCC

CC

CC

C

C

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

TC

C

C

CC

CC

C

C

CC

C CC

C

C

CCCC

C

CC

C

CC

CCCC

C

C

C

C

CC

CCCC

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

TC

C

C

CC

CC

C

C

CC

C CC

C

C

CCCC

C

CC

C

CC

CCCC

C

C

C

C

CC

CCCC

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

King and NielsenArgument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomizationMatching Finding Hidden Randomized Experiments

Types of Experiments

BalanceCovariates

CompleteRandomization

FullyBlocked

Observed On average ExactUnobserved On average On average

Fully blocked dominates complete randomization forimbalance model dependence power eciency bias researchcosts robustness Eg Imai King Nall 2009 SEs 600 smaller

Goal of Each Matching Method (in Observational Data)

bull PSM complete randomization

bull Other methods fully blocked

bull Other matching methods dominate PSM (wait it gets worse)

623

(slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 23

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

CC CCC CCCCC CCCCCC CCC CC C C CCCCC CCC CC CC CC CC CC CC CC CC C CCCCC CC CC C CCCC C CCC CC CC C CCC CC CC CC C CCC CC CC CCC C CC CCCCC CC CCCC C CCC CCC CC C C CCCC CC CCCC C CCCC C CC CCC CC CCC CC CC CC CC C CCCC C CC C C CC CCC CC CCC CCCCC CCCC CCC CC CCC C C CC CC CC CCCC CC CC CCC C C CCC C CC CCC CC C CCC C C CCC CC CC C CC C CC CCCCC CCCC C C CC C CCCC CC CCC CCC C CCC CC CC CC CC CC C CC C CC CC CCC CC C C CCC CCC C CC CC CCC CCC CC CCC CC C CCC CC C CC CCCC C CC C CC CC C CC C CCC C C CCC CC C CCC CCC CC CCCC CC CC C CC C CC CC C CC CCC CC C CCC CCCC C CC CC C CCC CC CC CC C CCCCC CCC C C CC C CC CCC CCC CC CCC CC CCCC C CCC CC CCC

923 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

923 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching is Suboptimal

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

CC

C C

CC

C

C

C CC

C

C

CC

C

C

C

C

C

CC CC

C

C

C

C

CC

C

CC

C

C

C

C

C

C

C

C

C C

CC

CCC

CC

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

King and Nielsen

Argument 3I Random pruning (deleting observations at random) increasesimbalance This is because the sample size decreases so that varianceincreases (large differences become more likely)

I More imbalancevariance means more model dependence andresearcher discretion

I Because PSM approximates complete randomization it engages inrandom pruning

I PSM Paradox (ldquowhen you do lsquobetterrsquo you do worserdquo)F When matching is made more strict (eg by decreasing the size of

the caliper) PSM like other matching methods typically reducesimbalance But soon the PSM Paradox kicks in such that furtherpruning quickly increases imbalance

F If the data is such that there are no big differences between treatedand untreated to begin with the PSM Paradox kicks in almostimmediately

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 25

PSM Increases Model Dependence amp Bias

Model Dependence0

000

010

020

030

040

05

Number of Units Pruned

Varia

nce

0 40 80 120 160

MDM

PSM

Bias

20

25

30

35

40

Number of Units Pruned

Max

imum

Coe

ffici

ent a

cros

s 51

2 Sp

ecifi

catio

ns0 40 80 120 160

MDM

PSM

True effect = 2

Yi = 2Ti + X1i + X2i + ii N(0 1)

2023 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26

The Propensity Score Paradox in Real Data

Finkel et al (JOP 2012)

0 500 1000 1500 2000 2500 3000

0

2

4

6

8

10

Number of units pruned

Imba

lanc

e

CEMMDM

PSM

14 SD caliper

Raw

Random

Nielsen et al (AJPS 2011)

0 500 1000 1500 2000 2500

0

5

10

15

20

25

30

Number of units pruned

Imba

lanc

e

CEM

MDM

PSM

14 SD caliper

RandomRaw

Similar pattern for gt 20 other real data sets we checked

2123 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 27

Are King and Nielsen right

Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)

I Matching is good because it reduces model dependence

I fully agree

My view however may be somewhat less pessimistic I believe thatresearch results can be credible if researchers are well educated sothat they know what they are doing and if modeling decisions aremade transparent and robustness of results is evaluated (anddocumented)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 28

Are King and Nielsen right

Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization

That fully blocked randomization is more efficient than completerandomization ndash given the sample size ndash is of course true (how largethe efficiency gains are depends on the strength of the relationbetween X and Y )

However if blocking reduces the sample size it is not a priori clearwhether estimates from the blocked sample are more efficient thanestimates from the full sample (although often they will be)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29

Are King and Nielsen right

Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization

That PSM approximates complete randomization is only partiallytrue PSM approximates complete randomization withinobservations with the same propensity score Hence PSM issomewhere between complete randomization and fully blockedrandomizationI If the X variables have no relation to T (treatment) then allobservations have the same propensity score Hence we end up withcomplete randomization

I If the X variables have a strong effect on T there is lots of blocking

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)

As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)

Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary

I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)

In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31

The Command

kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)

I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics

I Computationally efficient implementation

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32

Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)

drop if industry==2(4 observations deleted)

Mahalanobis-distance kernel matching

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33

Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)

Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif

collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888

3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0

10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0

2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0

Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio

collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966

3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1

10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1

2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34

Some Examples make a graph of the balancing stats mat M = r(M)

mat V = r(V)

coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)

addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))

addplot 2 xline(1) norescaling

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-4 -2 0 2 4 0 1 2 3 4

Std mean difference Variance ratio

Raw Matched

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35

Some Examples

Propensity-score kernel matching

kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Propensity-score kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 431 26 457 1214 182 1396 00188

Treatment-effects estimation

wage Coef

ATT 3887224NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36

Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)

01

23

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Den

sity

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37

Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)

05

1

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Cum

ulat

ive

prob

abilit

y

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38

Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)

02

46

8Raw Matched (ATT)

Untreated Treated

Prop

ensi

ty s

core

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39

Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289

NATE 1432913 2333282 614 0000 9755981 1890228

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40

Some Examples

Do some tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41

Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42

Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43

Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44

Some Examples

Mahalanobis-distance and propensity-score matching combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45

Some Examples

Exact matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46

Some Examples

Bandwidth selection the default (based on distribution of distances in

one-nearest-neighbor matching)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47

Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

3

02

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48

Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49

Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

4

1112

1314

Wei

ghte

d M

ISE

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50

Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51

Some Examples make a graph of the common-support stats mat M = r(M)

coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52

Some Examples

Multiple outcome variables

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53

Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54

Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2763358 166 0097 -082975 1000241

1ATT 9518705 406903 234 0019 1543553 1749386

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 123Prob gt chi2 = 02679

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4452343 111 0268 -379406 1365881

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017-06-24

Propensity Scores MatchingIllustration using kmatch

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017-06-24

Propensity Scores MatchingIllustration using kmatch

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64

Conclusions

The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)

Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65

Conclusions

Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the

literature (somewhat unclear eg how categorical variables should betreated)

Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 8: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 8

Conditional Independence Strong Ignorability

Can causal effects also be identified from ldquoobservationalrdquo (ienon-experimental) data

Sometimes it can be argued that the independence assumption isvalid conditionally (conditional independence ldquounconfoundednessrdquo)

(Y 0Y 1) perpperp D |X

If in addition the overlap assumption

0 lt Pr(D = 1|X = x) lt 1 for all x

is given then the ATE (or ATT or ATC) can be identified byconditioning on X

For example

ATE =sumx

Pr[X = x ] E [Y |D = 1X = x ]minus E [Y |D = 0X = x ]

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 9

Matching

Matching is one approach to ldquocondition on X rdquo if strong ignorabilityholds

Basic idea1 For each observation in the treatment group find ldquostatistical twinsrdquo in

the control group with the same (or at least very similar) X values(and vice versa)

2 The Y values of these matching observations are then used tocompute the counterfactual outcome for the observation at hand

3 An estimate for the average causal effect can be obtained as themean of the differences between the observed values and theldquoimputedrdquo counterfactual values over all observations

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 10

Matching

Formally

ATT =1

ND=1

sumi |D=1

[Yi minus Y 0

i

]=

1ND=1

sumi |D=1

Yi minussum

j |D=0

wijYj

ATC =

1ND=0

sumi |D=0

[Y 1

i minus Yi

]=

1ND=0

sumi |D=0

sumj |D=1

wijYj minus Yi

ATE =

ND=1

Nmiddot ATT +

ND=0

Nmiddot ATC

Different matching algorithms use different definitions of wij

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 11

Exact Matching

Exact matchingwij =

1ki if Xi = Xj

0 else

with ki as the number of observations for which Xi = Xj applies

The result equivalent to ldquoperfect stratificationrdquo or ldquosubclassificationrdquo(see eg Cochran 1968)

Problem If X contains several variables there is a large probabilitythat no exact matches can be found for many observations (theldquocurse of dimensionalityrdquo)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 12

Multivariate Distance Matching (MDM)

An alternative is to match based on a distance metric that measuresthe proximity between observations in the multivariate space of X

The idea then is to use observations that are ldquocloserdquo but notnecessarily equal as matches

A common approach is to use

MD(Xi Xj) =radic

(Xi minus Xj)primeΣminus1(Xi minus Xj)

as distance metric where Σ is an appropriate scaling matrix

I Mahalanobis matching Σ is the covariance matrix of X I Euclidean matching Σ is the identity matrixI Mahalanobis matching is equivalent to Euclidean matching based onstandardized and orthogonalized X

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 13

Matching Algorithms

Various matching algorithms can be employed to find potentialmatches based on MD and determine the matching weights wij

Pair matching (one-to-one matching without replacement)I For each observation i in the treatment group find observation j inthe control group for which MDij is smallest Once observation j isused as a match do not use it again

Nearest-neighbor matchingI For each observation i in the treatment group find the k closestobservations in the control group A single control can be usedmultiple times as a match In case of ties (multiple controls withidentical MD) use all ties as matches k is set by the researcher

Caliper matchingI Like nearest-neighbor matching but only use controls for which MDis smaller than some threshold c

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 14

Mahalanobis Matching

Radius matchingI Use all controls as matches for which MD is smaller than somethreshold c

Kernel matchingI Like radius matching but give larger weight to controls for which MDis small (using some kernel function such as eg the Epanechnikovkernel)

In addition since matching is no longer exact it may make sense torefine the estimates by applying regression-adjustment to thematched data (also known as ldquobias-adjustmentrdquo in the context ofnearest-neighbor matching)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 15

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 16

The Propensity Score Theorem (Rosenbaum and Rubin 1983)

If the conditional independence assumption is true then

Pr(Di = 1|Y 0i Y

1i Xi) = Pr(Di = 1|Xi) = π(Xi)

where π(X ) is called the propensity score

That is(Y 0Y 1) perpperp D |X

implies(Y 0Y 1) perpperp D |π(X )

so that under strong ignorability the average causal effect can beestimated by conditioning on the propensity score π(X ) instead of X

This is remarkable because the information in X which may includemany variables can be reduced to just one dimension This greatlysimplifies the matching task

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 17

Propensity Score Matching (PSM)

Instead of computing multivariate distances we can thus simplymatch on the (one-dimensional) propensity score

ProcedureI Step 1 Estimate the propensity score eg using a Logit modelI Step 2 Apply a matching algorithm using differences in thepropensity score |π(Xi )minus π(Xj)| instead of multivariate distances

PSM is tremendously popularI httpsscholargooglechscholarq=propensity+score+AND+(matching+OR+matched+OR+match)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 18

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 19

King and Nielsen

In 20152016 Gary King and Richard Nielsen circulated a paper thatcreated quite some concern among applied researchers

The basic message of the paper is that PSM is really really bad andshould be discarded

The paperI httpjmp1sexgVw

SlidesI httpsgkingharvardedupresentationswhy-propensity-scores-should-not-be-used-matching-6

Watch itI httpswwwyoutubecomwatchv=rBv39pK1iEs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 20

King and Nielsen

The story goes about as follows

Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)

I Matching is good because it reduces model dependence

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 21

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

CC

C

CC

C

C

CC

C

C

C

C

C

C

C

C

CC C

C

C

C

C

C

C

C

C

C

C

C

CCC

CC

CC

C

C

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

CC

C

CC

C

C

CC

C

C

C

C

C

C

C

C

CC C

C

C

C

C

C

C

C

C

C

C

C

CCC

CC

CC

C

C

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

CC

C

CC

C

C

CC

C

C

C

C

C

C

C

C

CC C

C

C

C

C

C

C

C

C

C

C

C

CCC

CC

CC

C

C

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

TC

C

C

CC

CC

C

C

CC

C CC

C

C

CCCC

C

CC

C

CC

CCCC

C

C

C

C

CC

CCCC

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

TC

C

C

CC

CC

C

C

CC

C CC

C

C

CCCC

C

CC

C

CC

CCCC

C

C

C

C

CC

CCCC

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

King and NielsenArgument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomizationMatching Finding Hidden Randomized Experiments

Types of Experiments

BalanceCovariates

CompleteRandomization

FullyBlocked

Observed On average ExactUnobserved On average On average

Fully blocked dominates complete randomization forimbalance model dependence power eciency bias researchcosts robustness Eg Imai King Nall 2009 SEs 600 smaller

Goal of Each Matching Method (in Observational Data)

bull PSM complete randomization

bull Other methods fully blocked

bull Other matching methods dominate PSM (wait it gets worse)

623

(slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 23

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

CC CCC CCCCC CCCCCC CCC CC C C CCCCC CCC CC CC CC CC CC CC CC CC C CCCCC CC CC C CCCC C CCC CC CC C CCC CC CC CC C CCC CC CC CCC C CC CCCCC CC CCCC C CCC CCC CC C C CCCC CC CCCC C CCCC C CC CCC CC CCC CC CC CC CC C CCCC C CC C C CC CCC CC CCC CCCCC CCCC CCC CC CCC C C CC CC CC CCCC CC CC CCC C C CCC C CC CCC CC C CCC C C CCC CC CC C CC C CC CCCCC CCCC C C CC C CCCC CC CCC CCC C CCC CC CC CC CC CC C CC C CC CC CCC CC C C CCC CCC C CC CC CCC CCC CC CCC CC C CCC CC C CC CCCC C CC C CC CC C CC C CCC C C CCC CC C CCC CCC CC CCCC CC CC C CC C CC CC C CC CCC CC C CCC CCCC C CC CC C CCC CC CC CC C CCCCC CCC C C CC C CC CCC CCC CC CCC CC CCCC C CCC CC CCC

923 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

923 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching is Suboptimal

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

CC

C C

CC

C

C

C CC

C

C

CC

C

C

C

C

C

CC CC

C

C

C

C

CC

C

CC

C

C

C

C

C

C

C

C

C C

CC

CCC

CC

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

King and Nielsen

Argument 3I Random pruning (deleting observations at random) increasesimbalance This is because the sample size decreases so that varianceincreases (large differences become more likely)

I More imbalancevariance means more model dependence andresearcher discretion

I Because PSM approximates complete randomization it engages inrandom pruning

I PSM Paradox (ldquowhen you do lsquobetterrsquo you do worserdquo)F When matching is made more strict (eg by decreasing the size of

the caliper) PSM like other matching methods typically reducesimbalance But soon the PSM Paradox kicks in such that furtherpruning quickly increases imbalance

F If the data is such that there are no big differences between treatedand untreated to begin with the PSM Paradox kicks in almostimmediately

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 25

PSM Increases Model Dependence amp Bias

Model Dependence0

000

010

020

030

040

05

Number of Units Pruned

Varia

nce

0 40 80 120 160

MDM

PSM

Bias

20

25

30

35

40

Number of Units Pruned

Max

imum

Coe

ffici

ent a

cros

s 51

2 Sp

ecifi

catio

ns0 40 80 120 160

MDM

PSM

True effect = 2

Yi = 2Ti + X1i + X2i + ii N(0 1)

2023 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26

The Propensity Score Paradox in Real Data

Finkel et al (JOP 2012)

0 500 1000 1500 2000 2500 3000

0

2

4

6

8

10

Number of units pruned

Imba

lanc

e

CEMMDM

PSM

14 SD caliper

Raw

Random

Nielsen et al (AJPS 2011)

0 500 1000 1500 2000 2500

0

5

10

15

20

25

30

Number of units pruned

Imba

lanc

e

CEM

MDM

PSM

14 SD caliper

RandomRaw

Similar pattern for gt 20 other real data sets we checked

2123 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 27

Are King and Nielsen right

Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)

I Matching is good because it reduces model dependence

I fully agree

My view however may be somewhat less pessimistic I believe thatresearch results can be credible if researchers are well educated sothat they know what they are doing and if modeling decisions aremade transparent and robustness of results is evaluated (anddocumented)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 28

Are King and Nielsen right

Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization

That fully blocked randomization is more efficient than completerandomization ndash given the sample size ndash is of course true (how largethe efficiency gains are depends on the strength of the relationbetween X and Y )

However if blocking reduces the sample size it is not a priori clearwhether estimates from the blocked sample are more efficient thanestimates from the full sample (although often they will be)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29

Are King and Nielsen right

Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization

That PSM approximates complete randomization is only partiallytrue PSM approximates complete randomization withinobservations with the same propensity score Hence PSM issomewhere between complete randomization and fully blockedrandomizationI If the X variables have no relation to T (treatment) then allobservations have the same propensity score Hence we end up withcomplete randomization

I If the X variables have a strong effect on T there is lots of blocking

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)

As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)

Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary

I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)

In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31

The Command

kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)

I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics

I Computationally efficient implementation

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32

Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)

drop if industry==2(4 observations deleted)

Mahalanobis-distance kernel matching

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33

Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)

Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif

collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888

3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0

10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0

2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0

Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio

collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966

3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1

10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1

2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34

Some Examples make a graph of the balancing stats mat M = r(M)

mat V = r(V)

coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)

addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))

addplot 2 xline(1) norescaling

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-4 -2 0 2 4 0 1 2 3 4

Std mean difference Variance ratio

Raw Matched

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35

Some Examples

Propensity-score kernel matching

kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Propensity-score kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 431 26 457 1214 182 1396 00188

Treatment-effects estimation

wage Coef

ATT 3887224NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36

Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)

01

23

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Den

sity

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37

Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)

05

1

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Cum

ulat

ive

prob

abilit

y

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38

Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)

02

46

8Raw Matched (ATT)

Untreated Treated

Prop

ensi

ty s

core

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39

Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289

NATE 1432913 2333282 614 0000 9755981 1890228

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40

Some Examples

Do some tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41

Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42

Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43

Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44

Some Examples

Mahalanobis-distance and propensity-score matching combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45

Some Examples

Exact matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46

Some Examples

Bandwidth selection the default (based on distribution of distances in

one-nearest-neighbor matching)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47

Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

3

02

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48

Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49

Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

4

1112

1314

Wei

ghte

d M

ISE

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50

Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51

Some Examples make a graph of the common-support stats mat M = r(M)

coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52

Some Examples

Multiple outcome variables

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53

Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54

Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2763358 166 0097 -082975 1000241

1ATT 9518705 406903 234 0019 1543553 1749386

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 123Prob gt chi2 = 02679

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4452343 111 0268 -379406 1365881

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017-06-24

Propensity Scores MatchingIllustration using kmatch

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017-06-24

Propensity Scores MatchingIllustration using kmatch

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64

Conclusions

The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)

Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65

Conclusions

Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the

literature (somewhat unclear eg how categorical variables should betreated)

Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 9: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

Conditional Independence Strong Ignorability

Can causal effects also be identified from ldquoobservationalrdquo (ienon-experimental) data

Sometimes it can be argued that the independence assumption isvalid conditionally (conditional independence ldquounconfoundednessrdquo)

(Y 0Y 1) perpperp D |X

If in addition the overlap assumption

0 lt Pr(D = 1|X = x) lt 1 for all x

is given then the ATE (or ATT or ATC) can be identified byconditioning on X

For example

ATE =sumx

Pr[X = x ] E [Y |D = 1X = x ]minus E [Y |D = 0X = x ]

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 9

Matching

Matching is one approach to ldquocondition on X rdquo if strong ignorabilityholds

Basic idea1 For each observation in the treatment group find ldquostatistical twinsrdquo in

the control group with the same (or at least very similar) X values(and vice versa)

2 The Y values of these matching observations are then used tocompute the counterfactual outcome for the observation at hand

3 An estimate for the average causal effect can be obtained as themean of the differences between the observed values and theldquoimputedrdquo counterfactual values over all observations

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 10

Matching

Formally

ATT =1

ND=1

sumi |D=1

[Yi minus Y 0

i

]=

1ND=1

sumi |D=1

Yi minussum

j |D=0

wijYj

ATC =

1ND=0

sumi |D=0

[Y 1

i minus Yi

]=

1ND=0

sumi |D=0

sumj |D=1

wijYj minus Yi

ATE =

ND=1

Nmiddot ATT +

ND=0

Nmiddot ATC

Different matching algorithms use different definitions of wij

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 11

Exact Matching

Exact matchingwij =

1ki if Xi = Xj

0 else

with ki as the number of observations for which Xi = Xj applies

The result equivalent to ldquoperfect stratificationrdquo or ldquosubclassificationrdquo(see eg Cochran 1968)

Problem If X contains several variables there is a large probabilitythat no exact matches can be found for many observations (theldquocurse of dimensionalityrdquo)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 12

Multivariate Distance Matching (MDM)

An alternative is to match based on a distance metric that measuresthe proximity between observations in the multivariate space of X

The idea then is to use observations that are ldquocloserdquo but notnecessarily equal as matches

A common approach is to use

MD(Xi Xj) =radic

(Xi minus Xj)primeΣminus1(Xi minus Xj)

as distance metric where Σ is an appropriate scaling matrix

I Mahalanobis matching Σ is the covariance matrix of X I Euclidean matching Σ is the identity matrixI Mahalanobis matching is equivalent to Euclidean matching based onstandardized and orthogonalized X

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 13

Matching Algorithms

Various matching algorithms can be employed to find potentialmatches based on MD and determine the matching weights wij

Pair matching (one-to-one matching without replacement)I For each observation i in the treatment group find observation j inthe control group for which MDij is smallest Once observation j isused as a match do not use it again

Nearest-neighbor matchingI For each observation i in the treatment group find the k closestobservations in the control group A single control can be usedmultiple times as a match In case of ties (multiple controls withidentical MD) use all ties as matches k is set by the researcher

Caliper matchingI Like nearest-neighbor matching but only use controls for which MDis smaller than some threshold c

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 14

Mahalanobis Matching

Radius matchingI Use all controls as matches for which MD is smaller than somethreshold c

Kernel matchingI Like radius matching but give larger weight to controls for which MDis small (using some kernel function such as eg the Epanechnikovkernel)

In addition since matching is no longer exact it may make sense torefine the estimates by applying regression-adjustment to thematched data (also known as ldquobias-adjustmentrdquo in the context ofnearest-neighbor matching)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 15

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 16

The Propensity Score Theorem (Rosenbaum and Rubin 1983)

If the conditional independence assumption is true then

Pr(Di = 1|Y 0i Y

1i Xi) = Pr(Di = 1|Xi) = π(Xi)

where π(X ) is called the propensity score

That is(Y 0Y 1) perpperp D |X

implies(Y 0Y 1) perpperp D |π(X )

so that under strong ignorability the average causal effect can beestimated by conditioning on the propensity score π(X ) instead of X

This is remarkable because the information in X which may includemany variables can be reduced to just one dimension This greatlysimplifies the matching task

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 17

Propensity Score Matching (PSM)

Instead of computing multivariate distances we can thus simplymatch on the (one-dimensional) propensity score

ProcedureI Step 1 Estimate the propensity score eg using a Logit modelI Step 2 Apply a matching algorithm using differences in thepropensity score |π(Xi )minus π(Xj)| instead of multivariate distances

PSM is tremendously popularI httpsscholargooglechscholarq=propensity+score+AND+(matching+OR+matched+OR+match)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 18

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 19

King and Nielsen

In 20152016 Gary King and Richard Nielsen circulated a paper thatcreated quite some concern among applied researchers

The basic message of the paper is that PSM is really really bad andshould be discarded

The paperI httpjmp1sexgVw

SlidesI httpsgkingharvardedupresentationswhy-propensity-scores-should-not-be-used-matching-6

Watch itI httpswwwyoutubecomwatchv=rBv39pK1iEs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 20

King and Nielsen

The story goes about as follows

Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)

I Matching is good because it reduces model dependence

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 21

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

CC

C

CC

C

C

CC

C

C

C

C

C

C

C

C

CC C

C

C

C

C

C

C

C

C

C

C

C

CCC

CC

CC

C

C

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

CC

C

CC

C

C

CC

C

C

C

C

C

C

C

C

CC C

C

C

C

C

C

C

C

C

C

C

C

CCC

CC

CC

C

C

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

CC

C

CC

C

C

CC

C

C

C

C

C

C

C

C

CC C

C

C

C

C

C

C

C

C

C

C

C

CCC

CC

CC

C

C

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

TC

C

C

CC

CC

C

C

CC

C CC

C

C

CCCC

C

CC

C

CC

CCCC

C

C

C

C

CC

CCCC

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

TC

C

C

CC

CC

C

C

CC

C CC

C

C

CCCC

C

CC

C

CC

CCCC

C

C

C

C

CC

CCCC

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

King and NielsenArgument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomizationMatching Finding Hidden Randomized Experiments

Types of Experiments

BalanceCovariates

CompleteRandomization

FullyBlocked

Observed On average ExactUnobserved On average On average

Fully blocked dominates complete randomization forimbalance model dependence power eciency bias researchcosts robustness Eg Imai King Nall 2009 SEs 600 smaller

Goal of Each Matching Method (in Observational Data)

bull PSM complete randomization

bull Other methods fully blocked

bull Other matching methods dominate PSM (wait it gets worse)

623

(slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 23

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

CC CCC CCCCC CCCCCC CCC CC C C CCCCC CCC CC CC CC CC CC CC CC CC C CCCCC CC CC C CCCC C CCC CC CC C CCC CC CC CC C CCC CC CC CCC C CC CCCCC CC CCCC C CCC CCC CC C C CCCC CC CCCC C CCCC C CC CCC CC CCC CC CC CC CC C CCCC C CC C C CC CCC CC CCC CCCCC CCCC CCC CC CCC C C CC CC CC CCCC CC CC CCC C C CCC C CC CCC CC C CCC C C CCC CC CC C CC C CC CCCCC CCCC C C CC C CCCC CC CCC CCC C CCC CC CC CC CC CC C CC C CC CC CCC CC C C CCC CCC C CC CC CCC CCC CC CCC CC C CCC CC C CC CCCC C CC C CC CC C CC C CCC C C CCC CC C CCC CCC CC CCCC CC CC C CC C CC CC C CC CCC CC C CCC CCCC C CC CC C CCC CC CC CC C CCCCC CCC C C CC C CC CCC CCC CC CCC CC CCCC C CCC CC CCC

923 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

923 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching is Suboptimal

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

CC

C C

CC

C

C

C CC

C

C

CC

C

C

C

C

C

CC CC

C

C

C

C

CC

C

CC

C

C

C

C

C

C

C

C

C C

CC

CCC

CC

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

King and Nielsen

Argument 3I Random pruning (deleting observations at random) increasesimbalance This is because the sample size decreases so that varianceincreases (large differences become more likely)

I More imbalancevariance means more model dependence andresearcher discretion

I Because PSM approximates complete randomization it engages inrandom pruning

I PSM Paradox (ldquowhen you do lsquobetterrsquo you do worserdquo)F When matching is made more strict (eg by decreasing the size of

the caliper) PSM like other matching methods typically reducesimbalance But soon the PSM Paradox kicks in such that furtherpruning quickly increases imbalance

F If the data is such that there are no big differences between treatedand untreated to begin with the PSM Paradox kicks in almostimmediately

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 25

PSM Increases Model Dependence amp Bias

Model Dependence0

000

010

020

030

040

05

Number of Units Pruned

Varia

nce

0 40 80 120 160

MDM

PSM

Bias

20

25

30

35

40

Number of Units Pruned

Max

imum

Coe

ffici

ent a

cros

s 51

2 Sp

ecifi

catio

ns0 40 80 120 160

MDM

PSM

True effect = 2

Yi = 2Ti + X1i + X2i + ii N(0 1)

2023 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26

The Propensity Score Paradox in Real Data

Finkel et al (JOP 2012)

0 500 1000 1500 2000 2500 3000

0

2

4

6

8

10

Number of units pruned

Imba

lanc

e

CEMMDM

PSM

14 SD caliper

Raw

Random

Nielsen et al (AJPS 2011)

0 500 1000 1500 2000 2500

0

5

10

15

20

25

30

Number of units pruned

Imba

lanc

e

CEM

MDM

PSM

14 SD caliper

RandomRaw

Similar pattern for gt 20 other real data sets we checked

2123 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 27

Are King and Nielsen right

Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)

I Matching is good because it reduces model dependence

I fully agree

My view however may be somewhat less pessimistic I believe thatresearch results can be credible if researchers are well educated sothat they know what they are doing and if modeling decisions aremade transparent and robustness of results is evaluated (anddocumented)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 28

Are King and Nielsen right

Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization

That fully blocked randomization is more efficient than completerandomization ndash given the sample size ndash is of course true (how largethe efficiency gains are depends on the strength of the relationbetween X and Y )

However if blocking reduces the sample size it is not a priori clearwhether estimates from the blocked sample are more efficient thanestimates from the full sample (although often they will be)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29

Are King and Nielsen right

Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization

That PSM approximates complete randomization is only partiallytrue PSM approximates complete randomization withinobservations with the same propensity score Hence PSM issomewhere between complete randomization and fully blockedrandomizationI If the X variables have no relation to T (treatment) then allobservations have the same propensity score Hence we end up withcomplete randomization

I If the X variables have a strong effect on T there is lots of blocking

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)

As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)

Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary

I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)

In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31

The Command

kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)

I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics

I Computationally efficient implementation

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32

Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)

drop if industry==2(4 observations deleted)

Mahalanobis-distance kernel matching

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33

Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)

Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif

collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888

3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0

10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0

2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0

Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio

collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966

3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1

10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1

2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34

Some Examples make a graph of the balancing stats mat M = r(M)

mat V = r(V)

coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)

addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))

addplot 2 xline(1) norescaling

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-4 -2 0 2 4 0 1 2 3 4

Std mean difference Variance ratio

Raw Matched

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35

Some Examples

Propensity-score kernel matching

kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Propensity-score kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 431 26 457 1214 182 1396 00188

Treatment-effects estimation

wage Coef

ATT 3887224NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36

Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)

01

23

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Den

sity

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37

Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)

05

1

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Cum

ulat

ive

prob

abilit

y

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38

Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)

02

46

8Raw Matched (ATT)

Untreated Treated

Prop

ensi

ty s

core

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39

Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289

NATE 1432913 2333282 614 0000 9755981 1890228

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40

Some Examples

Do some tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41

Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42

Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43

Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44

Some Examples

Mahalanobis-distance and propensity-score matching combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45

Some Examples

Exact matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46

Some Examples

Bandwidth selection the default (based on distribution of distances in

one-nearest-neighbor matching)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47

Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

3

02

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48

Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49

Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

4

1112

1314

Wei

ghte

d M

ISE

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50

Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51

Some Examples make a graph of the common-support stats mat M = r(M)

coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52

Some Examples

Multiple outcome variables

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53

Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54

Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2763358 166 0097 -082975 1000241

1ATT 9518705 406903 234 0019 1543553 1749386

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 123Prob gt chi2 = 02679

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4452343 111 0268 -379406 1365881

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017-06-24

Propensity Scores MatchingIllustration using kmatch

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017-06-24

Propensity Scores MatchingIllustration using kmatch

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64

Conclusions

The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)

Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65

Conclusions

Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the

literature (somewhat unclear eg how categorical variables should betreated)

Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 10: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

Matching

Matching is one approach to ldquocondition on X rdquo if strong ignorabilityholds

Basic idea1 For each observation in the treatment group find ldquostatistical twinsrdquo in

the control group with the same (or at least very similar) X values(and vice versa)

2 The Y values of these matching observations are then used tocompute the counterfactual outcome for the observation at hand

3 An estimate for the average causal effect can be obtained as themean of the differences between the observed values and theldquoimputedrdquo counterfactual values over all observations

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 10

Matching

Formally

ATT =1

ND=1

sumi |D=1

[Yi minus Y 0

i

]=

1ND=1

sumi |D=1

Yi minussum

j |D=0

wijYj

ATC =

1ND=0

sumi |D=0

[Y 1

i minus Yi

]=

1ND=0

sumi |D=0

sumj |D=1

wijYj minus Yi

ATE =

ND=1

Nmiddot ATT +

ND=0

Nmiddot ATC

Different matching algorithms use different definitions of wij

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 11

Exact Matching

Exact matchingwij =

1ki if Xi = Xj

0 else

with ki as the number of observations for which Xi = Xj applies

The result equivalent to ldquoperfect stratificationrdquo or ldquosubclassificationrdquo(see eg Cochran 1968)

Problem If X contains several variables there is a large probabilitythat no exact matches can be found for many observations (theldquocurse of dimensionalityrdquo)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 12

Multivariate Distance Matching (MDM)

An alternative is to match based on a distance metric that measuresthe proximity between observations in the multivariate space of X

The idea then is to use observations that are ldquocloserdquo but notnecessarily equal as matches

A common approach is to use

MD(Xi Xj) =radic

(Xi minus Xj)primeΣminus1(Xi minus Xj)

as distance metric where Σ is an appropriate scaling matrix

I Mahalanobis matching Σ is the covariance matrix of X I Euclidean matching Σ is the identity matrixI Mahalanobis matching is equivalent to Euclidean matching based onstandardized and orthogonalized X

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 13

Matching Algorithms

Various matching algorithms can be employed to find potentialmatches based on MD and determine the matching weights wij

Pair matching (one-to-one matching without replacement)I For each observation i in the treatment group find observation j inthe control group for which MDij is smallest Once observation j isused as a match do not use it again

Nearest-neighbor matchingI For each observation i in the treatment group find the k closestobservations in the control group A single control can be usedmultiple times as a match In case of ties (multiple controls withidentical MD) use all ties as matches k is set by the researcher

Caliper matchingI Like nearest-neighbor matching but only use controls for which MDis smaller than some threshold c

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 14

Mahalanobis Matching

Radius matchingI Use all controls as matches for which MD is smaller than somethreshold c

Kernel matchingI Like radius matching but give larger weight to controls for which MDis small (using some kernel function such as eg the Epanechnikovkernel)

In addition since matching is no longer exact it may make sense torefine the estimates by applying regression-adjustment to thematched data (also known as ldquobias-adjustmentrdquo in the context ofnearest-neighbor matching)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 15

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 16

The Propensity Score Theorem (Rosenbaum and Rubin 1983)

If the conditional independence assumption is true then

Pr(Di = 1|Y 0i Y

1i Xi) = Pr(Di = 1|Xi) = π(Xi)

where π(X ) is called the propensity score

That is(Y 0Y 1) perpperp D |X

implies(Y 0Y 1) perpperp D |π(X )

so that under strong ignorability the average causal effect can beestimated by conditioning on the propensity score π(X ) instead of X

This is remarkable because the information in X which may includemany variables can be reduced to just one dimension This greatlysimplifies the matching task

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 17

Propensity Score Matching (PSM)

Instead of computing multivariate distances we can thus simplymatch on the (one-dimensional) propensity score

ProcedureI Step 1 Estimate the propensity score eg using a Logit modelI Step 2 Apply a matching algorithm using differences in thepropensity score |π(Xi )minus π(Xj)| instead of multivariate distances

PSM is tremendously popularI httpsscholargooglechscholarq=propensity+score+AND+(matching+OR+matched+OR+match)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 18

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 19

King and Nielsen

In 20152016 Gary King and Richard Nielsen circulated a paper thatcreated quite some concern among applied researchers

The basic message of the paper is that PSM is really really bad andshould be discarded

The paperI httpjmp1sexgVw

SlidesI httpsgkingharvardedupresentationswhy-propensity-scores-should-not-be-used-matching-6

Watch itI httpswwwyoutubecomwatchv=rBv39pK1iEs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 20

King and Nielsen

The story goes about as follows

Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)

I Matching is good because it reduces model dependence

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 21

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

CC

C

CC

C

C

CC

C

C

C

C

C

C

C

C

CC C

C

C

C

C

C

C

C

C

C

C

C

CCC

CC

CC

C

C

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

CC

C

CC

C

C

CC

C

C

C

C

C

C

C

C

CC C

C

C

C

C

C

C

C

C

C

C

C

CCC

CC

CC

C

C

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

CC

C

CC

C

C

CC

C

C

C

C

C

C

C

C

CC C

C

C

C

C

C

C

C

C

C

C

C

CCC

CC

CC

C

C

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

TC

C

C

CC

CC

C

C

CC

C CC

C

C

CCCC

C

CC

C

CC

CCCC

C

C

C

C

CC

CCCC

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

TC

C

C

CC

CC

C

C

CC

C CC

C

C

CCCC

C

CC

C

CC

CCCC

C

C

C

C

CC

CCCC

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

King and NielsenArgument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomizationMatching Finding Hidden Randomized Experiments

Types of Experiments

BalanceCovariates

CompleteRandomization

FullyBlocked

Observed On average ExactUnobserved On average On average

Fully blocked dominates complete randomization forimbalance model dependence power eciency bias researchcosts robustness Eg Imai King Nall 2009 SEs 600 smaller

Goal of Each Matching Method (in Observational Data)

bull PSM complete randomization

bull Other methods fully blocked

bull Other matching methods dominate PSM (wait it gets worse)

623

(slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 23

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

CC CCC CCCCC CCCCCC CCC CC C C CCCCC CCC CC CC CC CC CC CC CC CC C CCCCC CC CC C CCCC C CCC CC CC C CCC CC CC CC C CCC CC CC CCC C CC CCCCC CC CCCC C CCC CCC CC C C CCCC CC CCCC C CCCC C CC CCC CC CCC CC CC CC CC C CCCC C CC C C CC CCC CC CCC CCCCC CCCC CCC CC CCC C C CC CC CC CCCC CC CC CCC C C CCC C CC CCC CC C CCC C C CCC CC CC C CC C CC CCCCC CCCC C C CC C CCCC CC CCC CCC C CCC CC CC CC CC CC C CC C CC CC CCC CC C C CCC CCC C CC CC CCC CCC CC CCC CC C CCC CC C CC CCCC C CC C CC CC C CC C CCC C C CCC CC C CCC CCC CC CCCC CC CC C CC C CC CC C CC CCC CC C CCC CCCC C CC CC C CCC CC CC CC C CCCCC CCC C C CC C CC CCC CCC CC CCC CC CCCC C CCC CC CCC

923 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

923 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching is Suboptimal

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

CC

C C

CC

C

C

C CC

C

C

CC

C

C

C

C

C

CC CC

C

C

C

C

CC

C

CC

C

C

C

C

C

C

C

C

C C

CC

CCC

CC

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

King and Nielsen

Argument 3I Random pruning (deleting observations at random) increasesimbalance This is because the sample size decreases so that varianceincreases (large differences become more likely)

I More imbalancevariance means more model dependence andresearcher discretion

I Because PSM approximates complete randomization it engages inrandom pruning

I PSM Paradox (ldquowhen you do lsquobetterrsquo you do worserdquo)F When matching is made more strict (eg by decreasing the size of

the caliper) PSM like other matching methods typically reducesimbalance But soon the PSM Paradox kicks in such that furtherpruning quickly increases imbalance

F If the data is such that there are no big differences between treatedand untreated to begin with the PSM Paradox kicks in almostimmediately

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 25

PSM Increases Model Dependence amp Bias

Model Dependence0

000

010

020

030

040

05

Number of Units Pruned

Varia

nce

0 40 80 120 160

MDM

PSM

Bias

20

25

30

35

40

Number of Units Pruned

Max

imum

Coe

ffici

ent a

cros

s 51

2 Sp

ecifi

catio

ns0 40 80 120 160

MDM

PSM

True effect = 2

Yi = 2Ti + X1i + X2i + ii N(0 1)

2023 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26

The Propensity Score Paradox in Real Data

Finkel et al (JOP 2012)

0 500 1000 1500 2000 2500 3000

0

2

4

6

8

10

Number of units pruned

Imba

lanc

e

CEMMDM

PSM

14 SD caliper

Raw

Random

Nielsen et al (AJPS 2011)

0 500 1000 1500 2000 2500

0

5

10

15

20

25

30

Number of units pruned

Imba

lanc

e

CEM

MDM

PSM

14 SD caliper

RandomRaw

Similar pattern for gt 20 other real data sets we checked

2123 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 27

Are King and Nielsen right

Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)

I Matching is good because it reduces model dependence

I fully agree

My view however may be somewhat less pessimistic I believe thatresearch results can be credible if researchers are well educated sothat they know what they are doing and if modeling decisions aremade transparent and robustness of results is evaluated (anddocumented)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 28

Are King and Nielsen right

Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization

That fully blocked randomization is more efficient than completerandomization ndash given the sample size ndash is of course true (how largethe efficiency gains are depends on the strength of the relationbetween X and Y )

However if blocking reduces the sample size it is not a priori clearwhether estimates from the blocked sample are more efficient thanestimates from the full sample (although often they will be)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29

Are King and Nielsen right

Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization

That PSM approximates complete randomization is only partiallytrue PSM approximates complete randomization withinobservations with the same propensity score Hence PSM issomewhere between complete randomization and fully blockedrandomizationI If the X variables have no relation to T (treatment) then allobservations have the same propensity score Hence we end up withcomplete randomization

I If the X variables have a strong effect on T there is lots of blocking

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)

As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)

Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary

I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)

In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31

The Command

kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)

I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics

I Computationally efficient implementation

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32

Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)

drop if industry==2(4 observations deleted)

Mahalanobis-distance kernel matching

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33

Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)

Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif

collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888

3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0

10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0

2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0

Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio

collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966

3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1

10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1

2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34

Some Examples make a graph of the balancing stats mat M = r(M)

mat V = r(V)

coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)

addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))

addplot 2 xline(1) norescaling

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-4 -2 0 2 4 0 1 2 3 4

Std mean difference Variance ratio

Raw Matched

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35

Some Examples

Propensity-score kernel matching

kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Propensity-score kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 431 26 457 1214 182 1396 00188

Treatment-effects estimation

wage Coef

ATT 3887224NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36

Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)

01

23

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Den

sity

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37

Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)

05

1

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Cum

ulat

ive

prob

abilit

y

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38

Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)

02

46

8Raw Matched (ATT)

Untreated Treated

Prop

ensi

ty s

core

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39

Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289

NATE 1432913 2333282 614 0000 9755981 1890228

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40

Some Examples

Do some tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41

Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42

Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43

Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44

Some Examples

Mahalanobis-distance and propensity-score matching combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45

Some Examples

Exact matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46

Some Examples

Bandwidth selection the default (based on distribution of distances in

one-nearest-neighbor matching)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47

Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

3

02

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48

Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49

Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

4

1112

1314

Wei

ghte

d M

ISE

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50

Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51

Some Examples make a graph of the common-support stats mat M = r(M)

coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52

Some Examples

Multiple outcome variables

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53

Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54

Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2763358 166 0097 -082975 1000241

1ATT 9518705 406903 234 0019 1543553 1749386

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 123Prob gt chi2 = 02679

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4452343 111 0268 -379406 1365881

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017-06-24

Propensity Scores MatchingIllustration using kmatch

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017-06-24

Propensity Scores MatchingIllustration using kmatch

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64

Conclusions

The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)

Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65

Conclusions

Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the

literature (somewhat unclear eg how categorical variables should betreated)

Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 11: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

Matching

Formally

ATT =1

ND=1

sumi |D=1

[Yi minus Y 0

i

]=

1ND=1

sumi |D=1

Yi minussum

j |D=0

wijYj

ATC =

1ND=0

sumi |D=0

[Y 1

i minus Yi

]=

1ND=0

sumi |D=0

sumj |D=1

wijYj minus Yi

ATE =

ND=1

Nmiddot ATT +

ND=0

Nmiddot ATC

Different matching algorithms use different definitions of wij

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 11

Exact Matching

Exact matchingwij =

1ki if Xi = Xj

0 else

with ki as the number of observations for which Xi = Xj applies

The result equivalent to ldquoperfect stratificationrdquo or ldquosubclassificationrdquo(see eg Cochran 1968)

Problem If X contains several variables there is a large probabilitythat no exact matches can be found for many observations (theldquocurse of dimensionalityrdquo)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 12

Multivariate Distance Matching (MDM)

An alternative is to match based on a distance metric that measuresthe proximity between observations in the multivariate space of X

The idea then is to use observations that are ldquocloserdquo but notnecessarily equal as matches

A common approach is to use

MD(Xi Xj) =radic

(Xi minus Xj)primeΣminus1(Xi minus Xj)

as distance metric where Σ is an appropriate scaling matrix

I Mahalanobis matching Σ is the covariance matrix of X I Euclidean matching Σ is the identity matrixI Mahalanobis matching is equivalent to Euclidean matching based onstandardized and orthogonalized X

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 13

Matching Algorithms

Various matching algorithms can be employed to find potentialmatches based on MD and determine the matching weights wij

Pair matching (one-to-one matching without replacement)I For each observation i in the treatment group find observation j inthe control group for which MDij is smallest Once observation j isused as a match do not use it again

Nearest-neighbor matchingI For each observation i in the treatment group find the k closestobservations in the control group A single control can be usedmultiple times as a match In case of ties (multiple controls withidentical MD) use all ties as matches k is set by the researcher

Caliper matchingI Like nearest-neighbor matching but only use controls for which MDis smaller than some threshold c

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 14

Mahalanobis Matching

Radius matchingI Use all controls as matches for which MD is smaller than somethreshold c

Kernel matchingI Like radius matching but give larger weight to controls for which MDis small (using some kernel function such as eg the Epanechnikovkernel)

In addition since matching is no longer exact it may make sense torefine the estimates by applying regression-adjustment to thematched data (also known as ldquobias-adjustmentrdquo in the context ofnearest-neighbor matching)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 15

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 16

The Propensity Score Theorem (Rosenbaum and Rubin 1983)

If the conditional independence assumption is true then

Pr(Di = 1|Y 0i Y

1i Xi) = Pr(Di = 1|Xi) = π(Xi)

where π(X ) is called the propensity score

That is(Y 0Y 1) perpperp D |X

implies(Y 0Y 1) perpperp D |π(X )

so that under strong ignorability the average causal effect can beestimated by conditioning on the propensity score π(X ) instead of X

This is remarkable because the information in X which may includemany variables can be reduced to just one dimension This greatlysimplifies the matching task

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 17

Propensity Score Matching (PSM)

Instead of computing multivariate distances we can thus simplymatch on the (one-dimensional) propensity score

ProcedureI Step 1 Estimate the propensity score eg using a Logit modelI Step 2 Apply a matching algorithm using differences in thepropensity score |π(Xi )minus π(Xj)| instead of multivariate distances

PSM is tremendously popularI httpsscholargooglechscholarq=propensity+score+AND+(matching+OR+matched+OR+match)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 18

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 19

King and Nielsen

In 20152016 Gary King and Richard Nielsen circulated a paper thatcreated quite some concern among applied researchers

The basic message of the paper is that PSM is really really bad andshould be discarded

The paperI httpjmp1sexgVw

SlidesI httpsgkingharvardedupresentationswhy-propensity-scores-should-not-be-used-matching-6

Watch itI httpswwwyoutubecomwatchv=rBv39pK1iEs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 20

King and Nielsen

The story goes about as follows

Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)

I Matching is good because it reduces model dependence

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 21

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

CC

C

CC

C

C

CC

C

C

C

C

C

C

C

C

CC C

C

C

C

C

C

C

C

C

C

C

C

CCC

CC

CC

C

C

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

CC

C

CC

C

C

CC

C

C

C

C

C

C

C

C

CC C

C

C

C

C

C

C

C

C

C

C

C

CCC

CC

CC

C

C

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

CC

C

CC

C

C

CC

C

C

C

C

C

C

C

C

CC C

C

C

C

C

C

C

C

C

C

C

C

CCC

CC

CC

C

C

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

TC

C

C

CC

CC

C

C

CC

C CC

C

C

CCCC

C

CC

C

CC

CCCC

C

C

C

C

CC

CCCC

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

TC

C

C

CC

CC

C

C

CC

C CC

C

C

CCCC

C

CC

C

CC

CCCC

C

C

C

C

CC

CCCC

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

King and NielsenArgument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomizationMatching Finding Hidden Randomized Experiments

Types of Experiments

BalanceCovariates

CompleteRandomization

FullyBlocked

Observed On average ExactUnobserved On average On average

Fully blocked dominates complete randomization forimbalance model dependence power eciency bias researchcosts robustness Eg Imai King Nall 2009 SEs 600 smaller

Goal of Each Matching Method (in Observational Data)

bull PSM complete randomization

bull Other methods fully blocked

bull Other matching methods dominate PSM (wait it gets worse)

623

(slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 23

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

CC CCC CCCCC CCCCCC CCC CC C C CCCCC CCC CC CC CC CC CC CC CC CC C CCCCC CC CC C CCCC C CCC CC CC C CCC CC CC CC C CCC CC CC CCC C CC CCCCC CC CCCC C CCC CCC CC C C CCCC CC CCCC C CCCC C CC CCC CC CCC CC CC CC CC C CCCC C CC C C CC CCC CC CCC CCCCC CCCC CCC CC CCC C C CC CC CC CCCC CC CC CCC C C CCC C CC CCC CC C CCC C C CCC CC CC C CC C CC CCCCC CCCC C C CC C CCCC CC CCC CCC C CCC CC CC CC CC CC C CC C CC CC CCC CC C C CCC CCC C CC CC CCC CCC CC CCC CC C CCC CC C CC CCCC C CC C CC CC C CC C CCC C C CCC CC C CCC CCC CC CCCC CC CC C CC C CC CC C CC CCC CC C CCC CCCC C CC CC C CCC CC CC CC C CCCCC CCC C C CC C CC CCC CCC CC CCC CC CCCC C CCC CC CCC

923 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

923 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching is Suboptimal

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

CC

C C

CC

C

C

C CC

C

C

CC

C

C

C

C

C

CC CC

C

C

C

C

CC

C

CC

C

C

C

C

C

C

C

C

C C

CC

CCC

CC

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

King and Nielsen

Argument 3I Random pruning (deleting observations at random) increasesimbalance This is because the sample size decreases so that varianceincreases (large differences become more likely)

I More imbalancevariance means more model dependence andresearcher discretion

I Because PSM approximates complete randomization it engages inrandom pruning

I PSM Paradox (ldquowhen you do lsquobetterrsquo you do worserdquo)F When matching is made more strict (eg by decreasing the size of

the caliper) PSM like other matching methods typically reducesimbalance But soon the PSM Paradox kicks in such that furtherpruning quickly increases imbalance

F If the data is such that there are no big differences between treatedand untreated to begin with the PSM Paradox kicks in almostimmediately

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 25

PSM Increases Model Dependence amp Bias

Model Dependence0

000

010

020

030

040

05

Number of Units Pruned

Varia

nce

0 40 80 120 160

MDM

PSM

Bias

20

25

30

35

40

Number of Units Pruned

Max

imum

Coe

ffici

ent a

cros

s 51

2 Sp

ecifi

catio

ns0 40 80 120 160

MDM

PSM

True effect = 2

Yi = 2Ti + X1i + X2i + ii N(0 1)

2023 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26

The Propensity Score Paradox in Real Data

Finkel et al (JOP 2012)

0 500 1000 1500 2000 2500 3000

0

2

4

6

8

10

Number of units pruned

Imba

lanc

e

CEMMDM

PSM

14 SD caliper

Raw

Random

Nielsen et al (AJPS 2011)

0 500 1000 1500 2000 2500

0

5

10

15

20

25

30

Number of units pruned

Imba

lanc

e

CEM

MDM

PSM

14 SD caliper

RandomRaw

Similar pattern for gt 20 other real data sets we checked

2123 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 27

Are King and Nielsen right

Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)

I Matching is good because it reduces model dependence

I fully agree

My view however may be somewhat less pessimistic I believe thatresearch results can be credible if researchers are well educated sothat they know what they are doing and if modeling decisions aremade transparent and robustness of results is evaluated (anddocumented)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 28

Are King and Nielsen right

Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization

That fully blocked randomization is more efficient than completerandomization ndash given the sample size ndash is of course true (how largethe efficiency gains are depends on the strength of the relationbetween X and Y )

However if blocking reduces the sample size it is not a priori clearwhether estimates from the blocked sample are more efficient thanestimates from the full sample (although often they will be)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29

Are King and Nielsen right

Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization

That PSM approximates complete randomization is only partiallytrue PSM approximates complete randomization withinobservations with the same propensity score Hence PSM issomewhere between complete randomization and fully blockedrandomizationI If the X variables have no relation to T (treatment) then allobservations have the same propensity score Hence we end up withcomplete randomization

I If the X variables have a strong effect on T there is lots of blocking

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)

As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)

Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary

I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)

In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31

The Command

kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)

I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics

I Computationally efficient implementation

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32

Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)

drop if industry==2(4 observations deleted)

Mahalanobis-distance kernel matching

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33

Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)

Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif

collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888

3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0

10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0

2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0

Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio

collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966

3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1

10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1

2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34

Some Examples make a graph of the balancing stats mat M = r(M)

mat V = r(V)

coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)

addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))

addplot 2 xline(1) norescaling

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-4 -2 0 2 4 0 1 2 3 4

Std mean difference Variance ratio

Raw Matched

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35

Some Examples

Propensity-score kernel matching

kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Propensity-score kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 431 26 457 1214 182 1396 00188

Treatment-effects estimation

wage Coef

ATT 3887224NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36

Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)

01

23

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Den

sity

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37

Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)

05

1

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Cum

ulat

ive

prob

abilit

y

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38

Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)

02

46

8Raw Matched (ATT)

Untreated Treated

Prop

ensi

ty s

core

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39

Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289

NATE 1432913 2333282 614 0000 9755981 1890228

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40

Some Examples

Do some tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41

Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42

Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43

Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44

Some Examples

Mahalanobis-distance and propensity-score matching combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45

Some Examples

Exact matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46

Some Examples

Bandwidth selection the default (based on distribution of distances in

one-nearest-neighbor matching)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47

Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

3

02

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48

Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49

Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

4

1112

1314

Wei

ghte

d M

ISE

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50

Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51

Some Examples make a graph of the common-support stats mat M = r(M)

coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52

Some Examples

Multiple outcome variables

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53

Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54

Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2763358 166 0097 -082975 1000241

1ATT 9518705 406903 234 0019 1543553 1749386

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 123Prob gt chi2 = 02679

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4452343 111 0268 -379406 1365881

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017-06-24

Propensity Scores MatchingIllustration using kmatch

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017-06-24

Propensity Scores MatchingIllustration using kmatch

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64

Conclusions

The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)

Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65

Conclusions

Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the

literature (somewhat unclear eg how categorical variables should betreated)

Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 12: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

Exact Matching

Exact matchingwij =

1ki if Xi = Xj

0 else

with ki as the number of observations for which Xi = Xj applies

The result equivalent to ldquoperfect stratificationrdquo or ldquosubclassificationrdquo(see eg Cochran 1968)

Problem If X contains several variables there is a large probabilitythat no exact matches can be found for many observations (theldquocurse of dimensionalityrdquo)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 12

Multivariate Distance Matching (MDM)

An alternative is to match based on a distance metric that measuresthe proximity between observations in the multivariate space of X

The idea then is to use observations that are ldquocloserdquo but notnecessarily equal as matches

A common approach is to use

MD(Xi Xj) =radic

(Xi minus Xj)primeΣminus1(Xi minus Xj)

as distance metric where Σ is an appropriate scaling matrix

I Mahalanobis matching Σ is the covariance matrix of X I Euclidean matching Σ is the identity matrixI Mahalanobis matching is equivalent to Euclidean matching based onstandardized and orthogonalized X

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 13

Matching Algorithms

Various matching algorithms can be employed to find potentialmatches based on MD and determine the matching weights wij

Pair matching (one-to-one matching without replacement)I For each observation i in the treatment group find observation j inthe control group for which MDij is smallest Once observation j isused as a match do not use it again

Nearest-neighbor matchingI For each observation i in the treatment group find the k closestobservations in the control group A single control can be usedmultiple times as a match In case of ties (multiple controls withidentical MD) use all ties as matches k is set by the researcher

Caliper matchingI Like nearest-neighbor matching but only use controls for which MDis smaller than some threshold c

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 14

Mahalanobis Matching

Radius matchingI Use all controls as matches for which MD is smaller than somethreshold c

Kernel matchingI Like radius matching but give larger weight to controls for which MDis small (using some kernel function such as eg the Epanechnikovkernel)

In addition since matching is no longer exact it may make sense torefine the estimates by applying regression-adjustment to thematched data (also known as ldquobias-adjustmentrdquo in the context ofnearest-neighbor matching)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 15

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 16

The Propensity Score Theorem (Rosenbaum and Rubin 1983)

If the conditional independence assumption is true then

Pr(Di = 1|Y 0i Y

1i Xi) = Pr(Di = 1|Xi) = π(Xi)

where π(X ) is called the propensity score

That is(Y 0Y 1) perpperp D |X

implies(Y 0Y 1) perpperp D |π(X )

so that under strong ignorability the average causal effect can beestimated by conditioning on the propensity score π(X ) instead of X

This is remarkable because the information in X which may includemany variables can be reduced to just one dimension This greatlysimplifies the matching task

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 17

Propensity Score Matching (PSM)

Instead of computing multivariate distances we can thus simplymatch on the (one-dimensional) propensity score

ProcedureI Step 1 Estimate the propensity score eg using a Logit modelI Step 2 Apply a matching algorithm using differences in thepropensity score |π(Xi )minus π(Xj)| instead of multivariate distances

PSM is tremendously popularI httpsscholargooglechscholarq=propensity+score+AND+(matching+OR+matched+OR+match)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 18

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 19

King and Nielsen

In 20152016 Gary King and Richard Nielsen circulated a paper thatcreated quite some concern among applied researchers

The basic message of the paper is that PSM is really really bad andshould be discarded

The paperI httpjmp1sexgVw

SlidesI httpsgkingharvardedupresentationswhy-propensity-scores-should-not-be-used-matching-6

Watch itI httpswwwyoutubecomwatchv=rBv39pK1iEs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 20

King and Nielsen

The story goes about as follows

Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)

I Matching is good because it reduces model dependence

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 21

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

CC

C

CC

C

C

CC

C

C

C

C

C

C

C

C

CC C

C

C

C

C

C

C

C

C

C

C

C

CCC

CC

CC

C

C

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

CC

C

CC

C

C

CC

C

C

C

C

C

C

C

C

CC C

C

C

C

C

C

C

C

C

C

C

C

CCC

CC

CC

C

C

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

CC

C

CC

C

C

CC

C

C

C

C

C

C

C

C

CC C

C

C

C

C

C

C

C

C

C

C

C

CCC

CC

CC

C

C

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

TC

C

C

CC

CC

C

C

CC

C CC

C

C

CCCC

C

CC

C

CC

CCCC

C

C

C

C

CC

CCCC

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

TC

C

C

CC

CC

C

C

CC

C CC

C

C

CCCC

C

CC

C

CC

CCCC

C

C

C

C

CC

CCCC

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

King and NielsenArgument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomizationMatching Finding Hidden Randomized Experiments

Types of Experiments

BalanceCovariates

CompleteRandomization

FullyBlocked

Observed On average ExactUnobserved On average On average

Fully blocked dominates complete randomization forimbalance model dependence power eciency bias researchcosts robustness Eg Imai King Nall 2009 SEs 600 smaller

Goal of Each Matching Method (in Observational Data)

bull PSM complete randomization

bull Other methods fully blocked

bull Other matching methods dominate PSM (wait it gets worse)

623

(slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 23

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

CC CCC CCCCC CCCCCC CCC CC C C CCCCC CCC CC CC CC CC CC CC CC CC C CCCCC CC CC C CCCC C CCC CC CC C CCC CC CC CC C CCC CC CC CCC C CC CCCCC CC CCCC C CCC CCC CC C C CCCC CC CCCC C CCCC C CC CCC CC CCC CC CC CC CC C CCCC C CC C C CC CCC CC CCC CCCCC CCCC CCC CC CCC C C CC CC CC CCCC CC CC CCC C C CCC C CC CCC CC C CCC C C CCC CC CC C CC C CC CCCCC CCCC C C CC C CCCC CC CCC CCC C CCC CC CC CC CC CC C CC C CC CC CCC CC C C CCC CCC C CC CC CCC CCC CC CCC CC C CCC CC C CC CCCC C CC C CC CC C CC C CCC C C CCC CC C CCC CCC CC CCCC CC CC C CC C CC CC C CC CCC CC C CCC CCCC C CC CC C CCC CC CC CC C CCCCC CCC C C CC C CC CCC CCC CC CCC CC CCCC C CCC CC CCC

923 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

923 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching is Suboptimal

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

CC

C C

CC

C

C

C CC

C

C

CC

C

C

C

C

C

CC CC

C

C

C

C

CC

C

CC

C

C

C

C

C

C

C

C

C C

CC

CCC

CC

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

King and Nielsen

Argument 3I Random pruning (deleting observations at random) increasesimbalance This is because the sample size decreases so that varianceincreases (large differences become more likely)

I More imbalancevariance means more model dependence andresearcher discretion

I Because PSM approximates complete randomization it engages inrandom pruning

I PSM Paradox (ldquowhen you do lsquobetterrsquo you do worserdquo)F When matching is made more strict (eg by decreasing the size of

the caliper) PSM like other matching methods typically reducesimbalance But soon the PSM Paradox kicks in such that furtherpruning quickly increases imbalance

F If the data is such that there are no big differences between treatedand untreated to begin with the PSM Paradox kicks in almostimmediately

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 25

PSM Increases Model Dependence amp Bias

Model Dependence0

000

010

020

030

040

05

Number of Units Pruned

Varia

nce

0 40 80 120 160

MDM

PSM

Bias

20

25

30

35

40

Number of Units Pruned

Max

imum

Coe

ffici

ent a

cros

s 51

2 Sp

ecifi

catio

ns0 40 80 120 160

MDM

PSM

True effect = 2

Yi = 2Ti + X1i + X2i + ii N(0 1)

2023 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26

The Propensity Score Paradox in Real Data

Finkel et al (JOP 2012)

0 500 1000 1500 2000 2500 3000

0

2

4

6

8

10

Number of units pruned

Imba

lanc

e

CEMMDM

PSM

14 SD caliper

Raw

Random

Nielsen et al (AJPS 2011)

0 500 1000 1500 2000 2500

0

5

10

15

20

25

30

Number of units pruned

Imba

lanc

e

CEM

MDM

PSM

14 SD caliper

RandomRaw

Similar pattern for gt 20 other real data sets we checked

2123 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 27

Are King and Nielsen right

Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)

I Matching is good because it reduces model dependence

I fully agree

My view however may be somewhat less pessimistic I believe thatresearch results can be credible if researchers are well educated sothat they know what they are doing and if modeling decisions aremade transparent and robustness of results is evaluated (anddocumented)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 28

Are King and Nielsen right

Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization

That fully blocked randomization is more efficient than completerandomization ndash given the sample size ndash is of course true (how largethe efficiency gains are depends on the strength of the relationbetween X and Y )

However if blocking reduces the sample size it is not a priori clearwhether estimates from the blocked sample are more efficient thanestimates from the full sample (although often they will be)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29

Are King and Nielsen right

Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization

That PSM approximates complete randomization is only partiallytrue PSM approximates complete randomization withinobservations with the same propensity score Hence PSM issomewhere between complete randomization and fully blockedrandomizationI If the X variables have no relation to T (treatment) then allobservations have the same propensity score Hence we end up withcomplete randomization

I If the X variables have a strong effect on T there is lots of blocking

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)

As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)

Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary

I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)

In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31

The Command

kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)

I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics

I Computationally efficient implementation

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32

Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)

drop if industry==2(4 observations deleted)

Mahalanobis-distance kernel matching

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33

Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)

Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif

collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888

3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0

10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0

2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0

Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio

collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966

3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1

10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1

2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34

Some Examples make a graph of the balancing stats mat M = r(M)

mat V = r(V)

coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)

addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))

addplot 2 xline(1) norescaling

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-4 -2 0 2 4 0 1 2 3 4

Std mean difference Variance ratio

Raw Matched

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35

Some Examples

Propensity-score kernel matching

kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Propensity-score kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 431 26 457 1214 182 1396 00188

Treatment-effects estimation

wage Coef

ATT 3887224NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36

Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)

01

23

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Den

sity

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37

Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)

05

1

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Cum

ulat

ive

prob

abilit

y

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38

Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)

02

46

8Raw Matched (ATT)

Untreated Treated

Prop

ensi

ty s

core

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39

Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289

NATE 1432913 2333282 614 0000 9755981 1890228

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40

Some Examples

Do some tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41

Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42

Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43

Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44

Some Examples

Mahalanobis-distance and propensity-score matching combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45

Some Examples

Exact matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46

Some Examples

Bandwidth selection the default (based on distribution of distances in

one-nearest-neighbor matching)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47

Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

3

02

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48

Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49

Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

4

1112

1314

Wei

ghte

d M

ISE

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50

Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51

Some Examples make a graph of the common-support stats mat M = r(M)

coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52

Some Examples

Multiple outcome variables

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53

Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54

Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2763358 166 0097 -082975 1000241

1ATT 9518705 406903 234 0019 1543553 1749386

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 123Prob gt chi2 = 02679

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4452343 111 0268 -379406 1365881

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017-06-24

Propensity Scores MatchingIllustration using kmatch

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017-06-24

Propensity Scores MatchingIllustration using kmatch

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64

Conclusions

The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)

Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65

Conclusions

Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the

literature (somewhat unclear eg how categorical variables should betreated)

Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 13: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

Multivariate Distance Matching (MDM)

An alternative is to match based on a distance metric that measuresthe proximity between observations in the multivariate space of X

The idea then is to use observations that are ldquocloserdquo but notnecessarily equal as matches

A common approach is to use

MD(Xi Xj) =radic

(Xi minus Xj)primeΣminus1(Xi minus Xj)

as distance metric where Σ is an appropriate scaling matrix

I Mahalanobis matching Σ is the covariance matrix of X I Euclidean matching Σ is the identity matrixI Mahalanobis matching is equivalent to Euclidean matching based onstandardized and orthogonalized X

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 13

Matching Algorithms

Various matching algorithms can be employed to find potentialmatches based on MD and determine the matching weights wij

Pair matching (one-to-one matching without replacement)I For each observation i in the treatment group find observation j inthe control group for which MDij is smallest Once observation j isused as a match do not use it again

Nearest-neighbor matchingI For each observation i in the treatment group find the k closestobservations in the control group A single control can be usedmultiple times as a match In case of ties (multiple controls withidentical MD) use all ties as matches k is set by the researcher

Caliper matchingI Like nearest-neighbor matching but only use controls for which MDis smaller than some threshold c

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 14

Mahalanobis Matching

Radius matchingI Use all controls as matches for which MD is smaller than somethreshold c

Kernel matchingI Like radius matching but give larger weight to controls for which MDis small (using some kernel function such as eg the Epanechnikovkernel)

In addition since matching is no longer exact it may make sense torefine the estimates by applying regression-adjustment to thematched data (also known as ldquobias-adjustmentrdquo in the context ofnearest-neighbor matching)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 15

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 16

The Propensity Score Theorem (Rosenbaum and Rubin 1983)

If the conditional independence assumption is true then

Pr(Di = 1|Y 0i Y

1i Xi) = Pr(Di = 1|Xi) = π(Xi)

where π(X ) is called the propensity score

That is(Y 0Y 1) perpperp D |X

implies(Y 0Y 1) perpperp D |π(X )

so that under strong ignorability the average causal effect can beestimated by conditioning on the propensity score π(X ) instead of X

This is remarkable because the information in X which may includemany variables can be reduced to just one dimension This greatlysimplifies the matching task

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 17

Propensity Score Matching (PSM)

Instead of computing multivariate distances we can thus simplymatch on the (one-dimensional) propensity score

ProcedureI Step 1 Estimate the propensity score eg using a Logit modelI Step 2 Apply a matching algorithm using differences in thepropensity score |π(Xi )minus π(Xj)| instead of multivariate distances

PSM is tremendously popularI httpsscholargooglechscholarq=propensity+score+AND+(matching+OR+matched+OR+match)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 18

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 19

King and Nielsen

In 20152016 Gary King and Richard Nielsen circulated a paper thatcreated quite some concern among applied researchers

The basic message of the paper is that PSM is really really bad andshould be discarded

The paperI httpjmp1sexgVw

SlidesI httpsgkingharvardedupresentationswhy-propensity-scores-should-not-be-used-matching-6

Watch itI httpswwwyoutubecomwatchv=rBv39pK1iEs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 20

King and Nielsen

The story goes about as follows

Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)

I Matching is good because it reduces model dependence

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 21

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

CC

C

CC

C

C

CC

C

C

C

C

C

C

C

C

CC C

C

C

C

C

C

C

C

C

C

C

C

CCC

CC

CC

C

C

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

CC

C

CC

C

C

CC

C

C

C

C

C

C

C

C

CC C

C

C

C

C

C

C

C

C

C

C

C

CCC

CC

CC

C

C

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

CC

C

CC

C

C

CC

C

C

C

C

C

C

C

C

CC C

C

C

C

C

C

C

C

C

C

C

C

CCC

CC

CC

C

C

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

TC

C

C

CC

CC

C

C

CC

C CC

C

C

CCCC

C

CC

C

CC

CCCC

C

C

C

C

CC

CCCC

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

TC

C

C

CC

CC

C

C

CC

C CC

C

C

CCCC

C

CC

C

CC

CCCC

C

C

C

C

CC

CCCC

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

King and NielsenArgument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomizationMatching Finding Hidden Randomized Experiments

Types of Experiments

BalanceCovariates

CompleteRandomization

FullyBlocked

Observed On average ExactUnobserved On average On average

Fully blocked dominates complete randomization forimbalance model dependence power eciency bias researchcosts robustness Eg Imai King Nall 2009 SEs 600 smaller

Goal of Each Matching Method (in Observational Data)

bull PSM complete randomization

bull Other methods fully blocked

bull Other matching methods dominate PSM (wait it gets worse)

623

(slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 23

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

CC CCC CCCCC CCCCCC CCC CC C C CCCCC CCC CC CC CC CC CC CC CC CC C CCCCC CC CC C CCCC C CCC CC CC C CCC CC CC CC C CCC CC CC CCC C CC CCCCC CC CCCC C CCC CCC CC C C CCCC CC CCCC C CCCC C CC CCC CC CCC CC CC CC CC C CCCC C CC C C CC CCC CC CCC CCCCC CCCC CCC CC CCC C C CC CC CC CCCC CC CC CCC C C CCC C CC CCC CC C CCC C C CCC CC CC C CC C CC CCCCC CCCC C C CC C CCCC CC CCC CCC C CCC CC CC CC CC CC C CC C CC CC CCC CC C C CCC CCC C CC CC CCC CCC CC CCC CC C CCC CC C CC CCCC C CC C CC CC C CC C CCC C C CCC CC C CCC CCC CC CCCC CC CC C CC C CC CC C CC CCC CC C CCC CCCC C CC CC C CCC CC CC CC C CCCCC CCC C C CC C CC CCC CCC CC CCC CC CCCC C CCC CC CCC

923 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

923 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching is Suboptimal

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

CC

C C

CC

C

C

C CC

C

C

CC

C

C

C

C

C

CC CC

C

C

C

C

CC

C

CC

C

C

C

C

C

C

C

C

C C

CC

CCC

CC

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

King and Nielsen

Argument 3I Random pruning (deleting observations at random) increasesimbalance This is because the sample size decreases so that varianceincreases (large differences become more likely)

I More imbalancevariance means more model dependence andresearcher discretion

I Because PSM approximates complete randomization it engages inrandom pruning

I PSM Paradox (ldquowhen you do lsquobetterrsquo you do worserdquo)F When matching is made more strict (eg by decreasing the size of

the caliper) PSM like other matching methods typically reducesimbalance But soon the PSM Paradox kicks in such that furtherpruning quickly increases imbalance

F If the data is such that there are no big differences between treatedand untreated to begin with the PSM Paradox kicks in almostimmediately

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 25

PSM Increases Model Dependence amp Bias

Model Dependence0

000

010

020

030

040

05

Number of Units Pruned

Varia

nce

0 40 80 120 160

MDM

PSM

Bias

20

25

30

35

40

Number of Units Pruned

Max

imum

Coe

ffici

ent a

cros

s 51

2 Sp

ecifi

catio

ns0 40 80 120 160

MDM

PSM

True effect = 2

Yi = 2Ti + X1i + X2i + ii N(0 1)

2023 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26

The Propensity Score Paradox in Real Data

Finkel et al (JOP 2012)

0 500 1000 1500 2000 2500 3000

0

2

4

6

8

10

Number of units pruned

Imba

lanc

e

CEMMDM

PSM

14 SD caliper

Raw

Random

Nielsen et al (AJPS 2011)

0 500 1000 1500 2000 2500

0

5

10

15

20

25

30

Number of units pruned

Imba

lanc

e

CEM

MDM

PSM

14 SD caliper

RandomRaw

Similar pattern for gt 20 other real data sets we checked

2123 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 27

Are King and Nielsen right

Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)

I Matching is good because it reduces model dependence

I fully agree

My view however may be somewhat less pessimistic I believe thatresearch results can be credible if researchers are well educated sothat they know what they are doing and if modeling decisions aremade transparent and robustness of results is evaluated (anddocumented)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 28

Are King and Nielsen right

Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization

That fully blocked randomization is more efficient than completerandomization ndash given the sample size ndash is of course true (how largethe efficiency gains are depends on the strength of the relationbetween X and Y )

However if blocking reduces the sample size it is not a priori clearwhether estimates from the blocked sample are more efficient thanestimates from the full sample (although often they will be)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29

Are King and Nielsen right

Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization

That PSM approximates complete randomization is only partiallytrue PSM approximates complete randomization withinobservations with the same propensity score Hence PSM issomewhere between complete randomization and fully blockedrandomizationI If the X variables have no relation to T (treatment) then allobservations have the same propensity score Hence we end up withcomplete randomization

I If the X variables have a strong effect on T there is lots of blocking

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)

As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)

Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary

I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)

In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31

The Command

kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)

I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics

I Computationally efficient implementation

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32

Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)

drop if industry==2(4 observations deleted)

Mahalanobis-distance kernel matching

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33

Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)

Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif

collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888

3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0

10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0

2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0

Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio

collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966

3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1

10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1

2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34

Some Examples make a graph of the balancing stats mat M = r(M)

mat V = r(V)

coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)

addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))

addplot 2 xline(1) norescaling

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-4 -2 0 2 4 0 1 2 3 4

Std mean difference Variance ratio

Raw Matched

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35

Some Examples

Propensity-score kernel matching

kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Propensity-score kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 431 26 457 1214 182 1396 00188

Treatment-effects estimation

wage Coef

ATT 3887224NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36

Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)

01

23

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Den

sity

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37

Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)

05

1

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Cum

ulat

ive

prob

abilit

y

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38

Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)

02

46

8Raw Matched (ATT)

Untreated Treated

Prop

ensi

ty s

core

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39

Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289

NATE 1432913 2333282 614 0000 9755981 1890228

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40

Some Examples

Do some tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41

Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42

Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43

Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44

Some Examples

Mahalanobis-distance and propensity-score matching combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45

Some Examples

Exact matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46

Some Examples

Bandwidth selection the default (based on distribution of distances in

one-nearest-neighbor matching)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47

Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

3

02

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48

Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49

Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

4

1112

1314

Wei

ghte

d M

ISE

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50

Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51

Some Examples make a graph of the common-support stats mat M = r(M)

coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52

Some Examples

Multiple outcome variables

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53

Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54

Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2763358 166 0097 -082975 1000241

1ATT 9518705 406903 234 0019 1543553 1749386

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 123Prob gt chi2 = 02679

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4452343 111 0268 -379406 1365881

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017-06-24

Propensity Scores MatchingIllustration using kmatch

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017-06-24

Propensity Scores MatchingIllustration using kmatch

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64

Conclusions

The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)

Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65

Conclusions

Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the

literature (somewhat unclear eg how categorical variables should betreated)

Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 14: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

Matching Algorithms

Various matching algorithms can be employed to find potentialmatches based on MD and determine the matching weights wij

Pair matching (one-to-one matching without replacement)I For each observation i in the treatment group find observation j inthe control group for which MDij is smallest Once observation j isused as a match do not use it again

Nearest-neighbor matchingI For each observation i in the treatment group find the k closestobservations in the control group A single control can be usedmultiple times as a match In case of ties (multiple controls withidentical MD) use all ties as matches k is set by the researcher

Caliper matchingI Like nearest-neighbor matching but only use controls for which MDis smaller than some threshold c

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 14

Mahalanobis Matching

Radius matchingI Use all controls as matches for which MD is smaller than somethreshold c

Kernel matchingI Like radius matching but give larger weight to controls for which MDis small (using some kernel function such as eg the Epanechnikovkernel)

In addition since matching is no longer exact it may make sense torefine the estimates by applying regression-adjustment to thematched data (also known as ldquobias-adjustmentrdquo in the context ofnearest-neighbor matching)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 15

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 16

The Propensity Score Theorem (Rosenbaum and Rubin 1983)

If the conditional independence assumption is true then

Pr(Di = 1|Y 0i Y

1i Xi) = Pr(Di = 1|Xi) = π(Xi)

where π(X ) is called the propensity score

That is(Y 0Y 1) perpperp D |X

implies(Y 0Y 1) perpperp D |π(X )

so that under strong ignorability the average causal effect can beestimated by conditioning on the propensity score π(X ) instead of X

This is remarkable because the information in X which may includemany variables can be reduced to just one dimension This greatlysimplifies the matching task

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 17

Propensity Score Matching (PSM)

Instead of computing multivariate distances we can thus simplymatch on the (one-dimensional) propensity score

ProcedureI Step 1 Estimate the propensity score eg using a Logit modelI Step 2 Apply a matching algorithm using differences in thepropensity score |π(Xi )minus π(Xj)| instead of multivariate distances

PSM is tremendously popularI httpsscholargooglechscholarq=propensity+score+AND+(matching+OR+matched+OR+match)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 18

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 19

King and Nielsen

In 20152016 Gary King and Richard Nielsen circulated a paper thatcreated quite some concern among applied researchers

The basic message of the paper is that PSM is really really bad andshould be discarded

The paperI httpjmp1sexgVw

SlidesI httpsgkingharvardedupresentationswhy-propensity-scores-should-not-be-used-matching-6

Watch itI httpswwwyoutubecomwatchv=rBv39pK1iEs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 20

King and Nielsen

The story goes about as follows

Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)

I Matching is good because it reduces model dependence

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 21

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

CC

C

CC

C

C

CC

C

C

C

C

C

C

C

C

CC C

C

C

C

C

C

C

C

C

C

C

C

CCC

CC

CC

C

C

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

CC

C

CC

C

C

CC

C

C

C

C

C

C

C

C

CC C

C

C

C

C

C

C

C

C

C

C

C

CCC

CC

CC

C

C

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

CC

C

CC

C

C

CC

C

C

C

C

C

C

C

C

CC C

C

C

C

C

C

C

C

C

C

C

C

CCC

CC

CC

C

C

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

TC

C

C

CC

CC

C

C

CC

C CC

C

C

CCCC

C

CC

C

CC

CCCC

C

C

C

C

CC

CCCC

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

TC

C

C

CC

CC

C

C

CC

C CC

C

C

CCCC

C

CC

C

CC

CCCC

C

C

C

C

CC

CCCC

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

King and NielsenArgument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomizationMatching Finding Hidden Randomized Experiments

Types of Experiments

BalanceCovariates

CompleteRandomization

FullyBlocked

Observed On average ExactUnobserved On average On average

Fully blocked dominates complete randomization forimbalance model dependence power eciency bias researchcosts robustness Eg Imai King Nall 2009 SEs 600 smaller

Goal of Each Matching Method (in Observational Data)

bull PSM complete randomization

bull Other methods fully blocked

bull Other matching methods dominate PSM (wait it gets worse)

623

(slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 23

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

CC CCC CCCCC CCCCCC CCC CC C C CCCCC CCC CC CC CC CC CC CC CC CC C CCCCC CC CC C CCCC C CCC CC CC C CCC CC CC CC C CCC CC CC CCC C CC CCCCC CC CCCC C CCC CCC CC C C CCCC CC CCCC C CCCC C CC CCC CC CCC CC CC CC CC C CCCC C CC C C CC CCC CC CCC CCCCC CCCC CCC CC CCC C C CC CC CC CCCC CC CC CCC C C CCC C CC CCC CC C CCC C C CCC CC CC C CC C CC CCCCC CCCC C C CC C CCCC CC CCC CCC C CCC CC CC CC CC CC C CC C CC CC CCC CC C C CCC CCC C CC CC CCC CCC CC CCC CC C CCC CC C CC CCCC C CC C CC CC C CC C CCC C C CCC CC C CCC CCC CC CCCC CC CC C CC C CC CC C CC CCC CC C CCC CCCC C CC CC C CCC CC CC CC C CCCCC CCC C C CC C CC CCC CCC CC CCC CC CCCC C CCC CC CCC

923 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

923 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching is Suboptimal

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

CC

C C

CC

C

C

C CC

C

C

CC

C

C

C

C

C

CC CC

C

C

C

C

CC

C

CC

C

C

C

C

C

C

C

C

C C

CC

CCC

CC

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

King and Nielsen

Argument 3I Random pruning (deleting observations at random) increasesimbalance This is because the sample size decreases so that varianceincreases (large differences become more likely)

I More imbalancevariance means more model dependence andresearcher discretion

I Because PSM approximates complete randomization it engages inrandom pruning

I PSM Paradox (ldquowhen you do lsquobetterrsquo you do worserdquo)F When matching is made more strict (eg by decreasing the size of

the caliper) PSM like other matching methods typically reducesimbalance But soon the PSM Paradox kicks in such that furtherpruning quickly increases imbalance

F If the data is such that there are no big differences between treatedand untreated to begin with the PSM Paradox kicks in almostimmediately

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 25

PSM Increases Model Dependence amp Bias

Model Dependence0

000

010

020

030

040

05

Number of Units Pruned

Varia

nce

0 40 80 120 160

MDM

PSM

Bias

20

25

30

35

40

Number of Units Pruned

Max

imum

Coe

ffici

ent a

cros

s 51

2 Sp

ecifi

catio

ns0 40 80 120 160

MDM

PSM

True effect = 2

Yi = 2Ti + X1i + X2i + ii N(0 1)

2023 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26

The Propensity Score Paradox in Real Data

Finkel et al (JOP 2012)

0 500 1000 1500 2000 2500 3000

0

2

4

6

8

10

Number of units pruned

Imba

lanc

e

CEMMDM

PSM

14 SD caliper

Raw

Random

Nielsen et al (AJPS 2011)

0 500 1000 1500 2000 2500

0

5

10

15

20

25

30

Number of units pruned

Imba

lanc

e

CEM

MDM

PSM

14 SD caliper

RandomRaw

Similar pattern for gt 20 other real data sets we checked

2123 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 27

Are King and Nielsen right

Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)

I Matching is good because it reduces model dependence

I fully agree

My view however may be somewhat less pessimistic I believe thatresearch results can be credible if researchers are well educated sothat they know what they are doing and if modeling decisions aremade transparent and robustness of results is evaluated (anddocumented)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 28

Are King and Nielsen right

Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization

That fully blocked randomization is more efficient than completerandomization ndash given the sample size ndash is of course true (how largethe efficiency gains are depends on the strength of the relationbetween X and Y )

However if blocking reduces the sample size it is not a priori clearwhether estimates from the blocked sample are more efficient thanestimates from the full sample (although often they will be)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29

Are King and Nielsen right

Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization

That PSM approximates complete randomization is only partiallytrue PSM approximates complete randomization withinobservations with the same propensity score Hence PSM issomewhere between complete randomization and fully blockedrandomizationI If the X variables have no relation to T (treatment) then allobservations have the same propensity score Hence we end up withcomplete randomization

I If the X variables have a strong effect on T there is lots of blocking

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)

As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)

Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary

I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)

In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31

The Command

kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)

I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics

I Computationally efficient implementation

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32

Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)

drop if industry==2(4 observations deleted)

Mahalanobis-distance kernel matching

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33

Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)

Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif

collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888

3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0

10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0

2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0

Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio

collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966

3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1

10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1

2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34

Some Examples make a graph of the balancing stats mat M = r(M)

mat V = r(V)

coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)

addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))

addplot 2 xline(1) norescaling

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-4 -2 0 2 4 0 1 2 3 4

Std mean difference Variance ratio

Raw Matched

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35

Some Examples

Propensity-score kernel matching

kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Propensity-score kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 431 26 457 1214 182 1396 00188

Treatment-effects estimation

wage Coef

ATT 3887224NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36

Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)

01

23

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Den

sity

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37

Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)

05

1

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Cum

ulat

ive

prob

abilit

y

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38

Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)

02

46

8Raw Matched (ATT)

Untreated Treated

Prop

ensi

ty s

core

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39

Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289

NATE 1432913 2333282 614 0000 9755981 1890228

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40

Some Examples

Do some tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41

Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42

Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43

Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44

Some Examples

Mahalanobis-distance and propensity-score matching combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45

Some Examples

Exact matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46

Some Examples

Bandwidth selection the default (based on distribution of distances in

one-nearest-neighbor matching)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47

Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

3

02

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48

Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49

Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

4

1112

1314

Wei

ghte

d M

ISE

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50

Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51

Some Examples make a graph of the common-support stats mat M = r(M)

coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52

Some Examples

Multiple outcome variables

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53

Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54

Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2763358 166 0097 -082975 1000241

1ATT 9518705 406903 234 0019 1543553 1749386

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 123Prob gt chi2 = 02679

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4452343 111 0268 -379406 1365881

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017-06-24

Propensity Scores MatchingIllustration using kmatch

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017-06-24

Propensity Scores MatchingIllustration using kmatch

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64

Conclusions

The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)

Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65

Conclusions

Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the

literature (somewhat unclear eg how categorical variables should betreated)

Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 15: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

Mahalanobis Matching

Radius matchingI Use all controls as matches for which MD is smaller than somethreshold c

Kernel matchingI Like radius matching but give larger weight to controls for which MDis small (using some kernel function such as eg the Epanechnikovkernel)

In addition since matching is no longer exact it may make sense torefine the estimates by applying regression-adjustment to thematched data (also known as ldquobias-adjustmentrdquo in the context ofnearest-neighbor matching)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 15

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 16

The Propensity Score Theorem (Rosenbaum and Rubin 1983)

If the conditional independence assumption is true then

Pr(Di = 1|Y 0i Y

1i Xi) = Pr(Di = 1|Xi) = π(Xi)

where π(X ) is called the propensity score

That is(Y 0Y 1) perpperp D |X

implies(Y 0Y 1) perpperp D |π(X )

so that under strong ignorability the average causal effect can beestimated by conditioning on the propensity score π(X ) instead of X

This is remarkable because the information in X which may includemany variables can be reduced to just one dimension This greatlysimplifies the matching task

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 17

Propensity Score Matching (PSM)

Instead of computing multivariate distances we can thus simplymatch on the (one-dimensional) propensity score

ProcedureI Step 1 Estimate the propensity score eg using a Logit modelI Step 2 Apply a matching algorithm using differences in thepropensity score |π(Xi )minus π(Xj)| instead of multivariate distances

PSM is tremendously popularI httpsscholargooglechscholarq=propensity+score+AND+(matching+OR+matched+OR+match)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 18

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 19

King and Nielsen

In 20152016 Gary King and Richard Nielsen circulated a paper thatcreated quite some concern among applied researchers

The basic message of the paper is that PSM is really really bad andshould be discarded

The paperI httpjmp1sexgVw

SlidesI httpsgkingharvardedupresentationswhy-propensity-scores-should-not-be-used-matching-6

Watch itI httpswwwyoutubecomwatchv=rBv39pK1iEs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 20

King and Nielsen

The story goes about as follows

Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)

I Matching is good because it reduces model dependence

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 21

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

CC

C

CC

C

C

CC

C

C

C

C

C

C

C

C

CC C

C

C

C

C

C

C

C

C

C

C

C

CCC

CC

CC

C

C

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

CC

C

CC

C

C

CC

C

C

C

C

C

C

C

C

CC C

C

C

C

C

C

C

C

C

C

C

C

CCC

CC

CC

C

C

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

CC

C

CC

C

C

CC

C

C

C

C

C

C

C

C

CC C

C

C

C

C

C

C

C

C

C

C

C

CCC

CC

CC

C

C

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

TC

C

C

CC

CC

C

C

CC

C CC

C

C

CCCC

C

CC

C

CC

CCCC

C

C

C

C

CC

CCCC

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

TC

C

C

CC

CC

C

C

CC

C CC

C

C

CCCC

C

CC

C

CC

CCCC

C

C

C

C

CC

CCCC

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

King and NielsenArgument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomizationMatching Finding Hidden Randomized Experiments

Types of Experiments

BalanceCovariates

CompleteRandomization

FullyBlocked

Observed On average ExactUnobserved On average On average

Fully blocked dominates complete randomization forimbalance model dependence power eciency bias researchcosts robustness Eg Imai King Nall 2009 SEs 600 smaller

Goal of Each Matching Method (in Observational Data)

bull PSM complete randomization

bull Other methods fully blocked

bull Other matching methods dominate PSM (wait it gets worse)

623

(slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 23

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

CC CCC CCCCC CCCCCC CCC CC C C CCCCC CCC CC CC CC CC CC CC CC CC C CCCCC CC CC C CCCC C CCC CC CC C CCC CC CC CC C CCC CC CC CCC C CC CCCCC CC CCCC C CCC CCC CC C C CCCC CC CCCC C CCCC C CC CCC CC CCC CC CC CC CC C CCCC C CC C C CC CCC CC CCC CCCCC CCCC CCC CC CCC C C CC CC CC CCCC CC CC CCC C C CCC C CC CCC CC C CCC C C CCC CC CC C CC C CC CCCCC CCCC C C CC C CCCC CC CCC CCC C CCC CC CC CC CC CC C CC C CC CC CCC CC C C CCC CCC C CC CC CCC CCC CC CCC CC C CCC CC C CC CCCC C CC C CC CC C CC C CCC C C CCC CC C CCC CCC CC CCCC CC CC C CC C CC CC C CC CCC CC C CCC CCCC C CC CC C CCC CC CC CC C CCCCC CCC C C CC C CC CCC CCC CC CCC CC CCCC C CCC CC CCC

923 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

923 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching is Suboptimal

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

CC

C C

CC

C

C

C CC

C

C

CC

C

C

C

C

C

CC CC

C

C

C

C

CC

C

CC

C

C

C

C

C

C

C

C

C C

CC

CCC

CC

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

King and Nielsen

Argument 3I Random pruning (deleting observations at random) increasesimbalance This is because the sample size decreases so that varianceincreases (large differences become more likely)

I More imbalancevariance means more model dependence andresearcher discretion

I Because PSM approximates complete randomization it engages inrandom pruning

I PSM Paradox (ldquowhen you do lsquobetterrsquo you do worserdquo)F When matching is made more strict (eg by decreasing the size of

the caliper) PSM like other matching methods typically reducesimbalance But soon the PSM Paradox kicks in such that furtherpruning quickly increases imbalance

F If the data is such that there are no big differences between treatedand untreated to begin with the PSM Paradox kicks in almostimmediately

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 25

PSM Increases Model Dependence amp Bias

Model Dependence0

000

010

020

030

040

05

Number of Units Pruned

Varia

nce

0 40 80 120 160

MDM

PSM

Bias

20

25

30

35

40

Number of Units Pruned

Max

imum

Coe

ffici

ent a

cros

s 51

2 Sp

ecifi

catio

ns0 40 80 120 160

MDM

PSM

True effect = 2

Yi = 2Ti + X1i + X2i + ii N(0 1)

2023 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26

The Propensity Score Paradox in Real Data

Finkel et al (JOP 2012)

0 500 1000 1500 2000 2500 3000

0

2

4

6

8

10

Number of units pruned

Imba

lanc

e

CEMMDM

PSM

14 SD caliper

Raw

Random

Nielsen et al (AJPS 2011)

0 500 1000 1500 2000 2500

0

5

10

15

20

25

30

Number of units pruned

Imba

lanc

e

CEM

MDM

PSM

14 SD caliper

RandomRaw

Similar pattern for gt 20 other real data sets we checked

2123 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 27

Are King and Nielsen right

Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)

I Matching is good because it reduces model dependence

I fully agree

My view however may be somewhat less pessimistic I believe thatresearch results can be credible if researchers are well educated sothat they know what they are doing and if modeling decisions aremade transparent and robustness of results is evaluated (anddocumented)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 28

Are King and Nielsen right

Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization

That fully blocked randomization is more efficient than completerandomization ndash given the sample size ndash is of course true (how largethe efficiency gains are depends on the strength of the relationbetween X and Y )

However if blocking reduces the sample size it is not a priori clearwhether estimates from the blocked sample are more efficient thanestimates from the full sample (although often they will be)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29

Are King and Nielsen right

Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization

That PSM approximates complete randomization is only partiallytrue PSM approximates complete randomization withinobservations with the same propensity score Hence PSM issomewhere between complete randomization and fully blockedrandomizationI If the X variables have no relation to T (treatment) then allobservations have the same propensity score Hence we end up withcomplete randomization

I If the X variables have a strong effect on T there is lots of blocking

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)

As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)

Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary

I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)

In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31

The Command

kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)

I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics

I Computationally efficient implementation

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32

Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)

drop if industry==2(4 observations deleted)

Mahalanobis-distance kernel matching

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33

Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)

Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif

collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888

3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0

10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0

2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0

Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio

collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966

3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1

10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1

2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34

Some Examples make a graph of the balancing stats mat M = r(M)

mat V = r(V)

coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)

addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))

addplot 2 xline(1) norescaling

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-4 -2 0 2 4 0 1 2 3 4

Std mean difference Variance ratio

Raw Matched

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35

Some Examples

Propensity-score kernel matching

kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Propensity-score kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 431 26 457 1214 182 1396 00188

Treatment-effects estimation

wage Coef

ATT 3887224NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36

Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)

01

23

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Den

sity

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37

Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)

05

1

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Cum

ulat

ive

prob

abilit

y

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38

Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)

02

46

8Raw Matched (ATT)

Untreated Treated

Prop

ensi

ty s

core

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39

Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289

NATE 1432913 2333282 614 0000 9755981 1890228

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40

Some Examples

Do some tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41

Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42

Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43

Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44

Some Examples

Mahalanobis-distance and propensity-score matching combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45

Some Examples

Exact matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46

Some Examples

Bandwidth selection the default (based on distribution of distances in

one-nearest-neighbor matching)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47

Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

3

02

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48

Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49

Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

4

1112

1314

Wei

ghte

d M

ISE

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50

Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51

Some Examples make a graph of the common-support stats mat M = r(M)

coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52

Some Examples

Multiple outcome variables

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53

Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54

Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2763358 166 0097 -082975 1000241

1ATT 9518705 406903 234 0019 1543553 1749386

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 123Prob gt chi2 = 02679

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4452343 111 0268 -379406 1365881

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017-06-24

Propensity Scores MatchingIllustration using kmatch

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017-06-24

Propensity Scores MatchingIllustration using kmatch

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64

Conclusions

The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)

Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65

Conclusions

Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the

literature (somewhat unclear eg how categorical variables should betreated)

Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 16: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 16

The Propensity Score Theorem (Rosenbaum and Rubin 1983)

If the conditional independence assumption is true then

Pr(Di = 1|Y 0i Y

1i Xi) = Pr(Di = 1|Xi) = π(Xi)

where π(X ) is called the propensity score

That is(Y 0Y 1) perpperp D |X

implies(Y 0Y 1) perpperp D |π(X )

so that under strong ignorability the average causal effect can beestimated by conditioning on the propensity score π(X ) instead of X

This is remarkable because the information in X which may includemany variables can be reduced to just one dimension This greatlysimplifies the matching task

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 17

Propensity Score Matching (PSM)

Instead of computing multivariate distances we can thus simplymatch on the (one-dimensional) propensity score

ProcedureI Step 1 Estimate the propensity score eg using a Logit modelI Step 2 Apply a matching algorithm using differences in thepropensity score |π(Xi )minus π(Xj)| instead of multivariate distances

PSM is tremendously popularI httpsscholargooglechscholarq=propensity+score+AND+(matching+OR+matched+OR+match)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 18

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 19

King and Nielsen

In 20152016 Gary King and Richard Nielsen circulated a paper thatcreated quite some concern among applied researchers

The basic message of the paper is that PSM is really really bad andshould be discarded

The paperI httpjmp1sexgVw

SlidesI httpsgkingharvardedupresentationswhy-propensity-scores-should-not-be-used-matching-6

Watch itI httpswwwyoutubecomwatchv=rBv39pK1iEs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 20

King and Nielsen

The story goes about as follows

Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)

I Matching is good because it reduces model dependence

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 21

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

CC

C

CC

C

C

CC

C

C

C

C

C

C

C

C

CC C

C

C

C

C

C

C

C

C

C

C

C

CCC

CC

CC

C

C

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

CC

C

CC

C

C

CC

C

C

C

C

C

C

C

C

CC C

C

C

C

C

C

C

C

C

C

C

C

CCC

CC

CC

C

C

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

CC

C

CC

C

C

CC

C

C

C

C

C

C

C

C

CC C

C

C

C

C

C

C

C

C

C

C

C

CCC

CC

CC

C

C

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

TC

C

C

CC

CC

C

C

CC

C CC

C

C

CCCC

C

CC

C

CC

CCCC

C

C

C

C

CC

CCCC

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

TC

C

C

CC

CC

C

C

CC

C CC

C

C

CCCC

C

CC

C

CC

CCCC

C

C

C

C

CC

CCCC

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

King and NielsenArgument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomizationMatching Finding Hidden Randomized Experiments

Types of Experiments

BalanceCovariates

CompleteRandomization

FullyBlocked

Observed On average ExactUnobserved On average On average

Fully blocked dominates complete randomization forimbalance model dependence power eciency bias researchcosts robustness Eg Imai King Nall 2009 SEs 600 smaller

Goal of Each Matching Method (in Observational Data)

bull PSM complete randomization

bull Other methods fully blocked

bull Other matching methods dominate PSM (wait it gets worse)

623

(slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 23

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

CC CCC CCCCC CCCCCC CCC CC C C CCCCC CCC CC CC CC CC CC CC CC CC C CCCCC CC CC C CCCC C CCC CC CC C CCC CC CC CC C CCC CC CC CCC C CC CCCCC CC CCCC C CCC CCC CC C C CCCC CC CCCC C CCCC C CC CCC CC CCC CC CC CC CC C CCCC C CC C C CC CCC CC CCC CCCCC CCCC CCC CC CCC C C CC CC CC CCCC CC CC CCC C C CCC C CC CCC CC C CCC C C CCC CC CC C CC C CC CCCCC CCCC C C CC C CCCC CC CCC CCC C CCC CC CC CC CC CC C CC C CC CC CCC CC C C CCC CCC C CC CC CCC CCC CC CCC CC C CCC CC C CC CCCC C CC C CC CC C CC C CCC C C CCC CC C CCC CCC CC CCCC CC CC C CC C CC CC C CC CCC CC C CCC CCCC C CC CC C CCC CC CC CC C CCCCC CCC C C CC C CC CCC CCC CC CCC CC CCCC C CCC CC CCC

923 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

923 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching is Suboptimal

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

CC

C C

CC

C

C

C CC

C

C

CC

C

C

C

C

C

CC CC

C

C

C

C

CC

C

CC

C

C

C

C

C

C

C

C

C C

CC

CCC

CC

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

King and Nielsen

Argument 3I Random pruning (deleting observations at random) increasesimbalance This is because the sample size decreases so that varianceincreases (large differences become more likely)

I More imbalancevariance means more model dependence andresearcher discretion

I Because PSM approximates complete randomization it engages inrandom pruning

I PSM Paradox (ldquowhen you do lsquobetterrsquo you do worserdquo)F When matching is made more strict (eg by decreasing the size of

the caliper) PSM like other matching methods typically reducesimbalance But soon the PSM Paradox kicks in such that furtherpruning quickly increases imbalance

F If the data is such that there are no big differences between treatedand untreated to begin with the PSM Paradox kicks in almostimmediately

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 25

PSM Increases Model Dependence amp Bias

Model Dependence0

000

010

020

030

040

05

Number of Units Pruned

Varia

nce

0 40 80 120 160

MDM

PSM

Bias

20

25

30

35

40

Number of Units Pruned

Max

imum

Coe

ffici

ent a

cros

s 51

2 Sp

ecifi

catio

ns0 40 80 120 160

MDM

PSM

True effect = 2

Yi = 2Ti + X1i + X2i + ii N(0 1)

2023 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26

The Propensity Score Paradox in Real Data

Finkel et al (JOP 2012)

0 500 1000 1500 2000 2500 3000

0

2

4

6

8

10

Number of units pruned

Imba

lanc

e

CEMMDM

PSM

14 SD caliper

Raw

Random

Nielsen et al (AJPS 2011)

0 500 1000 1500 2000 2500

0

5

10

15

20

25

30

Number of units pruned

Imba

lanc

e

CEM

MDM

PSM

14 SD caliper

RandomRaw

Similar pattern for gt 20 other real data sets we checked

2123 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 27

Are King and Nielsen right

Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)

I Matching is good because it reduces model dependence

I fully agree

My view however may be somewhat less pessimistic I believe thatresearch results can be credible if researchers are well educated sothat they know what they are doing and if modeling decisions aremade transparent and robustness of results is evaluated (anddocumented)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 28

Are King and Nielsen right

Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization

That fully blocked randomization is more efficient than completerandomization ndash given the sample size ndash is of course true (how largethe efficiency gains are depends on the strength of the relationbetween X and Y )

However if blocking reduces the sample size it is not a priori clearwhether estimates from the blocked sample are more efficient thanestimates from the full sample (although often they will be)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29

Are King and Nielsen right

Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization

That PSM approximates complete randomization is only partiallytrue PSM approximates complete randomization withinobservations with the same propensity score Hence PSM issomewhere between complete randomization and fully blockedrandomizationI If the X variables have no relation to T (treatment) then allobservations have the same propensity score Hence we end up withcomplete randomization

I If the X variables have a strong effect on T there is lots of blocking

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)

As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)

Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary

I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)

In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31

The Command

kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)

I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics

I Computationally efficient implementation

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32

Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)

drop if industry==2(4 observations deleted)

Mahalanobis-distance kernel matching

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33

Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)

Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif

collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888

3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0

10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0

2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0

Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio

collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966

3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1

10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1

2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34

Some Examples make a graph of the balancing stats mat M = r(M)

mat V = r(V)

coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)

addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))

addplot 2 xline(1) norescaling

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-4 -2 0 2 4 0 1 2 3 4

Std mean difference Variance ratio

Raw Matched

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35

Some Examples

Propensity-score kernel matching

kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Propensity-score kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 431 26 457 1214 182 1396 00188

Treatment-effects estimation

wage Coef

ATT 3887224NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36

Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)

01

23

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Den

sity

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37

Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)

05

1

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Cum

ulat

ive

prob

abilit

y

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38

Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)

02

46

8Raw Matched (ATT)

Untreated Treated

Prop

ensi

ty s

core

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39

Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289

NATE 1432913 2333282 614 0000 9755981 1890228

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40

Some Examples

Do some tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41

Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42

Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43

Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44

Some Examples

Mahalanobis-distance and propensity-score matching combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45

Some Examples

Exact matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46

Some Examples

Bandwidth selection the default (based on distribution of distances in

one-nearest-neighbor matching)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47

Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

3

02

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48

Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49

Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

4

1112

1314

Wei

ghte

d M

ISE

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50

Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51

Some Examples make a graph of the common-support stats mat M = r(M)

coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52

Some Examples

Multiple outcome variables

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53

Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54

Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2763358 166 0097 -082975 1000241

1ATT 9518705 406903 234 0019 1543553 1749386

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 123Prob gt chi2 = 02679

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4452343 111 0268 -379406 1365881

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017-06-24

Propensity Scores MatchingIllustration using kmatch

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017-06-24

Propensity Scores MatchingIllustration using kmatch

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64

Conclusions

The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)

Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65

Conclusions

Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the

literature (somewhat unclear eg how categorical variables should betreated)

Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 17: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

The Propensity Score Theorem (Rosenbaum and Rubin 1983)

If the conditional independence assumption is true then

Pr(Di = 1|Y 0i Y

1i Xi) = Pr(Di = 1|Xi) = π(Xi)

where π(X ) is called the propensity score

That is(Y 0Y 1) perpperp D |X

implies(Y 0Y 1) perpperp D |π(X )

so that under strong ignorability the average causal effect can beestimated by conditioning on the propensity score π(X ) instead of X

This is remarkable because the information in X which may includemany variables can be reduced to just one dimension This greatlysimplifies the matching task

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 17

Propensity Score Matching (PSM)

Instead of computing multivariate distances we can thus simplymatch on the (one-dimensional) propensity score

ProcedureI Step 1 Estimate the propensity score eg using a Logit modelI Step 2 Apply a matching algorithm using differences in thepropensity score |π(Xi )minus π(Xj)| instead of multivariate distances

PSM is tremendously popularI httpsscholargooglechscholarq=propensity+score+AND+(matching+OR+matched+OR+match)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 18

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 19

King and Nielsen

In 20152016 Gary King and Richard Nielsen circulated a paper thatcreated quite some concern among applied researchers

The basic message of the paper is that PSM is really really bad andshould be discarded

The paperI httpjmp1sexgVw

SlidesI httpsgkingharvardedupresentationswhy-propensity-scores-should-not-be-used-matching-6

Watch itI httpswwwyoutubecomwatchv=rBv39pK1iEs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 20

King and Nielsen

The story goes about as follows

Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)

I Matching is good because it reduces model dependence

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 21

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

CC

C

CC

C

C

CC

C

C

C

C

C

C

C

C

CC C

C

C

C

C

C

C

C

C

C

C

C

CCC

CC

CC

C

C

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

CC

C

CC

C

C

CC

C

C

C

C

C

C

C

C

CC C

C

C

C

C

C

C

C

C

C

C

C

CCC

CC

CC

C

C

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

CC

C

CC

C

C

CC

C

C

C

C

C

C

C

C

CC C

C

C

C

C

C

C

C

C

C

C

C

CCC

CC

CC

C

C

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

TC

C

C

CC

CC

C

C

CC

C CC

C

C

CCCC

C

CC

C

CC

CCCC

C

C

C

C

CC

CCCC

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

TC

C

C

CC

CC

C

C

CC

C CC

C

C

CCCC

C

CC

C

CC

CCCC

C

C

C

C

CC

CCCC

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

King and NielsenArgument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomizationMatching Finding Hidden Randomized Experiments

Types of Experiments

BalanceCovariates

CompleteRandomization

FullyBlocked

Observed On average ExactUnobserved On average On average

Fully blocked dominates complete randomization forimbalance model dependence power eciency bias researchcosts robustness Eg Imai King Nall 2009 SEs 600 smaller

Goal of Each Matching Method (in Observational Data)

bull PSM complete randomization

bull Other methods fully blocked

bull Other matching methods dominate PSM (wait it gets worse)

623

(slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 23

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

CC CCC CCCCC CCCCCC CCC CC C C CCCCC CCC CC CC CC CC CC CC CC CC C CCCCC CC CC C CCCC C CCC CC CC C CCC CC CC CC C CCC CC CC CCC C CC CCCCC CC CCCC C CCC CCC CC C C CCCC CC CCCC C CCCC C CC CCC CC CCC CC CC CC CC C CCCC C CC C C CC CCC CC CCC CCCCC CCCC CCC CC CCC C C CC CC CC CCCC CC CC CCC C C CCC C CC CCC CC C CCC C C CCC CC CC C CC C CC CCCCC CCCC C C CC C CCCC CC CCC CCC C CCC CC CC CC CC CC C CC C CC CC CCC CC C C CCC CCC C CC CC CCC CCC CC CCC CC C CCC CC C CC CCCC C CC C CC CC C CC C CCC C C CCC CC C CCC CCC CC CCCC CC CC C CC C CC CC C CC CCC CC C CCC CCCC C CC CC C CCC CC CC CC C CCCCC CCC C C CC C CC CCC CCC CC CCC CC CCCC C CCC CC CCC

923 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

923 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching is Suboptimal

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

CC

C C

CC

C

C

C CC

C

C

CC

C

C

C

C

C

CC CC

C

C

C

C

CC

C

CC

C

C

C

C

C

C

C

C

C C

CC

CCC

CC

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

King and Nielsen

Argument 3I Random pruning (deleting observations at random) increasesimbalance This is because the sample size decreases so that varianceincreases (large differences become more likely)

I More imbalancevariance means more model dependence andresearcher discretion

I Because PSM approximates complete randomization it engages inrandom pruning

I PSM Paradox (ldquowhen you do lsquobetterrsquo you do worserdquo)F When matching is made more strict (eg by decreasing the size of

the caliper) PSM like other matching methods typically reducesimbalance But soon the PSM Paradox kicks in such that furtherpruning quickly increases imbalance

F If the data is such that there are no big differences between treatedand untreated to begin with the PSM Paradox kicks in almostimmediately

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 25

PSM Increases Model Dependence amp Bias

Model Dependence0

000

010

020

030

040

05

Number of Units Pruned

Varia

nce

0 40 80 120 160

MDM

PSM

Bias

20

25

30

35

40

Number of Units Pruned

Max

imum

Coe

ffici

ent a

cros

s 51

2 Sp

ecifi

catio

ns0 40 80 120 160

MDM

PSM

True effect = 2

Yi = 2Ti + X1i + X2i + ii N(0 1)

2023 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26

The Propensity Score Paradox in Real Data

Finkel et al (JOP 2012)

0 500 1000 1500 2000 2500 3000

0

2

4

6

8

10

Number of units pruned

Imba

lanc

e

CEMMDM

PSM

14 SD caliper

Raw

Random

Nielsen et al (AJPS 2011)

0 500 1000 1500 2000 2500

0

5

10

15

20

25

30

Number of units pruned

Imba

lanc

e

CEM

MDM

PSM

14 SD caliper

RandomRaw

Similar pattern for gt 20 other real data sets we checked

2123 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 27

Are King and Nielsen right

Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)

I Matching is good because it reduces model dependence

I fully agree

My view however may be somewhat less pessimistic I believe thatresearch results can be credible if researchers are well educated sothat they know what they are doing and if modeling decisions aremade transparent and robustness of results is evaluated (anddocumented)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 28

Are King and Nielsen right

Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization

That fully blocked randomization is more efficient than completerandomization ndash given the sample size ndash is of course true (how largethe efficiency gains are depends on the strength of the relationbetween X and Y )

However if blocking reduces the sample size it is not a priori clearwhether estimates from the blocked sample are more efficient thanestimates from the full sample (although often they will be)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29

Are King and Nielsen right

Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization

That PSM approximates complete randomization is only partiallytrue PSM approximates complete randomization withinobservations with the same propensity score Hence PSM issomewhere between complete randomization and fully blockedrandomizationI If the X variables have no relation to T (treatment) then allobservations have the same propensity score Hence we end up withcomplete randomization

I If the X variables have a strong effect on T there is lots of blocking

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)

As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)

Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary

I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)

In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31

The Command

kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)

I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics

I Computationally efficient implementation

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32

Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)

drop if industry==2(4 observations deleted)

Mahalanobis-distance kernel matching

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33

Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)

Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif

collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888

3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0

10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0

2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0

Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio

collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966

3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1

10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1

2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34

Some Examples make a graph of the balancing stats mat M = r(M)

mat V = r(V)

coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)

addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))

addplot 2 xline(1) norescaling

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-4 -2 0 2 4 0 1 2 3 4

Std mean difference Variance ratio

Raw Matched

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35

Some Examples

Propensity-score kernel matching

kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Propensity-score kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 431 26 457 1214 182 1396 00188

Treatment-effects estimation

wage Coef

ATT 3887224NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36

Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)

01

23

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Den

sity

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37

Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)

05

1

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Cum

ulat

ive

prob

abilit

y

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38

Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)

02

46

8Raw Matched (ATT)

Untreated Treated

Prop

ensi

ty s

core

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39

Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289

NATE 1432913 2333282 614 0000 9755981 1890228

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40

Some Examples

Do some tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41

Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42

Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43

Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44

Some Examples

Mahalanobis-distance and propensity-score matching combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45

Some Examples

Exact matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46

Some Examples

Bandwidth selection the default (based on distribution of distances in

one-nearest-neighbor matching)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47

Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

3

02

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48

Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49

Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

4

1112

1314

Wei

ghte

d M

ISE

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50

Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51

Some Examples make a graph of the common-support stats mat M = r(M)

coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52

Some Examples

Multiple outcome variables

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53

Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54

Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2763358 166 0097 -082975 1000241

1ATT 9518705 406903 234 0019 1543553 1749386

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 123Prob gt chi2 = 02679

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4452343 111 0268 -379406 1365881

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017-06-24

Propensity Scores MatchingIllustration using kmatch

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017-06-24

Propensity Scores MatchingIllustration using kmatch

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64

Conclusions

The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)

Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65

Conclusions

Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the

literature (somewhat unclear eg how categorical variables should betreated)

Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 18: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

Propensity Score Matching (PSM)

Instead of computing multivariate distances we can thus simplymatch on the (one-dimensional) propensity score

ProcedureI Step 1 Estimate the propensity score eg using a Logit modelI Step 2 Apply a matching algorithm using differences in thepropensity score |π(Xi )minus π(Xj)| instead of multivariate distances

PSM is tremendously popularI httpsscholargooglechscholarq=propensity+score+AND+(matching+OR+matched+OR+match)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 18

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 19

King and Nielsen

In 20152016 Gary King and Richard Nielsen circulated a paper thatcreated quite some concern among applied researchers

The basic message of the paper is that PSM is really really bad andshould be discarded

The paperI httpjmp1sexgVw

SlidesI httpsgkingharvardedupresentationswhy-propensity-scores-should-not-be-used-matching-6

Watch itI httpswwwyoutubecomwatchv=rBv39pK1iEs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 20

King and Nielsen

The story goes about as follows

Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)

I Matching is good because it reduces model dependence

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 21

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

CC

C

CC

C

C

CC

C

C

C

C

C

C

C

C

CC C

C

C

C

C

C

C

C

C

C

C

C

CCC

CC

CC

C

C

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

CC

C

CC

C

C

CC

C

C

C

C

C

C

C

C

CC C

C

C

C

C

C

C

C

C

C

C

C

CCC

CC

CC

C

C

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

CC

C

CC

C

C

CC

C

C

C

C

C

C

C

C

CC C

C

C

C

C

C

C

C

C

C

C

C

CCC

CC

CC

C

C

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

TC

C

C

CC

CC

C

C

CC

C CC

C

C

CCCC

C

CC

C

CC

CCCC

C

C

C

C

CC

CCCC

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

TC

C

C

CC

CC

C

C

CC

C CC

C

C

CCCC

C

CC

C

CC

CCCC

C

C

C

C

CC

CCCC

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

King and NielsenArgument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomizationMatching Finding Hidden Randomized Experiments

Types of Experiments

BalanceCovariates

CompleteRandomization

FullyBlocked

Observed On average ExactUnobserved On average On average

Fully blocked dominates complete randomization forimbalance model dependence power eciency bias researchcosts robustness Eg Imai King Nall 2009 SEs 600 smaller

Goal of Each Matching Method (in Observational Data)

bull PSM complete randomization

bull Other methods fully blocked

bull Other matching methods dominate PSM (wait it gets worse)

623

(slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 23

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

CC CCC CCCCC CCCCCC CCC CC C C CCCCC CCC CC CC CC CC CC CC CC CC C CCCCC CC CC C CCCC C CCC CC CC C CCC CC CC CC C CCC CC CC CCC C CC CCCCC CC CCCC C CCC CCC CC C C CCCC CC CCCC C CCCC C CC CCC CC CCC CC CC CC CC C CCCC C CC C C CC CCC CC CCC CCCCC CCCC CCC CC CCC C C CC CC CC CCCC CC CC CCC C C CCC C CC CCC CC C CCC C C CCC CC CC C CC C CC CCCCC CCCC C C CC C CCCC CC CCC CCC C CCC CC CC CC CC CC C CC C CC CC CCC CC C C CCC CCC C CC CC CCC CCC CC CCC CC C CCC CC C CC CCCC C CC C CC CC C CC C CCC C C CCC CC C CCC CCC CC CCCC CC CC C CC C CC CC C CC CCC CC C CCC CCCC C CC CC C CCC CC CC CC C CCCCC CCC C C CC C CC CCC CCC CC CCC CC CCCC C CCC CC CCC

923 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

923 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching is Suboptimal

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

CC

C C

CC

C

C

C CC

C

C

CC

C

C

C

C

C

CC CC

C

C

C

C

CC

C

CC

C

C

C

C

C

C

C

C

C C

CC

CCC

CC

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

King and Nielsen

Argument 3I Random pruning (deleting observations at random) increasesimbalance This is because the sample size decreases so that varianceincreases (large differences become more likely)

I More imbalancevariance means more model dependence andresearcher discretion

I Because PSM approximates complete randomization it engages inrandom pruning

I PSM Paradox (ldquowhen you do lsquobetterrsquo you do worserdquo)F When matching is made more strict (eg by decreasing the size of

the caliper) PSM like other matching methods typically reducesimbalance But soon the PSM Paradox kicks in such that furtherpruning quickly increases imbalance

F If the data is such that there are no big differences between treatedand untreated to begin with the PSM Paradox kicks in almostimmediately

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 25

PSM Increases Model Dependence amp Bias

Model Dependence0

000

010

020

030

040

05

Number of Units Pruned

Varia

nce

0 40 80 120 160

MDM

PSM

Bias

20

25

30

35

40

Number of Units Pruned

Max

imum

Coe

ffici

ent a

cros

s 51

2 Sp

ecifi

catio

ns0 40 80 120 160

MDM

PSM

True effect = 2

Yi = 2Ti + X1i + X2i + ii N(0 1)

2023 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26

The Propensity Score Paradox in Real Data

Finkel et al (JOP 2012)

0 500 1000 1500 2000 2500 3000

0

2

4

6

8

10

Number of units pruned

Imba

lanc

e

CEMMDM

PSM

14 SD caliper

Raw

Random

Nielsen et al (AJPS 2011)

0 500 1000 1500 2000 2500

0

5

10

15

20

25

30

Number of units pruned

Imba

lanc

e

CEM

MDM

PSM

14 SD caliper

RandomRaw

Similar pattern for gt 20 other real data sets we checked

2123 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 27

Are King and Nielsen right

Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)

I Matching is good because it reduces model dependence

I fully agree

My view however may be somewhat less pessimistic I believe thatresearch results can be credible if researchers are well educated sothat they know what they are doing and if modeling decisions aremade transparent and robustness of results is evaluated (anddocumented)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 28

Are King and Nielsen right

Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization

That fully blocked randomization is more efficient than completerandomization ndash given the sample size ndash is of course true (how largethe efficiency gains are depends on the strength of the relationbetween X and Y )

However if blocking reduces the sample size it is not a priori clearwhether estimates from the blocked sample are more efficient thanestimates from the full sample (although often they will be)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29

Are King and Nielsen right

Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization

That PSM approximates complete randomization is only partiallytrue PSM approximates complete randomization withinobservations with the same propensity score Hence PSM issomewhere between complete randomization and fully blockedrandomizationI If the X variables have no relation to T (treatment) then allobservations have the same propensity score Hence we end up withcomplete randomization

I If the X variables have a strong effect on T there is lots of blocking

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)

As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)

Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary

I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)

In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31

The Command

kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)

I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics

I Computationally efficient implementation

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32

Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)

drop if industry==2(4 observations deleted)

Mahalanobis-distance kernel matching

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33

Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)

Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif

collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888

3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0

10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0

2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0

Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio

collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966

3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1

10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1

2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34

Some Examples make a graph of the balancing stats mat M = r(M)

mat V = r(V)

coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)

addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))

addplot 2 xline(1) norescaling

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-4 -2 0 2 4 0 1 2 3 4

Std mean difference Variance ratio

Raw Matched

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35

Some Examples

Propensity-score kernel matching

kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Propensity-score kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 431 26 457 1214 182 1396 00188

Treatment-effects estimation

wage Coef

ATT 3887224NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36

Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)

01

23

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Den

sity

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37

Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)

05

1

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Cum

ulat

ive

prob

abilit

y

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38

Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)

02

46

8Raw Matched (ATT)

Untreated Treated

Prop

ensi

ty s

core

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39

Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289

NATE 1432913 2333282 614 0000 9755981 1890228

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40

Some Examples

Do some tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41

Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42

Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43

Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44

Some Examples

Mahalanobis-distance and propensity-score matching combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45

Some Examples

Exact matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46

Some Examples

Bandwidth selection the default (based on distribution of distances in

one-nearest-neighbor matching)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47

Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

3

02

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48

Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49

Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

4

1112

1314

Wei

ghte

d M

ISE

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50

Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51

Some Examples make a graph of the common-support stats mat M = r(M)

coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52

Some Examples

Multiple outcome variables

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53

Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54

Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2763358 166 0097 -082975 1000241

1ATT 9518705 406903 234 0019 1543553 1749386

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 123Prob gt chi2 = 02679

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4452343 111 0268 -379406 1365881

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017-06-24

Propensity Scores MatchingIllustration using kmatch

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017-06-24

Propensity Scores MatchingIllustration using kmatch

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64

Conclusions

The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)

Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65

Conclusions

Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the

literature (somewhat unclear eg how categorical variables should betreated)

Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 19: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 19

King and Nielsen

In 20152016 Gary King and Richard Nielsen circulated a paper thatcreated quite some concern among applied researchers

The basic message of the paper is that PSM is really really bad andshould be discarded

The paperI httpjmp1sexgVw

SlidesI httpsgkingharvardedupresentationswhy-propensity-scores-should-not-be-used-matching-6

Watch itI httpswwwyoutubecomwatchv=rBv39pK1iEs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 20

King and Nielsen

The story goes about as follows

Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)

I Matching is good because it reduces model dependence

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 21

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

CC

C

CC

C

C

CC

C

C

C

C

C

C

C

C

CC C

C

C

C

C

C

C

C

C

C

C

C

CCC

CC

CC

C

C

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

CC

C

CC

C

C

CC

C

C

C

C

C

C

C

C

CC C

C

C

C

C

C

C

C

C

C

C

C

CCC

CC

CC

C

C

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

CC

C

CC

C

C

CC

C

C

C

C

C

C

C

C

CC C

C

C

C

C

C

C

C

C

C

C

C

CCC

CC

CC

C

C

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

TC

C

C

CC

CC

C

C

CC

C CC

C

C

CCCC

C

CC

C

CC

CCCC

C

C

C

C

CC

CCCC

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

TC

C

C

CC

CC

C

C

CC

C CC

C

C

CCCC

C

CC

C

CC

CCCC

C

C

C

C

CC

CCCC

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

King and NielsenArgument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomizationMatching Finding Hidden Randomized Experiments

Types of Experiments

BalanceCovariates

CompleteRandomization

FullyBlocked

Observed On average ExactUnobserved On average On average

Fully blocked dominates complete randomization forimbalance model dependence power eciency bias researchcosts robustness Eg Imai King Nall 2009 SEs 600 smaller

Goal of Each Matching Method (in Observational Data)

bull PSM complete randomization

bull Other methods fully blocked

bull Other matching methods dominate PSM (wait it gets worse)

623

(slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 23

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

CC CCC CCCCC CCCCCC CCC CC C C CCCCC CCC CC CC CC CC CC CC CC CC C CCCCC CC CC C CCCC C CCC CC CC C CCC CC CC CC C CCC CC CC CCC C CC CCCCC CC CCCC C CCC CCC CC C C CCCC CC CCCC C CCCC C CC CCC CC CCC CC CC CC CC C CCCC C CC C C CC CCC CC CCC CCCCC CCCC CCC CC CCC C C CC CC CC CCCC CC CC CCC C C CCC C CC CCC CC C CCC C C CCC CC CC C CC C CC CCCCC CCCC C C CC C CCCC CC CCC CCC C CCC CC CC CC CC CC C CC C CC CC CCC CC C C CCC CCC C CC CC CCC CCC CC CCC CC C CCC CC C CC CCCC C CC C CC CC C CC C CCC C C CCC CC C CCC CCC CC CCCC CC CC C CC C CC CC C CC CCC CC C CCC CCCC C CC CC C CCC CC CC CC C CCCCC CCC C C CC C CC CCC CCC CC CCC CC CCCC C CCC CC CCC

923 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

923 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching is Suboptimal

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

CC

C C

CC

C

C

C CC

C

C

CC

C

C

C

C

C

CC CC

C

C

C

C

CC

C

CC

C

C

C

C

C

C

C

C

C C

CC

CCC

CC

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

King and Nielsen

Argument 3I Random pruning (deleting observations at random) increasesimbalance This is because the sample size decreases so that varianceincreases (large differences become more likely)

I More imbalancevariance means more model dependence andresearcher discretion

I Because PSM approximates complete randomization it engages inrandom pruning

I PSM Paradox (ldquowhen you do lsquobetterrsquo you do worserdquo)F When matching is made more strict (eg by decreasing the size of

the caliper) PSM like other matching methods typically reducesimbalance But soon the PSM Paradox kicks in such that furtherpruning quickly increases imbalance

F If the data is such that there are no big differences between treatedand untreated to begin with the PSM Paradox kicks in almostimmediately

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 25

PSM Increases Model Dependence amp Bias

Model Dependence0

000

010

020

030

040

05

Number of Units Pruned

Varia

nce

0 40 80 120 160

MDM

PSM

Bias

20

25

30

35

40

Number of Units Pruned

Max

imum

Coe

ffici

ent a

cros

s 51

2 Sp

ecifi

catio

ns0 40 80 120 160

MDM

PSM

True effect = 2

Yi = 2Ti + X1i + X2i + ii N(0 1)

2023 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26

The Propensity Score Paradox in Real Data

Finkel et al (JOP 2012)

0 500 1000 1500 2000 2500 3000

0

2

4

6

8

10

Number of units pruned

Imba

lanc

e

CEMMDM

PSM

14 SD caliper

Raw

Random

Nielsen et al (AJPS 2011)

0 500 1000 1500 2000 2500

0

5

10

15

20

25

30

Number of units pruned

Imba

lanc

e

CEM

MDM

PSM

14 SD caliper

RandomRaw

Similar pattern for gt 20 other real data sets we checked

2123 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 27

Are King and Nielsen right

Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)

I Matching is good because it reduces model dependence

I fully agree

My view however may be somewhat less pessimistic I believe thatresearch results can be credible if researchers are well educated sothat they know what they are doing and if modeling decisions aremade transparent and robustness of results is evaluated (anddocumented)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 28

Are King and Nielsen right

Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization

That fully blocked randomization is more efficient than completerandomization ndash given the sample size ndash is of course true (how largethe efficiency gains are depends on the strength of the relationbetween X and Y )

However if blocking reduces the sample size it is not a priori clearwhether estimates from the blocked sample are more efficient thanestimates from the full sample (although often they will be)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29

Are King and Nielsen right

Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization

That PSM approximates complete randomization is only partiallytrue PSM approximates complete randomization withinobservations with the same propensity score Hence PSM issomewhere between complete randomization and fully blockedrandomizationI If the X variables have no relation to T (treatment) then allobservations have the same propensity score Hence we end up withcomplete randomization

I If the X variables have a strong effect on T there is lots of blocking

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)

As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)

Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary

I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)

In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31

The Command

kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)

I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics

I Computationally efficient implementation

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32

Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)

drop if industry==2(4 observations deleted)

Mahalanobis-distance kernel matching

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33

Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)

Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif

collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888

3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0

10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0

2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0

Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio

collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966

3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1

10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1

2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34

Some Examples make a graph of the balancing stats mat M = r(M)

mat V = r(V)

coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)

addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))

addplot 2 xline(1) norescaling

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-4 -2 0 2 4 0 1 2 3 4

Std mean difference Variance ratio

Raw Matched

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35

Some Examples

Propensity-score kernel matching

kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Propensity-score kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 431 26 457 1214 182 1396 00188

Treatment-effects estimation

wage Coef

ATT 3887224NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36

Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)

01

23

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Den

sity

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37

Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)

05

1

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Cum

ulat

ive

prob

abilit

y

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38

Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)

02

46

8Raw Matched (ATT)

Untreated Treated

Prop

ensi

ty s

core

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39

Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289

NATE 1432913 2333282 614 0000 9755981 1890228

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40

Some Examples

Do some tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41

Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42

Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43

Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44

Some Examples

Mahalanobis-distance and propensity-score matching combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45

Some Examples

Exact matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46

Some Examples

Bandwidth selection the default (based on distribution of distances in

one-nearest-neighbor matching)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47

Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

3

02

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48

Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49

Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

4

1112

1314

Wei

ghte

d M

ISE

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50

Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51

Some Examples make a graph of the common-support stats mat M = r(M)

coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52

Some Examples

Multiple outcome variables

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53

Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54

Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2763358 166 0097 -082975 1000241

1ATT 9518705 406903 234 0019 1543553 1749386

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 123Prob gt chi2 = 02679

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4452343 111 0268 -379406 1365881

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017-06-24

Propensity Scores MatchingIllustration using kmatch

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017-06-24

Propensity Scores MatchingIllustration using kmatch

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64

Conclusions

The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)

Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65

Conclusions

Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the

literature (somewhat unclear eg how categorical variables should betreated)

Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 20: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

King and Nielsen

In 20152016 Gary King and Richard Nielsen circulated a paper thatcreated quite some concern among applied researchers

The basic message of the paper is that PSM is really really bad andshould be discarded

The paperI httpjmp1sexgVw

SlidesI httpsgkingharvardedupresentationswhy-propensity-scores-should-not-be-used-matching-6

Watch itI httpswwwyoutubecomwatchv=rBv39pK1iEs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 20

King and Nielsen

The story goes about as follows

Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)

I Matching is good because it reduces model dependence

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 21

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

CC

C

CC

C

C

CC

C

C

C

C

C

C

C

C

CC C

C

C

C

C

C

C

C

C

C

C

C

CCC

CC

CC

C

C

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

CC

C

CC

C

C

CC

C

C

C

C

C

C

C

C

CC C

C

C

C

C

C

C

C

C

C

C

C

CCC

CC

CC

C

C

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

CC

C

CC

C

C

CC

C

C

C

C

C

C

C

C

CC C

C

C

C

C

C

C

C

C

C

C

C

CCC

CC

CC

C

C

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

TC

C

C

CC

CC

C

C

CC

C CC

C

C

CCCC

C

CC

C

CC

CCCC

C

C

C

C

CC

CCCC

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

TC

C

C

CC

CC

C

C

CC

C CC

C

C

CCCC

C

CC

C

CC

CCCC

C

C

C

C

CC

CCCC

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

King and NielsenArgument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomizationMatching Finding Hidden Randomized Experiments

Types of Experiments

BalanceCovariates

CompleteRandomization

FullyBlocked

Observed On average ExactUnobserved On average On average

Fully blocked dominates complete randomization forimbalance model dependence power eciency bias researchcosts robustness Eg Imai King Nall 2009 SEs 600 smaller

Goal of Each Matching Method (in Observational Data)

bull PSM complete randomization

bull Other methods fully blocked

bull Other matching methods dominate PSM (wait it gets worse)

623

(slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 23

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

CC CCC CCCCC CCCCCC CCC CC C C CCCCC CCC CC CC CC CC CC CC CC CC C CCCCC CC CC C CCCC C CCC CC CC C CCC CC CC CC C CCC CC CC CCC C CC CCCCC CC CCCC C CCC CCC CC C C CCCC CC CCCC C CCCC C CC CCC CC CCC CC CC CC CC C CCCC C CC C C CC CCC CC CCC CCCCC CCCC CCC CC CCC C C CC CC CC CCCC CC CC CCC C C CCC C CC CCC CC C CCC C C CCC CC CC C CC C CC CCCCC CCCC C C CC C CCCC CC CCC CCC C CCC CC CC CC CC CC C CC C CC CC CCC CC C C CCC CCC C CC CC CCC CCC CC CCC CC C CCC CC C CC CCCC C CC C CC CC C CC C CCC C C CCC CC C CCC CCC CC CCCC CC CC C CC C CC CC C CC CCC CC C CCC CCCC C CC CC C CCC CC CC CC C CCCCC CCC C C CC C CC CCC CCC CC CCC CC CCCC C CCC CC CCC

923 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

923 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching is Suboptimal

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

CC

C C

CC

C

C

C CC

C

C

CC

C

C

C

C

C

CC CC

C

C

C

C

CC

C

CC

C

C

C

C

C

C

C

C

C C

CC

CCC

CC

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

King and Nielsen

Argument 3I Random pruning (deleting observations at random) increasesimbalance This is because the sample size decreases so that varianceincreases (large differences become more likely)

I More imbalancevariance means more model dependence andresearcher discretion

I Because PSM approximates complete randomization it engages inrandom pruning

I PSM Paradox (ldquowhen you do lsquobetterrsquo you do worserdquo)F When matching is made more strict (eg by decreasing the size of

the caliper) PSM like other matching methods typically reducesimbalance But soon the PSM Paradox kicks in such that furtherpruning quickly increases imbalance

F If the data is such that there are no big differences between treatedand untreated to begin with the PSM Paradox kicks in almostimmediately

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 25

PSM Increases Model Dependence amp Bias

Model Dependence0

000

010

020

030

040

05

Number of Units Pruned

Varia

nce

0 40 80 120 160

MDM

PSM

Bias

20

25

30

35

40

Number of Units Pruned

Max

imum

Coe

ffici

ent a

cros

s 51

2 Sp

ecifi

catio

ns0 40 80 120 160

MDM

PSM

True effect = 2

Yi = 2Ti + X1i + X2i + ii N(0 1)

2023 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26

The Propensity Score Paradox in Real Data

Finkel et al (JOP 2012)

0 500 1000 1500 2000 2500 3000

0

2

4

6

8

10

Number of units pruned

Imba

lanc

e

CEMMDM

PSM

14 SD caliper

Raw

Random

Nielsen et al (AJPS 2011)

0 500 1000 1500 2000 2500

0

5

10

15

20

25

30

Number of units pruned

Imba

lanc

e

CEM

MDM

PSM

14 SD caliper

RandomRaw

Similar pattern for gt 20 other real data sets we checked

2123 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 27

Are King and Nielsen right

Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)

I Matching is good because it reduces model dependence

I fully agree

My view however may be somewhat less pessimistic I believe thatresearch results can be credible if researchers are well educated sothat they know what they are doing and if modeling decisions aremade transparent and robustness of results is evaluated (anddocumented)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 28

Are King and Nielsen right

Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization

That fully blocked randomization is more efficient than completerandomization ndash given the sample size ndash is of course true (how largethe efficiency gains are depends on the strength of the relationbetween X and Y )

However if blocking reduces the sample size it is not a priori clearwhether estimates from the blocked sample are more efficient thanestimates from the full sample (although often they will be)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29

Are King and Nielsen right

Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization

That PSM approximates complete randomization is only partiallytrue PSM approximates complete randomization withinobservations with the same propensity score Hence PSM issomewhere between complete randomization and fully blockedrandomizationI If the X variables have no relation to T (treatment) then allobservations have the same propensity score Hence we end up withcomplete randomization

I If the X variables have a strong effect on T there is lots of blocking

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)

As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)

Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary

I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)

In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31

The Command

kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)

I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics

I Computationally efficient implementation

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32

Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)

drop if industry==2(4 observations deleted)

Mahalanobis-distance kernel matching

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33

Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)

Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif

collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888

3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0

10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0

2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0

Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio

collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966

3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1

10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1

2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34

Some Examples make a graph of the balancing stats mat M = r(M)

mat V = r(V)

coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)

addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))

addplot 2 xline(1) norescaling

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-4 -2 0 2 4 0 1 2 3 4

Std mean difference Variance ratio

Raw Matched

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35

Some Examples

Propensity-score kernel matching

kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Propensity-score kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 431 26 457 1214 182 1396 00188

Treatment-effects estimation

wage Coef

ATT 3887224NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36

Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)

01

23

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Den

sity

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37

Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)

05

1

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Cum

ulat

ive

prob

abilit

y

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38

Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)

02

46

8Raw Matched (ATT)

Untreated Treated

Prop

ensi

ty s

core

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39

Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289

NATE 1432913 2333282 614 0000 9755981 1890228

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40

Some Examples

Do some tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41

Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42

Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43

Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44

Some Examples

Mahalanobis-distance and propensity-score matching combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45

Some Examples

Exact matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46

Some Examples

Bandwidth selection the default (based on distribution of distances in

one-nearest-neighbor matching)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47

Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

3

02

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48

Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49

Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

4

1112

1314

Wei

ghte

d M

ISE

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50

Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51

Some Examples make a graph of the common-support stats mat M = r(M)

coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52

Some Examples

Multiple outcome variables

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53

Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54

Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2763358 166 0097 -082975 1000241

1ATT 9518705 406903 234 0019 1543553 1749386

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 123Prob gt chi2 = 02679

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4452343 111 0268 -379406 1365881

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017-06-24

Propensity Scores MatchingIllustration using kmatch

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017-06-24

Propensity Scores MatchingIllustration using kmatch

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64

Conclusions

The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)

Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65

Conclusions

Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the

literature (somewhat unclear eg how categorical variables should betreated)

Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 21: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

King and Nielsen

The story goes about as follows

Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)

I Matching is good because it reduces model dependence

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 21

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

CC

C

CC

C

C

CC

C

C

C

C

C

C

C

C

CC C

C

C

C

C

C

C

C

C

C

C

C

CCC

CC

CC

C

C

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

CC

C

CC

C

C

CC

C

C

C

C

C

C

C

C

CC C

C

C

C

C

C

C

C

C

C

C

C

CCC

CC

CC

C

C

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

CC

C

CC

C

C

CC

C

C

C

C

C

C

C

C

CC C

C

C

C

C

C

C

C

C

C

C

C

CCC

CC

CC

C

C

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

TC

C

C

CC

CC

C

C

CC

C CC

C

C

CCCC

C

CC

C

CC

CCCC

C

C

C

C

CC

CCCC

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

TC

C

C

CC

CC

C

C

CC

C CC

C

C

CCCC

C

CC

C

CC

CCCC

C

C

C

C

CC

CCCC

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

King and NielsenArgument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomizationMatching Finding Hidden Randomized Experiments

Types of Experiments

BalanceCovariates

CompleteRandomization

FullyBlocked

Observed On average ExactUnobserved On average On average

Fully blocked dominates complete randomization forimbalance model dependence power eciency bias researchcosts robustness Eg Imai King Nall 2009 SEs 600 smaller

Goal of Each Matching Method (in Observational Data)

bull PSM complete randomization

bull Other methods fully blocked

bull Other matching methods dominate PSM (wait it gets worse)

623

(slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 23

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

CC CCC CCCCC CCCCCC CCC CC C C CCCCC CCC CC CC CC CC CC CC CC CC C CCCCC CC CC C CCCC C CCC CC CC C CCC CC CC CC C CCC CC CC CCC C CC CCCCC CC CCCC C CCC CCC CC C C CCCC CC CCCC C CCCC C CC CCC CC CCC CC CC CC CC C CCCC C CC C C CC CCC CC CCC CCCCC CCCC CCC CC CCC C C CC CC CC CCCC CC CC CCC C C CCC C CC CCC CC C CCC C C CCC CC CC C CC C CC CCCCC CCCC C C CC C CCCC CC CCC CCC C CCC CC CC CC CC CC C CC C CC CC CCC CC C C CCC CCC C CC CC CCC CCC CC CCC CC C CCC CC C CC CCCC C CC C CC CC C CC C CCC C C CCC CC C CCC CCC CC CCCC CC CC C CC C CC CC C CC CCC CC C CCC CCCC C CC CC C CCC CC CC CC C CCCCC CCC C C CC C CC CCC CCC CC CCC CC CCCC C CCC CC CCC

923 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

923 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching is Suboptimal

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

CC

C C

CC

C

C

C CC

C

C

CC

C

C

C

C

C

CC CC

C

C

C

C

CC

C

CC

C

C

C

C

C

C

C

C

C C

CC

CCC

CC

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

King and Nielsen

Argument 3I Random pruning (deleting observations at random) increasesimbalance This is because the sample size decreases so that varianceincreases (large differences become more likely)

I More imbalancevariance means more model dependence andresearcher discretion

I Because PSM approximates complete randomization it engages inrandom pruning

I PSM Paradox (ldquowhen you do lsquobetterrsquo you do worserdquo)F When matching is made more strict (eg by decreasing the size of

the caliper) PSM like other matching methods typically reducesimbalance But soon the PSM Paradox kicks in such that furtherpruning quickly increases imbalance

F If the data is such that there are no big differences between treatedand untreated to begin with the PSM Paradox kicks in almostimmediately

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 25

PSM Increases Model Dependence amp Bias

Model Dependence0

000

010

020

030

040

05

Number of Units Pruned

Varia

nce

0 40 80 120 160

MDM

PSM

Bias

20

25

30

35

40

Number of Units Pruned

Max

imum

Coe

ffici

ent a

cros

s 51

2 Sp

ecifi

catio

ns0 40 80 120 160

MDM

PSM

True effect = 2

Yi = 2Ti + X1i + X2i + ii N(0 1)

2023 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26

The Propensity Score Paradox in Real Data

Finkel et al (JOP 2012)

0 500 1000 1500 2000 2500 3000

0

2

4

6

8

10

Number of units pruned

Imba

lanc

e

CEMMDM

PSM

14 SD caliper

Raw

Random

Nielsen et al (AJPS 2011)

0 500 1000 1500 2000 2500

0

5

10

15

20

25

30

Number of units pruned

Imba

lanc

e

CEM

MDM

PSM

14 SD caliper

RandomRaw

Similar pattern for gt 20 other real data sets we checked

2123 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 27

Are King and Nielsen right

Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)

I Matching is good because it reduces model dependence

I fully agree

My view however may be somewhat less pessimistic I believe thatresearch results can be credible if researchers are well educated sothat they know what they are doing and if modeling decisions aremade transparent and robustness of results is evaluated (anddocumented)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 28

Are King and Nielsen right

Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization

That fully blocked randomization is more efficient than completerandomization ndash given the sample size ndash is of course true (how largethe efficiency gains are depends on the strength of the relationbetween X and Y )

However if blocking reduces the sample size it is not a priori clearwhether estimates from the blocked sample are more efficient thanestimates from the full sample (although often they will be)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29

Are King and Nielsen right

Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization

That PSM approximates complete randomization is only partiallytrue PSM approximates complete randomization withinobservations with the same propensity score Hence PSM issomewhere between complete randomization and fully blockedrandomizationI If the X variables have no relation to T (treatment) then allobservations have the same propensity score Hence we end up withcomplete randomization

I If the X variables have a strong effect on T there is lots of blocking

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)

As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)

Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary

I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)

In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31

The Command

kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)

I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics

I Computationally efficient implementation

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32

Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)

drop if industry==2(4 observations deleted)

Mahalanobis-distance kernel matching

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33

Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)

Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif

collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888

3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0

10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0

2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0

Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio

collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966

3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1

10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1

2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34

Some Examples make a graph of the balancing stats mat M = r(M)

mat V = r(V)

coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)

addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))

addplot 2 xline(1) norescaling

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-4 -2 0 2 4 0 1 2 3 4

Std mean difference Variance ratio

Raw Matched

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35

Some Examples

Propensity-score kernel matching

kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Propensity-score kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 431 26 457 1214 182 1396 00188

Treatment-effects estimation

wage Coef

ATT 3887224NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36

Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)

01

23

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Den

sity

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37

Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)

05

1

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Cum

ulat

ive

prob

abilit

y

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38

Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)

02

46

8Raw Matched (ATT)

Untreated Treated

Prop

ensi

ty s

core

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39

Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289

NATE 1432913 2333282 614 0000 9755981 1890228

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40

Some Examples

Do some tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41

Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42

Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43

Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44

Some Examples

Mahalanobis-distance and propensity-score matching combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45

Some Examples

Exact matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46

Some Examples

Bandwidth selection the default (based on distribution of distances in

one-nearest-neighbor matching)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47

Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

3

02

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48

Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49

Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

4

1112

1314

Wei

ghte

d M

ISE

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50

Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51

Some Examples make a graph of the common-support stats mat M = r(M)

coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52

Some Examples

Multiple outcome variables

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53

Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54

Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2763358 166 0097 -082975 1000241

1ATT 9518705 406903 234 0019 1543553 1749386

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 123Prob gt chi2 = 02679

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4452343 111 0268 -379406 1365881

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017-06-24

Propensity Scores MatchingIllustration using kmatch

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017-06-24

Propensity Scores MatchingIllustration using kmatch

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64

Conclusions

The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)

Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65

Conclusions

Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the

literature (somewhat unclear eg how categorical variables should betreated)

Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 22: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

CC

C

CC

C

C

CC

C

C

C

C

C

C

C

C

CC C

C

C

C

C

C

C

C

C

C

C

C

CCC

CC

CC

C

C

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

CC

C

CC

C

C

CC

C

C

C

C

C

C

C

C

CC C

C

C

C

C

C

C

C

C

C

C

C

CCC

CC

CC

C

C

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

CC

C

CC

C

C

CC

C

C

C

C

C

C

C

C

CC C

C

C

C

C

C

C

C

C

C

C

C

CCC

CC

CC

C

C

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

TC

C

C

CC

CC

C

C

CC

C CC

C

C

CCCC

C

CC

C

CC

CCCC

C

C

C

C

CC

CCCC

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

TC

C

C

CC

CC

C

C

CC

C CC

C

C

CCCC

C

CC

C

CC

CCCC

C

C

C

C

CC

CCCC

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

King and NielsenArgument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomizationMatching Finding Hidden Randomized Experiments

Types of Experiments

BalanceCovariates

CompleteRandomization

FullyBlocked

Observed On average ExactUnobserved On average On average

Fully blocked dominates complete randomization forimbalance model dependence power eciency bias researchcosts robustness Eg Imai King Nall 2009 SEs 600 smaller

Goal of Each Matching Method (in Observational Data)

bull PSM complete randomization

bull Other methods fully blocked

bull Other matching methods dominate PSM (wait it gets worse)

623

(slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 23

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

CC CCC CCCCC CCCCCC CCC CC C C CCCCC CCC CC CC CC CC CC CC CC CC C CCCCC CC CC C CCCC C CCC CC CC C CCC CC CC CC C CCC CC CC CCC C CC CCCCC CC CCCC C CCC CCC CC C C CCCC CC CCCC C CCCC C CC CCC CC CCC CC CC CC CC C CCCC C CC C C CC CCC CC CCC CCCCC CCCC CCC CC CCC C C CC CC CC CCCC CC CC CCC C C CCC C CC CCC CC C CCC C C CCC CC CC C CC C CC CCCCC CCCC C C CC C CCCC CC CCC CCC C CCC CC CC CC CC CC C CC C CC CC CCC CC C C CCC CCC C CC CC CCC CCC CC CCC CC C CCC CC C CC CCCC C CC C CC CC C CC C CCC C C CCC CC C CCC CCC CC CCCC CC CC C CC C CC CC C CC CCC CC C CCC CCCC C CC CC C CCC CC CC CC C CCCCC CCC C C CC C CC CCC CCC CC CCC CC CCCC C CCC CC CCC

923 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

923 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching is Suboptimal

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

CC

C C

CC

C

C

C CC

C

C

CC

C

C

C

C

C

CC CC

C

C

C

C

CC

C

CC

C

C

C

C

C

C

C

C

C C

CC

CCC

CC

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

King and Nielsen

Argument 3I Random pruning (deleting observations at random) increasesimbalance This is because the sample size decreases so that varianceincreases (large differences become more likely)

I More imbalancevariance means more model dependence andresearcher discretion

I Because PSM approximates complete randomization it engages inrandom pruning

I PSM Paradox (ldquowhen you do lsquobetterrsquo you do worserdquo)F When matching is made more strict (eg by decreasing the size of

the caliper) PSM like other matching methods typically reducesimbalance But soon the PSM Paradox kicks in such that furtherpruning quickly increases imbalance

F If the data is such that there are no big differences between treatedand untreated to begin with the PSM Paradox kicks in almostimmediately

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 25

PSM Increases Model Dependence amp Bias

Model Dependence0

000

010

020

030

040

05

Number of Units Pruned

Varia

nce

0 40 80 120 160

MDM

PSM

Bias

20

25

30

35

40

Number of Units Pruned

Max

imum

Coe

ffici

ent a

cros

s 51

2 Sp

ecifi

catio

ns0 40 80 120 160

MDM

PSM

True effect = 2

Yi = 2Ti + X1i + X2i + ii N(0 1)

2023 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26

The Propensity Score Paradox in Real Data

Finkel et al (JOP 2012)

0 500 1000 1500 2000 2500 3000

0

2

4

6

8

10

Number of units pruned

Imba

lanc

e

CEMMDM

PSM

14 SD caliper

Raw

Random

Nielsen et al (AJPS 2011)

0 500 1000 1500 2000 2500

0

5

10

15

20

25

30

Number of units pruned

Imba

lanc

e

CEM

MDM

PSM

14 SD caliper

RandomRaw

Similar pattern for gt 20 other real data sets we checked

2123 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 27

Are King and Nielsen right

Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)

I Matching is good because it reduces model dependence

I fully agree

My view however may be somewhat less pessimistic I believe thatresearch results can be credible if researchers are well educated sothat they know what they are doing and if modeling decisions aremade transparent and robustness of results is evaluated (anddocumented)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 28

Are King and Nielsen right

Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization

That fully blocked randomization is more efficient than completerandomization ndash given the sample size ndash is of course true (how largethe efficiency gains are depends on the strength of the relationbetween X and Y )

However if blocking reduces the sample size it is not a priori clearwhether estimates from the blocked sample are more efficient thanestimates from the full sample (although often they will be)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29

Are King and Nielsen right

Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization

That PSM approximates complete randomization is only partiallytrue PSM approximates complete randomization withinobservations with the same propensity score Hence PSM issomewhere between complete randomization and fully blockedrandomizationI If the X variables have no relation to T (treatment) then allobservations have the same propensity score Hence we end up withcomplete randomization

I If the X variables have a strong effect on T there is lots of blocking

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)

As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)

Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary

I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)

In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31

The Command

kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)

I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics

I Computationally efficient implementation

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32

Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)

drop if industry==2(4 observations deleted)

Mahalanobis-distance kernel matching

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33

Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)

Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif

collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888

3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0

10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0

2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0

Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio

collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966

3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1

10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1

2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34

Some Examples make a graph of the balancing stats mat M = r(M)

mat V = r(V)

coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)

addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))

addplot 2 xline(1) norescaling

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-4 -2 0 2 4 0 1 2 3 4

Std mean difference Variance ratio

Raw Matched

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35

Some Examples

Propensity-score kernel matching

kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Propensity-score kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 431 26 457 1214 182 1396 00188

Treatment-effects estimation

wage Coef

ATT 3887224NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36

Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)

01

23

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Den

sity

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37

Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)

05

1

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Cum

ulat

ive

prob

abilit

y

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38

Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)

02

46

8Raw Matched (ATT)

Untreated Treated

Prop

ensi

ty s

core

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39

Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289

NATE 1432913 2333282 614 0000 9755981 1890228

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40

Some Examples

Do some tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41

Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42

Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43

Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44

Some Examples

Mahalanobis-distance and propensity-score matching combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45

Some Examples

Exact matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46

Some Examples

Bandwidth selection the default (based on distribution of distances in

one-nearest-neighbor matching)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47

Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

3

02

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48

Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49

Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

4

1112

1314

Wei

ghte

d M

ISE

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50

Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51

Some Examples make a graph of the common-support stats mat M = r(M)

coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52

Some Examples

Multiple outcome variables

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53

Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54

Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2763358 166 0097 -082975 1000241

1ATT 9518705 406903 234 0019 1543553 1749386

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 123Prob gt chi2 = 02679

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4452343 111 0268 -379406 1365881

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017-06-24

Propensity Scores MatchingIllustration using kmatch

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017-06-24

Propensity Scores MatchingIllustration using kmatch

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64

Conclusions

The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)

Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65

Conclusions

Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the

literature (somewhat unclear eg how categorical variables should betreated)

Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 23: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

CC

C

CC

C

C

CC

C

C

C

C

C

C

C

C

CC C

C

C

C

C

C

C

C

C

C

C

C

CCC

CC

CC

C

C

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

CC

C

CC

C

C

CC

C

C

C

C

C

C

C

C

CC C

C

C

C

C

C

C

C

C

C

C

C

CCC

CC

CC

C

C

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

CC

C

CC

C

C

CC

C

C

C

C

C

C

C

C

CC C

C

C

C

C

C

C

C

C

C

C

C

CCC

CC

CC

C

C

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

TC

C

C

CC

CC

C

C

CC

C CC

C

C

CCCC

C

CC

C

CC

CCCC

C

C

C

C

CC

CCCC

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

TC

C

C

CC

CC

C

C

CC

C CC

C

C

CCCC

C

CC

C

CC

CCCC

C

C

C

C

CC

CCCC

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

King and NielsenArgument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomizationMatching Finding Hidden Randomized Experiments

Types of Experiments

BalanceCovariates

CompleteRandomization

FullyBlocked

Observed On average ExactUnobserved On average On average

Fully blocked dominates complete randomization forimbalance model dependence power eciency bias researchcosts robustness Eg Imai King Nall 2009 SEs 600 smaller

Goal of Each Matching Method (in Observational Data)

bull PSM complete randomization

bull Other methods fully blocked

bull Other matching methods dominate PSM (wait it gets worse)

623

(slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 23

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

CC CCC CCCCC CCCCCC CCC CC C C CCCCC CCC CC CC CC CC CC CC CC CC C CCCCC CC CC C CCCC C CCC CC CC C CCC CC CC CC C CCC CC CC CCC C CC CCCCC CC CCCC C CCC CCC CC C C CCCC CC CCCC C CCCC C CC CCC CC CCC CC CC CC CC C CCCC C CC C C CC CCC CC CCC CCCCC CCCC CCC CC CCC C C CC CC CC CCCC CC CC CCC C C CCC C CC CCC CC C CCC C C CCC CC CC C CC C CC CCCCC CCCC C C CC C CCCC CC CCC CCC C CCC CC CC CC CC CC C CC C CC CC CCC CC C C CCC CCC C CC CC CCC CCC CC CCC CC C CCC CC C CC CCCC C CC C CC CC C CC C CCC C C CCC CC C CCC CCC CC CCCC CC CC C CC C CC CC C CC CCC CC C CCC CCCC C CC CC C CCC CC CC CC C CCCCC CCC C C CC C CC CCC CCC CC CCC CC CCCC C CCC CC CCC

923 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

923 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching is Suboptimal

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

CC

C C

CC

C

C

C CC

C

C

CC

C

C

C

C

C

CC CC

C

C

C

C

CC

C

CC

C

C

C

C

C

C

C

C

C C

CC

CCC

CC

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

King and Nielsen

Argument 3I Random pruning (deleting observations at random) increasesimbalance This is because the sample size decreases so that varianceincreases (large differences become more likely)

I More imbalancevariance means more model dependence andresearcher discretion

I Because PSM approximates complete randomization it engages inrandom pruning

I PSM Paradox (ldquowhen you do lsquobetterrsquo you do worserdquo)F When matching is made more strict (eg by decreasing the size of

the caliper) PSM like other matching methods typically reducesimbalance But soon the PSM Paradox kicks in such that furtherpruning quickly increases imbalance

F If the data is such that there are no big differences between treatedand untreated to begin with the PSM Paradox kicks in almostimmediately

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 25

PSM Increases Model Dependence amp Bias

Model Dependence0

000

010

020

030

040

05

Number of Units Pruned

Varia

nce

0 40 80 120 160

MDM

PSM

Bias

20

25

30

35

40

Number of Units Pruned

Max

imum

Coe

ffici

ent a

cros

s 51

2 Sp

ecifi

catio

ns0 40 80 120 160

MDM

PSM

True effect = 2

Yi = 2Ti + X1i + X2i + ii N(0 1)

2023 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26

The Propensity Score Paradox in Real Data

Finkel et al (JOP 2012)

0 500 1000 1500 2000 2500 3000

0

2

4

6

8

10

Number of units pruned

Imba

lanc

e

CEMMDM

PSM

14 SD caliper

Raw

Random

Nielsen et al (AJPS 2011)

0 500 1000 1500 2000 2500

0

5

10

15

20

25

30

Number of units pruned

Imba

lanc

e

CEM

MDM

PSM

14 SD caliper

RandomRaw

Similar pattern for gt 20 other real data sets we checked

2123 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 27

Are King and Nielsen right

Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)

I Matching is good because it reduces model dependence

I fully agree

My view however may be somewhat less pessimistic I believe thatresearch results can be credible if researchers are well educated sothat they know what they are doing and if modeling decisions aremade transparent and robustness of results is evaluated (anddocumented)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 28

Are King and Nielsen right

Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization

That fully blocked randomization is more efficient than completerandomization ndash given the sample size ndash is of course true (how largethe efficiency gains are depends on the strength of the relationbetween X and Y )

However if blocking reduces the sample size it is not a priori clearwhether estimates from the blocked sample are more efficient thanestimates from the full sample (although often they will be)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29

Are King and Nielsen right

Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization

That PSM approximates complete randomization is only partiallytrue PSM approximates complete randomization withinobservations with the same propensity score Hence PSM issomewhere between complete randomization and fully blockedrandomizationI If the X variables have no relation to T (treatment) then allobservations have the same propensity score Hence we end up withcomplete randomization

I If the X variables have a strong effect on T there is lots of blocking

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)

As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)

Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary

I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)

In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31

The Command

kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)

I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics

I Computationally efficient implementation

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32

Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)

drop if industry==2(4 observations deleted)

Mahalanobis-distance kernel matching

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33

Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)

Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif

collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888

3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0

10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0

2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0

Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio

collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966

3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1

10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1

2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34

Some Examples make a graph of the balancing stats mat M = r(M)

mat V = r(V)

coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)

addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))

addplot 2 xline(1) norescaling

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-4 -2 0 2 4 0 1 2 3 4

Std mean difference Variance ratio

Raw Matched

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35

Some Examples

Propensity-score kernel matching

kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Propensity-score kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 431 26 457 1214 182 1396 00188

Treatment-effects estimation

wage Coef

ATT 3887224NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36

Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)

01

23

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Den

sity

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37

Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)

05

1

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Cum

ulat

ive

prob

abilit

y

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38

Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)

02

46

8Raw Matched (ATT)

Untreated Treated

Prop

ensi

ty s

core

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39

Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289

NATE 1432913 2333282 614 0000 9755981 1890228

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40

Some Examples

Do some tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41

Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42

Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43

Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44

Some Examples

Mahalanobis-distance and propensity-score matching combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45

Some Examples

Exact matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46

Some Examples

Bandwidth selection the default (based on distribution of distances in

one-nearest-neighbor matching)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47

Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

3

02

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48

Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49

Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

4

1112

1314

Wei

ghte

d M

ISE

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50

Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51

Some Examples make a graph of the common-support stats mat M = r(M)

coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52

Some Examples

Multiple outcome variables

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53

Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54

Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2763358 166 0097 -082975 1000241

1ATT 9518705 406903 234 0019 1543553 1749386

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 123Prob gt chi2 = 02679

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4452343 111 0268 -379406 1365881

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017-06-24

Propensity Scores MatchingIllustration using kmatch

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017-06-24

Propensity Scores MatchingIllustration using kmatch

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64

Conclusions

The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)

Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65

Conclusions

Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the

literature (somewhat unclear eg how categorical variables should betreated)

Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 24: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

CC

C

CC

C

C

CC

C

C

C

C

C

C

C

C

CC C

C

C

C

C

C

C

C

C

C

C

C

CCC

CC

CC

C

C

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

CC

C

CC

C

C

CC

C

C

C

C

C

C

C

C

CC C

C

C

C

C

C

C

C

C

C

C

C

CCC

CC

CC

C

C

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

TC

C

C

CC

CC

C

C

CC

C CC

C

C

CCCC

C

CC

C

CC

CCCC

C

C

C

C

CC

CCCC

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

TC

C

C

CC

CC

C

C

CC

C CC

C

C

CCCC

C

CC

C

CC

CCCC

C

C

C

C

CC

CCCC

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

King and NielsenArgument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomizationMatching Finding Hidden Randomized Experiments

Types of Experiments

BalanceCovariates

CompleteRandomization

FullyBlocked

Observed On average ExactUnobserved On average On average

Fully blocked dominates complete randomization forimbalance model dependence power eciency bias researchcosts robustness Eg Imai King Nall 2009 SEs 600 smaller

Goal of Each Matching Method (in Observational Data)

bull PSM complete randomization

bull Other methods fully blocked

bull Other matching methods dominate PSM (wait it gets worse)

623

(slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 23

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

CC CCC CCCCC CCCCCC CCC CC C C CCCCC CCC CC CC CC CC CC CC CC CC C CCCCC CC CC C CCCC C CCC CC CC C CCC CC CC CC C CCC CC CC CCC C CC CCCCC CC CCCC C CCC CCC CC C C CCCC CC CCCC C CCCC C CC CCC CC CCC CC CC CC CC C CCCC C CC C C CC CCC CC CCC CCCCC CCCC CCC CC CCC C C CC CC CC CCCC CC CC CCC C C CCC C CC CCC CC C CCC C C CCC CC CC C CC C CC CCCCC CCCC C C CC C CCCC CC CCC CCC C CCC CC CC CC CC CC C CC C CC CC CCC CC C C CCC CCC C CC CC CCC CCC CC CCC CC C CCC CC C CC CCCC C CC C CC CC C CC C CCC C C CCC CC C CCC CCC CC CCCC CC CC C CC C CC CC C CC CCC CC C CCC CCCC C CC CC C CCC CC CC CC C CCCCC CCC C C CC C CC CCC CCC CC CCC CC CCCC C CCC CC CCC

923 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

923 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching is Suboptimal

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

CC

C C

CC

C

C

C CC

C

C

CC

C

C

C

C

C

CC CC

C

C

C

C

CC

C

CC

C

C

C

C

C

C

C

C

C C

CC

CCC

CC

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

King and Nielsen

Argument 3I Random pruning (deleting observations at random) increasesimbalance This is because the sample size decreases so that varianceincreases (large differences become more likely)

I More imbalancevariance means more model dependence andresearcher discretion

I Because PSM approximates complete randomization it engages inrandom pruning

I PSM Paradox (ldquowhen you do lsquobetterrsquo you do worserdquo)F When matching is made more strict (eg by decreasing the size of

the caliper) PSM like other matching methods typically reducesimbalance But soon the PSM Paradox kicks in such that furtherpruning quickly increases imbalance

F If the data is such that there are no big differences between treatedand untreated to begin with the PSM Paradox kicks in almostimmediately

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 25

PSM Increases Model Dependence amp Bias

Model Dependence0

000

010

020

030

040

05

Number of Units Pruned

Varia

nce

0 40 80 120 160

MDM

PSM

Bias

20

25

30

35

40

Number of Units Pruned

Max

imum

Coe

ffici

ent a

cros

s 51

2 Sp

ecifi

catio

ns0 40 80 120 160

MDM

PSM

True effect = 2

Yi = 2Ti + X1i + X2i + ii N(0 1)

2023 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26

The Propensity Score Paradox in Real Data

Finkel et al (JOP 2012)

0 500 1000 1500 2000 2500 3000

0

2

4

6

8

10

Number of units pruned

Imba

lanc

e

CEMMDM

PSM

14 SD caliper

Raw

Random

Nielsen et al (AJPS 2011)

0 500 1000 1500 2000 2500

0

5

10

15

20

25

30

Number of units pruned

Imba

lanc

e

CEM

MDM

PSM

14 SD caliper

RandomRaw

Similar pattern for gt 20 other real data sets we checked

2123 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 27

Are King and Nielsen right

Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)

I Matching is good because it reduces model dependence

I fully agree

My view however may be somewhat less pessimistic I believe thatresearch results can be credible if researchers are well educated sothat they know what they are doing and if modeling decisions aremade transparent and robustness of results is evaluated (anddocumented)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 28

Are King and Nielsen right

Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization

That fully blocked randomization is more efficient than completerandomization ndash given the sample size ndash is of course true (how largethe efficiency gains are depends on the strength of the relationbetween X and Y )

However if blocking reduces the sample size it is not a priori clearwhether estimates from the blocked sample are more efficient thanestimates from the full sample (although often they will be)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29

Are King and Nielsen right

Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization

That PSM approximates complete randomization is only partiallytrue PSM approximates complete randomization withinobservations with the same propensity score Hence PSM issomewhere between complete randomization and fully blockedrandomizationI If the X variables have no relation to T (treatment) then allobservations have the same propensity score Hence we end up withcomplete randomization

I If the X variables have a strong effect on T there is lots of blocking

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)

As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)

Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary

I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)

In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31

The Command

kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)

I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics

I Computationally efficient implementation

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32

Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)

drop if industry==2(4 observations deleted)

Mahalanobis-distance kernel matching

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33

Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)

Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif

collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888

3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0

10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0

2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0

Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio

collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966

3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1

10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1

2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34

Some Examples make a graph of the balancing stats mat M = r(M)

mat V = r(V)

coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)

addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))

addplot 2 xline(1) norescaling

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-4 -2 0 2 4 0 1 2 3 4

Std mean difference Variance ratio

Raw Matched

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35

Some Examples

Propensity-score kernel matching

kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Propensity-score kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 431 26 457 1214 182 1396 00188

Treatment-effects estimation

wage Coef

ATT 3887224NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36

Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)

01

23

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Den

sity

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37

Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)

05

1

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Cum

ulat

ive

prob

abilit

y

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38

Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)

02

46

8Raw Matched (ATT)

Untreated Treated

Prop

ensi

ty s

core

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39

Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289

NATE 1432913 2333282 614 0000 9755981 1890228

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40

Some Examples

Do some tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41

Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42

Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43

Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44

Some Examples

Mahalanobis-distance and propensity-score matching combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45

Some Examples

Exact matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46

Some Examples

Bandwidth selection the default (based on distribution of distances in

one-nearest-neighbor matching)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47

Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

3

02

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48

Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49

Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

4

1112

1314

Wei

ghte

d M

ISE

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50

Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51

Some Examples make a graph of the common-support stats mat M = r(M)

coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52

Some Examples

Multiple outcome variables

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53

Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54

Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2763358 166 0097 -082975 1000241

1ATT 9518705 406903 234 0019 1543553 1749386

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 123Prob gt chi2 = 02679

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4452343 111 0268 -379406 1365881

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017-06-24

Propensity Scores MatchingIllustration using kmatch

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017-06-24

Propensity Scores MatchingIllustration using kmatch

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64

Conclusions

The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)

Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65

Conclusions

Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the

literature (somewhat unclear eg how categorical variables should betreated)

Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 25: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

T

CC

C

CC

C

C

CC

C

C

C

C

C

C

C

C

CC C

C

C

C

C

C

C

C

C

C

C

C

CCC

CC

CC

C

C

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

TC

C

C

CC

CC

C

C

CC

C CC

C

C

CCCC

C

CC

C

CC

CCCC

C

C

C

C

CC

CCCC

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

TC

C

C

CC

CC

C

C

CC

C CC

C

C

CCCC

C

CC

C

CC

CCCC

C

C

C

C

CC

CCCC

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

King and NielsenArgument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomizationMatching Finding Hidden Randomized Experiments

Types of Experiments

BalanceCovariates

CompleteRandomization

FullyBlocked

Observed On average ExactUnobserved On average On average

Fully blocked dominates complete randomization forimbalance model dependence power eciency bias researchcosts robustness Eg Imai King Nall 2009 SEs 600 smaller

Goal of Each Matching Method (in Observational Data)

bull PSM complete randomization

bull Other methods fully blocked

bull Other matching methods dominate PSM (wait it gets worse)

623

(slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 23

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

CC CCC CCCCC CCCCCC CCC CC C C CCCCC CCC CC CC CC CC CC CC CC CC C CCCCC CC CC C CCCC C CCC CC CC C CCC CC CC CC C CCC CC CC CCC C CC CCCCC CC CCCC C CCC CCC CC C C CCCC CC CCCC C CCCC C CC CCC CC CCC CC CC CC CC C CCCC C CC C C CC CCC CC CCC CCCCC CCCC CCC CC CCC C C CC CC CC CCCC CC CC CCC C C CCC C CC CCC CC C CCC C C CCC CC CC C CC C CC CCCCC CCCC C C CC C CCCC CC CCC CCC C CCC CC CC CC CC CC C CC C CC CC CCC CC C C CCC CCC C CC CC CCC CCC CC CCC CC C CCC CC C CC CCCC C CC C CC CC C CC C CCC C C CCC CC C CCC CCC CC CCCC CC CC C CC C CC CC C CC CCC CC C CCC CCCC C CC CC C CCC CC CC CC C CCCCC CCC C C CC C CC CCC CCC CC CCC CC CCCC C CCC CC CCC

923 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

923 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching is Suboptimal

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

CC

C C

CC

C

C

C CC

C

C

CC

C

C

C

C

C

CC CC

C

C

C

C

CC

C

CC

C

C

C

C

C

C

C

C

C C

CC

CCC

CC

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

King and Nielsen

Argument 3I Random pruning (deleting observations at random) increasesimbalance This is because the sample size decreases so that varianceincreases (large differences become more likely)

I More imbalancevariance means more model dependence andresearcher discretion

I Because PSM approximates complete randomization it engages inrandom pruning

I PSM Paradox (ldquowhen you do lsquobetterrsquo you do worserdquo)F When matching is made more strict (eg by decreasing the size of

the caliper) PSM like other matching methods typically reducesimbalance But soon the PSM Paradox kicks in such that furtherpruning quickly increases imbalance

F If the data is such that there are no big differences between treatedand untreated to begin with the PSM Paradox kicks in almostimmediately

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 25

PSM Increases Model Dependence amp Bias

Model Dependence0

000

010

020

030

040

05

Number of Units Pruned

Varia

nce

0 40 80 120 160

MDM

PSM

Bias

20

25

30

35

40

Number of Units Pruned

Max

imum

Coe

ffici

ent a

cros

s 51

2 Sp

ecifi

catio

ns0 40 80 120 160

MDM

PSM

True effect = 2

Yi = 2Ti + X1i + X2i + ii N(0 1)

2023 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26

The Propensity Score Paradox in Real Data

Finkel et al (JOP 2012)

0 500 1000 1500 2000 2500 3000

0

2

4

6

8

10

Number of units pruned

Imba

lanc

e

CEMMDM

PSM

14 SD caliper

Raw

Random

Nielsen et al (AJPS 2011)

0 500 1000 1500 2000 2500

0

5

10

15

20

25

30

Number of units pruned

Imba

lanc

e

CEM

MDM

PSM

14 SD caliper

RandomRaw

Similar pattern for gt 20 other real data sets we checked

2123 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 27

Are King and Nielsen right

Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)

I Matching is good because it reduces model dependence

I fully agree

My view however may be somewhat less pessimistic I believe thatresearch results can be credible if researchers are well educated sothat they know what they are doing and if modeling decisions aremade transparent and robustness of results is evaluated (anddocumented)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 28

Are King and Nielsen right

Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization

That fully blocked randomization is more efficient than completerandomization ndash given the sample size ndash is of course true (how largethe efficiency gains are depends on the strength of the relationbetween X and Y )

However if blocking reduces the sample size it is not a priori clearwhether estimates from the blocked sample are more efficient thanestimates from the full sample (although often they will be)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29

Are King and Nielsen right

Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization

That PSM approximates complete randomization is only partiallytrue PSM approximates complete randomization withinobservations with the same propensity score Hence PSM issomewhere between complete randomization and fully blockedrandomizationI If the X variables have no relation to T (treatment) then allobservations have the same propensity score Hence we end up withcomplete randomization

I If the X variables have a strong effect on T there is lots of blocking

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)

As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)

Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary

I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)

In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31

The Command

kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)

I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics

I Computationally efficient implementation

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32

Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)

drop if industry==2(4 observations deleted)

Mahalanobis-distance kernel matching

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33

Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)

Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif

collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888

3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0

10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0

2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0

Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio

collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966

3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1

10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1

2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34

Some Examples make a graph of the balancing stats mat M = r(M)

mat V = r(V)

coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)

addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))

addplot 2 xline(1) norescaling

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-4 -2 0 2 4 0 1 2 3 4

Std mean difference Variance ratio

Raw Matched

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35

Some Examples

Propensity-score kernel matching

kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Propensity-score kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 431 26 457 1214 182 1396 00188

Treatment-effects estimation

wage Coef

ATT 3887224NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36

Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)

01

23

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Den

sity

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37

Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)

05

1

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Cum

ulat

ive

prob

abilit

y

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38

Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)

02

46

8Raw Matched (ATT)

Untreated Treated

Prop

ensi

ty s

core

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39

Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289

NATE 1432913 2333282 614 0000 9755981 1890228

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40

Some Examples

Do some tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41

Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42

Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43

Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44

Some Examples

Mahalanobis-distance and propensity-score matching combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45

Some Examples

Exact matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46

Some Examples

Bandwidth selection the default (based on distribution of distances in

one-nearest-neighbor matching)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47

Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

3

02

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48

Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49

Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

4

1112

1314

Wei

ghte

d M

ISE

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50

Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51

Some Examples make a graph of the common-support stats mat M = r(M)

coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52

Some Examples

Multiple outcome variables

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53

Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54

Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2763358 166 0097 -082975 1000241

1ATT 9518705 406903 234 0019 1543553 1749386

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 123Prob gt chi2 = 02679

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4452343 111 0268 -379406 1365881

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017-06-24

Propensity Scores MatchingIllustration using kmatch

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017-06-24

Propensity Scores MatchingIllustration using kmatch

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64

Conclusions

The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)

Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65

Conclusions

Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the

literature (somewhat unclear eg how categorical variables should betreated)

Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 26: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

TC

C

C

CC

CC

C

C

CC

C CC

C

C

CCCC

C

CC

C

CC

CCCC

C

C

C

C

CC

CCCC

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

TC

C

C

CC

CC

C

C

CC

C CC

C

C

CCCC

C

CC

C

CC

CCCC

C

C

C

C

CC

CCCC

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

King and NielsenArgument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomizationMatching Finding Hidden Randomized Experiments

Types of Experiments

BalanceCovariates

CompleteRandomization

FullyBlocked

Observed On average ExactUnobserved On average On average

Fully blocked dominates complete randomization forimbalance model dependence power eciency bias researchcosts robustness Eg Imai King Nall 2009 SEs 600 smaller

Goal of Each Matching Method (in Observational Data)

bull PSM complete randomization

bull Other methods fully blocked

bull Other matching methods dominate PSM (wait it gets worse)

623

(slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 23

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

CC CCC CCCCC CCCCCC CCC CC C C CCCCC CCC CC CC CC CC CC CC CC CC C CCCCC CC CC C CCCC C CCC CC CC C CCC CC CC CC C CCC CC CC CCC C CC CCCCC CC CCCC C CCC CCC CC C C CCCC CC CCCC C CCCC C CC CCC CC CCC CC CC CC CC C CCCC C CC C C CC CCC CC CCC CCCCC CCCC CCC CC CCC C C CC CC CC CCCC CC CC CCC C C CCC C CC CCC CC C CCC C C CCC CC CC C CC C CC CCCCC CCCC C C CC C CCCC CC CCC CCC C CCC CC CC CC CC CC C CC C CC CC CCC CC C C CCC CCC C CC CC CCC CCC CC CCC CC C CCC CC C CC CCCC C CC C CC CC C CC C CCC C C CCC CC C CCC CCC CC CCCC CC CC C CC C CC CC C CC CCC CC C CCC CCCC C CC CC C CCC CC CC CC C CCCCC CCC C C CC C CC CCC CCC CC CCC CC CCCC C CCC CC CCC

923 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

923 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching is Suboptimal

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

CC

C C

CC

C

C

C CC

C

C

CC

C

C

C

C

C

CC CC

C

C

C

C

CC

C

CC

C

C

C

C

C

C

C

C

C C

CC

CCC

CC

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

King and Nielsen

Argument 3I Random pruning (deleting observations at random) increasesimbalance This is because the sample size decreases so that varianceincreases (large differences become more likely)

I More imbalancevariance means more model dependence andresearcher discretion

I Because PSM approximates complete randomization it engages inrandom pruning

I PSM Paradox (ldquowhen you do lsquobetterrsquo you do worserdquo)F When matching is made more strict (eg by decreasing the size of

the caliper) PSM like other matching methods typically reducesimbalance But soon the PSM Paradox kicks in such that furtherpruning quickly increases imbalance

F If the data is such that there are no big differences between treatedand untreated to begin with the PSM Paradox kicks in almostimmediately

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 25

PSM Increases Model Dependence amp Bias

Model Dependence0

000

010

020

030

040

05

Number of Units Pruned

Varia

nce

0 40 80 120 160

MDM

PSM

Bias

20

25

30

35

40

Number of Units Pruned

Max

imum

Coe

ffici

ent a

cros

s 51

2 Sp

ecifi

catio

ns0 40 80 120 160

MDM

PSM

True effect = 2

Yi = 2Ti + X1i + X2i + ii N(0 1)

2023 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26

The Propensity Score Paradox in Real Data

Finkel et al (JOP 2012)

0 500 1000 1500 2000 2500 3000

0

2

4

6

8

10

Number of units pruned

Imba

lanc

e

CEMMDM

PSM

14 SD caliper

Raw

Random

Nielsen et al (AJPS 2011)

0 500 1000 1500 2000 2500

0

5

10

15

20

25

30

Number of units pruned

Imba

lanc

e

CEM

MDM

PSM

14 SD caliper

RandomRaw

Similar pattern for gt 20 other real data sets we checked

2123 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 27

Are King and Nielsen right

Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)

I Matching is good because it reduces model dependence

I fully agree

My view however may be somewhat less pessimistic I believe thatresearch results can be credible if researchers are well educated sothat they know what they are doing and if modeling decisions aremade transparent and robustness of results is evaluated (anddocumented)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 28

Are King and Nielsen right

Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization

That fully blocked randomization is more efficient than completerandomization ndash given the sample size ndash is of course true (how largethe efficiency gains are depends on the strength of the relationbetween X and Y )

However if blocking reduces the sample size it is not a priori clearwhether estimates from the blocked sample are more efficient thanestimates from the full sample (although often they will be)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29

Are King and Nielsen right

Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization

That PSM approximates complete randomization is only partiallytrue PSM approximates complete randomization withinobservations with the same propensity score Hence PSM issomewhere between complete randomization and fully blockedrandomizationI If the X variables have no relation to T (treatment) then allobservations have the same propensity score Hence we end up withcomplete randomization

I If the X variables have a strong effect on T there is lots of blocking

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)

As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)

Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary

I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)

In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31

The Command

kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)

I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics

I Computationally efficient implementation

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32

Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)

drop if industry==2(4 observations deleted)

Mahalanobis-distance kernel matching

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33

Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)

Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif

collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888

3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0

10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0

2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0

Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio

collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966

3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1

10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1

2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34

Some Examples make a graph of the balancing stats mat M = r(M)

mat V = r(V)

coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)

addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))

addplot 2 xline(1) norescaling

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-4 -2 0 2 4 0 1 2 3 4

Std mean difference Variance ratio

Raw Matched

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35

Some Examples

Propensity-score kernel matching

kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Propensity-score kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 431 26 457 1214 182 1396 00188

Treatment-effects estimation

wage Coef

ATT 3887224NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36

Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)

01

23

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Den

sity

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37

Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)

05

1

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Cum

ulat

ive

prob

abilit

y

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38

Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)

02

46

8Raw Matched (ATT)

Untreated Treated

Prop

ensi

ty s

core

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39

Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289

NATE 1432913 2333282 614 0000 9755981 1890228

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40

Some Examples

Do some tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41

Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42

Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43

Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44

Some Examples

Mahalanobis-distance and propensity-score matching combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45

Some Examples

Exact matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46

Some Examples

Bandwidth selection the default (based on distribution of distances in

one-nearest-neighbor matching)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47

Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

3

02

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48

Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49

Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

4

1112

1314

Wei

ghte

d M

ISE

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50

Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51

Some Examples make a graph of the common-support stats mat M = r(M)

coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52

Some Examples

Multiple outcome variables

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53

Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54

Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2763358 166 0097 -082975 1000241

1ATT 9518705 406903 234 0019 1543553 1749386

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 123Prob gt chi2 = 02679

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4452343 111 0268 -379406 1365881

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017-06-24

Propensity Scores MatchingIllustration using kmatch

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017-06-24

Propensity Scores MatchingIllustration using kmatch

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64

Conclusions

The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)

Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65

Conclusions

Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the

literature (somewhat unclear eg how categorical variables should betreated)

Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 27: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

TT T

T

T

T TT

TT

T TT T

T

T

T

TC

C

C

CC

CC

C

C

CC

C CC

C

C

CCCC

C

CC

C

CC

CCCC

C

C

C

C

CC

CCCC

323 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22

King and NielsenArgument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomizationMatching Finding Hidden Randomized Experiments

Types of Experiments

BalanceCovariates

CompleteRandomization

FullyBlocked

Observed On average ExactUnobserved On average On average

Fully blocked dominates complete randomization forimbalance model dependence power eciency bias researchcosts robustness Eg Imai King Nall 2009 SEs 600 smaller

Goal of Each Matching Method (in Observational Data)

bull PSM complete randomization

bull Other methods fully blocked

bull Other matching methods dominate PSM (wait it gets worse)

623

(slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 23

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

CC CCC CCCCC CCCCCC CCC CC C C CCCCC CCC CC CC CC CC CC CC CC CC C CCCCC CC CC C CCCC C CCC CC CC C CCC CC CC CC C CCC CC CC CCC C CC CCCCC CC CCCC C CCC CCC CC C C CCCC CC CCCC C CCCC C CC CCC CC CCC CC CC CC CC C CCCC C CC C C CC CCC CC CCC CCCCC CCCC CCC CC CCC C C CC CC CC CCCC CC CC CCC C C CCC C CC CCC CC C CCC C C CCC CC CC C CC C CC CCCCC CCCC C C CC C CCCC CC CCC CCC C CCC CC CC CC CC CC C CC C CC CC CCC CC C C CCC CCC C CC CC CCC CCC CC CCC CC C CCC CC C CC CCCC C CC C CC CC C CC C CCC C C CCC CC C CCC CCC CC CCCC CC CC C CC C CC CC C CC CCC CC C CCC CCCC C CC CC C CCC CC CC CC C CCCCC CCC C C CC C CC CCC CCC CC CCC CC CCCC C CCC CC CCC

923 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

923 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching is Suboptimal

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

CC

C C

CC

C

C

C CC

C

C

CC

C

C

C

C

C

CC CC

C

C

C

C

CC

C

CC

C

C

C

C

C

C

C

C

C C

CC

CCC

CC

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

King and Nielsen

Argument 3I Random pruning (deleting observations at random) increasesimbalance This is because the sample size decreases so that varianceincreases (large differences become more likely)

I More imbalancevariance means more model dependence andresearcher discretion

I Because PSM approximates complete randomization it engages inrandom pruning

I PSM Paradox (ldquowhen you do lsquobetterrsquo you do worserdquo)F When matching is made more strict (eg by decreasing the size of

the caliper) PSM like other matching methods typically reducesimbalance But soon the PSM Paradox kicks in such that furtherpruning quickly increases imbalance

F If the data is such that there are no big differences between treatedand untreated to begin with the PSM Paradox kicks in almostimmediately

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 25

PSM Increases Model Dependence amp Bias

Model Dependence0

000

010

020

030

040

05

Number of Units Pruned

Varia

nce

0 40 80 120 160

MDM

PSM

Bias

20

25

30

35

40

Number of Units Pruned

Max

imum

Coe

ffici

ent a

cros

s 51

2 Sp

ecifi

catio

ns0 40 80 120 160

MDM

PSM

True effect = 2

Yi = 2Ti + X1i + X2i + ii N(0 1)

2023 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26

The Propensity Score Paradox in Real Data

Finkel et al (JOP 2012)

0 500 1000 1500 2000 2500 3000

0

2

4

6

8

10

Number of units pruned

Imba

lanc

e

CEMMDM

PSM

14 SD caliper

Raw

Random

Nielsen et al (AJPS 2011)

0 500 1000 1500 2000 2500

0

5

10

15

20

25

30

Number of units pruned

Imba

lanc

e

CEM

MDM

PSM

14 SD caliper

RandomRaw

Similar pattern for gt 20 other real data sets we checked

2123 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 27

Are King and Nielsen right

Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)

I Matching is good because it reduces model dependence

I fully agree

My view however may be somewhat less pessimistic I believe thatresearch results can be credible if researchers are well educated sothat they know what they are doing and if modeling decisions aremade transparent and robustness of results is evaluated (anddocumented)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 28

Are King and Nielsen right

Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization

That fully blocked randomization is more efficient than completerandomization ndash given the sample size ndash is of course true (how largethe efficiency gains are depends on the strength of the relationbetween X and Y )

However if blocking reduces the sample size it is not a priori clearwhether estimates from the blocked sample are more efficient thanestimates from the full sample (although often they will be)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29

Are King and Nielsen right

Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization

That PSM approximates complete randomization is only partiallytrue PSM approximates complete randomization withinobservations with the same propensity score Hence PSM issomewhere between complete randomization and fully blockedrandomizationI If the X variables have no relation to T (treatment) then allobservations have the same propensity score Hence we end up withcomplete randomization

I If the X variables have a strong effect on T there is lots of blocking

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)

As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)

Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary

I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)

In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31

The Command

kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)

I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics

I Computationally efficient implementation

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32

Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)

drop if industry==2(4 observations deleted)

Mahalanobis-distance kernel matching

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33

Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)

Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif

collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888

3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0

10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0

2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0

Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio

collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966

3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1

10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1

2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34

Some Examples make a graph of the balancing stats mat M = r(M)

mat V = r(V)

coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)

addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))

addplot 2 xline(1) norescaling

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-4 -2 0 2 4 0 1 2 3 4

Std mean difference Variance ratio

Raw Matched

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35

Some Examples

Propensity-score kernel matching

kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Propensity-score kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 431 26 457 1214 182 1396 00188

Treatment-effects estimation

wage Coef

ATT 3887224NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36

Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)

01

23

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Den

sity

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37

Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)

05

1

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Cum

ulat

ive

prob

abilit

y

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38

Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)

02

46

8Raw Matched (ATT)

Untreated Treated

Prop

ensi

ty s

core

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39

Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289

NATE 1432913 2333282 614 0000 9755981 1890228

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40

Some Examples

Do some tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41

Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42

Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43

Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44

Some Examples

Mahalanobis-distance and propensity-score matching combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45

Some Examples

Exact matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46

Some Examples

Bandwidth selection the default (based on distribution of distances in

one-nearest-neighbor matching)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47

Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

3

02

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48

Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49

Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

4

1112

1314

Wei

ghte

d M

ISE

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50

Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51

Some Examples make a graph of the common-support stats mat M = r(M)

coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52

Some Examples

Multiple outcome variables

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53

Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54

Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2763358 166 0097 -082975 1000241

1ATT 9518705 406903 234 0019 1543553 1749386

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 123Prob gt chi2 = 02679

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4452343 111 0268 -379406 1365881

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017-06-24

Propensity Scores MatchingIllustration using kmatch

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017-06-24

Propensity Scores MatchingIllustration using kmatch

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64

Conclusions

The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)

Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65

Conclusions

Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the

literature (somewhat unclear eg how categorical variables should betreated)

Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 28: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

King and NielsenArgument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomizationMatching Finding Hidden Randomized Experiments

Types of Experiments

BalanceCovariates

CompleteRandomization

FullyBlocked

Observed On average ExactUnobserved On average On average

Fully blocked dominates complete randomization forimbalance model dependence power eciency bias researchcosts robustness Eg Imai King Nall 2009 SEs 600 smaller

Goal of Each Matching Method (in Observational Data)

bull PSM complete randomization

bull Other methods fully blocked

bull Other matching methods dominate PSM (wait it gets worse)

623

(slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 23

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

CC CCC CCCCC CCCCCC CCC CC C C CCCCC CCC CC CC CC CC CC CC CC CC C CCCCC CC CC C CCCC C CCC CC CC C CCC CC CC CC C CCC CC CC CCC C CC CCCCC CC CCCC C CCC CCC CC C C CCCC CC CCCC C CCCC C CC CCC CC CCC CC CC CC CC C CCCC C CC C C CC CCC CC CCC CCCCC CCCC CCC CC CCC C C CC CC CC CCCC CC CC CCC C C CCC C CC CCC CC C CCC C C CCC CC CC C CC C CC CCCCC CCCC C C CC C CCCC CC CCC CCC C CCC CC CC CC CC CC C CC C CC CC CCC CC C C CCC CCC C CC CC CCC CCC CC CCC CC C CCC CC C CC CCCC C CC C CC CC C CC C CCC C C CCC CC C CCC CCC CC CCCC CC CC C CC C CC CC C CC CCC CC C CCC CCCC C CC CC C CCC CC CC CC C CCCCC CCC C C CC C CC CCC CCC CC CCC CC CCCC C CCC CC CCC

923 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

923 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching is Suboptimal

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

CC

C C

CC

C

C

C CC

C

C

CC

C

C

C

C

C

CC CC

C

C

C

C

CC

C

CC

C

C

C

C

C

C

C

C

C C

CC

CCC

CC

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

King and Nielsen

Argument 3I Random pruning (deleting observations at random) increasesimbalance This is because the sample size decreases so that varianceincreases (large differences become more likely)

I More imbalancevariance means more model dependence andresearcher discretion

I Because PSM approximates complete randomization it engages inrandom pruning

I PSM Paradox (ldquowhen you do lsquobetterrsquo you do worserdquo)F When matching is made more strict (eg by decreasing the size of

the caliper) PSM like other matching methods typically reducesimbalance But soon the PSM Paradox kicks in such that furtherpruning quickly increases imbalance

F If the data is such that there are no big differences between treatedand untreated to begin with the PSM Paradox kicks in almostimmediately

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 25

PSM Increases Model Dependence amp Bias

Model Dependence0

000

010

020

030

040

05

Number of Units Pruned

Varia

nce

0 40 80 120 160

MDM

PSM

Bias

20

25

30

35

40

Number of Units Pruned

Max

imum

Coe

ffici

ent a

cros

s 51

2 Sp

ecifi

catio

ns0 40 80 120 160

MDM

PSM

True effect = 2

Yi = 2Ti + X1i + X2i + ii N(0 1)

2023 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26

The Propensity Score Paradox in Real Data

Finkel et al (JOP 2012)

0 500 1000 1500 2000 2500 3000

0

2

4

6

8

10

Number of units pruned

Imba

lanc

e

CEMMDM

PSM

14 SD caliper

Raw

Random

Nielsen et al (AJPS 2011)

0 500 1000 1500 2000 2500

0

5

10

15

20

25

30

Number of units pruned

Imba

lanc

e

CEM

MDM

PSM

14 SD caliper

RandomRaw

Similar pattern for gt 20 other real data sets we checked

2123 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 27

Are King and Nielsen right

Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)

I Matching is good because it reduces model dependence

I fully agree

My view however may be somewhat less pessimistic I believe thatresearch results can be credible if researchers are well educated sothat they know what they are doing and if modeling decisions aremade transparent and robustness of results is evaluated (anddocumented)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 28

Are King and Nielsen right

Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization

That fully blocked randomization is more efficient than completerandomization ndash given the sample size ndash is of course true (how largethe efficiency gains are depends on the strength of the relationbetween X and Y )

However if blocking reduces the sample size it is not a priori clearwhether estimates from the blocked sample are more efficient thanestimates from the full sample (although often they will be)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29

Are King and Nielsen right

Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization

That PSM approximates complete randomization is only partiallytrue PSM approximates complete randomization withinobservations with the same propensity score Hence PSM issomewhere between complete randomization and fully blockedrandomizationI If the X variables have no relation to T (treatment) then allobservations have the same propensity score Hence we end up withcomplete randomization

I If the X variables have a strong effect on T there is lots of blocking

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)

As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)

Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary

I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)

In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31

The Command

kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)

I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics

I Computationally efficient implementation

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32

Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)

drop if industry==2(4 observations deleted)

Mahalanobis-distance kernel matching

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33

Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)

Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif

collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888

3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0

10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0

2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0

Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio

collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966

3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1

10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1

2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34

Some Examples make a graph of the balancing stats mat M = r(M)

mat V = r(V)

coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)

addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))

addplot 2 xline(1) norescaling

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-4 -2 0 2 4 0 1 2 3 4

Std mean difference Variance ratio

Raw Matched

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35

Some Examples

Propensity-score kernel matching

kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Propensity-score kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 431 26 457 1214 182 1396 00188

Treatment-effects estimation

wage Coef

ATT 3887224NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36

Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)

01

23

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Den

sity

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37

Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)

05

1

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Cum

ulat

ive

prob

abilit

y

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38

Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)

02

46

8Raw Matched (ATT)

Untreated Treated

Prop

ensi

ty s

core

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39

Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289

NATE 1432913 2333282 614 0000 9755981 1890228

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40

Some Examples

Do some tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41

Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42

Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43

Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44

Some Examples

Mahalanobis-distance and propensity-score matching combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45

Some Examples

Exact matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46

Some Examples

Bandwidth selection the default (based on distribution of distances in

one-nearest-neighbor matching)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47

Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

3

02

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48

Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49

Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

4

1112

1314

Wei

ghte

d M

ISE

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50

Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51

Some Examples make a graph of the common-support stats mat M = r(M)

coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52

Some Examples

Multiple outcome variables

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53

Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54

Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2763358 166 0097 -082975 1000241

1ATT 9518705 406903 234 0019 1543553 1749386

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 123Prob gt chi2 = 02679

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4452343 111 0268 -379406 1365881

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017-06-24

Propensity Scores MatchingIllustration using kmatch

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017-06-24

Propensity Scores MatchingIllustration using kmatch

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64

Conclusions

The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)

Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65

Conclusions

Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the

literature (somewhat unclear eg how categorical variables should betreated)

Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 29: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

CC CCC CCCCC CCCCCC CCC CC C C CCCCC CCC CC CC CC CC CC CC CC CC C CCCCC CC CC C CCCC C CCC CC CC C CCC CC CC CC C CCC CC CC CCC C CC CCCCC CC CCCC C CCC CCC CC C C CCCC CC CCCC C CCCC C CC CCC CC CCC CC CC CC CC C CCCC C CC C C CC CCC CC CCC CCCCC CCCC CCC CC CCC C C CC CC CC CCCC CC CC CCC C C CCC C CC CCC CC C CCC C C CCC CC CC C CC C CC CCCCC CCCC C C CC C CCCC CC CCC CCC C CCC CC CC CC CC CC C CC C CC CC CCC CC C C CCC CCC C CC CC CCC CCC CC CCC CC C CCC CC C CC CCCC C CC C CC CC C CC C CCC C C CCC CC C CCC CCC CC CCCC CC CC C CC C CC CC C CC CCC CC C CCC CCCC C CC CC C CCC CC CC CC C CCCCC CCC C C CC C CC CCC CCC CC CCC CC CCCC C CCC CC CCC

923 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

923 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching is Suboptimal

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

CC

C C

CC

C

C

C CC

C

C

CC

C

C

C

C

C

CC CC

C

C

C

C

CC

C

CC

C

C

C

C

C

C

C

C

C C

CC

CCC

CC

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

King and Nielsen

Argument 3I Random pruning (deleting observations at random) increasesimbalance This is because the sample size decreases so that varianceincreases (large differences become more likely)

I More imbalancevariance means more model dependence andresearcher discretion

I Because PSM approximates complete randomization it engages inrandom pruning

I PSM Paradox (ldquowhen you do lsquobetterrsquo you do worserdquo)F When matching is made more strict (eg by decreasing the size of

the caliper) PSM like other matching methods typically reducesimbalance But soon the PSM Paradox kicks in such that furtherpruning quickly increases imbalance

F If the data is such that there are no big differences between treatedand untreated to begin with the PSM Paradox kicks in almostimmediately

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 25

PSM Increases Model Dependence amp Bias

Model Dependence0

000

010

020

030

040

05

Number of Units Pruned

Varia

nce

0 40 80 120 160

MDM

PSM

Bias

20

25

30

35

40

Number of Units Pruned

Max

imum

Coe

ffici

ent a

cros

s 51

2 Sp

ecifi

catio

ns0 40 80 120 160

MDM

PSM

True effect = 2

Yi = 2Ti + X1i + X2i + ii N(0 1)

2023 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26

The Propensity Score Paradox in Real Data

Finkel et al (JOP 2012)

0 500 1000 1500 2000 2500 3000

0

2

4

6

8

10

Number of units pruned

Imba

lanc

e

CEMMDM

PSM

14 SD caliper

Raw

Random

Nielsen et al (AJPS 2011)

0 500 1000 1500 2000 2500

0

5

10

15

20

25

30

Number of units pruned

Imba

lanc

e

CEM

MDM

PSM

14 SD caliper

RandomRaw

Similar pattern for gt 20 other real data sets we checked

2123 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 27

Are King and Nielsen right

Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)

I Matching is good because it reduces model dependence

I fully agree

My view however may be somewhat less pessimistic I believe thatresearch results can be credible if researchers are well educated sothat they know what they are doing and if modeling decisions aremade transparent and robustness of results is evaluated (anddocumented)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 28

Are King and Nielsen right

Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization

That fully blocked randomization is more efficient than completerandomization ndash given the sample size ndash is of course true (how largethe efficiency gains are depends on the strength of the relationbetween X and Y )

However if blocking reduces the sample size it is not a priori clearwhether estimates from the blocked sample are more efficient thanestimates from the full sample (although often they will be)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29

Are King and Nielsen right

Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization

That PSM approximates complete randomization is only partiallytrue PSM approximates complete randomization withinobservations with the same propensity score Hence PSM issomewhere between complete randomization and fully blockedrandomizationI If the X variables have no relation to T (treatment) then allobservations have the same propensity score Hence we end up withcomplete randomization

I If the X variables have a strong effect on T there is lots of blocking

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)

As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)

Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary

I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)

In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31

The Command

kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)

I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics

I Computationally efficient implementation

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32

Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)

drop if industry==2(4 observations deleted)

Mahalanobis-distance kernel matching

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33

Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)

Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif

collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888

3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0

10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0

2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0

Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio

collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966

3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1

10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1

2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34

Some Examples make a graph of the balancing stats mat M = r(M)

mat V = r(V)

coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)

addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))

addplot 2 xline(1) norescaling

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-4 -2 0 2 4 0 1 2 3 4

Std mean difference Variance ratio

Raw Matched

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35

Some Examples

Propensity-score kernel matching

kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Propensity-score kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 431 26 457 1214 182 1396 00188

Treatment-effects estimation

wage Coef

ATT 3887224NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36

Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)

01

23

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Den

sity

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37

Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)

05

1

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Cum

ulat

ive

prob

abilit

y

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38

Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)

02

46

8Raw Matched (ATT)

Untreated Treated

Prop

ensi

ty s

core

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39

Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289

NATE 1432913 2333282 614 0000 9755981 1890228

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40

Some Examples

Do some tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41

Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42

Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43

Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44

Some Examples

Mahalanobis-distance and propensity-score matching combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45

Some Examples

Exact matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46

Some Examples

Bandwidth selection the default (based on distribution of distances in

one-nearest-neighbor matching)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47

Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

3

02

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48

Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49

Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

4

1112

1314

Wei

ghte

d M

ISE

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50

Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51

Some Examples make a graph of the common-support stats mat M = r(M)

coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52

Some Examples

Multiple outcome variables

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53

Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54

Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2763358 166 0097 -082975 1000241

1ATT 9518705 406903 234 0019 1543553 1749386

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 123Prob gt chi2 = 02679

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4452343 111 0268 -379406 1365881

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017-06-24

Propensity Scores MatchingIllustration using kmatch

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017-06-24

Propensity Scores MatchingIllustration using kmatch

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64

Conclusions

The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)

Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65

Conclusions

Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the

literature (somewhat unclear eg how categorical variables should betreated)

Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 30: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

Best Case Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT

C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC

923 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching is Suboptimal

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

CC

C C

CC

C

C

C CC

C

C

CC

C

C

C

C

C

CC CC

C

C

C

C

CC

C

CC

C

C

C

C

C

C

C

C

C C

CC

CCC

CC

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

King and Nielsen

Argument 3I Random pruning (deleting observations at random) increasesimbalance This is because the sample size decreases so that varianceincreases (large differences become more likely)

I More imbalancevariance means more model dependence andresearcher discretion

I Because PSM approximates complete randomization it engages inrandom pruning

I PSM Paradox (ldquowhen you do lsquobetterrsquo you do worserdquo)F When matching is made more strict (eg by decreasing the size of

the caliper) PSM like other matching methods typically reducesimbalance But soon the PSM Paradox kicks in such that furtherpruning quickly increases imbalance

F If the data is such that there are no big differences between treatedand untreated to begin with the PSM Paradox kicks in almostimmediately

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 25

PSM Increases Model Dependence amp Bias

Model Dependence0

000

010

020

030

040

05

Number of Units Pruned

Varia

nce

0 40 80 120 160

MDM

PSM

Bias

20

25

30

35

40

Number of Units Pruned

Max

imum

Coe

ffici

ent a

cros

s 51

2 Sp

ecifi

catio

ns0 40 80 120 160

MDM

PSM

True effect = 2

Yi = 2Ti + X1i + X2i + ii N(0 1)

2023 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26

The Propensity Score Paradox in Real Data

Finkel et al (JOP 2012)

0 500 1000 1500 2000 2500 3000

0

2

4

6

8

10

Number of units pruned

Imba

lanc

e

CEMMDM

PSM

14 SD caliper

Raw

Random

Nielsen et al (AJPS 2011)

0 500 1000 1500 2000 2500

0

5

10

15

20

25

30

Number of units pruned

Imba

lanc

e

CEM

MDM

PSM

14 SD caliper

RandomRaw

Similar pattern for gt 20 other real data sets we checked

2123 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 27

Are King and Nielsen right

Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)

I Matching is good because it reduces model dependence

I fully agree

My view however may be somewhat less pessimistic I believe thatresearch results can be credible if researchers are well educated sothat they know what they are doing and if modeling decisions aremade transparent and robustness of results is evaluated (anddocumented)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 28

Are King and Nielsen right

Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization

That fully blocked randomization is more efficient than completerandomization ndash given the sample size ndash is of course true (how largethe efficiency gains are depends on the strength of the relationbetween X and Y )

However if blocking reduces the sample size it is not a priori clearwhether estimates from the blocked sample are more efficient thanestimates from the full sample (although often they will be)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29

Are King and Nielsen right

Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization

That PSM approximates complete randomization is only partiallytrue PSM approximates complete randomization withinobservations with the same propensity score Hence PSM issomewhere between complete randomization and fully blockedrandomizationI If the X variables have no relation to T (treatment) then allobservations have the same propensity score Hence we end up withcomplete randomization

I If the X variables have a strong effect on T there is lots of blocking

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)

As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)

Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary

I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)

In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31

The Command

kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)

I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics

I Computationally efficient implementation

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32

Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)

drop if industry==2(4 observations deleted)

Mahalanobis-distance kernel matching

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33

Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)

Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif

collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888

3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0

10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0

2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0

Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio

collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966

3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1

10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1

2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34

Some Examples make a graph of the balancing stats mat M = r(M)

mat V = r(V)

coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)

addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))

addplot 2 xline(1) norescaling

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-4 -2 0 2 4 0 1 2 3 4

Std mean difference Variance ratio

Raw Matched

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35

Some Examples

Propensity-score kernel matching

kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Propensity-score kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 431 26 457 1214 182 1396 00188

Treatment-effects estimation

wage Coef

ATT 3887224NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36

Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)

01

23

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Den

sity

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37

Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)

05

1

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Cum

ulat

ive

prob

abilit

y

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38

Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)

02

46

8Raw Matched (ATT)

Untreated Treated

Prop

ensi

ty s

core

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39

Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289

NATE 1432913 2333282 614 0000 9755981 1890228

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40

Some Examples

Do some tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41

Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42

Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43

Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44

Some Examples

Mahalanobis-distance and propensity-score matching combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45

Some Examples

Exact matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46

Some Examples

Bandwidth selection the default (based on distribution of distances in

one-nearest-neighbor matching)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47

Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

3

02

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48

Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49

Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

4

1112

1314

Wei

ghte

d M

ISE

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50

Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51

Some Examples make a graph of the common-support stats mat M = r(M)

coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52

Some Examples

Multiple outcome variables

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53

Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54

Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2763358 166 0097 -082975 1000241

1ATT 9518705 406903 234 0019 1543553 1749386

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 123Prob gt chi2 = 02679

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4452343 111 0268 -379406 1365881

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017-06-24

Propensity Scores MatchingIllustration using kmatch

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017-06-24

Propensity Scores MatchingIllustration using kmatch

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64

Conclusions

The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)

Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65

Conclusions

Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the

literature (somewhat unclear eg how categorical variables should betreated)

Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 31: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching is Suboptimal

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

CC

C C

CC

C

C

C CC

C

C

CC

C

C

C

C

C

CC CC

C

C

C

C

CC

C

CC

C

C

C

C

C

C

C

C

C C

CC

CCC

CC

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

King and Nielsen

Argument 3I Random pruning (deleting observations at random) increasesimbalance This is because the sample size decreases so that varianceincreases (large differences become more likely)

I More imbalancevariance means more model dependence andresearcher discretion

I Because PSM approximates complete randomization it engages inrandom pruning

I PSM Paradox (ldquowhen you do lsquobetterrsquo you do worserdquo)F When matching is made more strict (eg by decreasing the size of

the caliper) PSM like other matching methods typically reducesimbalance But soon the PSM Paradox kicks in such that furtherpruning quickly increases imbalance

F If the data is such that there are no big differences between treatedand untreated to begin with the PSM Paradox kicks in almostimmediately

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 25

PSM Increases Model Dependence amp Bias

Model Dependence0

000

010

020

030

040

05

Number of Units Pruned

Varia

nce

0 40 80 120 160

MDM

PSM

Bias

20

25

30

35

40

Number of Units Pruned

Max

imum

Coe

ffici

ent a

cros

s 51

2 Sp

ecifi

catio

ns0 40 80 120 160

MDM

PSM

True effect = 2

Yi = 2Ti + X1i + X2i + ii N(0 1)

2023 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26

The Propensity Score Paradox in Real Data

Finkel et al (JOP 2012)

0 500 1000 1500 2000 2500 3000

0

2

4

6

8

10

Number of units pruned

Imba

lanc

e

CEMMDM

PSM

14 SD caliper

Raw

Random

Nielsen et al (AJPS 2011)

0 500 1000 1500 2000 2500

0

5

10

15

20

25

30

Number of units pruned

Imba

lanc

e

CEM

MDM

PSM

14 SD caliper

RandomRaw

Similar pattern for gt 20 other real data sets we checked

2123 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 27

Are King and Nielsen right

Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)

I Matching is good because it reduces model dependence

I fully agree

My view however may be somewhat less pessimistic I believe thatresearch results can be credible if researchers are well educated sothat they know what they are doing and if modeling decisions aremade transparent and robustness of results is evaluated (anddocumented)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 28

Are King and Nielsen right

Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization

That fully blocked randomization is more efficient than completerandomization ndash given the sample size ndash is of course true (how largethe efficiency gains are depends on the strength of the relationbetween X and Y )

However if blocking reduces the sample size it is not a priori clearwhether estimates from the blocked sample are more efficient thanestimates from the full sample (although often they will be)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29

Are King and Nielsen right

Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization

That PSM approximates complete randomization is only partiallytrue PSM approximates complete randomization withinobservations with the same propensity score Hence PSM issomewhere between complete randomization and fully blockedrandomizationI If the X variables have no relation to T (treatment) then allobservations have the same propensity score Hence we end up withcomplete randomization

I If the X variables have a strong effect on T there is lots of blocking

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)

As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)

Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary

I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)

In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31

The Command

kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)

I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics

I Computationally efficient implementation

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32

Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)

drop if industry==2(4 observations deleted)

Mahalanobis-distance kernel matching

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33

Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)

Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif

collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888

3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0

10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0

2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0

Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio

collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966

3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1

10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1

2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34

Some Examples make a graph of the balancing stats mat M = r(M)

mat V = r(V)

coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)

addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))

addplot 2 xline(1) norescaling

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-4 -2 0 2 4 0 1 2 3 4

Std mean difference Variance ratio

Raw Matched

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35

Some Examples

Propensity-score kernel matching

kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Propensity-score kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 431 26 457 1214 182 1396 00188

Treatment-effects estimation

wage Coef

ATT 3887224NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36

Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)

01

23

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Den

sity

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37

Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)

05

1

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Cum

ulat

ive

prob

abilit

y

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38

Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)

02

46

8Raw Matched (ATT)

Untreated Treated

Prop

ensi

ty s

core

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39

Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289

NATE 1432913 2333282 614 0000 9755981 1890228

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40

Some Examples

Do some tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41

Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42

Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43

Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44

Some Examples

Mahalanobis-distance and propensity-score matching combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45

Some Examples

Exact matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46

Some Examples

Bandwidth selection the default (based on distribution of distances in

one-nearest-neighbor matching)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47

Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

3

02

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48

Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49

Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

4

1112

1314

Wei

ghte

d M

ISE

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50

Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51

Some Examples make a graph of the common-support stats mat M = r(M)

coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52

Some Examples

Multiple outcome variables

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53

Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54

Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2763358 166 0097 -082975 1000241

1ATT 9518705 406903 234 0019 1543553 1749386

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 123Prob gt chi2 = 02679

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4452343 111 0268 -379406 1365881

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017-06-24

Propensity Scores MatchingIllustration using kmatch

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017-06-24

Propensity Scores MatchingIllustration using kmatch

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64

Conclusions

The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)

Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65

Conclusions

Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the

literature (somewhat unclear eg how categorical variables should betreated)

Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 32: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

Best Case Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

CCC

CC

C

C

CC

C

C

C

CC

C

C

C

C

C

CCCC

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C

CC

CC

CCC

CC

C

C

C

CC

CC

C

C

CC

C

CC

C

C

C C

CC

C

CC

CC

C

CC

C

C

CCC

C

C

C

CC

CC

C

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

CC

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

C CC

C

C

C

CC

C

C

C

C

C

C

CC

C

C

C

CC

C

C

C

C

C

CCC

C

C

C

CC

CC

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

C

C

C

C

C

C

C C

CC C

CC

CC

C

C

C

C

CC

C

C

C C

CC

CC

C

C

C

C C

CC

C

CC

C

C

C

C C

CC

C

CC

C

C

C

C

C

C

C

C

CC

CC

C

C

CC

CC

C

CC

C

C

C

C

CC

C

CC

C

C

C

CC

C

C

CC

CC

CC

C

C

C C

C

CC C

C

CC

C CC

C

C

C

C

CC

CC

C

C

C

C

CC

C

C

C

C

C

C

C

C

C

C

C

CC

CC C

C

C

C

C

C

C C

C

C

C

C C

C

C

C

CC

CCC

CC

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

C

C

CCC

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC

C

CC

C

C

CC

C C

C

CC

C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C C

CC C

CCC

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C

CC

C

C C

C

C

C

C

C

C CC

C

C

C

C CC

C

C

C

CC

C

C

C

C

C C

C

C

C

C

C

C

C

C

CC

C

C

CC

C

C

C

C

C C

CC

C

C

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1

0

PropensityScore

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

Best Case Propensity Score Matching is Suboptimal

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

CC

C C

CC

C

C

C CC

C

C

CC

C

C

C

C

C

CC CC

C

C

C

C

CC

C

CC

C

C

C

C

C

C

C

C

C C

CC

CCC

CC

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

King and Nielsen

Argument 3I Random pruning (deleting observations at random) increasesimbalance This is because the sample size decreases so that varianceincreases (large differences become more likely)

I More imbalancevariance means more model dependence andresearcher discretion

I Because PSM approximates complete randomization it engages inrandom pruning

I PSM Paradox (ldquowhen you do lsquobetterrsquo you do worserdquo)F When matching is made more strict (eg by decreasing the size of

the caliper) PSM like other matching methods typically reducesimbalance But soon the PSM Paradox kicks in such that furtherpruning quickly increases imbalance

F If the data is such that there are no big differences between treatedand untreated to begin with the PSM Paradox kicks in almostimmediately

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 25

PSM Increases Model Dependence amp Bias

Model Dependence0

000

010

020

030

040

05

Number of Units Pruned

Varia

nce

0 40 80 120 160

MDM

PSM

Bias

20

25

30

35

40

Number of Units Pruned

Max

imum

Coe

ffici

ent a

cros

s 51

2 Sp

ecifi

catio

ns0 40 80 120 160

MDM

PSM

True effect = 2

Yi = 2Ti + X1i + X2i + ii N(0 1)

2023 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26

The Propensity Score Paradox in Real Data

Finkel et al (JOP 2012)

0 500 1000 1500 2000 2500 3000

0

2

4

6

8

10

Number of units pruned

Imba

lanc

e

CEMMDM

PSM

14 SD caliper

Raw

Random

Nielsen et al (AJPS 2011)

0 500 1000 1500 2000 2500

0

5

10

15

20

25

30

Number of units pruned

Imba

lanc

e

CEM

MDM

PSM

14 SD caliper

RandomRaw

Similar pattern for gt 20 other real data sets we checked

2123 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 27

Are King and Nielsen right

Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)

I Matching is good because it reduces model dependence

I fully agree

My view however may be somewhat less pessimistic I believe thatresearch results can be credible if researchers are well educated sothat they know what they are doing and if modeling decisions aremade transparent and robustness of results is evaluated (anddocumented)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 28

Are King and Nielsen right

Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization

That fully blocked randomization is more efficient than completerandomization ndash given the sample size ndash is of course true (how largethe efficiency gains are depends on the strength of the relationbetween X and Y )

However if blocking reduces the sample size it is not a priori clearwhether estimates from the blocked sample are more efficient thanestimates from the full sample (although often they will be)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29

Are King and Nielsen right

Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization

That PSM approximates complete randomization is only partiallytrue PSM approximates complete randomization withinobservations with the same propensity score Hence PSM issomewhere between complete randomization and fully blockedrandomizationI If the X variables have no relation to T (treatment) then allobservations have the same propensity score Hence we end up withcomplete randomization

I If the X variables have a strong effect on T there is lots of blocking

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)

As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)

Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary

I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)

In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31

The Command

kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)

I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics

I Computationally efficient implementation

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32

Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)

drop if industry==2(4 observations deleted)

Mahalanobis-distance kernel matching

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33

Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)

Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif

collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888

3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0

10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0

2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0

Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio

collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966

3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1

10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1

2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34

Some Examples make a graph of the balancing stats mat M = r(M)

mat V = r(V)

coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)

addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))

addplot 2 xline(1) norescaling

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-4 -2 0 2 4 0 1 2 3 4

Std mean difference Variance ratio

Raw Matched

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35

Some Examples

Propensity-score kernel matching

kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Propensity-score kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 431 26 457 1214 182 1396 00188

Treatment-effects estimation

wage Coef

ATT 3887224NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36

Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)

01

23

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Den

sity

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37

Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)

05

1

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Cum

ulat

ive

prob

abilit

y

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38

Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)

02

46

8Raw Matched (ATT)

Untreated Treated

Prop

ensi

ty s

core

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39

Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289

NATE 1432913 2333282 614 0000 9755981 1890228

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40

Some Examples

Do some tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41

Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42

Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43

Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44

Some Examples

Mahalanobis-distance and propensity-score matching combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45

Some Examples

Exact matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46

Some Examples

Bandwidth selection the default (based on distribution of distances in

one-nearest-neighbor matching)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47

Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

3

02

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48

Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49

Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

4

1112

1314

Wei

ghte

d M

ISE

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50

Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51

Some Examples make a graph of the common-support stats mat M = r(M)

coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52

Some Examples

Multiple outcome variables

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53

Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54

Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2763358 166 0097 -082975 1000241

1ATT 9518705 406903 234 0019 1543553 1749386

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 123Prob gt chi2 = 02679

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4452343 111 0268 -379406 1365881

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017-06-24

Propensity Scores MatchingIllustration using kmatch

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017-06-24

Propensity Scores MatchingIllustration using kmatch

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64

Conclusions

The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)

Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65

Conclusions

Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the

literature (somewhat unclear eg how categorical variables should betreated)

Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 33: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

Best Case Propensity Score Matching is Suboptimal

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

CC

C C

CC

C

C

C CC

C

C

CC

C

C

C

C

C

CC CC

C

C

C

C

CC

C

CC

C

C

C

C

C

C

C

C

C C

CC

CCC

CC

T

TTT

TT

T

T

TT

T

T

T

TT

T

T

T

T

T

TT TT

T

T

T

T

T

T

T

TT

T

T

T

T

T

T

T

T

TT

TT

TTT

TT

1523 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24

King and Nielsen

Argument 3I Random pruning (deleting observations at random) increasesimbalance This is because the sample size decreases so that varianceincreases (large differences become more likely)

I More imbalancevariance means more model dependence andresearcher discretion

I Because PSM approximates complete randomization it engages inrandom pruning

I PSM Paradox (ldquowhen you do lsquobetterrsquo you do worserdquo)F When matching is made more strict (eg by decreasing the size of

the caliper) PSM like other matching methods typically reducesimbalance But soon the PSM Paradox kicks in such that furtherpruning quickly increases imbalance

F If the data is such that there are no big differences between treatedand untreated to begin with the PSM Paradox kicks in almostimmediately

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 25

PSM Increases Model Dependence amp Bias

Model Dependence0

000

010

020

030

040

05

Number of Units Pruned

Varia

nce

0 40 80 120 160

MDM

PSM

Bias

20

25

30

35

40

Number of Units Pruned

Max

imum

Coe

ffici

ent a

cros

s 51

2 Sp

ecifi

catio

ns0 40 80 120 160

MDM

PSM

True effect = 2

Yi = 2Ti + X1i + X2i + ii N(0 1)

2023 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26

The Propensity Score Paradox in Real Data

Finkel et al (JOP 2012)

0 500 1000 1500 2000 2500 3000

0

2

4

6

8

10

Number of units pruned

Imba

lanc

e

CEMMDM

PSM

14 SD caliper

Raw

Random

Nielsen et al (AJPS 2011)

0 500 1000 1500 2000 2500

0

5

10

15

20

25

30

Number of units pruned

Imba

lanc

e

CEM

MDM

PSM

14 SD caliper

RandomRaw

Similar pattern for gt 20 other real data sets we checked

2123 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 27

Are King and Nielsen right

Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)

I Matching is good because it reduces model dependence

I fully agree

My view however may be somewhat less pessimistic I believe thatresearch results can be credible if researchers are well educated sothat they know what they are doing and if modeling decisions aremade transparent and robustness of results is evaluated (anddocumented)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 28

Are King and Nielsen right

Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization

That fully blocked randomization is more efficient than completerandomization ndash given the sample size ndash is of course true (how largethe efficiency gains are depends on the strength of the relationbetween X and Y )

However if blocking reduces the sample size it is not a priori clearwhether estimates from the blocked sample are more efficient thanestimates from the full sample (although often they will be)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29

Are King and Nielsen right

Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization

That PSM approximates complete randomization is only partiallytrue PSM approximates complete randomization withinobservations with the same propensity score Hence PSM issomewhere between complete randomization and fully blockedrandomizationI If the X variables have no relation to T (treatment) then allobservations have the same propensity score Hence we end up withcomplete randomization

I If the X variables have a strong effect on T there is lots of blocking

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)

As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)

Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary

I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)

In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31

The Command

kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)

I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics

I Computationally efficient implementation

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32

Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)

drop if industry==2(4 observations deleted)

Mahalanobis-distance kernel matching

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33

Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)

Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif

collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888

3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0

10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0

2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0

Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio

collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966

3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1

10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1

2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34

Some Examples make a graph of the balancing stats mat M = r(M)

mat V = r(V)

coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)

addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))

addplot 2 xline(1) norescaling

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-4 -2 0 2 4 0 1 2 3 4

Std mean difference Variance ratio

Raw Matched

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35

Some Examples

Propensity-score kernel matching

kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Propensity-score kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 431 26 457 1214 182 1396 00188

Treatment-effects estimation

wage Coef

ATT 3887224NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36

Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)

01

23

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Den

sity

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37

Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)

05

1

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Cum

ulat

ive

prob

abilit

y

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38

Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)

02

46

8Raw Matched (ATT)

Untreated Treated

Prop

ensi

ty s

core

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39

Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289

NATE 1432913 2333282 614 0000 9755981 1890228

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40

Some Examples

Do some tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41

Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42

Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43

Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44

Some Examples

Mahalanobis-distance and propensity-score matching combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45

Some Examples

Exact matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46

Some Examples

Bandwidth selection the default (based on distribution of distances in

one-nearest-neighbor matching)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47

Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

3

02

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48

Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49

Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

4

1112

1314

Wei

ghte

d M

ISE

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50

Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51

Some Examples make a graph of the common-support stats mat M = r(M)

coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52

Some Examples

Multiple outcome variables

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53

Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54

Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2763358 166 0097 -082975 1000241

1ATT 9518705 406903 234 0019 1543553 1749386

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 123Prob gt chi2 = 02679

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4452343 111 0268 -379406 1365881

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017-06-24

Propensity Scores MatchingIllustration using kmatch

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017-06-24

Propensity Scores MatchingIllustration using kmatch

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64

Conclusions

The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)

Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65

Conclusions

Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the

literature (somewhat unclear eg how categorical variables should betreated)

Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 34: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

King and Nielsen

Argument 3I Random pruning (deleting observations at random) increasesimbalance This is because the sample size decreases so that varianceincreases (large differences become more likely)

I More imbalancevariance means more model dependence andresearcher discretion

I Because PSM approximates complete randomization it engages inrandom pruning

I PSM Paradox (ldquowhen you do lsquobetterrsquo you do worserdquo)F When matching is made more strict (eg by decreasing the size of

the caliper) PSM like other matching methods typically reducesimbalance But soon the PSM Paradox kicks in such that furtherpruning quickly increases imbalance

F If the data is such that there are no big differences between treatedand untreated to begin with the PSM Paradox kicks in almostimmediately

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 25

PSM Increases Model Dependence amp Bias

Model Dependence0

000

010

020

030

040

05

Number of Units Pruned

Varia

nce

0 40 80 120 160

MDM

PSM

Bias

20

25

30

35

40

Number of Units Pruned

Max

imum

Coe

ffici

ent a

cros

s 51

2 Sp

ecifi

catio

ns0 40 80 120 160

MDM

PSM

True effect = 2

Yi = 2Ti + X1i + X2i + ii N(0 1)

2023 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26

The Propensity Score Paradox in Real Data

Finkel et al (JOP 2012)

0 500 1000 1500 2000 2500 3000

0

2

4

6

8

10

Number of units pruned

Imba

lanc

e

CEMMDM

PSM

14 SD caliper

Raw

Random

Nielsen et al (AJPS 2011)

0 500 1000 1500 2000 2500

0

5

10

15

20

25

30

Number of units pruned

Imba

lanc

e

CEM

MDM

PSM

14 SD caliper

RandomRaw

Similar pattern for gt 20 other real data sets we checked

2123 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 27

Are King and Nielsen right

Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)

I Matching is good because it reduces model dependence

I fully agree

My view however may be somewhat less pessimistic I believe thatresearch results can be credible if researchers are well educated sothat they know what they are doing and if modeling decisions aremade transparent and robustness of results is evaluated (anddocumented)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 28

Are King and Nielsen right

Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization

That fully blocked randomization is more efficient than completerandomization ndash given the sample size ndash is of course true (how largethe efficiency gains are depends on the strength of the relationbetween X and Y )

However if blocking reduces the sample size it is not a priori clearwhether estimates from the blocked sample are more efficient thanestimates from the full sample (although often they will be)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29

Are King and Nielsen right

Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization

That PSM approximates complete randomization is only partiallytrue PSM approximates complete randomization withinobservations with the same propensity score Hence PSM issomewhere between complete randomization and fully blockedrandomizationI If the X variables have no relation to T (treatment) then allobservations have the same propensity score Hence we end up withcomplete randomization

I If the X variables have a strong effect on T there is lots of blocking

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)

As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)

Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary

I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)

In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31

The Command

kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)

I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics

I Computationally efficient implementation

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32

Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)

drop if industry==2(4 observations deleted)

Mahalanobis-distance kernel matching

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33

Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)

Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif

collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888

3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0

10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0

2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0

Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio

collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966

3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1

10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1

2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34

Some Examples make a graph of the balancing stats mat M = r(M)

mat V = r(V)

coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)

addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))

addplot 2 xline(1) norescaling

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-4 -2 0 2 4 0 1 2 3 4

Std mean difference Variance ratio

Raw Matched

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35

Some Examples

Propensity-score kernel matching

kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Propensity-score kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 431 26 457 1214 182 1396 00188

Treatment-effects estimation

wage Coef

ATT 3887224NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36

Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)

01

23

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Den

sity

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37

Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)

05

1

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Cum

ulat

ive

prob

abilit

y

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38

Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)

02

46

8Raw Matched (ATT)

Untreated Treated

Prop

ensi

ty s

core

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39

Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289

NATE 1432913 2333282 614 0000 9755981 1890228

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40

Some Examples

Do some tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41

Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42

Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43

Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44

Some Examples

Mahalanobis-distance and propensity-score matching combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45

Some Examples

Exact matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46

Some Examples

Bandwidth selection the default (based on distribution of distances in

one-nearest-neighbor matching)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47

Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

3

02

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48

Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49

Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

4

1112

1314

Wei

ghte

d M

ISE

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50

Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51

Some Examples make a graph of the common-support stats mat M = r(M)

coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52

Some Examples

Multiple outcome variables

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53

Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54

Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2763358 166 0097 -082975 1000241

1ATT 9518705 406903 234 0019 1543553 1749386

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 123Prob gt chi2 = 02679

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4452343 111 0268 -379406 1365881

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017-06-24

Propensity Scores MatchingIllustration using kmatch

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017-06-24

Propensity Scores MatchingIllustration using kmatch

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64

Conclusions

The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)

Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65

Conclusions

Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the

literature (somewhat unclear eg how categorical variables should betreated)

Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 35: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

PSM Increases Model Dependence amp Bias

Model Dependence0

000

010

020

030

040

05

Number of Units Pruned

Varia

nce

0 40 80 120 160

MDM

PSM

Bias

20

25

30

35

40

Number of Units Pruned

Max

imum

Coe

ffici

ent a

cros

s 51

2 Sp

ecifi

catio

ns0 40 80 120 160

MDM

PSM

True effect = 2

Yi = 2Ti + X1i + X2i + ii N(0 1)

2023 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26

The Propensity Score Paradox in Real Data

Finkel et al (JOP 2012)

0 500 1000 1500 2000 2500 3000

0

2

4

6

8

10

Number of units pruned

Imba

lanc

e

CEMMDM

PSM

14 SD caliper

Raw

Random

Nielsen et al (AJPS 2011)

0 500 1000 1500 2000 2500

0

5

10

15

20

25

30

Number of units pruned

Imba

lanc

e

CEM

MDM

PSM

14 SD caliper

RandomRaw

Similar pattern for gt 20 other real data sets we checked

2123 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 27

Are King and Nielsen right

Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)

I Matching is good because it reduces model dependence

I fully agree

My view however may be somewhat less pessimistic I believe thatresearch results can be credible if researchers are well educated sothat they know what they are doing and if modeling decisions aremade transparent and robustness of results is evaluated (anddocumented)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 28

Are King and Nielsen right

Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization

That fully blocked randomization is more efficient than completerandomization ndash given the sample size ndash is of course true (how largethe efficiency gains are depends on the strength of the relationbetween X and Y )

However if blocking reduces the sample size it is not a priori clearwhether estimates from the blocked sample are more efficient thanestimates from the full sample (although often they will be)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29

Are King and Nielsen right

Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization

That PSM approximates complete randomization is only partiallytrue PSM approximates complete randomization withinobservations with the same propensity score Hence PSM issomewhere between complete randomization and fully blockedrandomizationI If the X variables have no relation to T (treatment) then allobservations have the same propensity score Hence we end up withcomplete randomization

I If the X variables have a strong effect on T there is lots of blocking

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)

As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)

Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary

I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)

In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31

The Command

kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)

I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics

I Computationally efficient implementation

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32

Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)

drop if industry==2(4 observations deleted)

Mahalanobis-distance kernel matching

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33

Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)

Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif

collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888

3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0

10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0

2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0

Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio

collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966

3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1

10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1

2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34

Some Examples make a graph of the balancing stats mat M = r(M)

mat V = r(V)

coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)

addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))

addplot 2 xline(1) norescaling

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-4 -2 0 2 4 0 1 2 3 4

Std mean difference Variance ratio

Raw Matched

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35

Some Examples

Propensity-score kernel matching

kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Propensity-score kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 431 26 457 1214 182 1396 00188

Treatment-effects estimation

wage Coef

ATT 3887224NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36

Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)

01

23

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Den

sity

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37

Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)

05

1

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Cum

ulat

ive

prob

abilit

y

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38

Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)

02

46

8Raw Matched (ATT)

Untreated Treated

Prop

ensi

ty s

core

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39

Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289

NATE 1432913 2333282 614 0000 9755981 1890228

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40

Some Examples

Do some tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41

Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42

Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43

Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44

Some Examples

Mahalanobis-distance and propensity-score matching combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45

Some Examples

Exact matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46

Some Examples

Bandwidth selection the default (based on distribution of distances in

one-nearest-neighbor matching)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47

Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

3

02

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48

Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49

Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

4

1112

1314

Wei

ghte

d M

ISE

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50

Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51

Some Examples make a graph of the common-support stats mat M = r(M)

coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52

Some Examples

Multiple outcome variables

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53

Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54

Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2763358 166 0097 -082975 1000241

1ATT 9518705 406903 234 0019 1543553 1749386

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 123Prob gt chi2 = 02679

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4452343 111 0268 -379406 1365881

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017-06-24

Propensity Scores MatchingIllustration using kmatch

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017-06-24

Propensity Scores MatchingIllustration using kmatch

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64

Conclusions

The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)

Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65

Conclusions

Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the

literature (somewhat unclear eg how categorical variables should betreated)

Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 36: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

The Propensity Score Paradox in Real Data

Finkel et al (JOP 2012)

0 500 1000 1500 2000 2500 3000

0

2

4

6

8

10

Number of units pruned

Imba

lanc

e

CEMMDM

PSM

14 SD caliper

Raw

Random

Nielsen et al (AJPS 2011)

0 500 1000 1500 2000 2500

0

5

10

15

20

25

30

Number of units pruned

Imba

lanc

e

CEM

MDM

PSM

14 SD caliper

RandomRaw

Similar pattern for gt 20 other real data sets we checked

2123 (slid

esby

Kin

gan

dN

ielsen

)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 27

Are King and Nielsen right

Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)

I Matching is good because it reduces model dependence

I fully agree

My view however may be somewhat less pessimistic I believe thatresearch results can be credible if researchers are well educated sothat they know what they are doing and if modeling decisions aremade transparent and robustness of results is evaluated (anddocumented)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 28

Are King and Nielsen right

Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization

That fully blocked randomization is more efficient than completerandomization ndash given the sample size ndash is of course true (how largethe efficiency gains are depends on the strength of the relationbetween X and Y )

However if blocking reduces the sample size it is not a priori clearwhether estimates from the blocked sample are more efficient thanestimates from the full sample (although often they will be)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29

Are King and Nielsen right

Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization

That PSM approximates complete randomization is only partiallytrue PSM approximates complete randomization withinobservations with the same propensity score Hence PSM issomewhere between complete randomization and fully blockedrandomizationI If the X variables have no relation to T (treatment) then allobservations have the same propensity score Hence we end up withcomplete randomization

I If the X variables have a strong effect on T there is lots of blocking

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)

As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)

Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary

I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)

In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31

The Command

kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)

I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics

I Computationally efficient implementation

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32

Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)

drop if industry==2(4 observations deleted)

Mahalanobis-distance kernel matching

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33

Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)

Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif

collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888

3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0

10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0

2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0

Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio

collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966

3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1

10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1

2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34

Some Examples make a graph of the balancing stats mat M = r(M)

mat V = r(V)

coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)

addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))

addplot 2 xline(1) norescaling

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-4 -2 0 2 4 0 1 2 3 4

Std mean difference Variance ratio

Raw Matched

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35

Some Examples

Propensity-score kernel matching

kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Propensity-score kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 431 26 457 1214 182 1396 00188

Treatment-effects estimation

wage Coef

ATT 3887224NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36

Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)

01

23

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Den

sity

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37

Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)

05

1

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Cum

ulat

ive

prob

abilit

y

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38

Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)

02

46

8Raw Matched (ATT)

Untreated Treated

Prop

ensi

ty s

core

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39

Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289

NATE 1432913 2333282 614 0000 9755981 1890228

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40

Some Examples

Do some tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41

Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42

Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43

Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44

Some Examples

Mahalanobis-distance and propensity-score matching combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45

Some Examples

Exact matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46

Some Examples

Bandwidth selection the default (based on distribution of distances in

one-nearest-neighbor matching)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47

Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

3

02

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48

Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49

Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

4

1112

1314

Wei

ghte

d M

ISE

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50

Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51

Some Examples make a graph of the common-support stats mat M = r(M)

coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52

Some Examples

Multiple outcome variables

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53

Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54

Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2763358 166 0097 -082975 1000241

1ATT 9518705 406903 234 0019 1543553 1749386

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 123Prob gt chi2 = 02679

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4452343 111 0268 -379406 1365881

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017-06-24

Propensity Scores MatchingIllustration using kmatch

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017-06-24

Propensity Scores MatchingIllustration using kmatch

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64

Conclusions

The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)

Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65

Conclusions

Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the

literature (somewhat unclear eg how categorical variables should betreated)

Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 37: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 27

Are King and Nielsen right

Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)

I Matching is good because it reduces model dependence

I fully agree

My view however may be somewhat less pessimistic I believe thatresearch results can be credible if researchers are well educated sothat they know what they are doing and if modeling decisions aremade transparent and robustness of results is evaluated (anddocumented)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 28

Are King and Nielsen right

Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization

That fully blocked randomization is more efficient than completerandomization ndash given the sample size ndash is of course true (how largethe efficiency gains are depends on the strength of the relationbetween X and Y )

However if blocking reduces the sample size it is not a priori clearwhether estimates from the blocked sample are more efficient thanestimates from the full sample (although often they will be)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29

Are King and Nielsen right

Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization

That PSM approximates complete randomization is only partiallytrue PSM approximates complete randomization withinobservations with the same propensity score Hence PSM issomewhere between complete randomization and fully blockedrandomizationI If the X variables have no relation to T (treatment) then allobservations have the same propensity score Hence we end up withcomplete randomization

I If the X variables have a strong effect on T there is lots of blocking

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)

As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)

Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary

I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)

In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31

The Command

kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)

I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics

I Computationally efficient implementation

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32

Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)

drop if industry==2(4 observations deleted)

Mahalanobis-distance kernel matching

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33

Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)

Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif

collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888

3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0

10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0

2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0

Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio

collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966

3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1

10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1

2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34

Some Examples make a graph of the balancing stats mat M = r(M)

mat V = r(V)

coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)

addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))

addplot 2 xline(1) norescaling

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-4 -2 0 2 4 0 1 2 3 4

Std mean difference Variance ratio

Raw Matched

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35

Some Examples

Propensity-score kernel matching

kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Propensity-score kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 431 26 457 1214 182 1396 00188

Treatment-effects estimation

wage Coef

ATT 3887224NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36

Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)

01

23

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Den

sity

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37

Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)

05

1

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Cum

ulat

ive

prob

abilit

y

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38

Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)

02

46

8Raw Matched (ATT)

Untreated Treated

Prop

ensi

ty s

core

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39

Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289

NATE 1432913 2333282 614 0000 9755981 1890228

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40

Some Examples

Do some tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41

Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42

Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43

Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44

Some Examples

Mahalanobis-distance and propensity-score matching combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45

Some Examples

Exact matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46

Some Examples

Bandwidth selection the default (based on distribution of distances in

one-nearest-neighbor matching)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47

Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

3

02

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48

Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49

Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

4

1112

1314

Wei

ghte

d M

ISE

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50

Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51

Some Examples make a graph of the common-support stats mat M = r(M)

coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52

Some Examples

Multiple outcome variables

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53

Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54

Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2763358 166 0097 -082975 1000241

1ATT 9518705 406903 234 0019 1543553 1749386

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 123Prob gt chi2 = 02679

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4452343 111 0268 -379406 1365881

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017-06-24

Propensity Scores MatchingIllustration using kmatch

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017-06-24

Propensity Scores MatchingIllustration using kmatch

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64

Conclusions

The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)

Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65

Conclusions

Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the

literature (somewhat unclear eg how categorical variables should betreated)

Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 38: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

Are King and Nielsen right

Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)

I Matching is good because it reduces model dependence

I fully agree

My view however may be somewhat less pessimistic I believe thatresearch results can be credible if researchers are well educated sothat they know what they are doing and if modeling decisions aremade transparent and robustness of results is evaluated (anddocumented)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 28

Are King and Nielsen right

Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization

That fully blocked randomization is more efficient than completerandomization ndash given the sample size ndash is of course true (how largethe efficiency gains are depends on the strength of the relationbetween X and Y )

However if blocking reduces the sample size it is not a priori clearwhether estimates from the blocked sample are more efficient thanestimates from the full sample (although often they will be)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29

Are King and Nielsen right

Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization

That PSM approximates complete randomization is only partiallytrue PSM approximates complete randomization withinobservations with the same propensity score Hence PSM issomewhere between complete randomization and fully blockedrandomizationI If the X variables have no relation to T (treatment) then allobservations have the same propensity score Hence we end up withcomplete randomization

I If the X variables have a strong effect on T there is lots of blocking

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)

As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)

Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary

I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)

In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31

The Command

kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)

I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics

I Computationally efficient implementation

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32

Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)

drop if industry==2(4 observations deleted)

Mahalanobis-distance kernel matching

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33

Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)

Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif

collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888

3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0

10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0

2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0

Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio

collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966

3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1

10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1

2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34

Some Examples make a graph of the balancing stats mat M = r(M)

mat V = r(V)

coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)

addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))

addplot 2 xline(1) norescaling

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-4 -2 0 2 4 0 1 2 3 4

Std mean difference Variance ratio

Raw Matched

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35

Some Examples

Propensity-score kernel matching

kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Propensity-score kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 431 26 457 1214 182 1396 00188

Treatment-effects estimation

wage Coef

ATT 3887224NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36

Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)

01

23

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Den

sity

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37

Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)

05

1

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Cum

ulat

ive

prob

abilit

y

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38

Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)

02

46

8Raw Matched (ATT)

Untreated Treated

Prop

ensi

ty s

core

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39

Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289

NATE 1432913 2333282 614 0000 9755981 1890228

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40

Some Examples

Do some tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41

Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42

Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43

Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44

Some Examples

Mahalanobis-distance and propensity-score matching combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45

Some Examples

Exact matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46

Some Examples

Bandwidth selection the default (based on distribution of distances in

one-nearest-neighbor matching)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47

Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

3

02

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48

Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49

Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

4

1112

1314

Wei

ghte

d M

ISE

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50

Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51

Some Examples make a graph of the common-support stats mat M = r(M)

coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52

Some Examples

Multiple outcome variables

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53

Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54

Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2763358 166 0097 -082975 1000241

1ATT 9518705 406903 234 0019 1543553 1749386

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 123Prob gt chi2 = 02679

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4452343 111 0268 -379406 1365881

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017-06-24

Propensity Scores MatchingIllustration using kmatch

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017-06-24

Propensity Scores MatchingIllustration using kmatch

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64

Conclusions

The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)

Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65

Conclusions

Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the

literature (somewhat unclear eg how categorical variables should betreated)

Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 39: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

Are King and Nielsen right

Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization

That fully blocked randomization is more efficient than completerandomization ndash given the sample size ndash is of course true (how largethe efficiency gains are depends on the strength of the relationbetween X and Y )

However if blocking reduces the sample size it is not a priori clearwhether estimates from the blocked sample are more efficient thanestimates from the full sample (although often they will be)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29

Are King and Nielsen right

Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization

That PSM approximates complete randomization is only partiallytrue PSM approximates complete randomization withinobservations with the same propensity score Hence PSM issomewhere between complete randomization and fully blockedrandomizationI If the X variables have no relation to T (treatment) then allobservations have the same propensity score Hence we end up withcomplete randomization

I If the X variables have a strong effect on T there is lots of blocking

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)

As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)

Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary

I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)

In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31

The Command

kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)

I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics

I Computationally efficient implementation

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32

Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)

drop if industry==2(4 observations deleted)

Mahalanobis-distance kernel matching

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33

Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)

Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif

collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888

3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0

10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0

2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0

Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio

collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966

3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1

10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1

2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34

Some Examples make a graph of the balancing stats mat M = r(M)

mat V = r(V)

coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)

addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))

addplot 2 xline(1) norescaling

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-4 -2 0 2 4 0 1 2 3 4

Std mean difference Variance ratio

Raw Matched

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35

Some Examples

Propensity-score kernel matching

kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Propensity-score kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 431 26 457 1214 182 1396 00188

Treatment-effects estimation

wage Coef

ATT 3887224NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36

Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)

01

23

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Den

sity

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37

Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)

05

1

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Cum

ulat

ive

prob

abilit

y

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38

Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)

02

46

8Raw Matched (ATT)

Untreated Treated

Prop

ensi

ty s

core

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39

Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289

NATE 1432913 2333282 614 0000 9755981 1890228

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40

Some Examples

Do some tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41

Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42

Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43

Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44

Some Examples

Mahalanobis-distance and propensity-score matching combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45

Some Examples

Exact matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46

Some Examples

Bandwidth selection the default (based on distribution of distances in

one-nearest-neighbor matching)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47

Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

3

02

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48

Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49

Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

4

1112

1314

Wei

ghte

d M

ISE

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50

Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51

Some Examples make a graph of the common-support stats mat M = r(M)

coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52

Some Examples

Multiple outcome variables

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53

Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54

Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2763358 166 0097 -082975 1000241

1ATT 9518705 406903 234 0019 1543553 1749386

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 123Prob gt chi2 = 02679

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4452343 111 0268 -379406 1365881

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017-06-24

Propensity Scores MatchingIllustration using kmatch

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017-06-24

Propensity Scores MatchingIllustration using kmatch

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64

Conclusions

The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)

Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65

Conclusions

Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the

literature (somewhat unclear eg how categorical variables should betreated)

Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 40: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

Are King and Nielsen right

Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization

That PSM approximates complete randomization is only partiallytrue PSM approximates complete randomization withinobservations with the same propensity score Hence PSM issomewhere between complete randomization and fully blockedrandomizationI If the X variables have no relation to T (treatment) then allobservations have the same propensity score Hence we end up withcomplete randomization

I If the X variables have a strong effect on T there is lots of blocking

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)

As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)

Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary

I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)

In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31

The Command

kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)

I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics

I Computationally efficient implementation

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32

Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)

drop if industry==2(4 observations deleted)

Mahalanobis-distance kernel matching

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33

Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)

Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif

collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888

3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0

10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0

2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0

Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio

collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966

3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1

10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1

2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34

Some Examples make a graph of the balancing stats mat M = r(M)

mat V = r(V)

coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)

addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))

addplot 2 xline(1) norescaling

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-4 -2 0 2 4 0 1 2 3 4

Std mean difference Variance ratio

Raw Matched

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35

Some Examples

Propensity-score kernel matching

kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Propensity-score kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 431 26 457 1214 182 1396 00188

Treatment-effects estimation

wage Coef

ATT 3887224NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36

Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)

01

23

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Den

sity

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37

Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)

05

1

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Cum

ulat

ive

prob

abilit

y

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38

Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)

02

46

8Raw Matched (ATT)

Untreated Treated

Prop

ensi

ty s

core

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39

Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289

NATE 1432913 2333282 614 0000 9755981 1890228

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40

Some Examples

Do some tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41

Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42

Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43

Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44

Some Examples

Mahalanobis-distance and propensity-score matching combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45

Some Examples

Exact matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46

Some Examples

Bandwidth selection the default (based on distribution of distances in

one-nearest-neighbor matching)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47

Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

3

02

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48

Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49

Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

4

1112

1314

Wei

ghte

d M

ISE

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50

Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51

Some Examples make a graph of the common-support stats mat M = r(M)

coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52

Some Examples

Multiple outcome variables

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53

Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54

Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2763358 166 0097 -082975 1000241

1ATT 9518705 406903 234 0019 1543553 1749386

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 123Prob gt chi2 = 02679

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4452343 111 0268 -379406 1365881

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017-06-24

Propensity Scores MatchingIllustration using kmatch

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017-06-24

Propensity Scores MatchingIllustration using kmatch

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64

Conclusions

The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)

Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65

Conclusions

Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the

literature (somewhat unclear eg how categorical variables should betreated)

Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 41: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)

As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)

Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary

I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)

In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31

The Command

kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)

I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics

I Computationally efficient implementation

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32

Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)

drop if industry==2(4 observations deleted)

Mahalanobis-distance kernel matching

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33

Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)

Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif

collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888

3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0

10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0

2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0

Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio

collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966

3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1

10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1

2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34

Some Examples make a graph of the balancing stats mat M = r(M)

mat V = r(V)

coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)

addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))

addplot 2 xline(1) norescaling

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-4 -2 0 2 4 0 1 2 3 4

Std mean difference Variance ratio

Raw Matched

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35

Some Examples

Propensity-score kernel matching

kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Propensity-score kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 431 26 457 1214 182 1396 00188

Treatment-effects estimation

wage Coef

ATT 3887224NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36

Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)

01

23

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Den

sity

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37

Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)

05

1

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Cum

ulat

ive

prob

abilit

y

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38

Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)

02

46

8Raw Matched (ATT)

Untreated Treated

Prop

ensi

ty s

core

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39

Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289

NATE 1432913 2333282 614 0000 9755981 1890228

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40

Some Examples

Do some tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41

Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42

Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43

Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44

Some Examples

Mahalanobis-distance and propensity-score matching combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45

Some Examples

Exact matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46

Some Examples

Bandwidth selection the default (based on distribution of distances in

one-nearest-neighbor matching)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47

Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

3

02

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48

Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49

Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

4

1112

1314

Wei

ghte

d M

ISE

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50

Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51

Some Examples make a graph of the common-support stats mat M = r(M)

coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52

Some Examples

Multiple outcome variables

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53

Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54

Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2763358 166 0097 -082975 1000241

1ATT 9518705 406903 234 0019 1543553 1749386

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 123Prob gt chi2 = 02679

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4452343 111 0268 -379406 1365881

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017-06-24

Propensity Scores MatchingIllustration using kmatch

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017-06-24

Propensity Scores MatchingIllustration using kmatch

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64

Conclusions

The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)

Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65

Conclusions

Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the

literature (somewhat unclear eg how categorical variables should betreated)

Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 42: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary

I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)

In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31

The Command

kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)

I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics

I Computationally efficient implementation

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32

Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)

drop if industry==2(4 observations deleted)

Mahalanobis-distance kernel matching

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33

Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)

Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif

collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888

3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0

10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0

2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0

Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio

collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966

3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1

10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1

2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34

Some Examples make a graph of the balancing stats mat M = r(M)

mat V = r(V)

coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)

addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))

addplot 2 xline(1) norescaling

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-4 -2 0 2 4 0 1 2 3 4

Std mean difference Variance ratio

Raw Matched

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35

Some Examples

Propensity-score kernel matching

kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Propensity-score kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 431 26 457 1214 182 1396 00188

Treatment-effects estimation

wage Coef

ATT 3887224NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36

Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)

01

23

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Den

sity

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37

Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)

05

1

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Cum

ulat

ive

prob

abilit

y

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38

Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)

02

46

8Raw Matched (ATT)

Untreated Treated

Prop

ensi

ty s

core

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39

Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289

NATE 1432913 2333282 614 0000 9755981 1890228

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40

Some Examples

Do some tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41

Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42

Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43

Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44

Some Examples

Mahalanobis-distance and propensity-score matching combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45

Some Examples

Exact matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46

Some Examples

Bandwidth selection the default (based on distribution of distances in

one-nearest-neighbor matching)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47

Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

3

02

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48

Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49

Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

4

1112

1314

Wei

ghte

d M

ISE

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50

Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51

Some Examples make a graph of the common-support stats mat M = r(M)

coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52

Some Examples

Multiple outcome variables

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53

Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54

Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2763358 166 0097 -082975 1000241

1ATT 9518705 406903 234 0019 1543553 1749386

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 123Prob gt chi2 = 02679

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4452343 111 0268 -379406 1365881

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017-06-24

Propensity Scores MatchingIllustration using kmatch

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017-06-24

Propensity Scores MatchingIllustration using kmatch

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64

Conclusions

The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)

Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65

Conclusions

Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the

literature (somewhat unclear eg how categorical variables should betreated)

Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 43: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

Are King and Nielsen right

Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo

True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)

In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31

The Command

kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)

I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics

I Computationally efficient implementation

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32

Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)

drop if industry==2(4 observations deleted)

Mahalanobis-distance kernel matching

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33

Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)

Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif

collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888

3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0

10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0

2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0

Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio

collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966

3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1

10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1

2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34

Some Examples make a graph of the balancing stats mat M = r(M)

mat V = r(V)

coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)

addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))

addplot 2 xline(1) norescaling

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-4 -2 0 2 4 0 1 2 3 4

Std mean difference Variance ratio

Raw Matched

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35

Some Examples

Propensity-score kernel matching

kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Propensity-score kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 431 26 457 1214 182 1396 00188

Treatment-effects estimation

wage Coef

ATT 3887224NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36

Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)

01

23

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Den

sity

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37

Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)

05

1

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Cum

ulat

ive

prob

abilit

y

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38

Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)

02

46

8Raw Matched (ATT)

Untreated Treated

Prop

ensi

ty s

core

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39

Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289

NATE 1432913 2333282 614 0000 9755981 1890228

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40

Some Examples

Do some tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41

Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42

Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43

Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44

Some Examples

Mahalanobis-distance and propensity-score matching combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45

Some Examples

Exact matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46

Some Examples

Bandwidth selection the default (based on distribution of distances in

one-nearest-neighbor matching)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47

Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

3

02

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48

Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49

Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

4

1112

1314

Wei

ghte

d M

ISE

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50

Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51

Some Examples make a graph of the common-support stats mat M = r(M)

coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52

Some Examples

Multiple outcome variables

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53

Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54

Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2763358 166 0097 -082975 1000241

1ATT 9518705 406903 234 0019 1543553 1749386

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 123Prob gt chi2 = 02679

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4452343 111 0268 -379406 1365881

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017-06-24

Propensity Scores MatchingIllustration using kmatch

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017-06-24

Propensity Scores MatchingIllustration using kmatch

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64

Conclusions

The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)

Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65

Conclusions

Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the

literature (somewhat unclear eg how categorical variables should betreated)

Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 44: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31

The Command

kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)

I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics

I Computationally efficient implementation

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32

Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)

drop if industry==2(4 observations deleted)

Mahalanobis-distance kernel matching

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33

Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)

Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif

collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888

3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0

10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0

2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0

Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio

collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966

3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1

10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1

2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34

Some Examples make a graph of the balancing stats mat M = r(M)

mat V = r(V)

coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)

addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))

addplot 2 xline(1) norescaling

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-4 -2 0 2 4 0 1 2 3 4

Std mean difference Variance ratio

Raw Matched

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35

Some Examples

Propensity-score kernel matching

kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Propensity-score kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 431 26 457 1214 182 1396 00188

Treatment-effects estimation

wage Coef

ATT 3887224NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36

Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)

01

23

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Den

sity

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37

Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)

05

1

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Cum

ulat

ive

prob

abilit

y

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38

Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)

02

46

8Raw Matched (ATT)

Untreated Treated

Prop

ensi

ty s

core

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39

Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289

NATE 1432913 2333282 614 0000 9755981 1890228

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40

Some Examples

Do some tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41

Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42

Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43

Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44

Some Examples

Mahalanobis-distance and propensity-score matching combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45

Some Examples

Exact matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46

Some Examples

Bandwidth selection the default (based on distribution of distances in

one-nearest-neighbor matching)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47

Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

3

02

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48

Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49

Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

4

1112

1314

Wei

ghte

d M

ISE

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50

Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51

Some Examples make a graph of the common-support stats mat M = r(M)

coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52

Some Examples

Multiple outcome variables

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53

Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54

Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2763358 166 0097 -082975 1000241

1ATT 9518705 406903 234 0019 1543553 1749386

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 123Prob gt chi2 = 02679

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4452343 111 0268 -379406 1365881

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017-06-24

Propensity Scores MatchingIllustration using kmatch

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017-06-24

Propensity Scores MatchingIllustration using kmatch

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64

Conclusions

The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)

Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65

Conclusions

Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the

literature (somewhat unclear eg how categorical variables should betreated)

Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 45: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

The Command

kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)

I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics

I Computationally efficient implementation

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32

Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)

drop if industry==2(4 observations deleted)

Mahalanobis-distance kernel matching

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33

Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)

Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif

collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888

3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0

10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0

2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0

Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio

collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966

3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1

10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1

2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34

Some Examples make a graph of the balancing stats mat M = r(M)

mat V = r(V)

coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)

addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))

addplot 2 xline(1) norescaling

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-4 -2 0 2 4 0 1 2 3 4

Std mean difference Variance ratio

Raw Matched

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35

Some Examples

Propensity-score kernel matching

kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Propensity-score kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 431 26 457 1214 182 1396 00188

Treatment-effects estimation

wage Coef

ATT 3887224NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36

Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)

01

23

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Den

sity

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37

Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)

05

1

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Cum

ulat

ive

prob

abilit

y

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38

Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)

02

46

8Raw Matched (ATT)

Untreated Treated

Prop

ensi

ty s

core

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39

Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289

NATE 1432913 2333282 614 0000 9755981 1890228

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40

Some Examples

Do some tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41

Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42

Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43

Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44

Some Examples

Mahalanobis-distance and propensity-score matching combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45

Some Examples

Exact matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46

Some Examples

Bandwidth selection the default (based on distribution of distances in

one-nearest-neighbor matching)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47

Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

3

02

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48

Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49

Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

4

1112

1314

Wei

ghte

d M

ISE

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50

Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51

Some Examples make a graph of the common-support stats mat M = r(M)

coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52

Some Examples

Multiple outcome variables

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53

Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54

Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2763358 166 0097 -082975 1000241

1ATT 9518705 406903 234 0019 1543553 1749386

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 123Prob gt chi2 = 02679

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4452343 111 0268 -379406 1365881

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017-06-24

Propensity Scores MatchingIllustration using kmatch

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017-06-24

Propensity Scores MatchingIllustration using kmatch

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64

Conclusions

The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)

Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65

Conclusions

Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the

literature (somewhat unclear eg how categorical variables should betreated)

Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 46: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)

drop if industry==2(4 observations deleted)

Mahalanobis-distance kernel matching

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33

Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)

Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif

collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888

3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0

10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0

2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0

Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio

collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966

3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1

10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1

2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34

Some Examples make a graph of the balancing stats mat M = r(M)

mat V = r(V)

coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)

addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))

addplot 2 xline(1) norescaling

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-4 -2 0 2 4 0 1 2 3 4

Std mean difference Variance ratio

Raw Matched

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35

Some Examples

Propensity-score kernel matching

kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Propensity-score kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 431 26 457 1214 182 1396 00188

Treatment-effects estimation

wage Coef

ATT 3887224NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36

Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)

01

23

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Den

sity

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37

Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)

05

1

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Cum

ulat

ive

prob

abilit

y

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38

Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)

02

46

8Raw Matched (ATT)

Untreated Treated

Prop

ensi

ty s

core

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39

Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289

NATE 1432913 2333282 614 0000 9755981 1890228

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40

Some Examples

Do some tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41

Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42

Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43

Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44

Some Examples

Mahalanobis-distance and propensity-score matching combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45

Some Examples

Exact matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46

Some Examples

Bandwidth selection the default (based on distribution of distances in

one-nearest-neighbor matching)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47

Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

3

02

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48

Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49

Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

4

1112

1314

Wei

ghte

d M

ISE

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50

Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51

Some Examples make a graph of the common-support stats mat M = r(M)

coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52

Some Examples

Multiple outcome variables

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53

Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54

Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2763358 166 0097 -082975 1000241

1ATT 9518705 406903 234 0019 1543553 1749386

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 123Prob gt chi2 = 02679

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4452343 111 0268 -379406 1365881

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017-06-24

Propensity Scores MatchingIllustration using kmatch

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017-06-24

Propensity Scores MatchingIllustration using kmatch

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64

Conclusions

The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)

Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65

Conclusions

Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the

literature (somewhat unclear eg how categorical variables should betreated)

Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 47: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)

Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif

collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888

3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0

10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0

2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0

Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio

collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966

3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1

10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1

2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34

Some Examples make a graph of the balancing stats mat M = r(M)

mat V = r(V)

coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)

addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))

addplot 2 xline(1) norescaling

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-4 -2 0 2 4 0 1 2 3 4

Std mean difference Variance ratio

Raw Matched

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35

Some Examples

Propensity-score kernel matching

kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Propensity-score kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 431 26 457 1214 182 1396 00188

Treatment-effects estimation

wage Coef

ATT 3887224NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36

Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)

01

23

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Den

sity

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37

Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)

05

1

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Cum

ulat

ive

prob

abilit

y

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38

Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)

02

46

8Raw Matched (ATT)

Untreated Treated

Prop

ensi

ty s

core

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39

Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289

NATE 1432913 2333282 614 0000 9755981 1890228

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40

Some Examples

Do some tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41

Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42

Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43

Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44

Some Examples

Mahalanobis-distance and propensity-score matching combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45

Some Examples

Exact matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46

Some Examples

Bandwidth selection the default (based on distribution of distances in

one-nearest-neighbor matching)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47

Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

3

02

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48

Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49

Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

4

1112

1314

Wei

ghte

d M

ISE

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50

Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51

Some Examples make a graph of the common-support stats mat M = r(M)

coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52

Some Examples

Multiple outcome variables

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53

Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54

Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2763358 166 0097 -082975 1000241

1ATT 9518705 406903 234 0019 1543553 1749386

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 123Prob gt chi2 = 02679

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4452343 111 0268 -379406 1365881

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017-06-24

Propensity Scores MatchingIllustration using kmatch

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017-06-24

Propensity Scores MatchingIllustration using kmatch

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64

Conclusions

The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)

Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65

Conclusions

Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the

literature (somewhat unclear eg how categorical variables should betreated)

Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 48: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

Some Examples make a graph of the balancing stats mat M = r(M)

mat V = r(V)

coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)

addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))

addplot 2 xline(1) norescaling

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-4 -2 0 2 4 0 1 2 3 4

Std mean difference Variance ratio

Raw Matched

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35

Some Examples

Propensity-score kernel matching

kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Propensity-score kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 431 26 457 1214 182 1396 00188

Treatment-effects estimation

wage Coef

ATT 3887224NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36

Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)

01

23

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Den

sity

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37

Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)

05

1

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Cum

ulat

ive

prob

abilit

y

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38

Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)

02

46

8Raw Matched (ATT)

Untreated Treated

Prop

ensi

ty s

core

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39

Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289

NATE 1432913 2333282 614 0000 9755981 1890228

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40

Some Examples

Do some tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41

Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42

Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43

Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44

Some Examples

Mahalanobis-distance and propensity-score matching combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45

Some Examples

Exact matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46

Some Examples

Bandwidth selection the default (based on distribution of distances in

one-nearest-neighbor matching)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47

Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

3

02

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48

Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49

Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

4

1112

1314

Wei

ghte

d M

ISE

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50

Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51

Some Examples make a graph of the common-support stats mat M = r(M)

coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52

Some Examples

Multiple outcome variables

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53

Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54

Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2763358 166 0097 -082975 1000241

1ATT 9518705 406903 234 0019 1543553 1749386

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 123Prob gt chi2 = 02679

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4452343 111 0268 -379406 1365881

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017-06-24

Propensity Scores MatchingIllustration using kmatch

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017-06-24

Propensity Scores MatchingIllustration using kmatch

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64

Conclusions

The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)

Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65

Conclusions

Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the

literature (somewhat unclear eg how categorical variables should betreated)

Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 49: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

Some Examples

Propensity-score kernel matching

kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)

Propensity-score kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 431 26 457 1214 182 1396 00188

Treatment-effects estimation

wage Coef

ATT 3887224NATE 1432913

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36

Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)

01

23

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Den

sity

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37

Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)

05

1

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Cum

ulat

ive

prob

abilit

y

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38

Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)

02

46

8Raw Matched (ATT)

Untreated Treated

Prop

ensi

ty s

core

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39

Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289

NATE 1432913 2333282 614 0000 9755981 1890228

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40

Some Examples

Do some tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41

Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42

Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43

Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44

Some Examples

Mahalanobis-distance and propensity-score matching combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45

Some Examples

Exact matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46

Some Examples

Bandwidth selection the default (based on distribution of distances in

one-nearest-neighbor matching)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47

Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

3

02

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48

Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49

Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

4

1112

1314

Wei

ghte

d M

ISE

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50

Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51

Some Examples make a graph of the common-support stats mat M = r(M)

coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52

Some Examples

Multiple outcome variables

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53

Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54

Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2763358 166 0097 -082975 1000241

1ATT 9518705 406903 234 0019 1543553 1749386

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 123Prob gt chi2 = 02679

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4452343 111 0268 -379406 1365881

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017-06-24

Propensity Scores MatchingIllustration using kmatch

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017-06-24

Propensity Scores MatchingIllustration using kmatch

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64

Conclusions

The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)

Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65

Conclusions

Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the

literature (somewhat unclear eg how categorical variables should betreated)

Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 50: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)

01

23

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Den

sity

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37

Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)

05

1

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Cum

ulat

ive

prob

abilit

y

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38

Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)

02

46

8Raw Matched (ATT)

Untreated Treated

Prop

ensi

ty s

core

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39

Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289

NATE 1432913 2333282 614 0000 9755981 1890228

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40

Some Examples

Do some tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41

Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42

Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43

Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44

Some Examples

Mahalanobis-distance and propensity-score matching combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45

Some Examples

Exact matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46

Some Examples

Bandwidth selection the default (based on distribution of distances in

one-nearest-neighbor matching)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47

Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

3

02

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48

Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49

Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

4

1112

1314

Wei

ghte

d M

ISE

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50

Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51

Some Examples make a graph of the common-support stats mat M = r(M)

coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52

Some Examples

Multiple outcome variables

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53

Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54

Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2763358 166 0097 -082975 1000241

1ATT 9518705 406903 234 0019 1543553 1749386

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 123Prob gt chi2 = 02679

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4452343 111 0268 -379406 1365881

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017-06-24

Propensity Scores MatchingIllustration using kmatch

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017-06-24

Propensity Scores MatchingIllustration using kmatch

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64

Conclusions

The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)

Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65

Conclusions

Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the

literature (somewhat unclear eg how categorical variables should betreated)

Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 51: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)

05

1

0 2 4 6 8 0 2 4 6 8

Raw Matched (ATT)

Untreated Treated

Cum

ulat

ive

prob

abilit

y

Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38

Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)

02

46

8Raw Matched (ATT)

Untreated Treated

Prop

ensi

ty s

core

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39

Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289

NATE 1432913 2333282 614 0000 9755981 1890228

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40

Some Examples

Do some tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41

Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42

Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43

Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44

Some Examples

Mahalanobis-distance and propensity-score matching combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45

Some Examples

Exact matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46

Some Examples

Bandwidth selection the default (based on distribution of distances in

one-nearest-neighbor matching)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47

Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

3

02

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48

Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49

Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

4

1112

1314

Wei

ghte

d M

ISE

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50

Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51

Some Examples make a graph of the common-support stats mat M = r(M)

coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52

Some Examples

Multiple outcome variables

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53

Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54

Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2763358 166 0097 -082975 1000241

1ATT 9518705 406903 234 0019 1543553 1749386

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 123Prob gt chi2 = 02679

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4452343 111 0268 -379406 1365881

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017-06-24

Propensity Scores MatchingIllustration using kmatch

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017-06-24

Propensity Scores MatchingIllustration using kmatch

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64

Conclusions

The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)

Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65

Conclusions

Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the

literature (somewhat unclear eg how categorical variables should betreated)

Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 52: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)

02

46

8Raw Matched (ATT)

Untreated Treated

Prop

ensi

ty s

core

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39

Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289

NATE 1432913 2333282 614 0000 9755981 1890228

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40

Some Examples

Do some tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41

Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42

Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43

Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44

Some Examples

Mahalanobis-distance and propensity-score matching combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45

Some Examples

Exact matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46

Some Examples

Bandwidth selection the default (based on distribution of distances in

one-nearest-neighbor matching)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47

Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

3

02

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48

Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49

Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

4

1112

1314

Wei

ghte

d M

ISE

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50

Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51

Some Examples make a graph of the common-support stats mat M = r(M)

coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52

Some Examples

Multiple outcome variables

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53

Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54

Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2763358 166 0097 -082975 1000241

1ATT 9518705 406903 234 0019 1543553 1749386

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 123Prob gt chi2 = 02679

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4452343 111 0268 -379406 1365881

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017-06-24

Propensity Scores MatchingIllustration using kmatch

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017-06-24

Propensity Scores MatchingIllustration using kmatch

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64

Conclusions

The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)

Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65

Conclusions

Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the

literature (somewhat unclear eg how categorical variables should betreated)

Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 53: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289

NATE 1432913 2333282 614 0000 9755981 1890228

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40

Some Examples

Do some tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41

Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42

Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43

Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44

Some Examples

Mahalanobis-distance and propensity-score matching combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45

Some Examples

Exact matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46

Some Examples

Bandwidth selection the default (based on distribution of distances in

one-nearest-neighbor matching)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47

Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

3

02

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48

Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49

Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

4

1112

1314

Wei

ghte

d M

ISE

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50

Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51

Some Examples make a graph of the common-support stats mat M = r(M)

coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52

Some Examples

Multiple outcome variables

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53

Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54

Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2763358 166 0097 -082975 1000241

1ATT 9518705 406903 234 0019 1543553 1749386

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 123Prob gt chi2 = 02679

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4452343 111 0268 -379406 1365881

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017-06-24

Propensity Scores MatchingIllustration using kmatch

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017-06-24

Propensity Scores MatchingIllustration using kmatch

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64

Conclusions

The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)

Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65

Conclusions

Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the

literature (somewhat unclear eg how categorical variables should betreated)

Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 54: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

Some Examples

Do some tests

lincom ATT-NATE

( 1) ATT - NATE = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) -8270117 1810415 -457 0000 -1181847 -4721768

test ATT = ATC

( 1) ATT - ATC = 0

chi2( 1) = 242Prob gt chi2 = 01200

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41

Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42

Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43

Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44

Some Examples

Mahalanobis-distance and propensity-score matching combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45

Some Examples

Exact matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46

Some Examples

Bandwidth selection the default (based on distribution of distances in

one-nearest-neighbor matching)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47

Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

3

02

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48

Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49

Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

4

1112

1314

Wei

ghte

d M

ISE

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50

Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51

Some Examples make a graph of the common-support stats mat M = r(M)

coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52

Some Examples

Multiple outcome variables

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53

Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54

Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2763358 166 0097 -082975 1000241

1ATT 9518705 406903 234 0019 1543553 1749386

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 123Prob gt chi2 = 02679

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4452343 111 0268 -379406 1365881

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017-06-24

Propensity Scores MatchingIllustration using kmatch

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017-06-24

Propensity Scores MatchingIllustration using kmatch

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64

Conclusions

The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)

Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65

Conclusions

Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the

literature (somewhat unclear eg how categorical variables should betreated)

Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 55: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 1

Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 328 1068 1396

Treatment-effects estimation

wage Coef

ATT 7246969

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 7246969 2942952 246 0014 147889 1301505

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42

Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43

Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44

Some Examples

Mahalanobis-distance and propensity-score matching combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45

Some Examples

Exact matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46

Some Examples

Bandwidth selection the default (based on distribution of distances in

one-nearest-neighbor matching)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47

Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

3

02

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48

Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49

Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

4

1112

1314

Wei

ghte

d M

ISE

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50

Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51

Some Examples make a graph of the common-support stats mat M = r(M)

coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52

Some Examples

Multiple outcome variables

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53

Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54

Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2763358 166 0097 -082975 1000241

1ATT 9518705 406903 234 0019 1543553 1749386

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 123Prob gt chi2 = 02679

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4452343 111 0268 -379406 1365881

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017-06-24

Propensity Scores MatchingIllustration using kmatch

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017-06-24

Propensity Scores MatchingIllustration using kmatch

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64

Conclusions

The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)

Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65

Conclusions

Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the

literature (somewhat unclear eg how categorical variables should betreated)

Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 56: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5590823

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43

Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44

Some Examples

Mahalanobis-distance and propensity-score matching combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45

Some Examples

Exact matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46

Some Examples

Bandwidth selection the default (based on distribution of distances in

one-nearest-neighbor matching)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47

Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

3

02

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48

Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49

Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

4

1112

1314

Wei

ghte

d M

ISE

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50

Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51

Some Examples make a graph of the common-support stats mat M = r(M)

coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52

Some Examples

Multiple outcome variables

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53

Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54

Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2763358 166 0097 -082975 1000241

1ATT 9518705 406903 234 0019 1543553 1749386

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 123Prob gt chi2 = 02679

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4452343 111 0268 -379406 1365881

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017-06-24

Propensity Scores MatchingIllustration using kmatch

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017-06-24

Propensity Scores MatchingIllustration using kmatch

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64

Conclusions

The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)

Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65

Conclusions

Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the

literature (somewhat unclear eg how categorical variables should betreated)

Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 57: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)

Multivariate-distance nearest-neighbor matching

Number of obs = 1853Neighbors min = 5

Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 457 0 457 870 526 1396

Treatment-effects estimation

wage Coef

ATT 5288023

adjusted for collgrad ttl_exp tenure iindustry irace south

teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)

Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6

AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]

ATETunion

(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44

Some Examples

Mahalanobis-distance and propensity-score matching combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45

Some Examples

Exact matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46

Some Examples

Bandwidth selection the default (based on distribution of distances in

one-nearest-neighbor matching)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47

Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

3

02

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48

Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49

Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

4

1112

1314

Wei

ghte

d M

ISE

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50

Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51

Some Examples make a graph of the common-support stats mat M = r(M)

coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52

Some Examples

Multiple outcome variables

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53

Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54

Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2763358 166 0097 -082975 1000241

1ATT 9518705 406903 234 0019 1543553 1749386

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 123Prob gt chi2 = 02679

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4452343 111 0268 -379406 1365881

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017-06-24

Propensity Scores MatchingIllustration using kmatch

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017-06-24

Propensity Scores MatchingIllustration using kmatch

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64

Conclusions

The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)

Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65

Conclusions

Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the

literature (somewhat unclear eg how categorical variables should betreated)

Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 58: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

Some Examples

Mahalanobis-distance and propensity-score matching combined

kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 439 18 457 1258 138 1396 83886

Treatment-effects estimation

wage Coef

ATT 6408443

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45

Some Examples

Exact matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46

Some Examples

Bandwidth selection the default (based on distribution of distances in

one-nearest-neighbor matching)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47

Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

3

02

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48

Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49

Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

4

1112

1314

Wei

ghte

d M

ISE

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50

Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51

Some Examples make a graph of the common-support stats mat M = r(M)

coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52

Some Examples

Multiple outcome variables

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53

Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54

Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2763358 166 0097 -082975 1000241

1ATT 9518705 406903 234 0019 1543553 1749386

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 123Prob gt chi2 = 02679

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4452343 111 0268 -379406 1365881

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017-06-24

Propensity Scores MatchingIllustration using kmatch

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017-06-24

Propensity Scores MatchingIllustration using kmatch

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64

Conclusions

The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)

Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65

Conclusions

Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the

literature (somewhat unclear eg how categorical variables should betreated)

Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 59: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

Some Examples

Exact matching

kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1103 293 1396 13013

Treatment-effects estimation

wage Coef

ATT 6047374

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46

Some Examples

Bandwidth selection the default (based on distribution of distances in

one-nearest-neighbor matching)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47

Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

3

02

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48

Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49

Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

4

1112

1314

Wei

ghte

d M

ISE

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50

Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51

Some Examples make a graph of the common-support stats mat M = r(M)

coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52

Some Examples

Multiple outcome variables

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53

Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54

Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2763358 166 0097 -082975 1000241

1ATT 9518705 406903 234 0019 1543553 1749386

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 123Prob gt chi2 = 02679

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4452343 111 0268 -379406 1365881

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017-06-24

Propensity Scores MatchingIllustration using kmatch

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017-06-24

Propensity Scores MatchingIllustration using kmatch

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64

Conclusions

The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)

Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65

Conclusions

Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the

literature (somewhat unclear eg how categorical variables should betreated)

Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 60: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

Some Examples

Bandwidth selection the default (based on distribution of distances in

one-nearest-neighbor matching)

kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1105 291 1396 13394

Treatment-effects estimation

wage Coef

ATT 6059013

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47

Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

3

02

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48

Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49

Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

4

1112

1314

Wei

ghte

d M

ISE

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50

Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51

Some Examples make a graph of the common-support stats mat M = r(M)

coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52

Some Examples

Multiple outcome variables

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53

Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54

Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2763358 166 0097 -082975 1000241

1ATT 9518705 406903 234 0019 1543553 1749386

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 123Prob gt chi2 = 02679

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4452343 111 0268 -379406 1365881

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017-06-24

Propensity Scores MatchingIllustration using kmatch

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017-06-24

Propensity Scores MatchingIllustration using kmatch

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64

Conclusions

The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)

Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65

Conclusions

Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the

literature (somewhat unclear eg how categorical variables should betreated)

Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 61: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 448 9 457 1184 212 1396 18888

Treatment-effects estimation

wage Coef

ATT 6651578

kmatch cvplot ms(o) index mlabposition(1) sort

1

57 915131411128 102 6

4

3

02

04

06

08

1MSE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48

Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49

Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

4

1112

1314

Wei

ghte

d M

ISE

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50

Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51

Some Examples make a graph of the common-support stats mat M = r(M)

coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52

Some Examples

Multiple outcome variables

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53

Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54

Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2763358 166 0097 -082975 1000241

1ATT 9518705 406903 234 0019 1543553 1749386

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 123Prob gt chi2 = 02679

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4452343 111 0268 -379406 1365881

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017-06-24

Propensity Scores MatchingIllustration using kmatch

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017-06-24

Propensity Scores MatchingIllustration using kmatch

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64

Conclusions

The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)

Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65

Conclusions

Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the

literature (somewhat unclear eg how categorical variables should betreated)

Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 62: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 453 4 457 1289 107 1396 2433

Treatment-effects estimation

wage Coef

ATT 6928956

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

579 61110131512

148

4

3

118

12122

124

126

MISE

15 2 25 3Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49

Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

4

1112

1314

Wei

ghte

d M

ISE

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50

Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51

Some Examples make a graph of the common-support stats mat M = r(M)

coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52

Some Examples

Multiple outcome variables

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53

Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54

Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2763358 166 0097 -082975 1000241

1ATT 9518705 406903 234 0019 1543553 1749386

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 123Prob gt chi2 = 02679

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4452343 111 0268 -379406 1365881

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017-06-24

Propensity Scores MatchingIllustration using kmatch

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017-06-24

Propensity Scores MatchingIllustration using kmatch

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64

Conclusions

The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)

Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65

Conclusions

Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the

literature (somewhat unclear eg how categorical variables should betreated)

Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 63: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 455 2 457 1356 40 1396 27626

Treatment-effects estimation

wage Coef

ATT 7308166

kmatch cvplot ms(o) index mlabposition(1) sort

1

2

6

10121481513119

3

7

5

4

1112

1314

Wei

ghte

d M

ISE

1 2 3 4 5Bandwidth

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50

Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51

Some Examples make a graph of the common-support stats mat M = r(M)

coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52

Some Examples

Multiple outcome variables

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53

Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54

Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2763358 166 0097 -082975 1000241

1ATT 9518705 406903 234 0019 1543553 1749386

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 123Prob gt chi2 = 02679

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4452343 111 0268 -379406 1365881

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017-06-24

Propensity Scores MatchingIllustration using kmatch

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017-06-24

Propensity Scores MatchingIllustration using kmatch

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64

Conclusions

The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)

Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65

Conclusions

Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the

literature (somewhat unclear eg how categorical variables should betreated)

Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 64: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)

Multivariate-distance kernel matching Number of obs = 1853Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 366 91 457 701 695 1396 5

Treatment-effects estimation

wage Coef

ATT 3303161

kmatch csummarize(refitting the model using the generate() option)

Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)

collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734

3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577

10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348

2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753

Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)

collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073

3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361

10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885

2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939

(1) matched (2) unmatched (3) total

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51

Some Examples make a graph of the common-support stats mat M = r(M)

coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52

Some Examples

Multiple outcome variables

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53

Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54

Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2763358 166 0097 -082975 1000241

1ATT 9518705 406903 234 0019 1543553 1749386

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 123Prob gt chi2 = 02679

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4452343 111 0268 -379406 1365881

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017-06-24

Propensity Scores MatchingIllustration using kmatch

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017-06-24

Propensity Scores MatchingIllustration using kmatch

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64

Conclusions

The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)

Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65

Conclusions

Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the

literature (somewhat unclear eg how categorical variables should betreated)

Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 65: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

Some Examples make a graph of the common-support stats mat M = r(M)

coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)

collgradttl_exptenure

3industry4industry5industry6industry7industry8industry9industry

10industry11industry12industry

2race3racesouth

-2 -1 0 1 2

Std difference

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52

Some Examples

Multiple outcome variables

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53

Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54

Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2763358 166 0097 -082975 1000241

1ATT 9518705 406903 234 0019 1543553 1749386

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 123Prob gt chi2 = 02679

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4452343 111 0268 -379406 1365881

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017-06-24

Propensity Scores MatchingIllustration using kmatch

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017-06-24

Propensity Scores MatchingIllustration using kmatch

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64

Conclusions

The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)

Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65

Conclusions

Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the

literature (somewhat unclear eg how categorical variables should betreated)

Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 66: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

Some Examples

Multiple outcome variables

kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 6021049

NATE 1430823

hoursATT 1263759

NATE 1450303

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53

Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54

Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2763358 166 0097 -082975 1000241

1ATT 9518705 406903 234 0019 1543553 1749386

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 123Prob gt chi2 = 02679

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4452343 111 0268 -379406 1365881

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017-06-24

Propensity Scores MatchingIllustration using kmatch

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017-06-24

Propensity Scores MatchingIllustration using kmatch

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64

Conclusions

The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)

Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65

Conclusions

Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the

literature (somewhat unclear eg how categorical variables should betreated)

Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 67: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)

Multivariate-distance kernel matching Number of obs = 1852Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

Treated 432 25 457 1104 291 1395 13392

Treatment-effects estimation

Coef

wageATT 5152752

NATE 1430823

hoursATT 1263759

NATE 1450303

wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54

Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2763358 166 0097 -082975 1000241

1ATT 9518705 406903 234 0019 1543553 1749386

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 123Prob gt chi2 = 02679

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4452343 111 0268 -379406 1365881

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017-06-24

Propensity Scores MatchingIllustration using kmatch

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017-06-24

Propensity Scores MatchingIllustration using kmatch

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64

Conclusions

The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)

Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65

Conclusions

Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the

literature (somewhat unclear eg how categorical variables should betreated)

Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 68: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)

(running kmatch on estimation sample)

Bootstrap replications (50)1 2 3 4 5

50

Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan

Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace

0 south = 01 south = 1

Matching statistics

Matched Controls Band-Yes No Total Used Unused Total width

0Treated 306 15 321 625 120 745 13199

1Treated 126 10 136 473 178 651 13398

Treatment-effects estimation

Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]

0ATT 4586332 2763358 166 0097 -082975 1000241

1ATT 9518705 406903 234 0019 1543553 1749386

test [0]ATT = [1]ATT

( 1) [0]ATT - [1]ATT = 0

chi2( 1) = 123Prob gt chi2 = 02679

lincom [1]ATT - [0]ATT

( 1) - [0]ATT + [1]ATT = 0

wage Coef Std Err z Pgt|z| [95 Conf Interval]

(1) 4932373 4452343 111 0268 -379406 1365881

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017-06-24

Propensity Scores MatchingIllustration using kmatch

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017-06-24

Propensity Scores MatchingIllustration using kmatch

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64

Conclusions

The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)

Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65

Conclusions

Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the

literature (somewhat unclear eg how categorical variables should betreated)

Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 69: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

Simulation

Population data from Swiss census of 2000

Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)

Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group

Control variables gender age and highest educational degree

Population restricted to people between 24 to 60 years old who areworking

2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup

Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017-06-24

Propensity Scores MatchingIllustration using kmatch

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017-06-24

Propensity Scores MatchingIllustration using kmatch

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64

Conclusions

The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)

Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65

Conclusions

Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the

literature (somewhat unclear eg how categorical variables should betreated)

Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 70: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates

Propensity score in population (computed from fully stratified data)

0

1

2

3

4

5

6

7

Den

sity

0 1 2 3 4 5 6 7 8 9 1Propensity score

UntreatedTreated

McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017-06-24

Propensity Scores MatchingIllustration using kmatch

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017-06-24

Propensity Scores MatchingIllustration using kmatch

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64

Conclusions

The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)

Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65

Conclusions

Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the

literature (somewhat unclear eg how categorical variables should betreated)

Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 71: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

Simulation

Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)

30

35

40

45

50

55

Out

com

e

0 1 2 3 4 5 6 7 8Propensity score

UntreatedTreated

-6

-5

-4

-3

-2

-1

Trea

tmen

t effe

ct

0 1 2 3 4 5 6 7 8Propensity score

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017-06-24

Propensity Scores MatchingIllustration using kmatch

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017-06-24

Propensity Scores MatchingIllustration using kmatch

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64

Conclusions

The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)

Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65

Conclusions

Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the

literature (somewhat unclear eg how categorical variables should betreated)

Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 72: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017-06-24

Propensity Scores MatchingIllustration using kmatch

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017-06-24

Propensity Scores MatchingIllustration using kmatch

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64

Conclusions

The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)

Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65

Conclusions

Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the

literature (somewhat unclear eg how categorical variables should betreated)

Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 73: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Variance

2017-06-24

Propensity Scores MatchingIllustration using kmatch

In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM

For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017-06-24

Propensity Scores MatchingIllustration using kmatch

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64

Conclusions

The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)

Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65

Conclusions

Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the

literature (somewhat unclear eg how categorical variables should betreated)

Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 74: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017-06-24

Propensity Scores MatchingIllustration using kmatch

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64

Conclusions

The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)

Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65

Conclusions

Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the

literature (somewhat unclear eg how categorical variables should betreated)

Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 75: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Bias reduction (in percent)

2017-06-24

Propensity Scores MatchingIllustration using kmatch

Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias

The bias also vanishes once post-matching regression adjustment isapplied

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64

Conclusions

The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)

Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65

Conclusions

Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the

literature (somewhat unclear eg how categorical variables should betreated)

Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 76: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to X

cross-validationwith respect to Y

weighted CVwith respect to Y

Nearest-neighbormatching

Kernel matching

15 2 25 3 35 4 45 15 2 25 3 35 4 45 5

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Mean squared error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64

Conclusions

The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)

Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65

Conclusions

Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the

literature (somewhat unclear eg how categorical variables should betreated)

Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 77: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

9 95 1 105 11 115 12 95 1 105 11 115 12 125

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Relative standard error

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64

Conclusions

The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)

Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65

Conclusions

Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the

literature (somewhat unclear eg how categorical variables should betreated)

Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 78: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

1 neighbor

5 neighbors

1 neighbor

5 neighbors

fixed bandwidth

pair-matchingbandwidth

cross-validationwith respect to Xcross-validation

with respect to Yweighted CV

with respect to Y

Nearest-neighbormatching (teffects)

Nearest-neighbormatching (bootstrap)

Kernel matching(bootstrap)

92 93 94 95 96 97 98 9 92 94 96 98

N = 500 N = 5000

MDMwith biascorrection

PSMwith biascorrection

Results Coverage of 95 CIs

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64

Conclusions

The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)

Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65

Conclusions

Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the

literature (somewhat unclear eg how categorical variables should betreated)

Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 79: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

1 Potential Outcomes and Causal Inference

2 Matching

3 Propensity Score Matching

4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo

5 Are King and Nielsen right

6 Illustration using kmatch

7 Conclusions

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64

Conclusions

The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)

Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65

Conclusions

Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the

literature (somewhat unclear eg how categorical variables should betreated)

Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 80: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

Conclusions

The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)

Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65

Conclusions

Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the

literature (somewhat unclear eg how categorical variables should betreated)

Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 81: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

Conclusions

Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will

generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)

- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the

literature (somewhat unclear eg how categorical variables should betreated)

Computational complexity

One clear conclusion we can draw however is

Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 82: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

Conclusions

Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear

I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased

To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 83: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

References I

Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313

Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90

Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290

Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics

King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw

Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions
Page 84: Why Propensity Scores Should Be Used for Matching · c c cccc cccc cc cccccc c cc c c cccc cc c c cc c cccccc cccc cc cccccccccccc cc ccccccccc c cccccc ccc c ccc c cc cc ccccccc

References II

Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472

Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55

Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701

Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480

Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69

  • Potential Outcomes and Causal Inference
  • Matching
  • Propensity Score Matching
  • King and Nielsens ldquoWhy Propensity Scores Should Not Be Used for Matchingrdquo
  • Are King and Nielsen right
  • Illustration using kmatch
  • Conclusions