endogeneity

14
ENDOGENEITY Development Workshop

Upload: tave

Post on 06-Jan-2016

41 views

Category:

Documents


0 download

DESCRIPTION

ENDOGENEITY. Development Workshop. What is endogeneity and why we do not like it. Three causes: X influences Y, but Y reinforces X too Z causes both X and Y fairly contemporaneusly X causes Y, but we cannot observe X and Z (which we observe) is influenced by X but also by Y Consequences: - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: ENDOGENEITY

ENDOGENEITY

Development

Workshop

Page 2: ENDOGENEITY

What is endogeneity and why we do not like it

Three causes:– X influences Y, but Y reinforces X too– Z causes both X and Y fairly contemporaneusly– X causes Y, but we cannot observe X and Z (which we

observe) is influenced by X but also by Y Consequences:

– No matter how many observations – estimators biased (this is called: inconsistent)

– Ergo: whatever point estimates we find, we can’t even tell if they are positive/negative/significant, because we do not know the size of bias + no way to estimate the size of bias

Page 3: ENDOGENEITY

Some more on what endogeneity may actually be

Page 4: ENDOGENEITY

How can difference-in-difference be helpful

Suppose your problem is measurement of treatment (effect of change in policy/choice)

– Some individuals are more likely to be treated/make some choices

– These very same individuals may be more likely to exhibit better/worse performance

– As a result systematic relationship that in the real world is attributable to individual specificity, but in our model will be attributed to the effects of policy/choice

What can be done?– Instruments are not of much help here…– Neither will be panel analysis, unless…

Page 5: ENDOGENEITY

What is diff-in-diff exactly?

Example: Algeria (LG=1) does „sth” in year t+1 Angola does not in t, nor t+1 (LG=0)

We want to know the effect of this „sth” If we did:

Y=β_0 + β_1*T + ewe would not know the difference between sth and Algeria/Angola and time

But we can also do this:

Y=β_0 + β_1*T + β_2*LG + β_3*(T*LG) + e Then, we distinguish between individual and time effects as well

as their interaction:

Page 6: ENDOGENEITY

What is diff-in-diff exactly?

Algeria Angola

Year 1 a b

Year 2 c d

Coefficient Calculation

β_0 a

β_1 c-a

β_2 b-a

β_3 (d-b)-(c-a)

Page 7: ENDOGENEITY

How diff-in-diff works in practice?

Yist - outcome of interest for individual i in group s at time t Tst - dummy whether the intervention a ected group s at time tff As and Bt are fixed e ects for the states and yearsff Xist - relevant individual covariates β - estimated impact of the intervention (OLS) with fixed time and

state effects Standard errors around that estimate are OLS standard errors after

accounting for the correlation of shocks within each state-year (s,t)

Page 8: ENDOGENEITY

A nice quote from Joshua Angrist (MIT)

Four steps:

1. What is the causal relationship of interest?

2. What experiment could ideally be used to capture that causal effect of interest?

3. What is the identification strategy?

4. What is your mode of statistical inference Problem?

Page 9: ENDOGENEITY

A nice quote from Joshua Angrist (MIT)

Although inference issues are rarely very exciting, and often quite technical, the ultimate success of even a well-conceived and conceptually exciting project turns on the details of statistical inference. This sometimes-dispiriting fact inspired the following econometrics haiku, penned by then-econometrics-Ph.D.-student Keisuke Hirano on the occasion of completing his thesis:

T-stat looks too good;

Use robust standard errors–;

Signifi…cance gone.

Page 10: ENDOGENEITY

What is the problem in the case of diff-in-diff estimator?

Serial correlation as an enemy:– We take time dimension seriously (this is the major

identification strategy)– Our LHS variable may be serially autocorrelated

Page 11: ENDOGENEITY

What is the problem in the case of diff-in-diff estimator?

As T->∞, ratio of true to estimated variance of the estimated parameter approaches

with ρ serial correlation in error term and λ serial correlation in independent variable

If correlation negative (ρ<0), standard errors overstated (too frequently reject the null)

If correlation positive (ρ<0), standard errors understated (too rarely reject the null)

If λ=0, no problem with standard errors, but this highly unrealistic…

Page 12: ENDOGENEITY

Paper by Bertrand et al. (2004, QJE)

Take all papers that use diff-in-diff in top journals (N=92)– Discuss their faults in how diff-in-diff is used– Propose a „placebo” excercise:

Randomly allocate that some US state has implemented some policy in certain point in time

Run method as amployed by these 92 papers to see if the results demonstrate the statistically significant effect of this „fake” policy

Conclusions– DD estimation may grossly under-state the standard errors => find

the effect of policy/change where there should be nothing– It may be corrected, but GLS is not a solution => collapse data into

post and pre periods and cluster standard errors.

Page 13: ENDOGENEITY

How to do diff-in-diff?

Need to have a control group– Sometimes it is enough that someone does things later than

others Need to have at least two periods (before and after)

– For robustness of your findings, it is good to collapse before and after

– For interpretation of your findings it is good to keep in mind what is the effect of such data adaptation

Need to have a good reason (theoretical!), why should there be any change at all

– What is it exactly that have actually changed?– Why was the change implemented?

Page 14: ENDOGENEITY

Next week – practical excercise

Read the papers posted on the web, but we will replicate particularly:– Minimum Wages and Employment: A Case-Study of

the Fast-Food Industry in New Jersey and Pennsylvania, by David Card and Alan Krueger (1994)