econometrics amayoral.iae-csic.org › econometrics_insead20 › handout6.pdf · 6.obtain point...

Econometrics AHandout 6: IV Estimation and its Limitations

Weak Instruments and Weakly Endogenous Regressors

Laura Mayoral

IAE and BGSE

February 2020

Roadmap

1. Deviations from the standard framework:

Irrelevant and weak instruments

Endogeneous and weakly endogeneous regressors

2. Testing for deviations from the standard framework

Testing the exclusion restrictions: J-test of overidentification

Detection of weak instruments

Inference robust to weak instruments

General advice for a good implementation ofIV methods

The following is a list of the steps you should follow for acorrect implementation of IV techniques

The literature is still under construction– in a few years thislist can become obsolete!

Each of the steps in this list is explained in this handout,

see also Murray 2006 and references throughout this handout.

General advice for a good implementation ofIV methods

(See Murray, 2006)

1. Check the significance and estimated signs of the instrumentsin the troublesome variables reduced form and in the dependentvariables reduced form for consistency with the instruments ratio-nale.

2. Avoid bad instruments. Build a case for the validity of the in-strument, using the nine strategies described and illustrated below.

3. Test for weak instruments using Stock-Yogo critical values(Stock and Yogo (2004)). If weakness is not rejected, proceedto (4). If weakness is rejected, either turn to 2SLS or, preferably,proceed to (4).

4. Conduct hypothesis tests using the conditional likelihood ratio(CLR) test (Moreira (2003)) or its robust variants.

5. Build confidence intervals based on the CLR test or its robustvariants.

6. Obtain point estimates using Fullers estimators with a = 1 or4 (Fuller (1977)) rather than using 2SLS.

7. You can also check how robust your results are to violations ofthe exclusion restriction by using Conley (2012) approach

8. Interpret the IV results with care.

Most of the tests/estimation procedures above can be imple-mented in STATA using ivreg2. See also additional commandsmentioned below.

1. The standard IV framework

Consider the following model

y = xβ + ε; (1)

x = zΠ + v; (2)

corr(ε, v) 6= 0 (3)

The IV setup:

• x is endogenous since corr(ε, v) 6= 0

• Unless otherwise stated it’s assumed throughout that z isexogenous (E(z′ε) = 0)

• Without loss of generality, we omit exogenous regressors (ifthey exist, they can be partialled out)

1.1. Violations to the IV set up: Irrelevant instruments

Recall that if there is just one IV then

β̂iv = β +̂cov(z, y)̂cov(z,x)

If cov(z,x) = 0 −→ z is irrelevant.

General case (more instruments): if E(z′x) = 0 −→ z is irrele-vant.

Using the notation above: Π = 0 −→ z is irrelevant

Irrelevant instruments, II

What happens when z is irrelevant?

β̂2sls is not identified.

β̂2sls is inconsistent (we knew this already!)

The distribution of (β̂2sls − β) is Cauchy-like

The bias of β̂2sls tends to that of β̂ols: The distribution of β̂2slsis centered around the plim(β̂ols).

1.2. Weak instruments

Recall that if there is just one IV then

β̂2sls = β +̂cov(z, y)̂cov(z,x)

If cov(z,x) > 0 but close to zero: z is weak.

Why? if ̂cov(z,x) is close to zero −→, the bias of β̂2sls will bevery large!

Weak instruments, consequences

Although strictly speaking the conditions for consistency ofβ̂2sls are met (since E(x′z) 6= 0) standard asymptotic theory yieldvery poor approximation to the finite-sample distributions wheninstruments are weak.

The main problem derives from the fact that the finite-sampledistribution is very different from the asymptotic one (and remem-ber that 2SLS’s justification is asymptotic!).

As a result β̂2sls is weakly identified, (i.e., its distribution is notwell approximated by their standard asymptotic distribution).

Consequences, II

The source of the problem is not small-sample problems in aconventional sense, but rather, limited information, (see Bound etal. 1995).

To evaluate how severe is the weak instrument problem, weneed to have analytical expressions that approximate better thefinite sample properties of the estimators when instruments areweak.

Analytical approximations to the finite sample distribu-tions of IV estimators when instruments are weak

Different approaches:

Approach 1: Assume errors are normal and z is fix. Then wecan derive the exact distribution

Define:

µ2 = Π′z′zΠ/σ2v

µ2: concentration parameter, is a measure of the strength ofthe instruments.

(Recall that σ2v is the variance of v, defined in the linear projectionof x on z.)

(more specifically µ2: it measures the share of the variance of xexplained by z –normalised by the variance of v–).

Nelson and Startz (1990) show that

µ(β̂2sls − β) = Λ

µ plays the role that is usually played by√N (sample size).

If µp−→ ∞: instruments are strong, Λ is the usual normal dis-

tribution

If µ is small: then Λ is not standard and estimates will be verybiased.

How big is the bias? VERY big if µ is small.

The following graph plots Λ for different values of µ

In this example: β = 0, ρ = .95 is the degree of endogeneity(correlation between the endogeneous regressor and the innova-tion).

Approach 2: Weak instrument asymptotics

Approach 1 is developed under very demanding assumptions:normality, fixed z.

Staiger and Stock (1997): showed that very similar resultscan be obtained (i.e., same distribution of β, Λ) under generalconditions (z not fixed, errors non normal).

Approach 2: Weak instrument asymptotics, II

Their approach: assume Π = ΠN = C/√

(N), where N is thenumber of obs., such that

y = xβ0 + ε; (4)

x = zΠ + v; (5)

Π = ΠN = C/√

(N) (6)

corr(ε, v) 6= 0 (7)

estimate β and compute the asymptotic distribution of β underΠ = ΠN = C/

√(N)

−→ weak instrument asymptotics.

Weak instrument asymptotics

As mentioned before, weak instruments should not be thoughtof a finite sample problem!

Staiger and Stock show that:

for each sample size (even for a very large one) there will existsome values of correlation between the instrument and the regres-sor such that the quality of normal approximation is poor

What’s the meaning Π = ΠN = C/√

This is a just ’trick’ to obtain analytical expressions that ap-proximate the distribution in finite samples in a better way.

That is, we don’t really believe that Π = ΠN = C/√

(N), but itis useful to assume this setup.

We saw before that for any Π arbitrarily close to zero (but > 0),standard asymptotic theory will not be informative as the limitedinformation problem will be eventually overcome by an infinite sam-ple size.

This solution is not satisfactory as we never have such a thing!

By choosing Π = ΠN = C/√

(N), µ2 remains constant as thesample size increases (rather than going to ∞ as it would for afix Π > 0 as the sample size increases) (so the weak instrumentsproblem remains even if N −→∞)

In this case

µ2 = Π′z′zΠ/σ2v = C′(z′z/N)C/σ2v

which will converge to a constant as N increases.

Summary of weak IV asymptotic results (Staiger andStock, 1997).

β̂2sls is not consistent and non-normal

The analytical expressions obtained under weak IV asymptotics:provide very good approximations of the finite sample distributionswhen the correlation between the instruments and the endogenousregressor is small.

Test statistics (including J-test of overindentifying restrictions)do not have standard distributions.

Summary of weak IV asymptotic results, II.

Bias of β̂12sls :

If z is irrelevant (E(Z’X)=0) −→ β̂2sls is centered around theplim(β̂ols)

Note. Remember [plimβ̂ols = β +E(X ′X)−1E(X ′ε)]

If z is weak: the bias of β̂2sls tends to the bias of β̂ols

1.3. Violations to the exclusion condition: Setup

Let’s now consider violations to the exclusion restriction

y = xβ + ε; (8)

x = zΠ + v; (9)

ε = zγ + ε∗ (10)

corr(ε∗, v) 6= 0 (11)

Unless otherwise stated, assume z is strong i.e. Π 6= 0 and islarge.

If γ 6= 0 −→ z is endogenous, (exclusion restriction is violated)

Endogeneous instruments, consequences

β is not consistent.

The bias (1 endogenous variable, 1 instrument):

β̂2sls − βp−→

cov(z, ε)

cov(z,x)=γ

Very important:

Any correlation between z an ε will be magnified if the correla-tion between z and x is small (i.e., if Π is small)

An example: Estimating the causal effect of yearsof education on lifetime earnings, Angrist and Krueger(1991).

Education is likely to be endogenous. Why?

Omitted variable: innate ability.

More talented people will tend to remain in school longer.

Also, more talented people will tend to earn more money.

Angrist and Krueger (1991)’s approach: use “quarter of birth”as IV.

Their argument: US compulsory schooling laws are in terms ofage, not number of years of schooling completed. If compulsoryschooling age is 16, you can drop out on your 16th birthday (evenif in the middle of the school year).

School entry is once a year, and cutoffs are based on birthdays.Beginning school: kids that are 6 years olds by Sep. 1st

The combination of these two generates variation in schoolingfor those who drop out as soon as they can.

Consider two students, one born on August 31st and anotherborn on Sep. 1st: by the time they turn 16, the second has had 1additional year of education.

Instrument validity

Do birthdays satisfy the exclusion restriction, or could birthdaysbe correlated with earnings for other reasons than their effect onschooling?

Birthday affects, e.g., age rank in class?.

Do birthdays indeed affect schooling? −→ Check the first stage.

Angrist and Krueger data

Huge dataset.

Data are from the 1980 US Census. 329,509 men born 1930 to1939 (i.e. in their 40s when observed). For these men we haveyear of birth, quarter of birth, years of schooling, and earnings in1979.

Do birthdays satisfy the exclusion restriction?,

Bound, Jaeger and Baker (1995): argue that z might not beexogenous:

Some evidence that quarter of birth is related to school atten-dance rates, performance in reading, maths, etc.

Differences in physical and mental health of individuals born indifferent times of the year.

Key point:

Although the correlation between z and ε is likely to be small,it gets magnified by the very small correlation between quarter ofbirth and education!!.

As a result, the bias can be very large.

Bound et al: used Angrist and Krueger data but instead ofusing the actual quarter of birth, they randomly assigned a quarterof birth to each observation.

This ’random’ quarter of birth is exogenous but also, totallyuncorrelated with education.

However, the results using this instrument were very similarthan those obtained with the true quarter of birth!

Nothing in the second stage regression could suggest that thenew Z was totally irrelevant!

Another important point: the weak instrument problem wasimportant even if the sample size was huge!!

Many weak instruments

Angrist and Krueger used as instruments the interactions be-tween quarter of birth and year of birth and quarter of birth andstate of birth.

Adding many weak instruments makes the problem even worse:the F statistic of the first stage gets smaller −→the bias gets worse.

The next three tables are from Bound et al..

The first table shows that by increasing the number of (weak)instruments, the F statistic that tests the joint significance of theinstruments in the the first stage decreases.

Table 2: many weak instruments –contains instruments basedon quarter of birth interacted with state of birth and year of birth.

Table 3: uses the randomly generated quarters of birth (i.e.,all instruments are irrelevant).

Compare Tables 1/2 with 3: just by looking at the secondstage regression you won’t be able to detect that instruments areirrelevant: both tables look similar!

but the first stage shows that the regressions are problematic–look at the F’s!

Notice that the OLS estimates and those obtained with irrele-vant instruments are very similar, as the theory predicts.

Weakly endogeneous regressors

In practice, it is very likely that z and ε have some correlation.

Thus, as before, it is useful to study local violations to theexclusion restriction

Set up

y = xβ + ε; (12)

x = zΠ + v; (13)

ε = zγ + ε∗ (14)

γ = γn = B/√

(N) (15)

corr(ε∗, v) 6= 0 (16)

Interpretation:

We don’t believe that γ is as described,

this is just a trick to find analytical expressions to the finitesample distributions under mild violations of the endogeneity re-striction (and it works very well!).

Summary of asymptotic results under endogeneity of Z

1. Recall that γ 6= 0 −→ β̂2sls is not consistent

2. But if γ = γn = B/√

(N), β̂2sls is consistent!

3. Asymptotic distribution of β̂2sls is normal with the same variance-covariance matrix but centered on a ’wrong’ value.

T 1/2(β̂2sls − β0)d→ N(

σ2εΠ2E(Z′Z)

) (17)

Some implications

Under mild violations of the exclusion restriction: point esti-mates might still be ok! (IF sample size is large, estimates arestrong and violation is mild!!)

Inference is wrong even under mild violations: we’ll tend toreject the null hypothesis of no significance too often when it istrue (size is wrong).

If B is small relative to Π: the bias will be small.

If we knew B or (it could be consistently estimated), we couldcorrect the distribution and use it to obtain valid inference.

But B cannot be consistently estimated!

An alternative approach: Conley et al. (2012)

Plausibly exogeneous, Conley et al. (2012)

Main idea: Relax the exclusion restriction by adopting a Bayesianapproach

Set up

y = xβ + zγ + ε; (18)

x = zΠ + v; (19)

corr(ε, v) 6= 0 (20)

where x can contain various endogenous regressors (s) and z con-tains r instruments (r≥ s).

Plausibly exogeneous, II

Bayesian approach: incorporate beliefs about γ

The IV exclusion restriction is equivalent to the dogmatic beliefthat γ = 0.

New belief: γ is close to zero, but maybe not identical to zero.

To implement this technique in STATA: ssc install plausexog

Plausibly exogeneous, III

4 inference strategies that use prior information about γ

1. The research specifies the support of γ.

2 and 3: The research specifies some prior information aboutthe distribution of γ

and 4: Full Bayesian approach, with priors over all parameters.

Their strategy allows to compute confidence intervals for βconditional on any potential value of γ

This allows to evaluate the robustness to deviations from theexclusion restriction.

Example (from Conley et al.,)

Price elasticity of Demand for margarine.

log(share) = βlog(retailprice) + controls+ vt

Instrument: log wholesale prices. Why? they should primar-ily vary in response to cost shocks and be less sensitive to retaildemand shocks than retail prices.

The following picture shows that there can be substantial vi-olations of the exclusion restriction and the estimates won’t varymuch.

δ measures how large is the deviation from the exclusion re-striction (i.e., δ = 0 −→ exclusion restriction holds). Its precisedefinition varies depending on the method employed.

Example 2: Returns to schooling

Data from Angrist and Krueger (1991), 300.000+ observations

log(wagei) = β1educationi + controls+ ui

Conley use some of Bound’s arguments/calculations to set uppriors for the parameters.

Last remark:

The stronger the instruments, the more robust your estimateswill be to deviations from the exclusion restriction

Since the strength of the instruments is something that can beevaluated (as opposed to the “exogeneity”) is important to haveinstruments as strong as possible!

What you should do in practice?

[Advice from Mostly Harmless...]

• Report the first stage and think about whether it makes sense.Are the magnitude and sign as you would expect?

• Report the F-statistics on the excluded instruments. Thebigger it is the better. Use a proper test

• Pick your best single instrument and report just-identified es-timates using this one only. Just-identified IV is approximatelymedian-unbiased

• Check over-identified 2SLS estimates with LIML. If the LIMLestimates are very different, or standard errors are much big-ger, worry.

See also Murray (2006) for more practical advice!

2. Testing for deviations from the standardframework

Inference robust to weak instruments

Software: ivreg2 (STATA)

Examples

Testing for deviations from the standardframework

Testing for the failure of the exclusion restriction: notpossible to test, unless the model is overidentified (i.e., thereare more instruments that endogenous variables), but strongassumptions are needed (basically you need to assume whatyou want to test!)

Testing for the failure of the relevance condition/weakinstruments: different tests are available, most of them basedon the F statistic of the null hypothesis that Π = 0 in the firststage.

Two types of test

Tests of underidentification: test whether instruments areirrelevant (Π=0)

Tests of weak identification: test whether instrument areweak.

Always interpret tests of underidentification with caution:

if you reject underidentification, it can still be the casethat your model is only weakly identified since instrumentsare weak.

Detection of weak instruments, II

Goal: determine when instruments are irrelevant (test ofunder identification) or weak (tests of weak identification).

For the 1 endogenous variable case, consider the reduced-form regression

X = ZΠ +Wδ + v

where W are exogenous regressors.

Using F to test for weak instruments

The concentration parameter µ2 = Π′Z′ZΠ/σ2v is closelyrelated to the F statistic associated to testing H0 : Π = 0.

More specifically (under conditional homoscedasticity) F '1+ µ2/K, where K is the number of instruments.

weak instruments −→ a low value of µ2 −→ a low value ofF.

But: this relation relies heavily on the assumption of con-ditional homoscedasticity of the error term.

Staiger and Stock’s rule of thumb

Staiger and Stock (1997) showed that the weak instru-ment problem can arise even if the hypothesis of Π = 0 isrejected at standard t-tests at conventional significance lev-els.

Staiger and Stock’s Rule of thumb (1 endogenous vari-able):

Reject that your instruments are weak if F≥10,

where F is the F-statistic testing Π=0 in the regression of yon Z and W (where W are the exogenous regressors includedin the equation of interest).

Stock and Yogo (2005)’s bias and size methods

Stock and Yogo (SY, 2005) formalise Staiger and Stock’sprocedure.

They show that their rule of thumb is not always enough!(see below)

SY’s tests can be used with multiple endogenous regres-sors and multiple instruments.

How weak is weak?

We need a cutoff value for µ2 such that for values smallerthan µ2, instruments are deemed to be weak.

Since (under conditional homoscedasticity) the F statisticof the first stage regression and µ2 are related, the cutoff valueis usually compared with it.

There are different ways in which we can define this cutoffvalue, depending what we want to control (bias of β, size oft-test), etc.

The value of the cutoff depends on the method employed.

This means that the same instrument can be weak for oneestimation method but not for other!

2SLS is one of the least robust

How weak is weak, II

Stock and Yogo propose two methods to select the cut-offs:

The bias method: find a cutoff value for µ2 such that therelative bias of the 2SLS with respect to OLS doesn’t exceedcertain quantity.

The size method: find a cutoff value for µ2 such that themaximal size of a Wald test of all the elements of β doesn’texceed a certain amount.

Stock and Yogo (2005)’s bias method

Let µ210%bias

be the value of µ2 such that if µ2 ≥ µ210%bias

,the maximum bias of the 2SLS estimator will be no morethan 10% of the bias of OLS.

Decision rule is (5% significance level )

Reject that your instruments are weak if F > J10(k),

where J10(k) is chosen such that P (F > J10(k);µ2 = µ210%bias

) =0.05

Stock and Yogo (2005)’s bias method, II

Test statistic:

1 endogenous regressor: computed using F from the firststage regression.

more than 1 endogenous regressor: Cragg- Donald (1993)statistic (multivariate version of the F statistic).

Stock and Yogo (2005) provide critical values that dependon

the number of endogenous regressors,

the number of instruments,

the maximum bias.

The estimation procedure (2SLS, LIML, ...)

ivreg2 (STATA): gives you the critical values.

Stock and Yogo (2005)’s size method

Similar logic but instead of controlling bias, control thesize of a Wald test of β=β0

Use F statistic first stage if 1 endogenous regressor orCragg-Donald if more than 1

Decision rule:

Reject weak instruments if statistic is larger than the crit-ical value

Have a look at the critical values (and compare to Staigerand Stock’s rule of thumb!)

Detecting weak instruments, final commentsAs mentioned before, the logic of using the first-stage

using the F statistics relies heavily on the assumption of con-ditional homoscedasticity.

Solution: ongoing research

Kleibergen-Paap rk Wald statistic: ivreg2 reports this testas a test for weak instruments when robust options are calledfor.

However, this test is not formally justified in the contextof weak instruments.

It is justified in the case of under identification and if errorsare i.i.d., it becomes the Cragg-Donald test (but not underweak instruments!).

Olea Montiel and Pflueger (2013) and (2014): tests validunder heteroskedasticity, autocorrelation, and clus- tering ro-bust weak instrument tests, for 2SLS and LIML.

But only applicable if there is only 1 endogenous regressor.

Stata: weakivtest (you need to install the package first).

econometrics amayoral.iae-csic.org › econometrics_insead20 › handout6.pdf · 6.obtain point...

Documents

time series analysis -...

tema 2: juegos estáticos con información...

2017 fullers fours head crew list 4x- band 1 fullers fours...

sda09 handout6 full

introduction to time series and forecasting handout 1...

case study- fullers fine jewelry, dallas, texas accolade...

1....

introduction to univariate nonstationary time series...

love your laundry webinar with shannon lush info sheet ·...

chores1 -...

kontrol...

the governance of perpetual financial...

queues, not just mediocrity: inefficiency in decentralized...

jack fullers oxley green | brightling | robertsbridge...

fullers bleed

6 fullers earth mb filter line - mundo...

instrumental variables estimation in...

fullers earth business opportunites - primaryinfo · 2019....

presentation by cliff fullers

¿qué caracteriza a los juegos dinámicos con información...