chapter 19: measurement error and the instrumental...

Chapter 19: Measurement Error and the Instrumental Variables Estimation Procedure

Chapter 19 Outline

• Introduction to Measurement Error o What Is Measurement Error? o Modeling Measurement Error

• The Ordinary Least Squares (OLS) Estimation Procedure and Dependent Variable Measurement Error

• The Ordinary Least Squares (OLS) Estimation Procedure and Explanatory Variable Measurement Error

o Summary: Explanatory Variable Measurement Error Bias o Explanatory Variable Measurement Error: Attenuation (Dilution)

Bias o Might the Ordinary Least Squares (OLS) Estimation Procedure Be

Consistent? • Instrumental Variable (IV) Estimation Procedure: A Two Regression

Procedure o Mechanics o The “Good” Instrument Conditions

• Measurement Error Example: Annual, Permanent, and Transitory Income

o Definitions and Theory o Might the Ordinary Least Squares (OLS) Estimation Procedure

Suffer from a Serious Econometric Problem? • Instrumental Variable (IV) Approach

o The Mechanics o Comparison of the Ordinary Least Squares (OLS) and the

Instrumental Variables (IV) Approaches o “Good” Instrument Conditions Revisited

• Justifying the Instrumental Variable (IV) Estimation Procedure Chapter 19 Prep Questions 1. Suppose that a physics assignment requires you to measure the amount of time

it takes a one pound weight to fall six feet. You conduct twenty trials in which you use a very accurate stop watch to measure how long it takes the weight to fall.

a. Even though you are very careful and conscientious would you expect the stop watch to report precisely the same amount of time on each trial? Explain.

2

Suppose that the following equation describes the relationship between the measured elapsed time and the actual elapsed time:

Measured elapsed time

Actualelapsed timet t t t

t

yMeasured yActual v yMeasured

yActual

= + ==

vt is a random variable. vt represents the random influences that cause your measurement of the elapsed time to deviate from the actual elapsed time. The random influences cause you to click the stop watch a little early or a little late.

b. Recall that you are careful and conscientious in attempting to measure the elapsed time.

1) In approximately what portion of the trials would you overestimate the elapsed time; that is, in approximately what portion of the trials would you expect vt to be positive?

2) In approximately what portion of the trials would you underestimate the elapsed time; that is, in approximately what portion of the trials would you expect vt to be negative?

3) Approximately what would the mean (average) of vt equal? 2. Economists distinguish between permanent income and annual income.

Loosely speaking, permanent income equals what a household earns per year “on average;” that is, permanent income can be thought of as the “average” of annual income over an entire lifetime. In some years, annual income is more than its permanent income, but in other years it is less. The difference between the household’s annual income and permanent income is called transitory income:

where Households's Annual Income

Household's Permanent Income

Household's Transitory Income

t t t

t

t

t

IncTrans IncAnn IncPerm

IncAnn

IncPerm

IncTrans

= −===

or equivalently,

t t tIncAnn IncPerm IncTrans= +

Since permanent income equals what a household earns “on average,” the mean of transitory income equals 0. Microeconomic theory teaches that households base their consumption decisions on their “permanent” income.

Theory: Additional permanent income increases consumption. Consider the following model to assess the theory:

Model: Theory: 0t Const IncPerm t t IncPermCons IncPerm eβ β β= + + >

3

When we attempt to gather data to access this theory, we immediately encounter a difficulty. Permanent income cannot be observed. Only annual income data are available to assess the theory. So, while we would like to specify permanent income as the explanatory variable, we have no choice. We must use annual disposable income.

a. Can you interpret transitory income as measurement error? Hint: What is the mean (average) of transitory income?

b. Now, represent transitory income, IncTranst, by ut:

t t tIncAnn IncPerm u= +

Express the model in terms of annual income. c. What is the equation for the new error term? d. What are the ramifications of using the ordinary least squares (OLS)

estimation procedure to estimate the permanent income coefficient, βIncPerm, using annual income as the explanatory variable?

Introduction to Measurement Error Two types of measurement error can be present:

• Dependent variable • Explanatory variable

We shall argue that dependent variable measurement error does not lead to bias. On the other hand, whenever explanatory variable measure error exists, the explanatory variable and error term will be correlated resulting in bias. We consider dependent variable measurement error first. Before doing so, however, we shall describe precisely what we mean by measurement error. What Is Measurement Error? Suppose that a physics assignment requires you to measure the amount of time it takes a one pound weight to fall six feet. You conduct twenty trials in which you use a very accurate stop watch to measure how long it takes the weight to fall.

Question: Will your stop watch report the same amount of time on each trial? Answer: No. Sometimes reported times will be lower than other reported times. Sometimes you will be a little premature in clicking the stop watch button. Other times you will be a little late.

It is humanly impossible to measure the actual elapsed time perfectly. No matter how careful you are, sometimes the measured value will be a little low and other times a little high.

4

Modeling Measurement Error We can model measurement error with the following equation:

yMeasuredt = yActualt + vt yActualt equals the actual amount of time elapsed and yMeasuredt equals the measured amount of time. vt represents measurement error. Sometimes vt will be positive when you are a little too slow in clicking the stop watch button; other times vt will be negative when you click the button a little too quickly. vt is a random variable; we cannot predict the numerical value of vt beforehand. What can we say about vt? We can describe its distribution. Since you are conscientious in measuring the elapsed time, the mean of vt’s probability distribution equals 0:

Mean[vt] = 0 Measurement error does not systematically increase or decrease the measured value of yt. The measured value of yt will not systematically overestimate or underestimate the actual value. The Ordinary Least Squares (OLS) Estimation Procedure and Dependent Variable Measurement Error We begin with the equation specifying the actual relationship between the dependent and explanatory variables:

Actual Relationship: yActualt = βConst + βxActualxActualt + et But now suppose that as a consequence of measurement error, the actual value of the dependent variable, yActualt, is not observable. You have no choice but to use the measured value, yMeasuredt. Recall that the measured value equals the actual value plus the measurement error random variable, vt:

is a random variable with mean 0 : Mean[ ] 0t t t tyMeasured yActual v v= + =tv

Solving for yActualt:

yActualt = yMeasuredt − vt

5

Let us apply this to the actual relationship:

Substituting for

Rearranging terms

Letting

t Const xActual t t

t

t t Const xActual t t

t Const xActual t t t

t t t

t Const xActua

yActual xActual e

yActual

yMeasured v xActual e

yMeasured xActual e v

e v

yMeasured

β β

β β

β βε

β β

= + +

↓− = + +

= + + +

↓ = += + l t txActual ε+

εt represents the error term in the regression that you will actually be running. Will this result in bias? To address this issue consider the following question:

Question: Are the explanatory variable, xActualt, and the error term, εt, correlated?

To answer the question, suppose that the measurement error term, vt, were to increase:

vt up ã é εt = et + vt

xActualt unaffected ↔ εt up

The value of the explanatory variable, xActualt, is unchanged while the error term, εt, increases. Hence, the explanatory variable and error term εt are independent; consequently, no bias should result.

6

Econometrics Lab 19.1: Dependent Measurement Error

Figure 19.1: Dependent Variable Measurement Error Simulation

[Link to MIT-Lab 19.1 goes here.]

We use a simulation to confirm our logic. First, we consider our base case, the no measurement error case. The YMeas Err checkbox is cleared indicating that no dependent variable measurement error is present. Consequently, no bias should result. Be certain that the Pause checkbox is cleared and click Start. After many, many repetitions, click Stop. The ordinary least squares (OLS) estimation procedure is unbiased in this case; the average of the estimated coefficient values and the actual coefficient value both equal 2.0. When no measurement error is present, all is well.

Now, we shall introduce dependent variable measurement error by

checking the YMeas Err checkbox. The YMeas Var list now appears with 20.0 selected; the variance of the measurement error’s probability distribution, Var[vt], equals 20.0. Click Start and then after many, many repetitions, click Stop. Again, the average of the estimated coefficient values and the actual coefficient value

http://www3.amherst.edu/~fwesthoff/MITLinks/MIT-Lab-19-01.html

7

both equal 2.0. Next, select from 20.0 to 50.0 to 80.0 from the “YMeas Var” list and repeat the process.

Sample Size = 10

Type of Actual Mean (Average) Variance of Measurement YMeas Coef of the Estimated Estimated

Error Var Value Coef Values Coef Values None 2.0 ≈2.0 ≈1.7

Dep Vbl 20.0 2.0 ≈2.0 ≈1.8 Dep Vbl 50.0 2.0 ≈2.0 ≈2.0 Dep Vbl 80.0 2.0 ≈2.0 ≈2.2 Table 19.1: Dependent Variable Measurement Error Simulation Results

The simulation confirms our logic. Even when dependent variable measurement error is present, the average of the estimated coefficient values equals the actual coefficient value. Dependent variable measurement error does not lead to bias.

What are the ramifications of dependent variable measurement error? The

last column of Table 19.1 reveals the answer. As measurement error variance increases, the variance of the estimated coefficient values and hence the variance of the coefficient estimate’s probability distribution increases. As the variance of the dependent variable measurement error term increases, we introduce “more uncertainty” into the process and hence, the ordinary least squares (OLS) estimates become less reliable. The Ordinary Least Squares (OLS) Estimation Procedure and Explanatory Variable Measurement Error To investigate explanatory variable measurement error we again begin with the equation that describes the actual relationship between the dependent and explanatory variables:

Actual Relationship: yActualt = βConst + βxActualxActualt + et Now, suppose that we cannot observe the actual value of the explanatory variable; we can only observe the measured value. The measured value equals the actual value plus the measurement error random variable, ut:

is a random variable with mean 0 : Mean[ ] 0t t t txMeasured xActual u u= + =tu

Solving for yActualt:

xActualt = xMeasuredt − ut

8

Now, we apply this to the actual relationship:

Substituting for

( )

Multiplying

Rearranging terms

t Const xActual t t

t

Const xActual t t t

Const xActual t xActual t t

Const xActual t t xActu

yActual xActual e

xActual

xMeasured u e

xMeasured u e

xMeasured e

β β

β β

β β β

β β β

= + +

↓= + − +

↓= + − +

= + + −

Letting

al t

t t xActual t

t Const xActual t t

u

e u

yActual xMeasured

ε ββ β ε

↓ = −= + +

εt is the error term in the regression that we will actually be running.

Recall what we learned about correlation between the explanatory variable and error term:

Explanatory variable Explanatory variable Explanatory variable and error term and error term and error term

positively correlated uncorrelated negatively correlated ↓ ↓ ↓

OLS estimation procedure for



the coefficient value the coefficient value the coefficient value is biased upward is unbiased is biased downward

9

Are the explanatory variable, xMeasuredt, and the error term, εt, correlated? The answer to the question depends on the actual coefficient. Consider the three possibilities:

• βxActual > 0: When the actual coefficient is positive, negative correlation exists; consequently, the ordinary least squares (OLS) estimation procedure for the coefficient value would be biased downward. To understand why, suppose that ut increases:

ut up

t t txMeasured xActual u= + ã é t t xActual te uε β= − βxActual > 0

xMeasuredt up ↔ εt down

é ã Negative Explanatory Variable/Error

Term Correlation

↓ OLS Biased Downward

• βxActual < 0: When the actual coefficient is negative, positive correlation exists; consequently, the ordinary least squares (OLS) estimation procedure for the coefficient value would be biased upward. To understand why, suppose that ut increases:

ut up

t t txMeasured xActual u= + ã é t t xActual te uε β= − βxActual < 0

xMeasuredt up ↔ εt up

é ã Positive Explanatory Variable/Error

Term Correlation

↓ OLS Biased Upward

• βxActual = 0: When the actual coefficient equals 0, no correlation exists; consequently, no bias results. To understand why, suppose that ut increases:

ut up

t t txMeasured xActual u= + ã é t t xActual te uε β= − βxActual = 0

xMeasuredt up ↔ εt unaffected

é ã No Explanatory Variable/Error

Term Correlation

↓ OLS Unbiased

10

Summary: Explanatory Variable Measurement Error Bias βxActual < 0 βxActual = 0 βxActual > 0

↓ ↓ ↓ xMeasuredt and εt are xMeasuredt and εt are xMeasuredt and εt are positively correlated uncorrelated negatively correlated

↓ ↓ ↓ OLS estimation

procedure OLS estimation

procedure OLS estimation

procedure is biased upward is unbiased is biased downward

↓ ↓ Biased toward 0 Biased toward 0

11

Econometrics Lab 19.2: Explanatory Variable Measurement Error

Figure 19.2: Explanatory Variable Measurement Error Simulation


We shall use a simulation to check our logic. This time we check the XMeas Err checkbox. The XMeas Var list now appears with 20.0 selected; the variance of the measurement error’s probability distribution, Var[ut], equals 20.0. Then, we select various values for the actual coefficient. In each case, click Start and then after many, many repetitions click Stop. The simulation results are reported in Table 10.2:


12

Sample Size = 40 Type of Actual Mean (Average)

Measurement XMeas Coef of the Estimated Magnitude Error Var Value Coef Values of Bias

Exp Vbl 20.0 2.0 ≈1.11 ≈.89 Exp Vbl 20.0 1.0 ≈.56 ≈.44 Exp Vbl 20.0 −1.0 ≈−.56 ≈.44 Exp Vbl 20.0 0.0 ≈.00 ≈.00 Table 19.2: Explanatory Variable Measurement Error Simulation Results

The simulation results confirm our logic. When the actual coefficient is

positive and explanatory variable measurement error is present, the ordinary least squares (OLS) estimation procedure for the coefficient value is biased downward. When the actual coefficient is negative and explanatory variable measurement error is present, upward bias results. Lastly, when the actual coefficient is zero, no bias results even in the presence of explanatory variable measurement error. Explanatory Variable Measurement Error: Attenuation (Dilution) Bias The simulations reveal an interesting pattern. While explanatory variable measurement error leads to bias, the bias never appears to be strong enough to change the sign of the mean of the coefficient estimates. In other words, explanatory variable measurement error biases the ordinary least squares (OLS) estimation procedure for the coefficient value toward 0. This type of bias is called attenuation or dilution bias.

βxActual

0βxActual < 0 βxActual > 0

OLS Estimation Procedure

Figure 19.3: Effect of Explanatory Variable Measurement Error

13

Why does explanatory variable measurement error cause attenuation bias? Even more basic, why does explanatory variable measurement error cause bias at all? After all, the chances that the measured value of the explanatory variable will be too high equal the chances it will be too low. Why should this lead to bias? To appreciate why, suppose that the actual value of the coefficient, βxActual, is positive. When the measured value of the explanatory variable, xMeasuredt, rises it can do so for two reasons:

• the actual value of explanatory variable, xActualt, rises or

• the value of the measurement error term, ut, rises. Consider what happens to yActualt in each case:

and

Assume that 0t t t t Const xActual t t

xActual

xMeasured xActual u yActual xActual eβ ββ

= + = + +>

xActualt up → yActualt up

ç xMeasuredt up or

é ut up → yActualt unchanged

So, we have two possibilities: • First case: The actual value of the dependent variable rises since the actual

value of the explanatory variable has risen. In this case, the estimation procedure will estimate the value of the coefficient estimate “correctly.”

• Second case: The actual value of the dependent variable remains unchanged since the actual value of the explanatory variable is unchanged. In this case, the estimation procedure would estimate the value of the coefficient to be 0.

Taking into account both cases, the estimation procedure will understate the effect that the actual value of the explanatory variable has on the dependent variable. Overall, the estimation procedure will understate the actual value of the coefficient.

14

Might the Ordinary Least Squares (OLS) Estimation Procedure Be Consistent? Econometrics Lab 19.3: Consistency and Explanatory Variable Measurement Error


We have already shown that when explanatory variable measurement error is present and the actual coefficient is nonzero, the ordinary least squares (OLS) estimation procedure for the coefficient value is biased. But perhaps it is consistent. Let us see, by increasing the sample size:

Estimation XMeas Sample Actual Mean of Magnitude Variance of Procedure Var Size Coef Coef Ests of Bias Coef Ests

OLS 20 40 2.0 ≈1.11 ≈0.89 ≈0.2 OLS 20 50 2.0 ≈1.11 ≈0.89 ≈0.2 OLS 20 60 2.0 ≈1.11 ≈0.89 ≈0.1

Table 19.3: OLS Estimation Procedure, Measurement Error, and Consistency

The bias does not lessen as the sample size is increased. Unfortunately, when explanatory variable measurement error is present and the actual coefficient is nonzero, the ordinary least squares (OLS) estimation procedure for the coefficient value provides only bad news:

• Bad news: The ordinary least squares (OLS) estimation procedure is biased.

• Bad news: The ordinary least squares (OLS) estimation procedure is not consistent.

Instrumental Variable (IV) Estimation Procedure: A Two Regression Procedure Recall that the instrumental variable estimation procedure addresses situations in which the explanatory variable and the error term are correlated:

Original Model: yt = βConst + βxxt + εt where yt = Dependent variable é ã xt = Explanatory variable When xt and εt εt = Error term are correlated t = 1, 2, …, T T = Sample size

↓

xt is the “problem” explanatory variable

Figure 19.4: The “Problem” Explanatory Variable


15

When an explanatory variable, xt, is correlated with the error term, εt, we shall refer to the explanatory variable as the “problem” explanatory variable. The correlation of the explanatory variable and the error term creates the bias problem for the ordinary least squares (OLS) estimation procedure. The instrumental variable estimation procedure can mitigate, but not completely remedy the problem. Let us briefly review the procedure and motivate it. Mechanics

• Choose a “Good” Instrument: A “good” instrument, zt, must have two properties:

o Correlated with the “problem” explanatory variable, xt.

o Uncorrelated with the error term, εt.

• Instrumental Variables (IV) Regression 1: Use the instrument, zt, to provide an “estimate” of the problem explanatory variable, xt.

o Dependent variable: “Problem” explanatory variable, xt. o Explanatory variable: Instrument, zt. o Estimate of the “problem” explanatory variable: Estxt = aConst + azzt

where aConst and az are the estimates of the constant and coefficient in this regression, IV Regression 1.

• Instrumental Variables (IV) Regression 2: In the original model, replace the “problem” explanatory variable, xt, with its surrogate, Estxt, the estimate of the “problem” explanatory variable provided by the instrument, zt, from IV Regression 1.

o Dependent variable: Original dependent variable, yt. o Explanatory variable: Estimate of the “problem” explanatory

variable based on the results from IV Regression 1, Estxt.

16

The “Good” Instrument Conditions Let us now provide the intuition behind why a “good” instrument, zt, must satisfy the two conditions:

• Instrument/”Problem” Explanatory Variable Correlation: The instrument, zt, must be correlated with the “problem” explanatory variable, xt. To understand why, focus on IV Regression 1. We are using the instrument to create a surrogate for the “problem” explanatory variable in IV Regression 1:

Estxt = aConst + azzt The estimate, Estxt, will be a good surrogate only if it is a good predictor of the “problem” explanatory variable, xt. This will occur only if the instrument, zt, is correlated with the “problem” explanatory variable, xt.

• Instrument/Error Term Independence: The instrument, zt, must be

independent of the error term, εt. Focus on IV Regression 2. We begin with the original model and then replace the “problem” explanatory, xt, variable with its surrogate, Estxt:

yt = βConst + βxxt + εt ↓ Replace “problem” with surrogate = βConst + βxEstxt + εt

where Estxt = aConst + azzt from IV Regression 1

To avoid violating the explanatory variable/error term independence premise in IV Regression 2, the surrogate for the “problem” explanatory variable, Estxt, must be independent of the error term, εt. The surrogate, Estxt, is derived from the instrument, zt, in IV Regression 1:

Estxt = aConst + azzt Consequently, to avoid violating the explanatory variable/error term independence premise the instrument, zt, and the error term, εt, must be independent.

Estxt and εt must be independent ã é

yt = βConst + βxEstxt + εt

↓ ⏐ ⏐ ⏐ ⏐ ↓

Estxt = aConst + azzt

é

zt and εt must be independent

17

Measurement Error Example: Annual, Permanent, and Transitory Income Definitions and Theory Economists distinguish between permanent income and annual income. Loosely speaking, permanent income equals what a household earns per year “on average;” that is, permanent income can be thought of as the “average” of annual income over an entire lifetime. In some years, the household’s annual income is more than its permanent income, but in other years it is less. The difference between the household’s annual income and permanent income is called transitory income:

where Households's Annual Income

Household's Permanent Income

Household's Transitory Income

t t t

t

t

t

IncTrans IncAnn IncPerm

IncAnn

IncPerm

IncTrans

= −===

or equivalently,

t t tIncAnn IncPerm IncTrans= +

Since permanent income equals what the household earns “on average,” the mean of transitory income equals 0.

Microeconomic theory teaches that households base their consumption

decisions on their “permanent” income. We are going to apply the permanent income consumption theory to health insurance coverage:

Theory: Additional permanent per capita disposable income within a state increases health insurance coverage within the state. Project: Assess the effect of permanent income on health insurance coverage.

We consider a straightforward linear model: Model: Theory: 0

where Percent of individuals with health insurance in state

Per capita permanent disposable income in state

t Const IncPerm t t IncPerm

t

t

Covered IncPermPC e

Covered t

IncPermPC t

β β β= + + >==

When we attempt to gather data to access this theory, we immediately encounter a difficulty. Permanent income cannot be observed. Only annual income data are available to assess the theory.

18

Health Insurance Data: Cross section data of health insurance coverage, education, and income statistics from the 50 states and the District of Columbia in 2007.

Coveredt Adults (25 and older) covered by health insurance in state t (percent)

IncAnnPCt Per capita annual disposable income in state t (thousands of dollars)

HSt Adults (25 and older) who completed high school in state t (percent)

Collt Adults (25 and older) who completed a four year college in state t (percent)

AdvDegt Adults (25 and older) who have an advanced degree in state t (percent)

[Link to MIT-HealthInsur-2007.wf1 goes here.]

While we would like to specify permanent income as the explanatory variable, we have no choice. We must use annual disposable income as the explanatory variable. Using the ordinary least squares (OLS) estimation procedure to estimate the parameters:

Ordinary Least Squares (OLS) Dependent Variable: Covered Explanatory Variable(s): Estimate SE t-Statistic Prob IncAnnPC 0.226905 0.104784 2.165464 0.0352 Const 78.56242 3.605818 21.78768 0.0000 Number of Observations 51 Estimated Equation: EstCovered = 78.6 + .23IncAnnPC Interpretation of Estimates: bIncAnnPC = .23: A $1,000 increase in annual per capita disposable income

increases the state’s health insurance coverage by .23 percentage points. Critical Result: The IncAnnPC coefficient estimate equals .23. The positive sign

of the coefficient estimate suggests that increases in disposable income increase health insurance coverage. This evidence supports the theory. Table 19.4: Health Insurance OLS Regression Results

http://www3.amherst.edu/~fwesthoff/MITLinks/MIT-HealthInsur-2007.wf1

19

Now, construct the null and alternative hypotheses: H0: βIncPerm = 0 Disposable income has no effect on health insurance

coverage H1: βIncPerm > 0 Additional disposable income increases health insurance

coverage Since the null hypothesis is based on the premise that the actual value of the coefficient equals 0, we can calculate the Prob[Results IF H0 True] using the tails probability reported in the regression printout:

0

.0352Prob[Results IF H True] = .0176

2=

Might the Ordinary Least Squares (OLS) Estimation Procedure Suffer from a Serious Econometric Problem? Might this regression suffer from a serious econometric problem, however? Yes. Annual income equals permanent income plus transitory income; transitory income can be viewed as measurement error. Sometimes transitory income is positive, sometimes it is negative, on average it is 0:

IncAnnPCt = IncPermPCt + IncTransPCt ↓ Measurement

Error

↓ IncAnnPCt = IncPermPCt + ut where Mean[ut] = 0

or equivalently, IncPermPCt = IncAnnPCt − ut

As a consequence of explanatory variable measurement error the ordinary least squares (OLS) estimation procedure for the coefficient will be biased downward. To understand why we begin with our model and the do a little algebra:

20

P P

P

P P

P

where 0

Substituting for

( )

Multiplying

Rearranging terms

t Const Inc erm t t Inc erm

t

Const Inc erm t t t

Const Inc erm t Inc erm t t

Const Inc erm

Covered IncPermPC e

IncPermPC

IncAnnPC u e

IncAnnPC u e

In

β β β

β β

β β β

β β

= + + >

↓= + − +

↓= + − +

= + P

P

P

Letting

t t Inc erm t

t t Inc erm t

t Const Inc erm t t

cAnnPC e u

e u

Covered IncAnnPC

βε β

β β ε

+ −

↓ = −= + +

Theory suggests that βIncPerm is positive; consequently, we expect the new error

term, εt, and the explanatory variable, IncAnnPCt, to be negatively correlated. ut up

IncAnnPCt = IncPermPCt + ut ã é εt = et − βIncPermut βIncPerm > 0 IncAnnPCt up ↔ εt down

é ã Negative Explanatory Variable/Error

Term Correlation

↓ OLS Biased Downward

IncAnnPCt is the “problem” explanatory variable because it is correlated with the

error term, εt. The ordinary least squares (OLS) estimation procedure for the coefficient value is biased toward 0. We shall now show how we can use the instrumental variable (IV) estimation procedure to mitigate the problem. Instrumental Variable (IV) Approach The Mechanics Choose an Instrument: In this example, we use percent of adults who completed high school, HSt, as our instrument. In doing so, we believe that it satisfies the two “good” instrument conditions. We believe that high school education, HSt,

• is positively correlated with the “problem” explanatory variable, IncAnnPCt.

and • is uncorrelated with the error term, εt.

21

Instrumental Variables (IV) Regression 1 • Dependent variable: “Problem” explanatory variable, IncAnnPC. • Explanatory variable: Instrument, the correlated variable, HS.

We can motivate IV Regression 1 by devising a theory to explain permanent income. Our theory is very straightforward, state per capita permanent income depends on percent of state residents who are high school graduates:

where Percent of adults (25 and over) who completed high school in state t Const HS t t

t

IncPermPC HS e

HS t

α α= + +

Theory: As a state has a greater percent of college graduates, its per capita permanent income increase; hence, αHS > 0.

But again we note that permanent income is not observable, only annual income is. Consequently, we have no choice but to use annual per capita income as the dependent variable.

[Link to MIT-HealthInsur-2007.wf1 goes here.]

Ordinary Least Squares (OLS)

Dependent Variable: IncAnnPC Explanatory Variable(s): Estimate SE t-Statistic Prob HS 0.456948 0.194711 2.346797 0.0230 Const −5.274762 16.75975 -0.314728 0.7543 Number of Observations 51 Estimated Equation: EstIncAnnPC = −5.27 + .457HS

Table 19.5: Health Insurance IV Regression 1 Results

What are the ramifications of using annual per capita income as the dependent variable? We can view annual per capita income as permanent per capita income with measurement error. What do we know about dependent variable measurement error? Dependent variable does not lead to bias; only explanatory variable measurement error creates bias. Since annual income is the dependent variable in IV Regression 1, the ordinary least squares (OLS) estimation procedure for the regression parameters will not be biased.

http://www3.amherst.edu/~fwesthoff/MITLinks/MIT-HealthInsur-2007.wf1

22

Instrumental Variables (IV) Regression 2 • Dependent variable: Original dependent variable, Covered. • Explanatory variable: Estimate of the “problem” explanatory variable

based on the results from IV Regression 1, EstIncAnnPC. Use the estimates of IV Regression 1 to create a new variable, the estimated value of per capita disposable income based on the completion of high school:

EstIncAnnPC = −5.27 + .457HS

Ordinary Least Squares (OLS) Dependent Variable: Covered Explanatory Variable(s): Estimate SE t-Statistic Prob EstIncAnnPC 1.387791 0.282369 4.914822 0.0000 Const 39.05305 9.620730 4.059260 0.0002 Number of Observations 51 Estimated Equation: EstCovered = 39.05 + 1.39EstIncAnnPC Interpretation of Estimates: bEstIncAnnPC = 1.39: A $1,000 increase in annual per capita disposable income

increases the state’s health insurance coverage by 1.39 percentage points. Critical Result: The EstIncAnnPC coefficient estimate equals 1.39. The positive

sign of the coefficient estimate suggests that increases in permanent disposable income increase health insurance coverage. This evidence supports the theory. Table 19.6: Health Insurance IV Regression 2 Results

Comparison of the Ordinary Least Squares (OLS) and the Instrumental Variables (IV) Approaches Now review the two approaches that we used to estimate of the effect of permanent income on health insurance coverage: the ordinary least squares (OLS) estimation procedure and the instrumental variable (IV) estimation procedure.

• First, we used annual disposable income as the explanatory variable and applied the ordinary least squares (OLS) estimation procedure. We estimated that a $1,000 increase in per capita disposable income increases health insurance coverage by .23 percentage points. But we believe that an explanatory variable measurement error problem is present here.

• Second, we used an instrumental variable (IV) approach which resulted in a higher estimate for the impact of permanent income. We estimated that a $1,000 increase in per capita disposable income increases health insurance coverage by 1.39 percentage points.

23

βIncPerm Standard Tails Estimate Error t-Statistic Probability Ordinary Least Squares (OLS) .23 .105 2.17 .0352 Instrumental Variable (IV) 1.38 .282 4.91 <.0001

Table 19.7: Comparison of OLS and IV Regression Results

These results are consistent with the notion that the ordinary least squares (OLS) estimation procedure for the coefficient value is biased downward whenever explanatory variable measurement error is present. “Good” Instrument Conditions Revisited IV Regression 1 allows us to assess the first “good” instrument condition.

• Instrument/”Problem” Explanatory Variable Correlation: The instrument, HSt, must be correlated with the “problem” explanatory variable, IncAnnPCt. We are using the instrument to create a surrogate for the “problem” explanatory variable in IV Regression 1:

EstIncAnnPCt = −5.27 + .457HSt The estimate, EstIncAnnPCt, will be a “good” surrogate only if the instrument, HSt, is correlated with the “problem” explanatory variable, IncAnnPCt; that is, only if the estimate is a good predictor of the “problem” explanatory variable.

The sign of the HSt coefficient is positive supporting our view that annual income and high school education are positively correlated. Furthermore, the coefficient is significant at the 5 percent level and nearly significant at the 1 percent level. So, it is reasonable to judge that the instrument meets the first condition.

Next, focus on the second “good” instrument condition: • Instrument/Error Term Independence: The instrument, HS, and the

error term, εt, must be independent. Otherwise, the explanatory variable/error term independence premise would be violated in IV Regression 2.

Recall the model that IV Regression 2 estimates:

Coveredt = βConst + βIncPermEstAnnIncPCt + εt é ã

Question: Are EstAnnIncPCt and εt independent? ã é EstIncAnnPCt = −5.27 + .457HSt εt = e t− βIncPermut

é ã Answer: Only if HSt and εt are independent.

24

The explanatory variable/error term independence premise will be satisfied only if the instrument, HSt, and the new error term, εt, are independent. If they are correlated, then we have gone “from the frying pan into the fire.” It was the violation of this premise that created the problem in the first place. There is no obvious reason to believe that they are correlated. Unfortunately, there is no way to confirm this empirically, however. This can be the “Achilles heel” of the instrumental variable (IV) estimation procedure, however. Finding a good instrument can be very tricky. Justifying the Instrumental Variable (IV) Estimation Procedure Claim: While the instrumental variable (IV) estimation procedure for the coefficient value in the presence of measurement is biased, it is consistent. Econometrics Lab 19.4: Consistency and the Instrumental Variable (IV) Estimation Procedure While this claim can be justified rigorous, we shall avoid the mathematics by using a simulation.



25

Figure 19.5: Instrumental Variable Measurement Error Simulation

Focus your attention on Figure 19.5. Since we wish to investigate the properties of the instrumental variable (IV) estimation procedure, IV is selected in the estimation procedure box. Next, note the XMeas Var List. Explanatory variable measurement error is present. By default, the variance of the probability distribution for the measurement error term, Var[ut], equals 20.0. In the Corr X&Z list .50 is selected; the correlation coefficient between the explanatory variable and the instrument is .50.

26

Initially, the sample size is 40. Click Start and then after many, many repetitions click Stop. The average of the estimated coefficient values equals 2.24. Next, increase the sample size from 40 to 60 and repeat the process. Do the same for a sample size of 80. As Table 19.8 reports, the average of the estimated coefficient values never equals the actual value; consequently, the instrumental variable (IV) estimation procedure for the coefficient value is biased. But also note that the magnitude of the bias decreases as the sample size increases. Also, the variance of the estimates declines as the sample size increases.

Estimation XMeas Sample Actual Mean of Magnitude Variance of Procedure Var Size Coef Coef Ests of Bias Coef Ests

IV 20 40 2.0 ≈2.24 ≈0.24 ≈8.8 IV 20 50 2.0 ≈2.17 ≈0.17 ≈3.4 IV 20 60 2.0 ≈2.12 ≈0.12 ≈1.7

Table 19.8: Measurement Error, IV Estimation Procedure, and Consistency

Table 19.8 suggests that when explanatory variable measurement error is present, the instrumental variable (IV) estimation procedure for the coefficient value provides both good news and bad news:

• Bad news: The instrumental variable (IV) estimation procedure for the coefficient value is still biased; the average of the estimated coefficient values does not equal the actual value.

• Good news: The instrumental variable (IV) estimation procedure for the coefficient value is consistent. As the sample size is increased,

o the magnitude of the bias diminishes. o the variance of the estimated coefficient values decreases.

chapter 19: measurement error and the instrumental...

Documents