lecture 9 today: ch. 3: multiple regression analysis example with two independent variables...

23
Lecture 9 Today: Ch. 3: Multiple Regression Analysis Example with two independent variables Frisch-Waugh-Lovell theorem

Upload: fay-andrews

Post on 17-Dec-2015

218 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Lecture 9 Today: Ch. 3: Multiple Regression Analysis Example with two independent variables Frisch-Waugh-Lovell theorem

Lecture 9

Today:Ch. 3: Multiple Regression Analysis• Example with two independent variables• Frisch-Waugh-Lovell theorem

Page 2: Lecture 9 Today: Ch. 3: Multiple Regression Analysis Example with two independent variables Frisch-Waugh-Lovell theorem

© Christopher Dougherty 1999–2006

MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE

EARNINGS

EXP

S

b1

We’ll look at the geometrical interpretation of a multiple regression model with two explanatory variables.

Specifically, we will look at an earnings function model where hourly earnings, EARNINGS, depend on years of schooling (highest grade completed), S, and years of work experience, EXP.

The model has three dimensions, one each for EARNINGS, S, and EXP. The starting point for investigating the determination of EARNINGS is the intercept, b1.

Literally the intercept gives EARNINGS for those respondents who have no schooling and no work experience. However, there were no respondents with less than 6 years of schooling. Hence a literal interpretation of b1 would be unwise.

EARNINGS = b1 + b2S + b3EXP + u

Page 3: Lecture 9 Today: Ch. 3: Multiple Regression Analysis Example with two independent variables Frisch-Waugh-Lovell theorem

b1 + b2S

© Christopher Dougherty 1999–2006

EARNINGS

EXP

The next term on the right side of the equation gives the effect of variations in S. A one year increase in S causes EARNINGS to increase by b2 dollars, holding EXP constant.

S

b1

pure S effect

EARNINGS = b1 + b2S + b3EXP + u

MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE

Page 4: Lecture 9 Today: Ch. 3: Multiple Regression Analysis Example with two independent variables Frisch-Waugh-Lovell theorem

b1 + b3EXP

© Christopher Dougherty 1999–2006

pure EXP effect

S

b1

EARNINGS

EXP

EARNINGS = b1 + b2S + b3EXP + u

Similarly, the third term gives the effect of variations in EXP. A one year increase in EXP causes earnings to increase by b3 dollars, holding S constant.

MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE

Page 5: Lecture 9 Today: Ch. 3: Multiple Regression Analysis Example with two independent variables Frisch-Waugh-Lovell theorem

© Christopher Dougherty 1999–2006

pure EXP effect

pure S effect

S

b1

b1 + b3EXP

b1 + b2S + b3EXP

EARNINGS

EXP

b1 + b2S

combined effect of S and EXP

EARNINGS = b1 + b2S + b3EXP + u

b1 + b2S

Different combinations of S and EXP give rise to values of EARNINGS which lie on the plane shown in the diagram, defined by EARNINGS = b1 + b2S + b3EXP.

This is the nonstochastic/deterministic (nonrandom) component of the model.

MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE

Page 6: Lecture 9 Today: Ch. 3: Multiple Regression Analysis Example with two independent variables Frisch-Waugh-Lovell theorem

© Christopher Dougherty 1999–2006

pure EXP effect

pure S effect

S

b1

b1 + b3EXP

b1 + b2S + b3EXP

b1 + b2S + b3EXP + u

EARNINGS

EXP

b1 + b2S

combined effect of S and EXP

u

EARNINGS = b1 + b2S + b3EXP + u

b1 + b2S

The final element of the model is the disturbance term, u. This causes the actual values of EARNINGS to deviate from the plane. In this observation, u happens to have a positive value.

MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE

Page 7: Lecture 9 Today: Ch. 3: Multiple Regression Analysis Example with two independent variables Frisch-Waugh-Lovell theorem

© Christopher Dougherty 1999–2006

pure EXP effect

pure S effect

S

b1

b1 + b3EXP

b1 + b2S + b3EXP

b1 + b2S + b3EXP + u

EARNINGS

EXP

b1 + b2S

combined effect of S and EXP

u

A sample consists of a number of observations generated in this way. Note that the interpretation of the model does not depend on whether S and EXP are correlated or not.

However we do assume that the effects of S and EXP on EARNINGS are additive. The impact of a difference in S on EARNINGS is not affected by the value of EXP, or vice versa.

EARNINGS = b1 + b2S + b3EXP + u

b1 + b2S

MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE

Page 8: Lecture 9 Today: Ch. 3: Multiple Regression Analysis Example with two independent variables Frisch-Waugh-Lovell theorem

© Christopher Dougherty 1999–2006

iiii uXXY 33221

iii XbXbbY 33221ˆ

iiiiii XbXbbYYYe 33221ˆ

The regression coefficients are derived using the same least squares principle used in simple regression analysis. The fitted value of Y in observation i depends on our choice of b1, b2, and b3.

The residual ei in observation i is the difference between the actual and fitted values of Y.

MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE

Page 9: Lecture 9 Today: Ch. 3: Multiple Regression Analysis Example with two independent variables Frisch-Waugh-Lovell theorem

© Christopher Dougherty 1999–2006

233221

2 )( iiii XbXbbYeRSS

We define RSS, the sum of the squares of the residuals, and choose b1, b2, and b3 so as to minimize it.

MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE

Page 10: Lecture 9 Today: Ch. 3: Multiple Regression Analysis Example with two independent variables Frisch-Waugh-Lovell theorem

© Christopher Dougherty 1999–2006

233221

2 )( iiii XbXbbYeRSS

)2222

22(

323233122133

22123

23

22

22

21

2

iiiiii

iiiiii

XXbbXbbXbbYXb

YXbYbXbXbbY

iii

iiiii

iiii

XXbbXbb

XbbYXbYXb

YbXbXbnbY

3232331

2213322

123

23

22

22

21

2

22

222

2

01

bRSS

0

2

bRSS

0

3

bRSS

First we expand RSS as shown, and then we use the first order conditions for minimizing it.

MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE

Page 11: Lecture 9 Today: Ch. 3: Multiple Regression Analysis Example with two independent variables Frisch-Waugh-Lovell theorem

© Christopher Dougherty 1999–2006

33221 XbXbYb

We thus obtain three equations in three unknowns. Solving for b1, b2, and b3, we obtain the expressions shown above. (The expression for b3 is the same as that for b2, with the subscripts 2 and 3 interchanged everywhere.)

23322 XXYYXX iii

23322

233

222

3322332

XXXXXXXX

XXXXYYXXb

iiii

iiii

MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE

Page 12: Lecture 9 Today: Ch. 3: Multiple Regression Analysis Example with two independent variables Frisch-Waugh-Lovell theorem

© Christopher Dougherty 1999–2006

33221 XbXbYb

23322 XXYYXX iii

23322

233

222

3322332

XXXXXXXX

XXXXYYXXb

iiii

iiii

The expression for b1 is a straightforward extension of the expression for it in simple regression analysis.

However, the expressions for the slope coefficients are considerably more complex than that for the slope coefficient in simple regression analysis.

For the general case when there are many explanatory variables, ordinary algebra is inadequate. It is necessary to switch to matrix algebra.

MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE

Page 13: Lecture 9 Today: Ch. 3: Multiple Regression Analysis Example with two independent variables Frisch-Waugh-Lovell theorem

© Christopher Dougherty 1999–2006

. reg EARNINGS S EXP

Source | SS df MS Number of obs = 540-------------+------------------------------ F( 2, 537) = 67.54 Model | 22513.6473 2 11256.8237 Prob > F = 0.0000 Residual | 89496.5838 537 166.660305 R-squared = 0.2010-------------+------------------------------ Adj R-squared = 0.1980 Total | 112010.231 539 207.811189 Root MSE = 12.91

------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- S | 2.678125 .2336497 11.46 0.000 2.219146 3.137105 EXP | .5624326 .1285136 4.38 0.000 .3099816 .8148837 _cons | -26.48501 4.27251 -6.20 0.000 -34.87789 -18.09213------------------------------------------------------------------------------

Here is the regression output for the earnings function using Data Set 21.

EXPSINGSNEAR 56.068.249.26ˆ

MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE

Page 14: Lecture 9 Today: Ch. 3: Multiple Regression Analysis Example with two independent variables Frisch-Waugh-Lovell theorem

© Christopher Dougherty 1999–2006

. reg EARNINGS S EXP

Source | SS df MS Number of obs = 540-------------+------------------------------ F( 2, 537) = 67.54 Model | 22513.6473 2 11256.8237 Prob > F = 0.0000 Residual | 89496.5838 537 166.660305 R-squared = 0.2010-------------+------------------------------ Adj R-squared = 0.1980 Total | 112010.231 539 207.811189 Root MSE = 12.91

------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- S | 2.678125 .2336497 11.46 0.000 2.219146 3.137105 EXP | .5624326 .1285136 4.38 0.000 .3099816 .8148837 _cons | -26.48501 4.27251 -6.20 0.000 -34.87789 -18.09213------------------------------------------------------------------------------

EXPSINGSNEAR 56.068.249.26ˆ

It indicates that earnings increase by $2.68 for every extra year of schooling and by $0.56 for every extra year of work experience.

MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE

Page 15: Lecture 9 Today: Ch. 3: Multiple Regression Analysis Example with two independent variables Frisch-Waugh-Lovell theorem

© Christopher Dougherty 1999–2006

. reg EARNINGS S EXP

Source | SS df MS Number of obs = 540-------------+------------------------------ F( 2, 537) = 67.54 Model | 22513.6473 2 11256.8237 Prob > F = 0.0000 Residual | 89496.5838 537 166.660305 R-squared = 0.2010-------------+------------------------------ Adj R-squared = 0.1980 Total | 112010.231 539 207.811189 Root MSE = 12.91

------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- S | 2.678125 .2336497 11.46 0.000 2.219146 3.137105 EXP | .5624326 .1285136 4.38 0.000 .3099816 .8148837 _cons | -26.48501 4.27251 -6.20 0.000 -34.87789 -18.09213------------------------------------------------------------------------------

EXPSINGSNEAR 56.068.249.26ˆ

Literally, the intercept indicates that an individual who had no schooling or work experience would have hourly earnings of –$26.49.

Obviously, this is impossible. The lowest value of S in the sample was 6. We have obtained a nonsense estimate because we have extrapolated too far from the data range.

MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE

Page 16: Lecture 9 Today: Ch. 3: Multiple Regression Analysis Example with two independent variables Frisch-Waugh-Lovell theorem

© Christopher Dougherty 1999–2006

GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL

Suppose that you were particularly interested in the relationship between EARNINGS and S and wished to represent it graphically, using the sample data.

A simple plot would be misleading.

-20

0

20

40

60

80

100

120

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Years of schooling

Ho

url

y ea

rnin

gs

($)

Page 17: Lecture 9 Today: Ch. 3: Multiple Regression Analysis Example with two independent variables Frisch-Waugh-Lovell theorem

© Christopher Dougherty 1999–2006

-20

0

20

40

60

80

100

120

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Years of schooling

Ho

url

y ea

rnin

gs

($)

Schooling is negatively correlated with work experience. The plot fails to take account of this, and as a consequence the regression line underestimates the impact of schooling on earnings Omitted Variable Bias (Later, we’ll discuss the mathematical details of this distortion.)

To eliminate the distortion, you purge both EARNINGS and S of their components related to EXP and then draw a scatter diagram using the purged variables.

. cor S EXP(obs=540) | S ASVABC--------+------------------ S| 1.0000 EXP| -0.2179 1.0000

GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL

Page 18: Lecture 9 Today: Ch. 3: Multiple Regression Analysis Example with two independent variables Frisch-Waugh-Lovell theorem

© Christopher Dougherty 1999–2006

. reg EARNINGS EXP

Source | SS df MS Number of obs = 540-------------+------------------------------ F( 1, 538) = 2.98 Model | 617.717488 1 617.717488 Prob > F = 0.0847 Residual | 111392.514 538 207.049282 R-squared = 0.0055-------------+------------------------------ Adj R-squared = 0.0037 Total | 112010.231 539 207.811189 Root MSE = 14.389

------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- EXP | .2414715 .1398002 1.73 0.085 -.0331497 .5160927 _cons | 15.55527 2.442468 6.37 0.000 10.75732 20.35321------------------------------------------------------------------------------

. predict EEARN, resid

We start by regressing EARNINGS on EXP, as shown above. The residuals are the part of EARNINGS which is not related to EXP. The ‘predict’ command is the Stata command for saving the residuals from the most recent regression. We name them EEARN.

GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL

Page 19: Lecture 9 Today: Ch. 3: Multiple Regression Analysis Example with two independent variables Frisch-Waugh-Lovell theorem

© Christopher Dougherty 1999–2006

. reg S EXP

Source | SS df MS Number of obs = 540-------------+------------------------------ F( 1, 538) = 26.82 Model | 152.160205 1 152.160205 Prob > F = 0.0000 Residual | 3052.82313 538 5.67439243 R-squared = 0.0475-------------+------------------------------ Adj R-squared = 0.0457 Total | 3204.98333 539 5.94616574 Root MSE = 2.3821

------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- EXP | -.1198454 .0231436 -5.18 0.000 -.1653083 -.0743826 _cons | 15.69765 .4043447 38.82 0.000 14.90337 16.49194------------------------------------------------------------------------------

. predict ES, resid

We do the same with S. We regress it on EXP and save the residuals as ES.

GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL

Page 20: Lecture 9 Today: Ch. 3: Multiple Regression Analysis Example with two independent variables Frisch-Waugh-Lovell theorem

© Christopher Dougherty 1999–2006

Now we plot EEARN on ES and the scatter is a faithful representation of the relationship, both in terms of the slope of the trend line (the black line) and in terms of the variation about that line.

As you would expect, the trend line is steeper than in the scatter diagram which did not control for EXP (reproduced here as the red line).

-20

0

20

40

60

80

-8 -6 -4 -2 0 2 4 6

ES (schooling residuals)

EE

AR

N (

earn

ing

s re

sid

ual

s)

GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL

Page 21: Lecture 9 Today: Ch. 3: Multiple Regression Analysis Example with two independent variables Frisch-Waugh-Lovell theorem

© Christopher Dougherty 1999–2006

. reg EEARN ES Source | SS df MS Number of obs = 540-------------+------------------------------ F( 1, 538) = 131.63 Model | 21895.9298 1 21895.9298 Prob > F = 0.0000 Residual | 89496.5833 538 166.350527 R-squared = 0.1966-------------+------------------------------ Adj R-squared = 0.1951 Total | 111392.513 539 206.665145 Root MSE = 12.898------------------------------------------------------------------------------ EEARN | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- ES | 2.678125 .2334325 11.47 0.000 2.219574 3.136676 _cons | 8.10e-09 .5550284 0.00 1.000 -1.090288 1.090288------------------------------------------------------------------------------

From multiple regression:

. reg EARNINGS S EXP------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- S | 2.678125 .2336497 11.46 0.000 2.219146 3.137105 EXP | .5624326 .1285136 4.38 0.000 .3099816 .8148837 _cons | -26.48501 4.27251 -6.20 0.000 -34.87789 -18.09213------------------------------------------------------------------------------

Here is the regression of EEARN on ES.

We will content ourselves by verifying that the estimate of the slope coefficient is the same as that from a multiple regression. A mathematical proof that the technique works requires matrix algebra.

This result is also called the Frisch-Waugh-Lovell theorem.

GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL

Page 22: Lecture 9 Today: Ch. 3: Multiple Regression Analysis Example with two independent variables Frisch-Waugh-Lovell theorem

© Christopher Dougherty 1999–2006

. reg EEARN ES Source | SS df MS Number of obs = 540-------------+------------------------------ F( 1, 538) = 131.63 Model | 21895.9298 1 21895.9298 Prob > F = 0.0000 Residual | 89496.5833 538 166.350527 R-squared = 0.1966-------------+------------------------------ Adj R-squared = 0.1951 Total | 111392.513 539 206.665145 Root MSE = 12.898------------------------------------------------------------------------------ EEARN | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- ES | 2.678125 .2334325 11.47 0.000 2.219574 3.136676 _cons | 8.10e-09 .5550284 0.00 1.000 -1.090288 1.090288------------------------------------------------------------------------------

From multiple regression:

. reg EARNINGS S EXP------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- S | 2.678125 .2336497 11.46 0.000 2.219146 3.137105 EXP | .5624326 .1285136 4.38 0.000 .3099816 .8148837 _cons | -26.48501 4.27251 -6.20 0.000 -34.87789 -18.09213------------------------------------------------------------------------------

Finally, a small and not very important technical point. You may have noticed that the standard error and t statistic do not quite match. The reason for this is that the number of degrees of freedom is overstated by 1 in the residuals regression. That regression has not made allowance for the fact that we have already used up 1 degree of freedom in removing EXP from the model.

GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL

Page 23: Lecture 9 Today: Ch. 3: Multiple Regression Analysis Example with two independent variables Frisch-Waugh-Lovell theorem

© Christopher Dougherty 1999–2006

A.1: The model is linear in parameters and correctly specified.

A.2: There does not exist an exact linear relationship among the regressors in the sample.

A.3 The disturbance term has zero expectation

A.4 The disturbance term is homoscedastic

A.5 The values of the disturbance term have independent distributions

A.6 The disturbance term has a normal distribution

PROPERTIES OF THE MULTIPLE REGRESSION COEFFICIENTS

uXXY kk ...221

Moving from the simple to the multiple regression model, we start by restating the regression model assumptions. Only A.2 is different. Previously it was stated that there must be some variation in the X variable. We will explain the difference in one of the following lectures.

Provided that the regression model assumptions are valid, the OLS estimators in the multiple regression model are unbiased and efficient, as in the simple regression model.