part iii multiple regression analysislipas.uwasa.fi/~sjp/teaching/ecm/lectures/ecmc3.pdf ·...

60
Multiple Regression Analysis Part III Multiple Regression Analysis As of Sep 29, 2020 Seppo Pynn¨ onen Econometrics I

Upload: others

Post on 24-Aug-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Part III Multiple Regression Analysislipas.uwasa.fi/~sjp/Teaching/ecm/lectures/ecmc3.pdf · Multiple Regression Analysis Example 1 Consider the consumption function C = f(Y), where

Multiple Regression Analysis

Part III

Multiple Regression Analysis

As of Sep 29, 2020Seppo Pynnonen Econometrics I

Page 2: Part III Multiple Regression Analysislipas.uwasa.fi/~sjp/Teaching/ecm/lectures/ecmc3.pdf · Multiple Regression Analysis Example 1 Consider the consumption function C = f(Y), where

Multiple Regression Analysis

1 Multiple Regression Analysis

Estimation

Matrix form

Goodness-of-Fit

R-square

Adjusted R-square

Expected values of the OLS estimators

Irrelevant variables in a regression

Omitted variable bias

Variances of OLS estimators

Multicollinearity

Variances of misspecified models

Estimating error variance σ2u

Standard error of βk

The Gauss-Markov Theorem

Seppo Pynnonen Econometrics I

Page 3: Part III Multiple Regression Analysislipas.uwasa.fi/~sjp/Teaching/ecm/lectures/ecmc3.pdf · Multiple Regression Analysis Example 1 Consider the consumption function C = f(Y), where

Multiple Regression Analysis

The general linear regression with k explanatory variables is just anextension of the simple regression as follows

y = β0 + β1x1 + · · ·+ βkxk + u. (1)

Because∂y

∂xj= βj (2)

j = 1, . . . , k , coefficient βj indicates the marginal effect of variablexj , and indicates the amount y is expected to change as xj changesby one unit and other variables are kept constant (ceteris paribus).

Seppo Pynnonen Econometrics I

Page 4: Part III Multiple Regression Analysislipas.uwasa.fi/~sjp/Teaching/ecm/lectures/ecmc3.pdf · Multiple Regression Analysis Example 1 Consider the consumption function C = f(Y), where

Multiple Regression Analysis

Multiple regression opens up several additional options to enrichanalysis and make modeling more realistic compared to the simpleregression.

Seppo Pynnonen Econometrics I

Page 5: Part III Multiple Regression Analysislipas.uwasa.fi/~sjp/Teaching/ecm/lectures/ecmc3.pdf · Multiple Regression Analysis Example 1 Consider the consumption function C = f(Y), where

Multiple Regression Analysis

Example 1

Consider the consumption functionC = f (Y ), where Y is income. Suppose the assumption is that asincomes grow the marginal propensity to consume decreases.

In simple regression we could dry to fit a level-log model or log-log model.

One possibility also could be

β1 = β1l + β1qY ,

where according to our hypothesis β1q < 0. Thus the consumptionfunction becomes

C = β0 + (β1l + β1qY )Y + u

= β0 + β1lY + β1qY2 + u

This is a multiple regression model with x1 = Y and x2 = Y 2.

Seppo Pynnonen Econometrics I

Page 6: Part III Multiple Regression Analysislipas.uwasa.fi/~sjp/Teaching/ecm/lectures/ecmc3.pdf · Multiple Regression Analysis Example 1 Consider the consumption function C = f(Y), where

Multiple Regression Analysis

Example 1 continues . . .

This simple example demonstrates that we can meaningfully enrichsimple regression analysis (even though we have essentially only twovariables, C and Y ) and at the same time get a meaningful interpretationto the above polynomial model.Technically, considering the simple regression

C = β0 + β1Y + v ,

the extensionC = β0 + βilY + β1qY

2 + u

means that we have extracted the quadratic term Y 2 out from the error

term v of the simple regression.

Seppo Pynnonen Econometrics I

Page 7: Part III Multiple Regression Analysislipas.uwasa.fi/~sjp/Teaching/ecm/lectures/ecmc3.pdf · Multiple Regression Analysis Example 1 Consider the consumption function C = f(Y), where

Multiple Regression Analysis

Estimation

1 Multiple Regression Analysis

Estimation

Matrix form

Goodness-of-Fit

R-square

Adjusted R-square

Expected values of the OLS estimators

Irrelevant variables in a regression

Omitted variable bias

Variances of OLS estimators

Multicollinearity

Variances of misspecified models

Estimating error variance σ2u

Standard error of βk

The Gauss-Markov Theorem

Seppo Pynnonen Econometrics I

Page 8: Part III Multiple Regression Analysislipas.uwasa.fi/~sjp/Teaching/ecm/lectures/ecmc3.pdf · Multiple Regression Analysis Example 1 Consider the consumption function C = f(Y), where

Multiple Regression Analysis

Estimation

In order to estimate the model we replace the classical assumption4 by

4. The explanatory variables are in mathematical sense linearlyindependent. That is no column in the data matrix X (see below)is a linear combination of the other columns.

The key assumption (Assumption 5) in terms of the conditionalexpectation becomes

E[u|x1, . . . , xk ] = 0. (3)

Seppo Pynnonen Econometrics I

Page 9: Part III Multiple Regression Analysislipas.uwasa.fi/~sjp/Teaching/ecm/lectures/ecmc3.pdf · Multiple Regression Analysis Example 1 Consider the consumption function C = f(Y), where

Multiple Regression Analysis

Estimation

Given observations (yi , xi1, . . . , xik), i = 1, . . . , n (n = the numberof observations), the estimation method again is the OLS, whichproduces estimates β0, β1, . . . , βk by minimizing

n∑i=1

(yi − β0 − β1xi1 − · · · − βkxik)2 (4)

with respect to the parameters.

Again the first order solution is to set the (k + 1) partialderivatives equal to zero.

The solution is straightforward although the explicit form of theestimators become complicated.

Seppo Pynnonen Econometrics I

Page 10: Part III Multiple Regression Analysislipas.uwasa.fi/~sjp/Teaching/ecm/lectures/ecmc3.pdf · Multiple Regression Analysis Example 1 Consider the consumption function C = f(Y), where

Multiple Regression Analysis

Estimation

1 Multiple Regression Analysis

Estimation

Matrix form

Goodness-of-Fit

R-square

Adjusted R-square

Expected values of the OLS estimators

Irrelevant variables in a regression

Omitted variable bias

Variances of OLS estimators

Multicollinearity

Variances of misspecified models

Estimating error variance σ2u

Standard error of βk

The Gauss-Markov Theorem

Seppo Pynnonen Econometrics I

Page 11: Part III Multiple Regression Analysislipas.uwasa.fi/~sjp/Teaching/ecm/lectures/ecmc3.pdf · Multiple Regression Analysis Example 1 Consider the consumption function C = f(Y), where

Multiple Regression Analysis

Estimation

Using matrix algebra simplifies considerably the notations inmultiple regression.

Denote the observation vector on y as

y = (y1, . . . , yn)′, (5)

where the prime denotes transposition.

In the same manner denote the data matrix on x-variablesenhanced with ones in the first column as an n × (k + 1) matrix

X =

1 x11 x12 · · · x1k

1 x21 x22 · · · x2k...

......

...1 xn1 xn2 · · · xnk

, (6)

where k < n (the number of observations, n, is larger than thenumber of x-variables, k).

Seppo Pynnonen Econometrics I

Page 12: Part III Multiple Regression Analysislipas.uwasa.fi/~sjp/Teaching/ecm/lectures/ecmc3.pdf · Multiple Regression Analysis Example 1 Consider the consumption function C = f(Y), where

Multiple Regression Analysis

Estimation

Then we can present the whole set of regression equations for thesample

y1 = β0 + β1x11 + · · ·+ βkx1k + u1

y2 = β0 + β1x21 + · · ·+ βkx2k + u2...

yn = β0 + β1xn1 + · · ·+ βkxnk + un

(7)

in the matrix form asy1

y2

...yn

=

1 x11 x12 · · · x1k

1 x21 x22 · · · x2k

......

......

1 xn1 xn2 · · · xnk

β0

β1

β2

...βk

+

u1

u2

...un

(8)

or shortlyy = Xβ + u, (9)

whereβ = (β0, β1, . . . , βk)′

is the parameter vector and

u = (u1, u2, . . . , un)′

is the error vector.

Seppo Pynnonen Econometrics I

Page 13: Part III Multiple Regression Analysislipas.uwasa.fi/~sjp/Teaching/ecm/lectures/ecmc3.pdf · Multiple Regression Analysis Example 1 Consider the consumption function C = f(Y), where

Multiple Regression Analysis

Estimation

The normal equations for the first order conditions of theminimization of (4) in matrix form are simply

X′Xβ = X′y (10)

which gives the explicit solution for the OLS estimator of β as

β = (X′X)−1X′y, (11)

where β = (β0, β1, . . . , βk)′.

Seppo Pynnonen Econometrics I

Page 14: Part III Multiple Regression Analysislipas.uwasa.fi/~sjp/Teaching/ecm/lectures/ecmc3.pdf · Multiple Regression Analysis Example 1 Consider the consumption function C = f(Y), where

Multiple Regression Analysis

Estimation

The fitted model is

yi = β0 + β1xi1 + · · ·+ βkxik , (12)

i = 1, . . . , n.

The residuals for observation i is again defined as

ui = yi − yi . (13)

Again holdy = ¯y (14)

andy = β0 + β1x1 + · · ·+ βk xk , (15)

where y , ¯y , xj , j = 1, . . . , k are sample averages of the variables.

Seppo Pynnonen Econometrics I

Page 15: Part III Multiple Regression Analysislipas.uwasa.fi/~sjp/Teaching/ecm/lectures/ecmc3.pdf · Multiple Regression Analysis Example 1 Consider the consumption function C = f(Y), where

Multiple Regression Analysis

Estimation

Exploring further (Wooldridge 5e, p.74)

OLS fitted regression explaining college GPA in terms of high school GPAand ACT score is

clGPA = 1.29 + 0.454 hsGPA + 0.0094 ACT.

If in the sample the average high school GPA is 3.4 and the average ACT

is 24.2, what is the average college GPA in the sample?

Seppo Pynnonen Econometrics I

Page 16: Part III Multiple Regression Analysislipas.uwasa.fi/~sjp/Teaching/ecm/lectures/ecmc3.pdf · Multiple Regression Analysis Example 1 Consider the consumption function C = f(Y), where

Multiple Regression Analysis

Estimation

Remark 1

For example, if you fit the simple regression y = β0 + β1x1, where β0 and

β1 are the OLS estimators, and fit a multiple regression

y = β0 + β1x1 + β2x2 then generally β1 6= β1 unless β2 = 0, or x1 and x2

are uncorrelated.

Seppo Pynnonen Econometrics I

Page 17: Part III Multiple Regression Analysislipas.uwasa.fi/~sjp/Teaching/ecm/lectures/ecmc3.pdf · Multiple Regression Analysis Example 1 Consider the consumption function C = f(Y), where

Multiple Regression Analysis

Estimation

Example 2

Consider the hourly wage example. enhance the model as

log(w) = β0 + β1x1 + β2x2 + β3x3, (16)

where w = hourly wage, x1 = years of education (educ), x2 = years oflabor market experience (exper), and x3 = years with the currentemployer (tenure).

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 0.284360 0.104190 2.729 0.00656 **

educ 0.092029 0.007330 12.555 < 2e-16 ***

exper 0.004121 0.001723 2.391 0.01714 *

tenure 0.022067 0.003094 7.133 3.29e-12 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.4409 on 522 degrees of freedom

Multiple R-squared: 0.316,Adjusted R-squared: 0.3121

F-statistic: 80.39 on 3 and 522 DF, p-value: < 2.2e-16

Seppo Pynnonen Econometrics I

Page 18: Part III Multiple Regression Analysislipas.uwasa.fi/~sjp/Teaching/ecm/lectures/ecmc3.pdf · Multiple Regression Analysis Example 1 Consider the consumption function C = f(Y), where

Multiple Regression Analysis

Goodness-of-Fit

1 Multiple Regression Analysis

Estimation

Matrix form

Goodness-of-Fit

R-square

Adjusted R-square

Expected values of the OLS estimators

Irrelevant variables in a regression

Omitted variable bias

Variances of OLS estimators

Multicollinearity

Variances of misspecified models

Estimating error variance σ2u

Standard error of βk

The Gauss-Markov Theorem

Seppo Pynnonen Econometrics I

Page 19: Part III Multiple Regression Analysislipas.uwasa.fi/~sjp/Teaching/ecm/lectures/ecmc3.pdf · Multiple Regression Analysis Example 1 Consider the consumption function C = f(Y), where

Multiple Regression Analysis

Goodness-of-Fit

Again in the same manner as with the simple regression we have

n∑i=1

(yi − y)2 =n∑

i=1

(yi − y)2 +n∑

i=1

(yi − yi )2 (17)

orSST = SSE + SSR, (18)

whereyi = β0 + β1xi1 + · · ·+ βkxik . (19)

Seppo Pynnonen Econometrics I

Page 20: Part III Multiple Regression Analysislipas.uwasa.fi/~sjp/Teaching/ecm/lectures/ecmc3.pdf · Multiple Regression Analysis Example 1 Consider the consumption function C = f(Y), where

Multiple Regression Analysis

Goodness-of-Fit

1 Multiple Regression Analysis

Estimation

Matrix form

Goodness-of-Fit

R-square

Adjusted R-square

Expected values of the OLS estimators

Irrelevant variables in a regression

Omitted variable bias

Variances of OLS estimators

Multicollinearity

Variances of misspecified models

Estimating error variance σ2u

Standard error of βk

The Gauss-Markov Theorem

Seppo Pynnonen Econometrics I

Page 21: Part III Multiple Regression Analysislipas.uwasa.fi/~sjp/Teaching/ecm/lectures/ecmc3.pdf · Multiple Regression Analysis Example 1 Consider the consumption function C = f(Y), where

Multiple Regression Analysis

Goodness-of-Fit

R2 =SSE

SST= 1− SSR

SST. (20)

Again as in the case of the simple regression, R2 can be shown tobe the squared correlation coefficient between the actual yi andfitted yi . This correlation is called the multiple correlation

R =

∑(yi − y)(yi − ¯y)√∑

(yi − y)2√∑

(yi − ¯y)2. (21)

Recall, ¯y = y .

Remark 2

R2 never decreases when an explanatory variable is added to the model!

Seppo Pynnonen Econometrics I

Page 22: Part III Multiple Regression Analysislipas.uwasa.fi/~sjp/Teaching/ecm/lectures/ecmc3.pdf · Multiple Regression Analysis Example 1 Consider the consumption function C = f(Y), where

Multiple Regression Analysis

Goodness-of-Fit

1 Multiple Regression Analysis

Estimation

Matrix form

Goodness-of-Fit

R-square

Adjusted R-square

Expected values of the OLS estimators

Irrelevant variables in a regression

Omitted variable bias

Variances of OLS estimators

Multicollinearity

Variances of misspecified models

Estimating error variance σ2u

Standard error of βk

The Gauss-Markov Theorem

Seppo Pynnonen Econometrics I

Page 23: Part III Multiple Regression Analysislipas.uwasa.fi/~sjp/Teaching/ecm/lectures/ecmc3.pdf · Multiple Regression Analysis Example 1 Consider the consumption function C = f(Y), where

Multiple Regression Analysis

Goodness-of-Fit

R2 = 1− s2u

s2y

, (22)

where

s2u =

1

n − k − 1

n∑i=1

(yi − yi )2 =

1

n − k − 1

n∑i=1

u2i (23)

is an estimator of the error variance σ2u = var[ui ] and ui = yi − yi

are residuals with yi = β0 + β1xi1 + · · ·+ βkxik the fitted values.

Seppo Pynnonen Econometrics I

Page 24: Part III Multiple Regression Analysislipas.uwasa.fi/~sjp/Teaching/ecm/lectures/ecmc3.pdf · Multiple Regression Analysis Example 1 Consider the consumption function C = f(Y), where

Multiple Regression Analysis

Expected values of the OLS estimators

1 Multiple Regression Analysis

Estimation

Matrix form

Goodness-of-Fit

R-square

Adjusted R-square

Expected values of the OLS estimators

Irrelevant variables in a regression

Omitted variable bias

Variances of OLS estimators

Multicollinearity

Variances of misspecified models

Estimating error variance σ2u

Standard error of βk

The Gauss-Markov Theorem

Seppo Pynnonen Econometrics I

Page 25: Part III Multiple Regression Analysislipas.uwasa.fi/~sjp/Teaching/ecm/lectures/ecmc3.pdf · Multiple Regression Analysis Example 1 Consider the consumption function C = f(Y), where

Multiple Regression Analysis

Expected values of the OLS estimators

Under the classical assumptions 1–5 (with assumption 5 updatedto multiple regression) we can show that the estimators of theregression coefficients are unbiased.

Theorem 1

Under the classical assumptions the OLS estimators of the regressioncoefficients, βj , j = 0, 1, . . . , k , are unbiased, i.e.,

E[βj

]= βj , j = 0, 1, . . . , k . (24)

Seppo Pynnonen Econometrics I

Page 26: Part III Multiple Regression Analysislipas.uwasa.fi/~sjp/Teaching/ecm/lectures/ecmc3.pdf · Multiple Regression Analysis Example 1 Consider the consumption function C = f(Y), where

Multiple Regression Analysis

Expected values of the OLS estimators

Using matrix notations, the proof of Theorem 3.1 is prettystraightforward. To do this write

β = (X′X)−1X′y

= (X′X)−1X′(Xβ + u)

= (X′X)−1X′Xβ + (X′X)−1X′u

= β + (X′X)−1X′u.

(25)

The expected value of β is

E[β]

= β + E[(X′X)−1X′u

]= β

(26)

because Assumption 5, i.e., E[u|X] = 0 implies cov[u,X] = 0 andtherefore E

[(X′X)−1X′u

]= 0. Thus, the OLS estimators of the

regression coefficients are unbiased.

Seppo Pynnonen Econometrics I

Page 27: Part III Multiple Regression Analysislipas.uwasa.fi/~sjp/Teaching/ecm/lectures/ecmc3.pdf · Multiple Regression Analysis Example 1 Consider the consumption function C = f(Y), where

Multiple Regression Analysis

Expected values of the OLS estimators

Remark 3

If z = (z1, . . . , zk)′ is a random vector, then E[z] = (E[z1] , . . . ,E[zn])′.

That is the expected value of a vector is a vector whose components are

the individual expected values.

Seppo Pynnonen Econometrics I

Page 28: Part III Multiple Regression Analysislipas.uwasa.fi/~sjp/Teaching/ecm/lectures/ecmc3.pdf · Multiple Regression Analysis Example 1 Consider the consumption function C = f(Y), where

Multiple Regression Analysis

Expected values of the OLS estimators

1 Multiple Regression Analysis

Estimation

Matrix form

Goodness-of-Fit

R-square

Adjusted R-square

Expected values of the OLS estimators

Irrelevant variables in a regression

Omitted variable bias

Variances of OLS estimators

Multicollinearity

Variances of misspecified models

Estimating error variance σ2u

Standard error of βk

The Gauss-Markov Theorem

Seppo Pynnonen Econometrics I

Page 29: Part III Multiple Regression Analysislipas.uwasa.fi/~sjp/Teaching/ecm/lectures/ecmc3.pdf · Multiple Regression Analysis Example 1 Consider the consumption function C = f(Y), where

Multiple Regression Analysis

Expected values of the OLS estimators

Suppose the correct model is

y = β0 + β1x1 + u (27)

but we estimate the model as

y = β0 + β1x1 + β2x2 + u. (28)

Thus, β2 = 0 in reality. The OLS estimation results yield

y = β0 + β1x1 + β2x2 (29)

By Theorem 3.1 E[βj

]= βj , thus in particular E

[β2

]= β2 = 0,

implying that inclusion of extra variables to a regression does notbias the results.

However, as will be seen later, they decrease accuracy ofestimation by increasing variance of the OLS estimates.

Seppo Pynnonen Econometrics I

Page 30: Part III Multiple Regression Analysislipas.uwasa.fi/~sjp/Teaching/ecm/lectures/ecmc3.pdf · Multiple Regression Analysis Example 1 Consider the consumption function C = f(Y), where

Multiple Regression Analysis

Expected values of the OLS estimators

1 Multiple Regression Analysis

Estimation

Matrix form

Goodness-of-Fit

R-square

Adjusted R-square

Expected values of the OLS estimators

Irrelevant variables in a regression

Omitted variable bias

Variances of OLS estimators

Multicollinearity

Variances of misspecified models

Estimating error variance σ2u

Standard error of βk

The Gauss-Markov Theorem

Seppo Pynnonen Econometrics I

Page 31: Part III Multiple Regression Analysislipas.uwasa.fi/~sjp/Teaching/ecm/lectures/ecmc3.pdf · Multiple Regression Analysis Example 1 Consider the consumption function C = f(Y), where

Multiple Regression Analysis

Expected values of the OLS estimators

Suppose now as an example that the correct model is

y = β0 + β1x1 + β2x2 + u, (30)

but we misspecify the model as

y = β0 + β1x1 + v , (31)

where the omitted variable is embedded into the residual termv = β2x2 + u.

OLS estimator for β1 for specification (31)is

β1 =

∑(xi1 − x1)yi∑(xi1 − x1)2

. (32)

Seppo Pynnonen Econometrics I

Page 32: Part III Multiple Regression Analysislipas.uwasa.fi/~sjp/Teaching/ecm/lectures/ecmc3.pdf · Multiple Regression Analysis Example 1 Consider the consumption function C = f(Y), where

Multiple Regression Analysis

Expected values of the OLS estimators

From Equation (32) we have

β1 = β1 +n∑

i=1

aivi , (33)

where

ai =(xi1 − x1)∑(xi1 − x1)2

. (34)

Seppo Pynnonen Econometrics I

Page 33: Part III Multiple Regression Analysislipas.uwasa.fi/~sjp/Teaching/ecm/lectures/ecmc3.pdf · Multiple Regression Analysis Example 1 Consider the consumption function C = f(Y), where

Multiple Regression Analysis

Expected values of the OLS estimators

Thus because E[vi ] = E[β2xi2 + ui ] = β2xi2

E[β1

]= β1 +

∑aiE[vi ]

= β1 +∑

aiβ2xi2

= β1 + β2

∑(xi1−x1)xi2∑(xi1−x1)2

(35)

i.e.,

E[β1

]= β1 + β2

∑(xi1 − x1)xi2∑(xi1 − x1)2

, (36)

implying that β1 is biased for β1 unless x1and x2 are uncorrelated(or β2 = 0). This is called the omitted variable bias.

Seppo Pynnonen Econometrics I

Page 34: Part III Multiple Regression Analysislipas.uwasa.fi/~sjp/Teaching/ecm/lectures/ecmc3.pdf · Multiple Regression Analysis Example 1 Consider the consumption function C = f(Y), where

Multiple Regression Analysis

Expected values of the OLS estimators

The direction of the omitted variable bias is as follows:

cor[x1, x2] > 0 cor[x1, x2] < 0

β2 > 0 positive bias negative biasβ2 < 0 negative bias positive bias

Seppo Pynnonen Econometrics I

Page 35: Part III Multiple Regression Analysislipas.uwasa.fi/~sjp/Teaching/ecm/lectures/ecmc3.pdf · Multiple Regression Analysis Example 1 Consider the consumption function C = f(Y), where

Multiple Regression Analysis

Expected values of the OLS estimators

Remark 4

The omitted variable bias factually is due to the fact that the error termof the miss-specified model is correlated with the explanatory variables.

That is, if the true model is y = β0 + β1x1 + β2x2 + u but we estimate

model y = β0 + β2x1 + v (so that v = β2x2 + u), then

E[v |x1] = E[β2x2 + u|x1] = β2E[x2|x1] + E[u|x1] = β2E[x2|x1] 6= 0 if x2

and x1 are correlated, i.e., the crucial assumption E[v |x1] = 0 becomes

violated in the miss-specified model. (Note: we assume that in the true

model E[u|x1, x2] = 0 which implies also E[u|x1] = 0.)

Seppo Pynnonen Econometrics I

Page 36: Part III Multiple Regression Analysislipas.uwasa.fi/~sjp/Teaching/ecm/lectures/ecmc3.pdf · Multiple Regression Analysis Example 1 Consider the consumption function C = f(Y), where

Multiple Regression Analysis

Expected values of the OLS estimators

Exploring further (Wooldridge 5e, p.67)

A simple model to explain city murder rates (murdrate) in terms ofprobability of conviction (prbconv) and average sentence length(avgsen) is

murdrate = β0 + β1prbconv + β2avgsen + u.

What are some factors contained in u? Do you think the key assumption

E[u|prbconv, avgsen] = 0 likely to hold?

Seppo Pynnonen Econometrics I

Page 37: Part III Multiple Regression Analysislipas.uwasa.fi/~sjp/Teaching/ecm/lectures/ecmc3.pdf · Multiple Regression Analysis Example 1 Consider the consumption function C = f(Y), where

Multiple Regression Analysis

Variances of OLS estimators

1 Multiple Regression Analysis

Estimation

Matrix form

Goodness-of-Fit

R-square

Adjusted R-square

Expected values of the OLS estimators

Irrelevant variables in a regression

Omitted variable bias

Variances of OLS estimators

Multicollinearity

Variances of misspecified models

Estimating error variance σ2u

Standard error of βk

The Gauss-Markov Theorem

Seppo Pynnonen Econometrics I

Page 38: Part III Multiple Regression Analysislipas.uwasa.fi/~sjp/Teaching/ecm/lectures/ecmc3.pdf · Multiple Regression Analysis Example 1 Consider the consumption function C = f(Y), where

Multiple Regression Analysis

Variances of OLS estimators

Write the regression model

yi = β0 + β1xi1 + · · ·+ βkxik + ui (37)

i = 1, . . . , n in the matrix form

y = Xβ + u. (38)

Then we can write the OLS estimators compactly as

β = (X′X)−1X′y. (39)

Seppo Pynnonen Econometrics I

Page 39: Part III Multiple Regression Analysislipas.uwasa.fi/~sjp/Teaching/ecm/lectures/ecmc3.pdf · Multiple Regression Analysis Example 1 Consider the consumption function C = f(Y), where

Multiple Regression Analysis

Variances of OLS estimators

Under the classical assumption 1–5, and assuming X fixed we canshow that the variance-covariance matrix β is

cov[β]

= (X′X)−1σ2u. (40)

Variances of the individual coefficients are obtained form the maindiagonal of the matrix, and can be shown to be of the form

var[βj

]=

σ2u

(1− R2j )∑n

i=1(xij − xj)2, (41)

j = 1, . . . , k , where R2j is the R-square when regressing xj on the

other explanatory variables and the constant term.

Seppo Pynnonen Econometrics I

Page 40: Part III Multiple Regression Analysislipas.uwasa.fi/~sjp/Teaching/ecm/lectures/ecmc3.pdf · Multiple Regression Analysis Example 1 Consider the consumption function C = f(Y), where

Multiple Regression Analysis

Variances of OLS estimators

1 Multiple Regression Analysis

Estimation

Matrix form

Goodness-of-Fit

R-square

Adjusted R-square

Expected values of the OLS estimators

Irrelevant variables in a regression

Omitted variable bias

Variances of OLS estimators

Multicollinearity

Variances of misspecified models

Estimating error variance σ2u

Standard error of βk

The Gauss-Markov Theorem

Seppo Pynnonen Econometrics I

Page 41: Part III Multiple Regression Analysislipas.uwasa.fi/~sjp/Teaching/ecm/lectures/ecmc3.pdf · Multiple Regression Analysis Example 1 Consider the consumption function C = f(Y), where

Multiple Regression Analysis

Variances of OLS estimators

In terms of linear algebra, we say that vectors

x1, x2, . . . , xk

are linearly independent if

a1x1 + · · ·+ akxk = 0

holds only ifa1 = · · · = ak = 0.

Otherwise x1, . . . , xk are linearly dependent. In such a case somea` 6= 0 and we can write

x` = c1x1 + · · ·+ c`−1x`−1 + c`+1x`+1 + · · ·+ ckxk ,

where cj = −aj/a` that is x` can be represented as a linearcombination of the other variables.

Seppo Pynnonen Econometrics I

Page 42: Part III Multiple Regression Analysislipas.uwasa.fi/~sjp/Teaching/ecm/lectures/ecmc3.pdf · Multiple Regression Analysis Example 1 Consider the consumption function C = f(Y), where

Multiple Regression Analysis

Variances of OLS estimators

In statistics the multiple correlation measures the degree of lineardependence.

If the variables are perfectly linearly dependent. That is, if forexample, xj is a linear combination of other variables, the multiplecorrelation Rj = 1.

A perfect linear dependence is rare between random variables.

However, particularly between macro economic variablesdependencies are often high.

Seppo Pynnonen Econometrics I

Page 43: Part III Multiple Regression Analysislipas.uwasa.fi/~sjp/Teaching/ecm/lectures/ecmc3.pdf · Multiple Regression Analysis Example 1 Consider the consumption function C = f(Y), where

Multiple Regression Analysis

Variances of OLS estimators

From the variance equation (41) we see that var[βj

]→∞ as

R2j → 1.

That is, the more the explanatory variables are linearly dependentthe larger the variance becomes.

This implies that the coefficient estimates become increasinglyunstable.

High (but not perfect) correlation between two or moreexplanatory variables is called multicollinearity.

Seppo Pynnonen Econometrics I

Page 44: Part III Multiple Regression Analysislipas.uwasa.fi/~sjp/Teaching/ecm/lectures/ecmc3.pdf · Multiple Regression Analysis Example 1 Consider the consumption function C = f(Y), where

Multiple Regression Analysis

Variances of OLS estimators

Symptoms of multicollinearity:

1 High correlations between explanatory variables.

2 R2 is relatively high, but the coefficient estimates tend to beinsignificant (see the section of hypothesis testing)

3 Some of the coefficients are of wrong sign and somecoefficients are at the same time unreasonably large.

4 Coefficient estimates change much from one model alternativeto another.

Seppo Pynnonen Econometrics I

Page 45: Part III Multiple Regression Analysislipas.uwasa.fi/~sjp/Teaching/ecm/lectures/ecmc3.pdf · Multiple Regression Analysis Example 1 Consider the consumption function C = f(Y), where

Multiple Regression Analysis

Variances of OLS estimators

Example 3

Variable Et denotes expenditure costs in a sample of Toyota Mark II carsat time point t, Mt denotes the mileage and At age.

Consider model alternatives:

Model A: Et = α0 + α1At + u1t

Model B: Et = β0 + β1Mt + u2t

Model C: Et = γ0 + γ1Mt + γ2At + u3t

Seppo Pynnonen Econometrics I

Page 46: Part III Multiple Regression Analysislipas.uwasa.fi/~sjp/Teaching/ecm/lectures/ecmc3.pdf · Multiple Regression Analysis Example 1 Consider the consumption function C = f(Y), where

Multiple Regression Analysis

Variances of OLS estimators

Example 3 continues . . .

Estimation results: (t-values in parentheses)

Variable Model A Model B Model C

Constant -626.24 -796.07 7.29(-5.98) (-5.91) (0.06)

Age 7.35 27.58(22.16) (9.58)

Miles 53.45 -151.15(18.27) (-7.06)

df 55 55 54

R2 0.897 0.856 0.946σ 368.6 437.0 268.3

Findings:

Apriori, coefficients α1, β1, γ1, and γ2 should be positive. However,

γ2 = −151.15 (!!?), but β1 = 53.45. Correlation rA,M = 0.996!

Seppo Pynnonen Econometrics I

Page 47: Part III Multiple Regression Analysislipas.uwasa.fi/~sjp/Teaching/ecm/lectures/ecmc3.pdf · Multiple Regression Analysis Example 1 Consider the consumption function C = f(Y), where

Multiple Regression Analysis

Variances of OLS estimators

Remedies:

In the collinearity problem the question is the there is not enoughinformation to reliably identify each variables contribution as anexplanatory variable in the model.

Thus in order to alleviate the problem:

1 Use non-sample information if available to impose restrictionsbetween coefficients.

2 Increase the sample size if possible.

3 Drop the most collinear (on the base of R2j ) variables.

4 If a linear combination (usually a sum) of the most collinearvariables is meaningful, replace the collinear variables by thelinear combination.

Seppo Pynnonen Econometrics I

Page 48: Part III Multiple Regression Analysislipas.uwasa.fi/~sjp/Teaching/ecm/lectures/ecmc3.pdf · Multiple Regression Analysis Example 1 Consider the consumption function C = f(Y), where

Multiple Regression Analysis

Variances of OLS estimators

Remark 5

Multicollinearity is not always harmful.

If an explanatory variable is not correlated with the rest of theexplanatory variables multicollinearity of these variables (providedthat it is not perfect) does not harm the variance of the slopecoefficient estimator of the uncorrelated explanatory variable.

If xj is not correlated with any of the other predictors, R2j = 0 and hence,

as is seen from equation (41), the correlation factor (1− R2j ) drops out

from its slope coefficient estimator variance var[βj

].

Another case where multicollinearity (again excluding the perfectcase) is not a problem is if we are purely interested in thepredictions of the model.

Predictions are affected essentially only by the usefulness of the available

information.

Seppo Pynnonen Econometrics I

Page 49: Part III Multiple Regression Analysislipas.uwasa.fi/~sjp/Teaching/ecm/lectures/ecmc3.pdf · Multiple Regression Analysis Example 1 Consider the consumption function C = f(Y), where

Multiple Regression Analysis

Variances of OLS estimators

Exploring further (Wooldridge 5e, p.93)

Consider a model explaining final exam score by class attendance (number of

classes attended). To control for student abilities and efforts outside the

classroom, additional explanatory variables like cumulative GPA, SAT score,

and a measure of high school performance are included. Someone says, ”You

cannot learn much from this exercise because cumulative GPA, SAT score, and

high school performance are likely to be highly collinear.” What should be your

response?

Seppo Pynnonen Econometrics I

Page 50: Part III Multiple Regression Analysislipas.uwasa.fi/~sjp/Teaching/ecm/lectures/ecmc3.pdf · Multiple Regression Analysis Example 1 Consider the consumption function C = f(Y), where

Multiple Regression Analysis

Variances of OLS estimators

Sometimes collinearity imply from technical reasons.

Example 4 (Wooldridge 5e, Exercise 3.5)

In a study relating college grade point average (gpa) to time spent in variousactivities, you distribute a survey to several students. The students are askedhow many hours they spend each week in four activities: studying, sleeping,working, and leisure. Any activity is put into one of these four categories, sothat for each student the sum of hours in the four activities must sum to 168.

(i) In the model

gpa = β0 + β1study + β2sleep + β3work + β4leisure + u,

does it make sense to hold sleep, work, leisure fixed, while changingstudy?

(ii) Explain why this model violates Assumption 4.

(iii) How would you reformulate model so that its parameters have a usefulinterpretation ant if satisfies Assumption 4.?

Seppo Pynnonen Econometrics I

Page 51: Part III Multiple Regression Analysislipas.uwasa.fi/~sjp/Teaching/ecm/lectures/ecmc3.pdf · Multiple Regression Analysis Example 1 Consider the consumption function C = f(Y), where

Multiple Regression Analysis

Variances of OLS estimators

1 Multiple Regression Analysis

Estimation

Matrix form

Goodness-of-Fit

R-square

Adjusted R-square

Expected values of the OLS estimators

Irrelevant variables in a regression

Omitted variable bias

Variances of OLS estimators

Multicollinearity

Variances of misspecified models

Estimating error variance σ2u

Standard error of βk

The Gauss-Markov Theorem

Seppo Pynnonen Econometrics I

Page 52: Part III Multiple Regression Analysislipas.uwasa.fi/~sjp/Teaching/ecm/lectures/ecmc3.pdf · Multiple Regression Analysis Example 1 Consider the consumption function C = f(Y), where

Multiple Regression Analysis

Variances of OLS estimators

Consider again as in (30) the regression model

y = β0 + β1x1 + β2x2 + u. (42)

Suppose the following models are estimated by OLS

y = β0 + β1x1 + β2x2 (43)

andy = β0 + β1x1. (44)

Then

var[β1

]=

σ2u

(1− r212)∑

(xi1 − x1)2(45)

and

var[β1

]=

σ2u∑

(xi1 − x1)2, (46)

where r12 is the sample correlation between x1 and x2.Seppo Pynnonen Econometrics I

Page 53: Part III Multiple Regression Analysislipas.uwasa.fi/~sjp/Teaching/ecm/lectures/ecmc3.pdf · Multiple Regression Analysis Example 1 Consider the consumption function C = f(Y), where

Multiple Regression Analysis

Variances of OLS estimators

Thus var[β1

]≤ var

[β1

], and the inequality is strict if r12 6= 0.

In summary (assuming r12 6= 0):

1 If β2 6= 0, then β1 is biased, β1 is unbiased, and var[β1

]< var

[β1

]2 If β2 = 0, then both β1 and β1 are unbiased, but var

[β1

]< var

[β1

]

Seppo Pynnonen Econometrics I

Page 54: Part III Multiple Regression Analysislipas.uwasa.fi/~sjp/Teaching/ecm/lectures/ecmc3.pdf · Multiple Regression Analysis Example 1 Consider the consumption function C = f(Y), where

Multiple Regression Analysis

Variances of OLS estimators

1 Multiple Regression Analysis

Estimation

Matrix form

Goodness-of-Fit

R-square

Adjusted R-square

Expected values of the OLS estimators

Irrelevant variables in a regression

Omitted variable bias

Variances of OLS estimators

Multicollinearity

Variances of misspecified models

Estimating error variance σ2u

Standard error of βk

The Gauss-Markov Theorem

Seppo Pynnonen Econometrics I

Page 55: Part III Multiple Regression Analysislipas.uwasa.fi/~sjp/Teaching/ecm/lectures/ecmc3.pdf · Multiple Regression Analysis Example 1 Consider the consumption function C = f(Y), where

Multiple Regression Analysis

Variances of OLS estimators

An unbiased estimator of the error variance var[u] = σ2u is

σ2u =

1

n − k − 1

n∑i=1

u2i , (47)

whereui = yi − β0 − β1xi1 − · · · − βkuik . (48)

The term n − k − 1 in (46) is thedegrees of freedom (df).

It can be shown thatE[σ2u

]= σ2

u, (49)

i.e., σ2u is unbiased estimator of σ2

u.

σu is called the standard error of the regression.

Seppo Pynnonen Econometrics I

Page 56: Part III Multiple Regression Analysislipas.uwasa.fi/~sjp/Teaching/ecm/lectures/ecmc3.pdf · Multiple Regression Analysis Example 1 Consider the consumption function C = f(Y), where

Multiple Regression Analysis

Variances of OLS estimators

1 Multiple Regression Analysis

Estimation

Matrix form

Goodness-of-Fit

R-square

Adjusted R-square

Expected values of the OLS estimators

Irrelevant variables in a regression

Omitted variable bias

Variances of OLS estimators

Multicollinearity

Variances of misspecified models

Estimating error variance σ2u

Standard error of βk

The Gauss-Markov Theorem

Seppo Pynnonen Econometrics I

Page 57: Part III Multiple Regression Analysislipas.uwasa.fi/~sjp/Teaching/ecm/lectures/ecmc3.pdf · Multiple Regression Analysis Example 1 Consider the consumption function C = f(Y), where

Multiple Regression Analysis

Variances of OLS estimators

Standard deviation of βj is the square root of (41), i.e.

σβj=√

var[βj ] =σu√

(1− R2j )∑

(xij − xj)2(50)

Substituting σu by its estimate σu =√σ2u gives

the standard error of βj

se(βj) =σu√

(1− R2j )∑

(xij − xj)2. (51)

Seppo Pynnonen Econometrics I

Page 58: Part III Multiple Regression Analysislipas.uwasa.fi/~sjp/Teaching/ecm/lectures/ecmc3.pdf · Multiple Regression Analysis Example 1 Consider the consumption function C = f(Y), where

Multiple Regression Analysis

The Gauss-Markov Theorem

1 Multiple Regression Analysis

Estimation

Matrix form

Goodness-of-Fit

R-square

Adjusted R-square

Expected values of the OLS estimators

Irrelevant variables in a regression

Omitted variable bias

Variances of OLS estimators

Multicollinearity

Variances of misspecified models

Estimating error variance σ2u

Standard error of βk

The Gauss-Markov Theorem

Seppo Pynnonen Econometrics I

Page 59: Part III Multiple Regression Analysislipas.uwasa.fi/~sjp/Teaching/ecm/lectures/ecmc3.pdf · Multiple Regression Analysis Example 1 Consider the consumption function C = f(Y), where

Multiple Regression Analysis

The Gauss-Markov Theorem

Theorem 2 (Gauss-Markov)

Under the classical assumptions 1–5 β0, β1, . . . βk are the best linear

unbiased estimators (BLUEs) of β0, β1, . . . βk , respectively.

BLUE:

Best: The variance of the OLS estimator is smallest among alllinear unbiased estimators of βj

Linear: βj =∑n

i=1 wijyi .

Unbiased: E[βj

]= βj .

Seppo Pynnonen Econometrics I

Page 60: Part III Multiple Regression Analysislipas.uwasa.fi/~sjp/Teaching/ecm/lectures/ecmc3.pdf · Multiple Regression Analysis Example 1 Consider the consumption function C = f(Y), where

Multiple Regression Analysis

The Gauss-Markov Theorem

Example 5 (Wooldridge 5e Exercise 3.13 (our Exercise 2.1))

(i) Consider the simple regression model y = β0 + β1x + u under the classical(Gauss-Markov) assumptions 1–5. For some function g(x), for exampleg(x) = x2 or g(x) = log(1 + x2), define zi = g(xi ). Define a slope estimator as

β =

∑ni=1(zi − z)yi∑ni=1(zi − z)xi

.

Show that β1 is linear and unbiased. (Remember, because E[u|x] = 0, you cantreat both xi and zi as nonrandom in your derivation.)

(ii) Show that

var[β1

]=

σ2∑n

i=1(zi − z)2(∑ni=1(zi − z)zi

)2

(iii) Show that under the Gauss-Markov assumptions, var[β1

]≤ var

[β], where β1 is

the OLS estimator. (Hint. Cauchy-Schwartz inequality:(∑

aibi )2 ≤ (

∑ai ) (

∑bi ).)

Seppo Pynnonen Econometrics I