the multiple regression model under the classical assumptions

The Multiple Regression Model under the ClassicalAssumptions

() Introductory Econometrics: Topic 4 1 / 49

The Multiple Regression Model

The multiple regression model with k explanatory variables is:

Yi = α+ β1X1i + β2X2i + ..+ βkXki + εi

where i subscripts to denote individual observations and i = 1, ..,N.

This set of slides discusses how statistical derivations for simpleregression (Topic 3) extend to multiple regression.

Also derives some results (e.g. about omitted variables bias andmulticollinearity) that were intuitively motivated in Topic 1.

This set of slides is based on Chapter 4 of the textbook


Basic Theoretical Results

In Topic 3 we derived theoretical results using simple regression modelwith classical assumptions

In Topic 4 we will retain classical assumptions which are:

1 E (εi ) = 0 —mean zero errors.2 var (εi ) = E

(ε2i)= σ2 —constant variance errors (homoskedasticity).

3 cov (εi εj ) = 0 for i 6= j .4 εi is Normally distributed5 Explanatory variables are fixed. They are not random variables.


Statistical results for multiple regression model are basically the sameas for simple regression

For instance, OLS is still unbiased and confidence intervals andhypothesis tests are derived in the same way.

Gauss Markov theorem still says that, under the classical assumptions,OLS is BLUE.

Formulae do get messier (which is why, in more advanced courses,matrix algebra is used with the multiple regression model).

But basic ideas are the same


Example: multiple regression with an intercept and two explanatoryvariables:

Yi = α+ β1X1i + β2X2i + εi

OLS estimator is still the one which minimizes the sum of squaredresiduals.

However, formulae turn out to be a bit messier:


β1 =(∑ x1iyi )

(∑ x22i

)− (∑ x2iyi ) (∑ x1ix2i )

(∑ x21i ) (∑ x22i )− (∑ x1ix2i )

2

β2 =(∑ x2iyi )

(∑ x21i

)− (∑ x1iyi ) (∑ x1ix2i )

(∑ x21i ) (∑ x22i )− (∑ x1ix2i )

2

α = Y − β1X 1 − β2X 2

where variables with bars over them are means (e.g. X 2 = ∑X2iN )

The previous formulae used small letters to indicate deviations frommeans. That is,

yi = Yi − Yx1i = X1i − X 1x2i = X2i − X 2


It can be shown that OLS is unbiased (e.g. E(

βj

)= βj for j = 1, 2)

An unbiased estimator for σ2 is:

s2 =∑ ε2i

N − k − 1

whereεi = Yi − α− β1X1i + β2X2i

An estimate of the variances of the OLS estimators:

var(

β1

)= s2

(1−r 2)∑ x 21i

var(

β2

)= s2

(1−r 2)∑ x 22i

r is the correlation between X1 and X2.

var (α) also has formula, but I will not provide here


Do not memorize these formulae. I provide them here mostly to showyou how messy things the formulae get (and to motivate why we havebeen deriving key theoretical results in the simple regression case).

Don’t forget the interpretation of coeffi cients in the multipleregression model is:

"βj is the marginal effect of the jth explanatory variable on Y ,

holding all the other explanatory variables constant"


Measures of Fit

R2 has the same definition as in Topic 1:

R2 = 1− SSRTSS

= 1− ∑ ε2i

∑(Yi − Y

)2Interpretation: R2 is the proportion of the variability in the dependentvariable which can be explained by the explanatory variables.Note: When new explanatory variables are added to a model, the R2

will always rise (even if new variables are insignificant).Why? By adding a new variable, ∑ ε2i will always get at least a littlebit smallerAdding an explanatory variable cannot make fit worse.So R2 should not be used to decide whether to add a new variable toa regression.What you should do is a hypothesis test (test whether new variablesignificant and, if not, drop it).


Alternatively, use R2which is similar to R2 but does not always rise

when new variables are added.If you add a new variable and R

2increases, you can be confident this

new variable should be included.If you have two regressions (with the same dependent variable butdifferent explanatory variables), then the one with the higher R

2is

the better one.Definition:

R2= 1− var (ε)

var (Y )= 1− s2

1N−1 ∑

(Yi − Y

)2The only problem with R

2is that it CANNOT be interpreted simply

as reflecting the proportion of the variability in the dependent variablewhich can be explained by the explanatory variablesSummary: R2 has a nice interpretation as a measure of fit, but shouldnot be used for choosing between models.R2does not have as nice an interpretation as a measure of fit, but

can be used for choosing between models.() Introductory Econometrics: Topic 4 10 / 49

Hypothesis Testing in the Multiple Regression Model

Yi = α+ β1X1i + β2X2i + ..+ βkXki + εi .

We know how to do t-tests of H0 : βj = 0

In Topic 1 we presented a test for whether R2 = 0

This is equivalent to a test of the hypothesis:

H0 : β1 = .. = βk = 0

Remember: testing the hypothesis H0 : β1 = .. = βk = 0 is not thesame as testing the k individual hypotheses H0 : β1 = 0 andH0 : β2 = 0 through H0 : βk = 0


But there are other hypotheses that you may want to test. E.g.

H0 : β1 = β3 = 0

orH0 : β1 − β3 = 0, β2 = 5

etc. etc.

F-tests, are suitable for testing hypotheses involving any number oflinear combinations of regression coeffi cients.

Likelihood ratio tests, can do the same, but can also be used fornonlinear restrictions and can be used with models other than theregression model.

In this course, we only have time to do F-tests


F-tests

We will not provide proofs relating to F-tests.

Distinguish between unrestricted and restricted model

Unrestricted model is the multiple regression model as above

Restricted model is the multiple regression model with the restrictionsin H0 imposed


Examples using k = 3:

The unrestricted model:

Yi = α+ β1X1i + β2X2i + β3X3i + εi

Consider testingH0 : β1 = β2 = 0.

Then restricted model is:

Yi = α+ β3X3i + εi


Now consider testing

H0 : β1 = 0, β2 + β3 = 0

Imposing this restriction yields a restricted model:

Yi = α+ β2 (X2i − X3i ) + εi

This restricted model is regression with (X2 − X3) being theexplanatory variable.


Now consider testing

H0 : β1 = 1, β2 = 0

Imposing this restriction yields a restricted model:

Yi − X1i = α+ β3X3i + εi

This restricted model is regression with (Yi − X1i ) as dependentvariable and X3 being the explanatory variable.

In general: for any set of linear restrictions on an unrestricted modelyou can write the restricted model as a linear regression model (butwith dependent and explanatory variables defined appropriately).


For testing such hypotheses (involving more than one linear restrictionon the regression coeffi cients), the following test statistic is used:

F =(SSRR − SSRUR ) /qSSRUR/ (N − k − 1)

where SSR is the familiar sum of squared residuals and the subscriptsUR and R distinguish between the SSR from the "unrestricted" and"restricted" regression models.

The number of restrictions being tested is q (Note: q = 2 in theexamples above).

Since SSRR > SSRUR (i.e. the model with fewer restrictions canalways achieve the lower SSR), it can be seen that F is positive.

Note: F is positive and large values of F indicate the null hypothesisis incorrect.


As with any hypothesis test, you calculate the test statistic (here F )and compare it to a critical value. If F is greater than the criticalvalue you reject H0 (else you accept H0).To get the critical value you need to specify a level of significance(usually .05) and, using this, obtain a critical value from statisticaltables.F is distributed as Fq,N−k−1 (in words, critical values should beobtained from the F-distribution with q degrees of freedom in thenumerator and N − k − 1 degrees of freedom in the denominator).Statistical tables for the F-distribution are available in many places(including the textbook).Appendix B gives an example of how to use these tablesIn practice, the more sophisticated econometrics package will provideyou with a P-value for any test you do.Remember: Useful (but not quite correct) intuition: "P-value is theprobability that H0 is true"A correct interpretation: "P-value equals the smallest level ofsignificance at which you can reject H0"


Multicollinearity

This is a fairly common practical problem in empirical work

We discussed it in Topic 1, here we will add a few technical details

Intuition: If the explanatory variables are very highly correlated withone another you run into problems

Informally speaking, if two variables are highly correlated they containroughly the same information. The OLS estimator has troubleestimating two separate marginal effects for two such highlycorrelated variables.


Perfect Multicollinearity

Occurs if an exact linear relationship exists between the explanatoryvariables.

This implies the correlation between two explanatory variables equals1.

OLS estimates cannot be calculated (i.e. Gretl will not be able to finda solution and will give an error message).


Example: A regression relating to the effect of studying on studentperformance.

Y = student grade on test

X1 = family income

X2 = hours studied per day

X3 = hours studied per week.

But X3 = 7X2 —an exact linear relationship between two explanatoryvariables (they are perfectly correlated).

This is a case of perfect multicollinearity.

Intuition: β2 will measure the marginal effect of hours studied per dayon student grade, holding all other explanatory variables constant.

With perfect multicollinearity there is no way of "holding all otherexplanatory variables constant" —when X2 changes, then X3 willchange as well (it cannot be held constant).


Regular Multicollinearity

In practice you will never get perfect multicollinearity, unless you dosomething that does not make sense (like put in two explanatoryvariables which measure the exact same thing).

And if you ever do try and estimate a model with perfectmulticollinearity you will find out quickly – Gretl will not runproperly and will give you an error message.

But in empirical work you can easily get very highly correlatedexplanatory variables.


Example from Topic 1 slides:

Macroeconomic regression involving the interest rate.

X1 = interest rate set by Bank of England

X2 = interest rate charged by banks on mortgages.

X1 and X2 will not be exactly the same, but will be very highlycorrelated (e.g. r = .99).

Multicollinearity of this form can cause problems too.


I will not provide full proofs, but outline the basic idea

Basic idea of technical proofs is based on variances of OLS estimators.

Previously, we wrote:

var(

β1

)= s2

(1−r 2)∑ x 21i

var(

β2

)= s2

(1−r 2)∑ x 22i

If r is near 1 (or near −1), then(1− r2

)will be near zero.

Variances of β1 and β2 become very large.

This feeds through into very wide confidence intervals (i.e. inaccurateestimates) and very small t-statistics (i.e. hypothesis tests indicate β1and β2 are insignificant).


Common symptom of multicollinearity problem:

Some or all explanatory variables appear insignificant, even thoughthe model might be fitting well (has a high R2).

Common way to investigate if multicollinearity is a problem:

Calculate a correlation matrix for your explanatory variables. See ifany correlations are very high.

Note: What does we mean by a correlation being "high"?

As a rough guideline, if you find correlations between your explanatoryvariables |r | > .9 then you probably have a multicollinearity problem.


Solutions to multicollinearity problem:

1. Get more data (often not possible).

2. Drop out one of the highly correlated variables.

Example: Macroeconomic regression involving the interest rate(continued)

If you include both X1 and X2 you will run into a multicollinearityproblem. So include one or the other (not both).


Omitted Variables Bias

We introduced this non-technically in Topic 1 slides

Now we provide proof (in the case of regression with 2 explanatoryvariables)

Assume true model is:

Yi = α+ β1X1i + β2X2i + εi

and the classical assumptions hold.

Correct OLS estimate of β1 is

β1 =(∑ x1iyi )

(∑ x22i

)− (∑ x2iyi ) (∑ x1ix2i )

(∑ x21i ) (∑ x22i )− (∑ x1ix2i )

2


What if you mistakenly omit X2 and work with the followingregression?


A simple extension (to include the intercept) of the derivations madein Topic 3 slides shows OLS estimator for β1 is:

β1 =∑ x1iyi∑ x21i

Note: use notation β1 to distinguish it from the correct OLS estimator

β1 is biased.

Hence, omitted variables bias


Proof that β1 is biased

Step 1

Y =∑YiN

=∑ (α+ β1X1i + β2X2i + εi )

N= α+ β1X 1 + β2X 2 + ε

Step 2

yi = Yi − Y= (α+ β1X1i + β2X2i + εi )−(

α+ β1X 1 + β2X 2 + ε)

= β1x1i + β2x2i + εi − ε.


Step 3: using Step 2 result, replace yi in formula for β1:

β1 =∑ x1i (β1x1i + β2x2i + εi − ε)

∑ x21i

=β1 ∑ x21i

∑ x21i+

β2 ∑ x1ix2i∑ x21i

+∑ x1i (εi − ε)

∑ x21i

= β1 +β2 ∑ x1ix2i

∑ x21i+

∑ x1i (εi − ε)

∑ x21i


Step 4: take expected value of both sides of this equation:

E(

β1

)= E

(β1 +


+∑ x1i (εi − ε)

∑ x21i

)= β1 +


,


The result at the end of Step 4 shows that E(

β1

)6= β1

This proves β1 is biased

In words: if we omit an explanatory variable in the regression whichshould be included, we obtain a biased estimate of the coeffi cient onthe included explanatory variable.


Note that the size of bias is given by


Hence, omitted variables bias does not exist if β2 = 0

But if β2 = 0 then X2 does not belong in the regression so it is okayto omit it.

Note that omitted variables bias does not exist if ∑ x1i x2i∑ x 21i

= 0.

Can show (see Chapter 1) that this implies correlation between X1and X2 is zero

Hence, if omitted variable is uncorrelated with the included one youare okay (no omitted variables bias)


Inclusion of Irrelevant Explanatory Variables

Now reverse role of the two models discussed in omitted variables biassection

True model:Yi = α+ β1X1i + εi

and classical assumptions hold

Incorrect model adds an irrelevant variable:

Yi = α+ β1X1i + β2X2i + εi .

Using incorrect model get OLS estimate:

β1 =(∑ x1iyi )

(∑ x22i

)− (∑ x2iyi ) (∑ x1ix2i )

(∑ x21i ) (∑ x22i )− (∑ x1ix2i )

2


But correct OLS estimate is:

β1 =∑ x1iyi∑ x21i

Gauss-Markov theorem tells us that β1 is the best linear unbiasedestimator.

Thus, β1 has a smaller variance than any other unbiased estimator.

If we can show that β1 is unbiased, then Gauss-Markov theorem tell

us var(

β1

)> var

(β1

)This proves that including irrelevant explanatory variables will lead toless precise estimates.

Proof that β1 is unbiased is given in the textbook, page 100


Important Message for Empirical Practice

Omitted variables bias says:

"you should always try to include all those explanatory variables thatcould affect the dependent variable"

Inclusion of irrelevant explanatory variables section says:

"you should always try not to include irrelevant variables, since thiswill decrease the accuracy of the estimation of all the coeffi cients(even the ones that are not irrelevant)"

How do you play off these considerations?

Begin with as many explanatory variables as possible, then usehypothesis testing procedures to discard those that are irrelevant (andthen re-run the regression with the new set of explanatory variables).


Nonlinear Regression

The linear regression model is:

Yi = α+ β1X1i + ..+ βkXki + εi

Sometimes the relationship between your explanatory and dependentvariables is nonlinear.

Yi = f (X1i , ..,Xki ) + εi

where f () is some nonlinear function.

In some cases, nonlinear regressions is a bit more complicated (usingtechniques beyond those covered in this textbook). However, in manycases a nonlinear function can be transformed into a linear one —andthen linear regression techniques can be used.


Example:

Yi = β0Xβ11i X

β22i ..X

βkki

This becomes:

ln (Yi ) = α+ β1 ln (X1i ) + ..+ βk ln (Xki )

where α = ln (β0).

So you can run a regression of the logged dependent variable on logsof the explanatory variables.


How to decide which nonlinear form?

It can be hard to decide which nonlinear form is appropriate. Here area few pieces of advice.

Sometimes economic theory suggests a particular functional form.

For instance, the previous example arises when one is using aCobb-Douglas production function.

Experiment with different functional forms and use hypothesis testingprocedures or R

2to decide between them.


Example: Is there a quadratic pattern?

Run OLS regressions on two models:


andYi = α+ β1X1i + β2X

21i + εi

and choose the quadratic model if its R2is higher than the linear

model.

Alternatively, run OLS on quadratic model and test whether β2 = 0.


Warning: you can only use R2to compare models involving nonlinear

transformations of the explanatory variables. You cannot use it tocompare models which tranform the dependent variable in differentways.

Remember,

R2= 1− var (ε)

var (Y )

In order to use it for choosing a model, all models must have thesame Y .


Example: Comparing two models:


andln (Yi ) = α+ β1X1i + εi

You CANNOT use R2to decide which of these models to use.

Next slides describes a test for whether dependent variable should belogged or not


Should you use a linear or log-linear regression?

Suppose we wish to choose between:

Yi = α+ β1X1i + ...+ βkXki + εi

andln (Yi ) = α+ β1 ln (X1i ) + ..+ βk ln (Xki ) + εi .

Terminology: first is linear regression, second equation is log-linearregression model

Note: does not matter whether all the explanatory variables in thesecond equation are logged or not.


A test of the hypothesis that linear and log linear models fit the dataequally well, can be done by first constucting a new dependentvariable:

Y ∗i =YiY

Y is the geometric mean: Y = (Y1Y2..YN )1N


Steps of test:

1 Run the linear and log-linear regressions, except using Y ∗ and ln (Y ∗)as dependent variables. Let SSRLIN and SSRLOG be the sums ofsquared residuals for these two models.

2 If SSRLIN > SSRLOG then use test statistic:

LL1 =12N

ln(SSRLINSSRLOG

)3 Take critical value from χ21 statistical tables. 5% critical value is3.841.

4 If LL1 > 3.841 then use log-linear model.5 if LL1 < 3.841, accept the hypothesis that there is no differencebetween the linear and log-linear regressions.


Previous steps assumed SSRLIN > SSRLOG .

If SSRLOG > SSRLIN , the test statistic becomes:

LL2 =12N

ln(SSRLOGSSRLIN

)If LL2 > 3.841 then use linear model.

If LL2 < 3.841, conclude there is no difference between linear andlog-linear regressions.


Interpretation of Coeffi cients when Variables are Logged

Consider log-linear regression

ln (Yi ) = α+ β1 ln (X1i ) + ..+ βk ln (Xki ) + εi .

Interpretation of coeffi cients is as an elasticity: if Xj increases by onepercent, Y tends to increase by βj percent (ceteris paribus)

Now consider a regression where some variables are not logged suchas:

ln (Yi ) = α+ β1 ln (X1i ) + β2X2i + εi

β1 has elasticity interpretation, but β2 has interpretation: if X2increases by one unit, Y tends to increase by β2 percent (ceterisparibus)


Chapter Summary

Much of Topic 4 slides builds on Topic 1 (Non-technical introductionto regression) and Topic 3 (on deriving statistical results for simpleregression)

Issues in Topic 1 (i.e. omitted variables bias, the impact of includingirrelevant variables and multicollinearity) treated more formally in astatistical sense.

Most derivations (e.g. of confidence intervals, t-tests, etc.) for simpleregression extend in conceptually straightforward manner to multipleregression

This topic introduced a new framework for hypothesis testing: F-tests.These are useful for testing multiple hypotheses about regressioncoeffi cients.


This topic discussed selection of the appropriate functional form of aregression.

Many nonlinear relationships can be made linear through anappropriate transformation (nothing is new, simply run a regressionwith transformed variables).

Care must be taken with interpretation of coeffi cients in nonlinearregression. Care has to be taken when choosing between modelswhich have different nonlinear transformations of the dependentvariable.


the multiple regression model under the classical assumptions

Documents