testing for multicollinearity using microsoft …...testing for multicollinearity using microsoft...

24
TESTING FOR MULTICOLLINEARITY USING MICROSOFT EXCEL: USING THE VARIANCE INFLATION FACTOR? ADESETE AHMED ADEFEMI COMPUTING AND INTERPRETING THE VARIANCE INFLATION FACTOR MARCH, 2019 RESEARCH SOLUTION GROUP PUBLISHED ARTICLES BY THE AUTHOR

Upload: others

Post on 27-Jan-2020

19 views

Category:

Documents


0 download

TRANSCRIPT

TESTING FOR MULTICOLLINEARITY USING MICROSOFT EXCEL: USING THE VARIANCE INFLATION FACTOR?

ADESETE AHMED ADEFEMI

COMPUTING AND INTERPRETING THE VARIANCE INFLATION FACTOR

MARCH, 2019

RESEARCH SOLUTION GROUP

PUBLISHED ARTICLES BY THE AUTHOR

ADESETE AHMED ADEFEMI 2

TESTING FOR MULTICOLLINEARITY USING MICROSOFT EXCEL Page 2

MULTICOLLINEARITY:

Multicollinearity is one of the factor often overlooked by researchers when using the ordinary least

square(OLS) technique. Multicollinearity can simply be described as a condition in which there is an

inter-association amongst independent/explanatory variables in a multiple regression model.

However, multicollinearity does not decrease the reliability/predictive power of the model but it may

affect calculations regarding the individual predictors.

If the correlation between two explanatory variables is 1, then we can conclude there is perfect

multicollinearity between the variables. That is, there is an exact linear relationship between the two

explanatory variables. On the other hand, if the correlation between the two explanatory variables is

zero(0), we can conclude that, there is no evidence of multicollinearity between the two variables. In

practice, there is rarely a case of perfect multicollinearity or no evidence of multicollinearity because

there is usually an iota of inter-association between variables except in rare cases of some raw data.

CAUSES/REASONS FOR MULTICOLLINEARITY

(1) There is tendency that most economic variables moves together overtime. In periods of economic

boom, economic variables such as employment, economic growth, investment increases together

during this period. However there is tendency for multicollinearity between these variables.

(2) The use of lagged values of dependent variables as explanatory variables is another common

reason for the little presence of multicollinearity in a multiple regression model. Investment in the

immediate past one period and investment in immediate past two periods may affect investment in the

current period.

CONSEQUENCES OF MULTICOLLINEARITY

(1) It may affect calculations regarding the individual predictors.

(2) The standard errors of the individual predictors are large.

ADESETE AHMED ADEFEMI 3

TESTING FOR MULTICOLLINEARITY USING MICROSOFT EXCEL Page 3

(3) In the case of a perfect multicollinearity, the estimates of the individual predictors are

indeterminate.

INDICATORS FOR DETECTING MULTICOLLINEARITY

(1) Large changes in the estimated regression coefficients when an explanatory variable is added or

deleted.

(2) The affected variables are statistically insignificant but the F-test indicates the model is

statistically significant.

(3) An explanatory variable is insignificant in a multiple regression but significant in a simple

regression.

TEST FOR DETECTING MULTICOLLINEARITY

(1) Correlation test that is by constructing a correlation matrix of the data. The major set-back of this

technique is, correlation indicates bivariate relationship but multicollinearity is a multivariate

phenomenon. Problem arises when a multiple regression model is involved.

(2) Variance Inflation Factor(VIF): VIF is majorly used to measure the severity of multicollinearity

in a multiple regression model. It is usually measured as the variance of a model with multiple terms

divided by the variance of a model with one term alone.

Decision criterion:

VIF value Multicollinearity severity Less than 5 Low

Between 5 and 10 Medium

Greater than 10 High

ADESETE AHMED ADEFEMI 4

TESTING FOR MULTICOLLINEARITY USING MICROSOFT EXCEL Page 4

VIF Formula =

R2 ------- Coefficient of determination for the estimated regression function

(3) Farrar-Glauber test: According to this test, explanatory variables in a multiple regression are

orthogonal if there is no multicollinearity between them. However, if the variables are not orthogonal,

there is little evidence of multicollinearity between them.

SOLUTION TO MULTICOLLINEARITY: If multicollinearity does not affect the predictor

coefficients much, the variable can be retained. If there is high degree of multicollinearity as a result

of lagged values of explanatory variables or the dependent variables, then a distributed lagged model

technique can be adopted. However, if the high degree of multicollinearity is as a result of other

factors;

(1) More data can be obtained by increasing the sample size of the data.

(2) Drop the variables causing multicollinearity but one has to be careful of mis-specification bias

when adopting this method.

(3) Partial least squares method can also be adopted.

In this article, our emphasis would be on using the Variance Inflation Factor to detect

multicollinearity in a multiple regression with the aid of Microsoft excel package.

ADESETE AHMED ADEFEMI 5

TESTING FOR MULTICOLLINEARITY USING MICROSOFT EXCEL Page 5

COMPUTING AND INTERPRETING THE VARIANCE INFLATION FACTOR IN

MICROSOFT EXCEL.

STEP ONE: Open the Microsoft excel file where the data is stored.

Note: MGDP is the dependent variable while INFR, UNEMP, EXR and FDI are the explanatory

variables.

TESTING FOR MULTICOLLINEARITY USING MICROSOFT EXCEL

STEP TWO: If you do not have data analysis tab in your Microsoft excel, Navigate to the highlighted

excel icon and Click

TESTING FOR MULTICOLLINEARITY USING MICROSOFT EXCEL

STEP TWO: If you do not have data analysis tab in your Microsoft excel, Navigate to the highlighted

excel icon and Click

TESTING FOR MULTICOLLINEARITY USING MICROSOFT EXCEL

STEP TWO: If you do not have data analysis tab in your Microsoft excel, Navigate to the highlighted

excel icon and Click excel options

TESTING FOR MULTICOLLINEARITY USING MICROSOFT EXCEL

STEP TWO: If you do not have data analysis tab in your Microsoft excel, Navigate to the highlighted

excel options

ADESETE AHMED ADEFEMI

TESTING FOR MULTICOLLINEARITY USING MICROSOFT EXCEL

STEP TWO: If you do not have data analysis tab in your Microsoft excel, Navigate to the highlighted

ADESETE AHMED ADEFEMI

TESTING FOR MULTICOLLINEARITY USING MICROSOFT EXCEL

STEP TWO: If you do not have data analysis tab in your Microsoft excel, Navigate to the highlighted

ADESETE AHMED ADEFEMI

STEP TWO: If you do not have data analysis tab in your Microsoft excel, Navigate to the highlighted

ADESETE AHMED ADEFEMI

STEP TWO: If you do not have data analysis tab in your Microsoft excel, Navigate to the highlighted

6

Page 6

STEP TWO: If you do not have data analysis tab in your Microsoft excel, Navigate to the highlighted

ADESETE AHMED ADEFEMI 7

TESTING FOR MULTICOLLINEARITY USING MICROSOFT EXCEL Page 7

STEP THREE: Click on add-ins, and navigate to Analysis toolpak and click on GO ;

ADESETE AHMED ADEFEMI 8

TESTING FOR MULTICOLLINEARITY USING MICROSOFT EXCEL Page 8

STEP FOUR: Click/Tick Analysis ToolPak, Analysis ToolPak-VBA and Solver Add-in then Click

OK

ADESETE AHMED ADEFEMI 9

TESTING FOR MULTICOLLINEARITY USING MICROSOFT EXCEL Page 9

You would observe you have a new tab Data in the Microsoft excel windows display

Click on the Data tab, this would appear

STEP FIVE: Arrange the data in order in which they would be regressed in the microsoft excel file

with each arrangement in separate excel sheets.

INFR, UNEMP, EXR and FDI are the explanatory variables in this study.

INFR = f(UNEMP, EXR, FDI) ................(1)

UNEMP = f(EXR, FDI, INFR) ................(2)

FDI = f(INFR, UNEMP, EXR) ................(3)

EXR = f(INFR, UNEMP, FDI) ................(4)

NOTE: The explanatory variable to be used as dependent variable should come first when

arranging the variables in Microsoft excel while the other explanatory variables follows.

ADESETE AHMED ADEFEMI 10

TESTING FOR MULTICOLLINEARITY USING MICROSOFT EXCEL Page 10

EQUATION 1:

For Equation 1 to examine the degree of multicollinearity between INFR and the other explanatory

variables. INFR is used as dependent variable while the other variables are retained as explanatory

variables.

INFR would come first in the Microsoft excel file followed by other explanatory variables as below:

ADESETE AHMED ADEFEMI 11

TESTING FOR MULTICOLLINEARITY USING MICROSOFT EXCEL Page 11

EQUATION 2:

For Equation 2 to examine the degree of multicollinearity between UNEMP and the other

explanatory variables. UNEMP is used as dependent variable while the other variables are retained as

explanatory variables.

UNEMP would come first in the Microsoft excel file followed by other explanatory variables as

below:

EQUATION 3:

For Equation 3 to examine the degree of multicollinearity between FDI and the other explanatory

variables. FDI is used as dependent variable while the other variables are retained as explanatory

variables.

FDI would come first in the Microsoft excel file followed by other explanatory variables as below:

ADESETE AHMED ADEFEMI 12

TESTING FOR MULTICOLLINEARITY USING MICROSOFT EXCEL Page 12

EQUATION 4:

For Equation 4 to examine the degree of multicollinearity between EXR and the other explanatory

variables. EXR is used as dependent variable while the other variables are retained as explanatory

variables.

EXR would come first in the Microsoft excel file followed by other explanatory variables as below:

ADESETE AHMED ADEFEMI 13

TESTING FOR MULTICOLLINEARITY USING MICROSOFT EXCEL Page 13

STEP SIX: Regress each explanatory variable on other explanatory variables. That is one explanatory

variable would be used as dependent variable while the other explanatory variable would be retained

as explanatory variable.

INFR, UNEMP, EXR and FDI are the explanatory variables in this study.

INFR = f(UNEMP, EXR, FDI) ................(1)

UNEMP = f(EXR, FDI, INFR) ................(2)

FDI = f(INFR, UNEMP, EXR) ................(3)

EXR = f(INFR, UNEMP, FDI) ................(4)

Navigate: Data >> Data analysis >> Regression

Click OK, this window displays

ADESETE AHMED ADEFEMI 14

TESTING FOR MULTICOLLINEARITY USING MICROSOFT EXCEL Page 14

STEP SEVEN:

EQUATION 1:

INFR = f(UNEMP, EXR, FDI)

Put the mouse cursor in the Input Y range and Highlight the Y variable which is INFR column

and do the same for the X variables which are UNEMP, EXR and FDI.

FOR Y

ADESETE AHMED ADEFEMI 15

TESTING FOR MULTICOLLINEARITY USING MICROSOFT EXCEL Page 15

FOR X

Click OK and the regression result shows:

The same procedure would be followed for other variables.

ADESETE AHMED ADEFEMI 16

TESTING FOR MULTICOLLINEARITY USING MICROSOFT EXCEL Page 16

EQUATION 2

EQUATION 3

ADESETE AHMED ADEFEMI 17

TESTING FOR MULTICOLLINEARITY USING MICROSOFT EXCEL Page 17

EQUATION 4

STEP EIGHT: Collect the R2 of each equation/variable and put it in a table

EQUATION VARIABLE R2

1 INFR 0.363227848

2 UNEMP 0.930691032

3 FDI 0.208836755

4 EXR 0.937053465

STEP NINE: Compute 1 - R2

EQUATION VARIABLE R2 1 - R2

1 INFR 0.363227848 0.636772

2 UNEMP 0.930691032 0.069309

3 FDI 0.208836755 0.791163

4 EXR 0.937053465 0.062947

ADESETE AHMED ADEFEMI 18

TESTING FOR MULTICOLLINEARITY USING MICROSOFT EXCEL Page 18

STEP TEN: Compute Variance Inflation factor for each variable and interpret.

VIF Formula =

EQUATION VARIABLE R2 1 - R2 VIF

1 INFR 0.363227848 0.636772 1.57042

2 UNEMP 0.930691032 0.069309 14.42815

3 FDI 0.208836755 0.791163 1.263962

4 EXR 0.937053465 0.062947 15.8865

Decision:

EQUATION VARIABLE R2 1 - R2 VIF Decision

1 INFR 0.363227848 0.636772 1.57042

VIF < 5, There is little or no evidence of

multicollinearity of INFR with other explanatory

variables.

2 UNEMP 0.930691032 0.069309 14.42815

VIF > 10, There is evidence of high

multicollinearity of UNEMP with other

explanatory variables.

3 FDI 0.208836755 0.791163 1.263962

VIF < 5, There is little or no evidence of

multicollinearity of FDI with other explanatory

variables.

4 EXR 0.937053465 0.062947 15.8865

VIF > 10, There is evidence of high

multicollinearity of EXR with other explanatory

variables.

ADESETE AHMED ADEFEMI 19

TESTING FOR MULTICOLLINEARITY USING MICROSOFT EXCEL Page 19

STEP ELEVEN: Investigate the source of the multicollinearity by computing a correlation

matrix.

Navigate: Data >> Data analysis >> Correlation

Then highlight all explanatory variables:

Click OK

From the result of the correlation matrix, there is high correlation of above 0.5 between

inflation rate and exchange rate, between unemployment rate and exchange rate which shows

the source of the multicollinearity of exchange rate and unemployment rate with other

explanatory variables. This result shows that all other explanatory variable shares little

ADESETE AHMED ADEFEMI 20

TESTING FOR MULTICOLLINEARITY USING MICROSOFT EXCEL Page 20

evidence of multicollinearity with exchange rate and unemployment rate except inflation rate

and exchange rate which shows there is multicollinearity only between inflation rate and

exchange rate, and only multicollinearity between unemployment rate and exchange rate.

STEP TWELVE: Either the sample size of the data is increased or the multicollinear variables are

added and removed from the final regression and the R2 , significance, sign and size of the variables

being observed to choose the best model. Or the multicollinear variables are removed, but in the case

of this study, both multicollinear variables are economic indicators so it would be better to increase

their sample size or try adding and removing the variables to check the most important variable of the

two.

MGDP = f(UNEMP, INFR, EXR, FDI)

Removing both variables

Removing both variables gives an adjusted R2 of 0.409 and a significance F of 0.000142105 and also,

unemployment is the only significant factor with a coefficient estimate of 0.119243 and the sign of

UNEMP and FDI is positive and negative respectively.

ADESETE AHMED ADEFEMI 21

TESTING FOR MULTICOLLINEARITY USING MICROSOFT EXCEL Page 21

Adding INFR

Adding INFR gives an adjusted R2 of 0.432 and a significance F of 0.000209395 and also,

unemployment is the only significant factor with a coefficient estimate of 0.139497 and the sign of

UNEMP and FDI is positive and negative respectively as with the regression result of removing both

variables.

Adding EXR

ADESETE AHMED ADEFEMI 22

TESTING FOR MULTICOLLINEARITY USING MICROSOFT EXCEL Page 22

Adding EXR and removing INFR gives an adjusted R2 of 0.446549 and a significance F of

0.000144929 and also, unemployment is a significant factor at both 5% and 10% significance level

while EXR is a significant factor at 10% with a coefficient estimate of 0.260698 and -1.301745

respectively. The sign of UNEMP and FDI is positive and negative respectively as with the previous

regression results above.

Adding both EXR and INFR

Adding both EXR and INFR gives an adjusted R2 of 0.441376 and a significance F of 0.000363 and

also, unemployment is a significant factor at 5% significance level and a coefficient estimate of -

1.301745. The sign of UNEMP, EXR, INFR and FDI are the same as with the previous regression

results above.

However, this study concludes that regression result in which only EXR is added and INFR is

removed is the most appropriate result because adding INFR decreases the adjusted R2 , increases the

significance F and also decreased the number of significant variables from two to one. However INFR

would be removed from the final equation to correct for the effect of high multicollinearity.

ADESETE AHMED ADEFEMI 23

TESTING FOR MULTICOLLINEARITY USING MICROSOFT EXCEL Page 23

However, for further rigorous analysis, the mis-specification bias test is necessary to determine if

there is specification bias after removing or adding a variable.

FINAL EQUATION

Any further research questions or questions regarding this article should be forwarded to

[email protected].

Use of information on this publication/website is at your own risk. No part of this publication

may be reproduced, downloaded or transmitted in any form or by any means published

somewhere without the author permission. All publications are copyrighted by the author and

publications are used for research and understanding purposes.

ADESETE AHMED ADEFEMI 24

TESTING FOR MULTICOLLINEARITY USING MICROSOFT EXCEL Page 24

DATA USED

YEAR MGDP INFR UNEMP EXR FDI 1981 3.455099 20.81282 5.2 1.648659 5.813235 1982 3.603718 7.697747 4.3 1.458615 5.669881 1983 3.745081 23.21233 6.4 1.856298 5.577085 1984 3.647969 17.82053 6.2 1.824549 5.887215 1985 3.853885 7.435345 6.1 1.808289 6.073275 1986 3.466818 5.717151 5.3 1.667707 6.600958 1987 3.837841 11.29032 7 1.94591 7.804986 1988 4.113863 54.51122 5.1 1.629241 7.449033 1989 4.346632 50.46669 4.5 1.504077 9.538017 1990 3.709202 7.3644 3.5 1.252763 8.45242 1991 4.591557 13.00697 3.1 1.131402 8.841607 1992 4.972358 44.58884 3.5 1.252763 9.579356 1993 5.111336 57.16525 3.4 1.223775 10.29756 1994 5.392955 57.03171 3.2 1.163151 3.100092 1995 5.689688 72.8355 1.9 0.641854 4.329417 1996 5.859654 29.26829 2.8 1.029619 4.712229 1997 5.947055 8.529874 3.4 1.223775 4.705016 1998 5.980828 9.996378 3.5 1.252763 4.390739 1999 6.054937 6.618373 17.5 2.862201 4.530447 2000 6.148523 6.933292 18.5 2.917771 4.75359 2001 6.283754 18.87365 13.7 2.617396 4.885828 2002 6.23016 12.87658 12.2 2.501436 5.416989 2003 6.143781 14.03178 14.8 2.694627 5.554509 2004 5.855978 14.99803 11.8 2.4681 5.514235 2005 6.012168 17.86349 11.9 2.476538 7.560705 2006 6.170707 8.239527 12.3 2.509599 10.62424 2007 6.255526 5.382224 12.3 2.509599 11.60058 2008 6.372591 11.57798 12.7 2.541602 11.73322 2009 6.417237 11.53767 14.7 2.687847 12.33312 2010 6.466254 13.7202 14.7 2.687847 11.82795 2011 6.543644 10.84079 21.1 3.049273 11.7414 2012 6.635247 12.21701 23.9 3.173878 12.39253 2013 6.714001 8.475827 23 3.135494 12.17091