multiple regression analysis & multicolinearity by humayun yousaf - hassaan wasti
DESCRIPTION
MULTIPLE REGRESSION ANALYSIS & MULTICOLINEARITYTRANSCRIPT
INSTITUTE OF BUSINESS ADMINISTRATION KARACHI
MULTIPLE REGRESSION
ANALYSIS &
MULTICOLINEARITY QMDM TERM PAPER
HUMAYUN YOUSAF ERP: 06670
SYED HASSAN MAHMOOD WASTI ERP :06668
Submitted to:
Dr. Abdus Salam
MULTIPLE REGRESSION ANALYSIS & MULTICOLINEARITY
Institute of Business Administration Karachi, December 28th, 2014 Page 1
Table of Contents
1. ABSTRACT ......................................................................................................... 2
2. MULTIPLE REGRESSION FRAMEWORK .......................................................... 2
3. APPLICATION OF SPSS .................................................................................... 4
4. MULTICOLINEARITY IN REGRESSION ANALYSIS .......................................... 6
5. APPLICATION OF TECHNIQUE USING E-VIEWS............................................. 8
6. APPLICATION OF TECHNIQUE USING EXCEL .............................................. 12
7. MAIN FINDINGS:............................................................................................... 13
8. CONCLUSION AND IMPLICATIONS: ............................................................... 13
9. REFERENCES .................................................................................................. 13
MULTIPLE REGRESSION ANALYSIS & MULTICOLINEARITY
Institute of Business Administration Karachi, December 28th, 2014 Page 2
1. ABSTRACT
The aim of this study is to strengthen our knowledge regarding Statistical Technique
of Multiple Regression analysis and impact of Multicolinearity on such analysis. This
term paper uses a real life example of wheat prices and different factors affecting the
price of wheat in Pakistan. Results have ben obtained using SPSS, E-VIEWS, and
MS-EXCEL software packages.
2. MULTIPLE REGRESSION FRAMEWORK
Multiple regression is used to explore the relationships among the variables.
Assumptions:
Following assumptions are considered while running a linear regression analysis.
Normality
All the variables involved in analysis follow Normal Distribution
Homoscedasticity
There is Nil Volatility in Volatility, i.e Variance is same for all the predictor and
dependent variables
Linearity
The Linearity between dependent and independent variables holds
Independent predictor variables
All the predictor variables are independent of each other, i.e. change in one
doesn’t affect the value of others
MULTIPLE REGRESSION ANALYSIS & MULTICOLINEARITY
Institute of Business Administration Karachi, December 28th, 2014 Page 3
Limitations:
The major conceptual limitation of all regression techniques is that we can
only ascertain relationships, but never be sure about the underlying causal
mechanism.
Objective (Based on Problem):
To find the effect on price of wheat in Pakistan to variation in Price of Rice ,
Price of Fertilizer and Wheat Price Index.
Hypothesis:
Null Hypothesis:
Values of dependent Variables Doesn’t have any effect on independent
variable.
i.e. β1=0 , β2=0, β3=0
Alternate Hypothesis:
Values of Independent Variables have effect on independent variable.
β1≠0 ,β2≠0,β3≠0
POW = f(POR, POF,WPI)
So, the Estimation equation will become,
POW = β0 + β1*(POR) + β2*(POF) + β3*(WPI) + ε
Where,
POW = Price of Wheat (Rs/40Kg) --------------------------------------------Dependent
POR = Price of Rice (Rs/40kg) -------------------------------------------Independent
POF = Price of Fertilizer(Rs/40Kg) -----------------------------------------Independent
WPI = Wheat Price Index (IGC AND FAO Asian Wheat Price Indicator) Independent
MULTIPLE REGRESSION ANALYSIS & MULTICOLINEARITY
Institute of Business Administration Karachi, December 28th, 2014 Page 4
Expected Signs of coefficients are as follows:
β1 > 0
β2 > 0
β3 > 0
3. APPLICATION OF SPSS
SPSS was used to run the analysis based on 10 year Scale data obtained from
sources mentioned in reference and following observations were made.
R square
It summarizes the proportion of variance in the dependent variable
associated with the independent variables. Ideally it should be close to
1. In our case value is .998
Durbin Watson
the Durbin–Watson statistic is a test statistic used to detect the
presence of autocorrelation. Ideally it should be 2. Our value is 2.9
which means negative autocorrelation exists.
MULTIPLE REGRESSION ANALYSIS & MULTICOLINEARITY
Institute of Business Administration Karachi, December 28th, 2014 Page 5
Std Error of Estimate
This represents the average distance that the observed values fall from
the regression line. Lower Values are better.
T Statistic
In Multiple Regression Analysis T statistic tests the hypothesis that a
population regression coefficient is 0
Significance (P Value)
It should be Less than 0.05 for significance.
MULTIPLE REGRESSION ANALYSIS & MULTICOLINEARITY
Institute of Business Administration Karachi, December 28th, 2014 Page 6
4. MULTICOLINEARITY IN REGRESSION ANALYSIS
When high correlations among the explanatory variables lead to erratic point
estimates of the coefficients, large standard errors and unsatisfactorily low t
statistics, the regression is said to said to be suffering from multicollinearity
Checks of Multi Colinearity in SPSS:
Tolerance
It is an indication of percentage of variance in the independent variable
that can not be accounted for by other independent variables. If the
value is less than 0.1 it requires further investigation.
VIF (Variance inflation Factor)
VIF is 1 / Tolerance and a value greater than 10 requires further
investigation
Eigen Value
If Several eigenvalues are close to 0, indicating that the independent
values are highly intercorrelated
Condition Index
The condition indices are computed as the square roots of the ratios of
the largest eigenvalue to each successive eigenvalue. Values greater
than 15 indicate a possible problem with collinearity
Measures of reducing Multi colinearity:
Bringing more variables into the model and reducing the population variance
of the disturbance term.
Increase the number of observations
Combine the correlated variables
Drop some of the correlated variables
MULTIPLE REGRESSION ANALYSIS & MULTICOLINEARITY
Institute of Business Administration Karachi, December 28th, 2014 Page 7
Correlation Matrix:
Box Plot of Variables used in Multiple Regression Analysis:
MULTIPLE REGRESSION ANALYSIS & MULTICOLINEARITY
Institute of Business Administration Karachi, December 28th, 2014 Page 8
5. Application of Technique Using E-VIEWS
Dependent Variable: POW
Method: Least Squares
Date:15/12/14 Time: 20:13
Sample (adjusted): 2000 2010
Variable Coefficient Std. Error t-Statistic Prob.
C 26.33039 16.79167 1.568063 0.1609
POR 0.290108 0.018394 15.77183 0.0000
POF 0.002785 0.000696 4.004490 0.0052
WPI 1.398779 0.226640 6.171805 0.0005
R-squared 0.997910 Mean dependent var 542.2727
Adjusted R-squared 0.997014 S.D. dependent var 277.2757
S.E. of regression 15.15175 Akaike info criterion 8.549397
Sum squared resid 1607.030 Schwarz criterion 8.694087
Log likelihood -43.02169 Hannan-Quinn criter. 8.458191
F-statistic 1113.955 Durbin-Watson stat 2.901714
Prob (F-statistic) 0.000000
DESCRIPTIVE STATISTICS
CORRLEATION MATRIX:
POW POR POF WPI
POW 1.000000 0.986789 0.902957 0.936992
POR 0.986789 1.000000 0.855768 0.883460
POF 0.902957 0.855768 1.000000 0.839871
MULTIPLE REGRESSION ANALYSIS & MULTICOLINEARITY
Institute of Business Administration Karachi, December 28th, 2014 Page 9
WPI 0.936992 0.883460 0.839871 1.000000
CORRELATION MATRIX
Correlation Matrix shows the co-dependency of all the variables on independent
variable and on each other.
CONCLUSION FROM DATA:
After Correlation analysis; it can be interpreted from the correlation matrix that all the
independent variables, i.e. price of rice, Price of Fertilizer and Wheat Price Index
have strong correlation with dependent variable and among themselves.
The highest Correlation exists between Price of wheat and Price of Rice which is 98.67%.
The Correlation between Price of wheat and Price of Fertilizer is 90.29%.
The Correlation between Price of wheat and Wheat Price Index is 93.60%.
Weakest Correlation among the values obtained in the matrix exists between Price of Fertilizer and Wheat Price Index. This can be explained by the difference of supply side variables affecting price of fertilizer as compared to price of Wheat Internationally.
ESTIMATION MODEL:
POW POR POF WPI
Mean 542.2727 942.7273 856.36 130.7000
Median 415.0000 664.0000 617.600 107.1000
Maximum 950.0000 2039.200 2037.00 232.1000
Minimum 300.0000 424.0000 364.000 85.80000
Std. Dev. 277.2757 621.6864 567.43 48.09239
Skewness 0.726116 0.864522 0.886659 0.901452
Kurtosis 1.777608 2.046407 2.506446 2.602117
MULTIPLE REGRESSION ANALYSIS & MULTICOLINEARITY
Institute of Business Administration Karachi, December 28th, 2014 Page 10
Jarque-Bera 1.651476 1.787012 1.552949 1.562356
Probability 0.437912 0.409219 0.460025 0.457866
Sum 5965.000 10370.00 9420.0 1437.700
Sum Sq. Dev. 768818.2 3864939. 5 8453238 23128.78
Observations 11 11 11 11
ESTIMATION MODEL OF THE OUTPUT
A total of 11 observations were recorded as data set for the study Estimation Model Shows the Mean, Median, Maxima and Minima of the set.
CONCLUSION FROM DATA:
Mean of POW is 542 while minimum value is 300 and maximum 950, Median value of POW data set is 415
Mean of POR is 943 while minimum value is 424 and maximum 2039
Mean of POF is 856.36 while minimum value is 364 and maximum 2037
Mean of WPI is 130 , maximum value has been 232 and minimum value came out to be 85 only
Graphical Analysis:
Overlapping the fitted graph which has been obtained by the equation to the Actual
graph which was obtained from directly plotting the real values can gives us valuable
insights into the strength of our resulted equation and could prove to be a useful tool
in analyzing the behavior of the values where significant differences occur (spikes
and dips between the curves). After spotting these anomalies one can look into detail
of that particular data entry and find the justification of this behavior. Residual graph
gives has the graphical view of fitness of our curve.
MULTIPLE REGRESSION ANALYSIS & MULTICOLINEARITY
Institute of Business Administration Karachi, December 28th, 2014 Page 11
GRAPH: ACTUAL, FITTED, RESIDUAL GRAPH
This graph indicates that the model estimated is a best fit model. It is also evident
from the graph that actual and fitted lines are highly correlated and follows similar
trend. The margin of error is also very low ranging from ‐25 to +25
-30
-20
-10
0
10
20
30
200
400
600
800
1,000
00 01 02 03 04 05 06 07 08 09 10
Residual Actual Fitted
MULTIPLE REGRESSION ANALYSIS & MULTICOLINEARITY
Institute of Business Administration Karachi, December 28th, 2014 Page 12
6. Application of Technique Using EXCEL
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.998702
R Square 0.997406
Adjusted R
Square 0.996295
Standard
Error 16.87827
Observatio
ns 11
ANOVA
df SS MS F
Significan
ce F
Regression 3
76682
4 255608
897.260
5 2.07E-09
Residual 7
1994.1
32 284.876
Total 10
76881
8.2
Coefficient
s
Standa
rd Err t Stat P-value
Lower
95%
Upper
95%
Lower
95.0%
Upper
95.0%
Intercept 48.12886
15.705
99
3.06436
3
0.01821
1 10.99009
85.2676
3
10.9900
9
85.2676
3
por 0.210432
0.0316
96
6.63908
5
0.00029
3 0.135483
0.28538
1
0.13548
3
0.28538
1
pof 0.004812 0.0007 6.55710 0.00031 0.003076 0.00654 0.00307 0.0065
wPi 0.411274 0.0759 5.41646 0.00099 0.231727 0.59082 0.23172 0.59082
MULTIPLE REGRESSION ANALYSIS & MULTICOLINEARITY
Institute of Business Administration Karachi, December 28th, 2014 Page 13
7. Main Findings:
Price of Wheat is directly related to Price of Rice, Price of Fertilizer and Wheat
Price Index
Fitted Line Equation was obtained to predict change in Wheat Price due to
change in independent variables
Multicolineality was found to be insignificant due to mutually independent
nature of variables.
A negative Auto-Correlation was observed
8. CONCLUSION AND IMPLICATIONS:
Multiple Regression Model was used to simulate a real life problem and different
tools were used to obtain important Statistics results. Following Conclusion can be
drawn from the study:
Rice being the principle substitute of wheat in Pakistan and in most of the world plays a vital role in determining the price of wheat. The higher the price of the rice, the higher the demand of the wheat.
Increased Price of Fertilizer affects price of Wheat (Supply Curve Shift due to Cost of Production)
Another Important observation made during this analysis was the fact that regional wheat prices have a significant impact on wheat prices in Pakistan. This can be explained by indirect impact of world wheat prices on Pakistani government’s wheat procurement policy and direct impact because Pakistan has been importing as well as exporting wheat in different years historically.
9. REFERENCES
1. Paul Dorosh and Abdul Salam 2008: “Wheat Markets and Price Stabilization in Pakistan: An Analysis of Policy Options”
2. Salman Azam Joiya And Adnan Ali Shahzad 2013: “Determinants Of High Food Prices”
3. www.fao.org/statistics/en/ 4. www.pbs.gov.pk 5. www.finance.gov.pk 6. www.blog.minitab.com/ 7. en.wikipedia.org