an elementary investigation into regression model using eviews 5.1

49
1 An Investigation into Regression Model using EVIEWS Module One Prepared by: Sayed Hossain For more articles and videos, visit at: www.sayedhossain.com

Upload: others

Post on 12-Feb-2022

5 views

Category:

Documents


0 download

TRANSCRIPT

1

An Investigation into Regression Model using

EVIEWS

Module One

Prepared by:

Sayed Hossain

For more articles and videos, visit at:

www.sayedhossain.com

2

Seven assumptions about a good regression model

1. Regression line must be fitted to data strongly.

2. Most of the independent variables should be individually significant to explain dependent variable

3. Independent variables should be jointly significant to influence or explain dependent variable.

4. The sign of the coefficients should follow economic theory or expectation or experiences or intuition.

5. No serial or auto-correlation in the residual (u)

6. The variance of the residual (u) should be constant meaning that homoscedasticity

7. The residual (u) should be normally distributed.

3

(Assumption no. 1)

Regression line must be fitted to data strongly

(Goodness of Data Fit)

***

Guideline : R2 => 60 percent (0.60) is better

4

Goodness of Data Fit

• Data must be fitted reasonable well.

• That is value of R2 should be reasonable high, more than 60 percent.

• Higher the R2 better the fitted data.

www.sayedhossain.com

5

(Assumption no. 2)

Most of the independent variables should

be individually individually significant

**

t- test

t –test is done to know whether each and every

independent variable (X1, X2 and X3 etc here) is

individually significant or not to influence the

dependent variable, that is Y here.

6

Individual significance of the variable

• Most of the independent variables should be individually

significant.

• This matter can be checked using t test.

• If the p-value of t statistics is less than 5 percent (0.05) we can reject the null and accept alternative hypothesis.

• If we can reject the null hypothesis, it means that particular independent variable is significant to influence dependent variable in the population.

7

For Example>>

Variables: We have four variables, Y, X1, X2 X3 Here Y is dependent and X1, X2 X3 are independent

Population regression model

Y = Bo + B1X1+ B2X2 + B3X3 + u

Sample regression model

Y = bo + b1X1+ b2X2 + b3X3 + e

Here, sample regression line is a estimator of population regression line. Our target is to estimate population regression line (which is almost impposible or time and money consuming to estimate) from sample regression line. For example, small b1, b2 and b3 are estimators of big B1, B2 and B3

Here, u is the residual for population regression line while e is the residual for sample regression line. e is the estimator of u. We want to know the nature of u from e.

Tips

If the sample collection is done as per the

statistical guideline (several random procedures)

then sample regression line can be a

representative of population regression line.

Our target is to estimate the population

regression line from a sample regression line.

9

Setting hypothesis for t –test : An example

Null Hypothesis: Bo=0

Alternative hypothesis: Bo≠0

Null hypothesis : B1=0

Alternative hypothesis: B1≠0

Null Hypothesis : B2=0

Alternative hypothesis: B2≠0

Null Hypothesis : B3=0

Alternative hypothesis: B3 ≠0

Hypothesis setting is always done for population, not for sample. That

is why we have taken all big B (from population regression line) but

not small b from sample regression line.

Hypothesis Setting Null hypothesis : B1=0

Alternative hypothesis: B1≠0

• Since the direction of alternative hypothesisis is ≠, meaning that we assume that there exists a relationship between independent variable (X1 should be here) with dependent variable (Y here) in the population. But it can not say whether the relationship is negative or positive. This direction ≠ is a two tail hypothesis.

Null hypothesis : B1=0

Alternative hypothesis: B1<0

• But if we set hypothesis as above, then we assume that in the population, there exists a negative relationship between X1 and Y as the direction in alternative hypothesis is <. It requires one tail test.

`

11

(Assumption no. 3)

Joint Significace

Independent variables should be jointly significant

to explain dependent variable

**

F- test

ANOVA

(Analysis of Variance)

12

Joint significance

• Independent variables should be jointly

significant to explain Y. This can be checked using F-test.

• If the p-value of F statistic is less than 5 percent (0.05) we can reject the null and accept alternative hypothesis.

• If we can reject null hypothesis, it means that all the independent variables (X1, X2 X3 ) jointly can influence dependent variable, that is Y here.

13

Joint hypothesis setting

Null hypothesis Ho: B1=B2=B3=0

Alternative H1: Not all B’s are simultaneously equal to

zero

Here Bo is dropped as it is not associated with any

variable.

Here also taken all big B

www.sayedhossain.com

14

Few things

• Residual ( u or e) = Actual Y – estimated (fitted)

Y

• Residual, error term, disturbance term all are

same meaning.

• Serial correlation and auto-correlation are same

meaning.

15

(Assumption no. 4)

The sign of the coefficients should follow

economic theory or expectation or experiences

of others (literature review) or intuition.

www.sayedhossain.com

Residual Analysis

17

(Assumption no. 5)

No serial or auto-correlation in the residual (u).

** Breusch-Godfrey serial correlation LM test : BG test

18

Serial correlation

• Serial correlation is a statistical term used to the

describe the situation when the residual is

correlated with lagged values of itself.

• In other words, If residuals are correlated, we

call this situation serial correlation which is not

desirable.

19

How serial correlation can be formed in the

model?

• Incorrect model specification,

• omitted variables,

• incorrect functional form,

• incorrectly transformed data.

20

Detection of serial correlation

• Many ways we can detect the existence of serial

correlation in the model.

• An approach of detecting serial correlation is

Breusch-Godfrey serial correlation LM test : BG

test

21

Hypothesis setting

Null hypothesis Ho: no serial correlation (no correlation between residuals (ui and uj))

Alternative hypothesis H1: serial correlation

(correlation between residuals (ui and uj )

22

(Assumption no. 6)

The variance of the residual (u) is

constant (Homoscedasticity)

***

Breusch-Pegan-Godfrey Test

23

• Heteroscedasticity is a term used to the

describe the situation when the variance of the

residuals from a model is not constant.

• When the variance of the residuals is constant,

we call it homoscedasticity. Homoscedasticity

is desirable.

• If residuals do not have constant variance, we

call it hetersocedasticty, which is not desirable.

24

How the heteroscedasticity may form?

– Incorrect model specification,

– Incorrectly transformed data,

25

Hypothesis setting for heteroscedasticity

– Null hypothesis Ho: Homoscedasticity (the

variance of residual (u) is constant)

– Alternative hypothesis H1 : Heteroscedasticity

(the variance of residual (u) is not constant )

26

Detection of heteroscedasticity

• There are many test involed to detect

heteroscedasticity.

• One of them is Bruesch-Pegan-Godfrey test

which we will employ here.

27

(Assumption no. 7)

Residuals (u ) should be normally distributed

**

Jarque Bera statistics

28

Setting the hypothesis:

• Null hypothesis Ho : Normal distribution (the residual (u) follows a normal distribution)

• Alternative hypothesis H1: Not normal distribution (the residual (u) follows not normal distribution)

Detecting residual normality:

• Histogram-Normality test (Perform Jarque-Bera Statistic).

• If the p-value of Jarque-Bera statistics is less than 5 percent (0.05) we can reject null and accept the alternative, that is residuals (u) are not normally distributed.

29

An Emperical

Model Development

www.sayedhossain.com

30

Our hypothetical model

Variables:

We have four variables, Y, X1, X2 X3 Here Y is dependent and X1, X2 and X3 are independent

Population regression model

Y = Bo + B1X1+ B2X2 + B3X3 + u

Sample regression line

Y = bo+ b1X1+ b2X2+b3X3 + e

DATA

Sample size is 35 taken from

population

DATA

obs RESID X1 X2 X3 Y YF

1 0.417167 1700 1.2 20000 1.2 0.782833

2 -0.27926 1200 1.03 18000 0.65 0.929257

3 -0.17833 2100 1.2 19000 0.6 0.778327

4 0.231419 937.5 1 15163 1.2 0.968581

5 -0.33278 7343.3 0.97 21000 0.5 0.832781

6 0.139639 837.9 0.88 15329 1.2 1.060361

7 -0.01746 1648 0.91 16141 1 1.017457

8 -0.14573 739.1 1.2 21876 0.65 0.795733

9 0.480882 2100 0.89 17115 1.5 1.019118

10 -0.0297 274.6 0.23 23400 1.5 1.529701

11 -0.32756 231 0.87 16127 0.75 1.077562

12 0.016113 1879.1 0.94 17688 1 0.983887

13 -0.34631 1941 0.99 17340 0.6 0.946315

14 0.485755 2317.6 0.87 21000 1.5 1.014245

15 0.972181 471.4 0.93 16000 2 1.027819

16 -0.22757 678 0.79 16321 0.9 1.127572

17 -0.2685 7632.9 0.93 18027 0.6 0.868503

18 -0.41902 510.1 0.93 18023 0.6 1.019018

19 -0.4259 630.6 0.93 15634 0.6 1.0259

20 0.076632 1500 1.03 17886 1 0.923368

DATA obs RESID X1 X2 X3 Y YF

21 -0.37349949 1618.3 1.1 16537 0.5 0.873499

22 0.183799347 2009.8 0.96 17655 1.15 0.966201

23 0.195832507 1562.4 0.96 23100 1.15 0.954167

24 -0.46138707 1200 0.88 13130 0.6 1.061387

25 0.309577968 13103 1 20513 1 0.690422

26 -0.21073204 3739.6 0.92 17409 0.75 0.960732

27 -0.08351157 324 1.2 14525 0.75 0.833512

28 -0.02060854 2385.8 0.89 15207 1 1.020609

29 0.14577644 1698.5 0.93 15409 1.15 1.004224

30 -0.06000649 544 0.87 18900 1 1.060006

31 -0.50510204 1769.1 0.45 17677 0.85 1.355102

32 0.870370225 1065 0.65 15092 2.1 1.22963

33 0.274774344 803.1 0.98 18014 1.25 0.975226

34 -0.1496757 1616.7 1 28988 0.75 0.899676

35 0.062732149 210 1.2 21786 0.87 0.807268

Y, X1, X2 and X3 are actual sample data collected from population

YF= Estimated, forecasted or predicted Y

RESID (e) = Residuals of the sample regression line that is, e=Actual Y – Predicted Y (fitted Y)

Regression Output

35

Dependent Variable: Y 35 Observation Method: Least Squares

Included observations: 98

Variable Coefficient Std. Error t-Statistic Prob.

C 1.800 0.4836 3.72 0.0008

X1 -2.11E-05 2.58E-05 -0.820 0.4183

X2 -0.7527 0.3319 -2.267 0.0305

X3 -3.95E-06 2.08E-05 -0.189 0.8509

R-squared 0.1684 Mean dependent var 0.9834

Adjusted R-squared 0.087 S.D. dependent var 0.3912

S.E. of regression 0.3736 Akaike info criterion 0.9762

Sum squared resid 4.328 Schwarz criterion 1.15

Log likelihood -13.08 F-statistic 2.093

Durbin-Watson stat 2.184 Prob(F-statistic) 0.1213

Regression output

Few things

• t- statistics= Coeffient / standard error

• t-statistics (absolute value) and p values always

move in opposite direction

Output Actual Y, Fitted Y, Residual and its plotting

obs Actual Fitted Residual Residual Plot

1 1.2 0.782832991 0.417167009 | . | .* |

2 0.65 0.92925722 -0.27925722 | .* | . |

3 0.6 0.778327375 -0.178327375 | . * | . |

4 1.2 0.96858115 0.23141885 | . | *. |

5 0.5 0.8327808 -0.3327808 | * | . |

6 1.2 1.060360549 0.139639451 | . | * . |

7 1 1.017457055 -0.017457055 | . * . |

8 0.65 0.79573323 -0.14573323 | . * | . |

9 1.5 1.019118163 0.480881837 | . | . * |

10 1.5 1.529701243 -0.029701243 | . * . |

11 0.75 1.077562408 -0.327562408 | * | . |

12 1 0.983887019 0.016112981 | . * . |

13 0.6 0.946314864 -0.346314864 | * | . |

14 1.5 1.014244939 0.485755061 | . | . * |

15 2 1.027819105 0.972180895 | . | . *

16 0.9 1.127572088 -0.227572088 | .* | . |

17 0.6 0.868503447 -0.268503447 | .* | . |

18 0.6 1.019018495 -0.419018495 | *. | . |

19 0.6 1.025899595 -0.425899595 | *. | . |

20 1 0.923368304 0.076631696 | . |* . |

obs Actual Fitted Residual Residual Plot

21 0.5 0.873499486 -0.373499486 | * | . |

22 1.15 0.966200653 0.183799347 | . | * . |

23 1.15 0.954167493 0.195832507 | . | * . |

24 0.6 1.061387074 -0.461387074 | *. | . |

25 1 0.690422032 0.309577968 | . | * |

26 0.75 0.960732042 -0.210732042 | . * | . |

27 0.75 0.833511567 -0.083511567 | . *| . |

28 1 1.020608541 -0.020608541 | . * . |

29 1.15 1.00422356 0.14577644 | . | * . |

30 1 1.060006494 -0.060006494 | . *| . |

31 0.85 1.355102042 -0.505102042 | * . | . |

32 2.1 1.229629775 0.870370225 | . | . * |

33 1.25 0.975225656 0.274774344 | . | *. |

34 0.75 0.899675696 -0.149675696 | . * | . |

35 0.87 0.807267851 0.062732149 | . |* . |

Output Actual Y, Fitted Y, Residual and its plotting

Actual Y, Fitted Y and Residual

-0.8

-0.4

0.0

0.4

0.8

1.2

0.4

0.8

1.2

1.6

2.0

2.4

5 10 15 20 25 30 35

Residual Actual Fitted

Sample residual

-0.6

-0.4

-0.2

0.0

0.2

0.4

0.6

0.8

1.0

5 10 15 20 25 30 35

Y Residuals

41

(Assumption no. 1)

Goodness of Fit Data

R-square: 0.1684

• It means that 16.84 percent variation in Y can

be explained jointly by three independent

variables such as X1, x2 and X3. The rest

83.16 percent variation in Y can be explained

by residuals or other variables other than X1 X2

and X3.

42

(Assumption no. 2)

Joint Hypothesis : F statistics

F statistics: 2.093 and Prob 0.1213

Null hypothesis Ho: B1=B2=B3=0 Alternative H1: Not all B’s are simultaneously equal to zero

Since the p-value is more tha than 5 percent (here 12.13 percent), we can not reject null. In other words, it means that all the independent variables (here X1 X2 and X3) can not jointly explain or influence Y in the population.

43

Assumption No. 3 Independent variable significance

• For X1, p-value : 0.4183

Null Hypothesis: B1=0

Alternative hypothesis: B1≠0

Since the p-value is more than 5 percent (0.05) we can not reject null and meaning we accept null meaning B1=0. In other words, X1 can not influence Y in the population.

• For X2, p-value: 0.0305 (3.05 percent)

Null Hypothesis: B2=0

Alternative hypothesis: B2≠0

Since p-value (0.03035) is less than 5 percent meaning that we can reject null and accept alternative hypothesis. It means that variable X2 can influence variable Y in the population but what direction we can not say as alternative hypothesis is ≠.

• For X3, p-value: 0.8509. So X3 is not significant to explain Y.

Assumption No. 4

Sign of the coefficients

Our sample model:

Y=bo+b1x1+b2x2+b3x3+e

Sign we expected after estimation as follows:

Y=bo - b1x1 + b2x2 - b3x3

Decision : The outcome did not match with our expectation.

So assumption 4 is violated.

45

Breusch-Godfrey Serial Correlation LM Test:

F-statistic 1.01 Prob. F(2,29) 0.3751

Obs*R-squared 2.288 Prob. Chi-Square(2) 0.3185

Null hypothesis : No serial correlation in the residuals

(u)

Alternative: There is serial correlation in the residuals

(u)

Since the p-value ( 0.3185) of Obs*R-squared is

more than 5 percent (p>0.05), we can not reject null

hypothesis meaning that residuals (u) are not

serially correlated which is desirable.

Assumption no 5

SERIAL OR AUTOCORRELATION

46

Assumption no. 6 Heteroscedasticy Test

F-statistic 1.84 Probability 0.3316

Obs*R-squared 3.600 Probability 0.3080

Breusch-Pegan-Godfrey test (B-P-G Test)

Null Hypothesis: Residuals (u) are Homoscedastic

Alternative: Residuals (u) are Hetroscedastic

The p-value of Obs*R-squared shows that we can not

reject null. So residuals do have constant variance which

is desirable meaning that residuals are homoscedastic.

B-P-G test normally done for large sample

47

Assumption no. 7 Residual (u) Normality Test

0

1

2

3

4

5

6

-0.6 -0.4 -0.2 -0.0 0.2 0.4 0.6 0.8 1.0

Series: Residuals

Sample 1 35

Observations 35

Mean 1.15e-16

Median -0.029701

Maximum 0.972181

Minimum -0.505102

Std. Dev. 0.356788

Skewness 0.880996

Kurtosis 3.508042

Jarque-Bera 4.903965

Probability 0.086123

Null Hypothesis: residuals (u) are normally distribution

Alternative: Not normally distributed

Jarque Berra statistics is 4.903 and the corresponding p value is

0.08612. Since p vaue is more than 5 percent we accept null meaning

that population residual (u) is normally distrbuted which fulfills the

assumption of a good regression line.

48

Evaluation of our model on the basis of

assumptions

1. R-square is very low ( Bad sign)

2. There is no serial correlation (Good sign)

3. Independent variables are not jointly can influence Y (Bad sign)

4. Signs are not as expected (Bad sign)

5. Only X2 variable is significant out of three (Bad sign).

6. Heteroscedasticity problem is not there (Good sign)

7. Residuals are normally distributed (Good sign)

49

Use the information of this website on your own risk. This website

shall not be responsible for any loss or expense suffered in

connection with the use of this website.