problems in regression analysis
DESCRIPTION
Problems in Regression Analysis. Heteroscedasticity Violation of the constancy of the variance of the errors. Cross-sectional data Serial Correlation Violation of uncorrelated error terms Time-series data. Heteroscedasticity. - PowerPoint PPT PresentationTRANSCRIPT
1Spring 02
Problems in Regression Analysis
Heteroscedasticity Violation of the constancy of the variance of the
errors. Cross-sectional data
Serial Correlation Violation of uncorrelated error terms Time-series data
2Spring 02
Heteroscedasticity
The OLS model assumes homoscedasticity, i.e., the variance of the errors is constant. In some regressions, especially in cross-sectional studies, this assumption may be violated.
When heteroscedasticity is present, OLS estimation puts more weight on the observations which have large error variances than on those with small error variances.
The OLS estimates are unbiased but they are inefficient but have larger than minimum variance.
3Spring 02
Tests of Heteroscedasticity
Lagrange Multiplier Tests
Goldfeld-Quant Test
White’s Test
4Spring 02
Goldfeld-Quant Test
Order the data by the magnitude of the independent variable, X, which is thouth to be related to the error variance.
Omit the middle d observations. (d might be 1/5 of the total sample size)
Fit two separate regressions; one for the low values, another for the high values
Calculate ESS1 and ESS2
Calculate
2
1)2
)2(2
)2(( ESSESSF kdNkdN
5Spring 02
Problem
Salvatore – Data on income and consumptionY Consumption
12 10.6 10.8 11.113 11.4 11.7 12.114 12.3 12.6 13.215 13.0 13.3 13.616 13.8 14.0 14.217 14.4 14.9 15.318 15.0 15.7 16.419 15.9 16.5 16.920 16.9 17.5 18.121 17.2 17.8 18.5
6Spring 02
Problem
10.0
11.0
12.0
13.0
14.0
15.0
16.0
17.0
18.0
19.0
10 12 14 16 18 20 22
7Spring 02
Problem
Regression on the whole sample:
dYC *788.*48.1ˆ
Regressions on the first twelve and last twelve observations:
97.23.3069.1344.3
344.3,71.0,837.31.2ˆ
069.1,91.0,837.85.ˆ
%510,10
1222
12
11
crit
d
d
FF
ESSRYC
ESSRYC
8Spring 02
To Correct for Heteroscedasticity
To correct for heteroscedasticity of the form Var(i)=CX2, where C is a nonzero constant, transform the variables by dividing through by the problematic variable.
In the two variable case,
The transformed error term is now homoscedastic
i
i
ii
i
XXX
Y 2
1
9Spring 02
Problem
d
dd
idd
YC
YY
C
uYY
C
792.421.1ˆ
1421.1792.
ˆ
121
10Spring 02
Serial Correlation
This is the problem which arises in OLS estimation when the errors are not independent. The error term in one period is correlated with error
terms in previous periods.
If i is correlated with i-1, then we say there is first order serial correlation.Serial correlation may be positive or negative. E(i,i-1)>0 E(i,i-1)<0
11Spring 02
Serial Correlation
If serial correlation is present, the OLS estimates are still unbiased and consistent, but the standard errors are biased, leading to incorrect statistical tests and biased confidence intervals. With positive serial correlation, the standard errors
of hat is biased downward, leading to higher t stats With negative serial correlation, the standard errors
of hat is biased upward, leading to lower t stats
12Spring 02
Durbin-Watson Statistic
40
)(
1
2
2
21
d
d n
tt
n
ttt
0 dL dU 2 4-dU 4-dL 4
+SC inconcl no serial correlation inconcl -SC
13Spring 02
Problem
Data 9-4 shows corporate profits and sales in billions of dollars for the manufacturing sector of the U.S. from 1974 to 1994.
Estimate the equation
Profits = 1+2Sales + e
Test for first-order serial correlation.
14Spring 02
Problem
Coefficientsa
34.014 24.041 1.415 .173
2.654E-02 .011 .496 2.492 .022
(Constant)
SALES
Model1
B Std. Error
UnstandardizedCoefficients
Beta
Standardized
Coefficients
t Sig.
Dependent Variable: PROFITSa.
OLS Estimate of Profit as a function of Sales:
Salest *027.01.34ˆ
15Spring 02
Problem
Test for serial correlation SPSS
Model Summaryb
.496a .246 .207 31.251 1.080Model1
R R SquareAdjustedR Square
Std. Error ofthe Estimate
Durbin-Watson
Predictors: (Constant), SALESa.
Dependent Variable: PROFITSb.
16Spring 02
Correcting for Serial Correlation
We assume:
Where ut is distributed normally with a zero mean and constant variance.
Follow a Durbin Procedure
21
1
),(
tt
ttt
Cov
u
17Spring 02
Correcting for Serial Correlation
)()(...)()1(
...
...
...
11122211
1112211
1112211
221
ttktktktttt
tktktt
tktktt
tktktt
XXXXYY
XXY
XXY
XXY
18Spring 02
Correcting for Serial Correlation
)()(...)()1( 11122211 ttktktktttt XXXXYY
• Move the lagged dependent variable term to the right-hand side and estimate the equation using OLS. The estimated coefficient on the lagged dependent variable is .
19Spring 02
Correcting for Serial Correlation
1*
1*
ttt
ttt
YYY
XXX
Create new independent and dependent variables by the following process:
Estimate the following equation:
tkkt uXXY **221
* ...)1(
)()(...)()1( 11122211 ttktktktttt XXXXYY
20Spring 02
Correcting for Serial Correlation
The estimates of the slope coefficients are the same (but corrected for serial correlation) as in the original equation.
The constant of the regression on the transformed variables is
tkkt uXXY **221
* ...)1(
)1(
)1(
*1
1
1*1
or
21Spring 02
Problem
Begin by regressing Profit () on Profit lagged one period, Sales, and Sales lagged one period.
The estimated coefficient on the lagged dependent variable is .
ttttt uSS 12211
22Spring 02
Problem
Coefficientsa
-1.419 24.387 -.058 .954
.492 .209 .419 2.358 .031
.176 .052 3.106 3.355 .004
-.161 .053 -2.840 -3.046 .008
(Constant)
PROFITSL
SALES
SALESL
Model1
B Std. Error
UnstandardizedCoefficients
Beta
Standardized
Coefficients
t Sig.
Dependent Variable: PROFITSa.
= .49
23Spring 02
Problem
Then generate the transformed (starred) variables. Run regression on transformed variables
Profit*=.167+.042 Sales*Profit = .327 +.027 Sales With no serial correlation
Coefficientsa
.167 24.855 .007 .995
4.234E-02 .020 .442 2.091 .051
(Constant)
SALESS
Model1
B Std. Error
UnstandardizedCoefficients
Beta
Standardized
Coefficients
t Sig.
Dependent Variable: PROFITSSa.