econometrics multiple regression analysis: heteroskedasticitydocentes.fe.unl.pt/~azevedoj/web...
TRANSCRIPT
MLR.5 Variance LM Statistic Testing Heteroskedasticity WLS and GLS
EconometricsMultiple Regression Analysis: Heteroskedasticity
Joao Valle e Azevedo
Faculdade de EconomiaUniversidade Nova de Lisboa
Spring Semester
Joao Valle e Azevedo (FEUNL) Econometrics Lisbon, April 2011 1 / 19
MLR.5 Variance LM Statistic Testing Heteroskedasticity WLS and GLS
Heteroskedasticity
Properties of OLS: Variance
Assumption MLR.5 (Homoskedasticity) The error u has the samevariance given any value of the explanatory variables
Var(u|xi , ..., xk) = σ2 leading to Var(β|X) = σ2(X′X)−1
With MLR.1 through MLR.5 we have derived the variance of the OLSestimators and further concluded that OLS was asymptoticallyNormal: Enough to conduct inference ”as usual”
If MLR.5 does not hold, that is, if the conditional variance of u isallowed to vary given the x ’s, then the errors are heteroskedasticand the results above are NOT valid. Cannot make inference ”asusual” (t-tests, F tests, LM tests)
Joao Valle e Azevedo (FEUNL) Econometrics Lisbon, April 2011 2 / 19
MLR.5 Variance LM Statistic Testing Heteroskedasticity WLS and GLS
Heteroskedasticity
Heteroskedastic CaseSuppose y is wage and x is education
.
xx1 x2
f(y|x)
x3
..
E(y|x) = b0 + b1x
Figure: How spread out is the distribution of the estimator
Joao Valle e Azevedo (FEUNL) Econometrics Lisbon, April 2011 3 / 19
MLR.5 Variance LM Statistic Testing Heteroskedasticity WLS and GLS
Heteroskedasticity
Properties of OLS: Variance (Cont.)
Theorem
Under assumptions MLR.1 through MLR.5
Var(βj) =σ2
SSTj(1− R2j ), j = 0, 1, ..., k
SSTj =n∑
i=1
(xij − xj)2
R2j is the coefficient of determination from regressing xj on all the other regressors.
Tells us how much the other regressors ”explain” xj
Joao Valle e Azevedo (FEUNL) Econometrics Lisbon, April 2011 4 / 19
MLR.5 Variance LM Statistic Testing Heteroskedasticity WLS and GLS
Heteroskedasticity
Variance with Heteroskedasticity
Now assume Var(ui |xi1, ..., xik) = σ2i
For the simple regression case:
β1 = β1 +
∑(xi − x)ui∑(xi − x)2
So, conditional on the x ’s:
Var(β1) =
∑(xi − x)2σ2i∑
(xi − x)2
A valid estimator when σ2i 6= σ2 is:
Var(β1) =
∑(xi − x)2u2i[∑(xi − x)2
]2 ,where ui are the OLS residuals
Joao Valle e Azevedo (FEUNL) Econometrics Lisbon, April 2011 5 / 19
MLR.5 Variance LM Statistic Testing Heteroskedasticity WLS and GLS
Heteroskedasticity
Variance with Heteroskedasticity
For the multiple regression model, a valid (consistent) estimator ofVar(βj) with heteroskedasticity is:
Var(βj) =
∑r2ij u
2i
SSR2j
rij is the i th residual from regressing xj on all other independentvariables
SSRj is the sum of squared residuals from this regression
ui are the OLS residuals
Joao Valle e Azevedo (FEUNL) Econometrics Lisbon, April 2011 6 / 19
MLR.5 Variance LM Statistic Testing Heteroskedasticity WLS and GLS
Heteroskedasticity
Robust Standard Errors
The square root of this variance can be used as a standard error forinference (Robust Standard error). With these standard errors itturns out that:
t =(βj − βj)se(βj)
a∼ Normal(0, 1)
I This is an heteroskedasticity-robust t statistic
Often, the estimated variance is corrected for degrees of freedom bymultiplying by n/(n-k-1) (irrelevant for large n)
Why not use always robust standard errors?I In small samples t statistics using robust standard errors will not have a
distribution close to the Normal (or t) and inferences will not be correct
Will not deal with heteroskedasticity-robust F statistics
Instead, use heteroskedasticity-robust LM tests
Joao Valle e Azevedo (FEUNL) Econometrics Lisbon, April 2011 7 / 19
MLR.5 Variance LM Statistic Testing Heteroskedasticity WLS and GLS
Heteroskedasticity
A Robust LM Statistic
Suppose we have a standard model
y = β0 + β1x1 + β2x2 + ...+ βkxk + u
and our null hypothesis is H0 : βk−q+1 = βk−q+2 = ... = βk = 0 (thenumber of restrictions is q)
First, we just run OLS on the restricted model and save the residuals u
Regress each of the excluded variables on all of the included variables(q different regressions) and save each set of residuals r1, r2, ..., rq
Regress a variable defined to be = 1 on r1, r2, ..., rq, with no intercept
The LM statistic is n − SSR1, where SSR1 is the sum of squaredresiduals from this final regression, it has a chi-square distributionwith q degrees of freedom (under the Null)
Joao Valle e Azevedo (FEUNL) Econometrics Lisbon, April 2011 8 / 19
MLR.5 Variance LM Statistic Testing Heteroskedasticity WLS and GLS
Heteroskedasticity
Testing for Heteroskedasticity
Want to test H0 : Var(u|xi , ..., xk) = σ2, which is equivalent toH0 : E (u2|xi , ..., xk) = E (u2) = σ2
If assume the relationship between u2 and xj will be linear, can test asa linear restriction
I Thus, for u2 = δ0 + δ1x1 + ...+ δkxk + ν this means testingH0 : δ1 = δ2 = ... = δk = 0
I Don’t observe the error, but can use residuals from the OLS regression
Joao Valle e Azevedo (FEUNL) Econometrics Lisbon, April 2011 9 / 19
MLR.5 Variance LM Statistic Testing Heteroskedasticity WLS and GLS
Heteroskedasticity
The Breusch-Pagan Test
Estimate u2 = δ0 + δ1x1 + ...+ δkxk + ν by OLS
Want to test H0 : δ1 = δ2 = ... = δk = 0
I Take the R2 of this regression. With assumptions MLR.1 throughMLR.4 still in place we can use an F test or an LM type test
I The F statistic is just the reported F statistic for overall significance ofthis regression
F =R2/k
(1− R2)/(n − k − 1)∼ F(k,n−k−1)
Alternatively, can form the LM statistic LM = nR2, which isapproximately distributed as a χ2
k under the null (R2 of the regressionabove!, this is not the typical LM test!)
These tests are usually called the Breusch-Pagan tests forheteroskedasticity
Joao Valle e Azevedo (FEUNL) Econometrics Lisbon, April 2011 10 / 19
MLR.5 Variance LM Statistic Testing Heteroskedasticity WLS and GLS
Heteroskedasticity
The White Test
The Breusch-Pagan tests will detect linear forms of heteroskedasticityThe White test allows for nonlinearities by using squares andcross-products of all the x’sEstimate
u2 = δ0 + δ1x1 + ...+ δkxk + δk+1x21 + ...+ δ2kx
2k + ...+
+ δ2k + 1x1x2 + ...+ δk+k(k+1)/2xkxk−1 + error by OLS
Want to test H0 : δ1 = δ2 = ... = δk+k(k+1)/2 = 0
I Take the R2 of this regression and still use the F or LM statistics totest whether all the xj , x
2j , and xjxh are jointly significant:
F =R2/q
(1− R2)/(n − q − 1)∼ F (q, n− k−1) (approx.) under the null
I and LM=nR2 ∼ χ2q (approx.) under the null (q = k + k(k + 1)/2)
I If k is large and n small these approximations are poorJoao Valle e Azevedo (FEUNL) Econometrics Lisbon, April 2011 11 / 19
MLR.5 Variance LM Statistic Testing Heteroskedasticity WLS and GLS
Heteroskedasticity
Alternate form of the White Test
Now, the fitted values from OLS, y , are a function of all the x ’s
Thus, y2 will be a function of the squares and cross-products and yand y2 can ”substitute” for all of the xj , x
2j , and xjxh, so:
I Regress the squared residuals on y and y2 (as well as a constant) anduse the R2 to form an F or LM statistic (as for the BP or White tests)
I Only testing 2 restrictions now
Joao Valle e Azevedo (FEUNL) Econometrics Lisbon, April 2011 12 / 19
MLR.5 Variance LM Statistic Testing Heteroskedasticity WLS and GLS
WLS
Weighted Least Squares
We can always estimate robust standard errors for OLS
However, if we know something about the specific form of theheteroskedasticity, we can obtain estimators that have a smallervariance than OLS
If we know in fact something we are able to transform the model intoone that has homoskedastic errors
Joao Valle e Azevedo (FEUNL) Econometrics Lisbon, April 2011 13 / 19
MLR.5 Variance LM Statistic Testing Heteroskedasticity WLS and GLS
WLS
Case of known form up to a multiplicative constant
y = β0 + β1x1 + β2x2 + β3x3 + ...+ βkxk + u
Suppose we know that Var(u|x) = σ2h(x), orVar(ui |x) = σ2h(xi ) = σ2hi
Example:
wage = β0 + β1Education + β2Experience + β3Tenure + u
We know that E (ui/√hi |x) = 0, because hi depends only on x, and
Var(ui/√hi |x) = σ2, because Var(u|x) = σ2hi
So, if we divide the regression equation by√hi we will get a model
where the error is homoskedastic (MLR.1 to MLR.5 verified again)
Joao Valle e Azevedo (FEUNL) Econometrics Lisbon, April 2011 14 / 19
MLR.5 Variance LM Statistic Testing Heteroskedasticity WLS and GLS
WLS
Generalized Least Squares
Estimating the transformed equation by OLS is an example ofgeneralized least squares (GLS)GLS will be BLUE (Best Linear Unbiased Estimator) in this caseThe GLS estimator for the particular case where we divide theregression equation by
√hi is called a weighted least squares (WLS)
estimator. Why?
n∑i=1
(y∗i −β0√hi− β1x∗i1 − ...− βkx∗ik)2
where y∗i = yi/√hi , x
∗i1 = xi1/
√hi
n∑i=1
(yi − β0 − β1xi1 − ...− βkxik)2/hi
Individuals with larger variance are given a smaller weight.Joao Valle e Azevedo (FEUNL) Econometrics Lisbon, April 2011 15 / 19
MLR.5 Variance LM Statistic Testing Heteroskedasticity WLS and GLS
WLS
More on WLS
We interpret WLS estimates in the original (not transformed model)but get variances of the WLS estimators in the transformed model
WLS is optimal if we know the form of Var(ui |xi )In most cases, won’t know the form of heteroskedasticity
Can often estimate the form of heteroskedasticity
Example:
wage = β0 + β1Education + β2Experience + β3Tenure + u
Var(u|Education,Experience,Tenure) = σ2exp(δ0 + δ1Education)
I where δ0 and δ1 are unknown
Joao Valle e Azevedo (FEUNL) Econometrics Lisbon, April 2011 16 / 19
MLR.5 Variance LM Statistic Testing Heteroskedasticity WLS and GLS
WLS
Must estimate the form of Heteroskedasticity: FeasibleGLS
First, we assume a model for heteroskedasticity
Example: Var(u|x) = E (u2|x) = σ2exp(δ0 + δ1x1 + ...+ δkxk) > 0
Since we don’t know the δ’s, must estimate them
We can write the above model as:
u2 = σ2exp(δ0 + δ1x1 + ...+ δkxk)ν, where E (ν|x) = 1
Assume further that ν is independent of x
Then ln(u2) = α0 + δ1x1 + ...+ δkxk + e
where E (e) = 0 and e is independent of x
Joao Valle e Azevedo (FEUNL) Econometrics Lisbon, April 2011 17 / 19
MLR.5 Variance LM Statistic Testing Heteroskedasticity WLS and GLS
WLS
Feasible GLS (continued)
ln(u2) = α0+δ1x1+...+δkxk+e where E (e) = 0 and e is independent of x
Can use u (from OLS) instead of u, to estimate this equation by OLS
Then, obtain an estimate of hi by hi = exp(gi ),
Finally, use 1/hi as the weights in WLS
Summary:
I Run OLS in the original model, save the residuals, u, square them andtake logs
I Regress ln(u2) on all of the independent variables (plus constant) andget the fitted values, g
I Do WLS using 1/exp(g) as the weight
Joao Valle e Azevedo (FEUNL) Econometrics Lisbon, April 2011 18 / 19
MLR.5 Variance LM Statistic Testing Heteroskedasticity WLS and GLS
WLS
Notes on GLS
OLS is still unbiased and consistent with heteroskedasticity (as longas MLR.1 through MLR.4) hold
We use GLS just for efficiency (smaller variance of the estimators)
If we know the weights to use in WLS, then GLS is unbiased.Otherwise, and assuming that we estimate a correctly specified forheteroskedasticity, FGLS (which is a Feasible GLS) is not unbiasedbut is consistent and asymptotically efficient
Remember, with FGLS we are estimating the parameters of theoriginal model. Standard errors in the transformed model also refer tostandard errors in the original model
Can use the t and F tests for inference
When doing F tests with WLS, form the weights from theunrestricted model and use those weights to do WLS on the restrictedmodel as well as on the unrestricted model
Joao Valle e Azevedo (FEUNL) Econometrics Lisbon, April 2011 19 / 19