robust and clustered standard errors
TRANSCRIPT
Robust and Clustered Standard Errors
Molly Roberts
March 8, 2012
Molly Roberts () Robust and Clustered Standard Errors March 8, 2012 1 / 43
An Introduction to Robust and Clustered Standard Errors
Outline
1 An Introduction to Robust and Clustered Standard ErrorsLinear Regression with Non-constant VarianceGLM’s and Non-constant VarianceCluster-Robust Standard Errors
2 Replicating in R
3 Why be skeptical of robust standard errors?
Molly Roberts () Robust and Clustered Standard Errors March 8, 2012 2 / 43
An Introduction to Robust and Clustered Standard Errors
My Main References
"On the So-Called ’Huber Sandwich Estimator’ and ’RobustStandard Errors’" by David FreedmanAdam’s slides from Gov 2000Wooldridge Econometric Analysis of Cross Section and PanelDataThe Gov 2001 Code library!
Molly Roberts () Robust and Clustered Standard Errors March 8, 2012 3 / 43
An Introduction to Robust and Clustered Standard Errors
Outline
1 An Introduction to Robust and Clustered Standard ErrorsLinear Regression with Non-constant VarianceGLM’s and Non-constant VarianceCluster-Robust Standard Errors
2 Replicating in R
3 Why be skeptical of robust standard errors?
Molly Roberts () Robust and Clustered Standard Errors March 8, 2012 4 / 43
An Introduction to Robust and Clustered Standard Errors Linear Regression with Non-constant Variance
Remember, from Adam’s slides...
Molly Roberts () Robust and Clustered Standard Errors March 8, 2012 5 / 43
An Introduction to Robust and Clustered Standard Errors Linear Regression with Non-constant Variance
Review: Errors and Residuals
Errors are the vertical distances between observations and theunknown Conditional Expectation Function. Therefore, they areunknown.
Residuals are the vertical distances between observations and theestimated regression function. Therefore, they are known.
Molly Roberts () Robust and Clustered Standard Errors March 8, 2012 6 / 43
An Introduction to Robust and Clustered Standard Errors Linear Regression with Non-constant Variance
Notation
Errors represent the difference between the outcome and the truemean.
y = Xβ + uu = y− Xβ
Residuals represent the difference between the outcome and theestimated mean.
y = Xβ + u
u = y− Xβ
Molly Roberts () Robust and Clustered Standard Errors March 8, 2012 7 / 43
An Introduction to Robust and Clustered Standard Errors Linear Regression with Non-constant Variance
Variance of β depends on the errors
β =(X′X
)−1 X′y
=(X′X
)−1 X′(Xβ + u)
= β +(X′X
)−1 X′u
Molly Roberts () Robust and Clustered Standard Errors March 8, 2012 8 / 43
An Introduction to Robust and Clustered Standard Errors Linear Regression with Non-constant Variance
Variance of β depends on the errors
V [β] = V [β] + V [(X′X
)−1 X′u]
= 0 + V [(X′X
)−1 X′u]
= E [(X′X
)−1 X′uu′X(X′X
)−1]− E [
(X′X
)−1 X′u]E [(X′X
)−1 X′u]′
= E [(X′X
)−1 X′uu′X(X′X
)−1]− 0
Molly Roberts () Robust and Clustered Standard Errors March 8, 2012 9 / 43
An Introduction to Robust and Clustered Standard Errors Linear Regression with Non-constant Variance
Variance of β depends on the errors (continued)
V [β] = V [β] + V [(X′X
)−1 X′u]
= 0 + V [(X′X
)−1 X′u]
= E [(X′X
)−1 X′uu′X(X′X
)−1]− E [
(X′X
)−1 X′u]E [(X′X
)−1 X′u]′
= E [(X′X
)−1 X′uu′X(X′X
)−1]− 0
=(X′X
)−1 X′E [uu′]X(X′X
)−1
=(X′X
)−1 X′ΣX(X′X
)−1
Molly Roberts () Robust and Clustered Standard Errors March 8, 2012 10 / 43
An Introduction to Robust and Clustered Standard Errors Linear Regression with Non-constant Variance
Constant Error Variance and Dependence
Under the four assumptions,
u ∼ Nn(0,Σ)
Σ = Var(u) = E [uu′] =
σ2 0 0 . . . 00 σ2 0 . . . 0
...0 0 0 . . . σ2
What does this mean graphically for a CEF with one explanatoryvariable?
Molly Roberts () Robust and Clustered Standard Errors March 8, 2012 11 / 43
An Introduction to Robust and Clustered Standard Errors Linear Regression with Non-constant Variance
Evidence of Non-constant Error Variance (4 examples)
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●●●
●●
●
●
●
●
●
●
●
●●
●●
●
0.0 0.2 0.4 0.6 0.8 1.0
−1.
00.
01.
02.
0
x
y● ●●
● ●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●
● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0.0 0.2 0.4 0.6 0.8 1.0
−1.
5−
0.5
0.5
1.5
x
y
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.4
0.8
1.2
x
y
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
x
y
Molly Roberts () Robust and Clustered Standard Errors March 8, 2012 12 / 43
An Introduction to Robust and Clustered Standard Errors Linear Regression with Non-constant Variance
Notation
The constant error variance assumption sometimes calledhomoskedasiticity states that
Var(u) = E [uu′] =
σ2 0 0 . . . 00 σ2 0 . . . 0
...0 0 0 . . . σ2
In this section we will allow violations of this assumption in thefollowing heteroskedastic form.
Var(u) = E [uu′] =
σ2
1 0 0 . . . 00 σ2
2 0 . . . 0...
0 0 0 . . . σ2n
Molly Roberts () Robust and Clustered Standard Errors March 8, 2012 13 / 43
An Introduction to Robust and Clustered Standard Errors Linear Regression with Non-constant Variance
Consequences of non-constant error variance
The σ2 will not be unbiased for σ2.For “α” level tests, probability of Type I error will not be α.“1− α” confidence intervals will not have 1− α coverageprobability.The LS estimator is no longer BLUE.
However,The degree of the problem depends on the amount ofheteroskedasticity.β is still unbiased for β
Molly Roberts () Robust and Clustered Standard Errors March 8, 2012 14 / 43
An Introduction to Robust and Clustered Standard Errors Linear Regression with Non-constant Variance
Heteroskedasticity Consistent Estimator
Suppose
V [u] = Σ =
σ2
1 0 0 . . . 00 σ2
2 0 . . . 0...
0 0 0 . . . σ2n
then Var(β) = (X′X)−1 X′ΣX (X′X)−1 and Huber (and then White)showed that
(X′X
)−1 X′
u2
1 0 0 . . . 00 u2
2 0 . . . 0...
0 0 0 . . . u2n
X(X′X
)−1
is a consistent estimator of V [β].
Molly Roberts () Robust and Clustered Standard Errors March 8, 2012 15 / 43
An Introduction to Robust and Clustered Standard Errors Linear Regression with Non-constant Variance
Things to note about this approach
1 Requires larger sample sizelarge enough for each estimate (e.g., large enough in bothtreatment and baseline groups or large enough in both runoff andnon-runoff groups)large enough for consistent estimates (e.g., need n ≥ 250 for Statadefault when highly heteroskedastic (Long and Ervin 2000)).
2 Doesn’t make β BLUE3 What are you going to do with predicted probabilities?
Molly Roberts () Robust and Clustered Standard Errors March 8, 2012 16 / 43
An Introduction to Robust and Clustered Standard Errors GLM’s and Non-constant Variance
Outline
1 An Introduction to Robust and Clustered Standard ErrorsLinear Regression with Non-constant VarianceGLM’s and Non-constant VarianceCluster-Robust Standard Errors
2 Replicating in R
3 Why be skeptical of robust standard errors?
Molly Roberts () Robust and Clustered Standard Errors March 8, 2012 17 / 43
An Introduction to Robust and Clustered Standard Errors GLM’s and Non-constant Variance
What happens when the model is not linear?
Huber (1967) developed a general way to find the standard errors formodels that are specified in the wrong way.
Under certain conditions, you can get the standard errors, even if yourmodel is misspecified.
These are the robust standard errors that scholars now use for otherglm’s, and that happen to coincide with the linear case.
Molly Roberts () Robust and Clustered Standard Errors March 8, 2012 18 / 43
An Introduction to Robust and Clustered Standard Errors GLM’s and Non-constant Variance
What does it mean for a non-linear model to haveheteroskedasticity?
Think about the probit model in the latent variable formulation.Pretend that there is heteroskedasticity on the linear model for y∗.Heteroskedasticity in the latent variable formulation will completelychange the functional form of P(y = 1|x).What does this mean? The P(y = 1|x) 6= Φ(xβ). Your model iswrong.
Molly Roberts () Robust and Clustered Standard Errors March 8, 2012 19 / 43
An Introduction to Robust and Clustered Standard Errors GLM’s and Non-constant Variance
What does it mean for a non-linear model to haveheteroskedasticity?
In most cases, this means your betas are no longer consistentYou can use a robust estimator, but if your robust estimator is verydifferent from your regular estimator, this probably indicates thereis heteroskedasticity and therefore your betas are probably off.→ Maybe robust standard error estimation should be a diagnosticrather than a fix.
Molly Roberts () Robust and Clustered Standard Errors March 8, 2012 20 / 43
An Introduction to Robust and Clustered Standard Errors GLM’s and Non-constant Variance
But first, the math
To derive robust standard errors in the general case, we assume that
y ∼ fi(y |θ)
Then our likelihood function is given by
n∏i=1
fi(Yi |θ)
and thus the log-likelihood is
L(θ) =n∑
i=1
log fi(Yi |θ)
Molly Roberts () Robust and Clustered Standard Errors March 8, 2012 21 / 43
An Introduction to Robust and Clustered Standard Errors GLM’s and Non-constant Variance
But first, the math
We will denote the first and second partial derivatives of L to be:
L′(θ) =n∑
i=1
gi(Yi |θ),L′′(θ) =n∑
i=1
hi(Yi |θ)
Wheregi(Yi |θ) = [logfi(y |θ)]′ =
δ
δθlog fi(y |θ)
and
hi(Yi |θ) = [logfi(y |θ)]′′ =δ2
δθ2 log fi(y |θ)
Molly Roberts () Robust and Clustered Standard Errors March 8, 2012 22 / 43
An Introduction to Robust and Clustered Standard Errors GLM’s and Non-constant Variance
But first, the math
This shouldn’t be too unfamiliar.
Remember, the Fisher information matrix is −Eθ[hi(Yi |θ)].
Molly Roberts () Robust and Clustered Standard Errors March 8, 2012 23 / 43
An Introduction to Robust and Clustered Standard Errors GLM’s and Non-constant Variance
But first, the math
Let’s assume the model is correct – there is a true value θ0 for θ.Then we can use the Taylor approximation for the log-likelihoodfunction to estimate what the likelihood function looks like aroundθ0:
L(θ) = L(θ0) + L′(θ0)(θ − θ0) +12
(θ − θ0)T L′′(θ0)(θ − θ0)
Molly Roberts () Robust and Clustered Standard Errors March 8, 2012 24 / 43
An Introduction to Robust and Clustered Standard Errors GLM’s and Non-constant Variance
But first, the math
We can use the Taylor approximation to approximate θ − θ0 andtherefore the variance covariance matrix.We want to find the maximum of the log-likelihood function, so weset L′(θ) = 0:
L′(θ0)(θ − θ0) + (θ − θ0)T L′′(θ0) = 0
θ − θ0 = [−L′′(θ0)]−1L′(θ0)T
Avar(θ) = [−L′′(θ0)]−1[Cov(L′(θ0))][−L′′(θ0)]−1
Molly Roberts () Robust and Clustered Standard Errors March 8, 2012 25 / 43
An Introduction to Robust and Clustered Standard Errors GLM’s and Non-constant Variance
But first, the math
It’s the sandwich estimator.
Avar(θ) = [−L′′(θ0)]−1[Cov(L′(θ0))][−L′′(θ0)]−1
=
[−
n∑i=1
hi(Yi |θ)
]−1 [ n∑i=1
gi(Yi |θ)T gi(Yi |θ)
][−
n∑i=1
hi(Yi |θ)
]−1
Molly Roberts () Robust and Clustered Standard Errors March 8, 2012 26 / 43
An Introduction to Robust and Clustered Standard Errors GLM’s and Non-constant Variance
But first, the math
It’s the sandwich estimator.
=
[−
n∑i=1
hi(Yi |θ)
]−1 [ n∑i=1
gi(Yi |θ)T gi(Yi |θ)
][−
n∑i=1
hi(Yi |θ)
]−1
Bread:[−∑n
i=1 hi(Yi |θ)]−1
Meat:[∑n
i=1 gi(Yi |θ)T gi(Yi |θ)]
Molly Roberts () Robust and Clustered Standard Errors March 8, 2012 27 / 43
An Introduction to Robust and Clustered Standard Errors GLM’s and Non-constant Variance
So what does this mean for my model?
Remember we assumed the model is right?
Well, if the model is right we won’t have heteroskedasticity andtherefore the robust standard errors will be very close to ournon-robust standard errors.
Molly Roberts () Robust and Clustered Standard Errors March 8, 2012 28 / 43
An Introduction to Robust and Clustered Standard Errors GLM’s and Non-constant Variance
So what does this mean for my model?
What if the model is wrong?Say we were estimating a model with the functional form fi(y |θ0)but we really should have been using a different stochasticcomponent, ψi .Huber shows that if we assume there is a θ0 such that our fi(y |θ0)is the closest in terms of entropy to ψi , then our estimate willconverge to θ0.But in the end, θ0 is really not our quantity of interest.
Molly Roberts () Robust and Clustered Standard Errors March 8, 2012 29 / 43
An Introduction to Robust and Clustered Standard Errors Cluster-Robust Standard Errors
Cluster-Robust Standard Errors
Using clustered-robust standard errors, the meat changes.
Instead of summing over each individual, we first sum over the groups. n∑j=1
∑i∈cj
gi(Yi |θ)T gi(Yi |θ)
Does this solve a clustering problem?
Maybe in some cases where the clusters are actually independent andthe quantity of interest is constant across clusters.
Molly Roberts () Robust and Clustered Standard Errors March 8, 2012 30 / 43
An Introduction to Robust and Clustered Standard Errors Cluster-Robust Standard Errors
Cluster-Robust Standard Errors
Using clustered-robust standard errors, the meat changes.
Instead of summing over each individual, we first sum over the groups.∑j
∑i∈cj
gi(Yi |θ)T gi(Yi |θ)
Does this solve a clustering problem?
But in general, we are just fixing the variance for a model that is wrong.If our model wasn’t wrong, we wouldn’t need clustered standard errors.
Molly Roberts () Robust and Clustered Standard Errors March 8, 2012 31 / 43
Replicating in R
Outline
1 An Introduction to Robust and Clustered Standard ErrorsLinear Regression with Non-constant VarianceGLM’s and Non-constant VarianceCluster-Robust Standard Errors
2 Replicating in R
3 Why be skeptical of robust standard errors?
Molly Roberts () Robust and Clustered Standard Errors March 8, 2012 32 / 43
Replicating in R
Replicating in R
There are lots of different ways to replicate these standard errorsin R.Sometimes it’s difficult to figure out what is going on in Stata.But by really understanding what is going on in R, you will be ableto replicate once you know the equation for Stata.
Molly Roberts () Robust and Clustered Standard Errors March 8, 2012 33 / 43
Replicating in R
Some Data
I’m going to use data from the Gov 2001 Code library.
load("Gov2001CodeLibrary.RData")
This particular dataset is from the paper "When preferences andcommitments collide: The effect of relative partisan shifts onInternational treaty compliance." in International Organization byJoseph Grieco, Christopher Gelpi, and Camber Warren. Thanks to
Michele Margolis and Dan Altman for their contributions to the library!
Molly Roberts () Robust and Clustered Standard Errors March 8, 2012 34 / 43
Replicating in R
The Model
First, let’s run their model:fml <- as.formula(restrict ~ art8 + shift_left + flexible + gnpcap +
regnorm + gdpgrow + resgdp + bopgdp + useimfcr +surveil + univers + resvol + totvol + tradedep + military +termlim + parli + lastrest + lastrest2 + lastrest3)
fit <-glm(fml, data=treaty1, family=binomial(link="logit"))
Molly Roberts () Robust and Clustered Standard Errors March 8, 2012 35 / 43
Replicating in R
The Meat and Bread
First recognize that the bread of the sandwich estimator is just thevariance covariance matrix.
library(sandwich)bread <-vcov(fit)
For the meat, we are going to use the estimating function to create thematrices first derivative:
est.fun <- estfun(fit)
Molly Roberts () Robust and Clustered Standard Errors March 8, 2012 36 / 43
Replicating in R
The Sandwich
So we can create the sandwich
meat <- t(est.fun)%*%est.funsandwich <- bread%*%meat%*%bread
And put them back in our table
library(lm.test)coeftest(fit, sandwich)
Molly Roberts () Robust and Clustered Standard Errors March 8, 2012 37 / 43
Replicating in R
Clustered Standard Errors
First, we have to identify our clusters:
fc <- treaty1$imf_ccodem <- length(unique(fc))k <- length(coef(fit))
Then, we sum the u’s by cluster
u <- estfun(fit)u.clust <- matrix(NA, nrow=m, ncol=k)for(j in 1:k){u.clust[,j] <- tapply(u[,j], fc, sum)}
Molly Roberts () Robust and Clustered Standard Errors March 8, 2012 38 / 43
Replicating in R
Last, we can make our cluster robust matrix:cl.vcov <- vcov %*% ((m / (m-1)) * t(u.clust) %*% (u.clust)) %*% + vcov
And test our coefficients
coeftest(fit, cl.vcov)
Molly Roberts () Robust and Clustered Standard Errors March 8, 2012 39 / 43
Replicating in R
A couple notes
There are easier ways to do this in R (see for example hccm).But it’s good to know what is going on, especially when you arereplicating.Beware: degrees of freedom corrections.
Molly Roberts () Robust and Clustered Standard Errors March 8, 2012 40 / 43
Why be skeptical of robust standard errors?
Outline
1 An Introduction to Robust and Clustered Standard ErrorsLinear Regression with Non-constant VarianceGLM’s and Non-constant VarianceCluster-Robust Standard Errors
2 Replicating in R
3 Why be skeptical of robust standard errors?
Molly Roberts () Robust and Clustered Standard Errors March 8, 2012 41 / 43
Why be skeptical of robust standard errors?
Quantities of Interest
Robust standard errors constrain your quantities of interest.Say you fit a linear model, but you think there is heteroskedasticityYou want to simulate a quantity of interest, which means you haveto simulate the y’s.You *could* use the robust variance matrix to simulate the betas.But how would you simulate fundamental uncertainty?
Molly Roberts () Robust and Clustered Standard Errors March 8, 2012 42 / 43
Why be skeptical of robust standard errors?
Coefficients vs. Standard Errors
We care more about the coefficients than the standard errors.If you think in theory you need robust standard errors, then yourcoefficients are probably off.In our field, we generally care about coefficients (especially theirsign!) more than standard errors.Even more than coefficients, we care about other quantities ofinterest we can’t get to if we use robust SE’s.
Molly Roberts () Robust and Clustered Standard Errors March 8, 2012 43 / 43