introduction to robust and clustered standard...

Introduction to Robust and Clustered Standard Errors

Miguel Sarzosa

Department of Economics

University of Maryland

Econ626: Empirical Microeconomics, 2012

1 Standard Errors, why should you worry about them

2 Obtaining the Correct SE

3 Consequences

4 Now we go to Stata!

Regressions and what we estimateA regression does not calculate the value of a relation between twovariables. In fact, it calculates a distribution of values of suchrelation. That is why, when you calculate a regression the two mostimportant outputs you get are:

I The conditional mean of the coe�cientI The standard deviation of the distribution of that coe�cient

Are we in the presence of a star?

The standard errors determine how accurate is your estimation.Therefore, it a�ects the hypothesis testing.That is why the standard errors are so important: they are crucial indetermining how many stars your table gets.And like in any business, in economics, the stars matter a lot.Hence, obtaining the correct SE, is critical

SE and the data

The correct SE estimation procedure is given by the underlying structureof the data.

It is very unlikely that all observations in a data set are unrelated, butdrawn from identical distributionsFor instance, the variance of income is often grater in familiesbelonging to top deciles than among poorer familiesSome phenomena do not a�ect observations individually, but theya�ect groups of observations uniformly within each group.

SE and the data

The correct SE estimation procedure is given by the underlying structureof the data.

It is very unlikely that all observations in a data set are unrelated, butdrawn from identical distributionsFor instance, the variance of income is often grater in familiesbelonging to top deciles than among poorer families(heteroskedasticity)

Some phenomena do not a�ect observations individually, but theya�ect groups of observations uniformly within each group. (clustered

data)

The Basic Case (i.i.d.)

y

i

= x

0i

b + ei

ei

⇠�0,s2

�

This means that E

hee 0 |X

i= ⌦= s2

I

We know b =hX

0X

i�1

X

0y = b +

hX

0X

i�1

X

0e and

Var (b ) = E

hX

0X

i�1

X

0ee 0X

hX

0X

i�1

�=

hX

0X

i�1

E

hX

0ee 0X

ihX

0X

i�1

= s2

hX

0X

i�1

Heteroskedasticity (i.n.i.d)

y

i

= x

0i

b + ei

ei

⇠�0,s2

i

�

This means that E

hee 0 |X

i= ⌦= diag

�s2

i

�(big di�erence)

Heteroskedasticity (i.n.i.d)

Now

Var (b )=E

hX

0X

i�1

X

0ee 0X

hX

0X

i�1

�=hX

0X

i�1

E

hX

0ee 0X

ihX

0X

i�1

No further simplification is possibleNeed to estimate E

hX

0ee 0X

i= ÂN

i=1

e2

i

x

i

x

0i

Be aware that ÂN

i=1

e2

i

x

i

x

0i

6= X

0^e^eX

Then the Huber-Eicker-White (HEW) VC estimator is:

Var

⇣b⌘=hX

0X

i�1

"N

Âi=1

e2

i

x

i

x

0i

#hX

0X

i�1

(1)

Two-step procedure: 1. Get ei

, 2. Estimate (1). In Stata:vce(robust)

Clustered Data

Observations are related with each other within certain groups

ExampleYou want to regress kids’ grades on class size.The unobservables of kids belonging to the same classroom will becorrelated (e.g., teachers’ quality, recess routines) while will not becorrelated with kids in far away classrooms.

This means, we are assuming independence across clusters butcorrelation within clusters. That is:

y

ig

= x

0ig

b + eig

where g = 1, . . . ,G

E

he

ig

ejg

0

i=

(0 if g = g

0

s(ij)g if g 6= g

0 (2)

Clustered Data

Let us stack observations by cluster

y

g

= x

0g

b + eg

where g = 1, . . . ,G

The OLS estimator of b is still b =hX

0X

i�1

X

0y. (We just stacked

the data)The variance is given by

Var (b ) = E

hX

0X

i�1

X

0⌦X

hX

0X

i�1

�

Clustered Data

By (2), we know that

⌦= diag (⌃g

)

=

2

666666666666666666666664

s2

(11)1 . . . s(1N

1

)1 0 . . . 0 . . . 0 . . . 0

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

s2

(N1

1)1 s(N1

N

1

)1 0 0 0 0

0 . . . 0 s2

(11)2 . . . s(1N

2

)2

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

0 . . . 0 s(N2

1)2 . . . s2

(N2

N

2

)2

.

.

.

.

.

.

.

.

.

s2

(11)g . . . s(1N

g

)g

.

.

.

.

.

.

.

.

.

s(Ng

1)g . . . s2

(Ng

N

g

)g

3

777777777777777777777775

Clustered Data

With this in mind, we can now write the VC matrix for clustered data

Var

⇣b⌘=

hX

0X

i�1

"G

Âi=1

x

0g

eg

e 0g

x

g

#hX

0X

i�1

In Stata: vce(cluster clustvar ). Where clustvar is a variablethat identifies the groups in which onobservables are allowed tocorrelate.

The importance of knowing your data

In real world you should never go with the i.i.d. case. Life is not thatsimpleYou need to know your data in order to choose the correct errorstructure and then infer the required SE calculationIf you have aggregate variables (like class size), clustering at that levelis required.You might think your data correlates in more than one way

I If nested (e.g., classroom and school district), you should cluster at thehighest level of aggregation

I If not nested (e.g., time and space), you can:1

Include fixed-e�ects in one dimension and cluster in the other one.

2

Multi-way clustering extension (see Cameron, Gelbach and Miller, 2006)

How much will my SE increase?

Clustered SE will increase your confidence intervals because you areallowing for correlation between observations. Hence, less stars in yourtables. The higher the clustering level, the larger the resulting SE.For one regressor the clustered SE inflate the default (i.i.d.) SE by

q1+r

x

re�N �1

�

were rx

is the within-cluster correlation of the regressor, re is thewithin-cluster error correlation and N is the average cluster size.Here it is easy to see the importance of clustering when you haveaggregate regressors (i.e., r

x

= 1).

Now we go to Stata!

introduction to robust and clustered standard...

Documents