lecture series 1 linear random and fixed effect models and their (less) recent extensions

8/19/2019 Lecture Series 1 Linear Random and Fixed Effect Models and Their (Less) Recent Extensions

1/62

Lecture Series 1Linear Random and Fixed Effect Models andTheir (Less) Recent Extensions

Stefanie [email protected]

RMIT UniversitySchool of Economics, Finance, and Marketing

January 21, 2014

1 / 6 2


2/62

Overview

1 Recap: Linear model set-up, random effects estimation andfixed effects estimation;

2 Relationship between random and fixed (and between) effectsestimators;

3 Is fixed effects estimation always preferable to random effects

estimation?;

4 Hausman-Taylor (1981) approach to estimating coefficients onboth time-varying and time-invariant variables;

5 Correlated random effects (CRE): a flexible extension to

random effect models to relax orthogonality condition;6 Plumper and Troeger’s Fixed Effects Vector Decomposition

Approach and Rule of Thumb;

7 Application: Estimating the effects of health on wages.

2 / 6 2


3/62

References for Lecture 11 Greene, W.H. (2011). Econometric Analysis. Pearson

Education Limited . 399-438.2 Wooldridge, J. (2009). Econometric Analysis of Cross Sectionand Panel Data. The MIT Press. 285-382, 345-361.

3 Hsiao, C. (2003). Analysis of Panel Data. EconometricSociety Monographs. CUP: New York. 27-44.

4 Mundlak, Y. (1978). On the Pooling of Time Series andCross-section Data. Econometrica 46: 69-85.

5 Hausman, J., Taylor, W.E. (1981). Panel Data andUnobservable Individual Effects. Econometrica 49: 1377-1398.

6 Pluemper, T., Traeger, V. (2007). Efficient estimation of

time-invariant and rarely changing variables in finite samplepanel analyses with unit fixed effects. Political Analysis 15:124-139.

7 Contoyannis, P., Rice, N. (2001). The impact of health onwages. Evidence from the BHPS. Empirical Economics 26:599-622.

3 / 6 2


4/62

1. Recap: Linear model set-up

4 / 6 2


5/62

Heterogeneous intercept modelsConsider the following linear regression model which allows forindividual-specific heterogeneity αi

Y it = X ′it β + εit , (1)

for all i = 1, . . . , N and t = 1, . . . , T

εit = αi + u it , (2)

• Y it is some outcome of interest;

• X it is a vector of covariates (X it 1, . . . , X itK )’ and generallyincludes a constant term, i.e. X it 1 = 1 for all i and t . These

may include also time-invariant variables such as X i .• The unobserved (errors) consist of two components: αi

(constant across time), u it is an idiosyncratic error term thatvaries across individuals and time

(u it ∼

iid (0, σ

2

u ),E

(εit |αi ,X i 1, . . . ,

X iT ))=0.

5 / 6 2


6/62

The model in matrix notation

The NT observations are ordered first by i units, and then by t observations, such that:

Y = X β + ε, (3)

The dimensions are:

• Y: NT × 1 vector of Y it ’s;

• X: NT × K matrix with rows columns X itk ;

• ε: NT × 1 vector of εit ’s.

6 / 6 2


7/62

OLS estimator in matrix form

The OLS estimator for β is:

β̂ OLS = (X ′X )−1(X ′Y ) (4)

Our focus here is how to estimate this model under differentassumptions about the individual-specific heterogeneity αi . Early

discussions (examples) in the literature were concerned withwhether αi should be treated as a random variable (which wouldadd a error term) or as a fixed parameter to be estimated for eachcross-sectional group.

More modern approaches to panel data econometrics are moreconcerned with the question whether αi is correlated with theexplanatory variables of interest (e.g. Wooldridge, 2009, p.285-286).

7 / 6 2


8/62

Random versus fixed effect models

We will examine the implications for OLS estimation under thealternative assumptions that:

1 αi is uncorrelated with X it for all t = 1, . . . , T (referred to as

”random effects model”);2 αi is allowed to arbitrarily correlate with X it for all

t = 1, . . . , T (referred to as ”fixed effects model”);

3 αi is assumed to linearly depend on X it (Referred to as”correlated random effects model”);

We will consider the suitable estimators in each case.

8 / 6 2


9/62

Random effect models

In the random effect model we assume Cov (X it , αi ) = 0

t = 1, . . . , T , or the stronger assumption of zero conditionalexpectation, i.e.: E (αi |X i 1, . . . X iT ) = 0. In this scenario, usingOLS will yield unbiased parameter estimates, but wrong standarderrors and thus unreliable statistical inference. Let’s take a look atwhy:

Consider the properties of the OLS estimator:

E (β̂ OLS |X ) = E {(X ′X )−1X ′Y |X } (5)

= E {(X ′X )−1X ′(X β + ε)|X } (6)

= β + (X ′X )−1X ′E {ε|X } (7)

= β (8)

9 / 6 2


10/62

Random effect modelsNow think of the sampling properties of the OLS estimator:

Var (β̂ OLS |X ) = Var {(X ′X )−1X ′Y |X } (9)

= Var {(X ′X )−1X ′(X β + ε)|X } (10)

= Var {β + (X ′X )−1X ′ε|X } (11)

= X ′

X −1

X ′

Var {ε|X }X (X ′

X )−1

(12)

Recall, the OLS assumption about ε is that εit ∼ iid (0, σ2) and so:

Var (β̂ OLS |X ) = σ2(X ′X )−1, (13)

and replacing σ2

by an estimate, typically the sample variance of the regression errors:

s 2 = 1

NT

N i =1

T t =1

e 2it . (14)

10/62


11/62


But what’s wrong with the variance when we allow for unobservedheterogeneity? Due to εit = αi + u it the assumption of independent errors across observations fails. In particular, if αi ∼ N (0, σ

2α) and u it ∼ iid (0, σ

2u ), where σ

2ε = σ

2u + σ

2α, then the

variance-covariance matrix of εi = (εi 1, εi 2, . . . , εiT )′:

Var (εi |X i ) =

σ2u + σ2α σ

2α . . . σ

2α σ

2α

σ2α σ2u + σ

2α . . . σ

2α σ

2α

σ2α σ2α . . . σ

2α σ

2α

σ2

α σ2

α . . . σ2u + σ

2

α σ2

ασ2α σ

2α . . . σ

2α σ

2u + σ

2α

=

σ2u I T ×T + σ2αi T ×1i

′1×T = Σ (i is a vector of ones).

11/62


12/62


Since the observations i and j are independent, the disturbancecovariance matrix for the full NT observations is:

Ω =

Σ 0 . . . 0 00 Σ . . . 0 0... . . .

... ...

0 0 . . . 0 Σ

NT ×NT

= I N ×N

ΣT ×T

12/62


13/62


There are two solutions to fix the wrong standard errors implied bycross-sectional unobserved heterogeneity when using OLS:

1 Correcting the OLS standard errors: robust covariance matrix

estimation; estimate model with OLS, then adjust standarderrors ex post.

2 Random effects estimation: obtain a more efficient estimatorof β using generalised least squares. transform the data first,then use OLS on transformed data - this approach is similar to

(feasible) GLS when controlling for e.g. heteroskedasticity.

13/62


14/62

1. Correcting OLS standard errors ex post

• Note that Var (β̂ OLS |X ) = X ′X −1X ′Var {ε|X }X (X ′X )−1

implies that Var (ε|X ) is a NT × NT matrix with a blockdiagonal structure;

• For each of the N cross-sectional groups there will be T × T diagonal blocks corresponding to Var (εi |X );

• Off these diagonal blocks the matrix has zeros due to theassumed independence of the cross-sectional sample;

• Thus we can correct the OLS standard errors by replacingVar (ε|X ) with a suitable estimate from the sample data.

14/62


15/62

1. Correcting OLS standard errors ex post

Suitable estimators are:

• Estimate σ2α by:

s 2α =NT (T − 1)

2

−1 N i =1

T −1t =1

T s =t +1

e it e is (15)

• Estimate σ2ε by:

s 2 = (NT )−1N

i =1

T

t =1

e 2it (16)

• This approach is nothing else than robust covariance matrixestimation (See p. 390 in Greene, 2012).

15/62


16/62

Random effects (or GLS) estimation

• We want to transform the data in a way that the variance of

the transformed errors is equal to the identity matrix; i.e.Var (ε̃) = ΓVar (ε)Γ′ = I NT (17)

• A good candidate for transforming the data for eachindividual is Σ−1/2 - hence, if we find this term, we can

pre-multiply Y i , X i and εi by Σ−1/2 (or, in terms of matrixnotation: Ω−1/2 and pre-multiply Y , X , and ε).

• See the derivations of Σ−1/2 on the blackboard;

• The final result is:

Σ−1/2 = 1σu

[I − θT

i T ×1i ′1×T ], (18)

whereθ = 1 −

σu

σ2u + T σ2α. (19)

16/62


17/62

Random effects estimation

Consider the following transformation of our benchmark linear

regression model:

Ω−1/2Y = Ω−1/2X β + Ω−1/2ε, (20)

or

Ỹ = X̃ β + ε̃. (21)

where, for instance:

Σ−1/2Y i =

Y i 1 − θ

¯Y i Y i 2 − θ Ȳ i

...Y iT − θ Ȳ i

, Σ−1/2X i =

X i 1 − θ

¯X i X i 2 − θ X̄ i

...X iT − θ X̄ i

.

17/62


18/62

Random effects estimation

We can show that the transformed errors ε̃ have the property that

Var (ε̃) = ΓVar (ε)Γ′

= I NT (try to do at home). Thus, feasibleGLS regression based on this transformation satisfies the necessaryassumptions for efficient estimation of β , and is referred to as therandom effects estimator:

ˆβ RE = (

˜X ′ ˜X )

−1 ˜X ′ ˜Y = (X

′

Ω−1

X )−1

X ′

Ω−1

Y . (22)

The variance of this estimator is (Homework: Check whether youcan derive all steps by yourself - we will talk about it in class nextweek)

Var (β̂ RE |X ) = Var {(X ′Ω−1X )−1X ′Ω−1Y |X } (23)

= (X ′Ω−1X )−1X ′Ω−1X (X ′Ω−1X )−1 (24)

= (X ′Ω−1X )−1 (25)

18/62


19/62

Fixed effect models

In fixed effect model we allow for the possibility of Cov (X it , αi ). Inthis case, the OLS estimator β̂ OLS will be biased and inconsistent.This is so because:

E (β̂ OLS |X ) = E {(X ′X )−1X ′Y |X } (26)

= E {(X ′X )−1X ′(X β + ε)|X } (27)

= β + (X ′X )−1X ′E {ε|X } (28)

= β, (29)

where the last inequality stems from the fact that E (αi |X it ) = 0.

19/62


20/62

Fixed effect models

There are two solutions to the problem. Re-consider the originalmodel:

Y it = X ′it β + εit , (30)

for all i = 1, . . . , N and t = 1, . . . , T

εit = αi + u it , (31)

1 Within-group fixed effects: subtract the within-group means

from the original regression equation that combines Eqs. 30and 31.

2 First-differences between two adjacent time periods.

20/62


21/62

Within-group fixed effectsConstruct the within-group average of benchmark linear regressionmodel:

Ȳ i = X̄ ′i β + αi + ū i , (32)

where Ȳ i = T −1T

t =1 Y it and X̄ i = T

−1T

t =1 X it , and

ū i = T −1T

t =1 u it . Then, subtract Eq. 32 from combined Eqs. 30and 31.

Y it − Ȳ i = (X it − X̄ i )′β + (u it − ū i ) − (αi − αi ). (33)

And so the within-group fixed effects estimator is:

β̂ FE = N i =1

T t =1

(X it − X̄ i )(X it − X̄ i )′−1 N

i =1

T t =1

(X it − X̄ i )(Y it − Ȳ i )

(34) 21/62


22/62

Within-group fixed effects

• In contrast to the random effects or GLS procedure which usesboth within-group (across time) and between-group (acrosscross-sectional units) variation to estimate β , the within-groupfixed effect approach uses only the within-group variation.

• Any time-invariant observable characteristics will alsodifference out, so that their coefficients cannot be identified(unless they are interacted with time-varying variables).

• N degrees of freedom will be lost, since this approachestimates the group sample means (one for each group).

• Even though the transformed errors in Eq. 33 (u it − ū i ) arenon-classical (which means what?), the OLS standard errorsfrom the fixed effects regression are correct.

22/62


23/62


24/62

Pros and cons

• The first differences approach is easy to implement manuallyand keeping track of the correct number of degrees of freedom is more straightforward.

• If the model is correctly specified and if there is no serialcorrelation, then within-group fixed effect estimation is moreefficient than first differences.

• The relative efficiency between the two estimators depends onthe degree of serial correlation in the idiosyncratic errors

(Cov (u it , u is ), for t = s ). (Why?)

24/62


25/62

2. Relationship between random and fixed

(and between) effect estimators

25/62


26/62

Some transformations

Consider the following transformation

• Group-means transformation: P = I N ×N T −1i T ×1i

′1×T ,

where I N ×N is the identity matrix of dimension N × N ,

isthe Kronecker product, and i T ×1 is a T × 1 vector of 1’s.

• Deviations from group means: Q = I NT ×NT − P

26/62


27/62

Some transformations, cont.

P and Q have the effect of transforming the data to group means,

and deviations from means, respectively:

PY =

Ȳ 1. . .Ȳ 1Ȳ

2. . .Ȳ 2. . .Ȳ N

. . .Ȳ N

, QY =

Y 11 − Ȳ 1. . .

Y 1T − Ȳ 1Y

21 − Ȳ

2. . .Y 2T − Ȳ 2

. . .Y N 1 − Ȳ N

. . .Y NT − Ȳ N

, and so on.

Note that P and Q are idempotent (P 2 = P , Q 2 = Q ) andorthogonal (PQ = 0).

27/62


28/62

Fixed and random effects similarity

Hence, the fixed effects estimator of β can be expressed in morecompact notation:

β̂ FE = (X ′QX )−1X ′QY (36)

The random effect transformation described above is a partial

deviation from group means:

Ỹ it = Y it − θ Ȳ i , (37)

and

X̃ it = X it − θ X̄ i , (38)

where θ = 1 −

σ2u σ2u +T σ

2α

1/2.

28/62


29/62

Fixed and random effects similarity

The partial deviations framework provides an optimal use of thewithin group and the between group variation. Note that the largeris the between-group fraction of total variation (i.e. σ2α relative toσ2u ) and/or the larger is T , the greater will be θ (closer to 1), and

the more weight is given to within-group, compared tobetween-group, variation.

• Suppose T=3 and σ2α = 0, then θ = 0 and the full variation inthe data is used, compared to θ = 0.5, if σ2α = σ

2u ;

• Alternatively, suppose σ

2

α = σ

2

u , then if T=3, θ = 0.5compared to θ = 0.75 if T=15.

29/62


30/62

Random effects as weighted average

The random effect estimator can be thought of as a weightedaverage of the within-group estimator β FE and the between-groupestimator β BE based on the group-means data:

β̂ RE = δ k ×k β̂ FE + (I k ×k − δ k ×k )β̂ BE , (39)

β̂ BE = N i =1

T ( X̄ i − X̄ )( X̄ i − X̄ )′−1 N

i =1

T ( X̄ i − X̄ )( Ȳ i − Ȳ )

= (X ′PX )−1X ′PY

30/62


31/62

Random effects as weighted average

δ = N i =1

T t =1

(X it − X̄ i )(X it − X̄ i )′ + λ

N i =1

T ( X̄ i − X̄ )( X̄ i − X̄ )′−1

×N

i =1

T

t =1

(X it − X̄ i )(X it − X̄ i )′

λ = σ2u

σ2u + T σ2α

= (1 − θ)2. (40)

• If λ = 0, then FE and RE are equivalent (a lot of weight isgiven to within-group variation).

• If λ = 1, a lot of weight is given to between-group variation.

• However, 0 < λ


32/62

Summary

• If E (αi |X it ) = 0,

• Both β̂ RE and β̂ FE are consistent for β (and so would be OLS).•

β̂ RE is efficient, β̂ OLS has biased standard errors.

• If E (αi |X it ) = 0,

• β̂ RE is inconsistent for β .

• β̂ FE is consistent for β .

32/62

T i


33/62

Testing

The efficiency/consistency trade-off between β̂ RE and β̂ FE suggests

a method to test the random effects restriction. One of these testsis the Hausman test. Under the null hypothesis,H 0 : E (αi |X it ) = 0, β̂ RE is efficient, but it is inconsistent under thealternative hypothesis (H a : E (αi |X it ) = 0). In contrast, β̂ FE isconsistent under both H 0 and H 1.

The Hausman test statistic for this test is:

H = (β̂ FE − β̂ RE )′{Var (β̂ FE − β̂ RE )}

−1(β̂ FE − β̂ RE ), (41)

where Var (β̂ FE − β̂ RE ) = Var (β̂ FE ) − Var (β̂ RE ) is thevariance-covariance matrix of the difference between the fixedeffects and random effects estimator.

33/62

T i


34/62

Testing

Under the null hypothesis, the Hausman test statistic has a χ2

distribution with degrees of freedom equal to the dimension of β ,i.e.:

H ∼ χ2k (42)

Note: Since the fixed effects estimation method can only identifycoefficients on time-variant variables, the relevant dimension of β is the number of time-varying variable coefficients.

34/62


35/62

3. Is fixed effects estimation always preferable

to random effects estimation?

35/62

I FE l b tt th RE?


36/62

Is FE always better than RE?

Recall: The fixed effects estimator uses only the within-group (=difference from group mean) variation and ignores thebetween-group variation. This method is used because of aconcern that this between-group variation is contaminated withunobserved heterogeneity.

In some cases, the cross-sectional variation may be more reliablethan the within-group time-variation, in which case fixed effects

estimation may be worse than the OLS or RE alternatives.

36/62

I FE l b tt th RE?


37/62

Is FE always better than RE?

Examples are:

• Measurement error in X it : If X it is measured with classical, i.e.purely random, error, then taking either differences-from-meanor first-differences will exacerbate the noise-to-signal ratio inthe resulting data → Serious attenuation bias in β̂ FE

• Endogenous changes in X it : If X is endogenous, i.e. changesin X it over time are not exogenous to changes in Y it , thenfixed effects estimation may be worse than random effects orOLS. In this case, (X it − X̄ i ) may be strongly correlated with(εit − ε̄i ).

• There may not be enough variation in the X variables,although FE can estimate the coefficient even if X rarelychanges (Pluemper and Troeger, 2007)

37/62


38/62

4. Hausman-Taylor (1981) approach toestimating coefficients on both time-varying

and time-invariant variables

38/62

Hausman Taylor (1981) approach


39/62

Hausman-Taylor (1981) approach

If we have a situation, in which we have both time-variant andtime-invariant variables of interest, Hausman and Taylor show that

consistent estimation of the coefficients of interest is possible, if not all of the time-varying coefficients are correlated with theunobserved heterogeneity,

The basic idea is to use the group means of the time-varyingvariables that are uncorrelated with the unobserved heterogeneityas instrument for the time-invariant variables to obtain consistentestimates of their coefficients, while consistent estimates of thetime-varying variable coefficients can be obtained using standard

fixed effects estimation.

This requires there are at least as many uncorrelated time-varyingvariables as correlated time-invariant variables and also that thereis suitable correlation between these.

39/62



40/62


Consider the linear regression of Y it on k time-varying covariates(X it ) and g time-invariant covariates (Z i ):

Y it = X ′

it β + Z ′

i γ + εit , (43)where i = 1, . . . , N , t = 1, . . . , T , and

εit = αi + u it , (44)

40/62



41/62


Sub-divide each of the X it = (X 1it X 2it )′ and Z i = (Z 1i Z 2i )

′:

• X 1it and X 2it consist of k 1 and k 2 variables, respectively(k 1 + k 2 = k );

• Z 1i and Z 2i consist of g 1 and g 2 variables, respectively(g 1 + g 2 = g );

• E (αi |X 1it ) = 0 and E (αi |Z 1i ) = 0; and

• E (αi |X 2it ) = 0 and E (αi |Z 2i ) = 0.

41/62



42/62

Hausman-Taylor (1981) approachThe intuition for the Hausman-Taylor approach is follows:

• STEP 1: Fixed effects provides consistent estimation of the

coefficients on the time-varying variables:

β̂ FE = (X ′QX )−1X ′QY (45)

Remember that Q = I − P , where P = I

T −1ii ′ The

residual variance obtained in this step is a consistent estimatorof σ2u .

• STEP 2: Using β̂ FE to construct the group means of thewithin-group residuals:

d̂ i = Ȳ i − X̄ ′i β̂ FE = Z ′i γ + αi + ū i , (46)

where ū i is the group-mean residual (u it ).

• If (46) was estimated with OLS or GLS, then γ̂ is likely to bebiased, due to the correlation of Z i 2 with αi .

42/62



43/62


Where does expression for d in (46) come from? The group meansof the within-group residuals are derived as follows:

d̂ = P (Y − X β̂ FE ) = P {I − X (X ′QX )−1X ′Q }Y

= P {I − X (X ′QX )−1X ′Q }(X β + Z γ + α + u )

= P (X β + Z γ + α + u − X β )

= P (Z γ + α + u )

= Z γ + α + Pu

This is a regression of the group-mean residuals from the fixedeffects regression on the Z ′i s , with αi + ū i being the group-meanresiduals.

43/62



44/62


• STEP 3: Use X̄ 1i as instruments for Z 2i . This will provide

consistent estimation of γ if there are sufficient X ′1s (i.e. ordercondition: k 1 ≥ g 2), and the X

′1i s are correlated with the Z

′2i s

(rank condition).

• Then estimate (46) with a 2-SLS approach, where:

γ̂ = (Z ′i P AZ i )−1Z ′i P A

d̂ i (47)

where A = [X 1it Z 1i ], and P A is the projection matrix:

P A = A(A′

A)−1

A′

(48)and

Ẑ 2 = A(A′A)−1A′Z 2 (49)

44/62



45/62


• NOTE: Both β̂ FE and γ̂ 2SLS are consistent. However, since

β̂ FE is likely to be inefficient, then γ̂ 2SLS , which stem from theFE approach are likely to be inefficient too. Therefore,Hausman and Taylor suggest an extension to estimate β andγ in a more efficient way.

• STEP 4: The residual variance in the step above is aconsistent estimator of σ∗2 = σ2u /T + σ

2α. Using the

consistent estimator of σ2u from the first step, we deduce anestimator for σ2α = σ

∗2 − σ2u /T . The weight for feasible GLSis:

θ = 1 − σu σ2u + T σ

2α

. (50)

45/62



46/62


• STEP 5: Construct a weighted instrumental variableestimator. The full set of variables is:

w ′it = (X ′1it X

′2it Z

′1i Z

′2i ) =⇒ W NT ×(k 1+k 2+g 1+g 2), (51)

so the transformed variables of GLS are:

w ∗′

it = w ′it − θ̂w̄

′i

Y ∗it = Y it − θ̂ Ȳ i .

The instruments used are:

v ′it = [(X 1it − X̄ 1i )′ (X 2it − X̄ 2i )

′ Z ′1i X̄ ′1i ] (52)

46/62



47/62

aus a ay o ( 98 ) app oac

1 Instrumental variable estimator (efficient):

(β̂ γ̂ )′IV = [(W ∗′ V )(V ′V )−1(V ′W ∗)]−1[(W ∗

′

V )(V ′V )−1(V ′Y ∗)](53)

2 Instrumental variable estimator using un-weighted variables

(inefficient):

(β̂ γ̂ )′IV = [(W ′V )(V ′V )−1(V ′W )]−1[(W ′V )(V ′V )−1(V ′Y )]

(54)

3 Feasible GLS estimator

(β̂ γ̂ )′GLS = [W ∗′W ∗]−1[W ∗

′

Y ∗] (55)

47/62


48/62

5. Correlated random effects (CRE): a flexible

extension to random effect models

48/62

Intuition of CRE


49/62

Recall that the random effects estimator is biased if α is correlatedwith X it . Chamberlain (1984) and Mundlak (1978) observed thatif αi is correlated with X it in period t , then it will also becorrelated with X it in period s , where t = s . One interpretation of

this observation is that X it should be included in the period s regression. More generally, all the realisations of the X ′s should beincluded in each period’s regression.

That is, if αi is correlated with X it in the structural form then all

leads and lags of X it should be included in the regression.

49/62

Formalisation of CRE


50/62

Specify the linear projection of αi on the set of X ′it s :

αi = X ′i 1λ1 + X

′i 2λ2 + . . . + X

′iT λT + ηi . (56)

Eq. 56 provides a way to decompose αi into two components:

1 A component (X ′i 1λ1 + X ′i 2λ2 + . . . + X ′iT λT ) that iscorrelated with the observable covariates; and

2 A component (ηi ) that is uncorrelated with the covariates.

The λ′s are the projection coefficients that reflect the extent of the

correlation between αi and X it , and ηi is, by construction, a truerandom effect - i.e. uncorrelated with X it for all t.

50/62

Formalisation of CRE


51/62

Note:• E (αi |X it ) does not have to be linear in the X

′it s . It is only the

linear correlation that causes bias/inconsistency in the OLSand (random effects/GLS) estimator. Hence, only the linear

projection is required for CRE to be unbiased/consistent.• Mundlak (1978) adopted the more restricted specification that

λ1 = λ2 = λT = λ. This restriction implies that Eq. 56reduces to:

αi = (T X̄ i )′λ + ηi (57)

51/62

Mundlak’s assumption and consequences


52/62

The assumption that the individual-specific effect is equallycorrelated with all time-period X ′it s implies a very easyimplementation of the correction. All you need to do is to replaceαi in Eq. 60:

Y it = X ′it β + αi + u it , (58)

With (and ignore the scaling factor of T):

αi = (T X̄ i )′λ + ηi (59)

To get:

Y it = X ′it β + X̄ ′i λ + ηi + u it , (60)

where ηi is a true random effect.

52/62

Chamberlain’s approach


53/62

If you do not want to make the strong assumptions made byMundlak, then implementation of this correction is slightly more

difficult. Use Eq. 56 to substitute for αi in combined Eq. 30, weget:

Y it = X ′it β + X

′i 1

λ1 + X ′i 2

λ2 + . . . + X ′iT λT + ηi + u it (61)

= X ′it (β + λt ) +s =t

X ′is λs + ηi + u it . (62)

or, in more compact form:

Y it = X ′i 1πt 1 + X

′i 2πt 2 + . . . + X

′iT πtT + ηi + u it . (63)

where πts = λs s = t

β + λt s = t .

53/62

Some explanations


54/62

Eq. 62 is the reduced form equation for the model. The errors

(ηi + u it ) are uncorrelated with the regressors. This expressionshows that one way to view the problem of ignoring the correlationbetween the covariates and the unobserved heterogeneity is anomitted variables problem that can be solved by including all theout-of-period realisations of X

is in the period t equation.

In Eq. 63, the coefficient on X it , i.e. πtt , consists of twocomponents:

1 The structural effect of interest β ;

2 The component λt , which reflects the correlation of X it withthe unobserved heterogeneity.

54/62

Estimation of CRE


55/62

The parameters of interest (β and λt ’s) can be estimated by theminimum distance approach - it requires two steps:

1 Estimate the unrestricted reduced form equations as outlinedin Eq. 63 by OLS. Include all the leads and lags of the X it ’s inthe period t regression, and estimate this regression separatelyfor each time period.

2 Estimate the parameters of interest by imposing the impliedrestrictions (see below) on the first-stage reduced formcoefficients using a minimum distance estimation method.This latter means to use a quadratic form criteria as the basisfor estimating the parameters of interest in the second stage.

The implied cross-equation restrictions are:

1 πts = λs ∀ t = s ;

2 πtt − πst = β ∀ t = s .

The details of minimum distance are explained on the white-board.55/62

Evaluation of CRE


56/62

• This approach is called random effects because itparameterises the distribution of αi (i.e. by projecting αi onto

the set of sample realisations of X it );• It requires to estimate 1 + TK + K parameters (risk of

proliferation of parameters);• It relies on the measured X it ’s being time-varying. Time

invariant variables will be absorbed into the αi in thisspecification;

• A test of the (zero-) correlation between the covariates andthe unobserved heterogeneity is given by testingH 0 : λ1 = λ2 = . . . = λT = 0 vs H a : not all are zero;

• An important caveat to the CRE discussion is that X is entersthe period t equation only via its correlation with αi . In somesituations, out-of-period regressors may have independent,structural reasons for being included (this approach may failthen).

56/62


57/62

6. Plumper and Troeger (2007) approach to

modelling (nearly) time-invariant variables

57/62

Three-stage procedure


58/62

1 Run a fixed effects model - predict the individual fixed effect;

2 Decompose the individual fixed effects into the part explainedby time-invariant and/or rarely changing variables and anerror term (hi );

3 Re-estimate the first stage by pooled OLS including thetime-invariant variables plus the error term of stage 2.

58/62

Three-stage procedure


59/62

y it − ȳ i = β k

K

k =1

(x kit − x̄ ki +γ m

M

m=1

(z mi −z mi )+(e it − ē i )+(u i −u i ))

(64)Let:

û i = ȳ i −K

k =1

β̂ k x kit − ē i ) (65)

û i =M

m=1

γ mz mi + hi (66)

and

hi = û i −M

m=1

γ mz mi (67)

y it = α + β k

K

k =1

x kit +M

m=1

γ mz mi + δ hi + εit (68)59/62

Monte Carlo Simulations


60/62

1 Compare finite sample properties of the FEVD estimatoragainst those of the Pooled OLS, RE, and Hausman-Taylor IVestimator (Use RMSE as criterion);

2 If both time-invariant and time-varying variables correlatestrongly with the individual FE, than FEVD outperforms all

estimators;3 When considering the estimates of coefficients on rarely

changing variables, FEVD outperforms FE if:• Ratio between Between/Within variation is high (threshold is

1.7), and;• Overall R 2 is low, and;• Correlation between rarely changing variables and ind. FE is

low.

60/62


61/62

7. Application: Effect of Health on HourlyWages (Contoyannis and Rice, 2001) using six

waves of BHPS

61/62

Assumptions


62/62

1 Remember: Use the mean values of the exogenous

time-varying variables to instrument the time-invariantendogenous variables

2 Time-invariant endogenous variables: Higher Degree;

3 Time-variant endogenous variables: Health (Psychological and

Physiological), workforce sector, occupation;4 Test for the validity of the instruments in the Hausman and

Taylor approach using a Hausman test (comparing theestimated coefficients with those of a FE model): They shouldbe sufficiently close.

5 Approach is valid only if health is correlated with theindividual, time-invariant effect of wages, but not with theperiod-specific effects of wages

62/62

lecture series 1 linear random and fixed effect models and their (less) recent extensions

Documents