endogenous regressors and correlated effects

Chapter 4Endogenous Regressors and Correlated Effects

Rachid Boumahdi and Alban Thomas

4.1 Introduction

There are several situations in econometric modeling where consistency ofparameter estimates is questionable because some explanatory variables may be cor-related with the model disturbances. Hence the fundamental exogeneity assumptionfor the regressors may not be supported by the data, with two implications. First,the source of this correlation might be investigated upon to propose possible correc-tions. Second, alternative but consistent estimators may be proposed.

One of the most well-known source of endogenous regressors is the case ofsimultaneous equations models, in which some of the regressors in a given equa-tion are the dependent variables in others and consequently are correlated with thedisturbances of the equation under consideration. Another cause of correlation be-tween explanatory variables and model disturbances is when the former are subjectto measurement errors. Chapter 9 provides a detailed treatment of simultaneity andmeasurement error issues in the case of panel data.

There is however an important reason why regressors may be endogenous inthe context of panel data. As discussed in the preceding chapters, accountingfor individual unobserved heterogeneity is usually done by incorporating randomindividual-specific effects to the usual idiosyncratic disturbances of the model. Con-sequently, regressors must be uncorrelated with these individual effects as well forconsistent estimates to be obtained. This assumption of no-correlation has beenwidely criticized by many authors, among which Mundlak (1978).

Rachid BoumahdiToulouse School of Economics, GREMAQ and LIHRE; Universite des Sciences Sociales deToulouse, 21 Allee de Brienne, 31000 Toulouse, France, e-mail: [email protected]

Alban ThomasToulouse School of Economics, INRA; Universite des Sciences Sociales de Toulouse, 21 Allee deBrienne, 31000 Toulouse, France, e-mail: [email protected]

91

https://www.researchgate.net/publication/226545180_Pooling_Time-Series_of_Cross-Section_Data?el=1_x_8&enrichId=rgreq-13d5d420-a05b-4c4b-9f9d-4e2033da64b6&enrichSource=Y292ZXJQYWdlOzIyNjc4OTI3OTtBUzoxMDQwMjk2ODIxNDMyMzJAMTQwMTgxNDAxNzEzOQ==

92 R. Boumahdi and A. Thomas

Consider for example an agricultural production model (crop yield responsefunction) where output depends on a set of inputs (labor, fertilizer, etc.). It is likelythat variables outside the scope of farmer’s decisions are also impacting the finalcrop output: soil characteristics (slope, water reserve, etc.) and climatic conditions.Land marginal productivity as represented by soil characteristics is often very diffi-cult to observe with precision, and is often supposed to be part of the farm specificeffect. But because farmer’s input choice is likely to depend on land productivity,observed input levels are likely to be correlated with the farmer specific effect. Thisis especially true for fertilizer and water inputs, whose application levels are likely tobe negatively correlated with systematic soil fertility and permanent water reserve,respectively.

Another popular example is the case of an individual earning function (wageequation), where the logarithm of the wage rate is explained by variables relatedto occupation, experience, and education. However, expected marginal productivityof a worker depends on individual ability, which is partly unobserved. In particu-lar, individual ability may positively influence working wages, as well as educationlevel of the individual. If the latter is an explanatory variable in the wage equationwhile being partly correlated with unobserved ability, individual effects (unobservedability) may then be correlated with regressors.

This chapter addresses the issue of correlated effects, and endogenous regressorsin the case of panel data. We present the main estimation and testing procedures em-ployed in a single-equation, linear panel-data context. Starting with a brief overviewof error structures and model transformations (fixed effects, first and quasi differ-ences), we present Instrumental Variable (IV) and Generalized Method of Moments(GMM) procedures for consistent and efficient estimation of static models. We de-vote a particular section to augmented linear models with time-invariant regressorsand show how to identify model parameters. Estimation of this kind of modelswith IV or GMM is discussed, and we compare in particular the efficiency of theseestimators, depending on the validity of a no-conditional-heteroskedasticity assump-tion. A way to measure instrument relevance in the context of panel data models esti-mated by instrumental-variables procedures is presented, based on single-parameterinformation. Estimation by Instrumental Variable of models including time-varyingregressors only is also the subject of a section, where endogenous regressors can beof any nature (time-varying only or not). As dynamic panel data models will be thesubject of Chap. 8, we do not deal with the vast literature on the subject, that hasemerged since the seminal work of Anderson and Hsiao (1982) and Arellano andBond (1991). We conclude this chapter by a brief presentation of unbalanced paneldata models with correlated effects and endogenous regressors, including nestederror component models.

4.2 Estimation of Transformed Linear Panel Data Models

Consider the linear panel data model:

yit = xitβ + uit, i = 1, . . . ,N ; t = 1, . . . ,T, (4.1)

4 Endogenous Regressors and Correlated Effects 93

where x′it is a K× 1 vector regressors depending on individual i and time t exceptthe first column of xit which is a vector of ones. The error term uit may containunobserved individual heterogeneity components, as in the one-way errorcomponent specification, uit = αi + εit . We assume for most of the chapter that thesample is balanced, i.e., each cross-sectional unit has the same number of non miss-ing observations (T ). The case of unbalanced panels will be briefly discussed inSect. 4.7.

As discussed in Chaps. 2 and 3, a conditional (fixed effects) or a random effectsapproach will lead to similar results asymptotically under standard assumptions,among which exogeneity of the xit s. On the other hand, when the correlation be-tween uit and some xit s in (4.1) is not accounted for, Ordinary or Generalized LeastSquares estimators are not consistent. In this case, an easy way to cope with suchendogeneity is simply to filter out this component. Such a strategy is applicable to avariety of error structures, as we now see.

4.2.1 Error Structures and Filtering Procedures

We present here basic transformations for eliminating the unobserved individualheterogeneity component in linear models. The motivation for such filtering in mostcases comes from endogeneity issues, and in particular the fact that regressors arecorrelated with individual effects.

In most applications, the error component structure can be specified as a particu-lar case of the following representation:

uit = αi + λtvi + εit , (4.2)

where αi and vi are unobserved heterogeneity terms, λt is a time effect, and εit isi.i.d. across individuals and time periods. Let σ2

α , σ2v and σ2

ε respectively denote thevariance of αi, vi and εit . The most important special cases are:

Case 1. (One-way error component model) λt = λ ∀t.Case 2. (Two-way error component model) vi = v ∀i.Case 3. (Cross-sectional dependence Type I) αi = α ∀i.

Case 1 is by far the most widely used specification. When λt is constant acrosstime periods, the error component structure reduces to αi + λ vi + εit ≡ α∗i + εit (theone-way specification).

In case 2, λt can represent a trend function or simply consist of (non-monotonic)time effects that impact all units in a similar way for a given time period. It mayhowever be of interest in applications to consider heterogeneous trends, where themarginal impact of the common time shock θt is individual-specific; this is obtainedin case 3. In the general case of (4.2) where αi and vi are allowed to vary across units,we have both heterogeneous intercepts and slopes on the time effects.

Let us examine model transformations to eliminate heterogeneous individual het-erogeneity terms in each of the cases presented above.


For case 1, the most common practice is to wipe out αi with the Within-group(fixed effects) transformation, εit − εi = (yit − yi)− (xit − xi)β , where yi denotes theindividual mean for unit i and variable yit . This equation provides a simple way ofobtaining consistent least squares estimation of β under the assumption of strongexogeneity: E[(xit − xi) |εis] = 0 ∀s,∀t.

Alternatively, we may use the first-difference transformation Δuit = Δεit = Δyit−Δxitβ = (yit − yi,t−1)− (xit − xi,t−1)β , and consistent estimation of β then obtainsunder the assumption that E[Δxit |εit ,εi,t−1] = 0, a somehow weaker assumption thanabove. In vector form, we can use the T × (T −1) submatrix LT for performing firstdifferences:

LT =

⎡⎢⎢⎢⎢⎢⎢⎢⎣

1 0 0 · · · 0 0−1 1 0 · · · 0 0

......

......

......

0 0 0 · · · −1 10 0 0 · · · 0 −1

⎤⎥⎥⎥⎥⎥⎥⎥⎦

.

Using first differences introduces a moving-average serial correlation on thetransformed residual (Arellano and Bover, 1995). To remove such a correlation,it is possible to use the Orthogonal deviation procedure:

y∗it =√

T − t√T − t + 1

[yit − 1

T − t

s=T

∑s=t+1

yis

], (4.3)

i = 1, . . . ,N t = 1, . . . ,T −1.Whatever the transformation considered, be it within-group (fixed effects), first

differences or orthogonal deviations, identification of parameter β is possible (ex-cept the constant term) because it is assumed that xit is time-varying. First differ-ences and deviations from individual means allow one to obtain the same informa-tion because operators QT (for fixed effects) and LT (for first differences) span thesame column space, with QT = LT (L′T LT )−1L′T .

The choice between fixed effects and first differences, on the grounds ofefficiency, depends in practice on assumptions made on homoskedasticity assump-tions as follows. Maintaining the strict exogeneity assumption E(εit |xi,αi) = 0, t =1, . . . ,T , where xi = (xi1, . . . ,xiT ), if we further assume that E(εiε ′i |xi,αi) = σ2

ε IT

(no heteroskedasticity nor serial correlation), then fixed effects is the most efficientestimator in the class of models satisfying these conditions. On the other hand, if wereplace the latter assumption by

E(ΔεiΔε ′i |xi,αi) = σ2Δε IT−1,t = 2, . . . ,T,

then it can be shown that the first-difference estimator is more efficient. This is thecase when εit follows a random walk.

In case 2, filtering of both individual and time effects can be achieved by meansof a modified Within operator which simultaneously filters out time-invariant and


time-varying only components. We will discuss such transformation in detail in thesection on time-varying only regressors below.

The model corresponding to case 3 was suggested by Holtz-Eakin, Newey andRosen (1988), Ahn, Lee and Schmidt (2001), Lillard and Weiss (1979). Unless λt isconstant across time periods, Within-group or first-difference transformations willfail to filter out the unobserved individual heterogeneity component αi.

Define a new variable rt = λt/λt−1; substracting from the equation at time t itsexpression lagged one period and premultiplied by rt , we have

yit − rtyi,t−1 = (xit − rtxi,t−1)β + εit − rtεi,t−1. (4.4)

The transformed model using the Quasi-differencing technique is now a nonlin-ear equation with additional parameters to be estimated: rt , t = 2,3, . . . ,T . Interest-ingly, parameters associated with time-invariant regressors become identified witha nonlinear regression of (4.4). This is the only case of such identification for thoseparameters in transformed models of the kind presented here.

Consider now the general case (4.1). To eliminate both effects αi and vi, it is nec-essary to use a double-transformation: first differences, and then quasi-differences:

�yit − rt�yi,t−1 = (�xit − rt�xi,t−1)β +�εit − rt�εi,t−1, (4.5)

i = 1,2, . . . ,N, t = 3,4, . . . ,T , where

rt =�λt/�λt−1 = (λt −λt−1)/(λt−1−λt−2).

Such double transformation of the model has been suggested by Nauges andThomas (2003) in the dynamic panel data context. Wansbeek and Knaap (1999) usea double first-difference transformation in the special case of a dynamic panel datamodel with a random trend with λt = t (the random growth model, see Heckmanand Holtz (1989)).

In what follows, we will mostly be working with the one-way error componentmodel uit = αi + εit .

4.2.2 An IV Representation of the Transformed Linear Model

Most estimators for linear panel data models can be shown to derive from the fol-lowing orthogonality condition in matrix form:

E[A′ (TU)

]= 0 ⇔ 1

NA′TY =

1N

A′T Xβ , (4.6)

where A is a NT × L matrix of instruments and T is a NT ×NT matrix transfor-mation operator. Let Q = INT −B and B = IN ⊗ (1/T)eT e′T denote the Within andBetween matrix operators respectively, where eT is a T vector of ones. The fixed ef-fects estimator obtains with A = X and T = Q so that βW = (X ′QX)−1X ′QY becauseQ is idempotent. In the one-way model, the GLS estimator obtains with A = X and


T = Ω−1/2 so that βGLS = (X ′Ω−1X)−1X ′Ω−1Y , where the covariance matrix ofU = α + ε is:

Ω = θ 21 Q+ θ 2

2 B, Ω−1 = (1/θ 21 )Q+(1/θ 2

2 )B, Ω− 12 = (1/θ1)Q+(1/θ2)B, (4.7)

where θ 21 = σ2

ε , θ 22 = σ2

ε + Tσ2α .

Under the strict exogeneity assumption and assuming that the error structure iscorrectly represented, consistent estimates are obtained from moment conditions asin (4.6). Therefore, most popular estimators for linear panel data models can berepresented in a IV form.

Depending on assumptions made on the error structure and the choice of the in-strument matrix, estimators can be either inconsistent of inefficient, and it is there-fore important to test for the validity of conditions underlying the construction of theestimator. To disentangle model misspecification due to an invalid set of instrumentsfrom an invalid transformation matrix, different specifications should be tested. Es-timates constructed from either the same A but a different T , or the opposite, can beused to form a series of specification tests.

As presented in Chap. 3, the Generalized Least Squares (GLS) estimator maybe selected on the grounds of efficiency in the case of the one-way linear paneldata model, if assumptions underlying the random-effects specification are valid (inparticular, strict exogeneity of the xit s). If however, E(αixit) �= 0, then GLS is notconsistent, and fixed effects (or any transformation filtering out unobserved individ-ual effects) should be used instead.

A very simple specification test is the Hausman exogeneity test, constructed asfollows (Hausman, 1978). The null hypothesis to test is: H0 : E(x′itαi) = 0∀i,∀t, andwe have two estimators available. β1 (e.g., the GLS) is consistent are efficient underthe null, and inconsistent otherwise, while the fixed effects estimator βW is consis-tent under the null and under the alternative, but is not efficient (under the null).

The Hausman test for linear panel data is based on the fact that, under H0, bothestimators should be asymptotically equivalent, β1 being more efficient. The teststatistic is

HT =(

βW − β1

)′ [Var(βW )−Var(β1)

]−1(βW − β1

)� χ2(K),

where K is the column dimension of βW . Note that β1 and βW must have the samedimension, i.e., parameters identified with the fixed effects procedure. Also, the

weighting matrix[Var(βW )−Var(β1)

]is always semidefinite positive because β1

is more efficient than Within under the null.Finally, concerning the interpretation of the number of degrees of freedom of

the test, the Within estimator is based on the condition E(X ′QU) = 0, whereas β1 isbased on a larger set of moment conditions. This is in fact the origin of the differencein efficiency between both estimators. In the case of GLS, the set of conditionsis E(X ′−1U) = 0 ⇒ E(X ′QU) = 0 and E(X ′BU) = 0, and we therefore add Kadditional conditions (in terms of B), which is the rank of X .


It is important to note at this stage that both cases considered up to now arerather polar (extreme) cases: either all of the explanatory variables are endogenous,or neither of them is.

If we do not wish to maintain the assumption that all regressors are correlatedwith individual effects, an alternative estimation method may be considered: Two-Stage Least Squares (2SLS) or Instrumental Variable (IV) estimation. Recall that ina cross-section context with N observations, the model would be:

Y = Xβ + ε, E(X ′ε) �= 0, E(A′ε) = 0, (4.8)

where A is a N×L matrix of instruments. If K = L, the orthogonality condition is[A′(Y −Xβ )

]= 0 ⇔ (A′Y ) = (A′X)β , (4.9)

and the IV estimator is β = (A′X)−1A′Y . If L > K, the model is over-identified(L conditions on K parameters). For any matrix A, let P[A] = A(A′A)−1A′ be theprojection onto the column space of A. We can construct the quadratic form (Y −Xβ )′P[A](Y −Xβ ) and the IV estimator is β = (X ′P[A]X)−1(X ′P[A]Y ).

In the cross section context, instruments A originate outside the structural equa-tion. In panel data models however, as we will see below, the advantage is thatinstruments (not correlated with the individual effect) can be obtained directly. An-other important difference in practice is that, when dealing with panel data, sphericaldisturbances can no longer be assumed.

4.3 Estimation with Time-Invariant Regressors

4.3.1 Introduction

When considering estimation of a model with correlated effects, two argumentsare in favor of yet another estimation procedure than Fixed Effects. First, one cansometimes obtain more efficient parameter estimates than the Within. Second, us-ing the Within estimator does not enable us to estimate parameters associated totime-invariant explanatory variables. Indeed, as the estimator is built upon differen-tiating all variables with respect to individual means, then all variables which areindividual-specific are dropped from the equation to be estimated.

For these reasons, an estimation method based on instrumental variables is calledfor. As we will show, Instrumental-Variables (IV) estimators yield more efficient es-timators than the Within procedure, while allowing identification of all parametersin the model. To motivate its use, we are going to present in this section an aug-mented model, in which some of the explanatory variables may be endogenous,and some regressors are not time-varying but only individual-specific. Includingindividual-specific variables zi is indeed important from an empirical perspective,as many samples contain important information on individuals, which does not vary


over time (e.g., sex, education completed, place of residence if individuals have notmoved during the whole sample period).

Hausman and Taylor (1981) – hereafter HT – consider the following model:

yit = xitβ + ziγ + αi + εit , i = 1, · · · ,N; t = 1, · · · ,T, (4.10)

where εit is assumed to be uncorrelated with xit , zi and αi while the effects αi maybe correlated with some explanatory variables in xit and/or zi.

Stacking all NT observations we can write (4.10) as: Y = Xβ +Zγ +α +ε , whereY is NT ×1, X is NT ×K, Z is NT ×G, ε and α are NT ×1 respectively. If X andZ are uncorrelated with α , the Generalized Least Squares (GLS) estimator yieldsconsistent and efficient parameter estimates:

μGLS =

[1

θ 21

Φ′QΦ+1

θ 22

Φ′BΦ

]−1[1

θ 21

Φ′QY +1

θ 22

Φ′BY

], (4.11)

where Φ = [X ,Z] and μ ′ = [β ′,γ ′]. This estimator may generally be found more sim-

ply computationally by first transforming X , Z and Y to Y ∗ = Ω− 12 Y, X∗ = Ω− 1

2 Xand Z∗ = Ω− 1

2 Z and then estimating β and γ from the Ordinary Least Squares(OLS) regression of Y ∗ on X∗ and Z∗. The estimated variance–covariance matrix ofthe GLS estimator δGLS is:

V (μGLS) = σ2ε

[1

θ 21

Φ′QΦ+1

θ 22

Φ′BΦ

]−1

, (4.12)

where σ2ε = θ 2

1 = u′W uW /(NT −K−G), θ 2

2 = u′BuB/(N−K), uW and uB are the

within and the between residual respectively.

4.3.2 Instrumental Variable Estimation

Following HT, we partition X and Z as follows:

X = [X1,X2] and Z = [Z1,Z2],

where X1 is NT × k1, X2 is NT × k2, Z1 is NT × g1 and Z2 is NT × g2, so that themodel in matrix form is

Y = X1β1 + X2β2 + Z1γ1 + Z2γ2 + α + ε. (4.13)

HT distinguish columns of X and Z which are asymptotically uncorrelated withα from those which are not. They assume, for fixed T and N → ∞, that

plim1N

(BX1)′α = 0, plim1N

(BX2)′α �= 0, plim1N

Z′1α = 0, plim1N

Z′2α �= 0.


The way to estimate model (4.13) using an IV procedure is to rely on theexogeneity conditions above to construct a matrix of instruments. However, themethod used differs from the standard one in simultaneous-equations literature. Inthe latter, a single equation is often estimated, which incorporates some endogenousvariables among the regressors. All exogenous variables in the system are used as in-struments, that is, exogenous variables not entering the equation of interest are alsoaccounted for. In our case however, all the information is already contained in thesingle equation, meaning that we are able to construct instruments from variables in(4.13) alone. To see this, note that we are looking for instrument variables not cor-related with the individual effect α . There are three ways such instruments may befound. First, exogenous variables X1 and Z1 are readily available because of the exo-geneity conditions given above. Second, we may also obtain additional instrumentsthrough transformations of the original exogenous variables, because such transfor-mations will also be exogenous. Third, we may consider as well transformations ofendogenous variables, provided these transformations are not correlated with α .

An important aspect of panel data methods is that required transformations arevery easily obtained through the use of matrices Q and B defined before. Matrix Bcalculates individual means of variables across all time periods, leaving the indi-vidual component unchanged. Therefore BX1 is clearly applicable as an instrument,whereas BX2 would not be, because endogeneity in X2 comes through the individualcomponent which is correlated with α . The Q matrix operates differentiation fromindividual means, filtering out the individual component. Therefore, QX1 and QX2

are also valid instruments, although the original X2 variable is endogenous.These considerations led HT to propose an IV estimator for a model correspond-

ing to our (4.14). Their instrument matrix AHT is the following:

AHT = (AHT1 ,AHT

2 ),

where AHT1 = (QX1, QX2) and AHT

2 = (BX1, Z1). We can show that:

P[AHT] = AHT(AHT′AHT)−1AHT′ = P[AHT1 ] + P[AHT

2 ]. (4.14)

To compute the efficient HT estimator we transform (4.13) by premultiplyingit by Ω− 1

2 , so that the error term will have a diagonal covariance matrix. UsingHT instruments AHT

1 = (QX1, QX2) and AHT2 = (BX1, Z1), the IV estimator can be

written as:

μIV =

[1

θ 21

Φ′P[AHT1 ]Φ+

1

θ 22

Φ′P[AHT2 ]Φ

]−1[1

θ 21

Φ′P[AHT1 ]Y +

1

θ 22

Φ′P[AHT2 ]Y

], (4.15)

and its variance–covariance matrix is

Var(μIV) = σ2ε

[1

θ 21

Φ′P[AHT1 ]Φ+

1

θ 22

Φ′P[AHT2 ]Φ

]−1

. (4.16)


Breusch, Mizon and Schmidt (1989) -hereafter BMS- show that this is equivalentto using the alternative instrument matrices AHT, CHT and DHT defined as follows

AHT = (AHT1 , AHT

2 ), AHT1 = (QX1,QX2), AHT

2 = (BX1,Z1)or CHT = (CHT

1 , CHT2 ), CHT

1 = (Q), CHT2 = (X1,Z1)

or DHT = (DHT1 , DHT

2 ), DHT1 = (QX1,QX2), DHT

2 = (X1,Z1).

We will not enter into too much detail about these equivalences (see BMS, 1989for more). Note however that the superiority of IV over Within estimators is easilyseen, as far as the estimation of parameters β is concerned. The fixed effects pro-cedure amounts to using the Q matrix as a single instrument. As it is well knownthat an IV estimator is more efficient when we add instruments, it is clear that theHausman–Taylor estimator is more efficient than the Within estimator, since it en-tails (BX1,Z1) as additional instruments.

A final difficulty with IV estimators concerns estimation of variance components,because endogeneity of some regressors will yield inconsistent estimates of σ2

α andσ2

ε if the standard Feasible GLS procedure is used. Hausman and Taylor (1981) de-scribe a method for obtaining consistent estimates. Let η denote the Within residualaveraged over time periods:

η = BY −BX βW = (B−BX(X ′QX)−1X ′Q)Y= Zγ + α + Bε−BX(X ′QX)−1X ′Qε. (4.17)

If the last three terms in the equation above are treated as zero-mean residuals, thenOLS and GLS estimates of γ will be inconsistent. However, consistent estimation ispossible if the columns of X1 provide sufficient instruments for the columns of Z2.A necessary condition is that k1 � g2. The IV estimator of γ is

γB =[Z′P[R]Z

]−1[Z′P[R]η

], (4.18)

where R = (X1,Z1). Now, using parameters estimates βW and γB, one forms theresiduals

uW = QY −QX βW and uB = BY −BX βW −ZγB. (4.19)

These two vectors of residuals are finally used in the computation of the variancecomponents as follows.1

σ2ε =

u′W uW

NT −Nand σ2

α =u′BuB

N− 1

Tσ2

ε

4.3.3 More Efficient IV Procedures

The Hausman–Taylor IV procedure has proved very popular, because of its relativecomputational simplicity and intuitive appeal. Since then however, there has been

1 For details, see Hausman and Taylor (1981), p. 1384.


several improvements along its lines which led to more efficient estimationprocedures.

The instruments used by Hausman and Taylor require only minimal exogeneityassumptions on variables, i.e., BX1 and Z1 are not correlated with the individual ef-fect. As a consequence, this estimator may not be the most efficient if exogeneityconditions can be made more restrictive. Amemiya and MaCurdy (1986) – hereafterAM – suggested a potentially more efficient estimator by assuming that realiza-tions of X1 are not correlated with α in each time period, i.e., for all t = 1, . . . ,Tand N → ∞ they assume that plim(1/N)x′1itαi = 0. Consequently, we may notonly use BX1 as an instrument for individual i at time t, but also the whole series(x1,i1,x1,i2, . . . ,x1,iT ). AM define the following NT ×Tk1 matrix:

X∗1 = vec{

eT ⊗ x′1,i

}={

eT ⊗ x′1,1, . . . ,eT ⊗ x′1,N

}, where x1,i = (x1,i1, . . . ,x1,iT )′,

which is such that QX∗1 = 0 and BX∗1 = X∗1 . Their instrument matrix is AAM =(AAM

1 ,AAM2 ), where AAM

1 = (QX1,QX2) and AAM2 = (X∗1 ,Z1). An equivalent esti-

mator obtains by using the matrix CAM = (CAM1 ,CAM

2 ), where CAM1 = (QX1,QX2)

and CAM2 = [(QX1)∗,BX1,Z1], (QX1)∗ is constructed the same way as X∗1 above.

These authors suggest that their estimator is at least as efficient as Hausman–Taylor if individual effects are not correlated with regressors X1 for each time period.

Note that the AM estimator differs from HT estimator only in its treatment ofX1. In fact, AHT

1 = AAM1 and CAM

2 = ((QX1)∗,BX1,Z1) differs from AHT2 = (BX1,Z1)

only by using (QX1)∗. In other words, HT use X1 as two instruments namely QX1

and BX1 whereas AM use each such variable as T +1 instruments: (QX1)∗ and BX1.Finally, a third IV method was described in BMS. Following these authors, if

the variables in X2 are correlated with effects only through a time-invariant compo-nent, then (QX2) would not contain this component and (QX2)∗ is a valid instru-ment. Their estimator is thus based on the following instrument matrix : ABMS =(ABMS

1 ,ABMS2 ), where ABMS

1 = (QX1,QX2) and ABMS2 = [(QX1)∗,(QX2)∗,BX1,Z1].

The estimated variance–covariance matrix of the IV estimator δIV has the sameform as in (4.16), where σ2

u = u′IVuIV/(NT −K−G) and uIV is the IV residual.

The Hausman test statistic can be used to check for the vality of the alternativeIV estimators described above. The HT-IV estimator can first be compared withthe fixed effects, to check that exogeneity assumption on X1 and Z1 are valid? Ifthis is the case, then the more efficient procedures of AM-IV and BMS-IV can becompared with HT-IV to check that additional assumptions described above aresupported by the data. See Cornwell and Rupert (1988) for an illustration of thesetest procedures.

4.4 A Measure of Instrument Relevance

It may be interesting in practice to investigate the performance of instruments interms of efficiency of IV estimators on an individual-regressor basis. Cornwell andRupert (1988) and Baltagi and Khanti-Akom (1990) have investigated efficiency


gains of instrumental variable estimators by fitting a wage equation on panel dataand applying the methods proposed by HT, AM and BMS. Cornwell and Rupert(1988) found that efficiency gains are limited to the coefficient of time-invariantendogenous variables Z2.

However, Baltagi and Khanti-Akom (1990) using the canonical correlation co-efficient for comparing different sets of instrumental variables found that efficiencygains are not limited to the time-invariant variable. They also show that the geo-metric average of canonical correlations increases as one moves from HT to AM,and then from AM to BMS. In fact, the canonical correlations only measure instru-ment relevance for the group of endogenous regressors taken as a whole, but cannotbe used to measure how a particular group of instruments affects relevance for oneendogenous regressor as opposed to another.

More recently, Boumahdi and Thomas (2006) have extended the method pro-posed by Shea (1997) and Godfrey (1999) to the case of panel data. This methodallows for measuring instrument relevance for separate endogenous regressors. Fol-lowing Shea (1997) and Godfrey (1999), we consider estimation of a single param-eter by rewriting the augmented model Y = Xβ + Zγ + α + ε as

Y = Mδ + α + ε = M1δ1 + M2δ2 + ε, (4.20)

where M = [X ,Z] and δ ′ = [β ′,γ ′], M1 is NT ×1 and M2 is NT × (K + G−1).Define M1 = (INT −PM2)M1, M1 = (INT − PM2

)M1 and Mj = PAMj, j = 1,2where A is the matrix of instruments. In our panel data model, δ1 would for examplecorrespond to the first variable in Ω− 1

2 X2. These definitions imply that M′1M1 =

M′1M1. Using the same idea as in Shea (1997) and Godfrey (1999) in the case of a

linear multiple regression model, we can use as a measure of instrumental variablerelevance, the population squared correlation between M1 and M1 for the model:

ρ2p = plim

(M′1M1

)2

(M′1M1

)(M′

1M1)= plim

M′1M1

M′1M1

. (4.21)

In applied work, provided N tends to infinity, we can approximate plimM′1M1/M′

1M1 by the following coefficient

R2p =

M′1M1

M′1M1

. (4.22)

It is not necessary in practice to compute the above expression, because the coef-ficient R2

p is directly related to the estimated parameter standard errors. To see this,

consider the estimated variance of the first component of δGLS and the correspond-ing component in δ IV:

V(

δ GLS1

)= σ2

ηGLS

(M′

1M1

)−1, V

(δ IV

1

)= σ2

ηIV

(M′1M1

)−1.


Then, R2p can be written as

R2p =

σ2ηIV

V(

δ GLS1

)σ2

ηGLSV(

δ IV1

) =M′1M1

M′1M1

. (4.23)

Consequently, the measure of instrumental variable relevance can be directly ob-tained by inspecting individual parameter (squared) standard errors.

4.5 Incorporating Time-Varying Regressors

Wyhowski (1994), Boumahdi and Thomas (1997) have extended the augmentedmodel by incorporating time-varying regressors, i.e., variables which are notindividual-specific, only time-period-specific. Think for example of a wage equa-tion depending on individual-specific variables such as sex and education, and ontime-varying regressors such as unemployment rate, economy-wide growth rate,etc. The intuition behind such a model would be that all individuals are affectedby macro-economic variables the same way on average. Consider the two-way errorcomponent model as case 2 defined above:

uit = αi + λt + εit . (4.24)

The extended model we are considering is now the following:

yit = xitβ + ziγ + wtδ + αi + λt + εit , i = 1, · · · ,N; t = 1, · · · ,T, (4.25)

where x′it is a K×1 vector of time-varying explanatory variables, z′i is a G×1 vec-tor of time-invariant explanatory variables, and w′t is a H× 1 vector of individual-invariant explanatory variables. Unobserved effects αi and λt are assumed to havezero mean and variances σ2

α and σ2λ respectively. We assume further that E(εit) =

0, E(εitεis) = σ2ε for t = s, E(εit εis) = 0 otherwise and E(αiεit) = E(λtεit ) = 0∀i,∀t.

Stacking all NT observations we can write the model in a compact form as:

Y = Xβ + Zγ +Wδ + α + λ + ε. (4.26)

Let us introduce some notation for this model. As before, B is the Between ma-trix transforming variables into their means across periods (individual means); wenow define B as a matrix transforming a variable into its mean across individu-als (time mean). Hence, BY is time-invariant and individual-specific, whereas B istime-varying and independent from individuals. Let

B = IN⊗ 1T

eT e′T , B =1N

eNe′N ⊗ IT ,

Q = INT −B− B+ J, J =1

NTeNT e′NT = BB.


The new matrix Q allows to differentiate a given variable according to both its timeand individual means. The J operator performs the total mean of a variable, i.e., JXis a NT ×1 matrix with the same argument 1

NT ∑Ni=1 ∑T

t=1 Xit . With this notation, thevariance–covariance matrix of the error term U reads:

Ω = θ 21 S1 + θ 2

2 S2 + θ 23 S3 + θ 2

4 J, (4.27)

where θ 21 = σ2

ε , θ 22 = σ2

ε + T σ2α , θ 2

3 = σ2ε + Nσ2

λ , θ 24 = σ2

ε + Nσ2λ + T σ2

α ,S1 = IT N − S2− S3− J, S2 = B− J, S3 = B− J, SkSl = 0 and JJ′ = J for l �= kand k, l = 1,2,3.

It is easy to show that2

Ω−1 = (1/θ 21 )S1 +(1/θ 2

2 )S2 +(1/θ 23 )S3 +(1/θ 2

4 )J, (4.28)

andΩ−

12 = (1/θ1)S1 +(1/θ2)S2 +(1/θ3)S3 +(1/θ4)J. (4.29)

If we assume that X ,Z and W are uncorrelated with α and λ , then model param-eters can be estimated by GLS as follows:

νGLS =

[3

∑k=1

1

θ 2k

Ψ′SkΨ

]−1[ 3

∑k=1

1

θ 2k

Ψ ′SkY

], (4.30)

where Ψ = [X ,Z,W ] and ν ′ = [β ′,γ ′,δ ′].

4.5.1 Instrumental Variables Estimation

Following HT and Wyhowski (1994) we allow for correlation between a subset of(X ,Z,W ) and (α,λ ), and we partition X ,Z and W as follows:

X = (X1, X2, X3, X4), Z = (Z1, Z2) and W = (W1, W2).

Their dimensions are denoted as follows: k1,k2,k3,k4,g1,g2,h1 and h2 for X1,X2,X3,Z1,Z2, W1 and W2 respectively. Furthermore, we assume that X1 is not correlatedwith α and λ , X2 is correlated with α but not λ , X3 is correlated with λ but not α ,X4 is correlated with both λ and α .3

However, Z1 and W1 are assumed uncorrelated with α and λ respectively. In otherwords and following Wyhowski (1994), we assume that, for T fixed and N −→ ∞:

plim(S2X1)′α = 0, plim(S2X3)′α = 0, plim(S2Z1)′α = 0,

and, for N fixed and T −→ ∞:

plim(S3X1)′λ = 0, plim(S3X2)′λ = 0, plim(S3W1)′λ = 0.

2 We can also show that Ω = σ 2ε INT +Tσ 2

α +Nσ 2λ .

3 Boumahdi and Thomas (1997) have considered another partition of X ,Z and W .


Under this assumption, we can use as an appropriate instrument set:

AHT = (AHT1 ,AHT

2 ,AHT3 ),

where AHT1 = (S1X), AHT

2 = (S2X1,S2X3,S2Z1) and AHT2 = (S3X1,S3X2,S3W1).

Then the HT estimator can be written as:

νIV =

[3

∑k=1

1

θ 2k

Ψ′P[AHTk ]Ψ

]−1[ 3

∑k=1

1

θ 2k

Ψ′P[AHTk ]Y

], (4.31)

where AHTk is the matrix of instruments. The order condition for existence of the

estimator can be obtained by counting instruments and parameters to be estimated.For parameters γ we must have:

K + k1 + k3 + g1 ≥ K + G or k1 + k3 ≥ g2,

and for parameters δ , we must have:

K + k2 + h1 ≥ K + H or k1 + k2 ≥ h2,

where K = k1 + k2 + k3 + k4, G = g1 + g2 and H = h1 + h2.Now, if we assume that plim(S2X1)′α = 0 and plim(S2X3)′α = 0, ∀t = 1, . . . ,T ,

and following AM, X1 and X3 can be used as two instruments: (S1X1,S1X3) andX∗1 ,X∗3 . X∗1 is the NT ×T k1 matrix defined as in the one-way AM case presentedabove, and

X∗3 = vec{

eT ⊗ x′3,i

}={

eT ⊗ x′3,1, . . . ,eT ⊗ x′3,N

}, where x3,i = (x3,i1, . . . ,x3,iT )′.

Furthermore, if we assume that plim(S3X1)′λ = 0 and plim(S3X2)′λ = 0 foreach i, i = 1, . . . ,N, then X1 and X2 can be used as two instruments (S1X1,S1X2) andX0

1 ,X03 , where

X01 =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

X1,11 X1,21 . . . X1,N1...

......

...X1,1T X1,2T . . . X1,NT

X1,11 X1,21 . . . X1,N1...

......

...X1,1T X1,21 . . . X1,N1

. . . . . . . . . . . .

......

......

. . . . . . . . . . . .

X1,11 X1,21 . . . X1,N1...

......

...X1,1T X1,2T . . . X1,NT

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

and X02 =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

x2,11 x2,21 . . . x2,N1...

......

...x2,1T x2,2T . . . x2,NT

x2,11 x2,21 . . . x2,N1...

......

...x2,1T x2,21 . . . x2,N1

. . . . . . . . . . . .

......

......

. . . . . . . . . . . .

x2,11 x2,21 . . . x2,N1...

......

...x2,1T x2,2T . . . x2,NT

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

.


In this case, the AM instruments can be defined as follows:

AAM = (AAM1 ,AAM

2 ,AAM3 ),

where AAM1 = (S1X)= AHT

1 , AAM2 = [AHT

2 ,(SX1, SX3)∗] and AAM3 = [AHT

3 ,(SX1, SX2)0],S = INT −S2− J and S = INT −S3− J. The order condition for γ becomes:

K + k1 + k3 + g1 +(T −1)(k1 + k3)≥ K + G or T (k1 + k3)≥ g2,

and for parameters δ , we must have:

K + k2 + h1 +(N−1)(k1 + k2)≥ K + H or N(k1 + k2)≥ h2.

Now, and following BMS, if X2, X4 are correlated with the individual effect αonly through a time-invariant component, and if (X3, X4) are correlated with the timeeffect λ only through a individual-invariant component, the BMS-like instrumentsare equivalent to the expanded instruments sets:

ABMS = (ABMS1 ,ABMS

2 ,ABMS3 ),

where ABMS1 = (S1X) = AHT

1 = AAM1 , ABMS

2 = [AAM2 ,(SX2, SX4)∗] and ABMS

3 =[AAM

3 ,(SX3, SX4)0]. The order condition for these instruments is T (k1 + k3)+ (T-1)(k2 + k4)≥ g2 for γ and N(k1 + k2)+ (N−1)(k3 + k4)≥ h2 for δ .

In order to compute νIV, we must first estimate parameters θ 21 ,θ 2

2 and θ 23 . To do

this, we can use a consistent estimate of β , δ and γ , and estimates of the variancecomponents derived from these estimators will be used below for estimating ϕIV.We can summarize the complete procedure as follows:

• Compute the within estimator βW = (X ′S1X)−1(X ′S1Y ) and form the vector ofresiduals uw = S1Y −S1X βW to compute

θ 21 = σ2

ε = (u′wuw)/(N−1)(T −1)−K. (4.32)

• Regress S2Y − S2X βW on PA2Z to get a consistent estimate γIV and form theresiduals vector u2 = S2Y −S2X βW −S2ZγIV. We can show that for fixed T andN −→ ∞:

plim (u′2u2/N) = θ 22 .

• Regress S3Y−S3X βW on PA3Z to get a consistent estimate δIV and form vector ofresiduals u3 = S3Y−S3X βW−S3W δI . We can show that for fixed N and T −→∞:

plim (u′3u3/T ) = θ 23 .

4.6 GMM Estimation of Static Panel Data Models

The way to deal with correlated effects using an IV procedure is to constructorthogonality conditions from the model residual and instruments such as thosepresented above (HT, AM, BMS), those instruments being assumed uncorrelated


with the disturbances (at least asymptotically), and asymptotically correlated withexplanatory variables. Consistent parameter estimates are then obtained under theassumption that the model is correctly specified (i.e., that orthogonality conditionsare valid), by minimizing a quadratic form in orthogonality conditions (momentrestrictions). Depending on the way this criterion is constructed, we obtain either ei-ther the Instrumental Variables (IV) under various forms, or the Generalized Methodof Moments estimator (GMM, see Hansen (1982)). We now turn to the applicationof GMM estimation to linear panel data models.

4.6.1 Static Model Estimation

We consider here the general form of Instrumental Variable and GMM estimatorsfor the static model introduced above, Y = Xβ + Zγ +U , or in a compact form,Y = Φμ +U where Φ = (X ,Z) and μ ′ = (β ′,γ ′). Let E(A′iUi) = 0 denote a L set oforthogonality conditions in vector form, where Ai, i = 1,2, . . . ,N is a T×L matrix ofinstruments. For a fixed T and N→∞, the empirical counterpart of the orthogonalityconditions is (1/N)∑N

i=1 A′iUi.Consider estimating by Generalized Least Squares (GLS) the following equation:

A′Y = A′Xβ + A′Zγ + A′U = A′Φμ + A′U, (4.33)

i.e., by minimizing

min1N

U ′A[

Var

(1N

A′U)]−1 1

NA′U. (4.34)

Letting V denote the variance–covariance matrix of (1/N)A′U , the resultingestimator can be written as

μ = (Φ′AV−1A′Φ)−1Φ′AV−1A′Y

Suppose we do not wish to make assumptions on the structure of the variancematrix V , e.g., disturbances may exhibit serial correlation (in the time dimension)and/or heteroskedasticity. Then the estimator above can be computed using an ini-tial estimate for V , V = (1/N)∑N

i=1 A′iUiU ′i Ai, where Ai is the (T,L) matrix of obser-

vations about the instrumental variables for the i-th individual and Ui is a (T,1)initial consistent estimate of Ui, i = 1, . . . ,N. This estimator is the GMM (Gen-eralized Method of Moments) under its optimal form, and its exploits the factthat the variance–covariance matrix of Ui is block-diagonal (no correlation acrossindividuals).

It is well known that if the disturbances are both homoskedastic and not seriallycorrelated, so that

Var(A′U) = E[A′Var(U |A)A]+ Var[A′E(U |A)A] = σ2E(A′A)−1


since E(U |A) = 0 and Var(U |A) = σ2U I, then the “best” instrumental variables

estimator (i.e., GLS applied to model) is given by (see Gourieroux and Monfort,1989):

μIV =[Φ′A(A′A)−1A′Φ

]−1 Φ′A(A′A)−1A′Y

For panel data however, this is unlikely to be the case because of individualunobserved heterogeneity, and the variance–covariance matrix of error terms isΩ = IN ⊗ Σ where Σ = σ2

α eT e′T + σ2ε IT is a T × T matrix. Suppose a prelimi-

nary estimate of Ω is available, Ω = IN⊗Σ(σ2

α , σ2ε), a simple version of this being

Ω = IN⊗ 1N ∑N

i=1 UiU ′i . Replacing Ω by Ω so that

plimN→∞1N

A′ΩA = plimN→∞1N

N

∑i=1

A′iΩAi = V,

we obtain the Three-Stage Least Squares estimator:

μ3SLS =[Φ′A(A′ΩA

)−1A′Φ]−1

Φ′A(AΩA′

)−1A′Y .

It is easy to see that the GMM and the 3SLS are equivalent under the conditionof no conditional heteroskedasticity, see Ahn and Schmidt (1999):

E(A′iUiU

′i Ai)

= E(A′iΣAi

) ∀i = 1, . . . ,N.

Note that this condition is weaker than the condition that E(UiU ′i |Ai) = Σ . If the no-

conditional heteroskedasticity condition is not satisfied, then GMM is more efficientthan 3SLS.

Assuming this conditional holds, 2SLS estimators can also be proposed. A firstversion of the Two-Stage Least Squares (2SLS) estimator is obtained by premulti-plying the model by Ω−1/2 and then applying instruments A. This is the form usedby HT, AM and BMS:

μIV1 =[Φ′Ω−1/2A(A′A)−1A′Ω−1/2Φ

]−1Φ′Ω−1/2A(A′A)−1A′Ω−1/2Y

This estimator is based on the two conditions:

E(

A′iΣ−1/2Ui

)= 0, E

(A′iΣ

−1/2UiU′i Σ−1/2Ai

).

Ahn and Schmidt (1999) shows that the 3SLS and the 2SLS estimator above areequivalent asymptotically if a consistent estimate is used for Σ4 and if there exists anonsingular and non-random matrix B such that Ω−1/2A = AB (or equivalently, thatΣ−1/2Ai = AiB ∀i = 1,2, . . . ,N).

A second version of the 2SLS estimator is denoted Generalized InstrumentalVariables (GIV, see White (1984)), which uses directly Ω−1/2A as instruments:

4 They are numerically equivalent if the same, consistent estimate is used.


μGIV =[Φ′−1A

(A′−1A

)−1A′−1Φ

]−1Φ′−1A

(A′−1A

)−1A′−1Y.

Although the two 2SLS estimators seem different, they are equivalent in apanel-data context when the error component structure is of the form Ω above.Again, a preliminary estimate of Σ is required to implement these 2SLS estimationprocedures.

4.6.2 GMM Estimation with HT, AM and BMS Instruments

In the Instrumental-Variable context with Hausman–Taylor, Amemiya–MaCurdyor Breusch–Mizon–Schmidt instruments described above, we assume an error-component structure and also that endogeneity is caused by correlated effects, eitherE(X ′α) �= 0 or E(Z′α) �= 0. In any case, it is maintained that E(X ′ε) = E(Z′ε) = 0.With GMM, we can consider different exogeneity assumptions related to α or ε ,producing different orthogonality conditions. Apart from the difference betweenrandom and fixed effect specifications (instruments correlated or not with α),we can also consider strictly or weakly exogenous instruments if explanatory in-struments are correlated with ε . These different cases are described by Ahn andSchmidt (1999), to which we refer the reader for more information.

Consider the case of strict exogeneity: E(Xisεit) = E(Ziεit) = 0, ∀i,∀t. The ques-tions we address are the following: is it possible to obtain a more efficient estima-tor than IV with either HT, AM or BMS, by exploiting more moment conditions?And does this efficiency depend on the assumption made on the assumed variance–covariance structure?

The first result is that, under the No conditional heteroskedasticity assumption,HT, AM and BMS–2SLS estimators are equivalent to the GMM estimator. From thediscussion above, this implies that GMM is more efficient with the same instrumentset (and a consistent variance–covariance matrix) than the original version of HT,AM and BMS–2SLS estimators, if this NCH condition is not valid.

Ahn and Schmidt (1995) and Arellano and Bover (1995) note that, under the strictexogeneity assumption, more moment conditions can be used, to improve efficiencyof the estimator. The strict exogeneity assumption is E(di⊗ εi) = 0, where di =(xi1, . . . ,xiT ,zi), implying E[(LT ⊗di)′ui] = E(L′T εi⊗di) = 0.

Arellano and Bover (1995) therefore propose a GMM estimator obtained by re-placing (in vector form) QT Φi by LT ⊗ di in the HT, AM or BMS list of instru-ments. This leads to (T − 1)(kT + g)− k additional instruments, which may causecomputational difficulties if T is large. They however also show that under the error-component structure Σ, both sets of instruments provide the same 3SLS (or 2SLSversion 1, μIV1) estimator.5 Consequently, if in addition the No conditional het-eroskedasticity assumption is valid, then the 3SLS (or 2SLS version 1, μIV1) withHT, AM or BMS instruments will be asymptotically equivalent to GMM with theaugmented set of instruments.

5 Asymptotically only, if different estimates of Σ are used.


Im, Ahn, Schmidt and Wooldridge (1999) consider cases where the no conditionalheteroskedasticity assumption holds when the Arellano–Bover set of instruments isused, and Σ is left unrestricted. They show that the 2SLS estimator version 2 (μIV2)using BMS instruments is equivalent to the 3SLS estimator using Arellano–Boverinstruments, but that this equivalence does not hold for HT or AM instruments. Tosolve this problem, Im, Ahn, Schmidt and Wooldridge (1999) propose to replacethe fixed effects operator QT by QΣ = Σ−1−Σ−1eT (e′T Σ−1eT )−1e′T Σ−1, such thatQΣ eT = 0, and modifying the matrix of instruments appropriately. This modified3SLS estimator would be asymptotically equivalent to an efficient GMM estimationis the NCH condition holds.

4.7 Unbalanced Panels

In the preceding sections we have discussed estimation methods for panel data mod-els when all cross-sectional units are observed for all time periods. In practice, miss-ing observations are often encountered for a given cross-sectional unit and for agiven time period. In this case, we have what we call an incomplete panel and thestandard estimation methods are not applicable. Fuller and Battese (1974) suggestto add in the list of regressors a set of dummy variables, one for each missing obser-vation. However, as noted by Wansbeek and Kapteyn (1989), this often implies thatthe number of regressors would increase dramatically (possibly, with the samplesize), and in many empirical studies this becomes computationally impractical.

Wansbeek and Kapteyn (1989) consider a two-way unbalanced error componentspecification for the fixed and random effects models. In the first case (fixed effects)they suggest a new expression for within operator, which generalizes the operatorQ given in Sect. 4.2.6 For the second case (random effects), they propose to use thequadratic unbiased and Maximum Likelihood estimators.

More recently, Baltagi and Chang (1994) have considered a one-way errorcomponent model with unbalanced data. Using a Monte Carlo simulation ex-periment, they compare several estimation methods including the Analysis OfVariance (ANOVA), Maximum Likelihood (ML), Restricted Maximum Likelihood(REML), Minimum Norm Quadratic Estimation (MINQUE) and Minimum Vari-ance Quadratic Estimation (MINQUE).

In their simulation and the empirical illustration they propose, they show thatin general, better estimates of the variance components do not necessarily implybetter estimates of the regression coefficients. Furthermore, MLE and MIVQUEperform better than the ANOVA methods in the estimation of the individual vari-ance component. Finally, for the regression coefficients, the computationally simpleANOVA methods perform reasonably well when compared with the computation-ally involved MLE and MIVQUE methods.

When the data have a sufficient degree of disaggregation, more than two dimen-sions of data variation are generally available. One can think for instance of a sample

6 See Wansbeek and Kapteyn (1989), p. 344.


of observations on firms (level 1) belonging to a particular industry (level 2), withina region (level 3). In this case, several time-invariant heterogeneity components canbe introduced in the linear panel data model, giving rise to multi-way error com-ponents models. In the nested specification, each successive component in the errorterm is nested within the preceding component. In the non-nested case, error compo-nents are independent of each other, and transformation techniques similar to thoseemployed in the two-way error component model are applicable.

As operator matrices for performing Between and Within transformations underany hierarchical structure are straightforward to construct, fixed effects and GLSestimators are generally available for such models (see, e.g., Antweiler (2001)). Inthe unbalanced panel data case however, the required algebra to obtain expressionsfor the Feasible GLS estimator in particular, is more difficult to handle.

Baltagi, Song and Jung (2001) propose a fixed-effects representation and a spec-tral decomposition of a three-way unbalanced error component model, leading toa Fuller–Battese scalar transformation for this model. They proceed by investigat-ing the performance of ANOVA, Maximum Likelihood and MINQUE estimators ofvariance components in the unbalanced nested error component model. ANOVAestimators for variance components are BQU (Best Quadratic Unbiased) in thebalanced case only, and are only unbiased in the unbalanced case. Monte Carloexperiments reveal that ANOVA methods perform well in estimating regression co-efficients, but ML and MINQUE estimators are recommended for variance compo-nents and standard errors of regression coefficients. They do not deal with the caseof endogenous regressors or correlated effects, beyond the obvious possibility to ob-tain consistent estimates using fixed effects. The fact that exogenous variables maybe available for different levels in the hierarchical structure of the data, leads to awide variety of possible instruments. For example, if firm-specific individual effectsare correlated with decision variables of the firms, price variables at an upper level(county, region) may be used as instruments.

Davis (2002) proposes a unifying approach to estimation of unbalanced multi-way error components models, as well as useful matrix algebra results for construct-ing (Between, Within) transformation matrices. The recurrence relations proposedin the paper allow for direct extension to any number of error components. There arebut few empirical applications in the literature using multi-way unbalanced panels,see Davis (2002) and Boumahdi and Thomas (2006) for examples.

For example, the three-way unbalanced error component model is

Y = Xβ + u, u = Δ1α + Δ2γ + Δ3λ + ε, (4.35)

where α = (α1, . . . ,αL)′, γ = (γ1, . . . ,γH)′ and λ = (λ1, . . . ,λT )′.Matrices Δk,k = 1,2,3 are constructed by collecting dummy variables for the

relevance of a given observation to a particular group (l,h, t), and have dimensionN×L, N×H and N×T respectively.

Letting PA = A(A′A)+A′ and QA = I−PA where + denotes a generalized inverse,the fixed effects transformation matrix is shown to be QΔ = QA−PB−PC, whereΔ = [Δ1,Δ2,Δ3] and


PA = I−Δ3(Δ′3Δ3)+Δ′3, QA = I−PA,

PB = QAΔ2(Δ′2QAΔ2)+Δ′2QA, QB = I−PB,

PC = QAQBΔ1[Δ′1(QAQB)Δ1

]+ Δ′1QAQB, QC = I−PC.

Under the exogeneity assumption E(X ′QΔε) = 0, the fixed-effects estimator isconsistent:

β =(X ′QΔX

)−1X ′QΔY. (4.36)

Assume instruments W are available such that E(W ′QΔε) = 0; then a consistentIV estimator can be constructed as

β =(X ′PQW X

)−1X ′PQWY, (4.37)

where PQW = QΔW (W ′QΔW )−1 W ′QΔ.As mentioned above in the one-way unbalanced case, application of IV proce-

dures require consistent estimation of the variance–covariance matrix, as well as aninstrument matrix consistent with the unbalanced nature of the sample. Formulae forestimating variance components can be found in Baltagi, Song and Jung (2001) andDavis (2002), although estimation should be adapted along the lines of Hausmanand Taylor (1981) because of endogenous regressors. For instrument matrices, theHT specification is directly applicable because it only contains Within transforma-tions and variables in levels. However, the AM and BMS IV estimators suffer fromthe same difficulty as in the one-way unbalanced case: they are more problematicto adapt because of missing observations in X∗1 and (QX)∗ matrices. It is not clearwhether the usual procedure to replace missing values by zeroes in those matricesproduces severe distortions (bias, inefficiency) or not.

References

Ahn, S.C., Y.H. Lee and P. Schmidt [2001]. GMM Estimation of Linear Panel Data Models withTime-Varying Individual Effects. Journal of Econometrics 101, 219–255.

Ahn, S.C. and P. Schmidt [1999]. Estimation of linear panel data models using GMM. In Gener-alized Method of Moments Estimation (L. Matyas, ed.), pp. 211–247, Cambridge UniversityPress, Cambridge.

Alvarez, J. and M. Arellano [2003]. The Time Series and Cross Section Asymptotics of DynamicPanel Data Estimators. Econometrica 71, 1121–1159.

Amemiya, T. and T.E. MaCurdy [1986]. Instrumental Variable Estimation of an Error ComponentsModel. Econometrica 54, 869–880.

Anderson, T. and C. Hsiao [1982]. Formulation and Estimation of Dynamic Models Using PanelData. Journal of Econometrics 18, 47–82.

Antweiler, W. [2001]. Nested Random Effects Estimation in Unbalanced Panel Data. Journal ofEconometrics 101, 295–313.

Arellano, M. [1993]. On the Testing of Correlated Effects with Panel Data. Journal of Econometrics59, 87–97.

Arellano, M. and S. Bond [1991]. Some Tests of Specification for Panel Data: Monte Carlo Evi-dence and an Application to Employment Equations. Review of Economic Studies 58, 277–297.


Arellano, M. and O. Bover [1995]. Another Look at the Instrumental Variable Estimation ofError-Components Models. Journal of Econometrics 68, 29–51.

Baltagi, B.H. and Y.-J. Chang [1994]. A Comparative Study of Alternative Estimators for the Un-balanced One-Way Error Component Regression Model. Journal of Econometrics 62, 67–89.

Baltagi, B. and S. Khanti-Akom [1990]. On Efficient Estimation with Panel Data, an Empir-ical Comparison of Instrumental Variables Estimators. Journal of Applied Econometrics 5,401–406.

Baltagi, B.H., S.H. Song and B.C. Jung [2001]. The Unbalanced Nested Error Component Regres-sion Model. Journal of Econometrics 101, 357–381.

Biørn, E. [1981]. Estimating Economic Relations from Incomplete Cross-Section/Time-SeriesData. Journal of Econometrics 16, 221–236.

Boumahdi, R., J. Chaaban and A. Thomas [2006]. Import demand estimation with country andproduct effects: Application of multi-way unbalanced panel data models to Lebanese im-ports. In Panel Data Econometrics: Theoretical Contributions and Empirical Applications(B.H. Baltagi, ed.), Chap. 8, pp. 193–228, Elsevier, Amsterdam.

Boumahdi, R. and A. Thomas [1997]. Estimation des modeles de donnees de panel avecregresseurs temporels. Annales d’Economie et de Statistique 46, 23–48.

Boumahdi, R. and A. Thomas [2006]. Instrument Relevance and Efficient Estimation with PanelData. Economics Letters 93, 305–310.

Breusch, T.S., G.E. Mizon and P. Schmidt [1989]. Efficient Estimation Using Panel Data. Econo-metrica 57, 695–700.

Cornwell C. and P. Rupert [1988]. Efficient Estimation with Panel Data: An Empirical Comparisonof Instrumental Variables Estimators. Journal of Applied Econometrics 3, 149–155.

Crepon, B., F. Kramarz and A. Trognon [1997]. Parameters of Interest, Nuisance Parameters andOrthogonality Conditions an Application to Autoregressive Error Component Models. Journalof Econometrics 82, 135–156.

Davis, P. [2002]. Estimating Multi-Way Error Components Models with Unbalanced Data Struc-tures. Journal of Econometrics 106, 67–95.

Fuller, W.A. and G.E. Battese [1974]. Estimation of Linear Functions with Cross-Error Structure.Journal of Econometrics 2, 67–78.

Godfrey, L.G. [1999]. Instrument Relevance in Multivariate Linear Models. Review of Economicsand Statistics 81, 550–552.

Hansen, L.P. [1982]. Large Sample Properties of Generalized Method of Moments Estimators.Econometrica 50, 1029–1054.

Hausman, J. [1978]. Specification Tests in Econometrics. Econometrica 46(6), 1251–1271.Hausman, J. and W.E. Taylor [1981]. Panel Data and Unobservable Individual Effects. Economet-

rica 49, 1377–1398.Heckman, J.J. and V.J. Holtz [1989]. Choosing Among Alternative Nonexperimental Methods for

Estimating the Impact of Social Programs: The Case of Manpower Training. Journal of theAmerican Statistical Association 84, 862–875.

Holtz-Eakin, D., W. Newey and H. Rosen [1988]. Estimating Vector Autoregressions with PanelData. Econometrica 56, 1371–1395.

Hsiao, C. [1986]. Analysis of Panel Data. Cambridge University Press, Cambridge.Im, K.S., S.C. Ahn, P. Schmidt and J.M. Wooldridge [1999]. Efficient Estimation of Panel Data

Models with Strictly Exogenous Explanatory Variables. Journal of Econometrics 93, 177–201.Lillard, L. and Y. Weiss [1979]. Components of Variation in Panel Earnings Data: American Sci-

entists 1960–1970. Econometrica 47, 437–454.Mundlak, Y. [1961]. Empirical Production Function Free of Management Bias. Journal of Farm

Economics 43, 44–56.Mundlak, Y. [1978]. On the Pooling of Time Series and Cross Section Data. Econometrica 46,

69–85.Nauges, C. and A. Thomas [2003]. Consistent Estimation of Dynamic Panel Data Models with

Time-Varying Individual Effects. Annales d’Economie et de Statistique 70, 53–75.


Schmidt, P., S.C. Ahn and D. Wyhowski [1992]. Comment: Sequential Moment Restrictions inPanel Data. Journal of Business and Economic Statistics 10, 10–14.

Shea, J. [1997]. Instrument Relevance in Multivariate Linear Models: A Simple Measure. Reviewof Economics and Statistics 79, 348–352.

Wansbeek, T. and A. Kapteyn [1989]. Estimation of the Error-Components Model with IncompletePanels. Journal of Econometrics 41, 341–361.

Wansbeek, T.J. and T. Knaap [1999]. Estimating a Dynamic Panel Data Model with HeterogenousTrends. Annales d’Economie et de Statistique 55–56, 331–349.

Wyhowski, D.J. [1994]. Estimation of a Panel Data Model in the Presence of Correlation BetweenRegressors and a Two-Way Error Component. Econometric Theory, 10, 130–139.

endogenous regressors and correlated effects

Documents