Download - Simultaneous Variables - Gabriel Montesgabrielmontes.com.ar/GMRS_87.pdf · ditional variables are de ned as simultaneous variables. We derive identi cation of the parameters, develop

Simultaneous Variables∗

Antonio F. Galvao†

Gabriel Montes-Rojas‡

Suyong Song§

June 17, 2015

Abstract

This paper proposes an alternative solution to the endogeneity problem by explic-itly modeling the joint interaction of the endogenous variables and the unobservedcauses of the dependent variable as a function of additional observables. These ad-ditional variables are defined as simultaneous variables. We derive identification ofthe parameters, develop an estimator, and establish its consistency and asymptoticnormality. We illustrate the methods with two examples that examine the investmentequation model and returns to education. Our empirical findings support the idea thatthe proposed estimator is a useful alternative to the instrumental and proxy variablesestimator for economic applications in which endogeneity is a relevant concern.

Keywords: Endogeneity, instrumental variables, simultaneous variables, investment equa-tion, returns to education.

JEL Classification: C10, C26, G31

∗The authors would like to express their appreciation to Carolina Caetano, Karim Chalak, Juan CarlosEscanciano, Sergio Firpo, Amit Gandhi, Zhengyuan Gao, Chuan Goh, Fabio Gomes, Susumu Imai, Kyoo ilKim, Roger Koenker, Joris Pinkse, Alex Poirier, Gene Savin, Liang Wang, and seminar participants at theUniversity of Minnesota, University of Illinois at Urbana-Champaign, Universidad de Buenos Aires, Univer-sity of Iowa, University of Wisconsin-Milwaukee, Central Bank of Brazil, 23rd Midwest Econometrics Group,Universidad de San Andres, University of Pittsburgh, and Royal Economic Society for helpful discussionsand comments regarding this paper. All the remaining errors are ours.†Department of Economics, University of Iowa, W284 Pappajohn Business Building, 21 E. Market Street,

Iowa City, IA 52242. E-mail: [email protected]‡CONICET-Universidad de San Andres, Vito Dumas 275, Victoria, Pcia. de Buenos Aires, Argentina

and Department of Economics, City University London, D316 Social Sciences Bldg, Northampton Square,London EC1V 0HB, UK. Email: [email protected]§Department of Economics, University of Iowa, W360 Pappajohn Business Building, 21 E. Market Street,

Iowa City, IA 52242. E-mail: [email protected]

1

1 Introduction

The problem of endogeneity occupies a substantial amount of research in theoretical and

applied econometrics. The most popular approaches are instrumental variables (IV) (see

e.g. Hausman, 1983; Angrist and Krueger, 2001, for surveys) and control function approach

(see e.g. Olley and Pakes (1996), Levinsohn and Petrin (2003), Wooldridge (2002), and

Wooldridge (2009)). These solutions rely on exogenous information derived from an ad-

ditional exclusion restriction or equation. In applications, the type of restriction chosen

determines the nature of the model to be used, i.e. the instrument or the proxy variable.

However, in many empirical applications, there is frequently disagreement and concern about

the exclusion restrictions imposed, and instruments and proxies selections. The potential

IV are often argued to be invalid since they are still correlated with the error term (see,

e.g., Bound, Jaeger, and Baker, 1995; Hahn and Hausman, 2002) while the conditions for

identification using proxy variables are many times implausible.

This paper contributes to the literature by proposing an alternative solution to the prob-

lem of endogeneity. In doing so, we propose a new identification condition which explicitly

models the endogeneity bias. More specifically, the conditional expectation of the joint

interaction of the endogenous variables and unobserved causes of the dependent variable

is assumed to be a function of additional observable variables. We define such additional

variables as simultaneous variables because they are simultaneously related to both the en-

dogenous variable and the unobservables. Thus, we develop simple and tractable methods

for conducting estimation and inference in econometric models with endogeneity, and allow

the additional variables (the simultaneous variables) to still be correlated with the innova-

tion term. Given the structural model and the simultaneous variables, we establish point

identification of the parameters of interest. Motivated by the identification result we de-

velop an estimator that is simple to implement in practice, and establish its consistency and

2

asymptotic normality. In addition, we propose practical inference procedures based on the

Wald statistic, as well as a test for endogeneity.

The proposed framework can be framed within the control function approach, although

it builds on specific assumptions for identification based on the nature of the simultaneous

variables. Our framework allows for situations in which there are no valid standard IV avail-

able, but there exist additional variables that happen to be related to the joint interaction of

the endogenous variable and the unobserved causes of the dependent variable. The intuition

on the main identification condition of the new procedure is that, by using the proposed

restriction, the econometrician is able to approximate the endogeneity bias using the simul-

taneous variables, such that after controlling for the simultaneous variables, the endogenous

variables are not related to the interaction term (between the endogenous variable and the

innovation). Thus, the endogeneity bias implied by the non-zero conditional expectation of

the interaction term can be effectively specified as a function of the simultaneous variables

only. Strictly speaking, neither the IV or simultaneous method is more general than the

other because the underlying assumptions are non-nested. They differ in the characteristics

of the additional variables; in IV case these cause the endogenous variable only, while in our

proposed method they are allowed to affect the joint interaction of the endogenous variable

and error term.

Monte Carlo studies are conducted to evaluate the finite sample properties of the proposed

estimator. The experiments suggest that when simultaneous variables are available, the

proposed method produces consistent estimates under the required assumptions. Moreover,

the simulations highlight the differences between the simultaneous variables approach and

the IV method. The results suggest that when only simultaneous variables exist, our method

is able to eliminate the endogeneity bias while the IV returns significantly biased estimates.

The new procedures provide intuitive and practical ways of handling the problem of

3

endogeneity in empirical settings. To motivate and illustrate the applicability of the identifi-

cation and estimator, we apply the methods to two empirical examples. First, an investment

equation model where measurement errors are a relevant concern. In particular, we consider

the Fazzari, Hubbard, and Petersen’s (1988) investment equation model, where a firm’s in-

vestment is regressed on observed investment demand (Tobin’s q) and cash flows. Concerns

about measurement errors in Tobin’s q have been emphasized in the empirical investment

equation models context (see e.g., Hayashi, 1982; Poterba, 1988) because researchers only

observe average q instead of marginal q. Hence, the measurement errors induce endogeneity

in the independent variables, and consequently bias in their estimates. We show that our

proposed methods are well suited to solve the existing measurement error problem in this

literature. It has been common in the literature to employ IV methods using the lags of q as

instruments to resolve the endogeneity problem (see e.g., Almeida, Campello, and Galvao,

2010; Lewellen and Lewellen, 2014). However, the approach of using lagged q as IV fails when

the measurement errors on the marginal q are systematic. In practice, it is highly likely that

current-period measurement error is correlated with the first-order or higher-order lags of the

measurement error. This phenomenon of autocorrelation in the measurement errors seems

to be more realistic in practice because systematic errors on a way of evaluating average q

would be highly likely to be persistent. Contrary to the IV, we show that even under the

assumption that the marginal q and measurement error are correlated, the lagged Tobin’s q

and its square are valid simultaneous variables because they are related to the interaction

of the endogenous variable q and the error term. Thus, we can use these simultaneous vari-

ables to remove the bias from the measurement error and obtain consistent estimates for the

parameters of interest. For comparison, we estimate the model using OLS, IV and simulta-

neous variables estimators. Our empirical findings support the idea that the simultaneous

variable estimator is a useful alternative to IV models.

4

In the second example, we estimate returns to education, a topic that has been exten-

sively discussed in the empirical literature. In general, the literature is concerned with the

endogeneity in education because of unobserved causes of wage, e.g. omitted ability. It is

argued that finding valid IV or proxy variables is not an easy task (see e.g. Card, 1999). Our

model proposes an alternative way to control for endogeneity in education by modeling the

dependence of education and the unobserved causes of wage as a function of simultaneous

variables and an innovation term. We argue that IQ test scores can be used as a simul-

taneous variable as it affects both ability and education. The empirical results show that

our proposed approach estimates smaller returns to education than OLS, IV or proxy ap-

proaches. Moreover, it delivers more reasonable estimates than other regression coefficients

such as age and ethnicity. As a result, our method would be a useful alternative to control

for upward bias in the estimation of returns to education.

The proposed methods relate to recent advances in the literature on various alternative

estimation and inference procedures that attempt to solve endogeneity without standard

instrumental variables. In a recent paper, Conley, Hansen, and Rossi (2012) present practical

methods for performing inference while relaxing the IV exclusion restriction. They use prior

information regarding the extent of deviations from the exact exclusion restriction. Chalak

and White (2011) define a new class of extended IV, and introduce notions of conditioning

and conditional extended IV which allow use of non-traditional instruments, as they may be

endogenous. Chalak (2012) achieves identification of parameters by employing restrictions

on the magnitude and sign of confounding instead of using traditional IV. Nevo and Rosen

(2012) provide bounds for the parameters when the standard exogeneity assumption on IV

fails, by assuming the correlation between the instruments and the error term has the same

sign as the correlation between the endogenous regressor and the error term and that the

instruments are less correlated with the error term than is the endogenous regressor. Gandhi,

5

Kim, and Petrin (2013) propose a generalized control function approach for models where the

endogenous variables interact with the error term. Caetano (2015) develops an interesting

test procedure for exogeneity of explanatory variables without relying on IV. The test rests

on an assumption that the structural function needs to be continuous in the explanatory

variable of interest, but it does not require the structural function to be identified under

either the null or the alternative hypotheses. Klein and Vella (2010) propose a control

function estimator to adjust for the endogeneity in the triangular simultaneous equations

model where there are no available exclusion restrictions to generate suitable instruments.

They exploit the dependence of the errors on exogenous variables (e.g. heteroscedasticity) to

adjust the conventional control function estimator. Lewbel (2012) also uses a heteroscedastic

covariance restriction to control for the endogeneity in the absence of instrumental variables.

Bias-corrected OLS estimators are proposed to speculate on the size and direction of the

bias by imposing different assumptions. For instance, this approach is used by Kiviet (2012)

where different values of the correlation between the endogenous variable and the innovations

are used to analyze the properties of different estimators, and by Altonji, Elder, and Taber

(2005, 2008) where the degree of selection on observables is used to assess the bias. Nevo

and Rosen (2012) analyze the case where the instruments may themselves be endogenous,

but the correlation of the instrument with the error term can be used to derive bounds in

which the true parameter lies.

The paper is organized as follows. Section 2 presents the econometric model, the identi-

fication results, and comparison with IV method. Section 3 proposes a consistent estimator,

establishes its asymptotic properties, and develops inference procedures. Section 4 presents

extensions to nonparametric model and a case of general form of endogeneity. Section 5 stud-

ies the finite sample performance of the proposed estimator. Section 6 applies the estimator

to the investment equation model and returns to education. Finally, Section 7 concludes the

6

paper. Technical proofs and extension to multiple endogenous variables are included in the

Appendix.

2 Modeling endogeneity using observables

In this section, we propose a new methodology to solve the endogeneity bias problem. Iden-

tification of the parameters of interest is achieved by modeling the endogeneity bias through

a novel additional equation. We also compare the proposed method with the IV framework.

2.1 The model

Consider the following structural model

E[yi|x1i,x2i] = f(x1i,x2i), i = 1, ..., n, (1)

where yi is a scalar dependent variable, x1i is a p1-vector of exogenous explanatory variables,

x2i is a p2-vector of endogenous regressors, and f(·, ·) is an unknown measurable function.

Define xi = [x1i,x2i] as a (p1 + p2)-vector with the sample covariates. The main focus of

this paper is endogeneity and its solution.

For motivation and exposition purposes, we primarily investigate the endogeneity problem

in linear regression models. Nevertheless, the results generalize to nonparametric models (see

Section 4.1 below for details). Equation (1) can be written as

yi = x1iβ1 + x2iβ2 + εi, i = 1, ..., n, (2)

where β1 is a p1-vector, β2 is a p2-vector, and εi is a scalar innovation term. Define β =

[β>1 ,β>2 ]>. We assume that x2i is endogenous, and correlated with the innovation term εi in

(2), such that E[x>2iεi] 6= 0. In addition, x1i is exogenous with E[x>1iεi] = 0. Thus, we also

assume that

1

n

n∑i=1

x>i εip→ E[x>ε] ≡ B = [0>,B>2 ]>,

7

withB a (p1+p2)-vector (finite), and 1n

∑ni=1 x

>i xi

p→ E[x>x] ≡ C (finite and non-singular).

The endogeneity in x2 produces an endogeneity bias, the term B, in the standard OLS es-

timator, βOLS = ( 1n

∑ni=1 x

>i xi)

−1 1n

∑ni=1 x

>i yi. Simple algebra and asymptotic calculations

show that βOLS1

p→ β1 −C−111 C12δ2, and βOLS2

p→ β2 + δ2, where δ2 = V −12·1 B2. Note that if

the endogeneity term B were known, then the correct estimating equation would be

1

n

n∑i=1

x>i (yi − xiβ) = B, (3)

and its solution is β = ( 1n

∑ni=1 x

>i xi)

−1 1n

∑ni=1(x>i yi − B) = βOLS − ( 1

n

∑ni=1 x

>i xi)

−1B.

However, since theB is unknown, to solve the endogeneity problem we will instead model the

interaction of the endogenous variable and the error term, x2iεi and establish identification

of β under some mild conditions. For simplicity, throughout we consider the case where

p2 = 1, i.e., there is only one endogenous variable, x2i. The extension to multiple endogenous

variables is derived in Appendix B.

2.2 Identification

Identification of the parameters of interest is achieved by explicitly modeling the interaction

of the endogenous variable and the unobserved causes of the dependent variable as a function

of additional observable variables. In particular, motivated by equation (3), we consider the

case where the variable x2ε can be modeled using additional variables. The following equation

formalizes modeling endogeneity

E(x2ε | z,x) = g(z), (4)

where g(·) is an unknown smooth function and z a k-vector of additional observable variables.

This is a general formulation to model the endogeneity in the linear parametric model. A

more general form of endogeneity which takes into account nonlinear dependence between

these two variables is considered in Section 4.2 below. For simplicity, assume that g(·) is a

8

known function of z with unknown parameters φ such as g(z;φ), but the analysis can be

easily extended to the case of unknown functional form of g(·) which is discussed in Section

4.1. For notational simplicity, we suppress the subscript i whenever there is no confusion.

A simple example of (4) that is convenient for exposition and estimation purposes is the

following polynomial of z,

E(x2ε | z,x) = Zφ, (5)

where Z = [1, z, z2, . . . ,zm] and φ = [φ0,φ>1 , . . . ,φ

>m]> which is a nonzero vector.1 Equation

(5) is explicitly modeling the endogeneity of x2. In this case, by modeling endogeneity

we mean to model the term x2ε. When φ 6= 0, we can interpret the exogenous variable

z as a noisy measure of the common cause(s) of x2 and ε, which is related to the joint

interaction of the endogenous variable and the unobservables. Our identification strategy

requires observable variables, z. We define such variables as simultaneous variables (SV)

because they are simultaneously related to x2 and ε.

The proposed identification is a special case of a control function approach. When the

correlation between x2 and ε is modeled, equation 5 can be rewritten as x2E(ε | z,x) = Zφ,

hence, we have E(ε | z,x) = Zx2φ. Therefore, the conditional expected value of the unob-

served error term is a function of the “normalized” simultaneous variables by the endogenous

variable , i.e., Zx2

. The emphasis is however on the nature of the SV, which provide infor-

mation about the joint interaction of the endogenous variable and the error term. More

importantly, the required assumptions for identification are specific to the SV model. The

empirical applications developed in this paper are illustrative about the nature of the SV.

In the investment equation model, we argue that the measurement errors model determines

that lag values are jointly related to the endogeneous variable and the error term. In the

education example, IQ is more likely a determinant of both education and ability, rather

1One might want to approximate the unknown function g(·) with one of the sieve bases (e.g., power series,Fourier series, splines, etc.). See, e.g., Chen (2007) for more details on the method of sieve.

9

than only ability.

We are interested in identifying and estimating the parameters β in equation (2). In

practice, φ is unknown, and it is important to note that this parameter cannot be directly

estimated from equation (5) because ε is unobservable. Hence, we consider the joint iden-

tification and estimation of both β and φ. To this end, we impose restrictions on the

relationship between the endogenous regressor and the simultaneous variables.

Define θ ≡ [β>1 ,α>]> with α ≡ [β>2 ,φ

>]>. To ease the notation, define y and x2 after

netting out the exogenous regressor x1 and multiplying the resulting objects by x2. Thus,

y = x2(y−x1E(x>1 x1)−1E(x>1 y)) and x = [x2,Z], with x2 = x2(x2−x1E(x>1 x1)−1E(x>1 x2)).

Let z be a set of variables induced by conditioning variables [z,x]. Consider the following

assumptions.

Assumption 1

(i) E(x>1 ε) = 0;

(ii) E(x2ε | z,x) = Zφ.

Assumption 2 E(x>1 x1) and E(z>x) are non-singular.

Assumptions 1 and 2 allow us to identify the parameters of interest. Assumption 1 (i)

simply states that x1 are exogenous regressors. Assumption 1 (ii) is the main identification

condition. It is new in the literature and deserves further discussion. Condition 1 (ii) ex-

plicitly models the interaction between the endogenous variable and the unobserved causes

of the dependent variable using a parametric model specification. It states that the simulta-

neous variables are able to capture the information on the endogeneity term. The intuition

behind this assumption is that once one controls for the simultaneous variables (z), x is

not related to the interaction term x2ε. In other words, the endogeneity bias implied by the

10

non-zero conditional expectation of the interaction term can be specified as a function of the

simultaneous variables only.

It is important to notice the restriction this assumption imposes relative to the litera-

ture. The simultaneous variables model allows for dependence between the error term and

simultaneous variables at the expense of restricting the interaction between the endogenous

regressor and the error term being a function of the simultaneous variables only. In contrast,

the IV model requires dependence between the endogenous regressor and the instrumental

variables, which are restricted to be uncorrelated with the error term. Therefore, the new

method is able to allow the additional variables to still be correlated with the error term

at the cost of requiring the interaction not to be related to the endogenous variable. As

a result, the difference between our proposed model and traditional IV approach rests on

different model specifications; researchers fail to identify parameters if an incorrect method

is employed to control for the endogeneity in each case.

Assumption 1 (ii) can be interpreted in an alternative form. In particular, it can be

restated as

x2ε = Zφ+ u, with E(u | z,x) = 0. (6)

This format provides an auxiliary equation which models endogeneity bias using simultaneous

variables, and the condition E(u | z,x) = 0 states that the innovation u should have

conditional mean zero given z and x. This requirement relates to standard exogeneity

condition in the IV literature, which requires that the innovation term in the first-stage

equation needs to be uncorrelated with instrumental variables.

Although the condition φ 6= 0 is not a formal requirement, it implicitly arises because of

the fact that if φ = 0, there is no endogeneity problem and the whole exercise is unnecessary.

In addition, Assumption 1 (ii) states that E(x>u) = 0. The intuition behind this condition is

that u must be uncorrelated with x2 as well as Z where x2 is the product of x2 and x2 with x1

11

netted out. One can notice that E(u|z,x1, x2) = 0 is a sufficient condition for E(x>u) = 0.

Although the former condition is stronger than the latter, it might be easier to interpret

in applications. The square seems non-intuitive at first glance but it can be explained as

follows. The identification fails if the unobserved component u is still correlated with x22

(after x1 is netted out), which means that ε is still linearly related to x2 in a way that

cannot be captured by Z. Assumption 2 is a standard rank condition. Non-singularity of

E(x>1 x1) is necessary for the identification of β1 and non-singularity of E(z>x) is required

for the identification of α. The later states that the simultaneous variables are not linearly

related to the endogenous regressor.

We now return to the general structural equation (2) and general identification. For

the sake of clarity, we first focus on exactly identified model motivated by the conditional

moment restriction of equation (5).2 We consider the over-identified case in Section 4.2

below. The following theorem formalizes the identification results of θ, with θ ≡ [β>1 ,α>]>

and α ≡ [β>2 ,φ>]>.

Theorem 1 Suppose Assumption 1 holds. Then, θ is fully identified with

α = E(z>x)−1E(z>y), β1 = E(x>1 x1)−1E(x>1 y)− E(x>1 x1)−1E(x>1 x2)β2,

if and only if Assumption 2 holds.

Proof. In Appendix A.

2.3 Comparison with IV approach

This section discusses the differences between the proposed simultaneous variables and the

IV approaches. Suppose that x2 is endogenous in (2). For the standard IV case, in the

2Based on E(ρ(x2ε,Z;θ) | z,x) = 0, where ρ(·, ·;θ) = x2(y − xβ) − Zφ, we obtain the orthogonalitycondition E(zρ(x2ε,Z;θ)) = 0. A consistent estimator of θ can be achieved by imposing the sample analogueof the population orthogonality condition.

12

first-stage we have

x2 = x1π1 + zπ2 + w,

where w is an unobserved component. The instrumental variable, z, is a valid instru-

ment when it satisfies two conditions. First, it is correlated with the endogenous variable,

Cov(x2, z) 6= 0, after partialling out x1. Second, the instrument is required to be uncorre-

lated with the unobserved causes of the dependent variable, E(ε | z) = 0, after partialling

out x1.3 Assumption 1 (ii) plays a similar role to the first IV condition above such that

an instrumental variable should be related to the endogenous variable, i.e. π2 6= 0. The

proposed method, however, requires the simultaneous variables to capture information on

the interaction term. In addition, the IV requires that E(w | x1, z) = 0. This is similar to

the simultaneous variables condition presented in equation (6), E(u | x, z) = 0.

Regarding the second IV restriction, E(ε | z) = 0, when it is violated, the IV approach

fails to solve the endogeneity problem. However, under the assumptions for our methodology

z could still be used as a simultaneous variable to control for endogeneity. Assumption 1 (ii)

allows z to be correlated with the unobserved causes of the dependent variable, ε, by explicitly

modeling the endogeneity. The main trade-off is that the simultaneous variables method

requires the interaction between the endogenous regressor and the error term to be a function

of the simultaneous variables only. Our approach is also different from the conditioning

instrumental variables in the model of Chalak and White (2011), where in this case x2 and ε

would be uncorrelated once we condition on this type of extended instrument. Furthermore,

our method relies on different model and assumptions than the proxy variable approach. In

the proxy variables approach, condition E[ε | x, z] = g(z) controls for endogeneity of x2

through the equation y = x1β1 + x2β2 + g(z) +w, where E[w | x, z] = 0. It is worth noting

3Strictly speaking, E(εz) = 0 is sufficient for the linear IV model. Similarly, Assumption 1 (ii) for theproposed model can be relaxed to weaker condition which is written as unconditional expectation. But theconditional version is adopted for better motivation and coherence.

13

that our condition E[x2ε | x, z] = g(z) is instead based on the interaction between x2 and ε.

3 Estimation and inference

In this section, we develop a simultaneous variables estimator and present the details on its

practical implementation. We then study its asymptotic properties by establishing consis-

tency and asymptotic normality. We also develop inference procedures based on the Wald

statistic. Finally, we propose a simple test for endogeneity.

3.1 Estimation

Given the identification result in Theorem 1, we are able to estimate the parameters of

interest. We construct an estimator which is simple to implement in practice. Recall that

θ ≡ [β1,α]> with α ≡ [β2,φ]>. An estimator of θ is as following

α =

(1

n

n∑i=1

z>i xi

)−1(1

n

n∑i=1

z>i yi

), (7)

β1 =

(1

n

n∑i=1

x>1ix1i

)−1(1

n

n∑i=1

x>1iyi

)−

(1

n

n∑i=1

x>1ix1i

)−1(1

n

n∑i=1

x>1ix2i

)β2, (8)

where z is the set of variables generated by conditioning variables, x and y are sample

analogs of x and y, which are obtained by replacing the expectations with sample means,

and where β2 is the first element of α.

The implementation of the proposed estimator in practice is simple and can be carried

through a sequence of OLS estimations as follows. First, compute the variables x and y. To

calculate y, one first partials out the exogenous regressors by computing the errors from a

OLS regression of y on x1, then multiply those by x2. Computation of x is analogous. Second,

estimate α using equation (7) and z, the set of variables generated by the conditioning

variables. Finally, given α and consequently β2, β1 can be estimated from OLS as in equation

14

(8), by using the coefficients of the OLS regression of y on x1 and also the coefficients of the

regression of x2 on x1. These generated variables affect the asymptotic variance-covariance

matrix (see e.g. Pagan, 1984), as shown in the derivation of the asymptotic normality below.

3.2 Asymptotic theory

Now we establish the asymptotic properties of the estimator described above. In particular,

we show its consistency and asymptotic normality. Denote Q ≡ E(z>i xi), C1 ≡ E(x>1ix1i),

and C2 ≡ E(x>1ix2i). The limiting behavior of the estimator is summarized in the following

result.

Theorem 2 Let assumptions of Theorem 1 hold and the observations (yi,xi,Zi); i = 1, 2, ..., n

be i.i.d. across i and their fourth moments exist, i.e., E(‖yi‖4) < ∞, E(‖xi‖4) < ∞, and

E(‖Zi‖4) <∞. Then, as n→∞,

αp→ α.

In addition, we have that

√n(α−α)

d→ N(0, Q−1MQ−1),

with M = V ar(z>u − Gr(δ) + Hs(γ)), where G, r(δ), H, and s(γ) are defined in the proof.

Moreover,

β1p→ β1,

and

√n(β1 − β1)

d→ N(0,C−11 C2Vβ2C

>2 C

−11 ),

where Vβ2 is the variance of β2.

Proof. In Appendix A.

15

The results given in Theorem 2 show that the limiting distribution of the proposed

estimator is standard. This result is the foundation for construction of inference procedures.

3.3 Inference

Inference is very important in practice. Given the results in Theorem 2, general hypotheses

on the vector θ can be accommodated by Wald-type tests. We focus on linear restrictions.

The Wald statistic and associated limiting theory provide a natural foundation for testing

hypotheses as Rθ = r, where R is a full-rank matrix imposing q restrictions on the pa-

rameters, and r is a column vector of q elements. The Wald statistic can be constructed

as

Wn = n(Rθ − r)>[RΩR>]−1(Rθ − r), (9)

where Ω is a consistent estimator of the variance-covariance matrix of Ω.

Under the null hypothesis H0 : Rθ = r, the statistic Wn is asymptotically χ2q with q-

degrees of freedom, where q is the rank of the matrix R. The limiting distribution of the

test is summarized in the following result.

Corollary 1 Under H0 : Rθ(τ) = r, and the conditions of Theorem 2,

Wna∼ χ2

q.

Proof. Given a consistent estimator Ω, the proof is a direct application of Theorem 2.

In practice, to carry inference and apply a Wald test in (9) one needs a consistent

estimator of the asymptotic variance-covariance matrix. As described in the above re-

sult, to estimate the asymptotic variance-covariance matrix, we need to estimate both

V ar(α) = Q−1MQ−1/n and V ar(β1) = C−11 C2Vβ2C

>2 C

−11 /n. The later is easily recovered

from its sample counterparts, that is, C1 = n−1∑n

i=1 x>1ix1i and C2 = n−1

∑ni=1 x

>1ix2i. Now,

16

notice that Vβ2 is the first element of the variance-covariance matrix V ar(α). Finally, for the

estimation of the variance-covariance matrix of α one can consider its sample counterpart

such as V ar(α) = Q−1MQ−1/n with

Q =1

n

n∑i=1

z>i xi,

M =1

n

n∑i=1

(z>i ui − Gri(δ) + Hsi(γ)

)(z>i ui − Gri(δ) + Hsi(γ)

)>where

ui = yi − xiα,

G =1

n

n∑i=1

(z>i ∇δxiα) =1

n

n∑i=1

(z>i [−x2ix1i, 0, 0]α),

H =1

n

n∑i=1

(z>i ∇γ yi) =1

n

n∑i=1

(z>i (−x2ix1i)),

ri(δ) =

(1

n

n∑i=1

x>1ix1i

)−1 (x>1i(x2i − x1iδ)

),

si(γ) =

(1

n

n∑i=1

x>1ix1i

)−1 (x>1i(yi − x1iγ)

),

δ =

(1

n

n∑i=1

x>1ix1i

)−1(1

n

n∑i=1

x>1ix2i

),

and

γ =

(1

n

n∑i=1

x>1ix1i

)−1(1

n

n∑i=1

x>1iyi

).

The practical estimation of these quantities is simple. As for the point estimates, it only

relies on simple OLS regressions.

17

4 Extensions

4.1 Nonparametric model

We have focused on a simple linear parametric model for the sake of clear exposition and

motivation. Nevertheless, the model we consider can be extended to more general additively

separable nonparametric models. The structural model (1) is now rewritten as

yi = f(x1i, x2i) + εi, i = 1, ..., n, (10)

where x2 is endogenous and ε is an innovation term. Then, from the equations (4) and (10)

we obtain

E(x2(y − f(x1, x2)) | z,x) = g(z),

or

E(x2y | z,x) = f(x1, x2) + g(z) (11)

= h(w),

where f(x1, x2) ≡ x2f(x1, x2) and w ≡ [x1, x2, z]. This implies that the function f(x1x2),

can be obtained from an additive regression of x2y on (x1, x2) and z. Thus, identification of

the function of interest, f(x1, x2), rests on the identification of f(x1, x2) through f(x1, x2) =

f(x1, x2)/x2.

Identification of f(x1, x2) can be analyzed by a similar argument of Theorem 2.1 in Newey,

Powell, and Vella (1999). To be specific, uniqueness of a conditional expectation is equivalent

to the statement that any additive function f ∗(x1, x2) + g∗(z) satisfying equation (11) must

have Prob(f ∗(x1, x2) + g∗(z) = f(x1, x2) + g(z)) = 1. Thus, identification implies equality

of the additive components up to a constant. Intuitively speaking, failure of identification

implies a functional relationship between the random vector (x1, x2) and z. Therefore, a

18

sufficient condition for identification is the absence of an exact relationship between these

random variables. We summarize the identification result in the following Lemma.

Lemma 1 f(x1, x2) is identified, up to an additive constant, if and only if Prob(δ(x1, x2) +

λ(z) = 0) = 1 implies there is a constant cf with Prob(δ(x1, x2) = cf ) = 1.

Proof. The result follows from a similar argument to Theorem 2.1 in Newey, Powell, and

Vella (1999).

4.2 General forms of endogeneity

We have characterized the endogeneity based on the linear dependence (i.e., expectation of

the product of x2 and ε). More generally, modeling endogeneity can be defined in terms

of any nonlinear dependence between x2 and ε. To be precise, instead of equation (4), we

assume

E(Λ(x2, ε;β) | z,x) = g(z;φ), (12)

where Λ(x2, ε;β) is a known measurable function of (x2, ε) up to an unknown vector of

parameters β and where g(z;φ) is a known measurable function of z up to an unknown

vector of parameters φ. We focus on a nonlinear parametric model as follows:

yi = f(x1i, x2i;β) + εi, i = 1, ..., n,

where f(x1, x2;β) is a known measurable function of (x1, x2) up to an unknown vector of

parameters β.

Let Ψ(y,x, z;θ) ≡ Λ(x2, y−f(x1, x2;β);β)−g(z;φ) with θ ≡ [β>,φ>]>. Then equation

(12) can be rewritten as the following moment condition,

E(Ψ(y,x, z;θ) | z,x) = 0.

19

Therefore, the orthogonality conditions can be rewritten as

E(zΨ(y,x, z;θ)) = 0, (13)

where z is a set of variables generated by the conditioning variables (z,x). It is important to

notice that the model in equation (13) allows for over-identified models where the dimension

of z is larger than the dimension of regressors. Assuming global identification of parameters

and full rank condition for matrices associated with Assumption 2, a minimum-distance

estimator of the parameter θ based on equation (13) is obtained by solving the following

minimization problem

θ ≡ argminθnhn(vi;θ)>Whn(vi;θ),

where W is a consistent estimator of a positive definite matrix W , v ≡ [y,x, z], and

hn(vi;θ) =1

n

n∑i=1

ziΨ(yi,xi, zi).

Asymptotic properties of the above estimator can be established by using standard econo-

metric results (see, e.g., Newey, 1990).

5 Monte Carlo simulations

We use simulation experiments to assess the finite sample performance of the proposed si-

multaneous variables (SV) estimator. For comparison, we also report results for the standard

ordinary least squares (OLS) and the instrumental variables (IV) estimators. The result-

ing estimates are compared in terms of bias, median absolute deviation (MAD) and root

mean square error (RMSE). The simulation experiments are based on 5,000 replications and

sample sizes of n ∈ 100, 200, 500, 1000.

We consider a simple data generating process (DGP) as

yi = β0 + β1x1i + β2x2i + εi, i = 1, 2, ..., n,

20

with β0 = β1 = β2 = 1, with covariates generated as x1 = v1 + v2 and x2 = v2 + v3, where

vj ∼ i.i.d.N(0, 1), j = 1, 2, 3.

First, we generate a model to evaluate the relative performance of the SV estimator when

both x1 and x2 variables are exogenous. Thus, there is no need to correct for endogeneity.

In this case we generate ε = v4, v4 ∼ i.i.d.N(0, 1), such that there is no correlation between

ε and x1 or x2. In this scenario, we expect the standard OLS estimator, which does not

make use of any additional variables for estimation, of both β1 and β2 to be approximately

unbiased. Both the SV and IV use additional variables. So, we construct different sets

of variables to study these estimators. In all the simulation experiments, the structural

equation is kept the same and the SV and IV models differ only in the DGP of z. In this

first set of simulations, two different cases are considered. We construct a simultaneous

variable zsv = v2v4 + v5 (Model SV), and an instrumental variable ziv = v2 + v5 (Model IV),

where v5 ∼ i.i.d.N(0, 1). Then, we use the two variables, zsv, ziv, alternatively for the

different estimators, that is, in the role of SV or IV for both estimators. It is important to

notice that zsv contains v2 and v4 and hence is correlated with x2ε. The rationale for such an

experiment is to evaluate what would be the effect of using a variable for the corresponding

estimator and (wrongly) for another estimator, when there is no endogeneity.

Table 1 reports the simulation results for β2 for this case. Note that all estimators are

unbiased and consistent when the correct variable is used. As expected OLS is consistent,

but also is the SV estimator when zsv is used as simultaneous variable and IV when ziv is

used as instrument. Also note that the SV estimator is also consistent when any variable is

used, i.e. zsv, ziv. However, the IV estimator only works when a proper instrument is used,

and produces biased estimates with significant variance when an incorrect instrument is used

instead. That is, if we use zsv as instrument, then by construction, it is not correlated with

x2 (that is, Cov(zsv, x2) = Cov(v2v4 + v5, v2 + v3) = 0), and then the IV estimator variance

21

is large with a consequent effect on MAD and RMSE. On the other hand, SV works with

both zsv, ziv because it does not affect the β estimators when φ = 0.

[TABLE 1 ABOUT HERE]

Second, consider a DGP model with endogeneity. In this case, we generate the innovation

term as ε = v3 + v4, where v4 ∼ i.i.d.N(0, 1). This model induces endogeneity because

corr[x1, x2] = corr[x2, ε] = 0.5 while corr[x1, ε] = 0. Given the correlation between the

x2 regressor and the error term in the DGP, the standard OLS estimators of both β1 and

β2 should be biased, and in particular, E[βOLS2 ] = 53> 1 so that the OLS bias is 2

3. As

in the previous case, in all the simulation experiments, the structural equation is kept the

same and the models differ only in the DGP of z. Three different cases are considered. Let

v5 ∼ i.i.d.N(0, 1). In Model SV1, we generate a DGP following a simultaneous variable

framework. From the additional equation x2ε = v23 + v2v3 + v2v4 + v3v4, we set z = v2

3 + v2v3.

Note that corr[z, x2ε − z] = 0 and corr[x22, x2ε − z] = 0. Thus, Model SV1 satisfies the

simultaneous variables requirements, and the proposed estimator is expected to be consistent.

In Model SV2, we consider z = 1 + x2ε + v5. In this case, z is a noisy measure of x2ε.

Finally, Model IV is designed to satisfy the IV method requirements. We construct z as

an instrumental variable such as z = v2 + v5. In this case, the IV estimator is expected to

perform the best. These simulation results (for β2) are presented in Table 2.


The results for models SV1 and SV2 show that, as expected, the proposed SV performs

very well in terms of bias, MAD and RMSE. The proposed estimator is approximately

unbiased, even for small sample sizes, and the bias, MAD and RMSE decrease as the sample

size increases. In contrast, OLS and IV estimators are largely biased, and their biases do

22

not disappear for large n. Finally, Model IV is favorable to the IV estimator. In this case,

the IV estimator is approximately unbiased while OLS and SV are not.

Overall, the results show evidence that the SV performs very well in finite sample sim-

ulations when a proper simultaneous variable is used. The SV estimator is able to reduce

the endogeneity bias when a simultaneous variable exists that is related to the joint inter-

action of the endogenous variable and the innovations. The last two experiments also show

that there are fundamental differences between our estimator and other methods for solving

endogeneity such as IV. In fact, the IV estimator works only for Model IV.

6 Empirical applications

6.1 Investment model

Fazzari, Hubbard, and Petersen (1988) consider an investment equation model, where a firm’s

investment is regressed on an observed measure of investment demand (Tobin’s q) and cash

flow. Following Fazzari, Hubbard, and Petersen (1988), investment–cash-flow sensitivities

became a standard metric that examines the impact of financing imperfections on corporate

investment (Stein, 2003). These empirical sensitivities are also used for drawing inferences

about efficiency in internal capital markets (Lamont, 1999; Shin and Stulz, 1998), the effect of

agency on corporate spending (Hadlock, 1998; Bertrand and Mullainathan, 2005), the role of

business groups in capital allocation (Hoshi, Kashyap, and Scharfstein, 1991), and the effect

of managerial characteristics on corporate policies (Bertrand and Schoar, 2003; Malmendier

and Tate, 2005). Therefore, the econometric model is represented by the following equation

yit = γ + αcit + βq∗it + ηit, (14)

where yit ≡ Iit/Kit, where I denotes investment and K capital stock, q∗ is marginal Tobin’s

q, cit ≡ CFit/Kit, where CF is cash flow, and η is the unobserved causes of the investment,

23

which can be interpreted as containing, among other factors, a firm specific economic shock

to investment. This shock may affect firms differently because of different perceptions on

the aggregate conjecture. As a result, η is a pure idiosyncratic shock to the investment.

Concerns about measurement errors (ME) have been emphasized in the context of the

investment equation model. Theory suggests that the correct measure for a firm’s invest-

ment demand is captured by marginal q, but this quantity is unobservable and researchers

use instead its measurable counterpart, average q. Because average q measures marginal q

imperfectly, a measurement problem naturally arises (Hayashi, 1982; Poterba, 1988).

Now we consider the structure of the ME on q∗ more carefully. We use a classical ME

framework to formalize the discussion. Let eit denote the Tobin’s q measurement error.

Assume that eit has zero mean, finite variance σ2e and is independent of q∗it. Thus, the

average q is given by

qit = q∗it + eit. (15)

From plugging equation (15) into the investment model in equation (14), we obtain

yit = γ + αcit + β(qit − eit) + ηit

= γ + αcit + βqit + εit, (16)

where εit ≡ ηit − βeit. Therefore, when q∗ is unobserved and a mismeasured q is used as a

regressor, ME induce endogeneity, i.e., q is correlated with ε due to ME.

Therefore, in the standard investment model in (14), if q∗ is measured with error, OLS

estimates of β will be biased. In addition, given that average q and cash flow are likely to

be correlated, the coefficient α is likely to be biased as well. These biases are expected to

be reduced by the use of estimators that solve the ME problem. It has been common in

the literature to employ instrumental variables estimators to resolve the problem (see, e.g.,

Almeida, Campello, and Galvao, 2010; Lewellen and Lewellen, 2014).

24

In order to better evaluate the performance of the alternative estimators in practice, we

describe some hypotheses about the effects of measurement error correction on the estimated

coefficients β and α from equation (14). Theory does not pin down the exact values that

these coefficients should take. Nevertheless, one could argue that the two following conditions

should be reasonable.

First, an estimator that solves ME in q∗ in a standard investment equation should return

a higher estimate for β and a lower estimate for α when compared with standard OLS

estimates. ME cause an attenuation bias on the estimate for the coefficient β. In addition,

since q and cash flow are likely to be positively correlated, ME should cause an upward bias

on the empirical estimate of α under the standard OLS estimation. Thus, denoting the OLS

and the measurement error corrected (MEC) estimates, respectively, by (βOLS, αOLS) and

(βMEC , αMEC), one should expect:

Condition 1: βOLS < βMEC and αOLS > αMEC .

Second, one would expect the coefficient for q to be positive and the cash flow coefficient

to be non-negative after controlling for ME. The q-theory of investment predicts a positive

correlation between investment and q (e.g. Hayashi, 1982). Under this theory, the cash

flow coefficient should be zero (“neoclassical view”), after controlling for ME. However, in

practice the cash flow coefficient could be positive because of either the presence of financing

frictions (e.g. Fazzari, Hubbard, and Petersen, 1988), or the fact that cash flow captures

variation in investment opportunities even after we apply a correction for mismeasurement

in q.4 Therefore, one should observe:

Condition 2: βMEC > 0 and αMEC ≥ 0.

4However, financial constraints are not sufficient to generate a strictly positive cash flow coefficient becausethe effect of financial constraints is capitalized in stock prices and may thus be captured by variations in q(Chirinko, 1993; Gomes, 2001).

25

Notice that these conditions are fairly intuitive. If a particular ME corrected estimator

does not deliver these basic results, one should have reasons to question the usefulness of

that estimator in applied work.

One potential solution to the endogeneity problem is the IV method. Conventional IV

approaches to solve the measurement error in investment equation models employ lags of qit

as instruments for qit.

In order to make the instruments valid, such that the IV approach removes the bias,

assumptions on the dynamics of measurement errors need to be imposed. Almeida, Campello,

and Galvao (2010) discuss such condition in details. In the simplest case, if the measurement

error eit in equation (15) is i.i.d. across firms and time, and qit is serially correlated, then, for

example, qit−2, qit−3, or (qit−2−qit−3) are valid as instruments for qit, because, for instance, in

equation (16) qit−2 is correlated with qit but uncorrelated with εit. The resulting instrumental

variables estimator is consistent.

There are two simple ways to relax the standard assumption of i.i.d. measurement errors.

First, one can consider an assumption of time-invariant autocorrelation. Then twice-lagged

levels of the observable variables can be used as instruments. Second, under the first-order

moving average structure (i.e., MA(1)) for ME, the consistency of the IV estimator requires

the use of instruments that are lagged two periods or longer. Additionally, the identification

requires the latent regressor to have some degree of autocorrelation (since lagged values are

used as instruments). See Biorn (2000) for more details.

However, the approach of using lagged qit as IV’s fails when ME on the marginal q are

systematic. In practice, it is highly likely that current-period measurement error is correlated

with the first-order or higher-order lags of the measurement error. This phenomenon of

autocorrelation in ME seems to be more realistic in practice because systematic errors on a

way of evaluating average q would be highly likely to be persistent. Then, in general eit could

26

depend on eit−1, eit−2, eit−3, .... Under the presence of autocorrelation in ME (e.g., AR(1)),

the IV approach will not work since the estimates are biased due to the dependence of the

instrument (qit−1) and the current-period measurement error (eit) through the first-order lag

of measurement error (eit−1).

In this paper we solve the ME endogeneity problem by explicitly modeling the conditional

expectation of the interaction qitεit in (16) given a set of simultaneous variables zit. In

other words, we model E(qitεit|zit, cit, qit) as a function of observables, zit, and estimate the

parameters of interest consistently. In this section, we show that, contrary to IV, even under

the assumption that the marginal q and the ME are correlated, lagged Tobin’s q and its

square are valid simultaneous variables, zit. This result holds because we can show that

these simultaneous variables are simultaneously related to the endogenous variable q and

the error term ε. Therefore, we use the proposed methods with the following specification

for Assumption 1 (ii) and a set of observable variables zit such that

E(qitεit|zit, cit, qit) = Zitφ, (17)

where Zit = [qit−1, q2it−1].

Now we discuss the choice of these the simultaneous variables Z and show they are

related to the interaction term qε. The ME bias in the OLS estimator arises from the fact

that E[qitεit] 6= 0. Note that

qitεit = (q∗it + eit)(ηit − βeit) = (q∗it + eit)ηit − β(q∗iteit + e2it). (18)

From the maximization of firm’s profit, we can derive the linear investment model (14) by

assuming linearly homogenous cost function. As a result, η is a pure idiosyncratic shock to

the investment and the model implies that there is no correlation between (q∗, e) and η.

Furthermore, η can be assumed to be i.i.d. and with zero mean in equation (14). Thus, our

main concern is the second term in equation (18), (q∗iteit + e2it).

27

We need to show that the simultaneous variables, lagged Tobin’s q and its square, are

valid variables, that is, they are related to the interaction term qitεit. To establish the

result, let both q∗ and e be autocorrelated such that lags of q can be used to eliminate the

endogeneity. Because both the endogenous variable q and the error term ε are functions of

q∗ and e, a polynomial function of the lags of q may be able to model (q∗iteit + e2it). Thus,

the main idea is to model this explicitly.

The data are taken from COMPUSTAT and cover 1970 to 2005, and the data collection

process follows that of Almeida and Campello (2007). The sample consists of manufacturing

firms with fixed capital of more than $ 5 million (with 1976 as the base year for the cpi), and

the sample firms have growth of less than 100% in both assets and sales. Summary statistics

for investment, q, and cash flow are presented in Table 3. These statistics are similar to

those reported by Almeida and Campello (2007), among other papers. To save space, we

omit the discussion of these descriptive statistics.


We estimate the above investment equation using OLS, instrumental variables (IV), and

simultaneous variables models. The results for OLS and IV are summarized in Table 4 and

for the simultaneous variables in Table 5. For the IV we use different combinations of lags of

Tobin’s q as instruments. In the same fashion, we use lags of the regressors as simultaneous

variables.

Regarding the OLS estimates, we obtain the standard result in the literature that both

q and CF attract positive coefficients which are statistically significant (see Table 4). In the

OLS specification, we obtain a q coefficient of 0.0626, and a cash flow coefficient of 0.1307,

which are likely to be biased. These results are consistent with those in the literature. In

particular, Erickson and Whited (2000) and Almeida, Campello, and Galvao (2010) report

28

very similar results for OLS estimates.

Now we consider the IV estimates. We first test the hypothesis of no first-order autocorre-

lation in the residuals of the simple OLS regression using a simple Durbin-Watson test (DW).

We find that the coefficient for the AR(1) model (of the residuals) is 0.474 with standard er-

ror of 0.007, and the DW statistic is 1.051, which rejects the null hypothesis of no first-order

autocorrelation. This result evidences that lags of q are likely invalid instruments for the IV

case. In spite of the testing results, the point estimates are in Table 4. As expected, following

the IV procedure, we obtain estimates for q and cash flow that do not satisfy the conditions

discussed above, although they are statistically significant. In particular, IV estimators do

not reduce the cash flow coefficient relative to that obtained by the standard OLS, while the

q coefficients even decrease. For example, in the third column of Table 4 (IV2), which uses

qt−1 and q2t−1 as instruments, the q coefficient is 0.0496 and CF coefficient is 0.1333. We

also examine the robustness of the estimators to variations in the set of instruments used

in the estimation, including sets that feature longer lags of the variables and the squares of

them in the model. Columns (IV3)-(IV8) of Table 4 present these results and show that the

q and CF coefficients from IV estimators do not improve substantially with the selection of

instruments used. When the first-order of q is not included as instrument, the q coefficients

are even worse. Therefore, given the concern about measurement error, these results show

that the IV estimates are inconsistent with Conditions 1 and 2 above, and there is strong

evidence that the instrumental variables approach is not able to solve the ME problem in

this investment equation example.

The results for the simultaneous variables case are presented in Table 5. All coefficients

for q and CF are statistically different from zero. By comparison, the simultaneous variables

approach delivers results that are consistent with Conditions 1 and 2. The q coefficient is

almost twice as large as the standard OLS estimate, depending on the set of variables used.

29

At the same time, the cash flow coefficient goes down by about 10% from the standard OLS

value. In particular, the results in column S2, using qt−1 and q2t−1 as simultaneous variables,

show that the q coefficient increases from 0.0626 for OLS to 0.1107 for the simultaneous

variables approach, while the cash flow coefficient drops from 0.1307 for OLS to 0.1208 for

the simultaneous variables approach. These results suggest that the proposed estimator

does a fairly reasonable job at addressing the ME problem. It is worth noting that choosing

proper simultaneous variables is important to obtain consistent estimates. When qt−1 and

q2t−1 are chosen (column S2), the proposed estimator obtains the most reasonable results.

When other further lags are added (column S8), the results do not change substantially.

When qt−1 and q2t−1 are omitted, however, the q coefficients reduce. This issue is analogous

to the instrumental variables approach of which consistency heavily relies on the selection of

valid instruments. As a result, it would be interesting and important to study the optimal

selection of simultaneous variables. We leave it as a future research. Nevertheless, the

estimates from our proposed approach are always consistent with Conditions 1 and 2, under

different sets of simultaneous variables.5



In all, our empirical tests support the idea that simultaneous variables estimators are

likely to outperform the IV estimators in economics and finance applications in which auto-

correlated measurement error is a relevant concern.

5For the robustness check of the performances of the proposed estimator, higher-order moments such asthe cubes of the lags of q have been included, but they do not change the results.

30

6.2 Returns to education

This section illustrates another use of the new proposed estimator. One of the most com-

monly studied topics in labor economics is the impact of education on earnings. The problem

of measuring returns to education is an important research area in economics with a very

large literature on the subject. For examples of comprehensive studies, see Card (1995),

Card (1999), Harmon and Oosterbeek (2000), and Harmon, Oosterbeek, and Walker (2003).

The large volume of research in this area has been explained by both the interest in the

causal effect of education on earnings and the inherent difficulty in measuring this effect.

The difficulty arises due to the potential endogenous relationship between education and

earnings. In particular, unobserved individual ability is correlated with both individual’s ed-

ucation and wage, thus inducing bias in standard OLS estimates of the impact of education

on earnings.

There are several different procedures to estimate returns to education. The most com-

mon strategies to solve the endogeneity of education are the IV and the proxy estimators.

The former tries to find a variable that is correlated with education and uncorrelated with

the unobserved characteristics of the individual (see e.g. Angrist and Krueger, 1991; Harmon

and Walker, 1995; Blundell, Dearden, and Sianesi, 2005). The latter assumes the availability

of a proxy variable for ability and includes it as a regressor in the structural equation (see e.g.

Griliches, 1976; Blackburn and Neumark, 1992; Ashenfelter and Krueger, 1994; Blackburn

and Neumark, 1995; Ashenfelter and Rouse, 1998).

To start our analysis, we first specify the functional form of the wage equation we aim

to estimate. Consider the following structural Mincerian model6

log(wagei) = xiβ1 + β2edui + εi, (19)

6See e.g. Mincer (1970) and Mincer (1974).

31

where edu is years of schooling, x denotes a vector of other covariates, and ε unobserved

causes of wage. In particular, x contains 1, exper, age, tenure, tenure-squared, married,

south, urban, black, where exper is labor market experience, age years of age, tenure job

tenure, married a dummy variable for married individuals, south a dummy variable for

southern region, urban a dummy variable for living in a SMSA, and black a race indicator.

In the Mincer specification the unobserved causes of wage, ε, capture unobservable in-

dividual characteristics. However, those individual factors may also influence the schooling

decision, and hence induce a correlation between schooling and the unobserved causes of

wage in equation (19). A common example is unobserved ability. As mentioned above, this

problem has concerned the empirical literature since the earliest contributions. The variable

years of schooling is endogenous because of the omitted ability. Since the ability is positively

correlated with the level of education, the simple OLS is biased upwards if the ability is

omitted.

The inclusion of direct measures of ability should reduce the estimated education coeffi-

cient if the included proxy for ability is valid. In this case, since ability is used as a control,

the coefficient on education captures the effect of education alone. For example, if we assume

that ε = γabil + e, where e is independent of abil, x, and edu, and IQ is a valid proxy for

abil, then IQ solves the endogeneity problem.

However, these assumptions are controversial. For example, these conditions are in fact

implicitly assuming that wage and education are not affected by another common cause such

as personality. Latent personality traits could include Openness, Conscientiousness, Ex-

traversion, Agreeableness, and Neuroticism (OCEAN). Gensowski, Heckman, and Savelyev

(2011), for instance, quantify the personality traits using a three-step estimation procedure

and show that they have significant and meaningful impacts on earnings. In the case wage

and education are indeed affected by another common cause, IQ would not be an effective

32

proxy for both ability and personality, and might deliver biased results. Of course, IQ is

not a good instrumental variable either because it is correlated with the unobserved causes

of wage which include ability, according to psychological theory and empirical findings.

Moreover, as noted in Hansen, Heckman, and Mullen (2004), schooling have a significant

effect on achievement test scores. Thus past IQ results might be correlated with both abil

and edu.

Our model proposes an alternative way to describe the relationship between education

and unobserved causes of wage using Z, the simultaneous variable. From our model we

assume that

eduiεi = φ0 + φ1IQi + φ2IQ2i + ui, (20)

where (IQ, IQ2) are the simultaneous variables Z, and u is uncorrelated with Z and edu

which is the product of edu and edu with x netted out. Note that, given these conditions,

we are able to solve the endogeneity problem.

We now discuss the plausibility of the conditions imposed for the identification of the

parameters. Assumption 1 (i) states that a set of covariates, x, are exogenous in equation

(19), which is commonly imposed in the literature. Assumption 2 is easily satisfied if IQ

and IQ2 are not linearly related to the squared education. Assumption 1 (ii) implies that

E([IQ, IQ2] u) = 0 and E(edu u) = 0. For these conditions to be satisfied IQ should capture

the correlation between education and ability in a way such that u is a pure idiosyncratic

shock to the wage.

The data set is from National Longitudinal Survey (NLS) on the wages of working men

of the first round of Young Men Cohort of the NLS using its 1980 survey. The data is the

same as in Blackburn and Neumark (1992). This cohort was first surveyed in 1966 with 5,225

respondents at ages 14-24, and the same set of young men was resurveyed at one-year intervals

thereafter. The reason to use this data set is its inclusion of two measures of ability: IQ and

33

KWW , which are the scores from two intelligence tests. IQ refers to the intelligence quotient

which is the result of a standardized test intended to measure intelligence in individuals. The

IQ scores were collected as part of a survey of the respondents’ schools conducted in 1968.

The KWW test examines respondents’ knowledge about the labor market, covering the

duties, educational attainment, and relative earnings of ten occupations. The KWW tests

were also administered as part of the initial survey. See Griliches (1976) for a discussion of

these two measures of ability and how they are used to estimate returns to education.

The variables used are summarized in Table 6 (the interpretation of the variables is similar

to Blackburn and Neumark, 1992). The summary statistics is also presented in Table 6 and

contains 935 observations of individuals aged 28-38 and years of schooling 9-18.

[Table 6 ABOUT HERE]

The econometric results appear in Tables 7, 8, and 9. The results for the OLS and IV

approach are presented in Table 7, the first column describes OLS and columns IV1 to IV6

display results from different selection of instrumental variables for the IV approach. In Ta-

bles 8 and 9, columns P1 to P6 and those S1 to S6 present results for the proxy variables and

simultaneous variables models, based on different selection of proxy variables and simultane-

ous variables, respectively. For OLS, IV, and proxy, we use White heteroskedasticity-robust

standard errors in parentheses. For simultaneous variables estimator, we use the variance

estimator described above.

The standard OLS estimate of the returns to education is 0.0611, which is upward biased

according to the theory on the omitted-variable bias. However, all estimates of the returns

to education from IV approach are larger than 0.1. Thus, as expected, IV approach does not

solve the omitted variable bias. The proxy variable model estimates the returns to education

smaller than OLS, which reduces to 0.0487 when IQ is included (column P1) in the structural

34

equation as a proxy variable, 0.0550 when KWW is added (column P3), 0.0461 when both

are added together (column P5), and 0.0437 when the squares of this variables are also

considered (column P6). Thus, the inclusion of the proxy variables in level reduces the returns

of education by about 24%. When we include all the polynomials of these variables up to

the second-order power as proxy variables, the coefficients reduce even further. Nevertheless,

estimated coefficients of the proxy variables in most of cases are not statistically significant.

For instance, although the estimate of the returns to education reaches the minimum, 0.0437,

when the levels and the squares of both IQ and KWW are included (column P6), estimated

coefficients of the level and the square of IQ are not statistically significant. Meanwhile,

when the levels and the squares of KWW are included, estimates of their coefficients are

statistically significant, but the estimated returns to education is 0.0538 (column P4).


Applying the estimator proposed in this paper, estimation of the returns to education

produces smaller estimates than OLS. The estimates range from 0.0438 to 0.0569, depending

on the specification used. For example, when we compare the results using the model con-

taining both IQ and KWW , together with its squares (column S6), the estimated coefficient

of the education is 0.0438, which represents a decrease of about 28% from the simple OLS.

As expected, these estimates from the proposed estimator support the theory suggesting that

education is positively correlated to ability, and therefore, OLS estimates are upward biased.

It is worth to note that the estimates of the returns to education from simultaneous variables

and the proxy variables are both smaller than the OLS estimate, but slightly different to

each other. For instance, when the specification in which estimated coefficients of either

simultaneous or proxy variables are statistically significant is considered, simultaneous vari-

ables and the proxy variables models produce different estimates of the returns to education:

0.0467 from simultaneous variables estimator (column S2), 0.0538 from the proxy variable

35

model (column P4). In this case, the estimate from S2 is much smaller than that from

P2. One possible explanation for the difference could be the case that personality would be

other unobserved cause of wage which is correlated with education. If this is true, the proxy

variable model cannot control for the omitted variable bias from both ability and person-

ality, by only using one observed proxy variable for ability. On the contrary, simultaneous

variables could still control for the endogeneity under some mild conditions. However, since

comparison of the two approaches should be conducted based on model selection associated

with proxy and simultaneous variables, in general, one need to be careful to conclude that

they are significantly different in this example.



There are other important points regarding the estimates of the simultaneous variables.

Note that the estimation of the coefficients of the exogenous regressors, x, is affected. For

instance, most of estimated coefficients of age from the proxy variable model are not sta-

tistically significant. But those from simultaneous variables estimator are all statistically

significant and magnitudes are bigger than OLS or the proxy variable model. Thus, our

approach captures positive effects of age on wage, which coincide with human capital theory

(e.g. Lillard, 1977; Huggett, Ventura, and Yaron, 2006). Furthermore, there are large gaps

between estimated coefficient of black from different approaches. The black coefficient is

estimated as -0.1909 in OLS (column OLS), and its estimate is reduced to -0.1763 in the

proxy variable model (column P4). The difference among these can be interpreted as the fact

that controlling for unobserved ability reduces the race gap. However, column S2 of Table

9 produces a coefficient estimate of -0.2037, and therefore it means that race discrimination

was in fact larger than predicted from the proxy variables model.

36

7 Conclusion

This paper proposes a new methodology for solving endogeneity bias in econometric models.

The main identification condition explicitly models endogeneity bias in a direct way as a

function of additional variables. These, however, play a different role from the instrumental

variables, and hence the new proposal can be viewed as a complement to the IV. Under

the specified conditions, we establish identification of the parameters of interest, propose an

estimator and derive its consistency and asymptotic normality. We also develop inference

procedures based on the Wald statistics. Furthermore, we illustrate the usefulness of the

proposed methods by revisiting the investment equation model and returns to education.

The scope of potential applications using the new techniques is large because it relies on

mild conditions for identification of parameters.

Many issues remain to be investigated in future research. There are many variants of the

model that would extend the structure for the simultaneous variables. A natural extension

would be to panel data. The analysis of the performance of the methods and more general

testing procedures is also important direction. Applications to program evaluation studies

would appear to be a natural laboratory for further development of simultaneous variables

methodology.

37

Appendix

A. Proof of the Theorems

Proof of Theorem 1

Note that from Assumption 1, E(x2ε − Zφ | z,x) = E(x2(y − x1β1 − x2β2) − Zφ |

z,x) = E(x2y − x2x1β1 − x2x2β2 − Zφ | z,x) = E(x2y − x2x1(E(x>1 x1)−1E(x>1 y) −

E(x>1 x1)−1E(x>1 x2)β2)− x2x2β2 −Zφ | z,x) = E(x2(y − x1E(x>1 x1)−1E(x>1 y))− x2(x2 −

x1E(x>1 x1)−1E(x>1 x2))β2 − Zφ | z,x) = E(y − xα | z,x) = 0. We then have E(z>(y −

xα)) = 0 or E(z>y) = E(z>x)α. Also note that from E(x>1 ε) = 0, E(x>1 (y − x1β1 −

x2β2)) = 0, E(x>1 y)− E(x>1 x1)β1 − E(x>1 x2)β2 = 0. This system admits a unique solution

θ if and only if E(z>x) and E(x>1 x1) are non-singular (Assumption 2). Q.E.D.

Proof of Theorem 2

Let x ≡ [x2(x2 − x1δ),Z] and x ≡ [x2(x2 − x1δ),Z] where δ ≡ E(x>1 x1)−1E(x>1 x2) and

δ ≡(

1n

∑ni=1 x

>1ix1i

)−1 ( 1n

∑ni=1 x

>1ix2i

). Also let y ≡ x2(y−x1γ) and y ≡ x2(y−x1γ) where

γ ≡ E(x>1 x1)−1E(x>1 y) and γ ≡(

1n

∑ni=1 x

>1ix1i

)−1 ( 1n

∑ni=1 x

>1iyi).

From yi = xiα+ ui, where ui is i.i.d. innovation, we have

yi + (yi − yi) = (xi + xi − xi)α+ ui,

yi = xiα + ui − (xi − xi)α+ (yi − yi). (21)

Also we have

α =

(1

n

n∑i=1

z>i xi

)−1(1

n

n∑i=1

z>i yi

)

= α+

(1

n

n∑i=1

z>i xi

)−1(1

n

n∑i=1

z>i (ui − (xi − xi)α+ (yi − yi))

).

38

Then

√n(α−α) = Q−1n−1/2

n∑i=1

z>i (ui − (xi − xi)α+ (yi − yi)), (22)

where Q ≡ 1n

∑ni=1 z

>i xi. By Chebychev’s LLN and Slutsky’s theorem,

Q ≡ 1

n

n∑i=1

z>i xip→ E(z>i xi) ≡ Q.

As considered in Pagan (1984), equation (21) contains generated regressors and generated

dependent variables. So we need to consider errors from these approximations in equation

(22).

First, since E(z>i ui) = 0, we have

n−1/2

n∑i=1

z>i ui = op(1).

Second, by a mean value expansion,

n−1/2

n∑i=1

z>i (xi − xi)α =

[n−1

n∑i=1

z>i ∇δxiα

]√n(δ − δ) + op(1),

= G√n(δ − δ) + op(1),

where G = E[z>i ∇δxiα].

Third, a similar argument gives us

n−1/2

n∑i=1

z>i (yi − yi) =

[n−1

n∑i=1

z>i ∇γ yi

]√n(γ − γ) + op(1),

= H√n(γ − γ) + op(1),

where ∇γ yi = −x2x1 and H = E[z>i ∇γ yi].

Note that from the definition δ, we can write the following Bahadur representation

√n(δ − δ) =

√n

n∑i=1

ri(δ) + op(1),

39

where ri(δ) =(

1n

∑ni=1 x

>1ix1i

)−1 (x>1i(x2i − x1iδ)

), and E[ri(δ)] = 0 by the Law of Iterated

Expectations (LIE). In the same way, given the definition of γ, we can write the following

representation

√n(γ − γ) =

√n

n∑i=1

si(γ) + op(1),

where si(γ) =(

1n

∑ni=1 x

>1ix1i

)−1 (x>1i(yi − x1iγ)

), and E[si(γ)] = 0 by LIE.

By combining all terms together, we have

√n(α−α) = Q−1

n−1/2

n∑i=1

[z>i ui −Gri(δ) +Hsi(γ)]

+ op(1).

For the consistency of α, we have

αp→ α+Q−1 · 0 = α.

For the asymptotic normality, we have that by the Lindeberg-Levy Central Limit Theorem,

√n(α−α)

d→ Q−1N(0,M) ≡ N(0, Q−1MQ−1),

where M = V ar(zi>ui −Gri(δ) +Hsi(γ)).

Similarly,

β1 =

(1

n

n∑i=1

x>1ix1i

)−1(1

n

n∑i=1

x1i(x1iβ1 + x2iβ2 + ε)

)−

(1

n

n∑i=1

x>1ix1i

)−1(1

n

n∑i=1

x1ix2i

)β2

= β1 +

(1

n

n∑i=1

x>1ix1i

)−1(1

n

n∑i=1

x>1ix2i

)β2 −

(1

n

n∑i=1

x>1ix1i

)−1(1

n

n∑i=1

x>1ix2i

)β2

= β1 −

(1

n

n∑i=1

x>1ix1i

)−1(1

n

n∑i=1

x>1ix2i

)(β2 − β2).

By Chebychev’s LLN,

C1 ≡1

n

n∑i=1

x>1ix1ip→ E(x>1ix1i) ≡ C1,

C2 ≡1

n

n∑i=1

x>1ix2ip→ E(x>1ix2i) ≡ C2,

40

we have

β1p→ β1 −C−1

1 C2Q−1β2· 0 = β1,

where Qβ2 is the element in the Q matrix that corresponds to the estimation of β2.

Note that

√n(β1 − β1) = −

(1

n

n∑i=1

x>1ix1i

)−1(1

n

n∑i=1

x>1ix2i

)√n(β2 − β2).

Thus, we have

√n(β1 − β1)

d→ C−11 C2N(0, Vβ2) ≡ N(0,C−1

1 C2Vβ2C>2 C

−11 ).

Q.E.D.

B. Extension to Multiple Endogenous Variables

The proposed methodology can be easily extended to the case of multiple endogenous

variables. Suppose that there are multiple endogenous causes, x2. We then have the following

model for each individual

y = x1β1 + x2β2 + ε, (23)

where x2 ≡ [x21, x22, ..., x2p2 ] is a p2-dimensional vector of endogenous regressors and β2 ≡

[β21, β22, · · · , β2p2 ]>

is a p2-dimensional vector of corresponding coefficients. Suppose that

the following equations hold. x21εx22ε

...x2p2ε

=

Z1φ1 + u1

Z2φ2 + u2...

Zp2φp2 + up2

, (24)

where (Z1,Z2, · · · ,Zp2) are simultaneous variables for each equation and where E(uj |

zj,x1, x2j) = 0 for j = 1, 2, ..., p2. Let each Zj be of dimension kj and φj of dimension

kj for j = 1, 2, ..., p2, and K = k1 + k2 + ... + kp2 . This give us a total of p1 + p2 + K

parameters to be estimated.

41

These parameters will be estimated by the same number of moment conditions. First,

using exogeneity of x1, p1 moment conditions

E(x>1 ε) = 0.

Then plugging equation (23) into (24) givesx21(y − x1β1 − x2β2)x22(y − x1β1 − x2β2)

...x2p2(y − x1β1 − x2β2)

=

Z1φ1 + u1

Z2φ2 + u2...

Zp2φp2 + up2

.Using E(x>1 ε) = 0, we have β1 = E(x>1 x1)−1E(x>1 y)−E(x>1 x1)−1E(x>1 x2)β2. Thus we

obtainx21(y − x1(E(x>1 x1)−1E(x>1 y)− E(x>1 x1)−1E(x>1 x2)β2)− x2β2)x22(y − x1(E(x>1 x1)−1E(x>1 y)− E(x>1 x1)−1E(x>1 x2)β2)− x2β2)

...x2p2(y − x1(E(x>1 x1)−1E(x>1 y)− E(x>1 x1)−1E(x>1 x2)β2)− x2β2)

=

Z1φ1 + u1

Z2φ2 + u2...

Zp2φp2 + up2

,orx21(y − x1E(x>1 x1)−1E(x>1 y))x22(y − x1E(x>1 x1)−1E(x>1 y))

...x2p2(y − x1E(x>1 x1)−1E(x>1 y))

=

x21(x2 − x1E(x>1 x1)−1E(x>1 x2))β2 +Z1φ1 + u1

x22(x2 − x1E(x>1 x1)−1E(x>1 x2))β2 +Z2φ2 + u2...

x2p2(x2 − x1E(x>1 x1)−1E(x>1 x2))β2 +Zp2φp2 + up2

.By rearranging the equations we have

y1

y2...yp2

=

x1α1 + u1

x2α2 + u2...

xp2αp2 + up2

,where yj ≡ x2j(y − x1E(x>1 x1)−1E(x>1 y)) and xj ≡ [x2j(x2 − x1E(x>1 x1)−1E(x>1 x2)),Zj],

and αj ≡ [β>2 ,φ>j ]> for j = 1, 2, ..., p2.

As a result, we have the following system of equations

Y = Xα+ u,

42

where

Y =

y1

y2...yp2

, X =

x1 0 · · · 00 x2 · · · 0...

.... . .

...0 0 · · · xp2

,u =

u1

u2...up2

with α ≡ [α>1 ,α

>1 , ...,α

>p2

]>.

Finally, consider the following p2 +K moment conditionsE(x>1 u1) = 0E(x>2 u2) = 0

...E(x>p2up2) = 0

,or equivalently

E(X>u) = 0.

As seen in the case for a single endogenous regressor, conditions similar to Assumptions

1 and 2 for each endogeneous cause are required to identify the parameters β1 and α.

43

References

Almeida, H., and M. Campello (2007): “Financial Constraints, Asset Tangibility and

Corporate Investment,” Review of Financial Studies, 20, 1429–1460.

Almeida, H., M. Campello, and A. Galvao (2010): “Measurement Errors in Invest-

ment Equations,” Review of Financial Studies, 23, 3279–3328.

Altonji, J. G., T. E. Elder, and C. R. Taber (2005): “Selection on Observed and Un-

observed Variables: Assessing the Effectiveness of Catholic Schools,” Journal of Political

Economy, 113, 151–184.

(2008): “Using Selection on Observed Variables to Assess Bias from Unobservables

When Evaluating Swan-Ganz Catheterization,” American Economic Review Papers and

Proceedings, 98, 345–350.

Angrist, J., and A. Krueger (1991): “Does Compulsory School Attendance Affect

Schooling and Earnings,” Quarterly Journal of Economics, 106, 979–1014.

(2001): “Instrumental Variables and the Search for Identification: From Supply

and Demand to Natural Experiments,” Journal of Economic Perspectives, 15, 69–85.

Ashenfelter, O., and A. Krueger (1994): “Estimates of the Economic Return to

Schooling from a New Sample of Twins,” American Economic Review, 84, 1157–1173.

Ashenfelter, O., and C. Rouse (1998): “Income, Schooling, and Ability: Evidence

From a New Sample of Identical Twins,” Quarterly Journal of Economics, 113, 253–284.

Bertrand, M., and S. Mullainathan (2005): “Bidding for Oil and Gas Leases in the

Gulf of Mexico: A Test of the Free Cash Flow Model?,” mimeo, The University of Chicago

and MIT.

44

Bertrand, M., and A. Schoar (2003): “Managing with Style: The Effect of Managers

on Firm Policies,” Quarterly Journal of Economics, 118, 1169–1208.

Biorn, E. (2000): “Panel Data with Measurement Errors: Instrumental Variables and

GMM Procedures Combining Levels and Difference,” Econometric Reviews, 19, 391–424.

Blackburn, M., and D. Neumark (1992): “Unobserved Ability, Efficiency Wages, and

Interindustry Wage Differentials,” Quarterly Journal of Economics, 107, 1421–1436.

(1995): “Are OLS Estimates of the Return to Schooling Biased Downward? Another

Look,” Review of Economics and Statistics, 77, 217–230.

Blundell, R., L. Dearden, and B. Sianesi (2005): “Evaluating the Effect of Education

on Earnings: Models, Methods and Results from the National Child Development Survey,”

Journal of the Royal Statistical Society: Series A, 168, 473–512.

Bound, J., D. A. Jaeger, and R. M. Baker (1995): “Problems With Instrumental

Variables Estimation When the Correlation Between the Instruments and the Endogenous

Explanatory Variables is Weak,” Journal of the American Statistical Association, 90, 443–

450.

Caetano, C. (2015): “A Test of Exogeneity without Instrumental Variables,” mimeo,

University of Rochester.

Card, D. (1995): “Earnings, Schooling, and Ability Revisited,” in Research in Labor Eco-

nomics, ed. by S. Polachek, vol. 14. JAI Press.

(1999): “The Causal Effect of Education on Earnings,” in Handbook of Labor

Economics, ed. by O. Ashenfelter, and D. Card, vol. 3. Amsterdam: Elsevier.

Chalak, K. (2012): “Identification of Average Random Coefficients under Magnitude and

Sign Restrictions on Confounding,” mimeo, University of Virginia.

45

Chalak, K., and H. White (2011): “An Extended Class of Instrumental Variables for

the Estimation of Causal Effects,” Canadian Journal of Economics, 44, 1–51.

Chen, X. (2007): “Large Sample Sieve Estimation of Semi-nonparametric Models,” in

Handbook of Econometrics, ed. by J. J. Heckman, and E. E. Leamer, vol. 6B. North-

Holland.

Chirinko, R. (1993): “Business Fixed Investment Spending: Modeling Strategies, Empir-

ical Results, and Policy Implications,” Journal of Economic Literature, 31, 1875–1911.

Conley, T. G., C. B. Hansen, and P. E. Rossi (2012): “Plausibly Exogenous,” Review

of Economics and Statistics, 94, 260–272.

Erickson, T., and T. Whited (2000): “Measurement Error and the Relationship between

Investment and Q,” Journal of Political Economy, 108, 1027–1057.

Fazzari, S., R. G. Hubbard, and B. Petersen (1988): “Financing Constraints and

Corporate Investment,” Brooking Papers on Economic Activity, 1, 141–195.

Gandhi, A., K. Kim, and A. Petrin (2013): “Identification and Estimation in Discrete

Choice Demand Models when Endogenous Variables Interact with the Error,” University

of Minnesota, mimeo.

Gensowski, M., J. Heckman, and P. Savelyev (2011): “The Effects of Education,

Personality, and IQ on Earnings of High-Ability Men,” mimeo, The University of Chicago.

Gomes, J. (2001): “Financing Investment,” American Economic Review, 91, 1263–1285.

Griliches, Z. (1976): “Wages of Very Young Men,” Journal of Political Economy, 84,

69–86.

46

Hadlock, C. (1998): “Ownership, Liquidity, and Investment,” RAND Journal of Eco-

nomics, 29, 487–508.

Hahn, J., and J. Hausman (2002): “A New Specification Test for the Validity of Instru-

mental Variables,” Econometrica, 70, 163–189.

Hansen, K. T., J. J. Heckman, and K. J. Mullen (2004): “The Effect of Schooling

and Ability on Achievement Test Scores,” Journal of Econometrics, 121, 39–98.

Harmon, C. P., and H. Oosterbeek (2000): “The Returns to Education: A Review

of the Evidence, Issues and Deficiencies in the Literature,” Centre for the Economics of

Education, LSE.

Harmon, C. P., H. Oosterbeek, and I. Walker (2003): “The Returns to Education:

Microeconomics,” Journal of Economic Surveys, 17, 115–156.

Harmon, C. P., and I. Walker (1995): “Estimates of the Economic Return to Schooling

for the United Kingdom,” American Economic Review, 85, 1278–1286.

Hausman, J. A. (1983): “Specification and Estimation of Simultaneous Equation Models,”

in Handbook of Econometrics, ed. by Z. Griliches, and M. D. Intriligator, vol. II, pp.

391–448. North-Holland Press.

Hayashi, F. (1982): “Tobin’s Marginal q and Average q: A Neoclassical Interpretation,”

Econometrica, 50, 213–224.

Hoshi, T., A. Kashyap, and D. Scharfstein (1991): “Corporate Structure, Liquid-

ity, and Investment: Evidence from Japanese Industrial Groups,” Quarterly Journal of

Economics, 106, 33–60.

Huggett, M., G. Ventura, and A. Yaron (2006): “Human Capital and Earnings

Distribution Dynamics,” Journal of Monetary Economics, 53, 265–290.

47

Kiviet, J. F. (2012): “Identification and Inference in a Simultaneous Equation under Alter-

native Information Sets and Sampling Schemes,” The Econometrics Journal, forthcoming.

Klein, R., and F. Vella (2010): “Estimating a Class of Triangular Simultaneous Equa-

tions Models without Exclusion Restrictions,” Journal of Econometrics, 154, 154–164.

Lamont, O. (1999): “Cash Flow and Investment: Evidence from Internal Capital Markets,”

The Journal of Finance, 52, 83–110.

Levinsohn, J., and A. Petrin (2003): “Estimating Production Functions Using Inputs

to Control for Unobservables,” Review of Economic Studies, 70, 317–342.

Lewbel, A. (2012): “Using Heteroskedasticity to Identify and Estimate Mismeasured and

Endogenous Regressor Models,” Journal of Business and Economic Statistics, 30, 67–80.

Lewellen, J., and K. Lewellen (2014): “Investment and Cashflow: New Evidence,”

Journal of Financial and Quantitative Analysis, Forthcoming.

Lillard, L. (1977): “Inequality: Earnings vs. Human Wealth,” American Economic Re-

view, 67, 42–53.

Malmendier, U., and G. Tate (2005): “CEO Overconfidence and Corporate Invest-

ment,” The Journal of Finance, 60, 2661–2700.

Mincer, J. (1970): “The Distribution of Labor Incomes: A Survey with Special Reference

to the Human Capital Approach,” Journal of Economic Literature, 8, 1–26.

(1974): Schooling, Experience, and Earnings. Columbia University Press, New

York, NY.

Nevo, A., and A. M. Rosen (2012): “Identification with Imperfect Instruments,” Review

of Economics and Statistics, 94, 659–671.

48

Newey, W. K. (1990): “Efficient Instrumental Variables Estimation of Nonlinear Models,”

Econometrica, 58, 809–837.

Newey, W. K., J. L. Powell, and F. Vella (1999): “Nonparametric Estimation of

Triangular Simultaneous Equations Models,” Econometrica, 67, 565–603.

Olley, S., and A. Pakes (1996): “The Dynamics of Productivity in the Telecommunica-

tions Equipment Industry,” Econometrica, 64, 1263–1298.

Pagan, A. (1984): “Econometric Issues in the Analysis of Regressions with Generated

Regressors,” International Economic Review, 25, 221–247.

Poterba, J. (1988): “Financing Constraints and Corporate Investment: Comment,”

Brookings Papers on Economic Activity, 1, 200–204.

Shin, H., and R. Stulz (1998): “Are Internal Capital Markets Efficient?,” Quarterly

Journal of Economics, 113, 531–552.

Stein, J. (2003): “Agency Information and Corporate Investment,” in Handbook of the Eco-

nomics of Finance, ed. by M. Harris, and R. Stulz. Elsevier/North-Holland, Amsterdam.

Wooldridge, J. (2002): Econometric Analysis of Cross Section and Panel Data. MIT

Press, London.

(2009): “On Estimating Firm-Level Production Functions Using Proxy Variables

to Control for Unobservables,” Economics Letters, 104, 112–114.

49

Table 1: DGP with no endogeneity. Monte Carlo re-sults: Bias, Median Absolute Deviation, and Root MeanSquared Error

EstimatorModel OLS IV SV OLS IV SV

n = 100 n = 200Bias

zsv -0.001 3.641 0.002 0.001 -1.104 0.002ziv -0.001 0.002 -0.003 0.001 0.000 0.000

MADzsv 0.056 1.056 0.070 0.041 1.091 0.050ziv 0.055 0.166 0.085 0.040 0.115 0.060

RMSEzsv 0.083 253.644 0.106 0.059 119.844 0.075ziv 0.082 0.416 0.126 0.059 0.186 0.090

n = 500 n = 1000Bias

zsv 0.000 9.609 0.000 0.000 -2.191 0.001ziv 0.000 0.000 0.000 0.000 0.002 0.000

MADzsv 0.025 1.041 0.032 0.018 1.048 0.022ziv 0.024 0.075 0.040 0.018 0.052 0.027

RMSEzsv 0.037 797.869 0.048 0.026 101.951 0.034ziv 0.036 0.115 0.058 0.026 0.080 0.042

Notes: Monte Carlo experiments are based on 5,000 repetitions. Estimates of β2. zsv and ziv are,

respectively, the used simultaneous variables and instrumental variables employed in the estimations.

50

Table 2: DGP with endogeneity. Monte Carlo re-sults: Bias, Median Absolute Deviation, and Root MeanSquared Error

EstimatorModel OLS IV SV OLS IV SV

n = 100 n = 200Bias

zsv1 0.667 0.851 -0.138 0.668 0.644 -0.097zsv2 0.666 -21.8 -0.019 0.667 0.755 -0.017ziv 0.668 -0.078 0.808 0.667 -0.037 0.805

MADzsv1 0.668 0.943 0.205 0.669 0.947 0.143zsv2 0.664 1.202 0.045 0.666 1.199 0.027ziv 0.665 0.236 0.806 0.668 0.170 0.804

RMSEzsv1 0.674 18.1 0.307 0.671 63.462 0.225zsv2 0.673 1557.8 0.085 0.670 25.6 0.057ziv 0.675 2.737 0.825 0.670 0.281 0.814

n = 500 n = 1000Bias

zsv1 0.666 -0.161 -0.056 0.667 0.570 -0.031zsv2 0.668 0.510 -0.009 0.666 1.140 -0.005ziv 0.667 -0.015 0.801 0.667 -0.003 0.802

MADzsv1 0.666 0.895 0.089 0.667 0.921 0.060zsv2 0.668 1.179 0.014 0.667 1.185 0.009ziv 0.668 0.168 0.841 0.667 0.073 0.802

RMSEzsv1 0.668 58.2 0.141 0.667 20.719 0.094zsv2 0.669 51.1 0.030 0.667 24.214 0.017ziv 0.669 0.168 0.805 0.667 0.110 0.804

Notes: Monte Carlo experiments are based on 5000 repetitions. Estimates of β2. zsv1, zsv2, and ziv

are, respectively, the used simultaneous variables 1 and 2, and instrumental variables employed in the

estimations.

51

Table 3: Investment example. Descriptive statistics

Variable Mean Std. Dev. Min Max ObsInvestment 0.2026 0.1348 0.0000 2.5349 24676

q 0.8755 0.4900 0.1307 22.7939 24676

Cash flow 0.3193 0.6212 -80.3571 14.8021 24676

Table 4: Investment example. OLS and Instrumental Variables approaches (using qt−1, q2t−1

qt−2, q2t−2, qt−3, and q2

t−3 as IV)

OLS IV1 IV2 IV3 IV4 IV5 IV6 IV7 IV8q 0.0626*** 0.0478*** 0.0496*** 0.0325*** 0.0373*** 0.0241*** 0.0288*** 0.0470*** 0.0488***

( 0.003 ) ( 0.004 ) ( 0.004 ) ( 0.005 ) ( 0.005 ) ( 0.006 ) ( 0.006 ) ( 0.004 ) ( 0.004 )CF 0.1307*** 0.1337*** 0.1333*** 0.1368*** 0.1358*** 0.1385*** 0.1375*** 0.1338*** 0.1335*

( 0.003 ) ( 0.003 ) ( 0.003 ) ( 0.003 ) ( 0.003 ) ( 0.003 ) ( 0.003 ) ( 0.003 ) ( 0.003 )Const. 0.1053*** 0.1165*** 0.1151*** 0.1282*** 0.1245*** 0.1345*** 0.1310*** 0.1171*** 0.1157

( 0.003 ) ( 0.003 ) ( 0.003 ) ( 0.004 ) ( 0.004 ) ( 0.005 ) ( 0.005 ) ( 0.003 ) ( 0.003 )qt−1 0.8348*** 1.0091*** 0.8133*** 1.0149***

( 0.004 ) ( 0.012 ) ( 0.007 ) ( 0.016 )q2t−1 -0.0699*** -0.0818***

( 0.004 ) ( 0.005 )qt−2 0.5393*** 0.8266*** -0.0140* -0.0461***

( 0.004 ) ( 0.009 ) ( 0.008 ) ( 0.013 )q2t−2 -0.0806*** 0.0070**

( 0.002 ) ( 0.003 )qt−3 0.2829*** 0.4717*** 0.0304*** 0.0526***

( 0.003 ) ( 0.006 ) ( 0.004 ) ( 0.006 )q2t−3 -0.0308*** -0.0023***

( 0.001 ) ( 0.001 )

Notes: Robust standard errors in parentheses. Coefficients and standard errors for the IV are from the first stage regression.*** p<0.01, ** p<0.05, * p<0.1.

52

Table 5: Investment example. Simultaneous variable approach (using qt−1, q2t−1, qt−2, q2

t−2,qt−3, and q2

t−3 as simultaneous variables)

S1 S2 S3 S4 S5 S6 S7 S8q 0.0938*** 0.1107*** 0.0896*** 0.0898*** 0.0804*** 0.0802*** 0.0969*** 0.1129***

( 0.026 ) ( 0.028 ) ( 0.022 ) ( 0.022 ) ( 0.019 ) ( 0.019 ) ( 0.026 ) ( 0.028 )CF 0.1243*** 0.1208*** 0.1252*** 0.1251*** 0.1270*** 0.1271*** 0.1237*** 0.1204***

( 0.005 ) ( 0.006 ) ( 0.004 ) ( 0.004 ) ( 0.004 ) ( 0.004 ) ( 0.005 ) ( 0.006 )Const. 0.0816*** 0.0687*** 0.0848*** 0.0846*** 0.0918*** 0.0920*** 0.0792*** 0.0671***

( 0.020 ) ( 0.021 ) ( 0.016 ) ( 0.016 ) ( 0.014 ) ( 0.015 ) ( 0.020 ) ( 0.021 )qt−1 -0.0684** 0.0766*** -0.0350 0.0915***

( 0.029 ) ( 0.022 ) ( 0.028 ) ( 0.030 )q2t−1 -0.0664*** -0.0606***

( 0.017 ) ( 0.017 )qt−2 -0.0546*** -0.0248 -0.0260* -0.0142

( 0.017 ) ( 0.019 ) ( 0.015 ) ( 0.026 )q2t−2 -0.0085 -0.0042

( 0.005 ) ( 0.008 )qt−3 -0.0282*** -0.0253** -0.0093 -0.0071

( 0.009 ) ( 0.011 ) ( 0.009 ) 0.014 )q2t−3 -0.0005 0.0007

( 0.001 ) ( 0.003 )

Notes: Robust standard errors in parentheses. *** p<0.01, ** p<0.05, * p<0.1.

Table 6: Education example. Descriptive statistics

Abbreviation Variable Mean Std. Dev. Min Max Obswage monthly earnings 957.95 404.36 115 3078 935IQ IQ score 101.28 15.05 50 145 935

KWW knowledge of world work score 35.74 7.64 12 56 935edu years of education 13.47 2.20 9 18 935exper years of work experience 11.56 4.37 1 23 935age age in years 33.08 3.11 28 38 935

tenure years with current employer 7.23 5.08 0 22 935married =1 if married 0.89 0.31 0 1 935south =1 if live in south 0.34 0.47 0 1 935urban =1 if live in SMSA 0.72 0.45 0 1 935black =1 if black 0.13 0.33 0 1 935lwage log(wage) 6.78 0.42 4.74 8.03 935

53

Table 7: Education example. OLS and Instrumental Variables approaches (using IQ, IQ2,KWW , and KWW 2 as IV variables)

OLS IV1 IV2 IV3 IV4 IV5 IV6

edu 0.0611*** 0.1104*** 0.1103*** 0.1055*** 0.1122*** 0.1089*** 0.1121***( 0.007 ) ( 0.015 ) ( 0.015 ) ( 0.019 ) ( 0.019 ) ( 0.014 ) ( 0.013 )

exper 0.0112*** 0.0253*** 0.0253*** 0.0239*** 0.0259*** 0.0249*** 0.0258***( 0.004 ) ( 0.005 ) ( 0.005 ) ( 0.006 ) ( 0.006 ) ( 0.005 ) ( 0.005 )

age 0.0085* -0.0016 -0.0016 -0.0006 -0.0020 -0.0013 -0.0019( 0.005 ) ( 0.006 ) ( 0.005 ) ( 0.006 ) ( 0.006 ) ( 0.005 ) ( 0.005 )

tenure 0.0272*** 0.0181** 0.0181** 0.0190** 0.0177** 0.0184** 0.0178**( 0.008 ) ( 0.009 ) ( 0.009 ) ( 0.009 ) ( 0.009 ) ( 0.009 ) ( 0.009 )

tenure2 -0.0010** -0.0005 -0.0005 -0.0005 -0.0004 -0.0005 -0.0004( 0.000 ) ( 0.001 ) ( 0.001 ) ( 0.001 ) ( 0.001 ) ( 0.001 ) ( 0.001 )

married 0.1934*** 0.2055*** 0.2055*** 0.2043 *** 0.2060 *** 0.2052 *** 0.2059 ***( 0.039 ) ( 0.040 ) ( 0.040 ) ( 0.040 ) ( 0.041 ) ( 0.040 ) ( 0.040 )

south -0.0909*** -0.0823*** -0.0823*** -0.0832*** -0.0820*** -0.0826*** -0.0820***( 0.026 ) ( 0.027 ) ( 0.027 ) ( 0.027 ) ( 0.027 ) ( 0.027 ) ( 0.027 )

urban 0.1852*** 0.1722*** 0.1722*** 0.1735*** 0.1717*** 0.1726*** 0.1718***( 0.027 ) ( 0.028 ) ( 0.028 ) ( 0.028 ) ( 0.028 ) ( 0.028 ) ( 0.028 )

black -0.1909*** -0.1475*** -0.1475*** -0.1518*** -0.1459*** -0.1488*** -0.1460 ***( 0.038 ) ( 0.041 ) ( 0.040 ) ( 0.042 ) ( 0.042 ) ( 0.040 ) ( 0.040 )

Cons. 5.1730*** 4.6962*** 4.6965*** 4.7436*** 4.6784*** 4.7108*** 4.6796***( 0.158 ) ( 0.208 ) ( 0.206 ) ( 0.238 ) ( 0.237 ) ( 0.200 ) ( 0.198 )

IQ 0.0614*** -0.0674* 0.0506*** -0.0731**(0.004) (0.036) (0.004) (0.035)

IQ2 0.0006*** 0.0006***(0.000) (0.000)

KWW 0.0983*** -0.0080 0.0595*** -0.0435(0.009) (0.054) (0.009) (0.050)

KWW 2 0.0015** 0.0015**(0.001) (0.001)

Notes: Robust standard errors in parentheses. Coefficients and standard errors for the IVvariables are from the first stage regression. *** p<0.01, ** p<0.05, * p<0.1.

54

Table 8: Education example. Proxy approach (using IQ, IQ2, KWW , and KWW 2 as proxyvariables)

P1 P2 P3 P4 P5 P6

edu 0.0487*** 0.0479*** 0.0550*** 0.0538*** 0.0461*** 0.0437***( 0.007 ) ( 0.007 ) ( 0.007 ) ( 0.007 ) ( 0.007 ) ( 0.007 )

exper 0.0107*** 0.0107*** 0.0113*** 0.0115*** 0.0108*** 0.0110***( 0.004 ) ( 0.004 ) ( 0.004 ) ( 0.004 ) ( 0.004 ) ( 0.004 )

age 0.0102** 0.0102** 0.0040 0.0023 0.0070 0.0053( 0.005 ) ( 0.005 ) ( 0.005 ) ( 0.005 ) ( 0.005 ) ( 0.005 )

tenure 0.0278*** 0.0276*** 0.0280*** 0.0270*** 0.0283*** 0.0270***( 0.008 ) ( 0.008 ) ( 0.008 ) ( 0.008 ) ( 0.008 ) ( 0.008 )

tenure2 -0.0010** -0.0010** -0.0010** -0.0010** -0.0011** -0.0010**( 0.000 ) ( 0.000 ) ( 0.000 ) ( 0.000 ) ( 0.000 ) ( 0.000 )

married 0.1928*** 0.1950*** 0.1857*** 0.1902*** 0.1878*** 0.1944***( 0.039 ) ( 0.039 ) ( 0.039 ) ( 0.039 ) ( 0.039 ) ( 0.039 )

south -0.0795*** -0.0784*** -0.0918*** -0.0914*** -0.0814*** -0.0797***( 0.026 ) ( 0.026 ) ( 0.026 ) ( 0.026 ) ( 0.026 ) ( 0.026 )

urban 0.1831*** 0.1846*** 0.1774*** 0.1758*** 0.1782*** 0.1777***( 0.027 ) ( 0.027 ) ( 0.027 ) ( 0.027 ) ( 0.027 ) ( 0.027 )

black -0.1429*** -0.1518*** -0.1677*** -0.1763*** -0.1331*** -0.1484***( 0.039 ) ( 0.040 ) ( 0.039 ) ( 0.039 ) ( 0.040 ) ( 0.041 )

Cons. 4.9007*** 5.2920*** 5.2354*** 5.7889*** 4.9728*** 5.8760***( 0.173 ) ( 0.426 ) ( 0.160 ) ( 0.264 ) ( 0.178 ) ( 0.465 )

IQ 0.0038*** -0.0041 0.0034*** -0.0034( 0.001 ) ( 0.008 ) ( 0.001 ) ( 0.008 )

IQ2 0.0000 0.0000( 0.000 ) ( 0.000 )

KWW 0.0050** -0.0246** 0.0033 -0.0273**( 0.002 ) ( 0.011 ) ( 0.002 ) ( 0.011 )

KWW 2 0.0004*** 0.0004***( 0.000 ) ( 0.000 )


55

Table 9: Education example. Simultaneous variable approach (using IQ, IQ2, KWW , andKWW 2 as simultaneous variables)

S1 S2 S3 S4 S5 S6

edu 0.0569*** 0.0467*** 0.0569*** 0.0509*** 0.0569*** 0.0438***( 0.010 ) ( 0.009 ) ( 0.009 ) ( 0.012 ) ( 0.009 ) ( 0.009 )

exper 0.0100*** 0.0071*** 0.0100*** 0.0083** 0.0100*** 0.0063**( 0.003 ) ( 0.003 ) ( 0.003 ) ( 0.003 ) ( 0.003 ) ( 0.003 )

age 0.0094*** 0.0115*** 0.0094*** 0.0106*** 0.0094*** 0.0121***( 0.002 ) ( 0.002 ) ( 0.002 ) ( 0.002 ) ( 0.002 ) ( 0.002 )

tenure 0.0279*** 0.0298*** 0.0279*** 0.0291*** 0.0279*** 0.0303***( 0.002 ) ( 0.002 ) ( 0.002 ) ( 0.002 ) ( 0.002 ) ( 0.002 )

tenure2 -0.0010*** -0.0011*** -0.0010*** -0.0011*** -0.0010*** -0.0012***( 0.000 ) ( 0.000 ) ( 0.000 ) ( 0.000 ) ( 0.000 ) ( 0.000 )

married 0.1923*** 0.1898*** 0.1923*** 0.1908*** 0.1923*** 0.1891***( 0.002 ) ( 0.002 ) ( 0.002 ) ( 0.003 ) ( 0.002 ) ( 0.002 )

south -0.0917*** -0.0935*** -0.0917*** -0.0927*** -0.0917*** -0.0939***( 0.002 ) ( 0.002 ) ( 0.002 ) ( 0.002 ) ( 0.002 ) ( 0.002 )

urban 0.1863*** 0.1890*** 0.1863 *** 0.1879 *** 0.1863 *** 0.1898 ***( 0.003 ) ( 0.002 ) ( 0.002 ) ( 0.003 ) ( 0.002 ) ( 0.002 )

black -0.1946*** -0.2037*** -0.1947 *** -0.2000 *** -0.1947*** -0.2061***( 0.009 ) ( 0.008 ) ( 0.008 ) ( 0.010 ) ( 0.008 ) ( 0.008 )

Cons. 5.2137*** 5.3132*** 5.2143*** 5.2724*** 5.2143*** 5.3405***( 0.095 ) ( 0.091 ) ( 0.085 ) ( 0.113 ) ( 0.085 ) ( 0.086 )

IQ 0.0009 -0.0483*** -0.0013 0.0509( 0.002 ) ( 0.018 ) ( 0.056 ) ( 0.044 )

IQ2 0.0005*** -0.3268***( 0.000 ) ( 0.015 )

KWW 0.0027 -0.0845 0.0064 0.0000( 0.008 ) ( 0.197 ) ( 0.159 ) ( 0.000 )

KWW 2 0.0023 0.0053*( 0.005 ) ( 0.003 )


56

Download - Simultaneous Variables - Gabriel Montesgabrielmontes.com.ar/GMRS_87.pdf · ditional variables are de ned as simultaneous variables. We derive identi cation of the parameters, develop

Top Related