Simultaneous Variables∗
Antonio F. Galvao†
Gabriel Montes-Rojas‡
Suyong Song§
June 17, 2015
Abstract
This paper proposes an alternative solution to the endogeneity problem by explic-itly modeling the joint interaction of the endogenous variables and the unobservedcauses of the dependent variable as a function of additional observables. These ad-ditional variables are defined as simultaneous variables. We derive identification ofthe parameters, develop an estimator, and establish its consistency and asymptoticnormality. We illustrate the methods with two examples that examine the investmentequation model and returns to education. Our empirical findings support the idea thatthe proposed estimator is a useful alternative to the instrumental and proxy variablesestimator for economic applications in which endogeneity is a relevant concern.
Keywords: Endogeneity, instrumental variables, simultaneous variables, investment equa-tion, returns to education.
JEL Classification: C10, C26, G31
∗The authors would like to express their appreciation to Carolina Caetano, Karim Chalak, Juan CarlosEscanciano, Sergio Firpo, Amit Gandhi, Zhengyuan Gao, Chuan Goh, Fabio Gomes, Susumu Imai, Kyoo ilKim, Roger Koenker, Joris Pinkse, Alex Poirier, Gene Savin, Liang Wang, and seminar participants at theUniversity of Minnesota, University of Illinois at Urbana-Champaign, Universidad de Buenos Aires, Univer-sity of Iowa, University of Wisconsin-Milwaukee, Central Bank of Brazil, 23rd Midwest Econometrics Group,Universidad de San Andres, University of Pittsburgh, and Royal Economic Society for helpful discussionsand comments regarding this paper. All the remaining errors are ours.†Department of Economics, University of Iowa, W284 Pappajohn Business Building, 21 E. Market Street,
Iowa City, IA 52242. E-mail: [email protected]‡CONICET-Universidad de San Andres, Vito Dumas 275, Victoria, Pcia. de Buenos Aires, Argentina
and Department of Economics, City University London, D316 Social Sciences Bldg, Northampton Square,London EC1V 0HB, UK. Email: [email protected]§Department of Economics, University of Iowa, W360 Pappajohn Business Building, 21 E. Market Street,
Iowa City, IA 52242. E-mail: [email protected]
1
1 Introduction
The problem of endogeneity occupies a substantial amount of research in theoretical and
applied econometrics. The most popular approaches are instrumental variables (IV) (see
e.g. Hausman, 1983; Angrist and Krueger, 2001, for surveys) and control function approach
(see e.g. Olley and Pakes (1996), Levinsohn and Petrin (2003), Wooldridge (2002), and
Wooldridge (2009)). These solutions rely on exogenous information derived from an ad-
ditional exclusion restriction or equation. In applications, the type of restriction chosen
determines the nature of the model to be used, i.e. the instrument or the proxy variable.
However, in many empirical applications, there is frequently disagreement and concern about
the exclusion restrictions imposed, and instruments and proxies selections. The potential
IV are often argued to be invalid since they are still correlated with the error term (see,
e.g., Bound, Jaeger, and Baker, 1995; Hahn and Hausman, 2002) while the conditions for
identification using proxy variables are many times implausible.
This paper contributes to the literature by proposing an alternative solution to the prob-
lem of endogeneity. In doing so, we propose a new identification condition which explicitly
models the endogeneity bias. More specifically, the conditional expectation of the joint
interaction of the endogenous variables and unobserved causes of the dependent variable
is assumed to be a function of additional observable variables. We define such additional
variables as simultaneous variables because they are simultaneously related to both the en-
dogenous variable and the unobservables. Thus, we develop simple and tractable methods
for conducting estimation and inference in econometric models with endogeneity, and allow
the additional variables (the simultaneous variables) to still be correlated with the innova-
tion term. Given the structural model and the simultaneous variables, we establish point
identification of the parameters of interest. Motivated by the identification result we de-
velop an estimator that is simple to implement in practice, and establish its consistency and
2
asymptotic normality. In addition, we propose practical inference procedures based on the
Wald statistic, as well as a test for endogeneity.
The proposed framework can be framed within the control function approach, although
it builds on specific assumptions for identification based on the nature of the simultaneous
variables. Our framework allows for situations in which there are no valid standard IV avail-
able, but there exist additional variables that happen to be related to the joint interaction of
the endogenous variable and the unobserved causes of the dependent variable. The intuition
on the main identification condition of the new procedure is that, by using the proposed
restriction, the econometrician is able to approximate the endogeneity bias using the simul-
taneous variables, such that after controlling for the simultaneous variables, the endogenous
variables are not related to the interaction term (between the endogenous variable and the
innovation). Thus, the endogeneity bias implied by the non-zero conditional expectation of
the interaction term can be effectively specified as a function of the simultaneous variables
only. Strictly speaking, neither the IV or simultaneous method is more general than the
other because the underlying assumptions are non-nested. They differ in the characteristics
of the additional variables; in IV case these cause the endogenous variable only, while in our
proposed method they are allowed to affect the joint interaction of the endogenous variable
and error term.
Monte Carlo studies are conducted to evaluate the finite sample properties of the proposed
estimator. The experiments suggest that when simultaneous variables are available, the
proposed method produces consistent estimates under the required assumptions. Moreover,
the simulations highlight the differences between the simultaneous variables approach and
the IV method. The results suggest that when only simultaneous variables exist, our method
is able to eliminate the endogeneity bias while the IV returns significantly biased estimates.
The new procedures provide intuitive and practical ways of handling the problem of
3
endogeneity in empirical settings. To motivate and illustrate the applicability of the identifi-
cation and estimator, we apply the methods to two empirical examples. First, an investment
equation model where measurement errors are a relevant concern. In particular, we consider
the Fazzari, Hubbard, and Petersen’s (1988) investment equation model, where a firm’s in-
vestment is regressed on observed investment demand (Tobin’s q) and cash flows. Concerns
about measurement errors in Tobin’s q have been emphasized in the empirical investment
equation models context (see e.g., Hayashi, 1982; Poterba, 1988) because researchers only
observe average q instead of marginal q. Hence, the measurement errors induce endogeneity
in the independent variables, and consequently bias in their estimates. We show that our
proposed methods are well suited to solve the existing measurement error problem in this
literature. It has been common in the literature to employ IV methods using the lags of q as
instruments to resolve the endogeneity problem (see e.g., Almeida, Campello, and Galvao,
2010; Lewellen and Lewellen, 2014). However, the approach of using lagged q as IV fails when
the measurement errors on the marginal q are systematic. In practice, it is highly likely that
current-period measurement error is correlated with the first-order or higher-order lags of the
measurement error. This phenomenon of autocorrelation in the measurement errors seems
to be more realistic in practice because systematic errors on a way of evaluating average q
would be highly likely to be persistent. Contrary to the IV, we show that even under the
assumption that the marginal q and measurement error are correlated, the lagged Tobin’s q
and its square are valid simultaneous variables because they are related to the interaction
of the endogenous variable q and the error term. Thus, we can use these simultaneous vari-
ables to remove the bias from the measurement error and obtain consistent estimates for the
parameters of interest. For comparison, we estimate the model using OLS, IV and simulta-
neous variables estimators. Our empirical findings support the idea that the simultaneous
variable estimator is a useful alternative to IV models.
4
In the second example, we estimate returns to education, a topic that has been exten-
sively discussed in the empirical literature. In general, the literature is concerned with the
endogeneity in education because of unobserved causes of wage, e.g. omitted ability. It is
argued that finding valid IV or proxy variables is not an easy task (see e.g. Card, 1999). Our
model proposes an alternative way to control for endogeneity in education by modeling the
dependence of education and the unobserved causes of wage as a function of simultaneous
variables and an innovation term. We argue that IQ test scores can be used as a simul-
taneous variable as it affects both ability and education. The empirical results show that
our proposed approach estimates smaller returns to education than OLS, IV or proxy ap-
proaches. Moreover, it delivers more reasonable estimates than other regression coefficients
such as age and ethnicity. As a result, our method would be a useful alternative to control
for upward bias in the estimation of returns to education.
The proposed methods relate to recent advances in the literature on various alternative
estimation and inference procedures that attempt to solve endogeneity without standard
instrumental variables. In a recent paper, Conley, Hansen, and Rossi (2012) present practical
methods for performing inference while relaxing the IV exclusion restriction. They use prior
information regarding the extent of deviations from the exact exclusion restriction. Chalak
and White (2011) define a new class of extended IV, and introduce notions of conditioning
and conditional extended IV which allow use of non-traditional instruments, as they may be
endogenous. Chalak (2012) achieves identification of parameters by employing restrictions
on the magnitude and sign of confounding instead of using traditional IV. Nevo and Rosen
(2012) provide bounds for the parameters when the standard exogeneity assumption on IV
fails, by assuming the correlation between the instruments and the error term has the same
sign as the correlation between the endogenous regressor and the error term and that the
instruments are less correlated with the error term than is the endogenous regressor. Gandhi,
5
Kim, and Petrin (2013) propose a generalized control function approach for models where the
endogenous variables interact with the error term. Caetano (2015) develops an interesting
test procedure for exogeneity of explanatory variables without relying on IV. The test rests
on an assumption that the structural function needs to be continuous in the explanatory
variable of interest, but it does not require the structural function to be identified under
either the null or the alternative hypotheses. Klein and Vella (2010) propose a control
function estimator to adjust for the endogeneity in the triangular simultaneous equations
model where there are no available exclusion restrictions to generate suitable instruments.
They exploit the dependence of the errors on exogenous variables (e.g. heteroscedasticity) to
adjust the conventional control function estimator. Lewbel (2012) also uses a heteroscedastic
covariance restriction to control for the endogeneity in the absence of instrumental variables.
Bias-corrected OLS estimators are proposed to speculate on the size and direction of the
bias by imposing different assumptions. For instance, this approach is used by Kiviet (2012)
where different values of the correlation between the endogenous variable and the innovations
are used to analyze the properties of different estimators, and by Altonji, Elder, and Taber
(2005, 2008) where the degree of selection on observables is used to assess the bias. Nevo
and Rosen (2012) analyze the case where the instruments may themselves be endogenous,
but the correlation of the instrument with the error term can be used to derive bounds in
which the true parameter lies.
The paper is organized as follows. Section 2 presents the econometric model, the identi-
fication results, and comparison with IV method. Section 3 proposes a consistent estimator,
establishes its asymptotic properties, and develops inference procedures. Section 4 presents
extensions to nonparametric model and a case of general form of endogeneity. Section 5 stud-
ies the finite sample performance of the proposed estimator. Section 6 applies the estimator
to the investment equation model and returns to education. Finally, Section 7 concludes the
6
paper. Technical proofs and extension to multiple endogenous variables are included in the
Appendix.
2 Modeling endogeneity using observables
In this section, we propose a new methodology to solve the endogeneity bias problem. Iden-
tification of the parameters of interest is achieved by modeling the endogeneity bias through
a novel additional equation. We also compare the proposed method with the IV framework.
2.1 The model
Consider the following structural model
E[yi|x1i,x2i] = f(x1i,x2i), i = 1, ..., n, (1)
where yi is a scalar dependent variable, x1i is a p1-vector of exogenous explanatory variables,
x2i is a p2-vector of endogenous regressors, and f(·, ·) is an unknown measurable function.
Define xi = [x1i,x2i] as a (p1 + p2)-vector with the sample covariates. The main focus of
this paper is endogeneity and its solution.
For motivation and exposition purposes, we primarily investigate the endogeneity problem
in linear regression models. Nevertheless, the results generalize to nonparametric models (see
Section 4.1 below for details). Equation (1) can be written as
yi = x1iβ1 + x2iβ2 + εi, i = 1, ..., n, (2)
where β1 is a p1-vector, β2 is a p2-vector, and εi is a scalar innovation term. Define β =
[β>1 ,β>2 ]>. We assume that x2i is endogenous, and correlated with the innovation term εi in
(2), such that E[x>2iεi] 6= 0. In addition, x1i is exogenous with E[x>1iεi] = 0. Thus, we also
assume that
1
n
n∑i=1
x>i εip→ E[x>ε] ≡ B = [0>,B>2 ]>,
7
withB a (p1+p2)-vector (finite), and 1n
∑ni=1 x
>i xi
p→ E[x>x] ≡ C (finite and non-singular).
The endogeneity in x2 produces an endogeneity bias, the term B, in the standard OLS es-
timator, βOLS = ( 1n
∑ni=1 x
>i xi)
−1 1n
∑ni=1 x
>i yi. Simple algebra and asymptotic calculations
show that βOLS1
p→ β1 −C−111 C12δ2, and βOLS2
p→ β2 + δ2, where δ2 = V −12·1 B2. Note that if
the endogeneity term B were known, then the correct estimating equation would be
1
n
n∑i=1
x>i (yi − xiβ) = B, (3)
and its solution is β = ( 1n
∑ni=1 x
>i xi)
−1 1n
∑ni=1(x>i yi − B) = βOLS − ( 1
n
∑ni=1 x
>i xi)
−1B.
However, since theB is unknown, to solve the endogeneity problem we will instead model the
interaction of the endogenous variable and the error term, x2iεi and establish identification
of β under some mild conditions. For simplicity, throughout we consider the case where
p2 = 1, i.e., there is only one endogenous variable, x2i. The extension to multiple endogenous
variables is derived in Appendix B.
2.2 Identification
Identification of the parameters of interest is achieved by explicitly modeling the interaction
of the endogenous variable and the unobserved causes of the dependent variable as a function
of additional observable variables. In particular, motivated by equation (3), we consider the
case where the variable x2ε can be modeled using additional variables. The following equation
formalizes modeling endogeneity
E(x2ε | z,x) = g(z), (4)
where g(·) is an unknown smooth function and z a k-vector of additional observable variables.
This is a general formulation to model the endogeneity in the linear parametric model. A
more general form of endogeneity which takes into account nonlinear dependence between
these two variables is considered in Section 4.2 below. For simplicity, assume that g(·) is a
8
known function of z with unknown parameters φ such as g(z;φ), but the analysis can be
easily extended to the case of unknown functional form of g(·) which is discussed in Section
4.1. For notational simplicity, we suppress the subscript i whenever there is no confusion.
A simple example of (4) that is convenient for exposition and estimation purposes is the
following polynomial of z,
E(x2ε | z,x) = Zφ, (5)
where Z = [1, z, z2, . . . ,zm] and φ = [φ0,φ>1 , . . . ,φ
>m]> which is a nonzero vector.1 Equation
(5) is explicitly modeling the endogeneity of x2. In this case, by modeling endogeneity
we mean to model the term x2ε. When φ 6= 0, we can interpret the exogenous variable
z as a noisy measure of the common cause(s) of x2 and ε, which is related to the joint
interaction of the endogenous variable and the unobservables. Our identification strategy
requires observable variables, z. We define such variables as simultaneous variables (SV)
because they are simultaneously related to x2 and ε.
The proposed identification is a special case of a control function approach. When the
correlation between x2 and ε is modeled, equation 5 can be rewritten as x2E(ε | z,x) = Zφ,
hence, we have E(ε | z,x) = Zx2φ. Therefore, the conditional expected value of the unob-
served error term is a function of the “normalized” simultaneous variables by the endogenous
variable , i.e., Zx2
. The emphasis is however on the nature of the SV, which provide infor-
mation about the joint interaction of the endogenous variable and the error term. More
importantly, the required assumptions for identification are specific to the SV model. The
empirical applications developed in this paper are illustrative about the nature of the SV.
In the investment equation model, we argue that the measurement errors model determines
that lag values are jointly related to the endogeneous variable and the error term. In the
education example, IQ is more likely a determinant of both education and ability, rather
1One might want to approximate the unknown function g(·) with one of the sieve bases (e.g., power series,Fourier series, splines, etc.). See, e.g., Chen (2007) for more details on the method of sieve.
9
than only ability.
We are interested in identifying and estimating the parameters β in equation (2). In
practice, φ is unknown, and it is important to note that this parameter cannot be directly
estimated from equation (5) because ε is unobservable. Hence, we consider the joint iden-
tification and estimation of both β and φ. To this end, we impose restrictions on the
relationship between the endogenous regressor and the simultaneous variables.
Define θ ≡ [β>1 ,α>]> with α ≡ [β>2 ,φ
>]>. To ease the notation, define y and x2 after
netting out the exogenous regressor x1 and multiplying the resulting objects by x2. Thus,
y = x2(y−x1E(x>1 x1)−1E(x>1 y)) and x = [x2,Z], with x2 = x2(x2−x1E(x>1 x1)−1E(x>1 x2)).
Let z be a set of variables induced by conditioning variables [z,x]. Consider the following
assumptions.
Assumption 1
(i) E(x>1 ε) = 0;
(ii) E(x2ε | z,x) = Zφ.
Assumption 2 E(x>1 x1) and E(z>x) are non-singular.
Assumptions 1 and 2 allow us to identify the parameters of interest. Assumption 1 (i)
simply states that x1 are exogenous regressors. Assumption 1 (ii) is the main identification
condition. It is new in the literature and deserves further discussion. Condition 1 (ii) ex-
plicitly models the interaction between the endogenous variable and the unobserved causes
of the dependent variable using a parametric model specification. It states that the simulta-
neous variables are able to capture the information on the endogeneity term. The intuition
behind this assumption is that once one controls for the simultaneous variables (z), x is
not related to the interaction term x2ε. In other words, the endogeneity bias implied by the
10
non-zero conditional expectation of the interaction term can be specified as a function of the
simultaneous variables only.
It is important to notice the restriction this assumption imposes relative to the litera-
ture. The simultaneous variables model allows for dependence between the error term and
simultaneous variables at the expense of restricting the interaction between the endogenous
regressor and the error term being a function of the simultaneous variables only. In contrast,
the IV model requires dependence between the endogenous regressor and the instrumental
variables, which are restricted to be uncorrelated with the error term. Therefore, the new
method is able to allow the additional variables to still be correlated with the error term
at the cost of requiring the interaction not to be related to the endogenous variable. As
a result, the difference between our proposed model and traditional IV approach rests on
different model specifications; researchers fail to identify parameters if an incorrect method
is employed to control for the endogeneity in each case.
Assumption 1 (ii) can be interpreted in an alternative form. In particular, it can be
restated as
x2ε = Zφ+ u, with E(u | z,x) = 0. (6)
This format provides an auxiliary equation which models endogeneity bias using simultaneous
variables, and the condition E(u | z,x) = 0 states that the innovation u should have
conditional mean zero given z and x. This requirement relates to standard exogeneity
condition in the IV literature, which requires that the innovation term in the first-stage
equation needs to be uncorrelated with instrumental variables.
Although the condition φ 6= 0 is not a formal requirement, it implicitly arises because of
the fact that if φ = 0, there is no endogeneity problem and the whole exercise is unnecessary.
In addition, Assumption 1 (ii) states that E(x>u) = 0. The intuition behind this condition is
that u must be uncorrelated with x2 as well as Z where x2 is the product of x2 and x2 with x1
11
netted out. One can notice that E(u|z,x1, x2) = 0 is a sufficient condition for E(x>u) = 0.
Although the former condition is stronger than the latter, it might be easier to interpret
in applications. The square seems non-intuitive at first glance but it can be explained as
follows. The identification fails if the unobserved component u is still correlated with x22
(after x1 is netted out), which means that ε is still linearly related to x2 in a way that
cannot be captured by Z. Assumption 2 is a standard rank condition. Non-singularity of
E(x>1 x1) is necessary for the identification of β1 and non-singularity of E(z>x) is required
for the identification of α. The later states that the simultaneous variables are not linearly
related to the endogenous regressor.
We now return to the general structural equation (2) and general identification. For
the sake of clarity, we first focus on exactly identified model motivated by the conditional
moment restriction of equation (5).2 We consider the over-identified case in Section 4.2
below. The following theorem formalizes the identification results of θ, with θ ≡ [β>1 ,α>]>
and α ≡ [β>2 ,φ>]>.
Theorem 1 Suppose Assumption 1 holds. Then, θ is fully identified with
α = E(z>x)−1E(z>y), β1 = E(x>1 x1)−1E(x>1 y)− E(x>1 x1)−1E(x>1 x2)β2,
if and only if Assumption 2 holds.
Proof. In Appendix A.
2.3 Comparison with IV approach
This section discusses the differences between the proposed simultaneous variables and the
IV approaches. Suppose that x2 is endogenous in (2). For the standard IV case, in the
2Based on E(ρ(x2ε,Z;θ) | z,x) = 0, where ρ(·, ·;θ) = x2(y − xβ) − Zφ, we obtain the orthogonalitycondition E(zρ(x2ε,Z;θ)) = 0. A consistent estimator of θ can be achieved by imposing the sample analogueof the population orthogonality condition.
12
first-stage we have
x2 = x1π1 + zπ2 + w,
where w is an unobserved component. The instrumental variable, z, is a valid instru-
ment when it satisfies two conditions. First, it is correlated with the endogenous variable,
Cov(x2, z) 6= 0, after partialling out x1. Second, the instrument is required to be uncorre-
lated with the unobserved causes of the dependent variable, E(ε | z) = 0, after partialling
out x1.3 Assumption 1 (ii) plays a similar role to the first IV condition above such that
an instrumental variable should be related to the endogenous variable, i.e. π2 6= 0. The
proposed method, however, requires the simultaneous variables to capture information on
the interaction term. In addition, the IV requires that E(w | x1, z) = 0. This is similar to
the simultaneous variables condition presented in equation (6), E(u | x, z) = 0.
Regarding the second IV restriction, E(ε | z) = 0, when it is violated, the IV approach
fails to solve the endogeneity problem. However, under the assumptions for our methodology
z could still be used as a simultaneous variable to control for endogeneity. Assumption 1 (ii)
allows z to be correlated with the unobserved causes of the dependent variable, ε, by explicitly
modeling the endogeneity. The main trade-off is that the simultaneous variables method
requires the interaction between the endogenous regressor and the error term to be a function
of the simultaneous variables only. Our approach is also different from the conditioning
instrumental variables in the model of Chalak and White (2011), where in this case x2 and ε
would be uncorrelated once we condition on this type of extended instrument. Furthermore,
our method relies on different model and assumptions than the proxy variable approach. In
the proxy variables approach, condition E[ε | x, z] = g(z) controls for endogeneity of x2
through the equation y = x1β1 + x2β2 + g(z) +w, where E[w | x, z] = 0. It is worth noting
3Strictly speaking, E(εz) = 0 is sufficient for the linear IV model. Similarly, Assumption 1 (ii) for theproposed model can be relaxed to weaker condition which is written as unconditional expectation. But theconditional version is adopted for better motivation and coherence.
13
that our condition E[x2ε | x, z] = g(z) is instead based on the interaction between x2 and ε.
3 Estimation and inference
In this section, we develop a simultaneous variables estimator and present the details on its
practical implementation. We then study its asymptotic properties by establishing consis-
tency and asymptotic normality. We also develop inference procedures based on the Wald
statistic. Finally, we propose a simple test for endogeneity.
3.1 Estimation
Given the identification result in Theorem 1, we are able to estimate the parameters of
interest. We construct an estimator which is simple to implement in practice. Recall that
θ ≡ [β1,α]> with α ≡ [β2,φ]>. An estimator of θ is as following
α =
(1
n
n∑i=1
z>i xi
)−1(1
n
n∑i=1
z>i yi
), (7)
β1 =
(1
n
n∑i=1
x>1ix1i
)−1(1
n
n∑i=1
x>1iyi
)−
(1
n
n∑i=1
x>1ix1i
)−1(1
n
n∑i=1
x>1ix2i
)β2, (8)
where z is the set of variables generated by conditioning variables, x and y are sample
analogs of x and y, which are obtained by replacing the expectations with sample means,
and where β2 is the first element of α.
The implementation of the proposed estimator in practice is simple and can be carried
through a sequence of OLS estimations as follows. First, compute the variables x and y. To
calculate y, one first partials out the exogenous regressors by computing the errors from a
OLS regression of y on x1, then multiply those by x2. Computation of x is analogous. Second,
estimate α using equation (7) and z, the set of variables generated by the conditioning
variables. Finally, given α and consequently β2, β1 can be estimated from OLS as in equation
14
(8), by using the coefficients of the OLS regression of y on x1 and also the coefficients of the
regression of x2 on x1. These generated variables affect the asymptotic variance-covariance
matrix (see e.g. Pagan, 1984), as shown in the derivation of the asymptotic normality below.
3.2 Asymptotic theory
Now we establish the asymptotic properties of the estimator described above. In particular,
we show its consistency and asymptotic normality. Denote Q ≡ E(z>i xi), C1 ≡ E(x>1ix1i),
and C2 ≡ E(x>1ix2i). The limiting behavior of the estimator is summarized in the following
result.
Theorem 2 Let assumptions of Theorem 1 hold and the observations (yi,xi,Zi); i = 1, 2, ..., n
be i.i.d. across i and their fourth moments exist, i.e., E(‖yi‖4) < ∞, E(‖xi‖4) < ∞, and
E(‖Zi‖4) <∞. Then, as n→∞,
αp→ α.
In addition, we have that
√n(α−α)
d→ N(0, Q−1MQ−1),
with M = V ar(z>u − Gr(δ) + Hs(γ)), where G, r(δ), H, and s(γ) are defined in the proof.
Moreover,
β1p→ β1,
and
√n(β1 − β1)
d→ N(0,C−11 C2Vβ2C
>2 C
−11 ),
where Vβ2 is the variance of β2.
Proof. In Appendix A.
15
The results given in Theorem 2 show that the limiting distribution of the proposed
estimator is standard. This result is the foundation for construction of inference procedures.
3.3 Inference
Inference is very important in practice. Given the results in Theorem 2, general hypotheses
on the vector θ can be accommodated by Wald-type tests. We focus on linear restrictions.
The Wald statistic and associated limiting theory provide a natural foundation for testing
hypotheses as Rθ = r, where R is a full-rank matrix imposing q restrictions on the pa-
rameters, and r is a column vector of q elements. The Wald statistic can be constructed
as
Wn = n(Rθ − r)>[RΩR>]−1(Rθ − r), (9)
where Ω is a consistent estimator of the variance-covariance matrix of Ω.
Under the null hypothesis H0 : Rθ = r, the statistic Wn is asymptotically χ2q with q-
degrees of freedom, where q is the rank of the matrix R. The limiting distribution of the
test is summarized in the following result.
Corollary 1 Under H0 : Rθ(τ) = r, and the conditions of Theorem 2,
Wna∼ χ2
q.
Proof. Given a consistent estimator Ω, the proof is a direct application of Theorem 2.
In practice, to carry inference and apply a Wald test in (9) one needs a consistent
estimator of the asymptotic variance-covariance matrix. As described in the above re-
sult, to estimate the asymptotic variance-covariance matrix, we need to estimate both
V ar(α) = Q−1MQ−1/n and V ar(β1) = C−11 C2Vβ2C
>2 C
−11 /n. The later is easily recovered
from its sample counterparts, that is, C1 = n−1∑n
i=1 x>1ix1i and C2 = n−1
∑ni=1 x
>1ix2i. Now,
16
notice that Vβ2 is the first element of the variance-covariance matrix V ar(α). Finally, for the
estimation of the variance-covariance matrix of α one can consider its sample counterpart
such as V ar(α) = Q−1MQ−1/n with
Q =1
n
n∑i=1
z>i xi,
M =1
n
n∑i=1
(z>i ui − Gri(δ) + Hsi(γ)
)(z>i ui − Gri(δ) + Hsi(γ)
)>where
ui = yi − xiα,
G =1
n
n∑i=1
(z>i ∇δxiα) =1
n
n∑i=1
(z>i [−x2ix1i, 0, 0]α),
H =1
n
n∑i=1
(z>i ∇γ yi) =1
n
n∑i=1
(z>i (−x2ix1i)),
ri(δ) =
(1
n
n∑i=1
x>1ix1i
)−1 (x>1i(x2i − x1iδ)
),
si(γ) =
(1
n
n∑i=1
x>1ix1i
)−1 (x>1i(yi − x1iγ)
),
δ =
(1
n
n∑i=1
x>1ix1i
)−1(1
n
n∑i=1
x>1ix2i
),
and
γ =
(1
n
n∑i=1
x>1ix1i
)−1(1
n
n∑i=1
x>1iyi
).
The practical estimation of these quantities is simple. As for the point estimates, it only
relies on simple OLS regressions.
17
4 Extensions
4.1 Nonparametric model
We have focused on a simple linear parametric model for the sake of clear exposition and
motivation. Nevertheless, the model we consider can be extended to more general additively
separable nonparametric models. The structural model (1) is now rewritten as
yi = f(x1i, x2i) + εi, i = 1, ..., n, (10)
where x2 is endogenous and ε is an innovation term. Then, from the equations (4) and (10)
we obtain
E(x2(y − f(x1, x2)) | z,x) = g(z),
or
E(x2y | z,x) = f(x1, x2) + g(z) (11)
= h(w),
where f(x1, x2) ≡ x2f(x1, x2) and w ≡ [x1, x2, z]. This implies that the function f(x1x2),
can be obtained from an additive regression of x2y on (x1, x2) and z. Thus, identification of
the function of interest, f(x1, x2), rests on the identification of f(x1, x2) through f(x1, x2) =
f(x1, x2)/x2.
Identification of f(x1, x2) can be analyzed by a similar argument of Theorem 2.1 in Newey,
Powell, and Vella (1999). To be specific, uniqueness of a conditional expectation is equivalent
to the statement that any additive function f ∗(x1, x2) + g∗(z) satisfying equation (11) must
have Prob(f ∗(x1, x2) + g∗(z) = f(x1, x2) + g(z)) = 1. Thus, identification implies equality
of the additive components up to a constant. Intuitively speaking, failure of identification
implies a functional relationship between the random vector (x1, x2) and z. Therefore, a
18
sufficient condition for identification is the absence of an exact relationship between these
random variables. We summarize the identification result in the following Lemma.
Lemma 1 f(x1, x2) is identified, up to an additive constant, if and only if Prob(δ(x1, x2) +
λ(z) = 0) = 1 implies there is a constant cf with Prob(δ(x1, x2) = cf ) = 1.
Proof. The result follows from a similar argument to Theorem 2.1 in Newey, Powell, and
Vella (1999).
4.2 General forms of endogeneity
We have characterized the endogeneity based on the linear dependence (i.e., expectation of
the product of x2 and ε). More generally, modeling endogeneity can be defined in terms
of any nonlinear dependence between x2 and ε. To be precise, instead of equation (4), we
assume
E(Λ(x2, ε;β) | z,x) = g(z;φ), (12)
where Λ(x2, ε;β) is a known measurable function of (x2, ε) up to an unknown vector of
parameters β and where g(z;φ) is a known measurable function of z up to an unknown
vector of parameters φ. We focus on a nonlinear parametric model as follows:
yi = f(x1i, x2i;β) + εi, i = 1, ..., n,
where f(x1, x2;β) is a known measurable function of (x1, x2) up to an unknown vector of
parameters β.
Let Ψ(y,x, z;θ) ≡ Λ(x2, y−f(x1, x2;β);β)−g(z;φ) with θ ≡ [β>,φ>]>. Then equation
(12) can be rewritten as the following moment condition,
E(Ψ(y,x, z;θ) | z,x) = 0.
19
Therefore, the orthogonality conditions can be rewritten as
E(zΨ(y,x, z;θ)) = 0, (13)
where z is a set of variables generated by the conditioning variables (z,x). It is important to
notice that the model in equation (13) allows for over-identified models where the dimension
of z is larger than the dimension of regressors. Assuming global identification of parameters
and full rank condition for matrices associated with Assumption 2, a minimum-distance
estimator of the parameter θ based on equation (13) is obtained by solving the following
minimization problem
θ ≡ argminθnhn(vi;θ)>Whn(vi;θ),
where W is a consistent estimator of a positive definite matrix W , v ≡ [y,x, z], and
hn(vi;θ) =1
n
n∑i=1
ziΨ(yi,xi, zi).
Asymptotic properties of the above estimator can be established by using standard econo-
metric results (see, e.g., Newey, 1990).
5 Monte Carlo simulations
We use simulation experiments to assess the finite sample performance of the proposed si-
multaneous variables (SV) estimator. For comparison, we also report results for the standard
ordinary least squares (OLS) and the instrumental variables (IV) estimators. The result-
ing estimates are compared in terms of bias, median absolute deviation (MAD) and root
mean square error (RMSE). The simulation experiments are based on 5,000 replications and
sample sizes of n ∈ 100, 200, 500, 1000.
We consider a simple data generating process (DGP) as
yi = β0 + β1x1i + β2x2i + εi, i = 1, 2, ..., n,
20
with β0 = β1 = β2 = 1, with covariates generated as x1 = v1 + v2 and x2 = v2 + v3, where
vj ∼ i.i.d.N(0, 1), j = 1, 2, 3.
First, we generate a model to evaluate the relative performance of the SV estimator when
both x1 and x2 variables are exogenous. Thus, there is no need to correct for endogeneity.
In this case we generate ε = v4, v4 ∼ i.i.d.N(0, 1), such that there is no correlation between
ε and x1 or x2. In this scenario, we expect the standard OLS estimator, which does not
make use of any additional variables for estimation, of both β1 and β2 to be approximately
unbiased. Both the SV and IV use additional variables. So, we construct different sets
of variables to study these estimators. In all the simulation experiments, the structural
equation is kept the same and the SV and IV models differ only in the DGP of z. In this
first set of simulations, two different cases are considered. We construct a simultaneous
variable zsv = v2v4 + v5 (Model SV), and an instrumental variable ziv = v2 + v5 (Model IV),
where v5 ∼ i.i.d.N(0, 1). Then, we use the two variables, zsv, ziv, alternatively for the
different estimators, that is, in the role of SV or IV for both estimators. It is important to
notice that zsv contains v2 and v4 and hence is correlated with x2ε. The rationale for such an
experiment is to evaluate what would be the effect of using a variable for the corresponding
estimator and (wrongly) for another estimator, when there is no endogeneity.
Table 1 reports the simulation results for β2 for this case. Note that all estimators are
unbiased and consistent when the correct variable is used. As expected OLS is consistent,
but also is the SV estimator when zsv is used as simultaneous variable and IV when ziv is
used as instrument. Also note that the SV estimator is also consistent when any variable is
used, i.e. zsv, ziv. However, the IV estimator only works when a proper instrument is used,
and produces biased estimates with significant variance when an incorrect instrument is used
instead. That is, if we use zsv as instrument, then by construction, it is not correlated with
x2 (that is, Cov(zsv, x2) = Cov(v2v4 + v5, v2 + v3) = 0), and then the IV estimator variance
21
is large with a consequent effect on MAD and RMSE. On the other hand, SV works with
both zsv, ziv because it does not affect the β estimators when φ = 0.
[TABLE 1 ABOUT HERE]
Second, consider a DGP model with endogeneity. In this case, we generate the innovation
term as ε = v3 + v4, where v4 ∼ i.i.d.N(0, 1). This model induces endogeneity because
corr[x1, x2] = corr[x2, ε] = 0.5 while corr[x1, ε] = 0. Given the correlation between the
x2 regressor and the error term in the DGP, the standard OLS estimators of both β1 and
β2 should be biased, and in particular, E[βOLS2 ] = 53> 1 so that the OLS bias is 2
3. As
in the previous case, in all the simulation experiments, the structural equation is kept the
same and the models differ only in the DGP of z. Three different cases are considered. Let
v5 ∼ i.i.d.N(0, 1). In Model SV1, we generate a DGP following a simultaneous variable
framework. From the additional equation x2ε = v23 + v2v3 + v2v4 + v3v4, we set z = v2
3 + v2v3.
Note that corr[z, x2ε − z] = 0 and corr[x22, x2ε − z] = 0. Thus, Model SV1 satisfies the
simultaneous variables requirements, and the proposed estimator is expected to be consistent.
In Model SV2, we consider z = 1 + x2ε + v5. In this case, z is a noisy measure of x2ε.
Finally, Model IV is designed to satisfy the IV method requirements. We construct z as
an instrumental variable such as z = v2 + v5. In this case, the IV estimator is expected to
perform the best. These simulation results (for β2) are presented in Table 2.
[TABLE 2 ABOUT HERE]
The results for models SV1 and SV2 show that, as expected, the proposed SV performs
very well in terms of bias, MAD and RMSE. The proposed estimator is approximately
unbiased, even for small sample sizes, and the bias, MAD and RMSE decrease as the sample
size increases. In contrast, OLS and IV estimators are largely biased, and their biases do
22
not disappear for large n. Finally, Model IV is favorable to the IV estimator. In this case,
the IV estimator is approximately unbiased while OLS and SV are not.
Overall, the results show evidence that the SV performs very well in finite sample sim-
ulations when a proper simultaneous variable is used. The SV estimator is able to reduce
the endogeneity bias when a simultaneous variable exists that is related to the joint inter-
action of the endogenous variable and the innovations. The last two experiments also show
that there are fundamental differences between our estimator and other methods for solving
endogeneity such as IV. In fact, the IV estimator works only for Model IV.
6 Empirical applications
6.1 Investment model
Fazzari, Hubbard, and Petersen (1988) consider an investment equation model, where a firm’s
investment is regressed on an observed measure of investment demand (Tobin’s q) and cash
flow. Following Fazzari, Hubbard, and Petersen (1988), investment–cash-flow sensitivities
became a standard metric that examines the impact of financing imperfections on corporate
investment (Stein, 2003). These empirical sensitivities are also used for drawing inferences
about efficiency in internal capital markets (Lamont, 1999; Shin and Stulz, 1998), the effect of
agency on corporate spending (Hadlock, 1998; Bertrand and Mullainathan, 2005), the role of
business groups in capital allocation (Hoshi, Kashyap, and Scharfstein, 1991), and the effect
of managerial characteristics on corporate policies (Bertrand and Schoar, 2003; Malmendier
and Tate, 2005). Therefore, the econometric model is represented by the following equation
yit = γ + αcit + βq∗it + ηit, (14)
where yit ≡ Iit/Kit, where I denotes investment and K capital stock, q∗ is marginal Tobin’s
q, cit ≡ CFit/Kit, where CF is cash flow, and η is the unobserved causes of the investment,
23
which can be interpreted as containing, among other factors, a firm specific economic shock
to investment. This shock may affect firms differently because of different perceptions on
the aggregate conjecture. As a result, η is a pure idiosyncratic shock to the investment.
Concerns about measurement errors (ME) have been emphasized in the context of the
investment equation model. Theory suggests that the correct measure for a firm’s invest-
ment demand is captured by marginal q, but this quantity is unobservable and researchers
use instead its measurable counterpart, average q. Because average q measures marginal q
imperfectly, a measurement problem naturally arises (Hayashi, 1982; Poterba, 1988).
Now we consider the structure of the ME on q∗ more carefully. We use a classical ME
framework to formalize the discussion. Let eit denote the Tobin’s q measurement error.
Assume that eit has zero mean, finite variance σ2e and is independent of q∗it. Thus, the
average q is given by
qit = q∗it + eit. (15)
From plugging equation (15) into the investment model in equation (14), we obtain
yit = γ + αcit + β(qit − eit) + ηit
= γ + αcit + βqit + εit, (16)
where εit ≡ ηit − βeit. Therefore, when q∗ is unobserved and a mismeasured q is used as a
regressor, ME induce endogeneity, i.e., q is correlated with ε due to ME.
Therefore, in the standard investment model in (14), if q∗ is measured with error, OLS
estimates of β will be biased. In addition, given that average q and cash flow are likely to
be correlated, the coefficient α is likely to be biased as well. These biases are expected to
be reduced by the use of estimators that solve the ME problem. It has been common in
the literature to employ instrumental variables estimators to resolve the problem (see, e.g.,
Almeida, Campello, and Galvao, 2010; Lewellen and Lewellen, 2014).
24
In order to better evaluate the performance of the alternative estimators in practice, we
describe some hypotheses about the effects of measurement error correction on the estimated
coefficients β and α from equation (14). Theory does not pin down the exact values that
these coefficients should take. Nevertheless, one could argue that the two following conditions
should be reasonable.
First, an estimator that solves ME in q∗ in a standard investment equation should return
a higher estimate for β and a lower estimate for α when compared with standard OLS
estimates. ME cause an attenuation bias on the estimate for the coefficient β. In addition,
since q and cash flow are likely to be positively correlated, ME should cause an upward bias
on the empirical estimate of α under the standard OLS estimation. Thus, denoting the OLS
and the measurement error corrected (MEC) estimates, respectively, by (βOLS, αOLS) and
(βMEC , αMEC), one should expect:
Condition 1: βOLS < βMEC and αOLS > αMEC .
Second, one would expect the coefficient for q to be positive and the cash flow coefficient
to be non-negative after controlling for ME. The q-theory of investment predicts a positive
correlation between investment and q (e.g. Hayashi, 1982). Under this theory, the cash
flow coefficient should be zero (“neoclassical view”), after controlling for ME. However, in
practice the cash flow coefficient could be positive because of either the presence of financing
frictions (e.g. Fazzari, Hubbard, and Petersen, 1988), or the fact that cash flow captures
variation in investment opportunities even after we apply a correction for mismeasurement
in q.4 Therefore, one should observe:
Condition 2: βMEC > 0 and αMEC ≥ 0.
4However, financial constraints are not sufficient to generate a strictly positive cash flow coefficient becausethe effect of financial constraints is capitalized in stock prices and may thus be captured by variations in q(Chirinko, 1993; Gomes, 2001).
25
Notice that these conditions are fairly intuitive. If a particular ME corrected estimator
does not deliver these basic results, one should have reasons to question the usefulness of
that estimator in applied work.
One potential solution to the endogeneity problem is the IV method. Conventional IV
approaches to solve the measurement error in investment equation models employ lags of qit
as instruments for qit.
In order to make the instruments valid, such that the IV approach removes the bias,
assumptions on the dynamics of measurement errors need to be imposed. Almeida, Campello,
and Galvao (2010) discuss such condition in details. In the simplest case, if the measurement
error eit in equation (15) is i.i.d. across firms and time, and qit is serially correlated, then, for
example, qit−2, qit−3, or (qit−2−qit−3) are valid as instruments for qit, because, for instance, in
equation (16) qit−2 is correlated with qit but uncorrelated with εit. The resulting instrumental
variables estimator is consistent.
There are two simple ways to relax the standard assumption of i.i.d. measurement errors.
First, one can consider an assumption of time-invariant autocorrelation. Then twice-lagged
levels of the observable variables can be used as instruments. Second, under the first-order
moving average structure (i.e., MA(1)) for ME, the consistency of the IV estimator requires
the use of instruments that are lagged two periods or longer. Additionally, the identification
requires the latent regressor to have some degree of autocorrelation (since lagged values are
used as instruments). See Biorn (2000) for more details.
However, the approach of using lagged qit as IV’s fails when ME on the marginal q are
systematic. In practice, it is highly likely that current-period measurement error is correlated
with the first-order or higher-order lags of the measurement error. This phenomenon of
autocorrelation in ME seems to be more realistic in practice because systematic errors on a
way of evaluating average q would be highly likely to be persistent. Then, in general eit could
26
depend on eit−1, eit−2, eit−3, .... Under the presence of autocorrelation in ME (e.g., AR(1)),
the IV approach will not work since the estimates are biased due to the dependence of the
instrument (qit−1) and the current-period measurement error (eit) through the first-order lag
of measurement error (eit−1).
In this paper we solve the ME endogeneity problem by explicitly modeling the conditional
expectation of the interaction qitεit in (16) given a set of simultaneous variables zit. In
other words, we model E(qitεit|zit, cit, qit) as a function of observables, zit, and estimate the
parameters of interest consistently. In this section, we show that, contrary to IV, even under
the assumption that the marginal q and the ME are correlated, lagged Tobin’s q and its
square are valid simultaneous variables, zit. This result holds because we can show that
these simultaneous variables are simultaneously related to the endogenous variable q and
the error term ε. Therefore, we use the proposed methods with the following specification
for Assumption 1 (ii) and a set of observable variables zit such that
E(qitεit|zit, cit, qit) = Zitφ, (17)
where Zit = [qit−1, q2it−1].
Now we discuss the choice of these the simultaneous variables Z and show they are
related to the interaction term qε. The ME bias in the OLS estimator arises from the fact
that E[qitεit] 6= 0. Note that
qitεit = (q∗it + eit)(ηit − βeit) = (q∗it + eit)ηit − β(q∗iteit + e2it). (18)
From the maximization of firm’s profit, we can derive the linear investment model (14) by
assuming linearly homogenous cost function. As a result, η is a pure idiosyncratic shock to
the investment and the model implies that there is no correlation between (q∗, e) and η.
Furthermore, η can be assumed to be i.i.d. and with zero mean in equation (14). Thus, our
main concern is the second term in equation (18), (q∗iteit + e2it).
27
We need to show that the simultaneous variables, lagged Tobin’s q and its square, are
valid variables, that is, they are related to the interaction term qitεit. To establish the
result, let both q∗ and e be autocorrelated such that lags of q can be used to eliminate the
endogeneity. Because both the endogenous variable q and the error term ε are functions of
q∗ and e, a polynomial function of the lags of q may be able to model (q∗iteit + e2it). Thus,
the main idea is to model this explicitly.
The data are taken from COMPUSTAT and cover 1970 to 2005, and the data collection
process follows that of Almeida and Campello (2007). The sample consists of manufacturing
firms with fixed capital of more than $ 5 million (with 1976 as the base year for the cpi), and
the sample firms have growth of less than 100% in both assets and sales. Summary statistics
for investment, q, and cash flow are presented in Table 3. These statistics are similar to
those reported by Almeida and Campello (2007), among other papers. To save space, we
omit the discussion of these descriptive statistics.
[TABLE 3 ABOUT HERE]
We estimate the above investment equation using OLS, instrumental variables (IV), and
simultaneous variables models. The results for OLS and IV are summarized in Table 4 and
for the simultaneous variables in Table 5. For the IV we use different combinations of lags of
Tobin’s q as instruments. In the same fashion, we use lags of the regressors as simultaneous
variables.
Regarding the OLS estimates, we obtain the standard result in the literature that both
q and CF attract positive coefficients which are statistically significant (see Table 4). In the
OLS specification, we obtain a q coefficient of 0.0626, and a cash flow coefficient of 0.1307,
which are likely to be biased. These results are consistent with those in the literature. In
particular, Erickson and Whited (2000) and Almeida, Campello, and Galvao (2010) report
28
very similar results for OLS estimates.
Now we consider the IV estimates. We first test the hypothesis of no first-order autocorre-
lation in the residuals of the simple OLS regression using a simple Durbin-Watson test (DW).
We find that the coefficient for the AR(1) model (of the residuals) is 0.474 with standard er-
ror of 0.007, and the DW statistic is 1.051, which rejects the null hypothesis of no first-order
autocorrelation. This result evidences that lags of q are likely invalid instruments for the IV
case. In spite of the testing results, the point estimates are in Table 4. As expected, following
the IV procedure, we obtain estimates for q and cash flow that do not satisfy the conditions
discussed above, although they are statistically significant. In particular, IV estimators do
not reduce the cash flow coefficient relative to that obtained by the standard OLS, while the
q coefficients even decrease. For example, in the third column of Table 4 (IV2), which uses
qt−1 and q2t−1 as instruments, the q coefficient is 0.0496 and CF coefficient is 0.1333. We
also examine the robustness of the estimators to variations in the set of instruments used
in the estimation, including sets that feature longer lags of the variables and the squares of
them in the model. Columns (IV3)-(IV8) of Table 4 present these results and show that the
q and CF coefficients from IV estimators do not improve substantially with the selection of
instruments used. When the first-order of q is not included as instrument, the q coefficients
are even worse. Therefore, given the concern about measurement error, these results show
that the IV estimates are inconsistent with Conditions 1 and 2 above, and there is strong
evidence that the instrumental variables approach is not able to solve the ME problem in
this investment equation example.
The results for the simultaneous variables case are presented in Table 5. All coefficients
for q and CF are statistically different from zero. By comparison, the simultaneous variables
approach delivers results that are consistent with Conditions 1 and 2. The q coefficient is
almost twice as large as the standard OLS estimate, depending on the set of variables used.
29
At the same time, the cash flow coefficient goes down by about 10% from the standard OLS
value. In particular, the results in column S2, using qt−1 and q2t−1 as simultaneous variables,
show that the q coefficient increases from 0.0626 for OLS to 0.1107 for the simultaneous
variables approach, while the cash flow coefficient drops from 0.1307 for OLS to 0.1208 for
the simultaneous variables approach. These results suggest that the proposed estimator
does a fairly reasonable job at addressing the ME problem. It is worth noting that choosing
proper simultaneous variables is important to obtain consistent estimates. When qt−1 and
q2t−1 are chosen (column S2), the proposed estimator obtains the most reasonable results.
When other further lags are added (column S8), the results do not change substantially.
When qt−1 and q2t−1 are omitted, however, the q coefficients reduce. This issue is analogous
to the instrumental variables approach of which consistency heavily relies on the selection of
valid instruments. As a result, it would be interesting and important to study the optimal
selection of simultaneous variables. We leave it as a future research. Nevertheless, the
estimates from our proposed approach are always consistent with Conditions 1 and 2, under
different sets of simultaneous variables.5
[TABLE 4 ABOUT HERE]
[TABLE 5 ABOUT HERE]
In all, our empirical tests support the idea that simultaneous variables estimators are
likely to outperform the IV estimators in economics and finance applications in which auto-
correlated measurement error is a relevant concern.
5For the robustness check of the performances of the proposed estimator, higher-order moments such asthe cubes of the lags of q have been included, but they do not change the results.
30
6.2 Returns to education
This section illustrates another use of the new proposed estimator. One of the most com-
monly studied topics in labor economics is the impact of education on earnings. The problem
of measuring returns to education is an important research area in economics with a very
large literature on the subject. For examples of comprehensive studies, see Card (1995),
Card (1999), Harmon and Oosterbeek (2000), and Harmon, Oosterbeek, and Walker (2003).
The large volume of research in this area has been explained by both the interest in the
causal effect of education on earnings and the inherent difficulty in measuring this effect.
The difficulty arises due to the potential endogenous relationship between education and
earnings. In particular, unobserved individual ability is correlated with both individual’s ed-
ucation and wage, thus inducing bias in standard OLS estimates of the impact of education
on earnings.
There are several different procedures to estimate returns to education. The most com-
mon strategies to solve the endogeneity of education are the IV and the proxy estimators.
The former tries to find a variable that is correlated with education and uncorrelated with
the unobserved characteristics of the individual (see e.g. Angrist and Krueger, 1991; Harmon
and Walker, 1995; Blundell, Dearden, and Sianesi, 2005). The latter assumes the availability
of a proxy variable for ability and includes it as a regressor in the structural equation (see e.g.
Griliches, 1976; Blackburn and Neumark, 1992; Ashenfelter and Krueger, 1994; Blackburn
and Neumark, 1995; Ashenfelter and Rouse, 1998).
To start our analysis, we first specify the functional form of the wage equation we aim
to estimate. Consider the following structural Mincerian model6
log(wagei) = xiβ1 + β2edui + εi, (19)
6See e.g. Mincer (1970) and Mincer (1974).
31
where edu is years of schooling, x denotes a vector of other covariates, and ε unobserved
causes of wage. In particular, x contains 1, exper, age, tenure, tenure-squared, married,
south, urban, black, where exper is labor market experience, age years of age, tenure job
tenure, married a dummy variable for married individuals, south a dummy variable for
southern region, urban a dummy variable for living in a SMSA, and black a race indicator.
In the Mincer specification the unobserved causes of wage, ε, capture unobservable in-
dividual characteristics. However, those individual factors may also influence the schooling
decision, and hence induce a correlation between schooling and the unobserved causes of
wage in equation (19). A common example is unobserved ability. As mentioned above, this
problem has concerned the empirical literature since the earliest contributions. The variable
years of schooling is endogenous because of the omitted ability. Since the ability is positively
correlated with the level of education, the simple OLS is biased upwards if the ability is
omitted.
The inclusion of direct measures of ability should reduce the estimated education coeffi-
cient if the included proxy for ability is valid. In this case, since ability is used as a control,
the coefficient on education captures the effect of education alone. For example, if we assume
that ε = γabil + e, where e is independent of abil, x, and edu, and IQ is a valid proxy for
abil, then IQ solves the endogeneity problem.
However, these assumptions are controversial. For example, these conditions are in fact
implicitly assuming that wage and education are not affected by another common cause such
as personality. Latent personality traits could include Openness, Conscientiousness, Ex-
traversion, Agreeableness, and Neuroticism (OCEAN). Gensowski, Heckman, and Savelyev
(2011), for instance, quantify the personality traits using a three-step estimation procedure
and show that they have significant and meaningful impacts on earnings. In the case wage
and education are indeed affected by another common cause, IQ would not be an effective
32
proxy for both ability and personality, and might deliver biased results. Of course, IQ is
not a good instrumental variable either because it is correlated with the unobserved causes
of wage which include ability, according to psychological theory and empirical findings.
Moreover, as noted in Hansen, Heckman, and Mullen (2004), schooling have a significant
effect on achievement test scores. Thus past IQ results might be correlated with both abil
and edu.
Our model proposes an alternative way to describe the relationship between education
and unobserved causes of wage using Z, the simultaneous variable. From our model we
assume that
eduiεi = φ0 + φ1IQi + φ2IQ2i + ui, (20)
where (IQ, IQ2) are the simultaneous variables Z, and u is uncorrelated with Z and edu
which is the product of edu and edu with x netted out. Note that, given these conditions,
we are able to solve the endogeneity problem.
We now discuss the plausibility of the conditions imposed for the identification of the
parameters. Assumption 1 (i) states that a set of covariates, x, are exogenous in equation
(19), which is commonly imposed in the literature. Assumption 2 is easily satisfied if IQ
and IQ2 are not linearly related to the squared education. Assumption 1 (ii) implies that
E([IQ, IQ2] u) = 0 and E(edu u) = 0. For these conditions to be satisfied IQ should capture
the correlation between education and ability in a way such that u is a pure idiosyncratic
shock to the wage.
The data set is from National Longitudinal Survey (NLS) on the wages of working men
of the first round of Young Men Cohort of the NLS using its 1980 survey. The data is the
same as in Blackburn and Neumark (1992). This cohort was first surveyed in 1966 with 5,225
respondents at ages 14-24, and the same set of young men was resurveyed at one-year intervals
thereafter. The reason to use this data set is its inclusion of two measures of ability: IQ and
33
KWW , which are the scores from two intelligence tests. IQ refers to the intelligence quotient
which is the result of a standardized test intended to measure intelligence in individuals. The
IQ scores were collected as part of a survey of the respondents’ schools conducted in 1968.
The KWW test examines respondents’ knowledge about the labor market, covering the
duties, educational attainment, and relative earnings of ten occupations. The KWW tests
were also administered as part of the initial survey. See Griliches (1976) for a discussion of
these two measures of ability and how they are used to estimate returns to education.
The variables used are summarized in Table 6 (the interpretation of the variables is similar
to Blackburn and Neumark, 1992). The summary statistics is also presented in Table 6 and
contains 935 observations of individuals aged 28-38 and years of schooling 9-18.
[Table 6 ABOUT HERE]
The econometric results appear in Tables 7, 8, and 9. The results for the OLS and IV
approach are presented in Table 7, the first column describes OLS and columns IV1 to IV6
display results from different selection of instrumental variables for the IV approach. In Ta-
bles 8 and 9, columns P1 to P6 and those S1 to S6 present results for the proxy variables and
simultaneous variables models, based on different selection of proxy variables and simultane-
ous variables, respectively. For OLS, IV, and proxy, we use White heteroskedasticity-robust
standard errors in parentheses. For simultaneous variables estimator, we use the variance
estimator described above.
The standard OLS estimate of the returns to education is 0.0611, which is upward biased
according to the theory on the omitted-variable bias. However, all estimates of the returns
to education from IV approach are larger than 0.1. Thus, as expected, IV approach does not
solve the omitted variable bias. The proxy variable model estimates the returns to education
smaller than OLS, which reduces to 0.0487 when IQ is included (column P1) in the structural
34
equation as a proxy variable, 0.0550 when KWW is added (column P3), 0.0461 when both
are added together (column P5), and 0.0437 when the squares of this variables are also
considered (column P6). Thus, the inclusion of the proxy variables in level reduces the returns
of education by about 24%. When we include all the polynomials of these variables up to
the second-order power as proxy variables, the coefficients reduce even further. Nevertheless,
estimated coefficients of the proxy variables in most of cases are not statistically significant.
For instance, although the estimate of the returns to education reaches the minimum, 0.0437,
when the levels and the squares of both IQ and KWW are included (column P6), estimated
coefficients of the level and the square of IQ are not statistically significant. Meanwhile,
when the levels and the squares of KWW are included, estimates of their coefficients are
statistically significant, but the estimated returns to education is 0.0538 (column P4).
[TABLE 7 ABOUT HERE]
Applying the estimator proposed in this paper, estimation of the returns to education
produces smaller estimates than OLS. The estimates range from 0.0438 to 0.0569, depending
on the specification used. For example, when we compare the results using the model con-
taining both IQ and KWW , together with its squares (column S6), the estimated coefficient
of the education is 0.0438, which represents a decrease of about 28% from the simple OLS.
As expected, these estimates from the proposed estimator support the theory suggesting that
education is positively correlated to ability, and therefore, OLS estimates are upward biased.
It is worth to note that the estimates of the returns to education from simultaneous variables
and the proxy variables are both smaller than the OLS estimate, but slightly different to
each other. For instance, when the specification in which estimated coefficients of either
simultaneous or proxy variables are statistically significant is considered, simultaneous vari-
ables and the proxy variables models produce different estimates of the returns to education:
0.0467 from simultaneous variables estimator (column S2), 0.0538 from the proxy variable
35
model (column P4). In this case, the estimate from S2 is much smaller than that from
P2. One possible explanation for the difference could be the case that personality would be
other unobserved cause of wage which is correlated with education. If this is true, the proxy
variable model cannot control for the omitted variable bias from both ability and person-
ality, by only using one observed proxy variable for ability. On the contrary, simultaneous
variables could still control for the endogeneity under some mild conditions. However, since
comparison of the two approaches should be conducted based on model selection associated
with proxy and simultaneous variables, in general, one need to be careful to conclude that
they are significantly different in this example.
[TABLE 8 ABOUT HERE]
[TABLE 9 ABOUT HERE]
There are other important points regarding the estimates of the simultaneous variables.
Note that the estimation of the coefficients of the exogenous regressors, x, is affected. For
instance, most of estimated coefficients of age from the proxy variable model are not sta-
tistically significant. But those from simultaneous variables estimator are all statistically
significant and magnitudes are bigger than OLS or the proxy variable model. Thus, our
approach captures positive effects of age on wage, which coincide with human capital theory
(e.g. Lillard, 1977; Huggett, Ventura, and Yaron, 2006). Furthermore, there are large gaps
between estimated coefficient of black from different approaches. The black coefficient is
estimated as -0.1909 in OLS (column OLS), and its estimate is reduced to -0.1763 in the
proxy variable model (column P4). The difference among these can be interpreted as the fact
that controlling for unobserved ability reduces the race gap. However, column S2 of Table
9 produces a coefficient estimate of -0.2037, and therefore it means that race discrimination
was in fact larger than predicted from the proxy variables model.
36
7 Conclusion
This paper proposes a new methodology for solving endogeneity bias in econometric models.
The main identification condition explicitly models endogeneity bias in a direct way as a
function of additional variables. These, however, play a different role from the instrumental
variables, and hence the new proposal can be viewed as a complement to the IV. Under
the specified conditions, we establish identification of the parameters of interest, propose an
estimator and derive its consistency and asymptotic normality. We also develop inference
procedures based on the Wald statistics. Furthermore, we illustrate the usefulness of the
proposed methods by revisiting the investment equation model and returns to education.
The scope of potential applications using the new techniques is large because it relies on
mild conditions for identification of parameters.
Many issues remain to be investigated in future research. There are many variants of the
model that would extend the structure for the simultaneous variables. A natural extension
would be to panel data. The analysis of the performance of the methods and more general
testing procedures is also important direction. Applications to program evaluation studies
would appear to be a natural laboratory for further development of simultaneous variables
methodology.
37
Appendix
A. Proof of the Theorems
Proof of Theorem 1
Note that from Assumption 1, E(x2ε − Zφ | z,x) = E(x2(y − x1β1 − x2β2) − Zφ |
z,x) = E(x2y − x2x1β1 − x2x2β2 − Zφ | z,x) = E(x2y − x2x1(E(x>1 x1)−1E(x>1 y) −
E(x>1 x1)−1E(x>1 x2)β2)− x2x2β2 −Zφ | z,x) = E(x2(y − x1E(x>1 x1)−1E(x>1 y))− x2(x2 −
x1E(x>1 x1)−1E(x>1 x2))β2 − Zφ | z,x) = E(y − xα | z,x) = 0. We then have E(z>(y −
xα)) = 0 or E(z>y) = E(z>x)α. Also note that from E(x>1 ε) = 0, E(x>1 (y − x1β1 −
x2β2)) = 0, E(x>1 y)− E(x>1 x1)β1 − E(x>1 x2)β2 = 0. This system admits a unique solution
θ if and only if E(z>x) and E(x>1 x1) are non-singular (Assumption 2). Q.E.D.
Proof of Theorem 2
Let x ≡ [x2(x2 − x1δ),Z] and x ≡ [x2(x2 − x1δ),Z] where δ ≡ E(x>1 x1)−1E(x>1 x2) and
δ ≡(
1n
∑ni=1 x
>1ix1i
)−1 ( 1n
∑ni=1 x
>1ix2i
). Also let y ≡ x2(y−x1γ) and y ≡ x2(y−x1γ) where
γ ≡ E(x>1 x1)−1E(x>1 y) and γ ≡(
1n
∑ni=1 x
>1ix1i
)−1 ( 1n
∑ni=1 x
>1iyi).
From yi = xiα+ ui, where ui is i.i.d. innovation, we have
yi + (yi − yi) = (xi + xi − xi)α+ ui,
yi = xiα + ui − (xi − xi)α+ (yi − yi). (21)
Also we have
α =
(1
n
n∑i=1
z>i xi
)−1(1
n
n∑i=1
z>i yi
)
= α+
(1
n
n∑i=1
z>i xi
)−1(1
n
n∑i=1
z>i (ui − (xi − xi)α+ (yi − yi))
).
38
Then
√n(α−α) = Q−1n−1/2
n∑i=1
z>i (ui − (xi − xi)α+ (yi − yi)), (22)
where Q ≡ 1n
∑ni=1 z
>i xi. By Chebychev’s LLN and Slutsky’s theorem,
Q ≡ 1
n
n∑i=1
z>i xip→ E(z>i xi) ≡ Q.
As considered in Pagan (1984), equation (21) contains generated regressors and generated
dependent variables. So we need to consider errors from these approximations in equation
(22).
First, since E(z>i ui) = 0, we have
n−1/2
n∑i=1
z>i ui = op(1).
Second, by a mean value expansion,
n−1/2
n∑i=1
z>i (xi − xi)α =
[n−1
n∑i=1
z>i ∇δxiα
]√n(δ − δ) + op(1),
= G√n(δ − δ) + op(1),
where G = E[z>i ∇δxiα].
Third, a similar argument gives us
n−1/2
n∑i=1
z>i (yi − yi) =
[n−1
n∑i=1
z>i ∇γ yi
]√n(γ − γ) + op(1),
= H√n(γ − γ) + op(1),
where ∇γ yi = −x2x1 and H = E[z>i ∇γ yi].
Note that from the definition δ, we can write the following Bahadur representation
√n(δ − δ) =
√n
n∑i=1
ri(δ) + op(1),
39
where ri(δ) =(
1n
∑ni=1 x
>1ix1i
)−1 (x>1i(x2i − x1iδ)
), and E[ri(δ)] = 0 by the Law of Iterated
Expectations (LIE). In the same way, given the definition of γ, we can write the following
representation
√n(γ − γ) =
√n
n∑i=1
si(γ) + op(1),
where si(γ) =(
1n
∑ni=1 x
>1ix1i
)−1 (x>1i(yi − x1iγ)
), and E[si(γ)] = 0 by LIE.
By combining all terms together, we have
√n(α−α) = Q−1
n−1/2
n∑i=1
[z>i ui −Gri(δ) +Hsi(γ)]
+ op(1).
For the consistency of α, we have
αp→ α+Q−1 · 0 = α.
For the asymptotic normality, we have that by the Lindeberg-Levy Central Limit Theorem,
√n(α−α)
d→ Q−1N(0,M) ≡ N(0, Q−1MQ−1),
where M = V ar(zi>ui −Gri(δ) +Hsi(γ)).
Similarly,
β1 =
(1
n
n∑i=1
x>1ix1i
)−1(1
n
n∑i=1
x1i(x1iβ1 + x2iβ2 + ε)
)−
(1
n
n∑i=1
x>1ix1i
)−1(1
n
n∑i=1
x1ix2i
)β2
= β1 +
(1
n
n∑i=1
x>1ix1i
)−1(1
n
n∑i=1
x>1ix2i
)β2 −
(1
n
n∑i=1
x>1ix1i
)−1(1
n
n∑i=1
x>1ix2i
)β2
= β1 −
(1
n
n∑i=1
x>1ix1i
)−1(1
n
n∑i=1
x>1ix2i
)(β2 − β2).
By Chebychev’s LLN,
C1 ≡1
n
n∑i=1
x>1ix1ip→ E(x>1ix1i) ≡ C1,
C2 ≡1
n
n∑i=1
x>1ix2ip→ E(x>1ix2i) ≡ C2,
40
we have
β1p→ β1 −C−1
1 C2Q−1β2· 0 = β1,
where Qβ2 is the element in the Q matrix that corresponds to the estimation of β2.
Note that
√n(β1 − β1) = −
(1
n
n∑i=1
x>1ix1i
)−1(1
n
n∑i=1
x>1ix2i
)√n(β2 − β2).
Thus, we have
√n(β1 − β1)
d→ C−11 C2N(0, Vβ2) ≡ N(0,C−1
1 C2Vβ2C>2 C
−11 ).
Q.E.D.
B. Extension to Multiple Endogenous Variables
The proposed methodology can be easily extended to the case of multiple endogenous
variables. Suppose that there are multiple endogenous causes, x2. We then have the following
model for each individual
y = x1β1 + x2β2 + ε, (23)
where x2 ≡ [x21, x22, ..., x2p2 ] is a p2-dimensional vector of endogenous regressors and β2 ≡
[β21, β22, · · · , β2p2 ]>
is a p2-dimensional vector of corresponding coefficients. Suppose that
the following equations hold. x21εx22ε
...x2p2ε
=
Z1φ1 + u1
Z2φ2 + u2...
Zp2φp2 + up2
, (24)
where (Z1,Z2, · · · ,Zp2) are simultaneous variables for each equation and where E(uj |
zj,x1, x2j) = 0 for j = 1, 2, ..., p2. Let each Zj be of dimension kj and φj of dimension
kj for j = 1, 2, ..., p2, and K = k1 + k2 + ... + kp2 . This give us a total of p1 + p2 + K
parameters to be estimated.
41
These parameters will be estimated by the same number of moment conditions. First,
using exogeneity of x1, p1 moment conditions
E(x>1 ε) = 0.
Then plugging equation (23) into (24) givesx21(y − x1β1 − x2β2)x22(y − x1β1 − x2β2)
...x2p2(y − x1β1 − x2β2)
=
Z1φ1 + u1
Z2φ2 + u2...
Zp2φp2 + up2
.Using E(x>1 ε) = 0, we have β1 = E(x>1 x1)−1E(x>1 y)−E(x>1 x1)−1E(x>1 x2)β2. Thus we
obtainx21(y − x1(E(x>1 x1)−1E(x>1 y)− E(x>1 x1)−1E(x>1 x2)β2)− x2β2)x22(y − x1(E(x>1 x1)−1E(x>1 y)− E(x>1 x1)−1E(x>1 x2)β2)− x2β2)
...x2p2(y − x1(E(x>1 x1)−1E(x>1 y)− E(x>1 x1)−1E(x>1 x2)β2)− x2β2)
=
Z1φ1 + u1
Z2φ2 + u2...
Zp2φp2 + up2
,orx21(y − x1E(x>1 x1)−1E(x>1 y))x22(y − x1E(x>1 x1)−1E(x>1 y))
...x2p2(y − x1E(x>1 x1)−1E(x>1 y))
=
x21(x2 − x1E(x>1 x1)−1E(x>1 x2))β2 +Z1φ1 + u1
x22(x2 − x1E(x>1 x1)−1E(x>1 x2))β2 +Z2φ2 + u2...
x2p2(x2 − x1E(x>1 x1)−1E(x>1 x2))β2 +Zp2φp2 + up2
.By rearranging the equations we have
y1
y2...yp2
=
x1α1 + u1
x2α2 + u2...
xp2αp2 + up2
,where yj ≡ x2j(y − x1E(x>1 x1)−1E(x>1 y)) and xj ≡ [x2j(x2 − x1E(x>1 x1)−1E(x>1 x2)),Zj],
and αj ≡ [β>2 ,φ>j ]> for j = 1, 2, ..., p2.
As a result, we have the following system of equations
Y = Xα+ u,
42
where
Y =
y1
y2...yp2
, X =
x1 0 · · · 00 x2 · · · 0...
.... . .
...0 0 · · · xp2
,u =
u1
u2...up2
with α ≡ [α>1 ,α
>1 , ...,α
>p2
]>.
Finally, consider the following p2 +K moment conditionsE(x>1 u1) = 0E(x>2 u2) = 0
...E(x>p2up2) = 0
,or equivalently
E(X>u) = 0.
As seen in the case for a single endogenous regressor, conditions similar to Assumptions
1 and 2 for each endogeneous cause are required to identify the parameters β1 and α.
43
References
Almeida, H., and M. Campello (2007): “Financial Constraints, Asset Tangibility and
Corporate Investment,” Review of Financial Studies, 20, 1429–1460.
Almeida, H., M. Campello, and A. Galvao (2010): “Measurement Errors in Invest-
ment Equations,” Review of Financial Studies, 23, 3279–3328.
Altonji, J. G., T. E. Elder, and C. R. Taber (2005): “Selection on Observed and Un-
observed Variables: Assessing the Effectiveness of Catholic Schools,” Journal of Political
Economy, 113, 151–184.
(2008): “Using Selection on Observed Variables to Assess Bias from Unobservables
When Evaluating Swan-Ganz Catheterization,” American Economic Review Papers and
Proceedings, 98, 345–350.
Angrist, J., and A. Krueger (1991): “Does Compulsory School Attendance Affect
Schooling and Earnings,” Quarterly Journal of Economics, 106, 979–1014.
(2001): “Instrumental Variables and the Search for Identification: From Supply
and Demand to Natural Experiments,” Journal of Economic Perspectives, 15, 69–85.
Ashenfelter, O., and A. Krueger (1994): “Estimates of the Economic Return to
Schooling from a New Sample of Twins,” American Economic Review, 84, 1157–1173.
Ashenfelter, O., and C. Rouse (1998): “Income, Schooling, and Ability: Evidence
From a New Sample of Identical Twins,” Quarterly Journal of Economics, 113, 253–284.
Bertrand, M., and S. Mullainathan (2005): “Bidding for Oil and Gas Leases in the
Gulf of Mexico: A Test of the Free Cash Flow Model?,” mimeo, The University of Chicago
and MIT.
44
Bertrand, M., and A. Schoar (2003): “Managing with Style: The Effect of Managers
on Firm Policies,” Quarterly Journal of Economics, 118, 1169–1208.
Biorn, E. (2000): “Panel Data with Measurement Errors: Instrumental Variables and
GMM Procedures Combining Levels and Difference,” Econometric Reviews, 19, 391–424.
Blackburn, M., and D. Neumark (1992): “Unobserved Ability, Efficiency Wages, and
Interindustry Wage Differentials,” Quarterly Journal of Economics, 107, 1421–1436.
(1995): “Are OLS Estimates of the Return to Schooling Biased Downward? Another
Look,” Review of Economics and Statistics, 77, 217–230.
Blundell, R., L. Dearden, and B. Sianesi (2005): “Evaluating the Effect of Education
on Earnings: Models, Methods and Results from the National Child Development Survey,”
Journal of the Royal Statistical Society: Series A, 168, 473–512.
Bound, J., D. A. Jaeger, and R. M. Baker (1995): “Problems With Instrumental
Variables Estimation When the Correlation Between the Instruments and the Endogenous
Explanatory Variables is Weak,” Journal of the American Statistical Association, 90, 443–
450.
Caetano, C. (2015): “A Test of Exogeneity without Instrumental Variables,” mimeo,
University of Rochester.
Card, D. (1995): “Earnings, Schooling, and Ability Revisited,” in Research in Labor Eco-
nomics, ed. by S. Polachek, vol. 14. JAI Press.
(1999): “The Causal Effect of Education on Earnings,” in Handbook of Labor
Economics, ed. by O. Ashenfelter, and D. Card, vol. 3. Amsterdam: Elsevier.
Chalak, K. (2012): “Identification of Average Random Coefficients under Magnitude and
Sign Restrictions on Confounding,” mimeo, University of Virginia.
45
Chalak, K., and H. White (2011): “An Extended Class of Instrumental Variables for
the Estimation of Causal Effects,” Canadian Journal of Economics, 44, 1–51.
Chen, X. (2007): “Large Sample Sieve Estimation of Semi-nonparametric Models,” in
Handbook of Econometrics, ed. by J. J. Heckman, and E. E. Leamer, vol. 6B. North-
Holland.
Chirinko, R. (1993): “Business Fixed Investment Spending: Modeling Strategies, Empir-
ical Results, and Policy Implications,” Journal of Economic Literature, 31, 1875–1911.
Conley, T. G., C. B. Hansen, and P. E. Rossi (2012): “Plausibly Exogenous,” Review
of Economics and Statistics, 94, 260–272.
Erickson, T., and T. Whited (2000): “Measurement Error and the Relationship between
Investment and Q,” Journal of Political Economy, 108, 1027–1057.
Fazzari, S., R. G. Hubbard, and B. Petersen (1988): “Financing Constraints and
Corporate Investment,” Brooking Papers on Economic Activity, 1, 141–195.
Gandhi, A., K. Kim, and A. Petrin (2013): “Identification and Estimation in Discrete
Choice Demand Models when Endogenous Variables Interact with the Error,” University
of Minnesota, mimeo.
Gensowski, M., J. Heckman, and P. Savelyev (2011): “The Effects of Education,
Personality, and IQ on Earnings of High-Ability Men,” mimeo, The University of Chicago.
Gomes, J. (2001): “Financing Investment,” American Economic Review, 91, 1263–1285.
Griliches, Z. (1976): “Wages of Very Young Men,” Journal of Political Economy, 84,
69–86.
46
Hadlock, C. (1998): “Ownership, Liquidity, and Investment,” RAND Journal of Eco-
nomics, 29, 487–508.
Hahn, J., and J. Hausman (2002): “A New Specification Test for the Validity of Instru-
mental Variables,” Econometrica, 70, 163–189.
Hansen, K. T., J. J. Heckman, and K. J. Mullen (2004): “The Effect of Schooling
and Ability on Achievement Test Scores,” Journal of Econometrics, 121, 39–98.
Harmon, C. P., and H. Oosterbeek (2000): “The Returns to Education: A Review
of the Evidence, Issues and Deficiencies in the Literature,” Centre for the Economics of
Education, LSE.
Harmon, C. P., H. Oosterbeek, and I. Walker (2003): “The Returns to Education:
Microeconomics,” Journal of Economic Surveys, 17, 115–156.
Harmon, C. P., and I. Walker (1995): “Estimates of the Economic Return to Schooling
for the United Kingdom,” American Economic Review, 85, 1278–1286.
Hausman, J. A. (1983): “Specification and Estimation of Simultaneous Equation Models,”
in Handbook of Econometrics, ed. by Z. Griliches, and M. D. Intriligator, vol. II, pp.
391–448. North-Holland Press.
Hayashi, F. (1982): “Tobin’s Marginal q and Average q: A Neoclassical Interpretation,”
Econometrica, 50, 213–224.
Hoshi, T., A. Kashyap, and D. Scharfstein (1991): “Corporate Structure, Liquid-
ity, and Investment: Evidence from Japanese Industrial Groups,” Quarterly Journal of
Economics, 106, 33–60.
Huggett, M., G. Ventura, and A. Yaron (2006): “Human Capital and Earnings
Distribution Dynamics,” Journal of Monetary Economics, 53, 265–290.
47
Kiviet, J. F. (2012): “Identification and Inference in a Simultaneous Equation under Alter-
native Information Sets and Sampling Schemes,” The Econometrics Journal, forthcoming.
Klein, R., and F. Vella (2010): “Estimating a Class of Triangular Simultaneous Equa-
tions Models without Exclusion Restrictions,” Journal of Econometrics, 154, 154–164.
Lamont, O. (1999): “Cash Flow and Investment: Evidence from Internal Capital Markets,”
The Journal of Finance, 52, 83–110.
Levinsohn, J., and A. Petrin (2003): “Estimating Production Functions Using Inputs
to Control for Unobservables,” Review of Economic Studies, 70, 317–342.
Lewbel, A. (2012): “Using Heteroskedasticity to Identify and Estimate Mismeasured and
Endogenous Regressor Models,” Journal of Business and Economic Statistics, 30, 67–80.
Lewellen, J., and K. Lewellen (2014): “Investment and Cashflow: New Evidence,”
Journal of Financial and Quantitative Analysis, Forthcoming.
Lillard, L. (1977): “Inequality: Earnings vs. Human Wealth,” American Economic Re-
view, 67, 42–53.
Malmendier, U., and G. Tate (2005): “CEO Overconfidence and Corporate Invest-
ment,” The Journal of Finance, 60, 2661–2700.
Mincer, J. (1970): “The Distribution of Labor Incomes: A Survey with Special Reference
to the Human Capital Approach,” Journal of Economic Literature, 8, 1–26.
(1974): Schooling, Experience, and Earnings. Columbia University Press, New
York, NY.
Nevo, A., and A. M. Rosen (2012): “Identification with Imperfect Instruments,” Review
of Economics and Statistics, 94, 659–671.
48
Newey, W. K. (1990): “Efficient Instrumental Variables Estimation of Nonlinear Models,”
Econometrica, 58, 809–837.
Newey, W. K., J. L. Powell, and F. Vella (1999): “Nonparametric Estimation of
Triangular Simultaneous Equations Models,” Econometrica, 67, 565–603.
Olley, S., and A. Pakes (1996): “The Dynamics of Productivity in the Telecommunica-
tions Equipment Industry,” Econometrica, 64, 1263–1298.
Pagan, A. (1984): “Econometric Issues in the Analysis of Regressions with Generated
Regressors,” International Economic Review, 25, 221–247.
Poterba, J. (1988): “Financing Constraints and Corporate Investment: Comment,”
Brookings Papers on Economic Activity, 1, 200–204.
Shin, H., and R. Stulz (1998): “Are Internal Capital Markets Efficient?,” Quarterly
Journal of Economics, 113, 531–552.
Stein, J. (2003): “Agency Information and Corporate Investment,” in Handbook of the Eco-
nomics of Finance, ed. by M. Harris, and R. Stulz. Elsevier/North-Holland, Amsterdam.
Wooldridge, J. (2002): Econometric Analysis of Cross Section and Panel Data. MIT
Press, London.
(2009): “On Estimating Firm-Level Production Functions Using Proxy Variables
to Control for Unobservables,” Economics Letters, 104, 112–114.
49
Table 1: DGP with no endogeneity. Monte Carlo re-sults: Bias, Median Absolute Deviation, and Root MeanSquared Error
EstimatorModel OLS IV SV OLS IV SV
n = 100 n = 200Bias
zsv -0.001 3.641 0.002 0.001 -1.104 0.002ziv -0.001 0.002 -0.003 0.001 0.000 0.000
MADzsv 0.056 1.056 0.070 0.041 1.091 0.050ziv 0.055 0.166 0.085 0.040 0.115 0.060
RMSEzsv 0.083 253.644 0.106 0.059 119.844 0.075ziv 0.082 0.416 0.126 0.059 0.186 0.090
n = 500 n = 1000Bias
zsv 0.000 9.609 0.000 0.000 -2.191 0.001ziv 0.000 0.000 0.000 0.000 0.002 0.000
MADzsv 0.025 1.041 0.032 0.018 1.048 0.022ziv 0.024 0.075 0.040 0.018 0.052 0.027
RMSEzsv 0.037 797.869 0.048 0.026 101.951 0.034ziv 0.036 0.115 0.058 0.026 0.080 0.042
Notes: Monte Carlo experiments are based on 5,000 repetitions. Estimates of β2. zsv and ziv are,
respectively, the used simultaneous variables and instrumental variables employed in the estimations.
50
Table 2: DGP with endogeneity. Monte Carlo re-sults: Bias, Median Absolute Deviation, and Root MeanSquared Error
EstimatorModel OLS IV SV OLS IV SV
n = 100 n = 200Bias
zsv1 0.667 0.851 -0.138 0.668 0.644 -0.097zsv2 0.666 -21.8 -0.019 0.667 0.755 -0.017ziv 0.668 -0.078 0.808 0.667 -0.037 0.805
MADzsv1 0.668 0.943 0.205 0.669 0.947 0.143zsv2 0.664 1.202 0.045 0.666 1.199 0.027ziv 0.665 0.236 0.806 0.668 0.170 0.804
RMSEzsv1 0.674 18.1 0.307 0.671 63.462 0.225zsv2 0.673 1557.8 0.085 0.670 25.6 0.057ziv 0.675 2.737 0.825 0.670 0.281 0.814
n = 500 n = 1000Bias
zsv1 0.666 -0.161 -0.056 0.667 0.570 -0.031zsv2 0.668 0.510 -0.009 0.666 1.140 -0.005ziv 0.667 -0.015 0.801 0.667 -0.003 0.802
MADzsv1 0.666 0.895 0.089 0.667 0.921 0.060zsv2 0.668 1.179 0.014 0.667 1.185 0.009ziv 0.668 0.168 0.841 0.667 0.073 0.802
RMSEzsv1 0.668 58.2 0.141 0.667 20.719 0.094zsv2 0.669 51.1 0.030 0.667 24.214 0.017ziv 0.669 0.168 0.805 0.667 0.110 0.804
Notes: Monte Carlo experiments are based on 5000 repetitions. Estimates of β2. zsv1, zsv2, and ziv
are, respectively, the used simultaneous variables 1 and 2, and instrumental variables employed in the
estimations.
51
Table 3: Investment example. Descriptive statistics
Variable Mean Std. Dev. Min Max ObsInvestment 0.2026 0.1348 0.0000 2.5349 24676
q 0.8755 0.4900 0.1307 22.7939 24676
Cash flow 0.3193 0.6212 -80.3571 14.8021 24676
Table 4: Investment example. OLS and Instrumental Variables approaches (using qt−1, q2t−1
qt−2, q2t−2, qt−3, and q2
t−3 as IV)
OLS IV1 IV2 IV3 IV4 IV5 IV6 IV7 IV8q 0.0626*** 0.0478*** 0.0496*** 0.0325*** 0.0373*** 0.0241*** 0.0288*** 0.0470*** 0.0488***
( 0.003 ) ( 0.004 ) ( 0.004 ) ( 0.005 ) ( 0.005 ) ( 0.006 ) ( 0.006 ) ( 0.004 ) ( 0.004 )CF 0.1307*** 0.1337*** 0.1333*** 0.1368*** 0.1358*** 0.1385*** 0.1375*** 0.1338*** 0.1335*
( 0.003 ) ( 0.003 ) ( 0.003 ) ( 0.003 ) ( 0.003 ) ( 0.003 ) ( 0.003 ) ( 0.003 ) ( 0.003 )Const. 0.1053*** 0.1165*** 0.1151*** 0.1282*** 0.1245*** 0.1345*** 0.1310*** 0.1171*** 0.1157
( 0.003 ) ( 0.003 ) ( 0.003 ) ( 0.004 ) ( 0.004 ) ( 0.005 ) ( 0.005 ) ( 0.003 ) ( 0.003 )qt−1 0.8348*** 1.0091*** 0.8133*** 1.0149***
( 0.004 ) ( 0.012 ) ( 0.007 ) ( 0.016 )q2t−1 -0.0699*** -0.0818***
( 0.004 ) ( 0.005 )qt−2 0.5393*** 0.8266*** -0.0140* -0.0461***
( 0.004 ) ( 0.009 ) ( 0.008 ) ( 0.013 )q2t−2 -0.0806*** 0.0070**
( 0.002 ) ( 0.003 )qt−3 0.2829*** 0.4717*** 0.0304*** 0.0526***
( 0.003 ) ( 0.006 ) ( 0.004 ) ( 0.006 )q2t−3 -0.0308*** -0.0023***
( 0.001 ) ( 0.001 )
Notes: Robust standard errors in parentheses. Coefficients and standard errors for the IV are from the first stage regression.*** p<0.01, ** p<0.05, * p<0.1.
52
Table 5: Investment example. Simultaneous variable approach (using qt−1, q2t−1, qt−2, q2
t−2,qt−3, and q2
t−3 as simultaneous variables)
S1 S2 S3 S4 S5 S6 S7 S8q 0.0938*** 0.1107*** 0.0896*** 0.0898*** 0.0804*** 0.0802*** 0.0969*** 0.1129***
( 0.026 ) ( 0.028 ) ( 0.022 ) ( 0.022 ) ( 0.019 ) ( 0.019 ) ( 0.026 ) ( 0.028 )CF 0.1243*** 0.1208*** 0.1252*** 0.1251*** 0.1270*** 0.1271*** 0.1237*** 0.1204***
( 0.005 ) ( 0.006 ) ( 0.004 ) ( 0.004 ) ( 0.004 ) ( 0.004 ) ( 0.005 ) ( 0.006 )Const. 0.0816*** 0.0687*** 0.0848*** 0.0846*** 0.0918*** 0.0920*** 0.0792*** 0.0671***
( 0.020 ) ( 0.021 ) ( 0.016 ) ( 0.016 ) ( 0.014 ) ( 0.015 ) ( 0.020 ) ( 0.021 )qt−1 -0.0684** 0.0766*** -0.0350 0.0915***
( 0.029 ) ( 0.022 ) ( 0.028 ) ( 0.030 )q2t−1 -0.0664*** -0.0606***
( 0.017 ) ( 0.017 )qt−2 -0.0546*** -0.0248 -0.0260* -0.0142
( 0.017 ) ( 0.019 ) ( 0.015 ) ( 0.026 )q2t−2 -0.0085 -0.0042
( 0.005 ) ( 0.008 )qt−3 -0.0282*** -0.0253** -0.0093 -0.0071
( 0.009 ) ( 0.011 ) ( 0.009 ) 0.014 )q2t−3 -0.0005 0.0007
( 0.001 ) ( 0.003 )
Notes: Robust standard errors in parentheses. *** p<0.01, ** p<0.05, * p<0.1.
Table 6: Education example. Descriptive statistics
Abbreviation Variable Mean Std. Dev. Min Max Obswage monthly earnings 957.95 404.36 115 3078 935IQ IQ score 101.28 15.05 50 145 935
KWW knowledge of world work score 35.74 7.64 12 56 935edu years of education 13.47 2.20 9 18 935exper years of work experience 11.56 4.37 1 23 935age age in years 33.08 3.11 28 38 935
tenure years with current employer 7.23 5.08 0 22 935married =1 if married 0.89 0.31 0 1 935south =1 if live in south 0.34 0.47 0 1 935urban =1 if live in SMSA 0.72 0.45 0 1 935black =1 if black 0.13 0.33 0 1 935lwage log(wage) 6.78 0.42 4.74 8.03 935
53
Table 7: Education example. OLS and Instrumental Variables approaches (using IQ, IQ2,KWW , and KWW 2 as IV variables)
OLS IV1 IV2 IV3 IV4 IV5 IV6
edu 0.0611*** 0.1104*** 0.1103*** 0.1055*** 0.1122*** 0.1089*** 0.1121***( 0.007 ) ( 0.015 ) ( 0.015 ) ( 0.019 ) ( 0.019 ) ( 0.014 ) ( 0.013 )
exper 0.0112*** 0.0253*** 0.0253*** 0.0239*** 0.0259*** 0.0249*** 0.0258***( 0.004 ) ( 0.005 ) ( 0.005 ) ( 0.006 ) ( 0.006 ) ( 0.005 ) ( 0.005 )
age 0.0085* -0.0016 -0.0016 -0.0006 -0.0020 -0.0013 -0.0019( 0.005 ) ( 0.006 ) ( 0.005 ) ( 0.006 ) ( 0.006 ) ( 0.005 ) ( 0.005 )
tenure 0.0272*** 0.0181** 0.0181** 0.0190** 0.0177** 0.0184** 0.0178**( 0.008 ) ( 0.009 ) ( 0.009 ) ( 0.009 ) ( 0.009 ) ( 0.009 ) ( 0.009 )
tenure2 -0.0010** -0.0005 -0.0005 -0.0005 -0.0004 -0.0005 -0.0004( 0.000 ) ( 0.001 ) ( 0.001 ) ( 0.001 ) ( 0.001 ) ( 0.001 ) ( 0.001 )
married 0.1934*** 0.2055*** 0.2055*** 0.2043 *** 0.2060 *** 0.2052 *** 0.2059 ***( 0.039 ) ( 0.040 ) ( 0.040 ) ( 0.040 ) ( 0.041 ) ( 0.040 ) ( 0.040 )
south -0.0909*** -0.0823*** -0.0823*** -0.0832*** -0.0820*** -0.0826*** -0.0820***( 0.026 ) ( 0.027 ) ( 0.027 ) ( 0.027 ) ( 0.027 ) ( 0.027 ) ( 0.027 )
urban 0.1852*** 0.1722*** 0.1722*** 0.1735*** 0.1717*** 0.1726*** 0.1718***( 0.027 ) ( 0.028 ) ( 0.028 ) ( 0.028 ) ( 0.028 ) ( 0.028 ) ( 0.028 )
black -0.1909*** -0.1475*** -0.1475*** -0.1518*** -0.1459*** -0.1488*** -0.1460 ***( 0.038 ) ( 0.041 ) ( 0.040 ) ( 0.042 ) ( 0.042 ) ( 0.040 ) ( 0.040 )
Cons. 5.1730*** 4.6962*** 4.6965*** 4.7436*** 4.6784*** 4.7108*** 4.6796***( 0.158 ) ( 0.208 ) ( 0.206 ) ( 0.238 ) ( 0.237 ) ( 0.200 ) ( 0.198 )
IQ 0.0614*** -0.0674* 0.0506*** -0.0731**(0.004) (0.036) (0.004) (0.035)
IQ2 0.0006*** 0.0006***(0.000) (0.000)
KWW 0.0983*** -0.0080 0.0595*** -0.0435(0.009) (0.054) (0.009) (0.050)
KWW 2 0.0015** 0.0015**(0.001) (0.001)
Notes: Robust standard errors in parentheses. Coefficients and standard errors for the IVvariables are from the first stage regression. *** p<0.01, ** p<0.05, * p<0.1.
54
Table 8: Education example. Proxy approach (using IQ, IQ2, KWW , and KWW 2 as proxyvariables)
P1 P2 P3 P4 P5 P6
edu 0.0487*** 0.0479*** 0.0550*** 0.0538*** 0.0461*** 0.0437***( 0.007 ) ( 0.007 ) ( 0.007 ) ( 0.007 ) ( 0.007 ) ( 0.007 )
exper 0.0107*** 0.0107*** 0.0113*** 0.0115*** 0.0108*** 0.0110***( 0.004 ) ( 0.004 ) ( 0.004 ) ( 0.004 ) ( 0.004 ) ( 0.004 )
age 0.0102** 0.0102** 0.0040 0.0023 0.0070 0.0053( 0.005 ) ( 0.005 ) ( 0.005 ) ( 0.005 ) ( 0.005 ) ( 0.005 )
tenure 0.0278*** 0.0276*** 0.0280*** 0.0270*** 0.0283*** 0.0270***( 0.008 ) ( 0.008 ) ( 0.008 ) ( 0.008 ) ( 0.008 ) ( 0.008 )
tenure2 -0.0010** -0.0010** -0.0010** -0.0010** -0.0011** -0.0010**( 0.000 ) ( 0.000 ) ( 0.000 ) ( 0.000 ) ( 0.000 ) ( 0.000 )
married 0.1928*** 0.1950*** 0.1857*** 0.1902*** 0.1878*** 0.1944***( 0.039 ) ( 0.039 ) ( 0.039 ) ( 0.039 ) ( 0.039 ) ( 0.039 )
south -0.0795*** -0.0784*** -0.0918*** -0.0914*** -0.0814*** -0.0797***( 0.026 ) ( 0.026 ) ( 0.026 ) ( 0.026 ) ( 0.026 ) ( 0.026 )
urban 0.1831*** 0.1846*** 0.1774*** 0.1758*** 0.1782*** 0.1777***( 0.027 ) ( 0.027 ) ( 0.027 ) ( 0.027 ) ( 0.027 ) ( 0.027 )
black -0.1429*** -0.1518*** -0.1677*** -0.1763*** -0.1331*** -0.1484***( 0.039 ) ( 0.040 ) ( 0.039 ) ( 0.039 ) ( 0.040 ) ( 0.041 )
Cons. 4.9007*** 5.2920*** 5.2354*** 5.7889*** 4.9728*** 5.8760***( 0.173 ) ( 0.426 ) ( 0.160 ) ( 0.264 ) ( 0.178 ) ( 0.465 )
IQ 0.0038*** -0.0041 0.0034*** -0.0034( 0.001 ) ( 0.008 ) ( 0.001 ) ( 0.008 )
IQ2 0.0000 0.0000( 0.000 ) ( 0.000 )
KWW 0.0050** -0.0246** 0.0033 -0.0273**( 0.002 ) ( 0.011 ) ( 0.002 ) ( 0.011 )
KWW 2 0.0004*** 0.0004***( 0.000 ) ( 0.000 )
Notes: Robust standard errors in parentheses. *** p<0.01, ** p<0.05, * p<0.1.
55
Table 9: Education example. Simultaneous variable approach (using IQ, IQ2, KWW , andKWW 2 as simultaneous variables)
S1 S2 S3 S4 S5 S6
edu 0.0569*** 0.0467*** 0.0569*** 0.0509*** 0.0569*** 0.0438***( 0.010 ) ( 0.009 ) ( 0.009 ) ( 0.012 ) ( 0.009 ) ( 0.009 )
exper 0.0100*** 0.0071*** 0.0100*** 0.0083** 0.0100*** 0.0063**( 0.003 ) ( 0.003 ) ( 0.003 ) ( 0.003 ) ( 0.003 ) ( 0.003 )
age 0.0094*** 0.0115*** 0.0094*** 0.0106*** 0.0094*** 0.0121***( 0.002 ) ( 0.002 ) ( 0.002 ) ( 0.002 ) ( 0.002 ) ( 0.002 )
tenure 0.0279*** 0.0298*** 0.0279*** 0.0291*** 0.0279*** 0.0303***( 0.002 ) ( 0.002 ) ( 0.002 ) ( 0.002 ) ( 0.002 ) ( 0.002 )
tenure2 -0.0010*** -0.0011*** -0.0010*** -0.0011*** -0.0010*** -0.0012***( 0.000 ) ( 0.000 ) ( 0.000 ) ( 0.000 ) ( 0.000 ) ( 0.000 )
married 0.1923*** 0.1898*** 0.1923*** 0.1908*** 0.1923*** 0.1891***( 0.002 ) ( 0.002 ) ( 0.002 ) ( 0.003 ) ( 0.002 ) ( 0.002 )
south -0.0917*** -0.0935*** -0.0917*** -0.0927*** -0.0917*** -0.0939***( 0.002 ) ( 0.002 ) ( 0.002 ) ( 0.002 ) ( 0.002 ) ( 0.002 )
urban 0.1863*** 0.1890*** 0.1863 *** 0.1879 *** 0.1863 *** 0.1898 ***( 0.003 ) ( 0.002 ) ( 0.002 ) ( 0.003 ) ( 0.002 ) ( 0.002 )
black -0.1946*** -0.2037*** -0.1947 *** -0.2000 *** -0.1947*** -0.2061***( 0.009 ) ( 0.008 ) ( 0.008 ) ( 0.010 ) ( 0.008 ) ( 0.008 )
Cons. 5.2137*** 5.3132*** 5.2143*** 5.2724*** 5.2143*** 5.3405***( 0.095 ) ( 0.091 ) ( 0.085 ) ( 0.113 ) ( 0.085 ) ( 0.086 )
IQ 0.0009 -0.0483*** -0.0013 0.0509( 0.002 ) ( 0.018 ) ( 0.056 ) ( 0.044 )
IQ2 0.0005*** -0.3268***( 0.000 ) ( 0.015 )
KWW 0.0027 -0.0845 0.0064 0.0000( 0.008 ) ( 0.197 ) ( 0.159 ) ( 0.000 )
KWW 2 0.0023 0.0053*( 0.005 ) ( 0.003 )
Notes: Robust standard errors in parentheses. *** p<0.01, ** p<0.05, * p<0.1.
56