nonlinear regressions with nonstationary time series · 2014-03-21 · nonlinear regressions with...
TRANSCRIPT
Nonlinear regressions with nonstationary time series
Nigel Chan and Qiying Wang∗
The University of Sydney
March 20, 2014
Abstract
This paper develops asymptotic theory for a nonlinear parametric cointegratingregression model. We establish a general framework for weak consistency that iseasy to apply for various nonstationary time series, including partial sums of linearprocesses and Harris recurrent Markov chains. We provide limit distributions fornonlinear least squares estimators, extending the previous works. We also introduceendogeneity to the model by allowing the error to be serially dependent on and crosscorrelated with the regressors.
JEL Classification: C13, C22.
Key words and phrases: Cointegration, nonlinear regressions, consistency, limit distribu-
tion, nonstationarity, nonlinearity, endogeneity.
1 Introduction
The past few decades have witnessed significant developments in cointegration analy-
sis. In particular, extensive researches have focused on cointegration models with linear
structure. Although it gives data researchers convenience in implementation, the linear
structure is often too restrictive. In particular, nonlinear responses with some unknown
parameters often arise in the context of economics. For empirical examples, we refer to
Granger and Terasvirta (1993) as well as Terasvirta et al. (2011). In this situation, it is
expected that nonlinear cointegration captures the features of many long-run relationships
in a more realistic manner.
∗The author thanks to the Editor, Associate Editor and three referees for their very helpful commentson the original version. The authors also acknowledge research support from Australian Research Council.Address correspondence to Qiying Wang: School of Mathematics and Statistics, University of Sydney,NSW 2006, Australia. E-mail: [email protected]
1
A typical nonlinear parametric cointegrating regression model has the form
yt = f(xt, θ0) + ut, t = 1, ..., n, (1)
where f : R×Rm → R is a known nonlinear function, xt and ut are regressor and regression
errors, and θ0 is an m-dimensional true parameter vector that lies in the parameter set
Θ. With the observed nonstationary data {yt, xt}nt=1, this paper is concerned with the
nonlinear least squares (NLS) estimation of the unknown parameters θ0 ∈ Θ. In this
regard, Park and Phillips (2001) (PP henceforth) considered xt to be an integrated, I(1),
process. Based on PP framework, Chang et al. (2001) introduced an additional linear time
trend term and stationary regressors into model (1). Chang and Park (2003) extended to
nonlinear index models driven by integrated processes. More recently, Choi and Saikkonen
(2010), Gao et al. (2009) and Wang and Phillips (2012) developed statistical tests for
the existence of a nonlinear cointegrating relation. Park and Chang (2011) allowed the
regressors xt to be contemporaneously correlated with the regression errors ut and Shi
and Phillips (2012) extended the model (1) by incorporating a loading coefficient.
The present paper has a similar goal to the previously mentioned papers but offers
more general results, which have some advantages for empirical studies. First of all, we
establish a general framework for weak consistency of the NLS estimator θn, allowing for
the xt to be a wider class of nonstationary time series. The set of sufficient conditions
is easy to apply for various nonstationary regressors, including partial sums of linear
processes and recurrent Markov chains. Furthermore, we provide limit distributions for
the NLS estimator θn. It deserves to mention that the routine employed in this paper
to establish the limit distributions of θn is different from those used in the previous
works, e.g. Park and Phillips (2001). Roughly speaking, our routine is related to joint
distributional convergence of a martingale under target and its conditional variance, rather
than using classical martingale limit theorem which requires establishing the convergence
in probability for the conditional variance. In nonlinear cointegrating regressions, there
are some advantages for our methodology since it is usually difficult to establish the
convergence in probability for the conditional variance, in particular, in the situation that
the regressor xt is a nonstationary time series. Second, in addition to the commonly used
martingale innovation structure, our model allows for serial dependence in the equilibrium
errors ut and the innovations driving xt. It is important as our model permits joint
determination of xt and yt, and hence the system is a time series structural model. Under
such situation, the weak consistency and limit distribution of the NLS estimator θn are
2
also established.
This paper is organized as follow. Section 2 presents our main results on weak con-
sistency of the NLS estimator θn. Theorem 2.1 provides a general framework. Its ap-
plications to integrable and non-integrable f are given in Theorems 2.2–2.5, respectively.
Section 3 investigates the limit distributions of θn in which the model (1) has a martingale
structure. Extension to endogeneity is presented in Section 4. As mentioned above, our
routine establishing the limit distribution of θn is different from previous works. Section
5 performs simulation, reporting and discussing the numerical values of means and stan-
dard errors which provide the evidence of accuracies of our NLS estimator. Section 6
presents an empirical example, providing a link between our theory and real applications.
The model of interest is the carbon Kuznets curve relating the per capita CO2 emissions
and per capita GDP. Endogeneity occurs in this example due to potential misreporting of
GDP, omitted variable bias and reverse causality. Section 7 concludes the paper. Section
8 provides partial technical proofs. Full details of the technical proofs can be found in
the Supplemental Material of this paper, where we also provide a unit root test for our
empirical example, and other details for simulation.
Throughout the paper, we denote constants by C,C1, C2, ... which may be different at
each appearance. For a vector x = (x1, ..., xm), assume that ‖x‖ = (x21 + ...+ x2
m)1/2, and
for a matrix A, the norm operator ‖·‖ is defined by ‖A‖ = supx:‖x‖=1 ‖xA‖. Furthermore,
the parameter set Θ ⊂ Rm is assumed to be compact and convex, and the true parameter
vector θ0 is an interior point of Θ.
2 Weak consistency
This section considers the estimation of the unknown parameters θ0 in model (1) by
NLS. Let Qn(θ) =∑n
t=1(yt − f(xt, θ))2. The NLS estimator θn of θ0 is defined to be the
minimizer of Qn(θ) over θ ∈ Θ, that is,
θn = arg minθ∈ΘQn(θ), (2)
and the error estimator is defined by σ2n = n−1
∑nt=1 u
2t , where ut = yt − f(xt, θn). To
investigate the weak consistency for the NLS estimator θn, this section assumes the regres-
sion model (1) to have a martingale structure. In this situation, our sufficient conditions
are closely related to those of Wu (1981), Lai (1994) and Skouras (2000), intending to
provide a general framework. In comparison to the papers mentioned, our assumptions
3
are easy to apply, particularly in nonlinear cointegrating regression situation as stated in
the two examples below. Extension to endogeneity between xt and ut is investigated in
Section 4.
2.1 A framework
We make use of the following assumptions for the development of weak consistency.
Assumption 2.1. For each π, π0 ∈ Θ, there exists a real function T : R→ R such that
|f(x, π)− f(x, π0)| ≤ h(||π − π0||)T (x), (3)
where h(x) is a bounded real function such that h(x) ↓ h(0) = 0, as x ↓ 0.
Assumption 2.2. (i) {ut,Ft, 1 ≤ t ≤ n} is a martingale difference sequence satisfying
E(|ut|2|Ft−1) = σ2 and sup1≤t≤nE(|ut|2q|Ft−1) <∞ a.s., where q > 1; and
(ii) xt is adapted to Ft−1, t = 1, ..., n.
Assumption 2.3. There exists an increasing sequence 0 < κn →∞ such that
κ−2n
n∑t=1
[T (xt) + T 2(xt)] = OP (1), (4)
and for any 0 < η < 1 and θ 6= θ0, where θ, θ0 ∈ Θ, there exist n0 > 0 and M1 > 0 such
that
P( n∑t=1
(f(xt, θ)− f(xt, θ0))2 ≥ κ2n /M1
)≥ 1− η, (5)
for all n > n0.
Theorem 2.1. Under Assumptions 2.1–2.3, the NLS estimator θn is a consistent esti-
mator of θ0, i.e. θn →P θ0. If in addition κ2nn−1 = O(1), then σ2
n →P σ2, as n→∞.
Assumptions 2.1 and 2.2 are the same as those used in Skouras (2000), which are
standard in the NLS estimation theory. Also see Wu (1981) and Lai (1994). Assumption
2.3 is used to replace (3.8), (3.9) and (3.11) in Skouras (2000), in which some uniform
conditions are used. In comparison to Skouras (2000), our Assumption 2.3 is related to
the conditions on the regressor xt and is more natural and easy to apply. In particular,
it is directly applicable in the situation that T is integrable and the regressor xt is a
nonstationary time series, as stated in the following sub-section.
4
2.2 Assumption 2.3: integrable functions
Due to Assumption 2.1, f(x, θ)− f(x, θ0) is integrable in x if T is an integrable function.
This class of functions includes f(x, θ1, θ2) = θ1|x|θ2I(x ∈ [a, b]), where a and b are finite
constants, the Gaussian function f(x, θ) = e−θx2, the Laplacian function f(x, θ) = e−θ|x|,
the logistic regression function f(x, θ) = eθ|x|/(1 + eθ|x|), etc. In this sub-section, two
commonly used non-stationary regressors xt are shown to satisfy Assumption 2.3 if T is
integrable.
Example 1 (Partial sums of linear processes). Let xt =∑t
j=1 ξj, where {ξj, j ≥ 1}is a linear process defined by
ξj =∞∑k=0
φk εj−k, (6)
where {εj,−∞ < j <∞} is a sequence of i.i.d. random variables with Eε0 = 0, Eε20 = 1
and the characteristic function ϕ(t) of ε0 satisfying∫∞−∞ |ϕ(t)|dt <∞. The coefficients φk
are assumed to satisfy one of the following conditions:
C1. φk ∼ k−µρ(k), where 1/2 < µ < 1 and ρ(k) is a function slowly varying at ∞.
C2.∑∞
k=0 |φk| <∞ and φ ≡∑∞
k=0 φk 6= 0.
Put d2n = Ex2
n. As in Wang, Lin and Gulati (2003), we have
d2n = Ex2
n ∼
{cµn
3−2µρ2(n), under C1,
φ2n, under C2,(7)
where cµ = 1/((1− µ)(3− 2µ))∫∞
0x−µ(x+ 1)−µdx. We have the following result.
Theorem 2.2. Suppose xt is defined as in Example 1 and Assumption 2.1 holds. Assume:
(i) T is bounded and integrable, and
(ii)∫∞−∞(f(s, θ)− f(s, θ0))2ds > 0 for all θ 6= θ0.
Then (4) and (5) hold with κ2n = n/dn. Consequently, if in addition Assumption 2.2, then
θn →P θ0.
Theorem 2.2 improves Theorem 4.1 of PP in two folds. Firstly, we allow for more
general regressor. The result under C1 is new, which allows xt to be long memory
process, including the fractionally integrated process as an example. PP only allow xt to
satisfy C2 with additional conditions on φk, that is, they require xt to be a partial sums
of a short memory process. Secondly, we remove the part (b) required in the definition
of an I-regular function given in their Definition 3.3. Furthermore, we allow for certain
5
non-integrable f such as the logistic regression function f(x, θ) = eθ|x|/(1 + eθ|x|), θ ≥ θ0
for some θ0 > 0, although we assume the integrability of T .
Example 2 (Recurrent Markov Chain). Let {xk}k≥0 be a Harris recurrent Markov
chain with state space (E, E), transition probability P (x,A) and invariant measure π. We
denote Pµ for the Markovian probability with the initial distribution µ, Eµ for correspon-
dent expectation and P k(x,A) for the k-step transition of {xk}k≥0. A subset D of E with
0 < π(D) <∞ is called D-set of {xk}k≥0 if for any A ∈ E+,
supx∈E
Ex( τA∑k=1
ID(xk))<∞,
where E+ = {A ∈ E : π(A) > 0} and τA = inf{n ≥ 1 : xn ∈ A}. By Theorem 6.2 of Orey
(1971), D-sets not only exist, but generate the entire sigma E , and for any D-sets C,D
and any probability measure ν, µ on (E, E),
limn→∞
n∑k=1
νP k(C)/n∑k=1
µP k(D) =π(C)
π(D), (8)
where νP k(D) =∫∞−∞ P
k(x,D)ν(dx). See Nummelin (1984) for instance.
Let a D-set D and a probability measure ν on (E, E) be fixed. Define
a(t) = π−1(D)
[t]∑k=1
νP k(D), t ≥ 0.
By recurrence, a(t) → ∞. Here and below, we set the state space to be the real space,
that is (E, E) = (R,R). We have the following result.
Theorem 2.3. Suppose xt is defined as in Example 2 and Assumption 2.1 holds. Assume:
(i) T is bounded and∫∞−∞ |T (x)|π(dx) <∞, and
(ii)∫∞−∞(f(s, θ)− f(s, θ0))2π(ds) > 0 for all θ 6= θ0.
Then (4) and (5) hold with κ2n = a(n). Consequently, if in addition Assumption 2.2, then
θn →P θ0.
Theorem 2.3 seems to be new to the literature. By virtue of (8), the asymptotic order
of a(t) depends only on {xk}k≥0. It is interesting to notice that Theorem 2.3 does not
impose the β-regular condition as commonly used in the literature. The Harris recurrent
Markov chain {xk}k≥0 is called β-regular if
limλ→∞
a(λt)/a(λ) = tβ, ∀t > 0, (9)
where 0 < β ≤ 1. See Chen (1999) for instance.
6
2.3 Assumption 2.3: beyond integrable functions
As noticed in Section 2.2, Assumption 2.3 holds for certain non-integrable f although we
require the integrability of T . Assumption 2.3 is more involved for general non-integrable
f when T is also non-integrable. In the latter situation, we require certain relationships
between f and T .
Assumption 2.4. (i) There exists a regular function1 g(x, θ, θ0) on Θ satisfying∫|s|≤δ
g2(s, θ, θ0)ds > 0
for all θ 6= θ0 and δ > 0, a real function T : R→ R and a positive real function v(λ)
which is bounded away from zero as λ→∞, such that for any bounded x,
supθ,θ0∈Θ
|f(λx, θ)− f(λx, θ0)− v(λ) g(x, θ, θ0)|/T (λx) = o(1), (10)
as λ→∞.
(ii) There exists a real function T1 : R → R such that T (λx) ≤ v(λ)T1(x) as |λx| → ∞and T1 + T 2
1 is locally integrable (i.e. integrable on any compact set).
Theorem 2.4. Suppose Assumption 2.4 holds. Suppose that there exists a continuous
Gaussian process G(t) such that x[nt],n ⇒ G(t), on D[0, 1], where xi,n = xi/dn and
0 < dn → ∞ is a sequence of real numbers. Then (4) and (5) hold with κ2n = nv2(dn).
Consequently, if in addition Assumptions 2.1–2.2 and assumption that the T s in Assump-
tions 2.1 and 2.4 coincide, then θn →P θ0.
Assumptions 2.1 and 2.4 are quite general, including many commonly used regression
functions. Typical examples include f(x, θ) = (x + θ)2, θex/(1 + ex), θ log |x|, θ|x|α
(α is fixed) and θ0 + θ1|x| + ... + θk|x|k. The class of functions satisfying Assumptions
2.1 and 2.4 are similar to, but wider than those imposed in Theorem 4.2 of PP. For
instance, Assumptions 2.1 and 2.4 (hence Theorem 2.4) are applicable for the function
f(x, θ) = (x + θ)2 [with T (x) = T1(x) = |x|, v(λ) = λ and g(x, θ, θ0) = 2(θ − θ0)x], but
Theorem 4.2 of PP is not directly applicable for this function. See, e.g., Example 4.1 (c)
of PP. We also mention that our condition on the regressor xt is much more general than
1Function H is called regular on Θ if (a) for all θ ∈ Θ, there exist for each ε > 0 continuous functionsHε, Hε, and a constant δε > 0 such that Hε(x, θ) ≤ H(y, θ) ≤ Hε(x, θ) for all |x − y| < δε on K, acompact set of R, and such that
∫K
(Hε − Hε)(x, θ)dx → 0 as ε → 0, and (b) for all x ∈ R, H(x, ·) isequicontinuous in a neighborhood of x.
7
that of PP, as we only require that x[nt]/dn converges weakly to a continuous Gaussian
process. This kind of weak convergence condition is very likely close to be necessary.
We next consider a class of asymptotically homogeneous functions. In this regard, we
follow PP. Let f : R×Θ→ R have the structure:
f(λx, θ) = v(λ, θ)h(x, θ) + b(λ, θ)A(x, θ)B(λx, θ), (11)
where supθ∈Θ |b(λ, θ) v−1(λ, θ)| → 0, as λ → ∞; supθ∈Θ |A(x, θ)| is locally bounded, that
is, bounded on bounded intervals; supθ∈Θ |B(λx, θ)| is bounded on R; h(x, θ) is regular on
Θ satisfying∫|s|≤δ h
2(s, θ)ds > 0 for all θ 6= θ0 and δ > 0, and v(λ, θ) satisfy: there exist
ε > 0 and a neighborhood N of θ such that as λ→∞
inf|p−p|<ε|q−q|<ε
infθ∈N|pv(λ, θ)− qv(λ, θ0)| → ∞, (12)
for any θ 6= θ0 and p, q > 0.
Theorem 2.5. Suppose that f in model (1) has the structure (11), and in addition to
Assumption 2.2, there exists a continuous Gaussian process G(t) such that x[nt],n ⇒ G(t)
on D[0, 1], where xi,n = xi/dn and 0 < dn →∞ is a sequence of real numbers. Then, the
NLS estimator θn defined by (2) is a consistent estimator of θ0, i.e. θn →P θ0.
The conditions on f given in Theorem 2.5 are the same as those used in Theorem 4.3 of
PP, which are satisfied by the functions such as the Box-Cox transformation (|x|θ − 1)/θ.
The difference between current Theorem 2.5 and Theorem 4.3 of PP is that we only
require that x[nt]/dn converges weakly to a continuous Gaussian process, which is close to
be necessary.
To investigate the weak consistency of θn, this section makes use of three different
sets of conditions on f : Assumption 2.1 with T being integrable (Theorems 2.2–2.3);
Assumptions 2.1 and 2.4 (Theorem 2.4) and the condition (11) (Theorem 2.5). We remark
that these condition sets are mutually exclusive. There are examples of the function f
such that one of these assumptions holds, but not other two. For instance, the function
f(x, θ) = θ/(1 + x2) satisfies Assumption 2.1 with h(x) = x and T (x) = 1/(1 + x2),
but Assumption 2.4 and the condition (11) fail; the function f(x, θ) = (x + θ)2 satisfies
Assumptions 2.1 and 2.4, but it does not satisfy condition (11).
8
3 Limit distribution
This section considers limit distribution of θn. In what follows, let Qn and Qn be the
first and second derivatives of Qn(θ) in the usual way, that is, Qn = ∂Qn/∂θ and Qn =
∂2Qn/∂θ∂θ′. Similarly we define f and f . We assume these quantities exist whenever they
are introduced. The following result comes from an application of Lemma 1 in Andrews
and Sun (2004), which provides a framework and plays a key part in our development on
limit distribution of θn.
Theorem 3.1. There exists a sequence of m ×m nonrandom nonsingular matrices Dn
with ‖ D−1n ‖→ 0, as n→∞, such that
(i) supθ:‖Dn(θ−θ0)‖≤kn ‖ (D−1n )′
∑nt=1
[f(xt, θ)f(xt, θ)
′−f(xt, θ0)f(xt, θ0)′]D−1n ‖= oP (1),
(ii) supθ:‖Dn(θ−θ0)‖≤kn ‖ (D−1n )′
∑nt=1 f(xt, θ)
[f(xt, θ)− f(xt, θ0)
]D−1n ‖= oP (1),
(iii) supθ:‖Dn(θ−θ0)‖≤kn ‖ (D−1n )′
∑nt=1 f(xt, θ)utD
−1n ‖= oP (1),
(iv) Yn := (D−1n )′
∑nt=1 f(xt, θ0)f(xt, θ0)′D−1
n →D M , where M > 0, a.s., and
Zn := (D−1n )′
n∑t=1
f(xt, θ0)ut = OP (1),
for some sequence of constants {kn, n ≥ 1} for which kn → ∞, as n → ∞. Then, there
exists a sequence of estimators {θn, n ≥ 1} satisfying Qn(θn) = 0 with probability that
goes to one and
Dn(θn − θ0) = Y −1n Zn + oP (1). (13)
If we replace (iv) by the following (iv)’, then Dn(θn − θ0)→D M−1 Z.
(iv)’ for any α′i = (αi1, ..., αim) ∈ Rm, i = 1, 2, 3,
(α′1 Ynα2, α′3Zn) →D (α′1M α2, α′3 Z),
where M > 0, a.s. and P (Z <∞) = 1.
Our routine in establishing the limit distribution of θn is essentially different from that
of PP, and is particularly convenient in the situation that f , f and f all are integrable in
which Dn = κn I for some 0 < κn → ∞, where I is an identity matrix. See next section
for examples. The condition AD3 of PP requires to show Yn →P M instead of (iv). It is
equivalent to say that, under the PP’s routine, one requires to show (at least under an
enlarged probability space)
1√n
n∑t=1
g(xt)→P
∫ ∞−∞
g(s)dsLW (1, 0), (14)
9
where xt is an integrated process, g is a real integrable function and LW (t, s) is the
local time of the standard Brownian Motion W (t)2. The convergence in probability is
usually hard or impossible to establish without enlarging the probability space. Our
routine essentially reduces the convergence in probability to less restrictive convergence
in distribution. Explicitly, in comparison to (14), we only need to show that
1√n
n∑t=1
g(xt)→D
∫ ∞−∞
g(s)dsLW (1, 0). (15)
This allows to extend the nonlinear regressions to a much wider class of nonstationary
regressor data series, and enables our methodology and proofs straightforward and neat.
3.1 Limit distribution: integrable functions
We now establish our results on convergence in distribution for the θn, which mainly
involves settling the conditions (i)–(iii) and (iv)’ in Theorem 3.1. First consider the
situation that f is an integrable function, together with some additional conditions on xt
and ut.
Assumption 3.1. (i) xt is defined as in Example 1, that is, xt =∑t
j=1 ξj, where ξj
satisfies (6);
(ii) Fk is a sequence of increasing σ-fields such that εk ∈ Fk and εk+1 is independent of
Fk for all k ≥ 1, and εk ∈ F1 for all k ≤ 0;
(iii) {uk,Fk}k≥1 forms a martingale difference satisfying maxk≥m |E(u2k+1 | Fk) − 1| → 0
a.s., and for some δ > 0, maxk≥1E(|uk+1|2+δ | Fk) <∞.
Assumption 3.2. Let p(x, θ) be one of f , fi and fij, 1 ≤ i, j ≤ m.
(i) p(x, θ0) is a bounded and integrable real function;
(ii) Σ =∫∞−∞ f(s, θ0)f(s, θ0)′ds > 0 and
∫∞−∞(f(s, θ)− f(s, θ0))2ds > 0 for all θ 6= θ0;
(iii) There exists a bounded and integrable function Tp : R → R such that |p(x, θ) −p(x, θ0)| ≤ hp(||θ − θ0||)Tp(x), for each θ, θ0 ∈ Θ, where hp(x) is a bounded real
function such that hp(x) ↓ hp(0) = 0, as x ↓ 0.
2Here and below, the process {Lζ(t, s), t ≥ 0, s ∈ R} is said to be the local time of a measurableprocess {ζ(t), t ≥ 0} if, for any locally integrable function T (x),∫ t
0
T [ζ(s)]ds =
∫ ∞−∞
T (s)Lζ(t, s)ds, all t ∈ R,
with probability one.
10
Theorem 3.2. Under Assumptions 3.1 and 3.2, we have√n/dn(θn − θ0)→D Σ−1/2 NL
−1/2G (1, 0), (16)
where N is a standard normal random vector, which is independent of G(t) defined by
G(t) =
{W3/2−µ(t), under C1,
W (t), under C2.(17)
Here and below, Wβ(t) denotes the fractional Brownian motion with 0 < β < 1 on
D[0, 1], defined as follows:
Wβ(t) =1
A(β)
[ ∫ 0
−∞
[(t− s)β−1/2 − (−s)β−1/2
]dW (s) +
∫ t
0
(t− s)β−1/2dW (s)],
where W (s) is a standard Brownian motion and
A(β) =( 1
2β+
∫ ∞0
[(1 + s)β−1/2 − sβ−1/2
]2
ds)1/2
.
Remark 3.1. Theorem 3.2 improves Theorem 5.1 of PP by allowing xt to be a long
memory process, which includes fractional integrated processes as an example.
Using the same routine and slight modifications of assumptions, we have the following
limit distribution when xt is a β-regular Harris recurrent Markov chain.
Assumption 3.1*. (i) xt is defined as in Example 2 satisfying (9), that is, xt is a β-
regular Harris recurrent Markov chain;
(ii) E(uk | Fnk) = 0 for any 1 ≤ k ≤ n, n ≥ 1, maxk≥m |E(u2k+1 | Fnk)− 1| → 0 a.s. and
for some δ > 0, maxk≥1E(|uk+1|2+δ | Fnk) <∞, where Fnk = σ(Fk, x1, ..., xn).
Assumption 3.2*. Let p(x, θ) be one of f , fi and fij, 1 ≤ i, j ≤ m.
(i) p(x, θ0) is a bounded real function with∫∞∞|p(x, θ0)|π(dx) <∞;
(ii) Σπ =∫∞−∞ f(s, θ0)f(s, θ0)′π(ds) > 0 and
∫∞−∞(f(s, θ) − f(s, θ0))2π(ds) > 0 for all
θ 6= θ0;
(iii) There exists a bounded function Tp : R → R with∫∞−∞ Tp(x)π(dx) < ∞ such that
|p(x, θ)− p(x, θ0)| ≤ hp(||θ− θ0||)Tp(x), for each θ, θ0 ∈ Θ, where hp(x) is a bounded
real function such that hp(x) ↓ hp(0) = 0, as x ↓ 0.
11
Theorem 3.3. Under Assumptions 3.1* and 3.2*, we have√a(n)(θn − θ0)→D Σ−1/2
π NΠ−1/2β , (18)
where N is a standard normal random vector, which is independent of Πβ, and for β = 1,
Πβ = 1, and for 0 < β < 1, Π−ββ is a stable random variable with Laplace transform
E exp{−tΠ−ββ } = exp{− tβ
Γ(β + 1)
}, t ≥ 0. (19)
Remark 3.2. Theorem 3.3 is new, even for the stationary case (β = 1) in which a(n) = n
and the NLS estimator θn converges to a normal variate. The random variable Πβ in
the limit distribution, after scaled by a factor of Γ(β + 1)−1, is a Mittag-Leffler random
variable with parameter β, which is closely related to a stable random variable. For details
regarding the properties of this distribution, see page 453 of Feller (1971) or Theorem 3.2
of Karlsen and Tjøstheim (2001).
Remark 3.3. Assumption 3.1* (ii) imposes a strong orthogonal property between the
regressor xt and the error sequence ut. It is not clear at the moment whether such
condition can be relaxed to a less restrictive one where xt is adapted to Fnt and xt+1 is
independent of Fnt for all 1 ≤ t ≤ n. We leave this for future research.
Remark 3.4. Unlike linear cointegration, integrable regression functions do not propor-
tionately transfer the effect of outliers of the nonstationary regressor to the response.
This is useful when practitioners want to lessen the impact of extreme values of xt on the
response. In addition, such functions arise in macro-economics when the dependent vari-
able has an uneven response to the regressor. Empirical examples include central banks’
market intervention and the currency exchange target zones, described in Phillips (2001).
Remark 3.5. The convergence rates in (16) and (18) are reduced by the nonstationarity
property of the regressor xt when f is an integrable function. For the partial sums of linear
processes in Theorem 3.2, comparing to the stationary situation where the convergence
rate is n1/2, the rate is reduced to n3/4−µ/2ρ1/2(n) for the long memory case, and is further
reduced to n1/4 for the short memory case. For the β-regular Harris recurrent Markov
chain, note that by (9), the asymptotic order a(n) is regularly varying at infinity, i.e., there
exists a slowly varying function ρ(n) such that a(n) ∼ nβρ(n). Therefore, the convergence
rate of the NLS estimator decreases as the regularity index β of the chain decreases from
one (stationary situation) to zero.
12
3.2 Limit distribution: beyond integrable functions
We next consider the limit distribution of θn when the regression function f is non-
integrable. Unlike PP and Chang et al. (2001), we use a more general regressor xt in the
development of our asymptotic distribution.
Assumption 3.3. (i) {ut,Ft, 1 ≤ t ≤ n} is a martingale difference sequence satisfying
E(|ut|2|Ft−1) = σ2 and sup1≤t≤nE(|ut|2q|Ft−1) <∞ a.s., where q > 1;
(ii) xt is adapted to Ft−1, t = 1, ..., n, max1≤t≤n |xt|/dn = OP (1), where d2n = var(xn) →
∞; and
(iii) There exists a vector of continuous Gaussian processes (G,U) such that
(x[nt],n, n−1/2
[nt]∑i=1
ui)→D (G(t), U(t)), (20)
on DR2 [0, 1], where xt,n = xt/dn.
Assumption 3.4. Let p(x, θ) be one of f , fi and fij, 1 ≤ i, j ≤ m. There exists a real
function Tp : R→ R such that
(i) |p(x, θ)− p(x, θ0)| ≤ Ap(||θ− θ0||)Tp(x), for each θ, θ0 ∈ Θ, where Ap(x) is a bounded
real function satisfying Ap(x) ↓ Ap(0) = 0, as x ↓ 0;
(ii) For any bounded x, supθ∈Θ |p(λx, θ) − vp(λ)hp(x, θ)|/Tp(λx) = o(1), as λ → ∞,
where hp(x, θ) on Θ is a regular function and vp(λ) is a positive real function which
is bounded away from zero as λ→∞; and
(iii) Tp(λx) ≤ vp(λ)T1p(x) as |λx| → ∞, where T1p : R → R is a real function such that
T1p + T 21p is locally integrable (i.e. integrable on any compact set).
For notation convenience, let v(t) = vf (t), h(x, θ) = hf (x, θ), vi(t) = vfi(t) and
v(t) = (v1(t), ..., vm(t)). Similarly, we define v(t), h(x, θ) and h(x, θ). We have the
following main result.
Theorem 3.4. Suppose Assumptions 3.3 and 3.4 hold. Further assume that
sup1≤i,j≤m |v(dn) vij(dn)
vi(dn) vj(dn)| <∞ and
∫|s|≤δ h(s, θ0)h(s, θ0)′ds > 0 for all δ > 0. Then we have
Dn (θn − θ0)→D
(∫ 1
0
Ψ(t)Ψ(t)′dt)−1
∫ 1
0
Ψ(t) dU(t), (21)
on D[0, 1], as n→∞, where Ψ(t) = h(G(t), θ0) and Dn = diag(√nv1(dn), ...,
√nvm(dn)).
13
Remark 3.6. Except the joint convergence, other conditions in establishing Theorem
3.4 are similar to Theorem 5.2 of PP. PP made use of the concept of H0-regular. Our
conditions on f , fi and fij, 1 ≤ i, j ≤ m are more straightforward and easy to identify.
Furthermore, the joint convergence assumption under present paper is quite natural when
f, f and f all satisfy Assumption 3.4 (ii). Indeed, under fi(λx, θ) ∼ vi(λ)hi(x, θ), one can
easily obtain the following asymptotics under our joint convergence.
D−1n
n∑t=1
f(xt, θ0)ut →D
∫ 1
0
h(G(t), θ0)dU(t), (22)
on D[0, 1], which is required in the proof of (21). See, e.g., Kurtz and Protter (1991) and
Hansen (1992).
Remark 3.7. If we further assume that U(t) and G(t) are asymptotic independent, that
is, the long run relationship between the regressor sequence xt and the innovative sequence
ut vanishes asymptotically, the limiting distribution in (21) will become mixed normal.
In particular, when U(t) is a standard Wiener process, we have
Dn (θn − θ0)→D
(∫ 1
0
Ψ(t)Ψ(t)′dt)−1/2
N, (23)
where N is a standard normal random vector.
Remark 3.8. Nonlinear cointegrating regressions with structure described in Theorem
3.4 are useful to modelling money demand functions. In such cases, yt is the logarithm of
the real money balance, xt is the nominal interest rate, and f can either be f(x, α, β) =
α + β log |x| or f(x, α, β) = α + β log(1+|x||x| ). See Bae and de Jong (2007) and Bae et
al. (2006) for empirical studies investigating the estimation of money demand functions
in USA and Japan respectively. Also, see Bae et al. (2004) for the derivation of these
functional forms from the underlying money demand theories studied in macro-economics.
Remark 3.9. Another example is the Michaelis–Menton model, which was considered
by Lai (1994). In such case, the regression function is f(x, θ1, θ2) = θ1/(θ2 + x)I{x ≥ 0}.Bates and Watts (1988) employs such model to investigate the dependence of rate of
enzymatic reactions on the concentration of substrate.
Remark 3.10. Unlike in the situation where f is integrable, there is super convergence
when f has the structure given in Assumption 3.4 and xt is nonstationary. The conver-
gence rate is given by Dn, whose elements√nvi(dn) are all faster than the standard rate
√n in the stationary situation.
14
We finally remark that, if f is an asymptotically homogeneous function having the
structure (11), it is also possible to extend PP’s result from a short memory linear process
to general nonstationary time series as described in Assumption 3.3. The idea of proof
for such result can be easily generalized from Theorem 5.3 of PP. Also see Chang et al.
(2001). As it only involves slight modification of notation, we omit the details.
4 Extension to endogeneity
Assumption 2.2 ensures the model (1) having a martingale structure. The result in this
regard is now well known. However, there is little work on allowing for contemporaneous
correlation between the regressors and regression errors. Using a nonparametric approach,
Wang and Phillips (2009b) considered kernel estimate of f and allowed the equation error
ut to be serially dependent and cross correlated with xs for |t− s| < m0, thereby inducing
endogeneity in the regressor, i.e., cov(ut, xt) 6= 0. In relation to present paper de Jong
(2002) considered model (1) without exogeneity, and assuming certain mixing conditions.
Chang and Park (2011) considered a simple prototypical model, where the regressor and
regression error are driven by i.i.d. innovations. This section provides some extensions
to these works. Our main results show that there is an essential effect of endogeneity on
the asymptotics, introducing the bias in the related limit distribution. Full investigation
in this regard requires new asymptotic results, which will be left for future work.
4.1 Asymptotics: integrable functions
We first consider the situation that f satisfies Assumption 2.1 with T being integrable. Let
ηi ≡ (εi, νi), i ∈ Z be a sequence of i.i.d. random vectors satisfying Eη0 = 0 and E‖η0‖2q <
∞, where q ≥ 1. Let the characteristic function ϕ(t) of ε0 satisfy∫∞−∞ |ϕ(t)|2dt <∞ and∫∞
−∞ |t|3|ϕ(t)|mdt < ∞ for some m > 0. As noticed in Remark 4 of Jeganathan (2008),
the conditions on the characteristics function ϕ(t) are not very restrictive.
Assumption 4.1. (i) xt =∑t
j=1 ξj, where ξj =∑∞
k=0 φkεj−k and the coefficients φk
satisfies (i)∑∞
k=0 k|φk| <∞ and∑∞
k=0 φk 6= 0 or (ii) φk ∼ k−µρ(k), where 1/2 < µ < 1
and ρ(k) is a function slowly varying at ∞.
(ii) ut =∑∞
k=0 ψk νt−k, where the coefficients ψk satisfies that∑∞
k=0 k2|ψk|2 < ∞, ψ ≡∑∞
k=0 ψk 6= 0 and∑∞
k=0 |ψk|max{1, |φk|} <∞ where φk =∑k
i=0 φi.
As we do not impose the independence between εk and νk, Assumption 4.1 provides
15
the endogeneity in the model (1), and it is much general than the conditions set given in
Chang and Park (2011). We have the following result.
Theorem 4.1. Suppose Assumptions 2.1 and 4.1 hold. Further assume:
(i) T is bounded and integrable, and
(ii)∫∞−∞(f(s, θ)− f(s, θ0))2ds > 0 for all θ 6= θ0.
Then the NLS estimator θn defined by (2) is a consistent estimator of θ0, i.e. θn →P θ0.
If in addition Assumption 3.2, then, as n→∞,√n/dn(θn − θ0)→D Σ−1 Λ1/2 NL
−1/2G (1, 0), (24)
where f(µ) =∫eiµxf(x, θ0)dx,
Λ = (2π)−1∫ f(µ)f(µ)′[Eu2
0 + 2∑∞
r=1E(u0ure−iµxr)] dµ, (25)
and other notations are given as in Theorem 3.1.
4.2 Asymptotics: beyond integrable functions
This section considers the situation that f satisfies Assumption 2.1 with T being non-
integrable. Let again ηi ≡ (εi, νi), i ∈ Z be a sequence of i.i.d. random vectors with
Eη0 = 0 and E‖η0‖2q < ∞, where q > 2. We make use of the following assumption in
related to the model (1).
Assumption 4.2. (i) xt =∑t
j=1 ξj, where ξj =∑∞
k=0 φkεj−k with φ =∑∞
k=0 φk 6= 0 and∑∞k=0 |φk| <∞;
(ii) ut =∑∞
k=0 ψk νt−k, where the coefficients ψk are assumed to satisfy∑∞
k=0 k|ψk| <∞,
ψ ≡∑∞
k=0 ψk 6= 0 and∑∞
k=0 k|ψk||φk| <∞.
Theorem 4.2. Suppose that Assumption 4.2 holds and in addition to Assumption 2.4, for
any θ 6= θ0, g(x, θ, θ0) is twice differentiable with that g′(x, θ, θ0) is locally bounded (i.e.,
bounded on any compact set). Then the NLS estimator θn defined by (2) is a consistent
estimator of θ0, i.e. θn →P θ0.
Define hx(x, θ) = ∂h(x,θ)∂x
.
Theorem 4.3. In addition to the conditions of Theorem 4.2, Assumptions 3.4 holds.
Further assume that:
16
(i) sup1≤i,j≤m |v(√n) vij(
√n)
vi(√n) vj(
√n)| <∞ and
∫|s|≤δ h(s, θ0)h(s, θ0)′ds > 0 for all δ > 0, and
(ii) hx(x, θ) is locally bounded (i.e., bounded on any compact set).
Then the limit distribution of θn is given by
Dn(θn − θ0)→D
(∫ 1
0
Ψ(t)Ψ(t)′dt)−1[
σξu
∫ 1
0
Ψ(t) dt+
∫ 1
0
Ψ(t) dU(t)], (26)
as n → ∞, where Ψ(t) = h(W (t), θ0), Ψ(t) = hx(W (t), θ0), σξu =∑∞
j=0E(ξ0uj), Dn =
diag(√nv1(√n), ...,
√nvm(
√n)) and (W (t), U(t)) is a bivariate Brownian motion with
covariance matrix
∆ =
(φ2Eε20 φψEε0ν0
φψEε0ν0 ψ2Eν20
).
5 Simulation
In this section, we investigate the finite sample performance of the NLS estimator θn of
nonlinear regressions with endogeneity. Chang and Park (2011) performed simulation of
similar model, but only considered the error structure ut to be i.i.d. innovation. We intend
to investigate the sampling behavior of θn under different degree of serially dependence
of ut on itself. To this end, we generate our data in the following way:
xt = xt−1 + εt,
vt =√
1− ρ2wt + ρεt,
f1(x, θ) = exp{−θ|x|} and f2(x, α, β1, β2) = α + β1x+ β2x2, (27)
where {wt} and {εt} are i.i.d. N(0, 32) variables, and ρ is the correlation coefficient that
controls the degree of endogeneity. The true value of θ is set as 0.1 and that of (α, β1, β2)
is set as (−20, 10, 0.1). The error structure is generated according to the following three
scenarios:
S1: ut = vt,
S2: ut =∑∞
j=1 j−10 vt−j+1, and
S3: ut =∑∞
j=1 j−4 vt−j+1.
Scenario S1 is considered by Chang and Park (2011), which eliminates the case that
ut is serially correlated. Scenarios S2 and S3 introduce self-dependence to the error
sequence. The decay rate of S2 is faster than that of S3, which implies that the level of
self dependence of ut increases from S1 to S3.
17
It is obvious that f1 is an integrable function and f2 satisfies the conditions in Theorem
4.3, with
v(λ, α, β1, β2) = λ2 and h(x, α, β1, β2) = β2x2.
In addition, we have
h(x, α, β1, β2) = (1, x, x2)′ and hx(x, α, β1, β2) = (0, 1, 2x)′.
It would be convenient for later discussion to write out the limit distributions of the
estimators for the f2 case. By Theorem 4.3, we have n1/2(αn − α0)
n(β1n − β10)
n3/2(β2n − β20)
→D Σ−1
U(1)
σξu +∫ 1
0W (t)dU(t)
2σξu∫ 1
0W (t)dt+
∫ 1
0W 2(t)dU(t)
, (28)
where σξu =∑∞
j=0E(ε0uj) and
Σ =
1∫ 1
0W (t)dt
∫ 1
0W 2(t)dt∫ 1
0W (t)dt
∫ 1
0W 2(t)dt
∫ 1
0W 3(t)dt∫ 1
0W 2(t)dt
∫ 1
0W 3(t)dt
∫ 1
0W 4(t)dt
.
In our simulations, we draw samples of sizes n = 200, 500 to estimate the NLS es-
timators and their t-ratios. Each simulation is run for 10,000 times, and their densities
are estimated using kernel method with normal kernel function. It is expected that as ρ
increases, the degree of endogeneity increases, and hence the variance of the distribution
will increase due to (24) and (26). It is also expected that the more serial correlation of
ut, the higher the variance of the limit distribution due to the cross terms appeared in
(25) and (26). Our simulation results largely corroborate with our theoretical results in
Section 4.
Firstly, the means of the estimators are close to the true values, and the deviations
from true values decrease as the sample sizes increase. The reductions of standard errors
from n = 200 to 500 are close to the theoretical values. For example, the ratio of standard
errors for αn between the sample size 200 and 500 is 1.564 in scenario S1, which is close
to theoretical value√
500200
= 1.581. This confirms that our estimators are accurate in all
scenarios. We present all these numerical values in Tables 2 and 3, which are given in the
Supplemental Materials of this paper (Section 12).
Secondly, comparing the S1–S3 curves in each plot, we can see that the S3 curves
have fat tails and lower peak than those of S1–S2. This verifies that high dependence
18
of ut on its own past will increase the variance of limit distribution. On the other hand,
comparing the shapes of the curves for different values of ρ, the curves with ρ = 0
have highest peaks, and the peakedness decrease as ρ increases. This matches with our
expectation that, if the cross dependence between ut and xt increases, the variance of
limit distribution will increase. The following Figure 1 provides the density estimates of
θn with n = 500. Additional figures for density estimates of θn and (αn, β1n, β2n) are given
in the Supplemental Materials of this paper (Figures 6-9).
Figure 1: Density estimates of θn.
Finally, the sampling results for t-ratios are also much expected from our limit theories.
In particular, we have interesting results for the t-ratios of β1n in Figure 2. In scenario S1,
Figure 2: Density of β1n t-ratios.
there is no endogeneity and the limiting t-ratios overlap with the normal curve. However,
as we introduce endogeneity to the model in scenarios S2 and S3, the t-ratios is shifted
away from the standard normal. Such behavior can be explained by looking at the limit
distribution of β1n in (28). When ρ 6= 0, there exists an extra term of Σ−1σξu, and such
term will introduce bias to the limit distribution. More figures for t-ratios are given in
the Supplemental Materials of this paper (Figures 10-13).
19
6 An empirical example: Carbon Kuznets Curve
In this section, we consider a simple example of the model with endogeneity. The equation
of interest is the Carbon Kuznets Curve (CKC) relating the per capita CO2 emission of
a country to its per capita GDP. Basically, the CKC hypothesis states that there is an
inverted-U shaped relationship between economic activities and per capita CO2 emissions.
As it is well accepted that the GDP variable displays some wandering nonstationary be-
haviors over time, such problem is relevant to nonlinear transformations of nonstationary
regressors. Refer to Wagner (2008) and Muller-Furstenberger and Wagner (2007) for
detailed expositions of how such problem relates to the works of PP and Chang et al.
(2001).
The formula that we currently consider has a quadratic formulation and is given by
ln(et) = α + β1 ln(xt) + β2(ln(xt))2 + ut, 1 ≤ t ≤ n, (29)
where et and xt denote the per capital emissions of CO2 and GDP in period t, and ut is a
stochastic error term. Figure 3 presents the plots of logarithm of CO2 versus logarithm of
GDP for Belgium, Denmark and France. The quadratic polynomial specification of CO2
in GDP has been extensively analysed in the literature. See the introduction of Piaggio
and Padilla (2012) for an overiew. Roughly speaking, the upward slope of the curve can
be explained by the increase in natural resources depletion as the economic activities of
grow. As the country continues to develop, technological advance and stricter regulatory
policies will start contributing to a reduction in the emission of air pollutants, hence,
result in an inverted-U shape.
In the usual CKC formulation, the logarithm of per capita CO2 emissions and GDP
are assumed to be integrated processes, see Muller-Furstenberger and Wagner (2007).
Also, it is clear that the nonlinear link function f(x, α, β1, β2) = α + β1x + β2x2 satisfies
the conditions in Theorem 4.3, and is considered in our simulations. There are studies
which employ more complicated models, by including time trend, stochastic terms and
additional explanatory variables such as the energy structure. To avoid complicating our
discussion, we only consider GDP as the only explanatory variable.
In the literature most studies assumed that the parameters (β1, β2) are homogeneous
across different countries. Several works investigated the appropriateness of restricting all
countries adhering to the same values of (β1, β2). For example, List and Gallet (1999), Di-
jkgraaf et al. (2005), Piaggio and Padilla (2012). Particularly, Piaggio and Padilla (2012)
20
tested this assumption by checking whether the confidence intervals of (β1, β2) across dif-
ferent countries overlap. They rejected the homogeneity assumption of parameters across
different countries, based on the result that they could not find any possible group with
more than 4 countries whose confidence intervals overlap with each other.
As an example of employing our NLS estimators θn = (αn, β1n, β2n) and calculating
their confidence intervals, we carry out a similar study and examine whether the 95% con-
fidence intervals of the parameters (β1n, β2n) across different countries overlap with each
other. We extend previous studies by incorporating the endogeneity in the model. The
assumption that xt is an exogenous variable does not usually hold in practise as there are
many potential sources of endogeneity. Firstly, endogeneity occurs due to measurement
error of the explanatory variables xt, including the misreporting of GDP of each country.
Secondly, there might be omitted variable bias, which arises if there exists a relevant
explanatory variable that is correlated with the regressor xt (per capita GDP), but is
excluded from the model. Finally, potential reverse causality will give rise to endogeneity.
That is, the CO2 emi ssions yt might at the same time has a feedback effect on the GDP
xt (e.g., due to stricter government regulation).
We consider 14 countries, which are shown in Piaggio and Padilla (2012) that their
response variables have a quadratic relationship described in (29) with the regressors. We
use annual data in this example and for each country, there are 58 observations of the
CO2 emission data from 1951–2008 published by the Carbon Dioxide Information Analysis
Center (Boden et al. (2009)). For per capita GDP, we adopt the GDP data from Maddison
(2003), which is transformed to 1990 Geary-Khamis dollars. We performed a unit root
test to ensure the data are nonstationary, and we observed that all Dickey-Fuller test
statistics were greater than the critical value. Therefore, we do not reject the hypothesis
that there is a unit root. The results are presented in the Supplementary Materials of
Figure 3: Plots of log(CO2) against log(GDP) for Belgium, Denmark and France.
21
this paper (Section 11).
We estimate the parameters by minimizing the error function given in (2) using the
nls() function in the statistical software R. Next, we calculate the confidence interval
by finding the critical values of the limit distribution of (αn, β1n, β2n) given in (29). Note
that most of the quantities in (29) involve only the standard Brownian motion or integral
of Brownian motions, which can be easily obtained by direct simulations. However, there
exist two nuisance parameters, including the covariance matrix ∆ of Brownian motions
(W,U) and the autocovariances σξu of linear processes {ξt, ut}. Thus we have to replace
the unknown parameters with their consistent estimates and adopt the empirical critical
values instead of the real ones. Using the linear process estimation procedures described
in Chang et al. (2001), we have φ, ψ as the estimators of φ, ψ, and εt and νt as the
artificial error sequences. The parameters ∆ and σξu can then be estimated by
∆ =
(n−1φ2
∑nt=1 ε
2t n−1φψ
∑nt=1 εtνt
n−1φψ∑n
t=1 εtνt n−1ψ2∑n
t=1 ν2t
), and σξu =
l∑j=1
n−1
n−j∑t=1
ξtut+j,
for any l = o(n1/4). See Ibragimov and Phillips (2008), Phillips and Solo (1992) and
Phillips and Perron (1988).
Figure 4: Estimates and 95% Confidence Intervals of β1n.
Tables 5 and 6 in the Supplementary Materials of this paper report the estimates
and 95% confidence intervals of parameters β1 and β2, and Figures 4 and 5 here present
the plots of the results. For the parameter β1, we can not find any group with more
than four groups of countries whose confidence intervals overlap. Such result is consistent
with that of Piaggio and Padilla (2012). However, for the parameter β2n, among the 13
22
countries, we can find group of 11 countries whose confidence intervals overlap. Hence,
the homogeneity of parameters can not be rejected as in Piaggio and Padilla (2012).
There are two main reasons for the differences between our results and theirs. Firstly,
they assume the innovation is i.i.d. normally distributed while we allow the error to be
serially correlated with itself. Secondly, we incorporate the effect of endogeneity in our
limit distribution of the estimators. Based on (29), both reasons will make the critical
values and confidence intervals significantly larger than that of Piaggio and Padilla (2012).
Similarly, we may construct the estimate and confidence interval for α. See Table 4 and
Figure 14 in Supplementary Materials of this paper for this kind of result.
7 Conclusion and discussion
In this paper, we establish an asymptotic theory for a nonlinear parametric cointegrating
regression model. A general framework is developed for establishing the weak consis-
tency of the NLS estimator θn. The framework can easily be applied to a wide class of
nonstationary regressors, including partial sums of linear processes and Harris recurrent
Markov chains. Limit distribution of θn is also established, which extends previous works.
Furthermore, we introduce endogeneity to our model by allowing the error term to be
serially dependent on itself, and cross-dependent on the regressor. We show that the limit
distribution of θn under the endogeneity situation is different from that with martingale
error structure. This result is of interest in the applied econometric research area.
Asymptotics in this paper are limited to univariate regressors and nonlinear para-
metric cointegrating regression models. Specification of the nonlinear regression function
f depends usually on the underlying theories of the subject. Illustration can be found
in Bae et al. (2004), where the concept of the money in the utility function with the
constant elasticity of substitution (MUFCES) is used to relate the money balance with
nominal interest rate by the link function f(x, α, β) = α + β log( x1+x
). More currently,
nonparametric method has been developed to test the specification of f . See, e.g., Wang
and Phillips (2012), for instance. Extension to multivariate regressors require substantial
different techniques. This is because local time theory can not in general be extended to
multivariate data, in which the techniques play a key rule in the asymptotics of the NLS
estimator with nonstationary regressor when f is an integrable function. Possible exten-
sion to multivariate regressors is the index model discussed in Chang and Park (2003).
This again requires some new techniques, and hence we leave it for future work.
23
8 Proofs of main results
This section provides proofs of the main results. We start with some preliminaries, which
list the limit theorems that are commonly used in the proofs of the main results.
8.1 Preliminaries
Denote Nδ(θ0) = {θ : ‖θ − θ0‖ < δ}, where θ0 ∈ Θ is fixed.
Lemma 8.1. Let Assumption 2.1 hold. Then, for any xt satisfying (4) with T being given
as in Assumption 2.1,
supθ∈Nδ(θ0)
κ−2n
n∑t=1
(|f(xt, θ)− f(xt, θ0)|+ |f(xt, θ)− f(xt, θ0)|2
)→P 0, (30)
as n→∞ first and then δ → 0. If in addition Assumption 2.2, then
κ−2n
n∑t=1
[f(xt, θ0)− f(xt, π0)]ut →P 0, (31)
for any θ0, π0 ∈ Θ, and
supθ∈Nδ(θ0)
κ−2n
n∑t=1
|f(xt, θ)− f(xt, θ0)| |ut| →P 0, (32)
as n→∞ first and then δ → 0.
Proof. (30) is simple. As κ−2n
∑nt=1[f(xt, θ0)− f(xt, π0)]2 ≤ Cκ−2
n
∑nt=1 T
2(xt) = OP (1),
(31) follows from Lemma 2 of Lai and Wei (1982). As for (32), the result follows from
n∑t=1
|f(xt, θ)− f(xt, θ0)| |ut| ≤ h(‖θ − θ0‖)n∑t=1
T (xt) |ut|,
and∑n
t=1 T (xt) |ut| ≤ C∑n
t=1 T (xt) +∑n
t=1 T (xt)[|ut| − E(|ut| | Ft−1)
]= OP (κ2
n).
Lemma 8.2. Let Assumption 3.1 (i) hold. For any bounded g(x) satisfying∫∞−∞ |g(x)|dx <
∞, we haven∑t=1
g(xt) = OP (n/dn). (33)
If in addition Assumption 3.1 (ii)–(iii), then, for any bounded gi(x), i = 1, 2, satisfying∫∞−∞ |gi(x)|dx <∞ and
∫∞−∞ gi(x)dx 6= 0,{(dn
n
)1/2n∑t=1
g1(xt)ut,dnn
n∑t=1
g2(xt)}→D
{τ1N L
1/2G (1, 0), τ2 LG(1, 0)
}, (34)
where τ 21 =
∫∞−∞ g
21(s)ds, τ2 =
∫∞−∞ g2(s)ds and N is a standard normal variate indepen-
dent of G(t).
24
Proof. The result (33) sees Lemma 3.2 of Wang and Phillips (2009b). Theorem 2.2 of
Wang (2013) provides (34) with g2(x) = g21(x). It is not difficult to see that (34) still
holds for general g2(x). We omit the details.
Lemma 8.3. Let xt be defined as in Example 2. For any g such that∫∞−∞ |g(x)|π(dx) <∞
and∫∞−∞ g(x)π(dx) 6= 0, we have
n∑t=1
g(xt) = OP [a(n)], (35)
[ n∑t=1
g(xt)]−1
= OP [a(n)−1]. (36)
If Assumption 3.1* holds, then, for any bounded gi(x), i = 1, 2, satisfying∫∞−∞ |gi(x)|π(dx) <
∞ and∫∞−∞ gi(x)π(dx) 6= 0,
{a(n)−1/2
n∑t=1
g1(xt)ut, a(n)−1
n∑t=1
g2(xt)}→D
{τ5N Π
1/2β , τ6 Πβ
}, (37)
where τ 25 =
∫∞−∞ g
21(s)π(ds), τ6 =
∫∞−∞ g2(s)π(ds), Πβ is defined as in Theorem 3.3 and N
is a standard normal variate independent of Πβ.
Proof. See the Supplemental Material of this paper.
Lemma 8.4. Under Assumption 4.1, for any bounded gi(x), i = 1, 2, satisfying∫∞−∞ |gi(x)|dx <
∞ and∫∞−∞ gi(x)dx 6= 0, we have
n∑t=1
g1(xt)|ut| = OP (n/dn), (38)
and {(n/dn)−1/2
∑nt=1 g1(xt)ut, (n/dn)−1
∑nt=1 g2(xt)
}→D
{τ3N L
1/2G (1, 0), τ4LG(1, 0)
}, (39)
where τ 23 = (2π)−1
∫∞−∞ g1(µ)2[Eu2
0 + 2∑∞
r=1E(u0ure−iµxr)] dµ, g1(µ) =
∫∞−∞ e
iµxg1(x)dx
and τ4 =∫∞−∞ g2(s)ds.
Proof. See Theorem 3 and 5 of Jeganathan (2008).
25
Lemma 8.5. Under Assumption 4.2, for any twice continuous differentiable function
gi(x), i = 1, 2, with that |g′i(x)| is locally bounded (i.e., bounded on any compact set), we
have { 1√n
n∑t=1
g1
( xt√n
)ut,
1
n
n∑t=1
g2
( xt√n
)}→D
{σξu
∫ 1
0
g′1(W (t))dt+
∫ 1
0
g1(W (t))dU(t),
∫ 1
0
g2(W (t))dt}, (40)
as n→∞, where σξu =∑∞
j=0E(ξ0uj), and (W (t), U(t)) is a bivariate Brownian motion
with covariance matrix
∆ =
(φ2Eε20 φψEε0ν0
φψEε0ν0 ψ2Eν20
).
Consequently, under the conditions of Theorem 4.2, we have
n∑t=1
(f(xt, π0)− f(xt, θ0))ut = oP[n v(√n)]. (41)
Proof. The result (40) sees Theorem 4.3 of Ibragimov and Phillips (2008) with minor
improvements. To see (41), recall max1≤t≤n |xt|/√n = OP (1). Without loss of generality,
we assume max1≤t≤n |xt|/√n ≤ K0 for some K0 > 0. It follows from Assumption 2.4 that
n∑t=1
(f(xt, π0)− f(xt, θ0))ut
= v(√n)
n∑t=1
g(xt/√n, π0, θ0)ut + oP (v(
√n))
n∑t=1
T1(xt/√n)|ut|
= oP [nv(√n)],
as required.
Lemma 8.6. Under Assumptions 3.1 (i) and 3.2, we have
dnn
supθ∈Nδ(θ0)
n∑t=1
∣∣fi(xt, θ) fj(xt, θ)− fi(xt, θ0) fj(xt, θ0)∣∣ →P 0, (42)
dnn
supθ∈Nδ(θ0)
n∑t=1
∣∣fij(xt, θ) [f(xt, θ)− f(xt, θ0)]∣∣ →P 0, (43)
as n → ∞ first and then δ → 0, for any 1 ≤ i, j ≤ m. If in addition Assumption 3.1
(ii)–(iii), then
dnn
supθ∈Nδ(θ0)
∣∣ n∑t=1
fij(xt, θ)ut∣∣→P 0, (44)
26
as n→∞ first and then δ → 0, for any 1 ≤ i, j ≤ m.
Similarly, under Assumptions 3.1* (i) and 3.2*, (42) and (43) are true if we replace
dn/n by a(n)−1. If in addition Assumption 3.1* (ii), then (44) holds if we replace dn/n
by a(n)−1.
Proof. See the Supplemental Material of this paper.
Lemma 8.7. Under Assumptions 3.3 and 3.4, we have
1
nvi(dn)vj(dn)sup
θ∈Nδ(θ0)
n∑t=1
∣∣fi(xt, θ) fj(xt, θ)− fi(xt, θ0) fj(xt, θ0)∣∣ →P 0, (45)
1
nv(dn)vij(dn)sup
θ∈Nδ(θ0)
n∑t=1
∣∣fij(xt, θ) [f(xt, θ)− f(xt, θ0)]∣∣ →P 0, (46)
1
nvij(dn)sup
θ∈Nδ(θ0)
∣∣ n∑t=1
fij(xt, θ)ut∣∣ →P 0, (47)
as n→∞ first and then δ → 0, for any 1 ≤ i, j ≤ m.
Proof. See the Supplemental Material of this paper.
Lemma 8.8. Let Assumption 3.3 hold. Then, for any regular functions g(x, θ) and
g1(x, θ) on Θ, we have{ 1√n
n∑t=1
g(xtdn, θ)ut,
1
n
n∑t=1
g1
(xtdn, θ)}
→D
{∫ 1
0
g(G(t), θ
)dU(t),
∫ 1
0
g1
(G(t), θ
)dt}. (48)
Proof. If g(x, θ) and g1(x, θ) are continuous functions, the result follows from (20) and the
continuous mapping theorem. See, Kurtz and Protter (1991) for instance. The extension
from continuous function to regular function is standard in literature. The details can be
found in PP or Park and Phillips (1999).
Lemma 8.9. Let Dn(θ, θ0) = Qn(θ)−Qn(θ0). Suppose that, for any δ > 0,
lim infn→∞
inf|θ−θ0|≥δ
Dn(θ, θ0) > 0 in probability, (49)
then θn →P θ0.
Proof. See Lemma 1 of Wu (1981).
27
8.2 Proofs of theorems
Proof of Theorem 2.1. Let N be any open subset of Θ containing θ0. Since θn is
the minimizer of Qn(θ) over θ ∈ Θ, by Lemma 8.9, proving consistency is equivalent to
showing that, for any 0 < η < 1/3 and θ 6= θ0, where θ, θ0 ∈ Θ, there exist n0 > 0 and
M1 > 0 such that
P(
infθ∈Θ∩N c
Dn(θ, θ0) ≥ κ2n/M1
)≥ 1− 3η, (50)
for all n > n0.
Denote Nδ(π0) = {θ : ‖θ − π0‖ < δ}. Since Θ ∩ N c is compact, by the finite covering
property of compact set, (50) will follow if we prove that, for any fixed π0 ∈ Θ ∩N c,
In(δ, π0) := supθ∈Nδ(π0)
κ−2n
∣∣∣Dn(θ, θ0)−Dn(π0, θ0)∣∣∣→P 0, (51)
as n→∞ first and then δ → 0, and for ∀η > 0, there exist M0 > 0 and n0 > 0 such that
for all n ≥ n0 and M ≥M0,
P(Dn(π0, θ0) ≥ κ2
n/M)≥ 1− 2η. (52)
Indeed, due to (51), for any 0 < η < 1/3 and M1 > 0, there exist n0 > 0 and δ0 > 0 such
that
P ( max1≤j≤m0
In(δ0, πj) ≥ 1/2M1) ≤ η,
where m0 and πj, 1 ≤ j ≤ m0, are chosen so that Θ ∩ N c ⊂⋃m0
j=1Nδ0(πj). Consequently,
by taking M1 ≥M0/(2m0), it follows from (52) that
P(
infθ∈Θ∩N c
Dn(θ, θ0) ≥ κ2n/M1
)≥ P
(inf
1≤j≤m0
Dn(πj, θ0) ≥ κ2n/(2M1)
)− η
≥ inf1≤j≤m0
P(Dn(πj, θ0) ≥ κ2
n/(2m0M1))− η ≥ 1− 3η,
which yields (50).
We next prove (51) and (52). The result (51) is simple, which follows immediately
from Lemma 8.1 and the fact that, for each fixed π0 ∈ Θ ∩N c,
In(δ, π0) ≤ supθ∈Nδ(π0)
κ−2n
n∑t=1
(f(xt, θ)− f(xt, π0))2
+ supθ∈Nδ(π0)
κ−2n
n∑t=1
|f(xt, θ)− f(xt, π0)||ut|.
28
To prove (52), we recall
Dn(π0, θ0) =n∑t=1
(f(xt, π0)− f(xt, θ0))2 −n∑t=1
(f(xt, π0)− f(xt, θ0))ut.
This, together with (5) and (31), implies that, for any η > 0, there exists n0 > 0 and
M0 > 0 such that for all n > n0 and M ≥M0,
P(Dn(π0, θ0) ≥ κ2
nM−1)≥ P
( n∑t=1
(f(xt, π0)− f(xt, θ0))2 ≥ κ2nM
−1/2)− η
≥ 1− 2η,
as required.
Finally, by noting Qn(θ0) = n−1∑n
t=1 u2t →P σ2 due to Assumption 2.2 and strong
law of large number, it follows from the consistency of θn and (51) that
|σ2n − σ2| ≤ Cκ−2
n |Qn(θn)−Qn(θ0)|+ oP (1) = oP (1).
Proof of Theorem 2.2. (4) follows from (33) of Lemma 8.2. (5) follows from (34) of
Lemma 8.2 with g2(x) = (f(x, θ)− f(x, θ0))2 and the facts that P (LG(1, 0) > 0) = 1 and∫∞−∞(f(s, θ)− f(s, θ0))2ds > 0, for any θ 6= θ0.
Proof of Theorem 2.3. (4) follows from (35) of Lemma 8.3 with g(x) = T (x) + T 2(x).
(5) follows from (36) of Lemma 8.3 with g(x) = (f(x, θ) − f(x, θ0))2 and the facts that∫∞−∞(f(s, θ)− f(s, θ0))2π(ds) > 0, for any θ 6= θ0.
Proof of Theorem 2.4. Recalling T (λx) ≤ v(λ)T1(x), we have
1
nv2(dn)
n∑t=1
[T (xt) + T 2(xt)] ≤ 1
n
n∑t=1
[T1
(xtdn
)+ T 2
1
(xtdn
)]→D
∫ 1
0
[T1(G(t)) + T 21 (G(t))]dt, (53)
where the convergence in distribution comes from Berkes and Horvath (2006). This proves
(4) with κn =√nv(dn) due to the local integrability of T1 + T 2
1 . To prove (5), by letting
G(λ, x) := f(λx, θ)− f(λx, θ0)− v(λ)g(x, θ, θ0), we have
1
nv2(dn)
n∑t=1
(f(xt, θ)− f(xt, θ0))2
=1
nv2(dn)
n∑t=1
[G(dn,
xtdn
)+ v(dn)g
(xtdn, θ, θ0
)]2
≥ 1
n
n∑t=1
g2(xtdn, θ, θ0
)− |Λn|, (54)
29
where Λn = 2nv(dn)
∑nt=1G
(dn,
xtdn
)g(xtdn, θ, θ0
). The similar arguments as in the proof of
(53) yield that
1
n
n∑t=1
g2(xtdn, θ, θ0
)→D Z :=
∫ 1
0
g2(G(s), θ, θ0)ds, (55)
where P (0 < Z < ∞) = 1 due to Assumption 2.4. On the other hand, it follows from
(10) that
1
nv2(dn)
n∑t=1
G2(dn,
xtdn
)≤ o(1)
nv2(dn)
n∑t=1
T 2(xt) ≤o(1)
n
n∑t=1
T 21 (xt/dn) = oP (1).
Now (5) follows from (54), (55) and the fact that
Λn ≤ 2∣∣∣ 1
nv2(dn)
n∑t=1
G2(dn,
xtdn
)∣∣∣1/2 ∣∣∣ 1n
n∑t=1
g2(xtdn, θ, θ0
)∣∣∣1/2 →P 0.
Proof of Theorem 2.5. See the Supplemental Material of this paper.
Proof of Theorem 3.1. It is readily seen that
Qn(θ0) = −n∑t=1
f(xt, θ0)(yt − f(xt, θ0)) = −n∑t=1
f(xt, θ0)ut,
Qn(θ) = −n∑t=1
f(xt, θ)f(xt, θ)′ −
n∑t=1
f(xt, θ)(yt − f(xt, θ)),
for any θ ∈ Θ. It follows from the conditions (i)–(iii) that
supθ:‖Dn(θ−θ0)‖≤kn
‖ (D−1n )′
[Qn(θ)− Qn(θ0)
]D−1n ‖= oP (1).
By using (iii) and (iv), we have (D−1n )′ Qn(θ0) = OP (1) and (D−1
n )′ Qn(θ0)D−1n →D M.
As M > 0, a.s., λmin[(D−1n )′ Qn(θ0)D−1
n ] ≥ ηn with probability that goes to one for some
ηn > 0, where λmin(A) denotes the smallest eigenvalue of A and ηn → 0 can be chosen
as slowly as required. Due to these facts, a modification of Lemma 1 in Andrews and
Sun (2004) implies (13). Note that (iv)’ implies (iv). The result Dn(θn − θ0)→D M−1Z
follows from (13) and the continuing mapping theorem.
Proof of Theorem 3.2. It suffices to verify the conditions (i)–(iii) and (iv)’ of Theorem
3.1 with
Dn =√n/dn I, Z = Σ1/2 NL
1/2G (1, 0), M = ΣLG(1, 0),
30
where I is an identity matrix, under Assumptions 3.1 and 3.2. In fact, as n/dn →∞, we
may take kn → ∞ such that θ : ||Dn(θ − θ0)|| ≤ kn falls in Nδ(θ0) = {θ : ||θ − θ0|| < δ}.This, together with Lemma 8.6, yields (i)–(iii).
On the other hand, it follows from Lemma 8.2 with g1(x) = α′3f(x, θ0) and g2(x) =
α′1f(x, θ0) f(x, θ0)′α2 that, for any αi = (αi1, ..., αim) ∈ Rm, i = 1, 2, 3,{α′3 Zn, α
′1Yn α2
}={(dn
n
)1/2n∑t=1
α′3 f(xt, θ0)ut,dnn
n∑t=1
α′1f(xt, θ0) f(xt, θ0)′α2
}→D
{τ N L
1/2G (1, 0), α′1M α2
}=D
{α′3 Z, α
′1M α2
}, (56)
where τ 2 =∫∞−∞[α′3f(s, θ0)]2ds, N is a standard normal random variable independent of
G(t) and we have used the fact that
τ N =(∫ ∞−∞
[α′3f(s, θ0)]2ds)1/2
N
=D α′3
(∫ ∞−∞
f(s, θ0)f(s, θ0)′ds)1/2
N = α′3 Σ1/2 N.
This proves (iv)’.
Proof of Theorem 3.3. As in the proof of Theorem 3.2, it suffices to verify the condi-
tions (i)–(iii) and (iv)’ of Theorem 3.1 with
Dn = a(n)I, Z = Σ1/2π NΠβ, M = Σπ Πβ,
under Assumptions 3.1* and 3.2*. The details are similar to that of Theorem 3.2, with
Lemma 8.2 replaced by Lemma 8.3, and hence are omitted.
Proof of Theorem 3.4. It suffices to verify the conditions (i)–(iii) and (iv)’ of Theorem
3.1 with Dn = diag(√nv1(dn), ...,
√nvm(dn)),
Z =
∫ 1
0
h(G(t), θ0)dU(t), M =
∫ 1
0
Ψ(t)Ψ(t)′dt,
under Assumptions 3.3 and 3.4. Recalling sup1≤i,j≤m |v(dn) vij(dn)
vi(dn) vj(dn)| < ∞, it follows from
(46) that
1
nvi(dn)vj(dn)sup
θ∈Nδ(θ0)
n∑t=1
∣∣fij(xt, θ)[f(xt, θ)− f(xt, θ0)]∣∣
≤ C
nvij(dn)v(dn)sup
θ∈Nδ(θ0)
n∑t=1
∣∣fij(xt, θ)[f(xt, θ)− f(xt, θ0)]∣∣ = oP (1),
31
for any 1 ≤ i, j ≤ m. On the other hand, as√nvi(dn) → ∞ for all 1 ≤ i ≤ m, we may
take kn →∞ such that θ : ||Dn(θ− θ0)|| ≤ kn falls in Nδ(θ0) = {θ : ||θ− θ0|| < δ}. These
facts imply that
supθ:‖Dn(θ−θ0)‖≤kn
‖ (D−1n )′
n∑t=1
f(xt, θ)[f(xt, θ)− f(xt, θ0)
]D−1n ‖= oP (1).
This proves the required (ii). The proofs of (i) and (iii) are similar, we omit the details.
Finally (iv)’ follows from Lemma 8.8 with
g(x, θ0) = α′3h(x, θ0) and g1(x, θ0) = α′1h(x, θ0)h(x, θ0)′α2.
Proofs of Theorems 4.1–4.3. See the Supplemental Material of this paper.
REFERENCES
Andrews, D. W. K. and Sun, Y., 2004. Adaptive local polynomial Whittle estimation of
long-range dependence. Econometrica, 72, 569-614.
Boden, T. A., Marland, G. and Andres, R. J., 2009. Global, Regional, and National
Fossil-Fuel CO2 Emissions. Carbon Dioxide Information Analysis Center, Oak Ridge
National Laboratory, U.S. Department of Energy, Oak Ridge, Tenn., U.S.A..
Bae, Y. and de Jong, R., 2007. Money demand function estimation by nonlinear cointe-
gration. Journal of Applied Econometrics, 22(4), 767–793.
Bae, Y. Kakkar, V. and Ogaki, M., 2004. Money demand in Japan and the liquidity trap.
ohio State University Department of Economics Working Paper #04–06.
Bae, Y., Kakkar, V., and Ogaki, M., 2006. Money demand in Japan and nonlinear
cointegration. Journal of Money, Credit, and Banking, 38(6), 1659–1667.
Bates, D. M. and Watts, D. G., 1988. Nonlinear Regression Analysis and Its Applications.
analysis, 60(4), 856865.
Berkes, I. and Horvath, L., 2006. Convergence of integral functionals of stochastic pro-
cesses. Econometric Theory, 22(2), 304–322.
Chang, Y. and Park, J. Y., 2011. Endogeneity in nonlinear regressions with integrated
time series, Econometric Review, 30, 51–87.
Chang, Y. and Park, J. Y., 2003. Index models with integrated time series. Journal of
Econometrics, 114, 73–106.
32
Chang, Y., Park, J. Y. and Phillips, P. C. B., 2001. Nonlinear econometric models with
cointegrated and deterministically trending regressors. Econometric Journal, 4, 1–36.
Chen, X., 1999. How Often Does a Harris Recurrent Markov Chain Recur? The Annals
of Probability, 27, 1324–1346.
Choi, I. and Saikkonen, P., 2010. Tests for nonlinear cointegration. Econometric Theory,
26, 682–709.
Dijkgraaf, E. and Vollebergh, H. R., 2005. A test for parameter homogeneity in CO2
panel EKC estimations. Environmental and Resource Economics, 32(2), 229–239.
de Jong, R., 2002. Nonlinear Regression with Integrated Regressors but Without Exo-
geneity, Mimeograph, Department of Economics, Michigan State University.
Feller, W., 1971. Introduction to Probability Theory and Its Applications, vol. II, 2nd
ed. Wiley.
Gao, J. K., Maxwell, Lu, Z. and Tjøstheim, D., 2009. Specification testing in nonlinear
and nonstationary time series autoregression. The Annals of Statistics, 37, 3893–3928.
Granger, C. W. J. and Terasvirta, T., 1993. Modelling Nonlinear Economic Relationships.
Oxford University Press, New York.
Hall, P. and Heyde, C. C., 1980. Martingale limit theory and its application. Probability
and Mathematical Statistics. Academic Press, Inc.
Hansen, B. E., 1992. Covergence to stochastic integrals for dependent heterogeneous
processes. Econometric Theory, 8, 489–500.
Ibragimov, R. and Phillips, P. C. B., 2008. Regression asymptotics using martingale
convergence methods Econometric Theory, 24, 888–947.
Jeganathan, P., 2008. Limit theorems for functional sums that converge to fractional
Brownian and stable motions. Cowles Foundation Discussion Paper No. 1649, Cowles
Foundation for Research in Economics, Yale University.
Karlsen, H. A. and Tjøstheim, D., 2001. Nonparametric estimation in null recurrent time
series. The Annals of Statistics, 29, 372–416.
Kurtz, T. G. and Protter, P., 1991. Weak limit theorems for stochastic integrals and
stochastic differential equations. The Annals of Probability, 19, 1035–1070.
Lai, T. L., 1994. Asymptotic theory of nonlinear least squares estimations. The Annals
of Statistics, 22, 1917–1930.
Lai, T. L. and Wei, C. Z., 1982. Least squares estimates in stochastic regression models
with applications to identification and control of dynamic systems, The Annals of
33
Statistics, 10, 154–166.
List, J. and Gallet, C., 1999. The environmental Kuznets curve: does one size fits all?
Ecological Economics 31, 409–423.
Maddison, A., 2003. The World Economy: Historical Statistics, OECD.
Muller-Furstenberger, G. and Wagner, M., 2007. Exploring the environmental Kuznets
hypothesis: Theoretical and econometric problems. Ecological Economics, 62(3), 648–
660.
Nummelin, E., 1984. General Irreducible Markov Chains and Non-negative Operators.
Cambridge University Press.
Orey, S., 1971. Limit Theorems for Markov Chain Transition Probabilities. Van Nostrand
Reinhold, London.
Piaggio, M. and Padilla, E., 2012. CO2 emissions and economic activity: Heterogeneity
across countries and non-stationary series. Energy policy, 46, 370–381.
Park, J. Y. and Phillips, P. C. B., 1999. Asymptotics for nonlinear transformation of
integrated time series. Econometric Theory, 15, 269–298.
Park, J. Y. and Phillips, P. C. B., 2001. Nonlinear regressions with integrated time series.
Econometrica, 69, 117–161.
Phillips, P. C. B., 2001. Descriptive econometrics for non-stationary time series with
empirical illustrations. Cowles Foundation discuss paper No. 1023.
Phillips, P. C. B. and Perron, P., 1988. Testing for a unit root in time series regression.
Biometrika, 75(2), 335–346.
Phillips, P. C. B. and Solo, V., 1992. Asymptotics for linear processes. The Annals of
Statistics, 971–1001.
Terasvirta, T., Tjøstheim, D. and Granger, C. W. J., 2011. Modelling Nonlinear Economic
Time Series, Advanced Texts in Econometrics.
Shi, X. and Phillips, P. C. B., 2012. Nonlinear cointegrating regression under weak
identification. Econometric Theory, 28, 509–547.
Skouras, K., 2000. Strong consistency in nonlinear stochastic regression models. The
Annals of Statistics, 28, 871–879.
Wagner, M., 2008. The carbon Kuznets curve: A cloudy picture emitted by bad econo-
metrics? Resource and Energy Economics, 30(3), 388–408.
Wang, Q., 2013. Martingale limit theorems revisited and non-linear cointegrating regres-
sion. Econometric Theory, forthcoming.
34
Wang, Q., Lin, Y. X. and Gulati, C. M., 2003. Asymptotics for general fractionally
integrated processes with applications to unit root tests. Econometric Theory, 19,
143–164.
Wang, Q. and Phillips, P. C. B., 2009. Asymptotic theory for local time density estimation
and nonparametric cointegrating regression. Econometric Theory, 25, 710–738.
Wang, Q. and Phillips, P. C. B., 2009. Structural nonparametric cointegrating regression.
Econometrica, 77, 1901–1948.
Wang, Q. and Phillips, P. C. B., 2012. A specification test for nonlinear nonstationary
models. The Annals of Statistics, 40, 727-758.
Wu, C. F., 1981. Asymptotic theory of nonlinear least squares estimations. The Annals
of Statistics, 9, 501–513.
35
Figure 5: Estimates and 95% Confidence Intervals of β2n.
36