nonlinear regressions with nonstationary time series · 2014-03-21 · nonlinear regressions with...

Nonlinear regressions with nonstationary time series

Nigel Chan and Qiying Wang∗

The University of Sydney

March 20, 2014

Abstract

This paper develops asymptotic theory for a nonlinear parametric cointegratingregression model. We establish a general framework for weak consistency that iseasy to apply for various nonstationary time series, including partial sums of linearprocesses and Harris recurrent Markov chains. We provide limit distributions fornonlinear least squares estimators, extending the previous works. We also introduceendogeneity to the model by allowing the error to be serially dependent on and crosscorrelated with the regressors.

JEL Classification: C13, C22.

Key words and phrases: Cointegration, nonlinear regressions, consistency, limit distribu-

tion, nonstationarity, nonlinearity, endogeneity.

1 Introduction

The past few decades have witnessed significant developments in cointegration analy-

sis. In particular, extensive researches have focused on cointegration models with linear

structure. Although it gives data researchers convenience in implementation, the linear

structure is often too restrictive. In particular, nonlinear responses with some unknown

parameters often arise in the context of economics. For empirical examples, we refer to

Granger and Terasvirta (1993) as well as Terasvirta et al. (2011). In this situation, it is

expected that nonlinear cointegration captures the features of many long-run relationships

in a more realistic manner.

∗The author thanks to the Editor, Associate Editor and three referees for their very helpful commentson the original version. The authors also acknowledge research support from Australian Research Council.Address correspondence to Qiying Wang: School of Mathematics and Statistics, University of Sydney,NSW 2006, Australia. E-mail: [email protected]

1

A typical nonlinear parametric cointegrating regression model has the form

yt = f(xt, θ0) + ut, t = 1, ..., n, (1)

where f : R×Rm → R is a known nonlinear function, xt and ut are regressor and regression

errors, and θ0 is an m-dimensional true parameter vector that lies in the parameter set

Θ. With the observed nonstationary data {yt, xt}nt=1, this paper is concerned with the

nonlinear least squares (NLS) estimation of the unknown parameters θ0 ∈ Θ. In this

regard, Park and Phillips (2001) (PP henceforth) considered xt to be an integrated, I(1),

process. Based on PP framework, Chang et al. (2001) introduced an additional linear time

trend term and stationary regressors into model (1). Chang and Park (2003) extended to

nonlinear index models driven by integrated processes. More recently, Choi and Saikkonen

(2010), Gao et al. (2009) and Wang and Phillips (2012) developed statistical tests for

the existence of a nonlinear cointegrating relation. Park and Chang (2011) allowed the

regressors xt to be contemporaneously correlated with the regression errors ut and Shi

and Phillips (2012) extended the model (1) by incorporating a loading coefficient.

The present paper has a similar goal to the previously mentioned papers but offers

more general results, which have some advantages for empirical studies. First of all, we

establish a general framework for weak consistency of the NLS estimator θn, allowing for

the xt to be a wider class of nonstationary time series. The set of sufficient conditions

is easy to apply for various nonstationary regressors, including partial sums of linear

processes and recurrent Markov chains. Furthermore, we provide limit distributions for

the NLS estimator θn. It deserves to mention that the routine employed in this paper

to establish the limit distributions of θn is different from those used in the previous

works, e.g. Park and Phillips (2001). Roughly speaking, our routine is related to joint

distributional convergence of a martingale under target and its conditional variance, rather

than using classical martingale limit theorem which requires establishing the convergence

in probability for the conditional variance. In nonlinear cointegrating regressions, there

are some advantages for our methodology since it is usually difficult to establish the

convergence in probability for the conditional variance, in particular, in the situation that

the regressor xt is a nonstationary time series. Second, in addition to the commonly used

martingale innovation structure, our model allows for serial dependence in the equilibrium

errors ut and the innovations driving xt. It is important as our model permits joint

determination of xt and yt, and hence the system is a time series structural model. Under

such situation, the weak consistency and limit distribution of the NLS estimator θn are

2

also established.

This paper is organized as follow. Section 2 presents our main results on weak con-

sistency of the NLS estimator θn. Theorem 2.1 provides a general framework. Its ap-

plications to integrable and non-integrable f are given in Theorems 2.2–2.5, respectively.

Section 3 investigates the limit distributions of θn in which the model (1) has a martingale

structure. Extension to endogeneity is presented in Section 4. As mentioned above, our

routine establishing the limit distribution of θn is different from previous works. Section

5 performs simulation, reporting and discussing the numerical values of means and stan-

dard errors which provide the evidence of accuracies of our NLS estimator. Section 6

presents an empirical example, providing a link between our theory and real applications.

The model of interest is the carbon Kuznets curve relating the per capita CO2 emissions

and per capita GDP. Endogeneity occurs in this example due to potential misreporting of

GDP, omitted variable bias and reverse causality. Section 7 concludes the paper. Section

8 provides partial technical proofs. Full details of the technical proofs can be found in

the Supplemental Material of this paper, where we also provide a unit root test for our

empirical example, and other details for simulation.

Throughout the paper, we denote constants by C,C1, C2, ... which may be different at

each appearance. For a vector x = (x1, ..., xm), assume that ‖x‖ = (x21 + ...+ x2

m)1/2, and

for a matrix A, the norm operator ‖·‖ is defined by ‖A‖ = supx:‖x‖=1 ‖xA‖. Furthermore,

the parameter set Θ ⊂ Rm is assumed to be compact and convex, and the true parameter

vector θ0 is an interior point of Θ.

2 Weak consistency

This section considers the estimation of the unknown parameters θ0 in model (1) by

NLS. Let Qn(θ) =∑n

t=1(yt − f(xt, θ))2. The NLS estimator θn of θ0 is defined to be the

minimizer of Qn(θ) over θ ∈ Θ, that is,

θn = arg minθ∈ΘQn(θ), (2)

and the error estimator is defined by σ2n = n−1

∑nt=1 u

2t , where ut = yt − f(xt, θn). To

investigate the weak consistency for the NLS estimator θn, this section assumes the regres-

sion model (1) to have a martingale structure. In this situation, our sufficient conditions

are closely related to those of Wu (1981), Lai (1994) and Skouras (2000), intending to

provide a general framework. In comparison to the papers mentioned, our assumptions

3

are easy to apply, particularly in nonlinear cointegrating regression situation as stated in

the two examples below. Extension to endogeneity between xt and ut is investigated in

Section 4.

2.1 A framework

We make use of the following assumptions for the development of weak consistency.

Assumption 2.1. For each π, π0 ∈ Θ, there exists a real function T : R→ R such that

|f(x, π)− f(x, π0)| ≤ h(||π − π0||)T (x), (3)

where h(x) is a bounded real function such that h(x) ↓ h(0) = 0, as x ↓ 0.

Assumption 2.2. (i) {ut,Ft, 1 ≤ t ≤ n} is a martingale difference sequence satisfying

E(|ut|2|Ft−1) = σ2 and sup1≤t≤nE(|ut|2q|Ft−1) <∞ a.s., where q > 1; and

(ii) xt is adapted to Ft−1, t = 1, ..., n.

Assumption 2.3. There exists an increasing sequence 0 < κn →∞ such that

κ−2n

n∑t=1

[T (xt) + T 2(xt)] = OP (1), (4)

and for any 0 < η < 1 and θ 6= θ0, where θ, θ0 ∈ Θ, there exist n0 > 0 and M1 > 0 such

that

P( n∑t=1

(f(xt, θ)− f(xt, θ0))2 ≥ κ2n /M1

)≥ 1− η, (5)

for all n > n0.

Theorem 2.1. Under Assumptions 2.1–2.3, the NLS estimator θn is a consistent esti-

mator of θ0, i.e. θn →P θ0. If in addition κ2nn−1 = O(1), then σ2

n →P σ2, as n→∞.

Assumptions 2.1 and 2.2 are the same as those used in Skouras (2000), which are

standard in the NLS estimation theory. Also see Wu (1981) and Lai (1994). Assumption

2.3 is used to replace (3.8), (3.9) and (3.11) in Skouras (2000), in which some uniform

conditions are used. In comparison to Skouras (2000), our Assumption 2.3 is related to

the conditions on the regressor xt and is more natural and easy to apply. In particular,

it is directly applicable in the situation that T is integrable and the regressor xt is a

nonstationary time series, as stated in the following sub-section.

4

2.2 Assumption 2.3: integrable functions

Due to Assumption 2.1, f(x, θ)− f(x, θ0) is integrable in x if T is an integrable function.

This class of functions includes f(x, θ1, θ2) = θ1|x|θ2I(x ∈ [a, b]), where a and b are finite

constants, the Gaussian function f(x, θ) = e−θx2, the Laplacian function f(x, θ) = e−θ|x|,

the logistic regression function f(x, θ) = eθ|x|/(1 + eθ|x|), etc. In this sub-section, two

commonly used non-stationary regressors xt are shown to satisfy Assumption 2.3 if T is

integrable.

Example 1 (Partial sums of linear processes). Let xt =∑t

j=1 ξj, where {ξj, j ≥ 1}is a linear process defined by

ξj =∞∑k=0

φk εj−k, (6)

where {εj,−∞ < j <∞} is a sequence of i.i.d. random variables with Eε0 = 0, Eε20 = 1

and the characteristic function ϕ(t) of ε0 satisfying∫∞−∞ |ϕ(t)|dt <∞. The coefficients φk

are assumed to satisfy one of the following conditions:

C1. φk ∼ k−µρ(k), where 1/2 < µ < 1 and ρ(k) is a function slowly varying at ∞.

C2.∑∞

k=0 |φk| <∞ and φ ≡∑∞

k=0 φk 6= 0.

Put d2n = Ex2

n. As in Wang, Lin and Gulati (2003), we have

d2n = Ex2

n ∼

{cµn

3−2µρ2(n), under C1,

φ2n, under C2,(7)

where cµ = 1/((1− µ)(3− 2µ))∫∞

0x−µ(x+ 1)−µdx. We have the following result.

Theorem 2.2. Suppose xt is defined as in Example 1 and Assumption 2.1 holds. Assume:

(i) T is bounded and integrable, and

(ii)∫∞−∞(f(s, θ)− f(s, θ0))2ds > 0 for all θ 6= θ0.

Then (4) and (5) hold with κ2n = n/dn. Consequently, if in addition Assumption 2.2, then

θn →P θ0.

Theorem 2.2 improves Theorem 4.1 of PP in two folds. Firstly, we allow for more

general regressor. The result under C1 is new, which allows xt to be long memory

process, including the fractionally integrated process as an example. PP only allow xt to

satisfy C2 with additional conditions on φk, that is, they require xt to be a partial sums

of a short memory process. Secondly, we remove the part (b) required in the definition

of an I-regular function given in their Definition 3.3. Furthermore, we allow for certain

5

non-integrable f such as the logistic regression function f(x, θ) = eθ|x|/(1 + eθ|x|), θ ≥ θ0

for some θ0 > 0, although we assume the integrability of T .

Example 2 (Recurrent Markov Chain). Let {xk}k≥0 be a Harris recurrent Markov

chain with state space (E, E), transition probability P (x,A) and invariant measure π. We

denote Pµ for the Markovian probability with the initial distribution µ, Eµ for correspon-

dent expectation and P k(x,A) for the k-step transition of {xk}k≥0. A subset D of E with

0 < π(D) <∞ is called D-set of {xk}k≥0 if for any A ∈ E+,

supx∈E

Ex( τA∑k=1

ID(xk))<∞,

where E+ = {A ∈ E : π(A) > 0} and τA = inf{n ≥ 1 : xn ∈ A}. By Theorem 6.2 of Orey

(1971), D-sets not only exist, but generate the entire sigma E , and for any D-sets C,D

and any probability measure ν, µ on (E, E),

limn→∞

n∑k=1

νP k(C)/n∑k=1

µP k(D) =π(C)

π(D), (8)

where νP k(D) =∫∞−∞ P

k(x,D)ν(dx). See Nummelin (1984) for instance.

Let a D-set D and a probability measure ν on (E, E) be fixed. Define

a(t) = π−1(D)

[t]∑k=1

νP k(D), t ≥ 0.

By recurrence, a(t) → ∞. Here and below, we set the state space to be the real space,

that is (E, E) = (R,R). We have the following result.

Theorem 2.3. Suppose xt is defined as in Example 2 and Assumption 2.1 holds. Assume:

(i) T is bounded and∫∞−∞ |T (x)|π(dx) <∞, and

(ii)∫∞−∞(f(s, θ)− f(s, θ0))2π(ds) > 0 for all θ 6= θ0.

Then (4) and (5) hold with κ2n = a(n). Consequently, if in addition Assumption 2.2, then

θn →P θ0.

Theorem 2.3 seems to be new to the literature. By virtue of (8), the asymptotic order

of a(t) depends only on {xk}k≥0. It is interesting to notice that Theorem 2.3 does not

impose the β-regular condition as commonly used in the literature. The Harris recurrent

Markov chain {xk}k≥0 is called β-regular if

limλ→∞

a(λt)/a(λ) = tβ, ∀t > 0, (9)

where 0 < β ≤ 1. See Chen (1999) for instance.

6

2.3 Assumption 2.3: beyond integrable functions

As noticed in Section 2.2, Assumption 2.3 holds for certain non-integrable f although we

require the integrability of T . Assumption 2.3 is more involved for general non-integrable

f when T is also non-integrable. In the latter situation, we require certain relationships

between f and T .

Assumption 2.4. (i) There exists a regular function1 g(x, θ, θ0) on Θ satisfying∫|s|≤δ

g2(s, θ, θ0)ds > 0

for all θ 6= θ0 and δ > 0, a real function T : R→ R and a positive real function v(λ)

which is bounded away from zero as λ→∞, such that for any bounded x,

supθ,θ0∈Θ

|f(λx, θ)− f(λx, θ0)− v(λ) g(x, θ, θ0)|/T (λx) = o(1), (10)

as λ→∞.

(ii) There exists a real function T1 : R → R such that T (λx) ≤ v(λ)T1(x) as |λx| → ∞and T1 + T 2

1 is locally integrable (i.e. integrable on any compact set).

Theorem 2.4. Suppose Assumption 2.4 holds. Suppose that there exists a continuous

Gaussian process G(t) such that x[nt],n ⇒ G(t), on D[0, 1], where xi,n = xi/dn and

0 < dn → ∞ is a sequence of real numbers. Then (4) and (5) hold with κ2n = nv2(dn).

Consequently, if in addition Assumptions 2.1–2.2 and assumption that the T s in Assump-

tions 2.1 and 2.4 coincide, then θn →P θ0.

Assumptions 2.1 and 2.4 are quite general, including many commonly used regression

functions. Typical examples include f(x, θ) = (x + θ)2, θex/(1 + ex), θ log |x|, θ|x|α

(α is fixed) and θ0 + θ1|x| + ... + θk|x|k. The class of functions satisfying Assumptions

2.1 and 2.4 are similar to, but wider than those imposed in Theorem 4.2 of PP. For

instance, Assumptions 2.1 and 2.4 (hence Theorem 2.4) are applicable for the function

f(x, θ) = (x + θ)2 [with T (x) = T1(x) = |x|, v(λ) = λ and g(x, θ, θ0) = 2(θ − θ0)x], but

Theorem 4.2 of PP is not directly applicable for this function. See, e.g., Example 4.1 (c)

of PP. We also mention that our condition on the regressor xt is much more general than

1Function H is called regular on Θ if (a) for all θ ∈ Θ, there exist for each ε > 0 continuous functionsHε, Hε, and a constant δε > 0 such that Hε(x, θ) ≤ H(y, θ) ≤ Hε(x, θ) for all |x − y| < δε on K, acompact set of R, and such that

∫K

(Hε − Hε)(x, θ)dx → 0 as ε → 0, and (b) for all x ∈ R, H(x, ·) isequicontinuous in a neighborhood of x.

7

that of PP, as we only require that x[nt]/dn converges weakly to a continuous Gaussian

process. This kind of weak convergence condition is very likely close to be necessary.

We next consider a class of asymptotically homogeneous functions. In this regard, we

follow PP. Let f : R×Θ→ R have the structure:

f(λx, θ) = v(λ, θ)h(x, θ) + b(λ, θ)A(x, θ)B(λx, θ), (11)

where supθ∈Θ |b(λ, θ) v−1(λ, θ)| → 0, as λ → ∞; supθ∈Θ |A(x, θ)| is locally bounded, that

is, bounded on bounded intervals; supθ∈Θ |B(λx, θ)| is bounded on R; h(x, θ) is regular on

Θ satisfying∫|s|≤δ h

2(s, θ)ds > 0 for all θ 6= θ0 and δ > 0, and v(λ, θ) satisfy: there exist

ε > 0 and a neighborhood N of θ such that as λ→∞

inf|p−p|<ε|q−q|<ε

infθ∈N|pv(λ, θ)− qv(λ, θ0)| → ∞, (12)

for any θ 6= θ0 and p, q > 0.

Theorem 2.5. Suppose that f in model (1) has the structure (11), and in addition to

Assumption 2.2, there exists a continuous Gaussian process G(t) such that x[nt],n ⇒ G(t)

on D[0, 1], where xi,n = xi/dn and 0 < dn →∞ is a sequence of real numbers. Then, the

NLS estimator θn defined by (2) is a consistent estimator of θ0, i.e. θn →P θ0.

The conditions on f given in Theorem 2.5 are the same as those used in Theorem 4.3 of

PP, which are satisfied by the functions such as the Box-Cox transformation (|x|θ − 1)/θ.

The difference between current Theorem 2.5 and Theorem 4.3 of PP is that we only

require that x[nt]/dn converges weakly to a continuous Gaussian process, which is close to

be necessary.

To investigate the weak consistency of θn, this section makes use of three different

sets of conditions on f : Assumption 2.1 with T being integrable (Theorems 2.2–2.3);

Assumptions 2.1 and 2.4 (Theorem 2.4) and the condition (11) (Theorem 2.5). We remark

that these condition sets are mutually exclusive. There are examples of the function f

such that one of these assumptions holds, but not other two. For instance, the function

f(x, θ) = θ/(1 + x2) satisfies Assumption 2.1 with h(x) = x and T (x) = 1/(1 + x2),

but Assumption 2.4 and the condition (11) fail; the function f(x, θ) = (x + θ)2 satisfies

Assumptions 2.1 and 2.4, but it does not satisfy condition (11).

8

3 Limit distribution

This section considers limit distribution of θn. In what follows, let Qn and Qn be the

first and second derivatives of Qn(θ) in the usual way, that is, Qn = ∂Qn/∂θ and Qn =

∂2Qn/∂θ∂θ′. Similarly we define f and f . We assume these quantities exist whenever they

are introduced. The following result comes from an application of Lemma 1 in Andrews

and Sun (2004), which provides a framework and plays a key part in our development on

limit distribution of θn.

Theorem 3.1. There exists a sequence of m ×m nonrandom nonsingular matrices Dn

with ‖ D−1n ‖→ 0, as n→∞, such that

(i) supθ:‖Dn(θ−θ0)‖≤kn ‖ (D−1n )′

∑nt=1

[f(xt, θ)f(xt, θ)

′−f(xt, θ0)f(xt, θ0)′]D−1n ‖= oP (1),

(ii) supθ:‖Dn(θ−θ0)‖≤kn ‖ (D−1n )′

∑nt=1 f(xt, θ)

[f(xt, θ)− f(xt, θ0)

]D−1n ‖= oP (1),

(iii) supθ:‖Dn(θ−θ0)‖≤kn ‖ (D−1n )′

∑nt=1 f(xt, θ)utD

−1n ‖= oP (1),

(iv) Yn := (D−1n )′

∑nt=1 f(xt, θ0)f(xt, θ0)′D−1

n →D M , where M > 0, a.s., and

Zn := (D−1n )′

n∑t=1

f(xt, θ0)ut = OP (1),

for some sequence of constants {kn, n ≥ 1} for which kn → ∞, as n → ∞. Then, there

exists a sequence of estimators {θn, n ≥ 1} satisfying Qn(θn) = 0 with probability that

goes to one and

Dn(θn − θ0) = Y −1n Zn + oP (1). (13)

If we replace (iv) by the following (iv)’, then Dn(θn − θ0)→D M−1 Z.

(iv)’ for any α′i = (αi1, ..., αim) ∈ Rm, i = 1, 2, 3,

(α′1 Ynα2, α′3Zn) →D (α′1M α2, α′3 Z),

where M > 0, a.s. and P (Z <∞) = 1.

Our routine in establishing the limit distribution of θn is essentially different from that

of PP, and is particularly convenient in the situation that f , f and f all are integrable in

which Dn = κn I for some 0 < κn → ∞, where I is an identity matrix. See next section

for examples. The condition AD3 of PP requires to show Yn →P M instead of (iv). It is

equivalent to say that, under the PP’s routine, one requires to show (at least under an

enlarged probability space)

1√n

n∑t=1

g(xt)→P

∫ ∞−∞

g(s)dsLW (1, 0), (14)

9

where xt is an integrated process, g is a real integrable function and LW (t, s) is the

local time of the standard Brownian Motion W (t)2. The convergence in probability is

usually hard or impossible to establish without enlarging the probability space. Our

routine essentially reduces the convergence in probability to less restrictive convergence

in distribution. Explicitly, in comparison to (14), we only need to show that

1√n

n∑t=1

g(xt)→D

∫ ∞−∞

g(s)dsLW (1, 0). (15)

This allows to extend the nonlinear regressions to a much wider class of nonstationary

regressor data series, and enables our methodology and proofs straightforward and neat.

3.1 Limit distribution: integrable functions

We now establish our results on convergence in distribution for the θn, which mainly

involves settling the conditions (i)–(iii) and (iv)’ in Theorem 3.1. First consider the

situation that f is an integrable function, together with some additional conditions on xt

and ut.

Assumption 3.1. (i) xt is defined as in Example 1, that is, xt =∑t

j=1 ξj, where ξj

satisfies (6);

(ii) Fk is a sequence of increasing σ-fields such that εk ∈ Fk and εk+1 is independent of

Fk for all k ≥ 1, and εk ∈ F1 for all k ≤ 0;

(iii) {uk,Fk}k≥1 forms a martingale difference satisfying maxk≥m |E(u2k+1 | Fk) − 1| → 0

a.s., and for some δ > 0, maxk≥1E(|uk+1|2+δ | Fk) <∞.

Assumption 3.2. Let p(x, θ) be one of f , fi and fij, 1 ≤ i, j ≤ m.

(i) p(x, θ0) is a bounded and integrable real function;

(ii) Σ =∫∞−∞ f(s, θ0)f(s, θ0)′ds > 0 and

∫∞−∞(f(s, θ)− f(s, θ0))2ds > 0 for all θ 6= θ0;

(iii) There exists a bounded and integrable function Tp : R → R such that |p(x, θ) −p(x, θ0)| ≤ hp(||θ − θ0||)Tp(x), for each θ, θ0 ∈ Θ, where hp(x) is a bounded real

function such that hp(x) ↓ hp(0) = 0, as x ↓ 0.

2Here and below, the process {Lζ(t, s), t ≥ 0, s ∈ R} is said to be the local time of a measurableprocess {ζ(t), t ≥ 0} if, for any locally integrable function T (x),∫ t

0

T [ζ(s)]ds =

∫ ∞−∞

T (s)Lζ(t, s)ds, all t ∈ R,

with probability one.

10

Theorem 3.2. Under Assumptions 3.1 and 3.2, we have√n/dn(θn − θ0)→D Σ−1/2 NL

−1/2G (1, 0), (16)

where N is a standard normal random vector, which is independent of G(t) defined by

G(t) =

{W3/2−µ(t), under C1,

W (t), under C2.(17)

Here and below, Wβ(t) denotes the fractional Brownian motion with 0 < β < 1 on

D[0, 1], defined as follows:

Wβ(t) =1

A(β)

[ ∫ 0

−∞

[(t− s)β−1/2 − (−s)β−1/2

]dW (s) +

∫ t

0

(t− s)β−1/2dW (s)],

where W (s) is a standard Brownian motion and

A(β) =( 1

2β+

∫ ∞0

[(1 + s)β−1/2 − sβ−1/2

]2

ds)1/2

.

Remark 3.1. Theorem 3.2 improves Theorem 5.1 of PP by allowing xt to be a long

memory process, which includes fractional integrated processes as an example.

Using the same routine and slight modifications of assumptions, we have the following

limit distribution when xt is a β-regular Harris recurrent Markov chain.

Assumption 3.1*. (i) xt is defined as in Example 2 satisfying (9), that is, xt is a β-

regular Harris recurrent Markov chain;

(ii) E(uk | Fnk) = 0 for any 1 ≤ k ≤ n, n ≥ 1, maxk≥m |E(u2k+1 | Fnk)− 1| → 0 a.s. and

for some δ > 0, maxk≥1E(|uk+1|2+δ | Fnk) <∞, where Fnk = σ(Fk, x1, ..., xn).

Assumption 3.2*. Let p(x, θ) be one of f , fi and fij, 1 ≤ i, j ≤ m.

(i) p(x, θ0) is a bounded real function with∫∞∞|p(x, θ0)|π(dx) <∞;

(ii) Σπ =∫∞−∞ f(s, θ0)f(s, θ0)′π(ds) > 0 and

∫∞−∞(f(s, θ) − f(s, θ0))2π(ds) > 0 for all

θ 6= θ0;

(iii) There exists a bounded function Tp : R → R with∫∞−∞ Tp(x)π(dx) < ∞ such that

|p(x, θ)− p(x, θ0)| ≤ hp(||θ− θ0||)Tp(x), for each θ, θ0 ∈ Θ, where hp(x) is a bounded

real function such that hp(x) ↓ hp(0) = 0, as x ↓ 0.

11

Theorem 3.3. Under Assumptions 3.1* and 3.2*, we have√a(n)(θn − θ0)→D Σ−1/2

π NΠ−1/2β , (18)

where N is a standard normal random vector, which is independent of Πβ, and for β = 1,

Πβ = 1, and for 0 < β < 1, Π−ββ is a stable random variable with Laplace transform

E exp{−tΠ−ββ } = exp{− tβ

Γ(β + 1)

}, t ≥ 0. (19)

Remark 3.2. Theorem 3.3 is new, even for the stationary case (β = 1) in which a(n) = n

and the NLS estimator θn converges to a normal variate. The random variable Πβ in

the limit distribution, after scaled by a factor of Γ(β + 1)−1, is a Mittag-Leffler random

variable with parameter β, which is closely related to a stable random variable. For details

regarding the properties of this distribution, see page 453 of Feller (1971) or Theorem 3.2

of Karlsen and Tjøstheim (2001).

Remark 3.3. Assumption 3.1* (ii) imposes a strong orthogonal property between the

regressor xt and the error sequence ut. It is not clear at the moment whether such

condition can be relaxed to a less restrictive one where xt is adapted to Fnt and xt+1 is

independent of Fnt for all 1 ≤ t ≤ n. We leave this for future research.

Remark 3.4. Unlike linear cointegration, integrable regression functions do not propor-

tionately transfer the effect of outliers of the nonstationary regressor to the response.

This is useful when practitioners want to lessen the impact of extreme values of xt on the

response. In addition, such functions arise in macro-economics when the dependent vari-

able has an uneven response to the regressor. Empirical examples include central banks’

market intervention and the currency exchange target zones, described in Phillips (2001).

Remark 3.5. The convergence rates in (16) and (18) are reduced by the nonstationarity

property of the regressor xt when f is an integrable function. For the partial sums of linear

processes in Theorem 3.2, comparing to the stationary situation where the convergence

rate is n1/2, the rate is reduced to n3/4−µ/2ρ1/2(n) for the long memory case, and is further

reduced to n1/4 for the short memory case. For the β-regular Harris recurrent Markov

chain, note that by (9), the asymptotic order a(n) is regularly varying at infinity, i.e., there

exists a slowly varying function ρ(n) such that a(n) ∼ nβρ(n). Therefore, the convergence

rate of the NLS estimator decreases as the regularity index β of the chain decreases from

one (stationary situation) to zero.

12

3.2 Limit distribution: beyond integrable functions

We next consider the limit distribution of θn when the regression function f is non-

integrable. Unlike PP and Chang et al. (2001), we use a more general regressor xt in the

development of our asymptotic distribution.

Assumption 3.3. (i) {ut,Ft, 1 ≤ t ≤ n} is a martingale difference sequence satisfying

E(|ut|2|Ft−1) = σ2 and sup1≤t≤nE(|ut|2q|Ft−1) <∞ a.s., where q > 1;

(ii) xt is adapted to Ft−1, t = 1, ..., n, max1≤t≤n |xt|/dn = OP (1), where d2n = var(xn) →

∞; and

(iii) There exists a vector of continuous Gaussian processes (G,U) such that

(x[nt],n, n−1/2

[nt]∑i=1

ui)→D (G(t), U(t)), (20)

on DR2 [0, 1], where xt,n = xt/dn.

Assumption 3.4. Let p(x, θ) be one of f , fi and fij, 1 ≤ i, j ≤ m. There exists a real

function Tp : R→ R such that

(i) |p(x, θ)− p(x, θ0)| ≤ Ap(||θ− θ0||)Tp(x), for each θ, θ0 ∈ Θ, where Ap(x) is a bounded

real function satisfying Ap(x) ↓ Ap(0) = 0, as x ↓ 0;

(ii) For any bounded x, supθ∈Θ |p(λx, θ) − vp(λ)hp(x, θ)|/Tp(λx) = o(1), as λ → ∞,

where hp(x, θ) on Θ is a regular function and vp(λ) is a positive real function which

is bounded away from zero as λ→∞; and

(iii) Tp(λx) ≤ vp(λ)T1p(x) as |λx| → ∞, where T1p : R → R is a real function such that

T1p + T 21p is locally integrable (i.e. integrable on any compact set).

For notation convenience, let v(t) = vf (t), h(x, θ) = hf (x, θ), vi(t) = vfi(t) and

v(t) = (v1(t), ..., vm(t)). Similarly, we define v(t), h(x, θ) and h(x, θ). We have the

following main result.

Theorem 3.4. Suppose Assumptions 3.3 and 3.4 hold. Further assume that

sup1≤i,j≤m |v(dn) vij(dn)

vi(dn) vj(dn)| <∞ and

∫|s|≤δ h(s, θ0)h(s, θ0)′ds > 0 for all δ > 0. Then we have

Dn (θn − θ0)→D

(∫ 1

0

Ψ(t)Ψ(t)′dt)−1

∫ 1

0

Ψ(t) dU(t), (21)

on D[0, 1], as n→∞, where Ψ(t) = h(G(t), θ0) and Dn = diag(√nv1(dn), ...,

√nvm(dn)).

13

Remark 3.6. Except the joint convergence, other conditions in establishing Theorem

3.4 are similar to Theorem 5.2 of PP. PP made use of the concept of H0-regular. Our

conditions on f , fi and fij, 1 ≤ i, j ≤ m are more straightforward and easy to identify.

Furthermore, the joint convergence assumption under present paper is quite natural when

f, f and f all satisfy Assumption 3.4 (ii). Indeed, under fi(λx, θ) ∼ vi(λ)hi(x, θ), one can

easily obtain the following asymptotics under our joint convergence.

D−1n

n∑t=1

f(xt, θ0)ut →D

∫ 1

0

h(G(t), θ0)dU(t), (22)

on D[0, 1], which is required in the proof of (21). See, e.g., Kurtz and Protter (1991) and

Hansen (1992).

Remark 3.7. If we further assume that U(t) and G(t) are asymptotic independent, that

is, the long run relationship between the regressor sequence xt and the innovative sequence

ut vanishes asymptotically, the limiting distribution in (21) will become mixed normal.

In particular, when U(t) is a standard Wiener process, we have

Dn (θn − θ0)→D

(∫ 1

0

Ψ(t)Ψ(t)′dt)−1/2

N, (23)

where N is a standard normal random vector.

Remark 3.8. Nonlinear cointegrating regressions with structure described in Theorem

3.4 are useful to modelling money demand functions. In such cases, yt is the logarithm of

the real money balance, xt is the nominal interest rate, and f can either be f(x, α, β) =

α + β log |x| or f(x, α, β) = α + β log(1+|x||x| ). See Bae and de Jong (2007) and Bae et

al. (2006) for empirical studies investigating the estimation of money demand functions

in USA and Japan respectively. Also, see Bae et al. (2004) for the derivation of these

functional forms from the underlying money demand theories studied in macro-economics.

Remark 3.9. Another example is the Michaelis–Menton model, which was considered

by Lai (1994). In such case, the regression function is f(x, θ1, θ2) = θ1/(θ2 + x)I{x ≥ 0}.Bates and Watts (1988) employs such model to investigate the dependence of rate of

enzymatic reactions on the concentration of substrate.

Remark 3.10. Unlike in the situation where f is integrable, there is super convergence

when f has the structure given in Assumption 3.4 and xt is nonstationary. The conver-

gence rate is given by Dn, whose elements√nvi(dn) are all faster than the standard rate

√n in the stationary situation.

14

We finally remark that, if f is an asymptotically homogeneous function having the

structure (11), it is also possible to extend PP’s result from a short memory linear process

to general nonstationary time series as described in Assumption 3.3. The idea of proof

for such result can be easily generalized from Theorem 5.3 of PP. Also see Chang et al.

(2001). As it only involves slight modification of notation, we omit the details.

4 Extension to endogeneity

Assumption 2.2 ensures the model (1) having a martingale structure. The result in this

regard is now well known. However, there is little work on allowing for contemporaneous

correlation between the regressors and regression errors. Using a nonparametric approach,

Wang and Phillips (2009b) considered kernel estimate of f and allowed the equation error

ut to be serially dependent and cross correlated with xs for |t− s| < m0, thereby inducing

endogeneity in the regressor, i.e., cov(ut, xt) 6= 0. In relation to present paper de Jong

(2002) considered model (1) without exogeneity, and assuming certain mixing conditions.

Chang and Park (2011) considered a simple prototypical model, where the regressor and

regression error are driven by i.i.d. innovations. This section provides some extensions

to these works. Our main results show that there is an essential effect of endogeneity on

the asymptotics, introducing the bias in the related limit distribution. Full investigation

in this regard requires new asymptotic results, which will be left for future work.

4.1 Asymptotics: integrable functions

We first consider the situation that f satisfies Assumption 2.1 with T being integrable. Let

ηi ≡ (εi, νi), i ∈ Z be a sequence of i.i.d. random vectors satisfying Eη0 = 0 and E‖η0‖2q <

∞, where q ≥ 1. Let the characteristic function ϕ(t) of ε0 satisfy∫∞−∞ |ϕ(t)|2dt <∞ and∫∞

−∞ |t|3|ϕ(t)|mdt < ∞ for some m > 0. As noticed in Remark 4 of Jeganathan (2008),

the conditions on the characteristics function ϕ(t) are not very restrictive.

Assumption 4.1. (i) xt =∑t

j=1 ξj, where ξj =∑∞

k=0 φkεj−k and the coefficients φk

satisfies (i)∑∞

k=0 k|φk| <∞ and∑∞

k=0 φk 6= 0 or (ii) φk ∼ k−µρ(k), where 1/2 < µ < 1

and ρ(k) is a function slowly varying at ∞.

(ii) ut =∑∞

k=0 ψk νt−k, where the coefficients ψk satisfies that∑∞

k=0 k2|ψk|2 < ∞, ψ ≡∑∞

k=0 ψk 6= 0 and∑∞

k=0 |ψk|max{1, |φk|} <∞ where φk =∑k

i=0 φi.

As we do not impose the independence between εk and νk, Assumption 4.1 provides

15

the endogeneity in the model (1), and it is much general than the conditions set given in

Chang and Park (2011). We have the following result.

Theorem 4.1. Suppose Assumptions 2.1 and 4.1 hold. Further assume:

(i) T is bounded and integrable, and

(ii)∫∞−∞(f(s, θ)− f(s, θ0))2ds > 0 for all θ 6= θ0.

Then the NLS estimator θn defined by (2) is a consistent estimator of θ0, i.e. θn →P θ0.

If in addition Assumption 3.2, then, as n→∞,√n/dn(θn − θ0)→D Σ−1 Λ1/2 NL

−1/2G (1, 0), (24)

where f(µ) =∫eiµxf(x, θ0)dx,

Λ = (2π)−1∫ f(µ)f(µ)′[Eu2

0 + 2∑∞

r=1E(u0ure−iµxr)] dµ, (25)

and other notations are given as in Theorem 3.1.

4.2 Asymptotics: beyond integrable functions

This section considers the situation that f satisfies Assumption 2.1 with T being non-

integrable. Let again ηi ≡ (εi, νi), i ∈ Z be a sequence of i.i.d. random vectors with

Eη0 = 0 and E‖η0‖2q < ∞, where q > 2. We make use of the following assumption in

related to the model (1).

Assumption 4.2. (i) xt =∑t

j=1 ξj, where ξj =∑∞

k=0 φkεj−k with φ =∑∞

k=0 φk 6= 0 and∑∞k=0 |φk| <∞;

(ii) ut =∑∞

k=0 ψk νt−k, where the coefficients ψk are assumed to satisfy∑∞

k=0 k|ψk| <∞,

ψ ≡∑∞

k=0 ψk 6= 0 and∑∞

k=0 k|ψk||φk| <∞.

Theorem 4.2. Suppose that Assumption 4.2 holds and in addition to Assumption 2.4, for

any θ 6= θ0, g(x, θ, θ0) is twice differentiable with that g′(x, θ, θ0) is locally bounded (i.e.,

bounded on any compact set). Then the NLS estimator θn defined by (2) is a consistent

estimator of θ0, i.e. θn →P θ0.

Define hx(x, θ) = ∂h(x,θ)∂x

.

Theorem 4.3. In addition to the conditions of Theorem 4.2, Assumptions 3.4 holds.

Further assume that:

16

(i) sup1≤i,j≤m |v(√n) vij(

√n)

vi(√n) vj(

√n)| <∞ and

∫|s|≤δ h(s, θ0)h(s, θ0)′ds > 0 for all δ > 0, and

(ii) hx(x, θ) is locally bounded (i.e., bounded on any compact set).

Then the limit distribution of θn is given by

Dn(θn − θ0)→D

(∫ 1

0

Ψ(t)Ψ(t)′dt)−1[

σξu

∫ 1

0

Ψ(t) dt+

∫ 1

0

Ψ(t) dU(t)], (26)

as n → ∞, where Ψ(t) = h(W (t), θ0), Ψ(t) = hx(W (t), θ0), σξu =∑∞

j=0E(ξ0uj), Dn =

diag(√nv1(√n), ...,

√nvm(

√n)) and (W (t), U(t)) is a bivariate Brownian motion with

covariance matrix

∆ =

(φ2Eε20 φψEε0ν0

φψEε0ν0 ψ2Eν20

).

5 Simulation

In this section, we investigate the finite sample performance of the NLS estimator θn of

nonlinear regressions with endogeneity. Chang and Park (2011) performed simulation of

similar model, but only considered the error structure ut to be i.i.d. innovation. We intend

to investigate the sampling behavior of θn under different degree of serially dependence

of ut on itself. To this end, we generate our data in the following way:

xt = xt−1 + εt,

vt =√

1− ρ2wt + ρεt,

f1(x, θ) = exp{−θ|x|} and f2(x, α, β1, β2) = α + β1x+ β2x2, (27)

where {wt} and {εt} are i.i.d. N(0, 32) variables, and ρ is the correlation coefficient that

controls the degree of endogeneity. The true value of θ is set as 0.1 and that of (α, β1, β2)

is set as (−20, 10, 0.1). The error structure is generated according to the following three

scenarios:

S1: ut = vt,

S2: ut =∑∞

j=1 j−10 vt−j+1, and

S3: ut =∑∞

j=1 j−4 vt−j+1.

Scenario S1 is considered by Chang and Park (2011), which eliminates the case that

ut is serially correlated. Scenarios S2 and S3 introduce self-dependence to the error

sequence. The decay rate of S2 is faster than that of S3, which implies that the level of

self dependence of ut increases from S1 to S3.

17

It is obvious that f1 is an integrable function and f2 satisfies the conditions in Theorem

4.3, with

v(λ, α, β1, β2) = λ2 and h(x, α, β1, β2) = β2x2.

In addition, we have

h(x, α, β1, β2) = (1, x, x2)′ and hx(x, α, β1, β2) = (0, 1, 2x)′.

It would be convenient for later discussion to write out the limit distributions of the

estimators for the f2 case. By Theorem 4.3, we have n1/2(αn − α0)

n(β1n − β10)

n3/2(β2n − β20)

→D Σ−1

U(1)

σξu +∫ 1

0W (t)dU(t)

2σξu∫ 1

0W (t)dt+

∫ 1

0W 2(t)dU(t)

, (28)

where σξu =∑∞

j=0E(ε0uj) and

Σ =

1∫ 1

0W (t)dt

∫ 1

0W 2(t)dt∫ 1

0W (t)dt

∫ 1

0W 2(t)dt

∫ 1

0W 3(t)dt∫ 1

0W 2(t)dt

∫ 1

0W 3(t)dt

∫ 1

0W 4(t)dt

.

In our simulations, we draw samples of sizes n = 200, 500 to estimate the NLS es-

timators and their t-ratios. Each simulation is run for 10,000 times, and their densities

are estimated using kernel method with normal kernel function. It is expected that as ρ

increases, the degree of endogeneity increases, and hence the variance of the distribution

will increase due to (24) and (26). It is also expected that the more serial correlation of

ut, the higher the variance of the limit distribution due to the cross terms appeared in

(25) and (26). Our simulation results largely corroborate with our theoretical results in

Section 4.

Firstly, the means of the estimators are close to the true values, and the deviations

from true values decrease as the sample sizes increase. The reductions of standard errors

from n = 200 to 500 are close to the theoretical values. For example, the ratio of standard

errors for αn between the sample size 200 and 500 is 1.564 in scenario S1, which is close

to theoretical value√

500200

= 1.581. This confirms that our estimators are accurate in all

scenarios. We present all these numerical values in Tables 2 and 3, which are given in the

Supplemental Materials of this paper (Section 12).

Secondly, comparing the S1–S3 curves in each plot, we can see that the S3 curves

have fat tails and lower peak than those of S1–S2. This verifies that high dependence

18

of ut on its own past will increase the variance of limit distribution. On the other hand,

comparing the shapes of the curves for different values of ρ, the curves with ρ = 0

have highest peaks, and the peakedness decrease as ρ increases. This matches with our

expectation that, if the cross dependence between ut and xt increases, the variance of

limit distribution will increase. The following Figure 1 provides the density estimates of

θn with n = 500. Additional figures for density estimates of θn and (αn, β1n, β2n) are given

in the Supplemental Materials of this paper (Figures 6-9).

Figure 1: Density estimates of θn.

Finally, the sampling results for t-ratios are also much expected from our limit theories.

In particular, we have interesting results for the t-ratios of β1n in Figure 2. In scenario S1,

Figure 2: Density of β1n t-ratios.

there is no endogeneity and the limiting t-ratios overlap with the normal curve. However,

as we introduce endogeneity to the model in scenarios S2 and S3, the t-ratios is shifted

away from the standard normal. Such behavior can be explained by looking at the limit

distribution of β1n in (28). When ρ 6= 0, there exists an extra term of Σ−1σξu, and such

term will introduce bias to the limit distribution. More figures for t-ratios are given in

the Supplemental Materials of this paper (Figures 10-13).

19

6 An empirical example: Carbon Kuznets Curve

In this section, we consider a simple example of the model with endogeneity. The equation

of interest is the Carbon Kuznets Curve (CKC) relating the per capita CO2 emission of

a country to its per capita GDP. Basically, the CKC hypothesis states that there is an

inverted-U shaped relationship between economic activities and per capita CO2 emissions.

As it is well accepted that the GDP variable displays some wandering nonstationary be-

haviors over time, such problem is relevant to nonlinear transformations of nonstationary

regressors. Refer to Wagner (2008) and Muller-Furstenberger and Wagner (2007) for

detailed expositions of how such problem relates to the works of PP and Chang et al.

(2001).

The formula that we currently consider has a quadratic formulation and is given by

ln(et) = α + β1 ln(xt) + β2(ln(xt))2 + ut, 1 ≤ t ≤ n, (29)

where et and xt denote the per capital emissions of CO2 and GDP in period t, and ut is a

stochastic error term. Figure 3 presents the plots of logarithm of CO2 versus logarithm of

GDP for Belgium, Denmark and France. The quadratic polynomial specification of CO2

in GDP has been extensively analysed in the literature. See the introduction of Piaggio

and Padilla (2012) for an overiew. Roughly speaking, the upward slope of the curve can

be explained by the increase in natural resources depletion as the economic activities of

grow. As the country continues to develop, technological advance and stricter regulatory

policies will start contributing to a reduction in the emission of air pollutants, hence,

result in an inverted-U shape.

In the usual CKC formulation, the logarithm of per capita CO2 emissions and GDP

are assumed to be integrated processes, see Muller-Furstenberger and Wagner (2007).

Also, it is clear that the nonlinear link function f(x, α, β1, β2) = α + β1x + β2x2 satisfies

the conditions in Theorem 4.3, and is considered in our simulations. There are studies

which employ more complicated models, by including time trend, stochastic terms and

additional explanatory variables such as the energy structure. To avoid complicating our

discussion, we only consider GDP as the only explanatory variable.

In the literature most studies assumed that the parameters (β1, β2) are homogeneous

across different countries. Several works investigated the appropriateness of restricting all

countries adhering to the same values of (β1, β2). For example, List and Gallet (1999), Di-

jkgraaf et al. (2005), Piaggio and Padilla (2012). Particularly, Piaggio and Padilla (2012)

20

tested this assumption by checking whether the confidence intervals of (β1, β2) across dif-

ferent countries overlap. They rejected the homogeneity assumption of parameters across

different countries, based on the result that they could not find any possible group with

more than 4 countries whose confidence intervals overlap with each other.

As an example of employing our NLS estimators θn = (αn, β1n, β2n) and calculating

their confidence intervals, we carry out a similar study and examine whether the 95% con-

fidence intervals of the parameters (β1n, β2n) across different countries overlap with each

other. We extend previous studies by incorporating the endogeneity in the model. The

assumption that xt is an exogenous variable does not usually hold in practise as there are

many potential sources of endogeneity. Firstly, endogeneity occurs due to measurement

error of the explanatory variables xt, including the misreporting of GDP of each country.

Secondly, there might be omitted variable bias, which arises if there exists a relevant

explanatory variable that is correlated with the regressor xt (per capita GDP), but is

excluded from the model. Finally, potential reverse causality will give rise to endogeneity.

That is, the CO2 emi ssions yt might at the same time has a feedback effect on the GDP

xt (e.g., due to stricter government regulation).

We consider 14 countries, which are shown in Piaggio and Padilla (2012) that their

response variables have a quadratic relationship described in (29) with the regressors. We

use annual data in this example and for each country, there are 58 observations of the

CO2 emission data from 1951–2008 published by the Carbon Dioxide Information Analysis

Center (Boden et al. (2009)). For per capita GDP, we adopt the GDP data from Maddison

(2003), which is transformed to 1990 Geary-Khamis dollars. We performed a unit root

test to ensure the data are nonstationary, and we observed that all Dickey-Fuller test

statistics were greater than the critical value. Therefore, we do not reject the hypothesis

that there is a unit root. The results are presented in the Supplementary Materials of

Figure 3: Plots of log(CO2) against log(GDP) for Belgium, Denmark and France.

21

this paper (Section 11).

We estimate the parameters by minimizing the error function given in (2) using the

nls() function in the statistical software R. Next, we calculate the confidence interval

by finding the critical values of the limit distribution of (αn, β1n, β2n) given in (29). Note

that most of the quantities in (29) involve only the standard Brownian motion or integral

of Brownian motions, which can be easily obtained by direct simulations. However, there

exist two nuisance parameters, including the covariance matrix ∆ of Brownian motions

(W,U) and the autocovariances σξu of linear processes {ξt, ut}. Thus we have to replace

the unknown parameters with their consistent estimates and adopt the empirical critical

values instead of the real ones. Using the linear process estimation procedures described

in Chang et al. (2001), we have φ, ψ as the estimators of φ, ψ, and εt and νt as the

artificial error sequences. The parameters ∆ and σξu can then be estimated by

∆ =

(n−1φ2

∑nt=1 ε

2t n−1φψ

∑nt=1 εtνt

n−1φψ∑n

t=1 εtνt n−1ψ2∑n

t=1 ν2t

), and σξu =

l∑j=1

n−1

n−j∑t=1

ξtut+j,

for any l = o(n1/4). See Ibragimov and Phillips (2008), Phillips and Solo (1992) and

Phillips and Perron (1988).

Figure 4: Estimates and 95% Confidence Intervals of β1n.

Tables 5 and 6 in the Supplementary Materials of this paper report the estimates

and 95% confidence intervals of parameters β1 and β2, and Figures 4 and 5 here present

the plots of the results. For the parameter β1, we can not find any group with more

than four groups of countries whose confidence intervals overlap. Such result is consistent

with that of Piaggio and Padilla (2012). However, for the parameter β2n, among the 13

22

countries, we can find group of 11 countries whose confidence intervals overlap. Hence,

the homogeneity of parameters can not be rejected as in Piaggio and Padilla (2012).

There are two main reasons for the differences between our results and theirs. Firstly,

they assume the innovation is i.i.d. normally distributed while we allow the error to be

serially correlated with itself. Secondly, we incorporate the effect of endogeneity in our

limit distribution of the estimators. Based on (29), both reasons will make the critical

values and confidence intervals significantly larger than that of Piaggio and Padilla (2012).

Similarly, we may construct the estimate and confidence interval for α. See Table 4 and

Figure 14 in Supplementary Materials of this paper for this kind of result.

7 Conclusion and discussion

In this paper, we establish an asymptotic theory for a nonlinear parametric cointegrating

regression model. A general framework is developed for establishing the weak consis-

tency of the NLS estimator θn. The framework can easily be applied to a wide class of

nonstationary regressors, including partial sums of linear processes and Harris recurrent

Markov chains. Limit distribution of θn is also established, which extends previous works.

Furthermore, we introduce endogeneity to our model by allowing the error term to be

serially dependent on itself, and cross-dependent on the regressor. We show that the limit

distribution of θn under the endogeneity situation is different from that with martingale

error structure. This result is of interest in the applied econometric research area.

Asymptotics in this paper are limited to univariate regressors and nonlinear para-

metric cointegrating regression models. Specification of the nonlinear regression function

f depends usually on the underlying theories of the subject. Illustration can be found

in Bae et al. (2004), where the concept of the money in the utility function with the

constant elasticity of substitution (MUFCES) is used to relate the money balance with

nominal interest rate by the link function f(x, α, β) = α + β log( x1+x

). More currently,

nonparametric method has been developed to test the specification of f . See, e.g., Wang

and Phillips (2012), for instance. Extension to multivariate regressors require substantial

different techniques. This is because local time theory can not in general be extended to

multivariate data, in which the techniques play a key rule in the asymptotics of the NLS

estimator with nonstationary regressor when f is an integrable function. Possible exten-

sion to multivariate regressors is the index model discussed in Chang and Park (2003).

This again requires some new techniques, and hence we leave it for future work.

23

8 Proofs of main results

This section provides proofs of the main results. We start with some preliminaries, which

list the limit theorems that are commonly used in the proofs of the main results.

8.1 Preliminaries

Denote Nδ(θ0) = {θ : ‖θ − θ0‖ < δ}, where θ0 ∈ Θ is fixed.

Lemma 8.1. Let Assumption 2.1 hold. Then, for any xt satisfying (4) with T being given

as in Assumption 2.1,

supθ∈Nδ(θ0)

κ−2n

n∑t=1

(|f(xt, θ)− f(xt, θ0)|+ |f(xt, θ)− f(xt, θ0)|2

)→P 0, (30)

as n→∞ first and then δ → 0. If in addition Assumption 2.2, then

κ−2n

n∑t=1

[f(xt, θ0)− f(xt, π0)]ut →P 0, (31)

for any θ0, π0 ∈ Θ, and

supθ∈Nδ(θ0)

κ−2n

n∑t=1

|f(xt, θ)− f(xt, θ0)| |ut| →P 0, (32)

as n→∞ first and then δ → 0.

Proof. (30) is simple. As κ−2n

∑nt=1[f(xt, θ0)− f(xt, π0)]2 ≤ Cκ−2

n

∑nt=1 T

2(xt) = OP (1),

(31) follows from Lemma 2 of Lai and Wei (1982). As for (32), the result follows from

n∑t=1

|f(xt, θ)− f(xt, θ0)| |ut| ≤ h(‖θ − θ0‖)n∑t=1

T (xt) |ut|,

and∑n

t=1 T (xt) |ut| ≤ C∑n

t=1 T (xt) +∑n

t=1 T (xt)[|ut| − E(|ut| | Ft−1)

]= OP (κ2

n).

Lemma 8.2. Let Assumption 3.1 (i) hold. For any bounded g(x) satisfying∫∞−∞ |g(x)|dx <

∞, we haven∑t=1

g(xt) = OP (n/dn). (33)

If in addition Assumption 3.1 (ii)–(iii), then, for any bounded gi(x), i = 1, 2, satisfying∫∞−∞ |gi(x)|dx <∞ and

∫∞−∞ gi(x)dx 6= 0,{(dn

n

)1/2n∑t=1

g1(xt)ut,dnn

n∑t=1

g2(xt)}→D

{τ1N L

1/2G (1, 0), τ2 LG(1, 0)

}, (34)

where τ 21 =

∫∞−∞ g

21(s)ds, τ2 =

∫∞−∞ g2(s)ds and N is a standard normal variate indepen-

dent of G(t).

24

Proof. The result (33) sees Lemma 3.2 of Wang and Phillips (2009b). Theorem 2.2 of

Wang (2013) provides (34) with g2(x) = g21(x). It is not difficult to see that (34) still

holds for general g2(x). We omit the details.

Lemma 8.3. Let xt be defined as in Example 2. For any g such that∫∞−∞ |g(x)|π(dx) <∞

and∫∞−∞ g(x)π(dx) 6= 0, we have

n∑t=1

g(xt) = OP [a(n)], (35)

[ n∑t=1

g(xt)]−1

= OP [a(n)−1]. (36)

If Assumption 3.1* holds, then, for any bounded gi(x), i = 1, 2, satisfying∫∞−∞ |gi(x)|π(dx) <

∞ and∫∞−∞ gi(x)π(dx) 6= 0,

{a(n)−1/2

n∑t=1

g1(xt)ut, a(n)−1

n∑t=1

g2(xt)}→D

{τ5N Π

1/2β , τ6 Πβ

}, (37)

where τ 25 =

∫∞−∞ g

21(s)π(ds), τ6 =

∫∞−∞ g2(s)π(ds), Πβ is defined as in Theorem 3.3 and N

is a standard normal variate independent of Πβ.

Proof. See the Supplemental Material of this paper.

Lemma 8.4. Under Assumption 4.1, for any bounded gi(x), i = 1, 2, satisfying∫∞−∞ |gi(x)|dx <

∞ and∫∞−∞ gi(x)dx 6= 0, we have

n∑t=1

g1(xt)|ut| = OP (n/dn), (38)

and {(n/dn)−1/2

∑nt=1 g1(xt)ut, (n/dn)−1

∑nt=1 g2(xt)

}→D

{τ3N L

1/2G (1, 0), τ4LG(1, 0)

}, (39)

where τ 23 = (2π)−1

∫∞−∞ g1(µ)2[Eu2

0 + 2∑∞

r=1E(u0ure−iµxr)] dµ, g1(µ) =

∫∞−∞ e

iµxg1(x)dx

and τ4 =∫∞−∞ g2(s)ds.

Proof. See Theorem 3 and 5 of Jeganathan (2008).

25

Lemma 8.5. Under Assumption 4.2, for any twice continuous differentiable function

gi(x), i = 1, 2, with that |g′i(x)| is locally bounded (i.e., bounded on any compact set), we

have { 1√n

n∑t=1

g1

( xt√n

)ut,

1

n

n∑t=1

g2

( xt√n

)}→D

{σξu

∫ 1

0

g′1(W (t))dt+

∫ 1

0

g1(W (t))dU(t),

∫ 1

0

g2(W (t))dt}, (40)

as n→∞, where σξu =∑∞

j=0E(ξ0uj), and (W (t), U(t)) is a bivariate Brownian motion

with covariance matrix

∆ =

(φ2Eε20 φψEε0ν0

φψEε0ν0 ψ2Eν20

).

Consequently, under the conditions of Theorem 4.2, we have

n∑t=1

(f(xt, π0)− f(xt, θ0))ut = oP[n v(√n)]. (41)

Proof. The result (40) sees Theorem 4.3 of Ibragimov and Phillips (2008) with minor

improvements. To see (41), recall max1≤t≤n |xt|/√n = OP (1). Without loss of generality,

we assume max1≤t≤n |xt|/√n ≤ K0 for some K0 > 0. It follows from Assumption 2.4 that

n∑t=1

(f(xt, π0)− f(xt, θ0))ut

= v(√n)

n∑t=1

g(xt/√n, π0, θ0)ut + oP (v(

√n))

n∑t=1

T1(xt/√n)|ut|

= oP [nv(√n)],

as required.

Lemma 8.6. Under Assumptions 3.1 (i) and 3.2, we have

dnn

supθ∈Nδ(θ0)

n∑t=1

∣∣fi(xt, θ) fj(xt, θ)− fi(xt, θ0) fj(xt, θ0)∣∣ →P 0, (42)

dnn

supθ∈Nδ(θ0)

n∑t=1

∣∣fij(xt, θ) [f(xt, θ)− f(xt, θ0)]∣∣ →P 0, (43)

as n → ∞ first and then δ → 0, for any 1 ≤ i, j ≤ m. If in addition Assumption 3.1

(ii)–(iii), then

dnn

supθ∈Nδ(θ0)

∣∣ n∑t=1

fij(xt, θ)ut∣∣→P 0, (44)

26

as n→∞ first and then δ → 0, for any 1 ≤ i, j ≤ m.

Similarly, under Assumptions 3.1* (i) and 3.2*, (42) and (43) are true if we replace

dn/n by a(n)−1. If in addition Assumption 3.1* (ii), then (44) holds if we replace dn/n

by a(n)−1.


Lemma 8.7. Under Assumptions 3.3 and 3.4, we have

1

nvi(dn)vj(dn)sup

θ∈Nδ(θ0)

n∑t=1

∣∣fi(xt, θ) fj(xt, θ)− fi(xt, θ0) fj(xt, θ0)∣∣ →P 0, (45)

1

nv(dn)vij(dn)sup

θ∈Nδ(θ0)

n∑t=1

∣∣fij(xt, θ) [f(xt, θ)− f(xt, θ0)]∣∣ →P 0, (46)

1

nvij(dn)sup

θ∈Nδ(θ0)

∣∣ n∑t=1

fij(xt, θ)ut∣∣ →P 0, (47)

as n→∞ first and then δ → 0, for any 1 ≤ i, j ≤ m.


Lemma 8.8. Let Assumption 3.3 hold. Then, for any regular functions g(x, θ) and

g1(x, θ) on Θ, we have{ 1√n

n∑t=1

g(xtdn, θ)ut,

1

n

n∑t=1

g1

(xtdn, θ)}

→D

{∫ 1

0

g(G(t), θ

)dU(t),

∫ 1

0

g1

(G(t), θ

)dt}. (48)

Proof. If g(x, θ) and g1(x, θ) are continuous functions, the result follows from (20) and the

continuous mapping theorem. See, Kurtz and Protter (1991) for instance. The extension

from continuous function to regular function is standard in literature. The details can be

found in PP or Park and Phillips (1999).

Lemma 8.9. Let Dn(θ, θ0) = Qn(θ)−Qn(θ0). Suppose that, for any δ > 0,

lim infn→∞

inf|θ−θ0|≥δ

Dn(θ, θ0) > 0 in probability, (49)

then θn →P θ0.

Proof. See Lemma 1 of Wu (1981).

27

8.2 Proofs of theorems

Proof of Theorem 2.1. Let N be any open subset of Θ containing θ0. Since θn is

the minimizer of Qn(θ) over θ ∈ Θ, by Lemma 8.9, proving consistency is equivalent to

showing that, for any 0 < η < 1/3 and θ 6= θ0, where θ, θ0 ∈ Θ, there exist n0 > 0 and

M1 > 0 such that

P(

infθ∈Θ∩N c

Dn(θ, θ0) ≥ κ2n/M1

)≥ 1− 3η, (50)

for all n > n0.

Denote Nδ(π0) = {θ : ‖θ − π0‖ < δ}. Since Θ ∩ N c is compact, by the finite covering

property of compact set, (50) will follow if we prove that, for any fixed π0 ∈ Θ ∩N c,

In(δ, π0) := supθ∈Nδ(π0)

κ−2n

∣∣∣Dn(θ, θ0)−Dn(π0, θ0)∣∣∣→P 0, (51)

as n→∞ first and then δ → 0, and for ∀η > 0, there exist M0 > 0 and n0 > 0 such that

for all n ≥ n0 and M ≥M0,

P(Dn(π0, θ0) ≥ κ2

n/M)≥ 1− 2η. (52)

Indeed, due to (51), for any 0 < η < 1/3 and M1 > 0, there exist n0 > 0 and δ0 > 0 such

that

P ( max1≤j≤m0

In(δ0, πj) ≥ 1/2M1) ≤ η,

where m0 and πj, 1 ≤ j ≤ m0, are chosen so that Θ ∩ N c ⊂⋃m0

j=1Nδ0(πj). Consequently,

by taking M1 ≥M0/(2m0), it follows from (52) that

P(

infθ∈Θ∩N c

Dn(θ, θ0) ≥ κ2n/M1

)≥ P

(inf

1≤j≤m0

Dn(πj, θ0) ≥ κ2n/(2M1)

)− η

≥ inf1≤j≤m0

P(Dn(πj, θ0) ≥ κ2

n/(2m0M1))− η ≥ 1− 3η,

which yields (50).

We next prove (51) and (52). The result (51) is simple, which follows immediately

from Lemma 8.1 and the fact that, for each fixed π0 ∈ Θ ∩N c,

In(δ, π0) ≤ supθ∈Nδ(π0)

κ−2n

n∑t=1

(f(xt, θ)− f(xt, π0))2

+ supθ∈Nδ(π0)

κ−2n

n∑t=1

|f(xt, θ)− f(xt, π0)||ut|.

28

To prove (52), we recall

Dn(π0, θ0) =n∑t=1

(f(xt, π0)− f(xt, θ0))2 −n∑t=1

(f(xt, π0)− f(xt, θ0))ut.

This, together with (5) and (31), implies that, for any η > 0, there exists n0 > 0 and

M0 > 0 such that for all n > n0 and M ≥M0,

P(Dn(π0, θ0) ≥ κ2

nM−1)≥ P

( n∑t=1

(f(xt, π0)− f(xt, θ0))2 ≥ κ2nM

−1/2)− η

≥ 1− 2η,

as required.

Finally, by noting Qn(θ0) = n−1∑n

t=1 u2t →P σ2 due to Assumption 2.2 and strong

law of large number, it follows from the consistency of θn and (51) that

|σ2n − σ2| ≤ Cκ−2

n |Qn(θn)−Qn(θ0)|+ oP (1) = oP (1).

Proof of Theorem 2.2. (4) follows from (33) of Lemma 8.2. (5) follows from (34) of

Lemma 8.2 with g2(x) = (f(x, θ)− f(x, θ0))2 and the facts that P (LG(1, 0) > 0) = 1 and∫∞−∞(f(s, θ)− f(s, θ0))2ds > 0, for any θ 6= θ0.

Proof of Theorem 2.3. (4) follows from (35) of Lemma 8.3 with g(x) = T (x) + T 2(x).

(5) follows from (36) of Lemma 8.3 with g(x) = (f(x, θ) − f(x, θ0))2 and the facts that∫∞−∞(f(s, θ)− f(s, θ0))2π(ds) > 0, for any θ 6= θ0.

Proof of Theorem 2.4. Recalling T (λx) ≤ v(λ)T1(x), we have

1

nv2(dn)

n∑t=1

[T (xt) + T 2(xt)] ≤ 1

n

n∑t=1

[T1

(xtdn

)+ T 2

1

(xtdn

)]→D

∫ 1

0

[T1(G(t)) + T 21 (G(t))]dt, (53)

where the convergence in distribution comes from Berkes and Horvath (2006). This proves

(4) with κn =√nv(dn) due to the local integrability of T1 + T 2

1 . To prove (5), by letting

G(λ, x) := f(λx, θ)− f(λx, θ0)− v(λ)g(x, θ, θ0), we have

1

nv2(dn)

n∑t=1

(f(xt, θ)− f(xt, θ0))2

=1

nv2(dn)

n∑t=1

[G(dn,

xtdn

)+ v(dn)g

(xtdn, θ, θ0

)]2

≥ 1

n

n∑t=1

g2(xtdn, θ, θ0

)− |Λn|, (54)

29

where Λn = 2nv(dn)

∑nt=1G

(dn,

xtdn

)g(xtdn, θ, θ0

). The similar arguments as in the proof of

(53) yield that

1

n

n∑t=1

g2(xtdn, θ, θ0

)→D Z :=

∫ 1

0

g2(G(s), θ, θ0)ds, (55)

where P (0 < Z < ∞) = 1 due to Assumption 2.4. On the other hand, it follows from

(10) that

1

nv2(dn)

n∑t=1

G2(dn,

xtdn

)≤ o(1)

nv2(dn)

n∑t=1

T 2(xt) ≤o(1)

n

n∑t=1

T 21 (xt/dn) = oP (1).

Now (5) follows from (54), (55) and the fact that

Λn ≤ 2∣∣∣ 1

nv2(dn)

n∑t=1

G2(dn,

xtdn

)∣∣∣1/2 ∣∣∣ 1n

n∑t=1

g2(xtdn, θ, θ0

)∣∣∣1/2 →P 0.

Proof of Theorem 2.5. See the Supplemental Material of this paper.

Proof of Theorem 3.1. It is readily seen that

Qn(θ0) = −n∑t=1

f(xt, θ0)(yt − f(xt, θ0)) = −n∑t=1

f(xt, θ0)ut,

Qn(θ) = −n∑t=1

f(xt, θ)f(xt, θ)′ −

n∑t=1

f(xt, θ)(yt − f(xt, θ)),

for any θ ∈ Θ. It follows from the conditions (i)–(iii) that

supθ:‖Dn(θ−θ0)‖≤kn

‖ (D−1n )′

[Qn(θ)− Qn(θ0)

]D−1n ‖= oP (1).

By using (iii) and (iv), we have (D−1n )′ Qn(θ0) = OP (1) and (D−1

n )′ Qn(θ0)D−1n →D M.

As M > 0, a.s., λmin[(D−1n )′ Qn(θ0)D−1

n ] ≥ ηn with probability that goes to one for some

ηn > 0, where λmin(A) denotes the smallest eigenvalue of A and ηn → 0 can be chosen

as slowly as required. Due to these facts, a modification of Lemma 1 in Andrews and

Sun (2004) implies (13). Note that (iv)’ implies (iv). The result Dn(θn − θ0)→D M−1Z

follows from (13) and the continuing mapping theorem.

Proof of Theorem 3.2. It suffices to verify the conditions (i)–(iii) and (iv)’ of Theorem

3.1 with

Dn =√n/dn I, Z = Σ1/2 NL

1/2G (1, 0), M = ΣLG(1, 0),

30

where I is an identity matrix, under Assumptions 3.1 and 3.2. In fact, as n/dn →∞, we

may take kn → ∞ such that θ : ||Dn(θ − θ0)|| ≤ kn falls in Nδ(θ0) = {θ : ||θ − θ0|| < δ}.This, together with Lemma 8.6, yields (i)–(iii).

On the other hand, it follows from Lemma 8.2 with g1(x) = α′3f(x, θ0) and g2(x) =

α′1f(x, θ0) f(x, θ0)′α2 that, for any αi = (αi1, ..., αim) ∈ Rm, i = 1, 2, 3,{α′3 Zn, α

′1Yn α2

}={(dn

n

)1/2n∑t=1

α′3 f(xt, θ0)ut,dnn

n∑t=1

α′1f(xt, θ0) f(xt, θ0)′α2

}→D

{τ N L

1/2G (1, 0), α′1M α2

}=D

{α′3 Z, α

′1M α2

}, (56)

where τ 2 =∫∞−∞[α′3f(s, θ0)]2ds, N is a standard normal random variable independent of

G(t) and we have used the fact that

τ N =(∫ ∞−∞

[α′3f(s, θ0)]2ds)1/2

N

=D α′3

(∫ ∞−∞

f(s, θ0)f(s, θ0)′ds)1/2

N = α′3 Σ1/2 N.

This proves (iv)’.

Proof of Theorem 3.3. As in the proof of Theorem 3.2, it suffices to verify the condi-

tions (i)–(iii) and (iv)’ of Theorem 3.1 with

Dn = a(n)I, Z = Σ1/2π NΠβ, M = Σπ Πβ,

under Assumptions 3.1* and 3.2*. The details are similar to that of Theorem 3.2, with

Lemma 8.2 replaced by Lemma 8.3, and hence are omitted.

Proof of Theorem 3.4. It suffices to verify the conditions (i)–(iii) and (iv)’ of Theorem

3.1 with Dn = diag(√nv1(dn), ...,

√nvm(dn)),

Z =

∫ 1

0

h(G(t), θ0)dU(t), M =

∫ 1

0

Ψ(t)Ψ(t)′dt,

under Assumptions 3.3 and 3.4. Recalling sup1≤i,j≤m |v(dn) vij(dn)

vi(dn) vj(dn)| < ∞, it follows from

(46) that

1

nvi(dn)vj(dn)sup

θ∈Nδ(θ0)

n∑t=1

∣∣fij(xt, θ)[f(xt, θ)− f(xt, θ0)]∣∣

≤ C

nvij(dn)v(dn)sup

θ∈Nδ(θ0)

n∑t=1

∣∣fij(xt, θ)[f(xt, θ)− f(xt, θ0)]∣∣ = oP (1),

31

for any 1 ≤ i, j ≤ m. On the other hand, as√nvi(dn) → ∞ for all 1 ≤ i ≤ m, we may

take kn →∞ such that θ : ||Dn(θ− θ0)|| ≤ kn falls in Nδ(θ0) = {θ : ||θ− θ0|| < δ}. These

facts imply that

supθ:‖Dn(θ−θ0)‖≤kn

‖ (D−1n )′

n∑t=1

f(xt, θ)[f(xt, θ)− f(xt, θ0)

]D−1n ‖= oP (1).

This proves the required (ii). The proofs of (i) and (iii) are similar, we omit the details.

Finally (iv)’ follows from Lemma 8.8 with

g(x, θ0) = α′3h(x, θ0) and g1(x, θ0) = α′1h(x, θ0)h(x, θ0)′α2.

Proofs of Theorems 4.1–4.3. See the Supplemental Material of this paper.

REFERENCES

Andrews, D. W. K. and Sun, Y., 2004. Adaptive local polynomial Whittle estimation of

long-range dependence. Econometrica, 72, 569-614.

Boden, T. A., Marland, G. and Andres, R. J., 2009. Global, Regional, and National

Fossil-Fuel CO2 Emissions. Carbon Dioxide Information Analysis Center, Oak Ridge

National Laboratory, U.S. Department of Energy, Oak Ridge, Tenn., U.S.A..

Bae, Y. and de Jong, R., 2007. Money demand function estimation by nonlinear cointe-

gration. Journal of Applied Econometrics, 22(4), 767–793.

Bae, Y. Kakkar, V. and Ogaki, M., 2004. Money demand in Japan and the liquidity trap.

ohio State University Department of Economics Working Paper #04–06.

Bae, Y., Kakkar, V., and Ogaki, M., 2006. Money demand in Japan and nonlinear

cointegration. Journal of Money, Credit, and Banking, 38(6), 1659–1667.

Bates, D. M. and Watts, D. G., 1988. Nonlinear Regression Analysis and Its Applications.

analysis, 60(4), 856865.

Berkes, I. and Horvath, L., 2006. Convergence of integral functionals of stochastic pro-

cesses. Econometric Theory, 22(2), 304–322.

Chang, Y. and Park, J. Y., 2011. Endogeneity in nonlinear regressions with integrated

time series, Econometric Review, 30, 51–87.

Chang, Y. and Park, J. Y., 2003. Index models with integrated time series. Journal of

Econometrics, 114, 73–106.

32

Chang, Y., Park, J. Y. and Phillips, P. C. B., 2001. Nonlinear econometric models with

cointegrated and deterministically trending regressors. Econometric Journal, 4, 1–36.

Chen, X., 1999. How Often Does a Harris Recurrent Markov Chain Recur? The Annals

of Probability, 27, 1324–1346.

Choi, I. and Saikkonen, P., 2010. Tests for nonlinear cointegration. Econometric Theory,

26, 682–709.

Dijkgraaf, E. and Vollebergh, H. R., 2005. A test for parameter homogeneity in CO2

panel EKC estimations. Environmental and Resource Economics, 32(2), 229–239.

de Jong, R., 2002. Nonlinear Regression with Integrated Regressors but Without Exo-

geneity, Mimeograph, Department of Economics, Michigan State University.

Feller, W., 1971. Introduction to Probability Theory and Its Applications, vol. II, 2nd

ed. Wiley.

Gao, J. K., Maxwell, Lu, Z. and Tjøstheim, D., 2009. Specification testing in nonlinear

and nonstationary time series autoregression. The Annals of Statistics, 37, 3893–3928.

Granger, C. W. J. and Terasvirta, T., 1993. Modelling Nonlinear Economic Relationships.

Oxford University Press, New York.

Hall, P. and Heyde, C. C., 1980. Martingale limit theory and its application. Probability

and Mathematical Statistics. Academic Press, Inc.

Hansen, B. E., 1992. Covergence to stochastic integrals for dependent heterogeneous

processes. Econometric Theory, 8, 489–500.

Ibragimov, R. and Phillips, P. C. B., 2008. Regression asymptotics using martingale

convergence methods Econometric Theory, 24, 888–947.

Jeganathan, P., 2008. Limit theorems for functional sums that converge to fractional

Brownian and stable motions. Cowles Foundation Discussion Paper No. 1649, Cowles

Foundation for Research in Economics, Yale University.

Karlsen, H. A. and Tjøstheim, D., 2001. Nonparametric estimation in null recurrent time

series. The Annals of Statistics, 29, 372–416.

Kurtz, T. G. and Protter, P., 1991. Weak limit theorems for stochastic integrals and

stochastic differential equations. The Annals of Probability, 19, 1035–1070.

Lai, T. L., 1994. Asymptotic theory of nonlinear least squares estimations. The Annals

of Statistics, 22, 1917–1930.

Lai, T. L. and Wei, C. Z., 1982. Least squares estimates in stochastic regression models

with applications to identification and control of dynamic systems, The Annals of

33

Statistics, 10, 154–166.

List, J. and Gallet, C., 1999. The environmental Kuznets curve: does one size fits all?

Ecological Economics 31, 409–423.

Maddison, A., 2003. The World Economy: Historical Statistics, OECD.

Muller-Furstenberger, G. and Wagner, M., 2007. Exploring the environmental Kuznets

hypothesis: Theoretical and econometric problems. Ecological Economics, 62(3), 648–

660.

Nummelin, E., 1984. General Irreducible Markov Chains and Non-negative Operators.

Cambridge University Press.

Orey, S., 1971. Limit Theorems for Markov Chain Transition Probabilities. Van Nostrand

Reinhold, London.

Piaggio, M. and Padilla, E., 2012. CO2 emissions and economic activity: Heterogeneity

across countries and non-stationary series. Energy policy, 46, 370–381.

Park, J. Y. and Phillips, P. C. B., 1999. Asymptotics for nonlinear transformation of

integrated time series. Econometric Theory, 15, 269–298.

Park, J. Y. and Phillips, P. C. B., 2001. Nonlinear regressions with integrated time series.

Econometrica, 69, 117–161.

Phillips, P. C. B., 2001. Descriptive econometrics for non-stationary time series with

empirical illustrations. Cowles Foundation discuss paper No. 1023.

Phillips, P. C. B. and Perron, P., 1988. Testing for a unit root in time series regression.

Biometrika, 75(2), 335–346.

Phillips, P. C. B. and Solo, V., 1992. Asymptotics for linear processes. The Annals of

Statistics, 971–1001.

Terasvirta, T., Tjøstheim, D. and Granger, C. W. J., 2011. Modelling Nonlinear Economic

Time Series, Advanced Texts in Econometrics.

Shi, X. and Phillips, P. C. B., 2012. Nonlinear cointegrating regression under weak

identification. Econometric Theory, 28, 509–547.

Skouras, K., 2000. Strong consistency in nonlinear stochastic regression models. The

Annals of Statistics, 28, 871–879.

Wagner, M., 2008. The carbon Kuznets curve: A cloudy picture emitted by bad econo-

metrics? Resource and Energy Economics, 30(3), 388–408.

Wang, Q., 2013. Martingale limit theorems revisited and non-linear cointegrating regres-

sion. Econometric Theory, forthcoming.

34

Wang, Q., Lin, Y. X. and Gulati, C. M., 2003. Asymptotics for general fractionally

integrated processes with applications to unit root tests. Econometric Theory, 19,

143–164.

Wang, Q. and Phillips, P. C. B., 2009. Asymptotic theory for local time density estimation

and nonparametric cointegrating regression. Econometric Theory, 25, 710–738.

Wang, Q. and Phillips, P. C. B., 2009. Structural nonparametric cointegrating regression.

Econometrica, 77, 1901–1948.

Wang, Q. and Phillips, P. C. B., 2012. A specification test for nonlinear nonstationary

models. The Annals of Statistics, 40, 727-758.

Wu, C. F., 1981. Asymptotic theory of nonlinear least squares estimations. The Annals

of Statistics, 9, 501–513.

35

Figure 5: Estimates and 95% Confidence Intervals of β2n.

36

nonlinear regressions with nonstationary time series · 2014-03-21 · nonlinear regressions with...

Documents