root--consistent estimation of weak fractional cointegration

$: Root--consistent estimation of weak fractional cointegration$
ARTICLE IN PRESS

Journal of Econometrics 140 (2007) 450–484

0304-4076/$ -

doi:10.1016/j

�CorrespoE-mail ad

www.elsevier.com/locate/jeconom

Root-n-consistent estimation of weakfractional cointegration

J. Hualdea, P.M. Robinsonb,�

aSchool of Economics and Business Administration, Universidad de Navarra, Pamplona 31080, SpainbDepartment of Economics, London School of Economics, Houghton Street, London WC2A 2AE, UK

Available online 23 August 2006

Abstract

Empirical evidence has emerged of the possibility of fractional cointegration such that the gap, b,between the integration order d of observable time series and the integration order g of cointegratingerrors is less than 0.5. This includes circumstances when observables are stationary or asymptotically

stationary with long memory so do12

� �and when they are nonstationary so dX1

2

� �. This ‘‘weak

cointegration’’ contrasts strongly with the traditional econometric prescription of unit-root

observables and short memory cointegrating errors, where b ¼ 1. Asymptotic inferential theory

also differs from this case and from other members of the class b412, in particular

ffiffiffinp

-consistent and

asymptotically normal estimation of the cointegrating vector n is possible when bo12, as we explore in

a simple bivariate model. The estimate depends on g and d or, more realistically, on estimates of

unknown g and d. These latter estimates need to beffiffiffinp

-consistent, and the asymptotic distribution of

the estimate of n is sensitive to their precise form. We propose estimates of g and d that are

computationally relatively convenient, relying on only univariate nonlinear optimization. Finite

sample performance of the methods is examined by means of Monte Carlo simulations, and several

applications to empirical data included.

r 2006 Elsevier B.V. All rights reserved.

JEL classification: C32

Keywords: Fractional cointegration; Parametric estimation; Asymptotic normality

see front matter r 2006 Elsevier B.V. All rights reserved.

.jeconom.2006.07.004

nding author. Tel.: +44 20 7955 7516; fax: +44 20 7955 6592.

dress: [email protected] (P.M. Robinson).

www.elsevier.com/locate/jeconom

dx.doi.org/10.1016/j.jeconom.2006.07.004

mailto:[email protected]

ARTICLE IN PRESSJ. Hualde, P.M. Robinson / Journal of Econometrics 140 (2007) 450–484 451

1. Introduction

Cointegration analysis has usually proceeded under the assumption of unit-root ðIð1ÞÞobservable series and short-memory stationary ðIð0ÞÞ cointegrating errors. Here, theleast squares estimate (LSE) of the cointegrating vector is not only consistent, butsuper-consistent (with convergence rate equal to sample size, n) (Stock, 1987) despitecontemporaneous correlation between regressors and cointegrating errors; optimalestimates, which account for this correlation, enjoy no better rate of convergence (Phillips,1991).

It is also possible to consider cointegration in a fractional context. To be specific, weconsider the model

Dgðyt � nxtÞ ¼ u#1t; tX1; yt ¼ 0; tp0;

Ddxt ¼ u#2t; tX1; xt ¼ 0; tp0

)(1)

for the bivariate observable sequence fyt; xtg. Here D ¼ 1� L, where L is the lag operator;

ð1� LÞ�a ¼X1j¼0

ajðaÞLj ; ajðaÞ ¼Gðj þ aÞ

GðaÞGðj þ 1Þ, (2)

taking GðaÞ ¼ 1 for a ¼ 0;�1;�2; . . . ; and Gð0Þ=Gð0Þ ¼ 1; the # superscript attached to ascalar or vector sequence vt has the meaning

v#t ¼ vt1ðt40Þ, (3)

where 1ð�Þ is the indicator function; fðu1t; u2tÞ; t ¼ 0;�1; . . .g is an unobservable covariancestationary bivariate sequence having zero mean and spectral density matrix that isnonsingular and bounded at all frequencies; and the real numbers g and d satisfy

0pgod. (4)

On this basis, we refer to ut ¼ ðu1t; u2tÞ0 as Ið0Þ, xt as IðdÞ and yt � nxt as IðgÞ, while for

na0 (5)

(4) implies that yt is also IðdÞ; under (1), (4) and (5), yt and xt are said to be cointegratedCIðd; gÞ (Engle and Granger, 1987), for which it is necessary that yt and xt share the sameintegration order (the argument of Ið�ÞÞ. The truncations on the right-hand side in (1)ensure that the model is well-defined in the mean square sense, whereas, for example,D�du2t does not have finite variance when dX1

2.

We anticipate

Covðu1t; u2tÞa0, (6)

when, rewriting the first equation of (1) as the regression

yt ¼ nxt þ v1t; v1t ¼ D�gu#1t, (7)

the xt and v1t are contemporaneously correlated. When

do12 (8)

(6) leads to inconsistency of the LSE due to the fact that xt is asymptotically stationary andso its sum of squares does not asymptotically dominate that of v1t. To overcome thisproblem, Robinson (1994a) showed that a narrow-band frequency-domain least squares

ARTICLE IN PRESSJ. Hualde, P.M. Robinson / Journal of Econometrics 140 (2007) 450–484452

estimate (NBLSE) is consistent, due to dominance near zero frequency of an IðgÞ spectraldensity by an IðdÞ one. (He considered the purely stationary situation, where there is notruncation in (1), but our modification does not affect such basic asymptotic properties.)Robinson and Marinucci (2003) gave a rate of convergence for this latter estimate,conjecturing its sharpness. Assuming (4), (8) and gþ do1

2, Christensen and Nielsen (2006)

obtained the asymptotic distribution of the NBLSE when u1t and u2t are incoherent atfrequency 0 (cf. (6)).Properties of the LSE and NBLSE were also studied by Robinson and Marinucci (2001,

2003) in case

d412, (9)

where there is trending nonstationarity. Here, the LSE is consistent, with convergence ratedepending on the location of g and d in the nonnegative quadrant, but the NBLSE stillsometimes converges faster, and never converges slower, despite dropping high-frequencyinformation. Referring to a sequence m used in the NBLSE such that m�1 þm=n! 0 asn!1, the respective rates are: for gþ do1, n2d�1 (LSE) and n2d�1ðn=mÞ1�g�d (NBLSE);for gþ d ¼ 1 but do1, n2d�1= log n (LSE) and n2d�1= logm (NBLSE); for g ¼ 0, d ¼ 1 bothestimates have rate n but the NBLSE enjoys less ‘‘second-order bias’’; and for gþ d41,both have rate nd�g.The question which then arises is whether these rates are optimal, by which we mean

whether they match the rates achieved by the Gaussian maximum likelihood estimate(MLE) under suitable regularity conditions. They are optimal for the combinationgþ d41, d� g41

2, but otherwise not. In particular, the nd�g rate is optimal for d� g41

2

without the restriction gþ d41, and Robinson and Hualde (2003) have established it forestimates asymptotically equivalent to the MLE, allowing for consistent estimation ofunknown g and d and a vector y of unknown parameters describing the autocovariancestructure of ut; these estimates of n have mixed normal asymptotics, and a Wald teststatistic with an asymptotic null w2 distribution, as established earlier in the CIð1; 0Þ case byPhillips (1991) and Johansen (1991). Indeed, Robinson and Hualde (2003) found the limitdistribution unaffected by the question of whether y, g and d are known or unknown.The present paper focuses on the case of ‘‘weak fractional cointegration’’

b ¼def d� go12, (10)

where substantially different asymptotics prevail, impacting also on the question of how dand g should be estimated. Under (10), since Dgyt and Dgxt are IðbÞ, they are asymptoticallystationary, and one anticipates the existence of

ffiffiffinp

-consistent and asymptotically normalestimates of n; the LSE and NBLSE converge slower than this owing to the dominance ofbias due to (6). Under (10), the gain of a cointegration analysis is clearly less than whenbX1

2, for example, in the CIð1; 0Þ case. Nevertheless, the identification of such structure is

useful, and a variety of empirical evidence appears to support (10).When cointegrated observables are stationary and cointegrating errors are not

antipersistent (so (4), (8) hold), (10) is inevitable. Andersen et al. (2001) detectedstationary long memory and comovement in statistics derived from high-frequencytransaction prices. Christensen and Prabhala (1998) and Christensen and Nielsen (2006)found integration orders between 0.35 and 0.4 in implied and realized volatilities and Ið0Þcointegrating errors. In Robinson and Yajima’s (2002) cointegration analysis of spot


closing prices of crude oil, most estimated integration orders were less than 0.5. Moregenerally, interest in the possibility of cointegration between stationary financial series isdeveloping, and Robinson and Marinucci (2003) argued that it can be difficult todistinguish between a unit-root process and some stationary long-memory ones.

In other cases of (10), observables are nonstationary. When they have a unit root, sod ¼ 1, it is implied that cointegrating errors are also nonstationary, albeit mean-reverting,in which case the cointegrating relation does not have the usual kind of ‘‘equilibrium’’interpretation. Nevertheless, a dimensionality reduction still occurs, empirical evidence forthe phenomenon can be found, and the case g41

2has been stressed by Marmol and Velasco

(2004). Diebold et al. (1991) represented real exchange rates as errors in cointegrated, andapparently unit-root, nominal exchange rates and prices, and found them in some cases tobe nonstationary. Similar mixed outcomes can be found in work of Cheung and Lai (1993)(investigating the PPP hypothesis), Baillie and Bollerslev (1994a) and Kim and Phillips(2000) (cointegration between spot exchange rates), Baillie and Bollerslev (1994b)(analysing the forward premium) and Crato and Rothman (1994) (cointegration betweenexchange rates). On the other hand, there may be no strong reason to focus on d ¼ 1 in afractional context; autoregression-based unit-root tests, such as those of Dickey and Fuller(1979), do not have good power against fractional alternatives, and though fractional-based tests have been developed (see e.g. Robinson, 1994b; Dolado et al., 2002) one cantreat d as unknown. In this case, empirical evidence of d4 1

2 with bo 12 was found by

Dueker and Startz (1998) (cointegration between US and Canadian bond rates) andRobinson and Marinucci (2003) (cointegration between stock prices and dividends, andbetween monetary aggregates), with estimates of d variously less than and greater than 1.

Here, we are principally concerned with estimating n, under (10). Most of the empiricalstudies reported above employ semiparametric estimates of integration orders, withconvergence rates slower than

ffiffiffinp

, so estimates of n depending on them will, like the LSEand NBLSE, be less than

ffiffiffinp

-consistent. Achievingffiffiffinp

-consistency requires a parametricapproach. Under both bo1

2 and 4 12 the Gaussian MLE appears to have optimality

properties and to provide Wald test statistics with null w2 limit distributions, and so shouldhandle multivariate systems containing more than one cointegrating relation, where bothbo 1

2and 4 1

2might occur. However, asymptotic properties of the MLE have yet to be

developed, in case of autocorrelated ut, and in (1) they can be achieved by acomputationally simpler approach when b4 1

2, as described by Robinson and Hualde

(2003), whereas this is not the case when bo 12.

To describe the theoretical background to inference when bo 12, note first that if g and d

are known, while ut is known to be white noise with unknown covariance matrix O, thenthe MLE of n is given in closed form, and may be computed as an added-variable LSE, aspursued in the following section. When g and/or d are unknown and ut has parametricautocorrelation (such as following a vector autoregression (VAR)), then the GaussianMLE of all the unknowns is again

ffiffiffinp

-consistent and asymptotically normal, but with limitcovariance matrix that is not block-diagonal, so the asymptotic variance of the estimate ofn differs from that when g and d are known. If do 1

2, a priori, conveying the implication

that d and g are both estimated by optimizing over subsets of the intersection of (4) and (8),asymptotic theory would largely follow the lines of authors such as Fox and Taqqu (1986)and Hosoya (1997), who were the first to develop such theory for standard scalar andvector long-memory time series models, respectively, the most notable difference being thefact that in our setting xt and yt would be only asymptotically stationary. If the possibility


that dX 12is admitted, and possibly gX 1

2also, then the situation is more delicate, as

discussed in Section 4.The preceding discussion makes it apparent that when g and d are unknown the issue of

how they are estimated is of greater significance when bo 12 than when b4 1

2. It is essentialhere (due to correlation between xt and u1t) that they be estimated

ffiffiffinp

-consistently. Closed-form

ffiffiffinp

-consistent estimates of integration orders are available (see Kashyap and Eom,1988; Moulines and Soulier, 1999), but these do not cover our bivariate situation, and alsoentail logging the periodogram, which raises technical difficulties not present in estimatesbased on quadratic forms, such as the MLE. In our setting some degree of numericaloptimization seems inevitable. Since this is likely to entail an initial search of the parameterspace to locate the vicinity of a global optimum, it is desirable if the computations can bearranged so that only univariate optimizations are involved. Even after concentrating outparameters, when both g and d are unknown the Gaussian MLE requires a bivariateoptimization under white noise ut, and at least a trivariate optimization when ut is a VAR,which we allow for. We propose

ffiffiffinp

-consistent and asymptotically normal estimates thatrequire only univariate optimizations.We mention finally other work on developing asymptotic inference on fractional

cointegration, which employs a different definition of IðdÞ processes for da0: for vt�Ið0Þ,we have D�dvt�IðdÞ for jdjo 1

2and

Pts¼1D

1�dvs�IðdÞ for 12odo 3

2. This kind of fractional

process has been called ‘‘Type I’’, and ours ‘‘Type II’’. Jeganathan (1999, 2001) consideredsuch a ‘‘Type I’’ version of (1), for jgjo 1

2and � 1

2odo 3

2, in a purely fractional context,

such that vt in the above definition is a white noise sequence. Assuming the distribution ofthe white noise inputs is of completely known (not necessarily Gaussian) form, Jeganathan(1999, 2001) considered limit distribution theory of optimal estimates of n, including themixed normal limit finding when b4 1

2, and Jeganathan (2001), using rates suggested by

Robinson (2000), derivedffiffiffinp

-consistency and asymptotic normality when bo 12;

Jeganathan (2001) also covered the case b ¼ 12. Though including some discussion of

estimation of g and d, Jeganathan (1999, 2001) assumed them known in his theory. Asidefrom this, the Gaussian version of his estimates is the same as ours in case our ut is whitenoise, though we do not assume Gaussianity. For the same, ‘‘Type I’’, definition offractional processes, but with Ið0Þ inputs having nonparametric autocorrelation (implyinga semiparametric model), Dolado and Marmol (1996) and Kim and Phillips (2000)developed methods and theory in cases when b4 1

2.

The basic structure of our estimates of n is described in the following section. Section 3provides asymptotic theory in case g and d are known, with proofs and some technicaldetails left to appendices. Section 4 considers estimation of g and d and the effect onestimating n. Section 5 contains Monte Carlo evidence of finite sample behaviour, andSection 6 several empirical applications.

2. Estimation of n

Write (1) as

ztðg; dÞ ¼ zxtðgÞnþ u#t , (11)

where we introduce the notation

vtðcÞ ¼ Dcv#t (12)


for a generic sequence vt, and define

ztðc; dÞ ¼ ðytðcÞ;xtðdÞÞ0; z ¼ ð1; 0Þ0. (13)

We take ut to be generated by the VAR

ut ¼Xp

j¼1

Bjut�j þ et, (14)

where all zeroes of detfI2 �Pp

j¼1Bjzjg lie outside the unit circle, the Bj being 2� 2

matrices, and Ir the r� r identity matrix, while et is a bivariate sequence, uncorrelated andhomoscedastic over t, with mean zero and covariance matrix O. We take (14) to meanwhite noise ut when p ¼ 0.

From (11) and (14) we have

ztðg; dÞ �Xp

j¼1

Bjzt�jðg; dÞ ¼ n zxtðgÞ �Xp

j¼1

Bjzxt�jðgÞ

( )þ eþt ; tX1, (15)

where

eþt ¼ u11ðt ¼ 1Þ þ ut �Xt�1j¼1

Bjut�j

( )1ðt ¼ 2; . . . ; pÞ þ et1ðt4pÞ. (16)

Denote by Bij the ith row of Bj. Writing eit for the ith element of et, for t4p the secondequation of (15) can be written as

xtðdÞ �Xp

j¼1

B2jzt�jðg; dÞ ¼ �nXp

j¼1

B2jzxt�jðgÞ þ e2t, (17)

whence the first equation can be written as

ytðgÞ ¼ nxtðgÞ þ rxtðdÞ þXp

j¼1

ðB1j � rB2jÞzt�jðg; dÞ

� nXp

j¼1

ðB1j � rB2jÞzxt�jðgÞ þ e1:2;t, ð18Þ

where e1:2;t ¼ e1t � re2t, r ¼ Eðe1te2tÞ=Eðe22tÞ; (18) is a form of error–correction representa-tion.

We wish to cater for the possibility of prior zero restrictions on the Bj which serve toeliminate some yt�jðgÞ, xt�jðgÞ, xt�jðdÞ, as this will improve efficiency. Thus, we introduce aq� ð3pþ 2Þ matrix Q, which is I3pþ2 when there are no such restrictions, but forqo3pþ 2, Q is formed by dropping rows corresponding to the restrictions. Thus, we canwrite (18) as

ytðgÞ ¼ y0QZtðg; dÞ þ e1:2;t, (19)

where

Ztðc; dÞ ¼ ðxtðcÞ;xtðdÞ;w0t�1ðc; dÞ; . . . ;w

0t�pðc; dÞÞ

0, (20)

wtðc; dÞ ¼ ðxtðcÞ;xtðdÞ; ytðcÞÞ0, (21)


and the q� 1 vector y consists of coefficients that are not a priori zero, being (in some casesnonlinear) functions of n, r and the Bij.Since Eðe1:2;tZtðg; dÞÞ ¼ 0, we consider the (possibly constrained) LSE

byðc; dÞ ¼ Gðc; dÞ�1gðc; dÞ, (22)

taking ðc; dÞ ¼ ðg; dÞ, ðg;edÞ, ðeg; dÞ or ðeg;edÞ, depending on whether g and/or d are known or

estimated by eg;ed, andGðc; dÞ ¼ Q

1

n

Xn

t¼pþ1

Ztðc; dÞZ0tðc; dÞQ

0; gðc; dÞ ¼ Q1

n

Xn

t¼pþ1

Ztðc; dÞytðcÞ. (23)

For example, in case p ¼ 1, if u1t is white noise while u2t is AR(1), then q ¼ 3 and (18)becomes

ytðgÞ ¼ nxtðgÞ þ rxtðdÞ � rB221xt�1ðdÞ þ e1:2;t, (24)

where B22j is the second element of B2j. Notice that n, r and B221 are all identified in (24),but it is apparent from comparison of (18) with (19) that in general, while n and r areexpected to be identified, only some elements of the Bj are. However, we are treating the Bj

as nuisance parameters, indeed it is principally n that is of interest, so we stress

bnðc; dÞ ¼ 10Gðc; dÞ�1gðc; dÞ, (25)

where 1 ¼ ð1; 0; . . . ; 0Þ0.In case p ¼ 0, bnðg; dÞ actually provides the Gaussian MLE of n, given knowledge of g; d

but lack of knowledge of O. For pX1, it is less efficient than the MLE for this case, but stillffiffiffinp

-consistent and computationally considerably simpler. Notice that over-specification ofp results in a further efficiency loss, but under-specification produces inconsistency. Inmoderate sample sizes, a modest choice of p, even p ¼ 1, might thus be a wise precaution.On the other hand, one could also regard (14) as approximating a more general infinite ARprocess with nonparametric Ið0Þ autocorrelation.

3. Asymptotic theory with known g; d

The present section establishes theffiffiffinp

-consistency and asymptotic normality of byðg; dÞ,and hence of bnðg; dÞ. We assume in addition to the description of (14) that the et arestationary and ergodic with finite fourth moment, satisfying also

EðetjFt�1Þ ¼ 0; Eðete0tjFt�1Þ ¼ O (26)

almost surely, where Ft is the s-field of events generated by es, spt, and also assume thatconditional (on Ft�1) third and fourth moments and cross-moments of elements of et

equal the corresponding unconditional moments. Thus, the et essentially behave like an iidsequence up to fourth moments. Noting from (1) that

xtðgÞ ¼Xt�1j¼0

ajðbÞu2;t�j ; t40; ¼ 0; tp0, (27)


define

xtðgÞ ¼X1

j¼maxðt;0Þ

ajðbÞu2;t�j ; extðgÞ ¼ xtðgÞ þ xtðgÞ, (28)

so that because of (10), extðgÞ, t ¼ 0;�1; . . . ; is a covariance stationary sequence. Likewise,so is

eytðgÞ ¼ nextðgÞ þ u1t, (29)

as is u2t. Now define

ewt ¼ ðextðgÞ; u2t; eytðgÞÞ0; eZt ¼ ðextðgÞ; u2t; ew0t�1; . . . ; ew0t�pÞ

0, (30)

F ¼ Eð eZteZ0tÞ; C ¼ Eðe21:2;t eZt

eZ0tÞ. (31)

The proof of the following theorem is left to Appendix A.

Theorem 3.1. Under (1), (4), (5), (10) and the conditions in the sentence containing (26), if gand d are known

n1=2fbyðg; dÞ � yg!dNð0; ðQFQ0Þ�1QCQ0ðQFQ0Þ�1Þ, (32)

as n!1, and the covariance matrix on the right-hand side is consistently estimated by

Gðg; dÞ�1Kðg; dÞGðg; dÞ�1, (33)

where

Kðc; dÞ ¼ Q1

n

Xn

t¼pþ1

be21:2;tðc; dÞZtðc; dÞZ0tðc; dÞQ

0, (34)

in which

be1:2;tðc; dÞ ¼ ytðcÞ �byðc; dÞ0QZtðc; dÞ. (35)

Remark 3.1. As anticipated, for pX1, bnðg; dÞ is inefficient relative to the Gaussian MLE,because it ignores the nonlinear restrictions on y.

Remark 3.2. Over-parameterization in the Bj results in further loss of efficiency. Considerthe case where, in the estimation, the Bj are taken to be diagonal, with also u1t white noiseand u2t ARðpÞ, to extend (24). Then if in fact u2t is also white noise the limiting variance ofn1=2fbnðg; dÞ � ng is

o21:2 o2

2

X1j¼pþ1

a2j ðbÞ

( ),, (36)

where o22 ¼ Eðe22tÞ, o

21:2 ¼ Eðe21:2;tÞ; (36) is increasing in p. As a simpler alternative to (33),

(34), we can consistently estimate (36) bybo21:2ðg; dÞð1

0Gðg; dÞ1Þ�1, (37)


where

bo21:2ðg; dÞ ¼

1

n

Xn

t¼pþ1

be21:2;tðg; dÞ. (38)

Note that (36) and (37) also apply in case p ¼ 0 is correctly taken in the estimation, whenbnðg; dÞ is equivalent to the Gaussian MLE, and (36) becomes

o21:2

2�4bo22

pB

1

2� b;

1

2� b

� �� 1

� �. (39)

Note also that (36) and (39) do not depend on fourth cumulants of et. If ut is not whitenoise, the limiting variance of n1=2fbnðg; dÞ � ng, namely 10ðQFQ0Þ�1QCQ0ðQFQ0Þ�11 (see(32)), in general depends on the fourth cumulant of e1:2;t, e1:2;t, e2t and e2t, though of coursethe latter is zero under Gaussianity.

Remark 3.3. On the other hand, under-parameterization of the Bj produces inconsistencyof bnðg; dÞ, as when ut is actually ARðpþ 1Þ. Our AR approach is computationallyconvenient, and is in a long tradition of macroeconometric estimation of linearsimultaneous equations systems, as well as relating to Johansen’s (1991) approach toCIð1; 0Þ cointegration. With an ARMA approach, over-parameterization of bothAR and MA orders would have more serious consequences than those discussed inRemark 3.2.

Remark 3.4. So long as pX1 and some Bj are nondiagonal, the endogeneity property (6)holds even when O is diagonal, i.e.r ¼ 0.

Remark 3.5. Though we assume (10) throughout, when in fact b4 12, nðg; dÞ is as efficient as

the Gaussian MLE. In particular, it can be shown to approximate the estimate ofRobinson and Hualde (2003), which for b4 1

2 has a limiting mixed normal distributionwhen the estimates of the parameters describing the short-memory process ut convergesuitably fast, but need not themselves be asymptotically efficient.

4. The case of unknown g; d

The main practical interest in fractional cointegration centres on the realistic situation inwhich g and/or d are unknown. We shall focus on the case where both g and d areunknown, as being the most difficult both computationally and theoretically.First, suppose that ut is correctly taken to be white noise, with unknown covariance

matrix O satisfying (6). Considering the Gaussian log-likelihood, both O and n can beconcentrated out to leave an objective function of g and d. The resulting estimates of g andd might then be plugged into (25). Instead, we propose estimates of g and d that requiretwo univariate nonlinear optimizations, in place of one bivariate one. The computationaladvantage in this would be intensified in extensions to systems involving a greater numberof integration orders.Write the second likelihood equation of (1) as

xtðdÞ ¼ e2t; tX1. (40)


We estimate d by

ed0 ¼ arg mind2D

S0ðdÞ, (41)

for a closed interval D and (cf. Beran, 1995)

S0ðdÞ ¼Xn

t¼1

x2t ðdÞ. (42)

We then estimate g by

eg0 ¼ arg minc2C

T0ðcÞ, (43)

for a closed interval C (presumably a subset of ½0;ed0�) andT0ðcÞ ¼

Xn

t¼1

fytðcÞ � bnðc;ed0ÞxtðcÞ � brðc;ed0Þxtðed0Þg2, (44)

where bnðc; dÞ is given by (25), taking p ¼ 0; and brðc; dÞ is the second element of byðc; dÞ inthis case. The presence of c as argument in ytðcÞ, and indeed of d in xtðdÞ of (42), presentsno barrier to consistent estimation because, for example, ytðcÞ involves c only in thecoefficients of lagged values yt�1; yt�2; . . . ; not yt.

In case of VAR ut, we develop further the triangular structure of (1) by assuming

Bj is upper-triangular; j ¼ 1; . . . ; p. (45)

This corresponds to a kind of causal structure, with yt formed from yt�1; yt�2; . . . andxt;xt�1; . . ., but xt being determined by

xtðdÞ � f0RX tðdÞ ¼ e2t, (46)

with

X tðdÞ ¼ ðxt�1ðdÞ; . . . ;xt�pðdÞÞ0, (47)

and R an r� p matrix with R ¼ Ip in case r ¼ p, but for rop, R is formed by droppingspecified rows from Ip, in case B22j ¼ 0 for some j, the r� 1 vector f collecting the B22j thatare not a priori zero. Prescription (46) includes the case of diagonal Bj, does not seem anexcessive requirement given the allowance for nondiagonal O, and introduces an elementof parsimony.

DefinebfðdÞ ¼ HðdÞ�1hðdÞ, (48)

where

HðdÞ ¼ R1

n

Xn

t¼pþ1

X tðdÞX0tðdÞR

0; hðdÞ ¼ R1

n

Xn

t¼pþ1

X tðdÞxtðdÞ. (49)

First estimate d byedp ¼ arg mind2D

SpðdÞ, (50)


where

SpðdÞ ¼Xn

t¼pþ1

fxtðdÞ � bfðdÞ0RX tðdÞg2. (51)

Then estimate g byegp ¼ arg minc2C

TpðcÞ, (52)

where

TpðcÞ ¼Xn

t¼pþ1

fytðcÞ �byðc;edpÞ

0QZtðc;edpÞg2. (53)

As abbreviating notation, we denote throughout, for any pX0, ed ¼ edp, eg ¼ egp. The proofof the following theorem is omitted for the following reasons. When do 1

2(and D � 0; 1

2

� �)

the proof of limit behaviour of ed does not greatly differ from proofs of Fox and Taqqu(1986) and Giraitis and Surgailis (1990); their xt is actually stationary, not justasymptotically, and their objective functions differ from (42) and (51), though with p ¼

0 their estimates have equal asymptotic efficiency to our ed0. When the possibility that d4 12

is allowed in the choice of D, there is a difficulty in proving consistency of ed if D includesdpd� 1

2, due to a lack of uniform convergence of SpðdÞ on D. Since d is unknown, there is

no guarantee of avoiding this problem. Velasco and Robinson (2000) establishedconsistency, and thence asymptotic normality with

ffiffiffinp

rate, of an alternative estimate ofd allowing D to be arbitrarily large, but for ‘‘Type I’’ processes and employing tapering(which tends to inflate variance). Hualde and Robinson (2004) have recently done the samefor ed in our setting, with the unimportant difference that their linear process for xt hasscalar innovation, and is not nested in a nondiagonal bivariate system. In our setting, andwhether or notD � ð0; 1

2Þ, the proof of Theorem 4.1 proceeds by establishing consistency ofed, following Hualde and Robinson (2004), then consistency of eg, allowing for the extra

complexity involved in working with residuals, and then employing the Cramer–Wolddevice and relatively straightforward and tedious arguments.

Theorem 4.1. Under (1), (4), (5), (10), (45), the conditions in the sentence containing (26) and

g 2 C, d 2 D,

n1=2

bnðeg;edÞ � neg� ged� d

264375!dNð0;ABA0Þ, (54)

as n!1, where A is a 3� ðqþ 2Þ matrix and B is a ðqþ 2Þ � ðqþ 2Þ matrix, for which

consistent estimates bA and bB are presented in Appendix B.

Remark 4.1. Analytic formulae, in either the time or frequency domain, for A and B are

excessively complicated, and thus omitted. The estimate bAbB bA0 provided by Appendix B isguaranteed nonnegative definite.

Remark 4.2. As well as being useful in inference on n, the theorem could also be applied ininference on g and d, for example, to set a confidence interval for b which could be useful injudging the suitability of the weak cointegration specification (10).


Remark 4.3. On the other hand, our estimation procedure, though not our asymptotictheory, can also be used when b4 1

2, though alternative, possibly computationally more

convenient, methods, are available here. In fact, Robinson and Hualde (2003) showed thatin this case the asymptotic distributions of bnðg; dÞ and bnðeg;edÞ are the same, due to eg, ed stillbeing

ffiffiffinp

-consistent.

Remark 4.4. Robinson and Hualde (2003) suggest use of residuals from the LSE orNBLSE of n in the estimation of g when b4 1

2. However, the LSE and NBLSE are less-

than-ffiffiffinp

-consistent under (10), and so it appears that the resulting estimates of g will notachieve the

ffiffiffinp

-consistency needed to provide affiffiffinp

-consistent estimate of n.

Remark 4.5. Even when ut is white noise, bnðeg;edÞ, ed and eg are inefficient relative to theGaussian MLE; intuitively, this is due to the estimation of d from only the second equationof (1) (i.e. (41)), whereas the first equation also contains relevant information. However,the estimates can be updated to efficiency by a single Newton step.

Remark 4.6. The paper has taken existence of cointegration, and bo 12, for granted.

In practice these properties will have to be established, and our estimation of nwill form the final step. Some discussion of methodology has already appeared inMarinucci and Robinson (2001), Robinson and Yajima (2002) and Robinson andMarinucci (2003). This has stressed a semiparametric approach, recognizing that aparametric model for ut (i.e. knowledge of p in our VAR case) is unlikely to beknown a priori. A natural starting point is to test the necessary requirement of equalityof integration orders of xt and yt. The literature on asymptotic inference for multi-variate fractional models is rather small, and some of it assumes lack of cointegration,but the approaches of Robinson and Yajima (2002) and Hualde (2002) areavailable. Given a positive outcome, a test for existence of cointegration, suchas those of Marinucci and Robinson (2001), Robinson and Yajima (2002) and Marmoland Velasco (2004) can be conducted. Given a positive outcome, one can rejectb ¼ 1

2against the alternative bo 1

2if a suitably standardized d� g� 1

2is significantly

negative relative to the standard normal distribution, d and g being semiparametricestimates of d and g, employing residuals yt � nxt based on an initial consistent n, such asthe NBLSE. Then, using proxies ytðgÞ � nxtðgÞ and xtðgÞ for u1t and u2t, respectively, theAR order of ut can be identified, for example as described in the empirical examples ofSection 6.

5. Monte Carlo evidence

With the main aim of investigating the finite sample performance of our estimates of nand associated rules of inference, a Monte Carlo experiment was carried out. There are twoparts to our investigation, the first comparing our proposed estimates (with known andunknown integration orders) with the simplest one, the LSE, and the second evaluating ina simple framework the inefficiency of bnðg; dÞ mentioned in Remark 3.1 with respect to twoasymptotically efficient estimates of n. In data generation from (1), (14), we took p ¼ 1throughout, with

B1 ¼ diagfb1; b2g, (55)


where each bi took values 0, 0.5, 0.9. The case b1 ¼ b2 ¼ 0 actually correspond to p ¼ 0 in(14), where ut is a white noise vector. Likewise, b1 ¼ 0, b2a0 correspond to (24). We haveemployed in (55) abbreviating notation compared to (24), so b2 ¼ B221. The et in (14) weregenerated as Gaussian with Eðe21tÞ ¼ Eðe22tÞ ¼ 1 and Eðe1te2tÞ ¼ r, taking values �0.5, 0, 0.5,0.75, via the g05ezf routine of the Fortran NAG library. We varied r in order to assesspossible ‘‘simultaneous equation bias’’, xt and u1t being orthogonal only when r ¼ 0. Weemployed four ðg; dÞ combinations:

ðg; dÞ ¼ ð0; 0:4Þ; ð0:2; 0:4Þ; ð0:4; 0:8Þ; ð0:7; 1Þ, (56)

for all of which bo 12. Notice that variances of all estimates, both in finite samples and

asymptotically, will inevitably vary across parameter values. For example, because theEðe2itÞ are fixed throughout, Eðe21:2;tÞ will decrease in jrj, while Eðu

2itÞ will increase in bi. Finite

sample biases of our estimates will doubtless also be affected by such variation, though in amore subtle manner. We took n ¼ 1.For each combination of parameter values, 1000 series of fyt;xtg of lengths n ¼

64; 128; 256 were generated. Fractional series were generated as in (27), using a0ðaÞ ¼ 1,ajþ1ðaÞ ¼ ððj þ aÞ=ðj þ 1ÞÞajðaÞ, jX1, for a40. For each series, in the first part of theexperiment we computed estimates of the following three types:

(i)

Tabl

Conv

ðg; dÞ

Optim

LSE,

LSE,

The LSE,

n0 ¼Xn

t¼1

xtyt

Xn

t¼1

x2t

,. (57)

(ii)
The infeasible estimate nI ¼ bnðg; dÞ based on correct specification and mis-specificationand/or over-specification.
(iii)
The feasible estimate nF ¼ bnðeg;edÞ based on correct specification and mis-specificationand/or over-specification.
By ‘‘correct specification’’ we mean that all prior zero restrictions on B1 in (55),including the nondiagonal ones and any diagonal ones, are incorporated in the estimation,but not equality restrictions. By ‘‘mis-specification’’ we mean that for b1a0 and b2a0 wetook Ztðc; dÞ ¼ ðxtðcÞ; xtðdÞÞ

0. By ‘‘over-specification’’ we mean that for b1 ¼ b2 ¼ 0 wetook Ztðc; dÞ ¼ ðxtðcÞ;xtðdÞ;w0t�1ðc; dÞÞ

0. Knowledge of r ¼ 0 was never used. Table 1records convergence rates of the LSE and, under the heading ‘‘optimal’’, of nI, nF.We now describe how ed and eg in nF were computed. In estimating d, we fixed D ¼½d� 1; dþ 1� in (50). A D of length 2 may often be adequate. In estimating g, we fixed

e 1

ergence rates

ð0; 0:4Þ ð0:2; 0:4Þ ð0:4; 0:8Þ ð0:7; 1Þ

al n0:5 n0:5 n0:5 n0:5

ra0 Inconsistent Inconsistent n0:4 n0:3

r ¼ 0 n0:5 n0:5 n0:4 n0:3


C ¼ ½ed� 2:05;ed� 0:05� in (52). The upper bound seems reasonable since a very small b isunlikely to be detectable, indeed there is then near loss of identifiability and very poorbehaviour of estimates of n.

Table 2

Monte Carlo bias, b1 ¼ b2 ¼ 0, correct specification

r g n 64 128 256

d nI nF n0 nI nF n0 nI nF n0

�0.5 0 0.4 0.000 0.061 �0.338 �0.002 0.017 �0.320 �0.003 0.008 �0.307

0.2 0.4 0.000 0.124 �0.401 �0.005 0.077 �0.387 �0.010 0.048 �0.377

0.4 0.8 0.000 0.061 �0.193 �0.002 0.017 �0.151 �0.003 0.008 �0.120

0.7 1 0.000 0.101 �0.220 �0.003 0.040 �0.176 �0.006 0.020 �0.142

0 0 0.4 �0.006 �0.006 �0.007 �0.001 0.000 �0.003 �0.001 �0.001 0.000

0.2 0.4 �0.014 �0.038 �0.011 0.000 �0.001 �0.005 �0.003 �0.007 0.000

0.4 0.8 �0.006 �0.006 �0.015 �0.001 0.000 �0.009 �0.001 �0.001 �0.002

0.7 1 �0.009 �0.020 �0.031 0.000 0.000 �0.023 �0.002 �0.003 �0.005

0.5 0 0.4 0.001 �0.089 0.337 0.005 �0.016 0.320 0.003 �0.009 0.308

0.2 0.4 �0.001 �0.179 0.394 0.009 �0.081 0.384 0.006 �0.056 0.376

0.4 0.8 0.001 �0.089 0.192 0.005 �0.016 0.155 0.003 �0.009 0.120

0.7 1 0.000 �0.142 0.214 0.006 �0.043 0.182 0.004 �0.025 0.143

0.75 0 0.4 0.002 �0.123 0.511 0.003 �0.029 0.481 0.002 �0.010 0.460

0.2 0.4 0.003 �0.212 0.599 0.007 �0.125 0.578 0.006 �0.077 0.562

0.4 0.8 0.002 �0.123 0.287 0.003 �0.029 0.226 0.002 �0.010 0.176

0.7 1 0.003 �0.194 0.315 0.005 �0.073 0.258 0.004 �0.028 0.206

Table 3

Monte Carlo bias, b1 ¼ b2 ¼ 0:9, correct specification

r g n 64 128 256


�0.5 0 0.4 �0.015 0.165 �0.161 �0.003 0.102 �0.136 �0.005 0.078 �0.120

0.2 0.4 �0.041 0.121 �0.293 �0.008 0.096 �0.266 �0.006 0.120 �0.248

0.4 0.8 �0.015 0.165 �0.147 �0.003 0.102 �0.113 �0.005 0.078 �0.088

0.7 1 �0.024 0.190 �0.207 �0.005 0.080 �0.166 �0.006 0.134 �0.131

0 0 0.4 �0.026 �0.086 �0.014 �0.016 �0.039 �0.005 �0.008 �0.005 0.000

0.2 0.4 �0.057 �0.155 �0.027 �0.033 �0.098 �0.012 �0.009 �0.010 �0.001

0.4 0.8 �0.026 �0.086 �0.025 �0.016 �0.039 �0.014 �0.008 �0.005 �0.003

0.7 1 �0.036 0.036 �0.043 �0.022 �0.093 �0.030 �0.008 0.002 �0.006

0.5 0 0.4 0.016 �0.208 0.158 0.004 �0.145 0.137 0.005 �0.073 0.120

0.2 0.4 0.028 �0.118 0.281 0.010 �0.168 0.267 0.008 �0.122 0.247

0.4 0.8 0.016 �0.208 0.140 0.004 �0.145 0.116 0.005 �0.073 0.090

0.7 1 0.019 �0.269 0.195 0.006 �0.144 0.170 0.006 �0.081 0.134

0.75 0 0.4 0.027 �0.278 0.237 0.010 �0.149 0.202 0.007 �0.068 0.176

0.2 0.4 0.047 �0.092 0.421 0.020 �0.143 0.390 0.010 �0.158 0.364

0.4 0.8 0.027 �0.278 0.206 0.010 �0.149 0.165 0.007 �0.068 0.129

0.7 1 0.034 �0.278 0.283 0.013 �0.215 0.236 0.008 �0.139 0.192


The estimates nI, nF (but not n0) are invariant to ðg; dÞ combinations with the same b,provided the fractionally integrated processes are generated from the same ut sequence.Thus, we do not report results for ðg; dÞ ¼ ð0:4; 0:8Þ in tables where only nI and nF are

Table 4

Monte Carlo bias, b1 ¼ 0, b2 ¼ 0:5, correct specification

r g n 64 128 256


�0.5 0 0.4 �0.001 0.036 �0.142 0.000 0.044 �0.128 0.000 0.032 �0.119

0.2 0.4 �0.002 0.058 �0.203 0.001 0.065 �0.189 �0.001 0.058 �0.181

0.4 0.8 �0.001 0.036 �0.083 0.000 0.044 �0.065 0.000 0.032 �0.052

0.7 1 �0.001 0.057 �0.106 0.000 0.065 �0.085 0.000 0.050 �0.069

0 0 0.4 �0.001 �0.003 �0.004 0.001 0.001 �0.001 0.001 0.000 0.000

0.2 0.4 0.001 �0.018 �0.008 0.004 �0.007 �0.003 0.003 0.006 0.000

0.4 0.8 �0.001 �0.003 �0.008 0.001 0.001 �0.005 0.001 0.000 �0.001

0.7 1 0.000 �0.012 �0.017 0.002 0.000 �0.012 0.002 0.004 �0.002

0.5 0 0.4 0.006 �0.034 0.142 0.004 �0.030 0.129 0.001 �0.029 0.119

0.2 0.4 0.016 �0.037 0.201 0.010 �0.045 0.189 0.004 �0.044 0.180

0.4 0.8 0.006 �0.034 0.082 0.004 �0.030 0.067 0.001 �0.029 0.052

0.7 1 0.009 �0.048 0.102 0.006 �0.053 0.088 0.002 �0.041 0.069

0.75 0 0.4 0.004 �0.061 0.216 0.002 �0.059 0.192 0.000 �0.047 0.178

0.2 0.4 0.011 �0.089 0.305 0.006 �0.094 0.283 0.001 �0.085 0.269

0.4 0.8 0.004 �0.061 0.123 0.002 �0.059 0.097 0.000 �0.047 0.076

0.7 1 0.006 �0.093 0.151 0.003 �0.091 0.124 0.001 �0.073 0.100

Table 5

Monte Carlo bias, b1 ¼ 0:9, b2 ¼ 0, correct specification

r g n 64 128 256


�0.5 0 0.4 �0.118 �0.189 �0.758 �0.040 �0.099 �0.755 �0.014 �0.046 �0.746

0.2 0.4 �0.264 �0.323 �1.05 �0.110 �0.155 �1.11 �0.054 �0.084 �1.14

0.4 0.8 �0.118 �0.189 �1.05 �0.040 �0.099 �0.965 �0.014 �0.046 �0.852

0.7 1 �0.172 �0.329 �1.51 �0.066 �0.142 �1.41 �0.029 �0.064 �1.26

0 0 0.4 0.006 0.006 �0.039 0.005 0.042 �0.015 0.005 0.011 �0.002

0.2 0.4 �0.002 �0.063 �0.065 0.009 0.019 �0.030 0.005 0.000 �0.005

0.4 0.8 0.006 0.006 �0.119 0.005 0.042 �0.082 0.005 0.011 �0.013

0.7 1 0.003 �0.073 �0.251 0.006 0.029 �0.210 0.005 0.008 �0.038

0.5 0 0.4 0.129 0.111 0.714 0.052 0.124 0.740 0.018 0.038 0.741

0.2 0.4 0.258 0.189 0.970 0.126 0.167 1.07 0.056 0.067 1.12

0.4 0.8 0.129 0.111 0.994 0.052 0.124 0.981 0.018 0.038 0.854

0.7 1 0.177 0.173 1.42 0.079 0.147 1.46 0.032 0.052 1.27

0.75 0 0.4 0.167 0.190 1.09 0.065 0.128 1.11 0.022 0.039 1.11

0.2 0.4 0.363 0.300 1.48 0.172 0.237 1.61 0.079 0.074 1.68

0.4 0.8 0.167 0.190 1.48 0.065 0.128 1.42 0.022 0.039 1.25

0.7 1 0.242 0.291 2.08 0.106 0.197 2.05 0.043 0.058 1.83


involved. Similarly, ed� d is invariant to d, so the reported results apply to any d, whereaseg� g is invariant to ðg; dÞ combinations with the same b, so we again omit results forðg; dÞ ¼ ð0:4; 0:8Þ.

Table 6

Monte Carlo bias, b1 ¼ b2 ¼ 0:5, mis-specification

r g n 64 128 256


�0.5 0 0.4 0.000 1.13 �0.242 �0.001 0.981 �0.221 �0.003 0.851 �0.208

0.2 0.4 0.000 1.95 �0.346 �0.003 1.84 �0.328 �0.009 1.64 �0.316

0.4 0.8 0.000 1.13 �0.167 �0.001 0.981 �0.132 �0.003 0.851 �0.105

0.7 1 0.000 1.51 �0.212 �0.002 1.30 �0.170 �0.005 1.10 �0.138

0 0 0.4 �0.005 1.79 �0.008 0.000 1.59 �0.003 0.000 1.40 0.000

0.2 0.4 �0.010 3.40 �0.016 0.002 3.30 �0.006 �0.001 3.06 �0.001

0.4 0.8 �0.005 1.79 �0.017 0.000 1.59 �0.010 0.000 1.40 �0.002

0.7 1 �0.007 2.50 �0.033 0.000 2.20 �0.024 0.000 1.91 �0.005

0.5 0 0.4 0.004 2.45 0.240 0.006 2.10 0.222 0.003 1.98 0.208

0.2 0.4 0.008 5.04 0.337 0.013 4.73 0.326 0.007 4.45 0.314

0.4 0.8 0.004 2.45 0.164 0.006 2.10 0.135 0.003 1.98 0.105

0.7 1 0.005 3.54 0.204 0.008 3.03 0.177 0.004 2.78 0.140

0.75 0 0.4 0.004 2.71 0.365 0.003 2.33 0.332 0.002 2.21 0.310

0.2 0.4 0.009 5.88 0.513 0.008 5.46 0.487 0.006 5.04 0.469

0.4 0.8 0.004 2.71 0.244 0.003 2.33 0.196 0.002 2.21 0.154

0.7 1 0.006 3.97 0.300 0.005 3.40 0.250 0.003 3.12 0.201

Table 7

Monte Carlo bias, b1 ¼ b2 ¼ 0, over-specification

r g n 64 128 256


�0.5 0 0.4 0.020 0.207 �0.338 0.013 0.248 �0.320 0.021 0.149 �0.307

0.2 0.4 0.065 0.012 �0.401 0.042 0.042 �0.387 0.035 �0.017 �0.377

0.4 0.8 0.020 0.207 �0.193 0.013 0.248 �0.151 0.021 0.149 �0.120

0.7 1 0.035 0.179 �0.220 0.022 0.205 �0.176 0.026 0.124 �0.142

0 0 0.4 �0.032 �0.056 �0.007 �0.006 �0.059 �0.003 0.007 �0.140 0.000

0.2 0.4 �0.036 �0.058 �0.011 0.023 �0.083 �0.005 0.027 �0.108 0.000

0.4 0.8 �0.032 �0.056 �0.015 �0.006 �0.059 �0.009 0.007 �0.140 �0.002

0.7 1 �0.034 �0.043 �0.031 0.003 �0.101 �0.023 0.014 �0.115 �0.005

0.5 0 0.4 0.006 �0.291 0.337 0.017 �0.323 0.320 �0.005 �0.290 0.308

0.2 0.4 0.021 �0.092 0.394 0.061 �0.151 0.384 0.004 �0.155 0.376

0.4 0.8 0.006 �0.291 0.192 0.017 �0.323 0.155 �0.005 �0.290 0.120

0.7 1 0.012 �0.259 0.214 0.032 �0.288 0.182 �0.001 �0.263 0.143

0.75 0 0.4 �0.018 �0.288 0.511 0.002 �0.319 0.481 �0.016 �0.187 0.460

0.2 0.4 �0.034 �0.103 0.599 0.016 �0.102 0.578 �0.021 �0.178 0.562

0.4 0.8 �0.018 �0.288 0.287 0.002 �0.319 0.226 �0.016 �0.187 0.176

0.7 1 �0.023 �0.191 0.315 0.007 �0.319 0.258 �0.017 �0.210 0.206


Tables 2–7 report Monte Carlo bias (defined as the estimate minus the true value) of n0,nI and nF, each table referring to a particular ðb1; b2Þ combination with either correctspecification, mis-specification or over-specification. Only some of the ðb1; b2Þ combina-tions covered in the experiment are included, in order to conserve on space. Generally, nIperforms best, followed by nF, with n0 worst.We discuss first the cases of correct specification (Tables 2–5). The relative performance

of n0, nI and nF mentioned above is maintained in the full white noise case b1 ¼ b2 ¼ 0(Table 2), and in the AR case (Tables 3–5) when ra0, but not when r ¼ 0 with b1 ¼ b2a0,where n0 is best. For b1 ¼ b2 ¼ 0:9, b ¼ 0:4 and small n, n0 usually beats nF even when ra0(Table 3). For b1 ¼ 0, b2a0 (Table 4), we are close to the white noise outcome, but whenb1a0, b2 ¼ 0 the bias of n0 decays very slowly, and is unacceptably large when b1 ¼ 0:9(Table 5). Focussing now more on variation across ðg; dÞ, the bias of nI decreasesin b, as is the case for nF when b1 ¼ b2 ¼ 0. With AR structure, the worst performance ofnF is generally found for ðg; dÞ ¼ ð0:2; 0:4Þ or ð0:7; 1Þ. As for n0, bias varies withcollective memory gþ d when r ¼ 0, but when ra0, ð0; 0:4Þ and ð0:2; 0:4Þ are the worstcases, unsurprisingly in view of the LSE’s inconsistency here. Generally, nF works bestunder b ¼ 0:4. With respect to variation in r, overall, the bias shares the sign of r in case ofn0, nI, but is opposite in case of nF, except for the case b1 ¼ 0:9, b2 ¼ 0. nI isrelatively insensitive to r, though for b1 ¼ 0:9, b2 ¼ 0 (Table 5), bias increases in jrj, asis the case for n0, but no clear pattern can be found in the results for nF, though there isevidence of increase in bias with jrj. Looking at variation across ðb1; b2Þ, AR structuretends to reduce bias in n0 but increase it, and possibly change its sign, in nI. For nF, theworst performances occur when b1a0, but even here bias decays rapidly as n increases, asit does also for nI.

Table 8

Monte Carlo SD, b1 ¼ b2 ¼ 0, correct specification

r g n 64 128 256


�0.5 0 0.4 0.178 0.429 0.109 0.112 0.169 0.084 0.076 0.104 0.065

0.2 0.4 0.419 0.788 0.131 0.274 0.470 0.102 0.193 0.318 0.077

0.4 0.8 0.178 0.429 0.154 0.112 0.169 0.122 0.076 0.104 0.092

0.7 1 0.259 0.583 0.270 0.167 0.286 0.237 0.116 0.183 0.188

0 0 0.4 0.212 0.321 0.107 0.128 0.151 0.073 0.086 0.093 0.049

0.2 0.4 0.489 0.817 0.141 0.310 0.469 0.105 0.217 0.284 0.076

0.4 0.8 0.212 0.321 0.171 0.128 0.151 0.128 0.086 0.093 0.093

0.7 1 0.305 0.518 0.322 0.189 0.269 0.278 0.130 0.150 0.214

0.5 0 0.4 0.184 0.484 0.112 0.113 0.175 0.084 0.073 0.099 0.063

0.2 0.4 0.426 0.892 0.136 0.276 0.514 0.104 0.187 0.313 0.078

0.4 0.8 0.184 0.484 0.160 0.113 0.175 0.127 0.073 0.099 0.092

0.7 1 0.266 0.673 0.283 0.168 0.307 0.247 0.112 0.176 0.192

0.75 0 0.4 0.140 0.593 0.114 0.087 0.196 0.091 0.058 0.101 0.075

0.2 0.4 0.328 0.811 0.116 0.213 0.535 0.092 0.146 0.354 0.073

0.4 0.8 0.140 0.593 0.140 0.087 0.196 0.111 0.058 0.101 0.086

0.7 1 0.203 0.733 0.226 0.129 0.401 0.188 0.088 0.193 0.152


Mis-specification (Table 6) has surprisingly little effect on nI, but seriously damages nF,especially when b is small, ð0:2; 0:4Þ being clearly the worst case, though again biasdecreases with n. As anticipated, over-specification (Table 7) makes little difference to nI,

Table 9

Monte Carlo SD, b1 ¼ b2 ¼ 0:9, correct specification

r g n 64 128 256


�0.5 0 0.4 0.918 2.88 0.164 0.480 1.62 0.112 0.271 0.784 0.075

0.2 0.4 1.78 4.15 0.300 0.961 1.85 0.225 0.557 1.03 0.159

0.4 0.8 0.918 2.88 0.225 0.480 1.62 0.161 0.271 0.784 0.108

0.7 1 1.19 3.46 0.374 0.633 1.70 0.296 0.363 0.886 0.216

0 0 0.4 1.06 2.98 0.192 0.553 1.54 0.122 0.306 0.878 0.079

0.2 0.4 2.04 3.67 0.354 1.10 2.12 0.253 0.634 1.26 0.177

0.4 0.8 1.06 2.98 0.282 0.553 1.54 0.191 0.306 0.878 0.120

0.7 1 1.37 3.21 0.483 0.729 1.78 0.370 0.411 0.847 0.249

0.5 0 0.4 0.901 3.81 0.172 0.472 1.55 0.115 0.266 0.877 0.075

0.2 0.4 1.76 3.59 0.319 0.953 2.04 0.233 0.553 1.02 0.161

0.4 0.8 0.901 3.81 0.241 0.472 1.55 0.170 0.266 0.877 0.109

0.7 1 1.17 3.44 0.405 0.625 1.75 0.313 0.358 0.862 0.219

0.75 0 0.4 0.717 2.67 0.138 0.372 1.30 0.093 0.212 0.914 0.066

0.2 0.4 1.39 2.94 0.248 0.747 1.56 0.179 0.441 0.986 0.131

0.4 0.8 0.717 2.67 0.195 0.372 1.30 0.128 0.212 0.914 0.088

0.7 1 0.930 2.79 0.331 0.491 1.50 0.232 0.286 1.12 0.169

Table 10

Monte Carlo SD, b1 ¼ 0:9, b2 ¼ 0, correct specification

r g n 64 128 256


�0.5 0 0.4 0.608 1.72 0.434 0.346 0.899 0.338 0.210 0.431 0.249

0.2 0.4 1.09 2.12 0.831 0.642 1.23 0.714 0.403 0.749 0.555

0.4 0.8 0.608 1.72 1.18 0.346 0.899 1.04 0.210 0.431 0.818

0.7 1 0.761 1.98 2.23 0.438 1.03 2.16 0.270 0.564 1.81

0 0 0.4 0.666 2.34 0.466 0.399 0.927 0.373 0.239 0.479 0.280

0.2 0.4 1.14 2.18 0.864 0.711 1.33 0.764 0.443 0.750 0.615

0.4 0.8 0.666 2.34 1.36 0.399 0.927 1.18 0.239 0.479 0.907

0.7 1 0.813 1.86 2.70 0.496 1.10 2.60 0.301 0.555 2.09

0.5 0 0.4 0.615 1.84 0.450 0.358 0.864 0.353 0.205 0.384 0.262

0.2 0.4 1.09 2.00 0.849 0.657 1.25 0.729 0.408 0.691 0.585

0.4 0.8 0.615 1.84 1.24 0.358 0.864 1.08 0.205 0.384 0.816

0.7 1 0.768 1.74 2.38 0.451 1.03 2.27 0.268 0.504 1.85

0.75 0 0.4 0.529 1.70 0.383 0.295 0.774 0.297 0.166 0.325 0.217

0.2 0.4 0.986 1.95 0.769 0.590 1.27 0.652 0.362 0.651 0.508

0.4 0.8 0.529 1.70 0.974 0.295 0.774 0.835 0.166 0.325 0.678

0.7 1 0.681 1.72 1.89 0.391 0.966 1.72 0.228 0.445 1.44

ARTICLE IN PRESS

Table 11

Monte Carlo SD, b1 ¼ b2 ¼ 0, over-specification

r g n 64 128 256


�0.5 0 0.4 1.78 2.41 0.109 1.07 1.53 0.084 0.670 1.13 0.065

0.2 0.4 3.46 2.92 0.131 2.14 1.79 0.102 1.36 1.13 0.077

0.4 0.8 1.78 2.41 0.154 1.07 1.53 0.122 0.670 1.13 0.092

0.7 1 2.30 2.65 0.270 1.41 1.50 0.237 0.887 0.999 0.188

0 0 0.4 2.04 3.09 0.107 1.19 1.75 0.073 0.748 1.23 0.049

0.2 0.4 4.03 3.57 0.141 2.37 2.20 0.105 1.52 1.53 0.076

0.4 0.8 2.04 3.09 0.171 1.19 1.75 0.128 0.748 1.23 0.093

0.7 1 2.66 3.51 0.322 1.56 1.89 0.278 0.988 1.25 0.214

0.5 0 0.4 1.74 2.73 0.112 1.06 1.82 0.084 0.668 1.35 0.063

0.2 0.4 3.39 3.13 0.136 2.12 1.90 0.104 1.35 1.45 0.078

0.4 0.8 1.74 2.73 0.160 1.06 1.82 0.127 0.668 1.35 0.092

0.7 1 2.26 3.10 0.283 1.40 1.73 0.247 0.881 1.38 0.192

0.75 0 0.4 1.42 2.05 0.114 0.831 1.60 0.091 0.519 1.33 0.075

0.2 0.4 2.74 2.32 0.116 1.67 1.47 0.092 1.05 1.22 0.073

0.4 0.8 1.42 2.05 0.140 0.831 1.60 0.111 0.519 1.33 0.086

0.7 1 1.83 2.17 0.226 1.09 1.45 0.188 0.686 1.20 0.152

J. Hualde, P.M. Robinson / Journal of Econometrics 140 (2007) 450–484468

which does much better than n0, but nF is damaged (especially for b ¼ 0:4) by poorestimates of the integration orders. However, small reductions in the optimizing intervalsC, D cause very significant improvements in nF (and in fact in the estimates of g, d).Tables 8–11 contain Monte Carlo standard deviations (SD) for only a subset of the

combinations for which bias results were displayed. As noted before, variability isconsiderably affected by parameter values, and the relative performance of n0, nI and nFcan be illustrated by focussing on only few cases. In fact, n0 was superior to nI for mostcombinations, including those not displayed, with nF a poor third. With correctspecification, this was most notably evident for small n and b1 ¼ b2a0 (Table 9), inpart due to the proliferation in regressors, five in nI and nF versus one in n0, with variabilityin ed and eg considerably inflating SD of nF relative to nI. Precision also increases withincreasing n, and when one or both of the bi is zero (see Tables 8 and 10), the performanceof nI and nF improves relative to that of n0. On the other hand, with over-specification(Table 11), nI and nF unsurprisingly deteriorate further, and generally larger sample sizeswill be required in order for their faster convergence rate to consistently deliver smaller SDthan n0. Nevertheless, it must be borne in mind that the paper’s motivation is not tominimize variance but rather to achieve

ffiffiffinp

-consistency and asymptotic normality in afairly general context, which the LSE n0 does not provide.We now examine the usefulness of the limit distributional properties of nI and nF by

examining the size of Wald tests. We computed

W I ¼ðnI � nÞ2n

½Gðg; dÞ�1Kðg; dÞGðg; dÞ�1�ð1Þ; WF ¼

ðnF � nÞ2n

½ bAbB bA0�ð1Þ , (58)


where ½��ðiÞ denotes ith diagonal element. Empirical sizes, with respect to nominal sizesa ¼ 0:05 and 0.1, again across 1000 replications, are reported in Tables 12–17, for each ofthe ðb1; b2Þ for which biases were given.

With correct specification, even for b1 ¼ b2 ¼ 0 (Table 12), sizes of the infeasible statisticW I are somewhat too large, and autocorrelation in ut exacerbates this, with the caseb1a0, b2 ¼ 0 again worse than b1 ¼ 0, b2a0, but not necessarily worse than b1 ¼ b2a0(Tables 13–15). Results for a ¼ 0:1 are clearly better than for a ¼ 0:05. Overall, there is

Table 13

Empirical sizes of W I and WF, b1 ¼ b2 ¼ 0:9, correct specification

r g a 0.05 0.10

n 64 64 128 128 256 256 64 64 128 128 256 256

d W I WF W I WF W I WF W I WF W I WF W I WF

�0.5 0 0.4 0.114 0.020 0.092 0.027 0.084 0.037 0.184 0.035 0.161 0.060 0.132 0.069

0.2 0.4 0.109 0.029 0.098 0.037 0.074 0.049 0.180 0.053 0.158 0.066 0.138 0.082

0.7 1 0.112 0.029 0.097 0.032 0.082 0.044 0.182 0.047 0.161 0.065 0.136 0.078

0 0 0.4 0.122 0.024 0.080 0.022 0.077 0.028 0.187 0.053 0.150 0.044 0.129 0.066

0.2 0.4 0.125 0.025 0.092 0.017 0.063 0.009 0.191 0.036 0.146 0.030 0.130 0.025

0.7 1 0.125 0.037 0.079 0.022 0.075 0.016 0.192 0.051 0.146 0.046 0.122 0.044

0.5 0 0.4 0.112 0.024 0.097 0.030 0.067 0.033 0.177 0.049 0.160 0.055 0.145 0.052

0.2 0.4 0.118 0.020 0.094 0.031 0.071 0.055 0.182 0.053 0.161 0.060 0.139 0.076

0.7 1 0.121 0.025 0.090 0.033 0.073 0.046 0.179 0.046 0.165 0.059 0.133 0.072

0.75 0 0.4 0.115 0.018 0.100 0.023 0.079 0.022 0.185 0.041 0.161 0.041 0.151 0.048

0.2 0.4 0.107 0.038 0.096 0.066 0.081 0.107 0.188 0.074 0.162 0.098 0.146 0.149

0.7 1 0.112 0.034 0.101 0.049 0.079 0.053 0.181 0.066 0.159 0.078 0.141 0.092

Table 12

Empirical sizes of W I and WF, b1 ¼ b2 ¼ 0, correct specification

r g a 0.05 0.10

n 64 64 128 128 256 256 64 64 128 128 256 256


�0.5 0 0.4 0.076 0.137 0.072 0.091 0.068 0.059 0.124 0.174 0.124 0.142 0.122 0.102

0.2 0.4 0.076 0.130 0.059 0.099 0.058 0.075 0.134 0.164 0.117 0.141 0.130 0.115

0.7 1 0.073 0.137 0.066 0.102 0.060 0.077 0.129 0.175 0.118 0.139 0.128 0.116

0 0 0.4 0.078 0.093 0.053 0.080 0.057 0.055 0.136 0.152 0.112 0.122 0.125 0.117

0.2 0.4 0.077 0.055 0.054 0.033 0.062 0.034 0.133 0.082 0.104 0.073 0.114 0.066

0.7 1 0.076 0.074 0.058 0.060 0.053 0.055 0.134 0.128 0.105 0.102 0.120 0.099

0.5 0 0.4 0.074 0.131 0.055 0.080 0.055 0.066 0.136 0.164 0.119 0.122 0.117 0.097

0.2 0.4 0.073 0.114 0.055 0.094 0.054 0.081 0.141 0.146 0.120 0.135 0.111 0.108

0.7 1 0.068 0.115 0.055 0.083 0.050 0.080 0.140 0.154 0.121 0.127 0.116 0.116

0.75 0 0.4 0.075 0.124 0.059 0.076 0.063 0.037 0.136 0.153 0.112 0.104 0.116 0.067

0.2 0.4 0.073 0.170 0.058 0.146 0.069 0.093 0.143 0.207 0.113 0.183 0.116 0.137

0.7 1 0.076 0.145 0.060 0.111 0.064 0.075 0.143 0.178 0.113 0.148 0.110 0.116

ARTICLE IN PRESS

Table 15

Empirical sizes of W I and WF, b1 ¼ 0:9, b2 ¼ 0, correct specification

r g a 0.05 0.10

n 64 64 128 128 256 256 64 64 128 128 256 256


�0.5 0 0.4 0.101 0.036 0.081 0.030 0.068 0.030 0.171 0.059 0.140 0.055 0.115 0.070

0.2 0.4 0.105 0.030 0.087 0.025 0.060 0.017 0.178 0.055 0.139 0.040 0.123 0.031

0.7 1 0.101 0.033 0.086 0.031 0.061 0.033 0.175 0.065 0.140 0.055 0.119 0.062

0 0 0.4 0.097 0.032 0.086 0.037 0.071 0.042 0.162 0.062 0.157 0.080 0.125 0.074

0.2 0.4 0.090 0.031 0.091 0.025 0.077 0.020 0.166 0.057 0.150 0.044 0.127 0.039

0.7 1 0.092 0.034 0.089 0.034 0.070 0.030 0.155 0.056 0.150 0.066 0.124 0.056

0.5 0 0.4 0.112 0.037 0.073 0.028 0.053 0.038 0.165 0.054 0.141 0.057 0.101 0.071

0.2 0.4 0.097 0.019 0.078 0.029 0.064 0.018 0.161 0.044 0.139 0.051 0.120 0.045

0.7 1 0.109 0.026 0.082 0.030 0.060 0.034 0.164 0.050 0.147 0.062 0.110 0.059

0.75 0 0.4 0.117 0.025 0.082 0.022 0.051 0.026 0.185 0.047 0.133 0.048 0.104 0.060

0.2 0.4 0.107 0.022 0.078 0.026 0.065 0.024 0.173 0.044 0.133 0.038 0.114 0.037

0.7 1 0.111 0.019 0.081 0.029 0.058 0.031 0.184 0.047 0.143 0.058 0.106 0.053

Table 14

Empirical sizes of W I and WF, b1 ¼ 0, b2 ¼ 0:5, correct specification

r g a 0.05 0.10

n 64 64 128 128 256 256 64 64 128 128 256 256


�0.5 0 0.4 0.067 0.076 0.067 0.049 0.055 0.057 0.125 0.103 0.117 0.081 0.100 0.088

0.2 0.4 0.067 0.051 0.063 0.036 0.055 0.047 0.119 0.068 0.119 0.056 0.094 0.072

0.7 1 0.067 0.063 0.066 0.044 0.058 0.052 0.122 0.082 0.120 0.067 0.103 0.073

0 0 0.4 0.069 0.041 0.067 0.046 0.059 0.038 0.113 0.065 0.122 0.072 0.106 0.078

0.2 0.4 0.066 0.022 0.064 0.022 0.065 0.017 0.114 0.030 0.120 0.035 0.112 0.035

0.7 1 0.070 0.035 0.067 0.035 0.065 0.025 0.114 0.048 0.125 0.056 0.107 0.056

0.5 0 0.4 0.062 0.070 0.054 0.056 0.049 0.053 0.124 0.098 0.115 0.078 0.105 0.075

0.2 0.4 0.061 0.046 0.053 0.039 0.049 0.041 0.127 0.065 0.110 0.057 0.103 0.060

0.7 1 0.066 0.062 0.051 0.051 0.047 0.044 0.127 0.087 0.118 0.063 0.102 0.067

0.75 0 0.4 0.073 0.092 0.055 0.059 0.054 0.063 0.145 0.116 0.107 0.082 0.096 0.083

0.2 0.4 0.069 0.079 0.054 0.066 0.057 0.064 0.131 0.102 0.104 0.091 0.099 0.087

0.7 1 0.067 0.088 0.058 0.055 0.051 0.049 0.137 0.113 0.106 0.078 0.103 0.068


improvement as n increases, and even for small n the performance of W I seems quitesatisfactory. Predictably, mis-specification (Table 16) plays havoc, producing sizes that areunacceptably high, especially for a ¼ 0:05. With over-specification, performance is againgood, though we would not expect high power.For the feasible statistic WF, with correct specification and no autocorrelation in ut

(Table 12), sizes are worse than for W I, with less evidence of settling down as n increases

ARTICLE IN PRESS

Table 17

Empirical sizes of W I and WF, b1 ¼ b2 ¼ 0, over-specification

r g a 0.05 0.10

n 64 64 128 128 256 256 64 64 128 128 256 256


�0.5 0 0.4 0.091 0.013 0.072 0.034 0.066 0.031 0.143 0.035 0.109 0.065 0.112 0.067

0.2 0.4 0.084 0.014 0.065 0.046 0.053 0.074 0.139 0.034 0.115 0.083 0.099 0.124

0.7 1 0.088 0.018 0.067 0.047 0.058 0.051 0.137 0.039 0.112 0.084 0.105 0.097

0 0 0.4 0.078 0.027 0.061 0.022 0.050 0.036 0.127 0.046 0.115 0.041 0.100 0.064

0.2 0.4 0.072 0.021 0.054 0.040 0.047 0.054 0.135 0.048 0.107 0.068 0.086 0.102

0.7 1 0.075 0.022 0.052 0.029 0.049 0.050 0.132 0.040 0.104 0.064 0.094 0.079

0.5 0 0.4 0.068 0.027 0.063 0.026 0.056 0.028 0.124 0.047 0.118 0.051 0.105 0.061

0.2 0.4 0.071 0.028 0.064 0.032 0.061 0.046 0.113 0.042 0.116 0.057 0.110 0.087

0.7 1 0.065 0.024 0.056 0.035 0.060 0.039 0.120 0.046 0.110 0.061 0.110 0.074

0.75 0 0.4 0.085 0.031 0.072 0.025 0.060 0.018 0.144 0.051 0.129 0.054 0.113 0.042

0.2 0.4 0.074 0.026 0.073 0.045 0.057 0.048 0.138 0.052 0.126 0.076 0.114 0.087

0.7 1 0.080 0.026 0.080 0.039 0.058 0.033 0.143 0.060 0.125 0.065 0.112 0.063

Table 16

Empirical sizes of W I and WF, b1 ¼ b2 ¼ 0:5, mis-specification

r g a 0.05 0.10

n 64 64 128 128 256 256 64 64 128 128 256 256


�0.5 0 0.4 0.274 0.072 0.250 0.244 0.255 0.813 0.349 0.179 0.333 0.498 0.341 0.957

0.2 0.4 0.258 0.001 0.228 0.009 0.228 0.085 0.331 0.017 0.317 0.053 0.317 0.239

0.7 1 0.270 0.025 0.243 0.068 0.233 0.392 0.343 0.056 0.331 0.215 0.334 0.682

0 0 0.4 0.258 0.477 0.245 0.807 0.248 0.988 0.344 0.621 0.319 0.882 0.325 0.995

0.2 0.4 0.242 0.160 0.214 0.277 0.229 0.459 0.327 0.258 0.296 0.404 0.310 0.679

0.7 1 0.255 0.302 0.229 0.565 0.241 0.842 0.339 0.435 0.308 0.701 0.322 0.956

0.5 0 0.4 0.264 0.702 0.246 0.904 0.248 0.988 0.356 0.767 0.324 0.938 0.324 0.992

0.2 0.4 0.245 0.295 0.230 0.399 0.224 0.631 0.341 0.371 0.303 0.467 0.317 0.733

0.7 1 0.253 0.498 0.239 0.726 0.239 0.944 0.347 0.598 0.306 0.778 0.325 0.962

0.75 0 0.4 0.274 0.767 0.244 0.941 0.251 0.997 0.360 0.820 0.329 0.965 0.333 0.997

0.2 0.4 0.249 0.320 0.221 0.407 0.218 0.661 0.336 0.403 0.310 0.495 0.313 0.768

0.7 1 0.262 0.544 0.240 0.734 0.238 0.963 0.350 0.623 0.318 0.815 0.318 0.978

J. Hualde, P.M. Robinson / Journal of Econometrics 140 (2007) 450–484 471

and more variation across parameter values, and they are sometimes actually less thannominal values. With autocorrelation (Tables 13–15), sizes are emphatically too small andmostly further from the nominal values than the corresponding W I are in the oppositedirection, though this is by no means always the case, and sometimes the results areextraordinarily good. As expected, the effect of mis-specification is more dramatic than for


W I. With over-specification (Table 17), sizes are mainly less than nominal values, but ingeneral approximate them as n increases. Our overall experience with WF is quiteencouraging.While we have stressed estimation of n, estimates of d and g would also be of interest in

an empirical analysis of fractional cointegration, and so we also give some space to theperformance of ed and eg, and to Wald tests for d and g based on Theorem 4.1.Tables 18 and 19 report Monte Carlo bias and SD of ed for the same values of b2 (0, 0.5,

0.9) and n (64, 128, 256) as before, again based on 1000 replications. However, we fixr ¼ 0:5 here, using the same estimates of ed computed in this case for the feasible estimatesnF and Wald statistics WF discussed previously. We report results for minimization of bothS0ðdÞ and S1ðdÞ (see (42), (51)), so that S0ðdÞ with b2 ¼ 0 and S1ðdÞ with b2a0 bothcorrespond to correct specification, S1ðdÞ with b2 ¼ 0 to over-specification and S0ðdÞ withb2a0 to mis-specification.Biases from S0ðdÞ with b2 ¼ 0 look satisfactory even for n ¼ 64, and decrease in n. For

S1ðdÞ with b2 ¼ 0:5, 0:9, there is some deterioration, but performance is still acceptable.For S1ðdÞ with b2 ¼ 0 results are worse, but small reductions in D have a large positiveimpact on ed. In this case the negative bias of ed is somehow expected, as the estimated(nonexistent) AR component in u2t accounts for some of the autocorrelation structure.

Table 19

Monte Carlo SD of ed, r ¼ 0:5

n 64 128 256

Estimationnb2 0 0.5 0.9 0 0.5 0.9 0 0.5 0.9

S0ðdÞ 0.110 0.149 0.107 0.072 0.102 0.075 0.046 0.070 0.057

S1ðdÞ 0.419 0.258 0.286 0.409 0.216 0.237 0.373 0.182 0.165

Table 20

Empirical sizes ða ¼ 0:05Þ of W d, r ¼ 0:5

n 64 128 256

Estimationnb2 0 0.5 0.9 0 0.5 0.9 0 0.5 0.9

S0ðdÞ 0.087 0.806 1.00 0.061 0.941 1.00 0.085 1.00 1.00

S1ðdÞ 0.367 0.170 0.158 0.382 0.149 0.159 0.276 0.135 0.106

Table 18

Monte Carlo bias of ed, r ¼ 0:5

n 64 128 256

Estimationnb2 0 0.5 0.9 0 0.5 0.9 0 0.5 0.9

S0ðdÞ �0.016 0.403 0.852 �0.001 0.415 0.868 �0.003 0.417 0.875

S1ðdÞ �0.325 �0.170 0.161 �0.286 �0.121 0.114 �0.207 �0.072 0.059


Unsurprisingly, there is severe bias, increasing with b2, when S0ðdÞ is used with b2a0. SDin the correctly specified and over-specified cases is, as expected, worse for AR ut.

Tables 20 and 21 report Monte Carlo sizes of Wald statistics for d,

W d ¼ðed� dÞ2n

½ bAbB bA0�ð3Þ , (59)

based on Theorem 4.1, with respect to nominal sizes a ¼ 0:05, 0.1, respectively. Asexpected, under mis-specification they are far too large. Otherwise, while still too large(especially for over-specification) in some cases they are not bad, and decrease in n, onesfor a ¼ 0:1 being best.

Tables 22–25 give corresponding results for eg, with b1 ¼ b2 ¼ b taking values 0, 0.5, 0.9.Our estimation procedure being sequential, we consider two categories, S0ðdÞ followed byT0ðcÞ (44), and S1ðdÞ followed by T1ðcÞ (53), so that in the former case there is correctspecification for b ¼ 0 and mis-specification for ba0, and in the latter, over-specificationfor b ¼ 0 and correct specification for ba0. The bias and SD results of Tables 22 and 23exhibit some variation across ðg; dÞ, and surprisingly biases are much less for b ¼ 0:9 thanfor b ¼ 0:5, possibly due to cancellation. For the Wald statistic

W g ¼ðeg� gÞ2n

½ bAbB bA0�ð2Þ , (60)

more size variation is also found, in Tables 24 and 25, than for W d, with results forb ¼ 0:9 being substantially better than for other cases under correct specification.

For the second part of the study, we focus on a situation where it is straightforward toderive asymptotically efficient estimates of n, and we compare their Monte Carlo variance

Table 21

Empirical sizes ða ¼ 0:10Þ of W d, r ¼ 0:5

n 64 128 256

Estimationnb2 0 0.5 0.9 0 0.5 0.9 0 0.5 0.9

S0ðdÞ 0.130 0.837 1.00 0.117 0.955 1.00 0.104 1.00 1.00

S1ðdÞ 0.444 0.201 0.205 0.419 0.190 0.195 0.304 0.162 0.122

Table 22

Monte Carlo bias of eg, r ¼ 0:5, b1 ¼ b2 ¼ b

Estimation g n 64 128 256

dnb 0 0.5 0.9 0 0.5 0.9 0 0.5 0.9

S0ðdÞ, T0ðcÞ 0 0.4 �0.038 0.396 0.842 �0.012 0.416 0.868 �0.003 0.421 0.874

0.2 0.4 �0.048 0.368 0.825 �0.018 0.400 0.857 �0.005 0.413 0.870

0.7 1 �0.040 0.387 0.839 �0.012 0.410 0.865 �0.005 0.418 0.872

S1ðdÞ, T1ðcÞ 0 0.4 �0.250 �0.309 0.044 �0.192 �0.218 0.066 �0.105 �0.153 0.042

0.2 0.4 �0.422 �0.345 0.001 �0.361 �0.255 0.025 �0.256 �0.177 0.012

0.7 1 �0.336 �0.325 0.026 �0.279 �0.233 0.049 �0.176 �0.164 0.030

ARTICLE IN PRESS

Table 24

Empirical sizes ða ¼ 0:05Þ of W g, r ¼ 0:5, b1 ¼ b2 ¼ b


dnb 0 0.5 0.9 0 0.5 0.9 0 0.5 0.9

S0ðdÞ, T0ðcÞ 0 0.4 0.114 0.951 1.00 0.096 1.00 1.00 0.075 1.00 1.00

0.2 0.4 0.105 0.946 1.00 0.091 1.00 1.00 0.084 1.00 1.00

0.7 1 0.110 0.945 1.00 0.075 0.999 1.00 0.092 1.00 1.00

S1ðdÞ, T1ðcÞ 0 0.4 0.243 0.273 0.115 0.352 0.263 0.094 0.322 0.249 0.080

0.2 0.4 0.388 0.301 0.108 0.471 0.297 0.074 0.455 0.271 0.068

0.7 1 0.329 0.274 0.104 0.400 0.268 0.089 0.362 0.254 0.073

Table 25

Empirical sizes ða ¼ 0:10Þ of W g, r ¼ 0:5, b1 ¼ b2 ¼ b


dnb 0 0.5 0.9 0 0.5 0.9 0 0.5 0.9

S0ðdÞ, T0ðcÞ 0 0.4 0.182 0.970 1.00 0.136 1.00 1.00 0.148 1.00 1.00

0.2 0.4 0.174 0.963 1.00 0.144 1.00 1.00 0.129 1.00 1.00

0.7 1 0.163 0.958 1.00 0.139 1.00 1.00 0.182 1.00 1.00

S1ðdÞ, T1ðcÞ 0 0.4 0.330 0.329 0.152 0.421 0.324 0.126 0.375 0.304 0.110

0.2 0.4 0.450 0.349 0.136 0.533 0.357 0.104 0.501 0.331 0.099

0.7 1 0.380 0.329 0.140 0.453 0.328 0.118 0.416 0.312 0.099

Table 23

Monte Carlo SD of eg, r ¼ 0:5, b1 ¼ b2 ¼ b


dnb 0 0.5 0.9 0 0.5 0.9 0 0.5 0.9

S0ðdÞ, T0ðcÞ 0 0.4 0.123 0.126 0.128 0.081 0.087 0.087 0.053 0.062 0.062

0.2 0.4 0.112 0.117 0.114 0.077 0.086 0.083 0.055 0.063 0.060

0.7 1 0.118 0.122 0.123 0.079 0.087 0.086 0.058 0.065 0.064

S1ðdÞ, T1ðcÞ 0 0.4 0.330 0.260 0.252 0.309 0.223 0.183 0.279 0.195 0.136

0.2 0.4 0.378 0.239 0.225 0.370 0.214 0.157 0.347 0.195 0.113

0.7 1 0.355 0.248 0.240 0.340 0.220 0.170 0.316 0.195 0.126


with that of nI. We consider only the case where in (55) b1 ¼ 0:5; 0:9, b2 ¼ 0. The firstefficient estimate we calculate is the Gaussian MLE with known b1, which in view of (18) isidentical to the LSE of n in the equation

ytðgÞ � b1yt�1ðgÞ ¼ nðxtðgÞ � b1xt�1ðgÞÞ þ rxtðdÞ þ e1:2t. (61)

ARTICLE IN PRESS

Table 26

Efficiency ratios

r g b 0.5 0.9

n 64 128 256 64 128 256 64 128 256 64 128 256

d r1 r1 r1 r2 r2 r2 r1 r1 r1 r2 r2 r2

�0.5 0 0.4 1.18 1.12 1.15 1.06 1.12 1.10 7.46 5.12 3.63 1.38 3.40 3.14

0.2 0.4 5.97 5.00 4.23 1.00 1.43 2.37 45.9 33.6 25.8 1.09 16.3 19.8

0.7 1 1.86 1.54 1.40 0.988 1.09 1.15 16.6 11.6 8.52 1.35 7.47 7.30

0 0 0.4 1.22 1.13 1.12 1.00 1.07 1.05 6.77 5.14 3.60 1.67 3.20 3.26

0.2 0.4 6.11 5.06 4.27 1.07 1.29 2.32 37.9 31.3 23.8 1.91 11.5 20.3

0.7 1 1.97 1.57 1.38 0.931 1.02 1.13 14.3 11.3 8.16 2.12 6.82 7.41

0.5 0 0.4 1.13 1.14 1.16 1.01 1.11 1.13 7.64 5.10 3.63 1.44 3.69 3.03

0.2 0.4 6.19 4.61 4.19 1.04 1.35 2.12 45.7 32.8 27.4 0.900 4.71 18.8

0.7 1 1.83 1.50 1.39 0.961 1.07 1.12 16.9 11.5 8.76 1.26 7.01 7.11

0.75 0 0.4 1.16 1.18 1.23 1.16 1.21 1.23 9.14 5.77 3.91 1.53 3.67 2.72

0.2 0.4 6.11 4.49 4.15 0.985 1.39 1.66 61.1 44.2 35.3 0.669 15.1 18.5

0.7 1 1.79 1.42 1.38 1.09 1.07 1.01 21.6 14.4 10.4 1.21 7.99 6.80


We also consider a two-stage approach where in the first step we estimate b1 by

bb1 ¼

Pnt¼2 bu1tbu1;t�1Pn

t¼2 bu21;t�1

; bu1t ¼ ytðgÞ � nIxtðgÞ, (62)

and in the second compute the estimate of n as in the infeasible situation but replacing b1

by bb1. We report in Table 26 the ‘‘efficiency ratios’’ r1 and r2, which are the Monte Carlovariance of nI divided by either that of the Gaussian MLE with known b1 (r1) or that of thefeasible estimate ðr2Þ. Note that r1 and r2 are invariant to the value of Eðe22tÞ, provided theestimates are computed from the same ut sequence. In general, results are little affected bychanges in r, and the loss of efficiency of nI is larger for smaller b and larger b1. Asexpected, nI is more inefficient relative to the infeasible MLE, and this is accentuated thelarger and smaller b1 and b are, respectively. In the comparison with the infeasible MLE,the efficiency loss is reduced as n increases, the reverse happening for the feasible estimate.On the limited evidence provided by our simple experiment, it seems worth improvingefficiency by incorporating restrictions on y. Undoubtedly more iterations could furtherimprove matters.

6. Empirical examples

Using a methodology involving the LSE and NBLSE of n, and semiparametric estimatesof n, Robinson and Marinucci (2003) found evidence that bo 1

2in some of the bivariate

macroeconomic series originally examined by Engle and Granger (1987) and Campbell andShiller (1987), who investigated only the possibility of CIð1; 0Þ cointegration. Thisexperience motivates application of our present approach to the same data.

The main departure from the methodology of the previous section was an attempt atgreater realism by determining p in (14) from the data, rather than assuming its value


a priori. For this purpose, we need proxies for the uit, which can only be obtained byoperating on the observed yt, xt series with preliminary estimates of n, g and d. To estimaten here we used the LSE n0, given by (57) (and computed by Robinson and Marinucci,2003). To estimate g and d, we used semiparametric estimates (already computed byMarinucci and Robinson, 2001; Robinson and Marinucci, 2003) in order to providerobustness against a range of short-memory specifications for ut. Specifically, the estimatesof g and d computed by these authors were of log periodogram (LP) and semiparametricGaussian (SG) type (of the precise form considered by Robinson, 1995a, b), using variousbandwidths and based either on raw data/residuals or on first differenced ones followed byadding back 1. For asymptotic theory under stationarity we appeal to Robinson (1995a,b), and under nonstationarity, to Velasco (1999a,b). Using preliminary estimates of g, d, n,sample correlograms and partial correlograms were computed (to lag length 36) in order toidentify, in the spirit of Box and Jenkins (1971), the AR orders of the uit. For each data set,this was done for both the smallest and largest of the various univariate estimates ofmemory parameters based on the xt/residuals provided by Marinucci and Robinson (2001)and Robinson and Marinucci (2003). When this led to contradictory models for the uit theanalysis was continued with both.We also took this opportunity to examine the matter of truncation, which in one

form or another always arises with fractional models, and perhaps most acutely whennonstationary data are involved. When estimated innovations from a stationary fractionalmodel are computed, the (infinite) AR representation has to be truncated because the databegin at time ‘‘1’’, not at time ‘‘�1’’. In our model (1) for nonstationary data, thetruncation is actually inherent in the model, so strictly speaking there is no ‘‘error’’associated with it. However, the model reflects the time when the data begin, and if we wereto drop the first observation, say, and start at the next one, the degree of filtering applied toall subsequent observations would change, and this could have a marked effect, especiallywith nonstationary data, even though filtering is here applied after de-meaning. To checkfor stability with respect to this phenomenon, we thus report computations based on thelast n0 ¼ n� j observations, for j ¼ 0; 1; . . . ; 10.We look first at Engle and Granger’s (1987) quarterly consumption and income data,

1947Q1–1981Q2 ðn ¼ 138Þ. They found evidence of CIð1; 0Þ cointegration, but did notinvestigate fractional possibilities. Marinucci and Robinson’s (2001) analysis tends tosupport the notion of d ¼ 1, but not of g ¼ 0, with positive estimates of g that sometimesfall in the nonstationary region, thereby hinting that bo 1

2is possible.

Taking y ¼ consumption, x ¼ income, the LSE of n, from Robinson and Marinucci(2003), is 0.229. The two preliminary estimates of d taken from Marinucci and Robinson(2001) were 0.89 (LP applied to first differences of x and adding back 1, with bandwidth22) and 1.08 (SG applied to first differences of x and adding back 1, with bandwidth 40). Ineach case, the corresponding correlograms and partial correlograms suggested modellingu2t as white noise. The preliminary estimates of g were 0.19 (LP applied to raw residualswith bandwidth 22) and 0.87 (SG applied to first differenced residuals and adding back 1,with bandwidth 40). This large gap results in identification of an AR(1) u1t in the first case,and white noise u1t in the second. In view of these investigations, we carried out twodistinct cointegration analyses, one with p ¼ 0 in (14), the other with p ¼ 1 in (14) withB1 ¼ diagðb1; 0Þ.In case u1t and u2t are both white noise, Table 27 reports values of the following statistics

with n replaced by n0 ¼ n� j, j ¼ 0; . . . ; 10: bn ¼ bnðeg;edÞ, ed, eg, and their estimated standard

ARTICLE IN PRESS

Table 27

Consumption and income: ut white noise

n0 138 137 136 135 134 133 132 131 130 129 128

bn 0.223 0.222 0.251 0.252 0.251 0.248 0.247 0.242 0.243 0.245 0.246

SEðbnÞ 0.027 0.031 0.024 0.022 0.023 0.022 0.023 0.021 0.022 0.023 0.023ed 1.07 1.07 1.09 1.15 1.15 1.17 1.18 1.18 1.18 1.18 1.18

SEðedÞ 0.028 0.028 0.059 0.068 0.073 0.080 0.083 0.082 0.083 0.082 0.084eg 0.714 0.745 0.715 0.692 0.694 0.696 0.696 0.685 0.692 0.694 0.693

SEðegÞ 0.084 0.092 0.087 0.087 0.089 0.090 0.090 0.089 0.093 0.093 0.093br �0.024 �0.055 �0.085 �0.090 �0.090 �0.086 �0.085 �0.072 �0.073 �0.073 �0.074

r �0.195 �0.189 �0.297 �0.311 �0.310 �0.294 �0.285 �0.247 �0.251 �0.250 �0.253


errors SEðbnÞ, SEðedÞ, SEðegÞ from Theorem 4.1, br ¼ brðeg;edÞ, which is the estimated coefficientof xtð

edÞ in (18) for p ¼ 0 with eg, ed, replacing g, d, and the correlation Corrðe1t; e2tÞ isestimated by

r ¼ brðeg;edÞðbs22=bs11Þ1=2, (63)

where

bs11 ¼ n�1X

t

0ðytðegÞ �bnðeg;edÞxtðegÞÞ2; bs22 ¼ n�1

Xt

0x2

t ðedÞ, (64)

withP0

t meaning summation over the last n0 observations.As n0 falls, bn and ed tend to increase, and eg to decrease, but there is high stability for

n0p133, and generally the changes are insignificant relative to standard errors, bn for n0 ¼

128 being one standard error larger than bn for n0 ¼ 138 (and also somewhat larger than theLSE). The estimates of d and g are certainly consistent with bo 1

2. More especially,

exploiting the standard errors provided by our approach, the hypothesis that d ¼ 1 seemsrejectable against d41, but (though we do not report standard errors of eb ¼ ed� eg, whichcould be computed using Theorem 4.1) there is no evidence against bo 1

2. Substantial

negative contemporaneous correlation between u1t and u2t is suggested. Dropping the firstobservation does not affect ed, since x1ðdÞ ¼ x1 for any d.

The analysis with AR(1) u1t in Table 28 presents a very different picture. Here, we alsoreport bb1 and cnb1, which are estimated coefficients of yt�1ðegÞ and �xt�1ðegÞ in the regression(cf. (18)) used to compute bn and br, and bs11 in r is now the sample average of the squaredresiduals from the regression of ytðegÞ � bnðeg;edÞxtðegÞ on yt�1ðegÞ �bnðeg;edÞxt�1ðegÞ. In view of theAR(1) component, we effectively lose one observation, so n0 goes from 127 to 137, theeffect of then dropping the first observation being very striking, but the estimatessubsequently exhibiting little variation across n0. As u2t is still supposed to be white noise,the estimates of d are identical to those in Table 27, but those of g are all now less thanzero, although not significantly, Engle and Granger’s (1987) CIð1; 0Þ conclusion now beingsupported. The AR component in u1t clearly accounts for the bulk of the autocorrelationin cointegrating errors, resulting in the small estimates of g, which are based on AR-transformed data. The MLE, which estimates g simultaneously with b1 and the otherparameters, would allow AR and fractional features to compete more favourably,though, as discussed in Section 1, it would require much heavier computation. Notice thatcnb1 looks quite consistent with bn and bb1, possibly providing some support for the present

ARTICLE IN PRESS

Table 29

logM1 and log GNP: ut white noise

n0 90 89 88 87 86 85 84 83 82 81 80

bn 0.704 0.740 0.578 0.564 0.608 0.640 0.638 0.644 0.643 0.649 0.658

SEðbnÞ 0.077 0.145 0.040 0.058 0.058 0.054 0.054 0.061 0.061 0.061 0.061ed 1.06 1.06 1.91 1.88 1.74 1.63 1.64 1.63 1.63 1.61 1.59

SEðedÞ 0.057 0.057 0.025 0.121 0.117 0.068 0.083 0.082 0.086 0.084 0.076eg 0.884 0.928 1.12 1.16 1.11 1.09 1.09 1.11 1.10 1.10 1.09

SEðegÞ 0.108 0.122 0.121 0.121 0.131 0.136 0.138 0.140 0.140 0.139 0.139br �0.134 �0.222 �0.261 �0.268 �0.315 �0.352 �0.350 �0.379 �0.376 �0.391 �0.408

r �0.839 �0.543 �0.402 �0.413 �0.455 �0.475 �0.473 �0.507 �0.504 �0.515 �0.522

Table 28

Consumption and income: u1t AR(1), u2t white noise

n0 137 136 135 134 133 132 131 130 129 128 127

bn 0.163 0.257 0.264 0.267 0.263 0.265 0.258 0.261 0.262 0.263 0.262

SEðbnÞ 0.179 0.055 0.054 0.057 0.053 0.056 0.051 0.056 0.055 0.055 0.054ed 1.07 1.09 1.15 1.15 1.17 1.18 1.18 1.18 1.18 1.18 1.18

SEðedÞ 0.028 0.059 0.068 0.073 0.080 0.083 0.082 0.083 0.082 0.084 0.084eg �0.101 �0.167 �0.183 �0.184 �0.184 �0.179 �0.193 �0.180 �0.184 �0.189 �0.186

SEðegÞ 0.234 0.187 0.181 0.183 0.185 0.193 0.180 0.193 0.192 0.191 0.192bb1 0.798 0.843 0.842 0.839 0.837 0.832 0.845 0.842 0.842 0.842 0.843cnb1 0.116 0.221 0.228 0.230 0.226 0.226 0.223 0.225 0.226 0.227 0.226br 0.009 �0.088 �0.102 �0.104 �0.102 �0.105 �0.093 �0.096 �0.094 �0.095 �0.094

r 0.009 �0.128 �0.122 �0.119 �0.126 �0.127 �0.128 �0.128 �0.119 �0.117 �0.121


specification. Note also that the various bn are larger than before, but that, if indeed b4 12,

their standard errors have to be interpreted with caution, as bn is then no longerasymptotically normal.Engle and Granger (1987) found no evidence of CIð1; 0Þ cointegration between logM1

ðyÞ and logGNP ðxÞ, on the basis of 90 quarterly observations, 1959Q1–1981Q2. Marinucciand Robinson (2001) fractional analysis admitted the possibility of cointegration, withbo 1

2. In our preliminary analysis of autocorrelation in ut, we took from their estimates of d

the values 1.22 (SG applied to first differences of x and adding back 1, using bandwidth 30)and 1.36 (LP applied to first differences of x and adding back 1, using bandwidth 22), andfrom their estimates of g the values 0.76, 1.2, both LP estimates but applied, respectively, toraw residuals using bandwidth 22 and first differences of residuals and adding back 1,using bandwidth 16. Employing also the LSE of n, 0.643, we found no evidence ofautocorrelation in ut, so proceeded to a cointegration analysis on the basis of p ¼ 0 in (14).The results are reported in Table 29. We found large variation across the largest n0, but agood degree of stability is then achieved, with substantially larger values of ed and eg (and oftheir standard errors). Clearly, ed significantly exceeds 1, while eg does not, and the resultingeb ¼ ed� eg are extremely close to the threshold value of 1

2. There is considerable negative

correlation between u1t and u2t, and for the smaller n0, bn is close to the LSE.Finally, we looked at the n ¼ 116 annual observations, 1871–1986, on stock prices ðyÞ

and dividends ðxÞ, analysed by Campbell and Shiller (1987). Their findings with respect to

ARTICLE IN PRESS

Table 30

Stock prices and dividends: ut white noise

n0 116 115 114 113 112 111 110 109 108 107 106

bn 32.7 32.7 32.2 31.9 31.7 31.8 31.7 32.0 32.1 32.1 32.1

SEðbnÞ 7.56 7.64 7.80 7.83 7.81 7.93 7.91 7.99 8.02 7.99 8.01ed 1.04 1.04 1.08 1.09 1.09 1.09 1.09 1.09 1.10 1.10 1.10

SEðedÞ 0.077 0.077 0.090 0.092 0.092 0.092 0.093 0.093 0.095 0.095 0.095eg 0.749 0.751 0.751 0.752 0.751 0.752 0.752 0.751 0.749 0.749 0.749

SEðegÞ 0.114 0.116 0.116 0.117 0.116 0.117 0.117 0.116 0.116 0.116 0.116br �8.97 �9.52 �9.13 �8.82 �8.56 �8.67 �8.54 �8.52 �8.64 �8.59 �8.69

r �0.299 �0.283 �0.272 �0.263 �0.256 �0.259 �0.255 �0.252 �0.255 �0.253 �0.256


CIð1; 0Þ cointegration were inconclusive, but Marinucci and Robinson’s (2001) andRobinson and Marinucci’s (2003) analyses again suggested the possibility of cointegrationwith bo 1

2. The preliminary estimates of d taken fromMarinucci and Robinson (2001) were

0.86 and 0.95, being SG based on first differences of x and adding back 1, withbandwidths, respectively, 30 and 40. The preliminary estimates of g were 0.57, 0.77, beingLP based on first differences of residuals and adding back 1, with bandwidth 30 and SG onraw residuals with bandwidth 22, respectively. We also used the LSE of n, 31. In this case,both g estimates suggested white noise u1t, while the d estimates variously suggested whitenoise and AR(1) u2t, but our subsequent fractional cointegration analysis produced eg and edthat were too close to admit the likelihood of any cointegration. Thus, we report, in Table30, only the results with both u1t and u2t white noise. There is little variation with n0, andstrong support for the unit-root hypothesis on d, and, since eg is significantly larger than 1

2at

the 5% level, cointegration with bo 12is certainly a possibility. We find that bn is somewhat

larger than the LSE value, though not significantly so.

Acknowledgements

This research was supported by ESRC Grants R000238212 and R000239936. The firstauthor’s research was also supported by the Fundacion Ramon Areces (Spain) and theSpanish Ministerio de Educacion y Ciencia through a ‘‘Juan de la Cierva’’ contract. Thesecond author’s research was also supported by a Leverhulme Trust Personal ResearchProfessorship. We are grateful for the comments of a referee which have led toimprovements in the paper.

Appendix A. Proof of Theorem 3.1

We prove first that F is nonsingular, which ensures existence of the inverses in (32).Define

Fþ ¼ Eð eZþt eZþ0t Þ; eZþt ¼ ðew0t; ew0t�1; . . . ; ew0t�pÞ0. (A.1)

It clearly suffices to show that Fþ is positive definite. Defining

Fþ¼ EðZtZ

0

tÞ; Zt ¼ ðw0t;w0t�1; . . . ;w

0t�pÞ0, (A.2)


for wt ¼ ðextðgÞ; u2t; u1tÞ0, from (29) it suffices to show that Fþ is positive definite, and

similarly, defining

Fþþ ¼ EðRZtZ0tR0Þ, (A.3)

where R is a full rank 3ðpþ 1Þ � 3ðpþ 1Þ matrix whose columns are orthonormal vectorssuch that

RZt ¼ ½xðgÞ0; u02; u

01�0, (A.4)

where xðgÞ ¼ ðextðgÞ; . . . ; ext�pðgÞÞ0, u2 ¼ ðu2t; . . . ; u2;t�pÞ

0, u1 ¼ ðu1t; . . . ; u1;t�pÞ0, it suffices to

show that Fþþ

is positive definite. Define the vectors

eðlÞ ¼ ð1; eil; . . . ; eiplÞ0; dðlÞ ¼ ð1� eilÞ�beðlÞ, (A.5)

and the 3ðpþ 1Þ � 2 matrix

EðlÞ ¼00 00 eðlÞ0

dðlÞ0 eðlÞ0 00

" #0, (A.6)

where 00 is a 1� ðpþ 1Þ vector of zeroes. Define by f ðlÞ the spectral density matrix of ut,and note from positive definiteness of O and finiteness of the Bj that the smallest eigenvalueof the Hermitian matrix f ðlÞ is bounded from below by a positive constant c, uniformly inl. Then we can write

Fþþ¼

Z p

�pEðlÞf ðlÞEð�lÞ0 dl, (A.7)

which for some c40 exceeds

c

Z p

�pEðlÞEð�lÞ0 dl ¼ c

C D 0

D0 Ipþ1 0

0 0 Ipþ1

264375 (A.8)

by a nonnegative definite matrix, where 0, C and D are ðpþ 1Þ � ðpþ 1Þ matrices,having ði; jÞth elements 0,

P1‘¼0 a‘a‘þji�jj and aj�i1ðjXiÞ, respectively, with aj ¼ ajðbÞ.

It thus suffices to show that C �DD0 is positive definite. But for a ðpþ 1Þ � 1 vectorz ¼ ðziÞ,

z0ðC �DD0Þz ¼X1‘¼1

ða‘zpþ1 þ � � � þ a‘þpz1Þ2, (A.9)

which is positive unless z ¼ 0 because a‘=a‘�1 ¼ ð‘ þ b� 1Þ=‘ is strictly increasing in ‘X1for bo1.We now have to show that

1

n

X0Ztðg; dÞZ0tðg; dÞ!pF, (A.10)

n�1=2X0

Ztðg; dÞe1:2;t!dNð0;CÞ, (A.11)

writingP0¼Pn

t¼pþ1. To prove (A.11), note first that it suffices to show

n�1=2X0 eZte1:2;t!dNð0;CÞ, (A.12)


because

E n�1=2X0fZtðg; dÞ � eZtge1:2;t

2pK

n

X0EkZtðg; dÞ � eZtk

2

pK

n

X0Xp

j¼1

Ex2t�jðgÞ

pK

n

X0Xp

j¼1

Z p

�p

X1s¼t�j

ase�isl

��2

kf ðlÞkdl

pK

n

Xn

t¼1

X1s¼t

a2s ! 0, ðA:13Þ

as n!1; by the Toeplitz lemma, the last inequality following because f ðlÞ is boundeddue to the assumption on the B‘. Write eZt ¼ Zat þ Zbt, where the first two elements of Zat,and the last 3p elements of Zbt, equal corresponding ones of eZt. Thus, Zbt is Ft�1-measurable and

Eðe1:2;t eZtjFt�1Þ ¼ Eðe1:2;tZatÞ þ ZbtEðe1:2;tjFt�1Þ ¼ 0 a:s. (A.14)

Further,

Eðe21:2;t eZteZ0tjFt�1Þ ¼ Eðe21:2;tZatZ

0atÞ þ Eðe21:2;tZatÞZ

0bt

þ ZbtEðe21:2;tZ0atÞ þ Eðe21:2;tÞZbtZ

0bt a:s. ðA:15Þ

and so

1

n

X0½Efe21:2;t eZt

eZ0tjFt�1g � Efe21:2;t eZteZ0tg�!p0, (A.16)

because Zbt and ZbtZ0bt � EðZbtZ

0btÞ are stationary and ergodic with zero means. Since

(A.15) has expectation C, (A.12) then follows from the Cramer–Wold device and Theorem1 of Brown (1971), noting that the Lindeberg condition in the latter reference is triviallysatisfied because e1:2;t eZt is stationary with finite variance. Thus (A.11) is proved. The proofof (A.10) follows from (A.13) and elementary inequalities. This concludes the proof of (32).The proof of the final statement of the theorem is omitted as it is standard given (32) andits proof.

Appendix B. Definition of A and B

For brevity we write ~G ¼ Gð~g; ~dÞ, ~y ¼ yð~g; ~dÞ, ~H ¼ Hð~dÞ, ~f ¼ fð~dÞ.We have

bA ¼ a01 a2 a3

00 a4 a5

00 0 a6

264375, (B.1)

where

a01 ¼ 10 ~G�1; a2 ¼ �10 ~yc ~s�1cc , (B.2)


a3 ¼ 10 ~yc ~s�1cc ~scd ~s

�1dd � 10 ~yd ~s

�1dd ; a4 ¼ �~s

�1cc , (B.3)

a5 ¼ ~s�1cc ~scd ~s�1dd ; a6 ¼ �~s

�1dd , (B.4)

in which

~yc ¼ ~G�1ð ~gc �~Gc~yÞ; ~yd ¼ ~G

�1ð ~gd �

~Gd~yÞ, (B.5)

~gc ¼ Q1

n

X0fZtcð~gÞytð~gÞ þ Ztð~g; ~dÞytcð~gÞg, (B.6)

~Gc ¼ Q1

n

X0fZtcð~gÞZ0tð~g; ~dÞ þ Ztð~g; ~dÞZ0tcð~gÞgQ

0, (B.7)

~gd ¼ Q1

n

X0Ztdð

~dÞytð~gÞ, (B.8)

~Gd ¼ Q1

n

X0fZtdð

~dÞZ0tð~g; ~dÞ þ Ztð~g; ~dÞZ0td ð~dÞgQ0, (B.9)

with

ytcð~gÞ ¼ logð1� LÞytð~gÞ, (B.10)

Ztcð~gÞ ¼ logð1� LÞfxtð~gÞ; 0;xt�1ð~gÞ; 0; yt�1ð~gÞ; . . . ;xt�pð~gÞ; 0; yt�pð~gÞg0, (B.11)

ZtdðedÞ ¼ logð1� LÞf0;xtð

edÞ; 0;xt�1ðedÞ; 0; . . . ; 0; xt�pð

edÞ; 0g0, (B.12)

and where

escc ¼1

n

X0~v2tc; escd ¼

1

n

X0~vtc ~vtd ; esdd ¼

1

n

X0~w2

td , (B.13)

with

~vtc ¼ ytcð~gÞ � ~y0cQZtð~g; ~dÞ � ~y

0QZtcð~gÞ, (B.14)

~vtd ¼ �~y0dQZtð~g; ~dÞ � ~y

0QZtd ð~dÞ, (B.15)

~wtd ¼ xtd ð~dÞ � ~f

0

dRX tð~dÞ � ~f

0RX tdð

~dÞ, (B.16)

xtdð~dÞ ¼ logð1� LÞxtð

~dÞ, (B.17)

X tdð~dÞ ¼ logð1� LÞX tð

~dÞ, (B.18)

~fd ¼~H�1ð ~hd � ~Hd

~fÞ, (B.19)

~hd ¼ R1

n

X0fX td ð

~dÞxtð~dÞ þ X tð

~dÞxtdð~dÞg, (B.20)

~Hd ¼ R1

n

X0fX tdð

~dÞX 0tð~dÞ þ X tð~dÞX 0tdð~dÞgR

0. (B.21)


We also have

bB ¼ 1

n

X0

e1:2;tð~g; ~dÞQZtð~g; ~dÞ

e1:2;tð~g; ~dÞ~vtc

e2tð~dÞ ~wtd

264375 e1:2;tð~g; ~dÞQZtð~g; ~dÞ

e1:2;tð~g; ~dÞ~vtc

e2tð~dÞ ~wtd

2643750

, (B.22)

where

e2tðdÞ ¼ xtðdÞ � ~f0RX tðdÞ. (B.23)

References

Andersen, T.G., Bollerslev, T., Diebold, F.X., Ebens, H., 2001. The distribution of realized stock return volatility.

Journal of Financial Economics 61, 43–76.

Baillie, R.T., Bollerslev, T., 1994a. Cointegration, fractional cointegration and exchange rate dynamics. The

Journal of Finance 49 (2), 737–745.

Baillie, R.T., Bollerslev, T., 1994b. The long memory of the forward premium. Journal of International Money

and Finance 13 (5), 565–571.

Beran, J., 1995. Maximum likelihood estimation of the differencing parameter for invertible short and long

memory autoregressive integrated moving average models. Journal of the Royal Statistical Society, Series B

57, 659–672.

Box, G.E.P., Jenkins, G.M., 1971. Time Series Analysis. Forecasting and Control. Holden-Day, San Francisco.

Brown, B.M., 1971. Martingale central limit theorems. Annals of Mathematical Statistics 42, 59–66.

Campbell, J.Y., Shiller, R.J., 1987. Cointegration and tests of present value models. Journal of Political Economy

95, 1062–1088.

Cheung, Y.W., Lai, K.S., 1993. A fractional cointegration analysis of purchasing power parity. Journal of

Business and Economic Statistics 11, 103–112.

Christensen, B.J., Nielsen, M.O., 2006. Asymptotic normality of narrow-band least squares in the stationary

fractional cointegration model and volatility forecasting. Journal of Econometrics 133, 343–371.

Christensen, B.J., Prabhala, N.R., 1998. The relation between implied and realized volatility. Journal of Financial

Economics 50, 125–150.

Crato, N., Rothman, P., 1994. A reappraisal of parity reversion for UK real exchange rates. Applied Economics

Letters 1, 139–141.

Dickey, D.A., Fuller, W.A., 1979. Distribution of estimators of autoregressive time series with a unit root. Journal

of the American Statistical Association 74, 427–431.

Diebold, F.X., Husted, S., Rush, M., 1991. Real exchange rates under the gold standard. Journal of Political

Economy 99, 1252–1271.

Dolado, J., Marmol, F., 1996. Efficient estimation of cointegrating relationships among higher order and

fractionally integrated processes. Banco de Espana-Servicio de Estudios, Documento de Trabajo 9617.

Dolado, J., Gonzalo, J., Mayoral, L., 2002. A fractional Dickey–Fuller test. Econometrica 70, 1963–2006.

Dueker, M., Startz, R., 1998. Maximum-likelihood estimation of fractional cointegration with an application to

US and Canadian bond rates. The Review of Economics and Statistics 80, 420–426.

Engle, R.F., Granger, C.W.J., 1987. Co-integration and error correction: representation, estimation and testing.

Econometrica 55, 251–276.

Fox, R., Taqqu, M.S., 1986. Large-sample properties of parameter estimates for strongly dependent processes.

Annals of Statistics 14, 517–532.

Giraitis, L., Surgailis, D., 1990. A central limit theorem for quadratic forms in strongly dependent linear variables

and its application to asymptotic normality of Whittle’s estimate. Probability Theory and Related Fields 86,

87–104.

Hosoya, Y., 1997. Limit theory with long-range dependence and statistical inference of related models. Annals of

Statistics 25, 105–137.

Hualde, J., 2002. Testing for the equality of orders of integration. Preprint.


Hualde, J., Robinson, P.M., 2004. Gaussian pseudo-maximum likelihood estimation of fractional time series

models. Preprint.

Jeganathan, P., 1999. On asymptotic inference in cointegrated time series with fractionally integrated errors.

Econometric Theory 15, 583–621.

Jeganathan, P., 2001. Correction to ‘On asymptotic inference in cointegrated time series with fractionally

integrated errors’. Preprint.

Johansen, S., 1991. Estimation and hypothesis testing of cointegrating vectors in Gaussian vector autoregressive

models. Econometrica 59, 1551–1580.

Kashyap, R., Eom, K., 1988. Estimation in long memory time series model. Journal of Time Series Analysis 9,

35–41.

Kim, C.S., Phillips, P.C.B., 2000. Fully modified estimation of fractional cointegration models. Preprint.

Marinucci, D., Robinson, P.M., 2001. Semiparametric fractional cointegration analysis. Journal of Econometrics

105, 225–247.

Marmol, F., Velasco, C., 2004. Consistent testing of cointegrating relationships. Econometrica 72, 1809–1844.

Moulines, E., Soulier, P., 1999. Broadband log periodogram regression of time series with long range dependence.

Annals of Statistics 27, 1415–1439.

Phillips, P.C.B., 1991. Optimal inference in cointegrated systems. Econometrica 59, 283–306.

Robinson, P.M., 1994a. Semiparametric analysis of long-memory time series. Annals of Statistics 22, 515–539.

Robinson, P.M., 1994b. Efficient tests of nonstationary hypotheses. Journal of the American Statistical

Association 89, 1420–1437.

Robinson, P.M., 1995a. Log-periodogram regression of time series with long range dependence. Annals of

Statistics 23, 1048–1072.

Robinson, P.M., 1995b. Gaussian semiparametric estimation of long-range dependence. Annals of Statistics 23,

1630–1661.

Robinson, P.M., 2000. Private communication.

Robinson, P.M., Hualde, J., 2003. Cointegration in fractional systems with unknown integration orders.

Econometrica 71, 1727–1766.

Robinson, P.M., Marinucci, D., 2001. Narrow-band analysis of nonstationary processes. Annals of Statistics 29,

947–986.

Robinson, P.M., Marinucci, D., 2003. Semiparametric frequency-domain analysis of fractional cointegration. In:

Robinson, P.M. (Ed.), Time Series with Long Memory. Oxford University Press, Oxford, pp. 334–373.

Robinson, P.M., Yajima, Y., 2002. Determination of cointegrating rank in fractional systems. Journal of

Econometrics 106, 217–241.

Stock, J.H., 1987. Asymptotic properties of least squares estimators of cointegrating vectors. Econometrica 55,

1035–1056.

Velasco, C., 1999a. Non-stationary log-periodogram regression. Journal of Econometrics 91, 325–371.

Velasco, C., 1999b. Gaussian semiparametric estimation of non-stationary time series. Journal of Time Series

Analysis 20, 87–127.

Velasco, C., Robinson, P.M., 2000. Whittle pseudo-maximum likelihood estimation for nonstationary time series.

Journal of the American Statistical Association 95, 1229–1243.

root--consistent estimation of weak fractional cointegration

Documents