essays on testing for structural changes

Essays on Testing for Structural Changes

Peiyun Jiang

Submitted in partial fulfillment of the requirements

for the degree of Doctor of Philosophy in Economics

Graduate School of Economics

Hitotsubashi University

November, 2020

Acknowledgements

This dissertation is composed of three testing procedures related to structural changes. Chap-

ter 2 is concerned with testing for parameter constancy in time series models, which is based

on Jiang and Kurozumi (2019), “Power properties of the modified CUSUM tests”, Commu-

nications in Statistics - Theory and Methods, 48, 2962-2981. Chapter 3 proposes a sequential

test for structural changes in models with a trend, which is based on Jiang and Kurozumi

(2020), “ Monitoring parameter changes in models with a trend”, Journal of Statistical Plan-

ning and Inference, 207, 288-319. Chapter 4 introduces a test for common breaks in panels.

I would like to express the deepest appreciation to my advisor, Professor Eiji Kurozumi,

whose insight and knowledge steered me through this dissertation. He inspired my interest

in time series analysis and has always lit up my career path. He did the best he can to

support me and has always been patient to map my Ph.D. journey, provide suggestions on

research topics, and connect me with the resources I need. Without his patient guidance

and persistent help, this dissertation would not have been possible. I am also thankful for

the thoughtful comments and constructive suggestions from Professor Yohei Yamamoto. His

enthusiasm for the research has greatly encouraged me to complete my dissertation. I would

like to thank Professor Toshiaki Watanabe for providing valuable suggestions on empirical

analysis and encouraging me to further explore applications in economics. I also appreciate

Professor Yukitoshi Matsushita and Professor Toshio Honda for their helpful comments in

improving this dissertation.

I am deeply grateful to my wonderful parents for their support and encouragement. They

provided me an opportunity to study abroad and helped me achieve all my goals. I thank

them for always being there through all my hardships. I am also thankful to my friends

who support me, uplift me, and bring light to my life. Special thanks go to Hitotsubashi

University, Nomura Foundation, and Mitsubishi UFJ for financial support.

Peiyun Jiang

November, 2020

Contents

Acknowledgements i

1 Overview 1

1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Overview: Chapter 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Modified Tests for Orthogonal Structural Changes in Time Series Models 6

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2 Models and Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.3 Modified CUSUM Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.3.2 Modified CUSUM tests . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.3.3 Serially correlated errors . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.4 Finite Sample Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

Appendix A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

Tables and Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3 A Sequential Test for Structural Changes in Models with a Trend 28

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3.2 Model and Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.3 Monitoring Procedure for a Change in the Trend . . . . . . . . . . . . . . . . 32

3.3.1 CUSUM-based monitoring procedure . . . . . . . . . . . . . . . . . . . 32

3.3.2 Extension to higher order polynomials . . . . . . . . . . . . . . . . . . 34

3.4 Asymptotic Distributions of the Stopping Times . . . . . . . . . . . . . . . . 36

3.6 Empirical Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

Appendix B. Proofs of Theorems 3.1 and 3.2 . . . . . . . . . . . . . . . . . . . . . 42

Appendix C. Proofs of Theorems 3.3 and 3.4 . . . . . . . . . . . . . . . . . . . . . 53

Tables and Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

4 A New Test for Common Breaks in Heterogeneous Panel Data Models 77

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

4.2 Model and Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

4.3 Test Statistic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

4.4 Asymptotic Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

4.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

Appendix D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

Chapter 1

Overview

1.1 Introduction

Structural change has long been an important issue in the statistics and econometrics liter-

ature, since ignoring parameters instability may lead to inaccurate forecasts and misleading

inferences. In the last fifty years or so, several important testing procedures have been pro-

posed in the econometrics literature and extensively investigated to cover models at a level

of generality.

One statistic which has played an important role in theory and applications related to

structural changes is the cumulative sum (CUSUM) test proposed by Brown et al. (1975).

This test is based on the CUSUM of recursive residuals and becomes especially popular

because it is designed to test the null hypothesis of parameter stability against a variety of

alternatives. With regard to its power properties, Ploberger and Kramer (1990), and Deng

and Perron (2008) indicated that the power of the CUSUM test crucially depends on the

angle between the mean of the regressors and direction of the shift. A major drawback

is that the test loses power when the mean of the regressors is perpendicular to the shift,

referred as to an orthogonal structural change. Several modified versions of the test have

been proposed, explicitly or implicitly, to overcome the problem. See Luger (2001), Huskova

and Koubkova (2005), and among others. However, these modified versions cannot show

overwhelming advantages over the original CUSUM tests. To avoid losing power when the

mean of the regressor is orthogonal to the shift in a linear regression model, Chapter 2

considers two versions of the modified CUSUM tests, which are satisfactory in terms of both

size and power.

Most of the existing methodologies related to structural changes are used to examine

parameter instability based on a historical data set of fixed size, which are the so-called

retrospective tests. Nowadays, since new data arrive steadily in the real world, it leads to

questions: Is the current model still valid to explain today’s data? How to sequentially

detect breaks as soon as possible as new data coming? However, multiple replications of a

retrospective test with a given critical value will result in an uncontrollable empirical size.

Starting with the pioneering work of Chu et al. (1996) in the econometrics literature, the

sequential tests have emerged to check whether incoming data are consistent with the current

relationship, and thus become popularly advocated in the literature. The sequential tests

have been investigated extensively in various models, including Aue and Horvath (2004) in a

mean-shift model, Aue et al. (2009) in a linear model, Carsoule and Franses (2003) and Lee

et al. (2009) in autoregressive models, Na et al. (2011) in GARCH models, Xia et al. (2011)

and Kurozumi (2017) in linear models with endogenous regressors. As noted by Perron

(1989) and others, macroeconomic time series are sometimes better characterized by trend-

stationary series with possible change(s) in deterministics, for instance, commodity retail

sales and consumer price index (CPI). While Qi et al. (2016) extended the fluctuation test to

detect parameter instability in a model with a trend, from a different perspective, Chapter 3

develops a CUSUM-type monitoring scheme based on ordinary least squares (OLS) residuals.

Since the sequential tests generally reject the null hypothesis of no change possibly with a

delay after the break, the speed of detection is a significant indicator for sequential tests. In

Chapter 3, we further investigate the properties of delay times of the CUSUM test as well as

the fluctuation test.

The issue related to structural changes in Chapter 4 shifts from pure time series to panel

data models, in which there are N series and each series has T observations. One of the

main reasons is that using the cross-section data improves the accuracy of the break points

estimators. In panel frameworks, the failure of consistency of the break point in time series

models has been overcome, due to a common break assumption that break point occurs in

each series at the same location. In practice, however, the common break assumption is

restrictive in a sense and some evidence has verified that the break points are likely to vary

significantly across individuals (see Claeys and Vasıcek 2014; Adesanya 2020). The validity

of the common break assumption we focus on in Chapter 4 is an issue of interest for most

applications, but no satisfactory methodology has been proposed to cope with this problem.

To fill in this gap, we introduce a test to determine whether the break dates are common or

can vary across series in the panel data models.

This dissertation consists of three testing procedures related to structural changes. It

includes modified tests to detect orthogonal breaks in a univariate time series in Chapter 2, a

sequential test to continuously detect parameter instability in a model with polynomial trend

in Chapter 3, and a new test for common breaks in panel data models in Chapter 4.

1.2 Overview: Chapter 2

In Chapter 2, we consider a linear regression model given by

yt = x′tβt + ut (t = 1, 2, · · · , T ),

where xt = [x1t, x2t, · · · , xkt]′ is a k-dimensional regressor, ut is an unobservable stochastic

disturbance, and βt is a k-dimensional vector of coefficients. We are interested in testing for

parameter instability in regressions, which is given by

H0 : βt = β ∀t vs. H1 : βt = β + δ1t>[Tλ],

where 1(t>[Tλ]) is an indicator function, taking the value one if t > [Tλ] and zero otherwise.

The original CUSUM tests are based on the recursive residuals or OLS residuals. However,

Deng and Perron (2008) verified that the tests lose power when the regressor is orthogonal

to the direction of the shift. To overcome the shortcoming, we propose two versions of

the modified tests based on the product of the regressors and the residuals. We show that

the power of proposed tests under the fixed alternative is depending on a quadratic form

of the shift and the second-moment of the regressors. This power property ensures the

modified versions to avoid the loss of power in the case of E(x′tδ) = 0. We further extend

the modified tests to models allowing for serial correlation in disturbances. The modified

tests are superior to the original ones in terms of power performance when the mean of the

regressors is orthogonal to the shift.

The tests in Chapter 2 are often used to examine what happens in historical data sets.

Since new data arrive steadily and quickly, this chapter focuses on a sequential test to detect

parameter changes on-line in a model with a trend, which is specified by

yt = x′tβt + ϵt (t = 1, 2, · · · ,m,m+ 1, · · · ),

where xt = [1, t/m]′ is a regressor including a constant term and a trend, ϵt is an unobservable

stochastic disturbance, and βt = [β0t, β1t]′ is a vector of the coefficients.

It is assumed that the parameters are stable in a training period of size m, that is,

βt = β0, t = 1, 2, · · · ,m.

When the data arrive sequentially one by one after the training period, we are interested in

detecting a change in parameters as soon as it occurs. Then, the null hypothesis is specified

H0 : βt = β0, t = m+ 1,m+ 2, · · · ,

while the alternative hypothesis is

H1 : There is k∗ ≥ 1 such that βt = β0, t = m+ 1,m+ 2, · · · ,m+ k∗ − 1,

but βt = β0 +∆, t = m+ k∗,m+ k∗ + 1, · · · .

We introduce a stopping time defined by

infk ≥ 1 : |Γ(m, k)| ≥ g(m, k),∞, if |Γ(m, k)| < g(m, k) for all k = 1, 2, · · · .

The null hypothesis will be rejected if the detecting statistic Γ(m, k) exceeds a suitably

constructed threshold function g(m, k) and otherwise the procedure will never terminate.

Our detecting statistic is constructed by the cumulative sum of the future residuals after

time m, and we can show that the limiting distribution of the detecting statistic has a

growing variance. This implies that a constant boundary cannot be used for the monitoring

procedure because the detecting statistic will eventually exceed a given critical values and

the null hypothesis will be rejected with a probability approaching one even if the parameter

is stable. To ensure proper empirical size of the test, we design the boundary function with

the same growth rate as that of the detecting statistic.

We derive the limiting null distribution and the consistency of the procedure under the

alternative. Then, we extend the CUSUM monitoring procedure to models with higher order

polynomial trends. We further find that the limit distributions of the stopping time for

the CUSUM test as well as the fluctuation one proposed by Qi et al. (2016) are normal

distributions. Moreover, the stopping time of the CUSUM test grows at a slower rate than

that of the fluctuation test, which implies that the delay time based on the CUSUM procedure

tends to be shorter than that based on the fluctuation one. This advantage has been confirmed

in simulations and an empirical study.

In Chapter 4, we consider a panel data model in which there are N series and each series has

T observations:

yit = x′itβi + x′itδi1t>k0i + uit, 1 ≤ i ≤ N and 1 ≤ t ≤ T,

where xit = [xit(1), · · · , xit(p)]′ is p-dimensional explanatory variables including a constant

term. Coefficient δi is the shift for individual i at an unknown date k0i , and uit is an unob-

servable stochastic disturbance.

We are interested in testing the validity of the common break assumption. Under the null

hypothesis, the break date in each series is assumed to be common, that is,

H0 : k0i = k0, for all i = 1, 2, · · · , N.

Under the alternative, the break dates are common for individuals in the same group but

allowed to differ across groups. Suppose that there exist G groups, the alternative hypothesis

is defined by

HA : k0g1 = k0g2 , for some g1, g2 ∈ 1, 2, · · · , G.

The numerator of the proposed statistic is squared of the cumulative sum of the OLS

residuals, while the denominator is constructed by the normalization factor, instead of a

long-run variance estimator such that we can avoid power loss when the shift increases under

the alternative (so-called non-monotonic power problem). We find that the proposed statistic

has a non-degenerate distribution under the null, which is a functional of Brownian bridges.

When the common break assumption fails, the statistic will diverge to infinity such that we

can reject the null hypothesis. Simulations show that the size of the test is controlled when

N and T are large and the test is powerful under various types of alternatives.

Chapter 2

Modified Tests for OrthogonalStructural Changes in Time SeriesModels

The CUSUM test has played an important role in theory and applications related to structural

change, but its drawback is that it loses power when the break is orthogonal to the mean of the

regressors. In this chapter, we consider two modified CUSUM tests that have been proposed,

implicitly or explicitly, in the literature to detect such structural changes and investigate the

limiting power properties of these tests under a fixed alternative. We demonstrate that the

modified tests are superior to the classic tests in terms of both asymptotic theory and finite

samples when detecting an orthogonal structural shift.1

2.1 Introduction

The original CUSUM test introduced by Brown et al. (1975) has been widely used to test

for parameter stability in practical analyses. It has also been investigated extensively and

extended in various ways in the literature. With regard to its power properties, for exam-

ple, Garbade (1977) studied the finite sample performance of the CUSUM test under three

patterns of coefficient variations. The results of the Monte Carlo experiments showed that

the CUSUM test is weak at detecting parameter instability under the simulation settings.

Because this property has been observed repeatedly in the literature, Ploberger and Kramer

(1990) theoretically investigated the power of the test, where changes in the parameters are

local to zero under the alternative. They showed that the limiting distribution of the test

1The published version is Jiang and Kurozumi (2019). Power properties of the mod-ified CUSUM tests. Communications in Statistics - Theory and Methods 48, 2962-2981.https://doi.org/10.1080/03610926.2018.1473598

under the local alternative is expressed as a Brownian motion, plus an additional term re-

lated to the interaction between the mean of the regressors and direction of the structural

change. This result implies that in the case of a simple shift in parameters, the power of the

CUSUM test depends on the angle between the mean of the regressors and direction of the

shift. Furthermore, the test loses power when this angle is perpendicular. Their result theo-

retically explains the poor performance of the CUSUM test in the study of Garbade (1977),

in which the mean of the regressors is set as orthogonal to the shift (the mean is equal to

zero). While the original CUSUM test was proposed by using recursive residuals, Ploberger

and Kramer (1992) developed a CUSUM test based on ordinary least squares residuals and

compared the local power of this test with that of the original test. By contrast, Deng and

Perron (2008) investigated the power properties of both versions of the CUSUM test from

a non-local perspective. They derived the limiting properties of the test statistic under the

fixed alternative and confirmed that even in this case, the power of the test depends on the

angle between the mean of the regressors and direction of the change.

Because these undesirable power properties of the CUSUM test have been noted in the

literature, several modified versions of the test have been proposed, explicitly or implicitly,

to overcome the problem. For example, Luger (2001) introduced a test statistic based on the

symmetrization of the absolute value of the recursive residuals. This modified test performs

better than the original test does in terms of power when the angle between the mean of the

regressor and shift is perpendicular, although the original CUSUM test performs better when

this angle decreases. Huskova and Koubkova (2005) considered a quadratic form of the prod-

uct of the regressors and the residuals for monitoring tests, while Xia, Guo, and Zhao (2011)

proposed a CUSUM test based on the weighted residuals from the GMM estimation. These

studies concentrate on the reactions of the tests on the location change and the magnitude

of the break, but do not analyze the impact of the angle on the power of the tests. Further

studies are thus necessary to investigate the performance of these methodologies in terms of

detecting an orthogonal structural change.

Therefore, in this chapter, we investigate two versions of the CUSUM test that are mod-

ified to avoid losing power when the mean of the regressor is orthogonal to the shift. The

asymptotic distributions of the test statistics are investigated under the null hypothesis of

parameter stability as well as under the fixed alternative, and then we investigate the power

of each of the modified tests. We confirm that the modified tests are superior to the classic

test in terms of both asymptotic theory and finite samples when detecting an orthogonal

structural shift.

The remainder of Chapter 2 is organized as follows. Section 2.2 introduces the model

and assumptions. Section 2.3 presents the asymptotic behaviors of the two modified tests.

Then, we extend the modified tests further to models with serially correlated errors. The

finite sample properties are investigated by using Monte Carlo simulations in Section 2.4.

Concluding remarks are given in Section 2.5. The mathematical proofs are relegated to

Appendix A.

2.2 Models and Assumptions

We consider the standard linear regression model given by

yt = x′tβt + ut (t = 1, 2, · · · , T ), (2.1)

where xt = [x1t, x2t, · · · , xkt]′ is a k-dimensional regressor, ut is an unobservable stochastic

disturbance, and βt is a k-dimensional vector of coefficients. Because a constant term is

typically included in a model, the first element of the regressor, x1t, is unity for all t. We

consider the testing problem given by

H0 : βt = β ∀t vs. H1 : βt = β + δ1(t>[Tλ]),

where λ ∈ (0, 1) represents the break fraction and 1(t>[Tλ]) is an indicator function, taking

the value one if t > [Tλ] and zero otherwise. Then, the parameters in (2.1) are stable under

the null hypothesis, whereas we allow for a one-time change in the parameters under the

alternative.

To investigate the asymptotic properties of the CUSUM test, we make the following

assumptions.

Assumption 2.1 The regressor xt and error term ut are defined on a common probability

space, and the following condition holds:

limT→∞

T∑t=1

∥ xt ∥2+δ< ∞, a.s. for some δ > 0.

Assumption 2.2 The following probability limits exist:

p limT→∞

T∑t=1

xt = E[xt] = c1,

p limT→∞

T∑t=1

xtx′t = E[xtx

′t] = C,

p limT→∞

T∑t=1

xtx′t ⊗ xtx

′t = E[xtx

′t ⊗ xtx

′t] = Λ,

where c1 is a k×1 vector, and C and Λ are k×k and k2×k2 non-singular and non-stochastic

matrices, respectively.

These assumptions are satisfied, for example, if xt is a weakly dependent stationary process

with more than fourth-order moments. We need Assumption 2.2 to investigate the power of

the modified CUSUM tests. We denote the rows of C by c′j for i = 1, · · · , k. That is,

C = E[xtx′t] = E

x2tx′t

...xktx

c′1c′2...c′k

The vector c1 is called the mean regressor.

Assumption 2.3 The disturbances ut are stationary and ergodic with

E[ut|Ut−1] = 0, E[u2t |Ut−1] = σ2, E[u4t ] < ∞,

where Ut−1 is the σ-field generated by xt, ut−1, xt−1, ut−2, · · · .

Assumption 2.3 implies that the error term is a martingale difference sequence. The uncor-

relatedness of the errors can be checked by testing for serial correlation using the regression

residuals. In the following, we first proceed with this assumption. However, it is relaxed in

a later section to investigate the effect of serial correlation on the power of the tests.

2.3 Modified CUSUM Tests

2.3.1 Motivation

We first consider the standard CUSUM test to motivate the modification. The test statistic

based on the OLS residuals is given by

CUSUMols = sup0≤r≤1

∣∣∣∣∣∣ 1

σ√T

[Tr]∑t=1

∣∣∣∣∣∣ ,

where ut are the OLS residuals and σ2 = T−1∑T

t=1 u2t . The recursive residuals-based test

statistic is

CUSUM rec = sup0≤r≤1

∣∣∣∣∣∑[Tr]

t=k+1 ut

σ√T − k

∣∣∣∣∣/(

1 + 2[Tr]− k

T − k

where ut = (yt−x′tβt−1)/ft, for t = k+1, · · · , T , are the recursive residuals; βt = (X ′tXt)

−1X ′tYt,

with Xt = [x′1, x′2, · · · , x′t]′ and Yt = [y1, y2, · · · , yt]′; ft = (1 + x′t(X

′t−1Xt−1)

−1xt)1/2; and

σ2 = (T − k)−1∑T

t=k+1(ut − ¯u)2, with ¯u = (T − k)−1∑T

t=k+1 ut.

Ploberger and Kramer (1990, 1992) derived the limiting distributions of these test statis-

tics under the local alternative and Deng and Perron (2008) investigated the asymptotic

properties of these statistics under the fixed alternative. Their results imply that the power

of these tests depends on the angle between the mean regressor and direction of the break. To

explain this dependence, we demonstrate that the power of the OLS-based test depends on

c′1δ by focusing on the fixed alternative. Given that the OLS estimator of β can be expressed

β = β +

(T∑t=1

xtx′t

)−1 T∑t=1

′tδ1(t>[Tλ]) + xtut

), (2.2)

the OLS residuals are given by

ut = ut + x′tδ1(t>[Tλ]) − x′t(β − β), (2.3)

and, thus,

[Tr]∑t=1

x′tδ1(t>[Tλ])

[Tr]∑t=1

T∑s=1

xsx′s

)−1 [1

T∑s=1

xsx′sδ1(s>[Tλ]) +

T∑s=1

]. (2.4)

It can be shown that, for r > λ, the second and third terms on the right-hand side of (2.4)

converge in probability to (r−λ)c′1δ and −r(1−λ)c′1δ, respectively. Therefore, the OLS-based

CUSUM test loses power when c′1δ = 0.

To avoid the dependence of the power on c′1δ, we modify the CUSUM test such that it is

not based on the residuals, but instead on the product of xjt (for j = 1) and the residuals.

Let wjt = xjtut and wjt = xjtut. Then, the modified CUSUM test statistics are defined as

CUSUMolsm = sup

0≤r≤1

∣∣∣∣∣∣ 1

σj√T

[Tr]∑t=1

∣∣∣∣∣∣ ,CUSUM rec

m = sup0≤r≤1

∣∣∣∣∣∑[Tr]

t=k+1 wjt

σj√T − k

∣∣∣∣∣/(

1 + 2[Tr]− k

T − k

σ2j =

T∑t=1

w2jt and σ2

T − k

T∑t=k+1

(wjt − ¯wj)2 with ¯wj =

T − k

T∑t=k+1

Proposition 2.1 Suppose Assumptions 2.1–2.3 hold.

(a) Under the null hypothesis,

CUSUMolsm ⇒ sup

0≤r≤1|BBj(r)|, (2.5)

CUSUM recm ⇒ sup

0≤r≤1

∣∣∣∣Wj(r)

1 + 2r

∣∣∣∣ , (2.6)

where BBj(r) and Wj(r) are the one-dimensional standard Brownian bridge and Brownian

motion, respectively and ⇒ denotes the weak convergence of the associated probability mea-

sures.

(b) Under the alternative hypothesis,

1√TCUSUMols

mp−→

|c′jδ|λ(1− λ)√σ2cjj + λ(1− λ)δ′Λjj,0δ

, (2.7)

1√TCUSUM rec

mp−→

|c′jδ|q√σ2cjj + λ(1− λ)δ′Λjj,0δ − (c′jδλ log(λ))2

, (2.8)

where cjj and Λjj,0 are the (j, j) element of C and (j, j) block of Λ, respectively; that is,

T∑t=1

x2jtp−→ cjj , and

T∑t=1

x2jtxtx′t

p−→ Λjj,0,

and q = sup0≤r≤1

λ log rλ1(r>λ)

1 + 2r=

λ log λ∗

1 + 2λ∗ 0 ≤ λ < e−32

−λ log λ

32 ≤ λ ≤ 1,

where λ∗ = λ∗ : 0 ≤ λ∗ ≤ 1 and log λ∗ = 1 + log λ+ 12λ∗ .

Proposition 2.1 shows that our modification could work well, even in the case of c′1δ = 0, for

c′jδ = 0 and j = 1. Thus, we can avoid the loss of power caused by the orthogonal change.

However, the modified test loses power if c′jδ = 0, and we do not know whether c′jδ = 0 for

some j = 1, · · · , k. We discuss how to overcome this problem in the following subsection.

2.3.2 Modified CUSUM tests

Note that C = E[xtx′t] is positive definite according to Assumption 2.2. Therefore, we can

easily see that c′jδ = 0 for at least one of j = 1, · · · , k if δ = 0. Thus, it is natural to construct

the test statistics based on all w1t, · · · , wkt or w1t, · · · , wkt to avoid the potential loss of power

caused by c′jδ = 0 for some j. One of the possible transformations used in the literature is to

construct a quadratic form based on xtut = [w1t, · · · , wkt]′ or xtut = [w1t, · · · , wkt]

′, given by

Qols = sup0≤r≤1

Qols(r) where Qols(r) =

(∑[Tr]t=1 xtut

)′ (∑Tt=1 xtx

)−1 (∑[Tr]t=1 xtut

Qrec = sup0≤r≤1

Qrec(r) where Qrec(r) =

(∑[Tr]t=k+1 xtut

)′ (∑Tt=1 xtx

)−1 (∑[Tr]t=k+1 xtut

where σ2 and σ2 are defined as before. The following theorem provides the asymptotic

properties of these test statistics.

Theorem 2.1 Suppose Assumptions 2.1–2.3 hold.

Qols ⇒ sup0≤r≤1

∥BB(r)∥2 , (2.10)

Qrec ⇒ sup0≤r≤1

∥W (r)∥2 , (2.11)

where BB(r) and W (r) are the k-dimensional standard Brownian bridge and Brownian mo-

tion, respectively.

TQols p−→ δ′Cδλ2(1− λ)2

σ2 + λ(1− λ)δ′Cδ, (2.12)

TQrec p−→ δ′Cδ(λ log λ)2

σ2 + λ(1− λ)δ′Cδ − (c′1δλ log(λ))2. (2.13)

Theorem 2.1 clearly shows that the modified test statistics Qols and Qrec can avoid the

loss of power caused by c′jδ = 0, for some j, and are consistent because δ′Cδ = 0.

The other possible transformation is to take the maximum of the absolute values of the

elements of xtut or xtut. Let

1 (r), · · · ,Molsk (r)

(T∑t=1

xtx′t

)−1/2 [Tr]∑t=1

[M rec1 (r), · · · ,M rec

k (r)]′ =1

(T∑t=1

xtx′t

)−1/2 [Tr]∑t=k+1

and consider the following test statistics:

Mols = max

0≤r≤1

∣∣∣Mols1 (r)

∣∣∣ , · · · , sup0≤r≤1

∣∣∣Molsk (r)

∣∣∣M rec = max

0≤r≤1|M rec

1 (r)| , · · · , sup0≤r≤1

|M reck (r)|

Theorem 2.2 Suppose Assumptions 2.1–2.3 hold.

Mols ⇒ max

0≤r≤1|BB1(r)|, · · · , sup

0≤r≤1|BBk(r)|

, (2.14)

M rec ⇒ max

0≤r≤1|W1(r)|, , · · · , sup

0≤r≤1|Wk(r)|

, (2.15)

where BBj(r) and Wj(r) (for j = 1, · · · , k) are the independent one-dimensional stan-

dard Brownian bridge and Brownian motion, respectively.

1√TMols p−→ max|v1|, · · · , |vk|λ(1− λ)√

σ2 + λ(1− λ)δ′Cδ, (2.16)

1√TM rec p−→ max|v1|, · · · , |vk|(−λ log λ)√

σ2 + λ(1− λ)δ′Cδ − (c′1δλ log(λ))2, (2.17)

where vj is the jth element of C1/2δ, for j = 1, · · · , k; that is, [v1, · · · , vk]′ = C1/2δ.

Again, we can see from Theorem 2.2 that the maximum-type tests, Mols and M rec, are

consistent irrespective of whether c′jδ = 0 for some j.

The critical values of the null-limiting distributions of the quadratic-type and maximum-

type tests are obtained by approximating a standard Brownian motion using 2,000 indepen-

dent normal random variables with 1,000,000 replications (see Panel (a) of Table 2.1). In

addition, it is sometimes the case that the test statistics are constructed by removing the

first and last 100ε % observations. In this case, we have

supε≤r≤1−ε

Qols(r) ⇒ supε≤r≤1−ε

∥BB(r)∥2 , supε≤r≤1−ε

Qrec(r) ⇒ supε≤r≤1−ε

∥W (r)∥2 , (2.18)

max1≤j≤k

ε≤r≤1−ε

∣∣∣Molsj (r)

∣∣∣ , j = 1, · · · , k

⇒ max1≤j≤k

ε≤r≤1−ε|BBj(r)|, j = 1, · · · , k

(2.19)

max1≤j≤k

ε≤r≤1−ε

∣∣M recj (r)

∣∣ , j = 1, · · · , k

⇒ max1≤j≤k

ε≤r≤1−ε|Wj(r)|, j = 1, · · · , k

. (2.20)

Panel (b) of Table 2.1 presents the critical values for these distributions with ε = 0.15.

We next investigate the limiting properties (not the finite sample properties) of the tests

under the alternative, based on Theorems 2.1 and 2.2. Because the probability limits under

the alternative depend on several parameters in the model, we focus on a simple case where

σ2 = 1 and xt = [1, zt]′, with zt ∼ i.i.d.N(1, 1). Furthermore, the change in the coefficients

is specified by δ = [1,−1]′ and δ = [1, 1]′, which correspond to c′1δ = 0 and c′1δ = 0,

respectively. Given that the power properties depend not only on the probability limits

under the alternative, but also on the critical values used in the tests, we compare these

limits divided by the asymptotic 5% critical values.

Figures 2.1(a) and (b) show the probability limits of the quadratic-type tests given by

(2.12) and (2.13), respectively divided by the corresponding critical values. We can see

that the limit of the OLS-based version is maximized at the midpoint, whereas that of the

recursive-based version is skewed to the right. As expected from the power analyses of

Ploberger and Kramer (1990) and Deng and Perron (2008), Qrec is more powerful than Qols

when the break occurs early in the sample, whereas the reversed relation is observed when λ is

closer to one. A similar tendency is observed for the maximum-type test, as shown in Figures

2.1(c) and (d). Figures 2.1(e) and (f) compared the two types of tests, the quadratic-type

and maximum-type tests based on the OLS method. However, neither version is uniformly

superior to the other because the powers depend on many factors such as the number of

regressors k and δ′Cδ, among others. For instance, in the case of k = 2, the quadratic-type

test outperforms the maximum-type test in our setting, as shown in Figure 2.1(e). However,

Figure 2.1(f) implies that the latter performs better in the case of k = 3.

2.3.3 Serially correlated errors

In practice, it is sometimes the case that the error term is not a martingale difference sequence,

but instead is serially correlated. As shown by, for example, Tang and MacNeill (1993), serial

correlation in the error term can produce striking effects on the distribution. Therefore, when

the error term is possibly serially correlated, we need to construct the test statistics by taking

serial correlation into account. In this case, we replace Assumption 2.3 with the following

assumption.

Assumption 2.4 The following functional central limit theorem holds:

[Tr]∑t=1

xtut ⇒ Ω1/2W (r)

uniformly for 0 ≤ r ≤ 1, where Ω =∑∞

p=−∞ Γp with Γp = Cov(xtut, xt−put−p).

The assumptions made in this chapter are standard in the investigation of linear regression

models with possible structural changes. The conditions necessary for Assumption 2.4 to hold

are discussed in econometric and statistical textbooks (e.g., Davidson, 1994).

The test statistics, Qols, Qrec, Mols, and M rec, are defined as before, with

Qols(r) =1

[Tr]∑t=1

Ω−1

[Tr]∑t=1

, (2.21)

Qrec(r) =1

[Tr]∑t=k+1

Ω−1

[Tr]∑t=k+1

, (2.22)

1 (r), · · · ,Molsk (r)

1√TΩ−1/2

[Tr]∑t=1

xtut, (2.23)

[M rec1 (r), · · · ,M rec

k (r)]′ =1√TΩ−1/2

[Tr]∑t=k+1

xtut, (2.24)

where Ω and Ω are consistent estimators of Ω, based on xtut and xtut, respectively. In

practice, it is often the case that Ω is estimated non-parametrically, such that

Ω = Γ0 +

m∑p=1

k(p,m)(Γp + Γ′

)where Γp =

T∑t=p+1

xtx′t−putut−p,

k(p,m) = 1 − p/(m + 1) is the Bartlett kernel, and the bandwidth m is selected based on

Andrews (1991), such that

m = [1.1447× (a(δ)T )1/3] where a(δ) =

∑kj=1 4ρ

4j /[(1− ρj)

6(1 + ρj)2]∑k

j=1 σ4j /(1− ρj)4

with ρj obtained by regressing wjt on wjt−1 and σ2j defined as before. Then, Ω is defined

similarly by using the recursive residuals.

Let us define γjj,p and Λjj,p as the probability limits of

T∑t=p+1

xjtxjt−putut−pp−→ γjj,p and

T∑t=p+1

xjtxjt−pxtx′t−p

p−→ Λjj,p.

Theorem 2.3 Suppose Assumptions 2.1, 2.2, and 2.4 hold and that the quadratic-type and

maximum-type test statistics are constructed by using (2.21)–(2.24).

(a) Under the null hypothesis, Qols, Qrec, Mols, and M rec have the same limiting distributions

as those in Theorems 2.1(a) and 2.2(a).

(b) Under the alternative hypothesis, if δ′(Λjj,1 − Λjj,0)δ → 0 as |δ| → ∞, for some j, then

T 2/3Qols = Op(∥δ∥−4/3),

T 2/3Qrec = Op(∥δ∥−4/3),

T 1/3Mols = Op(∥δ∥−2/3),

T 1/3M rec = Op(∥δ∥−2/3),

whereas if δ′(Λjj,1 − Λjj,0)δ → 0 as |δ| → ∞ for all j, then

T 2/3Qols = Op(1),

T 2/3Qrec = Op(1),

T 1/3Mols = Op(1),

T 1/3M rec = Op(1).

We can see that δ′(Λjj,1−Λjj,0)δ → 0 if xt consists only of a constant (xt = 1). In this case,

the tests suffer from the so-called non-monotonic power problem, as investigated by Vogelsang

(1999). Several methods have been proposed to overcome this problem, including those of

Crainiceanu and Vogelsang (2007), Kejriwal (2009), Juhl and Xiao (2009), Shao and Zhang

(2010), Yang and Vogelsang (2011), and Yamazaki and Kurozumi (2015). Furthermore, even

if the above condition does not hold, note that the divergence rates of the test statistics are

reduced in the case of serially correlated errors compared with the case in Section 2.3.2. That

is, the modifications robust to serial correlation result in the reduction of power, as is often

observed in the literature.

2.4 Finite Sample Properties

In this section, we investigate the finite sample performance of the tests considered in this

chapter. The data-generating process (DGP) we consider is given by

yt = x′t(β + δ1(t>[Tλ])) + ut, ut = ρut−1 + ϵt,

where xt = [1, zt]′, β = [1, 1]′, and ϵt ∼ i.i.d.N(0, (1 − ρ)2). The settings for δ and λ are

explained later. The stochastic regressor zt is an AR(1) process with mean 1 and variance 1,

given by

zt = 0.5 + 0.5zt−1 + et, et ∼ i.i.d.N(0, 0.75),

where et is independent of ϵt. We set ρ = 0 to investigate the performance of the

truncated versions of the tests in Section 2.3.2 given in (2.18)–(2.20), while ρ = 0.4 and 0.8

are used for the tests robust to serial correlation developed in Section 2.3.3. The sample size

T is 100 and 200, the number of replications is 5,000, and all computations are conducted by

using the GAUSS matrix language.

We first investigate the finite sample performance of the tests in Section 2.3.2 with ρ = 0.

Panel (a) of Table 2.2 shows that the sizes of all the tests are relatively well controlled,

although they tend to be slightly conservative. Because the empirical sizes of the tests are

different, we investigate the finite sample properties of the tests under the alternative using the

size-adjusted powers. We set a one-time shift in the coefficient to δ = b[1, 1]′ (non-orthogonal

change with c′1δ = 0) and δ = b[−1, 1]′ (orthogonal change with c′1δ = 0). Here, the magnitude

of the change is controlled by b = 0, 0.5, 1.0, 1.5, and 2.0, and the break fraction λ is set

to 0.5. Figures 2.2(a) and (b) show that the difference in power is relatively small among

the three tests based on the same (OLS or recursive) residuals when c′1δ = 0. However, as

shown in Figures 2.2(c) and (d), when c′1δ = 0, the modified tests are more powerful than the

original tests. When we focus on either the quadratic-type or the maximum-type tests, the

OLS-based test is more powerful than the recursive-based test. We also investigate the effect

of the location of the break on the performance of the quadratic-type and maximum-type

tests by changing λ from 0.2 to 0.8. Figures 2.2(e)–(h) show that the effect of the location

of the change in finite samples is consistent with the theoretical result given in Section 2.3.

For example, the modified tests using the OLS residuals are maximized at λ = 0.5. For an

early break, the tests using recursive residuals outperform those using OLS residuals, and

vice versa, for a late break.

In the case where the error term may be serially correlated, we should use the tests

proposed in Section 2.3, the empirical sizes of which are summarized in Panels (b) and (c) of

Table 2.2. We can see that the tests based on the OLS residuals tend to suffer from under-size

distortion, particularly when serial correlation is strong (ρ = 0.8). With regard to power in

the case of ρ = 0.4, the relative performance of the tests seems to be preserved, although

Figures 2.3(a) and (b) show that the tests suffer from the so-called non-monotonic power.2

Figures 2.3(g) and (h) show that the effect of the location of a change on the tests is similar

to the case of serially uncorrelated errors. We obtain a similar tendency in the case of ρ = 0.8

and, thus, omit the details.

When detecting non-orthogonal structural changes, the classic CUSUM test is slightly

powerful than the modified ones in some cases. In application, when the classic CUSUM test

2Because the main purpose of this study is to investigate the modified CUSUM tests, developed to overcomethe loss of power caused by an orthogonal shift of the parameters, we do not pursue this problem further here.

rejects the null hypothesis, but the modified one cannot reject the null hypothesis, we can

estimate the value of c′1δ ( the angle between the mean regressor and direction of the break)

by using real data. If the estimated value of c′1δ is far from zero, then we may rely on the

result based on the classic CUSUM test.

2.5 Conclusion

When a structural change is orthogonal to the mean of the regressors, standard CUSUM tests

lose power. As a result, several modified tests have been proposed, explicitly or implicitly, in

the literature. We investigated the asymptotic properties of such modified tests and found

that they can successfully reject the null hypothesis, even in the case of an orthogonal struc-

tural change. In this sense, the modified tests complement the standard test in theoretical

analyses. In this study, we focused on a fixed alternative hypothesis. We can investigate the

power properties of the modified CUSUM tests from a local perspective in future works.

Appendix A

Proof of Proposition 2.1. (a) In the case of the OLS residuals, from the functional central

limit theorem, weak law of large numbers, and continuous mapping theorem, we have

√T (β − β) ⇒ C−1B(1), (A.1)

where B(r) = [B1(r), · · · , Bk(r)]′ for 0 ≤ r ≤ 1 is a k-dimensional Brownian motion with

variance σ2C. Since ut = ut − x′t(β − β), we have

[Tr]∑t=1

wjt =1√T

[Tr]∑t=1

xjtut −1√T

[Tr]∑t=1

xjtx′t(β − β)

⇒ Bj(r)− rc′jC−1B(1)

= Bj(r)− rBj(1),

where the last equality holds because c′jC−1 = [0, · · · , 0, 1, 0, · · · , 0]. Similarly, we have

σ2j =

T∑t=1

x2jtu2t + op(1)

p−→ σ2cjj .

Since Bj(r)−rBj(1) =d σc1/2jj BBj(r) where BBj(r) is a standard Brownian bridge, we obtain

(2.5).

For the CUSUM test based on the recursive residuals, following Kramer, Ploberger, and

Alt (1988), we have

1√T − k

[Tr]∑t=k+1

wjt =1√

T − k

[Tr]∑t=k+1

xjtut ⇒ σc1/2jj Wj(r),

and T−1∑T

t=1 w2jt

p−→ σ2cjj , where Wj(r) for 0 ≤ r ≤ 1 is a standard Brownian motion. We

then obtain (2.6).

(b) Because β is expressed as (2.2) under the alternative, we have

β − βp−→ (1− λ)δ.

Then, by using expression (2.3), the numerator of the test statistic becomes

[Tr]∑t=1

wjt =1

[Tr]∑t=1

xjtut +1

[Tr]∑t=1

xjtx′tδ1(t>[Tλ]) −

[Tr]∑t=1

xjtx′t(β − β)

p−→ 0 + c′jδ(r − λ)1(r>λ) − rc′j(1− λ)δ

= c′jδ[(r − λ)1(r>λ) − r(1− λ)

](A.2)

uniformly over 0 ≤ r ≤ 1, the absolute value of which is maximized at r = λ, which is equal

to |c′jδ|λ(1− λ). Similarly, we have

T∑t=1

w2jt =

T∑t=1

x2jtu2t +

T∑t=1

′tδ1(t>[Tλ])

T∑t=1

′t(β − β)

T∑t=1

x2jtutx′tδ1(t>[Tλ]) −

T∑t=1

x2jtutx′t(β − β)

T∑t=1

x2jt1(t>[Tλ])δ′xtx

′t(β − β)

p−→ σ2cjj + (1− λ)δ′Λjj,0δ + (1− λ)2δ′Λjj,0δ + 0− 0− 2(1− λ)2δ′Λjj,0δ

= σ2cjj + λ(1− λ)δ′Λjj,0δ. (A.3)

From (A.2) and (A.3), we obtain (2.7).

The proof in the case with the recursive residuals is analogous to the OLS case. We first

note that because ftp−→ 1, T−1

∑Tt=1(1/ft−1)xtx

p−→ 0, which implies that T−1∑T

t=1 xtx′t/ft

p−→

C. This result is used repeatedly in our proofs below.

Since the recursive residuals are written under the alternative as ut = [ut + x′tδ1(t>[Tλ]) −

x′t(βt−1−β)]/ft and βt−1−β =(∑t−1

s=1 xsx′s

)−1 (∑t−1s=1 xsx

′sδ1(t>[Tλ]) +

∑t−1s=1 xsus

), wjt can

be expressed as

wjt =1

ftxjtut +

ftxjtx

′tδ1(t>[Tλ]) −

ftxjtx

(t−1∑s=1

xsx′s

)−1( t−1∑s=1

xsx′sδ1(s>[Tλ]) +

t−1∑s=1

Then, we have

[Tr]∑t=1

wjtp−→ 0 + c′j

0δ1(v>λ)dv − c′j

[(vC)−1

0Cδ1(w>λ)dw

= c′jδλ[log(r)− log(λ)]1(r>λ) (A.4)

uniformly over 0 ≤ r ≤ 1. By using standard calculus, we can show that |c′jδ|λ[log(r) −

log(λ)]1(r>λ)/(1 + 2r) takes the maximum value |c′jδ|q, where q is defined as (2.9). Further-

more, similarly to (A.3), we have

T∑t=1

p−→ σ2cjj + (1− λ)δ′Λjj,0δ

vC−1

(∫ v

0Cδ1(w>λ)dw

)(∫ v

0Cδ1(w>λ)dw

)′ 1

vC−1Λjj,0

+0− 0− 2

[δ1(v>λ)

0δ′1(w>λ)dw

vΛjj,0

= σ2cjj + λ(1− λ)δ′Λjj,0δ. (A.5)

From (A.4) and (A.5), we can see that

p−→ σ2cjj + λ(1− λ)δ′Λjj,0δ −(c′jδλ log(λ)

We then obtain (2.8).

Proof of Theorem 2.1. (a) From (A.1), we have

[Tr]∑t=1

xtut ⇒ B(r)− rB(1),

and thus [Tr]∑t=1

′(T∑t=1

xtx′t

)−1[Tr]∑

⇒ σ2∥BB(r)∥2.

Since σ2 p−→ σ2 under the null hypothesis, (2.10) is obtained.

The null limiting distribution of Qrec can be derived similarly.

(b) In the same way as (A.2), we have

[Tr]∑t=1

xtutp−→ Cδ

[(r − λ)1(r>λ) − r(1− λ)

](A.6)

uniformly over 0 ≤ r ≤ 1, and, thus, 1

[Tr]∑t=1

T∑t=1

xtx′t

)−1 1

[Tr]∑t=1

p−→ δ′Cδ[(r − λ)1(r>λ) − r(1− λ)

]2uniformly over 0 ≤ r ≤ 1, which achieves a maximum at r = λ, while σ2 p−→ σ2+λ(1−λ)δ′Cδ,

as proved by Deng and Perron (2008). We then obtain (2.12).

In the case of the recursive residuals, in the same way as in (A.4), we have

[Tr]∑t=k+1

xtutp−→ Cδλ (log(r)− log(λ)) 1(r>λ)

uniformly over 0 ≤ r ≤ 1, and, thus, 1

[Tr]∑t=k+1

T∑t=1

xtx′t

)−1 1

[Tr]∑t=k+1

p−→ δ′Cδ[λ(log(r)− log(λ))1(r>λ)

while σ2 p−→ σ2 + λ(1− λ)δ′Cδ − (c′1δλ log(λ))2 is shown by Deng and Perron (2008), which

implies (2.13).

Proof of Theorem 2.2. (a) Because

1 , · · · ,Molsk

(T∑t=1

xtx′t

)−1/2 [Tr]∑t=1

xtutd−→ BB(r),

(2.14) is obtained. (2.15) can be proved similarly to (2.11).

(b) Under the alternative, we have

1 , · · · ,Molsk

]′ p−→C1/2δ

[(r − λ)1(r>λ) − r(1− λ)

]√σ2 + λ(1− λ)δ′Cδ

and, therefore,

0≤r≤1|Mols

1 |, · · · , sup0≤r≤1

|Molsk |]′

p−→ 1√σ2 + λ(1− λ)δ′Cδ

[|v1|λ(1− λ), · · · , |vk|λ(1− λ)]′ ,

where vj is defined as in Theorem 2.2. We then have

max1≤j≤k

sup0≤r≤1

∣∣∣Molsj (r)

∣∣∣ p−→max1≤j≤k|v1|, · · · , |vk|λ(1− λ)√

σ2 + λ(1− λ)δ′Cδ.

For the recursive residuals, we can prove (2.17) similarly to the proof of Theorem 2.1(b).

Proof of Theorem 2.3. (a) The null limiting distributions can be obtained in the same

way as in Theorems 2.1(a) and 2.2(a).

(b) We first derive the divergence rate of the bandwidth m. Because ut is expressed as (2.3),

in the same way as (A.3), we have

T∑t=p+1

wjtwjt−pp−→ γjj,p + λ(1− λ)(δ′Λjj,pδ) (A.7)

for a given p. Then, we have

ρjp−→ γjj,1 + λ(1− λ)δ′Λjj,1δ

γjj,0 + λ(1− λ)δ′Λjj,0δ,

and, hence,

1−ρjp−→ γjj,0 − γjj,1 + λ(1− λ)δ′(Λjj,0 − Λjj,1)δ

γjj,0 + λ(1− λ)δ′Λjj,0δ=

(∥δ∥−2

): δ′(Λjj,0 − Λjj,1)δ → 0

Op(1) : δ′(Λjj,0 − Λjj,1)δ → 0

Because σ2j = Op(∥δ∥2) by (A.3), we can see that a(δ) = Op

(∥δ∥4

)if δ′(Λjj,0 − Λjj,1)δ → 0,

for some j, and is Op(1) otherwise. Hence,

(∥δ∥4/3T 1/3

): δ′(Λjj,0 − Λjj,1)δ → 0 ∃j

Op(T1/3) : δ′(Λjj,0 − Λjj,1)δ → 0 ∀j

By using this result, we next derive the divergence order of Ω. In the same way as (A.7),

Γpp−→ Γp + λ(1− λ)plim

T∑t=p+1

xtx′tδδ

′xt−px′t−p = Op(∥δ∥2).

Then, because∑m

p=1 k(p,m) = O(m),

∥Ω∥ =

∥∥∥∥∥∥Γ0 +m∑p=1

k(p,m)(Γp + Γ′

)∥∥∥∥∥∥≤ O(m)Op(∥δ∥2) =

(∥δ∥10/3T 1/3

): δ′(Λjj,0 − Λjj,1)δ → 0 ∃j

Op(∥δ∥2T 1/3) : δ′(Λjj,0 − Λjj,1)δ → 0 ∀j (A.8)

Because it can be shown that (A.6) holds under Assumption 2.4, we have that

Qols =

(∥δ∥−4/3T 2/3

): δ′(Λjj,0 − Λjj,1)δ → 0 ∃j

Op(T2/3) : δ′(Λjj,0 − Λjj,1)δ → 0 ∀j

Thus, we obtain the result.

The order of Mols is obtained similarly by using (A.6) and (A.8).

Because we can obtain the results of Qrec and M rec in the same way as in the case of the

OLS residuals, we omit the proof here.

le2.1:

Asymptoticcritical

values

Quadratic

(a)r∈[0,1]

10.406

12.553

10.738

14.486

10.522

12.340

16.312

(b)r∈[0.15,0.85]

10.680

12.298

10.489

13.886

Table 2.2: Empirical sizes under H0

CUSUM Quadratic Max

ols rec ols rec ols rec

(a) ρ = 0T = 100 0.040 0.041 0.033 0.039 0.032 0.040T = 200 0.042 0.045 0.037 0.039 0.040 0.040

(b) ρ = 0.4T = 100 0.037 0.056 0.019 0.059 0.020 0.058T = 200 0.052 0.056 0.033 0.057 0.034 0.054

(c) ρ = 0.8T = 100 0.011 0.050 0.002 0.072 0.002 0.072T = 200 0.029 0.047 0.006 0.057 0.009 0.062

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

Quadratic-olsQuadratic-rec

(a) Quadratic of ols and rec (c′1δ = 0)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

Quadratic-olsQuadratic-rec

(b) Quadratic of ols and rec (c′1δ = 0)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

Max-olsMax-rec

(c) Max of ols and rec (c′1δ = 0)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

Max-olsMax-rec

(d) Max of ols and rec (c′1δ = 0)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

Quadratic-olsMax-ols

(e) Quadratic and Max of ols (k = 2)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

Quadratic-olsMax-ols

(f) Quadratic and Max of ols (k = 3)

Figure 2.1: Asymptotic power properties

0 0.5 1 1.5 2

magnitude

CUSUM-olsCUSUM-rec

Q-olsQ-recM-olsM-rec

(a) T = 100, c′1δ = 1

0 0.5 1 1.5 2

magnitude

CUSUM-olsCUSUM-rec

(b) T = 200, c′1δ = 1

0 0.5 1 1.5 2

magnitude

CUSUM-olsCUSUM-rec

(c) T = 100, c′1δ = 0

0 0.5 1 1.5 2

magnitude

CUSUM-olsCUSUM-rec

(d) T = 200, c′1δ = 0

0.2 0.3 0.4 0.5 0.6 0.7 0.8

change of location

CUSUM-olsCUSUM-rec

(e) T = 100, c′1δ = 1

0.2 0.3 0.4 0.5 0.6 0.7 0.8

change of location

CUSUM-olsCUSUM-rec

(f) T = 200, c′1δ = 1

0.2 0.3 0.4 0.5 0.6 0.7 0.8

change of location

CUSUM-olsCUSUM-rec

(g) T = 100, c′1δ = 0

0.2 0.3 0.4 0.5 0.6 0.7 0.8

change of location

CUSUM-olsCUSUM-rec

(h) T = 200, c′1δ = 0

Figure 2.2: Size-adjusted powers (ρ = 0)

0 0.5 1 1.5 2

magnitude

CUSUM-olsCUSUM-rec

(a) T = 100, c′1δ = 1

0 0.5 1 1.5 2

magnitude

CUSUM-olsCUSUM-rec

(b) T = 200, c′1δ = 1

0 0.5 1 1.5 2

magnitude

CUSUM-olsCUSUM-rec

(c) T = 100, c′1δ = 0

0 0.5 1 1.5 2

magnitude

CUSUM-olsCUSUM-rec

(d) T = 200, c′1δ = 0

0.2 0.3 0.4 0.5 0.6 0.7 0.8

change of location

CUSUM-olsCUSUM-rec

(e) T = 100, c′1δ = 1

0.2 0.3 0.4 0.5 0.6 0.7 0.8

change of location

CUSUM-olsCUSUM-rec

(f) T = 200, c′1δ = 1

0.2 0.3 0.4 0.5 0.6 0.7 0.8

change of location

CUSUM-olsCUSUM-rec

(g) T = 100, c′1δ = 0

0.2 0.3 0.4 0.5 0.6 0.7 0.8

change of location

CUSUM-olsCUSUM-rec

(h) T = 200, c′1δ = 0

Figure 2.3: Size-adjusted powers (ρ = 0.4)

Chapter 3

A Sequential Test for StructuralChanges in Models with a Trend

We develop a CUSUM-type monitoring procedure based on the ordinary least squares resid-

uals for detecting structural changes in models with a trend. A proper boundary function is

designed to control the size. We derive the limiting null distribution and the consistency of

the procedure under the alternative. In addition, we derive the asymptotic distribution of

the delay time for the CUSUM procedure as well as the fluctuation procedure proposed by Qi

et al. (2016). The simulation and empirical results indicate that although neither procedure

is uniformly superior to the other, the CUSUM test is more suitable for an early break.1

3.1 Introduction

The first contribution to continuously monitoring parameter changes in the econometrics

literature was by Chu et al. (1996). They introduced a monitoring scheme by setting a

training period of size m in which the parameters are known to be stable as a reference

for comparison with new data and argued that the key feature of the sequential tests is to

construct a nondecreasing boundary function such that the tests can maintain a proper size.

This approach has been developed in many directions. Leisch et al. (2000) extended the

fluctuation test of Chu et al. (1996) based on moving estimates, with the boundary function

having a slower growth rate to improve the sensitivity to a late break in a monitoring period.

The MOSUM (moving sum) procedure was further investigated by Horvath et al. (2008), who

indicated that prior information on the moment structure of innovations is required to choose

a suitable boundary function. Horvath et al. (2004) discussed two classes of the residual-based

1The published version is Jiang and Kurozumi (2020). Monitoring parameter changes in models with atrend. Journal of Statistical Planning and Inference 207, 288-319. https://doi.org/10.1016/j.jspi.2020.01.004

cumulative sum monitoring procedure with an infinite monitoring horizon and introduced an

appropriate boundary function with the parameter γ ∈ [0, 1/2) to deal with different timings

of changes. Since the speed of detection is a crucial measure, Aue and Horvath (2004) derived

the limit distribution of the stopping time for a changing mean model, which is asymptotically

normal, while Aue et al. (2009) extended a local-level model to a linear regression model. They

found that γ close to 1/2 implies a shorter detection delay for an early break. Horvath et al.

(2007), Aue et al. (2008b), and Aue and Kuhn (2008) further investigated the behaviors of

the delay time in the case of γ = 1/2. Following the work of Aue and Horvath (2004), Fremdt

(2014, 2015) derived the asymptotic distribution of Page’s sequential CUSUM procedure and

compared the asymptotic normality of the stopping time with that of the ordinary CUSUM

version under a weaker condition on the change. Furthermore, the monitoring procedure

for sequentially detecting parameter instability has been investigated extensively in various

models. For example, Carsoule and Franses (2003) and Lee et al. (2009) developed sequential

tests in autoregressive models, while Na et al. (2011) applied the monitoring procedure to

detect changes for autocorrelation function, parameter instability in GARCH models, and

distributional changes. Xia et al. (2011) and Kurozumi (2017) considered a monitoring scheme

for linear models with endogenous regressors.

All the aforementioned sequential tests focus on models with nontrending regressors.

However, as noted by Perron (1989) and others, macroeconomic time series are sometimes

better characterized by trend-stationary series with possible change(s) in deterministics. Such

evidence with an upward or downward trend has also been found in the fields of tourism,

marketing, and environmental studies. For models with trending regressors, Chu and White

(1992), Kuan (1998), and Aue et al. (2008a) among others proposed tests of parameter in-

stability based on a given historical sample, while Qi et al. (2016) extended the generalized

fluctuation test to monitor structural changes in polynomial regressions.

In this chapter, we develop a CUSUM-type monitoring scheme based on ordinary least

squares residuals to detect parameter instability in a model with a trend. A new boundary

function is introduced to maintain a proper size. We derive the limit distribution of the

CUSUM detecting statistic under the null hypothesis, while proving that the test is consistent

under the alternative. We also extend the CUSUM monitoring procedure to models with

higher order polynomial trends. In addition, we derive the asymptotic distribution of the

delay time for the CUSUM procedure as well as the fluctuation procedure proposed by Qi

et al. (2016). We find that the delay time of the CUSUM test grows at a slower rate than

that of the fluctuation test, which implies that the latter requires a longer time to detect an

early change than the former. Then, we compare the CUSUM and fluctuation tests in a small

simulation study and apply them to macroeconomic time series. The results confirm that

the performance of the tests strongly depends on the timing of changes. The CUSUM test is

good at detecting an early change soon after the training period and has a shorter detection

time than the fluctuation test, while the fluctuation test is suitable for a late break.

The remainder of this chapter is as follows. In Section 3.2, we introduce the model and

our assumptions. The asymptotic properties of the test are investigated in Section 3.3 and

we extend the CUSUM monitoring procedure to models with higher order polynomial trends.

Section 3.4 investigates the asymptotic distribution of delay times. Then, we compare the

CUSUM and fluctuation tests in finite samples via Monte Carlo simulations in Section 3.5.

Section 3.6 provides an empirical example and concluding remarks are given in Section 3.7.

The mathematical proofs are relegated to the Appendix B and Appendix C.

3.2 Model and Assumptions

We consider the following model:

yt = x′tβt + ϵt (t = 1, 2, · · · ,m,m+ 1, · · · ), (3.1)

where xt = [1, t/m]′ is a regressor including a constant term and a trend, ϵt is an unobservable

stochastic disturbance, and βt = [β0t, β1t]′ is a vector of the coefficients. Our asymptotic

results do not change if the regressor is replaced by xt = [1, t].

The “noncontamination assumption”, as noted by Chu et al. (1996), is particularly im-

portant, and we suppose that there is no change in the training period of size m, that is,

βt = β0, t = 1, 2, · · · ,m.

The historical data are set as a reference to compare with the new data.

We are interested in testing the null hypothesis that βt is stable and allows for a one-time

change in the parameters under the alternative. Thus, we consider the testing problem given

H0 : βt = β0, t = m+ 1,m+ 2, · · ·

against the alternative hypothesis

H1 : There is k∗ ≥ 1 such that βt = β0, t = m+ 1,m+ 2, · · · ,m+ k∗ − 1,

but βt = β0 +∆, t = m+ k∗,m+ k∗ + 1, · · · with ∆ = [∆1,∆2]′.

We reject the null hypothesis if the detecting statistic (detector) Γ(m, k) exceeds a bound-

ary function g(m, k) for some k ≥ 1. The detector and boundary function must be designed

such that

limm→∞

P (τm < ∞) = α, under H0, (3.2)

limm→∞

P (τm < ∞) = 1, under H1, (3.3)

where the stopping time τm is defined by

infk ≥ 1 : |Γ(m, k)| ≥ g(m, k),∞, if |Γ(m, k)| < g(m, k) for all k = 1, 2, · · · .

Condition (3.2) ensures that the probability of a false alarm is given by α, while Condition

(3.3) means that we reject the hypothesis of no change with a probability approaching one

under the alternative.

To investigate the asymptotic properties of the monitoring test, we impose the following

assumption.

Assumption 3.1 For every m, there are two independent sequences of Wiener processes

W1,m(t), t ≥ 0, W2,m(t), t ≥ 0 and a constant σ > 0 such that, for some ν > 2,

sup1≤k<∞

∣∣∣∣∣m+k∑

ϵt − σW1,m(k)

∣∣∣∣∣ = Op(1), (3.5)

m∑t=1

ϵt − σW2,m(m) = op(m1/ν). (3.6)

The sequence ϵt satisfying Conditions (3.5) and (3.6) includes not only an i.i.d. sequence but

also a dependent sequence with some regularity conditions, as discussed by Aue and Horvath

(2004). For example, if a sequence ϵt is generated by ϵt =∑∞

j=0 cjδt−j , where δj are i.i.d.

random variables with mean 0, variance σ2, and E|δt|ν < ∞ for some ν > 2 and if δt has a

smooth density and cj satisfies some regularity conditions given by Horvath (1997), then

Assumption 3.1 holds as given by Example 2.2 of Aue and Horvath (2004). From Conditions

(3.5) and (3.6), we can derive the following approximation.

sup1≤m<∞

∣∣∣∣∣m∑t=1

ϵt − σ

)idW2,m(x)

∣∣∣∣∣ = Op(1), i = 1, 2, ..., p (3.7)

See, for example, Aue et al. (2008a).

3.3 Monitoring Procedure for a Change in the Trend

3.3.1 CUSUM-based monitoring procedure

The CUSUM procedure is based on the future residuals of the model given by

ϵt = yt − x′tβm, (3.8)

where βm is an OLS estimator from the historical data given by

(m∑t=1

xtx′t

)−1 m∑t=1

The CUSUM detector is defined by

Γ(m, k) =1

σmΓ(m, k) where Γ(m, k) =

m+k∑t=m+1

for k = 1, 2, . . . where σ2m is the consistent estimator of σ2 obtained from the training period.

Letting k/m = λ, (B.9) in the proof of Theorem 3.1(i) provides the asymptotic distribu-

tion of the detector as follows:

σm√m

m+k∑t=m+1

ϵt ⇒ (λ+ 1)W1

)+√3λ(λ+ 1)W2(1)

=: G(λ), (3.9)

where W1(·) and W2(·) are independent Wiener processes. This process has zero mean and

a growing variance

3λ2(λ+ 1)2 + λ(λ+ 1), (3.10)

which implies that a constant boundary cannot be used for the monitoring procedure because

the detecting statistic will eventually exceed a constant boundary and the null hypothesis

will be rejected with a probability approaching one even if the parameter is stable.

As noted by Chu et al. (1996), the growing variance of the asymptotic distribution of the

detector induces an increasing monitoring boundary. The boundary function cannot grow at

a too slow rate because the monitoring procedure will have a high probability of type one

error, while a boundary function with a too fast growth rate will result in the low power of

the test. The distribution of G(λ) shows that the first term is dominated by the second one

as λ increases, which determines the growth rate of the limiting process, and this enables us

to find a suitable form of the boundary function. Based on the boundary function proposed

by Horvath et al. (2004), we also allow for a flexible adjustment of the test to deal with

an early break in the monitoring period in terms of the parameter γ ∈ (0, 1/2). Thus, we

design the boundary function such that it grows at the rate√3λγ(λ + 1)2−γ . This means

that the probability of the excess over the boundary can be controlled to maintain a proper

size. Then, we propose the boundary function given by

g(m, k) = c√3m

)γ (1 +

)2−γ

, for some 0 < γ <1

This boundary function over√m grows approximately at the rate λγ(1+λ)2−γ to ensure that

P (|G(λ)| ≥ c√3λγ(1 + λ)2−γ) equals α for some c. The parameter γ is a tuning parameter,

which must be determined by a researcher. Because the shape of the boundary function affects

the detection property for structural change, γ would be chosen based on that property. The

other possible choice of γ may be based on Anatolyev and Kosenok (2018), which introduced

a reasonable criterion for the shape of boundaries which requires that the size of the test be

uniformly distributed over the testing period. From our preliminary simulations, we found

that the test using the boundary function with γ = 0.35 has such a property. Therefore, in

the simulations and and the empirical analysis in later sections, we use g(m, k) with γ = 0.35.

We next derive the limiting properties of the procedure in Theorem 3.1.

Theorem 3.1 Suppose that Assumption 3.1 holds.

(i) Under the null hypothesis, we have

limm→∞

1≤k<∞|Γ(m, k)| ≤ g(m, k)

0≤t≤1

∣∣∣∣ 1− t√3tγ

W1(t) + t1−γW2(1)

∣∣∣∣ ≤ c

where W1(t), 0 ≤ t < ∞ and W2(t), 0 ≤ t < ∞ are independent Wiener processes.

(ii) Suppose that ∆2 = 0. Then, under the alternative, we have

sup1≤k<∞

|Γ(m, k)|g(m, k)

→ ∞, as m → ∞.

In Theorem 3.1(i), the critical value c = c(α) determined by the significance level α can

be obtained from the asymptotic distribution of the detector. In practice, the monitoring

period of the procedure cannot go to infinity; instead, a researcher determines how long

he/she would like to monitor a change (the length of the monitoring horizon). Therefore, we

suppose that k ranges from 1 to κm for some κ > 0. This means that we start testing at

time m+ 1 and stop at time m+ κm. Then, the critical values are obtained from

limm→∞

P ( sup1≤k≤κm

|Γ(m, k)| ≥ g(m, k)) = P

0≤t≤ κκ+1

∣∣∣∣ 1− t√3tγ

W1(t) + t1−γW2(1)

∣∣∣∣ ≥ c(α)

)= α,

(3.11)

which depends on the various selections of κ and γ in the boundary function. We choose

κ = 1, 2, . . . , 8, and γ = 0.05, 0.15, . . . , 0.45, and approximate Brownian motions using 1,000

independent normal random variables with 100,000 replications to obtain the critical values

in Table 3.1.

Theorem 3.1(i) shows that the asymptotic distribution is composed of two independent

Wiener processes. In the case of t → 0, because of the law of the iterated logarithm, the condi-

tion γ < 1/2 ensures that the term |W1(t)|/tγ converges to zero and consequently the asymp-

totic distribution tends to zero. As t → 1, the first component associated with W1(t) con-

verges to zero and thus the distribution of the procedure is determined by the second Wiener

process. As a result, the proposed boundary function enables the CUSUM monitoring proce-

dure to maintain a nondegenerate and finite limit. Theorem 3.1(ii) implies that the CUSUM

monitoring test is consistent and that (3.3) is satisfied. We can see that the diverging order

of the detecting statistic crucially depends on the term sup1≤k<∞ |∑m+k

t=m+k∗ x′t∆|/g(m, k)

under the alternative from the proof of Theorem 3.1(ii). This term is guaranteed to diverge

to infinity as far as ∆2 = 0.

3.3.2 Extension to higher order polynomials

We extend the CUSUM test to sequentially detect changes in models with higher order

polynomial trend. In this case, βt is a (p+ 1)× 1 parameter vector and the regressor xt is a

(p+ 1) dimensional deterministic vector of the form

, . . . ,

)p]′

The detecting statistic of the CUSUM procedure is defined as the previous subsection and

as given in the proof of Theorem 3.2, the asymptotic distribution of Γ(m, k)/√m is given by,

for k/m = λ,

G(λ) = (λ+ 1)W1

)+ c′(λ)W (1),

where W1(·) and W (·) are 1- and (p+ 1)-dimensional Wiener processes independent of each

other, c(λ) = D−1/2L−1c1+λ−(1+λ)c1, c1+λ and C are (p+1)×1 vector and (p+1)×(p+1)

matrix with the ith and (i, j)th elements given by c1+λ(i) = (1 + λ)i/i and C(i, j) = 1/(i +

j − 1), respectively, and D and L are obtained from the Cholesky decomposition of C given

by C = LDL′ (see the proof of Theorem 3.2). For example, G(λ) becomes equal to (B.9) for

p = 1, while it is expressed as, for p = 2,

G(λ) = (λ+ 1)W1

)+√3λ(λ+ 1)W3(1) +

√5λ(λ+ 1)(2λ+ 1)W4(1),

where W1(·), W3(·), and W4(·) are independent 1-dimensional Wiener processes.

The process G(·) has mean zero and a growing variance given by

λ(λ+ 1) + c′(λ)c(λ),

where c′(λ)c(λ) = c1+λ − (1 + λ)c1′C−1c1+λ − (1 + λ)c1 is a polynomial of λ of order

2(p + 1) from (B.33). There is no common rule for the exact expression of c′(λ)c(λ). For

example, we can show that it is 3λ2(1 + λ)2 and 4λ2(1 + λ)2(5λ2 + 5λ+ 2) for p = 1 and 2,

respectively.

As in the case of linear trend, we need to take a diverging rate into account to determine

the boundary function. From the structure of c′(λ)c(λ), the highest order λ2(p+1) comes from

the (p + 1)th element of c1+λ, which is (1 + λ)p+1/(p + 1), and the (p + 1, p + 1)th element

of C−1, and we need to obtain its coefficient. In general, the (i, j)th element of C−1 is given

by Choi (1983) as

C−1(i, j) = (−1)i+j(i+ j − 1)

p+ 1− j

)(p+ j

p+ 1− i

)(i+ j − 2

i− 1

and thus the coefficient associated with λ2(p+1) becomes

f(p) =(2p+ 1)

)2(p+ 1)2

For example, f(1) = 3 and f(2) = 20. We then propose the boundary function given by

g(m, k) = c√

f(p)√m

)γ (1 +

)p+1−γ

for some 0 < γ <1

which enables the monitoring procedure to maintain a finite limit. 2 We summarize the

limiting properties of the procedure in Theorem 3.2.

Theorem 3.2 Suppose that Assumption 3.1 holds.

(i) Under the null hypothesis, we have

limm→∞

1≤k<∞|Γ(m, k)| ≤ g(m, k)

0≤t≤1

∣∣∣∣∣(1− t)pW1(t)√f(p)tγ

+(1− t)p+1c′(t/(1− t))W (1)√

f(p)tγ

∣∣∣∣∣ ≤ c

2In the case of quadratic trend, the critical values for γ = 0.35 and κ = 1 are 1.2800, 0.9742, and 0.8215at significance levels 1%, 5%, and 10%, respectively. We conducted simulations with m = 250 and the end ofthe monitoring period given by 2m (κ = 1), and found that the size of the test is well controlled (for example,the empirical size at the 5% nominal level is 0.053).

where W (·) = [W2(·),W3(·), · · · ,Wp+2(·)]′ is a (p + 1)-dimensional Wiener processes inde-

pendent of W1(·).

(ii) Suppose that ∆p+1 = 0, where ∆p+1 is the last element of ∆ = [∆1,∆2, · · · ,∆p+1]′.

Then, under the alternative, we have

sup1≤k<∞

|Γ(m, k)|g(m, k)

→ ∞, as m → ∞.

The null limiting distribution depends on the polynomial order. For example, it is given

in Theorem 3.1(i) for a model with linear trend, while in the case quadratic trend, it can be

expressed as

0≤t≤1

∣∣∣∣∣(1− t)2W1(t)√20tγ

√3t1−γ(1− t)W3(1)√

t1−γ(1 + t)W4(1)

∣∣∣∣∣ ≤ c

The constant term c can be calculated by simulations using these expressions. When the

monitoring horizon is terminated at some point, we can modify the expression as in (3.11).

3.4 Asymptotic Distributions of the Stopping Times

The monitoring procedure generally rejects the null hypothesis of no change possibly with

a delay after the break. Since a shorter detection delay implies a more reliable conclusion

and lower cost, the speed of detection is an important measure for the sequential test. Thus,

we expect that the procedure should reject the null hypothesis as soon as possible under the

alternative. In this section, we derive the limiting distribution of the stopping time based

on the CUSUM monitoring test as well as that based on the test presented by Qi et al.

(2016), who also proposed a monitoring test for a change in a trend. We thus investigate the

theoretical difference in their limiting behaviors in a linear trend model.

To investigate the asymptotic property of the stopping time based on the CUSUM detec-

tor, we make the following assumption related to k∗ and ∆.

Assumption 3.2 (a) There exists a θ > 0 such that k∗ = O(mθ) for some 0 ≤ θ < 1−2γ4(1−γ) .

(b) Let δ = d′∆ = ∆1 + ∆2 where d = [1, 1]′. There are positive constants C1 and C2 such

C1 ≤ |δ| ≤ C2.

Assumption 3.2(a) implies that the order of the change-point k∗ is related to the historical

sample size m. We focus on the same case as Aue et al. (2009) that a break occurs shortly

after the end of the training period.3 Assumption 3.2(b) assumes that the magnitude of the

change is bounded and excludes by a technical reason the case where a change in the trend

coefficient is in the opposite direction to a change in a constant with the same magnitude

(∆2 = −∆1).

Remark 3.1 We focus on the limiting properties of the delay time under early change scenar-

ios, while some papers relaxed the assumptions on the time of the change in linear regression

models and showed that the Page’s CUSUM and MOSUM procedures are able to detect late

changes faster than the ordinary CUSUM procedure. See, for example, Aue et al. (2012),

Fremdt (2014, 2015), and Stohr (2019).

Under this assumption, we derive the asymptotic distribution of the stopping time based

on the CUSUM detector.

Theorem 3.3 Suppose that Assumptions 3.1 and 3.2 hold. Then, we have

limm→∞

P (τm ≤ am + bmz) = Φ(z),

where Φ(·) is the cumulative distribution function of a standard normal distribution,

c1−γm − 1

cγmδ

m+cm∑t=m+k∗

(xt − d)′∆

11−γ

√cmσ

(1− γ)|δ|, and cm =

(√3cσm1/2−γ

) 11−γ

We next derive the asymptotic distribution of the delay time based on the maximal-type

fluctuation procedure of Qi et al. (2016). The detector of Qi et al. (2016) is defined by

ΓFLm (k, ℓ) = σ−1

m ΓFLm (k, ℓ), where ℓ = k is proportional to k as supposed in Assumption 3.3(a)

ΓFLm (k, ℓ) =

m+k∑t=1

yt −m+ k

m∑t=1

yt −k(m+ k)

ℓ(m+ ℓ)

(m+ℓ∑t=1

yt −m+ ℓ

m∑t=1

and the boundary function is given by

gFL(m, k) = cFL√m

3The order of k∗ can be slightly relaxed to O(m3/8) for the fluctuation test, as is seen in the proof ofLemma C.9(i) and (C.30).

Then, the corresponding stopping time is defined as

τFLm = inf

k ≥ 1 : |ΓFL(k, ℓ)| ≥ gFL(m, k)

The critical value cFL = cFL(α) is determined by the asymptotic distribution of the detector

under the null hypothesis. See Qi et al. (2016) for more details.4

To derive the limiting distribution of the stopping time based on the fluctuation test, we

need to make an additional assumption.

Assumption 3.3 (a) Let ℓ = [k+1η ] with η > 1, where the bracket means the integer part of

the term.

(b) Suppose that ∆1 +∆2/2 = 0.

Assumption 3.3(a) defines the relationship between the parameters k and ℓ, where k is

required to be greater than ℓ. Qi et al. (2016) used ℓ = k/2 in their simulations but it can

be relaxed as in (a). Assumption 3.3(b) is a necessary technical condition to ensure that the

limit results in Lemmas C.10, C.11, and C.14 can be derived.

The asymptotic distribution of the stopping time based on the fluctuation test is given in

the following theorem.

Theorem 3.4 Suppose that Assumptions 3.1, 3.2(a), and 3.3 hold. Then, we have

limm→∞

P(τFLm ≤ aFL

m + bFLm z

)= Φ(z),

aFLm =

cFLσm3/2∣∣∣( 1η − 1

) (δ − ∆2

)∣∣∣

and bFLm =

√η − 1σm

2√aFLm

∣∣∣( 1η − 1

) (δ − ∆2

)∣∣∣ .Theorems 3.3 and 3.4 show that the limit distributions of the stopping times are normal.

The sequences am, bm, aFLm , and bFL

m are used to standardize the variables to obtain the

limiting distributions. We can show that τm/amp−→ 1 and τFL

m /aFLm

p−→ 1. Since both

am and aFLm diverge to infinity, both the stopping times also go to infinity. Of importance

is the difference in the diverging rates. In the case of the CUSUM detector, am is of the

order m(1/2−γ)/(1−γ), which takes the value among (m0,m1/2) depending on γ, whereas aFLm

4Qi el al. (2016) also considered the range-type test. However, its finite sample property is inferior to themaximum-type test considered in this article according to their simulations and thus we focus on the lattertest.

is of the order m3/4. This implies that the stopping time based on the fluctuation test grows

at a faster rate than that based on the CUSUM detector. Thus, we expect that the delay

time based on the CUSUM procedure tends to be shorter than that based on the fluctuation

one. In other words, the monitoring test based on the CUSUM detector has a theoretical

advantage over that based on the fluctuation test as far as the break occurs early in the

monitoring period. This is confirmed by the Monte Carlo simulations in Section 3.5.

In this section, we investigate the finite sample performance of the tests considered in the

previous section. The data-generating process we consider is given by

yt = x′t(β0 +∆1t≥m+k∗) + ϵt, ϵt = ρϵt−1 + et, t = 1, . . . ,m,m+ 1, . . . ,m+ κm.

where xt = [1, t/m]′, β0 = [1, 1]′, and et ∼ i.i.d.N(0, (1 − ρ)2), meaning that the long-run

variance of ϵt is 1. The settings for ∆ and k∗ are explained later. In finite samples, we

consider that the monitoring period stops at 4m (κ = 3), while the training period m is 50,

100, and 250. The parameter γ of the boundary function is set to 0.35. We allow for serial

correlation in the errors and the coefficient ρ is 0.4 and 0.8. To obtain a consistent estimate

σ2m of the variance of the errors based on the historical data, we use the prewhitened kernel

estimator proposed by Andrews and Monahan (1992), which is defined by

σ2m = (1− ρ)−1Ω(1− ρ)−1,

where Ω is a standard kernel heteroskedasticity and autocorrelation consistent estimator given

m− 2

Γ0 +m−1∑j=1

)(Γj + Γ

, with Γj =1

m∑t=j+1

etet−j .

The coefficient estimate ρ and residuals et are obtained by regressing ϵt on ϵt−1, where the

OLS residuals ϵt are calculated from regressing yt on xt. In this simulation, we use the

quadratic spectral kernel as k(·), which is defined by

k(x) =25

12π2x2

(sin(6πx/5)

6πx/5− cos(6πx/5)

while the bandwidth Sm is selected based on Andrews (1991) given by

Sm = 1.3221(α(2)m)1/5 where α(2) =4ρ2σ4

(1− ρ)8

(1− ρ)4,

and σ2e is the estimated variance of the residuals et. The significance level is set to 0.05, the

number of replications is 5,000, and all computations are conducted using the GAUSS matrix

language.

Table 3.2 summarizes the empirical sizes of the monitoring procedures. The sizes of both

the CUSUM and fluctuation tests in the cases of m = 100, ρ = 0.4 and m = 250, ρ = 0.4 are

controlled well, whereas for the other cases, the sizes are relatively distorted, especially when

ρ = 0.8.5

For a comparison of the power performance, the change in the coefficient is specified by

∆ = bd, where the magnitude b is set to 0, 0.5, 1.0, 1.5, 2.0 and d = [1, 1]′. Table 3.3 reports

the powers of the monitoring tests corresponding to an early break (m+ k∗ = m+ 1), a late

break (m + k∗ = 1.8m), and a very late break (m + k∗ = 2.5m), respectively. The results

imply that neither the CUSUM test nor the fluctuation test dominates the other in small

samples. The CUSUM test is more powerful than the fluctuation test when the break occurs

soon after the historical data. For the later breaks in the monitoring period, the powers of the

fluctuation test are higher than those of the CUSUM test. Additionally, all the monitoring

tests are more powerful for a larger magnitude of change.

We further investigate the effect of the time of the change on the power performance. The

break date is controlled by m+ k∗ = 1.1m, 1.2m, ..., 2.5m (k∗ = 0.1m, 0.2m, ..., 1.5m); Table

3.4 reports the results. The CUSUM-based monitoring procedure performs better under

early-change settings and the earlier the change occurs, the better the test performs. On the

contrary, the power of the fluctuation test increases as k∗ changes from 0.1m to 1.5m.

Next, we compare the delay times of the two procedures since the detection speed is

regarded as an important indicator of the performance of the monitoring tests. We set

d = [1, 1]′ and consider the breaks occurring at m + k∗ = m + 1, 1.8m, and 2.5m. Table

3.5 summarizes the minimum value, quartiles, and maximum value of the delay time. If a

change occurs rapidly after the end of the training period, the CUSUM version rejects the

null hypothesis earlier than the fluctuation version and the minimum value, quartiles, and

maximum value of the delay time for the CUSUM test are much smaller. This is consistent

with the theoretical result that the stopping time based on the CUSUM version grows at a

slower rate than for the fluctuation version. For the later breaks in the monitoring period,

there is a slight difference between two procedures and the delay time of the fluctuation test

5We also conducted simulations for the range-type test by Qi et al. (2016), but the maximum-type testoutperforms the rage-type one in many cases and thus we report only the former result.

is shorter in some cases.

In summary, as far as power is concerned, the CUSUM procedure works significantly

better than the fluctuation test for the early change, while this fact is reversed for the later

changes. Furthermore, a much faster detection of the CUSUM test for the early change, and

the similarity of two procedures in detection time under later changes scenarios indicate that

neither version is uniformly superior to the other in every scenario considered. Therefore, we

recommend using both monitoring procedures in practical analyses.

3.6 Empirical Example

In this section, we apply the monitoring tests to sequentially detect parameter instability in

macroeconomic time series. The following simple linear trend model is considered:

yt = α+ βt+ ϵt,

where yt is the logarithm of real GDP measured in the domestic currency. Three countries,

namely Denmark, Japan, and New Zealand, are selected and quarterly data are taken from

the International Financial Statistics database. The sample periods are different for each

country and Figure 3.1 describes the logarithm of the real GDP series of the three countries.

We first apply the historical test proposed by Perron and Yabu (2009) to detect structural

changes in the whole sample and find that the null hypothesis of no change in the parameters

is rejected. Then, we estimate the break date by minimizing the sum of the squared residuals

and test for parameter constancy in the period before the estimated break. Because GDP

series can have a unit root, we also investigate the presence of a unit root. The results in Panel

(a) of Table 3.6 indicate that the parameters are stable and that the unit root hypothesis can

be rejected for the three series, which implies that we can set the period before the estimated

break as the training period.

We next investigate whether the two procedures can successfully detect the parameter

changes in the three GDP series and compare their speed of detection. We set different

training periods corresponding to the different timings of the break in the monitoring period

and the results are summarized in Panel (b) of Table 3.6. In the case of Japan, the end points

of the training period are set to 1998Q4, 2006Q2, and 2007Q4, which correspond to the late,

moderate, and early breaks in the monitoring period. We can see that all the tests reject

the null hypothesis of no change in the parameters, except the fluctuation test when the

break occurs early. Moreover, the fluctuation test has a much longer detection delay than the

CUSUM procedure in the case of the late break, while for a moderate change, the fluctuation

test performs better than the other. For Denmark, both procedures can detect an early

change and the CUSUM method rejects the null hypothesis of no change much earlier than

the fluctuation method, which is consistent with our theoretical analysis that the CUSUM

test is expected to have a shorter detection delay than the fluctuation one if the change

occurs early in the monitoring period.6 We also find evidence of the better performance of

the CUSUM-based test for an early change in the case of New Zealand, while the fluctuation

test is good at detecting a relatively late break in the monitoring period, as shown in the

simulations.

3.7 Conclusion

In this chapter, we applied the CUSUM test based on OLS residuals to sequentially detect

structural change in models with a trend. The asymptotic property of the CUSUMmonitoring

procedure was investigated and the results indicated that it can successfully reject the null

hypothesis of no change. We further derived the asymptotic distributions of the stopping

times based on the CUSUM and fluctuation procedures and found that the delay time based

on the CUSUM procedure is shorter than that based on the fluctuation one in the case of

an early break. This tendency is confirmed in finite samples, although the fluctuation test

works better in some cases. Because the location of the break point is unknown in practice,

it would be desirable to consider the monitoring procedure robust to the break location. One

of the possible strategies may be to construct the hybrid procedure using both the CUSUM

and fluctuation tests, which is our future work.

Appendix B. Proofs of Theorems 3.1 and 3.2

Proof of Theorem 3.1

6Except for the first break date 2001Q4, we also find that the GDP of Denmark exhibited several breakpoints (2003Q3, 2006Q1, and 2008Q3) in the period 2002Q1–2018Q2. The monitoring procedures often requirea long stable period with enough observations as the training period. If the structural changes frequentlyoccurred, such as in the period 2002Q1–2018Q2 of Denmark, we cannot find a suitable training period toapply the monitoring procedures in this case.

In this appendix, we replace σ2m with σ2 because it is consistent under both the null and

alternative hypotheses. Let

m∑t=1

x′t, C =

[1 1/2

1/2 1/3

Then, we have ∥∥∥∥ 1

mCm − C

∥∥∥∥ = O

), (B.1)∥∥∥∥∥

− C−1

∥∥∥∥∥ = O

). (B.2)

We rewrite Γ(m, k) as follows:

m+k∑t=m+1

ϵt =m+k∑

ϵt −m+k∑

(m∑t=1

xtx′t

)−1 m∑t=1

xtϵt. (B.3)

Lemma B.1 Under Assumption 3.1,

sup1≤k<∞

∣∣∣∣ m+k∑t=m+1

x′tC−1m

m∑t=1

xtϵt − 1m

m+k∑t=m+1

x′tC−1

m∑t=1

∣∣∣∣h(m, k)

= op(1), as m → ∞,

where h(m, k) = g(m, k)/c with c = c(α) determined by the given significance level α.

Proof of Lemma B.1. Relations (3.6) and (3.7) imply that∥∥∥∥∥m∑t=1

∥∥∥∥∥ = Op(√m), as m → ∞. (B.4)

Putting together (B.1), (B.2), and (B.4), we have

sup1≤k<∞

∣∣∣∣ m+k∑t=m+1

x′tC−1m

m∑t=1

xtϵt − 1m

m+k∑t=m+1

x′tC−1

m∑t=1

∣∣∣∣h(m, k)

≤ sup1≤k<∞

∥∥∥∥ 1m

m+k∑t=m+1

∥∥∥∥ ∥∥∥∥( 1mCm)−1 − C−1

∥∥∥∥ ∥∥∥∥ m∑t=1

∥∥∥∥h(m, k)

= sup1≤k<∞

∥∥∥[ km , k

m + k2

2m2 + k2m2

]∥∥∥h(m, k)

√m).

Since k/m+ k2/(2m2) is the dominating term ofm+k∑

t=m+1xt/m, and

sup1≤k<∞

∣∣∣ km + k2

∣∣∣(km

)γ (1 + k

)2−γ ≤ sup1≤k<∞

)1−γ (1 +

)γ−1

= O(1),

then the proof is complete.

sup1≤k<∞

∣∣∣∣ m+k∑t=m+1

ϵt − 1m

m+k∑t=m+1

x′tC−1

m∑t=1

xtϵt − G(m, k)

∣∣∣∣h(m, k)

= op(1), (B.5)

G(m, k) = σW1,m(k) +k

mG1(m) +

)G2(m),

G1(m) = 2σW2,m(m)− 6σ

mdW2,m(x),

G2(m) = 3σW2,m(m)− 6σ

mdW2,m(x).

Proof of Lemma B.2. The left-hand side of (B.5) becomes

sup1≤k<∞

∣∣∣∣ m+k∑t=m+1

ϵt − 1m

m+k∑t=m+1

x′tC−1

m∑t=1

xtϵt − G(m, k)

∣∣∣∣h(m, k)

≤ sup1≤k<∞

∣∣∣∣ m+k∑t=m+1

ϵt − σW1,m(k)

∣∣∣∣h(m, k)

+ sup1≤k<∞

∣∣∣∣2 m∑t=1

ϵt − 6m∑t=1

tmϵt − G1(m)

∣∣∣∣h(m, k)

+ sup1≤k<∞

m2 + km2 )

∣∣∣∣3 m∑t=1

ϵt − 6m∑t=1

tmϵt − G2(m)

∣∣∣∣h(m, k)

= A1 +A2 +A3.

Under Assumption 3.1, 1/ν < 1/2, and γ < 1/2, we can see that A1 = op(1) because

sup1≤k≤m

∣∣∣∣ m+k∑t=m+1

ϵt − σW1,m(k)

∣∣∣∣h(m, k)

= sup1≤k≤m

Op(k1/ν)mγ−1/2

kγ(1 + k/m)2−γ≤

Op(1)mγ−1/2 if 1/ν < γ

Op(1)m1/ν−1/2 if 1/ν ≥ γ

= op(1),

and in the case of m < k < ∞,

supm<k<∞

Op(k1/ν)mγ−1/2

kγ(1 + k/m)2−γ

Op(1)m

1/ν−1/2 if 1/ν < γ

Op(1) supm<k<∞

k1/ν−γmγ−1/2(mk )2−γ < Op(1)m

1/ν−1/2 if 1/ν ≥ γ

= op(1).

For A2, we have

A2 ≤ sup1≤k<∞

∣∣∣∣ m∑t=1

ϵt − σW2,m(m)

∣∣∣∣h(m, k)

+ sup1≤k<∞

∣∣∣∣ m∑t=1

tmϵt − σ

xmdW2,m(x)

∣∣∣∣h(m, k)

= sup1≤k<∞

k/m√3(k/m)γ(1 + k/m)2−γ

Op(m1/ν−1/2).

sup1≤k≤m

(k/m)γ(1 + k/m)2−γ= sup

1≤k≤m

(k/m)1−γ

(1 + k/m)2−γ= O(1),

supm<k<∞

(k/m)γ(1 + k/m)2−γ< sup

m<k<∞

)1−γ (mk

)2−γ= O(1),

we obtain A2 = op(1). Similarly, for the term A3, we can see that

A3 = O

1≤k<∞

m2 + km2√

3(k/m)γ(1 + k/m)2−γ

1/ν−1/2) = op(1).

Thus, the proof is complete.

Proof of Theorem 3.1. (i) Since the distribution of W1,m(t),W2,m(t), 0 ≤ t < ∞ does

not depend on m, we omit the subscript m in the following. We first establish that

1≤k<∞

|G(m, k)|h(m, k)

D= sup

1≤k<∞

∣∣∣∣ W1(k) +km(2W2(m)− 6

xmdW2(x))

m2 (3W2(m)− 6∫m0

xmdW2(x))

∣∣∣∣/√m

√3(k/m)γ(1 + k/m)2−γ

+op(1)

D= sup

1≤k<∞

|( km + 1)W1(

kk+m) +

√3 km( k

m + 1)W2(1)|√3(k/m)γ(1 + k/m)2−γ

+ op(1). (B.7)

The first equality in distribution holds because

sup1≤k<∞

∣∣∣∣ k

1√mW2(m)− 6

mdW2(x)

)∣∣∣∣/√3(k/m)γ(1 + k/m)2−γ

= op(1),

which can be shown by noting that the process (3W2(m)− 6∫m0

xmdW2(x))/

√m has zero ex-

pectation and a finite variance independent ofm, which implies that∣∣(3W2(m)− 6

xmdW2(x))/

√m∣∣ =

Op(1), and

sup1≤k<∞

|k/m2|√3(k/m)γ(1 + k/m)2−γ

= o(1).

For the second equality in distribution given by (B.7), let k/m = λ, and the numerator

of (B.6) can be written as

G(λ) = W1(λ)+λ

1√mW2(m)− 6

mdW2(x)

1√mW2(m)− 6

mdW2(x)

This is a Gaussian process with zero mean and covariance function given by

E[G(s)G(t)] = s(t+ 1) + 3st(s+ 1)(t+ 1) for s < t, (B.8)

which can be decomposed into the two independent processes X1(λ) and X2(λ) with the

covariance functions s(t+1) and 3st(s+1)(t+1), respectively. Since the covariance function

of X1(λ) can be written as E[X1(s)X1(t)] = u(s)v(t) for s < t, where u(s) = s and v(t) =

t+ 1, we can use the technique of Doob (1949) to transform X1(λ) into a Brownian motion.

Let a(λ) = u(λ)/v(λ) = λ/(λ + 1), which is continuous and monotonically increasing with

inverse b(λ) = λ/(1 − λ). Then, X1(b(λ))/v(b(λ)) is a standard Brownian motion because

E[X1(b(λ))/v(b(λ))] = 0 and the covariance function is min(s, t). This implies that X1(λ)D=

v(λ)W (a(λ)). And we can also see that X2(λ)D=

√3λ(λ + 1)W (1). Hence, we transform a

Gaussian process G(λ) into a functional of the two independent Brownian motions W1(·) and

W2(·) as follows:

G(λ)D= (λ+ 1)W1

)+√3λ(λ+ 1)W2(1), (B.9)

and we thus obtain the second equality in distribution in (B.7).

We next derive the limiting distribution of (B.7). The continuity of (t + 1)W1(t

t+1) +√3t(t+ 1)W2(1)/

√3tγ(1 + t)2−γ on [0, T ] for a given T > 0 yields that,

sup1≤k≤mT

∣∣∣( km + 1)W1(

kk+m) +

√3 km( k

m + 1)W2(1)∣∣∣

√3(k/m)γ(1 + k/m)2−γ

→ sup0<t≤T

∣∣∣(t+ 1)W1(t

t+1) +√3t(t+ 1)W2(1)

∣∣∣√3tγ(t+ 1)2−γ

For k ≥ mT , we have

supmT≤k<∞

∣∣∣( km + 1)W1(

∣∣∣√3(k/m)γ(1 + k/m)2−γ

≤ supT≤t<∞

∣∣∣(t+ 1)W1(t

t+1)∣∣∣

√3tγ(t+ 1)2−γ

= supT≤t<∞

∣∣∣W1(t

t+1)∣∣∣

√3tγ(t+ 1)1−γ

(B.10)

and for any δ > 0,

limT→∞

supT≤t<∞

∣∣∣W1(t

t+1)∣∣∣

√3tγ(t+ 1)1−γ

= 0. (B.11)

We also have

supmT≤k<∞

∣∣∣∣∣√3 km( k

m + 1)√3(k/m)γ(1 + k/m)2−γ

∣∣∣∣∣ ≤ supT≤t<∞

∣∣∣∣∣√3t(t+ 1)√

3tγ(1 + t)2−γ− 1

∣∣∣∣∣→ 0, as T → ∞.

(B.12)

Putting together (B.10), (B.11), and (B.12), we can see that as m → ∞ and T → ∞,∣∣∣∣∣∣ supmT≤k<∞

∣∣∣( km + 1)W1(

kk+m) +

√3 km( k

m + 1)W2(1)∣∣∣

√3(k/m)γ(1 + k/m)2−γ

− supT≤t<∞

∣∣∣(t+ 1)W1(t

t+1) +√3t(t+ 1)W2(1)

∣∣∣√3tγ(t+ 1)2−γ

∣∣∣∣∣∣ = op(1).

Hence,

sup1≤k<∞

∣∣G( km)∣∣

h(m, k)

d→ sup0≤t<∞

∣∣∣(t+ 1)W1(t

t+1) +√3t(t+ 1)W2(1)

∣∣∣√3tγ(t+ 1)2−γ

. (B.13)

From the scalar transformation, we have(t+ 1)W1(

tt+1) +

√3t(t+ 1)W2(1)√

3tγ(t+ 1)2−γ, 0 ≤ t < ∞

1− t√3tγ

W1(t) + t1−γW2(1), 0 ≤ t ≤ 1

Therefore, we obtain

sup1≤k<∞

∣∣G( km)∣∣

h(m, k)

d→ sup0≤t≤1

∣∣∣∣ 1− t√3tγ

W1(t) + t1−γW2(1)

∣∣∣∣ . (B.14)

Theorem 3.1(i) is obtained from Lemmas B.1 and B.2, (B.3), and (B.14).

(ii) Let k > k∗ and ∆ = [∆1,∆2]′. Then, the detector is expressed as

m+k∑t=m+1

ϵt −m+k∑

x′t(βm − β0) +

m+k∑t=m+k∗

x′t∆.

From Theorem 3.1, we have

sup1≤k<∞

∣∣∣∣∣∣m+k∑

ϵt −m+k∑

x′t(βm − β0)

∣∣∣∣∣∣/

h(m, k) = Op(1).

We then focus on the last term and will show that∣∣∣∣∣∣m+k∑

t=m+k∗

x′t∆

∣∣∣∣∣∣/

h(m, k) =

∣∣∣∣∣∆1 +∆2 +k + k∗

2m∆2

∣∣∣∣∣ (k − k∗ + 1)/h(m, k) → ∞. (B.15)

Suppose that k∗ = O(mθ) with 0 ≤ θ < 1 and ∆1 +∆2 = 0. Let k = mθ + k∗ − 1 with θ

satisfying max(θ, (1− 2γ)/2(1− γ)) < θ < 1. Then, we have∣∣∣∣∣∆1 +∆2 +k + k∗

2m∆2

∣∣∣∣∣→ |∆1 +∆2| > 0. (B.16)

k − k∗ + 1

m1/2(k/m)γ(1 + k/m)2−γ=

O(mθ(1−γ))

O(m1/2−γ)= O(mθ(1−γ)−(1/2−γ)) → ∞. (B.17)

Thus, (B.16) and (B.17) imply (B.15).

When ∆1 +∆2 = 0, let k = m+ k∗ − 1 (θ = 1). Since ∆2 = 0, we have∣∣∣∣∣∆1 +∆2 +k + k∗

2m∆2

∣∣∣∣∣→ |∆2|2

Since (B.17) holds with θ = 1, we have (B.15).

Similarly, we can prove (B.15) for θ ≥ 1 and thus the test is consistent.

Defining x(s) = [1, s, s2, · · · , sp]′, let

c1+k/m =

∫ 1+k/m

0x(s)ds, C =

0x(s)x(s)′ds, ak =

m+k∑t=m+1

xt, and Cm =m∑t=1

x′t.

In the following, we denote the ith and (i, j)th elements of c1+k/m and C as c1+k/m(i) and

C(i, j), respectively. For example, c1+k/m(i) = (1 + k/m)i/i and C(i, j) = 1/(i+ j − 1). We

also note that 1mCm → C and c1 is the first column of C.

Lemma B.3 As m → ∞, we have∥∥∥∥ 1

mak − (c1+k/m − c1)

∥∥∥∥ = O

), (B.18)∥∥∥∥ 1

mCm − C

∥∥∥∥ = O

), (B.19)∥∥∥∥∥

− C−1

∥∥∥∥∥ = O

). (B.20)

Proof of Lemma B.3. We first note that for positive integers a and b (a ≤ b),∫ bm

a−1m

sids =b∑

∫ tm

t−1m

sids ≥b∑

∫ tm

t−1m

(t− 1

b∑t=a

(t− 1

. (B.21)

Using this relation with a = m+ 1 and b = m+ k, we have, for i = 1, · · · , p+ 1,

0 ≤ 1

mak(i)− (c1+k/m(i)− c1(i)) (B.22)

m+k∑t=m+1

)i−1

−∫ 1+ k

1si−1ds ≤ 1

)i−1

. (B.23)

Because i = p + 1 is the highest order, we obtain (B.18). Similarly, we can see that (B.19)

and (B.20) hold using (B.21) with a = 1 and b = m.

From (B.3), we can express Γ(m, k) as follows:

Γ(m, k) =

m+k∑t=m+1

ϵt − a′kC−1m

m∑t=1

xtϵt. (B.24)

sup1≤k<∞

∣∣∣∣a′kC−1m

m∑t=1

xtϵt − (c1+k/m − c1)′C−1

m∑t=1

∣∣∣∣h(m, k)

= op(1), as m → ∞, (B.25)

where h(m, k) = g(m, k)/c with c = c(α) determined by the given significance level α.

Proof of Lemma B.4. Putting together (B.4), (B.18), and (B.20), the left-hand side of

(B.25) is bounded by

sup1≤k<∞

∥∥ 1mak

∥∥ ∥∥∥∥( 1mCm)−1 − C−1

∥∥∥∥ ∥∥∥∥ m∑t=1

∥∥∥∥h(m, k)

+ sup1≤k<∞

∥∥ 1mak − (c1+k/m − c1)

∥∥ ∥∥∥∥C−1m∑t=1

∥∥∥∥h(m, k)

= sup1≤k<∞

∥ 1mak∥

h(m, k)O

√m) +

(1 + k

)p − 1)

h(m, k)Op(

√m). (B.26)

The first element of 1mak is k/m, while for i = 2, · · · , p+ 1, we have, from (B.23),

m+k∑t=m+1

)i−1

≤∫ 1+k/m

1si−1ds+

)i−1

+i− 1

)i−2

where the last inequality holds by applying the mean-value theorem to each term of the right

hand side of the equality. Noting that for all 1 ≤ i ≤ p+ 1,

sup1≤k<∞

(1 + k

)i−1(km

)γ (1 + k

)p+1−γ ≤ sup1≤k<∞

)1−γ(1 + k

)1−γ = O(1),

we can see that sup ∥ 1mak∥/h(m, k) = O(m−1/2), which implies that the first term on the

right hand side of (B.26) is op(1). In the same way, the second term is shown to be op(1) and

the proof is complete.

sup1≤k<∞

∣∣∣∣ m+k∑t=m+1

ϵt − (c1+k/m − c1)′C−1

m∑t=1

xtϵt − G(m, k)

∣∣∣∣h(m, k)

= op(1), (B.27)

G(m, k) = σ(W1,m(k)− (c1+k/m − c1)

′C−1G(m)),

G(m) =

[W2,m(m),

mdW2,m(x), . . . ,

)pdW2,m(m)

Proof of Lemma B.5. The proof is similar to Lemma B.2 and thus we omit details.

Proof of Theorem 3.2. (i) Since the distribution of W1,m(t),W2,m(t), 0 ≤ t < ∞ does

not depend on m, we omit the subscript m in the following. We first establish that

1≤k<∞

|G(m, k)|h(m, k)

D= sup

1≤k<∞

∣∣∣( km + 1

)+ c′

)W (1)

∣∣∣√f(p)(k/m)γ(1 + k/m)p+1−γ

, (B.28)

c(x) = D−1/2L−1c1+x − (1 + x)c1, (B.29)

L is a lower triangular matrix defined as, for i ≥ j,

L(i, j) = (2j − 1)

(2j − 2

j − 1

)(2i− 1

i− j

)/(2i− 1)

(2i− 2

i− 1

), (B.30)

and D is a diagonal matrix defined as, for i = j,

D(i, j) = 1

/(2i− 1)

(2i− 2

i− 1

. (B.31)

To see this, let k/m = λ and define G(λ) = G(m,λm)/(σ√m) = (W1,m(λm) − (c1+λ −

c1)′C−1G(m))/

√m. This is a Gaussian process with mean zero and covariance function

given by

E[G(s)G(t)] = s+ (c1+s − c1)′C−1(c1+t − c1)

= s+ c′1+sC−1c1+s − (1 + s)− (1 + t) + 1

= c′1+sC−1c1+t − (1 + t)

for s ≤ t, where we used the fact that c′1C−1 = [1, 0, · · · , 0] because c1 is the first column of

the symmetric matrix C. We further decompose the first term into the term of (1+ s)(1+ t)

and the higher order polynomial. Since c1+s = (1 + s)c1 + c1+s − (1 + s)c1, we have,

c′1+sC−1c1+t = (1 + s)(1 + t) + c1+s − (1 + s)c1C−1c1+t − (1 + t)c1, (B.32)

because c′1C−1c1+s − (1 + s)c1 = 0. By using the Cholesky decomposition in Hitotumatu

(1988), the Hilbert matrix C can be decomposed as C = LDL′, where L and D are defined

in (B.30) and (B.31), respectively. Then, the second term of the right-hand side of (B.32)

becomes

c1+s − (1 + s)c1C−1c1+t − (1 + t)c1

= c1+s − (1 + s)c1′(L′)−1D−1/2D−1/2L−1c1+t − (1 + t)c1 = c′(s)c(t), (B.33)

where c(·) is defined in (B.29). Therefore, the covariance function can be expressed as

E[G(s)G(t)] = s(1 + t) + c′(s)c(t).

In exactly the same way as the derivation of (B.9), we have, using Doob’s transformation,

G(λ)D= (λ+ 1)W1

)+ c′(λ)W (1), (B.34)

where W (1) = [W2(1),W3(1), · · · ,Wp+2(1)]′ is a (p + 1)-dimensional Wiener processes and

W1(·),W2(·), · · · , and Wp+2(·) are independent Wiener processes.

We next derive the limiting distribution of (B.28). For given T > 0, we can see that

sup1≤k≤mT

∣∣∣( km + 1)W1(

kk+m) + c′

)W (1)

∣∣∣√f(p)(k/m)γ(1 + k/m)p+1−γ

→ sup0<t≤T

∣∣∣(t+ 1)W1(t

t+1) + c′ (t)W (1)∣∣∣√

f(p)tγ(t+ 1)p+1−γa.s.

On the other hand, as in the proof of Theorem 3.1, we have

supT≤k<∞

∣∣∣( km + 1)W1(

∣∣∣√f(p)(k/m)γ(1 + k/m)p+1−γ

p−→ 0,

while we will also show that∥∥∥∥∥ supT≤k<∞

c′ (k/m)√f(p)(k/m)γ(1 + k/m)p+1−γ

− ℓ

∥∥∥∥∥→ 0, where ℓ = [0, 0, · · · , 1]′. (B.35)

To see this, we note that the first element of c1+t − (1 + t)c1 is zero, while the ith element

for i = 2, · · · , p+ 1 is given by

(1 + t)i

i− 1 + t

i[(1 + t)i−1 − 1] =

i× (polynomial of t of order i− 1).

Thus, the highest order in c(t) is p + 1 (corresponding to tp+1) from the (p + 1)th element

and its coefficient is given by 1/(p+1) times (p+1, p+1)th element of the upper triangular

matrix (L′)−1D−1/2, which is equivalent to√f(p). Then,∣∣∣∣∣ sup

T≤k<∞

√f(p)(k/m)p+1√

f(p)(k/m)γ(1 + k/m)p+1−γ− 1

∣∣∣∣∣→ 0,

while for i = 1, · · · , p, ∣∣∣∣∣ supT≤k<∞

(k/m)i√f(p)(k/m)γ(1 + k/m)p+1−γ

∣∣∣∣∣→ 0,

and thus we obtain (B.35). These results imply that

sup1≤k<∞

∣∣∣( km + 1)W1(

kk+m) + c′ (k/m)W (1)

∣∣∣√f(p)(k/m)γ(1 + k/m)p+1−γ

d−→ sup0<t<∞

∣∣∣(t+ 1)W1(t

t+1) + c′ (t)W (1)∣∣∣√

f(p)tγ(t+ 1)p+1−γ.

Finally, by changing t as in the proof of Theorem 3.1, we obtain the null limiting distribution.

(ii) Let k > k∗, ∆ = [∆1,∆2, · · · ,∆p+1]′. As demonstrated in Theorem 3.1, it is sufficient to

show that ∣∣∣∣∣∣m+k∑

t=m+k∗

x′t∆

∣∣∣∣∣∣/

h(m, k) → ∞. (B.36)

The left-hand side of (B.36) can be decomposed into the term∣∣∣∣∣∣p+1∑i=1

−(1 +

k∗ − 1

)i∆i

∣∣∣∣∣∣/

h(m, k) (B.37)

and a negligible term. To see this, using (B.21) with a = m + k∗ and b = m + k, the ith

element of∑m+k

t=m+k∗ xt can be rewritten as

m+k∑t=m+k∗

)i−1

∫ 1+ km

1+ k∗−1m

xi−1dx+O

(1 + k

)i−1

−(1 +

k∗ − 1

)i−1

−(1 +

k∗ − 1

k − k∗ + 1

)i−2 , (B.38)

where the last equality holds by applying the mean-value theorem with ki ∈ [k∗ − 1, k]. Note

that the second term of (B.38) does not appear for i = 1, while for i = 2, 3, · · · , p + 1, the

second term of (B.38) over h(m, k) is negligible because∣∣∣∣∣(k − k∗ + 1)(1 + ki/m)i−2

mh(m, k)

∣∣∣∣∣ <∣∣∣∣∣ k(1 + k/m)i−2

mm1/2(k/m)γ(1 + k/m)p+1−γ

∣∣∣∣∣ =∣∣∣∣∣ (k/m)1−γ

m1/2(1 + k/m)p+3−γ−i

∣∣∣∣∣→ 0

for any value of k.

We next show that the term (B.37) diverges to infinity. Suppose that k∗ = O(mθ)

with 0 ≤ θ < 1 and ∆1 + ∆2 + · · · + ∆p+1 = 0. Let k = mθ + k∗ − 1 with θ satisfying

max(θ, (1− 2γ)/2(1− γ)) < θ < 1. Then, because k/m → 0, we have, using the binomial

expansion,∣∣∣∣∣∣p+1∑i=1

−(1 +

k∗ − 1

)i∆i

∣∣∣∣∣∣ = (k − k∗ + 1)|∆1 +∆2 + · · ·+∆p+1|(1 + o(1)).

By combining this with (B.17), we can see that (B.37) goes to infinity and thus (B.36) holds.

When ∆1 +∆2 + · · ·+∆p+1 = 0, let k = ξm+ k∗ − 1 (θ = 1), where a positive real value

ξ can be chosen such that

p+1∑i=1

−(1 +

k∗ − 1

)i∆i →

p+1∑i=1

(1 + ξ)i − 1

∆i = 0.

Since m/h(m, k) → ∞ by (B.17) with θ = 1, we have (B.36).

Similarly, we can prove (B.36) for θ ≥ 1 and thus the test is consistent.

Appendix C. Proofs of Theorems 3.3 and 3.4

The proof of Theorem 3.3 is based on the framework of Aue et al. (2009) through a series of

lemmas. The basic idea is to find a sequence N = N(m,x) such that

Pτm ≥ N = P

1≤k≤N

|Γ(m, k)|g(m, k)

→ Φ(x), for all real x.

Now, we define N as

N1−γ =

√3cσm1/2−γ

|δ|− 1

cγmδ

m+cm∑t=m+k∗

(xt − d)′∆− σx

((√3cσ)1/2−γm(1/2−γ)2

|δ|3/2−2γ

) 11−γ

= c1−γm − 1

cγmδ

m+cm∑t=m+k∗

(xt − d)′∆− σx

|δ|c1/2−γm

= a1−γm − x

(1− γ)bmcγm

. (C.1)

The following proof assumes that δ is positive and the same result can be derived under the

condition δ < 0. We first derive the order of the maximum of the partial sums of xt − d in

Lemmas C.1 and C.2, where d = [1, 1]′. Note that d is not the mean of regressor xt.

Lemma C.1 For j = 1 and 2, as m → ∞,

maxk∗≤k≤cm

∣∣∣∣∣m+k∑

t=m+k∗

(xjt − dj)

∣∣∣∣∣ = o(1), (C.2)

max1≤k≤k∗

∣∣∣∣∣m+k∑

(xjt − dj)

∣∣∣∣∣ = o(1). (C.3)

Proof of Lemma C.1. The result for j = 1 is obvious because the first element of xt is

unity. In the case of j = 2,

maxk∗≤k≤cm

∣∣∣∣∣m+k∑

t=m+k∗

m− 1

)∣∣∣∣∣ = maxk∗≤k≤cm

∣∣∣∣(k + k∗)(k − k∗ + 1)

∣∣∣∣ ≤ (cm + k∗)(cm − k∗ + 1)

2m= o(1),

since k∗2/m = o(1) and c2m/m = o(1) from Lemmas C.3(i) and (iii).

We can also see that

max1≤k≤k∗

∣∣∣∣∣m+k∑

m− 1

)∣∣∣∣∣ = max1≤k≤k∗

∣∣∣∣k + 1

∣∣∣∣ ≤ k∗ + 1

2m= o(1),

and we finish the proof.

Lemma C.2 Under Assumption 3.2, as m → ∞,

cm→ 1 and

→ 1, (C.4)

maxk∗≤k≤N

∣∣∣∣∣m+k∑

t=m+k∗

(xt − d)′∆

∣∣∣∣∣ = O

(N2−γ

), (C.5)

maxk∗≤k≤N

∥∥∥∥∥m+k∑

t=m+k∗

(xt − d)

∥∥∥∥∥ = O

(N2−γ

). (C.6)

Proof of Lemma C.2. From the definition of N , we find that(N

)1−γ

= 1− 1

m+cm∑t=m+k∗

(xt − d)′∆− σx√cm|δ|

. (C.7)

Sincem+b∑

(xt − d)′∆ =(b+ a)(b− a+ 1)

2m∆2, (C.8)

Lemma C.3 implies that the second term of the right-hand side in (C.7) is o(1) because

m+cm∑t=m+k∗

(xt − d)′∆ =(c2m − k∗2 + cm + k∗)∆2

2mcmδ= o(1),

which implies (am/cm)1−γ → 1, while the third term also goes to 0 according to the definition

of cm. Thus, we have N/cm → 1.

We next observe that

maxk∗≤k≤N

∣∣∣∣∣m+k∑

t=m+k∗

(xt − d)′∆

∣∣∣∣∣ < maxk∗≤k≤N

(k2 + k + k∗)|∆2|2mkγ

(N2−γ

N1−γ

k∗(1−γ)

)|∆2| (C.9)

(N2−γ

since for the third term of (C.9), according to Assumption 2(a) and (C.4), we have

(k∗(1−γ)

N2−γ

θ(1−γ)− (1/2−γ)(2−γ)1−γ

(N2−γ

Similarly, we can derive (C.6) and we omit the proof.

Lemma C.3 Under Assumption 3.2, as m → ∞,

(i)k∗2

m→ 0.

(iii)c2mm

→ 0.

(iv)k∗

→ 0 andk∗√N

→ 0.

Proof of Lemma C.3.(i) This is obvious by noting that k∗ = O(mθ) with 0 ≤ θ < 1/2.

(ii) It is proven that (cmm

)1−γ=

√3cσ

|δ|√m

= o(1).

Applying (C.4), we obtain N/m = o(1).

(iii) From the definition of cm, if γ = 0 holds, we have(c2mm

)1−γ

=3c2σ2

δ2mγ= o(1).

(iv) It can be verified by the assumption of k∗ and the definitions of cm that

k∗√cm

= O(mθ−(1/2−γ)/(2(1−γ))

)= o(1),

and cm can be replaced by N from (C.4). Thus, the proof is complete.

Lemma C.4 Under Assumption 3.2, for all real x,

limm→∞

)γ−1/2(cσ −

∣∣∣∣ 1√3m(N/m)γ

Sm(k∗, N)

∣∣∣∣) =σx√3, (C.10)

where Sm(k∗, a) =∑m+a

t=m+k∗ x′t∆.

Proof of Lemma C.4. It can be verified that(N

)γ−1/2 1√3m(N/m)γ

Sm(k∗, N)

)γ−1/2(

Nδ√3m(N/m)γ

3m(N/m)γ

m+N∑t=m+k∗

(xt − d)′∆

)+ o(1). (C.11)

Since the first term in the parentheses can be shown to dominate the second one, we can see

for a large m that |Sm(k∗, N)| = Sm(k∗, N) when δ > 0. Then, the left-hand side of (C.10)

can be rewritten by inserting (C.11) as(N

)γ−1/2(cσ − Nδ√

3m(N/m)γ− 1√

3m(N/m)γ

m+N∑t=m+k∗

(xt − d)′∆

)+ o(1). (C.12)

From the definition of N , we find that

Nδ√3m(N/m)γ

= cσ − 1√3m(cm/m)γ

m+cm∑t=m+k∗

(xt − d)′∆− σx√3

)γ−1/2

. (C.13)

If we prove that

Nγ−1/2(R(cm, cm)−R(N,N)) = o(1), where R(y, z) =1

m+z∑t=m+k∗

(xt − d)′∆,

the lemma is proven by using (C.12), (C.13), and the fact that (N/m)γ−1/2(m/cm)γ−1/2 → 1.

We start by transforming

Nγ−1/2|R(cm, cm)−R(N,N)| ≤ Nγ−1/2|R(cm, cm)−R(N, cm)|+Nγ−1/2|R(N, cm)−R(N,N)|.

(C.14)

Applying the mean value theorem and (C.2) of Lemma C.1, the first term of the right-hand

side in (C.14) is rewritten as

Nγ−1/2|Nγ − cγm|cγmNγ

∣∣∣∣∣m+cm∑t=m+k∗

(xt − d)′∆

∣∣∣∣∣ = Nγ−1/2O(c2γ−1m c

1/2−γm /δ)

cγmNγo(1) = o(1),

(see p.186 of Aue et al. (2009)). For the second term of the right-hand side in (C.14), using

c2m/m = o(1), (C.4), and (C.8),

Nγ−1/2|R(N, cm)−R(N,N)| = 1√N

∣∣∣∣∣m+N∑

t=m+cm+1

(xt − d)′∆

∣∣∣∣∣ = N2 − c2m +N − cm

2√Nm

= o(1).

Lemma C.5 Under Assumptions 3.1 and 3.2,(N

)γ−1/2(

max1≤k<k∗

|Γ(m, k)|h(m, k)

−∣∣∣∣ 1√

3m(N/m)γSm(k∗, N)

∣∣∣∣)

p−→ −∞, (C.15)

where h(m, k) = g(m, k)/c.

Proof of Lemma C.5. Γ(m, k) can be written as

Γ(m, k) =

m+k∑t=m+1

x′t(β0 − βm) + Sm(k∗, k)1k≥k∗. (C.16)

For the first term of (C.16), we have(N

)γ−1/2

max1≤k<k∗

|∑m+k

t=m+1 ϵt − σWm(k)|h(m, k)

= Op(1) max1≤k<k∗

k(1/ν−γ)

N1/2−γ

((k∗

)1/2−γ)

= op(1),

where the inequality holds because 1/ν < 1/2, and we next find that as m → ∞,(N

)γ−1/2 ∣∣∣∣ max1≤k<k∗

h(m, k)

∣∣∣∣ ≤ ∣∣∣∣ sup0<t<k∗

Wm(t)√N(t/N)γ

∣∣∣∣ D=

∣∣∣∣∣ sup0<t<k∗/N

∣∣∣∣∣ = op(1), (C.17)

which implies that (N/m)γ−1/2max1≤k<k∗ |∑m+k

t=m+1 ϵt|/h(m, k) = op(1) holds.

The second term of (C.16) can be rewritten as

)γ−1/2

max1≤k<k∗

∣∣∣∑m+kt=m+1 x

′t(β0 − βm)

∣∣∣h(m, k)

)γ−1/2

max1≤k<k∗

∥∥∥∑m+kt=m+1 xt

∥∥∥h(m, k)

(1√m

= Op(1)

)γ−1/2 max

1≤k<k∗

∥∥∥∑m+kt=m+1 d

∥∥∥√3m(k/m)γ(1 + k/m)2−γ

+ max1≤k<k∗

∥∥∥∑m+kt=m+1(xt − d)

∥∥∥√3m(k/m)γ(1 + k/m)2−γ

= Op(1)

)γ−1/2

((k∗

)1−γ)

= op(1).

The third term of (C.16) is zero when k < k∗. Thus, we prove that the first component

of (C.15) tends to 0 and we next show that the second component diverges as m → ∞. From

m+b∑t=m+a

x′t∆ =m+b∑

d′∆+m+b∑

(xt − d)′∆ = (b− a+ 1)δ +(b+ a)(b− a+ 1)

2m∆2, (C.18)

and (C.5) of Lemma C.2, we have

Sm(k∗, N)√3m(N/m)γ

∑m+Nt=m+k∗ d

′∆√3m(N/m)γ

∑m+Nt=m+k∗(xt − d)′∆√3m(N/m)γ

=(N − k∗ + 1)δ√

3m(N/m)γ+O

(N2−γ

m3/2−γ

(N − k∗ + 1)δ√3m(N/m)γ

+ o(1).

Since, for δ > 0,

limm→∞

(N − k∗ + 1)δ√3m(N/m)γ

= limm→∞

c1−γm δ√

3m1/2−γ= cσ > 0,

which is obtained from the definition of cm and (C.4), the second term in (C.15) diverges to

infinity and thus we obtain Lemma C.5.

)γ−1/2

maxk∗≤k≤N

∣∣∣Γ(m, k)− (σWm(k) + Sm(k∗, k))∣∣∣/h(m, k) = op(1). (C.19)

Proof of Lemma C.6. From (C.16), the left-hand side of (C.19) is decomposed into two

terms, one of which is

)γ−1/2

maxk∗≤k≤N

∣∣∣∑m+kt=m+1 ϵt − σWm(k)

∣∣∣√3m(k/m)γ(1 + k/m)2−γ

= Op(1) maxk∗≤k≤N

k1/ν−γ

N1/2−γ= op(1),

maxk∗≤k≤N

k1/ν−γ

N1/2−γ≤

N1/ν−1/2 if 1/ν ≥ γ

Nγ−1/2k∗1/ν−γ if 1/ν < γ

= op(1),

while the other term becomes(N

)γ−1/2

maxk∗≤k≤N

∣∣∣∑m+kt=m+1 x

′t(β0 − βm)

∣∣∣h(m, k)

= Op(1)

)γ−1/2 max

k∗≤k≤N

∥∥∥∑m+kt=m+1 d

∥∥∥√3m(k/m)γ(1 + k/m)2−γ

+ maxk∗≤k≤N

∥∥∥∑m+kt=m+1(xt − d)

∥∥∥√3m(k/m)γ(1 + k/m)2−γ

= Op(1)

)γ−1/2

(N1−γ

m1−γ

)γ−1/2

(N2−γ

m2−γ

)= op(1).

)γ−1/2

maxk∗≤k≤N

∣∣∣∣σWm(k) + Sm(k∗, k)

h(m, k)− σWm(k) + Sm(k∗, k)√

3m(k/m)γ

∣∣∣∣ = op(1). (C.20)

Proof of Lemma C.7. The left-hand side of (C.20) is bounded by(N

)γ−1/2

maxk∗≤k≤N

|σWm(k)|√3m(k/m)γ

∣∣∣∣∣√3m(k/m)γ

h(m, k)− 1

∣∣∣∣∣+

)γ−1/2

maxk∗≤k≤N

|Sm(k∗, k)|√3m(k/m)γ

∣∣∣∣∣√3m(k/m)γ

h(m, k)− 1

∣∣∣∣∣ . (C.21)

The mean value theorem yields that

maxk∗≤k≤N

∣∣∣∣∣√3m(k/m)γ√

3m(k/m)γ(1 + k/m)2−γ− 1

∣∣∣∣∣ = maxk∗≤k≤N

∣∣∣∣∣(1 +

)γ−2

∣∣∣∣∣ = O

)= o(1),

and then the first term of (C.21) is shown to be op(1) as proven by Lemma 3.3 in Aue and

Horvath (2004). By using (C.18) and (C.5), the second component of (C.21) can be written

Nγ−1/2 maxk∗≤k≤N

1√3kγ

∣∣∣∣∣m+k∑

t=m+k∗

d′∆+m+k∑

t=m+k∗

(xt − d)′∆

∣∣∣∣∣∣∣∣∣∣(1 +

)γ−2

∣∣∣∣∣= Nγ−1/2

(O(N1−γ) +O

(N2−γ

)= o(1).

Lemma C.8 . Under Assumptions 3.1 and 3.2,

limm→∞

)γ−1/2

maxk∗≤k≤N

(|σWm(k) + Sm(k∗, k)|√

3m(k/m)γ− |Sm(k∗, N)|√

3m(N/m)γ

)≤ βm(γ)

)= Φ(x),

(C.22)

where βm(γ) =

)γ−1/2(cσ − |Sm(k∗, N)|√

3m(N/m)γ

Proof of Lemma C.8. We can see that

maxk∗≤k≤N

σWm(k) + Sm(k∗, k)√3m(k/m)γ

= maxk∗≤k≤N

1√3m(k/m)γ

(σWm(k) +

m+k∑t=m+k∗

(xt − d)′∆+ (k − k∗ + 1)δ

). (C.23)

We find that the order of the second term of (C.23) becomes, using (C.5) of Lemma C.2,

maxk∗≤k≤N

∣∣∣∑m+kt=m+k∗(xt − d)′∆

∣∣∣√3m1/2−γkγ

(mγ−1/2N

2−γ

)= o(mγ−1/2N1−γ),

while the order of the first term is given by

maxk∗≤k≤N

σ|Wm(k)|√3m(k/m)γ

)1/2−γ)

= op(mγ−1/2N1−γ).

On the contrary, the last term is

maxk∗≤k≤N

(k − k∗ + 1)δ√3m(k/m)γ

= O(mγ−1/2N1−γ),

which implies that the last term dominates the others and thus the maximum of (C.23) is

achieved at k close to N because the last term is an increasing function of k. Hence, for all

ε ∈ (0, 1),

limm→∞

k∗≤k≤N

|σWm(k) + Sm(k∗, k)|√3m(k/m)γ

= max(1−ε)N≤k≤N

|σWm(k) + Sm(k∗, k)|√3m(k/m)γ

Exactly in the same way as Lemma 7.6 of Aue et al. (2009), we can show that the maximum

of (C.23) is attained at k = N . Therefore, because Sm(k∗, N) dominates σWm(N) and

Sm(k∗, N) is positive for a large m when δ > 0, we have, because βm(γ) → σx/√3 by Lemma

limm→∞

)γ−1/2

maxk∗≤k≤N

(|σWm(k) + Sm(k∗, k)|√

3m(k/m)γ− |Sm(k∗, N)|√

3m(N/m)γ

)≤ βm(γ)

= limm→∞

(σ√3

Wm(N)√N

≤ βm(γ)

)= Φ(x).

Proof of Theorem 3.3. By combining Lemmas C.5–C.8, we can see that

limm→∞

P (τm ≥ N) = limm→∞

1≤k≤N

|Γ(m, k)|g(m, k)

)= Φ(x).

Because Φ(x) is symmetric around 0, we have

Φ(x) = 1− Φ(−x)

= 1− limm→∞

P (τm ≥ N(m,−x))

= 1− limm→∞

(τ1−γm ≥ a1−γ

m + x(1− γ)bm

)= lim

m→∞P

1− γ

τ1−γm − a1−γ

bm≤ x

This implies that τm/amp−→ 1 because a1−γ

m cγm/bm → ∞. Applying the result in the proof

of Theorem 3.1 of Aue et al. (2009), we obtain

limm→∞

(τm − am

bm≤ x

)= lim

m→∞P

1− γ

τ1−γm − a1−γ

bm≤ x

)= Φ(x),

and hence complete the proof.

We next derive the asymptotic distribution of the stopping time based on the maximal-

type fluctuation test through Lemmas C.9–C.14. We define the sequence NFL(m,x) as

(NFL)2 =cFLσm3/2

|(1/η − 1)(δ −∆2/2)|− σx

√η − 1m

√aFLm

|(1/η − 1)(δ −∆2/2)|= (aFL

m )2 − 2xaFLm bFL

m , (C.24)

where aFLm and bFL

m are defined in Theorem 3.4. The following derivation is considered under

the condition that δ −∆2/2 is negative and the proof for positive δ −∆2/2 follows similarly

and is omitted here. Allowing for an abuse of notation, let N = NFL, am = aFLm , bm = bFL

and c = cFL. The detector is rewritten as

ΓFLm (k, ℓ) =

m+k∑t=m+1

ϵt −k

m∑t=1

ϵt −k(m+ k)

ℓ(m+ ℓ)

(m+ℓ∑

ϵt −ℓ

m∑t=1

+m+k∑

t=m+k∗

x′t∆1k≥k∗ −k(m+ k)

ℓ(m+ ℓ)

m+ℓ∑t=m+k∗

x′t∆1ℓ≥k∗.

Lemma C.9 Under Assumptions 3.2(a) and 3.3, as m → ∞,

(i)k∗

= o(1).

am→ 1.

(iii)N

m→ 0.

(iv)N3/2

m→ ∞.

Proof of Lemma C.9. (i) Since k∗ = O(mθ) with 0 ≤ θ < 1/2, k∗/√

am = O(mθ−3/8) =

(ii) From the definition of N , we have(N

= 1− σx

√η − 1

|(1/η − 1)(δ −∆2/2)|m

The second term tends to 0 since

(|(1/η − 1)(δ −∆2/2)|

cσm3/2

= o(1).

(iii) Using (ii) and

√cσm3/4−1√

|(1/η − 1)(δ −∆2/2)|= o(1),

we find that N/m = o(1).

(iv) From the definition of N , we have N3/2/m = O(m1/8).

Lemma C.10 Under Assumptions 3.2(a) and 3.3, for all real x,

limm→∞

)−1/2

(cσ − |Jm(k∗, N)|) =√η − 1σx, (C.25)

where Jm(k∗, a) =1√m

m+a∑t=m+k∗

x′t∆− a(m+ a)

[a+1η ](m+

[a+1η

]) m+[a+1η

]∑t=m+k∗

x′t∆

. (C.26)

Proof of Lemma C.10. Let

C(η) =N(m+N)[

[N+1η

]) . (C.27)

Using (C.18), (N/m)−1/2Jm(k∗, k) can be expressed as(N

)−1/2

Jm(k∗, k) =

)−1/2 1√m

[(k − k∗ + 1)δ +

(k + k∗)(k − k∗ + 1)

2m∆2

−k(m+ k)

ℓ(m+ ℓ)

(ℓ− k∗ + 1)δ +

(ℓ+ k∗)(ℓ− k∗ + 1)

2m∆2

]= J1 + J2 + J3,

)−1/2 1√m

2m∆2 −

k(m+ k)

ℓ(m+ ℓ)

(ℓδ +

2m∆2

)−1/2 1√m

2m∆2 −

k(m+ k)

ℓ(m+ ℓ)

2m∆2

)−1/2 1√m

[(−k∗ + 1)δ +

−k∗2 + k∗

2m∆2 −

k(m+ k)

ℓ(m+ ℓ)

(−k∗ + 1)δ +

−k∗2 + k∗

2m∆2

We investigate the order of each term. First, we have

maxk∗≤k≤N

J1 =1√N

maxk∗≤k≤N

k(k − ℓ)

m+ ℓ

(−δ +

maxk∗≤k≤N

k(k − ℓ)

(−δ +

)+ o(1), (C.28)

where the second equality holds because 1/(m+ℓ) = 1/m+O(ℓ/m2) and O(N5/2/m2) = o(1).

Similarly, we have

maxk∗≤k≤N

J2 =1√N

maxk∗≤k≤N

k(ℓ− k)

m+ ℓ

)= o(1), (C.29)

maxk∗≤k≤N

J3 = maxk∗≤k≤N

1− k(m+ k)

ℓ(m+ ℓ)

−k∗δ√

N+ o(1) = O

(k∗√N

)= o(1). (C.30)

From (C.28)–(C.30), we have(N

)−1/2

maxk∗≤k≤N

Jm(k∗, k) =1√N

maxk∗≤k≤N

k(k − ℓ)

(−δ +

)+ o(1), (C.31)

which implies that, because ℓ = [(k + 1)/η],

)−1/2

Jm(k∗, N) =

)−1/2N

(N −

[N+1η

])m3/2

(−δ +

)+ o(1). (C.32)

On the other hand, from the definition of N , we find that

(1− 1

)(∆2

2− δ

m3/2+√η − 1σx

√amm

. (C.33)

Note that as is seen in (C.32) and (C.37), |Jm(k∗, N)| = Jm(k∗, N) for a large m when

δ −∆2/2 < 0. Then, by using (C.32) and (C.33), (C.25) can be written as

)−1/2(1− 1

)(∆2

2− δ

m3/2−(N −

[N + 1

])(∆2

2− δ

+√η − 1σx

√amm

+ o(1). (C.34)

From Lemma C.9, we can see that(N

)−1/2(1− 1

)N −

(N −

[N + 1

])(∆2

2− δ

)−1/2([N + 1

]− N

)(∆2

2− δ

)−1/2 N

)= o(1), (C.35)

limm→∞

)−1/2√η − 1σx

√amm

=√η − 1σx.

Thus, we complete the proof.

Lemma C.11 Under Assumptions 3.1, 3.2(a), and 3.3, then(N

)−1/2(

max1≤k<k∗

|ΓFLm (k, ℓ)|

hFL(m, k)− |Jm(k∗, N)|

)p−→ −∞, (C.36)

where hFL(m, k) = gFL(m, k)/cFL.

Proof of Lemma C.11. We simplify the notations as h(m, k) = hFL(m, k) and in the case

of k ≤ k∗,

ΓFLm (k, ℓ) =

m+k∑t=m+1

ϵt −k

m∑t=1

ϵt −k(m+ k)

ℓ(m+ ℓ)

(m+ℓ∑

ϵt −ℓ

m∑t=1

and consequently we obtain

max1≤k<k∗

|ΓFLm (k, ℓ)|h(m, k)

≤ max1≤k<k∗

∣∣∣∑m+kt=m+1 ϵt

∣∣∣h(m, k)

+ max1≤ℓ<k∗

|kϵm|h(m, k)

+ O(1)

max1≤k<k∗

∣∣∣∑m+ℓt=m+1 ϵt

∣∣∣h(m, k)

+ max1≤ℓ<k∗

|ℓϵm|h(m, k)

= B1 +B2 +B3 +B4.

Then, the first term is bounded by

)−1/2

B1 ≤(N

)−1/2

max1≤k<k∗

∣∣∣∑m+kt=m+1 ϵt − σWm(k)

∣∣∣√m(1 + k/m)2

)−1/2

max1≤k<k∗

|σWm(k)|√m(1 + k/m)2

= Op(1)O

1≤k<k∗

k1/ν√N

)+O(1) max

1≤k<k∗

|σWm(k)|√N

= Op(1),

where the last equality is derived by Lemma C.9 (i) and sup0<t<k∗/N |Wm(t)| = Op(1). Then,

for the second term, we have(N

)−1/2

max1≤k<k∗

k∣∣ 1m

∑mt=1 ϵt

∣∣√m(1 + k/m)2

(k∗√Nm

√m) = op(1).

Similarly, we can derive that (N/m)−1/2(B3 + B4) = Op(1) + op(1). We have proven that

the first term related to the detector in (C.36) is bounded in probability. We next show that

(N/m)−1/2Jm(k∗, N) diverges as m → ∞. Applying (C.32), and (C.35), we have(N

)−1/2

Jm(k∗, N) =

)−1/2(1− 1

)(∆2

2− δ

m3/2+ o(1)

(1− 1

)(∆2

2− δ

m+ o(1). (C.37)

If the term δ −∆2/2 is nonzero, we can show that (N/m)−1/2|Jm(k∗, N)| tends to positive

infinity and hence we finish the proof of Lemma C.11.

Lemma C.12 Under Assumptions 3.1, 3.2(a), and 3.3, we have(N

)−1/2

maxk∗≤k≤N

∣∣∣ΓFLm (k, ℓ)− Wm(k, ℓ)

∣∣∣/h(m, k) = op(1), (C.38)

Wm(k, ℓ) = WQ(k)−k(m+ k)

ℓ(m+ ℓ)WQ(ℓ),

WQ(j) = σWm(j) +

m+j∑t=m+k∗

x′t∆1j≥k∗.

Proof of Lemma C.12. Let

Q(m, j) =

m+j∑t=m+1

ϵt −j

m∑t=1

m+j∑t=m+k∗

x′t∆1j≥k∗.

Then, ΓFLm (k, ℓ) = Q(m, k)− k(m+ k)/ℓ(m+ ℓ)Q(m, ℓ). We have(

)−1/2

maxk∗≤k≤N

|Q(m, k)− WQ(k)|h(m, k)

)−1/2

maxk∗≤k≤N

|∑m+k

t=m+1 ϵt − σWm(k)|√m(1 + k/m)2

)−1/2

maxk∗≤k≤N

| km∑m

t=1 ϵt|√m(1 + k/m)2

k∗≤k≤N

k1/ν√N

k∗≤k≤N

k√Nm

)= op(1).

Similarly, we can also show that(N

)−1/2 k(m+ k)

ℓ(m+ ℓ)max

k∗≤k≤N

|Q(m, ℓ)− WQ(ℓ)|h(m, k)

= op(1),

since ℓ = [(k+1)/η] implies that k(m+k)/ℓ(m+ℓ) = O(1). Hence, the proof is complete.

Lemma C.13 Under Assumptions 3.1, 3.2(a), and 3.3, we have(N

)−1/2

maxk∗≤k≤N

∣∣∣∣∣Wm(k, ℓ)

h(m, k)− Wm(k, ℓ)√

∣∣∣∣∣ = op(1). (C.39)

Proof of Lemma C.13. The left-hand side of (C.39) is bounded by(N

)−1/2

maxk∗≤k≤N

|σWm(k)|√m

∣∣∣∣ √m

h(m, k)− 1

∣∣∣∣+ (N

)−1/2

maxk∗≤k≤N

k(m+ k)

ℓ(m+ ℓ)

|σWm(ℓ)|√m

∣∣∣∣ √m

h(m, k)− 1

∣∣∣∣+

)−1/2

maxk∗≤k≤N

∣∣∣∣∣m+k∑

t=m+k∗

x′t∆− k(m+ k)

ℓ(m+ ℓ)

m+ℓ∑t=m+k∗

x′t∆1ℓ≥k∗

∣∣∣∣∣∣∣∣∣ √

h(m, k)− 1

∣∣∣∣= C1 + C2 + C3.

It is easily seen that

maxk∗≤k≤N

∣∣∣∣ √m

h(m, k)− 1

∣∣∣∣ = maxk∗≤k≤N

∣∣∣∣ 1

(1 + k/m)2− 1

∣∣∣∣ = O

and (N

)−1/2

maxk∗≤k≤N

|Wm(k)|√m

≤ max1≤k≤N

|Wm(k)|√N

D= max

0<t≤1|W (t)| = Op(1). (C.40)

Thus, C1 tends to 0 and similarly, we can show that the term C2 is op(1).

For C3, it tends to zero in the case of ℓ < k∗ from (C.43). When ℓ ≥ k∗, we have, from

(C.31),

C3 ≤(N

)−1/2

maxk∗≤k≤N

|Jm(k∗, k)| maxk∗≤k≤N

∣∣∣∣ √m

h(m, k)− 1

∣∣∣∣=

maxk∗≤k≤N

∣∣∣∣k(k − ℓ)

(−δ +

)∣∣∣∣+ o(1)

)= o(1).

Thus, the proof is complete.

Lemma C.14 Under Assumptions 3.1, 3.2(a), and 3.3, we have

limm→∞

)−1/2

maxk∗≤k≤N

∣∣∣Wm(k, ℓ)

∣∣∣√m

− |Jm(k∗, N)|

≤ βFLm (γ)

= Φ(x), (C.41)

where βFLm (γ) =

)−1/2

(cσ − |Jm(k∗, N)|) .

Proof of Lemma C.14. We first show that the drift term in Wm(k, ℓ)/√m is dominant.

From (C.40), we have

maxk∗≤k≤N

Wm(k)√m

and thus, because k(m+ k)/ℓ(m+ ℓ) = O(1), the stochastic terms are op(N2/m3/2).

We next investigate the order of the magnitude of the following term:(N

)−1/2 1√m

∣∣∣∣∣m+k∑

t=m+k∗

x′t∆− k(m+ k)

ℓ(m+ ℓ)

m+ℓ∑t=m+k∗

x′t∆1ℓ≥k∗

∣∣∣∣∣ . (C.42)

In the case of ℓ < k∗, ℓ > (k+1)/η− 1 implies that k < η(k∗ +1)− 1. Then, it is easily seen

that (C.42) tends to zero because, using (C.18),

maxk∗≤k≤N

m+k∑t=m+k∗

x′t∆ = maxk∗≤k≤N

m+k∑

t=m+k∗

d′∆+m+k∑

t=m+k∗

(xt − d)′∆

(k∗√m

(k∗2

). (C.43)

For ℓ ≥ k∗, (C.42) is equal to (N/m)−1/2|Jm(k∗, k)|. From (C.31), it is easily seen that the

denominating term in (N/m)−1/2maxk∗≤k≤N |Jm(k∗, k)| is

maxk∗≤k≤N

k(k − ℓ)√Nm

∣∣∣∣−∆1 −∆2

∣∣∣∣ . (C.44)

Then, (C.42) is bounded below by

)−1/2

maxk∗≤k≤N

|Jm(k∗, k)| ≥N(N −

[N+1η

])√Nm

∣∣∣∣−∆1 −∆2

∣∣∣∣+ o(1) = O

On the contrary, because k(k − ℓ) in (C.44) is a strictly increasing function of k and thus it

is bounded above by(N

)−1/2

maxk∗≤k≤N

|Jm(k∗, k)| ≤ maxk∗≤k≤N

k√Nm

(η − 1)(k + 1)

∣∣∣∣−∆1 −∆2

∣∣∣∣+o(1) = O

because

η− 1 < ℓ ≤ k + 1

ηand thus

(η − 1)k − 1

η− 1 ≤ k − ℓ <

(η − 1)(k + 1)

This implies that the maximum of |Wm(k, ℓ)|/√N will be determined by the term (C.42)

and then achieved close to N . Further, we have proven that (N/m)−1/2|Jm(k∗, N)| tends to

positive infinity in Lemma C.11. We thus find that for all ε ∈ (0, 1),

limm→∞

)−1/2

maxk∗≤k≤N

|Wm(k, ℓ)|√m

)−1/2

max(1−ε)N≤k≤N

|Wm(k, ℓ)|√m

Next, we show that

)−1/2

max(1−ε)N≤k≤N

∣∣∣∣∣∣Wm(k)− k(m+ k)Wm(ℓ)

ℓ(m+ ℓ)−Wm(N) +

N(m+N)Wm

([N+1η

])[N+1η

[N+1η

])∣∣∣∣∣∣/

≤ sup(1−ε)N≤t≤N

∣∣∣∣∣∣Wm(t)− t(m+ t)Wm(ℓ)

ℓ(m+ ℓ)−Wm(N) +

N(m+N)Wm

([N+1η

])[N+1η

[N+1η

])∣∣∣∣∣∣

D= sup

1−ε≤s≤1

∣∣∣∣∣∣Wm(s)−s(mN + s

(1ηs+ c1

)(1ηs+ c1

)(mN + 1

ηs+ c1

) −Wm(1) +

(mN + 1

(1η + c2

)(1η + c2

)(mN + 1

η + c2

)∣∣∣∣∣∣

≤ sup1−ε≤s≤1

|Wm(s)−Wm(1)|

+ sup1−ε≤s≤1

∣∣∣∣∣∣ s(mN + s

)(1ηs+ c1

)(mN + 1

ηs+ c1

) −(mN + 1

)(1η + c2

)(mN + 1

η + c2

)∣∣∣∣∣∣∣∣∣∣Wm

ηs+ c1

)∣∣∣∣+ sup

1−ε≤s≤1

∣∣∣∣∣∣(mN + 1

)(1η + c2

)(mN + 1

η + c2

)∣∣∣∣∣∣∣∣∣∣Wm

ηs+ c1

)−Wm

η+ c2

)∣∣∣∣p−→ 0,

as ε → 0, where

], s =

N, c1 =

([t+ 1

]− t

)/N, c2 =

([N + 1

]− N

and c1, c2 tend to zero as N → ∞, where we used the scale transformation for equality in

distribution and the last convergence holds according to almost sure continuity of Brownian

motions.

Furthermore, we can see that

∣∣∣∣∣∣W (N)− N(m+N)[

[N+1η

])W ([N + 1

])−(W (N)− ηW

))∣∣∣∣∣∣≤

∣∣∣∣∣∣η − N(m+N)[N+1η

[N+1η

])∣∣∣∣∣∣∣∣∣∣ 1√

([N + 1

])∣∣∣∣+ ∣∣∣∣ η√N

([N + 1

])−W

))∣∣∣∣= op(1).

Since(W (N)− ηW

))/√N

D= W (η − 1), we have

limm→∞

)−1/2

maxk∗≤k≤N

∣∣∣Wm(k, ℓ)

∣∣∣√m

− |Jm(k∗, N)|

≤ βm(γ)

m→∞P

)−1/2∣∣∣Wm(N, [(N + 1)/η])

∣∣∣√m

− |Jm(k∗, N)|

≤ βm(γ)

m→∞P(σW (η − 1) ≤

√η − 1σx

)= Φ(x).

The proof is complete.

Proof of Theorem 3.4. By combining Lemmas C.11–C.14, we have

limm→∞

P (τm ≥ N) = limm→∞

1≤k≤N

|ΓFLm (k, ℓ)|

gFL(m, k)≤ 1

)= lim

m→∞P

)−1/2

maxk∗≤k≤N

(|Wm(k, ℓ)|√

m− |Jm(k∗, N)|

)≤ βm(γ)

)= Φ(x).

Or we can rewrite it as

limm→∞

(τ2m − a2m2ambm

)= 1− lim

m→∞P (τ2m ≥ a2m + 2ambmx)

= 1− limm→∞

P (τ2m ≥ N2(m,−x))

= Φ(x). (C.45)

Since a2m/(2ambm) → ∞, (C.45) implies that τm/amp−→ 1, and the mean value theorem

yields thatτm − am

(τ2m)1/2 − (a2m)1/2

τ2m − a2m2ambm

(1 + op(1)).

From Slutsky’s theorem, we thus obtain

limm→∞

(τm − am

bm≤ x

)= lim

m→∞P

(τ2m − a2m2ambm

)= Φ(x).

The proof is complete.

Table 3.1: Critical values

α 1% 2.5% 5% 10% 1% 2.5% 5% 10%

(a) κ = 1 (b) κ = 2

γ = 0.05 1.4641 1.2786 1.1296 0.9626 1.8146 1.5781 1.3866 1.1710γ = 0.15 1.5738 1.3774 1.2180 1.0429 1.8918 1.6456 1.4483 1.2267γ = 0.25 1.6989 1.4888 1.3213 1.1371 1.9725 1.7185 1.5161 1.2912γ = 0.35 1.8400 1.6193 1.4476 1.2603 2.0632 1.8059 1.5979 1.3776γ = 0.45 2.0250 1.8149 1.6523 1.4783 2.1776 1.9295 1.7361 1.5453

(c) κ = 3 (d) κ = 4

γ = 0.05 2.0005 1.7353 1.5230 1.2803 2.1099 1.8332 1.6073 1.3494γ = 0.15 2.0597 1.7875 1.5700 1.3222 2.1582 1.8751 1.6447 1.3837γ = 0.25 2.1215 1.8433 1.6220 1.3727 2.2083 1.9194 1.6857 1.4226γ = 0.35 2.1878 1.9061 1.6824 1.4389 2.2600 1.9694 1.7351 1.4780γ = 0.45 2.2702 1.9969 1.7894 1.5789 2.3288 2.0456 1.8232 1.6024

(e) κ = 5 (f) κ = 6

γ = 0.05 2.1886 1.9000 1.6647 1.3935 2.2443 1.9490 1.7051 1.4280γ = 0.15 2.2295 1.9353 1.6964 1.4218 2.2792 1.9799 1.7326 1.4531γ = 0.25 2.2717 1.9718 1.7301 1.4553 2.3158 2.0121 1.7625 1.4810γ = 0.35 2.3146 2.0145 1.7712 1.5055 2.3524 2.0463 1.7981 1.5247γ = 0.45 2.3686 2.0755 1.8480 1.6187 2.3965 2.1005 1.8666 1.6302

(g) κ = 7 (h) κ = 8

γ = 0.05 2.2866 1.9856 1.7365 1.4540 2.3171 2.0131 1.7603 1.4742γ = 0.15 2.3174 2.0126 1.7609 1.4756 2.3446 2.0369 1.7823 1.4931γ = 0.25 2.3490 2.0404 1.7864 1.5002 2.3727 2.0617 1.8051 1.5156γ = 0.35 2.3811 2.0703 1.8173 1.5397 2.4026 2.0885 1.8339 1.5516γ = 0.45 2.4182 2.1179 1.8802 1.6394 2.4370 2.1327 1.8918 1.6470

Table 3.2: Size of the tests (κ = 3)

m ρ CUSUM FL

50 0.4 0.088 0.0990.8 0.169 0.166

100 0.4 0.073 0.0730.8 0.113 0.107

250 0.4 0.052 0.0620.8 0.072 0.073

Table 3.3: Power of the tests (κ = 3, d = [1, 1]′)

k∗ = 1 k∗ = 0.8m k∗ = 1.5m

b CUSUM FL CUSUM FL CUSUM FL

(a) m = 50, ρ = 0.40.0 0.088 0.099 0.088 0.099 0.088 0.0990.5 0.609 0.287 0.373 0.814 0.259 0.8701.0 0.987 0.815 0.845 0.999 0.642 1.0001.5 1.000 0.986 0.990 1.000 0.911 1.0002.0 1.000 0.999 1.000 1.000 0.989 1.000

(b) m = 50, ρ = 0.80.0 0.169 0.166 0.169 0.166 0.169 0.1660.5 0.759 0.497 0.563 0.881 0.438 0.9181.0 0.986 0.876 0.917 0.996 0.797 0.9981.5 0.998 0.976 0.992 0.999 0.953 1.0002.0 0.999 0.995 0.998 1.000 0.991 1.000

(c) m = 100, ρ = 0.40.0 0.073 0.073 0.073 0.073 0.073 0.0730.5 0.867 0.494 0.566 0.988 0.372 0.9991.0 1.000 0.993 0.984 1.000 0.878 1.0001.5 1.000 1.000 1.000 1.000 0.997 1.0002.0 1.000 1.000 1.000 1.000 1.000 1.000

(d) m = 100, ρ = 0.80.0 0.113 0.107 0.113 0.107 0.113 0.1070.5 0.896 0.602 0.661 0.978 0.480 0.9921.0 1.000 0.980 0.986 1.000 0.910 1.0001.5 1.000 0.999 1.000 1.000 0.996 1.0002.0 1.000 1.000 1.000 1.000 1.000 1.000

(e) m = 250, ρ = 0.40.0 0.052 0.062 0.052 0.062 0.052 0.0620.5 0.999 0.955 0.907 1.000 0.677 1.0001.0 1.000 1.000 1.000 1.000 1.000 1.0001.5 1.000 1.000 1.000 1.000 1.000 1.0002.0 1.000 1.000 1.000 1.000 1.000 1.000

(f) m = 250, ρ = 0.80.0 0.072 0.073 0.072 0.073 0.072 0.0730.5 0.998 0.930 0.921 1.000 0.720 1.0001.0 1.000 1.000 1.000 1.000 0.998 1.0001.5 1.000 1.000 1.000 1.000 1.000 1.0002.0 1.000 1.000 1.000 1.000 1.000 1.000

Table 3.4: Power of the tests (κ = 3, b = 0.5, d = [1, 1]′)

k∗ CUSUM FL CUSUM FL CUSUM FL

(a) m = 50, ρ = 0.4 (b) m = 100, ρ = 0.4 (c) m = 250, ρ = 0.40.1m 0.548 0.321 0.802 0.482 0.995 0.9090.2m 0.501 0.485 0.748 0.729 0.987 0.9900.3m 0.471 0.595 0.704 0.870 0.976 0.9990.4m 0.447 0.677 0.666 0.931 0.965 1.0000.5m 0.426 0.720 0.637 0.961 0.951 1.0000.6m 0.407 0.762 0.613 0.977 0.936 1.0000.7m 0.390 0.791 0.590 0.982 0.921 1.0000.8m 0.373 0.814 0.566 0.988 0.907 1.0000.9m 0.357 0.830 0.539 0.992 0.887 1.0001.0m 0.339 0.845 0.513 0.994 0.864 1.0001.1m 0.322 0.850 0.486 0.996 0.832 1.0001.2m 0.306 0.861 0.457 0.997 0.799 1.0001.3m 0.289 0.864 0.429 0.998 0.766 1.0001.4m 0.274 0.871 0.401 0.998 0.727 1.0001.5m 0.259 0.870 0.372 0.999 0.677 1.000

(d) m = 50, ρ = 0.8 (e) m = 100, ρ = 0.8 (f) m = 250, ρ = 0.80.1m 0.714 0.493 0.851 0.557 0.994 0.8770.2m 0.672 0.652 0.807 0.778 0.985 0.9800.3m 0.645 0.755 0.773 0.878 0.977 0.9980.4m 0.622 0.805 0.747 0.933 0.967 1.0000.5m 0.608 0.834 0.723 0.954 0.956 1.0000.6m 0.593 0.851 0.700 0.966 0.947 1.0000.7m 0.576 0.872 0.679 0.974 0.933 1.0000.8m 0.563 0.881 0.661 0.978 0.921 1.0000.9m 0.550 0.894 0.645 0.983 0.907 1.0001.0m 0.534 0.902 0.624 0.987 0.885 1.0001.1m 0.515 0.906 0.597 0.989 0.863 1.0001.2m 0.500 0.910 0.570 0.990 0.829 1.0001.3m 0.486 0.914 0.542 0.991 0.800 1.0001.4m 0.460 0.916 0.512 0.991 0.759 1.0001.5m 0.438 0.918 0.480 0.992 0.720 1.000

le3.5:

time(κ

[1,1]′)

1k∗=

=50,ρ=

le3.5:

tinued)

1k∗=

Table 3.6: Results of the monitoring tests

(a) Tests for a unit root

Training period t-statistic(ADF) Exp-Wald

Denmark 1995Q2-2001Q3 −3.290479c 1.975Japan 1990Q2-2007Q4 −3.311520c 1.070New Zealand 1993Q1-2007Q4 −3.516139b 1.755

(b) Performance of the monitoring procedures

Training Estimated CUSUM FL

period break date

Denmark1995Q2-2001Q3 2001Q4 2002Q2 2007Q4

Japan1990Q2-1998Q4 2008Q3 2010Q3 2016Q21990Q2-2006Q2 2010Q4 2009Q41990Q2-2007Q4 2009Q3 **

New Zealand1993Q1-2001Q3 2008Q1 ** 2010Q31993Q1-2006Q3 2010Q1 2010Q21993Q1-2007Q4 2009Q2 **1 a, b, c denote statistical significance at 1%, 5%, and 10% level.2 ** means that the test cannot reject the null hypothesis of no change.

Figure 3.1: Logarithm of quarterly real GDP

Chapter 4

A New Test for Common Breaks inHeterogeneous Panel Data Models

In this chapter, we develop a new test to detect whether the break points are common in

heterogeneous panel data models where the time series dimension T could be large relative to

cross section dimension N. The error process is assumed to be cross-sectionally independent.

The test is based on the cumulative sum of the ordinary least squares residuals. We derive the

asymptotic distribution of the detecting statistic under the null hypothesis, while proving the

consistency of the test under the alternative. Monte Carlo simulations show good performance

of the test in terms of both size and power.

4.1 Introduction

In recent years, panel data models have become increasingly popular in the theoretical and

empirical analyses, since richer information from both the cross-section and time series dimen-

sion leads to more powerful inferences than with a single cross-section or a single time-series.

In particular, the modeling and inferences of structural changes in panel frameworks have at-

tracted great attention in the literature. Comparing to applying the single detection method

for structural changes separately to each series, using cross-section datasets improves break

detection power. The detecting procedures in panels are often designed to test for the null

hypothesis that the regression parameters in each series are constant over time against the

alternative that at least one series exhibits structural changes. See, for example, Horvath

and Huskova (2012) in a mean-shift panel model; De Wachter and Tzavalis (2012), Hidalgo

and Schafgans (2017) in dynamic panels; Pauwels et al. (2012) in panel data models allowing

for heterogeneous coefficients; Chen and Huang (2018) in time-varying panel data model;

Antoch et al. (2019) in panels with fixed T and large N, to name a few. However, the rejec-

tion of the null hypothesis leaves the researcher with no information as to which cross-section

unit exhibits structural changes. Furthermore, it naturally leads to an issue of change point

estimation in panel data models.

The classical change point estimation methodologies in panel literature often assume that

break point occurred in each series at the same location, referred as to the common break

point. This assumption is particularly attractive as the common break phenomenon exactly

occurs in many practical applications. The other major advantage of this assumption, as

Bai (2010) pointed out, is the increased accuracy of the change point estimate. It is well

known that only the break fraction (i.e., the break date divided by sample size) can be

consistently estimated in a single time series. In panel frameworks, however, the failure of

consistency of the break point in time series models has been overcome under the common

break assumption. This enhanced precision of common break point estimate has been widely

confirmed under various frameworks in panel data analyses. Kim (2011, 2014) focused on

panel deterministic time trend models and considered a factor structure on error component.

Although the former study stated that the ordinary least squares break date estimator fails

to achieve consistency as imposing the factor structure, the latter one overcame this problem

and developed a new estimation strategy, where the common break date is estimated jointly

with the common factor to successfully sustain precision advantage of common break point

estimate in panels. In addition, Qian and Su (2016) coped with a panel data model where

the parameters of interest are homogeneous and errors are assumed to be cross-sectionally

independent, while Baltagi et al. (2016) considered a more general panel framework allowing

for heterogeneous parameters across individuals and multifactor error structure. More related

works including Li et al. (2016), Baltagi et al. (2017), Horvath et al. (2017), Westerlund

(2019), and among others, have documented that the break date estimate obtains increased

precision via imposing common break assumption in panels.

In practice, however, the common break assumption is restrictive in a sense and some

evidence has verified that the break points are likely to vary significantly across individuals

(see Claeys and Vasıcek 2014; Adesanya 2020). To the best of our knowledge, none of the

papers focus on the validity of common break assumption in panels. We contribute to the

literature in three ways. First, we fill in this gap to introduce a test for the null hypothesis

that the panels exhibit a common break against the alternative that break dates can vary

across units. A closest related work is Oka and Perron (2018), who considered the common

break detection in maximum likelihood frameworks in multiple equation systems. We extend

their model to a more general framework where both the number of series N and the number

of observations T are sufficiently large, which makes it available using panel or macroeconomic

data in applications.

The second major contribution is that we investigate the statistical properties of the

estimated common break point when common break assumption fails. It is verified that

the common break estimate cannot be consistent for each series, but will be restricted in a

specific region. Based on this property, our test has a non-degenerate distribution under the

null hypothesis and achieves consistency under the alternative.

Third, our test delivers monotonic power as the magnitude of breaks rises. The statistic is

established by the squares of the cumulative sum of the residuals, and we use a normalization

factor to replace the long-run variance estimator to avoid power loss when the shift increases

under the alternative (so-called nonmonotonic power problem). Monte Carlo simulations

show good size performance for large T. Moreover, the test can successfully reject the null

hypothesis of common break against various types of alternatives and has nontrivial power

for large breaks.

From a different perspective, recent clustering literature suggested an estimation method-

ology as an alternative strategy to identify distinct breaks across units in panels. The panel

data is modeled using a grouped pattern, in which the regression coefficients containing break

dates are heterogeneous across groups but homogeneous within a group. In this framework,

Okui and Wang (2020), and Lumsdaine et al. (2020) proposed iterative estimation approaches

to jointly estimate the break point, group membership structure, and coefficients. Consistency

of all coefficients estimates can be achieved simultaneously within the prior information on

the number of groups and an appropriate choice of the initial values for iteration. Researchers

can determine to conduct a testing procedure or to apply an estimation methodology or to

use a hybrid of two approaches depending on their empirical purpose.

The remainder of this chapter is as follows. Section 4.2 introduces the model and neces-

sary assumptions. Section 4.3 explains the testing strategy for common break assumption.

Section 4.4 establishes the asymptotic distribution of the statistic under the null and the

consistency of the test under the alternative. Monte Carlo simulation is conducted in Section

4.5. Concluding remarks are given in Section 4.6. The mathematical proofs are relegated to

the Appendix D.

4.2 Model and Assumptions

We consider a panel data model allowing for heterogeneous coefficients across units, defined

yit = x′itβi + x′itδi1t>k0i + uit, 1 ≤ i ≤ N and 1 ≤ t ≤ T, (4.1)

where xit = [xit(1), · · · , xit(p)]′ is p-dimensional explanatory variables including a constant

term and thus the first element is unity for all t. Coefficients βi = [βi1, · · · , βip]′, δi =

[δi1, · · · , δip]′ are p×1 vectors of fixed parameters, and 1t>k0i is an indicator function, taking

the value one if t > k0i and zero otherwise. uit is an unobservable stochastic disturbance. We

assume that the regression parameters in the ith panel change from βi to βi+ δi at unknown

time k0i , and we are interested in testing whether the break point in each series is common

against the alternative that the break point varies across individuals. The null hypothesis is

defined as

H0 : k0i = k0, for all i = 1, 2, · · · , N.

Under the alternative of distinct breaks across individuals, we suppose that there exist G

groups and the regression coefficients share the common break point in each group g =

1, 2, · · · , G. Then, the alternative hypothesis is defined by

HA : k0g1 = k0g2 , for some g1, g2 ∈ 1, 2, · · · , G.

In this chapter, we impose the following assumptions.

Assumption 4.1 k0i = [Tτ 0i ], where τ0i ∈ (0, 1) and [·] is the greatest integer function.

The break point k0i is assumed to be bounded away from the end points, which is a positive

fraction of the total sample size. This is a conventional assumption in the change point

literature, see Bai (1997).

Assumption 4.2 Define ϕN =∑N

i=1 δ0′i δ0i . Suppose that

(i) ϕN → ∞ as N → ∞

(ii) ϕNN is bounded as N → ∞,

(iii) ϕNTN → ∞, ϕN

√TN → ∞, and ϕN

N → ∞ as (T,N) → ∞,

(iv) NT → 0 as (T,N) → ∞.

Denote δ0i as true shift for individual i. Assumptions 4.2(i)-(iii) are borrowed from Assump-

tions A2 in Baltagi et al. (2016). The additional Assumption 4.2(iv) requires that T grows at

a faster rate than N . This is a significant condition to ensure a non-degenerate distribution

of the statistic under the null hypothesis and consistency of test under the alternative.

Assumption 4.3 (i) For each series i, uit is independent of xit for all i and t;

(ii) uit =∑∞

j=0 aijϵi,t−j, ϵit ∼ (0, σ2iϵ) are i.i.d over all i and t;

∑j j|aij | ≤ M for all i.

The idiosyncratic errors form a stationary time series, and it is assumed that uit are cross-

sectionally independent, similarly to the assumption in Bai (2010). In practice, this assump-

tion is relatively restrictive as cross-sectional dependence commonly exists in many panel

datasets. As explained in Sections 4.3 and 4.4, the statistic of our test can have a non-

degenerate distribution under the null hypothesis of common break, crucially depending on

the consistency of common change point estimate. However, Kim (2011) has indicated that

imposing a factor structure on error component may impede the consistency property. Some

additional techniques are needed if we relax Assumption 4.3 to allow for cross-sectional de-

pendence.

Assumption 4.4 (i) For i = 1, · · · , N , the matrices (1/j)∑j

t=1 xitx′it, (1/j)

∑Tt=T−j+1 xitx

′it,

(1/j)∑k0i

t=k0i−j+1xitx

′it, and (1/j)

∑k0i+j

t=k0i+1xitx

′it are stochastically bounded and have mini-

mum eigenvalues uniformly bounded away from zero in probability for all large j.

(ii) For each i, (1/T )∑T

t=1 xitx′it converges in probability to a nonrandom and positive defi-

nite p× p matrix Ci as T → ∞.

(iii) For each i, (1/T )∑T

t=1 xit converges in probability to a p× 1 vector ci1 as T → ∞.

Denote the jth row of Ci by cij for j = 1, · · · , p. That is, C = [ci1, · · · , cip]′. Note that the

vector c′i1 is the first row of Ci.

Assumption 4.5 (i) For any positive finite integer s, the matrices (1/N)∑N

∑k0it=k0i−s+1

xitx′it,

and (1/N)∑N

∑k0i+s

t=k0i+1xitx

′it are stochastically bounded and have minimum eigenvalues

uniformly bounded away from zero in probability for all large N .

(ii) For each t, (1/N)∑N

i=1 xitx′it is stochastically bounded as N → ∞.

Assumptions 4.4 is a conventional assumption in time series models, see, e.g., Bai (1997),

while Assumption 4.5 is an extension in the case of the cross-sectional dimension borrowed

from Assumptions A5 in Baltagi et al. (2016).

4.3 Test Statistic

The null hypothesis assumes that the panels exhibit one break occurring at an unknown

common location. We first use least squares method as proposed by Baltagi et al. (2016) to

estimate the common break point. Let

yi1yi2...

, Xi =

x′i1x′i2...

x′iT

, Zi(ki) =

x′i(ki+1)...

x′iT

, and ui =

ui1ui2...

The model with an unknown break point ki can be rewritten in matrix form as

Yi = Xiβi + Zi(ki)δi + ui

= [Xi, Zi(ki)]

[βiδi

= Xi(ki)bi + ui, (4.2)

where Xi(ki) = [Xi, Zi(ki)], and bi = [β′i, δ

′i]′. Given any k∗ = 1, 2, · · · , T−1, one can estimate

bi(k∗) =

[βi(k

δi(k∗)

]= [Xi(k

∗)′Xi(k∗)]−1Xi(k

∗)′Yi, i = 1, · · · , N.

The sum of squared residuals for ith equation is given by

SSRi(k∗) = [Yi − Xi(k

∗)bi(k∗)]′[Yi − Xi(k

∗)bi(k∗)], i = 1, · · · , N.

The least squares estimator of k∗ is defined as

k = arg min1≤k∗≤T−1

N∑i=1

πiSSRi(k∗). (4.3)

where weights πi ∈ (0, 1), i = 1, · · · , N ,∑N

i=1 πi = 1.

Our statistic is composed of the ordinary least squares residuals based on the estimated

common break point k. We decompose the panels into two regimes using k on the time series

dimension. Then, the OLS residuals are calculated by

ui1ui2...

= Yi − Xi(k)bi(k), (4.4)

and the squares of the partial sum of the OLS residuals uit are defined by

USNT (k, k) =

(1√NT

N∑i=1

k∑t=1

, where k = [Tτ ] with τ ∈ (0, 1). (4.5)

The statistic is a CUSUM-type of residuals, motivated by the consistency of the break point

estimate if common break assumption holds. Under the null hypothesis that all individuals

are assumed to share a common change point k0 = [Tτ 0] with τ0 ∈ (0, 1), Baltagi et al.

(2016) verified that the common break date is consistently estimated. Based on the con-

sistency that kp−→ k0, the regression parameters corresponding to regimes xi1, · · · , xik,

xi(k+1), · · · , xiT are asymptotically constant over time. Consequently, the cumulative sums

of the corresponding residuals will not diverge and can have a non-degenerate distribution,

which is derived as follows:

USNT (k, k) ⇒

σ2[W (τ)− τ

τ0W (τ0)]2 if τ ≤ τ0

σ2[W (τ)−W (τ0)− τ−τ0

1−τ0(W (1)−W (τ0))]2 if τ > τ0

where W (·) is a one-dimensional Brownian motion, σ2 is the long-run variance. Under the

alternative of distinct breaks, since the estimated common break point cannot coincide with

the true break point for each series, partial residuals will greatly deviate from the one under

the null. Hence, USNT (k, k) will diverge to infinite as N,T → ∞ such that we can successfully

reject the null hypothesis.

A traditional approach is using a consistent estimate to replace the unknown σ2, while the

kernel estimator is commonly applied. Typically, the selection of the bandwidth for the kernel

estimator greatly affects the size and power performance of the test. In time series analyses,

it has been extensively mentioned that the structural change tests suffer from the so-called

non-monotonic power problem, that is, the tests may lose power as the magnitude of the break

rises. See Vogelsang (1999), Deng and Perron (2008), Yamazaki and Kurozumi (2015), Jiang

and Kurozumi (2019), and among others. The main reason is that the long-run variance

estimated under the null is consistent but may be severely biased under the alternative.

To maintain nontrivial detection power for large breaks, we extend the self-normalization

method proposed by Shao and Zhang (2010) to construct a normalization factor instead of

using the long-run variance estimate. This normalization factor VNT (k1, k, k2) is required to

be proportional to σ2 such that the long-run variance can be canceled out as

USNT (k, k)

VNT (k1, k, k2)⇒ σ2 functional of Brownian motions

σ2 functional of Brownian motions,

where the long-run variance σ2 is lim(N,T )→∞E

(1√NT

N∑i=1

T∑t=1

. Furthermore, the nor-

malization process cannot grow at a faster rate relative to the process USNT (k, k) under the

alternative to avoid loss of power. To this end, we separate the panels into four regimes by flex-

ible points k1, k2 and the estimated break point k, where k1 and k2 takes value in the interval

1 ≤ k1 < k < k2 ≤ T −1. We estimate the model on the basis of four regimes xi1, · · · , xik1,

xi(k1+1), · · · , xik, xi(k+1), · · · , xik2 and xi(k2+1), · · · , xiT for the ith equation. Denote

T × p matrices by

Xji(a, b) = [0, · · · , 0, xi,a+1, · · · , xi,b, 0, · · · , 0]′, j = 1, 2, (4.6)

X3i(a) = [0, · · · , 0, xi,a+1, · · · , xT ]′, (4.7)

where the elements of the (a+1)th-bth rows of Xji(a, b) are the same as that of Xi and zeros

otherwise, and the elements of the (a + 1)th-T th rows of X3i(a) are the same as that of Xi

and zeros otherwise. Then, the model can be represented by

Yi = [Xi, X1i(k1, k), X2i(k, k2), X3i(k2)]

βiδ1iδ2iδ3i

= Xiβi +X1i(k1, k)δ1i +X2i(k, k2)δ2i +X3i(k2)δ3i + ui

= Xi(k1, k, k2)bi + ui, (4.8)

where Xi(k1, k, k2) = [Xi, X1i(k1, k), X2i(k, k2), X3i(k2)]. Using the coefficients estimators βi,

δ1i, δ2i, and δ3i, the corresponding residuals are calculated as

ui1ui2...

= Yi −Xiβi −X1i(k1, k)δ1i −X2i(k, k2)δ2i −X3i(k2)δ2i. (4.9)

Then, we define the process VNT (k1, k, k2) based on residuals uit as

VNT (k1, k, k2)

k1∑s=1

(1√NT

N∑i=1

s∑t=1

k∑s=k1+1

1√NT

N∑i=1

k∑t=s

k2∑s=k+1

1√NT

N∑i=1

s∑t=k+1

T∑s=k2+1

(1√NT

N∑i=1

T∑t=s

. (4.10)

Thus, our test statistic is composed by the squared CUSUM of residuals (4.5) and the nor-

malization factor (4.10), defined by

SNT (k, k1, k2) = sup(k,k1,k2)∈Ω(ϵ)

USNT (k, k)

VNT (k1, k, k2)

= sup(k,k1,k2)∈Ω(ϵ)

(1√NT

N∑i=1

k∑t=1

k1∑s=1

(1√NT

N∑i=1

s∑t=1

k∑s=k1+1

1√NT

N∑i=1

k∑t=s

k2∑s=k+1

1√NT

N∑i=1

s∑t=k+1

T∑s=k2+1

(1√NT

N∑i=1

T∑t=s

where Ω(ϵ) = (k, k1, k2) or (τ, τ1, τ2) : [Tϵ] ≤ k ≤ [T (1− ϵ)], [Tϵ] ≤ k1 ≤ k − [Tϵ], k + [Tϵ] ≤

k2 ≤ [T (1− ϵ)]. k = [Tτ ], k1 = [Tτ1] and k2 = [Tτ2] with τ, τ1, τ2 ∈ (0, 1).

4.4 Asymptotic Theory

We next derive the limiting properties of the test statistic.

Theorem 4.1 Suppose that Assumptions 4.1-4.5 hold. Then, under H0, we have, as N,T →

SNT (k, k1, k2)

⇒ sup(τ,τ1,τ2)∈Ω(ϵ)

W (τ)− τ

W (τ0)

τ0− (τ − τ0)

[W (1)−W (τ0)

1− τ0− W (τ0)

]1τ>τ0

∫ τ1

[W (r)− r

W (τ1)

]2dr +

∫ τ0

[W (τ0)−W (r)− (τ0 − r)

W (τ0)−W (τ1)

τ0 − τ1

∫ τ2

[W (r)−W (τ0)− (r − τ0)

W (τ2)−W (τ0)

τ2 − τ0

[W (1)−W (r)− (1− r)

W (1)−W (τ2)

1− τ2

where W (·) is a standard Brownian motion, k = [Tτ ], k1 = [Tτ1], k0 = [Tτ 0], and k2 = [Tτ2]

with τ, τ0, τ1, τ2,∈ (0, 1).

Under the null hypothesis, the proposed test has a non-standard limit distribution depending

on the true break fraction, which is unknown in practice. We choose τ0 = 0.1, 0.2, · · · , 0.9,

and approximate Brownian motions using 2,000 independent normal random variables with

10,000 replications to obtain the critical values in Table 4.1. A researcher can calculate

an appropriate critical value depending on the value of the estimated break fraction. For

example, if τ ∈ [0.4, 0.5), we obtain the critical value by

c = c0.4 + 10(τ − 0.1)(c0.5 − c0.4).

We next investigate the behavior of the proposed test statistic when the breaks vary across

individuals. We focus on the case that there are two groups and individuals in the same

group share a common break k0j , j = 1, 2.

H1A : |k01 − k02| ≥ ∆T, for some ∆ > 0.

Assumption 4.6 Let Nj , j = 1, 2, denote the number of units in group j (N = N1 + N2).

Suppose that Nj/N → πj > 0 for j = 1, 2.

In order to characterize the limiting properties of the test statistic under the alternative, it is

useful to first state some preliminary results about the statistical properties of the estimated

common break point. Define K(C) = k : 1 ≤ k < k01 − C1, k02 + C2 < k ≤ T − 1, where

C1, C2 are finite numbers.

Proposition 4.1 Suppose that Assumptions 4.1-4.6 hold. Then, under H1A, for any given

ϵ > 0, for both large N and T ,

P (k ∈ K(C)) < ϵ.

Proposition 4.1 states the possible region of the location of the common break date estimator

when the common break assumption fails. It implies that this estimator will be bounded

away from the both end points. In other words, the estimated common break point will

possibly lie between the two true break points, or be stochastically bounded by either of the

true break dates.

Proposition 4.2 Suppose that Assumptions 4.1-4.6 hold. Under H1A, for both large N and

(i) if k < k01,

supk∈Ω(ϵ)

USNT (k, k) = Op(NT ),

(ii) if k01 ≤ k ≤ k02,

supk∈Ω(ϵ)

(iii) if k02 < k,

supk∈Ω(ϵ)

Proposition 4.2 derives the divergence rate of the process USNT (k, k) under the alternative.

In case (i), Proposition 4.1 implies that the common change point estimate is bounded by

the true break point k01, that is k01− k = Op(1). Since we assume that the two true breaks are

separated by some positive fraction of the sample size, k will get distant from the other break

date k02. Therefore, for individuals in group 2, the regression parameters will be estimated

based on an inconsistent break fraction estimate. Then, we can find that the CUSUM of

corresponding residuals uit in USNT (k, k) will diverge to infinity at a rate of NT . For the

second and third cases, it is shown that the divergence rate of the process USNT (k, k) is the

same as that in case (i).

Proposition 4.3 Suppose that Assumptions 4.1-4.6 hold. Under H1A, for any given ϵ > 0,

there exists a finite M > 0 such that, for both large N and T ,

inf(k1,k2)∈Ω(ϵ)

VNT (k1, k, k2) > M∣∣∣k01 − C1 < k ≤ k01

)< ϵ,

inf(k1,k2)∈Ω(ϵ)

VNT (k1, k, k2) > M∣∣∣k01 < k < k02

)< ϵ,

inf(k1,k2)∈Ω(ϵ)

VNT (k1, k, k2) > M∣∣∣k02 ≤ k < k02 + C2

)< ϵ.

Proposition 4.3 investigates the limiting properties of the normalization process under the

alternative. The results state that inf (k1,k2)∈Ω(ϵ) VNT (k1, k, k2) is Op(1). Since the model

is estimated based on four subsamples for the normalization factor, we can eventually find

appropriate k1 and k2 such that the minimization will not diverge. The numerator of the

statistic diverges at a rate of NT and the denominator has a finite limit. Then, we derive

the consistency of the test under the alternative in the following theorem.

Theorem 4.2 Suppose that Assumptions 4.1-4.6 hold. Then, under H1A, we have, as

N,T → ∞,

SNT (k, k1, k2) → ∞.

The consistency of this test is achieved under a particular and specified alternative H1A.

Nevertheless, our simulations confirm that this test is still valid and powerful against a

variety of alternatives.

In this section, we investigate the finite sample performance of the test considered in the

previous sections. The data-generating process (DGP.1) under the null hypothesis of one

common break is given by

yit = x′itβi + x′itδi1t>k0 + uit, i = 1, · · · , N, t = 1, · · · , T.

where xit = [1, zit]′ includes a constant, each zit has normal distribution N(1, 1) and is

independent of the errors uit, 1 ≤ t ≤ T, 1 ≤ i ≤ N . We assume that there exists a common

break k0 = [0.5T ] in the slopes. The coefficients βi ∼ i.i.d.U(0, 0.8) and δi is the jump for each

series with δi ∼ i.i.d.U(0, 0.5). We allow for serial correlation in the errors uit = ρui(t−1)+ eit

with eit ∼ i.i.d.N(0, (1 − ρ)2). The trimming parameter ϵ is 0.1, the number of replications

is 2,000, and all computations are conducted using the GAUSS matrix language.

Table 4.2 summarizes the empirical sizes of the test for different pairs of (N,T ). In the

case of i.i.d. errors, the nominal rejection rate is close to the corresponding significance level

of the test. When the errors are allowed to be serially correlated with ρ = 0.4, 0.8, for small

N and T , the size distortion is quite noticeable. The size improves for large T and appears

to be quite close to nominal level at T = 200 corresponding to ρ = 0.4.

In practice, usually no prior information is available on the form of structural changes for

researchers. Therefore, we conduct extensive simulations to explore the empirical power of

the test for various group patterns of structural change and different magnitude of the break.

We consider three types of the alternative hypotheses:

• H1A: There are two groups and the series in each group share common break k0j ,

j = 1, 2. Let Nj denote the number of units in group j and N = N1 +N2.

• H2A: There are three groups and the series in each group share common break k0j ,

j = 1, 2, 3. Note that N = N1 +N2 +N3.

• H3A: Suppose that there is no group pattern. The break point for jth series is given

by k0j , j = 1, 2, · · · , N .

The data generating process (DGP.2) under H1A is given byyit = x′itβi + x′itδ1i1t>k01 + uit t = 1, · · · , T, for i in group 1,

yit = x′itβi + x′itδ2i1t>k02 + uit t = 1, · · · , T, for i in group 2,

which is the same as DGP.1 except that the change point varies across groups. The first group

exhibits one common break at k01 = [T/4], and we set the time of change k02 equal to [3T/4] in

the second group. We assume βi ∼ i.i.d.U(0, 0.8) and the jumps δ1i, δ2i ∼ i.i.d.U(0, 0.5). The

ratio of units among groups is set to N1 : N2 = 5 : 5. Table 4.3 shows that the test is powerful

except when N is 10 under H1A. Table 4.4 reports the effect of the magnitude of change on

power. As we expected, the proposed test delivers monotonic power. We further find that

the power of the test is sensitive to change points location. In Table 4.5, we fix one common

break at [0.2T ] and the other break changes from [0.25T ] to [0.8T ]. In the case that two

breaks are well separated by a positive fraction of the sample size, the common break cannot

be consistently estimated for at least one group. The residuals based on an inconsistent

estimator get farther away from the one under the null, which makes the test powerful to

reject the null. When two break dates are quite close and cannot be easily identified, the

small deviation of the residuals results in power loss of the test. Table 4.6 shows that the test

is powerful when the number of observations in each group is sufficiently large, while loses

power when the number of the units in one of the groups is relatively small. Furthermore,

we investigate the power performance when detecting orthogonal structural changes. The

magnitude of the change for individual i is ∆i ∼ i.i.d.U(0, 0.5). We use vectors a = [1, 1]′

and b = [1,−1]′ to represent the directions of the breaks. Then, the jumps are specified by

δ1i, δ2i = a∆i or b∆i, which corresponds to non-orthogonal changes or orthogonal changes,

respectively. We consider three cases: (a) non-orthogonal changes (δ1i, δ2i = a∆i), (b) (non-

)orthogonal changes (δ1i = a∆i, δ2i = b∆i), (c) orthogonal changes (δ1i, δ2i = b∆i). The

results are presented in Table 4.7. When detecting non-orthogonal structural changes, the

test is powerful, while the test loses power in the case of orthogonal structural changes. The

development of the test robust to the orthogonal change would be our future work.

We next investigate the power properties of the test under H2A, and the results for various

change points location and ratios of units among groups are reported in Table 4.8. The test

can successfully reject the null of one common break for large N and is less powerful for small

We also examine the empirical power under H3A that the change point varies across

individuals without group pattern. The change point for individual j is set to k0j = [Tτ 0j ],

j = 1, 2, · · · , N , while the break fraction τ0j is drawn from U(0.15, 0.75). Table 4.9 shows

that the test can successfully reject the null hypothesis of common break for large N.

In summary, the size of the proposed test is controlled for large N and T. In addition, the

test exhibits monotonic power as the magnitude of break increases and is powerful against

various alternatives.

4.6 Conclusion

In this chapter, we developed a new test based on OLS residuals to detect whether the

structural breaks across individuals occurred at the common location in panel data models.

The asymptotic properties of the test were investigated under the null and alternative. The

simulation results indicated that the test can successfully reject the null hypothesis of the

common break.

Appendix D

Supposing that the structural change occurred at a common location, Baltagi et al. (2016)

showed the consistency of the common break estimator,

lim(N,T )→∞

P (k = k0) = 1, which implies |k − k0| = op(1). (D.1)

In this Appendix, we derive the asymptotic distribution of the test statistic under the null

hypothesis by using this consistency property. We first focus on the limiting properties of

the numerator of the statistic. Model (4.2) with the true common break k0 is written as

Yi = Xi(k0)b0i + ui

= Xi(k)b0i + ui + [Xi(k

0)− Xi(k)]b0i

= Xi(k)b0i + ui + [Zi(k

0)− Zi(k)]δ0i , (D.2)

where b0i = [β0′i , δ0

′i ]′. Replacing Yi by (D.2), the residuals in (4.4) can be rewritten as

ui = Xi(k)b0i + ui + [Zi(k

0)− Zi(k)]δ0i − Xi(k)bi(k)

= ui − Xi(k)[bi(k)− b0i ] + [Zi(k0)− Zi(k)]δ

= ui −Xi[βi(k)− β0i ]− Zi(k)[δi(k)− δ0i ] + [Zi(k

0)− Zi(k)]δ0i , (D.3)

whose vector form is represented by

ui1ui2...

x′i1x′i2...

x′iT

(βi(k)−β0i )−

x′i(k+1)...

x′iT

(δi(k)−δ0i )+

x′i(k0+1)...

x′iT

x′i(k+1)...

x′iT

δ0i .

For the sake of simplicity, k is suppressed in βi(k) and δi(k). Then, the cumulative sum of

the residuals is

1√NT

N∑i=1

k∑t=1

=1√NT

N∑i=1

k∑t=1

uit −1√NT

N∑i=1

k∑t=1

x′it(βi − β0i )−

1√NT

N∑i=1

k∑t=k+1

x′it(δi − δ0i )1k>k

+1√NT

N∑i=1

k∑t=k0+1

x′itδ0i 1k0<k≤k +

1√NT

N∑i=1

k∑t=k0+1

x′itδ0i 1k0<k<k

− 1√NT

N∑i=1

k∑t=k+1

x′itδ0i 1k<k≤k0 −

1√NT

N∑i=1

k0∑t=k+1

x′itδ0i 1k<k0<k

= U1 − U2 − U3 + U4 + U5 − U6 − U7. (D.4)

We can show that the terms U4, U5, U6 and U7 are negligible as N,T → ∞. Since k is bounded

by k0 and k in U4, using the convergence property (D.1),

U4 =1√NT

N∑i=1

k∑t=k0+1

x′itδ0i 1k0<k≤k =

Top(1) = op

). (D.5)

Similarly, it is shown that the orders of the terms U5, U6 and U7 are op

(√NT

), which will

vanish sinceN/T → 0 in Assumption 4.2 (iv). The asymptotic distributions of the dominating

terms U1, U2, and U3 are derived in Lemma D.1.

Lemma D.1 Suppose that Assumptions 4.1-4.5 hold. We have, uniformly in τ ∈ (0, 1),

(i)1√NT

N∑i=1

k∑t=1

uit ⇒ σW (τ),

(ii)1√NT

N∑i=1

k∑t=1

x′it(βi − β0i ) ⇒ στ

W (τ0)

(iii)1√NT

N∑i=1

k∑t=k+1

x′it(δi − δ0i )1k>k ⇒ σ(τ − τ0)

[W (1)−W (τ0)

1− τ0− W (τ0)

]1τ>τ0,

where k = [Tτ ], k0 = [Tτ0], W (·) is a standard Brownian motion, and long-run variance σ2

is lim(N,T )→∞E

(1√NT

N∑i=1

T∑t=1

Proof of Lemma D.1. (i) Denote the process

1√NT

N∑i=1

k∑t=1

It is shown that, for a particular τ ,

1√NT

N∑i=1

[Tτ ]∑t=1

uitd−→ σW (τ),

as N,T → ∞. It remains to show that the weak convergence holds uniformly in τ ∈ (0, 1).

To this end, by Billingsley’s (1968) Theorem 12.3, we next show that moment condition (D.8)

is satisfied such that the process XN,T (τ) is tight. Applying Rosenthal’s inequality, we have,

∣∣∣∣XN,T

)−XN,T

)∣∣∣∣2γ = E

∣∣∣∣∣ 1√NT

N∑i=1

l∑t=k+1

∣∣∣∣∣2γ

≤ c1

N∑i=1

∣∣∣∣∣ 1√NT

l∑t=k+1

∣∣∣∣∣2γ

N∑i=1

(1√T

l∑t=k+1

≤ c1

N∑i=1

∣∣∣∣∣ 1√NT

l∑t=k+1

∣∣∣∣∣2γ

(l − k

, (D.6)

with some constants c1, c2, and c3. According to Phillips and Solo (1992) and p.637 of Horvath

and Huskova (2012), the partial sum of uit is composed of two parts,

k∑t=1

uit = ai

k∑t=1

ϵit + ηik,

where ηik = e∗i0−e∗ik, e∗it =

∑∞l=1 c

∗ilϵi(t−l), and c∗il =

∑∞k=l+1 cik. For the term ηik, Horvath and

Huskova (2012, p.640) showed that E|ηik|γ ≤ cE|ϵi0|γ . Then, using Minkowski’s inequality

and Rothenthal’s inequality, it is shown that, for γ > 1,

∣∣∣∣∣l∑

∣∣∣∣∣2γ

∣∣∣∣∣ail∑

ϵit + ηil − ηik

∣∣∣∣∣2γ

∣∣∣∣∣ail∑

∣∣∣∣∣2γ 1

+(E |ηil − ηik|2γ

) 12γ

l∑t=k+1

E |ϵit|2γ + c5

E(ϵit)2

)γ] 12γ

+(E |ηil − ηik|2γ

) 12γ

c4(l − k)E |ϵi0|2γ + c5(l − k)γ(E(ϵi0)

2)γ] 1

2γ+(E |ηil − ηik|2γ

) 12γ

c6(l − k)γE |ϵi0|2γ] 1

2γ+(E |ϵi0|2γ

) 12γ

≤ c7(l − k)γE |ϵi0|2γ ,

with some constants c4-c7. Then, we have, for γ > 1 ,

N∑i=1

∣∣∣∣∣ 1√NT

l∑t=k+1

∣∣∣∣∣2γ

(NT )γ

N∑i=1

∣∣∣∣∣l∑

∣∣∣∣∣2γ

(NT )γ

N∑i=1

c7(l − k)γE |ϵi0|2γ

≤ c7

(l − k

N∑i=1

E |ϵi0|2γ

≤ c8

(l − k

, (D.7)

with a constant c8. Combining (D.6) and (D.7), we can show that there exists a constant

γ > 1 such that

∣∣∣∣XN,T

)−XN,T

)∣∣∣∣2γ ≤ c8

(l − k

. (D.8)

(ii) By regressing Yi on Xi(k), the coefficient βi is estimated as, if k ≤ k0,

k∑t=1

xitx′it

−1k∑

xityit =

k∑t=1

xitx′it

−1k∑

xit(x′itβ

0i +uit) = β0

k∑t=1

xitx′it

−1k∑

xituit,

and if k > k0,

k∑t=1

xitx′it

−1 k0∑t=1

xit(x′itβ

0i + uit) +

k∑t=k0+1

xit(x′itβ

0i + x′itδ

0i + uit)

k∑t=1

xitx′it

−1k∑

xituit +

k∑t=1

xitx′it

−1k∑

t=k0+1

xitx′itδ

0i . (D.9)

Then, we can see that,

√T (βi − β0

k∑t=1

xitx′it

k∑t=1

xituit +

k∑t=1

xitx′it

k∑t=k0+1

xitx′itδ

0i 1k>k0 (D.10)

k0∑t=1

xitx′it

k0∑t=1

xituit +

k∑t=1

xitx′it

k0∑t=1

xitx′it

−1 1√T

k∑t=1

xituit

k0∑t=1

xitx′it

−1 1√T

k∑t=1

xituit −1√T

k0∑t=1

xituit

k∑t=1

xitx′it

k∑t=k0+1

xitx′itδ

0i 1k>k0

k0∑t=1

xitx′it

k0∑t=1

xituit + op

)Op(1) +Op(1)op

(1√T

)1k>k0

k0∑t=1

xitx′it

k0∑t=1

xituit + op

(1√T

), (D.11)

where we replace k by k0 using the consistency property (D.1) and following orders, 1

k∑t=1

xitx′it

k0∑t=1

xitx′it

k∑t=1

xitx′it

−1 1

k0∑t=1

xitx′it −

k∑t=1

xitx′it

k0∑t=1

xitx′it

= Op(1)op

)Op(1) = op

k∑t=1

xituit −1√T

k0∑t=1

xituit = op

(1√T

k∑t=1

xitx′it

k∑t=k0+1

xitx′itδ

0i = Op(1)op

Substituting (D.11) into the term U2 in (D.4), we have,

1√NT

N∑i=1

k∑t=1

x′it(βi − β0i ) =

N∑i=1

k∑t=1

x′it√T (βi − β0

=1√N

N∑i=1

k∑t=1

x′it

k0∑t=1

xitx′it

k0∑t=1

xituit +1√N

N∑i=1

k∑t=1

x′it

(1√T

=1√N

N∑i=1

k∑t=1

x′it

k0∑t=1

xitx′it

k0∑t=1

xituit + op

), (D.12)

where the second and third terms in the last equality vanish since N/T → 0 by Assumption

4.2(iv). From Assumptions 4.4 (ii)-(iii), we can see that,∥∥∥∥∥1kk∑

x′it − c′i1

∥∥∥∥∥ = op(1), and

∥∥∥∥∥∥(1

k∑t=1

xitx′it

− C−1i

∥∥∥∥∥∥ = op(1). (D.13)

Using orders in (D.13) and equality c′i1C−1i = [1, 0, · · · , 0], we have,∣∣∣∣∣∣ 1√

N∑i=1

k∑t=1

x′it

k0∑t=1

xitx′it

k0∑t=1

xituit −k

k01√NT

N∑i=1

k0∑t=1

∣∣∣∣∣∣=

∣∣∣∣∣∣ 1√N

N∑i=1

k∑t=1

x′it

k0∑t=1

xitx′it

k0∑t=1

xituit −k

k01√N

N∑i=1

c′i1C−1i

k0∑t=1

xituit

∣∣∣∣∣∣≤ 1√

N∑i=1

∥∥∥∥∥∥ 1Tk∑

x′it

k0∑t=1

xitx′it

k0c′i1C

∥∥∥∥∥∥∥∥∥∥∥∥ 1√

k0∑t=1

xituit

∥∥∥∥∥∥=

∥∥∥∥∥∥ 1√NT

N∑i=1

k0∑t=1

xituit

∥∥∥∥∥∥ op(1) = op(1). (D.14)

Applying the functional central limit theorem (FCLT), we can see that

1√NT

N∑i=1

k0∑t=1

uit ⇒ σW (τ0). (D.15)

Hence, we have, uniformly in τ ,

N∑i=1

k∑t=1

x′it

k0∑t=1

xitx′it

k0∑t=1

xituit ⇒ στW (τ0)

τ0. (D.16)

(iii) The coefficient δi is estimated as, if k < k0,

T∑t=k+1

xitx′it

−1T∑

xityit −

k∑t=1

xitx′it

−1k∑

xityit

T∑t=k+1

xitx′it

−1 k0∑t=k+1

xit(x′itβ

0i + uit) +

T∑t=k0+1

xit(x′itβ

0i + x′itδ

0i + uit)

β0i +

k∑t=1

xitx′it

−1k∑

xituit

T∑t=k+1

xitx′it

−1T∑

xituit −

k∑t=1

xitx′it

−1k∑

xituit +

T∑t=k+1

xitx′it

−1T∑

t=k0+1

xitx′itδ

T∑t=k+1

xitx′it

−1T∑

xituit −

k∑t=1

xitx′it

−1k∑

xituit + δ0i −

T∑t=k+1

xitx′it

−1k0∑

xitx′itδ

and if k ≥ k0,

T∑t=k+1

xitx′it

−1T∑

xityit −

k∑t=1

xitx′it

−1k∑

xityit

T∑t=k+1

xitx′it

−1T∑

xit(x′itβ

0i + x′itδ

0i + uit)

k∑t=1

xitx′it

−1 k0∑t=1

xit(x′itβ

0i + uit) +

k∑t=k0+1

xit(x′itβ

0i + x′itδ

0i + uit)

= δ0i +

T∑t=k+1

xitx′it

−1T∑

xituit −

k∑t=1

xitx′it

−1k∑

xituit −

k∑t=1

xitx′it

−1k∑

t=k0+1

xitx′itδ

Using the consistency property (D.1), the term∑k0

t=k+1xitx

′itδ

0i = op(1) is negligible and we

can see that,

√T (δi−δ0i ) =

T∑t=k+1

xitx′it

T∑t=k+1

xituit−

k∑t=1

xitx′it

k∑t=1

xituit+op

(1√T

(D.17)

Similarly to the proof of (ii), k in (D.17) can be replaced by k0 due to consistency that

kp−→ k0. Then, (D.17) is transformed into

√T (δi−δ0i ) =

T∑t=k0+1

xitx′it

T∑t=k0+1

xituit−

k0∑t=1

xitx′it

k0∑t=1

xituit+op

(1√T

(D.18)

Thus, we have,

1√NT

N∑i=1

k∑t=k+1

x′it(δi − δ0i ) =1√N

N∑i=1

k∑t=k+1

x′it√T (δi − δ0i )

=1√N

N∑i=1

k∑t=k0+1

x′it1k>k0 + op

)√T (δi − δ0i )

=1√N

N∑i=1

k∑t=k0+1

x′it1k>k0

√T (δi − δ0i ) + op

)Op(1)

=1√N

N∑i=1

k∑t=k0+1

x′it1k>k0

T∑t=k0+1

xitx′it

T∑t=k0+1

xituit

k0∑t=1

xitx′it

k0∑t=1

xituit

). (D.19)

The terms in bracket of (D.19) dominate the others. Similarly to (D.14)-(D.16), we can see

that, uniformly in τ ∈ (0, 1),

U3 ⇒ σ(τ − τ0)

[W (1)−W (τ0)

1− τ0− W (τ0)

]1τ>τ0.

Thus, we complete the proof of Lemma D.1. Using (D.4), (D.5) and Lemma D.1, we can show that,

1√NT

N∑i=1

[Tτ ]∑t=1

uit ⇒ σW (τ)− στW (τ0)

τ0− σ(τ − τ0)

[W (1)−W (τ0)

1− τ0− W (τ0)

]1τ>τ0,

uniformly in τ . Applying the continuous mapping theorem, we obtain,

supτ∈Ω(ϵ)

∣∣∣∣∣∣ 1√NT

N∑i=1

[Tτ ]∑t=1

∣∣∣∣∣∣2

⇒ supτ∈Ω(ϵ)

∣∣∣∣W (τ)− τW (τ0)

τ0− (τ − τ0)

[W (1)−W (τ0)

1− τ0− W (τ0)

]1τ>τ0

∣∣∣∣2 .(D.20)

We next derive the asymptotic distribution of the normalization process under the null

hypothesis. By definition (4.10), the normalization factor is based on the residuals uit, which

are calculated by regressing Yi on Xit(k1, k, k2) in (4.8). We assume that [Tϵ] ≤ k1 ≤ k− [Tϵ],

and k+[Tϵ] ≤ k2 ≤ [T (1−ϵ)], where k1, k2 are bounded away from endpoints and the common

break estimate k. Since k converges in probability to k0, we have |kj−k| > |k0−k| for j = 1, 2.

Thus, we only consider the case that k1 and k2 take values in k1 < k0 < k2. In this case, the

true model with common break k0 is written as

Yi = [Xi, X1i(k1, k0), X2i(k

0, k2), X3i(k2)]

0δ0iδ0i

= Xi(k1, k0, k2)b

01i + ui

= Xi(k1, k, k2)b01i + ui + [Xi(k1, k

0, k2))− Xi(k1, k, k2)]b01i. (D.21)

The residuals are calculated by

ui = Xi(k1, k, k2)b01i + ui + [Xi(k1, k

0, k2)− Xi(k1, k, k2)]b01i − Xi(k1, k, k2)b1i(k)

= ui + Xi(k1, k, k2)b01i − Xi(k1, k, k2)b1i(k)

+[0, X1i(k1, k0)−X1i(k1, k), X2i(k

0, k2)−X2i(k, k2), 0]

0δ0iδ0i

= ui − Xi(k1, k, k2)(b1i(k)− b01i) + [X2i(k

0, k2)−X2i(k, k2)]δ0i ,

whose vector form is

ui1ui2...

x′i1x′i2...

x′iT

(βi(k)− β0i )−

x′i(k1+1)...

x′ik0......0

δ1i(k)−

0......0

x′i(k+1)...

x′ik20...0

(δ2i(k)− δ0i )

0.........0

x′i(k2+1)...

x′iT

(δ3i(k)− δ0i ) +

0......0

x′i(k0+1)...

x′ik20...0

0......0

x′i(k+1)...

x′ik20...0

δ0i . (D.22)

For simplicity, k is suppressed in βi(k), δ1i(k), δ2i(k) and δ3i(k). Then, the normalization

factor is constructed by four terms V1, V2, V3, and V4, which are defined by

k1∑s=1

(1√NT

N∑i=1

s∑t=1

k∑s=k1+1

1√NT

N∑i=1

k∑t=s

k2∑s=k+1

1√NT

N∑i=1

s∑t=k+1

T∑s=k2+1

(1√NT

N∑i=1

T∑t=s

Lemma D.2 derives the asymptotic distributions of four terms under the null hypothesis.

Lemma D.2 Suppose that Assumptions 4.1-4.5 hold. We have, as N,T → ∞,

(i) V1 ⇒ σ2

∫ τ1

(W (r)− r

τ1W (τ1)

(ii) V2 ⇒ σ2

∫ τ0

[W (τ0)−W (r)− τ0 − r

τ0 − τ1(W (τ0)−W (τ1))

(iii) V3 ⇒ σ2

∫ τ2

[W (r)−W (τ0)− r − τ0

τ2 − τ0(W (τ2)−W (τ0))

(iv) V4 ⇒ σ2

[W (1)−W (r)− 1− r

1− τ2(W (1)−W (τ2))

Proof of Lemma D.2. (i) Using (D.22), the first term V1 can be rewritten as,

k1∑s=1

1√NT

N∑i=1

uit −s∑

x′it(βi − β0i )

k1∑s=1

(V11 − V12)2 .

Using the FCLT, it is shown that

V11 ⇒ σW (r). (D.23)

By the definition of βi, we can see that,

(k1∑t=1

xitx′it

)−1 k1∑t=1

xityit =

(k1∑t=1

xitx′it

)−1 k1∑t=1

xit(x′itβ

0i + uit) = β0

(k1∑t=1

xitx′it

)−1 k1∑t=1

xituit,

thus, we have,

V12 =1√N

N∑i=1

s∑t=1

x′it

k1∑t=1

xitx′it

)−11√T

k1∑t=1

xituit

⇒ σr

τ1W (τ1). (D.24)

Combining the results (D.23) and (D.24) and using the continuous mapping theorem, we can

derive the asymptotic distribution of the first term V1 as follows:

V1 ⇒ σ2

∫ τ1

(W (r)− r

τ1W (τ1)

(ii) The second term V2 can be rewritten as

k∑s=k1+1

1√NT

N∑i=1

k∑t=s

uit −k∑

k∑t=s

x′itδ1i +

k∑t=s

x′itδ0i 1k0<s<k

k∑s=k1+1

1√NT

N∑i=1

k∑t=s

uit −1√NT

N∑i=1

k∑t=s

1√NT

N∑i=1

k∑t=s

x′itδ1i + op

k∑s=k1+1

(V21 − V22 − V23 + op

Since k coincides asymptotically with true break date from (D.1), k in V21, V22, V23 can be

replaced by k0. Then, we can show that

V21 =1√NT

N∑i=1

k∑t=1

uit −1√NT

N∑i=1

s−1∑t=1

uit ⇒ σ(W (τ0)−W (r)), (D.25)

V22 =1√N

N∑i=1

k∑t=s

x′it√T (βi − β0

=1√N

N∑i=1

k0∑t=s

x′it + op

k1∑t=1

xitx′it

)−11√T

k1∑t=1

xituit

⇒ σ(τ0 − r)W (τ1)

τ1. (D.26)

The coefficient estimator δ1i in V23 can be calculated as,

δ1i =

k∑t=k1+1

xitx′it

−1k∑

t=k1+1

xityit −

(k1∑t=1

xitx′it

)−1 k1∑t=1

xityit

k∑t=k1+1

xitx′it

−1k∑

t=k1+1

xit(x′itβ

0i + uit)1k≤k0 +

k0∑t=k1+1

xit(x′itβ

0i + uit)

k∑t=k0+1

xit(x′itβ

0i + x′itδ

0i + uit)

1k>k0 −

β0i +

(k1∑t=1

xitx′it

)−1 k1∑t=1

xituit

k∑t=k1+1

xitx′it

−1k∑

t=k1+1

xituit −

(k1∑t=1

xitx′it

)−1 k1∑t=1

xituit

k∑t=k1+1

xitx′it

−1k∑

t=k0+1

xitx′itδ

0i 1k>k0

k∑t=k1+1

xitx′it

−1k∑

t=k1+1

xituit −

(k1∑t=1

xitx′it

)−1 k1∑t=1

xituit + op

Then, the third term V23 becomes, as N,T → ∞,

V23 =1√N

N∑i=1

k∑t=s

x′it√T δ1i

=1√N

N∑i=1

k0∑t=s

x′it + op

k0∑t=k1+1

xitx′it

−1k0∑

t=k1+1

xituit

−√T

(k1∑t=1

xitx′it

)−1 k1∑t=1

xituit + op

(1√T

=1√N

N∑i=1

k0∑t=s

x′it

k0∑t=k1+1

xitx′it

−1k0∑

t=k1+1

xituit −√T

(k1∑t=1

xitx′it

)−1 k1∑t=1

xituit

(√N√T

⇒ σ(τ0 − r)

(W (τ0)−W (τ1)

τ0 − τ1− W (τ1)

), (D.27)

since N/T → 0. Combining results (D.25), (D.26), and (D.27), we have,

V2 ⇒∫ τ0

[σ(W (τ0)−W (r))− σ(τ0 − r)

W (τ1)

τ1− σ(τ0 − r)

(W (τ0)−W (τ1)

τ0 − τ1− W (τ1)

∫ τ0

(W (τ0)−W (r)− (τ0 − r)

W (τ0)−W (τ1)

τ0 − τ1

(iii) The third term V3 can be rewritten as

k2∑s=k+1

1√NT

N∑i=1

s∑t=k+1

uit −s∑

s∑t=k+1

x′it(δ2i − δ0i )

−k0∑

x′itδ0i 1k<k0<s −

s∑t=k+1

x′itδ0i 1s≤k0

k2∑s=k+1

1√NT

N∑i=1

s∑t=k+1

uit −1√NT

N∑i=1

s∑t=k+1

1√NT

N∑i=1

s∑t=k+1

− op

k2∑s=k+1

(V31 − V32 − V33 − op

Similar to (D.25) and (D.27), we can find that,

V31 ⇒ σ(W (r)−W (τ0)), (D.28)

V32 =1√N

N∑i=1

s∑t=k0+1

x′it + op

k1∑t=1

xitx′it

)−11√T

k1∑t=1

xituit

⇒ σ(r − τ0)W (τ1)

τ1. (D.29)

The coefficient δ2i is estimated as

δ2i =

k2∑t=k+1

xitx′it

−1k2∑

xityit −

(k1∑t=1

xitx′it

)−1 k1∑t=1

xityit

k2∑t=k+1

xitx′it

−1k2∑

xit(x′itβ

0i + x′itδ

0i + uit)1k0≤k +

k0∑t=k+1

xit(x′itβ

0i + uit)

k2∑t=k0+1

xit(x′itβ

0i + x′itδ

0i + uit)

β0i +

(k1∑t=1

xitx′it

)−1 k1∑t=1

xituit

= δ0i +

k2∑t=k+1

xitx′it

−1k2∑

xituit −

(k1∑t=1

xitx′it

)−1 k1∑t=1

xituit − op

Then, similar to V23, the term V33 becomes, as N,T → ∞,

N∑i=1

s∑t=k+1

x′it√T (δ2i − δ0i )

=1√N

N∑i=1

s∑t=k0+1

x′it + op

k2∑t=k+1

xitx′it

−1k2∑

xituit

−√T

(k1∑t=1

xitx′it

)−1 k1∑t=1

xituit + op

(1√T

=1√N

N∑i=1

s∑t=k0+1

x′it

k2∑t=k0+1

xitx′it

−1k2∑

t=k0+1

xituit −√T

(k1∑t=1

xitx′it

)−1 k1∑t=1

xituit

(√N√T

⇒ σ(r − τ0)

(W (τ2)−W (τ0)

τ2 − τ0− W (τ1)

), (D.30)

since N/T → 0. Combining results (D.28), (D.29), and (D.30), we have,

V3 ⇒ σ2

∫ τ2

(W (r)−W (τ0)− (r − τ0)

W (τ2)−W (τ0)

τ2 − τ0

(iv) The fourth term V4 can be rewritten by

T∑s=k2+1

1√NT

N∑i=1

[T∑t=s

uit −T∑t=s

T∑t=s

T∑s=k2+1

[1√NT

N∑i=1

T∑t=s

uit −1√NT

N∑i=1

T∑t=s

uitx′it(βi − β0

i )−1√NT

N∑i=1

T∑t=s

uitx′it(δ3i − δ0i )

T∑s=k2+1

(V41 − V42 − V43)2 .

It is easily seen that

V41 ⇒ σ(W (1)−W (r)), (D.31)

V42 =1√N

N∑i=1

T∑t=s

x′it

k1∑t=1

xitx′it

)−11√T

k1∑t=1

xituit

⇒ σ(1− r)W (τ1)

τ1. (D.32)

The coefficient estimator δ3i can be written as,

δ3i =

T∑t=k2+1

xitx′it

−1T∑

t=k2+1

xityit −

(k1∑t=1

xitx′it

)−1 k1∑t=1

xityit

T∑t=k2+1

xitx′it

−1 T∑t=k2+1

xit(x′itβ

0i + x′itδ

0i + uit)

β0i +

(k1∑t=1

xitx′it

)−1 k1∑t=1

xituit

= δ0i +

T∑t=k2+1

xitx′it

−1T∑

t=k2+1

xituit −

(k1∑t=1

xitx′it

)−1 k1∑t=1

xituit.

Then, it is shown that, as N,T → ∞,

V43 =1√N

N∑i=1

T∑t=s

x′it√T (δ3i − δ0i )

=1√N

N∑i=1

T∑t=s

x′it

T∑t=k2+1

xitx′it

−1T∑

t=k2+1

xituit −

(k1∑t=1

xitx′it

)−1 k1∑t=1

xituit

⇒ σ(1− r)

(W (1)−W (τ2)

1− τ2− W (τ1)

), (D.33)

since N/T → 0. From (D.31), (D.32), and (D.33), we have,

V4 ⇒ σ2

[W (1)−W (r)− 1− r

1− τ2(W (1)−W (τ2))

The proof of Lemma D.2 is complete. Proof of Theorem 4.1. Combining the asymptotic distributions in (D.20) and Lemma D.2,

we can complete the proof.

Proofs of Propositions 4.1-4.3 and Theorem 4.2

We first investigate the statistical properties of the common break estimator k under the

alternative hypothesis in Proposition 4.1. The proof follows the proof of Lemmas 1-2 and

Theorem 1 in Baltagi et al. (2016). Suppose that there are two groups and the individ-

uals in the same group share the common break date. Denote these groups G1 = i :

individuals in group 1 with common break k01 and G2 = i : individuals in group 2 with

common break k02. The model under the alternative can be specified byyit = x′itβ

0i + x′itδ

01i1(t>k01)

+ uit t = 1, · · · , T, for i ∈ G1,

yit = x′itβ0i + x′itδ

02i1(t>k02)

+ uit t = 1, · · · , T, for i ∈ G2.

The vector form can be rewritten by

Yi = [Xi, Zi(k0j )]b

0i + ui, for i ∈ Gj , j = 1, 2.

The common break point is estimated in (4.3) by minimizing the total sum of squared OLS

residuals. Let SSRi denote the sum of squared residuals of regression Yi on Xi (no break

case). Using the equality on page 185 of Baltagi et al. (2016),

SSRi − SSRi(k∗) = δi(k

∗)′[Zi(k∗)′MiZi(k

∗)]δi(k∗),

estimation (4.3) can be transformed into

k = arg min1≤k∗≤T−1

N∑i=1

SSRi(k∗)

= arg max1≤k∗≤T−1

N∑i=1

(SSRi − SSRi(k∗))

N∑i=1

SVi(k∗)

∑i∈G1

(SVi(k∗)− SVi(k

01)) +

∑i∈G2

(SVi(k∗)− SVi(k

, (D.34)

Mi = I −Xi(X′iXi)

−1X ′i,

SVi(k∗) = δi(k

∗)′[Zi(k∗)′MiZi(k

∗)]δi(k∗),

SVi(k0j ) = δi(k

′[Zi(k0j )

′MiZi(k0j )]δi(k

0j ), for, j = 1, 2.

For individuals in group 1, we can see that the coefficients estimators are given by

δi(k∗) = [Zi(k

∗)′MiZi(k∗)]−1Zi(k

∗)MiYi,

δi(k01) = [Zi(k

′MiZi(k01)]

−1Zi(k01)MiYi.

Replacing Yi by

Yi = Xiβ0i + Zi(k

0i + ui,

we have,

δi(k∗) = [Zi(k

∗)′Mi[Xiβ0i + Zi(k

0i + ui]

= [Zi(k∗)′MiZi(k

∗)]−1Zi(k∗)′MiZi(k

0i + [Zi(k

∗)′Miui,

δi(k01) = [Zi(k

′MiZi(k01)]

−1Zi(k01)

′Mi[Xiβ0i + Zi(k

0i + ui]

= δ0i + [Zi(k01)

′MiZi(k01)]

−1Zi(k01)

′Miui.

Similarly, for individuals in group 2, by replacing Yi by

Yi = Xiβ0i + Zi(k

0i + ui,

the coefficients estimators are rewritten as

δi(k∗) = [Zi(k

∗)′Mi[Xiβ0i + Zi(k

0i + ui]

= [Zi(k∗)′MiZi(k

∗)]−1Zi(k∗)′MiZi(k

0i + [Zi(k

∗)′Miui,

δi(k01) = [Zi(k

′MiZi(k01)]

−1Zi(k01)

′Mi[Xiβ0i + Zi(k

0i + ui]

= [Zi(k01)

′MiZi(k01)]

−1Zi(k01)

′MiZi(k02)δ

0i + [Zi(k

′MiZi(k01)]

−1Zi(k01)

′Miui.

To simplify notations, we use Zi, Z01i, Z

02i to replace Zi(k

∗), Zi(k01), Zi(k

02). For individuals in

group 1, we have

SVi(k∗) = δ0i

′Z01i′MiZi(Z

′iMiZi)

−1Z ′iMiZ

0i + 2δ0i

′Z01i′MiZi(Z

′iMiZi)

−1Z ′iMiui

+u′iMiZ′i(Z

′iMiZi)

−1Z ′iMiui, (D.35)

SVi(k01) = δ0i

′Z01i′MiZ

0i + 2δ0i

′Z01i′Miui + u′iMiZ

01i′MiZ

−1Z01i′Miui. (D.36)

Using (D.35) and (D.36), SVi(k∗)− SVi(k

01) becomes

SVi(k∗)− SVi(k

01) = −δ0i

′[Z01i′MiZ

01i − Z0

1i′MiZi(Z

′iMiZi)

−1Z ′iMiZ

+2δ0i′Z01i′MiZi(Z

′iMiZi)

−1Z ′iMiui − 2δ0i

′Z01i′Miui

+u′iMiZ′i(Z

′iMiZi)

−1Z ′iMiui − u′iMiZ

01i′MiZ

−1Z01i′Miui,

and can be decomposed into the term defined by

J1i(k∗) = δ0i

′[Z01i′MiZ

01i − Z0

1i′MiZi(Z

′iMiZi)

−1Z ′iMiZ

]δ0i (D.37)

and the term related to disturbance ui defined by

H1i(k∗) = 2δ0i

′Z01i′MiZi(Z

′iMiZi)

′Z01i′Miui

+u′iMiZ′i(Z

′iMiZi)

01i′MiZ

−1Z01i′Miui. (D.38)

Then, we have SVi(k∗) − SVi(k

01) = −J1i(k

∗) + H1i(k∗) for i ∈ G1. Similar transformation

for individuals in group 2, we can see that

SVi(k∗) = δ0i

′Z02i′MiZi(Z

′iMiZi)

−1Z ′iMiZ

0i + 2δ0i

′Z02i′MiZi(Z

′iMiZi)

−1Z ′iMiui

+u′iMiZ′i(Z

′iMiZi)

−1Z ′iMiui, (D.39)

SVi(k01) = δ0i

′Z02i′MiZ

01i′MiZ

−1Z01iMiZ

0i + 2δ0i

′Z02i′MiZ

01i′MiZ

−1Z01iMiui

+u′iMiZ01i(Z

01i′MiZ

−1Z01i′Miui. (D.40)

Using (D.39) and (D.40), it is easily seen that, for i ∈ G2,

SVi(k∗)− SVi(k

01) = −J2i(k

∗) +H2i(k∗),

where the term J2i(k∗) is denoted by

J2i(k∗) = δ0i

′[Z02i′MiZ

01i′MiZ

−1Z01iMiZ

02i − Z0

2i′MiZi(Z

′iMiZi)

−1Z ′iMiZ

]δ0i , (D.41)

and the term H2i(k∗) related to disturbance is denoted by

H2i(k∗) = 2δ0i

′Z02i′MiZi(Z

′iMiZi)

′Z02i′MiZ

01i′MiZ

−1Z01iMiui

+u′iMiZ′i(Z

′iMiZi)

01i′MiZ

−1Z01i′Miui. (D.42)

Thus, (D.34) can be rewritten as

k = arg max1≤k∗≤T−1

∑i∈G1

(−J1i(k∗) +H1i(k

∗)) +∑i∈G2

(−J2i(k∗) +H2i(k

Define the sets K(C1) = k : 1 ≤ k < k01 − C1, K(C2) = k : k02 + C2 < k ≤ T − C1, and

K(C) = K(C1)∪K(C2) = k : 1 ≤ k < k01 −C1 or k02 +C2 < k ≤ T with positive constants

C1, C2. We next show that the common break estimator cannot appear in the set K(C1) by

Lemmas D.3-D.4. A similar result can be obtained for the set K(C2) by symmetry and thus

the details are omitted. Define

Z∆1i =

∗)− Zi(k01) if k < k01

−(Zi(k∗)− Zi(k

01)) if k ≥ k01

and Z∆2i =

∗)− Zi(k02) if k < k02

−(Zi(k∗)− Zi(k

02)) if k ≥ k02

Lemma D.3 Under Assumptions 4.1-4.6, for all large N and T , with probability tending to

infk∗∈K(C1)

k01 − k∗

∑i∈G1

J1i(k∗) +

∑i∈G2

J2i(k∗)

≥ λϕN .

Proof of Lemma D.3. We first show that the summation of part J1i(k∗) has a lower bound

in the case of k∗ ∈ K(C1). From Lemma A.2 in Bai(1997), if k < k01,

J1i(k∗) = δ0i

′[Z01i′MiZ

01i − Z0

1i′MiZi(Z

′iMiZi)

−1Z ′iMiZ

= δ0i′Z∆1i

′Z∆1i

(Z ′iZi

)−1Z01i′Z01iδ

0i . (D.43)

Since the matrixZ∆1i

′Z∆1i

k01 − k∗

(Z ′iZi

)−1 Z01i′Z01i

T(D.44)

is symmetric and positive definite from Assumption 4.4, we have

k01 − k∗

∑i∈G1

J1i(k∗) =

∑i∈G1

δ0i′S′iΛiSiδ

∑i∈G1

δ0′

i Λiδ0i ≥

∑i∈G1

λiδ0′i δ0i , (D.45)

where Λi is a diagonal matrix comprising of the eigenvalues of matrix (D.44), δ0i = Siδ0i , and

λi is the minimum eigenvalue of (D.44). Since δ0′

i δ0i = δ0′

i S′iSiδ

0i = δ0

′i δ0i , with probability

tending to one for large N and T, we show that

k01 − k∗

∑i∈G1

J1i(k∗) ≥ λ1

∑i∈G1

δ0′

i δ0i = λ1ϕN1 , (D.46)

where λ1 = mini∈G1λi. We next investigate the lower bound of J2i(k∗) for individuals in

group 2. Denote

Vi(a, b) =

x′i(a+1)

x′i(a+2)...x′ib

, V 0i (a, b, c) =

0(b−a)×p

x′i(b+1)

x′i(b+2)...x′ic

, and S =

[I 0−I I

where Vi(a, b) is a (b − a) × p matrix whose jth row is the same as (a + j)th row of Xi,

V 0i (a, b, c) is a (c− a)× p matrix whose first b− a rows are zeros and the jth row is the same

as (a+ j)th row of Xi for j > b− a, and S is a 2p× 2p matrix constructed by p× p identity

matrix I. The second term J2i(k) can be transformed into

J2i(k∗) = δ0i

′[Z02i′MiZ

01i′MiZ

−1Z01iMiZ

02i − Z0

2i′MiZi(Z

′iMiZi)

−1Z ′iMiZ

= δ0i′Z02i′[MiZ

01i′MiZ

−1Z01iMiZ

02i − Z0

2i′MiZi(Z

′iMiZi)

−1Z ′iMi

]Z02iδ

= δ0i′Z02i′[Mi − Z0

2i′MiZi(Z

′iMiZi)

−1Z ′iMi −

(Mi −MiZ

01i′MiZ

−1Z01iMiZ

)]Z02iδ

= δ0i′Z02i′(MW −MW 0

)Z02iδ

= δ0i′Z02i′(MW −MW 0

)Z02iδ

0i , (D.47)

W = [Xi, Zi(k∗)] =

[Vi(0, k

∗) 0Vi(k

∗, T ) Vi(k∗, T )

], W 0

1 = [Xi, Zi(k01)] =

[Vi(0, k

Vi(k01, T

01 ) Vi(k

01, T )

[Vi(0, k

∗) 00 Vi(k

∗, T )

], W 0

[Vi(0, k

0 Vi(k01, T )

MX = I −X(X ′X)−1X ′, for matrixX.

The final equality (D.47) holds because

MW = I −W (W ′W )−1W ′ = I −WS(S′W ′WS)−1S′W ′ = I − W(W ′W

)−1W = MW ,

MW 01= I −W 0

1 (W01′W 0

1 )−1W 0

1′= I −W 0

1 S(S′W 0

1′W 0

1 S)−1S′W 0

1′= I − W 0

(W 0′

1 W 01

)−1W 0′

1 = MW 01.

Since W and W 01 are block matrices, it follows that

Z02i′MWZ0

= Z02i′[I − W

(W ′W

)−1W]Z02i

= [0, V 0i (k

∗, k02, T )]′[I − W

(W ′W

)−1W] [ 0

V 0i (k

∗, k02, T )

]= V 0

i (k∗, k02, T )

′V 0i (k

∗, k02, T )− V 0i (k

∗, k02, T )′Vi(k

∗, T )(Vi(k

∗, T )′Vi(k∗, T )

)−1Vi(k

∗, T )V 0i (k

∗, k02, T )

= Vi(k02, T )

′Vi(k

02, T )− Vi(k

02, T )

′Vi(k

02, T )

∗, T )′Vi(k∗, T )

)−1Vi(k

02, T )

′Vi(k

02, T )

= Vi(k02, T )

′Vi(k

02, T )

[(Vi(k

02, T )

′Vi(k

02, T )

)−1−(Vi(k

∗, T )′Vi(k∗, T )

)−1]Vi(k

02, T )

′Vi(k

02, T ), (D.48)

Z02i′MW 0

= Z02i′[I − W 0

(W 0′

1 W 01

)−1W 0′

= [0, V 0i (k

02, T )]

′[I − W 0

(W 0′

1 W 01

)−1W 0′

V 0i (k

02, T )

]= V 0

i (k01, k

02, T )

′V 0i (k

02, T )− V 0

i (k01, k

02, T )

′Vi(k01, T )

01, T )

′Vi(k

01, T )

)−1Vi(k

01, T )V

02, T )

= Vi(k02, T )

′Vi(k

02, T )

[(Vi(k

02, T )

′Vi(k

02, T )

)−1−(Vi(k

01, T )

′Vi(k

01, T )

)−1]Vi(k

02, T )

′Vi(k

02, T ). (D.49)

Substituting (D.48) and (D.49) into (D.47), we have

J2i(k∗) = δ0

′i Vi(k

02, T )

′Vi(k

02, T )

[(Vi(k

01, T )

′Vi(k

01, T )

)−1−(Vi(k

∗, T )′Vi(k∗, T )

)−1]Vi(k

02, T )

′Vi(k

02, T )δ

= δ0′

i Z02i′Z02i

[(Z01i′Z01i

)−1−(Z ′iZi

)−1]Z02i′Z02iδ

= δ0′

i Z02i′Z02i

(Z ′iZi

)−1(Z ′iZi − Z0

1i′Z01i

)(Z01i′Z01i

)−1Z02i′Z02iδ

= δ0′

i Z02i′Z02i

(Z ′iZi

)−1Z∆1i

′Z∆1i

(Z01i′Z01i

)−1Z02i′Z02iδ

0i , (D.50)

which is symmetric from the first equality. Hence, under Assumptions 4.4 and 4.5,

Z02i′Z02i

(Z ′iZi

)−1 Z∆1i

′Z∆1i

k01 − k∗

(Z01i′Z01i

)−1Z02i′Z02i

T(D.51)

is positive definite. Then, we have

k01 − k∗

∑i∈G2

J2i(k∗) ≥ λ2

∑i∈G2

δ0′

i δ0i = λ2ϕN2 , (D.52)

where λ2 = mini∈G2λi and λi is the minimum eigenvalue of matrix (D.51). From inequal-

ities (D.46) and (D.52), the proof of Lemma D.3 is complete.

Lemma D.4 Under Assumptions 4.1-4.6, uniformly on k∗ ∈ K(C1),

(i)∑i∈G1

k01 − k∗δ0i

′Z∆1i

′ui = Op(

√ϕN1),

(ii)∑i∈G1

k01 − k∗δ0i

′Z∆1i

(X ′

)−1X ′

iui = Op

(√ϕN1

(iii)∑i∈G1

k01 − k∗δ0i

′Z∆1i

′MiZi

(Z ′iMiZi

)−1Z ′iMiui = Op

(√ϕN1

∑i∈G2

k01 − k∗δ0i

′Z02i′MiZ

(Z ′iMiZi

)−1Z ′iMiui = Op

(√ϕN2

∑i∈G2

k01 − k∗δ0i

′Z02i′MiZ

(Z ′iMiZi

)−1Z∆1i

′Miui = Op

(√ϕN2

∑i∈G2

k01 − k∗δ0i

′Z02i′MiZ

′MiZi

)−1 −(Z01i′MiZ

)−1]Z01i′Miui = Op

(√ϕN2

(iv)N∑i=1

k01 − k∗u′iMiZ

(Z ′iMiZi

)−1Z∆1i

′Miui = Op

N∑i=1

(Z ′iMiZi

)−1Z01i′Miui = Op

(N√T

N∑i=1

[(Z ′iMiZi

)−1 −(Z01i′MiZ

)−1]Z01i′Miui = Op

Proof of Lemma D.4. (i) It is shown that

1√k01 − k∗

Z∆1i

′ui = Op(1), since V ar

k01 − k∗Z∆1i

)< ∞.

Then, we have ∑i∈G1

k01 − k∗δ0i

′Z∆1i

′ui = Op(

√ϕN1).

(ii) We can show that

∑i∈G1

k01 − k∗δ0i

′Z∆1i

(X ′

)−1X ′

iui =1√T

∑i∈G1

δ0i′ Z∆

1i′Xi

k01 − k∗

(X ′

)−1 1√TX ′

iui = Op

(√ϕN1

since for large T1√TXi

′ui = Op(1).

(iii) By expanding Mi, we can show that

∑i∈G1

k01 − k∗δ0i

′Z∆1i

′MiZi

(Z ′iMiZi

)−1Z ′iMiui

=1√T

∑i∈G1

δ0i′ Z∆

1i′Zi

k01 − k∗

(Z ′iMiZi

)−1 1√TZ ′iMiui

− 1√T

∑i∈G1

δ0i′ Z∆

1i′Xi

k01 − k∗

(X ′

)−1 X ′iZi

(Z ′iMiZi

(√ϕN1

To prove the second order, since Z02i′Z∆1i = 0, we have

∑i∈G2

k01 − k∗δ0i

′Z02i′MiZ

(Z ′iMiZi

)−1Z ′iMiui

=∑i∈G2

k01 − k∗δ0i

′Z02i′Z∆1i

(Z ′iMiZi

)−1Z ′iMiui

− 1√T

∑i∈G2

δ0i′Z0

2i′Xi

(X ′

)−1 X ′iZ

k01 − k∗

(Z ′iMiZi

(√ϕN2

Considering the third order, we can show that

∑i∈G2

k01 − k∗δ0i

′Z02i′MiZ

(Z ′iMiZi

)−1Z∆1i

′Miui

=∑i∈G2

k01 − k∗δ0i

′Z02i′MiZ

(Z ′iMiZi

)−1Z∆1i

−∑i∈G2

k01 − k∗δ0i

′Z02i′MiZ

(Z ′iMiZi

)−1Z∆1i

′Xi(X

′iXi)

−1X ′iui

k01 − k∗Op

(√ϕN2

The last term can be transformed into∑i∈G2

k01 − k∗δ0i

′Z02i′MiZ

′MiZi

)−1(Z01i′MiZ

01i − Zi

′MiZi

)(Z01i′MiZ

)−1Z01i′Miui

=∑i∈G2

k01 − k∗δ0i

′Z02i′MiZ

′MiZi

)−1(−Z∆

1i′Z∆1i − Z∆

1i′MiZ

01i − Z0

1i′MiZ

)(Z01i′MiZ

)−1Z01i′Miui

+∑i∈G2

k01 − k∗δ0i

′Z02i′MiZ

′MiZi

)−1(Z∆1i

′Xi(X

′iXi)

−1X ′iZ

)(Z01i′MiZ

)−1Z01i′Miui

(√ϕN2

(iv) The term∑N

T (k01−k∗)u′iMiZ

(Z′iMiZi

)−1Z∆1i

′Miui has the same order as that of∑N

T (k01−k∗)u′iMiZ

∆1iZ

′Miui since the matrix

Z′iMiZi

T = Op(1) for large T. Expanding

matrix Mi, we have

N∑i=1

T (k01 − k∗)u′iMiZ

∆1iZ

′Miui

T (k01 − k∗)

N∑i=1

u′iZ∆1iZ

′ui −

T (k01 − k∗)

N∑i=1

u′iXi(X′iXi)

−1X ′iZ

∆1iZ

T (k01 − k∗)

N∑i=1

u′iXi(X′iXi)

−1X ′iZ

∆1iZ

′Xi(X

′iXi)

−1X ′iui (D.53)

Consider the first term in (D.53),

T (k01 − k∗)

N∑i=1

u′iZ∆1iZ

′ui = Op

Similarly, it can be shown that the second term

T (k01 − k∗)

N∑i=1

u′iXi(X′iXi)

−1X ′iZ

∆1iZ

√k01 − k∗

N∑i=1

u′iXi√T

(X ′

)−1 X ′iZ

k01 − k∗Z∆1i

′ui√

k01 − k∗= Op

and the third term

T (k01 − k∗)

N∑i=1

u′iXi(X′iXi)

−1X ′iZ

∆1iZ

′Xi(X

′iXi)

−1X ′iui

=k01 − k∗

N∑i=1

u′iXi√T

(X ′

)−1 X ′iZ

k01 − k∗Z∆1i

k01 − k∗

(X ′

)−1 X ′iui√T

Thus, we have

N∑i=1

(Z ′iMiZi

)−1Z∆1i

′Miui = Op

(v) By expanding Mi, it is shown that

N∑i=1

(Z ′iMiZi

)−1Z01i′Miui

=N∑i=1

k01 − k∗u′iZ

(Z ′iMiZi

)−1Z01i′Miui −

N∑i=1

k01 − k∗u′iXi(X

′iXi)

−1X ′iZ

(Z ′iMiZi

)−1Z01i′Miui

(N√T

(vi) We show that

N∑i=1

k01 − k∗ui

′MiZ01i

′MiZi

)−1(Z01i′MiZ

01i − Zi

′MiZi

)(Z01i′MiZ

)−1Z01i′Miui

N∑i=1

k01 − k∗ui

′MiZ01i

′MiZi

)−1(−Z∆

1i′MiZ

∆1i − Z∆

1i′MiZ

01i − Z0

1i′MiZ

)(Z01i′MiZ

)−1Z01i′Miui

The proof of Lemma D.4 is complete.Proof of Proposition 4.1. We first show that for any given ϵ > 0,

supK(C1)

∣∣∣∣∣∣∣∑i∈G1

H1i(k∗) +

∑i∈G2

H2i(k∗)

k01 − k∗

∣∣∣∣∣∣∣ ≥ λϕN

< ϵ. (D.54)

Using (D.38) and (D.42), we see that the sum of H1i(k∗) +H2i(k

∗) can be decomposed into

three parts:

k01 − k∗

∑i∈G1

H1i(k∗) +

∑i∈G2

H2i(k∗)

k01 − k∗

∑i∈G1

2δ0i′Z01i′MiZi(Z

′iMiZi)

−1Z ′iMiui −

∑i∈G1

2δ0i′Z01i′Miui

k01 − k∗

∑i∈G2

2δ0i′Z02i′MiZi(Z

′iMiZi)

−1Z ′iMiui −

∑i∈G2

2δ0i′Z02i′MiZ

01i′MiZ

−1Z01iMiui

k01 − k∗

[N∑i=1

u′iMiZ′i(Z

′iMiZi)

−1Z ′iMiui −

N∑i=1

u′iMiZ01i(Z

01i′MiZ

−1Z01i′Miui

]= H1 +H2 +H3.

Consider the first term, by replacing Z01i by Zi − Z∆

|H1| = 2

∣∣∣∣∣∣ 1

k01 − k∗

∑i∈G1

′Z01i′MiZi(Z

′iMiZi)

−1Z ′iMiui − δ0i

′Z01i′Miui

]∣∣∣∣∣∣= 2

∣∣∣∣∣∣ 1

k01 − k∗

∑i∈G1

′Z ′iMiui − δ0i

′Z∆1i

′MiZi(Z

′iMiZi)

−1Z ′iMiui − δ0i

′Miui + δ0i′Z∆1i

′Miui

]∣∣∣∣∣∣= 2

∣∣∣∣∣∣ 1

k01 − k∗

∑i∈G1

′Z∆1i

′Miui − δ0i

′Z∆1i

′MiZi(Z

′iMiZi)

−1Z ′iMiui

]∣∣∣∣∣∣≤ 2

∣∣∣∣∣∣ 1

k01 − k∗

∑i∈G1

δ0i′Z∆1i

∣∣∣∣∣∣+ 2

∣∣∣∣∣∣ 1

k01 − k∗

∑i∈G1

δ0i′Z∆1i

′Xi(X

′iXi)

−1X ′iui

∣∣∣∣∣∣+2

∣∣∣∣∣∣ 1

k01 − k∗

∑i∈G1

δ0i′Z∆1i

′MiZi(Z

′iMiZi)

−1Z ′iMiui

∣∣∣∣∣∣= Op(

√ϕN1) +Op

(√ϕN1

), (D.55)

where the inequality is obtained by expanding Mi and the final equality uses the orders in

(i)-(iii) of Lemma D.4.

For the second term H2, replacing Zi by Z∆1i + Z0

1i, and using (iii) of Lemma D.4,

|H2| = 2

∣∣∣∣∣∣ 1

k01 − k∗

∑i∈G2

[2δ0i

′Z02i′MiZi(Z

′iMiZi)

′Z02i′MiZ

01i′MiZ

−1Z01i′Miui

]∣∣∣∣∣∣= 2

∣∣∣∣∣∣ 1

k01 − k∗

∑i∈G2

′Z02i′MiZ

∆1i(Z

′iMiZi)

−1Z ′iMiui + δ0i

′Z02i′MiZ

′iMiZi)

−1Z ′iMiui

−δ0i′Z02i′MiZ

01i′MiZ

−1Z01iMiui

]∣∣∣= 2

∣∣∣∣∣∣ 1

k01 − k∗

∑i∈G2

′Z02i′MiZ

∆1i(Z

′iMiZi)

−1Z ′iMiui + δ0i

′Z02i′MiZ

′iMiZi)

−1Z∆1i

′Miui

+δ0i′Z02i′MiZ

′MiZi

)−1 −(Z01i′MiZ

)−1]Z01i′Miui

]∣∣∣∣≤ 2

∣∣∣∣∣∣ 1

k01 − k∗

∑i∈G2

δ0i′Z02i′MiZ

∆1i(Z

′iMiZi)

−1Z ′iMiui

∣∣∣∣∣∣+2

∣∣∣∣∣∣ 1

k01 − k

∑i∈G2

δ0i′Z02i′MiZ

′iMiZi)

−1Z∆1i

′Miui

∣∣∣∣∣∣ (D.56)

∣∣∣∣∣∣ 1

k01 − k∗

∑i∈G2

δ0i′Z02i′MiZ

′MiZi

)−1 −(Z01i′MiZ

)−1]Z01i′Miui

∣∣∣∣∣∣= Op

(√ϕN2

√ϕN2) +Op

(√ϕN2

). (D.57)

By (iv)-(vi) of Lemma D.4, the order of the third term is

|H3| =

∣∣∣∣∣ 1

k01 − k∗

N∑i=1

[u′iMiZ

′i(Z

′iMiZi)

01i′MiZ

−1Z01i′Miui

]∣∣∣∣∣≤

∣∣∣∣∣ 1

k01 − k∗

N∑i=1

u′iMiZ∆1i

′(Z ′

iMiZi)−1Z∆

1i′Miui

∣∣∣∣∣+∣∣∣∣∣ 1

k01 − k∗

N∑i=1

u′iMiZ∆1i

′(Z ′

iMiZi)−1Z0

1i′Miui

∣∣∣∣∣+

∣∣∣∣∣N∑i=1

[(Z ′iMiZi

)−1 −(Z01i′MiZ

)−1]Z01i′Miui

∣∣∣∣∣= Op

(N√T

). (D.58)

Combining (D.55), (D.57) and (D.58), under Assumption 4.2, the term

∣∣∣∣∣∣∣∑i∈G1

H1i(k∗) +

∑i∈G2

H2i(k∗)

k01 − k∗

∣∣∣∣∣∣∣=

∣∣∣∣∣[Op(

√ϕN1) +Op(

√ϕN2) +Op

(√ϕN1

(√ϕN2

(N√T

)]∣∣∣∣∣→ 0,

which will vanish for any k∗ ∈ K(C1). On the other hand, the part ϕ−1N (k01 − k∗)−1∣∣∣∣∣ ∑i∈G1

J1i(k∗) +

∑i∈G2

J2i(k∗)

∣∣∣∣∣ has a lower bound from Lemma D.3. Hence, for any ϵ > 0,

supK(C1)

∣∣∣∣∣∣∣∑i∈G1

H1i(k∗) +

∑i∈G2

H2i(k∗)

k01 − k∗

∣∣∣∣∣∣∣ ≥ supK(C1)

∣∣∣∣∣∣∣∑i∈G1

J1i(k∗) +

∑i∈G2

J2i(k∗)

k01 − k∗

∣∣∣∣∣∣∣ < ϵ,

which implies that

supK(C1)

∑i∈G1

−J1i(k∗) +H1i(k

∗) +∑i∈G2

−J2i(k∗) +H2i(k

∗) ≥ 0

N∑i=1

[SVi(k)− SVi(k

01)]≥ 0

)< ϵ.

Finally, we obtain that, for any given ϵ > 0, and both large N and T ,

P (k ∈ K(C1)) < ϵ.

In other words, the total sum of squared residuals cannot be maximized in the case of k∗ ∈

K(C1). By symmetry, the estimation of the common break point (4.3) can be transformed

k = arg max1≤k∗≤T−1

∑i∈G1

(SVi(k∗)− SVi(k

02)) +

∑i∈G2

(SVi(k∗)− SVi(k

Similarly, we can show that, for any given ϵ > 0,

P (k ∈ K(C2)) < ϵ.

The common break point estimator is obtained in set K(C2) with probability tending to zero.

Thus, we complete the proof of Proposition 4.1. Proposition 4.1 indicates that the estimated common break will be stochastically bounded

by either of true break points, or locate between k01 and k02. Then, we can say that

k01 − k

), if k ≤ k01, (D.59)

k − k02T

), if k ≥ k02. (D.60)

Using this property of common break estimator under the alternative, we next show that the

numerator of the statistic will diverge under H1A.

Proof of Proposition 4.2. Under the alternative, from (D.4), the CUSUM of the residuals

for individuals in group j (j = 1, 2) are calculated as

1√NT

∑i∈Gj

k∑t=1

=1√NT

∑i∈Gj

k∑t=1

uit −1√NT

∑i∈Gj

k∑t=1

1√NT

∑i∈Gj

k∑t=k+1

x′it(δi − δ0i )1k>k

+1√NT

∑i∈Gj

k∑t=k0j+1

x′itδ0i 1k0j<k≤k +

1√NT

∑i∈Gj

k∑t=k0j+1

x′itδ0i 1k0j<k<k

− 1√NT

∑i∈Gj

k∑t=k+1

x′itδ0i 1k<k≤k0j

− 1√NT

∑i∈Gj

k0j∑t=k+1

x′itδ0i 1k<k0j<k.

Then, the total sum of squared residuals 1√NT

∑Ni=1

∑kt=1 uit is expressed as

1√NT

N∑i=1

k∑t=1

uit −1√NT

N∑i=1

k∑t=1

x′it(βi(k)− β0i )−

1√NT

N∑i=1

k∑t=k+1

x′it(δi(k)− δ0i )1k>k

+1√NT

∑i∈G1

k∑t=k01+1

1√NT

∑i∈G1

k∑t=k01+1

x′itδ0i 1k01<k<k

− 1√NT

∑i∈G1

k∑t=k+1

x′itδ0i 1k<k≤k01

− 1√NT

∑i∈G1

k01∑t=k+1

x′itδ0i 1k<k01<k

+1√NT

∑i∈G2

k∑t=k02+1

1√NT

∑i∈G2

k∑t=k02+1

x′itδ0i 1k02<k<k

− 1√NT

∑i∈G2

k∑t=k+1

x′itδ0i 1k<k≤k02

− 1√NT

∑i∈G2

k02∑t=k+1

x′itδ0i 1k<k02<k

= UH11 − UH1

2 − UH13 + UH1

4 + UH15 − UH1

6 − UH17 + UH1

8 + UH19 − UH1

10 − UH111 . (D.61)

Since k < k01 in UH16 , UH1

7 , and k > k02 in UH18 , UH1

9 , using the orders (D.59) and (D.60), we

UH16 = Op

), UH1

7 = Op

), UH1

8 = Op

), UH1

9 = Op

). (D.62)

From (D.10), we know that, for i ∈ Gj , j = 1, 2,

√T (βi − β0

i ) =√T

k∑t=1

xitx′it

−1k∑

xituit +√T

k∑t=1

xitx′it

−1k∑

t=k0j+1

xitx′itδ

0i 1k>k0j

t=1xitx

)−1k∑

t=1xituit +

t=1xitx

)−1k∑

t=k01+1

xitx′itδ

0i 1k>k01

if j = 1,

t=1xitx

)−1k∑

t=1xituit +Op

(1√T

)if j = 2,

using order (D.60). Then, the second term UH12 becomes

∑i∈G1

k∑t=1

x′it√T

k∑t=1

xitx′it

−1k∑

xituit +

k∑t=1

xitx′it

−1k∑

t=k01+1

xitx′itδ

0i 1k>k01

∑i∈G2

k∑t=1

x′it

k∑t=1

xitx′it

−1k∑

xituit +Op

(1√T

)= Op(1) +

∑i∈G1

k∑t=1

x′it

k∑t=1

xitx′it

k∑t=k01+1

xitx′itδ

0i 1k>k01

+Op(1) +Op

= Op(1) + UH121 1k>k01

+Op(1) +Op

). (D.63)

Considering the third term UH13 , for individuals i ∈ Gj , the coefficient estimator is

T∑t=k+1

xitx′it

−1T∑

xityit −

k∑t=1

xitx′it

−1k∑

xityit

T∑t=k+1

xitx′it

−1T∑

xit(x′itβ

0i + x′itδ

0i + uit)1k≥k0j

k0j∑t=k+1

xit(x′itβ

0i + uit) +

T∑t=k0j+1

xit(x′itβ

0i + x′itδ

0i + uit)

1k<k0j

β0i +

k∑t=1

xitx′it

−1k∑

xituit +

k∑t=1

xitx′it

−1k∑

t=k0j+1

xitx′itδ

0i 1k>k0j

= δ0i +

T∑t=k+1

xitx′it

−1T∑

xituit −

k∑t=1

xitx′it

−1k∑

xituit

T∑t=k+1

xitx′it

−1 k0j∑t=k+1

xitx′itδ

0i 1k<k0j

k∑t=1

xitx′it

−1k∑

t=k0j+1

xitx′itδ

0i 1k>k0j

where the fourth term in the final equality is Op(1/T ) for individuals in group 1 by using

order (D.59), while the fifth term is Op(1/T ) for individuals in group 2 by using order (D.60).

Then, the third term UH13 can be rewritten as 1√

N∑i=1

k∑t=k+1

x′it√T

T∑t=k+1

xitx′it

−1T∑

xituit −

k∑t=1

xitx′it

−1k∑

xituit

)− 1√

∑i∈G1

k∑t=k+1

x′it

k∑t=1

xitx′it

−1k∑

t=k01+1

xitx′itδ

0i 1k>k01

− 1√N

∑i∈G2

k∑t=k+1

x′it

T∑t=k+1

xitx′it

k02∑t=k+1

xitx′itδ

0i 1k<k02

) 1k>k

[Op(1)−Op

)− UH1

31 1k>k01− UH1

32 1k<k02−Op

)]1k>k. (D.64)

Thus, from (D.62), (D.63) and (D.64), (D.61) can be rewritten by

1√NT

N∑i=1

k∑t=1

= Op(1)−

[Op(1) + UH1

21 1k>k01+Op(1) +Op

[Op(1)−Op

)− UH1

31 1k>k01− UH1

32 1k<k02−Op

)]1k>k

+UH14 + UH1

)− UH1

10 − UH111

= −UH121 1k>k01

+[UH131 1k>k01

+ UH132 1k<k02

]1k>k + UH1

4 + UH15 − UH1

10 − UH111

+Op(1) +Op

). (D.65)

We next show that (D.65) will diverge at rate of√NT under the alternative in following

three cases.

Case (i). Suppose that k < k01 < k02, we have

UH121 1k>k01

= 0, UH131 1k>k01

= 0, UH14 = 0, UH1

5 = 0.

Choosing k ∈ (k01 + C1, k02], we can see that UH1

11 = 0, and

−UH110 + UH1

32 1k<k021k>k

= − 1√NT

∑i∈G2

k∑t=k+1

x′itδ0i +

1√NT

∑i∈G2

k∑t=k+1

x′it

T∑t=k+1

xitx′it

−1k02∑

xitx′itδ

= −√

∑i∈G2

k∑t=k+1

x′it

T∑t=k+1

xitx′it

T∑t=k02+1

xitx′itδ

= Op(√NT ).

Thus, we have

supk∈[1,T−1]

USNT (k, k) ≥ supk∈(k01+C1,k02 ]

USNT (k, k)

= supk∈(k01+C1,k02 ]

(UH132 − UH1

10 +Op(1) +Op

= Op(NT ).

Case (ii). Suppose that k01 ≤ k ≤ k02. If k ∈ (k01 + C1, k02], choosing k ∈ [k01, k

01 + C1], we

UH131 1k>k = 0, UH1

32 1k>k = 0, UH15 = 0, UH1

10 = 0, UH111 = 0,

UH14 =

1√NT

∑i∈G1

k∑t=k01+1

x′itδ0i = Op

since k < k, and

UH121 =

∑i∈G1

k∑t=1

x′it

k∑t=1

xitx′it

k∑t=k01+1

xitx′itδ

0i = Op(

√NT ).

Thus, we have

supk∈[1,T−1]

USNT (k, k) ≥ supk∈[k01 ,k01+C1]

USNT (k, k)

= supk∈[k01 ,k01+C1]

(UH121 +Op(1) +Op

= Op(NT ).

If k ∈ [k01, k01 + C1], since (k − k01)/T = Op(1/T ),

UH121 1k>k01

), UH1

31 1k>k01= Op

), UH1

4 = Op

), UH1

5 = Op

Choosing k = k02, we have UH111 = 0, and

UH132 1k<k02

− UH110

=1√NT

∑i∈G2

k02∑t=k+1

x′it

T∑t=k+1

xitx′it

−1k02∑

xitx′itδ

0i −

1√NT

∑i∈G2

k02∑t=k+1

x′itδ0i

= −√

∑i∈G2

k02∑t=k+1

x′it

T∑t=k+1

xitx′it

T∑t=k02+1

xitx′itδ

(√NT

Thus, we have

supk∈[1,T−1]

USNT (k, k) ≥ USNT (k02, k) =

(UH132 − UH1

10 +Op(1) +Op

= Op(NT ).

Case (iii). Suppose that k01 < k02 < k, we have

UH132 1k<k02

= 0, UH110 = 0, UH1

11 = 0.

Choosing k ∈ (k01, k01 + C1], we can see that UH1

31 1k>k = 0, UH15 = 0, UH1

4 = Op

(√N/T

UH121 1k>k01

=1√NT

N∑i=1

k∑t=1

x′it

k∑t=1

xitx′it

−1k∑

t=k01+1

xitx′itδ

N∑i=1

k∑t=1

x′it

k∑t=1

xitx′it

k∑t=k01+1

xitx′itδ

= Op(√NT ).

Thus, we have

supk∈[1,T−1]

USNT (k, k) ≥ supk∈(k01 ,k01+C1]

USNT (k, k)

= supk∈(k01 ,k01+C1]

(UH121 +Op(1) +Op

= Op(NT ).

The proof of Proposition 4.2 is complete. Proof of Proposition 4.3. Form Proposition 4.1, under the alternative H1A, the esti-

mated common break k takes value in [k01 − C1, k02 + C2] with probability approaching one,

for arbitrary positive constants C1, C2. Thus, we investigate the limiting properties of the

normalization factor in three cases that k01−C1 ≤ k < k01, k01 ≤ k ≤ k02 and k02 < k ≤ k02+C2.

Case (i). Suppose that k01 − C1 ≤ k < k01, we have,

inf(k1,k2)∈Ω(ϵ)

VNT (k1, k, k2) ≤ VNT (k1, k, k02), for k1 ∈ Ω(ϵ).

To show that the minimum value of VNT (k1, k, k2) is stochastically bounded, it is sufficient

to show that for any k1 ∈ Ω(ϵ),

VNT (k1, k, k02) = Op(1).

In this case, the model is estimated by regressing Yi on [Xi, X1i(k1, k), X2i(k, k02), X3i(k

which is written as,

Yi = [Xi, X1i(k1, k), X2i(k, k02), X3i(k

βiδ1iδ2iδ3i

= Xi(k1, k, k02)b1i + ui, (D.66)

while the true model with distinct common breaks is defined by

Yi = [Xi, X1i(k, k01), X2i(k

02), X3i(k

01i + ui

= Xi(k, k01, k

01i + ui (D.67)

b01i =

i′, 0, δ0i

′, δ0i

′]′ if i ∈ G1,

[β0i′, 0, 0, δ0i

′]′ if i ∈ G2.

Replacing Yi in (D.66) by (D.67), the residuals can be written by, for individuals in group 1,

ui = Xi(k, k01, k

01i + ui − Xi(k1, k, k

02)b1i(k)

= ui − Xi(k1, k, k02)[b1i(k)− b01i] + [Xi(k, k

02)− Xi(k1, k, k

= ui − Xi(k1, k, k02)

βi − β0

δ1iδ2i − δ0iδ3i − δ0i

+ [0, X1i(k, k01)−X1i(k1, k), X2i(k

02)−X2i(k, k

02), 0]

0δ0iδ0i

= ui −Xi(βi − β0

i )−X1i(k1, k)δ1i −X2i(k, k02)(δ2i − δ0i )−X3i(k

02)(δ3i − δ0i )

+[X2i(k0, k02)−X2i(k, k

02)]δ

0i . (D.68)

For individuals in group 2, we have,

ui = ui − Xi(k1, k, k02)

βi − β0

δ1iδ2i

δ3i − δ0i

+ [0, X1i(k, k01)−X1i(k1, k), X2i(k

02)−X2i(k, k

02), 0]

00δ0i

= ui −Xi(βi − β0

i )−X1i(k1, k)δ1i −X2i(k, k02)δ2i −X3i(k

02)(δ3i − δ0i ). (D.69)

By the definition of the denominator, VNT (k1, k, k02) can be decomposed into four parts V H1

V H12 , V H1

3 and V H14 , defined by

V H11 =

k1∑s=1

(1√NT

N∑i=1

s∑t=1

, V H12 =

k∑s=k1+1

1√NT

N∑i=1

k∑t=s

V H13 =

k02∑s=k+1

1√NT

N∑i=1

s∑t=k+1

, V H14 =

T∑s=k02+1

(1√NT

N∑i=1

T∑t=s

From (D.68) and (D.69), for t ≤ k, the residuals uit are calculated on the basis of subsamples

xi1, · · · , xik1, and xi(k1+1), · · · , xik, which are the same as that of (D.22) under the null.

Using the asymptotic distribution of (D.23)-(D.27) and k01 − k = Op(1), we can derive the

limiting distribution of the terms V H11 and V H1

2 as follows:

V H11 +V H1

2 ⇒ σ2

∫ τ1

(W (r)− r

τ1W (τ1)

dr+σ2

∫ τ01

[W (τ01 )−W (r)− τ01 − r

τ01 − τ1(W (τ01 )−W (τ1))

We next consider the third term, which can be rewritten as

V H13 =

k02∑s=k+1

1√NT

N∑i=1

s∑t=k+1

uit −1√NT

N∑i=1

s∑t=k+1

1√NT

∑i∈G1

s∑t=k+1

− 1√NT

∑i∈G1

s∑t=k+1

x′itδ0i 1s≤k01 +

k01∑t=k+1

x′itδ0i 1s>k01

− 1√NT

∑i∈G2

s∑t=k+1

x′itδ2i

(D.70)

k02∑s=k+1

1√NT

N∑i=1

s∑t=k+1

uit −1√NT

N∑i=1

s∑t=k+1

1√NT

∑i∈G1

s∑t=k+1

− Op

)− 1√

∑i∈G2

s∑t=k+1

x′itδ2i

k02∑s=k+1

(V H131 − V H1

32 − V H133 −Op

)− V H1

Since k01 − k = Op(1), the terms in parentheses of (D.70) are op(1) and will vanish as N,T →

∞. Similar to V31 and V32, we have,

V H131 ⇒ σ(W (r)−W (τ01 )), (D.71)

V H132 ⇒ σ(r − τ01 )

W (τ1)

τ1. (D.72)

The coefficient estimator δ2i is calculated by, for i ∈ G1,

δ2i =

k02∑t=k+1

xitx′it

−1k02∑

xityit −

(k1∑t=1

xitx′it

)−1 k1∑t=1

xityit

k02∑t=k+1

xitx′it

−1 k01∑t=k+1

xit(x′itβ

0i + uit) +

k02∑t=k01+1

xit(x′itβ

0i + x′itδ

0i + uit)

β0i +

(k1∑t=1

xitx′it

)−1 k1∑t=1

xituit

= δ0i +

k02∑t=k+1

xitx′it

−1k02∑

xituit −

(k1∑t=1

xitx′it

)−1 k1∑t=1

xituit −Op

for i ∈ G2,

δ2i =

k02∑t=k+1

xitx′it

−1k02∑

xit(x′itβ

0i + uit)−

β0i +

(k1∑t=1

xitx′it

)−1 k1∑t=1

xituit

k02∑t=k+1

xitx′it

−1k02∑

xituit −

(k1∑t=1

xitx′it

)−1 k1∑t=1

xituit.

Then, we have,

V H133 + V H1

34 =1√N

N∑i=1

s∑t=k01+1

x′it +Op

k02∑

t=k01+1

xitx′it

−1k02∑

t=k01+1

xituit

(k1∑t=1

xitx′it

)−1 k1∑t=1

xituit + Op

(1√T

)]⇒ σ(r − τ01 )

(W (τ02 )−W (τ01 )

τ02 − τ01− W (τ1)

Thus, we can find the limiting distribution of V H13 is

∫ τ02

[W (r)−W (τ01 )−

r − τ01τ02 − τ01

(W (τ02 )−W (τ01 ))

Since the coefficient estimator δ3i remains the same in group 1 and 2, we have

V H14 =

T∑s=k02+1

1√NT

N∑i=1

[T∑t=s

uit −T∑t=s

x′it(β1i − β0i )−

T∑t=s

⇒ σ2

[W (1)−W (r)− 1− r

1− τ02

(W (1)−W (τ02 )

)]2dr.

Thus, we can say that

inf(k1,k2)∈Ω(ϵ)

VNT (k1, k, k2) ≤ VNT (k1, k, k02) = V H1

1 + V H12 + V H1

3 + V H14 = Op(1).

The proof of Proposition 4.2(i) is complete.

Case (ii) Suppose that k01 ≤ k ≤ k02. In this case, we have

inf(k1,k2)∈Ω(ϵ)

VNT (k1, k, k2) ≤ VNT (k01, k, k

We can easily find that the term VNT (k01, k, k

02) estimated by using true break points will

have a finite limiting distribution.

Case (iii) Suppose that k02 < k ≤ k02+C2. In this case, from (D.60), we have k−k02 = Op(1).

Similarly to proof of case (i), we can show that

inf(k1,k2)∈Ω(ϵ)

VNT (k1, k, k2) ≤ VNT (k01, k, k2) = Op(1), for any k2 ∈ Ω(ϵ).

Thus, we complete the proof of Proposition 4.3. Proof of Theorem 4.2. From Proposition 4.1, we show that P (k ∈ [k01 −C1, k

02 +C2]) → 1.

Furthermore, for any k ∈ [k01 − C1, k02 + C2],

supk∈Ω(ϵ)

sup(k1,k2)∈Ω(ϵ)

V −1NT (k1, k, k2) = Op(1) (or ∞),

from Propositions 4.2 and 4.3. Thus, the proof of Theorem 4.2 is complete.

Table 4.1: Critical values

cλ0 10% 5% 1%

c0.1 43.425 56.822 92.840c0.2 43.912 57.249 92.341c0.3 45.501 57.962 93.689c0.4 45.427 57.997 90.335c0.5 45.540 57.842 85.984c0.6 45.250 57.276 90.397c0.7 46.489 59.175 93.728c0.8 45.201 59.248 94.886c0.9 43.515 57.203 92.908

Table 4.2: Size of the test DGP.1

T N 10% 5% 1%

(a) ρ = 020 10 0.145 0.089 0.034

50 0.136 0.074 0.026100 0.098 0.053 0.015

50 10 0.086 0.048 0.01150 0.076 0.036 0.009100 0.063 0.027 0.006

100 10 0.073 0.032 0.00550 0.072 0.034 0.006100 0.064 0.033 0.004

200 10 0.083 0.037 0.00850 0.075 0.032 0.006100 0.086 0.041 0.008

(b) ρ = 0.420 10 0.226 0.146 0.060

50 0.234 0.153 0.069100 0.231 0.143 0.058

50 10 0.134 0.068 0.02250 0.151 0.084 0.028100 0.145 0.084 0.024

100 10 0.113 0.063 0.01750 0.105 0.058 0.016100 0.101 0.055 0.016

200 10 0.107 0.048 0.01350 0.091 0.043 0.009100 0.091 0.050 0.013

(c) ρ = 0.820 10 0.457 0.353 0.203

50 0.468 0.353 0.201100 0.450 0.360 0.206

50 10 0.352 0.262 0.14750 0.393 0.299 0.156100 0.364 0.277 0.150

100 10 0.252 0.178 0.08250 0.246 0.168 0.080100 0.246 0.164 0.079

200 10 0.179 0.110 0.03850 0.164 0.099 0.034100 0.155 0.100 0.036

Table 4.3: Power of the test DGP.2 (under H1A)

T N 10% 5% 1%

(a) ρ = 020 10 0.149 0.088 0.021

50 0.586 0.455 0.222100 0.846 0.741 0.474

50 10 0.281 0.174 0.04650 0.918 0.847 0.614100 0.993 0.982 0.911

100 10 0.561 0.425 0.19050 0.996 0.980 0.916100 1.000 1.000 0.992

(b) ρ = 0.420 10 0.304 0.216 0.100

50 0.803 0.722 0.519100 0.965 0.929 0.789

50 10 0.390 0.276 0.12150 0.950 0.901 0.742100 0.997 0.991 0.946

100 10 0.611 0.491 0.25850 0.994 0.985 0.932100 1.000 1.000 0.994

(c) ρ = 0.820 10 0.745 0.675 0.505

50 0.995 0.986 0.946100 1.000 0.999 0.990

50 10 0.713 0.622 0.45250 0.996 0.988 0.934100 1.000 1.000 0.992

100 10 0.771 0.683 0.49350 0.998 0.993 0.964100 1.000 1.000 0.999

1 k01 = [T/4], k0

2 = [3T/4],N1 : N2 = 5 : 5.

ρ δ1i, δ2i 10% 5% 1%

0 U(0,0.1) 0.130 0.064 0.012U(0.1,0.2) 0.538 0.401 0.167U(0.2,0.3) 0.918 0.850 0.611U(0.3,0.4) 0.995 0.984 0.919U(0.4,0.5) 1.000 0.999 0.986U(0.5,0.6) 1.000 1.000 0.998U(0.6,0.7) 1.000 1.000 1.000U(0.7,0.8) 1.000 1.000 1.000U(0.8,0.9) 1.000 1.000 1.000U(0.9,1.0) 1.000 1.000 1.000U(1.4,1.5) 1.000 1.000 1.000

0.4 U(0,0.1) 0.206 0.138 0.037U(0.1,0.2) 0.652 0.540 0.292U(0.2,0.3) 0.955 0.913 0.750U(0.3,0.4) 0.997 0.992 0.949U(0.4,0.5) 1.000 1.000 0.993U(0.5,0.6) 1.000 1.000 1.000U(0.6,0.7) 1.000 1.000 1.000U(0.7,0.8) 1.000 1.000 1.000U(0.8,0.9) 1.000 1.000 1.000U(0.9,1.0) 1.000 1.000 1.000U(1.4,1.5) 1.000 1.000 1.000

1 T = 50, N = 50.2 k0

1 = [T/4], k02 = [3T/4], N1 : N2 = 5 : 5.

k01 k02 10% 5% 1%

[0.2T] [0.25T] 0.120 0.066 0.018[0.3T] 0.325 0.225 0.070[0.4T] 0.801 0.695 0.465[0.5T] 0.941 0.891 0.734[0.6T] 0.955 0.918 0.764[0.7T] 0.925 0.875 0.692[0.8T] 0.859 0.771 0.558

1 N = T = 50, ρ = 0.4, N1 : N2 = 5 : 5.

N1 : N2 10% 5% 1%

2:N-2 0.168 0.105 0.0371:9 0.262 0.187 0.0732:8 0.546 0.447 0.2413:7 0.811 0.719 0.5004:6 0.938 0.888 0.7375:5 0.978 0.950 0.8401 N = T = 50, ρ = 0.4.2 k0

1 = [0.3T ], k02 = [0.7T ].

Table 4.7: Power of the test (under H1A)

Non-orthogonal changes (Non-)Orthogonal changes Orthogonal changes

T N 10% 5% 1% 10% 5% 1% 10% 5% 1%

(a) ρ = 020 20 0.255 0.166 0.051 0.139 0.085 0.024 0.072 0.040 0.009

50 0.565 0.434 0.209 0.184 0.120 0.050 0.084 0.045 0.014100 0.839 0.733 0.515 0.193 0.129 0.056 0.086 0.049 0.007

50 20 0.564 0.423 0.186 0.124 0.072 0.018 0.064 0.029 0.00650 0.905 0.840 0.615 0.114 0.063 0.017 0.071 0.041 0.006100 0.992 0.979 0.897 0.069 0.035 0.009 0.065 0.028 0.004

100 20 0.822 0.731 0.497 0.103 0.058 0.014 0.073 0.036 0.00750 0.994 0.980 0.905 0.081 0.043 0.006 0.077 0.037 0.005100 1.000 1.000 0.994 0.084 0.043 0.009 0.073 0.035 0.006

(b) ρ = 0.420 20 0.478 0.357 0.181 0.273 0.202 0.088 0.179 0.113 0.043

50 0.798 0.701 0.480 0.299 0.220 0.107 0.198 0.133 0.052100 0.953 0.910 0.775 0.277 0.206 0.111 0.209 0.141 0.052

50 20 0.662 0.550 0.329 0.175 0.106 0.037 0.114 0.065 0.01650 0.939 0.893 0.740 0.160 0.094 0.028 0.131 0.076 0.026100 0.997 0.990 0.943 0.136 0.080 0.022 0.130 0.077 0.020

100 20 0.844 0.766 0.548 0.128 0.067 0.022 0.113 0.063 0.01850 0.994 0.985 0.922 0.117 0.061 0.015 0.101 0.053 0.011100 1.000 1.000 0.995 0.117 0.063 0.015 0.105 0.058 0.018

(c) ρ = 0.820 20 0.882 0.834 0.698 0.468 0.384 0.229 0.400 0.292 0.154

50 0.988 0.977 0.924 0.422 0.343 0.202 0.412 0.321 0.180100 1.000 1.000 0.991 0.408 0.316 0.184 0.438 0.339 0.179

50 20 0.881 0.825 0.677 0.348 0.272 0.150 0.324 0.247 0.12150 0.989 0.977 0.928 0.322 0.240 0.122 0.311 0.226 0.114100 1.000 0.998 0.991 0.300 0.224 0.117 0.321 0.236 0.116

100 20 0.930 0.871 0.724 0.229 0.169 0.072 0.208 0.153 0.06550 0.999 0.996 0.974 0.231 0.163 0.062 0.209 0.140 0.054100 1.000 1.000 0.996 0.234 0.164 0.064 0.209 0.142 0.058

1 k01 = [T/4], k0

2 = [3T/4],N1 : N2 = 5 : 5.

Table 4.8: Power of the test (ρ = 0.4 under H2A)

k01 k02 k03 N1 : N2 : N3 T N 10% 5% 1%

[T/6] [3T/6] [4T/6] 3:3:4 50 10 0.296 0.204 0.08050 0.646 0.522 0.284100 0.740 0.642 0.445

[0.4T] [0.5T] [0.6T] 3:3:4 50 10 0.248 0.165 0.05850 0.485 0.381 0.211100 0.629 0.506 0.296

[0.2T] [0.25T] [0.5T] 3:3:4 50 10 0.342 0.239 0.10950 0.835 0.750 0.562100 0.964 0.923 0.809

[0.2T] [0.3T] [0.8T] 3:3:4 50 10 0.334 0.235 0.08750 0.704 0.591 0.363100 0.851 0.772 0.583

[0.2T] [0.5T] [0.8T] 1:4:5 50 10 0.334 0.224 0.08750 0.851 0.758 0.566100 0.925 0.866 0.725

Table 4.9: Power of the test (under H3A)

T N 10% 5% 1%

(a) ρ = 020 10 0.147 0.089 0.022

50 0.548 0.412 0.191100 0.654 0.497 0.255

50 10 0.281 0.173 0.05050 0.688 0.515 0.247100 0.884 0.770 0.474

100 10 0.468 0.329 0.12950 0.908 0.815 0.541100 0.969 0.895 0.637

(b) ρ = 0.420 10 0.344 0.243 0.109

50 0.725 0.584 0.352100 0.842 0.740 0.493

50 10 0.300 0.204 0.07650 0.842 0.730 0.477100 0.903 0.810 0.538

100 10 0.517 0.392 0.18350 0.942 0.874 0.642100 0.879 0.761 0.438

(c) ρ = 0.820 10 0.785 0.692 0.457

50 0.879 0.728 0.382100 0.980 0.914 0.626

50 10 0.604 0.489 0.31450 0.906 0.800 0.508100 0.940 0.833 0.485

100 10 0.710 0.621 0.39350 0.940 0.864 0.614100 0.840 0.656 0.276

Bibliography

[1] Adesanya, O. (2020). Testing and Dating Structural Breaks in Generalized Linear Mul-

tivariate Models for Stock Market Contagion. SSRN

[2] Anatolyev, S., and Kosenok, G. (2018). Sequential testing with uniformly distributed

size. Journal of Time Series Econometrics 10(2),1-22.

[3] Andrews, D. W. K. (1991). Heteroskedasticity and autocorrelation consistent covariance

matrix estimation. Econometrica 59, 817-858.

[4] Andrews, D. W. K., and Monahan, J. C. (1992). An improved heteroskedasticity and

autocorrelation consistent covariance matrix estimator. Econometrica 60(4),953-966.

[5] Antoch, J., Hanousek, J., Horvath, L., Huskova, M., and Wang, S. X. (2019). Structural

breaks in panel data: Large number of panels and short length time series. Econometric

Reviews, 38(7), 828-855.

[6] Aue, A., and Horvath, L. (2004). Delay time in sequential detection of change. Statistics

and Probability Letters 67(3),221-231.

[7] Aue, A., Horvath, L., Huskova, M., and Kokoszka, P. (2008a). Testing for changes in

polynomial regression. Bernoulli 14(3),637-660.

[8] Aue, A., Horvath, L., Kokoszka, P. and Steinebach, J. (2008b). Monitoring shifts in

mean: asymptotic normality of stopping times. Test 17(3),515-530.

[9] Aue, A., Horvath, L., Kuhn, M., and Steinebach, J. (2012). On the reaction time of

moving sum detectors. Journal of Statistical Planning and Inference 142(8),2271-2288.

[10] Aue, A., Horvath, L., and Reimherr, M. L. (2009). Delay times of sequential procedures

for multiple time series regression models. Journal of Econometrics 149(2),174-190.

[11] Aue, A., and Kuhn, M. (2008). Extreme value distribution of a recursive-type detector

in a linear model. Extremes 11,135-166.

[12] Bai, J. (1997). Estimation of a Change Point in Multiple Regression Models. Review of

Economics and Statistics, 79(4), 551-563.

[13] Bai, J. (2010). Common breaks in means and variances for panel data. Journal of Econo-

metrics, 157(1), 78-92.

[14] Baltagi, B.H. Feng, Q. and Kao, C. (2016). Estimation of heterogeneous panels with

structural breaks. Journal of Econometrics, 191(1),176-195.

[15] Baltagi, B. H., Kao, C., and Liu, L. (2017). Estimation and identification of change points

in panel models with nonstationary or stationary regressors and error term. Econometric

Reviews, 36(1-3), 85-102.

[16] Billingsley P. (1968). Convergence of probability measures. Wiley. New York.

[17] Brown, R. L., Durbin, J., and Evans, J. M. (1975). Techniques for testing the constancy

of regression relationships over time. Journal of the Royal Statistical Society B, 37, 149-

[18] Carsoule, F., and Franses, PH. (2003). A note on monitoring time-varying parameters

in an autoregression. Metrika 57(1),51-62.

[19] Chen, B., and Huang, L. (2018). Nonparametric testing for smooth structural changes

in panel data models. Journal of Econometrics, 202(2), 245-267.

[20] Choi, M.D. (1983). Tricks or Treats with the Hilbert matrix. American Mathematical

Monthly 90(5),301-312.

[21] Chu, C.-S.J., Stinchcombe, M., and White, H. (1996). Monitoring structural change.

Econometrica 64(5),1045-1065.

[22] Chu, C.-S.J., and White, H. (1992). A direct test for changing trend. Journal of Business

and Economic Statistics 10(3),289-299.

[23] Claeys, P., and Vasıcek, B. (2014). Measuring bilateral spillover and testing contagion

on sovereign bond markets in Europe. Journal of Banking & Finance, 46, 151-165.

[24] Crainiceanu, C. M., and Vogelsang, T. J. (2007). Nonmonotonic power for tests of a mean

shift in a time series. Journal of Statistical Computation and Simulation 77, 457-476.

[25] Davidson, J. (1994). Stochastic Limit Theory. Oxford University Press, Oxford.

[26] De Wachter, S., and Tzavalis, E. (2012). Detection of structural breaks in linear dynamic

panel data models. Computational Statistics & Data Analysis, 56(11), 3020-3034.

[27] Deng, A., and Perron, P. (2008). A non-local perspective on the power properties of the

CUSUM and CUSUM of squares tests for structural change. Journal of Econometrics

142, 212-240.

[28] Doob, J. (1949). Heuristic approach to the Kolmogorov-Smirnov theorems. The Annals

of Mathematical Statistics 20(3),393-403.

[29] Fremdt, S. (2014). Asymptotic distribution of the delay time in page’s sequential proce-

dure. Journal of Statistical Planning and Inference 145,74-91.

[30] Fremdt, S. (2015). Page’s sequential procedure for change-point detection in time-series

regression. Statistics: A Journal of Theoretical and Applied Statistics 49(1),128-155.

[31] Garbade, K. (1977). Two methods for examining the stability of regression coefficients.

Journal of the American Statistical Association 72:357, 54-63.

[32] Hidalgo, J., and Schafgans, M. (2017). Inference and testing breaks in large dynamic

panels with strong cross sectional dependence. Journal of Econometrics, 196(2), 259-

[33] Hitotumatu, E. (1988). Cholesky decomposition of the Hilbert matrix. Japan Journal of

Applied Mathematics 5,135-144.

[34] Horvath, L., Kuhn, M., and Steinebach, J. (2008). On the performance of the fluctuation

test for structural change. Sequential Analysis 27(2),126-140.

[35] Horvath, L. (1997). Detection of changes in linear sequences. Annals of the Institute of

Statistical Mathematics 49,271-283.

[36] Horvath, L., and Huskova, M (2012). Change-point detection in panel data. Journal of

Time Series Analysis 33(4), 631-648.

[37] Horvath, L., Huskova, M., Kokoszka, P., and Steinebach, J. (2004). Monitoring changes

in linear models. Journal of Statistical Planning and Inference 126(1),225-251.

[38] Horvath, L., Huskova, M, Rice, G., and Wang, J. (2017). Asymptotic properties of the

cusum estimator for the time of change in linear panel data models. Econometric Theory

33(2), 366-412.

[39] Horvath, L., Kokoszka, P., and Steinebach, J. (2007). On sequential detection of param-

eter changes in linear regression. Statistics and Probability Letters 77(9),885-895.

[40] Huskova, M., and Koubkova, A. (2005). Monitoring jump changes in linear models.

Journal of Statistical Research 39:2, 51-70.

[41] Jiang, P., and Kurozumi, E. (2019). Power properties of the modified CUSUM tests.

Communications in Statistics - Theory and Methods, 48(12), 2962-2981.

[42] Jiang, P., and Kurozumi, E. (2020). Monitoring parameter changes in models with a

trend. Journal of Statistical Planning and Inference, 207, 288-391.

[43] Juhl, T., and Xiao, Z. (2009). Tests for changing mean with monotonic power. Journal

of Econometrics 148, 14-24.

[44] Kejriwal, M. (2009). Tests for a mean shift with good size and monotonic power. Eco-

nomics Letters 102, 78-82.

[45] Kim, D. (2011). Estimating a common deterministic time trend break in large panels

with cross sectional dependence. Journal of Econometrics, 164(2), 310-330.

[46] Kim, D. (2014). Common breaks in time trends for large panel data with a factor struc-

ture. Econometrics Journal, 17(3), 301-337.

[47] Kramer, W., W. Ploberger, and Alt, R. (1988). Testing for Structural Change in Dynamic

Models. Econometrica 56, 1355-1369.

[48] Kuan, C. M. (1998). Tests for changes in models with a polynomial trend. Journal of

Econometrics 84(1),75-91.

[49] Kurozumi, E. (2017). Monitoring parameter constancy with endogenous regressors. Jour-

nal of Time Series Analysis 38(5),791-805.

[50] Lee, S., Lee, Y., and Na, O. (2009). Monitoring distributional changes in autoregressive

models. Communications in Statistics-Theory and Methods 38,2969-2982.

[51] Leisch, F., Hornik, K., and Kuan, C. M. (2000). Monitoring structural changes with the

generalized fluctuation test. Econometric Theory 16(6),835-854.

[52] Li, D., Qian, J., and Su, L. (2016). Panel Data Models With Interactive Fixed Effects and

Multiple Structural Breaks. Journal of the American Statistical Association, 111(516),

1804-1819.

[53] Luger, R. (2001). A modified CUSUM test for orthogonal structural changes. Economics

Letters 73, 301-306.

[54] Lumsdaine, R. L., Okui, R., and Wang, W. (2020). Estimation of Panel Group Structure

Models with Structural Breaks in Group Memberships and Coefficients. SSRN Electronic

Journal.

[55] Na, O., Lee, Y., and Lee, S. (2011). Monitoring parameter change in time series models.

Statistical Methods and Applications 20(2),171-199.

[56] Oka, T., and Perron, P. (2018). Testing for common breaks in a multiple equations

system. Journal of Econometrics, 204, 66-85.

[57] Okui, R., and Wang, W. (2020). Heterogeneous structural breaks in panel data models.

[58] Pauwels, L.L., Chan, F., and Griffoli, T.M. (2012). Testing for structural change in

heterogeneous panels with an application to the Euro’s trade effect. Journal of Time

Series Econometrics, 4:1-33.

[59] Perron, P. (1989). The great crash, the oil price shock, and the unit root hypothesis.

Econometrica 57(6),1361-1401.

[60] Perron, P., and Yabu, T. (2009). Testing for shifts in trend with an integrated or sta-

tionary noise component. Journal of Business and Economic Statistics 27(3),369-396.

[61] Phillips, P.C.P., and Solo, V. (1992). Asymptotics for linear processes. Annals of Statis-

tics, 20:971-1001.

[62] Ploberger, W., and Kramer, W. (1990). The local power of the CUSUM and CUSUM of

squares tests. Econometric Theory 6, 335-347.

[63] Ploberger, W., and Kramer, W. (1992). The CUSUM test with OLS residuals. Econo-

metrica 60, 271-285.

[64] Qi, P., Duan, X., Tian, Z., and Li, F. (2016). Sequential monitoring for changes in models

with a polynomial trend. Communications in Statistics-Simulation and Computation

45(1),222-239.

[65] Qian, J., and Su, L. (2016). Shrinkage estimation of common breaks in panel data models

via adaptive group fused lasso. Journal of Econometrics, 191, 86-109.

[66] Shao, X., and Zhang, X. (2010). Testing for change points in time series. Journal of

American Statistical Association 105, 1228-1240.

[67] Stohr, C. (2019). Sequential change point procedures based on U-statistics and

the detection of covariance changes in functional data. On Semantic Scholar

(https://www.semanticscholar.org/), DOI:10.25673/13826

[68] Tang, S. M., and MacNeill, I. B. (1993). The effect of serial correlation on tests for

parameter change at unknown time. Annals of Statistics 21, 552-575.

[69] Vogelsang, T. J. (1999). Sources of nonmonotonic power when testing for a shift in mean

of a dynamic time series. Journal of Econometrics 88, 283-299.

[70] Westerlund, J. (2019). Common Breaks in Means for Cross-correlated Fixed-T Panel

Data. Journal of Time Series Analysis, 40(2), 248-255.

[71] Xia, Z. M., Guo, P. J., and Zhao, W. Z. (2011). CUSUM methods for monitoring struc-

tural changes in structural equations. Communications in Statistics-Theory and Methods

40:6, 1109-1123.

[72] Yamazaki, D., and Kurozumi, E. (2015). Improving the finite sample performance of

tests for a shift in mean. Journal of Statistical Planning and Inference 167, 144-173.

[73] Yang, J., and Vogelsang, T. J. (2011). Fixed-b analysis of LM-type tests for a shift in

mean. Econometrics Journal 14, 438-456.

essays on testing for structural changes

Documents

transient versus persistent functional and structural...

structural changes in current account

essays in employment, banking system and structural...

brain structural -haemodinamic changes caused by...

january 2017 proposed structural changes 2017 proposed...

analyzing structural and functional changes of traditional

structural changes in finnish health care

structural changes in the economy_be seminar

fiscal incentives, public policies and structural changes

structural changes in submandibular salivary gland, …

structural changes in food retailing

structural changes in the standards setting environment

postmenopausal bone metabolism and structural changes

dehydration-rehydration induced structural changes in

structural changes in india’s labour

redox-coupled structural changes in nitrite reductase

managing structural changes in telecommunications

structural changes in fresh fruit and vegetable

structural and biochemical changes underlying a ... ·...

structural changes & relative perfusion measurements in...