essays on testing for structural changes
Post on 31-Dec-2021
2 Views
Preview:
TRANSCRIPT
Essays on Testing for Structural Changes
by
Peiyun Jiang
Submitted in partial fulfillment of the requirements
for the degree of Doctor of Philosophy in Economics
Graduate School of Economics
Hitotsubashi University
November, 2020
Acknowledgements
This dissertation is composed of three testing procedures related to structural changes. Chap-
ter 2 is concerned with testing for parameter constancy in time series models, which is based
on Jiang and Kurozumi (2019), “Power properties of the modified CUSUM tests”, Commu-
nications in Statistics - Theory and Methods, 48, 2962-2981. Chapter 3 proposes a sequential
test for structural changes in models with a trend, which is based on Jiang and Kurozumi
(2020), “ Monitoring parameter changes in models with a trend”, Journal of Statistical Plan-
ning and Inference, 207, 288-319. Chapter 4 introduces a test for common breaks in panels.
I would like to express the deepest appreciation to my advisor, Professor Eiji Kurozumi,
whose insight and knowledge steered me through this dissertation. He inspired my interest
in time series analysis and has always lit up my career path. He did the best he can to
support me and has always been patient to map my Ph.D. journey, provide suggestions on
research topics, and connect me with the resources I need. Without his patient guidance
and persistent help, this dissertation would not have been possible. I am also thankful for
the thoughtful comments and constructive suggestions from Professor Yohei Yamamoto. His
enthusiasm for the research has greatly encouraged me to complete my dissertation. I would
like to thank Professor Toshiaki Watanabe for providing valuable suggestions on empirical
analysis and encouraging me to further explore applications in economics. I also appreciate
Professor Yukitoshi Matsushita and Professor Toshio Honda for their helpful comments in
improving this dissertation.
I am deeply grateful to my wonderful parents for their support and encouragement. They
provided me an opportunity to study abroad and helped me achieve all my goals. I thank
them for always being there through all my hardships. I am also thankful to my friends
who support me, uplift me, and bring light to my life. Special thanks go to Hitotsubashi
University, Nomura Foundation, and Mitsubishi UFJ for financial support.
Peiyun Jiang
November, 2020
i
Contents
Acknowledgements i
1 Overview 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Overview: Chapter 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Overview: Chapter 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Overview: Chapter 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Modified Tests for Orthogonal Structural Changes in Time Series Models 6
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Models and Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 Modified CUSUM Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3.2 Modified CUSUM tests . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3.3 Serially correlated errors . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.4 Finite Sample Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Appendix A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Tables and Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3 A Sequential Test for Structural Changes in Models with a Trend 28
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.2 Model and Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
ii
3.3 Monitoring Procedure for a Change in the Trend . . . . . . . . . . . . . . . . 32
3.3.1 CUSUM-based monitoring procedure . . . . . . . . . . . . . . . . . . . 32
3.3.2 Extension to higher order polynomials . . . . . . . . . . . . . . . . . . 34
3.4 Asymptotic Distributions of the Stopping Times . . . . . . . . . . . . . . . . 36
3.5 Finite Sample Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.6 Empirical Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
Appendix B. Proofs of Theorems 3.1 and 3.2 . . . . . . . . . . . . . . . . . . . . . 42
Appendix C. Proofs of Theorems 3.3 and 3.4 . . . . . . . . . . . . . . . . . . . . . 53
Tables and Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4 A New Test for Common Breaks in Heterogeneous Panel Data Models 77
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.2 Model and Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.3 Test Statistic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.4 Asymptotic Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.5 Finite Sample Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
4.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
Appendix D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
iii
Chapter 1
Overview
1.1 Introduction
Structural change has long been an important issue in the statistics and econometrics liter-
ature, since ignoring parameters instability may lead to inaccurate forecasts and misleading
inferences. In the last fifty years or so, several important testing procedures have been pro-
posed in the econometrics literature and extensively investigated to cover models at a level
of generality.
One statistic which has played an important role in theory and applications related to
structural changes is the cumulative sum (CUSUM) test proposed by Brown et al. (1975).
This test is based on the CUSUM of recursive residuals and becomes especially popular
because it is designed to test the null hypothesis of parameter stability against a variety of
alternatives. With regard to its power properties, Ploberger and Kramer (1990), and Deng
and Perron (2008) indicated that the power of the CUSUM test crucially depends on the
angle between the mean of the regressors and direction of the shift. A major drawback
is that the test loses power when the mean of the regressors is perpendicular to the shift,
referred as to an orthogonal structural change. Several modified versions of the test have
been proposed, explicitly or implicitly, to overcome the problem. See Luger (2001), Huskova
and Koubkova (2005), and among others. However, these modified versions cannot show
overwhelming advantages over the original CUSUM tests. To avoid losing power when the
mean of the regressor is orthogonal to the shift in a linear regression model, Chapter 2
considers two versions of the modified CUSUM tests, which are satisfactory in terms of both
size and power.
Most of the existing methodologies related to structural changes are used to examine
1
parameter instability based on a historical data set of fixed size, which are the so-called
retrospective tests. Nowadays, since new data arrive steadily in the real world, it leads to
questions: Is the current model still valid to explain today’s data? How to sequentially
detect breaks as soon as possible as new data coming? However, multiple replications of a
retrospective test with a given critical value will result in an uncontrollable empirical size.
Starting with the pioneering work of Chu et al. (1996) in the econometrics literature, the
sequential tests have emerged to check whether incoming data are consistent with the current
relationship, and thus become popularly advocated in the literature. The sequential tests
have been investigated extensively in various models, including Aue and Horvath (2004) in a
mean-shift model, Aue et al. (2009) in a linear model, Carsoule and Franses (2003) and Lee
et al. (2009) in autoregressive models, Na et al. (2011) in GARCH models, Xia et al. (2011)
and Kurozumi (2017) in linear models with endogenous regressors. As noted by Perron
(1989) and others, macroeconomic time series are sometimes better characterized by trend-
stationary series with possible change(s) in deterministics, for instance, commodity retail
sales and consumer price index (CPI). While Qi et al. (2016) extended the fluctuation test to
detect parameter instability in a model with a trend, from a different perspective, Chapter 3
develops a CUSUM-type monitoring scheme based on ordinary least squares (OLS) residuals.
Since the sequential tests generally reject the null hypothesis of no change possibly with a
delay after the break, the speed of detection is a significant indicator for sequential tests. In
Chapter 3, we further investigate the properties of delay times of the CUSUM test as well as
the fluctuation test.
The issue related to structural changes in Chapter 4 shifts from pure time series to panel
data models, in which there are N series and each series has T observations. One of the
main reasons is that using the cross-section data improves the accuracy of the break points
estimators. In panel frameworks, the failure of consistency of the break point in time series
models has been overcome, due to a common break assumption that break point occurs in
each series at the same location. In practice, however, the common break assumption is
restrictive in a sense and some evidence has verified that the break points are likely to vary
significantly across individuals (see Claeys and Vasıcek 2014; Adesanya 2020). The validity
of the common break assumption we focus on in Chapter 4 is an issue of interest for most
applications, but no satisfactory methodology has been proposed to cope with this problem.
To fill in this gap, we introduce a test to determine whether the break dates are common or
can vary across series in the panel data models.
2
This dissertation consists of three testing procedures related to structural changes. It
includes modified tests to detect orthogonal breaks in a univariate time series in Chapter 2, a
sequential test to continuously detect parameter instability in a model with polynomial trend
in Chapter 3, and a new test for common breaks in panel data models in Chapter 4.
1.2 Overview: Chapter 2
In Chapter 2, we consider a linear regression model given by
yt = x′tβt + ut (t = 1, 2, · · · , T ),
where xt = [x1t, x2t, · · · , xkt]′ is a k-dimensional regressor, ut is an unobservable stochastic
disturbance, and βt is a k-dimensional vector of coefficients. We are interested in testing for
parameter instability in regressions, which is given by
H0 : βt = β ∀t vs. H1 : βt = β + δ1t>[Tλ],
where 1(t>[Tλ]) is an indicator function, taking the value one if t > [Tλ] and zero otherwise.
The original CUSUM tests are based on the recursive residuals or OLS residuals. However,
Deng and Perron (2008) verified that the tests lose power when the regressor is orthogonal
to the direction of the shift. To overcome the shortcoming, we propose two versions of
the modified tests based on the product of the regressors and the residuals. We show that
the power of proposed tests under the fixed alternative is depending on a quadratic form
of the shift and the second-moment of the regressors. This power property ensures the
modified versions to avoid the loss of power in the case of E(x′tδ) = 0. We further extend
the modified tests to models allowing for serial correlation in disturbances. The modified
tests are superior to the original ones in terms of power performance when the mean of the
regressors is orthogonal to the shift.
1.3 Overview: Chapter 3
The tests in Chapter 2 are often used to examine what happens in historical data sets.
Since new data arrive steadily and quickly, this chapter focuses on a sequential test to detect
parameter changes on-line in a model with a trend, which is specified by
yt = x′tβt + ϵt (t = 1, 2, · · · ,m,m+ 1, · · · ),
3
where xt = [1, t/m]′ is a regressor including a constant term and a trend, ϵt is an unobservable
stochastic disturbance, and βt = [β0t, β1t]′ is a vector of the coefficients.
It is assumed that the parameters are stable in a training period of size m, that is,
βt = β0, t = 1, 2, · · · ,m.
When the data arrive sequentially one by one after the training period, we are interested in
detecting a change in parameters as soon as it occurs. Then, the null hypothesis is specified
by
H0 : βt = β0, t = m+ 1,m+ 2, · · · ,
while the alternative hypothesis is
H1 : There is k∗ ≥ 1 such that βt = β0, t = m+ 1,m+ 2, · · · ,m+ k∗ − 1,
but βt = β0 +∆, t = m+ k∗,m+ k∗ + 1, · · · .
We introduce a stopping time defined by
τm =
infk ≥ 1 : |Γ(m, k)| ≥ g(m, k),∞, if |Γ(m, k)| < g(m, k) for all k = 1, 2, · · · .
The null hypothesis will be rejected if the detecting statistic Γ(m, k) exceeds a suitably
constructed threshold function g(m, k) and otherwise the procedure will never terminate.
Our detecting statistic is constructed by the cumulative sum of the future residuals after
time m, and we can show that the limiting distribution of the detecting statistic has a
growing variance. This implies that a constant boundary cannot be used for the monitoring
procedure because the detecting statistic will eventually exceed a given critical values and
the null hypothesis will be rejected with a probability approaching one even if the parameter
is stable. To ensure proper empirical size of the test, we design the boundary function with
the same growth rate as that of the detecting statistic.
We derive the limiting null distribution and the consistency of the procedure under the
alternative. Then, we extend the CUSUM monitoring procedure to models with higher order
polynomial trends. We further find that the limit distributions of the stopping time for
the CUSUM test as well as the fluctuation one proposed by Qi et al. (2016) are normal
distributions. Moreover, the stopping time of the CUSUM test grows at a slower rate than
that of the fluctuation test, which implies that the delay time based on the CUSUM procedure
tends to be shorter than that based on the fluctuation one. This advantage has been confirmed
in simulations and an empirical study.
4
1.4 Overview: Chapter 4
In Chapter 4, we consider a panel data model in which there are N series and each series has
T observations:
yit = x′itβi + x′itδi1t>k0i + uit, 1 ≤ i ≤ N and 1 ≤ t ≤ T,
where xit = [xit(1), · · · , xit(p)]′ is p-dimensional explanatory variables including a constant
term. Coefficient δi is the shift for individual i at an unknown date k0i , and uit is an unob-
servable stochastic disturbance.
We are interested in testing the validity of the common break assumption. Under the null
hypothesis, the break date in each series is assumed to be common, that is,
H0 : k0i = k0, for all i = 1, 2, · · · , N.
Under the alternative, the break dates are common for individuals in the same group but
allowed to differ across groups. Suppose that there exist G groups, the alternative hypothesis
is defined by
HA : k0g1 = k0g2 , for some g1, g2 ∈ 1, 2, · · · , G.
The numerator of the proposed statistic is squared of the cumulative sum of the OLS
residuals, while the denominator is constructed by the normalization factor, instead of a
long-run variance estimator such that we can avoid power loss when the shift increases under
the alternative (so-called non-monotonic power problem). We find that the proposed statistic
has a non-degenerate distribution under the null, which is a functional of Brownian bridges.
When the common break assumption fails, the statistic will diverge to infinity such that we
can reject the null hypothesis. Simulations show that the size of the test is controlled when
N and T are large and the test is powerful under various types of alternatives.
5
Chapter 2
Modified Tests for OrthogonalStructural Changes in Time SeriesModels
The CUSUM test has played an important role in theory and applications related to structural
change, but its drawback is that it loses power when the break is orthogonal to the mean of the
regressors. In this chapter, we consider two modified CUSUM tests that have been proposed,
implicitly or explicitly, in the literature to detect such structural changes and investigate the
limiting power properties of these tests under a fixed alternative. We demonstrate that the
modified tests are superior to the classic tests in terms of both asymptotic theory and finite
samples when detecting an orthogonal structural shift.1
2.1 Introduction
The original CUSUM test introduced by Brown et al. (1975) has been widely used to test
for parameter stability in practical analyses. It has also been investigated extensively and
extended in various ways in the literature. With regard to its power properties, for exam-
ple, Garbade (1977) studied the finite sample performance of the CUSUM test under three
patterns of coefficient variations. The results of the Monte Carlo experiments showed that
the CUSUM test is weak at detecting parameter instability under the simulation settings.
Because this property has been observed repeatedly in the literature, Ploberger and Kramer
(1990) theoretically investigated the power of the test, where changes in the parameters are
local to zero under the alternative. They showed that the limiting distribution of the test
1The published version is Jiang and Kurozumi (2019). Power properties of the mod-ified CUSUM tests. Communications in Statistics - Theory and Methods 48, 2962-2981.https://doi.org/10.1080/03610926.2018.1473598
6
under the local alternative is expressed as a Brownian motion, plus an additional term re-
lated to the interaction between the mean of the regressors and direction of the structural
change. This result implies that in the case of a simple shift in parameters, the power of the
CUSUM test depends on the angle between the mean of the regressors and direction of the
shift. Furthermore, the test loses power when this angle is perpendicular. Their result theo-
retically explains the poor performance of the CUSUM test in the study of Garbade (1977),
in which the mean of the regressors is set as orthogonal to the shift (the mean is equal to
zero). While the original CUSUM test was proposed by using recursive residuals, Ploberger
and Kramer (1992) developed a CUSUM test based on ordinary least squares residuals and
compared the local power of this test with that of the original test. By contrast, Deng and
Perron (2008) investigated the power properties of both versions of the CUSUM test from
a non-local perspective. They derived the limiting properties of the test statistic under the
fixed alternative and confirmed that even in this case, the power of the test depends on the
angle between the mean of the regressors and direction of the change.
Because these undesirable power properties of the CUSUM test have been noted in the
literature, several modified versions of the test have been proposed, explicitly or implicitly,
to overcome the problem. For example, Luger (2001) introduced a test statistic based on the
symmetrization of the absolute value of the recursive residuals. This modified test performs
better than the original test does in terms of power when the angle between the mean of the
regressor and shift is perpendicular, although the original CUSUM test performs better when
this angle decreases. Huskova and Koubkova (2005) considered a quadratic form of the prod-
uct of the regressors and the residuals for monitoring tests, while Xia, Guo, and Zhao (2011)
proposed a CUSUM test based on the weighted residuals from the GMM estimation. These
studies concentrate on the reactions of the tests on the location change and the magnitude
of the break, but do not analyze the impact of the angle on the power of the tests. Further
studies are thus necessary to investigate the performance of these methodologies in terms of
detecting an orthogonal structural change.
Therefore, in this chapter, we investigate two versions of the CUSUM test that are mod-
ified to avoid losing power when the mean of the regressor is orthogonal to the shift. The
asymptotic distributions of the test statistics are investigated under the null hypothesis of
parameter stability as well as under the fixed alternative, and then we investigate the power
of each of the modified tests. We confirm that the modified tests are superior to the classic
test in terms of both asymptotic theory and finite samples when detecting an orthogonal
7
structural shift.
The remainder of Chapter 2 is organized as follows. Section 2.2 introduces the model
and assumptions. Section 2.3 presents the asymptotic behaviors of the two modified tests.
Then, we extend the modified tests further to models with serially correlated errors. The
finite sample properties are investigated by using Monte Carlo simulations in Section 2.4.
Concluding remarks are given in Section 2.5. The mathematical proofs are relegated to
Appendix A.
2.2 Models and Assumptions
We consider the standard linear regression model given by
yt = x′tβt + ut (t = 1, 2, · · · , T ), (2.1)
where xt = [x1t, x2t, · · · , xkt]′ is a k-dimensional regressor, ut is an unobservable stochastic
disturbance, and βt is a k-dimensional vector of coefficients. Because a constant term is
typically included in a model, the first element of the regressor, x1t, is unity for all t. We
consider the testing problem given by
H0 : βt = β ∀t vs. H1 : βt = β + δ1(t>[Tλ]),
where λ ∈ (0, 1) represents the break fraction and 1(t>[Tλ]) is an indicator function, taking
the value one if t > [Tλ] and zero otherwise. Then, the parameters in (2.1) are stable under
the null hypothesis, whereas we allow for a one-time change in the parameters under the
alternative.
To investigate the asymptotic properties of the CUSUM test, we make the following
assumptions.
Assumption 2.1 The regressor xt and error term ut are defined on a common probability
space, and the following condition holds:
limT→∞
sup1
T
T∑t=1
∥ xt ∥2+δ< ∞, a.s. for some δ > 0.
Assumption 2.2 The following probability limits exist:
p limT→∞
1
T
T∑t=1
xt = E[xt] = c1,
8
p limT→∞
1
T
T∑t=1
xtx′t = E[xtx
′t] = C,
p limT→∞
1
T
T∑t=1
xtx′t ⊗ xtx
′t = E[xtx
′t ⊗ xtx
′t] = Λ,
where c1 is a k×1 vector, and C and Λ are k×k and k2×k2 non-singular and non-stochastic
matrices, respectively.
These assumptions are satisfied, for example, if xt is a weakly dependent stationary process
with more than fourth-order moments. We need Assumption 2.2 to investigate the power of
the modified CUSUM tests. We denote the rows of C by c′j for i = 1, · · · , k. That is,
C = E[xtx′t] = E
x1tx
′t
x2tx′t
...xktx
′t
=
c′1c′2...c′k
.
The vector c1 is called the mean regressor.
Assumption 2.3 The disturbances ut are stationary and ergodic with
E[ut|Ut−1] = 0, E[u2t |Ut−1] = σ2, E[u4t ] < ∞,
where Ut−1 is the σ-field generated by xt, ut−1, xt−1, ut−2, · · · .
Assumption 2.3 implies that the error term is a martingale difference sequence. The uncor-
relatedness of the errors can be checked by testing for serial correlation using the regression
residuals. In the following, we first proceed with this assumption. However, it is relaxed in
a later section to investigate the effect of serial correlation on the power of the tests.
2.3 Modified CUSUM Tests
2.3.1 Motivation
We first consider the standard CUSUM test to motivate the modification. The test statistic
based on the OLS residuals is given by
CUSUMols = sup0≤r≤1
∣∣∣∣∣∣ 1
σ√T
[Tr]∑t=1
ut
∣∣∣∣∣∣ ,
9
where ut are the OLS residuals and σ2 = T−1∑T
t=1 u2t . The recursive residuals-based test
statistic is
CUSUM rec = sup0≤r≤1
∣∣∣∣∣∑[Tr]
t=k+1 ut
σ√T − k
∣∣∣∣∣/(
1 + 2[Tr]− k
T − k
),
where ut = (yt−x′tβt−1)/ft, for t = k+1, · · · , T , are the recursive residuals; βt = (X ′tXt)
−1X ′tYt,
with Xt = [x′1, x′2, · · · , x′t]′ and Yt = [y1, y2, · · · , yt]′; ft = (1 + x′t(X
′t−1Xt−1)
−1xt)1/2; and
σ2 = (T − k)−1∑T
t=k+1(ut − ¯u)2, with ¯u = (T − k)−1∑T
t=k+1 ut.
Ploberger and Kramer (1990, 1992) derived the limiting distributions of these test statis-
tics under the local alternative and Deng and Perron (2008) investigated the asymptotic
properties of these statistics under the fixed alternative. Their results imply that the power
of these tests depends on the angle between the mean regressor and direction of the break. To
explain this dependence, we demonstrate that the power of the OLS-based test depends on
c′1δ by focusing on the fixed alternative. Given that the OLS estimator of β can be expressed
as
β = β +
(T∑t=1
xtx′t
)−1 T∑t=1
(xtx
′tδ1(t>[Tλ]) + xtut
), (2.2)
the OLS residuals are given by
ut = ut + x′tδ1(t>[Tλ]) − x′t(β − β), (2.3)
and, thus,
1
T
[Tr]∑t=1
ut =1
T
[Tr]∑t=1
ut +1
T
[Tr]∑t=1
x′tδ1(t>[Tλ])
− 1
T
[Tr]∑t=1
x′t
(1
T
T∑s=1
xsx′s
)−1 [1
T
T∑s=1
xsx′sδ1(s>[Tλ]) +
1
T
T∑s=1
xsus
]. (2.4)
It can be shown that, for r > λ, the second and third terms on the right-hand side of (2.4)
converge in probability to (r−λ)c′1δ and −r(1−λ)c′1δ, respectively. Therefore, the OLS-based
CUSUM test loses power when c′1δ = 0.
To avoid the dependence of the power on c′1δ, we modify the CUSUM test such that it is
not based on the residuals, but instead on the product of xjt (for j = 1) and the residuals.
Let wjt = xjtut and wjt = xjtut. Then, the modified CUSUM test statistics are defined as
CUSUMolsm = sup
0≤r≤1
∣∣∣∣∣∣ 1
σj√T
[Tr]∑t=1
wjt
∣∣∣∣∣∣ ,CUSUM rec
m = sup0≤r≤1
∣∣∣∣∣∑[Tr]
t=k+1 wjt
σj√T − k
∣∣∣∣∣/(
1 + 2[Tr]− k
T − k
),
10
where
σ2j =
1
T
T∑t=1
w2jt and σ2
j =1
T − k
T∑t=k+1
(wjt − ¯wj)2 with ¯wj =
1
T − k
T∑t=k+1
wjt.
Proposition 2.1 Suppose Assumptions 2.1–2.3 hold.
(a) Under the null hypothesis,
CUSUMolsm ⇒ sup
0≤r≤1|BBj(r)|, (2.5)
CUSUM recm ⇒ sup
0≤r≤1
∣∣∣∣Wj(r)
1 + 2r
∣∣∣∣ , (2.6)
where BBj(r) and Wj(r) are the one-dimensional standard Brownian bridge and Brownian
motion, respectively and ⇒ denotes the weak convergence of the associated probability mea-
sures.
(b) Under the alternative hypothesis,
1√TCUSUMols
mp−→
|c′jδ|λ(1− λ)√σ2cjj + λ(1− λ)δ′Λjj,0δ
, (2.7)
1√TCUSUM rec
mp−→
|c′jδ|q√σ2cjj + λ(1− λ)δ′Λjj,0δ − (c′jδλ log(λ))2
, (2.8)
where cjj and Λjj,0 are the (j, j) element of C and (j, j) block of Λ, respectively; that is,
1
T
T∑t=1
x2jtp−→ cjj , and
1
T
T∑t=1
x2jtxtx′t
p−→ Λjj,0,
and q = sup0≤r≤1
λ log rλ1(r>λ)
1 + 2r=
λ log λ∗
λ
1 + 2λ∗ 0 ≤ λ < e−32
−λ log λ
3e−
32 ≤ λ ≤ 1,
(2.9)
where λ∗ = λ∗ : 0 ≤ λ∗ ≤ 1 and log λ∗ = 1 + log λ+ 12λ∗ .
Proposition 2.1 shows that our modification could work well, even in the case of c′1δ = 0, for
c′jδ = 0 and j = 1. Thus, we can avoid the loss of power caused by the orthogonal change.
However, the modified test loses power if c′jδ = 0, and we do not know whether c′jδ = 0 for
some j = 1, · · · , k. We discuss how to overcome this problem in the following subsection.
2.3.2 Modified CUSUM tests
Note that C = E[xtx′t] is positive definite according to Assumption 2.2. Therefore, we can
easily see that c′jδ = 0 for at least one of j = 1, · · · , k if δ = 0. Thus, it is natural to construct
11
the test statistics based on all w1t, · · · , wkt or w1t, · · · , wkt to avoid the potential loss of power
caused by c′jδ = 0 for some j. One of the possible transformations used in the literature is to
construct a quadratic form based on xtut = [w1t, · · · , wkt]′ or xtut = [w1t, · · · , wkt]
′, given by
Qols = sup0≤r≤1
Qols(r) where Qols(r) =
(∑[Tr]t=1 xtut
)′ (∑Tt=1 xtx
′t
)−1 (∑[Tr]t=1 xtut
)σ2
,
Qrec = sup0≤r≤1
Qrec(r) where Qrec(r) =
(∑[Tr]t=k+1 xtut
)′ (∑Tt=1 xtx
′t
)−1 (∑[Tr]t=k+1 xtut
)σ2
,
where σ2 and σ2 are defined as before. The following theorem provides the asymptotic
properties of these test statistics.
Theorem 2.1 Suppose Assumptions 2.1–2.3 hold.
(a) Under the null hypothesis,
Qols ⇒ sup0≤r≤1
∥BB(r)∥2 , (2.10)
Qrec ⇒ sup0≤r≤1
∥W (r)∥2 , (2.11)
where BB(r) and W (r) are the k-dimensional standard Brownian bridge and Brownian mo-
tion, respectively.
(b) Under the alternative hypothesis,
1
TQols p−→ δ′Cδλ2(1− λ)2
σ2 + λ(1− λ)δ′Cδ, (2.12)
1
TQrec p−→ δ′Cδ(λ log λ)2
σ2 + λ(1− λ)δ′Cδ − (c′1δλ log(λ))2. (2.13)
Theorem 2.1 clearly shows that the modified test statistics Qols and Qrec can avoid the
loss of power caused by c′jδ = 0, for some j, and are consistent because δ′Cδ = 0.
The other possible transformation is to take the maximum of the absolute values of the
elements of xtut or xtut. Let
[Mols
1 (r), · · · ,Molsk (r)
]′=
1
σ
(T∑t=1
xtx′t
)−1/2 [Tr]∑t=1
xtut,
[M rec1 (r), · · · ,M rec
k (r)]′ =1
σ
(T∑t=1
xtx′t
)−1/2 [Tr]∑t=k+1
xtut,
12
and consider the following test statistics:
Mols = max
sup
0≤r≤1
∣∣∣Mols1 (r)
∣∣∣ , · · · , sup0≤r≤1
∣∣∣Molsk (r)
∣∣∣M rec = max
sup
0≤r≤1|M rec
1 (r)| , · · · , sup0≤r≤1
|M reck (r)|
.
Theorem 2.2 Suppose Assumptions 2.1–2.3 hold.
(a) Under the null hypothesis,
Mols ⇒ max
sup
0≤r≤1|BB1(r)|, · · · , sup
0≤r≤1|BBk(r)|
, (2.14)
M rec ⇒ max
sup
0≤r≤1|W1(r)|, , · · · , sup
0≤r≤1|Wk(r)|
, (2.15)
where BBj(r) and Wj(r) (for j = 1, · · · , k) are the independent one-dimensional stan-
dard Brownian bridge and Brownian motion, respectively.
(b) Under the alternative hypothesis,
1√TMols p−→ max|v1|, · · · , |vk|λ(1− λ)√
σ2 + λ(1− λ)δ′Cδ, (2.16)
1√TM rec p−→ max|v1|, · · · , |vk|(−λ log λ)√
σ2 + λ(1− λ)δ′Cδ − (c′1δλ log(λ))2, (2.17)
where vj is the jth element of C1/2δ, for j = 1, · · · , k; that is, [v1, · · · , vk]′ = C1/2δ.
Again, we can see from Theorem 2.2 that the maximum-type tests, Mols and M rec, are
consistent irrespective of whether c′jδ = 0 for some j.
The critical values of the null-limiting distributions of the quadratic-type and maximum-
type tests are obtained by approximating a standard Brownian motion using 2,000 indepen-
dent normal random variables with 1,000,000 replications (see Panel (a) of Table 2.1). In
addition, it is sometimes the case that the test statistics are constructed by removing the
first and last 100ε % observations. In this case, we have
supε≤r≤1−ε
Qols(r) ⇒ supε≤r≤1−ε
∥BB(r)∥2 , supε≤r≤1−ε
Qrec(r) ⇒ supε≤r≤1−ε
∥W (r)∥2 , (2.18)
max1≤j≤k
sup
ε≤r≤1−ε
∣∣∣Molsj (r)
∣∣∣ , j = 1, · · · , k
⇒ max1≤j≤k
sup
ε≤r≤1−ε|BBj(r)|, j = 1, · · · , k
,
(2.19)
max1≤j≤k
sup
ε≤r≤1−ε
∣∣M recj (r)
∣∣ , j = 1, · · · , k
⇒ max1≤j≤k
sup
ε≤r≤1−ε|Wj(r)|, j = 1, · · · , k
. (2.20)
Panel (b) of Table 2.1 presents the critical values for these distributions with ε = 0.15.
13
We next investigate the limiting properties (not the finite sample properties) of the tests
under the alternative, based on Theorems 2.1 and 2.2. Because the probability limits under
the alternative depend on several parameters in the model, we focus on a simple case where
σ2 = 1 and xt = [1, zt]′, with zt ∼ i.i.d.N(1, 1). Furthermore, the change in the coefficients
is specified by δ = [1,−1]′ and δ = [1, 1]′, which correspond to c′1δ = 0 and c′1δ = 0,
respectively. Given that the power properties depend not only on the probability limits
under the alternative, but also on the critical values used in the tests, we compare these
limits divided by the asymptotic 5% critical values.
Figures 2.1(a) and (b) show the probability limits of the quadratic-type tests given by
(2.12) and (2.13), respectively divided by the corresponding critical values. We can see
that the limit of the OLS-based version is maximized at the midpoint, whereas that of the
recursive-based version is skewed to the right. As expected from the power analyses of
Ploberger and Kramer (1990) and Deng and Perron (2008), Qrec is more powerful than Qols
when the break occurs early in the sample, whereas the reversed relation is observed when λ is
closer to one. A similar tendency is observed for the maximum-type test, as shown in Figures
2.1(c) and (d). Figures 2.1(e) and (f) compared the two types of tests, the quadratic-type
and maximum-type tests based on the OLS method. However, neither version is uniformly
superior to the other because the powers depend on many factors such as the number of
regressors k and δ′Cδ, among others. For instance, in the case of k = 2, the quadratic-type
test outperforms the maximum-type test in our setting, as shown in Figure 2.1(e). However,
Figure 2.1(f) implies that the latter performs better in the case of k = 3.
2.3.3 Serially correlated errors
In practice, it is sometimes the case that the error term is not a martingale difference sequence,
but instead is serially correlated. As shown by, for example, Tang and MacNeill (1993), serial
correlation in the error term can produce striking effects on the distribution. Therefore, when
the error term is possibly serially correlated, we need to construct the test statistics by taking
serial correlation into account. In this case, we replace Assumption 2.3 with the following
assumption.
Assumption 2.4 The following functional central limit theorem holds:
1√T
[Tr]∑t=1
xtut ⇒ Ω1/2W (r)
14
uniformly for 0 ≤ r ≤ 1, where Ω =∑∞
p=−∞ Γp with Γp = Cov(xtut, xt−put−p).
The assumptions made in this chapter are standard in the investigation of linear regression
models with possible structural changes. The conditions necessary for Assumption 2.4 to hold
are discussed in econometric and statistical textbooks (e.g., Davidson, 1994).
The test statistics, Qols, Qrec, Mols, and M rec, are defined as before, with
Qols(r) =1
T
[Tr]∑t=1
xtut
′
Ω−1
[Tr]∑t=1
xtut
, (2.21)
Qrec(r) =1
T
[Tr]∑t=k+1
xtut
′
Ω−1
[Tr]∑t=k+1
xtut
, (2.22)
[Mols
1 (r), · · · ,Molsk (r)
]′=
1√TΩ−1/2
[Tr]∑t=1
xtut, (2.23)
[M rec1 (r), · · · ,M rec
k (r)]′ =1√TΩ−1/2
[Tr]∑t=k+1
xtut, (2.24)
where Ω and Ω are consistent estimators of Ω, based on xtut and xtut, respectively. In
practice, it is often the case that Ω is estimated non-parametrically, such that
Ω = Γ0 +
m∑p=1
k(p,m)(Γp + Γ′
p
)where Γp =
1
T
T∑t=p+1
xtx′t−putut−p,
k(p,m) = 1 − p/(m + 1) is the Bartlett kernel, and the bandwidth m is selected based on
Andrews (1991), such that
m = [1.1447× (a(δ)T )1/3] where a(δ) =
∑kj=1 4ρ
2j σ
4j /[(1− ρj)
6(1 + ρj)2]∑k
j=1 σ4j /(1− ρj)4
,
with ρj obtained by regressing wjt on wjt−1 and σ2j defined as before. Then, Ω is defined
similarly by using the recursive residuals.
Let us define γjj,p and Λjj,p as the probability limits of
1
T
T∑t=p+1
xjtxjt−putut−pp−→ γjj,p and
1
T
T∑t=p+1
xjtxjt−pxtx′t−p
p−→ Λjj,p.
Theorem 2.3 Suppose Assumptions 2.1, 2.2, and 2.4 hold and that the quadratic-type and
maximum-type test statistics are constructed by using (2.21)–(2.24).
(a) Under the null hypothesis, Qols, Qrec, Mols, and M rec have the same limiting distributions
15
as those in Theorems 2.1(a) and 2.2(a).
(b) Under the alternative hypothesis, if δ′(Λjj,1 − Λjj,0)δ → 0 as |δ| → ∞, for some j, then
1
T 2/3Qols = Op(∥δ∥−4/3),
1
T 2/3Qrec = Op(∥δ∥−4/3),
1
T 1/3Mols = Op(∥δ∥−2/3),
1
T 1/3M rec = Op(∥δ∥−2/3),
whereas if δ′(Λjj,1 − Λjj,0)δ → 0 as |δ| → ∞ for all j, then
1
T 2/3Qols = Op(1),
1
T 2/3Qrec = Op(1),
1
T 1/3Mols = Op(1),
1
T 1/3M rec = Op(1).
We can see that δ′(Λjj,1−Λjj,0)δ → 0 if xt consists only of a constant (xt = 1). In this case,
the tests suffer from the so-called non-monotonic power problem, as investigated by Vogelsang
(1999). Several methods have been proposed to overcome this problem, including those of
Crainiceanu and Vogelsang (2007), Kejriwal (2009), Juhl and Xiao (2009), Shao and Zhang
(2010), Yang and Vogelsang (2011), and Yamazaki and Kurozumi (2015). Furthermore, even
if the above condition does not hold, note that the divergence rates of the test statistics are
reduced in the case of serially correlated errors compared with the case in Section 2.3.2. That
is, the modifications robust to serial correlation result in the reduction of power, as is often
observed in the literature.
2.4 Finite Sample Properties
In this section, we investigate the finite sample performance of the tests considered in this
chapter. The data-generating process (DGP) we consider is given by
yt = x′t(β + δ1(t>[Tλ])) + ut, ut = ρut−1 + ϵt,
where xt = [1, zt]′, β = [1, 1]′, and ϵt ∼ i.i.d.N(0, (1 − ρ)2). The settings for δ and λ are
explained later. The stochastic regressor zt is an AR(1) process with mean 1 and variance 1,
given by
zt = 0.5 + 0.5zt−1 + et, et ∼ i.i.d.N(0, 0.75),
where et is independent of ϵt. We set ρ = 0 to investigate the performance of the
truncated versions of the tests in Section 2.3.2 given in (2.18)–(2.20), while ρ = 0.4 and 0.8
are used for the tests robust to serial correlation developed in Section 2.3.3. The sample size
16
T is 100 and 200, the number of replications is 5,000, and all computations are conducted by
using the GAUSS matrix language.
We first investigate the finite sample performance of the tests in Section 2.3.2 with ρ = 0.
Panel (a) of Table 2.2 shows that the sizes of all the tests are relatively well controlled,
although they tend to be slightly conservative. Because the empirical sizes of the tests are
different, we investigate the finite sample properties of the tests under the alternative using the
size-adjusted powers. We set a one-time shift in the coefficient to δ = b[1, 1]′ (non-orthogonal
change with c′1δ = 0) and δ = b[−1, 1]′ (orthogonal change with c′1δ = 0). Here, the magnitude
of the change is controlled by b = 0, 0.5, 1.0, 1.5, and 2.0, and the break fraction λ is set
to 0.5. Figures 2.2(a) and (b) show that the difference in power is relatively small among
the three tests based on the same (OLS or recursive) residuals when c′1δ = 0. However, as
shown in Figures 2.2(c) and (d), when c′1δ = 0, the modified tests are more powerful than the
original tests. When we focus on either the quadratic-type or the maximum-type tests, the
OLS-based test is more powerful than the recursive-based test. We also investigate the effect
of the location of the break on the performance of the quadratic-type and maximum-type
tests by changing λ from 0.2 to 0.8. Figures 2.2(e)–(h) show that the effect of the location
of the change in finite samples is consistent with the theoretical result given in Section 2.3.
For example, the modified tests using the OLS residuals are maximized at λ = 0.5. For an
early break, the tests using recursive residuals outperform those using OLS residuals, and
vice versa, for a late break.
In the case where the error term may be serially correlated, we should use the tests
proposed in Section 2.3, the empirical sizes of which are summarized in Panels (b) and (c) of
Table 2.2. We can see that the tests based on the OLS residuals tend to suffer from under-size
distortion, particularly when serial correlation is strong (ρ = 0.8). With regard to power in
the case of ρ = 0.4, the relative performance of the tests seems to be preserved, although
Figures 2.3(a) and (b) show that the tests suffer from the so-called non-monotonic power.2
Figures 2.3(g) and (h) show that the effect of the location of a change on the tests is similar
to the case of serially uncorrelated errors. We obtain a similar tendency in the case of ρ = 0.8
and, thus, omit the details.
When detecting non-orthogonal structural changes, the classic CUSUM test is slightly
powerful than the modified ones in some cases. In application, when the classic CUSUM test
2Because the main purpose of this study is to investigate the modified CUSUM tests, developed to overcomethe loss of power caused by an orthogonal shift of the parameters, we do not pursue this problem further here.
17
rejects the null hypothesis, but the modified one cannot reject the null hypothesis, we can
estimate the value of c′1δ ( the angle between the mean regressor and direction of the break)
by using real data. If the estimated value of c′1δ is far from zero, then we may rely on the
result based on the classic CUSUM test.
2.5 Conclusion
When a structural change is orthogonal to the mean of the regressors, standard CUSUM tests
lose power. As a result, several modified tests have been proposed, explicitly or implicitly, in
the literature. We investigated the asymptotic properties of such modified tests and found
that they can successfully reject the null hypothesis, even in the case of an orthogonal struc-
tural change. In this sense, the modified tests complement the standard test in theoretical
analyses. In this study, we focused on a fixed alternative hypothesis. We can investigate the
power properties of the modified CUSUM tests from a local perspective in future works.
Appendix A
Proof of Proposition 2.1. (a) In the case of the OLS residuals, from the functional central
limit theorem, weak law of large numbers, and continuous mapping theorem, we have
√T (β − β) ⇒ C−1B(1), (A.1)
where B(r) = [B1(r), · · · , Bk(r)]′ for 0 ≤ r ≤ 1 is a k-dimensional Brownian motion with
variance σ2C. Since ut = ut − x′t(β − β), we have
1√T
[Tr]∑t=1
wjt =1√T
[Tr]∑t=1
xjtut −1√T
[Tr]∑t=1
xjtx′t(β − β)
⇒ Bj(r)− rc′jC−1B(1)
= Bj(r)− rBj(1),
where the last equality holds because c′jC−1 = [0, · · · , 0, 1, 0, · · · , 0]. Similarly, we have
σ2j =
1
T
T∑t=1
x2jtu2t + op(1)
p−→ σ2cjj .
Since Bj(r)−rBj(1) =d σc1/2jj BBj(r) where BBj(r) is a standard Brownian bridge, we obtain
(2.5).
18
For the CUSUM test based on the recursive residuals, following Kramer, Ploberger, and
Alt (1988), we have
1√T − k
[Tr]∑t=k+1
wjt =1√
T − k
[Tr]∑t=k+1
xjtut ⇒ σc1/2jj Wj(r),
and T−1∑T
t=1 w2jt
p−→ σ2cjj , where Wj(r) for 0 ≤ r ≤ 1 is a standard Brownian motion. We
then obtain (2.6).
(b) Because β is expressed as (2.2) under the alternative, we have
β − βp−→ (1− λ)δ.
Then, by using expression (2.3), the numerator of the test statistic becomes
1
T
[Tr]∑t=1
wjt =1
T
[Tr]∑t=1
xjtut +1
T
[Tr]∑t=1
xjtx′tδ1(t>[Tλ]) −
1
T
[Tr]∑t=1
xjtx′t(β − β)
p−→ 0 + c′jδ(r − λ)1(r>λ) − rc′j(1− λ)δ
= c′jδ[(r − λ)1(r>λ) − r(1− λ)
](A.2)
uniformly over 0 ≤ r ≤ 1, the absolute value of which is maximized at r = λ, which is equal
to |c′jδ|λ(1− λ). Similarly, we have
1
T
T∑t=1
w2jt =
1
T
T∑t=1
x2jtu2t +
1
T
T∑t=1
(xjtx
′tδ1(t>[Tλ])
)2+
1
T
T∑t=1
(xjtx
′t(β − β)
)2+2
T
T∑t=1
x2jtutx′tδ1(t>[Tλ]) −
2
T
T∑t=1
x2jtutx′t(β − β)
− 2
T
T∑t=1
x2jt1(t>[Tλ])δ′xtx
′t(β − β)
p−→ σ2cjj + (1− λ)δ′Λjj,0δ + (1− λ)2δ′Λjj,0δ + 0− 0− 2(1− λ)2δ′Λjj,0δ
= σ2cjj + λ(1− λ)δ′Λjj,0δ. (A.3)
From (A.2) and (A.3), we obtain (2.7).
The proof in the case with the recursive residuals is analogous to the OLS case. We first
note that because ftp−→ 1, T−1
∑Tt=1(1/ft−1)xtx
′t
p−→ 0, which implies that T−1∑T
t=1 xtx′t/ft
p−→
C. This result is used repeatedly in our proofs below.
Since the recursive residuals are written under the alternative as ut = [ut + x′tδ1(t>[Tλ]) −
x′t(βt−1−β)]/ft and βt−1−β =(∑t−1
s=1 xsx′s
)−1 (∑t−1s=1 xsx
′sδ1(t>[Tλ]) +
∑t−1s=1 xsus
), wjt can
19
be expressed as
wjt =1
ftxjtut +
1
ftxjtx
′tδ1(t>[Tλ]) −
1
ftxjtx
′t
(t−1∑s=1
xsx′s
)−1( t−1∑s=1
xsx′sδ1(s>[Tλ]) +
t−1∑s=1
xsus
).
Then, we have
1
T
[Tr]∑t=1
wjtp−→ 0 + c′j
∫ r
0δ1(v>λ)dv − c′j
∫ r
0
[(vC)−1
∫ v
0Cδ1(w>λ)dw
]dv
= c′jδλ[log(r)− log(λ)]1(r>λ) (A.4)
uniformly over 0 ≤ r ≤ 1. By using standard calculus, we can show that |c′jδ|λ[log(r) −
log(λ)]1(r>λ)/(1 + 2r) takes the maximum value |c′jδ|q, where q is defined as (2.9). Further-
more, similarly to (A.3), we have
1
T
T∑t=1
w2jt
p−→ σ2cjj + (1− λ)δ′Λjj,0δ
+
∫ 1
0tr
[1
vC−1
(∫ v
0Cδ1(w>λ)dw
)(∫ v
0Cδ1(w>λ)dw
)′ 1
vC−1Λjj,0
]dv
+0− 0− 2
∫ 1
0tr
[δ1(v>λ)
∫ v
0δ′1(w>λ)dw
1
vΛjj,0
]dv
= σ2cjj + λ(1− λ)δ′Λjj,0δ. (A.5)
From (A.4) and (A.5), we can see that
σ2j
p−→ σ2cjj + λ(1− λ)δ′Λjj,0δ −(c′jδλ log(λ)
)2.
We then obtain (2.8).
Proof of Theorem 2.1. (a) From (A.1), we have
1√T
[Tr]∑t=1
xtut ⇒ B(r)− rB(1),
and thus [Tr]∑t=1
xtut
′(T∑t=1
xtx′t
)−1[Tr]∑
t=1
xtut
⇒ σ2∥BB(r)∥2.
Since σ2 p−→ σ2 under the null hypothesis, (2.10) is obtained.
The null limiting distribution of Qrec can be derived similarly.
(b) In the same way as (A.2), we have
1
T
[Tr]∑t=1
xtutp−→ Cδ
[(r − λ)1(r>λ) − r(1− λ)
](A.6)
20
uniformly over 0 ≤ r ≤ 1, and, thus, 1
T
[Tr]∑t=1
xtut
′(1
T
T∑t=1
xtx′t
)−1 1
T
[Tr]∑t=1
xtut
p−→ δ′Cδ[(r − λ)1(r>λ) − r(1− λ)
]2uniformly over 0 ≤ r ≤ 1, which achieves a maximum at r = λ, while σ2 p−→ σ2+λ(1−λ)δ′Cδ,
as proved by Deng and Perron (2008). We then obtain (2.12).
In the case of the recursive residuals, in the same way as in (A.4), we have
1
T
[Tr]∑t=k+1
xtutp−→ Cδλ (log(r)− log(λ)) 1(r>λ)
uniformly over 0 ≤ r ≤ 1, and, thus, 1
T
[Tr]∑t=k+1
xtut
′(1
T
T∑t=1
xtx′t
)−1 1
T
[Tr]∑t=k+1
xtut
p−→ δ′Cδ[λ(log(r)− log(λ))1(r>λ)
]2,
while σ2 p−→ σ2 + λ(1− λ)δ′Cδ − (c′1δλ log(λ))2 is shown by Deng and Perron (2008), which
implies (2.13).
Proof of Theorem 2.2. (a) Because
[Mols
1 , · · · ,Molsk
]′=
1
σ
(T∑t=1
xtx′t
)−1/2 [Tr]∑t=1
xtutd−→ BB(r),
(2.14) is obtained. (2.15) can be proved similarly to (2.11).
(b) Under the alternative, we have
1√T
[Mols
1 , · · · ,Molsk
]′ p−→C1/2δ
[(r − λ)1(r>λ) − r(1− λ)
]√σ2 + λ(1− λ)δ′Cδ
and, therefore,
1√T
[sup
0≤r≤1|Mols
1 |, · · · , sup0≤r≤1
|Molsk |]′
p−→ 1√σ2 + λ(1− λ)δ′Cδ
[|v1|λ(1− λ), · · · , |vk|λ(1− λ)]′ ,
where vj is defined as in Theorem 2.2. We then have
1√T
max1≤j≤k
sup0≤r≤1
∣∣∣Molsj (r)
∣∣∣ p−→max1≤j≤k|v1|, · · · , |vk|λ(1− λ)√
σ2 + λ(1− λ)δ′Cδ.
For the recursive residuals, we can prove (2.17) similarly to the proof of Theorem 2.1(b).
Proof of Theorem 2.3. (a) The null limiting distributions can be obtained in the same
way as in Theorems 2.1(a) and 2.2(a).
21
(b) We first derive the divergence rate of the bandwidth m. Because ut is expressed as (2.3),
in the same way as (A.3), we have
1
T
T∑t=p+1
wjtwjt−pp−→ γjj,p + λ(1− λ)(δ′Λjj,pδ) (A.7)
for a given p. Then, we have
ρjp−→ γjj,1 + λ(1− λ)δ′Λjj,1δ
γjj,0 + λ(1− λ)δ′Λjj,0δ,
and, hence,
1−ρjp−→ γjj,0 − γjj,1 + λ(1− λ)δ′(Λjj,0 − Λjj,1)δ
γjj,0 + λ(1− λ)δ′Λjj,0δ=
Op
(∥δ∥−2
): δ′(Λjj,0 − Λjj,1)δ → 0
Op(1) : δ′(Λjj,0 − Λjj,1)δ → 0
Because σ2j = Op(∥δ∥2) by (A.3), we can see that a(δ) = Op
(∥δ∥4
)if δ′(Λjj,0 − Λjj,1)δ → 0,
for some j, and is Op(1) otherwise. Hence,
m =
Op
(∥δ∥4/3T 1/3
): δ′(Λjj,0 − Λjj,1)δ → 0 ∃j
Op(T1/3) : δ′(Λjj,0 − Λjj,1)δ → 0 ∀j
By using this result, we next derive the divergence order of Ω. In the same way as (A.7),
Γpp−→ Γp + λ(1− λ)plim
1
T
T∑t=p+1
xtx′tδδ
′xt−px′t−p = Op(∥δ∥2).
Then, because∑m
p=1 k(p,m) = O(m),
∥Ω∥ =
∥∥∥∥∥∥Γ0 +m∑p=1
k(p,m)(Γp + Γ′
p
)∥∥∥∥∥∥≤ O(m)Op(∥δ∥2) =
Op
(∥δ∥10/3T 1/3
): δ′(Λjj,0 − Λjj,1)δ → 0 ∃j
Op(∥δ∥2T 1/3) : δ′(Λjj,0 − Λjj,1)δ → 0 ∀j (A.8)
Because it can be shown that (A.6) holds under Assumption 2.4, we have that
Qols =
Op
(∥δ∥−4/3T 2/3
): δ′(Λjj,0 − Λjj,1)δ → 0 ∃j
Op(T2/3) : δ′(Λjj,0 − Λjj,1)δ → 0 ∀j
Thus, we obtain the result.
The order of Mols is obtained similarly by using (A.6) and (A.8).
Because we can obtain the results of Qrec and M rec in the same way as in the case of the
OLS residuals, we omit the proof here.
22
Tab
le2.1:
Asymptoticcritical
values
10%
5%1%
Quad
ratic
Max
Quad
ratic
Max
Quadratic
Max
ols
rec
ols
rec
ols
rec
ols
rec
ols
rec
ols
rec
(a)r∈[0,1]
k=1
1.465
3.792
1.210
1.947
1.806
4.958
1.344
2.227
2.608
7.795
1.615
2.792
k=2
2.077
5.776
1.341
2.215
2.469
7.179
1.465
2.478
3.347
10.406
1.718
3.002
k=3
2.584
7.473
1.412
2.366
3.009
9.033
1.532
2.616
3.945
12.553
1.773
3.122
k=4
3.038
9.035
1.461
2.468
3.491
10.738
1.578
2.712
4.488
14.486
1.813
3.207
k=5
3.464
10.522
1.498
2.546
3.942
12.340
1.613
2.784
4.999
16.312
1.842
3.270
(b)r∈[0.15,0.85]
k=1
1.463
3.218
1.210
1.794
1.805
4.217
1.344
2.054
2.608
6.636
1.615
2.576
k=2
2.077
4.902
1.340
2.043
2.469
6.100
1.465
2.283
3.347
8.842
1.718
2.770
k=3
2.584
6.350
1.412
2.179
3.009
7.680
1.532
2.412
3.945
10.680
1.773
2.883
k=4
3.038
7.675
1.461
2.274
3.491
9.127
1.578
2.500
4.488
12.298
1.813
2.959
k=5
3.464
8.932
1.498
2.346
3.942
10.489
1.613
2.567
4.999
13.886
1.842
3.018
23
Table 2.2: Empirical sizes under H0
CUSUM Quadratic Max
ols rec ols rec ols rec
(a) ρ = 0T = 100 0.040 0.041 0.033 0.039 0.032 0.040T = 200 0.042 0.045 0.037 0.039 0.040 0.040
(b) ρ = 0.4T = 100 0.037 0.056 0.019 0.059 0.020 0.058T = 200 0.052 0.056 0.033 0.057 0.034 0.054
(c) ρ = 0.8T = 100 0.011 0.050 0.002 0.072 0.002 0.072T = 200 0.029 0.047 0.006 0.057 0.009 0.062
24
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.005
0.01
0.015
0.02
0.025
Quadratic-olsQuadratic-rec
(a) Quadratic of ols and rec (c′1δ = 0)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
Quadratic-olsQuadratic-rec
(b) Quadratic of ols and rec (c′1δ = 0)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.02
0.04
0.06
0.08
0.1
0.12
0.14
Max-olsMax-rec
(c) Max of ols and rec (c′1δ = 0)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.05
0.1
0.15
0.2
0.25
Max-olsMax-rec
(d) Max of ols and rec (c′1δ = 0)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.005
0.01
0.015
0.02
0.025
Quadratic-olsMax-ols
(e) Quadratic and Max of ols (k = 2)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.002
0.004
0.006
0.008
0.01
0.012
0.014
0.016
0.018
Quadratic-olsMax-ols
(f) Quadratic and Max of ols (k = 3)
Figure 2.1: Asymptotic power properties
25
0
0.2
0.4
0.6
0.8
1
0 0.5 1 1.5 2
power
magnitude
CUSUM-olsCUSUM-rec
Q-olsQ-recM-olsM-rec
(a) T = 100, c′1δ = 1
0
0.2
0.4
0.6
0.8
1
0 0.5 1 1.5 2
pow
er
magnitude
CUSUM-olsCUSUM-rec
Q-olsQ-recM-olsM-rec
(b) T = 200, c′1δ = 1
0
0.2
0.4
0.6
0.8
1
0 0.5 1 1.5 2
power
magnitude
CUSUM-olsCUSUM-rec
Q-olsQ-recM-olsM-rec
(c) T = 100, c′1δ = 0
0
0.2
0.4
0.6
0.8
1
0 0.5 1 1.5 2
pow
er
magnitude
CUSUM-olsCUSUM-rec
Q-olsQ-recM-olsM-rec
(d) T = 200, c′1δ = 0
0
0.2
0.4
0.6
0.8
1
0.2 0.3 0.4 0.5 0.6 0.7 0.8
po
we
r
change of location
CUSUM-olsCUSUM-rec
Q-olsQ-recM-olsM-rec
(e) T = 100, c′1δ = 1
0
0.2
0.4
0.6
0.8
1
0.2 0.3 0.4 0.5 0.6 0.7 0.8
po
we
r
change of location
CUSUM-olsCUSUM-rec
Q-olsQ-recM-olsM-rec
(f) T = 200, c′1δ = 1
0
0.2
0.4
0.6
0.8
1
0.2 0.3 0.4 0.5 0.6 0.7 0.8
po
we
r
change of location
CUSUM-olsCUSUM-rec
Q-olsQ-recM-olsM-rec
(g) T = 100, c′1δ = 0
0
0.2
0.4
0.6
0.8
1
0.2 0.3 0.4 0.5 0.6 0.7 0.8
po
we
r
change of location
CUSUM-olsCUSUM-rec
Q-olsQ-recM-olsM-rec
(h) T = 200, c′1δ = 0
Figure 2.2: Size-adjusted powers (ρ = 0)
26
0
0.2
0.4
0.6
0.8
1
0 0.5 1 1.5 2
power
magnitude
CUSUM-olsCUSUM-rec
Q-olsQ-recM-olsM-rec
(a) T = 100, c′1δ = 1
0
0.2
0.4
0.6
0.8
1
0 0.5 1 1.5 2
pow
er
magnitude
CUSUM-olsCUSUM-rec
Q-olsQ-recM-olsM-rec
(b) T = 200, c′1δ = 1
0
0.2
0.4
0.6
0.8
1
0 0.5 1 1.5 2
power
magnitude
CUSUM-olsCUSUM-rec
Q-olsQ-recM-olsM-rec
(c) T = 100, c′1δ = 0
0
0.2
0.4
0.6
0.8
1
0 0.5 1 1.5 2
pow
er
magnitude
CUSUM-olsCUSUM-rec
Q-olsQ-recM-olsM-rec
(d) T = 200, c′1δ = 0
0
0.2
0.4
0.6
0.8
1
0.2 0.3 0.4 0.5 0.6 0.7 0.8
po
we
r
change of location
CUSUM-olsCUSUM-rec
Q-olsQ-recM-olsM-rec
(e) T = 100, c′1δ = 1
0
0.2
0.4
0.6
0.8
1
0.2 0.3 0.4 0.5 0.6 0.7 0.8
po
we
r
change of location
CUSUM-olsCUSUM-rec
Q-olsQ-recM-olsM-rec
(f) T = 200, c′1δ = 1
0
0.2
0.4
0.6
0.8
1
0.2 0.3 0.4 0.5 0.6 0.7 0.8
po
we
r
change of location
CUSUM-olsCUSUM-rec
Q-olsQ-recM-olsM-rec
(g) T = 100, c′1δ = 0
0
0.2
0.4
0.6
0.8
1
0.2 0.3 0.4 0.5 0.6 0.7 0.8
po
we
r
change of location
CUSUM-olsCUSUM-rec
Q-olsQ-recM-olsM-rec
(h) T = 200, c′1δ = 0
Figure 2.3: Size-adjusted powers (ρ = 0.4)
27
Chapter 3
A Sequential Test for StructuralChanges in Models with a Trend
We develop a CUSUM-type monitoring procedure based on the ordinary least squares resid-
uals for detecting structural changes in models with a trend. A proper boundary function is
designed to control the size. We derive the limiting null distribution and the consistency of
the procedure under the alternative. In addition, we derive the asymptotic distribution of
the delay time for the CUSUM procedure as well as the fluctuation procedure proposed by Qi
et al. (2016). The simulation and empirical results indicate that although neither procedure
is uniformly superior to the other, the CUSUM test is more suitable for an early break.1
3.1 Introduction
The first contribution to continuously monitoring parameter changes in the econometrics
literature was by Chu et al. (1996). They introduced a monitoring scheme by setting a
training period of size m in which the parameters are known to be stable as a reference
for comparison with new data and argued that the key feature of the sequential tests is to
construct a nondecreasing boundary function such that the tests can maintain a proper size.
This approach has been developed in many directions. Leisch et al. (2000) extended the
fluctuation test of Chu et al. (1996) based on moving estimates, with the boundary function
having a slower growth rate to improve the sensitivity to a late break in a monitoring period.
The MOSUM (moving sum) procedure was further investigated by Horvath et al. (2008), who
indicated that prior information on the moment structure of innovations is required to choose
a suitable boundary function. Horvath et al. (2004) discussed two classes of the residual-based
1The published version is Jiang and Kurozumi (2020). Monitoring parameter changes in models with atrend. Journal of Statistical Planning and Inference 207, 288-319. https://doi.org/10.1016/j.jspi.2020.01.004
28
cumulative sum monitoring procedure with an infinite monitoring horizon and introduced an
appropriate boundary function with the parameter γ ∈ [0, 1/2) to deal with different timings
of changes. Since the speed of detection is a crucial measure, Aue and Horvath (2004) derived
the limit distribution of the stopping time for a changing mean model, which is asymptotically
normal, while Aue et al. (2009) extended a local-level model to a linear regression model. They
found that γ close to 1/2 implies a shorter detection delay for an early break. Horvath et al.
(2007), Aue et al. (2008b), and Aue and Kuhn (2008) further investigated the behaviors of
the delay time in the case of γ = 1/2. Following the work of Aue and Horvath (2004), Fremdt
(2014, 2015) derived the asymptotic distribution of Page’s sequential CUSUM procedure and
compared the asymptotic normality of the stopping time with that of the ordinary CUSUM
version under a weaker condition on the change. Furthermore, the monitoring procedure
for sequentially detecting parameter instability has been investigated extensively in various
models. For example, Carsoule and Franses (2003) and Lee et al. (2009) developed sequential
tests in autoregressive models, while Na et al. (2011) applied the monitoring procedure to
detect changes for autocorrelation function, parameter instability in GARCH models, and
distributional changes. Xia et al. (2011) and Kurozumi (2017) considered a monitoring scheme
for linear models with endogenous regressors.
All the aforementioned sequential tests focus on models with nontrending regressors.
However, as noted by Perron (1989) and others, macroeconomic time series are sometimes
better characterized by trend-stationary series with possible change(s) in deterministics. Such
evidence with an upward or downward trend has also been found in the fields of tourism,
marketing, and environmental studies. For models with trending regressors, Chu and White
(1992), Kuan (1998), and Aue et al. (2008a) among others proposed tests of parameter in-
stability based on a given historical sample, while Qi et al. (2016) extended the generalized
fluctuation test to monitor structural changes in polynomial regressions.
In this chapter, we develop a CUSUM-type monitoring scheme based on ordinary least
squares residuals to detect parameter instability in a model with a trend. A new boundary
function is introduced to maintain a proper size. We derive the limit distribution of the
CUSUM detecting statistic under the null hypothesis, while proving that the test is consistent
under the alternative. We also extend the CUSUM monitoring procedure to models with
higher order polynomial trends. In addition, we derive the asymptotic distribution of the
delay time for the CUSUM procedure as well as the fluctuation procedure proposed by Qi
et al. (2016). We find that the delay time of the CUSUM test grows at a slower rate than
29
that of the fluctuation test, which implies that the latter requires a longer time to detect an
early change than the former. Then, we compare the CUSUM and fluctuation tests in a small
simulation study and apply them to macroeconomic time series. The results confirm that
the performance of the tests strongly depends on the timing of changes. The CUSUM test is
good at detecting an early change soon after the training period and has a shorter detection
time than the fluctuation test, while the fluctuation test is suitable for a late break.
The remainder of this chapter is as follows. In Section 3.2, we introduce the model and
our assumptions. The asymptotic properties of the test are investigated in Section 3.3 and
we extend the CUSUM monitoring procedure to models with higher order polynomial trends.
Section 3.4 investigates the asymptotic distribution of delay times. Then, we compare the
CUSUM and fluctuation tests in finite samples via Monte Carlo simulations in Section 3.5.
Section 3.6 provides an empirical example and concluding remarks are given in Section 3.7.
The mathematical proofs are relegated to the Appendix B and Appendix C.
3.2 Model and Assumptions
We consider the following model:
yt = x′tβt + ϵt (t = 1, 2, · · · ,m,m+ 1, · · · ), (3.1)
where xt = [1, t/m]′ is a regressor including a constant term and a trend, ϵt is an unobservable
stochastic disturbance, and βt = [β0t, β1t]′ is a vector of the coefficients. Our asymptotic
results do not change if the regressor is replaced by xt = [1, t].
The “noncontamination assumption”, as noted by Chu et al. (1996), is particularly im-
portant, and we suppose that there is no change in the training period of size m, that is,
βt = β0, t = 1, 2, · · · ,m.
The historical data are set as a reference to compare with the new data.
We are interested in testing the null hypothesis that βt is stable and allows for a one-time
change in the parameters under the alternative. Thus, we consider the testing problem given
by
H0 : βt = β0, t = m+ 1,m+ 2, · · ·
against the alternative hypothesis
H1 : There is k∗ ≥ 1 such that βt = β0, t = m+ 1,m+ 2, · · · ,m+ k∗ − 1,
30
but βt = β0 +∆, t = m+ k∗,m+ k∗ + 1, · · · with ∆ = [∆1,∆2]′.
We reject the null hypothesis if the detecting statistic (detector) Γ(m, k) exceeds a bound-
ary function g(m, k) for some k ≥ 1. The detector and boundary function must be designed
such that
limm→∞
P (τm < ∞) = α, under H0, (3.2)
limm→∞
P (τm < ∞) = 1, under H1, (3.3)
where the stopping time τm is defined by
τm =
infk ≥ 1 : |Γ(m, k)| ≥ g(m, k),∞, if |Γ(m, k)| < g(m, k) for all k = 1, 2, · · · .
(3.4)
Condition (3.2) ensures that the probability of a false alarm is given by α, while Condition
(3.3) means that we reject the hypothesis of no change with a probability approaching one
under the alternative.
To investigate the asymptotic properties of the monitoring test, we impose the following
assumption.
Assumption 3.1 For every m, there are two independent sequences of Wiener processes
W1,m(t), t ≥ 0, W2,m(t), t ≥ 0 and a constant σ > 0 such that, for some ν > 2,
sup1≤k<∞
1
k1/ν
∣∣∣∣∣m+k∑
t=m+1
ϵt − σW1,m(k)
∣∣∣∣∣ = Op(1), (3.5)
m∑t=1
ϵt − σW2,m(m) = op(m1/ν). (3.6)
The sequence ϵt satisfying Conditions (3.5) and (3.6) includes not only an i.i.d. sequence but
also a dependent sequence with some regularity conditions, as discussed by Aue and Horvath
(2004). For example, if a sequence ϵt is generated by ϵt =∑∞
j=0 cjδt−j , where δj are i.i.d.
random variables with mean 0, variance σ2, and E|δt|ν < ∞ for some ν > 2 and if δt has a
smooth density and cj satisfies some regularity conditions given by Horvath (1997), then
Assumption 3.1 holds as given by Example 2.2 of Aue and Horvath (2004). From Conditions
(3.5) and (3.6), we can derive the following approximation.
sup1≤m<∞
1
m1/ν
∣∣∣∣∣m∑t=1
(t
m
)i
ϵt − σ
∫ m
0
( x
m
)idW2,m(x)
∣∣∣∣∣ = Op(1), i = 1, 2, ..., p (3.7)
See, for example, Aue et al. (2008a).
31
3.3 Monitoring Procedure for a Change in the Trend
3.3.1 CUSUM-based monitoring procedure
The CUSUM procedure is based on the future residuals of the model given by
ϵt = yt − x′tβm, (3.8)
where βm is an OLS estimator from the historical data given by
βm =
(m∑t=1
xtx′t
)−1 m∑t=1
xtyt.
The CUSUM detector is defined by
Γ(m, k) =1
σmΓ(m, k) where Γ(m, k) =
m+k∑t=m+1
ϵt,
for k = 1, 2, . . . where σ2m is the consistent estimator of σ2 obtained from the training period.
Letting k/m = λ, (B.9) in the proof of Theorem 3.1(i) provides the asymptotic distribu-
tion of the detector as follows:
1
σm√m
m+k∑t=m+1
ϵt ⇒ (λ+ 1)W1
(λ
λ+ 1
)+√3λ(λ+ 1)W2(1)
=: G(λ), (3.9)
where W1(·) and W2(·) are independent Wiener processes. This process has zero mean and
a growing variance
3λ2(λ+ 1)2 + λ(λ+ 1), (3.10)
which implies that a constant boundary cannot be used for the monitoring procedure because
the detecting statistic will eventually exceed a constant boundary and the null hypothesis
will be rejected with a probability approaching one even if the parameter is stable.
As noted by Chu et al. (1996), the growing variance of the asymptotic distribution of the
detector induces an increasing monitoring boundary. The boundary function cannot grow at
a too slow rate because the monitoring procedure will have a high probability of type one
error, while a boundary function with a too fast growth rate will result in the low power of
the test. The distribution of G(λ) shows that the first term is dominated by the second one
as λ increases, which determines the growth rate of the limiting process, and this enables us
to find a suitable form of the boundary function. Based on the boundary function proposed
32
by Horvath et al. (2004), we also allow for a flexible adjustment of the test to deal with
an early break in the monitoring period in terms of the parameter γ ∈ (0, 1/2). Thus, we
design the boundary function such that it grows at the rate√3λγ(λ + 1)2−γ . This means
that the probability of the excess over the boundary can be controlled to maintain a proper
size. Then, we propose the boundary function given by
g(m, k) = c√3m
(k
m
)γ (1 +
k
m
)2−γ
, for some 0 < γ <1
2.
This boundary function over√m grows approximately at the rate λγ(1+λ)2−γ to ensure that
P (|G(λ)| ≥ c√3λγ(1 + λ)2−γ) equals α for some c. The parameter γ is a tuning parameter,
which must be determined by a researcher. Because the shape of the boundary function affects
the detection property for structural change, γ would be chosen based on that property. The
other possible choice of γ may be based on Anatolyev and Kosenok (2018), which introduced
a reasonable criterion for the shape of boundaries which requires that the size of the test be
uniformly distributed over the testing period. From our preliminary simulations, we found
that the test using the boundary function with γ = 0.35 has such a property. Therefore, in
the simulations and and the empirical analysis in later sections, we use g(m, k) with γ = 0.35.
We next derive the limiting properties of the procedure in Theorem 3.1.
Theorem 3.1 Suppose that Assumption 3.1 holds.
(i) Under the null hypothesis, we have
limm→∞
P
(sup
1≤k<∞|Γ(m, k)| ≤ g(m, k)
)= P
(sup
0≤t≤1
∣∣∣∣ 1− t√3tγ
W1(t) + t1−γW2(1)
∣∣∣∣ ≤ c
),
where W1(t), 0 ≤ t < ∞ and W2(t), 0 ≤ t < ∞ are independent Wiener processes.
(ii) Suppose that ∆2 = 0. Then, under the alternative, we have
sup1≤k<∞
|Γ(m, k)|g(m, k)
→ ∞, as m → ∞.
In Theorem 3.1(i), the critical value c = c(α) determined by the significance level α can
be obtained from the asymptotic distribution of the detector. In practice, the monitoring
period of the procedure cannot go to infinity; instead, a researcher determines how long
he/she would like to monitor a change (the length of the monitoring horizon). Therefore, we
suppose that k ranges from 1 to κm for some κ > 0. This means that we start testing at
time m+ 1 and stop at time m+ κm. Then, the critical values are obtained from
limm→∞
P ( sup1≤k≤κm
|Γ(m, k)| ≥ g(m, k)) = P
(sup
0≤t≤ κκ+1
∣∣∣∣ 1− t√3tγ
W1(t) + t1−γW2(1)
∣∣∣∣ ≥ c(α)
)= α,
(3.11)
33
which depends on the various selections of κ and γ in the boundary function. We choose
κ = 1, 2, . . . , 8, and γ = 0.05, 0.15, . . . , 0.45, and approximate Brownian motions using 1,000
independent normal random variables with 100,000 replications to obtain the critical values
in Table 3.1.
Theorem 3.1(i) shows that the asymptotic distribution is composed of two independent
Wiener processes. In the case of t → 0, because of the law of the iterated logarithm, the condi-
tion γ < 1/2 ensures that the term |W1(t)|/tγ converges to zero and consequently the asymp-
totic distribution tends to zero. As t → 1, the first component associated with W1(t) con-
verges to zero and thus the distribution of the procedure is determined by the second Wiener
process. As a result, the proposed boundary function enables the CUSUM monitoring proce-
dure to maintain a nondegenerate and finite limit. Theorem 3.1(ii) implies that the CUSUM
monitoring test is consistent and that (3.3) is satisfied. We can see that the diverging order
of the detecting statistic crucially depends on the term sup1≤k<∞ |∑m+k
t=m+k∗ x′t∆|/g(m, k)
under the alternative from the proof of Theorem 3.1(ii). This term is guaranteed to diverge
to infinity as far as ∆2 = 0.
3.3.2 Extension to higher order polynomials
We extend the CUSUM test to sequentially detect changes in models with higher order
polynomial trend. In this case, βt is a (p+ 1)× 1 parameter vector and the regressor xt is a
(p+ 1) dimensional deterministic vector of the form
xt =
[1,
t
m,
(t
m
)2
, . . . ,
(t
m
)p]′
.
The detecting statistic of the CUSUM procedure is defined as the previous subsection and
as given in the proof of Theorem 3.2, the asymptotic distribution of Γ(m, k)/√m is given by,
for k/m = λ,
G(λ) = (λ+ 1)W1
(λ
λ+ 1
)+ c′(λ)W (1),
where W1(·) and W (·) are 1- and (p+ 1)-dimensional Wiener processes independent of each
other, c(λ) = D−1/2L−1c1+λ−(1+λ)c1, c1+λ and C are (p+1)×1 vector and (p+1)×(p+1)
matrix with the ith and (i, j)th elements given by c1+λ(i) = (1 + λ)i/i and C(i, j) = 1/(i +
j − 1), respectively, and D and L are obtained from the Cholesky decomposition of C given
by C = LDL′ (see the proof of Theorem 3.2). For example, G(λ) becomes equal to (B.9) for
34
p = 1, while it is expressed as, for p = 2,
G(λ) = (λ+ 1)W1
(λ
λ+ 1
)+√3λ(λ+ 1)W3(1) +
√5λ(λ+ 1)(2λ+ 1)W4(1),
where W1(·), W3(·), and W4(·) are independent 1-dimensional Wiener processes.
The process G(·) has mean zero and a growing variance given by
λ(λ+ 1) + c′(λ)c(λ),
where c′(λ)c(λ) = c1+λ − (1 + λ)c1′C−1c1+λ − (1 + λ)c1 is a polynomial of λ of order
2(p + 1) from (B.33). There is no common rule for the exact expression of c′(λ)c(λ). For
example, we can show that it is 3λ2(1 + λ)2 and 4λ2(1 + λ)2(5λ2 + 5λ+ 2) for p = 1 and 2,
respectively.
As in the case of linear trend, we need to take a diverging rate into account to determine
the boundary function. From the structure of c′(λ)c(λ), the highest order λ2(p+1) comes from
the (p + 1)th element of c1+λ, which is (1 + λ)p+1/(p + 1), and the (p + 1, p + 1)th element
of C−1, and we need to obtain its coefficient. In general, the (i, j)th element of C−1 is given
by Choi (1983) as
C−1(i, j) = (−1)i+j(i+ j − 1)
(p+ i
p+ 1− j
)(p+ j
p+ 1− i
)(i+ j − 2
i− 1
)2
,
and thus the coefficient associated with λ2(p+1) becomes
f(p) =(2p+ 1)
(2pp
)2(p+ 1)2
.
For example, f(1) = 3 and f(2) = 20. We then propose the boundary function given by
g(m, k) = c√
f(p)√m
(k
m
)γ (1 +
k
m
)p+1−γ
for some 0 < γ <1
2,
which enables the monitoring procedure to maintain a finite limit. 2 We summarize the
limiting properties of the procedure in Theorem 3.2.
Theorem 3.2 Suppose that Assumption 3.1 holds.
(i) Under the null hypothesis, we have
limm→∞
P
(sup
1≤k<∞|Γ(m, k)| ≤ g(m, k)
)
= P
(sup
0≤t≤1
∣∣∣∣∣(1− t)pW1(t)√f(p)tγ
+(1− t)p+1c′(t/(1− t))W (1)√
f(p)tγ
∣∣∣∣∣ ≤ c
),
2In the case of quadratic trend, the critical values for γ = 0.35 and κ = 1 are 1.2800, 0.9742, and 0.8215at significance levels 1%, 5%, and 10%, respectively. We conducted simulations with m = 250 and the end ofthe monitoring period given by 2m (κ = 1), and found that the size of the test is well controlled (for example,the empirical size at the 5% nominal level is 0.053).
35
where W (·) = [W2(·),W3(·), · · · ,Wp+2(·)]′ is a (p + 1)-dimensional Wiener processes inde-
pendent of W1(·).
(ii) Suppose that ∆p+1 = 0, where ∆p+1 is the last element of ∆ = [∆1,∆2, · · · ,∆p+1]′.
Then, under the alternative, we have
sup1≤k<∞
|Γ(m, k)|g(m, k)
→ ∞, as m → ∞.
The null limiting distribution depends on the polynomial order. For example, it is given
in Theorem 3.1(i) for a model with linear trend, while in the case quadratic trend, it can be
expressed as
P
(sup
0≤t≤1
∣∣∣∣∣(1− t)2W1(t)√20tγ
+
√3t1−γ(1− t)W3(1)√
20+
t1−γ(1 + t)W4(1)
2
∣∣∣∣∣ ≤ c
).
The constant term c can be calculated by simulations using these expressions. When the
monitoring horizon is terminated at some point, we can modify the expression as in (3.11).
3.4 Asymptotic Distributions of the Stopping Times
The monitoring procedure generally rejects the null hypothesis of no change possibly with
a delay after the break. Since a shorter detection delay implies a more reliable conclusion
and lower cost, the speed of detection is an important measure for the sequential test. Thus,
we expect that the procedure should reject the null hypothesis as soon as possible under the
alternative. In this section, we derive the limiting distribution of the stopping time based
on the CUSUM monitoring test as well as that based on the test presented by Qi et al.
(2016), who also proposed a monitoring test for a change in a trend. We thus investigate the
theoretical difference in their limiting behaviors in a linear trend model.
To investigate the asymptotic property of the stopping time based on the CUSUM detec-
tor, we make the following assumption related to k∗ and ∆.
Assumption 3.2 (a) There exists a θ > 0 such that k∗ = O(mθ) for some 0 ≤ θ < 1−2γ4(1−γ) .
(b) Let δ = d′∆ = ∆1 + ∆2 where d = [1, 1]′. There are positive constants C1 and C2 such
that,
C1 ≤ |δ| ≤ C2.
Assumption 3.2(a) implies that the order of the change-point k∗ is related to the historical
sample size m. We focus on the same case as Aue et al. (2009) that a break occurs shortly
36
after the end of the training period.3 Assumption 3.2(b) assumes that the magnitude of the
change is bounded and excludes by a technical reason the case where a change in the trend
coefficient is in the opposite direction to a change in a constant with the same magnitude
(∆2 = −∆1).
Remark 3.1 We focus on the limiting properties of the delay time under early change scenar-
ios, while some papers relaxed the assumptions on the time of the change in linear regression
models and showed that the Page’s CUSUM and MOSUM procedures are able to detect late
changes faster than the ordinary CUSUM procedure. See, for example, Aue et al. (2012),
Fremdt (2014, 2015), and Stohr (2019).
Under this assumption, we derive the asymptotic distribution of the stopping time based
on the CUSUM detector.
Theorem 3.3 Suppose that Assumptions 3.1 and 3.2 hold. Then, we have
limm→∞
P (τm ≤ am + bmz) = Φ(z),
where Φ(·) is the cumulative distribution function of a standard normal distribution,
am =
c1−γm − 1
cγmδ
m+cm∑t=m+k∗
(xt − d)′∆
11−γ
,
bm =
√cmσ
(1− γ)|δ|, and cm =
(√3cσm1/2−γ
|δ|
) 11−γ
.
We next derive the asymptotic distribution of the delay time based on the maximal-type
fluctuation procedure of Qi et al. (2016). The detector of Qi et al. (2016) is defined by
ΓFLm (k, ℓ) = σ−1
m ΓFLm (k, ℓ), where ℓ = k is proportional to k as supposed in Assumption 3.3(a)
and
ΓFLm (k, ℓ) =
m+k∑t=1
yt −m+ k
m
m∑t=1
yt −k(m+ k)
ℓ(m+ ℓ)
(m+ℓ∑t=1
yt −m+ ℓ
m
m∑t=1
yt
),
and the boundary function is given by
gFL(m, k) = cFL√m
(m+ k
m
)2
.
3The order of k∗ can be slightly relaxed to O(m3/8) for the fluctuation test, as is seen in the proof ofLemma C.9(i) and (C.30).
37
Then, the corresponding stopping time is defined as
τFLm = inf
k ≥ 1 : |ΓFL(k, ℓ)| ≥ gFL(m, k)
.
The critical value cFL = cFL(α) is determined by the asymptotic distribution of the detector
under the null hypothesis. See Qi et al. (2016) for more details.4
To derive the limiting distribution of the stopping time based on the fluctuation test, we
need to make an additional assumption.
Assumption 3.3 (a) Let ℓ = [k+1η ] with η > 1, where the bracket means the integer part of
the term.
(b) Suppose that ∆1 +∆2/2 = 0.
Assumption 3.3(a) defines the relationship between the parameters k and ℓ, where k is
required to be greater than ℓ. Qi et al. (2016) used ℓ = k/2 in their simulations but it can
be relaxed as in (a). Assumption 3.3(b) is a necessary technical condition to ensure that the
limit results in Lemmas C.10, C.11, and C.14 can be derived.
The asymptotic distribution of the stopping time based on the fluctuation test is given in
the following theorem.
Theorem 3.4 Suppose that Assumptions 3.1, 3.2(a), and 3.3 hold. Then, we have
limm→∞
P(τFLm ≤ aFL
m + bFLm z
)= Φ(z),
where
aFLm =
cFLσm3/2∣∣∣( 1η − 1
) (δ − ∆2
2
)∣∣∣
1/2
and bFLm =
√η − 1σm
2√aFLm
∣∣∣( 1η − 1
) (δ − ∆2
2
)∣∣∣ .Theorems 3.3 and 3.4 show that the limit distributions of the stopping times are normal.
The sequences am, bm, aFLm , and bFL
m are used to standardize the variables to obtain the
limiting distributions. We can show that τm/amp−→ 1 and τFL
m /aFLm
p−→ 1. Since both
am and aFLm diverge to infinity, both the stopping times also go to infinity. Of importance
is the difference in the diverging rates. In the case of the CUSUM detector, am is of the
order m(1/2−γ)/(1−γ), which takes the value among (m0,m1/2) depending on γ, whereas aFLm
4Qi el al. (2016) also considered the range-type test. However, its finite sample property is inferior to themaximum-type test considered in this article according to their simulations and thus we focus on the lattertest.
38
is of the order m3/4. This implies that the stopping time based on the fluctuation test grows
at a faster rate than that based on the CUSUM detector. Thus, we expect that the delay
time based on the CUSUM procedure tends to be shorter than that based on the fluctuation
one. In other words, the monitoring test based on the CUSUM detector has a theoretical
advantage over that based on the fluctuation test as far as the break occurs early in the
monitoring period. This is confirmed by the Monte Carlo simulations in Section 3.5.
3.5 Finite Sample Properties
In this section, we investigate the finite sample performance of the tests considered in the
previous section. The data-generating process we consider is given by
yt = x′t(β0 +∆1t≥m+k∗) + ϵt, ϵt = ρϵt−1 + et, t = 1, . . . ,m,m+ 1, . . . ,m+ κm.
where xt = [1, t/m]′, β0 = [1, 1]′, and et ∼ i.i.d.N(0, (1 − ρ)2), meaning that the long-run
variance of ϵt is 1. The settings for ∆ and k∗ are explained later. In finite samples, we
consider that the monitoring period stops at 4m (κ = 3), while the training period m is 50,
100, and 250. The parameter γ of the boundary function is set to 0.35. We allow for serial
correlation in the errors and the coefficient ρ is 0.4 and 0.8. To obtain a consistent estimate
σ2m of the variance of the errors based on the historical data, we use the prewhitened kernel
estimator proposed by Andrews and Monahan (1992), which is defined by
σ2m = (1− ρ)−1Ω(1− ρ)−1,
where Ω is a standard kernel heteroskedasticity and autocorrelation consistent estimator given
by
Ω =m
m− 2
Γ0 +m−1∑j=1
k
(j
Sm
)(Γj + Γ
′j)
, with Γj =1
m
m∑t=j+1
etet−j .
The coefficient estimate ρ and residuals et are obtained by regressing ϵt on ϵt−1, where the
OLS residuals ϵt are calculated from regressing yt on xt. In this simulation, we use the
quadratic spectral kernel as k(·), which is defined by
k(x) =25
12π2x2
(sin(6πx/5)
6πx/5− cos(6πx/5)
),
while the bandwidth Sm is selected based on Andrews (1991) given by
Sm = 1.3221(α(2)m)1/5 where α(2) =4ρ2σ4
e
(1− ρ)8
/σ4e
(1− ρ)4,
39
and σ2e is the estimated variance of the residuals et. The significance level is set to 0.05, the
number of replications is 5,000, and all computations are conducted using the GAUSS matrix
language.
Table 3.2 summarizes the empirical sizes of the monitoring procedures. The sizes of both
the CUSUM and fluctuation tests in the cases of m = 100, ρ = 0.4 and m = 250, ρ = 0.4 are
controlled well, whereas for the other cases, the sizes are relatively distorted, especially when
ρ = 0.8.5
For a comparison of the power performance, the change in the coefficient is specified by
∆ = bd, where the magnitude b is set to 0, 0.5, 1.0, 1.5, 2.0 and d = [1, 1]′. Table 3.3 reports
the powers of the monitoring tests corresponding to an early break (m+ k∗ = m+ 1), a late
break (m + k∗ = 1.8m), and a very late break (m + k∗ = 2.5m), respectively. The results
imply that neither the CUSUM test nor the fluctuation test dominates the other in small
samples. The CUSUM test is more powerful than the fluctuation test when the break occurs
soon after the historical data. For the later breaks in the monitoring period, the powers of the
fluctuation test are higher than those of the CUSUM test. Additionally, all the monitoring
tests are more powerful for a larger magnitude of change.
We further investigate the effect of the time of the change on the power performance. The
break date is controlled by m+ k∗ = 1.1m, 1.2m, ..., 2.5m (k∗ = 0.1m, 0.2m, ..., 1.5m); Table
3.4 reports the results. The CUSUM-based monitoring procedure performs better under
early-change settings and the earlier the change occurs, the better the test performs. On the
contrary, the power of the fluctuation test increases as k∗ changes from 0.1m to 1.5m.
Next, we compare the delay times of the two procedures since the detection speed is
regarded as an important indicator of the performance of the monitoring tests. We set
d = [1, 1]′ and consider the breaks occurring at m + k∗ = m + 1, 1.8m, and 2.5m. Table
3.5 summarizes the minimum value, quartiles, and maximum value of the delay time. If a
change occurs rapidly after the end of the training period, the CUSUM version rejects the
null hypothesis earlier than the fluctuation version and the minimum value, quartiles, and
maximum value of the delay time for the CUSUM test are much smaller. This is consistent
with the theoretical result that the stopping time based on the CUSUM version grows at a
slower rate than for the fluctuation version. For the later breaks in the monitoring period,
there is a slight difference between two procedures and the delay time of the fluctuation test
5We also conducted simulations for the range-type test by Qi et al. (2016), but the maximum-type testoutperforms the rage-type one in many cases and thus we report only the former result.
40
is shorter in some cases.
In summary, as far as power is concerned, the CUSUM procedure works significantly
better than the fluctuation test for the early change, while this fact is reversed for the later
changes. Furthermore, a much faster detection of the CUSUM test for the early change, and
the similarity of two procedures in detection time under later changes scenarios indicate that
neither version is uniformly superior to the other in every scenario considered. Therefore, we
recommend using both monitoring procedures in practical analyses.
3.6 Empirical Example
In this section, we apply the monitoring tests to sequentially detect parameter instability in
macroeconomic time series. The following simple linear trend model is considered:
yt = α+ βt+ ϵt,
where yt is the logarithm of real GDP measured in the domestic currency. Three countries,
namely Denmark, Japan, and New Zealand, are selected and quarterly data are taken from
the International Financial Statistics database. The sample periods are different for each
country and Figure 3.1 describes the logarithm of the real GDP series of the three countries.
We first apply the historical test proposed by Perron and Yabu (2009) to detect structural
changes in the whole sample and find that the null hypothesis of no change in the parameters
is rejected. Then, we estimate the break date by minimizing the sum of the squared residuals
and test for parameter constancy in the period before the estimated break. Because GDP
series can have a unit root, we also investigate the presence of a unit root. The results in Panel
(a) of Table 3.6 indicate that the parameters are stable and that the unit root hypothesis can
be rejected for the three series, which implies that we can set the period before the estimated
break as the training period.
We next investigate whether the two procedures can successfully detect the parameter
changes in the three GDP series and compare their speed of detection. We set different
training periods corresponding to the different timings of the break in the monitoring period
and the results are summarized in Panel (b) of Table 3.6. In the case of Japan, the end points
of the training period are set to 1998Q4, 2006Q2, and 2007Q4, which correspond to the late,
moderate, and early breaks in the monitoring period. We can see that all the tests reject
the null hypothesis of no change in the parameters, except the fluctuation test when the
41
break occurs early. Moreover, the fluctuation test has a much longer detection delay than the
CUSUM procedure in the case of the late break, while for a moderate change, the fluctuation
test performs better than the other. For Denmark, both procedures can detect an early
change and the CUSUM method rejects the null hypothesis of no change much earlier than
the fluctuation method, which is consistent with our theoretical analysis that the CUSUM
test is expected to have a shorter detection delay than the fluctuation one if the change
occurs early in the monitoring period.6 We also find evidence of the better performance of
the CUSUM-based test for an early change in the case of New Zealand, while the fluctuation
test is good at detecting a relatively late break in the monitoring period, as shown in the
simulations.
3.7 Conclusion
In this chapter, we applied the CUSUM test based on OLS residuals to sequentially detect
structural change in models with a trend. The asymptotic property of the CUSUMmonitoring
procedure was investigated and the results indicated that it can successfully reject the null
hypothesis of no change. We further derived the asymptotic distributions of the stopping
times based on the CUSUM and fluctuation procedures and found that the delay time based
on the CUSUM procedure is shorter than that based on the fluctuation one in the case of
an early break. This tendency is confirmed in finite samples, although the fluctuation test
works better in some cases. Because the location of the break point is unknown in practice,
it would be desirable to consider the monitoring procedure robust to the break location. One
of the possible strategies may be to construct the hybrid procedure using both the CUSUM
and fluctuation tests, which is our future work.
Appendix B. Proofs of Theorems 3.1 and 3.2
Proof of Theorem 3.1
6Except for the first break date 2001Q4, we also find that the GDP of Denmark exhibited several breakpoints (2003Q3, 2006Q1, and 2008Q3) in the period 2002Q1–2018Q2. The monitoring procedures often requirea long stable period with enough observations as the training period. If the structural changes frequentlyoccurred, such as in the period 2002Q1–2018Q2 of Denmark, we cannot find a suitable training period toapply the monitoring procedures in this case.
42
In this appendix, we replace σ2m with σ2 because it is consistent under both the null and
alternative hypotheses. Let
Cm =
m∑t=1
x′t, C =
[1 1/2
1/2 1/3
].
Then, we have ∥∥∥∥ 1
mCm − C
∥∥∥∥ = O
(1
m
), (B.1)∥∥∥∥∥
(1
mCm
)−1
− C−1
∥∥∥∥∥ = O
(1
m
). (B.2)
We rewrite Γ(m, k) as follows:
m+k∑t=m+1
ϵt =m+k∑
t=m+1
ϵt −m+k∑
t=m+1
x′t
(m∑t=1
xtx′t
)−1 m∑t=1
xtϵt. (B.3)
Lemma B.1 Under Assumption 3.1,
sup1≤k<∞
∣∣∣∣ m+k∑t=m+1
x′tC−1m
m∑t=1
xtϵt − 1m
m+k∑t=m+1
x′tC−1
m∑t=1
xtϵt
∣∣∣∣h(m, k)
= op(1), as m → ∞,
where h(m, k) = g(m, k)/c with c = c(α) determined by the given significance level α.
Proof of Lemma B.1. Relations (3.6) and (3.7) imply that∥∥∥∥∥m∑t=1
xtϵt
∥∥∥∥∥ = Op(√m), as m → ∞. (B.4)
Putting together (B.1), (B.2), and (B.4), we have
sup1≤k<∞
∣∣∣∣ m+k∑t=m+1
x′tC−1m
m∑t=1
xtϵt − 1m
m+k∑t=m+1
x′tC−1
m∑t=1
xtϵt
∣∣∣∣h(m, k)
≤ sup1≤k<∞
∥∥∥∥ 1m
m+k∑t=m+1
x′t
∥∥∥∥ ∥∥∥∥( 1mCm)−1 − C−1
∥∥∥∥ ∥∥∥∥ m∑t=1
xtϵt
∥∥∥∥h(m, k)
= sup1≤k<∞
∥∥∥[ km , k
m + k2
2m2 + k2m2
]∥∥∥h(m, k)
O
(1
m
)Op(
√m).
Since k/m+ k2/(2m2) is the dominating term ofm+k∑
t=m+1xt/m, and
sup1≤k<∞
∣∣∣ km + k2
2m2
∣∣∣(km
)γ (1 + k
m
)2−γ ≤ sup1≤k<∞
(k
m
)1−γ (1 +
k
m
)γ−1
= O(1),
then the proof is complete.
43
Lemma B.2 Under Assumption 3.1,
sup1≤k<∞
∣∣∣∣ m+k∑t=m+1
ϵt − 1m
m+k∑t=m+1
x′tC−1
m∑t=1
xtϵt − G(m, k)
∣∣∣∣h(m, k)
= op(1), (B.5)
where
G(m, k) = σW1,m(k) +k
mG1(m) +
(k2
m2+
k
m2
)G2(m),
G1(m) = 2σW2,m(m)− 6σ
∫ m
0
x
mdW2,m(x),
G2(m) = 3σW2,m(m)− 6σ
∫ m
0
x
mdW2,m(x).
Proof of Lemma B.2. The left-hand side of (B.5) becomes
sup1≤k<∞
∣∣∣∣ m+k∑t=m+1
ϵt − 1m
m+k∑t=m+1
x′tC−1
m∑t=1
xtϵt − G(m, k)
∣∣∣∣h(m, k)
≤ sup1≤k<∞
∣∣∣∣ m+k∑t=m+1
ϵt − σW1,m(k)
∣∣∣∣h(m, k)
+ sup1≤k<∞
km
∣∣∣∣2 m∑t=1
ϵt − 6m∑t=1
tmϵt − G1(m)
∣∣∣∣h(m, k)
+ sup1≤k<∞
( k2
m2 + km2 )
∣∣∣∣3 m∑t=1
ϵt − 6m∑t=1
tmϵt − G2(m)
∣∣∣∣h(m, k)
= A1 +A2 +A3.
Under Assumption 3.1, 1/ν < 1/2, and γ < 1/2, we can see that A1 = op(1) because
sup1≤k≤m
∣∣∣∣ m+k∑t=m+1
ϵt − σW1,m(k)
∣∣∣∣h(m, k)
= sup1≤k≤m
Op(k1/ν)mγ−1/2
kγ(1 + k/m)2−γ≤
Op(1)mγ−1/2 if 1/ν < γ
Op(1)m1/ν−1/2 if 1/ν ≥ γ
= op(1),
and in the case of m < k < ∞,
supm<k<∞
Op(k1/ν)mγ−1/2
kγ(1 + k/m)2−γ
<
Op(1)m
1/ν−1/2 if 1/ν < γ
Op(1) supm<k<∞
k1/ν−γmγ−1/2(mk )2−γ < Op(1)m
1/ν−1/2 if 1/ν ≥ γ
= op(1).
For A2, we have
A2 ≤ sup1≤k<∞
km2
∣∣∣∣ m∑t=1
ϵt − σW2,m(m)
∣∣∣∣h(m, k)
+ sup1≤k<∞
km6
∣∣∣∣ m∑t=1
tmϵt − σ
∫m0
xmdW2,m(x)
∣∣∣∣h(m, k)
= sup1≤k<∞
k/m√3(k/m)γ(1 + k/m)2−γ
Op(m1/ν−1/2).
44
Since
sup1≤k≤m
k/m
(k/m)γ(1 + k/m)2−γ= sup
1≤k≤m
(k/m)1−γ
(1 + k/m)2−γ= O(1),
supm<k<∞
k/m
(k/m)γ(1 + k/m)2−γ< sup
m<k<∞
(k
m
)1−γ (mk
)2−γ= O(1),
we obtain A2 = op(1). Similarly, for the term A3, we can see that
A3 = O
(sup
1≤k<∞
k2
m2 + km2√
3(k/m)γ(1 + k/m)2−γ
)Op(m
1/ν−1/2) = op(1).
Thus, the proof is complete.
Proof of Theorem 3.1. (i) Since the distribution of W1,m(t),W2,m(t), 0 ≤ t < ∞ does
not depend on m, we omit the subscript m in the following. We first establish that
1
σsup
1≤k<∞
|G(m, k)|h(m, k)
D= sup
1≤k<∞
∣∣∣∣ W1(k) +km(2W2(m)− 6
∫m0
xmdW2(x))
+ k2
m2 (3W2(m)− 6∫m0
xmdW2(x))
∣∣∣∣/√m
√3(k/m)γ(1 + k/m)2−γ
(B.6)
+op(1)
D= sup
1≤k<∞
|( km + 1)W1(
kk+m) +
√3 km( k
m + 1)W2(1)|√3(k/m)γ(1 + k/m)2−γ
+ op(1). (B.7)
The first equality in distribution holds because
sup1≤k<∞
∣∣∣∣ k
m2
(3
1√mW2(m)− 6
1√m
∫ m
0
x
mdW2(x)
)∣∣∣∣/√3(k/m)γ(1 + k/m)2−γ
= op(1),
which can be shown by noting that the process (3W2(m)− 6∫m0
xmdW2(x))/
√m has zero ex-
pectation and a finite variance independent ofm, which implies that∣∣(3W2(m)− 6
∫m0
xmdW2(x))/
√m∣∣ =
Op(1), and
sup1≤k<∞
|k/m2|√3(k/m)γ(1 + k/m)2−γ
= o(1).
For the second equality in distribution given by (B.7), let k/m = λ, and the numerator
of (B.6) can be written as
G(λ) = W1(λ)+λ
(2
1√mW2(m)− 6
1√m
∫ m
0
x
mdW2(x)
)+λ2
(3
1√mW2(m)− 6
1√m
∫ m
0
x
mdW2(x)
).
This is a Gaussian process with zero mean and covariance function given by
E[G(s)G(t)] = s(t+ 1) + 3st(s+ 1)(t+ 1) for s < t, (B.8)
which can be decomposed into the two independent processes X1(λ) and X2(λ) with the
covariance functions s(t+1) and 3st(s+1)(t+1), respectively. Since the covariance function
45
of X1(λ) can be written as E[X1(s)X1(t)] = u(s)v(t) for s < t, where u(s) = s and v(t) =
t+ 1, we can use the technique of Doob (1949) to transform X1(λ) into a Brownian motion.
Let a(λ) = u(λ)/v(λ) = λ/(λ + 1), which is continuous and monotonically increasing with
inverse b(λ) = λ/(1 − λ). Then, X1(b(λ))/v(b(λ)) is a standard Brownian motion because
E[X1(b(λ))/v(b(λ))] = 0 and the covariance function is min(s, t). This implies that X1(λ)D=
v(λ)W (a(λ)). And we can also see that X2(λ)D=
√3λ(λ + 1)W (1). Hence, we transform a
Gaussian process G(λ) into a functional of the two independent Brownian motions W1(·) and
W2(·) as follows:
G(λ)D= (λ+ 1)W1
(λ
λ+ 1
)+√3λ(λ+ 1)W2(1), (B.9)
and we thus obtain the second equality in distribution in (B.7).
We next derive the limiting distribution of (B.7). The continuity of (t + 1)W1(t
t+1) +√3t(t+ 1)W2(1)/
√3tγ(1 + t)2−γ on [0, T ] for a given T > 0 yields that,
sup1≤k≤mT
∣∣∣( km + 1)W1(
kk+m) +
√3 km( k
m + 1)W2(1)∣∣∣
√3(k/m)γ(1 + k/m)2−γ
→ sup0<t≤T
∣∣∣(t+ 1)W1(t
t+1) +√3t(t+ 1)W2(1)
∣∣∣√3tγ(t+ 1)2−γ
a.s.
For k ≥ mT , we have
supmT≤k<∞
∣∣∣( km + 1)W1(
kk+m)
∣∣∣√3(k/m)γ(1 + k/m)2−γ
≤ supT≤t<∞
∣∣∣(t+ 1)W1(t
t+1)∣∣∣
√3tγ(t+ 1)2−γ
= supT≤t<∞
∣∣∣W1(t
t+1)∣∣∣
√3tγ(t+ 1)1−γ
,
(B.10)
and for any δ > 0,
limT→∞
P
supT≤t<∞
∣∣∣W1(t
t+1)∣∣∣
√3tγ(t+ 1)1−γ
> δ
= 0. (B.11)
We also have
supmT≤k<∞
∣∣∣∣∣√3 km( k
m + 1)√3(k/m)γ(1 + k/m)2−γ
− 1
∣∣∣∣∣ ≤ supT≤t<∞
∣∣∣∣∣√3t(t+ 1)√
3tγ(1 + t)2−γ− 1
∣∣∣∣∣→ 0, as T → ∞.
(B.12)
Putting together (B.10), (B.11), and (B.12), we can see that as m → ∞ and T → ∞,∣∣∣∣∣∣ supmT≤k<∞
∣∣∣( km + 1)W1(
kk+m) +
√3 km( k
m + 1)W2(1)∣∣∣
√3(k/m)γ(1 + k/m)2−γ
− supT≤t<∞
∣∣∣(t+ 1)W1(t
t+1) +√3t(t+ 1)W2(1)
∣∣∣√3tγ(t+ 1)2−γ
∣∣∣∣∣∣ = op(1).
Hence,
sup1≤k<∞
∣∣G( km)∣∣
h(m, k)
d→ sup0≤t<∞
∣∣∣(t+ 1)W1(t
t+1) +√3t(t+ 1)W2(1)
∣∣∣√3tγ(t+ 1)2−γ
. (B.13)
46
From the scalar transformation, we have(t+ 1)W1(
tt+1) +
√3t(t+ 1)W2(1)√
3tγ(t+ 1)2−γ, 0 ≤ t < ∞
D=
1− t√3tγ
W1(t) + t1−γW2(1), 0 ≤ t ≤ 1
.
Therefore, we obtain
sup1≤k<∞
∣∣G( km)∣∣
h(m, k)
d→ sup0≤t≤1
∣∣∣∣ 1− t√3tγ
W1(t) + t1−γW2(1)
∣∣∣∣ . (B.14)
Theorem 3.1(i) is obtained from Lemmas B.1 and B.2, (B.3), and (B.14).
(ii) Let k > k∗ and ∆ = [∆1,∆2]′. Then, the detector is expressed as
m+k∑t=m+1
ϵt =
m+k∑t=m+1
ϵt −m+k∑
t=m+1
x′t(βm − β0) +
m+k∑t=m+k∗
x′t∆.
From Theorem 3.1, we have
sup1≤k<∞
∣∣∣∣∣∣m+k∑
t=m+1
ϵt −m+k∑
t=m+1
x′t(βm − β0)
∣∣∣∣∣∣/
h(m, k) = Op(1).
We then focus on the last term and will show that∣∣∣∣∣∣m+k∑
t=m+k∗
x′t∆
∣∣∣∣∣∣/
h(m, k) =
∣∣∣∣∣∆1 +∆2 +k + k∗
2m∆2
∣∣∣∣∣ (k − k∗ + 1)/h(m, k) → ∞. (B.15)
Suppose that k∗ = O(mθ) with 0 ≤ θ < 1 and ∆1 +∆2 = 0. Let k = mθ + k∗ − 1 with θ
satisfying max(θ, (1− 2γ)/2(1− γ)) < θ < 1. Then, we have∣∣∣∣∣∆1 +∆2 +k + k∗
2m∆2
∣∣∣∣∣→ |∆1 +∆2| > 0. (B.16)
k − k∗ + 1
m1/2(k/m)γ(1 + k/m)2−γ=
O(mθ(1−γ))
O(m1/2−γ)= O(mθ(1−γ)−(1/2−γ)) → ∞. (B.17)
Thus, (B.16) and (B.17) imply (B.15).
When ∆1 +∆2 = 0, let k = m+ k∗ − 1 (θ = 1). Since ∆2 = 0, we have∣∣∣∣∣∆1 +∆2 +k + k∗
2m∆2
∣∣∣∣∣→ |∆2|2
> 0.
Since (B.17) holds with θ = 1, we have (B.15).
Similarly, we can prove (B.15) for θ ≥ 1 and thus the test is consistent.
Proof of Theorem 3.2
47
Defining x(s) = [1, s, s2, · · · , sp]′, let
c1+k/m =
∫ 1+k/m
0x(s)ds, C =
∫ 1
0x(s)x(s)′ds, ak =
m+k∑t=m+1
xt, and Cm =m∑t=1
x′t.
In the following, we denote the ith and (i, j)th elements of c1+k/m and C as c1+k/m(i) and
C(i, j), respectively. For example, c1+k/m(i) = (1 + k/m)i/i and C(i, j) = 1/(i+ j − 1). We
also note that 1mCm → C and c1 is the first column of C.
Lemma B.3 As m → ∞, we have∥∥∥∥ 1
mak − (c1+k/m − c1)
∥∥∥∥ = O
(1
m
(1 +
k
m
)p
− 1
), (B.18)∥∥∥∥ 1
mCm − C
∥∥∥∥ = O
(1
m
), (B.19)∥∥∥∥∥
(1
mCm
)−1
− C−1
∥∥∥∥∥ = O
(1
m
). (B.20)
Proof of Lemma B.3. We first note that for positive integers a and b (a ≤ b),∫ bm
a−1m
sids =b∑
t=a
∫ tm
t−1m
sids ≥b∑
t=a
∫ tm
t−1m
(t− 1
m
)i
ds =1
m
b∑t=a
(t− 1
m
)i
. (B.21)
Using this relation with a = m+ 1 and b = m+ k, we have, for i = 1, · · · , p+ 1,
0 ≤ 1
mak(i)− (c1+k/m(i)− c1(i)) (B.22)
=1
m
m+k∑t=m+1
(t
m
)i−1
−∫ 1+ k
m
1si−1ds ≤ 1
m
(1 +
k
m
)i−1
− 1
. (B.23)
Because i = p + 1 is the highest order, we obtain (B.18). Similarly, we can see that (B.19)
and (B.20) hold using (B.21) with a = 1 and b = m.
From (B.3), we can express Γ(m, k) as follows:
Γ(m, k) =
m+k∑t=m+1
ϵt − a′kC−1m
m∑t=1
xtϵt. (B.24)
Lemma B.4 Under Assumption 3.1,
sup1≤k<∞
∣∣∣∣a′kC−1m
m∑t=1
xtϵt − (c1+k/m − c1)′C−1
m∑t=1
xtϵt
∣∣∣∣h(m, k)
= op(1), as m → ∞, (B.25)
where h(m, k) = g(m, k)/c with c = c(α) determined by the given significance level α.
48
Proof of Lemma B.4. Putting together (B.4), (B.18), and (B.20), the left-hand side of
(B.25) is bounded by
sup1≤k<∞
∥∥ 1mak
∥∥ ∥∥∥∥( 1mCm)−1 − C−1
∥∥∥∥ ∥∥∥∥ m∑t=1
xtϵt
∥∥∥∥h(m, k)
+ sup1≤k<∞
∥∥ 1mak − (c1+k/m − c1)
∥∥ ∥∥∥∥C−1m∑t=1
xtϵt
∥∥∥∥h(m, k)
= sup1≤k<∞
∥ 1mak∥
h(m, k)O
(1
m
)Op(
√m) +
O(
1m
(1 + k
m
)p − 1)
h(m, k)Op(
√m). (B.26)
The first element of 1mak is k/m, while for i = 2, · · · , p+ 1, we have, from (B.23),
1
m
m+k∑t=m+1
(t
m
)i−1
≤∫ 1+k/m
1si−1ds+
1
m
(1 +
k
m
)i−1
− 1
=1
i
(1 +
k
m
)i
− 1
+
1
m
(1 +
k
m
)i−1
− 1
≤ k
m
(1 +
k
m
)i−1
+i− 1
m
k
m
(1 +
k
m
)i−2
,
where the last inequality holds by applying the mean-value theorem to each term of the right
hand side of the equality. Noting that for all 1 ≤ i ≤ p+ 1,
sup1≤k<∞
km
(1 + k
m
)i−1(km
)γ (1 + k
m
)p+1−γ ≤ sup1≤k<∞
(km
)1−γ(1 + k
m
)1−γ = O(1),
we can see that sup ∥ 1mak∥/h(m, k) = O(m−1/2), which implies that the first term on the
right hand side of (B.26) is op(1). In the same way, the second term is shown to be op(1) and
the proof is complete.
Lemma B.5 Under Assumption 3.1,
sup1≤k<∞
∣∣∣∣ m+k∑t=m+1
ϵt − (c1+k/m − c1)′C−1
m∑t=1
xtϵt − G(m, k)
∣∣∣∣h(m, k)
= op(1), (B.27)
where
G(m, k) = σ(W1,m(k)− (c1+k/m − c1)
′C−1G(m)),
G(m) =
[W2,m(m),
∫ m
0
x
mdW2,m(x), . . . ,
∫ m
0
( x
m
)pdW2,m(m)
]′.
Proof of Lemma B.5. The proof is similar to Lemma B.2 and thus we omit details.
Proof of Theorem 3.2. (i) Since the distribution of W1,m(t),W2,m(t), 0 ≤ t < ∞ does
not depend on m, we omit the subscript m in the following. We first establish that
1
σsup
1≤k<∞
|G(m, k)|h(m, k)
D= sup
1≤k<∞
∣∣∣( km + 1
)W1
(k
k+m
)+ c′
(km
)W (1)
∣∣∣√f(p)(k/m)γ(1 + k/m)p+1−γ
, (B.28)
49
where
c(x) = D−1/2L−1c1+x − (1 + x)c1, (B.29)
L is a lower triangular matrix defined as, for i ≥ j,
L(i, j) = (2j − 1)
(2j − 2
j − 1
)(2i− 1
i− j
)/(2i− 1)
(2i− 2
i− 1
), (B.30)
and D is a diagonal matrix defined as, for i = j,
D(i, j) = 1
/(2i− 1)
(2i− 2
i− 1
)2
. (B.31)
To see this, let k/m = λ and define G(λ) = G(m,λm)/(σ√m) = (W1,m(λm) − (c1+λ −
c1)′C−1G(m))/
√m. This is a Gaussian process with mean zero and covariance function
given by
E[G(s)G(t)] = s+ (c1+s − c1)′C−1(c1+t − c1)
= s+ c′1+sC−1c1+s − (1 + s)− (1 + t) + 1
= c′1+sC−1c1+t − (1 + t)
for s ≤ t, where we used the fact that c′1C−1 = [1, 0, · · · , 0] because c1 is the first column of
the symmetric matrix C. We further decompose the first term into the term of (1+ s)(1+ t)
and the higher order polynomial. Since c1+s = (1 + s)c1 + c1+s − (1 + s)c1, we have,
c′1+sC−1c1+t = (1 + s)(1 + t) + c1+s − (1 + s)c1C−1c1+t − (1 + t)c1, (B.32)
because c′1C−1c1+s − (1 + s)c1 = 0. By using the Cholesky decomposition in Hitotumatu
(1988), the Hilbert matrix C can be decomposed as C = LDL′, where L and D are defined
in (B.30) and (B.31), respectively. Then, the second term of the right-hand side of (B.32)
becomes
c1+s − (1 + s)c1C−1c1+t − (1 + t)c1
= c1+s − (1 + s)c1′(L′)−1D−1/2D−1/2L−1c1+t − (1 + t)c1 = c′(s)c(t), (B.33)
where c(·) is defined in (B.29). Therefore, the covariance function can be expressed as
E[G(s)G(t)] = s(1 + t) + c′(s)c(t).
In exactly the same way as the derivation of (B.9), we have, using Doob’s transformation,
G(λ)D= (λ+ 1)W1
(λ
λ+ 1
)+ c′(λ)W (1), (B.34)
50
where W (1) = [W2(1),W3(1), · · · ,Wp+2(1)]′ is a (p + 1)-dimensional Wiener processes and
W1(·),W2(·), · · · , and Wp+2(·) are independent Wiener processes.
We next derive the limiting distribution of (B.28). For given T > 0, we can see that
sup1≤k≤mT
∣∣∣( km + 1)W1(
kk+m) + c′
(km
)W (1)
∣∣∣√f(p)(k/m)γ(1 + k/m)p+1−γ
→ sup0<t≤T
∣∣∣(t+ 1)W1(t
t+1) + c′ (t)W (1)∣∣∣√
f(p)tγ(t+ 1)p+1−γa.s.
On the other hand, as in the proof of Theorem 3.1, we have
supT≤k<∞
∣∣∣( km + 1)W1(
kk+m)
∣∣∣√f(p)(k/m)γ(1 + k/m)p+1−γ
p−→ 0,
while we will also show that∥∥∥∥∥ supT≤k<∞
c′ (k/m)√f(p)(k/m)γ(1 + k/m)p+1−γ
− ℓ
∥∥∥∥∥→ 0, where ℓ = [0, 0, · · · , 1]′. (B.35)
To see this, we note that the first element of c1+t − (1 + t)c1 is zero, while the ith element
for i = 2, · · · , p+ 1 is given by
(1 + t)i
i− 1 + t
i=
1 + t
i[(1 + t)i−1 − 1] =
t
i× (polynomial of t of order i− 1).
Thus, the highest order in c(t) is p + 1 (corresponding to tp+1) from the (p + 1)th element
and its coefficient is given by 1/(p+1) times (p+1, p+1)th element of the upper triangular
matrix (L′)−1D−1/2, which is equivalent to√f(p). Then,∣∣∣∣∣ sup
T≤k<∞
√f(p)(k/m)p+1√
f(p)(k/m)γ(1 + k/m)p+1−γ− 1
∣∣∣∣∣→ 0,
while for i = 1, · · · , p, ∣∣∣∣∣ supT≤k<∞
(k/m)i√f(p)(k/m)γ(1 + k/m)p+1−γ
∣∣∣∣∣→ 0,
and thus we obtain (B.35). These results imply that
sup1≤k<∞
∣∣∣( km + 1)W1(
kk+m) + c′ (k/m)W (1)
∣∣∣√f(p)(k/m)γ(1 + k/m)p+1−γ
d−→ sup0<t<∞
∣∣∣(t+ 1)W1(t
t+1) + c′ (t)W (1)∣∣∣√
f(p)tγ(t+ 1)p+1−γ.
Finally, by changing t as in the proof of Theorem 3.1, we obtain the null limiting distribution.
(ii) Let k > k∗, ∆ = [∆1,∆2, · · · ,∆p+1]′. As demonstrated in Theorem 3.1, it is sufficient to
show that ∣∣∣∣∣∣m+k∑
t=m+k∗
x′t∆
∣∣∣∣∣∣/
h(m, k) → ∞. (B.36)
51
The left-hand side of (B.36) can be decomposed into the term∣∣∣∣∣∣p+1∑i=1
m
i
(1 +
k
m
)i
−(1 +
k∗ − 1
m
)i∆i
∣∣∣∣∣∣/
h(m, k) (B.37)
and a negligible term. To see this, using (B.21) with a = m + k∗ and b = m + k, the ith
element of∑m+k
t=m+k∗ xt can be rewritten as
m+k∑t=m+k∗
(t
m
)i−1
= m
∫ 1+ km
1+ k∗−1m
xi−1dx+O
(1 + k
m
)i−1
−(1 +
k∗ − 1
m
)i−1
=m
i
(1 +
k
m
)i
−(1 +
k∗ − 1
m
)i+O
k − k∗ + 1
m
(1 +
kim
)i−2 , (B.38)
where the last equality holds by applying the mean-value theorem with ki ∈ [k∗ − 1, k]. Note
that the second term of (B.38) does not appear for i = 1, while for i = 2, 3, · · · , p + 1, the
second term of (B.38) over h(m, k) is negligible because∣∣∣∣∣(k − k∗ + 1)(1 + ki/m)i−2
mh(m, k)
∣∣∣∣∣ <∣∣∣∣∣ k(1 + k/m)i−2
mm1/2(k/m)γ(1 + k/m)p+1−γ
∣∣∣∣∣ =∣∣∣∣∣ (k/m)1−γ
m1/2(1 + k/m)p+3−γ−i
∣∣∣∣∣→ 0
for any value of k.
We next show that the term (B.37) diverges to infinity. Suppose that k∗ = O(mθ)
with 0 ≤ θ < 1 and ∆1 + ∆2 + · · · + ∆p+1 = 0. Let k = mθ + k∗ − 1 with θ satisfying
max(θ, (1− 2γ)/2(1− γ)) < θ < 1. Then, because k/m → 0, we have, using the binomial
expansion,∣∣∣∣∣∣p+1∑i=1
m
i
(1 +
k
m
)i
−(1 +
k∗ − 1
m
)i∆i
∣∣∣∣∣∣ = (k − k∗ + 1)|∆1 +∆2 + · · ·+∆p+1|(1 + o(1)).
By combining this with (B.17), we can see that (B.37) goes to infinity and thus (B.36) holds.
When ∆1 +∆2 + · · ·+∆p+1 = 0, let k = ξm+ k∗ − 1 (θ = 1), where a positive real value
ξ can be chosen such that
1
m
p+1∑i=1
m
i
(1 +
k
m
)i
−(1 +
k∗ − 1
m
)i∆i →
p+1∑i=1
1
i
(1 + ξ)i − 1
∆i = 0.
Since m/h(m, k) → ∞ by (B.17) with θ = 1, we have (B.36).
Similarly, we can prove (B.36) for θ ≥ 1 and thus the test is consistent.
52
Appendix C. Proofs of Theorems 3.3 and 3.4
Proof of Theorem 3.3
The proof of Theorem 3.3 is based on the framework of Aue et al. (2009) through a series of
lemmas. The basic idea is to find a sequence N = N(m,x) such that
Pτm ≥ N = P
max
1≤k≤N
|Γ(m, k)|g(m, k)
≤ 1
→ Φ(x), for all real x.
Now, we define N as
N1−γ =
√3cσm1/2−γ
|δ|− 1
cγmδ
m+cm∑t=m+k∗
(xt − d)′∆− σx
((√3cσ)1/2−γm(1/2−γ)2
|δ|3/2−2γ
) 11−γ
= c1−γm − 1
cγmδ
m+cm∑t=m+k∗
(xt − d)′∆− σx
|δ|c1/2−γm
= a1−γm − x
(1− γ)bmcγm
. (C.1)
The following proof assumes that δ is positive and the same result can be derived under the
condition δ < 0. We first derive the order of the maximum of the partial sums of xt − d in
Lemmas C.1 and C.2, where d = [1, 1]′. Note that d is not the mean of regressor xt.
Lemma C.1 For j = 1 and 2, as m → ∞,
maxk∗≤k≤cm
∣∣∣∣∣m+k∑
t=m+k∗
(xjt − dj)
∣∣∣∣∣ = o(1), (C.2)
max1≤k≤k∗
1
k
∣∣∣∣∣m+k∑
t=m+1
(xjt − dj)
∣∣∣∣∣ = o(1). (C.3)
Proof of Lemma C.1. The result for j = 1 is obvious because the first element of xt is
unity. In the case of j = 2,
maxk∗≤k≤cm
∣∣∣∣∣m+k∑
t=m+k∗
(t
m− 1
)∣∣∣∣∣ = maxk∗≤k≤cm
∣∣∣∣(k + k∗)(k − k∗ + 1)
2m
∣∣∣∣ ≤ (cm + k∗)(cm − k∗ + 1)
2m= o(1),
since k∗2/m = o(1) and c2m/m = o(1) from Lemmas C.3(i) and (iii).
We can also see that
max1≤k≤k∗
1
k
∣∣∣∣∣m+k∑
t=m+1
(t
m− 1
)∣∣∣∣∣ = max1≤k≤k∗
∣∣∣∣k + 1
2m
∣∣∣∣ ≤ k∗ + 1
2m= o(1),
and we finish the proof.
53
Lemma C.2 Under Assumption 3.2, as m → ∞,
N
cm→ 1 and
amcm
→ 1, (C.4)
maxk∗≤k≤N
1
kγ
∣∣∣∣∣m+k∑
t=m+k∗
(xt − d)′∆
∣∣∣∣∣ = O
(N2−γ
m
), (C.5)
maxk∗≤k≤N
1
kγ
∥∥∥∥∥m+k∑
t=m+k∗
(xt − d)
∥∥∥∥∥ = O
(N2−γ
m
). (C.6)
Proof of Lemma C.2. From the definition of N , we find that(N
cm
)1−γ
= 1− 1
cmδ
m+cm∑t=m+k∗
(xt − d)′∆− σx√cm|δ|
. (C.7)
Sincem+b∑
t=m+a
(xt − d)′∆ =(b+ a)(b− a+ 1)
2m∆2, (C.8)
Lemma C.3 implies that the second term of the right-hand side in (C.7) is o(1) because
1
cmδ
m+cm∑t=m+k∗
(xt − d)′∆ =(c2m − k∗2 + cm + k∗)∆2
2mcmδ= o(1),
which implies (am/cm)1−γ → 1, while the third term also goes to 0 according to the definition
of cm. Thus, we have N/cm → 1.
We next observe that
maxk∗≤k≤N
1
kγ
∣∣∣∣∣m+k∑
t=m+k∗
(xt − d)′∆
∣∣∣∣∣ < maxk∗≤k≤N
(k2 + k + k∗)|∆2|2mkγ
≤
(N2−γ
2m+
N1−γ
2m+
k∗(1−γ)
2m
)|∆2| (C.9)
= O
(N2−γ
m
),
since for the third term of (C.9), according to Assumption 2(a) and (C.4), we have
O
(k∗(1−γ)
m
)= O
(k∗(1−γ)
N2−γ
N2−γ
m
)= O
(m
θ(1−γ)− (1/2−γ)(2−γ)1−γ
)O
(N2−γ
m
)= o
(N2−γ
m
).
Similarly, we can derive (C.6) and we omit the proof.
54
Lemma C.3 Under Assumption 3.2, as m → ∞,
(i)k∗2
m→ 0.
(ii)N
m→ 0.
(iii)c2mm
→ 0.
(iv)k∗
√cm
→ 0 andk∗√N
→ 0.
Proof of Lemma C.3.(i) This is obvious by noting that k∗ = O(mθ) with 0 ≤ θ < 1/2.
(ii) It is proven that (cmm
)1−γ=
√3cσ
|δ|√m
= o(1).
Applying (C.4), we obtain N/m = o(1).
(iii) From the definition of cm, if γ = 0 holds, we have(c2mm
)1−γ
=3c2σ2
δ2mγ= o(1).
(iv) It can be verified by the assumption of k∗ and the definitions of cm that
k∗√cm
= O(mθ−(1/2−γ)/(2(1−γ))
)= o(1),
and cm can be replaced by N from (C.4). Thus, the proof is complete.
Lemma C.4 Under Assumption 3.2, for all real x,
limm→∞
(N
m
)γ−1/2(cσ −
∣∣∣∣ 1√3m(N/m)γ
Sm(k∗, N)
∣∣∣∣) =σx√3, (C.10)
where Sm(k∗, a) =∑m+a
t=m+k∗ x′t∆.
Proof of Lemma C.4. It can be verified that(N
m
)γ−1/2 1√3m(N/m)γ
Sm(k∗, N)
=
(N
m
)γ−1/2(
Nδ√3m(N/m)γ
+1√
3m(N/m)γ
m+N∑t=m+k∗
(xt − d)′∆
)+ o(1). (C.11)
Since the first term in the parentheses can be shown to dominate the second one, we can see
for a large m that |Sm(k∗, N)| = Sm(k∗, N) when δ > 0. Then, the left-hand side of (C.10)
can be rewritten by inserting (C.11) as(N
m
)γ−1/2(cσ − Nδ√
3m(N/m)γ− 1√
3m(N/m)γ
m+N∑t=m+k∗
(xt − d)′∆
)+ o(1). (C.12)
55
From the definition of N , we find that
Nδ√3m(N/m)γ
= cσ − 1√3m(cm/m)γ
m+cm∑t=m+k∗
(xt − d)′∆− σx√3
(m
cm
)γ−1/2
. (C.13)
If we prove that
Nγ−1/2(R(cm, cm)−R(N,N)) = o(1), where R(y, z) =1
yγ
m+z∑t=m+k∗
(xt − d)′∆,
the lemma is proven by using (C.12), (C.13), and the fact that (N/m)γ−1/2(m/cm)γ−1/2 → 1.
We start by transforming
Nγ−1/2|R(cm, cm)−R(N,N)| ≤ Nγ−1/2|R(cm, cm)−R(N, cm)|+Nγ−1/2|R(N, cm)−R(N,N)|.
(C.14)
Applying the mean value theorem and (C.2) of Lemma C.1, the first term of the right-hand
side in (C.14) is rewritten as
Nγ−1/2|Nγ − cγm|cγmNγ
∣∣∣∣∣m+cm∑t=m+k∗
(xt − d)′∆
∣∣∣∣∣ = Nγ−1/2O(c2γ−1m c
1/2−γm /δ)
cγmNγo(1) = o(1),
(see p.186 of Aue et al. (2009)). For the second term of the right-hand side in (C.14), using
c2m/m = o(1), (C.4), and (C.8),
Nγ−1/2|R(N, cm)−R(N,N)| = 1√N
∣∣∣∣∣m+N∑
t=m+cm+1
(xt − d)′∆
∣∣∣∣∣ = N2 − c2m +N − cm
2√Nm
= o(1).
Lemma C.5 Under Assumptions 3.1 and 3.2,(N
m
)γ−1/2(
max1≤k<k∗
|Γ(m, k)|h(m, k)
−∣∣∣∣ 1√
3m(N/m)γSm(k∗, N)
∣∣∣∣)
p−→ −∞, (C.15)
where h(m, k) = g(m, k)/c.
Proof of Lemma C.5. Γ(m, k) can be written as
Γ(m, k) =
m+k∑t=m+1
ϵt +
m+k∑t=m+1
x′t(β0 − βm) + Sm(k∗, k)1k≥k∗. (C.16)
For the first term of (C.16), we have(N
m
)γ−1/2
max1≤k<k∗
|∑m+k
t=m+1 ϵt − σWm(k)|h(m, k)
= Op(1) max1≤k<k∗
k(1/ν−γ)
N1/2−γ
< Op
((k∗
N
)1/2−γ)
= op(1),
56
where the inequality holds because 1/ν < 1/2, and we next find that as m → ∞,(N
m
)γ−1/2 ∣∣∣∣ max1≤k<k∗
Wm(k)
h(m, k)
∣∣∣∣ ≤ ∣∣∣∣ sup0<t<k∗
Wm(t)√N(t/N)γ
∣∣∣∣ D=
∣∣∣∣∣ sup0<t<k∗/N
W (t)
tγ
∣∣∣∣∣ = op(1), (C.17)
which implies that (N/m)γ−1/2max1≤k<k∗ |∑m+k
t=m+1 ϵt|/h(m, k) = op(1) holds.
The second term of (C.16) can be rewritten as
(N
m
)γ−1/2
max1≤k<k∗
∣∣∣∑m+kt=m+1 x
′t(β0 − βm)
∣∣∣h(m, k)
=
(N
m
)γ−1/2
max1≤k<k∗
∥∥∥∑m+kt=m+1 xt
∥∥∥h(m, k)
Op
(1√m
)
= Op(1)
(N
m
)γ−1/2 max
1≤k<k∗
∥∥∥∑m+kt=m+1 d
∥∥∥√3m(k/m)γ(1 + k/m)2−γ
+ max1≤k<k∗
∥∥∥∑m+kt=m+1(xt − d)
∥∥∥√3m(k/m)γ(1 + k/m)2−γ
= Op(1)
(N
m
)γ−1/2
O
((k∗
m
)1−γ)
= op(1).
The third term of (C.16) is zero when k < k∗. Thus, we prove that the first component
of (C.15) tends to 0 and we next show that the second component diverges as m → ∞. From
m+b∑t=m+a
x′t∆ =m+b∑
t=m+a
d′∆+m+b∑
t=m+a
(xt − d)′∆ = (b− a+ 1)δ +(b+ a)(b− a+ 1)
2m∆2, (C.18)
and (C.5) of Lemma C.2, we have
Sm(k∗, N)√3m(N/m)γ
=
∑m+Nt=m+k∗ d
′∆√3m(N/m)γ
+
∑m+Nt=m+k∗(xt − d)′∆√3m(N/m)γ
=(N − k∗ + 1)δ√
3m(N/m)γ+O
(N2−γ
m3/2−γ
)=
(N − k∗ + 1)δ√3m(N/m)γ
+ o(1).
Since, for δ > 0,
limm→∞
(N − k∗ + 1)δ√3m(N/m)γ
= limm→∞
c1−γm δ√
3m1/2−γ= cσ > 0,
which is obtained from the definition of cm and (C.4), the second term in (C.15) diverges to
infinity and thus we obtain Lemma C.5.
Lemma C.6 Under Assumptions 3.1 and 3.2,(N
m
)γ−1/2
maxk∗≤k≤N
∣∣∣Γ(m, k)− (σWm(k) + Sm(k∗, k))∣∣∣/h(m, k) = op(1). (C.19)
57
Proof of Lemma C.6. From (C.16), the left-hand side of (C.19) is decomposed into two
terms, one of which is
(N
m
)γ−1/2
maxk∗≤k≤N
∣∣∣∑m+kt=m+1 ϵt − σWm(k)
∣∣∣√3m(k/m)γ(1 + k/m)2−γ
= Op(1) maxk∗≤k≤N
k1/ν−γ
N1/2−γ= op(1),
since
maxk∗≤k≤N
k1/ν−γ
N1/2−γ≤
N1/ν−1/2 if 1/ν ≥ γ
Nγ−1/2k∗1/ν−γ if 1/ν < γ
= op(1),
while the other term becomes(N
m
)γ−1/2
maxk∗≤k≤N
∣∣∣∑m+kt=m+1 x
′t(β0 − βm)
∣∣∣h(m, k)
= Op(1)
(N
m
)γ−1/2 max
k∗≤k≤N
∥∥∥∑m+kt=m+1 d
∥∥∥√3m(k/m)γ(1 + k/m)2−γ
+ maxk∗≤k≤N
∥∥∥∑m+kt=m+1(xt − d)
∥∥∥√3m(k/m)γ(1 + k/m)2−γ
= Op(1)
(N
m
)γ−1/2
O
(N1−γ
m1−γ
)+
(N
m
)γ−1/2
O
(N2−γ
m2−γ
)= op(1).
Lemma C.7 Under Assumptions 3.1 and 3.2,(N
m
)γ−1/2
maxk∗≤k≤N
∣∣∣∣σWm(k) + Sm(k∗, k)
h(m, k)− σWm(k) + Sm(k∗, k)√
3m(k/m)γ
∣∣∣∣ = op(1). (C.20)
Proof of Lemma C.7. The left-hand side of (C.20) is bounded by(N
m
)γ−1/2
maxk∗≤k≤N
|σWm(k)|√3m(k/m)γ
∣∣∣∣∣√3m(k/m)γ
h(m, k)− 1
∣∣∣∣∣+
(N
m
)γ−1/2
maxk∗≤k≤N
|Sm(k∗, k)|√3m(k/m)γ
∣∣∣∣∣√3m(k/m)γ
h(m, k)− 1
∣∣∣∣∣ . (C.21)
The mean value theorem yields that
maxk∗≤k≤N
∣∣∣∣∣√3m(k/m)γ√
3m(k/m)γ(1 + k/m)2−γ− 1
∣∣∣∣∣ = maxk∗≤k≤N
∣∣∣∣∣(1 +
k
m
)γ−2
− 1
∣∣∣∣∣ = O
(N
m
)= o(1),
and then the first term of (C.21) is shown to be op(1) as proven by Lemma 3.3 in Aue and
Horvath (2004). By using (C.18) and (C.5), the second component of (C.21) can be written
as
Nγ−1/2 maxk∗≤k≤N
1√3kγ
∣∣∣∣∣m+k∑
t=m+k∗
d′∆+m+k∑
t=m+k∗
(xt − d)′∆
∣∣∣∣∣∣∣∣∣∣(1 +
k
m
)γ−2
− 1
∣∣∣∣∣= Nγ−1/2
(O(N1−γ) +O
(N2−γ
m
))O
(N
m
)= o(1).
58
Lemma C.8 . Under Assumptions 3.1 and 3.2,
limm→∞
P
((N
m
)γ−1/2
maxk∗≤k≤N
(|σWm(k) + Sm(k∗, k)|√
3m(k/m)γ− |Sm(k∗, N)|√
3m(N/m)γ
)≤ βm(γ)
)= Φ(x),
(C.22)
where βm(γ) =
(N
m
)γ−1/2(cσ − |Sm(k∗, N)|√
3m(N/m)γ
).
Proof of Lemma C.8. We can see that
maxk∗≤k≤N
σWm(k) + Sm(k∗, k)√3m(k/m)γ
= maxk∗≤k≤N
1√3m(k/m)γ
(σWm(k) +
m+k∑t=m+k∗
(xt − d)′∆+ (k − k∗ + 1)δ
). (C.23)
We find that the order of the second term of (C.23) becomes, using (C.5) of Lemma C.2,
maxk∗≤k≤N
∣∣∣∑m+kt=m+k∗(xt − d)′∆
∣∣∣√3m1/2−γkγ
= O
(mγ−1/2N
2−γ
m
)= o(mγ−1/2N1−γ),
while the order of the first term is given by
maxk∗≤k≤N
σ|Wm(k)|√3m(k/m)γ
= Op
((N
m
)1/2−γ)
= op(mγ−1/2N1−γ).
On the contrary, the last term is
maxk∗≤k≤N
(k − k∗ + 1)δ√3m(k/m)γ
= O(mγ−1/2N1−γ),
which implies that the last term dominates the others and thus the maximum of (C.23) is
achieved at k close to N because the last term is an increasing function of k. Hence, for all
ε ∈ (0, 1),
limm→∞
P
(max
k∗≤k≤N
|σWm(k) + Sm(k∗, k)|√3m(k/m)γ
= max(1−ε)N≤k≤N
|σWm(k) + Sm(k∗, k)|√3m(k/m)γ
)= 1.
Exactly in the same way as Lemma 7.6 of Aue et al. (2009), we can show that the maximum
of (C.23) is attained at k = N . Therefore, because Sm(k∗, N) dominates σWm(N) and
Sm(k∗, N) is positive for a large m when δ > 0, we have, because βm(γ) → σx/√3 by Lemma
C.4,
limm→∞
P
((N
m
)γ−1/2
maxk∗≤k≤N
(|σWm(k) + Sm(k∗, k)|√
3m(k/m)γ− |Sm(k∗, N)|√
3m(N/m)γ
)≤ βm(γ)
)
= limm→∞
P
(σ√3
Wm(N)√N
≤ βm(γ)
)= Φ(x).
59
Proof of Theorem 3.3. By combining Lemmas C.5–C.8, we can see that
limm→∞
P (τm ≥ N) = limm→∞
P
(max
1≤k≤N
|Γ(m, k)|g(m, k)
≤ 1
)= Φ(x).
Because Φ(x) is symmetric around 0, we have
Φ(x) = 1− Φ(−x)
= 1− limm→∞
P (τm ≥ N(m,−x))
= 1− limm→∞
P
(τ1−γm ≥ a1−γ
m + x(1− γ)bm
cγm
)= lim
m→∞P
(cγm
1− γ
τ1−γm − a1−γ
m
bm≤ x
).
This implies that τm/amp−→ 1 because a1−γ
m cγm/bm → ∞. Applying the result in the proof
of Theorem 3.1 of Aue et al. (2009), we obtain
limm→∞
P
(τm − am
bm≤ x
)= lim
m→∞P
(cγm
1− γ
τ1−γm − a1−γ
m
bm≤ x
)= Φ(x),
and hence complete the proof.
Proof of Theorem 3.4
We next derive the asymptotic distribution of the stopping time based on the maximal-
type fluctuation test through Lemmas C.9–C.14. We define the sequence NFL(m,x) as
(NFL)2 =cFLσm3/2
|(1/η − 1)(δ −∆2/2)|− σx
√η − 1m
√aFLm
|(1/η − 1)(δ −∆2/2)|= (aFL
m )2 − 2xaFLm bFL
m , (C.24)
where aFLm and bFL
m are defined in Theorem 3.4. The following derivation is considered under
the condition that δ −∆2/2 is negative and the proof for positive δ −∆2/2 follows similarly
and is omitted here. Allowing for an abuse of notation, let N = NFL, am = aFLm , bm = bFL
m ,
and c = cFL. The detector is rewritten as
ΓFLm (k, ℓ) =
m+k∑t=m+1
ϵt −k
m
m∑t=1
ϵt −k(m+ k)
ℓ(m+ ℓ)
(m+ℓ∑
t=m+1
ϵt −ℓ
m
m∑t=1
ϵt
)
+m+k∑
t=m+k∗
x′t∆1k≥k∗ −k(m+ k)
ℓ(m+ ℓ)
m+ℓ∑t=m+k∗
x′t∆1ℓ≥k∗.
60
Lemma C.9 Under Assumptions 3.2(a) and 3.3, as m → ∞,
(i)k∗
√am
= o(1).
(ii)N
am→ 1.
(iii)N
m→ 0.
(iv)N3/2
m→ ∞.
Proof of Lemma C.9. (i) Since k∗ = O(mθ) with 0 ≤ θ < 1/2, k∗/√
am = O(mθ−3/8) =
o(1).
(ii) From the definition of N , we have(N
am
)2
= 1− σx
√η − 1
|(1/η − 1)(δ −∆2/2)|m
a3/2m
.
The second term tends to 0 since
m
a3/2m
= m
(|(1/η − 1)(δ −∆2/2)|
cσm3/2
)3/4
= o(1).
(iii) Using (ii) and
amm
=
√cσm3/4−1√
|(1/η − 1)(δ −∆2/2)|= o(1),
we find that N/m = o(1).
(iv) From the definition of N , we have N3/2/m = O(m1/8).
Lemma C.10 Under Assumptions 3.2(a) and 3.3, for all real x,
limm→∞
(N
m
)−1/2
(cσ − |Jm(k∗, N)|) =√η − 1σx, (C.25)
where Jm(k∗, a) =1√m
m+a∑t=m+k∗
x′t∆− a(m+ a)
[a+1η ](m+
[a+1η
]) m+[a+1η
]∑t=m+k∗
x′t∆
. (C.26)
Proof of Lemma C.10. Let
C(η) =N(m+N)[
N+1η
] (m+
[N+1η
]) . (C.27)
Using (C.18), (N/m)−1/2Jm(k∗, k) can be expressed as(N
m
)−1/2
Jm(k∗, k) =
(N
m
)−1/2 1√m
[(k − k∗ + 1)δ +
(k + k∗)(k − k∗ + 1)
2m∆2
−k(m+ k)
ℓ(m+ ℓ)
(ℓ− k∗ + 1)δ +
(ℓ+ k∗)(ℓ− k∗ + 1)
2m∆2
]= J1 + J2 + J3,
61
where
J1 =
(N
m
)−1/2 1√m
kδ +
k2
2m∆2 −
k(m+ k)
ℓ(m+ ℓ)
(ℓδ +
ℓ2
2m∆2
),
J2 =
(N
m
)−1/2 1√m
k
2m∆2 −
k(m+ k)
ℓ(m+ ℓ)
ℓ
2m∆2
,
J3 =
(N
m
)−1/2 1√m
[(−k∗ + 1)δ +
−k∗2 + k∗
2m∆2 −
k(m+ k)
ℓ(m+ ℓ)
(−k∗ + 1)δ +
−k∗2 + k∗
2m∆2
].
We investigate the order of each term. First, we have
maxk∗≤k≤N
J1 =1√N
maxk∗≤k≤N
k(k − ℓ)
m+ ℓ
(−δ +
∆2
2
)=
1√N
maxk∗≤k≤N
k(k − ℓ)
m
(−δ +
∆2
2
)+ o(1), (C.28)
where the second equality holds because 1/(m+ℓ) = 1/m+O(ℓ/m2) and O(N5/2/m2) = o(1).
Similarly, we have
maxk∗≤k≤N
J2 =1√N
maxk∗≤k≤N
k(ℓ− k)
m+ ℓ
∆2
2m= O
(N3/2
m2
)= o(1), (C.29)
maxk∗≤k≤N
J3 = maxk∗≤k≤N
1− k(m+ k)
ℓ(m+ ℓ)
−k∗δ√
N+ o(1) = O
(k∗√N
)= o(1). (C.30)
From (C.28)–(C.30), we have(N
m
)−1/2
maxk∗≤k≤N
Jm(k∗, k) =1√N
maxk∗≤k≤N
k(k − ℓ)
m
(−δ +
∆2
2
)+ o(1), (C.31)
which implies that, because ℓ = [(k + 1)/η],
(N
m
)−1/2
Jm(k∗, N) =
(N
m
)−1/2N
(N −
[N+1η
])m3/2
(−δ +
∆2
2
)+ o(1). (C.32)
On the other hand, from the definition of N , we find that
cσ =
(1− 1
η
)(∆2
2− δ
)N2
m3/2+√η − 1σx
√amm
. (C.33)
Note that as is seen in (C.32) and (C.37), |Jm(k∗, N)| = Jm(k∗, N) for a large m when
δ −∆2/2 < 0. Then, by using (C.32) and (C.33), (C.25) can be written as
(N
m
)−1/2(1− 1
η
)(∆2
2− δ
)N2
m3/2−(N −
[N + 1
η
])(∆2
2− δ
)N
m3/2
+√η − 1σx
√amm
+ o(1). (C.34)
62
From Lemma C.9, we can see that(N
m
)−1/2(1− 1
η
)N −
(N −
[N + 1
η
])(∆2
2− δ
)N
m3/2
=
(N
m
)−1/2([N + 1
η
]− N
η
)(∆2
2− δ
)N
m3/2
= O
((N
m
)−1/2 N
m3/2
)= O
(√N
m
)= o(1), (C.35)
and
limm→∞
(N
m
)−1/2√η − 1σx
√amm
=√η − 1σx.
Thus, we complete the proof.
Lemma C.11 Under Assumptions 3.1, 3.2(a), and 3.3, then(N
m
)−1/2(
max1≤k<k∗
|ΓFLm (k, ℓ)|
hFL(m, k)− |Jm(k∗, N)|
)p−→ −∞, (C.36)
where hFL(m, k) = gFL(m, k)/cFL.
Proof of Lemma C.11. We simplify the notations as h(m, k) = hFL(m, k) and in the case
of k ≤ k∗,
ΓFLm (k, ℓ) =
m+k∑t=m+1
ϵt −k
m
m∑t=1
ϵt −k(m+ k)
ℓ(m+ ℓ)
(m+ℓ∑
t=m+1
ϵt −ℓ
m
m∑t=1
ϵt
),
and consequently we obtain
max1≤k<k∗
|ΓFLm (k, ℓ)|h(m, k)
≤ max1≤k<k∗
∣∣∣∑m+kt=m+1 ϵt
∣∣∣h(m, k)
+ max1≤ℓ<k∗
|kϵm|h(m, k)
+ O(1)
max1≤k<k∗
∣∣∣∑m+ℓt=m+1 ϵt
∣∣∣h(m, k)
+ max1≤ℓ<k∗
|ℓϵm|h(m, k)
= B1 +B2 +B3 +B4.
Then, the first term is bounded by
(N
m
)−1/2
B1 ≤(N
m
)−1/2
max1≤k<k∗
∣∣∣∑m+kt=m+1 ϵt − σWm(k)
∣∣∣√m(1 + k/m)2
+
(N
m
)−1/2
max1≤k<k∗
|σWm(k)|√m(1 + k/m)2
= Op(1)O
(max
1≤k<k∗
k1/ν√N
)+O(1) max
1≤k<k∗
|σWm(k)|√N
= Op(1),
63
where the last equality is derived by Lemma C.9 (i) and sup0<t<k∗/N |Wm(t)| = Op(1). Then,
for the second term, we have(N
m
)−1/2
B2 =
(N
m
)−1/2
max1≤k<k∗
k∣∣ 1m
∑mt=1 ϵt
∣∣√m(1 + k/m)2
= O
(k∗√Nm
)Op(
√m) = op(1).
Similarly, we can derive that (N/m)−1/2(B3 + B4) = Op(1) + op(1). We have proven that
the first term related to the detector in (C.36) is bounded in probability. We next show that
(N/m)−1/2Jm(k∗, N) diverges as m → ∞. Applying (C.32), and (C.35), we have(N
m
)−1/2
Jm(k∗, N) =
(N
m
)−1/2(1− 1
η
)(∆2
2− δ
)N2
m3/2+ o(1)
=
(1− 1
η
)(∆2
2− δ
)N3/2
m+ o(1). (C.37)
If the term δ −∆2/2 is nonzero, we can show that (N/m)−1/2|Jm(k∗, N)| tends to positive
infinity and hence we finish the proof of Lemma C.11.
Lemma C.12 Under Assumptions 3.1, 3.2(a), and 3.3, we have(N
m
)−1/2
maxk∗≤k≤N
∣∣∣ΓFLm (k, ℓ)− Wm(k, ℓ)
∣∣∣/h(m, k) = op(1), (C.38)
where
Wm(k, ℓ) = WQ(k)−k(m+ k)
ℓ(m+ ℓ)WQ(ℓ),
WQ(j) = σWm(j) +
m+j∑t=m+k∗
x′t∆1j≥k∗.
Proof of Lemma C.12. Let
Q(m, j) =
m+j∑t=m+1
ϵt −j
m
m∑t=1
ϵt +
m+j∑t=m+k∗
x′t∆1j≥k∗.
Then, ΓFLm (k, ℓ) = Q(m, k)− k(m+ k)/ℓ(m+ ℓ)Q(m, ℓ). We have(
N
m
)−1/2
maxk∗≤k≤N
|Q(m, k)− WQ(k)|h(m, k)
≤(N
m
)−1/2
maxk∗≤k≤N
|∑m+k
t=m+1 ϵt − σWm(k)|√m(1 + k/m)2
+
(N
m
)−1/2
maxk∗≤k≤N
| km∑m
t=1 ϵt|√m(1 + k/m)2
= Op
(max
k∗≤k≤N
k1/ν√N
)+Op
(max
k∗≤k≤N
k√Nm
)= op(1).
Similarly, we can also show that(N
m
)−1/2 k(m+ k)
ℓ(m+ ℓ)max
k∗≤k≤N
|Q(m, ℓ)− WQ(ℓ)|h(m, k)
= op(1),
since ℓ = [(k+1)/η] implies that k(m+k)/ℓ(m+ℓ) = O(1). Hence, the proof is complete.
64
Lemma C.13 Under Assumptions 3.1, 3.2(a), and 3.3, we have(N
m
)−1/2
maxk∗≤k≤N
∣∣∣∣∣Wm(k, ℓ)
h(m, k)− Wm(k, ℓ)√
m
∣∣∣∣∣ = op(1). (C.39)
Proof of Lemma C.13. The left-hand side of (C.39) is bounded by(N
m
)−1/2
maxk∗≤k≤N
|σWm(k)|√m
∣∣∣∣ √m
h(m, k)− 1
∣∣∣∣+ (N
m
)−1/2
maxk∗≤k≤N
k(m+ k)
ℓ(m+ ℓ)
|σWm(ℓ)|√m
∣∣∣∣ √m
h(m, k)− 1
∣∣∣∣+
(N
m
)−1/2
maxk∗≤k≤N
1√m
∣∣∣∣∣m+k∑
t=m+k∗
x′t∆− k(m+ k)
ℓ(m+ ℓ)
m+ℓ∑t=m+k∗
x′t∆1ℓ≥k∗
∣∣∣∣∣∣∣∣∣ √
m
h(m, k)− 1
∣∣∣∣= C1 + C2 + C3.
It is easily seen that
maxk∗≤k≤N
∣∣∣∣ √m
h(m, k)− 1
∣∣∣∣ = maxk∗≤k≤N
∣∣∣∣ 1
(1 + k/m)2− 1
∣∣∣∣ = O
(N
m
),
and (N
m
)−1/2
maxk∗≤k≤N
|Wm(k)|√m
≤ max1≤k≤N
|Wm(k)|√N
D= max
0<t≤1|W (t)| = Op(1). (C.40)
Thus, C1 tends to 0 and similarly, we can show that the term C2 is op(1).
For C3, it tends to zero in the case of ℓ < k∗ from (C.43). When ℓ ≥ k∗, we have, from
(C.31),
C3 ≤(N
m
)−1/2
maxk∗≤k≤N
|Jm(k∗, k)| maxk∗≤k≤N
∣∣∣∣ √m
h(m, k)− 1
∣∣∣∣=
1√N
maxk∗≤k≤N
∣∣∣∣k(k − ℓ)
m
(−δ +
∆2
2
)∣∣∣∣+ o(1)
O
(N
m
)= O
(N5/2
m2
)= o(1).
Thus, the proof is complete.
Lemma C.14 Under Assumptions 3.1, 3.2(a), and 3.3, we have
limm→∞
P
(N
m
)−1/2
maxk∗≤k≤N
∣∣∣Wm(k, ℓ)
∣∣∣√m
− |Jm(k∗, N)|
≤ βFLm (γ)
= Φ(x), (C.41)
where βFLm (γ) =
(N
m
)−1/2
(cσ − |Jm(k∗, N)|) .
Proof of Lemma C.14. We first show that the drift term in Wm(k, ℓ)/√m is dominant.
From (C.40), we have
maxk∗≤k≤N
Wm(k)√m
= Op
((N
m
)1/2)
= op
(N2
m3/2
),
65
and thus, because k(m+ k)/ℓ(m+ ℓ) = O(1), the stochastic terms are op(N2/m3/2).
We next investigate the order of the magnitude of the following term:(N
m
)−1/2 1√m
∣∣∣∣∣m+k∑
t=m+k∗
x′t∆− k(m+ k)
ℓ(m+ ℓ)
m+ℓ∑t=m+k∗
x′t∆1ℓ≥k∗
∣∣∣∣∣ . (C.42)
In the case of ℓ < k∗, ℓ > (k+1)/η− 1 implies that k < η(k∗ +1)− 1. Then, it is easily seen
that (C.42) tends to zero because, using (C.18),
maxk∗≤k≤N
1√m
m+k∑t=m+k∗
x′t∆ = maxk∗≤k≤N
1√m
m+k∑
t=m+k∗
d′∆+m+k∑
t=m+k∗
(xt − d)′∆
= O
(k∗√m
)+O
(k∗2
m3/2
). (C.43)
For ℓ ≥ k∗, (C.42) is equal to (N/m)−1/2|Jm(k∗, k)|. From (C.31), it is easily seen that the
denominating term in (N/m)−1/2maxk∗≤k≤N |Jm(k∗, k)| is
maxk∗≤k≤N
k(k − ℓ)√Nm
∣∣∣∣−∆1 −∆2
2
∣∣∣∣ . (C.44)
Then, (C.42) is bounded below by
(N
m
)−1/2
maxk∗≤k≤N
|Jm(k∗, k)| ≥N(N −
[N+1η
])√Nm
∣∣∣∣−∆1 −∆2
2
∣∣∣∣+ o(1) = O
(N3/2
m
).
On the contrary, because k(k − ℓ) in (C.44) is a strictly increasing function of k and thus it
is bounded above by(N
m
)−1/2
maxk∗≤k≤N
|Jm(k∗, k)| ≤ maxk∗≤k≤N
k√Nm
(η − 1)(k + 1)
η
∣∣∣∣−∆1 −∆2
2
∣∣∣∣+o(1) = O
(N3/2
m
),
because
k + 1
η− 1 < ℓ ≤ k + 1
ηand thus
(η − 1)k − 1
η− 1 ≤ k − ℓ <
(η − 1)(k + 1)
η.
This implies that the maximum of |Wm(k, ℓ)|/√N will be determined by the term (C.42)
and then achieved close to N . Further, we have proven that (N/m)−1/2|Jm(k∗, N)| tends to
positive infinity in Lemma C.11. We thus find that for all ε ∈ (0, 1),
limm→∞
P
((N
m
)−1/2
maxk∗≤k≤N
|Wm(k, ℓ)|√m
=
(N
m
)−1/2
max(1−ε)N≤k≤N
|Wm(k, ℓ)|√m
).
66
Next, we show that
(N
m
)−1/2
max(1−ε)N≤k≤N
∣∣∣∣∣∣Wm(k)− k(m+ k)Wm(ℓ)
ℓ(m+ ℓ)−Wm(N) +
N(m+N)Wm
([N+1η
])[N+1η
] (m+
[N+1η
])∣∣∣∣∣∣/
√m
≤ sup(1−ε)N≤t≤N
1√N
∣∣∣∣∣∣Wm(t)− t(m+ t)Wm(ℓ)
ℓ(m+ ℓ)−Wm(N) +
N(m+N)Wm
([N+1η
])[N+1η
] (m+
[N+1η
])∣∣∣∣∣∣
D= sup
1−ε≤s≤1
∣∣∣∣∣∣Wm(s)−s(mN + s
)Wm
(1ηs+ c1
)(1ηs+ c1
)(mN + 1
ηs+ c1
) −Wm(1) +
(mN + 1
)Wm
(1η + c2
)(1η + c2
)(mN + 1
η + c2
)∣∣∣∣∣∣
≤ sup1−ε≤s≤1
|Wm(s)−Wm(1)|
+ sup1−ε≤s≤1
∣∣∣∣∣∣ s(mN + s
)(1ηs+ c1
)(mN + 1
ηs+ c1
) −(mN + 1
)(1η + c2
)(mN + 1
η + c2
)∣∣∣∣∣∣∣∣∣∣Wm
(1
ηs+ c1
)∣∣∣∣+ sup
1−ε≤s≤1
∣∣∣∣∣∣(mN + 1
)(1η + c2
)(mN + 1
η + c2
)∣∣∣∣∣∣∣∣∣∣Wm
(1
ηs+ c1
)−Wm
(1
η+ c2
)∣∣∣∣p−→ 0,
as ε → 0, where
ℓ =
[t+ 1
η
], s =
t
N, c1 =
([t+ 1
η
]− t
η
)/N, c2 =
([N + 1
η
]− N
η
)/N,
and c1, c2 tend to zero as N → ∞, where we used the scale transformation for equality in
distribution and the last convergence holds according to almost sure continuity of Brownian
motions.
Furthermore, we can see that
1√N
∣∣∣∣∣∣W (N)− N(m+N)[
N+1η
] (m+
[N+1η
])W ([N + 1
η
])−(W (N)− ηW
(N
η
))∣∣∣∣∣∣≤
∣∣∣∣∣∣η − N(m+N)[N+1η
] (m+
[N+1η
])∣∣∣∣∣∣∣∣∣∣ 1√
NW
([N + 1
η
])∣∣∣∣+ ∣∣∣∣ η√N
(W
([N + 1
η
])−W
(N
η
))∣∣∣∣= op(1).
67
Since(W (N)− ηW
(Nη
))/√N
D= W (η − 1), we have
limm→∞
P
(N
m
)−1/2
maxk∗≤k≤N
∣∣∣Wm(k, ℓ)
∣∣∣√m
− |Jm(k∗, N)|
≤ βm(γ)
= lim
m→∞P
(N
m
)−1/2∣∣∣Wm(N, [(N + 1)/η])
∣∣∣√m
− |Jm(k∗, N)|
≤ βm(γ)
= lim
m→∞P(σW (η − 1) ≤
√η − 1σx
)= Φ(x).
The proof is complete.
Proof of Theorem 3.4. By combining Lemmas C.11–C.14, we have
limm→∞
P (τm ≥ N) = limm→∞
P
(max
1≤k≤N
|ΓFLm (k, ℓ)|
gFL(m, k)≤ 1
)= lim
m→∞P
((N
m
)−1/2
maxk∗≤k≤N
(|Wm(k, ℓ)|√
m− |Jm(k∗, N)|
)≤ βm(γ)
)= Φ(x).
Or we can rewrite it as
limm→∞
P
(τ2m − a2m2ambm
≤ x
)= 1− lim
m→∞P (τ2m ≥ a2m + 2ambmx)
= 1− limm→∞
P (τ2m ≥ N2(m,−x))
= Φ(x). (C.45)
Since a2m/(2ambm) → ∞, (C.45) implies that τm/amp−→ 1, and the mean value theorem
yields thatτm − am
bm=
(τ2m)1/2 − (a2m)1/2
bm=
τ2m − a2m2ambm
(1 + op(1)).
From Slutsky’s theorem, we thus obtain
limm→∞
P
(τm − am
bm≤ x
)= lim
m→∞P
(τ2m − a2m2ambm
≤ x
)= Φ(x).
The proof is complete.
68
Table 3.1: Critical values
α 1% 2.5% 5% 10% 1% 2.5% 5% 10%
(a) κ = 1 (b) κ = 2
γ = 0.05 1.4641 1.2786 1.1296 0.9626 1.8146 1.5781 1.3866 1.1710γ = 0.15 1.5738 1.3774 1.2180 1.0429 1.8918 1.6456 1.4483 1.2267γ = 0.25 1.6989 1.4888 1.3213 1.1371 1.9725 1.7185 1.5161 1.2912γ = 0.35 1.8400 1.6193 1.4476 1.2603 2.0632 1.8059 1.5979 1.3776γ = 0.45 2.0250 1.8149 1.6523 1.4783 2.1776 1.9295 1.7361 1.5453
(c) κ = 3 (d) κ = 4
γ = 0.05 2.0005 1.7353 1.5230 1.2803 2.1099 1.8332 1.6073 1.3494γ = 0.15 2.0597 1.7875 1.5700 1.3222 2.1582 1.8751 1.6447 1.3837γ = 0.25 2.1215 1.8433 1.6220 1.3727 2.2083 1.9194 1.6857 1.4226γ = 0.35 2.1878 1.9061 1.6824 1.4389 2.2600 1.9694 1.7351 1.4780γ = 0.45 2.2702 1.9969 1.7894 1.5789 2.3288 2.0456 1.8232 1.6024
(e) κ = 5 (f) κ = 6
γ = 0.05 2.1886 1.9000 1.6647 1.3935 2.2443 1.9490 1.7051 1.4280γ = 0.15 2.2295 1.9353 1.6964 1.4218 2.2792 1.9799 1.7326 1.4531γ = 0.25 2.2717 1.9718 1.7301 1.4553 2.3158 2.0121 1.7625 1.4810γ = 0.35 2.3146 2.0145 1.7712 1.5055 2.3524 2.0463 1.7981 1.5247γ = 0.45 2.3686 2.0755 1.8480 1.6187 2.3965 2.1005 1.8666 1.6302
(g) κ = 7 (h) κ = 8
γ = 0.05 2.2866 1.9856 1.7365 1.4540 2.3171 2.0131 1.7603 1.4742γ = 0.15 2.3174 2.0126 1.7609 1.4756 2.3446 2.0369 1.7823 1.4931γ = 0.25 2.3490 2.0404 1.7864 1.5002 2.3727 2.0617 1.8051 1.5156γ = 0.35 2.3811 2.0703 1.8173 1.5397 2.4026 2.0885 1.8339 1.5516γ = 0.45 2.4182 2.1179 1.8802 1.6394 2.4370 2.1327 1.8918 1.6470
69
Table 3.2: Size of the tests (κ = 3)
m ρ CUSUM FL
50 0.4 0.088 0.0990.8 0.169 0.166
100 0.4 0.073 0.0730.8 0.113 0.107
250 0.4 0.052 0.0620.8 0.072 0.073
70
Table 3.3: Power of the tests (κ = 3, d = [1, 1]′)
k∗ = 1 k∗ = 0.8m k∗ = 1.5m
b CUSUM FL CUSUM FL CUSUM FL
(a) m = 50, ρ = 0.40.0 0.088 0.099 0.088 0.099 0.088 0.0990.5 0.609 0.287 0.373 0.814 0.259 0.8701.0 0.987 0.815 0.845 0.999 0.642 1.0001.5 1.000 0.986 0.990 1.000 0.911 1.0002.0 1.000 0.999 1.000 1.000 0.989 1.000
(b) m = 50, ρ = 0.80.0 0.169 0.166 0.169 0.166 0.169 0.1660.5 0.759 0.497 0.563 0.881 0.438 0.9181.0 0.986 0.876 0.917 0.996 0.797 0.9981.5 0.998 0.976 0.992 0.999 0.953 1.0002.0 0.999 0.995 0.998 1.000 0.991 1.000
(c) m = 100, ρ = 0.40.0 0.073 0.073 0.073 0.073 0.073 0.0730.5 0.867 0.494 0.566 0.988 0.372 0.9991.0 1.000 0.993 0.984 1.000 0.878 1.0001.5 1.000 1.000 1.000 1.000 0.997 1.0002.0 1.000 1.000 1.000 1.000 1.000 1.000
(d) m = 100, ρ = 0.80.0 0.113 0.107 0.113 0.107 0.113 0.1070.5 0.896 0.602 0.661 0.978 0.480 0.9921.0 1.000 0.980 0.986 1.000 0.910 1.0001.5 1.000 0.999 1.000 1.000 0.996 1.0002.0 1.000 1.000 1.000 1.000 1.000 1.000
(e) m = 250, ρ = 0.40.0 0.052 0.062 0.052 0.062 0.052 0.0620.5 0.999 0.955 0.907 1.000 0.677 1.0001.0 1.000 1.000 1.000 1.000 1.000 1.0001.5 1.000 1.000 1.000 1.000 1.000 1.0002.0 1.000 1.000 1.000 1.000 1.000 1.000
(f) m = 250, ρ = 0.80.0 0.072 0.073 0.072 0.073 0.072 0.0730.5 0.998 0.930 0.921 1.000 0.720 1.0001.0 1.000 1.000 1.000 1.000 0.998 1.0001.5 1.000 1.000 1.000 1.000 1.000 1.0002.0 1.000 1.000 1.000 1.000 1.000 1.000
71
Table 3.4: Power of the tests (κ = 3, b = 0.5, d = [1, 1]′)
k∗ CUSUM FL CUSUM FL CUSUM FL
(a) m = 50, ρ = 0.4 (b) m = 100, ρ = 0.4 (c) m = 250, ρ = 0.40.1m 0.548 0.321 0.802 0.482 0.995 0.9090.2m 0.501 0.485 0.748 0.729 0.987 0.9900.3m 0.471 0.595 0.704 0.870 0.976 0.9990.4m 0.447 0.677 0.666 0.931 0.965 1.0000.5m 0.426 0.720 0.637 0.961 0.951 1.0000.6m 0.407 0.762 0.613 0.977 0.936 1.0000.7m 0.390 0.791 0.590 0.982 0.921 1.0000.8m 0.373 0.814 0.566 0.988 0.907 1.0000.9m 0.357 0.830 0.539 0.992 0.887 1.0001.0m 0.339 0.845 0.513 0.994 0.864 1.0001.1m 0.322 0.850 0.486 0.996 0.832 1.0001.2m 0.306 0.861 0.457 0.997 0.799 1.0001.3m 0.289 0.864 0.429 0.998 0.766 1.0001.4m 0.274 0.871 0.401 0.998 0.727 1.0001.5m 0.259 0.870 0.372 0.999 0.677 1.000
(d) m = 50, ρ = 0.8 (e) m = 100, ρ = 0.8 (f) m = 250, ρ = 0.80.1m 0.714 0.493 0.851 0.557 0.994 0.8770.2m 0.672 0.652 0.807 0.778 0.985 0.9800.3m 0.645 0.755 0.773 0.878 0.977 0.9980.4m 0.622 0.805 0.747 0.933 0.967 1.0000.5m 0.608 0.834 0.723 0.954 0.956 1.0000.6m 0.593 0.851 0.700 0.966 0.947 1.0000.7m 0.576 0.872 0.679 0.974 0.933 1.0000.8m 0.563 0.881 0.661 0.978 0.921 1.0000.9m 0.550 0.894 0.645 0.983 0.907 1.0001.0m 0.534 0.902 0.624 0.987 0.885 1.0001.1m 0.515 0.906 0.597 0.989 0.863 1.0001.2m 0.500 0.910 0.570 0.990 0.829 1.0001.3m 0.486 0.914 0.542 0.991 0.800 1.0001.4m 0.460 0.916 0.512 0.991 0.759 1.0001.5m 0.438 0.918 0.480 0.992 0.720 1.000
72
Tab
le3.5:
Delay
time(κ
=3,
d=
[1,1]′)
k∗=
1k∗=
0.8m
k∗=
1.5m
bmin
1Q2Q
3Q
max
min
1Q2Q
3Qmax
min
1Q
2Q
3Q
max
(a)m
=50,ρ=
0. 4
0.5
CUSUM
210
1524
150
453
7093
150
477
99
121
150
FL
327
4772
150
553
6170
97
597
111
125
150
1.0
CUSUM
25
79
894
5363
78150
489
105
122
150
FL
335
5175
150
547
5054
80
586
91
96
149
1.5
CUSUM
23
45
494
4955
63146
488
98
112
150
FL
222
3853
147
545
4749
64
582
85
88
111
2.0
CUSUM
23
34
114
4751
55146
485
92
102
149
FL
23
338
145
543
4546
57
580
82
85
101
(b)m
=50,ρ=
0.8
0.5
CUSUM
26
1016
145
347
6180
150
353
92
115
150
FL
332
4974
149
848
5463
97
885
98
112
150
1.0
CUSUM
23
57
863
4755
67150
382
95
112
150
FL
223
4160
149
844
4751
81
881
86
92
150
1.5
CUSUM
23
34
423
4550
57148
381
90
102
150
FL
23
342
145
843
4547
74
879
82
86
145
2.0
CUSUM
22
33
273
4447
52141
380
86
95
150
FL
23
33
147
842
4445
80
878
80
83
117
(c)m
=100,
ρ=
0.4
0.5
CUSUM
314
2031
207
5112
141
183
300
5177
210
247
300
FL
973
112
160
300
9102
112
123
177
9186
202
218
300
1.0
CUSUM
26
810
585
101
114
132
300
5179
203
232
300
FL
360
80107
295
990
9498
123
9166
172
178
216
1.5
CUSUM
24
56
135
93100
109
179
5170
185
201
299
FL
347
5872
150
987
8992
105
9161
164
168
187
2.0
CUSUM
23
44
85
9094
100
133
5165
174
185
257
FL
339
4857
109
985
8789
98
9158
161
163
176
73
Tab
le3.5:
(con
tinued)
k∗=
1k∗=
0.8m
k∗=
1.5m
bmin
1Q2Q
3Qmax
min
1Q2Q
3Qmax
min
1Q
2Q
3Q
max
(d)m
=100,
ρ=
0.8
0.5
CUSUM
210
1625
223
6103
130
166
300
6166
202
242
300
FL
967
103
151
300
996
106
118
161
9176
192
210
299
1.0
CUSUM
25
68
586
95107
124
300
6171
193
220
300
FL
354
7299
295
988
9297
143
9162
169
175
241
1.5
CUSUM
23
45
266
9097
105
184
6165
178
194
300
FL
242
5568
277
985
8891
133
9158
162
166
239
2.0
CUSUM
23
34
346
8792
98177
6161
170
181
265
FL
23
4354
135
984
8688
116
9156
159
162
207
(e)m
=250,
ρ=
0.4
0.5
CUSUM
517
2332
192
12271
319
389
748
12
462
531
618
750
FL
28162
233
329
750
28234
248
261
330
28
431
449
468
587
1.0
CUSUM
47
910
2412
232
249
268
453
12
426
459
497
748
FL
28102
128
159
306
28216
222
228
251
28
401
409
417
461
1.5
CUSUM
35
56
1112
221
230
241
294
12
408
427
447
555
FL
2882
98115
207
28211
215
218
233
28
392
397
402
426
2.0
CUSUM
24
45
812
215
222
229
261
12
399
413
426
487
FL
2870
8395
146
28208
211
214
226
28
388
392
395
411
(f)m
=250,
ρ=
0.8
0.5
CUSUM
516
2130
203
12261
308
378
750
12
449
520
604
750
FL
31155
223
318
749
31231
245
259
346
31
425
444
465
615
1.0
CUSUM
36
810
2812
229
245
264
475
12
420
452
490
733
FL
31100
127
158
383
31215
221
227
258
31
398
407
416
474
1.5
CUSUM
34
56
1212
219
228
239
300
12
404
423
443
602
FL
3180
97115
211
31210
214
218
238
31
390
396
401
433
2.0
CUSUM
23
44
912
214
221
228
264
12
396
410
424
513
FL
369
8194
154
31207
210
213
229
31
387
391
395
416
74
Table 3.6: Results of the monitoring tests
(a) Tests for a unit root
Training period t-statistic(ADF) Exp-Wald
Denmark 1995Q2-2001Q3 −3.290479c 1.975Japan 1990Q2-2007Q4 −3.311520c 1.070New Zealand 1993Q1-2007Q4 −3.516139b 1.755
(b) Performance of the monitoring procedures
Training Estimated CUSUM FL
period break date
Denmark1995Q2-2001Q3 2001Q4 2002Q2 2007Q4
Japan1990Q2-1998Q4 2008Q3 2010Q3 2016Q21990Q2-2006Q2 2010Q4 2009Q41990Q2-2007Q4 2009Q3 **
New Zealand1993Q1-2001Q3 2008Q1 ** 2010Q31993Q1-2006Q3 2010Q1 2010Q21993Q1-2007Q4 2009Q2 **1 a, b, c denote statistical significance at 1%, 5%, and 10% level.2 ** means that the test cannot reject the null hypothesis of no change.
75
Figure 3.1: Logarithm of quarterly real GDP
76
Chapter 4
A New Test for Common Breaks inHeterogeneous Panel Data Models
In this chapter, we develop a new test to detect whether the break points are common in
heterogeneous panel data models where the time series dimension T could be large relative to
cross section dimension N. The error process is assumed to be cross-sectionally independent.
The test is based on the cumulative sum of the ordinary least squares residuals. We derive the
asymptotic distribution of the detecting statistic under the null hypothesis, while proving the
consistency of the test under the alternative. Monte Carlo simulations show good performance
of the test in terms of both size and power.
4.1 Introduction
In recent years, panel data models have become increasingly popular in the theoretical and
empirical analyses, since richer information from both the cross-section and time series dimen-
sion leads to more powerful inferences than with a single cross-section or a single time-series.
In particular, the modeling and inferences of structural changes in panel frameworks have at-
tracted great attention in the literature. Comparing to applying the single detection method
for structural changes separately to each series, using cross-section datasets improves break
detection power. The detecting procedures in panels are often designed to test for the null
hypothesis that the regression parameters in each series are constant over time against the
alternative that at least one series exhibits structural changes. See, for example, Horvath
and Huskova (2012) in a mean-shift panel model; De Wachter and Tzavalis (2012), Hidalgo
and Schafgans (2017) in dynamic panels; Pauwels et al. (2012) in panel data models allowing
for heterogeneous coefficients; Chen and Huang (2018) in time-varying panel data model;
77
Antoch et al. (2019) in panels with fixed T and large N, to name a few. However, the rejec-
tion of the null hypothesis leaves the researcher with no information as to which cross-section
unit exhibits structural changes. Furthermore, it naturally leads to an issue of change point
estimation in panel data models.
The classical change point estimation methodologies in panel literature often assume that
break point occurred in each series at the same location, referred as to the common break
point. This assumption is particularly attractive as the common break phenomenon exactly
occurs in many practical applications. The other major advantage of this assumption, as
Bai (2010) pointed out, is the increased accuracy of the change point estimate. It is well
known that only the break fraction (i.e., the break date divided by sample size) can be
consistently estimated in a single time series. In panel frameworks, however, the failure of
consistency of the break point in time series models has been overcome under the common
break assumption. This enhanced precision of common break point estimate has been widely
confirmed under various frameworks in panel data analyses. Kim (2011, 2014) focused on
panel deterministic time trend models and considered a factor structure on error component.
Although the former study stated that the ordinary least squares break date estimator fails
to achieve consistency as imposing the factor structure, the latter one overcame this problem
and developed a new estimation strategy, where the common break date is estimated jointly
with the common factor to successfully sustain precision advantage of common break point
estimate in panels. In addition, Qian and Su (2016) coped with a panel data model where
the parameters of interest are homogeneous and errors are assumed to be cross-sectionally
independent, while Baltagi et al. (2016) considered a more general panel framework allowing
for heterogeneous parameters across individuals and multifactor error structure. More related
works including Li et al. (2016), Baltagi et al. (2017), Horvath et al. (2017), Westerlund
(2019), and among others, have documented that the break date estimate obtains increased
precision via imposing common break assumption in panels.
In practice, however, the common break assumption is restrictive in a sense and some
evidence has verified that the break points are likely to vary significantly across individuals
(see Claeys and Vasıcek 2014; Adesanya 2020). To the best of our knowledge, none of the
papers focus on the validity of common break assumption in panels. We contribute to the
literature in three ways. First, we fill in this gap to introduce a test for the null hypothesis
that the panels exhibit a common break against the alternative that break dates can vary
across units. A closest related work is Oka and Perron (2018), who considered the common
78
break detection in maximum likelihood frameworks in multiple equation systems. We extend
their model to a more general framework where both the number of series N and the number
of observations T are sufficiently large, which makes it available using panel or macroeconomic
data in applications.
The second major contribution is that we investigate the statistical properties of the
estimated common break point when common break assumption fails. It is verified that
the common break estimate cannot be consistent for each series, but will be restricted in a
specific region. Based on this property, our test has a non-degenerate distribution under the
null hypothesis and achieves consistency under the alternative.
Third, our test delivers monotonic power as the magnitude of breaks rises. The statistic is
established by the squares of the cumulative sum of the residuals, and we use a normalization
factor to replace the long-run variance estimator to avoid power loss when the shift increases
under the alternative (so-called nonmonotonic power problem). Monte Carlo simulations
show good size performance for large T. Moreover, the test can successfully reject the null
hypothesis of common break against various types of alternatives and has nontrivial power
for large breaks.
From a different perspective, recent clustering literature suggested an estimation method-
ology as an alternative strategy to identify distinct breaks across units in panels. The panel
data is modeled using a grouped pattern, in which the regression coefficients containing break
dates are heterogeneous across groups but homogeneous within a group. In this framework,
Okui and Wang (2020), and Lumsdaine et al. (2020) proposed iterative estimation approaches
to jointly estimate the break point, group membership structure, and coefficients. Consistency
of all coefficients estimates can be achieved simultaneously within the prior information on
the number of groups and an appropriate choice of the initial values for iteration. Researchers
can determine to conduct a testing procedure or to apply an estimation methodology or to
use a hybrid of two approaches depending on their empirical purpose.
The remainder of this chapter is as follows. Section 4.2 introduces the model and neces-
sary assumptions. Section 4.3 explains the testing strategy for common break assumption.
Section 4.4 establishes the asymptotic distribution of the statistic under the null and the
consistency of the test under the alternative. Monte Carlo simulation is conducted in Section
4.5. Concluding remarks are given in Section 4.6. The mathematical proofs are relegated to
the Appendix D.
79
4.2 Model and Assumptions
We consider a panel data model allowing for heterogeneous coefficients across units, defined
by
yit = x′itβi + x′itδi1t>k0i + uit, 1 ≤ i ≤ N and 1 ≤ t ≤ T, (4.1)
where xit = [xit(1), · · · , xit(p)]′ is p-dimensional explanatory variables including a constant
term and thus the first element is unity for all t. Coefficients βi = [βi1, · · · , βip]′, δi =
[δi1, · · · , δip]′ are p×1 vectors of fixed parameters, and 1t>k0i is an indicator function, taking
the value one if t > k0i and zero otherwise. uit is an unobservable stochastic disturbance. We
assume that the regression parameters in the ith panel change from βi to βi+ δi at unknown
time k0i , and we are interested in testing whether the break point in each series is common
against the alternative that the break point varies across individuals. The null hypothesis is
defined as
H0 : k0i = k0, for all i = 1, 2, · · · , N.
Under the alternative of distinct breaks across individuals, we suppose that there exist G
groups and the regression coefficients share the common break point in each group g =
1, 2, · · · , G. Then, the alternative hypothesis is defined by
HA : k0g1 = k0g2 , for some g1, g2 ∈ 1, 2, · · · , G.
In this chapter, we impose the following assumptions.
Assumption 4.1 k0i = [Tτ 0i ], where τ0i ∈ (0, 1) and [·] is the greatest integer function.
The break point k0i is assumed to be bounded away from the end points, which is a positive
fraction of the total sample size. This is a conventional assumption in the change point
literature, see Bai (1997).
Assumption 4.2 Define ϕN =∑N
i=1 δ0′i δ0i . Suppose that
(i) ϕN → ∞ as N → ∞
(ii) ϕNN is bounded as N → ∞,
(iii) ϕNTN → ∞, ϕN
√TN → ∞, and ϕN
√T
N → ∞ as (T,N) → ∞,
(iv) NT → 0 as (T,N) → ∞.
Denote δ0i as true shift for individual i. Assumptions 4.2(i)-(iii) are borrowed from Assump-
tions A2 in Baltagi et al. (2016). The additional Assumption 4.2(iv) requires that T grows at
80
a faster rate than N . This is a significant condition to ensure a non-degenerate distribution
of the statistic under the null hypothesis and consistency of test under the alternative.
Assumption 4.3 (i) For each series i, uit is independent of xit for all i and t;
(ii) uit =∑∞
j=0 aijϵi,t−j, ϵit ∼ (0, σ2iϵ) are i.i.d over all i and t;
∑j j|aij | ≤ M for all i.
The idiosyncratic errors form a stationary time series, and it is assumed that uit are cross-
sectionally independent, similarly to the assumption in Bai (2010). In practice, this assump-
tion is relatively restrictive as cross-sectional dependence commonly exists in many panel
datasets. As explained in Sections 4.3 and 4.4, the statistic of our test can have a non-
degenerate distribution under the null hypothesis of common break, crucially depending on
the consistency of common change point estimate. However, Kim (2011) has indicated that
imposing a factor structure on error component may impede the consistency property. Some
additional techniques are needed if we relax Assumption 4.3 to allow for cross-sectional de-
pendence.
Assumption 4.4 (i) For i = 1, · · · , N , the matrices (1/j)∑j
t=1 xitx′it, (1/j)
∑Tt=T−j+1 xitx
′it,
(1/j)∑k0i
t=k0i−j+1xitx
′it, and (1/j)
∑k0i+j
t=k0i+1xitx
′it are stochastically bounded and have mini-
mum eigenvalues uniformly bounded away from zero in probability for all large j.
(ii) For each i, (1/T )∑T
t=1 xitx′it converges in probability to a nonrandom and positive defi-
nite p× p matrix Ci as T → ∞.
(iii) For each i, (1/T )∑T
t=1 xit converges in probability to a p× 1 vector ci1 as T → ∞.
Denote the jth row of Ci by cij for j = 1, · · · , p. That is, C = [ci1, · · · , cip]′. Note that the
vector c′i1 is the first row of Ci.
Assumption 4.5 (i) For any positive finite integer s, the matrices (1/N)∑N
i=1
∑k0it=k0i−s+1
xitx′it,
and (1/N)∑N
i=1
∑k0i+s
t=k0i+1xitx
′it are stochastically bounded and have minimum eigenvalues
uniformly bounded away from zero in probability for all large N .
(ii) For each t, (1/N)∑N
i=1 xitx′it is stochastically bounded as N → ∞.
Assumptions 4.4 is a conventional assumption in time series models, see, e.g., Bai (1997),
while Assumption 4.5 is an extension in the case of the cross-sectional dimension borrowed
from Assumptions A5 in Baltagi et al. (2016).
81
4.3 Test Statistic
The null hypothesis assumes that the panels exhibit one break occurring at an unknown
common location. We first use least squares method as proposed by Baltagi et al. (2016) to
estimate the common break point. Let
Yi =
yi1yi2...
yiT
, Xi =
x′i1x′i2...
x′iT
, Zi(ki) =
0...0
x′i(ki+1)...
x′iT
, and ui =
ui1ui2...
uiT
.
The model with an unknown break point ki can be rewritten in matrix form as
Yi = Xiβi + Zi(ki)δi + ui
= [Xi, Zi(ki)]
[βiδi
]+ ui
= Xi(ki)bi + ui, (4.2)
where Xi(ki) = [Xi, Zi(ki)], and bi = [β′i, δ
′i]′. Given any k∗ = 1, 2, · · · , T−1, one can estimate
bi by
bi(k∗) =
[βi(k
∗)
δi(k∗)
]= [Xi(k
∗)′Xi(k∗)]−1Xi(k
∗)′Yi, i = 1, · · · , N.
The sum of squared residuals for ith equation is given by
SSRi(k∗) = [Yi − Xi(k
∗)bi(k∗)]′[Yi − Xi(k
∗)bi(k∗)], i = 1, · · · , N.
The least squares estimator of k∗ is defined as
k = arg min1≤k∗≤T−1
N∑i=1
πiSSRi(k∗). (4.3)
where weights πi ∈ (0, 1), i = 1, · · · , N ,∑N
i=1 πi = 1.
Our statistic is composed of the ordinary least squares residuals based on the estimated
common break point k. We decompose the panels into two regimes using k on the time series
dimension. Then, the OLS residuals are calculated by
ui =
ui1ui2...
uiT
= Yi − Xi(k)bi(k), (4.4)
82
and the squares of the partial sum of the OLS residuals uit are defined by
USNT (k, k) =
(1√NT
N∑i=1
k∑t=1
uit
)2
, where k = [Tτ ] with τ ∈ (0, 1). (4.5)
The statistic is a CUSUM-type of residuals, motivated by the consistency of the break point
estimate if common break assumption holds. Under the null hypothesis that all individuals
are assumed to share a common change point k0 = [Tτ 0] with τ0 ∈ (0, 1), Baltagi et al.
(2016) verified that the common break date is consistently estimated. Based on the con-
sistency that kp−→ k0, the regression parameters corresponding to regimes xi1, · · · , xik,
xi(k+1), · · · , xiT are asymptotically constant over time. Consequently, the cumulative sums
of the corresponding residuals will not diverge and can have a non-degenerate distribution,
which is derived as follows:
USNT (k, k) ⇒
σ2[W (τ)− τ
τ0W (τ0)]2 if τ ≤ τ0
σ2[W (τ)−W (τ0)− τ−τ0
1−τ0(W (1)−W (τ0))]2 if τ > τ0
,
where W (·) is a one-dimensional Brownian motion, σ2 is the long-run variance. Under the
alternative of distinct breaks, since the estimated common break point cannot coincide with
the true break point for each series, partial residuals will greatly deviate from the one under
the null. Hence, USNT (k, k) will diverge to infinite as N,T → ∞ such that we can successfully
reject the null hypothesis.
A traditional approach is using a consistent estimate to replace the unknown σ2, while the
kernel estimator is commonly applied. Typically, the selection of the bandwidth for the kernel
estimator greatly affects the size and power performance of the test. In time series analyses,
it has been extensively mentioned that the structural change tests suffer from the so-called
non-monotonic power problem, that is, the tests may lose power as the magnitude of the break
rises. See Vogelsang (1999), Deng and Perron (2008), Yamazaki and Kurozumi (2015), Jiang
and Kurozumi (2019), and among others. The main reason is that the long-run variance
estimated under the null is consistent but may be severely biased under the alternative.
To maintain nontrivial detection power for large breaks, we extend the self-normalization
method proposed by Shao and Zhang (2010) to construct a normalization factor instead of
using the long-run variance estimate. This normalization factor VNT (k1, k, k2) is required to
be proportional to σ2 such that the long-run variance can be canceled out as
USNT (k, k)
VNT (k1, k, k2)⇒ σ2 functional of Brownian motions
σ2 functional of Brownian motions,
83
where the long-run variance σ2 is lim(N,T )→∞E
(1√NT
N∑i=1
T∑t=1
uit
)2
. Furthermore, the nor-
malization process cannot grow at a faster rate relative to the process USNT (k, k) under the
alternative to avoid loss of power. To this end, we separate the panels into four regimes by flex-
ible points k1, k2 and the estimated break point k, where k1 and k2 takes value in the interval
1 ≤ k1 < k < k2 ≤ T −1. We estimate the model on the basis of four regimes xi1, · · · , xik1,
xi(k1+1), · · · , xik, xi(k+1), · · · , xik2 and xi(k2+1), · · · , xiT for the ith equation. Denote
T × p matrices by
Xji(a, b) = [0, · · · , 0, xi,a+1, · · · , xi,b, 0, · · · , 0]′, j = 1, 2, (4.6)
X3i(a) = [0, · · · , 0, xi,a+1, · · · , xT ]′, (4.7)
where the elements of the (a+1)th-bth rows of Xji(a, b) are the same as that of Xi and zeros
otherwise, and the elements of the (a + 1)th-T th rows of X3i(a) are the same as that of Xi
and zeros otherwise. Then, the model can be represented by
Yi = [Xi, X1i(k1, k), X2i(k, k2), X3i(k2)]
βiδ1iδ2iδ3i
+ ui
= Xiβi +X1i(k1, k)δ1i +X2i(k, k2)δ2i +X3i(k2)δ3i + ui
= Xi(k1, k, k2)bi + ui, (4.8)
where Xi(k1, k, k2) = [Xi, X1i(k1, k), X2i(k, k2), X3i(k2)]. Using the coefficients estimators βi,
δ1i, δ2i, and δ3i, the corresponding residuals are calculated as
ui =
ui1ui2...
uiT
= Yi −Xiβi −X1i(k1, k)δ1i −X2i(k, k2)δ2i −X3i(k2)δ2i. (4.9)
Then, we define the process VNT (k1, k, k2) based on residuals uit as
VNT (k1, k, k2)
=1
T
k1∑s=1
(1√NT
N∑i=1
s∑t=1
uit
)2
+1
T
k∑s=k1+1
1√NT
N∑i=1
k∑t=s
uit
2
+1
T
k2∑s=k+1
1√NT
N∑i=1
s∑t=k+1
uit
2
+1
T
T∑s=k2+1
(1√NT
N∑i=1
T∑t=s
uit
)2
. (4.10)
84
Thus, our test statistic is composed by the squared CUSUM of residuals (4.5) and the nor-
malization factor (4.10), defined by
SNT (k, k1, k2) = sup(k,k1,k2)∈Ω(ϵ)
USNT (k, k)
VNT (k1, k, k2)
= sup(k,k1,k2)∈Ω(ϵ)
(1√NT
N∑i=1
k∑t=1
uit
)2 1
T
k1∑s=1
(1√NT
N∑i=1
s∑t=1
uit
)2
+1
T
k∑s=k1+1
1√NT
N∑i=1
k∑t=s
uit
2
+1
T
k2∑s=k+1
1√NT
N∑i=1
s∑t=k+1
uit
2
+1
T
T∑s=k2+1
(1√NT
N∑i=1
T∑t=s
uit
)2
−1
,
where Ω(ϵ) = (k, k1, k2) or (τ, τ1, τ2) : [Tϵ] ≤ k ≤ [T (1− ϵ)], [Tϵ] ≤ k1 ≤ k − [Tϵ], k + [Tϵ] ≤
k2 ≤ [T (1− ϵ)]. k = [Tτ ], k1 = [Tτ1] and k2 = [Tτ2] with τ, τ1, τ2 ∈ (0, 1).
4.4 Asymptotic Theory
We next derive the limiting properties of the test statistic.
Theorem 4.1 Suppose that Assumptions 4.1-4.5 hold. Then, under H0, we have, as N,T →
∞,
SNT (k, k1, k2)
⇒ sup(τ,τ1,τ2)∈Ω(ϵ)
W (τ)− τ
W (τ0)
τ0− (τ − τ0)
[W (1)−W (τ0)
1− τ0− W (τ0)
τ0
]1τ>τ0
2
∫ τ1
0
[W (r)− r
W (τ1)
τ1
]2dr +
∫ τ0
τ1
[W (τ0)−W (r)− (τ0 − r)
W (τ0)−W (τ1)
τ0 − τ1
]2dr
+
∫ τ2
τ0
[W (r)−W (τ0)− (r − τ0)
W (τ2)−W (τ0)
τ2 − τ0
]2dr
+
∫ 1
τ2
[W (1)−W (r)− (1− r)
W (1)−W (τ2)
1− τ2
]2dr
−1
,
where W (·) is a standard Brownian motion, k = [Tτ ], k1 = [Tτ1], k0 = [Tτ 0], and k2 = [Tτ2]
with τ, τ0, τ1, τ2,∈ (0, 1).
Under the null hypothesis, the proposed test has a non-standard limit distribution depending
on the true break fraction, which is unknown in practice. We choose τ0 = 0.1, 0.2, · · · , 0.9,
85
and approximate Brownian motions using 2,000 independent normal random variables with
10,000 replications to obtain the critical values in Table 4.1. A researcher can calculate
an appropriate critical value depending on the value of the estimated break fraction. For
example, if τ ∈ [0.4, 0.5), we obtain the critical value by
c = c0.4 + 10(τ − 0.1)(c0.5 − c0.4).
We next investigate the behavior of the proposed test statistic when the breaks vary across
individuals. We focus on the case that there are two groups and individuals in the same
group share a common break k0j , j = 1, 2.
H1A : |k01 − k02| ≥ ∆T, for some ∆ > 0.
Assumption 4.6 Let Nj , j = 1, 2, denote the number of units in group j (N = N1 + N2).
Suppose that Nj/N → πj > 0 for j = 1, 2.
In order to characterize the limiting properties of the test statistic under the alternative, it is
useful to first state some preliminary results about the statistical properties of the estimated
common break point. Define K(C) = k : 1 ≤ k < k01 − C1, k02 + C2 < k ≤ T − 1, where
C1, C2 are finite numbers.
Proposition 4.1 Suppose that Assumptions 4.1-4.6 hold. Then, under H1A, for any given
ϵ > 0, for both large N and T ,
P (k ∈ K(C)) < ϵ.
Proposition 4.1 states the possible region of the location of the common break date estimator
when the common break assumption fails. It implies that this estimator will be bounded
away from the both end points. In other words, the estimated common break point will
possibly lie between the two true break points, or be stochastically bounded by either of the
true break dates.
Proposition 4.2 Suppose that Assumptions 4.1-4.6 hold. Under H1A, for both large N and
T ,
(i) if k < k01,
supk∈Ω(ϵ)
USNT (k, k) = Op(NT ),
86
(ii) if k01 ≤ k ≤ k02,
supk∈Ω(ϵ)
USNT (k, k) = Op(NT ),
(iii) if k02 < k,
supk∈Ω(ϵ)
USNT (k, k) = Op(NT ),
Proposition 4.2 derives the divergence rate of the process USNT (k, k) under the alternative.
In case (i), Proposition 4.1 implies that the common change point estimate is bounded by
the true break point k01, that is k01− k = Op(1). Since we assume that the two true breaks are
separated by some positive fraction of the sample size, k will get distant from the other break
date k02. Therefore, for individuals in group 2, the regression parameters will be estimated
based on an inconsistent break fraction estimate. Then, we can find that the CUSUM of
corresponding residuals uit in USNT (k, k) will diverge to infinity at a rate of NT . For the
second and third cases, it is shown that the divergence rate of the process USNT (k, k) is the
same as that in case (i).
Proposition 4.3 Suppose that Assumptions 4.1-4.6 hold. Under H1A, for any given ϵ > 0,
there exists a finite M > 0 such that, for both large N and T ,
(i)
P(
inf(k1,k2)∈Ω(ϵ)
VNT (k1, k, k2) > M∣∣∣k01 − C1 < k ≤ k01
)< ϵ,
(ii)
P(
inf(k1,k2)∈Ω(ϵ)
VNT (k1, k, k2) > M∣∣∣k01 < k < k02
)< ϵ,
(iii)
P(
inf(k1,k2)∈Ω(ϵ)
VNT (k1, k, k2) > M∣∣∣k02 ≤ k < k02 + C2
)< ϵ.
Proposition 4.3 investigates the limiting properties of the normalization process under the
alternative. The results state that inf (k1,k2)∈Ω(ϵ) VNT (k1, k, k2) is Op(1). Since the model
is estimated based on four subsamples for the normalization factor, we can eventually find
appropriate k1 and k2 such that the minimization will not diverge. The numerator of the
statistic diverges at a rate of NT and the denominator has a finite limit. Then, we derive
the consistency of the test under the alternative in the following theorem.
Theorem 4.2 Suppose that Assumptions 4.1-4.6 hold. Then, under H1A, we have, as
N,T → ∞,
SNT (k, k1, k2) → ∞.
87
The consistency of this test is achieved under a particular and specified alternative H1A.
Nevertheless, our simulations confirm that this test is still valid and powerful against a
variety of alternatives.
4.5 Finite Sample Properties
In this section, we investigate the finite sample performance of the test considered in the
previous sections. The data-generating process (DGP.1) under the null hypothesis of one
common break is given by
yit = x′itβi + x′itδi1t>k0 + uit, i = 1, · · · , N, t = 1, · · · , T.
where xit = [1, zit]′ includes a constant, each zit has normal distribution N(1, 1) and is
independent of the errors uit, 1 ≤ t ≤ T, 1 ≤ i ≤ N . We assume that there exists a common
break k0 = [0.5T ] in the slopes. The coefficients βi ∼ i.i.d.U(0, 0.8) and δi is the jump for each
series with δi ∼ i.i.d.U(0, 0.5). We allow for serial correlation in the errors uit = ρui(t−1)+ eit
with eit ∼ i.i.d.N(0, (1 − ρ)2). The trimming parameter ϵ is 0.1, the number of replications
is 2,000, and all computations are conducted using the GAUSS matrix language.
Table 4.2 summarizes the empirical sizes of the test for different pairs of (N,T ). In the
case of i.i.d. errors, the nominal rejection rate is close to the corresponding significance level
of the test. When the errors are allowed to be serially correlated with ρ = 0.4, 0.8, for small
N and T , the size distortion is quite noticeable. The size improves for large T and appears
to be quite close to nominal level at T = 200 corresponding to ρ = 0.4.
In practice, usually no prior information is available on the form of structural changes for
researchers. Therefore, we conduct extensive simulations to explore the empirical power of
the test for various group patterns of structural change and different magnitude of the break.
We consider three types of the alternative hypotheses:
• H1A: There are two groups and the series in each group share common break k0j ,
j = 1, 2. Let Nj denote the number of units in group j and N = N1 +N2.
• H2A: There are three groups and the series in each group share common break k0j ,
j = 1, 2, 3. Note that N = N1 +N2 +N3.
• H3A: Suppose that there is no group pattern. The break point for jth series is given
by k0j , j = 1, 2, · · · , N .
88
The data generating process (DGP.2) under H1A is given byyit = x′itβi + x′itδ1i1t>k01 + uit t = 1, · · · , T, for i in group 1,
yit = x′itβi + x′itδ2i1t>k02 + uit t = 1, · · · , T, for i in group 2,
which is the same as DGP.1 except that the change point varies across groups. The first group
exhibits one common break at k01 = [T/4], and we set the time of change k02 equal to [3T/4] in
the second group. We assume βi ∼ i.i.d.U(0, 0.8) and the jumps δ1i, δ2i ∼ i.i.d.U(0, 0.5). The
ratio of units among groups is set to N1 : N2 = 5 : 5. Table 4.3 shows that the test is powerful
except when N is 10 under H1A. Table 4.4 reports the effect of the magnitude of change on
power. As we expected, the proposed test delivers monotonic power. We further find that
the power of the test is sensitive to change points location. In Table 4.5, we fix one common
break at [0.2T ] and the other break changes from [0.25T ] to [0.8T ]. In the case that two
breaks are well separated by a positive fraction of the sample size, the common break cannot
be consistently estimated for at least one group. The residuals based on an inconsistent
estimator get farther away from the one under the null, which makes the test powerful to
reject the null. When two break dates are quite close and cannot be easily identified, the
small deviation of the residuals results in power loss of the test. Table 4.6 shows that the test
is powerful when the number of observations in each group is sufficiently large, while loses
power when the number of the units in one of the groups is relatively small. Furthermore,
we investigate the power performance when detecting orthogonal structural changes. The
magnitude of the change for individual i is ∆i ∼ i.i.d.U(0, 0.5). We use vectors a = [1, 1]′
and b = [1,−1]′ to represent the directions of the breaks. Then, the jumps are specified by
δ1i, δ2i = a∆i or b∆i, which corresponds to non-orthogonal changes or orthogonal changes,
respectively. We consider three cases: (a) non-orthogonal changes (δ1i, δ2i = a∆i), (b) (non-
)orthogonal changes (δ1i = a∆i, δ2i = b∆i), (c) orthogonal changes (δ1i, δ2i = b∆i). The
results are presented in Table 4.7. When detecting non-orthogonal structural changes, the
test is powerful, while the test loses power in the case of orthogonal structural changes. The
development of the test robust to the orthogonal change would be our future work.
We next investigate the power properties of the test under H2A, and the results for various
change points location and ratios of units among groups are reported in Table 4.8. The test
can successfully reject the null of one common break for large N and is less powerful for small
N.
We also examine the empirical power under H3A that the change point varies across
individuals without group pattern. The change point for individual j is set to k0j = [Tτ 0j ],
89
j = 1, 2, · · · , N , while the break fraction τ0j is drawn from U(0.15, 0.75). Table 4.9 shows
that the test can successfully reject the null hypothesis of common break for large N.
In summary, the size of the proposed test is controlled for large N and T. In addition, the
test exhibits monotonic power as the magnitude of break increases and is powerful against
various alternatives.
4.6 Conclusion
In this chapter, we developed a new test based on OLS residuals to detect whether the
structural breaks across individuals occurred at the common location in panel data models.
The asymptotic properties of the test were investigated under the null and alternative. The
simulation results indicated that the test can successfully reject the null hypothesis of the
common break.
Appendix D
Proof of Theorem 4.1
Supposing that the structural change occurred at a common location, Baltagi et al. (2016)
showed the consistency of the common break estimator,
lim(N,T )→∞
P (k = k0) = 1, which implies |k − k0| = op(1). (D.1)
In this Appendix, we derive the asymptotic distribution of the test statistic under the null
hypothesis by using this consistency property. We first focus on the limiting properties of
the numerator of the statistic. Model (4.2) with the true common break k0 is written as
Yi = Xi(k0)b0i + ui
= Xi(k)b0i + ui + [Xi(k
0)− Xi(k)]b0i
= Xi(k)b0i + ui + [Zi(k
0)− Zi(k)]δ0i , (D.2)
where b0i = [β0′i , δ0
′i ]′. Replacing Yi by (D.2), the residuals in (4.4) can be rewritten as
ui = Xi(k)b0i + ui + [Zi(k
0)− Zi(k)]δ0i − Xi(k)bi(k)
= ui − Xi(k)[bi(k)− b0i ] + [Zi(k0)− Zi(k)]δ
0i
= ui −Xi[βi(k)− β0i ]− Zi(k)[δi(k)− δ0i ] + [Zi(k
0)− Zi(k)]δ0i , (D.3)
90
whose vector form is represented by
ui1ui2...
uiT
=
ui1ui2...
uiT
−
x′i1x′i2...
x′iT
(βi(k)−β0i )−
0...0
x′i(k+1)...
x′iT
(δi(k)−δ0i )+
0...0
x′i(k0+1)...
x′iT
−
0...0
x′i(k+1)...
x′iT
δ0i .
For the sake of simplicity, k is suppressed in βi(k) and δi(k). Then, the cumulative sum of
the residuals is
1√NT
N∑i=1
k∑t=1
uit
=1√NT
N∑i=1
k∑t=1
uit −1√NT
N∑i=1
k∑t=1
x′it(βi − β0i )−
1√NT
N∑i=1
k∑t=k+1
x′it(δi − δ0i )1k>k
+1√NT
N∑i=1
k∑t=k0+1
x′itδ0i 1k0<k≤k +
1√NT
N∑i=1
k∑t=k0+1
x′itδ0i 1k0<k<k
− 1√NT
N∑i=1
k∑t=k+1
x′itδ0i 1k<k≤k0 −
1√NT
N∑i=1
k0∑t=k+1
x′itδ0i 1k<k0<k
= U1 − U2 − U3 + U4 + U5 − U6 − U7. (D.4)
We can show that the terms U4, U5, U6 and U7 are negligible as N,T → ∞. Since k is bounded
by k0 and k in U4, using the convergence property (D.1),
U4 =1√NT
N∑i=1
k∑t=k0+1
x′itδ0i 1k0<k≤k =
√N
Top(1) = op
(√N
T
). (D.5)
Similarly, it is shown that the orders of the terms U5, U6 and U7 are op
(√NT
), which will
vanish sinceN/T → 0 in Assumption 4.2 (iv). The asymptotic distributions of the dominating
terms U1, U2, and U3 are derived in Lemma D.1.
Lemma D.1 Suppose that Assumptions 4.1-4.5 hold. We have, uniformly in τ ∈ (0, 1),
(i)1√NT
N∑i=1
k∑t=1
uit ⇒ σW (τ),
(ii)1√NT
N∑i=1
k∑t=1
x′it(βi − β0i ) ⇒ στ
W (τ0)
τ0,
(iii)1√NT
N∑i=1
k∑t=k+1
x′it(δi − δ0i )1k>k ⇒ σ(τ − τ0)
[W (1)−W (τ0)
1− τ0− W (τ0)
τ0
]1τ>τ0,
91
where k = [Tτ ], k0 = [Tτ0], W (·) is a standard Brownian motion, and long-run variance σ2
is lim(N,T )→∞E
(1√NT
N∑i=1
T∑t=1
uit
)2
.
Proof of Lemma D.1. (i) Denote the process
XN,T
(k
T
)=
1√NT
N∑i=1
k∑t=1
uit.
It is shown that, for a particular τ ,
1√NT
N∑i=1
[Tτ ]∑t=1
uitd−→ σW (τ),
as N,T → ∞. It remains to show that the weak convergence holds uniformly in τ ∈ (0, 1).
To this end, by Billingsley’s (1968) Theorem 12.3, we next show that moment condition (D.8)
is satisfied such that the process XN,T (τ) is tight. Applying Rosenthal’s inequality, we have,
E
∣∣∣∣XN,T
(l
T
)−XN,T
(k
T
)∣∣∣∣2γ = E
∣∣∣∣∣ 1√NT
N∑i=1
l∑t=k+1
uit
∣∣∣∣∣2γ
≤ c1
N∑i=1
E
∣∣∣∣∣ 1√NT
l∑t=k+1
uit
∣∣∣∣∣2γ
+ c2
1
N
N∑i=1
E
(1√T
l∑t=k+1
uit
)2γ
≤ c1
N∑i=1
E
∣∣∣∣∣ 1√NT
l∑t=k+1
uit
∣∣∣∣∣2γ
+ c3
(l − k
T
)γ
, (D.6)
with some constants c1, c2, and c3. According to Phillips and Solo (1992) and p.637 of Horvath
and Huskova (2012), the partial sum of uit is composed of two parts,
k∑t=1
uit = ai
k∑t=1
ϵit + ηik,
where ηik = e∗i0−e∗ik, e∗it =
∑∞l=1 c
∗ilϵi(t−l), and c∗il =
∑∞k=l+1 cik. For the term ηik, Horvath and
Huskova (2012, p.640) showed that E|ηik|γ ≤ cE|ϵi0|γ . Then, using Minkowski’s inequality
92
and Rothenthal’s inequality, it is shown that, for γ > 1,
E
∣∣∣∣∣l∑
t=k+1
uit
∣∣∣∣∣2γ
= E
∣∣∣∣∣ail∑
t=k+1
ϵit + ηil − ηik
∣∣∣∣∣2γ
≤
E
∣∣∣∣∣ail∑
t=k+1
ϵit
∣∣∣∣∣2γ 1
2γ
+(E |ηil − ηik|2γ
) 12γ
2γ
≤
[c4
l∑t=k+1
E |ϵit|2γ + c5
(l∑
t=k+1
E(ϵit)2
)γ] 12γ
+(E |ηil − ηik|2γ
) 12γ
2γ
≤[
c4(l − k)E |ϵi0|2γ + c5(l − k)γ(E(ϵi0)
2)γ] 1
2γ+(E |ηil − ηik|2γ
) 12γ
2γ
≤[
c6(l − k)γE |ϵi0|2γ] 1
2γ+(E |ϵi0|2γ
) 12γ
2γ
≤ c7(l − k)γE |ϵi0|2γ ,
with some constants c4-c7. Then, we have, for γ > 1 ,
N∑i=1
E
∣∣∣∣∣ 1√NT
l∑t=k+1
uit
∣∣∣∣∣2γ
=1
(NT )γ
N∑i=1
E
∣∣∣∣∣l∑
t=k+1
uit
∣∣∣∣∣2γ
≤ 1
(NT )γ
N∑i=1
c7(l − k)γE |ϵi0|2γ
≤ c7
(l − k
T
)γ 1
N
N∑i=1
E |ϵi0|2γ
≤ c8
(l − k
T
)γ
, (D.7)
with a constant c8. Combining (D.6) and (D.7), we can show that there exists a constant
γ > 1 such that
E
∣∣∣∣XN,T
(l
T
)−XN,T
(k
T
)∣∣∣∣2γ ≤ c8
(l − k
T
)γ
. (D.8)
(ii) By regressing Yi on Xi(k), the coefficient βi is estimated as, if k ≤ k0,
βi =
k∑t=1
xitx′it
−1k∑
t=1
xityit =
k∑t=1
xitx′it
−1k∑
t=1
xit(x′itβ
0i +uit) = β0
i +
k∑t=1
xitx′it
−1k∑
t=1
xituit,
and if k > k0,
βi =
k∑t=1
xitx′it
−1 k0∑t=1
xit(x′itβ
0i + uit) +
k∑t=k0+1
xit(x′itβ
0i + x′itδ
0i + uit)
= β0
i +
k∑t=1
xitx′it
−1k∑
t=1
xituit +
k∑t=1
xitx′it
−1k∑
t=k0+1
xitx′itδ
0i . (D.9)
93
Then, we can see that,
√T (βi − β0
i ) =
1
T
k∑t=1
xitx′it
−1
1√T
k∑t=1
xituit +
1
T
k∑t=1
xitx′it
−1
1√T
k∑t=k0+1
xitx′itδ
0i 1k>k0 (D.10)
=
1
T
k0∑t=1
xitx′it
−1
1√T
k0∑t=1
xituit +
1
T
k∑t=1
xitx′it
−1
−
1
T
k0∑t=1
xitx′it
−1 1√T
k∑t=1
xituit
+
1
T
k0∑t=1
xitx′it
−1 1√T
k∑t=1
xituit −1√T
k0∑t=1
xituit
+
1
T
k∑t=1
xitx′it
−1
1√T
k∑t=k0+1
xitx′itδ
0i 1k>k0
=
1
T
k0∑t=1
xitx′it
−1
1√T
k0∑t=1
xituit + op
(1
T
)Op(1) +Op(1)op
(1√T
)+ op
(1√T
)1k>k0
=
1
T
k0∑t=1
xitx′it
−1
1√T
k0∑t=1
xituit + op
(1
T
)+ op
(1√T
), (D.11)
where we replace k by k0 using the consistency property (D.1) and following orders, 1
T
k∑t=1
xitx′it
−1
−
1
T
k0∑t=1
xitx′it
−1
=
1
T
k∑t=1
xitx′it
−1 1
T
k0∑t=1
xitx′it −
1
T
k∑t=1
xitx′it
1
T
k0∑t=1
xitx′it
−1
= Op(1)op
(1
T
)Op(1) = op
(1
T
),
1√T
k∑t=1
xituit −1√T
k0∑t=1
xituit = op
(1√T
),
and 1
T
k∑t=1
xitx′it
−1
1
T
k∑t=k0+1
xitx′itδ
0i = Op(1)op
(1
T
)= op
(1
T
).
Substituting (D.11) into the term U2 in (D.4), we have,
1√NT
N∑i=1
k∑t=1
x′it(βi − β0i ) =
1√N
N∑i=1
1
T
k∑t=1
x′it√T (βi − β0
i )
=1√N
N∑i=1
1
T
k∑t=1
x′it
1
T
k0∑t=1
xitx′it
−1
1√T
k0∑t=1
xituit +1√N
N∑i=1
1
T
k∑t=1
x′it
(op
(1
T
)+ op
(1√T
))
=1√N
N∑i=1
1
T
k∑t=1
x′it
1
T
k0∑t=1
xitx′it
−1
1√T
k0∑t=1
xituit + op
(√N
T
)+ op
(√N
T
), (D.12)
94
where the second and third terms in the last equality vanish since N/T → 0 by Assumption
4.2(iv). From Assumptions 4.4 (ii)-(iii), we can see that,∥∥∥∥∥1kk∑
t=1
x′it − c′i1
∥∥∥∥∥ = op(1), and
∥∥∥∥∥∥(1
k
k∑t=1
xitx′it
)−1
− C−1i
∥∥∥∥∥∥ = op(1). (D.13)
Using orders in (D.13) and equality c′i1C−1i = [1, 0, · · · , 0], we have,∣∣∣∣∣∣ 1√
N
N∑i=1
1
T
k∑t=1
x′it
1
T
k0∑t=1
xitx′it
−1
1√T
k0∑t=1
xituit −k
k01√NT
N∑i=1
k0∑t=1
uit
∣∣∣∣∣∣=
∣∣∣∣∣∣ 1√N
N∑i=1
1
T
k∑t=1
x′it
1
T
k0∑t=1
xitx′it
−1
1√T
k0∑t=1
xituit −k
k01√N
N∑i=1
c′i1C−1i
1√T
k0∑t=1
xituit
∣∣∣∣∣∣≤ 1√
N
N∑i=1
∥∥∥∥∥∥ 1Tk∑
t=1
x′it
1
T
k0∑t=1
xitx′it
−1
− k
k0c′i1C
−1i
∥∥∥∥∥∥∥∥∥∥∥∥ 1√
T
k0∑t=1
xituit
∥∥∥∥∥∥=
∥∥∥∥∥∥ 1√NT
N∑i=1
k0∑t=1
xituit
∥∥∥∥∥∥ op(1) = op(1). (D.14)
Applying the functional central limit theorem (FCLT), we can see that
1√NT
N∑i=1
k0∑t=1
uit ⇒ σW (τ0). (D.15)
Hence, we have, uniformly in τ ,
1√N
N∑i=1
1
T
k∑t=1
x′it
1
T
k0∑t=1
xitx′it
−1
1√T
k0∑t=1
xituit ⇒ στW (τ0)
τ0. (D.16)
(iii) The coefficient δi is estimated as, if k < k0,
δi =
T∑t=k+1
xitx′it
−1T∑
t=k+1
xityit −
k∑t=1
xitx′it
−1k∑
t=1
xityit
=
T∑t=k+1
xitx′it
−1 k0∑t=k+1
xit(x′itβ
0i + uit) +
T∑t=k0+1
xit(x′itβ
0i + x′itδ
0i + uit)
−
β0i +
k∑t=1
xitx′it
−1k∑
t=1
xituit
=
T∑t=k+1
xitx′it
−1T∑
t=k+1
xituit −
k∑t=1
xitx′it
−1k∑
t=1
xituit +
T∑t=k+1
xitx′it
−1T∑
t=k0+1
xitx′itδ
0i
=
T∑t=k+1
xitx′it
−1T∑
t=k+1
xituit −
k∑t=1
xitx′it
−1k∑
t=1
xituit + δ0i −
T∑t=k+1
xitx′it
−1k0∑
t=k+1
xitx′itδ
0i ,
95
and if k ≥ k0,
δi =
T∑t=k+1
xitx′it
−1T∑
t=k+1
xityit −
k∑t=1
xitx′it
−1k∑
t=1
xityit
=
T∑t=k+1
xitx′it
−1T∑
t=k+1
xit(x′itβ
0i + x′itδ
0i + uit)
−
k∑t=1
xitx′it
−1 k0∑t=1
xit(x′itβ
0i + uit) +
k∑t=k0+1
xit(x′itβ
0i + x′itδ
0i + uit)
= δ0i +
T∑t=k+1
xitx′it
−1T∑
t=k+1
xituit −
k∑t=1
xitx′it
−1k∑
t=1
xituit −
k∑t=1
xitx′it
−1k∑
t=k0+1
xitx′itδ
0i .
Using the consistency property (D.1), the term∑k0
t=k+1xitx
′itδ
0i = op(1) is negligible and we
can see that,
√T (δi−δ0i ) =
1
T
T∑t=k+1
xitx′it
−1
1√T
T∑t=k+1
xituit−
1
T
k∑t=1
xitx′it
−1
1√T
k∑t=1
xituit+op
(1√T
).
(D.17)
Similarly to the proof of (ii), k in (D.17) can be replaced by k0 due to consistency that
kp−→ k0. Then, (D.17) is transformed into
√T (δi−δ0i ) =
1
T
T∑t=k0+1
xitx′it
−1
1√T
T∑t=k0+1
xituit−
1
T
k0∑t=1
xitx′it
−1
1√T
k0∑t=1
xituit+op
(1
T
)+op
(1√T
).
(D.18)
Thus, we have,
1√NT
N∑i=1
k∑t=k+1
x′it(δi − δ0i ) =1√N
N∑i=1
1
T
k∑t=k+1
x′it√T (δi − δ0i )
=1√N
N∑i=1
1
T
k∑t=k0+1
x′it1k>k0 + op
(1
T
)√T (δi − δ0i )
=1√N
N∑i=1
1
T
k∑t=k0+1
x′it1k>k0
√T (δi − δ0i ) + op
(√N
T
)Op(1)
=1√N
N∑i=1
1
T
k∑t=k0+1
x′it1k>k0
1
T
T∑t=k0+1
xitx′it
−1
1√T
T∑t=k0+1
xituit
−
1
T
k0∑t=1
xitx′it
−1
1√T
k0∑t=1
xituit
+ op
(√N
T
)+ op
(√N
T
). (D.19)
96
The terms in bracket of (D.19) dominate the others. Similarly to (D.14)-(D.16), we can see
that, uniformly in τ ∈ (0, 1),
U3 ⇒ σ(τ − τ0)
[W (1)−W (τ0)
1− τ0− W (τ0)
τ0
]1τ>τ0.
Thus, we complete the proof of Lemma D.1. Using (D.4), (D.5) and Lemma D.1, we can show that,
1√NT
N∑i=1
[Tτ ]∑t=1
uit ⇒ σW (τ)− στW (τ0)
τ0− σ(τ − τ0)
[W (1)−W (τ0)
1− τ0− W (τ0)
τ0
]1τ>τ0,
uniformly in τ . Applying the continuous mapping theorem, we obtain,
supτ∈Ω(ϵ)
∣∣∣∣∣∣ 1√NT
N∑i=1
[Tτ ]∑t=1
uit
∣∣∣∣∣∣2
⇒ supτ∈Ω(ϵ)
σ2
∣∣∣∣W (τ)− τW (τ0)
τ0− (τ − τ0)
[W (1)−W (τ0)
1− τ0− W (τ0)
τ0
]1τ>τ0
∣∣∣∣2 .(D.20)
We next derive the asymptotic distribution of the normalization process under the null
hypothesis. By definition (4.10), the normalization factor is based on the residuals uit, which
are calculated by regressing Yi on Xit(k1, k, k2) in (4.8). We assume that [Tϵ] ≤ k1 ≤ k− [Tϵ],
and k+[Tϵ] ≤ k2 ≤ [T (1−ϵ)], where k1, k2 are bounded away from endpoints and the common
break estimate k. Since k converges in probability to k0, we have |kj−k| > |k0−k| for j = 1, 2.
Thus, we only consider the case that k1 and k2 take values in k1 < k0 < k2. In this case, the
true model with common break k0 is written as
Yi = [Xi, X1i(k1, k0), X2i(k
0, k2), X3i(k2)]
β0i
0δ0iδ0i
+ ui
= Xi(k1, k0, k2)b
01i + ui
= Xi(k1, k, k2)b01i + ui + [Xi(k1, k
0, k2))− Xi(k1, k, k2)]b01i. (D.21)
The residuals are calculated by
ui = Xi(k1, k, k2)b01i + ui + [Xi(k1, k
0, k2)− Xi(k1, k, k2)]b01i − Xi(k1, k, k2)b1i(k)
= ui + Xi(k1, k, k2)b01i − Xi(k1, k, k2)b1i(k)
+[0, X1i(k1, k0)−X1i(k1, k), X2i(k
0, k2)−X2i(k, k2), 0]
β0i
0δ0iδ0i
= ui − Xi(k1, k, k2)(b1i(k)− b01i) + [X2i(k
0, k2)−X2i(k, k2)]δ0i ,
97
whose vector form is
ui1ui2...
uiT
=
ui1ui2...
uiT
−
x′i1x′i2...
x′iT
(βi(k)− β0i )−
0...0
x′i(k1+1)...
x′ik0......0
δ1i(k)−
0......0
x′i(k+1)...
x′ik20...0
(δ2i(k)− δ0i )
−
0.........0
x′i(k2+1)...
x′iT
(δ3i(k)− δ0i ) +
0......0
x′i(k0+1)...
x′ik20...0
−
0......0
x′i(k+1)...
x′ik20...0
δ0i . (D.22)
For simplicity, k is suppressed in βi(k), δ1i(k), δ2i(k) and δ3i(k). Then, the normalization
factor is constructed by four terms V1, V2, V3, and V4, which are defined by
V1 =1
T
k1∑s=1
(1√NT
N∑i=1
s∑t=1
uit
)2
,
V2 =1
T
k∑s=k1+1
1√NT
N∑i=1
k∑t=s
uit
2
,
V3 =1
T
k2∑s=k+1
1√NT
N∑i=1
s∑t=k+1
uit
2
,
V4 =1
T
T∑s=k2+1
(1√NT
N∑i=1
T∑t=s
uit
)2
.
Lemma D.2 derives the asymptotic distributions of four terms under the null hypothesis.
98
Lemma D.2 Suppose that Assumptions 4.1-4.5 hold. We have, as N,T → ∞,
(i) V1 ⇒ σ2
∫ τ1
0
(W (r)− r
τ1W (τ1)
)2
dr,
(ii) V2 ⇒ σ2
∫ τ0
τ1
[W (τ0)−W (r)− τ0 − r
τ0 − τ1(W (τ0)−W (τ1))
]2dr,
(iii) V3 ⇒ σ2
∫ τ2
τ0
[W (r)−W (τ0)− r − τ0
τ2 − τ0(W (τ2)−W (τ0))
]2dr,
(iv) V4 ⇒ σ2
∫ 1
τ2
[W (1)−W (r)− 1− r
1− τ2(W (1)−W (τ2))
]2dr.
Proof of Lemma D.2. (i) Using (D.22), the first term V1 can be rewritten as,
V1 =1
T
k1∑s=1
1√NT
N∑i=1
[s∑
t=1
uit −s∑
t=1
x′it(βi − β0i )
]2
=1
T
k1∑s=1
(V11 − V12)2 .
Using the FCLT, it is shown that
V11 ⇒ σW (r). (D.23)
By the definition of βi, we can see that,
βi =
(k1∑t=1
xitx′it
)−1 k1∑t=1
xityit =
(k1∑t=1
xitx′it
)−1 k1∑t=1
xit(x′itβ
0i + uit) = β0
i +
(k1∑t=1
xitx′it
)−1 k1∑t=1
xituit,
thus, we have,
V12 =1√N
N∑i=1
1
T
s∑t=1
x′it
(1
T
k1∑t=1
xitx′it
)−11√T
k1∑t=1
xituit
⇒ σr
τ1W (τ1). (D.24)
Combining the results (D.23) and (D.24) and using the continuous mapping theorem, we can
derive the asymptotic distribution of the first term V1 as follows:
V1 ⇒ σ2
∫ τ1
0
(W (r)− r
τ1W (τ1)
)2
dr.
(ii) The second term V2 can be rewritten as
V2 =1
T
k∑s=k1+1
1√NT
N∑i=1
k∑t=s
uit −k∑
t=s
x′it(βi − β0i )−
k∑t=s
x′itδ1i +
k∑t=s
x′itδ0i 1k0<s<k
2
=1
T
k∑s=k1+1
1√NT
N∑i=1
k∑t=s
uit −1√NT
N∑i=1
k∑t=s
x′it(βi − β0i )−
1√NT
N∑i=1
k∑t=s
x′itδ1i + op
(√N
T
)2
=1
T
k∑s=k1+1
(V21 − V22 − V23 + op
(√N
T
))2
.
99
Since k coincides asymptotically with true break date from (D.1), k in V21, V22, V23 can be
replaced by k0. Then, we can show that
V21 =1√NT
N∑i=1
k∑t=1
uit −1√NT
N∑i=1
s−1∑t=1
uit ⇒ σ(W (τ0)−W (r)), (D.25)
V22 =1√N
N∑i=1
1
T
k∑t=s
x′it√T (βi − β0
i )
=1√N
N∑i=1
1
T
k0∑t=s
x′it + op
(1
T
)( 1
T
k1∑t=1
xitx′it
)−11√T
k1∑t=1
xituit
⇒ σ(τ0 − r)W (τ1)
τ1. (D.26)
The coefficient estimator δ1i in V23 can be calculated as,
δ1i =
k∑t=k1+1
xitx′it
−1k∑
t=k1+1
xityit −
(k1∑t=1
xitx′it
)−1 k1∑t=1
xityit
=
k∑t=k1+1
xitx′it
−1k∑
t=k1+1
xit(x′itβ
0i + uit)1k≤k0 +
k0∑t=k1+1
xit(x′itβ
0i + uit)
+
k∑t=k0+1
xit(x′itβ
0i + x′itδ
0i + uit)
1k>k0 −
β0i +
(k1∑t=1
xitx′it
)−1 k1∑t=1
xituit
=
k∑t=k1+1
xitx′it
−1k∑
t=k1+1
xituit −
(k1∑t=1
xitx′it
)−1 k1∑t=1
xituit
+
k∑t=k1+1
xitx′it
−1k∑
t=k0+1
xitx′itδ
0i 1k>k0
=
k∑t=k1+1
xitx′it
−1k∑
t=k1+1
xituit −
(k1∑t=1
xitx′it
)−1 k1∑t=1
xituit + op
(1
T
).
100
Then, the third term V23 becomes, as N,T → ∞,
V23 =1√N
N∑i=1
1
T
k∑t=s
x′it√T δ1i
=1√N
N∑i=1
1
T
k0∑t=s
x′it + op
(1
T
)√T
k0∑t=k1+1
xitx′it
−1k0∑
t=k1+1
xituit
−√T
(k1∑t=1
xitx′it
)−1 k1∑t=1
xituit + op
(1
T
)+ op
(1√T
)+ op
(1√T
)]
=1√N
N∑i=1
1
T
k0∑t=s
x′it
√T
k0∑t=k1+1
xitx′it
−1k0∑
t=k1+1
xituit −√T
(k1∑t=1
xitx′it
)−1 k1∑t=1
xituit
+op
(1
T
)+ op
(√N√T
)+ op
(√N
T
)+ op
(√N
T 3/2
)+ op
(√N
T 2
)
⇒ σ(τ0 − r)
(W (τ0)−W (τ1)
τ0 − τ1− W (τ1)
τ1
), (D.27)
since N/T → 0. Combining results (D.25), (D.26), and (D.27), we have,
V2 ⇒∫ τ0
τ1
[σ(W (τ0)−W (r))− σ(τ0 − r)
W (τ1)
τ1− σ(τ0 − r)
(W (τ0)−W (τ1)
τ0 − τ1− W (τ1)
τ1
)]2dr
= σ2
∫ τ0
τ1
(W (τ0)−W (r)− (τ0 − r)
W (τ0)−W (τ1)
τ0 − τ1
)2
dr.
(iii) The third term V3 can be rewritten as
V3 =1
T
k2∑s=k+1
1√NT
N∑i=1
s∑t=k+1
uit −s∑
t=k+1
x′it(βi − β0i )−
s∑t=k+1
x′it(δ2i − δ0i )
−k0∑
t=k+1
x′itδ0i 1k<k0<s −
s∑t=k+1
x′itδ0i 1s≤k0
2
=1
T
k2∑s=k+1
1√NT
N∑i=1
s∑t=k+1
uit −1√NT
N∑i=1
s∑t=k+1
x′it(βi − β0i )−
1√NT
N∑i=1
s∑t=k+1
x′it(δ2i − δ0i )
− op
(√N
T
)]2
=1
T
k2∑s=k+1
(V31 − V32 − V33 − op
(√N
T
))2
.
101
Similar to (D.25) and (D.27), we can find that,
V31 ⇒ σ(W (r)−W (τ0)), (D.28)
V32 =1√N
N∑i=1
1
T
s∑t=k0+1
x′it + op
(1
T
)( 1
T
k1∑t=1
xitx′it
)−11√T
k1∑t=1
xituit
⇒ σ(r − τ0)W (τ1)
τ1. (D.29)
The coefficient δ2i is estimated as
δ2i =
k2∑t=k+1
xitx′it
−1k2∑
t=k+1
xityit −
(k1∑t=1
xitx′it
)−1 k1∑t=1
xityit
=
k2∑t=k+1
xitx′it
−1k2∑
t=k+1
xit(x′itβ
0i + x′itδ
0i + uit)1k0≤k +
k0∑t=k+1
xit(x′itβ
0i + uit)
+
k2∑t=k0+1
xit(x′itβ
0i + x′itδ
0i + uit)
1k0<k
−
β0i +
(k1∑t=1
xitx′it
)−1 k1∑t=1
xituit
= δ0i +
k2∑t=k+1
xitx′it
−1k2∑
t=k+1
xituit −
(k1∑t=1
xitx′it
)−1 k1∑t=1
xituit − op
(1
T
).
Then, similar to V23, the term V33 becomes, as N,T → ∞,
1√N
N∑i=1
1
T
s∑t=k+1
x′it√T (δ2i − δ0i )
=1√N
N∑i=1
1
T
s∑t=k0+1
x′it + op
(1
T
)√T
k2∑t=k+1
xitx′it
−1k2∑
t=k+1
xituit
−√T
(k1∑t=1
xitx′it
)−1 k1∑t=1
xituit + op
(1
T
)+ op
(1√T
)+ op
(1√T
)]
=1√N
N∑i=1
1
T
s∑t=k0+1
x′it
√T
k2∑t=k0+1
xitx′it
−1k2∑
t=k0+1
xituit −√T
(k1∑t=1
xitx′it
)−1 k1∑t=1
xituit
+op
(1
T
)+ op
(√N√T
)+ op
(√N
T
)+ op
(√N
T 3/2
)+ op
(√N
T 2
)
⇒ σ(r − τ0)
(W (τ2)−W (τ0)
τ2 − τ0− W (τ1)
τ1
), (D.30)
since N/T → 0. Combining results (D.28), (D.29), and (D.30), we have,
V3 ⇒ σ2
∫ τ2
τ0
(W (r)−W (τ0)− (r − τ0)
W (τ2)−W (τ0)
τ2 − τ0
)2
dr.
102
(iv) The fourth term V4 can be rewritten by
V4 =1
T
T∑s=k2+1
1√NT
N∑i=1
[T∑t=s
uit −T∑t=s
x′it(βi − β0i )−
T∑t=s
x′it(δ3i − δ0i )
]2
=1
T
T∑s=k2+1
[1√NT
N∑i=1
T∑t=s
uit −1√NT
N∑i=1
T∑t=s
uitx′it(βi − β0
i )−1√NT
N∑i=1
T∑t=s
uitx′it(δ3i − δ0i )
]2
=1
T
T∑s=k2+1
(V41 − V42 − V43)2 .
It is easily seen that
V41 ⇒ σ(W (1)−W (r)), (D.31)
V42 =1√N
N∑i=1
(1
T
T∑t=s
x′it
)(1
T
k1∑t=1
xitx′it
)−11√T
k1∑t=1
xituit
⇒ σ(1− r)W (τ1)
τ1. (D.32)
The coefficient estimator δ3i can be written as,
δ3i =
T∑t=k2+1
xitx′it
−1T∑
t=k2+1
xityit −
(k1∑t=1
xitx′it
)−1 k1∑t=1
xityit
=
T∑t=k2+1
xitx′it
−1 T∑t=k2+1
xit(x′itβ
0i + x′itδ
0i + uit)
−
β0i +
(k1∑t=1
xitx′it
)−1 k1∑t=1
xituit
= δ0i +
T∑t=k2+1
xitx′it
−1T∑
t=k2+1
xituit −
(k1∑t=1
xitx′it
)−1 k1∑t=1
xituit.
Then, it is shown that, as N,T → ∞,
V43 =1√N
N∑i=1
1
T
T∑t=s
x′it√T (δ3i − δ0i )
=1√N
N∑i=1
(1
T
T∑t=s
x′it
)√T
T∑t=k2+1
xitx′it
−1T∑
t=k2+1
xituit −
(k1∑t=1
xitx′it
)−1 k1∑t=1
xituit
⇒ σ(1− r)
(W (1)−W (τ2)
1− τ2− W (τ1)
τ1
), (D.33)
since N/T → 0. From (D.31), (D.32), and (D.33), we have,
V4 ⇒ σ2
∫ 1
τ2
[W (1)−W (r)− 1− r
1− τ2(W (1)−W (τ2))
]2dr.
The proof of Lemma D.2 is complete. Proof of Theorem 4.1. Combining the asymptotic distributions in (D.20) and Lemma D.2,
103
we can complete the proof.
Proofs of Propositions 4.1-4.3 and Theorem 4.2
We first investigate the statistical properties of the common break estimator k under the
alternative hypothesis in Proposition 4.1. The proof follows the proof of Lemmas 1-2 and
Theorem 1 in Baltagi et al. (2016). Suppose that there are two groups and the individ-
uals in the same group share the common break date. Denote these groups G1 = i :
individuals in group 1 with common break k01 and G2 = i : individuals in group 2 with
common break k02. The model under the alternative can be specified byyit = x′itβ
0i + x′itδ
01i1(t>k01)
+ uit t = 1, · · · , T, for i ∈ G1,
yit = x′itβ0i + x′itδ
02i1(t>k02)
+ uit t = 1, · · · , T, for i ∈ G2.
The vector form can be rewritten by
Yi = [Xi, Zi(k0j )]b
0i + ui, for i ∈ Gj , j = 1, 2.
The common break point is estimated in (4.3) by minimizing the total sum of squared OLS
residuals. Let SSRi denote the sum of squared residuals of regression Yi on Xi (no break
case). Using the equality on page 185 of Baltagi et al. (2016),
SSRi − SSRi(k∗) = δi(k
∗)′[Zi(k∗)′MiZi(k
∗)]δi(k∗),
estimation (4.3) can be transformed into
k = arg min1≤k∗≤T−1
N∑i=1
SSRi(k∗)
= arg max1≤k∗≤T−1
N∑i=1
(SSRi − SSRi(k∗))
= arg max1≤k∗≤T−1
N∑i=1
SVi(k∗)
= arg max1≤k∗≤T−1
∑i∈G1
(SVi(k∗)− SVi(k
01)) +
∑i∈G2
(SVi(k∗)− SVi(k
01))
, (D.34)
where
Mi = I −Xi(X′iXi)
−1X ′i,
SVi(k∗) = δi(k
∗)′[Zi(k∗)′MiZi(k
∗)]δi(k∗),
SVi(k0j ) = δi(k
0j )
′[Zi(k0j )
′MiZi(k0j )]δi(k
0j ), for, j = 1, 2.
104
For individuals in group 1, we can see that the coefficients estimators are given by
δi(k∗) = [Zi(k
∗)′MiZi(k∗)]−1Zi(k
∗)MiYi,
δi(k01) = [Zi(k
01)
′MiZi(k01)]
−1Zi(k01)MiYi.
Replacing Yi by
Yi = Xiβ0i + Zi(k
01)δ
0i + ui,
we have,
δi(k∗) = [Zi(k
∗)′MiZi(k∗)]−1Zi(k
∗)′Mi[Xiβ0i + Zi(k
01)δ
0i + ui]
= [Zi(k∗)′MiZi(k
∗)]−1Zi(k∗)′MiZi(k
01)δ
0i + [Zi(k
∗)′MiZi(k∗)]−1Zi(k
∗)′Miui,
δi(k01) = [Zi(k
01)
′MiZi(k01)]
−1Zi(k01)
′Mi[Xiβ0i + Zi(k
01)δ
0i + ui]
= δ0i + [Zi(k01)
′MiZi(k01)]
−1Zi(k01)
′Miui.
Similarly, for individuals in group 2, by replacing Yi by
Yi = Xiβ0i + Zi(k
02)δ
0i + ui,
the coefficients estimators are rewritten as
δi(k∗) = [Zi(k
∗)′MiZi(k∗)]−1Zi(k
∗)′Mi[Xiβ0i + Zi(k
02)δ
0i + ui]
= [Zi(k∗)′MiZi(k
∗)]−1Zi(k∗)′MiZi(k
02)δ
0i + [Zi(k
∗)′MiZi(k∗)]−1Zi(k
∗)′Miui,
δi(k01) = [Zi(k
01)
′MiZi(k01)]
−1Zi(k01)
′Mi[Xiβ0i + Zi(k
02)δ
0i + ui]
= [Zi(k01)
′MiZi(k01)]
−1Zi(k01)
′MiZi(k02)δ
0i + [Zi(k
01)
′MiZi(k01)]
−1Zi(k01)
′Miui.
To simplify notations, we use Zi, Z01i, Z
02i to replace Zi(k
∗), Zi(k01), Zi(k
02). For individuals in
group 1, we have
SVi(k∗) = δ0i
′Z01i′MiZi(Z
′iMiZi)
−1Z ′iMiZ
01iδ
0i + 2δ0i
′Z01i′MiZi(Z
′iMiZi)
−1Z ′iMiui
+u′iMiZ′i(Z
′iMiZi)
−1Z ′iMiui, (D.35)
SVi(k01) = δ0i
′Z01i′MiZ
01iδ
0i + 2δ0i
′Z01i′Miui + u′iMiZ
01i(Z
01i′MiZ
01i)
−1Z01i′Miui. (D.36)
Using (D.35) and (D.36), SVi(k∗)− SVi(k
01) becomes
SVi(k∗)− SVi(k
01) = −δ0i
′[Z01i′MiZ
01i − Z0
1i′MiZi(Z
′iMiZi)
−1Z ′iMiZ
01i
]δ0i
+2δ0i′Z01i′MiZi(Z
′iMiZi)
−1Z ′iMiui − 2δ0i
′Z01i′Miui
+u′iMiZ′i(Z
′iMiZi)
−1Z ′iMiui − u′iMiZ
01i(Z
01i′MiZ
01i)
−1Z01i′Miui,
105
and can be decomposed into the term defined by
J1i(k∗) = δ0i
′[Z01i′MiZ
01i − Z0
1i′MiZi(Z
′iMiZi)
−1Z ′iMiZ
01i
]δ0i (D.37)
and the term related to disturbance ui defined by
H1i(k∗) = 2δ0i
′Z01i′MiZi(Z
′iMiZi)
−1Z ′iMiui − 2δ0i
′Z01i′Miui
+u′iMiZ′i(Z
′iMiZi)
−1Z ′iMiui − u′iMiZ
01i(Z
01i′MiZ
01i)
−1Z01i′Miui. (D.38)
Then, we have SVi(k∗) − SVi(k
01) = −J1i(k
∗) + H1i(k∗) for i ∈ G1. Similar transformation
for individuals in group 2, we can see that
SVi(k∗) = δ0i
′Z02i′MiZi(Z
′iMiZi)
−1Z ′iMiZ
02iδ
0i + 2δ0i
′Z02i′MiZi(Z
′iMiZi)
−1Z ′iMiui
+u′iMiZ′i(Z
′iMiZi)
−1Z ′iMiui, (D.39)
SVi(k01) = δ0i
′Z02i′MiZ
01i(Z
01i′MiZ
01i)
−1Z01iMiZ
02iδ
0i + 2δ0i
′Z02i′MiZ
01i(Z
01i′MiZ
01i)
−1Z01iMiui
+u′iMiZ01i(Z
01i′MiZ
01i)
−1Z01i′Miui. (D.40)
Using (D.39) and (D.40), it is easily seen that, for i ∈ G2,
SVi(k∗)− SVi(k
01) = −J2i(k
∗) +H2i(k∗),
where the term J2i(k∗) is denoted by
J2i(k∗) = δ0i
′[Z02i′MiZ
01i(Z
01i′MiZ
01i)
−1Z01iMiZ
02i − Z0
2i′MiZi(Z
′iMiZi)
−1Z ′iMiZ
02i
]δ0i , (D.41)
and the term H2i(k∗) related to disturbance is denoted by
H2i(k∗) = 2δ0i
′Z02i′MiZi(Z
′iMiZi)
−1Z ′iMiui − 2δ0i
′Z02i′MiZ
01i(Z
01i′MiZ
01i)
−1Z01iMiui
+u′iMiZ′i(Z
′iMiZi)
−1Z ′iMiui − u′iMiZ
01i(Z
01i′MiZ
01i)
−1Z01i′Miui. (D.42)
Thus, (D.34) can be rewritten as
k = arg max1≤k∗≤T−1
∑i∈G1
(−J1i(k∗) +H1i(k
∗)) +∑i∈G2
(−J2i(k∗) +H2i(k
∗))
.
Define the sets K(C1) = k : 1 ≤ k < k01 − C1, K(C2) = k : k02 + C2 < k ≤ T − C1, and
K(C) = K(C1)∪K(C2) = k : 1 ≤ k < k01 −C1 or k02 +C2 < k ≤ T with positive constants
C1, C2. We next show that the common break estimator cannot appear in the set K(C1) by
Lemmas D.3-D.4. A similar result can be obtained for the set K(C2) by symmetry and thus
the details are omitted. Define
Z∆1i =
Zi(k
∗)− Zi(k01) if k < k01
−(Zi(k∗)− Zi(k
01)) if k ≥ k01
and Z∆2i =
Zi(k
∗)− Zi(k02) if k < k02
−(Zi(k∗)− Zi(k
02)) if k ≥ k02
.
106
Lemma D.3 Under Assumptions 4.1-4.6, for all large N and T , with probability tending to
1,
infk∗∈K(C1)
1
k01 − k∗
∑i∈G1
J1i(k∗) +
∑i∈G2
J2i(k∗)
≥ λϕN .
Proof of Lemma D.3. We first show that the summation of part J1i(k∗) has a lower bound
in the case of k∗ ∈ K(C1). From Lemma A.2 in Bai(1997), if k < k01,
J1i(k∗) = δ0i
′[Z01i′MiZ
01i − Z0
1i′MiZi(Z
′iMiZi)
−1Z ′iMiZ
01i
]δ0i
= δ0i′Z∆1i
′Z∆1i
(Z ′iZi
)−1Z01i′Z01iδ
0i . (D.43)
Since the matrixZ∆1i
′Z∆1i
k01 − k∗
(Z ′iZi
T
)−1 Z01i′Z01i
T(D.44)
is symmetric and positive definite from Assumption 4.4, we have
1
k01 − k∗
∑i∈G1
J1i(k∗) =
∑i∈G1
δ0i′S′iΛiSiδ
0i =
∑i∈G1
δ0′
i Λiδ0i ≥
∑i∈G1
λiδ0′i δ0i , (D.45)
where Λi is a diagonal matrix comprising of the eigenvalues of matrix (D.44), δ0i = Siδ0i , and
λi is the minimum eigenvalue of (D.44). Since δ0′
i δ0i = δ0′
i S′iSiδ
0i = δ0
′i δ0i , with probability
tending to one for large N and T, we show that
1
k01 − k∗
∑i∈G1
J1i(k∗) ≥ λ1
∑i∈G1
δ0′
i δ0i = λ1ϕN1 , (D.46)
where λ1 = mini∈G1λi. We next investigate the lower bound of J2i(k∗) for individuals in
group 2. Denote
Vi(a, b) =
x′i(a+1)
x′i(a+2)...x′ib
, V 0i (a, b, c) =
0(b−a)×p
x′i(b+1)
x′i(b+2)...x′ic
, and S =
[I 0−I I
],
where Vi(a, b) is a (b − a) × p matrix whose jth row is the same as (a + j)th row of Xi,
V 0i (a, b, c) is a (c− a)× p matrix whose first b− a rows are zeros and the jth row is the same
as (a+ j)th row of Xi for j > b− a, and S is a 2p× 2p matrix constructed by p× p identity
107
matrix I. The second term J2i(k) can be transformed into
J2i(k∗) = δ0i
′[Z02i′MiZ
01i(Z
01i′MiZ
01i)
−1Z01iMiZ
02i − Z0
2i′MiZi(Z
′iMiZi)
−1Z ′iMiZ
02i
]δ0i
= δ0i′Z02i′[MiZ
01i(Z
01i′MiZ
01i)
−1Z01iMiZ
02i − Z0
2i′MiZi(Z
′iMiZi)
−1Z ′iMi
]Z02iδ
0i
= δ0i′Z02i′[Mi − Z0
2i′MiZi(Z
′iMiZi)
−1Z ′iMi −
(Mi −MiZ
01i(Z
01i′MiZ
01i)
−1Z01iMiZ
02i
)]Z02iδ
0i
= δ0i′Z02i′(MW −MW 0
1
)Z02iδ
0i
= δ0i′Z02i′(MW −MW 0
1
)Z02iδ
0i , (D.47)
where
W = [Xi, Zi(k∗)] =
[Vi(0, k
∗) 0Vi(k
∗, T ) Vi(k∗, T )
], W 0
1 = [Xi, Zi(k01)] =
[Vi(0, k
01) 0
Vi(k01, T
01 ) Vi(k
01, T )
],
W =
[Vi(0, k
∗) 00 Vi(k
∗, T )
], W 0
1 =
[Vi(0, k
01) 0
0 Vi(k01, T )
],
MX = I −X(X ′X)−1X ′, for matrixX.
The final equality (D.47) holds because
MW = I −W (W ′W )−1W ′ = I −WS(S′W ′WS)−1S′W ′ = I − W(W ′W
)−1W = MW ,
MW 01= I −W 0
1 (W01′W 0
1 )−1W 0
1′= I −W 0
1 S(S′W 0
1′W 0
1 S)−1S′W 0
1′= I − W 0
1
(W 0′
1 W 01
)−1W 0′
1 = MW 01.
Since W and W 01 are block matrices, it follows that
Z02i′MWZ0
2i
= Z02i′[I − W
(W ′W
)−1W]Z02i
= [0, V 0i (k
∗, k02, T )]′[I − W
(W ′W
)−1W] [ 0
V 0i (k
∗, k02, T )
]= V 0
i (k∗, k02, T )
′V 0i (k
∗, k02, T )− V 0i (k
∗, k02, T )′Vi(k
∗, T )(Vi(k
∗, T )′Vi(k∗, T )
)−1Vi(k
∗, T )V 0i (k
∗, k02, T )
= Vi(k02, T )
′Vi(k
02, T )− Vi(k
02, T )
′Vi(k
02, T )
(Vi(k
∗, T )′Vi(k∗, T )
)−1Vi(k
02, T )
′Vi(k
02, T )
= Vi(k02, T )
′Vi(k
02, T )
[(Vi(k
02, T )
′Vi(k
02, T )
)−1−(Vi(k
∗, T )′Vi(k∗, T )
)−1]Vi(k
02, T )
′Vi(k
02, T ), (D.48)
Z02i′MW 0
1Z02i
= Z02i′[I − W 0
1
(W 0′
1 W 01
)−1W 0′
1
]Z02i
= [0, V 0i (k
01, k
02, T )]
′[I − W 0
1
(W 0′
1 W 01
)−1W 0′
1
] [0
V 0i (k
01, k
02, T )
]= V 0
i (k01, k
02, T )
′V 0i (k
01, k
02, T )− V 0
i (k01, k
02, T )
′Vi(k01, T )
(Vi(k
01, T )
′Vi(k
01, T )
)−1Vi(k
01, T )V
0i (k
01, k
02, T )
= Vi(k02, T )
′Vi(k
02, T )
[(Vi(k
02, T )
′Vi(k
02, T )
)−1−(Vi(k
01, T )
′Vi(k
01, T )
)−1]Vi(k
02, T )
′Vi(k
02, T ). (D.49)
108
Substituting (D.48) and (D.49) into (D.47), we have
J2i(k∗) = δ0
′i Vi(k
02, T )
′Vi(k
02, T )
[(Vi(k
01, T )
′Vi(k
01, T )
)−1−(Vi(k
∗, T )′Vi(k∗, T )
)−1]Vi(k
02, T )
′Vi(k
02, T )δ
0i
= δ0′
i Z02i′Z02i
[(Z01i′Z01i
)−1−(Z ′iZi
)−1]Z02i′Z02iδ
0i
= δ0′
i Z02i′Z02i
(Z ′iZi
)−1(Z ′iZi − Z0
1i′Z01i
)(Z01i′Z01i
)−1Z02i′Z02iδ
0i
= δ0′
i Z02i′Z02i
(Z ′iZi
)−1Z∆1i
′Z∆1i
(Z01i′Z01i
)−1Z02i′Z02iδ
0i , (D.50)
which is symmetric from the first equality. Hence, under Assumptions 4.4 and 4.5,
Z02i′Z02i
T
(Z ′iZi
T
)−1 Z∆1i
′Z∆1i
k01 − k∗
(Z01i′Z01i
T
)−1Z02i′Z02i
T(D.51)
is positive definite. Then, we have
1
k01 − k∗
∑i∈G2
J2i(k∗) ≥ λ2
∑i∈G2
δ0′
i δ0i = λ2ϕN2 , (D.52)
where λ2 = mini∈G2λi and λi is the minimum eigenvalue of matrix (D.51). From inequal-
ities (D.46) and (D.52), the proof of Lemma D.3 is complete.
109
Lemma D.4 Under Assumptions 4.1-4.6, uniformly on k∗ ∈ K(C1),
(i)∑i∈G1
1
k01 − k∗δ0i
′Z∆1i
′ui = Op(
√ϕN1),
(ii)∑i∈G1
1
k01 − k∗δ0i
′Z∆1i
′Xi
(X ′
iXi
)−1X ′
iui = Op
(√ϕN1
T
),
(iii)∑i∈G1
1
k01 − k∗δ0i
′Z∆1i
′MiZi
(Z ′iMiZi
)−1Z ′iMiui = Op
(√ϕN1
T
),
∑i∈G2
1
k01 − k∗δ0i
′Z02i′MiZ
∆1i
(Z ′iMiZi
)−1Z ′iMiui = Op
(√ϕN2
T
),
∑i∈G2
1
k01 − k∗δ0i
′Z02i′MiZ
01i
(Z ′iMiZi
)−1Z∆1i
′Miui = Op
(√ϕN2
)+Op
(√ϕN2
T
),
∑i∈G2
1
k01 − k∗δ0i
′Z02i′MiZ
01i
[(Zi
′MiZi
)−1 −(Z01i′MiZ
01i
)−1]Z01i′Miui = Op
(√ϕN2
T
),
(iv)N∑i=1
1
k01 − k∗u′iMiZ
∆1i
(Z ′iMiZi
)−1Z∆1i
′Miui = Op
(N
T
),
(v)
N∑i=1
1
k01 − k∗u′iMiZ
∆1i
(Z ′iMiZi
)−1Z01i′Miui = Op
(N
T
)+Op
(N√T
),
(vi)
N∑i=1
1
k01 − k∗u′iMiZ
01i
[(Z ′iMiZi
)−1 −(Z01i′MiZ
01i
)−1]Z01i′Miui = Op
(N
T
).
Proof of Lemma D.4. (i) It is shown that
1√k01 − k∗
Z∆1i
′ui = Op(1), since V ar
(1√
k01 − k∗Z∆1i
′ui
)< ∞.
Then, we have ∑i∈G1
1
k01 − k∗δ0i
′Z∆1i
′ui = Op(
√ϕN1).
(ii) We can show that
∑i∈G1
1
k01 − k∗δ0i
′Z∆1i
′Xi
(X ′
iXi
)−1X ′
iui =1√T
∑i∈G1
δ0i′ Z∆
1i′Xi
k01 − k∗
(X ′
iXi
T
)−1 1√TX ′
iui = Op
(√ϕN1
T
),
since for large T1√TXi
′ui = Op(1).
110
(iii) By expanding Mi, we can show that
∑i∈G1
1
k01 − k∗δ0i
′Z∆1i
′MiZi
(Z ′iMiZi
)−1Z ′iMiui
=1√T
∑i∈G1
δ0i′ Z∆
1i′Zi
k01 − k∗
(Z ′iMiZi
T
)−1 1√TZ ′iMiui
− 1√T
∑i∈G1
δ0i′ Z∆
1i′Xi
k01 − k∗
(X ′
iXi
T
)−1 X ′iZi
T
(Z ′iMiZi
T
)−1 1√TZ ′iMiui
= Op
(√ϕN1
T
).
To prove the second order, since Z02i′Z∆1i = 0, we have
∑i∈G2
1
k01 − k∗δ0i
′Z02i′MiZ
∆1i
(Z ′iMiZi
)−1Z ′iMiui
=∑i∈G2
1
k01 − k∗δ0i
′Z02i′Z∆1i
(Z ′iMiZi
)−1Z ′iMiui
− 1√T
∑i∈G2
δ0i′Z0
2i′Xi
T
(X ′
iXi
T
)−1 X ′iZ
∆1i
k01 − k∗
(Z ′iMiZi
T
)−1 1√TZ ′iMiui
= Op
(√ϕN2
T
).
Considering the third order, we can show that
∑i∈G2
1
k01 − k∗δ0i
′Z02i′MiZ
01i
(Z ′iMiZi
)−1Z∆1i
′Miui
=∑i∈G2
1
k01 − k∗δ0i
′Z02i′MiZ
01i
(Z ′iMiZi
)−1Z∆1i
′ui
−∑i∈G2
1
k01 − k∗δ0i
′Z02i′MiZ
01i
(Z ′iMiZi
)−1Z∆1i
′Xi(X
′iXi)
−1X ′iui
=1√
k01 − k∗Op
(√ϕN2
)+Op
(√ϕN2
T
)
= Op
(√ϕN2
)+Op
(√ϕN2
T
).
111
The last term can be transformed into∑i∈G2
1
k01 − k∗δ0i
′Z02i′MiZ
01i
(Zi
′MiZi
)−1(Z01i′MiZ
01i − Zi
′MiZi
)(Z01i′MiZ
01i
)−1Z01i′Miui
=∑i∈G2
1
k01 − k∗δ0i
′Z02i′MiZ
01i
(Zi
′MiZi
)−1(−Z∆
1i′Z∆1i − Z∆
1i′MiZ
01i − Z0
1i′MiZ
∆1i
)(Z01i′MiZ
01i
)−1Z01i′Miui
+∑i∈G2
1
k01 − k∗δ0i
′Z02i′MiZ
01i
(Zi
′MiZi
)−1(Z∆1i
′Xi(X
′iXi)
−1X ′iZ
∆1i
)(Z01i′MiZ
01i
)−1Z01i′Miui
= Op
(√ϕN2
T
)+Op
(√ϕN2
T
).
(iv) The term∑N
i=11
T (k01−k∗)u′iMiZ
∆1i
(Z′iMiZi
T
)−1Z∆1i
′Miui has the same order as that of∑N
i=11
T (k01−k∗)u′iMiZ
∆1iZ
∆1i
′Miui since the matrix
Z′iMiZi
T = Op(1) for large T. Expanding
matrix Mi, we have
N∑i=1
1
T (k01 − k∗)u′iMiZ
∆1iZ
∆1i
′Miui
=1
T (k01 − k∗)
N∑i=1
u′iZ∆1iZ
∆1i
′ui −
2
T (k01 − k∗)
N∑i=1
u′iXi(X′iXi)
−1X ′iZ
∆1iZ
∆1i
′ui
+1
T (k01 − k∗)
N∑i=1
u′iXi(X′iXi)
−1X ′iZ
∆1iZ
∆1i
′Xi(X
′iXi)
−1X ′iui (D.53)
Consider the first term in (D.53),
1
T (k01 − k∗)
N∑i=1
u′iZ∆1iZ
∆1i
′ui = Op
(N
T
).
Similarly, it can be shown that the second term
1
T (k01 − k∗)
N∑i=1
u′iXi(X′iXi)
−1X ′iZ
∆1iZ
∆1i
′ui
=
√k01 − k∗
T
1
T
N∑i=1
u′iXi√T
(X ′
iXi
T
)−1 X ′iZ
∆1i
k01 − k∗Z∆1i
′ui√
k01 − k∗= Op
(N
T
),
and the third term
1
T (k01 − k∗)
N∑i=1
u′iXi(X′iXi)
−1X ′iZ
∆1iZ
∆1i
′Xi(X
′iXi)
−1X ′iui
=k01 − k∗
T
1
T
N∑i=1
u′iXi√T
(X ′
iXi
T
)−1 X ′iZ
∆1i
k01 − k∗Z∆1i
′Xi
k01 − k∗
(X ′
iXi
T
)−1 X ′iui√T
= Op
(N
T
).
Thus, we have
N∑i=1
1
k01 − k∗u′iMiZ
∆1i
(Z ′iMiZi
)−1Z∆1i
′Miui = Op
(N
T
).
112
(v) By expanding Mi, it is shown that
N∑i=1
1
k01 − k∗u′iMiZ
∆1i
(Z ′iMiZi
)−1Z01i′Miui
=N∑i=1
1
k01 − k∗u′iZ
∆1i
(Z ′iMiZi
)−1Z01i′Miui −
N∑i=1
1
k01 − k∗u′iXi(X
′iXi)
−1X ′iZ
∆1i
(Z ′iMiZi
)−1Z01i′Miui
= Op
(N√T
)+Op
(N
T
).
(vi) We show that
N∑i=1
1
k01 − k∗ui
′MiZ01i
(Zi
′MiZi
)−1(Z01i′MiZ
01i − Zi
′MiZi
)(Z01i′MiZ
01i
)−1Z01i′Miui
=
N∑i=1
1
k01 − k∗ui
′MiZ01i
(Zi
′MiZi
)−1(−Z∆
1i′MiZ
∆1i − Z∆
1i′MiZ
01i − Z0
1i′MiZ
∆1i
)(Z01i′MiZ
01i
)−1Z01i′Miui
= Op
(N
T
).
The proof of Lemma D.4 is complete.Proof of Proposition 4.1. We first show that for any given ϵ > 0,
P
supK(C1)
∣∣∣∣∣∣∣∑i∈G1
H1i(k∗) +
∑i∈G2
H2i(k∗)
k01 − k∗
∣∣∣∣∣∣∣ ≥ λϕN
< ϵ. (D.54)
Using (D.38) and (D.42), we see that the sum of H1i(k∗) +H2i(k
∗) can be decomposed into
three parts:
1
k01 − k∗
∑i∈G1
H1i(k∗) +
∑i∈G2
H2i(k∗)
=
1
k01 − k∗
∑i∈G1
2δ0i′Z01i′MiZi(Z
′iMiZi)
−1Z ′iMiui −
∑i∈G1
2δ0i′Z01i′Miui
+
1
k01 − k∗
∑i∈G2
2δ0i′Z02i′MiZi(Z
′iMiZi)
−1Z ′iMiui −
∑i∈G2
2δ0i′Z02i′MiZ
01i(Z
01i′MiZ
01i)
−1Z01iMiui
+
1
k01 − k∗
[N∑i=1
u′iMiZ′i(Z
′iMiZi)
−1Z ′iMiui −
N∑i=1
u′iMiZ01i(Z
01i′MiZ
01i)
−1Z01i′Miui
]= H1 +H2 +H3.
113
Consider the first term, by replacing Z01i by Zi − Z∆
1i ,
|H1| = 2
∣∣∣∣∣∣ 1
k01 − k∗
∑i∈G1
[δ0i
′Z01i′MiZi(Z
′iMiZi)
−1Z ′iMiui − δ0i
′Z01i′Miui
]∣∣∣∣∣∣= 2
∣∣∣∣∣∣ 1
k01 − k∗
∑i∈G1
[δ0i
′Z ′iMiui − δ0i
′Z∆1i
′MiZi(Z
′iMiZi)
−1Z ′iMiui − δ0i
′Zi
′Miui + δ0i′Z∆1i
′Miui
]∣∣∣∣∣∣= 2
∣∣∣∣∣∣ 1
k01 − k∗
∑i∈G1
[δ0i
′Z∆1i
′Miui − δ0i
′Z∆1i
′MiZi(Z
′iMiZi)
−1Z ′iMiui
]∣∣∣∣∣∣≤ 2
∣∣∣∣∣∣ 1
k01 − k∗
∑i∈G1
δ0i′Z∆1i
′ui
∣∣∣∣∣∣+ 2
∣∣∣∣∣∣ 1
k01 − k∗
∑i∈G1
δ0i′Z∆1i
′Xi(X
′iXi)
−1X ′iui
∣∣∣∣∣∣+2
∣∣∣∣∣∣ 1
k01 − k∗
∑i∈G1
δ0i′Z∆1i
′MiZi(Z
′iMiZi)
−1Z ′iMiui
∣∣∣∣∣∣= Op(
√ϕN1) +Op
(√ϕN1
T
)+Op
(√ϕN1
T
), (D.55)
where the inequality is obtained by expanding Mi and the final equality uses the orders in
(i)-(iii) of Lemma D.4.
114
For the second term H2, replacing Zi by Z∆1i + Z0
1i, and using (iii) of Lemma D.4,
|H2| = 2
∣∣∣∣∣∣ 1
k01 − k∗
∑i∈G2
[2δ0i
′Z02i′MiZi(Z
′iMiZi)
−1Z ′iMiui − 2δ0i
′Z02i′MiZ
01i(Z
01i′MiZ
01i)
−1Z01i′Miui
]∣∣∣∣∣∣= 2
∣∣∣∣∣∣ 1
k01 − k∗
∑i∈G2
[δ0i
′Z02i′MiZ
∆1i(Z
′iMiZi)
−1Z ′iMiui + δ0i
′Z02i′MiZ
01i(Z
′iMiZi)
−1Z ′iMiui
−δ0i′Z02i′MiZ
01i(Z
01i′MiZ
01i)
−1Z01iMiui
]∣∣∣= 2
∣∣∣∣∣∣ 1
k01 − k∗
∑i∈G2
[δ0i
′Z02i′MiZ
∆1i(Z
′iMiZi)
−1Z ′iMiui + δ0i
′Z02i′MiZ
01i(Z
′iMiZi)
−1Z∆1i
′Miui
+δ0i′Z02i′MiZ
01i
[(Zi
′MiZi
)−1 −(Z01i′MiZ
01i
)−1]Z01i′Miui
]∣∣∣∣≤ 2
∣∣∣∣∣∣ 1
k01 − k∗
∑i∈G2
δ0i′Z02i′MiZ
∆1i(Z
′iMiZi)
−1Z ′iMiui
∣∣∣∣∣∣+2
∣∣∣∣∣∣ 1
k01 − k
∑i∈G2
δ0i′Z02i′MiZ
01i(Z
′iMiZi)
−1Z∆1i
′Miui
∣∣∣∣∣∣ (D.56)
+2
∣∣∣∣∣∣ 1
k01 − k∗
∑i∈G2
δ0i′Z02i′MiZ
01i
[(Zi
′MiZi
)−1 −(Z01i′MiZ
01i
)−1]Z01i′Miui
∣∣∣∣∣∣= Op
(√ϕN2
T
)+Op(
√ϕN2) +Op
(√ϕN2
T
)+Op
(√ϕN2
T
). (D.57)
By (iv)-(vi) of Lemma D.4, the order of the third term is
|H3| =
∣∣∣∣∣ 1
k01 − k∗
N∑i=1
[u′iMiZ
′i(Z
′iMiZi)
−1Z ′iMiui − u′iMiZ
01i(Z
01i′MiZ
01i)
−1Z01i′Miui
]∣∣∣∣∣≤
∣∣∣∣∣ 1
k01 − k∗
N∑i=1
u′iMiZ∆1i
′(Z ′
iMiZi)−1Z∆
1i′Miui
∣∣∣∣∣+∣∣∣∣∣ 1
k01 − k∗
N∑i=1
u′iMiZ∆1i
′(Z ′
iMiZi)−1Z0
1i′Miui
∣∣∣∣∣+
∣∣∣∣∣N∑i=1
1
k01 − k∗u′iMiZ
01i
[(Z ′iMiZi
)−1 −(Z01i′MiZ
01i
)−1]Z01i′Miui
∣∣∣∣∣= Op
(N
T
)+Op
(N
T
)+Op
(N√T
)+Op
(N
T
). (D.58)
Combining (D.55), (D.57) and (D.58), under Assumption 4.2, the term
1
ϕN
∣∣∣∣∣∣∣∑i∈G1
H1i(k∗) +
∑i∈G2
H2i(k∗)
k01 − k∗
∣∣∣∣∣∣∣=
1
ϕN
∣∣∣∣∣[Op(
√ϕN1) +Op(
√ϕN2) +Op
(√ϕN1
T
)+Op
(√ϕN2
T
)+Op
(N√T
)+Op
(N
T
)]∣∣∣∣∣→ 0,
115
which will vanish for any k∗ ∈ K(C1). On the other hand, the part ϕ−1N (k01 − k∗)−1∣∣∣∣∣ ∑i∈G1
J1i(k∗) +
∑i∈G2
J2i(k∗)
∣∣∣∣∣ has a lower bound from Lemma D.3. Hence, for any ϵ > 0,
P
supK(C1)
∣∣∣∣∣∣∣∑i∈G1
H1i(k∗) +
∑i∈G2
H2i(k∗)
k01 − k∗
∣∣∣∣∣∣∣ ≥ supK(C1)
∣∣∣∣∣∣∣∑i∈G1
J1i(k∗) +
∑i∈G2
J2i(k∗)
k01 − k∗
∣∣∣∣∣∣∣ < ϵ,
which implies that
P
supK(C1)
∑i∈G1
−J1i(k∗) +H1i(k
∗) +∑i∈G2
−J2i(k∗) +H2i(k
∗) ≥ 0
< ϵ,
P
(sup
K(C1)
N∑i=1
[SVi(k)− SVi(k
01)]≥ 0
)< ϵ.
Finally, we obtain that, for any given ϵ > 0, and both large N and T ,
P (k ∈ K(C1)) < ϵ.
In other words, the total sum of squared residuals cannot be maximized in the case of k∗ ∈
K(C1). By symmetry, the estimation of the common break point (4.3) can be transformed
into
k = arg max1≤k∗≤T−1
∑i∈G1
(SVi(k∗)− SVi(k
02)) +
∑i∈G2
(SVi(k∗)− SVi(k
02))
.
Similarly, we can show that, for any given ϵ > 0,
P (k ∈ K(C2)) < ϵ.
The common break point estimator is obtained in set K(C2) with probability tending to zero.
Thus, we complete the proof of Proposition 4.1. Proposition 4.1 indicates that the estimated common break will be stochastically bounded
by either of true break points, or locate between k01 and k02. Then, we can say that
k01 − k
T= Op
(1
T
), if k ≤ k01, (D.59)
k − k02T
= Op
(1
T
), if k ≥ k02. (D.60)
Using this property of common break estimator under the alternative, we next show that the
numerator of the statistic will diverge under H1A.
116
Proof of Proposition 4.2. Under the alternative, from (D.4), the CUSUM of the residuals
for individuals in group j (j = 1, 2) are calculated as
1√NT
∑i∈Gj
k∑t=1
uit
=1√NT
∑i∈Gj
k∑t=1
uit −1√NT
∑i∈Gj
k∑t=1
x′it(βi − β0i )−
1√NT
∑i∈Gj
k∑t=k+1
x′it(δi − δ0i )1k>k
+1√NT
∑i∈Gj
k∑t=k0j+1
x′itδ0i 1k0j<k≤k +
1√NT
∑i∈Gj
k∑t=k0j+1
x′itδ0i 1k0j<k<k
− 1√NT
∑i∈Gj
k∑t=k+1
x′itδ0i 1k<k≤k0j
− 1√NT
∑i∈Gj
k0j∑t=k+1
x′itδ0i 1k<k0j<k.
Then, the total sum of squared residuals 1√NT
∑Ni=1
∑kt=1 uit is expressed as
1√NT
N∑i=1
k∑t=1
uit −1√NT
N∑i=1
k∑t=1
x′it(βi(k)− β0i )−
1√NT
N∑i=1
k∑t=k+1
x′it(δi(k)− δ0i )1k>k
+1√NT
∑i∈G1
k∑t=k01+1
x′itδ0i 1k01<k≤k +
1√NT
∑i∈G1
k∑t=k01+1
x′itδ0i 1k01<k<k
− 1√NT
∑i∈G1
k∑t=k+1
x′itδ0i 1k<k≤k01
− 1√NT
∑i∈G1
k01∑t=k+1
x′itδ0i 1k<k01<k
+1√NT
∑i∈G2
k∑t=k02+1
x′itδ0i 1k02<k≤k +
1√NT
∑i∈G2
k∑t=k02+1
x′itδ0i 1k02<k<k
− 1√NT
∑i∈G2
k∑t=k+1
x′itδ0i 1k<k≤k02
− 1√NT
∑i∈G2
k02∑t=k+1
x′itδ0i 1k<k02<k
= UH11 − UH1
2 − UH13 + UH1
4 + UH15 − UH1
6 − UH17 + UH1
8 + UH19 − UH1
10 − UH111 . (D.61)
Since k < k01 in UH16 , UH1
7 , and k > k02 in UH18 , UH1
9 , using the orders (D.59) and (D.60), we
have
UH16 = Op
(√N
T
), UH1
7 = Op
(√N
T
), UH1
8 = Op
(√N
T
), UH1
9 = Op
(√N
T
). (D.62)
117
From (D.10), we know that, for i ∈ Gj , j = 1, 2,
√T (βi − β0
i ) =√T
k∑t=1
xitx′it
−1k∑
t=1
xituit +√T
k∑t=1
xitx′it
−1k∑
t=k0j+1
xitx′itδ
0i 1k>k0j
=
√T
(k∑
t=1xitx
′it
)−1k∑
t=1xituit +
√T
(k∑
t=1xitx
′it
)−1k∑
t=k01+1
xitx′itδ
0i 1k>k01
if j = 1,
√T
(k∑
t=1xitx
′it
)−1k∑
t=1xituit +Op
(1√T
)if j = 2,
using order (D.60). Then, the second term UH12 becomes
1√N
∑i∈G1
1
T
k∑t=1
x′it√T
k∑t=1
xitx′it
−1k∑
t=1
xituit +
k∑t=1
xitx′it
−1k∑
t=k01+1
xitx′itδ
0i 1k>k01
+
1√N
∑i∈G2
1
T
k∑t=1
x′it
√T
k∑t=1
xitx′it
−1k∑
t=1
xituit +Op
(1√T
)= Op(1) +
1√N
∑i∈G1
1
T
k∑t=1
x′it
1
T
k∑t=1
xitx′it
−1
1√T
k∑t=k01+1
xitx′itδ
0i 1k>k01
+Op(1) +Op
(√N
T
)
= Op(1) + UH121 1k>k01
+Op(1) +Op
(√N
T
). (D.63)
118
Considering the third term UH13 , for individuals i ∈ Gj , the coefficient estimator is
δi =
T∑t=k+1
xitx′it
−1T∑
t=k+1
xityit −
k∑t=1
xitx′it
−1k∑
t=1
xityit
=
T∑t=k+1
xitx′it
−1T∑
t=k+1
xit(x′itβ
0i + x′itδ
0i + uit)1k≥k0j
+
k0j∑t=k+1
xit(x′itβ
0i + uit) +
T∑t=k0j+1
xit(x′itβ
0i + x′itδ
0i + uit)
1k<k0j
−
β0i +
k∑t=1
xitx′it
−1k∑
t=1
xituit +
k∑t=1
xitx′it
−1k∑
t=k0j+1
xitx′itδ
0i 1k>k0j
= δ0i +
T∑t=k+1
xitx′it
−1T∑
t=k+1
xituit −
k∑t=1
xitx′it
−1k∑
t=1
xituit
−
T∑t=k+1
xitx′it
−1 k0j∑t=k+1
xitx′itδ
0i 1k<k0j
−
k∑t=1
xitx′it
−1k∑
t=k0j+1
xitx′itδ
0i 1k>k0j
,
where the fourth term in the final equality is Op(1/T ) for individuals in group 1 by using
order (D.59), while the fifth term is Op(1/T ) for individuals in group 2 by using order (D.60).
Then, the third term UH13 can be rewritten as 1√
N
N∑i=1
1
T
k∑t=k+1
x′it√T
T∑t=k+1
xitx′it
−1T∑
t=k+1
xituit −
k∑t=1
xitx′it
−1k∑
t=1
xituit
−Op
(√N
T
)− 1√
N
∑i∈G1
1
T
k∑t=k+1
x′it
k∑t=1
xitx′it
−1k∑
t=k01+1
xitx′itδ
0i 1k>k01
− 1√N
∑i∈G2
1
T
k∑t=k+1
x′it
1
T
T∑t=k+1
xitx′it
−1
1√T
k02∑t=k+1
xitx′itδ
0i 1k<k02
−Op
(√N
T
) 1k>k
=
[Op(1)−Op
(√N
T
)− UH1
31 1k>k01− UH1
32 1k<k02−Op
(√N
T
)]1k>k. (D.64)
119
Thus, from (D.62), (D.63) and (D.64), (D.61) can be rewritten by
1√NT
N∑i=1
k∑t=1
uit
= Op(1)−
[Op(1) + UH1
21 1k>k01+Op(1) +Op
(√N
T
)]
−
[Op(1)−Op
(√N
T
)− UH1
31 1k>k01− UH1
32 1k<k02−Op
(√N
T
)]1k>k
+UH14 + UH1
5 +Op
(√N
T
)− UH1
10 − UH111
= −UH121 1k>k01
+[UH131 1k>k01
+ UH132 1k<k02
]1k>k + UH1
4 + UH15 − UH1
10 − UH111
+Op(1) +Op
(√N
T
). (D.65)
We next show that (D.65) will diverge at rate of√NT under the alternative in following
three cases.
Case (i). Suppose that k < k01 < k02, we have
UH121 1k>k01
= 0, UH131 1k>k01
= 0, UH14 = 0, UH1
5 = 0.
Choosing k ∈ (k01 + C1, k02], we can see that UH1
11 = 0, and
−UH110 + UH1
32 1k<k021k>k
= − 1√NT
∑i∈G2
k∑t=k+1
x′itδ0i +
1√NT
∑i∈G2
k∑t=k+1
x′it
T∑t=k+1
xitx′it
−1k02∑
t=k+1
xitx′itδ
0i
= −√
T
N
∑i∈G2
1
T
k∑t=k+1
x′it
1
T
T∑t=k+1
xitx′it
−1
1
T
T∑t=k02+1
xitx′itδ
0i
= Op(√NT ).
Thus, we have
supk∈[1,T−1]
USNT (k, k) ≥ supk∈(k01+C1,k02 ]
USNT (k, k)
= supk∈(k01+C1,k02 ]
(UH132 − UH1
10 +Op(1) +Op
(√N
T
))2
= Op(NT ).
Case (ii). Suppose that k01 ≤ k ≤ k02. If k ∈ (k01 + C1, k02], choosing k ∈ [k01, k
01 + C1], we
have
UH131 1k>k = 0, UH1
32 1k>k = 0, UH15 = 0, UH1
10 = 0, UH111 = 0,
120
UH14 =
1√NT
∑i∈G1
k∑t=k01+1
x′itδ0i = Op
(√N
T
),
since k < k, and
UH121 =
√T
N
∑i∈G1
1
T
k∑t=1
x′it
1
T
k∑t=1
xitx′it
−1
1
T
k∑t=k01+1
xitx′itδ
0i = Op(
√NT ).
Thus, we have
supk∈[1,T−1]
USNT (k, k) ≥ supk∈[k01 ,k01+C1]
USNT (k, k)
= supk∈[k01 ,k01+C1]
(UH121 +Op(1) +Op
(√N
T
))2
= Op(NT ).
If k ∈ [k01, k01 + C1], since (k − k01)/T = Op(1/T ),
UH121 1k>k01
= Op
(√N
T
), UH1
31 1k>k01= Op
(√N
T
), UH1
4 = Op
(√N
T
), UH1
5 = Op
(√N
T
).
Choosing k = k02, we have UH111 = 0, and
UH132 1k<k02
− UH110
=1√NT
∑i∈G2
k02∑t=k+1
x′it
T∑t=k+1
xitx′it
−1k02∑
t=k+1
xitx′itδ
0i −
1√NT
∑i∈G2
k02∑t=k+1
x′itδ0i
= −√
T
N
∑i∈G2
1
T
k02∑t=k+1
x′it
1
T
T∑t=k+1
xitx′it
−1
1
T
T∑t=k02+1
xitx′itδ
0i
= Op
(√NT
).
Thus, we have
supk∈[1,T−1]
USNT (k, k) ≥ USNT (k02, k) =
(UH132 − UH1
10 +Op(1) +Op
(√N
T
))2
= Op(NT ).
Case (iii). Suppose that k01 < k02 < k, we have
UH132 1k<k02
= 0, UH110 = 0, UH1
11 = 0.
Choosing k ∈ (k01, k01 + C1], we can see that UH1
31 1k>k = 0, UH15 = 0, UH1
4 = Op
(√N/T
),
121
and
UH121 1k>k01
=1√NT
N∑i=1
k∑t=1
x′it
k∑t=1
xitx′it
−1k∑
t=k01+1
xitx′itδ
0i
=
√T
N
N∑i=1
1
T
k∑t=1
x′it
1
T
k∑t=1
xitx′it
−1
1
T
k∑t=k01+1
xitx′itδ
0i
= Op(√NT ).
Thus, we have
supk∈[1,T−1]
USNT (k, k) ≥ supk∈(k01 ,k01+C1]
USNT (k, k)
= supk∈(k01 ,k01+C1]
(UH121 +Op(1) +Op
(√N
T
))2
= Op(NT ).
The proof of Proposition 4.2 is complete. Proof of Proposition 4.3. Form Proposition 4.1, under the alternative H1A, the esti-
mated common break k takes value in [k01 − C1, k02 + C2] with probability approaching one,
for arbitrary positive constants C1, C2. Thus, we investigate the limiting properties of the
normalization factor in three cases that k01−C1 ≤ k < k01, k01 ≤ k ≤ k02 and k02 < k ≤ k02+C2.
Case (i). Suppose that k01 − C1 ≤ k < k01, we have,
inf(k1,k2)∈Ω(ϵ)
VNT (k1, k, k2) ≤ VNT (k1, k, k02), for k1 ∈ Ω(ϵ).
To show that the minimum value of VNT (k1, k, k2) is stochastically bounded, it is sufficient
to show that for any k1 ∈ Ω(ϵ),
VNT (k1, k, k02) = Op(1).
In this case, the model is estimated by regressing Yi on [Xi, X1i(k1, k), X2i(k, k02), X3i(k
02)],
which is written as,
Yi = [Xi, X1i(k1, k), X2i(k, k02), X3i(k
02)]
βiδ1iδ2iδ3i
+ ui
= Xi(k1, k, k02)b1i + ui, (D.66)
122
while the true model with distinct common breaks is defined by
Yi = [Xi, X1i(k, k01), X2i(k
01, k
02), X3i(k
02)]b
01i + ui
= Xi(k, k01, k
02)b
01i + ui (D.67)
b01i =
[β0
i′, 0, δ0i
′, δ0i
′]′ if i ∈ G1,
[β0i′, 0, 0, δ0i
′]′ if i ∈ G2.
Replacing Yi in (D.66) by (D.67), the residuals can be written by, for individuals in group 1,
ui = Xi(k, k01, k
02)b
01i + ui − Xi(k1, k, k
02)b1i(k)
= ui − Xi(k1, k, k02)[b1i(k)− b01i] + [Xi(k, k
01, k
02)− Xi(k1, k, k
02)]b
01i
= ui − Xi(k1, k, k02)
βi − β0
i
δ1iδ2i − δ0iδ3i − δ0i
+ [0, X1i(k, k01)−X1i(k1, k), X2i(k
01, k
02)−X2i(k, k
02), 0]
β0i
0δ0iδ0i
= ui −Xi(βi − β0
i )−X1i(k1, k)δ1i −X2i(k, k02)(δ2i − δ0i )−X3i(k
02)(δ3i − δ0i )
+[X2i(k0, k02)−X2i(k, k
02)]δ
0i . (D.68)
For individuals in group 2, we have,
ui = ui − Xi(k1, k, k02)
βi − β0
i
δ1iδ2i
δ3i − δ0i
+ [0, X1i(k, k01)−X1i(k1, k), X2i(k
01, k
02)−X2i(k, k
02), 0]
β0i
00δ0i
= ui −Xi(βi − β0
i )−X1i(k1, k)δ1i −X2i(k, k02)δ2i −X3i(k
02)(δ3i − δ0i ). (D.69)
By the definition of the denominator, VNT (k1, k, k02) can be decomposed into four parts V H1
1 ,
V H12 , V H1
3 and V H14 , defined by
V H11 =
1
T
k1∑s=1
(1√NT
N∑i=1
s∑t=1
uit
)2
, V H12 =
1
T
k∑s=k1+1
1√NT
N∑i=1
k∑t=s
uit
2
,
V H13 =
1
T
k02∑s=k+1
1√NT
N∑i=1
s∑t=k+1
uit
2
, V H14 =
1
T
T∑s=k02+1
(1√NT
N∑i=1
T∑t=s
uit
)2
.
From (D.68) and (D.69), for t ≤ k, the residuals uit are calculated on the basis of subsamples
xi1, · · · , xik1, and xi(k1+1), · · · , xik, which are the same as that of (D.22) under the null.
Using the asymptotic distribution of (D.23)-(D.27) and k01 − k = Op(1), we can derive the
limiting distribution of the terms V H11 and V H1
2 as follows:
V H11 +V H1
2 ⇒ σ2
∫ τ1
0
(W (r)− r
τ1W (τ1)
)2
dr+σ2
∫ τ01
τ1
[W (τ01 )−W (r)− τ01 − r
τ01 − τ1(W (τ01 )−W (τ1))
]2.
123
We next consider the third term, which can be rewritten as
V H13 =
1
T
k02∑s=k+1
1√NT
N∑i=1
s∑t=k+1
uit −1√NT
N∑i=1
s∑t=k+1
x′it(βi − β0i )−
1√NT
∑i∈G1
s∑t=k+1
x′it(δ2i − δ0i )
− 1√NT
∑i∈G1
s∑t=k+1
x′itδ0i 1s≤k01 +
k01∑t=k+1
x′itδ0i 1s>k01
− 1√NT
∑i∈G2
s∑t=k+1
x′itδ2i
2
(D.70)
=1
T
k02∑s=k+1
1√NT
N∑i=1
s∑t=k+1
uit −1√NT
N∑i=1
s∑t=k+1
x′it(βi − β0i )−
1√NT
∑i∈G1
s∑t=k+1
x′it(δ2i − δ0i )
− Op
(√N
T
)− 1√
NT
∑i∈G2
s∑t=k+1
x′itδ2i
2
=1
T
k02∑s=k+1
(V H131 − V H1
32 − V H133 −Op
(√N
T
)− V H1
34
)2
.
Since k01 − k = Op(1), the terms in parentheses of (D.70) are op(1) and will vanish as N,T →
∞. Similar to V31 and V32, we have,
V H131 ⇒ σ(W (r)−W (τ01 )), (D.71)
V H132 ⇒ σ(r − τ01 )
W (τ1)
τ1. (D.72)
The coefficient estimator δ2i is calculated by, for i ∈ G1,
δ2i =
k02∑t=k+1
xitx′it
−1k02∑
t=k+1
xityit −
(k1∑t=1
xitx′it
)−1 k1∑t=1
xityit
=
k02∑t=k+1
xitx′it
−1 k01∑t=k+1
xit(x′itβ
0i + uit) +
k02∑t=k01+1
xit(x′itβ
0i + x′itδ
0i + uit)
−
β0i +
(k1∑t=1
xitx′it
)−1 k1∑t=1
xituit
= δ0i +
k02∑t=k+1
xitx′it
−1k02∑
t=k+1
xituit −
(k1∑t=1
xitx′it
)−1 k1∑t=1
xituit −Op
(1
T
),
for i ∈ G2,
δ2i =
k02∑t=k+1
xitx′it
−1k02∑
t=k+1
xit(x′itβ
0i + uit)−
β0i +
(k1∑t=1
xitx′it
)−1 k1∑t=1
xituit
=
k02∑t=k+1
xitx′it
−1k02∑
t=k+1
xituit −
(k1∑t=1
xitx′it
)−1 k1∑t=1
xituit.
124
Then, we have,
V H133 + V H1
34 =1√N
N∑i=1
1
T
s∑t=k01+1
x′it +Op
(1
T
)√T
k02∑
t=k01+1
xitx′it
−1k02∑
t=k01+1
xituit
−
(k1∑t=1
xitx′it
)−1 k1∑t=1
xituit + Op
(1
T
)+Op
(1√T
)+Op
(1
T
)]⇒ σ(r − τ01 )
(W (τ02 )−W (τ01 )
τ02 − τ01− W (τ1)
τ1
).
Thus, we can find the limiting distribution of V H13 is
σ2
∫ τ02
τ01
[W (r)−W (τ01 )−
r − τ01τ02 − τ01
(W (τ02 )−W (τ01 ))
]2dr.
Since the coefficient estimator δ3i remains the same in group 1 and 2, we have
V H14 =
1
T
T∑s=k02+1
1√NT
N∑i=1
[T∑t=s
uit −T∑t=s
x′it(β1i − β0i )−
T∑t=s
x′it(δ3i − δ0i )
]2
⇒ σ2
∫ 1
τ02
[W (1)−W (r)− 1− r
1− τ02
(W (1)−W (τ02 )
)]2dr.
Thus, we can say that
inf(k1,k2)∈Ω(ϵ)
VNT (k1, k, k2) ≤ VNT (k1, k, k02) = V H1
1 + V H12 + V H1
3 + V H14 = Op(1).
The proof of Proposition 4.2(i) is complete.
Case (ii) Suppose that k01 ≤ k ≤ k02. In this case, we have
inf(k1,k2)∈Ω(ϵ)
VNT (k1, k, k2) ≤ VNT (k01, k, k
02).
We can easily find that the term VNT (k01, k, k
02) estimated by using true break points will
have a finite limiting distribution.
Case (iii) Suppose that k02 < k ≤ k02+C2. In this case, from (D.60), we have k−k02 = Op(1).
Similarly to proof of case (i), we can show that
inf(k1,k2)∈Ω(ϵ)
VNT (k1, k, k2) ≤ VNT (k01, k, k2) = Op(1), for any k2 ∈ Ω(ϵ).
Thus, we complete the proof of Proposition 4.3. Proof of Theorem 4.2. From Proposition 4.1, we show that P (k ∈ [k01 −C1, k
02 +C2]) → 1.
Furthermore, for any k ∈ [k01 − C1, k02 + C2],
supk∈Ω(ϵ)
USNT (k, k) = Op(NT ),
sup(k1,k2)∈Ω(ϵ)
V −1NT (k1, k, k2) = Op(1) (or ∞),
125
from Propositions 4.2 and 4.3. Thus, the proof of Theorem 4.2 is complete.
126
Table 4.1: Critical values
cλ0 10% 5% 1%
c0.1 43.425 56.822 92.840c0.2 43.912 57.249 92.341c0.3 45.501 57.962 93.689c0.4 45.427 57.997 90.335c0.5 45.540 57.842 85.984c0.6 45.250 57.276 90.397c0.7 46.489 59.175 93.728c0.8 45.201 59.248 94.886c0.9 43.515 57.203 92.908
127
Table 4.2: Size of the test DGP.1
T N 10% 5% 1%
(a) ρ = 020 10 0.145 0.089 0.034
50 0.136 0.074 0.026100 0.098 0.053 0.015
50 10 0.086 0.048 0.01150 0.076 0.036 0.009100 0.063 0.027 0.006
100 10 0.073 0.032 0.00550 0.072 0.034 0.006100 0.064 0.033 0.004
200 10 0.083 0.037 0.00850 0.075 0.032 0.006100 0.086 0.041 0.008
(b) ρ = 0.420 10 0.226 0.146 0.060
50 0.234 0.153 0.069100 0.231 0.143 0.058
50 10 0.134 0.068 0.02250 0.151 0.084 0.028100 0.145 0.084 0.024
100 10 0.113 0.063 0.01750 0.105 0.058 0.016100 0.101 0.055 0.016
200 10 0.107 0.048 0.01350 0.091 0.043 0.009100 0.091 0.050 0.013
(c) ρ = 0.820 10 0.457 0.353 0.203
50 0.468 0.353 0.201100 0.450 0.360 0.206
50 10 0.352 0.262 0.14750 0.393 0.299 0.156100 0.364 0.277 0.150
100 10 0.252 0.178 0.08250 0.246 0.168 0.080100 0.246 0.164 0.079
200 10 0.179 0.110 0.03850 0.164 0.099 0.034100 0.155 0.100 0.036
128
Table 4.3: Power of the test DGP.2 (under H1A)
T N 10% 5% 1%
(a) ρ = 020 10 0.149 0.088 0.021
50 0.586 0.455 0.222100 0.846 0.741 0.474
50 10 0.281 0.174 0.04650 0.918 0.847 0.614100 0.993 0.982 0.911
100 10 0.561 0.425 0.19050 0.996 0.980 0.916100 1.000 1.000 0.992
(b) ρ = 0.420 10 0.304 0.216 0.100
50 0.803 0.722 0.519100 0.965 0.929 0.789
50 10 0.390 0.276 0.12150 0.950 0.901 0.742100 0.997 0.991 0.946
100 10 0.611 0.491 0.25850 0.994 0.985 0.932100 1.000 1.000 0.994
(c) ρ = 0.820 10 0.745 0.675 0.505
50 0.995 0.986 0.946100 1.000 0.999 0.990
50 10 0.713 0.622 0.45250 0.996 0.988 0.934100 1.000 1.000 0.992
100 10 0.771 0.683 0.49350 0.998 0.993 0.964100 1.000 1.000 0.999
1 k01 = [T/4], k0
2 = [3T/4],N1 : N2 = 5 : 5.
129
Table 4.4: Power of the test DGP.2 (under H1A)
ρ δ1i, δ2i 10% 5% 1%
0 U(0,0.1) 0.130 0.064 0.012U(0.1,0.2) 0.538 0.401 0.167U(0.2,0.3) 0.918 0.850 0.611U(0.3,0.4) 0.995 0.984 0.919U(0.4,0.5) 1.000 0.999 0.986U(0.5,0.6) 1.000 1.000 0.998U(0.6,0.7) 1.000 1.000 1.000U(0.7,0.8) 1.000 1.000 1.000U(0.8,0.9) 1.000 1.000 1.000U(0.9,1.0) 1.000 1.000 1.000U(1.4,1.5) 1.000 1.000 1.000
0.4 U(0,0.1) 0.206 0.138 0.037U(0.1,0.2) 0.652 0.540 0.292U(0.2,0.3) 0.955 0.913 0.750U(0.3,0.4) 0.997 0.992 0.949U(0.4,0.5) 1.000 1.000 0.993U(0.5,0.6) 1.000 1.000 1.000U(0.6,0.7) 1.000 1.000 1.000U(0.7,0.8) 1.000 1.000 1.000U(0.8,0.9) 1.000 1.000 1.000U(0.9,1.0) 1.000 1.000 1.000U(1.4,1.5) 1.000 1.000 1.000
1 T = 50, N = 50.2 k0
1 = [T/4], k02 = [3T/4], N1 : N2 = 5 : 5.
130
Table 4.5: Power of the test DGP.2 (under H1A)
k01 k02 10% 5% 1%
[0.2T] [0.25T] 0.120 0.066 0.018[0.3T] 0.325 0.225 0.070[0.4T] 0.801 0.695 0.465[0.5T] 0.941 0.891 0.734[0.6T] 0.955 0.918 0.764[0.7T] 0.925 0.875 0.692[0.8T] 0.859 0.771 0.558
1 N = T = 50, ρ = 0.4, N1 : N2 = 5 : 5.
131
Table 4.6: Power of the test DGP.2 (under H1A)
N1 : N2 10% 5% 1%
2:N-2 0.168 0.105 0.0371:9 0.262 0.187 0.0732:8 0.546 0.447 0.2413:7 0.811 0.719 0.5004:6 0.938 0.888 0.7375:5 0.978 0.950 0.8401 N = T = 50, ρ = 0.4.2 k0
1 = [0.3T ], k02 = [0.7T ].
132
Table 4.7: Power of the test (under H1A)
Non-orthogonal changes (Non-)Orthogonal changes Orthogonal changes
T N 10% 5% 1% 10% 5% 1% 10% 5% 1%
(a) ρ = 020 20 0.255 0.166 0.051 0.139 0.085 0.024 0.072 0.040 0.009
50 0.565 0.434 0.209 0.184 0.120 0.050 0.084 0.045 0.014100 0.839 0.733 0.515 0.193 0.129 0.056 0.086 0.049 0.007
50 20 0.564 0.423 0.186 0.124 0.072 0.018 0.064 0.029 0.00650 0.905 0.840 0.615 0.114 0.063 0.017 0.071 0.041 0.006100 0.992 0.979 0.897 0.069 0.035 0.009 0.065 0.028 0.004
100 20 0.822 0.731 0.497 0.103 0.058 0.014 0.073 0.036 0.00750 0.994 0.980 0.905 0.081 0.043 0.006 0.077 0.037 0.005100 1.000 1.000 0.994 0.084 0.043 0.009 0.073 0.035 0.006
(b) ρ = 0.420 20 0.478 0.357 0.181 0.273 0.202 0.088 0.179 0.113 0.043
50 0.798 0.701 0.480 0.299 0.220 0.107 0.198 0.133 0.052100 0.953 0.910 0.775 0.277 0.206 0.111 0.209 0.141 0.052
50 20 0.662 0.550 0.329 0.175 0.106 0.037 0.114 0.065 0.01650 0.939 0.893 0.740 0.160 0.094 0.028 0.131 0.076 0.026100 0.997 0.990 0.943 0.136 0.080 0.022 0.130 0.077 0.020
100 20 0.844 0.766 0.548 0.128 0.067 0.022 0.113 0.063 0.01850 0.994 0.985 0.922 0.117 0.061 0.015 0.101 0.053 0.011100 1.000 1.000 0.995 0.117 0.063 0.015 0.105 0.058 0.018
(c) ρ = 0.820 20 0.882 0.834 0.698 0.468 0.384 0.229 0.400 0.292 0.154
50 0.988 0.977 0.924 0.422 0.343 0.202 0.412 0.321 0.180100 1.000 1.000 0.991 0.408 0.316 0.184 0.438 0.339 0.179
50 20 0.881 0.825 0.677 0.348 0.272 0.150 0.324 0.247 0.12150 0.989 0.977 0.928 0.322 0.240 0.122 0.311 0.226 0.114100 1.000 0.998 0.991 0.300 0.224 0.117 0.321 0.236 0.116
100 20 0.930 0.871 0.724 0.229 0.169 0.072 0.208 0.153 0.06550 0.999 0.996 0.974 0.231 0.163 0.062 0.209 0.140 0.054100 1.000 1.000 0.996 0.234 0.164 0.064 0.209 0.142 0.058
1 k01 = [T/4], k0
2 = [3T/4],N1 : N2 = 5 : 5.
133
Table 4.8: Power of the test (ρ = 0.4 under H2A)
k01 k02 k03 N1 : N2 : N3 T N 10% 5% 1%
[T/6] [3T/6] [4T/6] 3:3:4 50 10 0.296 0.204 0.08050 0.646 0.522 0.284100 0.740 0.642 0.445
[0.4T] [0.5T] [0.6T] 3:3:4 50 10 0.248 0.165 0.05850 0.485 0.381 0.211100 0.629 0.506 0.296
[0.2T] [0.25T] [0.5T] 3:3:4 50 10 0.342 0.239 0.10950 0.835 0.750 0.562100 0.964 0.923 0.809
[0.2T] [0.3T] [0.8T] 3:3:4 50 10 0.334 0.235 0.08750 0.704 0.591 0.363100 0.851 0.772 0.583
[0.2T] [0.5T] [0.8T] 1:4:5 50 10 0.334 0.224 0.08750 0.851 0.758 0.566100 0.925 0.866 0.725
134
Table 4.9: Power of the test (under H3A)
T N 10% 5% 1%
(a) ρ = 020 10 0.147 0.089 0.022
50 0.548 0.412 0.191100 0.654 0.497 0.255
50 10 0.281 0.173 0.05050 0.688 0.515 0.247100 0.884 0.770 0.474
100 10 0.468 0.329 0.12950 0.908 0.815 0.541100 0.969 0.895 0.637
(b) ρ = 0.420 10 0.344 0.243 0.109
50 0.725 0.584 0.352100 0.842 0.740 0.493
50 10 0.300 0.204 0.07650 0.842 0.730 0.477100 0.903 0.810 0.538
100 10 0.517 0.392 0.18350 0.942 0.874 0.642100 0.879 0.761 0.438
(c) ρ = 0.820 10 0.785 0.692 0.457
50 0.879 0.728 0.382100 0.980 0.914 0.626
50 10 0.604 0.489 0.31450 0.906 0.800 0.508100 0.940 0.833 0.485
100 10 0.710 0.621 0.39350 0.940 0.864 0.614100 0.840 0.656 0.276
135
Bibliography
[1] Adesanya, O. (2020). Testing and Dating Structural Breaks in Generalized Linear Mul-
tivariate Models for Stock Market Contagion. SSRN
[2] Anatolyev, S., and Kosenok, G. (2018). Sequential testing with uniformly distributed
size. Journal of Time Series Econometrics 10(2),1-22.
[3] Andrews, D. W. K. (1991). Heteroskedasticity and autocorrelation consistent covariance
matrix estimation. Econometrica 59, 817-858.
[4] Andrews, D. W. K., and Monahan, J. C. (1992). An improved heteroskedasticity and
autocorrelation consistent covariance matrix estimator. Econometrica 60(4),953-966.
[5] Antoch, J., Hanousek, J., Horvath, L., Huskova, M., and Wang, S. X. (2019). Structural
breaks in panel data: Large number of panels and short length time series. Econometric
Reviews, 38(7), 828-855.
[6] Aue, A., and Horvath, L. (2004). Delay time in sequential detection of change. Statistics
and Probability Letters 67(3),221-231.
[7] Aue, A., Horvath, L., Huskova, M., and Kokoszka, P. (2008a). Testing for changes in
polynomial regression. Bernoulli 14(3),637-660.
[8] Aue, A., Horvath, L., Kokoszka, P. and Steinebach, J. (2008b). Monitoring shifts in
mean: asymptotic normality of stopping times. Test 17(3),515-530.
[9] Aue, A., Horvath, L., Kuhn, M., and Steinebach, J. (2012). On the reaction time of
moving sum detectors. Journal of Statistical Planning and Inference 142(8),2271-2288.
[10] Aue, A., Horvath, L., and Reimherr, M. L. (2009). Delay times of sequential procedures
for multiple time series regression models. Journal of Econometrics 149(2),174-190.
136
[11] Aue, A., and Kuhn, M. (2008). Extreme value distribution of a recursive-type detector
in a linear model. Extremes 11,135-166.
[12] Bai, J. (1997). Estimation of a Change Point in Multiple Regression Models. Review of
Economics and Statistics, 79(4), 551-563.
[13] Bai, J. (2010). Common breaks in means and variances for panel data. Journal of Econo-
metrics, 157(1), 78-92.
[14] Baltagi, B.H. Feng, Q. and Kao, C. (2016). Estimation of heterogeneous panels with
structural breaks. Journal of Econometrics, 191(1),176-195.
[15] Baltagi, B. H., Kao, C., and Liu, L. (2017). Estimation and identification of change points
in panel models with nonstationary or stationary regressors and error term. Econometric
Reviews, 36(1-3), 85-102.
[16] Billingsley P. (1968). Convergence of probability measures. Wiley. New York.
[17] Brown, R. L., Durbin, J., and Evans, J. M. (1975). Techniques for testing the constancy
of regression relationships over time. Journal of the Royal Statistical Society B, 37, 149-
192.
[18] Carsoule, F., and Franses, PH. (2003). A note on monitoring time-varying parameters
in an autoregression. Metrika 57(1),51-62.
[19] Chen, B., and Huang, L. (2018). Nonparametric testing for smooth structural changes
in panel data models. Journal of Econometrics, 202(2), 245-267.
[20] Choi, M.D. (1983). Tricks or Treats with the Hilbert matrix. American Mathematical
Monthly 90(5),301-312.
[21] Chu, C.-S.J., Stinchcombe, M., and White, H. (1996). Monitoring structural change.
Econometrica 64(5),1045-1065.
[22] Chu, C.-S.J., and White, H. (1992). A direct test for changing trend. Journal of Business
and Economic Statistics 10(3),289-299.
[23] Claeys, P., and Vasıcek, B. (2014). Measuring bilateral spillover and testing contagion
on sovereign bond markets in Europe. Journal of Banking & Finance, 46, 151-165.
137
[24] Crainiceanu, C. M., and Vogelsang, T. J. (2007). Nonmonotonic power for tests of a mean
shift in a time series. Journal of Statistical Computation and Simulation 77, 457-476.
[25] Davidson, J. (1994). Stochastic Limit Theory. Oxford University Press, Oxford.
[26] De Wachter, S., and Tzavalis, E. (2012). Detection of structural breaks in linear dynamic
panel data models. Computational Statistics & Data Analysis, 56(11), 3020-3034.
[27] Deng, A., and Perron, P. (2008). A non-local perspective on the power properties of the
CUSUM and CUSUM of squares tests for structural change. Journal of Econometrics
142, 212-240.
[28] Doob, J. (1949). Heuristic approach to the Kolmogorov-Smirnov theorems. The Annals
of Mathematical Statistics 20(3),393-403.
[29] Fremdt, S. (2014). Asymptotic distribution of the delay time in page’s sequential proce-
dure. Journal of Statistical Planning and Inference 145,74-91.
[30] Fremdt, S. (2015). Page’s sequential procedure for change-point detection in time-series
regression. Statistics: A Journal of Theoretical and Applied Statistics 49(1),128-155.
[31] Garbade, K. (1977). Two methods for examining the stability of regression coefficients.
Journal of the American Statistical Association 72:357, 54-63.
[32] Hidalgo, J., and Schafgans, M. (2017). Inference and testing breaks in large dynamic
panels with strong cross sectional dependence. Journal of Econometrics, 196(2), 259-
274.
[33] Hitotumatu, E. (1988). Cholesky decomposition of the Hilbert matrix. Japan Journal of
Applied Mathematics 5,135-144.
[34] Horvath, L., Kuhn, M., and Steinebach, J. (2008). On the performance of the fluctuation
test for structural change. Sequential Analysis 27(2),126-140.
[35] Horvath, L. (1997). Detection of changes in linear sequences. Annals of the Institute of
Statistical Mathematics 49,271-283.
[36] Horvath, L., and Huskova, M (2012). Change-point detection in panel data. Journal of
Time Series Analysis 33(4), 631-648.
138
[37] Horvath, L., Huskova, M., Kokoszka, P., and Steinebach, J. (2004). Monitoring changes
in linear models. Journal of Statistical Planning and Inference 126(1),225-251.
[38] Horvath, L., Huskova, M, Rice, G., and Wang, J. (2017). Asymptotic properties of the
cusum estimator for the time of change in linear panel data models. Econometric Theory
33(2), 366-412.
[39] Horvath, L., Kokoszka, P., and Steinebach, J. (2007). On sequential detection of param-
eter changes in linear regression. Statistics and Probability Letters 77(9),885-895.
[40] Huskova, M., and Koubkova, A. (2005). Monitoring jump changes in linear models.
Journal of Statistical Research 39:2, 51-70.
[41] Jiang, P., and Kurozumi, E. (2019). Power properties of the modified CUSUM tests.
Communications in Statistics - Theory and Methods, 48(12), 2962-2981.
[42] Jiang, P., and Kurozumi, E. (2020). Monitoring parameter changes in models with a
trend. Journal of Statistical Planning and Inference, 207, 288-391.
[43] Juhl, T., and Xiao, Z. (2009). Tests for changing mean with monotonic power. Journal
of Econometrics 148, 14-24.
[44] Kejriwal, M. (2009). Tests for a mean shift with good size and monotonic power. Eco-
nomics Letters 102, 78-82.
[45] Kim, D. (2011). Estimating a common deterministic time trend break in large panels
with cross sectional dependence. Journal of Econometrics, 164(2), 310-330.
[46] Kim, D. (2014). Common breaks in time trends for large panel data with a factor struc-
ture. Econometrics Journal, 17(3), 301-337.
[47] Kramer, W., W. Ploberger, and Alt, R. (1988). Testing for Structural Change in Dynamic
Models. Econometrica 56, 1355-1369.
[48] Kuan, C. M. (1998). Tests for changes in models with a polynomial trend. Journal of
Econometrics 84(1),75-91.
[49] Kurozumi, E. (2017). Monitoring parameter constancy with endogenous regressors. Jour-
nal of Time Series Analysis 38(5),791-805.
139
[50] Lee, S., Lee, Y., and Na, O. (2009). Monitoring distributional changes in autoregressive
models. Communications in Statistics-Theory and Methods 38,2969-2982.
[51] Leisch, F., Hornik, K., and Kuan, C. M. (2000). Monitoring structural changes with the
generalized fluctuation test. Econometric Theory 16(6),835-854.
[52] Li, D., Qian, J., and Su, L. (2016). Panel Data Models With Interactive Fixed Effects and
Multiple Structural Breaks. Journal of the American Statistical Association, 111(516),
1804-1819.
[53] Luger, R. (2001). A modified CUSUM test for orthogonal structural changes. Economics
Letters 73, 301-306.
[54] Lumsdaine, R. L., Okui, R., and Wang, W. (2020). Estimation of Panel Group Structure
Models with Structural Breaks in Group Memberships and Coefficients. SSRN Electronic
Journal.
[55] Na, O., Lee, Y., and Lee, S. (2011). Monitoring parameter change in time series models.
Statistical Methods and Applications 20(2),171-199.
[56] Oka, T., and Perron, P. (2018). Testing for common breaks in a multiple equations
system. Journal of Econometrics, 204, 66-85.
[57] Okui, R., and Wang, W. (2020). Heterogeneous structural breaks in panel data models.
[58] Pauwels, L.L., Chan, F., and Griffoli, T.M. (2012). Testing for structural change in
heterogeneous panels with an application to the Euro’s trade effect. Journal of Time
Series Econometrics, 4:1-33.
[59] Perron, P. (1989). The great crash, the oil price shock, and the unit root hypothesis.
Econometrica 57(6),1361-1401.
[60] Perron, P., and Yabu, T. (2009). Testing for shifts in trend with an integrated or sta-
tionary noise component. Journal of Business and Economic Statistics 27(3),369-396.
[61] Phillips, P.C.P., and Solo, V. (1992). Asymptotics for linear processes. Annals of Statis-
tics, 20:971-1001.
[62] Ploberger, W., and Kramer, W. (1990). The local power of the CUSUM and CUSUM of
squares tests. Econometric Theory 6, 335-347.
140
[63] Ploberger, W., and Kramer, W. (1992). The CUSUM test with OLS residuals. Econo-
metrica 60, 271-285.
[64] Qi, P., Duan, X., Tian, Z., and Li, F. (2016). Sequential monitoring for changes in models
with a polynomial trend. Communications in Statistics-Simulation and Computation
45(1),222-239.
[65] Qian, J., and Su, L. (2016). Shrinkage estimation of common breaks in panel data models
via adaptive group fused lasso. Journal of Econometrics, 191, 86-109.
[66] Shao, X., and Zhang, X. (2010). Testing for change points in time series. Journal of
American Statistical Association 105, 1228-1240.
[67] Stohr, C. (2019). Sequential change point procedures based on U-statistics and
the detection of covariance changes in functional data. On Semantic Scholar
(https://www.semanticscholar.org/), DOI:10.25673/13826
[68] Tang, S. M., and MacNeill, I. B. (1993). The effect of serial correlation on tests for
parameter change at unknown time. Annals of Statistics 21, 552-575.
[69] Vogelsang, T. J. (1999). Sources of nonmonotonic power when testing for a shift in mean
of a dynamic time series. Journal of Econometrics 88, 283-299.
[70] Westerlund, J. (2019). Common Breaks in Means for Cross-correlated Fixed-T Panel
Data. Journal of Time Series Analysis, 40(2), 248-255.
[71] Xia, Z. M., Guo, P. J., and Zhao, W. Z. (2011). CUSUM methods for monitoring struc-
tural changes in structural equations. Communications in Statistics-Theory and Methods
40:6, 1109-1123.
[72] Yamazaki, D., and Kurozumi, E. (2015). Improving the finite sample performance of
tests for a shift in mean. Journal of Statistical Planning and Inference 167, 144-173.
[73] Yang, J., and Vogelsang, T. J. (2011). Fixed-b analysis of LM-type tests for a shift in
mean. Econometrics Journal 14, 438-456.
141
top related