nonlinear model specification/diagnostics:...
Post on 19-Mar-2020
11 Views
Preview:
TRANSCRIPT
We are grateful to Ivan Pastine for helpful comments and to M.J. Hinich and B. LeBaron1
for sharing their computer codes with us. The authors may be contacted at ashleyr@vt.edu andamex@vt.edu, respectively. This paper is available as Economics Department Working PaperE98-06 at http://ashleymac.econ.vt.edu/working_papers/E9806.pdf; an MSDOS programimplementing the calculations is available at http://ashleymac.econ.vt.edu/working_papers/toolzipd.exe.
Preliminary: Do not quote from this version.
NONLINEAR MODEL SPECIFICATION/DIAGNOSTICS:
INSIGHTS FROM A BATTERY OF NONLINEARITY TESTS 1
Richard A. Ashley
Department of Economics
Virginia Tech
Douglas M. Patterson
Department of Finance
Virginia Tech
July, 1998
Abstract
We present a comprehensive analysis of the most popular statistical tests used to detect
nonlinear dependence in time series data, including the BDS, Engle LaGrange Multiplier (LM),
McLeod-Li, Tsay, Hinich bicovariance, and Hinich bispectrum tests. The size of each test is
evaluated using serially i.i.d. data drawn from the gaussian, exponential, Student’s t , and
symmetric stable Paretian distributions. The power of each test is evaluated using serially
dependent data generated from General Autoregressive Conditional Heteroskedastic (GARCH),
Self-Exciting Threshold Autoregression (SETAR), Markov switching, quadratic, and cubic
processes. The results presented here are unique because of the wide variety of null and
alternative processes considered, because of the breadth of the study in terms of the tests
considered, and because each test is implemented using both the usual asymptotic theory and the
bootstrap.
The simulations using serially i.i.d data drawn from the exponential distribution allow us to
directly examine the sensitivity of each test’s actual size to asymmetry in the sample data;
similarly, using data drawn from the Student’s t and Paretian distributions allows us to examine
the impact of leptokurtosis and moment failure in the data. Simulating the tests with the serially
dependent data allows us quantify the relative power of each test across the various alternative
generating processes. We find that the differential relative power of the tests across the
alternatives is substantial. Therefore we conclude that the application of the full battery of tests to
sample data is potentially informative in terms of identifying the form of the underlying process.
As an example, we apply the tests to data on U.S. real GNP and to data simulated from
several estimated models for real GNP. Our battery of test results on the actual data confirm that
the generating mechanism for real GNP is nonlinear. The resulting pattern of test results
constitutes a new “stylized fact” about real GNP which any putative model for real GNP ought to
reproduce. By simulating data from each of several estimated models for real GNP in the
literature, we are able to estimate the probability that each of these models could generate data
exhibiting the pattern of nonlinearity test results observed with the actual data. In this way, we
find that it is very unlikely that the observed nonlinearity in U.S. real GNP is generated by either a
SETAR model or a Markov switching mechanism.
1
1. Introduction
Satisfactory methods for detecting linear serial dependence in time series and for
specifying statistically adequate models for such dependence, if detected, have been available for a
long time. The same cannot be said regarding nonlinear serial dependence, however.
This limited progress is certainly not for want of potential applications. Numerous
theoretical macroeconomic models are highly nonlinear, from Hicks’ (1950) elaboration of the
Samuelson multiplier-accelerator theory, to Grandmont’s (1985) overlapping generations model,
to labor hoarding models such as Hall (1990), and to recent models, such as Palm and Pfann
(1997), which are based on an explicit treatment of asymmetric adjustment costs. The
nonlinearity in these models is intrinsic to the macroeconomic hypotheses embodied therein and
essential to the derivation of observed macroeconomic properties, such as asymmetric business
cycles.
Nor is there any dearth of empirical support for such intrinsic nonlinearity, in large part
because a good deal of work has been done on the detection of nonlinear serial dependence.
Numerous tests for nonlinear serial dependence have been proposed and applied. Granger and
Andersen (1978), for example, suggested an examination of the sample correlogram of the
squared times series data corr(X , X ) leading to the McLeod and Li (1983) test, to the2 2t t -k
Engle (1982) LM test, and to many applications {e.g., Bollerslev (1986)} examining financial and
macroeconomic time series for ARCH and/or GARCH effects. In a separate line of research,
Subba Rao (1980) and Hinich (1982) developed tests for nonlinearity based on the observation
that the bispectrum the double fourier transformation of the third order moments, E(X X X )t t-j t-k
is flat across all frequency pairs if X ’s generating mechanism is linear. Ashley, Patterson, andt
Altug, Ashley and Patterson (1997) provides a partial exception to this conclusion; they2
use a sequence of nonlinearity tests to demonstrate that the nonlinearity in real GNP is generatedin the labor markets rather than in the capital markets or via exogenous technological shocks.
2
Hinich (1986) demonstrated that Hinich’s bispectral test has substantial power to detect the kinds
of nonlinearity generated by common statistical models; Ashley and Patterson (1989) showed that
the bispectral test can detect the kinds of nonlinearity intrinsic to simple theoretical
macroeconomic models (e.g., a stochastic Hicks economy) and, further, that it can detect
nonlinearity in the actual macroeconomy, using monthly data on the U.S. Index of Industrial
production. Similarly, Hinich and Patterson (1985) used the bispectral test to uncover widespread
nonlinearity in firm-level stock return data. These and other tests are described below.
Thus, a number of tests have been developed and nonlinear generating mechanisms have
been thereby detected in a number of important settings. However, these detections have
provided little guidance as to the form of the underlying nonlinear generating mechanism . 2
In this paper we present the results of a comprehensive comparison of the major tests for
nonlinearity in time series. Our results are unique because of the wide variety of null and
alternative processes considered, because of the breadth of the study in terms of the tests
considered, and because each test is implemented using both the usual asymptotic theory and the
bootstrap. Simulating the tests with serially i.i.d. data drawn from the exponential distribution
allows us to examine the sensitivity of each test’s empirical size to skewness in the data;
simulating the tests with serially i.i.d. data drawn from the Student’s t(df) distribution allows us to
examine the sensitivity of each test’s empirical size to leptokurtosis. Since stock return data has
been posited to follow a symmetric stable Paretian distribution, simulating the tests with serially
i.i.d. data drawn from this distribution allows us to examine the sensitivity of each test’s empirical
3
size in an empirically relevant circumstance where the data’s higher order moments do not exist.
Simulating the tests with serially dependent data generated by various processes allows us
quantify the relative power of each test across the various alternatives. We find that the
differential relative power of the tests across the alternatives is substantial. Therefore we
conclude that the application of the full battery of tests to sample data is potentially informative in
terms of identifying the form of the underlying process.
The remainder of this paper is organized as follows. Section 2 describes the statistical
tests considered; Section 3 describes the models used to generate the simulated data. In Section 4
we present our results on the empirical sizes of the tests; here we draw conclusions as to the
extent to which the actual size of each test is sensitive to characteristics such as asymmetry,
leptokurtosis, or moment failure in the underlying distribution of the data. In Section 5 we
present results on the power of the tests against various alternatives. In this Section we draw
conclusions as to which of the tests is most broadly powerful at detecting nonlinearities of the
forms considered and as to what the various tests can tell us about the form of the nonlinear
generating mechanism for the data.
Finally, we apply the tests to U.S. real GNP data in Section 6. Here we find, as expected,
that the generating mechanism for real GNP is nonlinear. However, the pattern of results across
the different nonlinearity tests is notably different from the patterns observed in the simulated
data. We conclude that this pattern of nonlinearity test results itself constitutes a new “stylized
fact” about U.S. real GNP and investigate whether or not existing estimated models for real GNP
can reproduce this stylized fact. Our results shed doubt on the commonly held notion that real
output is generated by some sort of switching process.
In our implementation p is chosen to minimize the Schwartz (SC) criterion. In contrast to3
alternative choices (e.g., AIC or FPE) the Schwartz criterion is known to be consistent for AR(p)order determination under the null hypothesis of a linear generating mechanism; see Judge, et al.(1985, p. 246).
Ashley, Patterson, and Hinich (1986) have shown that the test statistic for the Hinich4
bispectral test is invariant to linear filtering of the data, so the adequacy of the prewhitening modelis irrelevant to the validity of this test.
4
2. Testing for Nonlinearities
In this section, we provide a brief description of the statistical tests implemented below. These
include a test for ARCH effects due to McLeod and Li (1983), the Engle (1982) LM test for
ARCH effects, the BDS test proposed by Brock, Dechert, and Scheinkman (1996), the Tsay
(1986) test for quadratic serial dependence, the bicovariance test due to Hinich (1996) and Hinich
and Patterson (1995), and the Hinich bispectral test proposed in Hinich (1982) and studied in
Ashley, Patterson, and Hinich (1986) and in Ashley and Patterson (1989).
Except for the Hinich bispectral test, these tests all share the same premise: once any
linear serial dependence is removed from the data via a prewhitening model, any remaining serial
dependence must be due to a nonlinear generating mechanism. Thus, each of these procedures is
actually a test of serial independence applied to the (by construction) serially uncorrelated fitting
errors of an AR(p) model for the sample data. This fitting error series, standardized to zero3
mean and unit variance, is denoted by {x } below.t4
McLeod-Li Test
This test for ARCH effects was proposed by McLeod and Li (1983) based on a suggestion in
r̂ (k) 'jT
t'k%1x 2
t & F̂2 x 2t&k & F̂2
jT
t'1x 2
t & F̂2
F̂2 ' jT
t'1
x 2t
T
T r̂ ' r̂ (1) , ... , r̂ (L)
Q ' T(T % 2) jL
i'1
r̂ 2(k)T & i
5
Granger and Andersen (1978). It looks at the autocorrelation function of the squares of the
prewhitened data and tests whether corr(x , x ) is non-zero for some k. The autocorrelation2 2t t -k
function for the squared residuals {x } is estimated by:2t
where
Under the null hypothesis that x is an i.i.d process (and assuming that E(x ) exists) McLeod andt t8
Li (1983) show that, for fixed L:
is asymptotically a multivariate unit normal. Consequently the usual Box-Ljung statistic
is asymptotically P (L) under the null hypothesis of a linear generating mechanism for the data.2
x 2t ' "o % j
p
i'1"k x 2
t& i % <t
x mt ' xt , ... , xt%m&1
Cm,T (,) ' jt<s
I, x mt , x m
s2
Tm (Tm&1)
6
Engle LM Test
This test was proposed by Engle (1982) to detect ARCH disturbances; as
Bollerslev(1985) suggests, it should also have power against GARCH alternatives. As with most
LaGrange Multiplier tests, the test statistic itself is based on the R of an auxiliary regression, in2
this case:
Under the null hypothesis of a linear generating mechanism for x , TR for this regression ist2
asymptotically distributed P (p).2
BDS Test
The BDS test is a nonparametric test for serial independence based on the correlation
integral of the scalar series, {x }. For embedding dimension m, let {x } denote the sequence oft tm
m-histories generated by {x }:t
Then the correlation integral C (,) for a realization of T is given by:m,T
xt ' (o % jK
i'1(i v̂t i % 0t.
7
where T = T - (m - 1) and I (x , x ) is an indicator function which equals one if the sup normm , t sm m
2x - x 2 < , and equals 0 otherwise. Brock, Dechert, and Scheinkman (1996) exploit thet sm m
asymptotic normality of C (,) under the null hypothesis that {x } is an i.i.d. process to obtain am,T t
test statistic which asymptotically converges to a unit normal.
Tsay Test
The Tsay (1986) test explicitly looks for quadratic serial dependence.
Let z denote the projection of z on the subspace orthogonal to x , ... , x i.e., the^t t t-1 t-k
residuals from a regression of z on x , ... , x . And let the K = k(k-1)/2 column vectors V ...t t-1 t-k 1
V contain all of the possible crossproducts of the form x x . Thus, v = x ; v = x x ; K t-I t-j t,1 t -1 t,2 t-1 t-22
v = x x ; v = x x ; v = x x , and so forth. t,3 t-1 t-3 t,k+1 t-2 t-3 t,k+2 t-2 t-4
Then estimate ( ... ( by applying OLS to the regression equation1 K
The Tsay test statistic is then just the usual F statistic for testing the null hypothesis that ( ... (1 K
are all zero.
C3 (r ,s) ' (T & s)&1jT& s
t'1xt xt% r xt% s
O3 ' (T & s) .5jR
s'2js&1
r'1C3( r , s) 2
8
Hinich Bicovariance Test
This test assumes that {x } is a realization from a third-order stationary stochastic processt
and tests for serial independence using the sample bicovariances of the data. The (r,s) sample
bicovariance is defined as:
Under the null hypothesis that {x } is an i.i.d. process, Hinich and Patterson (1995) show that, fort
R < T ,.5
is asymptotically distributed P (R [R -1]); they recommend using R = T since they find that the2 .4
power of the test declines for smaller values of R .
cyyy(r ,s) ' E y(t) y(t% r) y(t%s) ,
By (ƒ1 , ƒ2) ' j4
r'&4j4
s'&4cyyy(r ,s) exp[& i2B (ƒ1r % ƒ2 s) ]
y(t) ' j4
n '0a(n) u(t&n) ,
See Brillinger and Rosenblatt (1967) for a rigorous treatment of the bispectrum.5
9
Hinich Bispectral Test
Suppose that {y(t)}, the series of interest, is a third-order stationary time series with, for
expositional convenience, E[y(t)] = 0. The series {y(t)} might be serially correlated, in which
case it is distinct from the prewhitened fitting error series denoted {x(t)} above. Letting c (r,s)yyy
denote the third order cumulant function for {y(t)},
the bispectrum of {y(t)} at frequency pair (ƒ , ƒ ) is its (double) Fourier transform:1 2
B (ƒ , ƒ ) is a spatially periodic function of (ƒ , ƒ ), whose principal domain is the triangular sety 1 2 1 2
S = {0 < ƒ < ½, ƒ < ƒ , 2ƒ +ƒ < 1}. 1 2 1 1 25
The generating mechanism for {y(t)} is linear if and only if it can be expressed as
where {u(t)} is a serially i.i.d. process and the weights {a(n)} are fixed. Letting S (ƒ) denote they
j4
n '0|a(n) | < 4 ,
Q2 (ƒ1 ,ƒ2 ) /|By (ƒ1 , ƒ2) |2
Sy(ƒ1) Sy(ƒ2) Sy(ƒ1% ƒ2)
Y jN
' jN&1
t'0y(t) exp & i2Bjt
N,
Fx( j, k) ' X( jN
) X( kN
) X (( j%kN
).
10
spectrum of {y(t)} at frequency ƒ and assuming that
Ashley, Patterson, and Hinich (1986) show that the squared skewness function,
is a constant for all frequency pairs (ƒ , ƒ ) in S whenever the process generating {y(t)} is linear.1 2
Consequently, under the null hypothesis of a linear generating mechanism, sample
estimates of Q (ƒ , ƒ ) for different frequency pairs will differ from one another no more than one21 2
would expect due to sampling error; this is the basis for the Hinich bispectral test.
Using an N-sample of data {y(0), y(1), ... y(N-1)}and letting Y(j/N) denote
F (j,k) provides an unbiased estimate of B (2Bj/N , 2Bk/N), where x y
See Hinich (1982) and Ashley, Patterson, and Hinich (1986) for details. Based on6
simulation results in the latter paper, M is set to the integer closest to N in the calculations.55
reported below.
11
However just as the sample periodogram must be smoothed (averaged over its values at
adjacent frequencies) in order to provide a consistent estimator of the spectrum, S (ƒ) F (j,k)y x
must be smoothed to obtain a consistent estimator of B (2Bj/N , 2Bk/N). Hinich (1982) showsy
that, properly averaged over a square of M adjacent values of F (j,k), this smoothed estimator of2x
the bispectrum yields an estimator of Q (ƒ ,ƒ ) which is asymptotically distributed as a noncentral21 2
chi square variate with 2 degrees of freedom and a noncentrality parameter proportional to
Q (ƒ ,ƒ ). Under the null hypothesis of a linear generating mechanism, this estimator of Q (ƒ ,ƒ )2 21 2 1 2
should have a dispersion consistent with this noncentral chi squared distribution; this proposition
is tested using standard results (e.g., David (1970)) on the asymptotic distribution of the
interquartile range of a sample drawn from a given distribution.6
Uniform and normal deviates were obtained using the RAN1 and GASDEV routines7
given by Press, et al. (1986), respectively. The adequacy of these pseudorandom deviates for thepresent purpose was confirmed by comparing our results to those obtained using IMSLsubroutines and also by observing that our results converge to those obtained from asymptotictheory when the sample size is large. Student’s t deviates were generated from its definition,using normal deviates to generate a P deviate, etc.2
12
3. Data Generation Models
Each of the data generation models was simulated using serially i.i.d innovations. These
data were generated from the unit normal distribution and from three additional distributions.
Data was generated from the Student’s t distribution with 5 degrees of freedom to simulate the
effects of a leptokurtic (fat-tailed) distribution; data was generated from the exponential
distribution to simulate the effects of an asymmetric innovation distribution. And, to simulate the
effect of an innovation distribution for which finite second (and higher) moments do not exist, we
also used the exact algorithm of Kanter and Steiger (1974) to generate symmetric stable Paretian
innovations. 7
In a (perhaps quixotic) attempt to span the space of nonlinear processes considered in the
literature, the seven models given in Table 1 were simulated. The first nonlinear process listed is
the “pure GARCH” process. Although many nonlinear processes display conditional
heteroskedasticity, the GARCH(1,1) model simulated here (like all ARCH/GARCH processes) is
“pure” in that it is a martingale difference: by construction, only its variance is serially dependent.
The next process listed is a generic switching regression. A number of switching models have
been considered in the literature e.g., Tong and Lim (1980), Hamilton (1989), and Teräsvirta
and Anderson (1992). Here, we have simulated a typical threshold autoregressive (SETAR)
model from Tong and Lim (1980) and a simple markov switching model in the spirit of Hamilton
Several more sophisticated markov switching models, estimated by Lam (1997) using8
U.S. real GNP data, are considered in Section 6.
13
(1989). Finally, we consider several models suggested by the usual Volterra expansion. Two8
quadratic models are considered, so as to include both a forecastable process and a martingale
process. Two cubic models were also considered. The “pure cubic” is included because of its
asymmetry; the “bicubic” is included so as to examine the sensitivity of the tests to more general
third order terms.
, is iid(0,1) in all models. As noted in the text, the distributions for , considered were:9t t
gaussian, Student’s t, exponential, and symmetric Paretian.
14
Table 1. Summary of Data Generating Processes Considered9
Serially i.i.d. noise model x = ,t t
Pure GARCH(1,1) model x = (h ) ,t t t
.5
h = .011 + .12 (x ) + .85 ht t-1 t-12
Switching Models:
SETARx = -.5 x + , if x < 1t t-1 t t-1
x = .4 x + , otherwiset t-1 t
Two State Markov x = .4 x + , if in state 2
x = -.5 x + , if in state 1t t-1 t
t t-1 t
(Remain in state with probability .90)
Quadratic models:
martingale x = , + .6 , [, +.6, +.6 , +.6 , +.6 , ]t t t t-1 t-2 t-3 t-4 t-52 3 4
non-martingale x = , + .6 , [, +.6, +.6 , +.6 , +.6 , ]t t t-1 t-2 t-3 t-4 t-5 t-62 3 4
Cubic models:
pure cubic x = , + .2 [, ]t t t-13
bicubic x = , + .6 , [(, ) + .8(, ) + .8 (, ) + .8 (, ) ]t t t-1 t-2 t-3 t-4 t-52 2 2 2 3 2
15
4. An Examination of the Sensitivity of Empirical Test Size to the Distribution of the Data
Like most econometric procedures, the tests described above are only asymptotically
justified. Particular concern has been expressed about the validity of the BDS test for reasonable
sample sizes e.g., Ramsey and Yuan (1987) and, to some degree, addressed in Brock, et al.
(1991). More recently, de Lima (1997) has considered the behavior of a number of nonlinearity
tests where the moment restriction assumptions underlying the asymptotic distributions of these
tests are not satisfied, finding particular problems in situations involving leptokurtic (heavy-tailed)
data.
Because we share these concerns, we routinely bootstrap the significance levels of all the
tests used here, as well as computing significance levels based on asymptotic theory. This is very
straightforward. After pre-whitening, so that the data is serially i.i.d. under the null hypothesis of
a linear generating mechanism, we draw 1000 N-samples at random from the empirical
distribution of the observed N-sample of data. The bootstrap significance level for a given test is
then just the fraction of these 1000 “new” N-samples for which the test statistic exceeds that
observed in the sample data. It is simple enough to confirm that 1000 bootstrap replications is
sufficient by merely observing that the results are invariant to increasing this number; it is
distinctly less clear that N itself is sufficiently large: after all, the pre-whitening procedure and
bootstrap itself are themselves only asymptotically justified.
Consequently, we examined the actual size of each test (using both asymptotic theory and
the bootstrap) using samples of 200 serially i.i.d. variates generated from each of four
distributions: gaussian, exponential, Student’s t with 5 degrees of freedom, and the symmetric
Symmetric stable Paretian variates were simulated using the exact algorithm given by10
Kanter and Steiger (1974). The de Lima (1997) article arbitrarily considers " = 1.50; we chose "= 1.93 because this is the value Fama (1965) estimates for U.S. stock data.
The fact that one of McLeod-Li bootstrap size estimates lies outside the 95% confidence11
interval around .05 is inconsequential in view of the number of estimates made. The bootstrapitself is only asymptotically justified; apparently the bispectral test is so ill-behaved withexponential or pareto data that samples larger than N = 200 are necessary.
16
stable Paretian distribution with " = 1.93 . The exponential distribution is quite asymmetric. 10
Both of the latter two distributions are heavy-tailed to the point where the symmetric stable
Paretian distribution with this index value has infinite variance.
The results of these calculations for N = 200 are given in Table 2. We observe that the
concerns about the small-sample validity of the tests in particular, the BDS test are justified, at
least at this sample length. In contrast, the bootstrap results appear to be satisfactory for all of the
tests, except for the Hinich bispectral test with the heavy-tailed distributions. We conclude that11
it is reasonable to proceed using the bootstrapped tests for samples of roughly this length or larger
without further concern about the form of the data’s distribution.
Results significantly different from .05 are marked with an asterisk. All figures quoted are based on 1000 generated samples. 12
The 5% critical region for each test was obtained using 1000 bootstrap replications. Under the null hypothesis that the actual size is.05, an (asymptotic) 95% confidence interval for these estimates is (.036, .064). The parameters L, p, m, k, R , and M are defined inSection 2, where each test is discussed.
17
Table 2 Empirical Size of 5% Tests12
Serially i.i.d. Data 200 Observations
McLeod-Li Engle LM BDS Tsay Bicov. Bispectral
L = 24 p = 5 m = 2 m = 3 m = 4 k = 5 R = 8 M = 11
Bootstrap
Gaussian .050 .059 .042 .049 .062 .064 .056 .054
Student’s t(5) .054 .056 .052 .050 .053 .044 .052 .063
Exponential .040 .053 .055 .053 .055 .055 .050 .006*
Paretian " =1.93 .039 .052 .051 .052 .048 .050.035* .027*
Asymptotic Theory
Gaussian .050 .052 .053 .053.072* .087* .099* .102*
Student’s t(5) .060 .040 .055.089* .077* .085* .032* .122*
Exponential .050 .057 .061.065* .065* .066* .088* .630*
Paretian " =1.93 .039 .036 .052 .054.070* .074* .078* .236*
18
5. The Differential Power of the Tests Across the Alternatives:
Implications for Model Identification
In this Section we discuss our estimates of the power of each test against the various
alternative data generating processes discussed in Section 3. Our goal is to answer the following
questions:
1. Do one or more of the tests have high power against all of the alternative processes?
2. Is the pattern of power estimates across the alternative processes similar for all of the tests?
3. For a given generating process, are the results of all of the tests highly correlated across the
simulations?
If one of the tests dominates all the rest in terms of power, then this test is the one to use
as a “nonlinearity screening test” to, for example, routinely check the fitting errors of a proposed
model for a time series. On the other hand, since such a test has relatively high power against all
of the alternatives, it conveys very little information as to which kind of nonlinear model is
appropriate.
Paradoxically, such identifying information is only obtainable from tests whose
performance is uneven across the alternatives. In this context, a test conveys identifying
information to the extent that it is either particularly powerful or particularly unpowerful against a
limited subset of the alternatives.
Finally, we sought to examine the correlations between the results of the tests for a given
alternative. Here, again paradoxically, what is useful is a lack of consistency. For example, we
generated 250 samples of 200 observations from the SETAR model described in Section 3. If test
19
#1 rejects the null hypothesis of linearity at the 5% level for, say, 200 of these samples, then it has
higher power than test #2 which only rejects the null for 150 samples. But were most of these
150 samples among the 200 samples for which test #1 rejected, or not? If both tests reject over
basically the same samples, then test #2 is simply an inferior alternative to test #1. In contrast, if
this “rejection overlap” is small, then test #2 is sensitive to a different aspect of the data set than
test #1 and provides separately useful information as to whether or not to reject the null
hypothesis in this case it is worthwhile to do both tests and perhaps combine them into a
portmanteau test.
Our estimated power results are given in Table 3 and in Tables 6 to 11 below. Table 3
summarizes the results for all seven generating processes discussed in Section 3, all driven by
gaussian innovations. Each of Tables 6 to 11 focuses on one generating process and compares
the power of the tests across the four innovation distributions considered.
No single test dominates all the others across all seven alternative generating processes.
However, the BDS test clearly stands out in terms of overall power against a variety of
alternatives: it has distinctly the highest power for the bicubic and quadratic processes and is a
close second for the GARCH and pure cubic processes. For the SETAR process the Tsay test
stands out, but even there the BDS test still exibits reasonable power. We conclude that the BDS
test is the the best test of this group for use as a “nonlinearity screening test.”
On the other hand, this same consistently high power across the alternatives also implies
that the BDS test conveys very little information as to what kind of nonlinear process generated
the data. Here it is inconsistent power against the alternatives that is useful. In this context the
Tsay test stands out as a possible marker for SETAR models in particular and for switching
This result is confirmed using simulated data from the Potter (1995) SETAR model for13
U.S. real GNP; see Section 6 below.
20
models generally. The results given in Table 5 indicate that this result holds up across all four
innovation distributions but, obviously, this result needs to be confirmed across a variety of
different SETAR and other switching models. 13
Next we turn to the third question raised at the beginning of this Section. Generally
speaking, there appear to be few complementarities among the tests when a particular test’s
power exceeds that of another against a given alternative, we typically find that the less powerful
test is rejecting the null hypothesis over basically the same sample replications in which the more
powerful test rejects the null also. However, the quadratic models with gaussian innovations
provided some exceptions to this result for the Bicovariance, Tsay, and Engle LM tests. For
example, a crosstabulation of the Bicovariance vs. Tsay test results for the non-martingale
quadratic process is given in Table 4 below. Clearly, neither test is particularly effective in this
instance compared to, say, the BDS test, but the Bicovariance test is rejecting on 27 of the 110
replications that the Tsay test “misses” and the Tsay test is rejecting on 47 of the 130 replications
that the Bicovariance test “misses.” However, Figure 1 a crossplot of all 250 generated
significance levels for these tests shows that the results of these two tests are still rather highly
correlated. Consequently, we conclude that construction of portmanteau tests is probably not
worth pursuing.
All figures quoted are based on 250 generated samples. The 5% critical region for each test was obtained using 100014
bootstrap replications. Data generated from the cubic models were serially correlated, so the bootstrap was, in these cases, applied tothe residuals from a prewhitening model. The parameters L, p, m, R , k, and M are defined in Section 2, where each test is discussed. BDS test results were calculated for , equal to .5, 1, and 2 standard deviations; for brevity (and without much loss of information)results are quoted only for , = 1. The generating models (GARCH, SETAR, etc.) are discussed in Section 3.
21
Table 3 Power Estimates of 5% Tests14
Gaussian Innovations 200 Observations
McLeod-Li Engle LM BDS Tsay Bicov. Bispectral
L = 24 p = 5 m = 2 m = 3 m = 4 k = 5 R = 8 M = 11
GARCH .75 .72 .63 .76 .82 .43 .74 .32
Switching Models
SETAR .13 .16 .71 .66 .60 .85 .14 .23
Markov .16 .33 .57 .54 .51 .07 .15 .07
Quadratic
martingale .24 .40 .86 .90 .90 .34 .40 .25
non-martingale .51 .68 .73 .81 .82 .85 .84 .26
Cubic
pure cubic .44 .77 .77 .76 .71 .26 .36 .06
bicubic .65 .82 .96 1.00 1.00 .66 .76 .24
22
Table 4
Bicovariance Test Versus Tsay Test Crosstabulation
(Quadratic Non-Martingale Model with Gaussian Innovations)
Power for 5% Bicovariance Test .48
Power for 5%Tsay Test .56
Fraction of replications both tests reject null .37
Fraction Bicovariance test alone rejects .11
Fraction Tsay test alone rejects .19
Fraction neither test rejects .33
Transformed Significance LevelsQuadratic Non-Martingale Data Generation Model
-2
-1
0
1
2
3
4
-3 -2 -1 0 1 2 3 4
Tsay Test Sig. Level
Bic
ova
rian
ce T
est
Sig
. Lev
el
The test significance levels in this figure are transformed with an inverse gaussian c.d.f. to spread the data out and make the graph more interpretable e.g.,15
the results in Table 4 correspond to the points in this figure for which one or both transformed significance levels exceeds two. The “bandedness” discernable at theupper and rightmost margins of this figure is an artifact due to the finite number (1000) of bootstrap replications done for each test.
23
Figure 115
All figures quoted are based on 250 generated samples. The 5% critical region for each test was obtained using 100016
bootstrap replications. The parameters L, p, m, R , k, and M are defined in Section 2, where each test is discussed.
See Section 3, Table 1.17
24
Table 5Empirical Power of 5% Tests16
Data generated from GARCH(1,1) model 200 Observations17
McLeod-Li Engle LM BDS Tsay Bicov. Bispectral
L = 24 p = 5 m = 2 m = 3 m = 4 k = 5 R = 5 M = 11
Gaussian .75 .72 .63 .76 .82 .43 .74 .32
Student’s t(5) .47 .60 .57 .68 .77 .23 .46
Exponential .78 .71 .93 .97 .98 .79 .93
Paretian " =1.93 .91 .94 .97 .98 .98 .97 .98
All figures quoted are based on 250 generated samples. The 5% critical region for each test was obtained using 100018
bootstrap replications. The parameters L, p, m, R , k, and M are defined in Section 2, where each test is discussed.
See Section 3, Table 1.19
25
Table 6Empirical Power of 5% Tests18
Data generated from SETAR model 200 Observations19
McLeod-Li Engle LM BDS Tsay Bicov. Bispectral
L = 24 p = 5 m = 2 m = 3 m = 4 k = 5 R = 8 M = 11
Gaussian .13 .16 .71 .66 .60 .85 .14 .23
Student’s t(5) .07 .16 .67 .61 .59 .85 .08 .19
Exponential .08 .14 .92 .97 .98 .83 .15 .07
Paretian " =1.93 .06 .16 .57 .49 .41 .66 .16 .12
All figures quoted are based on 250 generated samples. The 5% critical region for each test was obtained using 100020
bootstrap replications. The parameters L, p, m, R , k, and M are defined in Section 2, where each test is discussed.
See Section 3, Table 1.21
26
Table 7Empirical Power of 5% Tests20
Data generated from Two State Markov model 200 Observations21
McLeod-Li Engle LM BDS Tsay Bicov. Bispectral
L = 24 p = 5 m = 2 m = 3 m = 4 k = 5 R = 8 M = 11
Gaussian .16 .33 .57 .54 .51 .07 .15 .07
Student’s t(5) .17 .40 .58 .56 .50 .10 .16
Exponential .24 .44 .89 .92 .89 .30 .36
Paretian " =1.93 .15 .32 .70 .65 .60 .14 .24
All figures quoted are based on 250 generated samples. The 5% critical region for each test was obtained using 100022
bootstrap replications. The parameters L, p, m, R , k, and M are defined in Section 2, where each test is discussed.
See Section 3, Table 1.23
27
Table 8Empirical Power of 5% Tests22
Data generated from Quadratic Martingale model 200 Observations23
McLeod-Li Engle LM BDS Tsay Bicov. Bispectral
L = 24 p = 5 m = 2 m = 3 m = 4 k = 5 R = 8 M = 11
Gaussian .24 .40 .86 .90 .90 .34 .40 .25
Student’s t(5)
Exponential
Paretian " =1.93
All figures quoted are based on 250 generated samples. The 5% critical region for each test was obtained using 100024
bootstrap replications. The parameters L, p, m, R , k, and M are defined in Section 2, where each test is discussed.
See Section 3, Table 1.25
28
Table 9Empirical Power of 5% Tests24
Data generated from Quadratic Non-Martingale model 200 Observations25
McLeod-Li Engle LM BDS Tsay Bicov. Bispectral
L = 24 p = 5 m = 2 m = 3 m = 4 k = 5 R = 8 M = 11
Gaussian .51 .68 .73 .81 .82 .85 .84 .26
Student’s t(5) .36 .56 .73 .81 .83 .79 .72 .22
Exponential .18 .34 .44 .57 .63 .54 .44 .03
Paretian " =1.93
All figures quoted are based on 250 generated samples. The 5% critical region for each test was obtained using 100026
bootstrap replications. The parameters L, p, m, R , k, and M are defined in Section 2, where each test is discussed.
See Section 3, Table 1.27
29
Table 10Empirical Power of 5% Tests26
Data generated from Pure Cubic model 200 Observations27
McLeod-Li Engle LM BDS Tsay Bicov. Bispectral
L = 24 p = 5 m = 2 m = 3 m = 4 k = 5 R = 8 M = 11
Gaussian .44 .77 .77 .76 .71 .26 .36 .06
Student’s t(5) .46 .80 .78 .78 .69 .30 .44
Exponential .06 .13 .92 .90 .88 .96 .35
Paretian " =1.93 .10 .18 .74 .74 .74 .50 .20
All figures quoted are based on 250 generated samples. The 5% critical region for each test was obtained using 100028
bootstrap replications. The parameters L, p, m, R , k, and M are defined in Section 2, where each test is discussed.
See Section 3, Table 1.29
30
Table 11Empirical Power of 5% Tests28
Data generated from Bicubic model 200 Observations29
McLeod-Li Engle LM BDS Tsay Bicov. Bispectral
L = 24 p = 5 m = 2 m = 3 m = 4 k = 5 R = 8 M = 11
Gaussian .65 .82 .96 1.00 1.00 .66 .76 .24
Student’s t(5) .73 .86 .96 .99 1.00 .66 .78
Exponential .61 .80 .93 .98 .98 .90 .89
Paretian " =1.93 .60 .78 1.00 1.00 1.00 .77 .82
31
6. Analysis of U.S. Real GNP
The battery of tests analyzed above was applied to the logarithmic growth rate of U.S. real
GNP over a sample of 163 quarters from 1953I to 1993III. These data are plotted in Figure 2
below; they appear to be reasonably stationary over this time period. The test results themselves
are given in Table 12.
As expected from results in Ashley and Patterson(1989) on the U.S. Index of Industrial
Production and results in Altug, et al. (1995) and in Potter(1995) on real GNP itself, the null
hypothesis of a linear generating mechanism for this time series can be rejected at the 5% level. In
fact, this null hypothesis can be rejected at the 2% level using the Hinich bicovariance test.
The pattern of test results for this time series is quite interesting, however. For one thing,
the strongest rejection is provided by the bicovariance test, a test whose power is fairly low
relative to that of the other tests in most of our simulations. Note also that the BDS test rejects
the null hypothesis at the 3-5% level of significance for m = 3 and m = 4, but does not reject at all
for m = 2. This pattern also appears nowhere in our simulations.
Nowadays, real output is commonly modelled as a two-state switching process of one sort
or another. And the Tsay test, which has high relative power against the simple SETAR
alternative in Table 1, does reject the null hypothesis at the 3% level for the real data. But the
pattern of power results in Table 3 is quite different from the results in Table 12: for the simple
SETAR model, the Hinich bicovariance test has quite low power and the BDS test has relatively
high power for all three values of m. But perhaps data simulated from a SETAR model estimated
to fit U.S. real GNP data will yield a pattern of test results more in keeping with what is observed
The Potter (1995) SETAR model is estimated over real GNP data from 1948III -30
1990IV. In contrast, the results quoted in Table 12 follow the usual practice of truncating thesample to eliminate the Korean War period. (This practice arises because time plots indicate thatit yields sample data which are more plausibly covariance stationary for most macroeconomic timeseries, including real GNP.) In any case, running the tests over the same sample period Potterused produces materially similar results, except that the Hinich bicovariance test’s significancelevel rises to .15, yielding a pattern even less consistent with the test power results obtained usingdata generated from Potter’s estimated model.
32
from applying the tests to the data directly.
To examine this hypothesis we estimated the power of all six tests using 250 data sets
each of length N = 163 simulated from the SETAR model for U.S. real GNP estimated by
Potter (1995). These power estimates are quoted in the first row of Table 13. The pattern of
these power estimates is different from that obtained from the simple SETAR model; however,
this pattern is also quite different from what one would expect if this SETAR model was the
generating mechanism for real GNP. In particular, note that the McLeod-Li and Engle LM tests
have high power against this SETAR alternative, but do not reject the linear null hypothesis on
the actual real GNP data. And the BDS test has high power across all three embedding
dimensions for this SETAR alternative, whereas the BDS test rejects the null hypothesis only for
m=3 and m=4 using the actual data. To assess the statistical significance of this discrepancy, we30
applied the three BDS tests to 1000 data sets simulated from Potter’s estimated SETAR model:
only .2% of these data sets yielded BDS test results matching or exceeding the pattern of results
obtained with the actual data. (See Table 14 for details.) Evidently, the process generating U.S.
real GNP is nonlinear, but not a SETAR process.
We next considered the possibility that the nonlinear process generating U.S. real GNP is
a Markov switching process. As with the SETAR alternative, the simple Markov switching
See Lam (1997) for details. The transition probabilities are also allowed to depend on31
the duration of the previous state.
33
process considered in Sections 3 and 5 yields a pattern of estimated powers for the six tests that is
totally at odds with the pattern of significance levels obtained when the tests are applied to the
real GNP data itself, but this does not eliminate the possibility that a Markov switching model
estimated to fit this data might yield a good match.
The second and third rows of Table 13 give estimates of the power of each test using data
simulated from each of two Markov switching models for U.S. real GNP estimated by Lam
(1997) over the sample period 1952II - 1996IV. Except for the updated sample period, the first
of these models is identical to that of Hamilton (1989) the economy switches probablistically
back and forth between a low growth rate state and a high growth rate state, with a fixed matrix
of state transition probabilities. The second model generalizes the Markov switching framework
to allow the mean growth rate and the matrix of state transition probabilities to depend on the
length of time the economy has been in its current state. 31
The pattern of test results obtained using data generated from these two estimated
Markov switching models is even more dis-similar to that observed using actual GNP data than
was the pattern obtained using data generated from the Potter SETAR model. The results in
Table 3 hinted that the nonlinearity tests might have some difficulty in detecting these kinds of
processes; evidently, these more realistic models exacerbate the problem. In any case, we see
that, if real GNP were in fact generated by one of these Markov switching models, with power
this low it would hardly be likely that the BDS, Tsay, and Hinich bicovariance tests would be
rejecting the linearity null hypothesis on the actual data. Indeed, the results collected in Table 14
U.S. Real GNP Versus Timegrowth rate 1953I - 1993III
-0.03
-0.02
-0.01
0
0.01
0.02
0.03
0 20 40 60 80 100 120 140 160 180
34
indicate that less than .1% of 1000 data sets simulated from these two estimated models yield
BDS test results matching or exceeding those observed with the actual data.
These results demonstrate that the commonly held notion that real output is generated by
a two-state switching model of some sort is seriously in error. Indeed, our results indicate that
the true generating mechanism for U.S. real GNP is more complicated than (or, at least, different
from) all of the alternative generating mechanisms considered here.
Figure 2
Test results given in bold are rejections at the 5% level of the null hypothesis of a linear generating mechanism. The 5%32
critical region for each test was obtained using 1000 bootstrap replications. The parameters L, p, m, R , k, and M are defined in Section2, where each test is discussed.
35
Table 12Significance Levels for Nonlinearity Tests on U.S. Real GNP32
McLeod-Li Engle LM BDS Tsay Bicov. Bispectral
L = 24 p = 5 m=2 m = 3 m = 4 k = 5 R = 7 M = 10
N = 163(53I - 93III) .22 .53 .36 .19.05 .03 .03 .02
All figures quoted are based on 250 generated samples. The 5% critical region for each test was obtained using 100033
bootstrap replications. The parameters L, p, m, R , k, and M are defined in Section 2, where each test is discussed.
36
Table 13Empirical Power of 5% Tests33
Data generated from Estimated Models for U.S. real GNP 163 Observations
McLeod-Li Engle LM BDS Tsay Bicov. Bispectral
L = 24 p = 5 m=2 m = 3 m = 4 k = 5 R = 7 M = 10
Simulated data from Potter (1995) SETAR model for U.S. Real GNP
.85 .92 .83 .88 .90 .96 .92 .20
Simulated data from Lam (1997) re-estimation ofHamilton Markov switching model for U.S. Real GNP
.04 .07 .09 .11 .11 .07 .06 .05
Simulated data from Lam (1997) estimated Markov switching model for U.S. Real GNP with duration dependent switching probabilities
.09 .11 .10 .10 .12 .12 .12 .10
Each generated data set is 163 observations in length to match the sample length of the34
actual data. Let s(m) denote the significance level at which the BDS for embedding dimension mrejects linearity. Then “more extreme” in this context means s(2) $ .36, s(3) # .05, and s(4) #.03.
37
Table 14
Percentage of 1000 Simulated Data Sets Yielding BDS Test Results at m = 2, 3, and 4
More Extreme than Those Observed with the Actual Real GNP Data34
Potter (1995) Hamilton Markov switching switching model with
SETAR model model (constant transition duration dependent transition
Lam (1997) re-estimate of Lam (1997) Markov
probabilities) probabilities
.2% < .1% < .1%
Additional sample lengths (e.g., 100 and 400) will be examined as this project develops.35
38
7. Conclusions
The size and power of the McLeod-Li, Engle LM, BDS, Tsay, and Hinich
bicovariance/bispectral tests are examined above over a wide variety of data generating
mechanisms for samples of length 200. We conclude that:35
(1) At this sample length, bootstrapping is necessary (but sufficient) in order for the tests to be
properly sized.
(2) Of the tests considered, the BDS test has relatively high power against all of the alternatives,
making it a reasonable choice as a “nonlinearity screening test” for routine use.
(3) The test results appear to be quite highly correlated with one another: based on these results
we see little potential benefit in attempting to combine them into a portmanteau test.
(4) Excluding the BDS test, the remaining tests are quite inconsistent in their power across the
various alternatives considered. Some of the tests (e.g., McLeod-Li) are simply erratic.
Notably, however, the Tsay test appears to have relatively high power against SETAR
alternatives. More simulations need to be done to confirm this, but we tentatively
conclude that observation of a noticeably stronger rejection by the Tsay test than by the
BDS tests should be taken as evidence in favor of a SETAR generating mechanism.
Applying the battery of tests to actual data on real U.S. GNP, we find persuasive evidence that
the generating mechanism for this time series is nonlinear. The pattern of the test results is quite
unlike anything we observe in our simulations, however. In particular, on the actual data, we find
This result suggests that the Hinich bicovariance test may be substantially more useful in36
practice than our results on simulated data in Table 3 indicate.
We have not yet considered data generated from a STAR model (as in Teräsvirta and37
Anderson (1992)) but it will be surprising if those calculations materially affect this conclusion.
39
that the Hinich bicovariance test rejects the null hypothesis more strongly than any of the other
tests. And we find that the BDS test rejects the null hypothesis only at embedding dimensions36
greater than 2 on the actual data.
Taking the observed pattern of nonlinearity test results as a new “stylized fact” about U.S.
real GNP, we examine whether estimated SETAR and Markov switching models in the literature
are consistent with this pattern of test results obtained from the data itself. We find that data
generated from these SETAR and Markov switching models estimated to fit U.S. real GNP data
yield patterns of test results which are significantly different from the pattern observed in applying
the tests to the sample data itself. We conclude that the commonly held belief that some sort of37
regime switching process is an adequate representation of the true generating process for U.S.
real GNP is most likely seriously in error.
40
References
Altug, S., Ashley, R., and Patterson, D. M. (1995) "Are Technology Shocks Nonlinear?"
Virginia Tech Economics Department Working Paper Number E95-55.
Ashley, R. and Patterson, D. M. (1989). “Linear Versus Nonlinear Macroeconomies”
International Economic Review 30, 685-704.
Ashley, R., Patterson, D. M. and Hinich, M. (1986). “A Diagnostic Test for Nonlinear Serial
Dependence in Time Series Fitting Errors” Journal of Time Series Analysis 7, 165-78.
Bollerslev, Tim (1986) “Generalized Autoregressive Conditional Heteroskedasticity” Journal of
Econometrics 31, 307-27.
Brillinger, D. and M. Rosenblatt (1967) “Asymptotic Theory of kth Order Spectra” in Spectral
Analysis of Time Series, (B. Harris, ed.) Wiley: New York, pp. 153-88.
Brock, W. A., Hsieh, D. A., and LeBaron, B.D. (1991) A Test of Nonlinear Dynamics, Chaos,
and Instability: Theory and Evidence MIT Press: Cambridge.
Brock, W. A., Dechert W., and Scheinkman J. (1996) “A Test for Independence Based on the
Correlation Dimension” Econometric Reviews 15, 197-235.
David, H. A. (1970) Order Statistics Wiley: New York.
Engle, Robert F. (1982) “Autoregressive Conditional Heteroskedasticity with Estimates of the
Variance of United Kingdom Inflation” Econometrica 50, 987-1007.
Fama, E. F. (1965) “The Behavior of Stock Market Prices” Journal of Business 38, 34-105.
41
Grandmont, J. M. (1985) “On Endogenous Competitive Business Cycles” Econometrica 53, 995-
1045.
Granger, C. W. J. and Andersen, A. A. (1978) An Introduction to Bilinear Time Series Models
Vandenhoeck and Ruprecht: Gottingen.
Hall, R. (1990) “Invariance Properties of Solow’s Productivity Residual” in P. Diamond (ed.)
Growth/Productivity/Employment MIT Press: Cambridge.
Hamilton, James (1989) “A New Approach to the Economic Analysis of Non-Stationary Time
Series and the Business Cycle” Econometrica 57, 357-84.
Hicks, J. R. (1950) A Contribution to the Theory of the Trade Cycle Oxford University Press:
Oxford.
Hinich, M. (1982) “Testing for Gaussianity and Linearity of a Stationary Time Series” Journal of
Time Series Analysis 3, 169-76.
Hinich, M. (1996) “Testing for Dependence in the Input to a Linear Time Series Model” Journal
of Nonparametric Statistics 6, 205-221.
Hinich, M. and Patterson D. M. (1985) “Evidence of Nonlinearity in Daily Stock Returns”
Journal of Business and Economic Statistics 3, 69-77.
Hinich, M. and Patterson D. M. (1995) “Detecting Epochs of Transient Dependence in White
Noise,” unpublished manuscript.
Judge, G., W., Griffiths, C., Hill, H. L, Lütkepohl, Lee, T. C. (1985) The Theory and Practice
of Econometrics John Wiley and Sons: New York.
Kanter, M. and Steiger W. L. (1974) “Regression and Autoregression with Infinite Variance”
Advances in Applied Probability 6, 768-83.
42
Lam, P. (1997) “A Markov Switching Model of GNP Growth With Duration Dependence”
(unpublished manuscript).
de Lima, P. J. F. (1997) “On the Robustness of Nonlinearity Tests to Moment Condition Failure”
Journal of Econometrics 76, 251-80.
McLeod, A. I. and Li, W. K. (1983) “Diagnostic Checking ARMA Time Series Models Using
Squared-Residual Autocorrelations” Journal of Time Series Analysis 4, 269-73.
Palm, F. C. and Pfann, G. A. (1997) “Sources of Asymmetry in Production Factor Dynamics”
Journal of Econometrics 82, 361-92.
Potter, S. (1995) “A Nonlinear Approach to U.S. GNP” Journal of Applied Econometrics 10,
109-125.
Press, W. H., Flannery, B. P., Teukolsky, S. A., and Vetterling, W. T. (1986) Numerical
Recipes: The Art of Scientific Computing. Cambridge University Press: Cambridge.
Ramsey, J. B. and Yuan, H. J. (1987) “The Statistical Properties of Dimension Calculations
Using Small Data Sets” New York University Economic Research Reports: RR 87-20,
53-63.
Subba Rao, T. and Gabr, M. (1980) “A Test for Linearity of Stationary Time Series Analysis”
Journal of Time Series Analysis 1, 145-58.
Teräsvirta, T. and Anderson, H. (1992) “Modelling Nonlinearities in Business Cycles Using
Smooth Transition Autoregressive Models.” Journal of Applied Econometrics 7, 119-36.
Tsay, Ruey S. (1986) “Nonlinearity Tests for Time Series” Biometrika 73, 461-6.
top related