time aggregation and skip sampling in cointegration tests
TRANSCRIPT
Statistical Papers 37, 225-234 (1996) Stat is t ica l P a p e r s �9 Springer-Verlag 1996
Time aggregation and skip sampling in cointegration tests Wanhong Hu*
Received: October 7, 1994; revised version: April 25, 1995
We examine the change of power of Johansen's VAR MLE cointegration test when samples are aggregated or skipped. We show by Monte Carlo simulation that although there are power gains when switching to high frequency data to gain more observations for a fixed time span, the power gains are much more significant when data with longer time span are used.
I. Introduction
Time aggregation occurs when summing high frequency time serie.s data into low
frequency data, as for instance, the aggregation of monthly data into quarterly or annual
data. In many empirical studies data sets are available only through time aggregation.
But when the data sets are available at all frequencies, people often conjecture the power
for cointegration tests increases using high frequency data since for a fixed time span
high frequency data give more observations. Skip sampling occurs when data are only
observable every certain time period. In practice some data sets are available only
through skip sampling. We are interested in examining the effect of skip sampling and
time aggregation on the power of cointegration tests.
Shiller and Perron (1985) and Perron (1989) show that the power of unit root tests
depends more on the time span than the number of observations. In other words, using
high frequency data to increase the number of observations given the fixed time span will
not give as much gain in power as the case of increasing the time span of the sample. For
the residual based cointegration tests the same conclusion has been drawn by Hakkio and
* I thank G. S. Maddala and an anonymous referee for useful comments. The remaining errors are mine.
226.
Rush (1991). They show that although increasing the number of observations by
increasing data frequency will increase the power for four residual based cointegration
tests, the gains are trivial compared to the gains of increasing the number of observations
by increasing the sample time span. This is especially true when the high frequency data
have high serial correlation. Thus they conclude that " . .... cointegration is a long-run
property and thus we often need long time spans of data to properly test it." Indeed, since
the test for cointegration is a test for long-run relationships, the power of the tests should
depend more on the time span of the sample than the number of observations since more
long-run sample variations could only be provided by increasing the time span. Hooker
(1993) shows that contrary to the result of Shiller and Perron, for the ADF residual based
cointegration test, the test power of monthly data is higher than that for the quarterly or
annual data, assuming data are generated at the monthly frequency. The result, however,
is not quite comparable to Shiller and Perron's result. Two dimensions, both the
frequency and the time span, have been considered in Shiller and Perron's paper. Test
power gains are compared over the increasing frequency and the increasing time span.
Hooker considers only one dimension by only comparing the test power when increasing
the sampling frequency.
While all the papers mentioned above use only skip sampling to examine the
changes in test power of unit root tests and cointegration tests, this paper examines the
change in test power using both skip sampling and time aggregation. Also, two
dimensions have been considered, the frequency and the time span. This paper uses
cointegration tests in systems of equations such as, Johansen's maximum likelihood
procedure, instead of cointegration test in single equations, as for instance, the ADF
residual based cointegration test because of the low test power of the ADF test. The tests
for cointegration in systems of equations provide sufficient information about multiple
cointegration vectors and are useful when there is a failure of weak exogeneity in a single
equation. The results show that increasing data frequency by skip sampling will increase
test power, but increasing data span will result in higher test power especially for the case
of highly serially correlated high frequency data. For the aggregated sample the test
power is not as high as the skipped sample. But the above results are generally true for
the case of the aggregated sample. The second section of the paper discusses the model
227
and the data generating process. The third section discusses the simulation results. The
final section provides the conclusions.
II. The tests for cointegration: model and methodology
The cointegration model considered in the simulation is:
X t = X t_ I + e t
yt = [3x, +o1,
COt = P0~t-1 +Ut
where ~, ~iidN(0,6~) and u, ~iidN(O, o2,), for t = 1,2 ..... T. After skip sampling or
time aggregation the model becomes:
X t ~- X t - i '}1"It
r, =~x, +~,
�9 "Ct = pST, t_ 1 + V t
where
and
X, = G,, Y~ = Y,,
11 j = E ~(t-0+l +'" "+~ a
V t "-~ ld t + P U t _ l + . . . + D S - l u t _ s + l
for skip sampling. For time aggregation, we have
X t = xs(l_l)+t + . . . + x s t
El = Ys ( t - l )+l + '" "+Yst
1"1 t = ~; s(t-l) + 2[~ s(t-1)+l + 2F_. s(t-l)+2 + ' " "+E st
v, =u, + ( 1 + p)u,_ 1 +(1 + p + p2)ut_l + . . . + ( p , - 2 +pS-I)/dt_2(s_l)+ 1 + pS-lUt_s+l
where s is the sampling frequency defined as the ratio of the number of observations to
the time span. So v, is not iid as u, but a moving average process. X, and Y~ are
cointegrated the same way as x, and y, if IPl < 1 and are not cointegrated if IPl = 1.
The cointegration test applied here is Johansen's (1988) maximum likelihood
vector autoregression procedure. Express Z, = [G x, ]' (or Z, = [Y~ X, ]' for the cases of
skipped or aggregated samples). Since the error vector follows a VAR(1) process, thus a
228
nonstationary VAR(2) model is required for Z,. For the skipped or aggregated samples,
we use the same VAR(2) model specification. Consider a nonstationary VAR process
with iid Gaussian innovations:
Z t -~- r l i Z t _ 1 -~....-}-l-lkZt_ k 4- e,, e t ~ iidN(O,F,).
Define A(L)= I-I-I~L-...-HkL k and H = I - H ~ - . . . - H k. The null hypothesis that
Z, has r cointegration vectors is equivalent to testing
Ho:rank(Fl)_<r or Yi=tz 7'
where a and ~/ are p x r matrices. In this case r = l, p = 2 and y = [1 - [I].
Rewrite the model in error correction form
~z, = r l az ,_ ,+ . . .+ r ,_ ,Az ,_ , . 1 + r , z , _ , +e,
where F~ = - I +l-I~+...+l-Ij for i = 1 ..... k and FI = - F k . In this case k = 3. The null
hypothesis Ho:r = 0 is tested by applying the trace test and maximum eigenvalue test.
The trace test computes the likelihood ratio (the trace statistics)
P
Q(p-r)=-T~_log(l-~.i) for r = 0 , 1 ..... p - 1 . i=r+l
The maximum eigenvalue test statistic computes the likelihood ratio
Q(rir + 1) = -Tln(1 - 3.,+1).
The null hypothesis H o (r cointegration vectors) is tested against the alternative H~
(r + 1 cointegration vectors). Once the rank(y)= r has been tested, the MLE VAR
estimates of the r cointegration vectors ,/ are the eigenvectors linked to the first r
eigenvalues. In this paper we applied the trace test.
Ill. The Monte Carlo study
The test power is analyzed for different combinations of time spans and
frequencies. Four data sets are generated according to the cointegrating model for spans
of 30, 60, 120 and 240. 240 observations are generated for the data set with time span of
30. Then this data set is skipped or aggregated to obtain different number of observations
of 120, 60 and 30. 480 observations are generated for the span of 60. Then this data set
is skipped or aggregated to obtain different number of observations of 240, 120, 60 and
30. Similarly, 960 observations are generated for span of 120, and 1920 observations for
span of 240. Then same method is used to obtain the different numbers of observations.
229
Thus the spans we consider are S = {30,60,120,240}, from which different numbers of
observations (T = {30,60,120,240} holds for each time span) are obtained through either
skip sampling or time aggregation. The frequency s of the sample is related to the
sample span and the number of observations as s = T / S. Thus, the higher the s the more
the numbers of observations for one time span. In this study, s = {1,2,4,8} is used for
time span of 30. s = {2,4,8,16} is used for time span of 60. Similarly s = {4,8,16,32} and
s = {8,16,32,64} are used for time span of 120 and 240, respectively.
The true value of 13 is 2 in the simulation. The power of the test is studied over
different values of p , which are 0.98, 0.95, 0.92, 0.85. The Monte Carlo study shows
that it is not very interesting to examine the problem when p < 0.85 since power quickly
converges to 1. The critical values provided by Johansen and Juselius (1990) are
tabulated using simulation results. Since the sample sizes are different here, we take the
critical values from the actual distributions of the test statistic under the null in the
simulation. Thus we can have the correct test size and the power can be compared. The
critical values are tabulated when p = 1. So the null hypothesis is no cointegration, while
the alternative hypothesis is cointegration with one cointegrating vector. The critical
values are obtained at the 5% significance level.
For the tables in the Appendix, first columns state time spans and first rows state
the number of observations. Table 1 in the Appendix shows the critical values of the
trace test for the null of no cointegration in the case of skip sampling for the VAR(2)
model. The results are based on 3000 simulations. The corresponding trace test statistic
provided by Johansen and Juselius (1990) is 20.168 (Table A3). In our simulation, the
trace test critical values range from 17.9387 to 19.0588 for skip sampling. Table 2 in the
Appendix gives the critical values for time aggregation. The critical values range from
18.2981 to 19.9539. The critical values tend to become larger as s, the sampling
frequency, increases within the same span. For the same S and T, the critical values for
time aggregation are larger than those for skip sampling.
The simulation results of test power for skip sampling are reported from Table la
to Table ld in the Appendix for different values of p. First we observe that when we
increase the sample frequency for a fixed span (read across the row), the test power
increases in most cases. This is a reasonable result since after all more observations
230
provided by the high frequency data help to better discriminate between the alternative
hypothesis. What we are more interested in is to compare the gain of test power over this
method to the method of increasing the sample span. We observe that the general pattern
is the gain of test power over increasing the sample span is more significant when p is
relatively high, such as 0.95, 0.98. For example, in Table la, for time span of 120,
increasing sample frequency (read across the row) results in test power gain of 18%
(increasing the number of observations from 30 to 240). But for 120 observations,
increasing sample span (read down the column) results in test power gain of almost 75%
(increasing span from 30 to 240). The power gain over increasing frequency is trivial
compared to increasing the time span. When p decreases, we first notice that the increase
in overall test power, since when p is close to I the test cannot discriminate well. We
also notice that the power gain over increasing sample time span is smaller. In Table ld,
for example, for observations of 120 and 240 when increasing span from 120 to 240 there
are no power gains and this is partly because of the high overall test power. The results
are consistent with the results in Hakkio and Rush (1991), namely that it is futile to
increase the frequency given fixed span when the high frequency data are highly serially
correlated.
Table 2a to Table 2d in the Appendix report the simulation results using time
aggregation. The main conclusion is the same. However, we find the test power of time
aggregation is slightly smaller than the case of skip sampling. And the smaller the p, the
smaller the test power compared to the case of skip sampling. Similar to the results of
skip sampling, when p is relatively high, disaggregating the data will result in higher test
power, but this increase is trivial compared to the power gains of increasing sample time
span. When p decreases, the overall test power is higher. The power gain over
increased sample time span is then smaller.
IV. Conclusion
In this paper, we examine the change of power of Johansen's VAR MLE
cointegration test when samples are aggregated or skipped. We show by Monte Carlo
simulation that although there are power gains when switching to high frequency data to
gain more observations for a fixed time span, the power gains are much more significant
when data with longer time span are used. Thus, it confirms the conclusions drawn by
231
Hakkio and Rush (1991). It also confirms Shiller and Perron's (1985) results, namely, the
length of the sample time span is more important than number of observations within a
fixed time span. Earlier studies considered only skip sampling. The present study
considers both time aggregation and skip sampling. It studies the more powerful
Johansen test as compared to the residual based tests considered in earlier studies. It also
provides critical values for skip sampling and time aggregation for the Johansen test.
Although the gains in the power by increasing the observations with a fixed time
span are not as great as those coming from data with larger time spans, it is not true that
one should not use high frequency data. In practice, data with longer time spans may not
be always available, and even if they are, there are likely to be subject to structural
changes. Thus, if one has a choice between 20 years of annual, quarterly or monthly data,
it is best to use the 240 monthly observations)
i It is so suggested without considering the presentation of seasonal fluctuations.
232
Appendix
Table 1 Critical Values of Skip Sampling
T 30 60 120 240
S 30 18.8726 18.6031 18.2754 18,2981 60 19.0588 18,1248 17.9387 18,3213 120 18.8157 18.4493 18.3562 18,4612 240 18.2922 18.2664 18.1114 18.0461
Table la Cointegration Test with p = 0.98
T 30 60 120 240
S 30 0.0570 0.0560 0,0670 0.0725 60 0.0810 0.1155 0,1365 0.1290 120 0.1635 0.2410 0.3030 0.3420 240 0.3010 0.5520 0.7520 0.8640
Table lb Cointegration Test with p = 0.95
T 30 60 120 240
S 30 0.0945 0.1270 0.1525 0.1680 60 0.1790 0.3475 0.4645 0.4905 120 0.3235 0.6710 0.8925 0.9600 240 0.4285 0.9095 0.9990 1.0000
Table lc Cointegration Test with p = 0.92
T 30 60 120 240
S 30 0.1535 0.2395 0.3090 0.3575 60 0.2680 0.5800 0.7960 0.8695 120 0.3650 0.8605 0.9940 1.0000 240 0.4540 0.9610 1.0000 1.0000
Table ld Cointegration Test with p = 0.85
T 30 60 120 240
S 30 0.2620 0.5365 0.7570 0.8540 60 0.3635 0.8745 0.9945 0.9995 120 0.4035 0.9525 1.0000 1.0000 240 0.4565 0.9705 1.0000 1.0000
Table 2 Critical Values of Time Aggregation
T 30 60 120 240
S 30 19.6043 19.2396 18.6093 18.2981 60 19.9539 19.2424 18.6554 18.5776 120 19.8923 19.3784 19.1553 19.1557 240 19.7111 19.3095 18.9446 18.7439
Table 2a Cointegration Test with p = 0.98
T 30 60 120 240
S 30 0.0675 0.0665 0.0685 0.0725 60 0.0770 0.1160 0.1470 0.1395 120 0.1380 0.2495 0.3185 0.3725 240 0.2180 0.4795 0.7525 0.8800
Table 2b Cointegration Test with p = 0.95
T 30 60 120 240
S 30 0.0990 0.1365 0.1615 0.1680 60 0.1505 0.3200 0.4750 0.5175 120 0.2435 0.5885 0.8725 0.9625 240 0.3050 0.8245 0.9970 1.0000
Table 2c Cointegration Test with p = 0.92
T 30 60 120 240
S 30 0.1435 0.2410 0.3185 0.3575 60 0.2145 0.5255 0.7800 0.8705 120 0.2880 0.7635 0.9835 1.0000 240 0.3395 0.9015 1.0000 1.0000
233
Table 2d Cointegration Test with p = 0.85
T 30 60 120 240
30 0.2225 0.4895 0,7460 0.8540 60 0.2855 0.7720 0.9845 0.9990 120 0.3300 0.8870 1.0000 1.0000 240 0.3600 0.9375 1.0000 1.0000
234
References
Hakkio C. and M. Rush (1991) Cointegration: how short is the long run? Journal of International Money and Finance 10:571-581
Hooker M. (1993) Testing for cointegration. Economics Letters 41:359-362
Perron P. (1991) Testing for a random walk: a simulation experiment of power when the sampling interval is varied, in B. Raj (ed.), Advances in Econometrics and Modeling, Kluwer Academic Publishers, Dordrecht
Shiller R. and P. Perron (1985) Testing the random walk hypotheses: power versus frequency of observation. Economics Letters 18:381-386
Johansen S. (1988) Statistical analysis of cointegration vectors. Journal of Economic . Dynamics and Control 12:231-254
Johansen S. and K. Juselius (1990) Maximum Likelihood Estimation and Inference on Cointegration - with Applications to the Demand for Money. Oxford Bulletin of Economics and Statistics 52:169-211
Wanhong Hu 410 Arias Hall 1945 N. High St. Columbus OH 43210 USA