time aggregation and skip sampling in cointegration tests

Statistical Papers 37, 225-234 (1996) Stat is t ica l P a p e r s �9 Springer-Verlag 1996

Time aggregation and skip sampling in cointegration tests Wanhong Hu*

Received: October 7, 1994; revised version: April 25, 1995

We examine the change of power of Johansen's VAR MLE cointegration test when samples are aggregated or skipped. We show by Monte Carlo simulation that although there are power gains when switching to high frequency data to gain more observations for a fixed time span, the power gains are much more significant when data with longer time span are used.

I. Introduction

Time aggregation occurs when summing high frequency time serie.s data into low

frequency data, as for instance, the aggregation of monthly data into quarterly or annual

data. In many empirical studies data sets are available only through time aggregation.

But when the data sets are available at all frequencies, people often conjecture the power

for cointegration tests increases using high frequency data since for a fixed time span

high frequency data give more observations. Skip sampling occurs when data are only

observable every certain time period. In practice some data sets are available only

through skip sampling. We are interested in examining the effect of skip sampling and

time aggregation on the power of cointegration tests.

Shiller and Perron (1985) and Perron (1989) show that the power of unit root tests

depends more on the time span than the number of observations. In other words, using

high frequency data to increase the number of observations given the fixed time span will

not give as much gain in power as the case of increasing the time span of the sample. For

the residual based cointegration tests the same conclusion has been drawn by Hakkio and

* I thank G. S. Maddala and an anonymous referee for useful comments. The remaining errors are mine.

226.

Rush (1991). They show that although increasing the number of observations by

increasing data frequency will increase the power for four residual based cointegration

tests, the gains are trivial compared to the gains of increasing the number of observations

by increasing the sample time span. This is especially true when the high frequency data

have high serial correlation. Thus they conclude that " . .... cointegration is a long-run

property and thus we often need long time spans of data to properly test it." Indeed, since

the test for cointegration is a test for long-run relationships, the power of the tests should

depend more on the time span of the sample than the number of observations since more

long-run sample variations could only be provided by increasing the time span. Hooker

(1993) shows that contrary to the result of Shiller and Perron, for the ADF residual based

cointegration test, the test power of monthly data is higher than that for the quarterly or

annual data, assuming data are generated at the monthly frequency. The result, however,

is not quite comparable to Shiller and Perron's result. Two dimensions, both the

frequency and the time span, have been considered in Shiller and Perron's paper. Test

power gains are compared over the increasing frequency and the increasing time span.

Hooker considers only one dimension by only comparing the test power when increasing

the sampling frequency.

While all the papers mentioned above use only skip sampling to examine the

changes in test power of unit root tests and cointegration tests, this paper examines the

change in test power using both skip sampling and time aggregation. Also, two

dimensions have been considered, the frequency and the time span. This paper uses

cointegration tests in systems of equations such as, Johansen's maximum likelihood

procedure, instead of cointegration test in single equations, as for instance, the ADF

residual based cointegration test because of the low test power of the ADF test. The tests

for cointegration in systems of equations provide sufficient information about multiple

cointegration vectors and are useful when there is a failure of weak exogeneity in a single

equation. The results show that increasing data frequency by skip sampling will increase

test power, but increasing data span will result in higher test power especially for the case

of highly serially correlated high frequency data. For the aggregated sample the test

power is not as high as the skipped sample. But the above results are generally true for

the case of the aggregated sample. The second section of the paper discusses the model

227

and the data generating process. The third section discusses the simulation results. The

final section provides the conclusions.

II. The tests for cointegration: model and methodology

The cointegration model considered in the simulation is:

X t = X t_ I + e t

yt = [3x, +o1,

COt = P0~t-1 +Ut

where ~, ~iidN(0,6~) and u, ~iidN(O, o2,), for t = 1,2 ..... T. After skip sampling or

time aggregation the model becomes:

X t ~- X t - i '}1"It

r, =~x, +~,

�9 "Ct = pST, t_ 1 + V t

where

and

X, = G,, Y~ = Y,,

11 j = E ~(t-0+l +'" "+~ a

V t "-~ ld t + P U t _ l + . . . + D S - l u t _ s + l

for skip sampling. For time aggregation, we have

X t = xs(l_l)+t + . . . + x s t

El = Ys ( t - l )+l + '" "+Yst

1"1 t = ~; s(t-l) + 2[~ s(t-1)+l + 2F_. s(t-l)+2 + ' " "+E st

v, =u, + ( 1 + p)u,_ 1 +(1 + p + p2)ut_l + . . . + ( p , - 2 +pS-I)/dt_2(s_l)+ 1 + pS-lUt_s+l

where s is the sampling frequency defined as the ratio of the number of observations to

the time span. So v, is not iid as u, but a moving average process. X, and Y~ are

cointegrated the same way as x, and y, if IPl < 1 and are not cointegrated if IPl = 1.

The cointegration test applied here is Johansen's (1988) maximum likelihood

vector autoregression procedure. Express Z, = [G x, ]' (or Z, = [Y~ X, ]' for the cases of

skipped or aggregated samples). Since the error vector follows a VAR(1) process, thus a

228

nonstationary VAR(2) model is required for Z,. For the skipped or aggregated samples,

we use the same VAR(2) model specification. Consider a nonstationary VAR process

with iid Gaussian innovations:

Z t -~- r l i Z t _ 1 -~....-}-l-lkZt_ k 4- e,, e t ~ iidN(O,F,).

Define A(L)= I-I-I~L-...-HkL k and H = I - H ~ - . . . - H k. The null hypothesis that

Z, has r cointegration vectors is equivalent to testing

Ho:rank(Fl)_<r or Yi=tz 7'

where a and ~/ are p x r matrices. In this case r = l, p = 2 and y = [1 - [I].

Rewrite the model in error correction form

~z, = r l az ,_ ,+ . . .+ r ,_ ,Az ,_ , . 1 + r , z , _ , +e,

where F~ = - I +l-I~+...+l-Ij for i = 1 ..... k and FI = - F k . In this case k = 3. The null

hypothesis Ho:r = 0 is tested by applying the trace test and maximum eigenvalue test.

The trace test computes the likelihood ratio (the trace statistics)

P

Q(p-r)=-T~_log(l-~.i) for r = 0 , 1 ..... p - 1 . i=r+l

The maximum eigenvalue test statistic computes the likelihood ratio

Q(rir + 1) = -Tln(1 - 3.,+1).

The null hypothesis H o (r cointegration vectors) is tested against the alternative H~

(r + 1 cointegration vectors). Once the rank(y)= r has been tested, the MLE VAR

estimates of the r cointegration vectors ,/ are the eigenvectors linked to the first r

eigenvalues. In this paper we applied the trace test.

Ill. The Monte Carlo study

The test power is analyzed for different combinations of time spans and

frequencies. Four data sets are generated according to the cointegrating model for spans

of 30, 60, 120 and 240. 240 observations are generated for the data set with time span of

30. Then this data set is skipped or aggregated to obtain different number of observations

of 120, 60 and 30. 480 observations are generated for the span of 60. Then this data set

is skipped or aggregated to obtain different number of observations of 240, 120, 60 and

30. Similarly, 960 observations are generated for span of 120, and 1920 observations for

span of 240. Then same method is used to obtain the different numbers of observations.

229

Thus the spans we consider are S = {30,60,120,240}, from which different numbers of

observations (T = {30,60,120,240} holds for each time span) are obtained through either

skip sampling or time aggregation. The frequency s of the sample is related to the

sample span and the number of observations as s = T / S. Thus, the higher the s the more

the numbers of observations for one time span. In this study, s = {1,2,4,8} is used for

time span of 30. s = {2,4,8,16} is used for time span of 60. Similarly s = {4,8,16,32} and

s = {8,16,32,64} are used for time span of 120 and 240, respectively.

The true value of 13 is 2 in the simulation. The power of the test is studied over

different values of p , which are 0.98, 0.95, 0.92, 0.85. The Monte Carlo study shows

that it is not very interesting to examine the problem when p < 0.85 since power quickly

converges to 1. The critical values provided by Johansen and Juselius (1990) are

tabulated using simulation results. Since the sample sizes are different here, we take the

critical values from the actual distributions of the test statistic under the null in the

simulation. Thus we can have the correct test size and the power can be compared. The

critical values are tabulated when p = 1. So the null hypothesis is no cointegration, while

the alternative hypothesis is cointegration with one cointegrating vector. The critical

values are obtained at the 5% significance level.

For the tables in the Appendix, first columns state time spans and first rows state

the number of observations. Table 1 in the Appendix shows the critical values of the

trace test for the null of no cointegration in the case of skip sampling for the VAR(2)

model. The results are based on 3000 simulations. The corresponding trace test statistic

provided by Johansen and Juselius (1990) is 20.168 (Table A3). In our simulation, the

trace test critical values range from 17.9387 to 19.0588 for skip sampling. Table 2 in the

Appendix gives the critical values for time aggregation. The critical values range from

18.2981 to 19.9539. The critical values tend to become larger as s, the sampling

frequency, increases within the same span. For the same S and T, the critical values for

time aggregation are larger than those for skip sampling.

The simulation results of test power for skip sampling are reported from Table la

to Table ld in the Appendix for different values of p. First we observe that when we

increase the sample frequency for a fixed span (read across the row), the test power

increases in most cases. This is a reasonable result since after all more observations

230

provided by the high frequency data help to better discriminate between the alternative

hypothesis. What we are more interested in is to compare the gain of test power over this

method to the method of increasing the sample span. We observe that the general pattern

is the gain of test power over increasing the sample span is more significant when p is

relatively high, such as 0.95, 0.98. For example, in Table la, for time span of 120,

increasing sample frequency (read across the row) results in test power gain of 18%

(increasing the number of observations from 30 to 240). But for 120 observations,

increasing sample span (read down the column) results in test power gain of almost 75%

(increasing span from 30 to 240). The power gain over increasing frequency is trivial

compared to increasing the time span. When p decreases, we first notice that the increase

in overall test power, since when p is close to I the test cannot discriminate well. We

also notice that the power gain over increasing sample time span is smaller. In Table ld,

for example, for observations of 120 and 240 when increasing span from 120 to 240 there

are no power gains and this is partly because of the high overall test power. The results

are consistent with the results in Hakkio and Rush (1991), namely that it is futile to

increase the frequency given fixed span when the high frequency data are highly serially

correlated.

Table 2a to Table 2d in the Appendix report the simulation results using time

aggregation. The main conclusion is the same. However, we find the test power of time

aggregation is slightly smaller than the case of skip sampling. And the smaller the p, the

smaller the test power compared to the case of skip sampling. Similar to the results of

skip sampling, when p is relatively high, disaggregating the data will result in higher test

power, but this increase is trivial compared to the power gains of increasing sample time

span. When p decreases, the overall test power is higher. The power gain over

increased sample time span is then smaller.

IV. Conclusion

In this paper, we examine the change of power of Johansen's VAR MLE

cointegration test when samples are aggregated or skipped. We show by Monte Carlo

simulation that although there are power gains when switching to high frequency data to

gain more observations for a fixed time span, the power gains are much more significant

when data with longer time span are used. Thus, it confirms the conclusions drawn by

231

Hakkio and Rush (1991). It also confirms Shiller and Perron's (1985) results, namely, the

length of the sample time span is more important than number of observations within a

fixed time span. Earlier studies considered only skip sampling. The present study

considers both time aggregation and skip sampling. It studies the more powerful

Johansen test as compared to the residual based tests considered in earlier studies. It also

provides critical values for skip sampling and time aggregation for the Johansen test.

Although the gains in the power by increasing the observations with a fixed time

span are not as great as those coming from data with larger time spans, it is not true that

one should not use high frequency data. In practice, data with longer time spans may not

be always available, and even if they are, there are likely to be subject to structural

changes. Thus, if one has a choice between 20 years of annual, quarterly or monthly data,

it is best to use the 240 monthly observations)

i It is so suggested without considering the presentation of seasonal fluctuations.

232

Appendix

Table 1 Critical Values of Skip Sampling

T 30 60 120 240

S 30 18.8726 18.6031 18.2754 18,2981 60 19.0588 18,1248 17.9387 18,3213 120 18.8157 18.4493 18.3562 18,4612 240 18.2922 18.2664 18.1114 18.0461

Table la Cointegration Test with p = 0.98

T 30 60 120 240

S 30 0.0570 0.0560 0,0670 0.0725 60 0.0810 0.1155 0,1365 0.1290 120 0.1635 0.2410 0.3030 0.3420 240 0.3010 0.5520 0.7520 0.8640

Table lb Cointegration Test with p = 0.95

T 30 60 120 240

S 30 0.0945 0.1270 0.1525 0.1680 60 0.1790 0.3475 0.4645 0.4905 120 0.3235 0.6710 0.8925 0.9600 240 0.4285 0.9095 0.9990 1.0000

Table lc Cointegration Test with p = 0.92

T 30 60 120 240

S 30 0.1535 0.2395 0.3090 0.3575 60 0.2680 0.5800 0.7960 0.8695 120 0.3650 0.8605 0.9940 1.0000 240 0.4540 0.9610 1.0000 1.0000

Table ld Cointegration Test with p = 0.85

T 30 60 120 240

S 30 0.2620 0.5365 0.7570 0.8540 60 0.3635 0.8745 0.9945 0.9995 120 0.4035 0.9525 1.0000 1.0000 240 0.4565 0.9705 1.0000 1.0000

Table 2 Critical Values of Time Aggregation

T 30 60 120 240

S 30 19.6043 19.2396 18.6093 18.2981 60 19.9539 19.2424 18.6554 18.5776 120 19.8923 19.3784 19.1553 19.1557 240 19.7111 19.3095 18.9446 18.7439

Table 2a Cointegration Test with p = 0.98

T 30 60 120 240

S 30 0.0675 0.0665 0.0685 0.0725 60 0.0770 0.1160 0.1470 0.1395 120 0.1380 0.2495 0.3185 0.3725 240 0.2180 0.4795 0.7525 0.8800

Table 2b Cointegration Test with p = 0.95

T 30 60 120 240

S 30 0.0990 0.1365 0.1615 0.1680 60 0.1505 0.3200 0.4750 0.5175 120 0.2435 0.5885 0.8725 0.9625 240 0.3050 0.8245 0.9970 1.0000

Table 2c Cointegration Test with p = 0.92

T 30 60 120 240

S 30 0.1435 0.2410 0.3185 0.3575 60 0.2145 0.5255 0.7800 0.8705 120 0.2880 0.7635 0.9835 1.0000 240 0.3395 0.9015 1.0000 1.0000

233

Table 2d Cointegration Test with p = 0.85

T 30 60 120 240

30 0.2225 0.4895 0,7460 0.8540 60 0.2855 0.7720 0.9845 0.9990 120 0.3300 0.8870 1.0000 1.0000 240 0.3600 0.9375 1.0000 1.0000

234

References

Hakkio C. and M. Rush (1991) Cointegration: how short is the long run? Journal of International Money and Finance 10:571-581

Hooker M. (1993) Testing for cointegration. Economics Letters 41:359-362

Perron P. (1991) Testing for a random walk: a simulation experiment of power when the sampling interval is varied, in B. Raj (ed.), Advances in Econometrics and Modeling, Kluwer Academic Publishers, Dordrecht

Shiller R. and P. Perron (1985) Testing the random walk hypotheses: power versus frequency of observation. Economics Letters 18:381-386

Johansen S. (1988) Statistical analysis of cointegration vectors. Journal of Economic . Dynamics and Control 12:231-254

Johansen S. and K. Juselius (1990) Maximum Likelihood Estimation and Inference on Cointegration - with Applications to the Demand for Money. Oxford Bulletin of Economics and Statistics 52:169-211

Wanhong Hu 410 Arias Hall 1945 N. High St. Columbus OH 43210 USA

time aggregation and skip sampling in cointegration tests

Documents