evaluating multivariate forecast densities: a comparison of two approaches

11
International Journal of Forecasting 18 (2002) 397–407 www.elsevier.com / locate / ijforecast Evaluating multivariate forecast densities: a comparison of two approaches * Michael P. Clements, Jeremy Smith Department of Economics, University of Warwick, Coventry CV47AL, UK Abstract We consider methods of evaluating multivariate density forecasts. A recently proposed method is found to lack power when the correlation structure is mis-specified. Tests that have good power to detect mis-specifications of this sort are described. We also consider the properties of the tests in the presence of more general mis-specifications. 2002 International Institute of Forecasters. Published by Elsevier Science B.V. All rights reserved. Keywords: Multivariate density forecasts; Point forecasts 1. Introduction McCracken, 2001, provide an exposition). New- bold and Harvey (2001) review recent develop- In recent years there has been much interest ments in forecast encompassing and combina- in the evaluation and testing of the predictive tion, and White (2000) the impact of ‘data accuracy of point forecasts. For example, snooping’. Diebold and Mariano (1995) consider ways of At the same time, a number of recent papers comparing predictive accuracy, and West have gone beyond the traditional concern with (1996) and West and McCracken (1998) look at the production and evaluation of point forecasts the impact of parameter estimation uncertainty to consider the evaluation of interval forecasts on comparisons of predictive accuracy (West & (e.g. Granger, White, & Kamstra, 1989; Christ- offersen, 1998 and Clements and Taylor, 2000) and density forecasts (see, for example, Diebold, Gunther, & Tay, 1998; Diebold, Hahn, & Tay, *Corresponding author. Tel.: 144-1203-523-336; fax: 1999; Clements & Smith, 2000 and the review 144-1203-523-032. by Tay & Wallis, 2001). This literature is still in E-mail address: [email protected] (J. its infancy, but some of the recent concerns in Smith). 1 the point forecast literature are relevant, such as In the point forecast evaluation literature, Clements and how to compare a number of forecast densities, Hendry (1993) considered the evaluation of multivariate forecasts. and the impact of estimation uncertainty. In this 0169-2070 / 02 / $ – see front matter 2002 International Institute of Forecasters. Published by Elsevier Science B.V. All rights reserved. PII: S0169-2070(01)00126-1

Upload: michael-p-clements

Post on 16-Sep-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Evaluating multivariate forecast densities: a comparison of two approaches

International Journal of Forecasting 18 (2002) 397–407www.elsevier.com/ locate/ ijforecast

Evaluating multivariate forecast densities:a comparison of two approaches

*Michael P. Clements, Jeremy SmithDepartment of Economics, University of Warwick, Coventry CV4 7AL, UK

Abstract

We consider methods of evaluating multivariate density forecasts. A recently proposed method is found to lack powerwhen the correlation structure is mis-specified. Tests that have good power to detect mis-specifications of this sort aredescribed. We also consider the properties of the tests in the presence of more general mis-specifications. 2002International Institute of Forecasters. Published by Elsevier Science B.V. All rights reserved.

Keywords: Multivariate density forecasts; Point forecasts

1. Introduction McCracken, 2001, provide an exposition). New-bold and Harvey (2001) review recent develop-

In recent years there has been much interestments in forecast encompassing and combina-in the evaluation and testing of the predictive tion, and White (2000) the impact of ‘dataaccuracy of point forecasts. For example, snooping’.Diebold and Mariano (1995) consider ways of At the same time, a number of recent paperscomparing predictive accuracy, and West have gone beyond the traditional concern with(1996) and West and McCracken (1998) look at the production and evaluation of point forecaststhe impact of parameter estimation uncertainty to consider the evaluation of interval forecastson comparisons of predictive accuracy (West & (e.g. Granger, White, & Kamstra, 1989; Christ-

offersen, 1998 and Clements and Taylor, 2000)and density forecasts (see, for example, Diebold,Gunther, & Tay, 1998; Diebold, Hahn, & Tay,

*Corresponding author. Tel.:144-1203-523-336; fax: 1999; Clements & Smith, 2000 and the review144-1203-523-032. by Tay & Wallis, 2001). This literature is still in

E-mail address: [email protected] (J. its infancy, but some of the recent concerns inSmith).1 the point forecast literature are relevant, such asIn the point forecast evaluation literature, Clements and

how to compare a number of forecast densities,Hendry (1993) considered the evaluation of multivariateforecasts. and the impact of estimation uncertainty. In this

0169-2070/02/$ – see front matter 2002 International Institute of Forecasters. Published by Elsevier Science B.V. All rights reserved.PI I : S0169-2070( 01 )00126-1

Page 2: Evaluating multivariate forecast densities: a comparison of two approaches

398 M.P. Clements, J. Smith / International Journal of Forecasting 18 (2002) 397–407

paper, though, we focus on two alternative of the probability integral transform goes backmethods of evaluating joint forecast densities at least as far as Rosenblatt (1952). Diebold et

1that have been suggested in the literature. Jointal. (1998) have recently popularised these ideasdensity or probability forecasts seem set to in the econometrics literature.command increasing attention in the macro- Consider a series of one-step prediction den-economics policy-making arena, in the form for sities for the value of a variableY made attexample of, ‘what are the chances that inflation t 2 1, denoted byp (y ), where we haven oft t

1]will remain below 2 % next year and growth these, t 5 1, . . . ,n. The probability integral2

will exceed 2%?’ Thus tools for evaluating such transforms (p.i.t.) of the realizations of theforecasts need to be reviewed. Section 2 dis- variable with respect to the prediction densitiescusses the two methods, which are based on theare given by:probability integral transform (p.i.t.) of Rosen-

ytblatt, 1952, but can have very different power

z 5E p (u) du (1)properties in some circumstances. This is dem- t t

2`onstrated by simulation in Section 3, where weconsider the performance of the tests under a for t 5 1, . . . ,n. When the predicted densitynumber of different distributional assumptions corresponds to the true predictive density,z |tfor the data generation process. Section 4 U 0,1 , t 5 1, . . . ,n, and the sequence is in-s d

nillustrates with an empirical example, and fur- dependently distributed, sohz j is iid U [0,1].t t51ther simulations in Section 5 based on the In a time-series context, the form ofp (y ) mayt tempirical example support our contentions over be changing over time, so that the sequencenthe differing power properties of the tests. hz j will only consist of iid uniform variablest t51Section 6 concludes. if the true densities are used at eacht.

Because of our focus on the relative merits of So we can evaluate the predicted densities bytests of joint density adequacy, we ignore in the assessing whether there is statistically signifi-simulations and empirical work other issues

cant evidence that the realizations do not comewhich warrant consideration. Specifically, in the

from that density, which means testing whetherempirical example the models’ parameter esti-

the empirical distribution function of thehz jtmates are taken as being the population valuesseries departs from the theoretical distributionfor generating the forecasts. That is, we neglectfunction of a 458 line. The natural test is theparameter estimation uncertainty, but recogniseKolmogorov–Smirnov (KS) test, which is afollowing the work of West (1996) and Westvalid test for uniformity under the assumptionand McCracken (1998) that this could be im-of independence. The Kolmogorov–Smirnovportant. We also assume the models’ errors are(KS) statistic is the maximum difference be-gaussian, whereas in general it may be moretween the empirical cumulative distributionappropriate to construct densities by some formfunction of the z values and the theoreticalof bootstrap, but we do investigate the impactcumulative distribution. If the observed valueon the tests of allowing the actual observationsexceeds the critical value, the null that thezs areto come from a variety of non-gaussian dis-uniformly distributed is rejected. The perform-tributions.ance of other statistics designed to test foruniformity is discussed in Noceti, Smith, andHodges (2001). The independence part of the2. Evaluating density forecastsnull is sometimes tested by looking for auto-

The approach to density forecast evaluation is correlation in (powers of) thez series.t

based on Dawid (1984), although the key idea Suppose now we have a joint density for

Page 3: Evaluating multivariate forecast densities: a comparison of two approaches

M.P. Clements, J. Smith / International Journal of Forecasting 18 (2002) 397–407 399

y ,y . The above approach can be applied if it exploits twice as many observations (in theh j1t 2t

we factor the joint density into the conditional bivariate case).of y giveny , and the marginal fory (or vice2t 1t 1t

versa), and calculate the p.i.t. values for the3. Monte Carlo simulationsconditional and marginal separately. Under the

null that the predicted joint density is correct,In this section we present some Monte Carlothe two sequences will each be iidU 0,1 , ands d

power results for the KS statistics for both thethe two sequences will themselves be indepen-stacked test (S) proposed in Diebold et al.dent. If we let the p.i.t. sequence for the

c (1998, 1999), and the joint tests (J1 & J2) in aconditional of y given y be Z , and for2t 1t 2u1,tm bivariate system. We simulate random numbersthe marginal fory be Z , then Diebold et1t 1,t

from a variety of distributions.al. (1998, p. 881) propose applying the evalu-ation techniques to the 2n 3 1 stacked

c c m m 3.1. Joint normalvector [Z , . . . ,Z ,Z , . . . ,Z ]9 or2u1,1 2u1,n 1,1 1,nc c m m([Z , . . . ,Z ,Z , . . . ,Z ]9). Alternative-1u2,1 1u2,n 2,1 2,n The variables are joint normal:ly, Clements and Smith (2000) propose basing a

2test on then dimensional vector with typical x m s rs s1 1 1 1 2j c m |N , (2)element hZ 5 Z 3Z j. They derive the F G SF G F GD2t 2u1,t 1,t x m rs s s2 2 1 2 2distribution function for the ‘product’ for up tothree variables (and indicate how it can be for T 5 25, 50, 70, 100, 200,s 5s 5 1, m 51 2 1derived for any number of variables) and this m 5 0 and for various values ofr. The model2jcan be used to transform theZ series to an iid or predicted densities areN 0,1 for all the casess dt

U 0,1 sequence under the null. In this paper, wes d we consider for both processes, so that when thealso consider the ratio of the conditional and data are gaussian, as here, the marginal modelmarginal p.i.t. values, with typical element densities are correctly-specified. Thus the sets of

c mhZ /Z j. In Appendix A, we derive the z s of size T for each variable, obtained byh j2u1,t 1,t t

distribution function for the ratio, and report the evaluating (1) wherep ? is the standard nor-s dt

derivation for the product for completeness. We mal, consist of iid U 0,1 random variates.s destablish in Section 3 that in some circum- However, in using the product of the marginalstances the ratio yields a test with good power. densities to obtain the joint,r 5 0 is effectively

Under the null the tests for the ‘stacked’ being assumed: that the conditional forx given1t

sequence which we denote ‘S’, and for the x (or x given x ) is equal to the marginal for2t 2t 1t

product and ratio, which we denote ‘J1’ and x (or x ). This is an example of mis-spe-1t 2t

‘J2’, respectively, are equivalent. When the mis- cification that affects only the correlation. Thusspecification is of the correlation between the the two sets of z s will not be independent ofh jttwo variables, the S test is likely to lack power each other. We calculate the KS statistics for thebecause it fails to preserve the temporal group- 23T vector formed by stacking the two sets of

c ming of the Z and Z sequences, i.e, that z s, and for the T dimensional vectors ofh j2u1,t 1,t tc mZ and Z , occur together. As we will products and ratios, using the distributions2u1,t 1,to o

show, in some circumstances the J1 and J2 testsrecorded in Appendix A. We then repeat thiswill detect mis-specifications of the correlation exercise until we have 1000 replications, from

c mbecause these jointly affectZ andZ , and which the Monte Carlo estimates of the powers2u1,t 1,to o

this will be preserved by the tests based on the for the stacked (S) and joint (J1 and J2) tests ofproduct or ratio. When this form of mis-spe- the adequacy of the predicted multivariate den-cification is absent, the S-test may fare better as sities are simply the frequency of rejections

Page 4: Evaluating multivariate forecast densities: a comparison of two approaches

400 M.P. Clements, J. Smith / International Journal of Forecasting 18 (2002) 397–407

Table 1Monte Carlo comparison of the KS stacked (S) and joint (J1, J2) tests: normal distribution

r T525 T550 T570 T5100 T5200

S M J1 J2 S M J1 J2 S M J1 J2 S M J1 J2 S M J1 J2

0.8 12.1 2.6 11.2 22.3 12.6 4.1 23.1 64.2 12.4 3.2 34.1 86.1 12.0 3.3 49.4 98.8 12.9 2.9 87.1 100.0

0.7 11.3 3.0 10.1 11.2 9.5 2.1 15.1 34.3 8.5 3.2 25.3 54.0 9.5 3.1 40.8 79.2 10.2 3.9 75.8 99.8

0.6 7.5 1.9 5.2 5.8 9.2 3.4 13.6 19.0 9.6 3.7 19.1 29.2 7.6 3.8 30.2 45.8 7.8 3.1 58.7 89.4

0.5 8.4 3.1 5.8 5.8 9.5 3.4 9.7 10.0 7.8 4.2 16.1 14.5 6.6 3.8 20.1 25.4 8.5 3.4 43.0 59.6

0.4 7.1 3.5 6.3 5.2 7.3 3.2 8.5 7.0 6.2 3.5 8.2 7.9 7.8 4.3 14.5 13.9 7.4 3.3 27.7 30.8

0.3 5.7 2.4 4.6 3.3 7.2 3.6 5.2 4.1 6.0 3.9 7.7 5.6 5.7 3.3 7.4 7.3 6.9 2.9 16.0 13.4

0.2 5.4 3.5 4.0 3.8 5.5 3.1 4.2 3.5 4.2 3.6 4.1 4.1 5.7 3.9 6.0 4.4 5.6 4.8 8.2 7.4

0.1 4.7 3.8 3.2 3.8 4.6 4.1 3.5 4.0 3.8 2.7 3.6 4.0 4.2 3.9 3.9 3.5 6.1 3.3 5.4 4.0

0.1 3.5 2.5 4.3 3.2 4.9 3.4 4.1 3.9 4.3 3.7 3.9 2.9 3.7 3.2 4.1 3.6 3.8 3.8 3.1 2.7

0.0 3.3 2.9 4.0 3.5 3.8 3.0 4.3 2.7 3.1 3.6 3.4 3.8 3.3 3.7 3.2 3.5 3.6 3.5 4.5 4.2

20.1 4.3 3.8 3.3 3.2 2.9 3.2 3.3 3.3 3.7 4.4 3.6 3.7 3.3 3.4 3.6 3.8 2.7 3.1 3.1 3.8

20.1 2.4 3.1 2.2 3.1 1.8 2.7 3.0 2.8 2.7 3.4 3.3 4.1 3.8 4.1 5.0 5.0 3.8 3.5 4.2 5.4

20.2 2.2 3.0 4.9 3.6 2.2 3.9 4.7 5.3 2.6 4.4 5.6 5.3 3.1 4.0 5.7 7.2 4.0 4.6 11.0 8.9

20.3 2.6 2.9 3.8 4.9 2.0 3.6 7.6 6.7 1.7 3.4 7.7 5.4 1.4 2.9 11.0 7.4 2.4 3.7 22.2 12.4

20.4 1.6 3.2 4.8 5.1 1.4 3.9 9.6 8.8 1.7 3.5 13.9 9.6 1.2 3.8 19.7 12.3 0.9 3.7 45.1 20.8

20.5 0.7 4.0 6.8 6.1 0.6 4.0 15.0 8.7 1.0 3.7 22.4 12.1 1.3 3.6 36.9 16.9 0.9 3.3 77.5 29.1

20.6 0.8 2.6 9.4 6.3 0.5 3.4 25.4 10.7 0.8 3.5 43.4 14.7 1.1 4.0 61.9 22.0 0.6 4.2 97.1 44.5

20.7 0.4 2.5 17.2 8.3 0.3 4.3 47.3 17.7 0.2 5.4 70.7 21.8 0.3 4.0 90.6 27.5 0.3 4.1 100.0 54.2

20.8 0.2 3.2 28.7 10.4 0.3 3.0 77.3 18.6 0.4 3.1 95.8 23.9 0.3 3.4 99.8 35.7 0.3 3.7 100.0 66.6

across replications. These are reported in Table The power of the J1 test is markedly better for2

r ,0, and the J2 test powers are close to being1 for a 5% significance level.the mirror image forr . 0. For r 40 theFor each sample sizeT the second column u umaximum power ofhJ1, J2j is far greater thanheaded ‘M’ presents the power of the KS

3 that of the S test. The power of the stacked teststatistic for the marginal distribution ofx .1

increases asr increases, but forT 5 50, forIrrespective ofr these ought to be approximate-example, the maximum power at 13% forr 5ly equal to size. The first column reports the0.8 is far less than that of J2 (64%).results for the S test and the last two columns

The tests also perform differently withfor the J1 and J2 tests. Whenr 50 the randomchanges in the sample size. Whereas powervariables, x and x , are independent, so the1 2

increases with sample size for the J1 and J2predicted densities are correctly specified, andtests, it remains largely unaltered for the S test.the marginal, stacked and joint tests approxi-A direct comparison of the two tests shows thatmately return the nominal size. For allT there isin almost all cases forr . 0.3 the worst J testsome evidence that the tests are slightly under- u uhas more power than the S test, and the best issized (the 95% confidence interval is approxi-far superior.mately 426%).

There is clear asymmetry in the performanceof both the S and J1 and J2 tests aroundr 5 0. 3.2. Ramberg

2 Following Joe (1997), we use the bivariateThe critical values of the KS test are from Miller (1956).3 gaussian copula to generate alternative bivariateThe KS statistics for the marginal ofx are basically2

identical. distributions. These have well defined, corre-

Page 5: Evaluating multivariate forecast densities: a comparison of two approaches

M.P. Clements, J. Smith / International Journal of Forecasting 18 (2002) 397–407 401

lated marginal distributions, as required. As- tions of the degree of kurtosis of the magnitudesuming the bivariate vectorx is given by (2) considered here.with s 5s 5 1, m 5m 5 0, and correlation1 2 1 2

r, we construct correlated uniform random3.4. Uniformvariates asp 5F(x ), j 51,2, whereF is thej j

gaussian cdf. Applying the inverse cumulativeTable 3 indicates that both the J1 and J2 testsdistribution function of the desired distribution

have good power for large values ofr of both21(F ) to the uniform random variates, we obtainsign.drawings from bivariate distributions with cor-

21related marginals,w 5F ( p ), j 5 1,2. Wej j

then proceed as above, that is, we calculate theprobability integral transforms for the simulated 4. VAR model for output unemploymentdata usingN 0,1 distributions, and calculate thes dKS statistics for a null of a uniform distribution. Clements and Smith (2000) compare the

One of the distributions we use is the Ram- density forecasts derived from four models forberg distribution, see Ramberg, Dudewicz, US GNP (x) growth and changes in unemploy-Tadikamalla, and Mykytka (1979), which is ment (u) using data over the period 1948:1–defined by the percentile function: 1993:4. They considered linear autoregressive

(AR) models, non-linear self-exciting thresholdl4l 2(12p)3f g AR (SETAR) models, a vector autoregressiveR( p)5l 1 p Yl , 0# p # 11 2

(VAR) model, and a nonlinear VAR model of theand a density function given by: form outlined by Pesaran and Potter (1997). In

this section we compare the S and J tests byl 21 l 21 213 4f(R( p))5l l p 1l (12 p)f g2 3 4 investigating the performance of the two uni-

variate models ofx and u. Both the VAR andwherel , l , l , andl control the skewness1 2 3 4NVAR models yield an estimated correlationand kurtosis. We report results for a kurtosis ofbetween the errors of around2 0.64, suggesting3 and skewness ranging from 0.1 to 0.7. Thethat joint densities formed as the product of thedistribution is scaled to have a zero mean andunivariate densities forx and u will be mis-unit variance, as are the other non-gaussianspecified. Because the correlation is negative,distributions we consider, and therefore matchthe J test reported here is taken to be J1 (thethe model densities (gaussian zero-mean unit-product). Fig. 1 presents the KS tests for thevariance) in this respect.one-step forecast densities from the AR andTable 2 indicates that the power of the S testSETAR models, which were estimated recur-is increasing inT, the degree of skewness, andrsively from 1948:1–1974:4 until 1948:1–(asr increases from20.8 to 0.8). For small1992:1. Using 90% confidence intervals it isvalues of r , the S test is better than J1 and J2,u uapparent that the univariate SETAR densitybut one of J1 and J2 clearly dominates whenforecasts are not rejected for eitherx or u. Inthere is correlation. The power for the marginalcontrast, the linear AR model is clearly rejectedp.i.t is increasing in the skewness parameter butfor x and the result foru is marginal. The J testdoes not depend onr as expected.clearly rejects both the SETAR and AR modeldensities, whereas the S test only rejects the AR3.3. Student’ s tmodel. Interpreted in the light of the MonteCarlo evidence, we attribute these outcomes toTable 3 indicates that the powers of the S andthe greater power of the J1 test.J tests are not very sensitive to mis-specifica-

Page 6: Evaluating multivariate forecast densities: a comparison of two approaches

402 M.P. Clements, J. Smith / International Journal of Forecasting 18 (2002) 397–407

Table 2Monte Carlo comparison of the KS stacked (S) and joint (J1, J2) tests: Ramberg distribution

r Skew T525 T550 T570 T5100

S M J1 J2 S M J1 J2 S M J1 J2 S M J1 J2

20.8 0.1 0.5 2.7 35.9 10.1 0.2 3.3 83.0 17.7 0.2 2.2 95.6 23.9 0.6 4.0 100.0 33.8

20.4 0.1 1.6 1.8 5.0 5.1 1.9 3.7 11.6 7.2 2.5 4.2 17.3 9.4 2.0 4.7 26.4 13.8

0.0 0.1 2.9 4.1 3.2 2.2 3.8 2.7 3.2 2.4 4.8 4.1 4.5 3.6 3.8 3.3 3.5 2.9

0.4 0.1 6.4 3.4 3.6 3.8 8.2 5.7 7.3 6.0 6.1 3.4 9.7 8.1 9.0 4.4 13.2 14.2

0.8 0.1 9.9 2.7 7.4 17.8 13.2 3.5 22.4 63.4 12.5 4.6 28.9 89.2 14.3 4.5 46.3 98.8

20.8 0.2 0.3 3.0 41.9 12.0 0.6 3.8 87.6 19.3 0.5 6.4 98.5 31.2 0.6 4.5 100.0 41.4

20.4 0.2 1.6 5.2 7.0 7.7 2.9 4.1 14.2 9.4 2.7 4.5 20.1 9.7 3.4 6.9 32.8 13.6

0.0 0.2 4.9 3.8 4.1 2.8 4.6 3.7 4.7 4.6 6.8 4.8 5.5 3.8 6.4 6.2 5.7 3.8

0.4 0.2 7.3 3.2 4.3 2.8 8.8 4.5 5.7 5.7 8.5 4.4 6.1 10.6 10.6 6.4 8.5 12.1

0.8 0.2 13.1 4.0 9.0 19.6 13.6 4.9 16.0 64.6 14.6 5.4 25.1 88.8 15.7 7.6 41.8 98.9

20.8 0.3 1.0 4.5 45.6 12.4 0.8 5.2 91.6 18.9 1.7 6.8 99.7 32.6 2.6 7.5 100.0 41.7

20.4 0.3 2.7 5.0 7.7 6.2 4.0 5.1 15.7 10.2 4.9 6.7 29.3 10.7 6.6 7.9 40.5 15.3

0.0 0.3 6.2 3.6 4.3 3.6 7.4 5.5 5.6 3.5 7.8 6.7 5.3 4.0 10.0 7.8 7.0 4.0

0.4 0.3 10.0 5.6 4.3 2.2 10.2 6.2 5.4 7.0 12.9 7.5 7.4 7.5 12.8 6.9 7.7 14.3

0.8 0.3 13.5 5.5 7.0 20.2 17.5 6.5 18.1 65.7 15.1 6.8 25.2 90.7 18.8 8.1 37.9 99.3

20.8 0.4 0.9 6.0 52.1 13.6 1.6 6.2 94.7 23.8 2.3 9.7 99.7 30.7 5.4 9.1 100.0 45.5

20.4 0.4 4.3 5.1 9.8 7.5 6.0 8.3 23.7 9.5 7.6 8.5 33.9 11.2 10.7 9.7 48.5 16.1

0.0 0.4 7.5 5.5 5.4 3.6 9.5 6.1 5.8 2.8 12.2 8.1 7.9 3.9 15.4 10.6 10.3 4.3

0.4 0.4 10.2 5.3 3.8 3.8 14.4 8.0 4.6 4.9 14.6 8.6 6.8 7.8 17.3 10.2 7.7 12.1

0.8 0.4 17.2 5.3 6.2 18.8 20.3 7.5 17.5 66.1 20.9 8.7 23.0 92.3 23.6 8.7 40.2 99.7

20.8 0.5 1.9 6.6 58.8 12.0 3.3 8.6 97.3 24.3 6.1 10.2 99.8 35.7 10.6 12.5 100.0 50.9

20.4 0.5 5.8 6.0 12.9 6.4 10.4 10.0 29.6 8.6 10.6 11.0 41.9 11.3 18.6 15.5 62.6 18.7

0.0 0.5 7.8 5.8 5.4 4.2 14.6 8.9 8.0 5.1 15.6 12.3 9.0 4.6 21.8 13.4 10.8 5.0

0.4 0.5 12.1 6.7 4.4 3.4 17.2 8.8 4.5 6.4 20.4 10.1 7.5 8.4 27.8 15.5 7.1 10.5

0.8 0.5 17.6 6.4 8.3 21.2 20.5 8.4 14.2 70.2 22.8 10.2 23.7 94.7 29.5 12.4 34.2 99.7

20.8 0.6 2.4 8.3 65.2 13.6 7.1 13.1 98.4 29.5 10.4 13.3 99.9 36.4 21.9 17.6 100.0 51.5

20.4 0.6 7.9 7.4 16.2 6.7 12.4 11.9 36.6 12.7 17.8 15.7 49.3 13.7 27.9 17.3 70.2 17.9

0.0 0.6 10.2 8.2 6.7 3.9 16.7 12.0 8.9 4.4 21.4 15.3 12.3 3.9 33.1 18.9 17.9 4.3

0.4 0.6 17.0 7.9 5.6 3.2 21.6 10.4 6.3 4.2 23.8 12.7 7.8 7.5 34.0 17.4 8.7 12.8

0.8 0.6 23.1 7.8 9.1 20.0 29.1 13.0 14.3 76.9 27.7 11.5 22.6 96.9 41.2 22.1 34.2 100.0

20.8 0.7 5.0 11.4 74.2 17.3 13.8 15.4 99.2 26.9 24.8 19.9 100.0 42.4 41.2 25.5 100.0 56.2

20.4 0.7 9.6 8.6 20.0 7.0 20.8 14.6 42.7 10.3 29.8 20.3 61.1 17.5 43.9 24.8 83.1 20.9

0.0 0.7 14.3 9.3 8.9 3.3 26.0 13.6 14.8 3.4 32.0 21.6 18.4 5.2 45.8 26.3 25.1 4.1

0.4 0.7 20.0 9.9 5.8 2.6 29.3 15.1 8.3 5.1 36.9 18.4 8.5 6.9 46.4 23.7 11.0 12.4

0.8 0.7 24.8 9.7 8.3 22.6 30.7 16.6 17.3 81.7 38.5 19.4 24.3 98.2 46.1 22.6 37.8 100.0

5. Simulation results for output and parameters and some basic summary statisticsunemployment for the VAR 2 are presented in Table 4. Thes d

correlation between the error terms is20.64.To investigate the empirical example further, In experiment (1), we simulated data from

we simulated data from estimated AR(2) and univariate AR(2) models forx and u ignoringVAR(2) models for x and u. The estimated the correlation in the errors (sor 50). We

Page 7: Evaluating multivariate forecast densities: a comparison of two approaches

M.P. Clements, J. Smith / International Journal of Forecasting 18 (2002) 397–407 403

Table 3Monte Carlo comparison of the KS stacked (S) and joint (J1, J2) tests: Student’st-test and uniform distributions

r dof T525 T550 T570 T5100

S M J1 J2 S M J1 J2 S M J1 J2 S M J1 J2

Student’s t

20.8 20.0 0.3 2.4 30.4 9.4 0.2 3.0 78.3 14.3 0.1 4.6 96.5 23.8 0.4 3.1 99.6 30.8

20.4 20.0 1.1 3.6 4.6 7.1 1.7 3.5 10.2 7.1 1.8 3.2 12.1 8.0 1.0 4.5 20.2 9.6

0.0 20.0 3.3 3.2 3.6 2.9 3.1 2.9 3.6 4.4 4.7 3.0 4.0 3.4 3.8 4.1 3.5 3.7

0.4 20.0 6.2 2.2 4.4 3.0 6.7 3.2 7.2 6.9 6.3 2.9 7.8 11.8 7.2 3.4 12.7 14.1

0.8 20.0 11.3 2.8 9.2 20.1 12.4 3.4 18.1 66.7 12.3 3.7 31.5 88.8 13.1 5.2 46.4 98.5

20.8 10.0 0.2 3.6 29.3 9.1 0.5 3.1 83.4 12.8 0.4 3.4 97.0 20.3 0.7 4.2 100.0 24.2

20.4 10.0 1.6 3.4 3.6 4.6 1.1 3.6 8.1 7.1 1.4 3.9 14.2 7.3 2.5 3.7 21.2 8.0

0.0 10.0 4.9 2.2 3.6 3.3 4.2 3.0 3.9 2.7 3.4 4.1 2.8 3.4 5.2 4.6 4.4 3.7

0.4 10.0 7.5 3.8 4.8 3.1 7.6 4.3 7.0 8.0 8.2 4.6 8.6 11.2 9.4 5.4 12.5 19.4

0.8 10.0 12.2 3.6 8.0 21.1 14.3 4.5 19.5 68.5 12.9 3.8 29.9 91.9 14.6 3.4 44.1 99.1

20.8 7.0 0.4 3.1 29.3 8.3 0.4 3.0 84.9 12.1 0.8 3.3 98.8 14.5 2.3 4.1 100.0 18.2

20.4 7.0 2.1 3.4 3.9 5.1 2.4 4.2 9.8 5.9 2.0 4.5 13.7 6.2 4.4 4.7 22.8 7.0

0.0 7.0 4.0 2.8 3.4 3.2 3.8 3.7 2.0 3.1 5.2 4.3 4.3 3.9 5.6 4.9 5.3 4.4

0.4 7.0 5.9 2.6 3.5 4.7 8.8 3.0 6.5 8.3 8.6 5.3 8.8 13.0 11.6 5.4 12.4 20.2

0.8 7.0 13.1 3.2 7.2 22.7 14.0 4.5 17.9 72.6 13.8 4.5 25.6 94.6 17.4 5.1 44.6 99.5

20.8 4.0 1.4 3.1 38.0 4.6 5.8 7.4 93.3 5.9 11.0 10.4 99.9 8.2 18.9 12.4 100.0 7.8

20.4 4.0 4.1 3.5 5.6 2.3 9.2 5.7 14.6 4.2 12.8 8.5 24.3 3.6 27.7 14.2 45.0 5.9

0.0 4.0 6.3 3.3 3.0 3.2 14.6 6.5 5.1 3.8 18.7 8.9 7.5 4.3 30.6 13.4 12.0 7.3

0.4 4.0 9.8 3.7 5.2 5.1 17.4 4.9 8.2 12.5 24.9 8.3 12.9 25.5 41.3 14.4 21.1 38.2

0.8 4.0 15.3 3.5 5.9 32.5 27.7 6.2 15.3 86.4 31.0 9.2 23.7 98.5 44.8 15.1 35.9 100.0

Uniform

20.8 2.8 7.2 36.2 21.5 8.5 11.1 71.5 45.7 15.3 15.2 90.0 65.6 27.9 21.9 99.6 84.4

20.4 7.1 7.2 11.6 8.9 17.0 11.1 23.5 17.1 23.8 14.1 32.8 24.1 39.5 25.2 44.3 38.7

0.0 11.5 7.2 7.6 4.1 23.9 12.3 12.2 5.2 31.4 17.3 15.9 5.6 42.4 20.0 17.2 5.5

0.4 14.7 7.3 7.1 4.9 27.1 14.2 17.6 5.8 35.2 16.5 24.4 8.0 50.4 21.8 39.1 13.9

0.8 22.1 6.5 20.3 18.8 34.2 12.0 49.3 59.2 42.6 15.6 68.4 86.7 56.2 22.8 86.7 98.3

estimate correctly-specified models for the pro- identical to (2) and (3) except thatr 5 0.64.cesses, and the inflation of the rejection fre- The J2 test is now better than the S test, and forquencies evident in Table 5 is therefore due to experiment (4) rejects the AR models as beingthe effects of parameter estimation. In experi- mis-specified nearly half the time.ment (2) the DGP is an AR(2) for both vari-ables and we setr 5 2 0.64. When AR(2)models are estimated the power of the J1 test is6. Conclusionsmarkedly better than the S test, which is in tunewith the rejection on the J but not the S-test in Stacking the p.i.t. values from the conditionalthe empirical section. In experiment (3), the and marginal densities fails to preserve theDGP is a VAR(2) with r 5 20.64 and we temporal pairing, so that KS tests of uniformityapproximate this by univariate AR(4) models. may fail to reject the null for mis-specificationsThe J1 test has far higher rejection frequencies that affect the temporal pairings, such as gettingthan the S test. Experiments (4) and (5) are the contemporaneous correlation in the pro-

Page 8: Evaluating multivariate forecast densities: a comparison of two approaches

404 M.P. Clements, J. Smith / International Journal of Forecasting 18 (2002) 397–407

Fig. 1. Empirical cdfs for the p.i.t. values for AR and SETAR model forecast forx and u.

Page 9: Evaluating multivariate forecast densities: a comparison of two approaches

M.P. Clements, J. Smith / International Journal of Forecasting 18 (2002) 397–407 405

Table 4 the form of mis-specification, there is a goodVAR(2) model for US GNP and unemployment case for employing a battery of tests, just asVariable GNP (x) Unemployment (u) standard econometrics packages typically report

a number of tests of serial correlation of aCoefficient t-prob Coefficient t-probregression model’s errors (Durbin Watson,

x 0.172 0.076 20.115 0.0001 Box–Pierce test based on squared autocorrela-x 0.266 0.008 20.827 0.0102 tions, Lagrange–Multiplier tests etc.).u 20.954 0.002 0.556 0.0001

Finally, it is customary to test for indepen-u 1.064 0.000 20.367 0.0002

Constant 0.425 0.001 0.158 0.000 dence of the p.i.t.s in addition to running the KStest for uniformity: that we have not done so is

2s 0.854 0.087 because tests for autocorrelation in the p.i.t.sSC 0.518 0.096

(and powers thereof) would lack power for ther 20.640type of mis-specification considered in this

SC is theP value corresponding to an LM test for fourth-order paper.autocorrelation in the equations’ residuals.

Table 5AcknowledgementsMonte Carlo comparison of the KS stacked and joint tests:

AR and VAR modelsWe are grateful to Martin Cripps, Mike Pitt

Exp. no. Marg. Marg. S J1 J2and to an anonymous referee and an Associate

(1) 10.4 10.1 10.2 10.3 11.0 Editor of thisJournal for helpful comments and(2) 10.4 10.7 2.4 52.6 28.7 suggestions. The first author acknowledges(3) 7.3 10.4 8.0 43.1 20.3

financial support under ESRC grant(4) 10.4 9.5 18.9 27.6 44.0L138251009.(5) 15.3 1.9 9.0 13.8 26.9

(1) DGP is AR(2) for bothu andx, and the forecasts aregenerated from estimated AR(2) models; (2) DGP is

Appendix AAR(2) for bothu andx, with r 5 2 0.64, and the forecastsare generated from estimated AR(2) models; (3) DGP is

Let Z and Z be independentU 0,1 randomVAR(2) for both u and x, with r 5 2 0.64, and the s d1 2forecasts are generated from estimated AR(4) models; (4) variables. We begin by deriving the distributionDGP is AR(2) for bothu and x, with r 50.64, and the function for their product. Because of indepen-forecasts are generated from estimated AR(4) models; (5) dence, the joint distribution functionF is theZ Z1 2DGP is VAR(2) for bothu and x, with r 5 0.64, and the

product of the distribution functions ofZ andforecasts are generated from estimated AR(4) models. 1

Z , F z ,z 5 z z , and f 5 f f 51.s d2 Z Z 1 2 1 2 Z Z Z Z1 2 1 2 1 2

Using a change of variables:cesses wrong. We show via Monte Carlo and inan empirical example that tests based on simple *Z 5 Z Z1 1 2transformations of the pairs (e.g. products or *Z 5 Z2 2ratios) can deliver much more powerful tests inthese circumstances. For other types of mis- for which the determinant of the Jacobian forspecification, such as when the model density is the inverse transformation is:of the wrong form (but the first two moments

*Z1 1match that of the DGP), and in the absence of ≠ Z ,Z 1s d ] ]]21 2 2*Z]]] ]J 5 det 5 5*Z2correlation, the S-test may be preferred. Be- 2* * *≠ Z ,Z Zs d * *1 2 2cause the relative powers of the tests depend on 0 1

Page 10: Evaluating multivariate forecast densities: a comparison of two approaches

406 M.P. Clements, J. Smith / International Journal of Forecasting 18 (2002) 397–407

* * *The joint density function of (Z ,Z ) is then: integratingZ out of f over the permissible1 2 2 Z Z* *1 2

range gives:*Z 1 11] * ] ]f 5 f , Z 3 5 (3)S DZ Z 2* * 11 2 * * *Z Z Z2 2 2

1* * ]f 5E Z dZ 5* * Z 2 2where 0,Z , Z , 1. *11 2 2

0*Since Z is the random variable of interest,1

*integratingZ out of f over the permissible2 Z Z ** * when Z ,1 or:1 2 1range gives:

1]

1 Z *121 11* * * *f 5E Z dZ 5 ln Z 5 2 ln ZZf g * * * ]]Z 2 2 2 1 1 f 5E Z dZ 5*1 Z 2 2*1 2*2 Zs d1Z *1 0

The distribution function is: *when Z .1. The distribution function is:1

* * * *F 5 Z 2Z ln Z , 0,Z ,1Z 1 1 1 1*1 *Z 1] *, 0, Z ,11* 2The probability integral transform ofZ with1 F 5Z * 11respect tof , i.e.:Z *1 ]] *12 , 1, Z ,`1*2Z 1nz*t

n References*hz j 5 E f (u) du (4)t t51 Z *15 60 t51 Christoffersen, P. F. (1998). Evaluating interval forecasts.

International Economic Review 39, 841–862.yields an iid U 0,1 sequence under thes dClements, M. P., & Hendry, D. F. (1993). On the limita-null. Thus, we need simply to evaluateF atZ *1 tions of comparing mean squared forecast errors.Jour-nZ 3 Z and plot the empirical distributionh j1t 2t t51 nal of Forecasting 12, 617–637, with discussion.

function, which will be the 458 line under the Clements, M. P., & Smith, J. (2000). Evaluating thenull. The confidence intervals are again based forecast densities of linear and non-linear models:

Applications to output growth and unemployment.on Miller (1956).Journal of Forecasting 19, 255–276.Now consider the ratio. Again using a change

Clements, M. P., & Taylor, N. (2000).Evaluating predic-of variables:tion intervals for high-frequency data, Department ofEconomics, University of Warwick, mimeo.*Z 5 Z /Z1 1 2

Dawid, A. P. (1984). Statistical theory: The prequential*Z 5 Z2 2 approach. Journal of The Royal Statistical Society,Series A 147, 278–292.for which the determinant of the Jacobian for

Diebold, F. X., Gunther, T. A., & Tay, A. S. (1998).the inverse transformation is:Evaluating density forecasts: With applications to finan-

≠ Z , Z cial risk management.International Economic Reviews d * *Z Z1 2 2 1]]] *J 5 det 5 5ZU U 2 39, 863–883.* *≠ Z , Zs d 0 11 2Diebold, F. X., Hahn, J. Y., & Tay, A. S. (1999).

Multivariate density forecast evaluation and calibration* *The joint density function of (Z ,Z ) is then:1 2in financial risk management: High frequency returns on

*Z 1 foreign exchange.Review of Economics and Statistics] * * *f 5 f , Z 3 Z 5 Z (5)S DZ Z 2 2 2* *1 2 *Z 81, 661–673.2Diebold, F. X., & Mariano, R. S. (1995). Comparing

* *where 0,Z , 1, 0,Z ,`. predictive accuracy.Journal of Business and Economic2 1

*Since Z is the random variable of interest, Statistics 13, 253–263.1

Page 11: Evaluating multivariate forecast densities: a comparison of two approaches

M.P. Clements, J. Smith / International Journal of Forecasting 18 (2002) 397–407 407

Granger, C. W. J., White, H., & Kamstra, M. (1989). West, K. D. (1996). Asymptotic inference about predictiveInterval forecasting: An analysis based upon ARCH- ability.Econometrica 64, 1067–1084.quantile estimators.Journal of Econometrics 40, 87–96. West, K. D., & McCracken, M. W. (1998). Regression-

Joe, H. (1997). Multivariate models and dependence based tests of predictive ability.International Economicconcepts, Chapman and Hall, London. Review 39, 817–840.

Miller, L. H. (1956). Table of percentage points of West, K. D., & McCracken, M. W. (2001). Inference aboutKolmogorov statistics.Journal of the American Statisti- predictive ability. Mimeo, Louisiana State University,cal Association 51, 111–121. USA In: Clements, M. P., & Hendry, D. F. (Eds.), A

Newbold, P., & Harvey, D. I. (2001). Forecasting combina- companion to economic forecasting, Blackwell, London,tion and encompassing. Mimeo, School of Economics, Forthcoming.University of Nottingham In: Clements, M. P., & White, H. (2000). A reality check for data snooping.Hendry, D. F. (Eds.), A companion to economic fore- Econometrica 68, 1097–1126.casting, Blackwell, London, Forthcoming.

Noceti, P., Smith, J., & Hodges, S. (2001).An evaluation Biographies: Michael P. CLEMENTS is a Reader in theof tests of distributional forecasts, Department of Econ- Department of Economics, University of Warwick. He hasomics, University of Warwick, mimeo. published a number of articles in academic journals on

Pesaran, M. H., & Potter, S. M. (1997). A floor and ceiling forecasting, and is the author of two books on forecastingmodel of US output.Journal of Economic Dynamics (both joint with David F. Hendry),Forecasting Economicand Control 21, 661–695. Time Series. The Marshall Lectures on Economic Fore-

Ramberg, J., Dudewicz, E., Tadikamalla, P., & Mykytka, casting, 1998, Cambridge: Cambridge University Press,A. (1979). A probability distribution and its uses in and Forecasting Non-Stationary Economic Time Series.fitting data.Technometrics 21, 201–209. The Zeuthen Lectures on Economic Forecasting, 1999,

Rosenblatt, M. (1952). Remarks on a multivariate trans- Cambridge, MA: MIT Press.formation. Annals of Mathematical Statistics 23, 470–472. Jeremy SMITH is a Reader in the Department of Econ-

Tay, A. S., & Wallis, K. F. (2001). Density forecasting: a omics, University of Warwick. He has interests in the areasurvey. Mimeo, Department of Economics, University of non-linear modelling and fractional integration.of Warwick In: Clements, M. P., & Hendry, D. F. (Eds.),A companion to economic forecasting, Blackwell, Lon-don, Forthcoming.