content horizons for conditional variance forecasts

www.elsevier.com/locate/ijforecast

International Journal of Foreca

Content horizons for conditional variance forecasts

John W. Galbraitha, Turgut KVs˙Vnbayb,*

aDepartment of Economics, McGill University, Montreal, Quebec H3A 2T7, CanadabInternational Monetary Fund, Washington, DC 20431, USA

Abstract

Using realized variance to estimate daily conditional variance of financial returns, we compare forecasts of daily variance

from standard GARCH and FIGARCH models estimated by Quasi-Maximum Likelihood (QML), and from projections on past

realized volatilities obtained from high-frequency data. We consider horizons extending to 30 trading days. The forecasts are

compared with the unconditional sample variance of daily returns treated as a predictor of daily variance, allowing us to

estimate the maximum horizon at which conditioning information has exploitable value for variance forecasting. With foreign

exchange return data (DM/$US and Yen/$US), we find evidence of forecasting power at horizons of up to 30 trading days, on

each of two financial returns series. We also find some evidence that the result of (e.g.) Bollerslev and Wright [Bollerslev, T., &

Wright, J. H. (2001) High-frequency data, frequency domain inference, and volatility forecasting. Review of Economics and

Statistics, 83, 596–602], that projections on past realized variance provide better one-step forecasts than the QML-GARCH and

-FIGARCH forecasts, appears to extend to longer horizons up to around 10 to 15 trading days. At longer horizons, there is less

to distinguish the forecast methods, but the evidence does suggest positive forecast content at 30 days for various forecast types.

D 2004 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved.

Keywords: GARCH; High-frequency data; Integrated variance; Realized variance

1. Introduction

The modelling of conditional variance of eco-

nomic time series, following Engle (1982) and many

subsequent contributions, has permitted character-

ization and forecasting of volatility in data such as

time series of asset returns. While returns themselves

are usually approximately unforecastable, forecasting

0169-2070/$ - see front matter D 2004 International Institute of Forecaste

doi:10.1016/j.ijforecast.2004.10.002

* Corresponding author.

E-mail address: [email protected] (T. KVs˙Vnbay).

of the variance of returns has been quite successful.

Of course, evaluating the degree of this success, as

with the forecasting itself, has been complicated by

the unobservability of the realized conditional

variance.

Recent work with high-frequency data, in particular

by Andersen and Bollerslev (1997, 1998), has provided

an avenue for relaxing this constraint. Using direct

estimates of daily conditional variance obtained from

high-frequency (intraday) squared returns may permit

better evaluation of model-based daily variance fore-

casts (e.g. Andersen & Bollerslev, 1998) and character-

sting 21 (2005) 249–260

rs. Published by Elsevier B.V. All rights reserved.

J.W. Galbraith, T. KVs˙Vnbay / International Journal of Forecasting 21 (2005) 249–260250

ization of daily conditional variances without recourse

to standard parametric models (e.g. Andersen, Boller-

slev, Diebold, & Labys, 2001; Andersen, Bollerslev,

Diebold, & Labys, 2003; Bollerslev & Wright, 2001).

The present paper uses intraday asset returns to

examine model-based variance forecasts at various

horizons. As Christoffersen and Diebold (2000) note,

risk depends on the horizon considered, and different

horizons are relevant for different problems or, in the

case of financial risk management, different classes of

market participant. Nonetheless, little is known about

the forecastability of variance at horizons beyond the

very short, such as 1 day. We investigate forecasts,

both from the standard GARCH model estimated by

Quasi-Maximum Likelihood (QML), from the frac-

tionally integrated GARCH model which allows long

memory in conditional variance, and from projecting

directly on past values of the realized variance, at

horizons of up to 30 trading days.1

These forecasts are compared with the forecast

implicit in the unconditional sample variance. We are

then able to consider the maximum horizon at which

any of these forecasts contain useful information about

deviations of daily conditional variance from the

unconditional variance: that is, the maximum horizon

at which we can exploit conditioning information for

forecasting. In doing so, we use the definitions of

forecast content of Galbraith (2003), and compute the

forecast content function to the 30-day horizon.

Throughout this paper, wewill use the term drealizedvarianceT to refer to an estimate of daily variance

obtained by summing higher frequency returns; this

concept has typically been labelled drealized volatilityT,following Andersen and Bollerslev (1998) and sub-

sequent authors. However, we will follow Barndorff-

Neilsen and Shephard (2002) in using the former term,

since volatility has more usually been used to refer to a

measure of standard deviation rather than variance. The

next section of the paper describes themethods that will

be applied in evaluating the information content of

variance forecasts. The third section describes the high-

frequency data and forecasting models, and the fourth

estimates QML-GARCH, -FIGARCH and realized-

1 We emphasize the QML estimation in referring to these

estimates to make clear that the (FI)GARCH forecasts used are of

the standard type (e.g. Bollerslev, 1986; Baillie et al., 1996), based

on daily data alone.

variance forecast content for the daily conditional

variance of the returns, and interprets the results.

2. Variance forecasts and content horizons

The conditional variance of a time series is not

directly observable. For this reason, both estimation

and evaluation of forecasting models is difficult

relative to the conditional mean case. However,

estimates of variance can be obtained from the

quadratic variation or realized variance. For example,

Andersen and Bollerslev (1998) address the question

of the forecast performance of GARCH models using

estimates of daily conditional variance obtained from

intraday returns, and compare an R2-type measure of

predictive power at a 1-day horizon. The exercise that

we will perform here uses the same realized variance

measures, but uses a measure based on relative

forecast MSE at different horizons to characterize

forecasting performance for daily variance.

The model of asset returns used by Andersen and

Bollerslev, which we adopt for the exposition here,

assumes that returns follow a continuous-time diffu-

sion process such that instantaneous returns obey the

relation dpt=rtdWp,t, where pt is the logarithm of the

price, rt is the instantaneous standard deviation, and

Wp,t is a Wiener process. Consider the sequence of

(Tm+1) discretely sampled logarithmic prices {pj},

j=0, 1/m, 2/m,. . .,T, where integer values of the indexj represent end-of-day closing prices and non-

integer values represent intraday observations. Trans-

form the sequence to obtain Tm returns, r l;lþ1½ �,l ¼ 0; 1

m; 2m; N ; T � 1

m

��. Defining r[a ,b] as pb�pa,

the daily return is r[t�1,t]=pt�pt�1 and the return on

the last of m intraday periods of equal length between

t�1 and t is r[t�1/m,t]=pt�pt�1/m.2

With these definitions, the daily quadratic variation

or integrated variance isR 10

r2tþsds, and using the

result that (Andersen & Bollerslev, 1998)

p limmYl

Z 1

0

r2tþsds �

Xmi¼1

r2tþ i�1ð Þ

m;tþ i

m�½

! ¼ 0; ð2:1Þ

it follows that the summation of intraday squared

returns, or daily realized variance, provides an

2 This is a slight modification of Andersen and Bollerslev’s

notation for the returns.

J.W. Galbraith, T. KVs˙Vnbay / International Journal of Forecasting 21 (2005) 249–260 251

estimate of the daily conditional variance. It is the

performance of daily conditional variance models in

forecasting this quantity of interest which is evaluated

both by Andersen and Bollerslev and others, and in

the present paper. Note that the result (Eq. (2.1))

requires that the discretely sampled squared returns

r[t+((i�1)/m), t+(i/m)]2 be serially uncorrelated and that the

price process be continuous. In practice, this summa-

tion will provide only an approximation toR 10

r2tþsds,

the quality of which will depend upon the adequacy of

these assumptions as well as on the intraday sampling

frequency m (that is, the asymptotics). Meddahi

(2002) provides a detailed examination of the

discrepancy between integrated and realized variance.

The daily realized variance therefore gives us the

target for the forecasting technique, analogous to the

realized outcome in forecasting an observable series.

With this information, we proceed to the evaluation of

forecasts at different horizons, making use of the

forecast content function as defined in Galbraith

(2003), modified slightly here to reflect the fact that

we are evaluating second-moment forecasts. There,

the forecast content function was defined as the

proportionate reduction in MSE obtainable relative

to the unconditional mean forecast, in forecasting the

level of a quantity observable ex post. Adapting that

definition for a conditional variance forecast, we set

C sð Þ ¼ 1�MSE rr2 sð ÞÞðMSE rr2 sð ÞÞð

; s ¼ 1; N ; S; ð2:2Þ

where r2(s) and r2(s) are, respectively, the model-

based s-step-ahead forecast of the daily conditional

variance, and the estimated unconditional variance. As

in the case of conditional mean forecasts, C(s)Y0 as

T, sYl for a forecast rT+sjT2 based on a correctly

specified forecasting model; C(s) may be less than zero

where model parameter uncertainty dominates the

contribution of these parameters to prediction; C(s)

equal to zero indicates model-based forecasts no better

as predictors than is the unconditional sample variance.

For general autoregressive processes, the content

function can be obtained analytically as a function of

the autoregressive parameters and sample size avail-

able for estimation, taking account of parameter

estimation uncertainty; asymptotic expressions for

C(s) are given in Galbraith (2003). Alternatively, for

a set of forecasts from an arbitrary forecasting model,

C(s) can be estimated from a sequence of outcomes and

forecasts using the sample mean squared errors and

associated asymptotic inference. This is the method

used below for the forecasts evaluated in Section 4.

3. Data and models

We evaluate the forecast content functions for two

currencies, priced relative to the US dollar. In each

case, we examine the daily logarithmic returns. High-

frequency intraday data are available on the foreign

exchange prices (bid and ask) at 5-min intervals. A

point relevant to each of the data series, noted in

Andersen and Bollerslev (1998) and explored further in

Andersen et al. (2001, 2003), is that the highest

possible frequency of observation may not be optimal

from the point of view of daily variance estimation,

because of market microstructure or other effects. As in

Andersen et al. (2001, 2003), we consider various

possible sampling frequencies for the estimation of

daily variance.

3.1. Foreign exchange data

The foreign exchange data are Deutschemark–US

dollar and Yen–US dollar spot exchange rates, taken

from the Olsen and Associates HFDF-2000 data set.

The original sources of the raw data that are used to

construct HFDF are the DM/$US and Yen/$US bid–

ask quotations displayed on the Reuters FXFX screen.

These raw data are subject to various microstructure

frictions, outliers and other anomalies, and have to be

filtered; see Muller et al. (1990) and Dacorogna;

Muller, Nagler, Olsen, and Pictet (1993) for the

filtering procedures and the construction of the

returns. Because these data have been fairly widely

used, our description will be brief.

Each data point in the HFDF-2000 data set

contains a mid-price, a bid–ask spread over a 5-

min interval of spot rate quotation and a time stamp,

resulting in a total of 288 five-minute returns for a

given day. The 5-min returns, expressed in basis

points (i.e. multiplied by 10,000), are computed as

the mid-quote price difference. The mid-price at a

given regular time point is estimated through a linear

interpolation between the previous and following

mid-price of the irregularly spaced tick-by-tick data.


The bid–ask spread is the average over the last 5-min

interval, again expressed in basis points. If there is

no quotation during this interval, the mean bid–ask

spread is zero. The sample spans the period from

January 2, 1987 to December 31, 1998, providing us

with a sample of 1,262,016 five-minute returns,

expressed in US$ terms. However, it is well known

that trading activity in the foreign exchange market

slows markedly over the weekends and certain

holidays. Following Andersen et al. (2001), among

others, we remove such low-activity days from our

data set. Whenever a day t is excluded from the data

set, we cut from 21:05 GMT of the previous calendar

day t�1 to 21:00 GMT on the calendar day t. This

definition of a bdayQ is standard in the literature and

is based on work by Bollerslev and Domowitz

(1993).

In addition to the low-activity period from Friday

21:05 GMT to Sunday 21:00 GMT, we remove the

following fixed holidays from our data set: Christmas

(December 24–26), New Year’s (December 31–Jan-

uary 2), and the Fourth of July. We also remove the

moving holidays of Good Friday, Easter Monday,

Memorial Day, Labour Day, Thanksgiving (US) and

the day after Thanksgiving. Finally, we cut the days for

which the indicator variable (the bid–ask spread) had

144 or more zeros, corresponding with the technical

bholesQ in the recorded data. After these adjustments,

we are left with 2968 trading days of DM/$US data

corresponding with 2968288=854,784 five-minute

return observations, and 2970 days of Yen/$US data

corresponding with 2970288=855,360 five-minute

return observations. A final adjustment is made to the

Yen/$US data set: because of the extremely large

variance observations occurring at October 7–8, 1998

and surrounding dates,3 we terminate this data set at

the end of May 1998, leaving 2830 days of high-

frequency data. We comment below on the effect of

this trimming on the relative performance of the two

forecast types.

We report results for three different aggregations of

the intraday data as well as for the daily squared

return, for comparison: 5-, 10- and 30-min returns. We

use an initial sample of daily returns, derived from the

high-frequency data, for QML estimation.

3 This was the period of the collapse of Long Term Capital

Management.

3.2. Forecasting models

We use three classes of forecasting model to

evaluate the information content of variance forecasts

at horizons extended to 30 days. The first is the

standard GARCH model estimated by Quasi-Max-

imum Likelihood methods on samples of daily returns

data; these forecasts do not use the information present

in the high-frequency data. The second is the FIG-

ARCH model which incorporates a lag polynomial

term of the form (1�L)d, for noninteger d, and thereby

allows long memory effects in conditional variance

persistence: if actual autocorrelations in conditional

variance decay more slowly than is compatible with the

usual short-range dependent specifications, such mod-

els might be expected to perform relatively well at

longer horizons. The third class is the forecast of

conditional variance obtained from projection of

realized variance on past values of realized variance:

that is, autoregressive models of the realized variance,

as examined by Andersen et al. (2001, 2003), and

Bollerslev and Wright (2001).

Among short-memory models in the QML case,

we restrict ourselves for reporting purposes to the

GARCH(1,1) model which Bollerslev and Wright, for

example, found to produce the best one-step variance

forecasts by a substantial margin, on a set of DM/$US

data very similar to that used here.4 We found that

threshold GARCH and EGARCH specifications did

not perform substantially differently.

The GARCH( p,q) model can be characterized

using polynomials in the lag operator as

r2t ¼ x þ b Lð Þr2

t þ a Lð Þe2t ð3:1Þor in the (1,1) case rt

2=x+br2t�1+ae2t�1; we can

rewrite Eq. (3.1) as

1� a Lð Þ � b Lð Þð Þe2t ¼ x þ 1� b Lð Þð Þ e2t � r2t

��:

ð3:2ÞThe FIGARCH( p,d,q) extends the model in the form

of Eq. (3.2) by allowing a term of the form (1�L)d

(usually da(0,1)), typically written as

1� / Lð Þð Þ 1� Lð Þde2t ¼ x þ 1� b Lð Þð Þ e2t � r2t

��;

ð3:3Þ

4 The alternatives considered by Bollerslev and Wright were

GARCH models up to order (2,2), and the EGARCH(1,1).


where the GARCH model may be viewed as a special

case in which d=0 so that the term (1�L)d plays no

role, and in which (1�/(L))=(1�a(L)�b(L)). In the

FIGARCH(1,d ,1) case, (1�/(L ))=(1�/L ) and

(1�b(L))=(1�bL).5

For initial estimation of the GARCH(1,1) parame-

ters, initial daily samples of each of the two foreign

exchange return series are taken from the first 2 years,

1987–1988. From each forecast date, out-of-sample

forecasts are produced for s days in the future,

s=1,2. . .30, so that the first forecasts are computed

for the first 30 trading days of 1989. Parameter

estimates are then updated to include the data from

the first trading day of 1989, and forecasts are produced

for trading days 2 to 31, and so on: the estimates are

updated at each day for recursive estimation and out-of-

sample forecasting, producing sets of forecasts for

1989 through the ends of the data sets in 1998. For the

FIGARCH forecasts, a somewhat larger initial sample

was necessary in order to avoid computational prob-

lems. We therefore used forecasts based on fixed

parameter estimates on 1200 sample points for an

initial period, with recursive updating of parameter

estimates thereafter. This gives a small advantage to the

FIGARCH as initial forecasts benefit from parameter

estimation based on future data, but as the FIGARCH

does not prove to be superior to other methods, it does

not seem to be critical to the results.

Forecasts by autoregressive projection on past

realized volatilities (AR-rv) are also computed recur-

sively beginning with the same initial samples. Note

that these forecasts use the additional information

present in the high-frequency data, and are therefore

not based on the same information set as the QML-

GARCH/-FIGARCH forecasts. We again provide

results for one of several alternatives considered by

Bollerslev and Wright (2001), in this case a time-

domain AR(12) model of the realized volatilities,

which provides good results on each data set. It is

possible nonetheless that overall increases in forecast

content may be available from variants on this model.

For each data set, we then also estimate the

unconditional variance recursively using the same

5 Recall that (1�L)d can be expanded as 1� Lð Þd ¼Pl

k¼0

d

k

� Lð Þk¼

Plk¼0 bkL

k , with b0=1, b1=�d , b2=(1/2)d(d�1), bj=

(�1/j)bj�1(d+1�j), jz3.

initial sample, in order to evaluate the performance of

the forecasting models in predicting deviations of the

conditional variance from this quantity; that is, we

ask whether conditioning information, as exploited

by the model-based forecast, has value in predicting

variance at a given horizon.

Estimates of the variance forecast content function

for each class of forecast are obtained as follows.

Beginning with an initial sample of size t0, let s=t0,t0+1,. . .,T index a sample of trading days for which

high-frequency information is available, on either of

the two data sets (that is, T=2968 and 2970 for DM/

$US and Yen/$US, respectively). Let s=1,2,. . .,S indexthe forecast horizon; we consider a maximum horizon

of S=30 days. Construct the (T�t0+1)S matrix E

having typical element es,s, defined as the forecast error

from the model-based forecast for trading day s given aforecast made s days in the past. Construct the

corresponding (T�t0+1)1 vector U having typical

element us, defined as the difference between the

estimated unconditional variance of the returns and the

realized variance for trading day s. The latter quantitiesdo not depend on s because the unconditional variance

does not, of course, depend on s. For the entries in both

E and U, we take the sample mean squared error

obtained in comparing the forecast with the realized

variance, at the chosen level of aggregation (e.g. 5, 10

min). These sample quantities are substituted into Eq.

(2.2), for each s.

Pointwise standard errors for the content function

are computed using the standard approximate expres-

sion varðXYÞfðlX

lYÞ2½var Xð Þ

l2X

þ var Yð Þl2Y

� 2cov X ;Yð ÞlX lY

�. Let Xbe the estimated MSE of model-based forecasts foreach s, and let Y be the estimated MSE of the

forecast implicit in the unconditional sample var-

iance. We replace each of the population quantities

on the RHS of this expression with their sample

counterparts for asymptotic inference, allowing for

dependence to order 29, in accordance with the

standard result that successive optimal h-step fore-

casts overlap h�1 periods and so can show depend-

ence to this order. We use a Bartlett window in

computing the variance and covariance terms, and

note that consistency would require the window to

increase, or weights to approach 1, with sample size.

Note also that these standard errors should not be

interpreted as leading to asymptotically normal tests

of (non-)zero forecast content: inference on the


content functions will be presented in Section 4 using

a test of Giacomini and White (2003).

4. Empirical results

The figures record the estimated forecast content

functions for the two data sets and the three forecast

methods; the content functionsC(s) are shownwithF2

standard error bands. Each figure shows the forecast

content function for three different time aggregations,

together with the function estimated on squared returns,

for comparison. The latter is of interest primarily to

underline, in accordance with Andersen and Bollerslev

Fig. 1. (a) DM/USD. Forecast content function. GARCH(1,1) forecasts,

function. GARCH(1,1) forecasts, realized volatility of 10-min returns. (c) D

volatility of 30-min returns. (d) DM/USD. Forecast content function. GA

(1998), the value of realized variance in exploring the

performance of conditional variance forecasts; using

the very noisy squared return series, which was used for

forecast evaluation in literature pre-dating Andersen

and Bollerslev (1998), yields no substantial indication

of forecasting power on any data set. Recall that a value

of zero indicates a forecast of daily variance no more

informative, by MSE, than is the unconditional sample

variance.

Specifically, Figs. 1(a–d) and 2(a–d) plot estimated

forecast content functions, using forecasts from the

QML-estimated GARCH(1,1) model, on the three

realized variance measures and the untransformed

daily squared returns. These figures describe the DM/

realized volatility of 5-min returns. (b) DM/USD. Forecast content

M/USD. Forecast content function. GARCH(1,1) forecasts, realized

RCH(1,1) forecasts, squared daily returns.

Fig. 2. (a) Yen/USD. Forecast content function. GARCH(1,1) forecasts, realized volatility of 5-min returns. (b) Yen/USD. Forecast content

function. GARCH(1,1) forecasts, realized volatility of 10-min returns. (c) Yen/USD. Forecast content function. GARCH(1,1) forecasts, realized

volatility of 30-min returns. (d) Yen/USD. Forecast content function. GARCH(1,1) forecasts, squared daily returns.


$US and Yen/$US data, respectively, with aggrega-

tions of 5, 10 and 30 min used in estimating realized

variance, as well as with the squared daily return for

comparison.

On both data sets, all estimates on realized variance

measures (parts a–c) show positive point estimates of

forecast content remaining at the 30-day horizon: use of

a formal variance forecasting model for daily variance

shows positive information content 30 days into the

future, in the sense that the forecasts improve on the

unconditional variance of the process as an indicator of

the realized variance that will emerge. In each data set,

the 5-min aggregation provides the highest estimate of

forecast content.

The results for conditional variance forecasts made

via autoregressions on realized variance appear in

Figs. 3 and 4. The pattern is very similar to that

appearing in Figs. 1 and 2, respectively; however, the

point estimates of forecast content are in general

higher for the realized variance forecasts for

approximately 10–15 trading days. Toward the

end of the 30-day maximum horizon that we

consider, the two types of forecasts appear to be

very similar. As one would expect, there is virtually

no evidence of positive content in applying this

method directly to squared returns (Figs. 3d and

4d). Again, the strongest results on both data sets

appear in the 5-min returns.

Fig. 3. (a) DM/USD. Forecast content function. AR(12) forecasts, realized volatility of 5-min returns. (b) DM/USD. Forecast content function.

AR(12) forecasts, realized volatility of 10-min returns. (c) DM/USD. Forecast content function. AR(12) forecasts, realized volatility of 30-min

returns. (d) DM/USD. Forecast content function. AR(12) forecasts, squared daily returns.


A numerical summary of some of these results

appears in Table 1, for the 5-min aggregation.6 These

results tend to confirm, for the shorter horizons, the

results of Bollerslev and Wright (2001) concerning the

superiority of AR-rv forecasts over QML-GARCH. At

longer horizons, the three methods show very similar

point estimates of information content. The content

functions for FIGARCH forecasts are similar to those

for the GARCH forecasts, and we do not record these

6 Note however that in trimming the Yen sample to remove the

apparent outliers, we have given an advantage to the AR-realized

variance forecasts, which are more sensitive to this effect than the

GARCH forecasts; the latter are based on GARCH estimates of

daily variance, which fluctuate less dramatically than the realized

volatilities.

in separate figures. However, a few differences do

appear, of which the summary measures of Table 1 are

a sufficient indicator. In particular, the FIGARCH

forecasts have slightly less content at very short ho-

rizons, so that there is a clear ranking (AR-rv,

GARCH, FIGARCH, respectively) with respect to

these point estimates of short-horizon content. At the

longest horizon, the FIGARCH forecasts have point

estimates of content virtually identical with the

GARCH.

It is important to recall that even the best realized

variance measure contains measurement noise relative

to the daily integrated varianceR tt�1

r2tþsds. Such

measurement noise tends to lower all measures of the

value of forecasts, relative to that which would be

Fig. 4. (a) Yen/USD. Forecast content function. AR(12) forecasts, realized volatility of 5-min returns. (b) Yen/USD. Forecast content function.

AR(12) forecasts, realized volatility of 10-min returns. (c) Yen/USD. Forecast content function. AR(12) forecasts, realized volatility of 30-min

returns. (d) Yen/USD. Forecast content function. AR(12) forecasts, squared daily returns.


measured with knowledge of the true daily variance, by

obscuring the match between forecast and outcome.

For this reason, we interpret the highest forecast

content across measures as evidence in favour of the

superiority of that measure. The evidence in these data

sets therefore favours the 5-min aggregation.

More precise evidence on the relative performance

of the models comes from a sequence of tests for the

null of equal forecast accuracy, on the two sequences of

(FI)GARCH forecasts vs. autoregression on realized

variance, for each of the two currencies. We first report

the well-known Diebold and Mariano (1995) test

(Table 2).

There are some significant results in this test at

short horizons, corresponding with higher forecast

content in the AR-rv forecasts relative to GARCH

and FIGARCH. Clearly the (FI)GARCH vs. AR-rv

results are stronger on the DM/$US data; there is

some weak evidence on the Yen/$US data that the

AR-realized variance forecast may be superior at

fairly short horizons, by this test. For each

forecasting method, the point estimates of the

content of the forecasts become close to zero

around the 30-day horizon. Although there are

small differences in the point estimates of forecast

content for the longer horizons, there do not appear

to be statistically significant differences among the

methods at long horizons, by this test. Note again

that the ratio of the content to the standard error

should not be taken as an asymptotically normal

Table 1

Numerical summary of forecast content

Forecast method DM/$US Yen/$US

QML-GARCH

C(1) (std. err.) 0.26 (0.02) 0.25 (0.03)

C(5) (std. err.) 0.14 (0.03) 0.11 (0.03)

C(20) (std. err.) 0.07 (0.04) 0.04 (0.02)

C(30) (std. err.) 0.05 (0.03) 0.02 (0.02)

QML-FIGARCH

C(1) (std. err.) 0.23 (0.04) 0.17 (0.04)

C(5) (std. err.) 0.15 (0.07) 0.11 (0.06)

C(20) (std. err.) 0.07 (0.07) 0.05 (0.06)

C(30) (std. err.) 0.05 (0.07) 0.02 (0.05)

AR-rv

C(1) (std. err.) 0.39 (0.03) 0.32 (0.03)

C(5) (std. err.) 0.20 (0.03) 0.15 (0.03)

C(20) (std. err.) 0.07 (0.03) 0.05 (0.02)

C(30) (std. err.) 0.03 (0.03) 0.03 (0.02)

Evaluated by realized variance of 5-min returns.


test. We test the significance of the forecast content

below.

A limitation of these tests is that they do not take

into account uncertainty of parameter estimation and

are not valid when one of the models under

consideration is nested by the other. Tests developed

by Clark and McCracken (2001) deal with nested

models, but require linearity in the models being

estimated. We therefore implement a new test

developed by Giacomini and White (2003), which

allows for tests of equal predictive ability of both

nested and non-nested, linear and nonlinear models.

The tests have asymptotic v2 distributions, and

appear to have good finite-sample size conformity.7

Table 3 presents results from these tests, again by

horizon and p-value. The test requires a choice of the

parameter q, the number of elements in the test

function; see Giacomini and White (2003). This

choice is potentially important, and Giacomini and

White note that the test function should include

information about the future values of the loss

differential (via lags of the loss differential, where

there is autocorrelation), but an excessive value

reduces the power of the test. In their simulation

examples, the authors use q=2, where the test function

7 We thank R. Giacomini and H. White for supplying their Matlab

code for the test.

is formed as a matrix with a column of ones and

another column containing the lagged values of the

loss differential. In the present study, we examined

fixed values of q as well as a choice of q based on the

autocorrelation structure of the loss differential. To

balance the necessity of including highly correlated

lags of the loss differential and the danger of eroding

the power of the test, we include lags with sample

autocorrelation higher than 0.20 in the test function,

and choose q accordingly (using q=2 if there is no

autocorrelation of this magnitude). Table 3 reports the

results with q chosen in this way; results based on

fixed values of q are broadly similar, but differ

substantially in detail.

The GARCH(1,1) and AR(12)-rv forecasting

models are compared in columns 2 and 3 of the table;

columns 4 and 5 compare these forecasting models

with the forecast implicit in the constant unconditional

variance. A significantly lower forecast MSE relative

to the constant-variance forecasts represents signifi-

cant forecast content. That is, the null hypothesis is of

equal conditional predictive ability of the model-

based (GARCH or AR) forecasts and the uncondi-

tional variance, r2. Under this null, forecast content is

zero, so that the tests in columns 4 and 5 are tests of

significance of estimated forecast content at horizon s,

for each forecasting model.8

The result of the GARCH/AR-rv comparison in this

test is broadly similar to that in the Diebold–Mariano

test, but these p-values tend to be lower. At shorter

horizons, there are significant differences between

tests, whereas at longer horizons the differences

between the two classes of model-based forecast tend

not to be significant at conventional levels. With

respect to the significance of the forecast content at

different horizons, there are clear indications of

significant content of either forecasting method to the

30-day horizon considered here on the DM data. On the

Yen data, the same is true for the AR-rv forecasts, while

GARCH forecast content is not significantly different

from zero at the longer horizons.

To summarize the results from the set of tests, there

appears to be a significant difference in short-horizon

predictive ability in favour of the AR-rv forecasts based

on the additional information in the realized volatilities.

8 Results for the FIGARCH models are not included because of

computational constraints.

Table 2

Diebold–Mariano tests of equal forecast loss: p-values

Horizon (s) GARCHv.AR (DM/$US) GARCHv.AR (Yen/$US) FIGARCHv.AR (DM/$US) FIGARCHv.AR (Yen/$US)

1 b0.01 0.11 b0.01 0.15

2 0.01 0.04 0.01 b0.01

3 0.01 0.25 0.06 0.06

4 0.08 0.14 0.03 0.91

5 0.03 0.09 0.01 0.38

10 0.08 0.10 0.07 0.25

15 0.15 0.37 0.15 0.71

20 0.67 0.72 0.62 0.79

30 0.71 0.32 0.81 0.85

GARCH(1,1) and FIGARCH(1,d,1) vs. AR-realized variance forecasts. Evaluated by realized variance of 5-min returns.


At longer horizons near 30 days, the various model-

based methods are difficult to distinguish. However,

although the point estimates of forecast content tend to

be just slightly above zero at 30 days, there is some

indication that this content is statistically significant,

that is, that there is a small but genuine element of

forecast content remaining at the 30-day horizon.

5. Concluding remarks

To measure forecast performance, we need to

estimate ex post the outcome (here, the conditional

variance) which was being forecast. The use of

realized variance, as in Andersen and Bollerslev

(1998) and various subsequent contributions, has

clearly improved our ability to do so, and we take

advantage of this measurement here to examine the

Table 3

Giacomini–White tests of equal conditional predictive ability:

p-values

Horizon

(s)

GARCHv.AR

(DM/$US)

GARCHv.AR

(Yen/$US)

GARCH/AR

vs. r2

(DM/$US)

GARCH/AR

vs. r2

(Yen/$US)

1 b0.01 b0.01 b0.01/b0.01 b0.01/b0.01

2 0.10 b0.01 b0.01/b0.01 b0.01/b0.01

3 0.02 b0.01 b0.01/b0.01 b0.01/b0.01

4 b0.01 b0.01 b0.01/b0.01 b0.01/b0.01

5 0.01 b0.01 b0.01/b0.01 0.02/b0.01

10 0.05 0.03 b0.01/b0.01 0.16/b0.01

15 0.12 0.05 b0.01/0.01 0.28/0.01

20 0.29 0.26 0.01/0.02 0.47/0.02

30 0.06 0.16 b0.01/0.02 0.18/0.01

GARCH(1,1) vs. AR-realized variance forecasts vs. unconditional

mean. Evaluated by realized variance of 5-min returns.

usefulness of forecasts at longer horizons than have

commonly been investigated. Nonetheless, as in

Andersen et al. (2001, 2003), the choice of frequency

of measurement has nontrivial affects on the meas-

ured realized variance. These different estimates

affect our measures of forecast content; we find,

however, that the 5-min realized volatilities provide

the strongest results on all data sets.

The results have a number of other features which

appear common to the different data sets. First,

forecasts of conditional variance appear to have

information content to a horizon of approximately 30

trading days, although the precise maximum horizon

of course depends on the data set and forecast method.

It is possible that a measure of daily variance with even

less noise than the 5-min realized volatilities would

provide yet stronger evidence of the value of variance

forecasts, particularly at long horizons where evidence

is weaker. Second, forecasts based on autoregressive

projection on past realized volatilities do appear to

provide better results at short horizons (to about 10–15

days), as was found by Bollerslev and Wright (2001)

for the 1-day horizon (on DM/$US data). At longer

horizons, as the content of each type of forecasts nears

zero, the methods inevitably become more difficult to

distinguish on statistical grounds. Nonetheless, there is

evidence of significant forecast content, i.e. evidence

that each of the methods usefully exploits conditioning

information to approximately 30 trading days.

Acknowledgement

We would like to thank the Fonds quebecois de la

recherche sur la societe et la culture (FQRSC) and the


Social Sciences and Humanities Research Council of

Canada (SSHRC) for financial support of this

research, and the Centre Interuniversitaire de

recherche en analyse des organisations (CIRANO)

and Centre Interuniversitaire de recherche en econo-

mie quantitative (CIREQ) for research facilities. The

foreign exchange data were provided by Olsen

Associates. We are also grateful to Raffaella Giaco-

mini for advice on implementation of the Giacomini

and White test, and to Serguei Zernov and Victoria

Zinde-Walsh for their comments and suggestions. The

views expressed in this paper are those of the authors

and do not necessarily represent the views or policy of

the IMF.

References

Andersen, T. G., & Bollerslev, T. (1997). Intraday periodicity and

volatility persistence in financial markets. Journal of Empirical

Finance, 4, 115–158.

Andersen, T. G., & Bollerslev, T. (1998). Answering the skeptics:

Yes, standard volatility models do provide accurate forecasts.

International Economic Review, 39(4), 885–905.

Andersen, T. G., Bollerslev, T., Diebold, F. X., & Labys, P. (2001).

The distribution of realized exchange rate volatility. Journal of

the American Statistical Association, 96, 42–55.

Andersen, T. G., Bollerslev, T., Diebold, F. X., & Labys, P. (2003).

Modeling and forecasting realized volatility. Econometrica, 71,

579–625.

Baillie, R. T., Bollerslev, T., & Mikkelsen, H. O. (1996). Fraction-

ally integrated generalized autoregressive conditional hetero-

skedasticity. Journal of Econometrics, 74, 3–30.

Barndorff-Neilsen, O. E., & Shephard, N. (2002). Estimating

quadratic variation using realized variance. Journal of Applied

Econometrics, 17, 457–477.

Bollerslev, T. (1986). Generalized autoregressive conditional

heteroskedasticity. Journal of Econometrics, 31, 307–327.

Bollerslev, T., & Domowitz, I. (1993). Trading patterns and prices

in the interbank foreign exchange market. Journal of Finance,

48, 1421–1443.

Bollerslev, T., & Wright, J. H. (2001). High-frequency data,

frequency domain inference, and volatility forecasting. Review

of Economics and Statistics, 83, 596–602.

Christoffersen, P., & Diebold, F. X. (2000). How relevant is

volatility forecasting for financial risk management? Review of

Economics and Statistics, 82, 12–22.

Clark, T. E., & McCracken, M. W. (2001). Tests of equal forecast

accuracy and encompassing for nested models. Journal of

Econometrics, 105, 85–110.

Dacorogna, M. M., Mqller, U. A., Nagler, R. J., Olsen, R. B., &Pictet, O. V. (1993). A geographical model for the daily and

weekly seasonal volatility in the foreign exchange market.

Journal of International Money and Finance, 12, 413–438.

Diebold, F. X., & Mariano, R. (1995). Comparing predictive

accuracy. Journal of Business and Economic Statistics, 13,

253–259.

Engle, R. F. (1982). Autoregressive conditional heteroskedasticity

with estimates of the variance of U.K. inflation. Econometrica,

50, 987–1008.

Galbraith, J. W. (2003). Content horizons for univariate time series

forecasts. International Journal of Forecasting, 19, 43–55.

Giacomini, R., & White, H. (2003). Tests of Conditional

Predictive Ability, Working paper, University of California,

San Diego.

Meddahi, N. (2002). A theoretical comparison between integrated

and realized volatilities. Journal of Applied Econometrics, 17,

479–505.

Mqller, U. A., Dacorogna, M. M., Olsen, R. B., Pictet, O. V.,

Schwarz, M., & Morgenegg, C. (1990). Statistical study of

foreign exchange rates, empirical evidence of a price scaling

law, and intraday analysis. Journal of Banking and Finance, 14,

1189–1208.

Biographies: John W. GALBRAITH is a Professor of Economics at

McGill University, Montreal, Canada, and a Fellow of the Centre

Interuniversitaire de Recherche en Analyse des Organisations

(CIRANO) and of the Centre Interuniversitaire de Recherche en

Economie Quantitative (CIREQ). He completed his doctorate in

Economics at the University of Oxford in 1987. His research is

primarily concerned with time-series econometrics and has appeared

in journals such as Biometrika, Econometric Theory, International

Economic Review, Journal of Applied Econometrics and Journal of

Econometrics.

Turgut KIS˙INBAY is an economist at the International Monetary

Fund. He received his BA in Economics from Bogazici University,

Istanbul, Turkey, and his PhD in Economics from McGill

University, Montreal, Canada. His research interests are in the areas

of monetary economics, business cycle analysis, time-series

econometrics, and financial econometrics.

content horizons for conditional variance forecasts

Documents