content horizons for conditional variance forecasts
TRANSCRIPT
www.elsevier.com/locate/ijforecast
International Journal of Foreca
Content horizons for conditional variance forecasts
John W. Galbraitha, Turgut KVs˙Vnbayb,*
aDepartment of Economics, McGill University, Montreal, Quebec H3A 2T7, CanadabInternational Monetary Fund, Washington, DC 20431, USA
Abstract
Using realized variance to estimate daily conditional variance of financial returns, we compare forecasts of daily variance
from standard GARCH and FIGARCH models estimated by Quasi-Maximum Likelihood (QML), and from projections on past
realized volatilities obtained from high-frequency data. We consider horizons extending to 30 trading days. The forecasts are
compared with the unconditional sample variance of daily returns treated as a predictor of daily variance, allowing us to
estimate the maximum horizon at which conditioning information has exploitable value for variance forecasting. With foreign
exchange return data (DM/$US and Yen/$US), we find evidence of forecasting power at horizons of up to 30 trading days, on
each of two financial returns series. We also find some evidence that the result of (e.g.) Bollerslev and Wright [Bollerslev, T., &
Wright, J. H. (2001) High-frequency data, frequency domain inference, and volatility forecasting. Review of Economics and
Statistics, 83, 596–602], that projections on past realized variance provide better one-step forecasts than the QML-GARCH and
-FIGARCH forecasts, appears to extend to longer horizons up to around 10 to 15 trading days. At longer horizons, there is less
to distinguish the forecast methods, but the evidence does suggest positive forecast content at 30 days for various forecast types.
D 2004 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved.
Keywords: GARCH; High-frequency data; Integrated variance; Realized variance
1. Introduction
The modelling of conditional variance of eco-
nomic time series, following Engle (1982) and many
subsequent contributions, has permitted character-
ization and forecasting of volatility in data such as
time series of asset returns. While returns themselves
are usually approximately unforecastable, forecasting
0169-2070/$ - see front matter D 2004 International Institute of Forecaste
doi:10.1016/j.ijforecast.2004.10.002
* Corresponding author.
E-mail address: [email protected] (T. KVs˙Vnbay).
of the variance of returns has been quite successful.
Of course, evaluating the degree of this success, as
with the forecasting itself, has been complicated by
the unobservability of the realized conditional
variance.
Recent work with high-frequency data, in particular
by Andersen and Bollerslev (1997, 1998), has provided
an avenue for relaxing this constraint. Using direct
estimates of daily conditional variance obtained from
high-frequency (intraday) squared returns may permit
better evaluation of model-based daily variance fore-
casts (e.g. Andersen & Bollerslev, 1998) and character-
sting 21 (2005) 249–260
rs. Published by Elsevier B.V. All rights reserved.
J.W. Galbraith, T. KVs˙Vnbay / International Journal of Forecasting 21 (2005) 249–260250
ization of daily conditional variances without recourse
to standard parametric models (e.g. Andersen, Boller-
slev, Diebold, & Labys, 2001; Andersen, Bollerslev,
Diebold, & Labys, 2003; Bollerslev & Wright, 2001).
The present paper uses intraday asset returns to
examine model-based variance forecasts at various
horizons. As Christoffersen and Diebold (2000) note,
risk depends on the horizon considered, and different
horizons are relevant for different problems or, in the
case of financial risk management, different classes of
market participant. Nonetheless, little is known about
the forecastability of variance at horizons beyond the
very short, such as 1 day. We investigate forecasts,
both from the standard GARCH model estimated by
Quasi-Maximum Likelihood (QML), from the frac-
tionally integrated GARCH model which allows long
memory in conditional variance, and from projecting
directly on past values of the realized variance, at
horizons of up to 30 trading days.1
These forecasts are compared with the forecast
implicit in the unconditional sample variance. We are
then able to consider the maximum horizon at which
any of these forecasts contain useful information about
deviations of daily conditional variance from the
unconditional variance: that is, the maximum horizon
at which we can exploit conditioning information for
forecasting. In doing so, we use the definitions of
forecast content of Galbraith (2003), and compute the
forecast content function to the 30-day horizon.
Throughout this paper, wewill use the term drealizedvarianceT to refer to an estimate of daily variance
obtained by summing higher frequency returns; this
concept has typically been labelled drealized volatilityT,following Andersen and Bollerslev (1998) and sub-
sequent authors. However, we will follow Barndorff-
Neilsen and Shephard (2002) in using the former term,
since volatility has more usually been used to refer to a
measure of standard deviation rather than variance. The
next section of the paper describes themethods that will
be applied in evaluating the information content of
variance forecasts. The third section describes the high-
frequency data and forecasting models, and the fourth
estimates QML-GARCH, -FIGARCH and realized-
1 We emphasize the QML estimation in referring to these
estimates to make clear that the (FI)GARCH forecasts used are of
the standard type (e.g. Bollerslev, 1986; Baillie et al., 1996), based
on daily data alone.
variance forecast content for the daily conditional
variance of the returns, and interprets the results.
2. Variance forecasts and content horizons
The conditional variance of a time series is not
directly observable. For this reason, both estimation
and evaluation of forecasting models is difficult
relative to the conditional mean case. However,
estimates of variance can be obtained from the
quadratic variation or realized variance. For example,
Andersen and Bollerslev (1998) address the question
of the forecast performance of GARCH models using
estimates of daily conditional variance obtained from
intraday returns, and compare an R2-type measure of
predictive power at a 1-day horizon. The exercise that
we will perform here uses the same realized variance
measures, but uses a measure based on relative
forecast MSE at different horizons to characterize
forecasting performance for daily variance.
The model of asset returns used by Andersen and
Bollerslev, which we adopt for the exposition here,
assumes that returns follow a continuous-time diffu-
sion process such that instantaneous returns obey the
relation dpt=rtdWp,t, where pt is the logarithm of the
price, rt is the instantaneous standard deviation, and
Wp,t is a Wiener process. Consider the sequence of
(Tm+1) discretely sampled logarithmic prices {pj},
j=0, 1/m, 2/m,. . .,T, where integer values of the indexj represent end-of-day closing prices and non-
integer values represent intraday observations. Trans-
form the sequence to obtain Tm returns, r l;lþ1½ �,l ¼ 0; 1
m; 2m; N ; T � 1
m
��. Defining r[a ,b] as pb�pa,
the daily return is r[t�1,t]=pt�pt�1 and the return on
the last of m intraday periods of equal length between
t�1 and t is r[t�1/m,t]=pt�pt�1/m.2
With these definitions, the daily quadratic variation
or integrated variance isR 10
r2tþsds, and using the
result that (Andersen & Bollerslev, 1998)
p limmYl
Z 1
0
r2tþsds �
Xmi¼1
r2tþ i�1ð Þ
m;tþ i
m�½
! ¼ 0; ð2:1Þ
it follows that the summation of intraday squared
returns, or daily realized variance, provides an
2 This is a slight modification of Andersen and Bollerslev’s
notation for the returns.
J.W. Galbraith, T. KVs˙Vnbay / International Journal of Forecasting 21 (2005) 249–260 251
estimate of the daily conditional variance. It is the
performance of daily conditional variance models in
forecasting this quantity of interest which is evaluated
both by Andersen and Bollerslev and others, and in
the present paper. Note that the result (Eq. (2.1))
requires that the discretely sampled squared returns
r[t+((i�1)/m), t+(i/m)]2 be serially uncorrelated and that the
price process be continuous. In practice, this summa-
tion will provide only an approximation toR 10
r2tþsds,
the quality of which will depend upon the adequacy of
these assumptions as well as on the intraday sampling
frequency m (that is, the asymptotics). Meddahi
(2002) provides a detailed examination of the
discrepancy between integrated and realized variance.
The daily realized variance therefore gives us the
target for the forecasting technique, analogous to the
realized outcome in forecasting an observable series.
With this information, we proceed to the evaluation of
forecasts at different horizons, making use of the
forecast content function as defined in Galbraith
(2003), modified slightly here to reflect the fact that
we are evaluating second-moment forecasts. There,
the forecast content function was defined as the
proportionate reduction in MSE obtainable relative
to the unconditional mean forecast, in forecasting the
level of a quantity observable ex post. Adapting that
definition for a conditional variance forecast, we set
C sð Þ ¼ 1�MSE rr2 sð ÞÞðMSE rr2 sð ÞÞð
; s ¼ 1; N ; S; ð2:2Þ
where r2(s) and r2(s) are, respectively, the model-
based s-step-ahead forecast of the daily conditional
variance, and the estimated unconditional variance. As
in the case of conditional mean forecasts, C(s)Y0 as
T, sYl for a forecast rT+sjT2 based on a correctly
specified forecasting model; C(s) may be less than zero
where model parameter uncertainty dominates the
contribution of these parameters to prediction; C(s)
equal to zero indicates model-based forecasts no better
as predictors than is the unconditional sample variance.
For general autoregressive processes, the content
function can be obtained analytically as a function of
the autoregressive parameters and sample size avail-
able for estimation, taking account of parameter
estimation uncertainty; asymptotic expressions for
C(s) are given in Galbraith (2003). Alternatively, for
a set of forecasts from an arbitrary forecasting model,
C(s) can be estimated from a sequence of outcomes and
forecasts using the sample mean squared errors and
associated asymptotic inference. This is the method
used below for the forecasts evaluated in Section 4.
3. Data and models
We evaluate the forecast content functions for two
currencies, priced relative to the US dollar. In each
case, we examine the daily logarithmic returns. High-
frequency intraday data are available on the foreign
exchange prices (bid and ask) at 5-min intervals. A
point relevant to each of the data series, noted in
Andersen and Bollerslev (1998) and explored further in
Andersen et al. (2001, 2003), is that the highest
possible frequency of observation may not be optimal
from the point of view of daily variance estimation,
because of market microstructure or other effects. As in
Andersen et al. (2001, 2003), we consider various
possible sampling frequencies for the estimation of
daily variance.
3.1. Foreign exchange data
The foreign exchange data are Deutschemark–US
dollar and Yen–US dollar spot exchange rates, taken
from the Olsen and Associates HFDF-2000 data set.
The original sources of the raw data that are used to
construct HFDF are the DM/$US and Yen/$US bid–
ask quotations displayed on the Reuters FXFX screen.
These raw data are subject to various microstructure
frictions, outliers and other anomalies, and have to be
filtered; see Muller et al. (1990) and Dacorogna;
Muller, Nagler, Olsen, and Pictet (1993) for the
filtering procedures and the construction of the
returns. Because these data have been fairly widely
used, our description will be brief.
Each data point in the HFDF-2000 data set
contains a mid-price, a bid–ask spread over a 5-
min interval of spot rate quotation and a time stamp,
resulting in a total of 288 five-minute returns for a
given day. The 5-min returns, expressed in basis
points (i.e. multiplied by 10,000), are computed as
the mid-quote price difference. The mid-price at a
given regular time point is estimated through a linear
interpolation between the previous and following
mid-price of the irregularly spaced tick-by-tick data.
J.W. Galbraith, T. KVs˙Vnbay / International Journal of Forecasting 21 (2005) 249–260252
The bid–ask spread is the average over the last 5-min
interval, again expressed in basis points. If there is
no quotation during this interval, the mean bid–ask
spread is zero. The sample spans the period from
January 2, 1987 to December 31, 1998, providing us
with a sample of 1,262,016 five-minute returns,
expressed in US$ terms. However, it is well known
that trading activity in the foreign exchange market
slows markedly over the weekends and certain
holidays. Following Andersen et al. (2001), among
others, we remove such low-activity days from our
data set. Whenever a day t is excluded from the data
set, we cut from 21:05 GMT of the previous calendar
day t�1 to 21:00 GMT on the calendar day t. This
definition of a bdayQ is standard in the literature and
is based on work by Bollerslev and Domowitz
(1993).
In addition to the low-activity period from Friday
21:05 GMT to Sunday 21:00 GMT, we remove the
following fixed holidays from our data set: Christmas
(December 24–26), New Year’s (December 31–Jan-
uary 2), and the Fourth of July. We also remove the
moving holidays of Good Friday, Easter Monday,
Memorial Day, Labour Day, Thanksgiving (US) and
the day after Thanksgiving. Finally, we cut the days for
which the indicator variable (the bid–ask spread) had
144 or more zeros, corresponding with the technical
bholesQ in the recorded data. After these adjustments,
we are left with 2968 trading days of DM/$US data
corresponding with 2968288=854,784 five-minute
return observations, and 2970 days of Yen/$US data
corresponding with 2970288=855,360 five-minute
return observations. A final adjustment is made to the
Yen/$US data set: because of the extremely large
variance observations occurring at October 7–8, 1998
and surrounding dates,3 we terminate this data set at
the end of May 1998, leaving 2830 days of high-
frequency data. We comment below on the effect of
this trimming on the relative performance of the two
forecast types.
We report results for three different aggregations of
the intraday data as well as for the daily squared
return, for comparison: 5-, 10- and 30-min returns. We
use an initial sample of daily returns, derived from the
high-frequency data, for QML estimation.
3 This was the period of the collapse of Long Term Capital
Management.
3.2. Forecasting models
We use three classes of forecasting model to
evaluate the information content of variance forecasts
at horizons extended to 30 days. The first is the
standard GARCH model estimated by Quasi-Max-
imum Likelihood methods on samples of daily returns
data; these forecasts do not use the information present
in the high-frequency data. The second is the FIG-
ARCH model which incorporates a lag polynomial
term of the form (1�L)d, for noninteger d, and thereby
allows long memory effects in conditional variance
persistence: if actual autocorrelations in conditional
variance decay more slowly than is compatible with the
usual short-range dependent specifications, such mod-
els might be expected to perform relatively well at
longer horizons. The third class is the forecast of
conditional variance obtained from projection of
realized variance on past values of realized variance:
that is, autoregressive models of the realized variance,
as examined by Andersen et al. (2001, 2003), and
Bollerslev and Wright (2001).
Among short-memory models in the QML case,
we restrict ourselves for reporting purposes to the
GARCH(1,1) model which Bollerslev and Wright, for
example, found to produce the best one-step variance
forecasts by a substantial margin, on a set of DM/$US
data very similar to that used here.4 We found that
threshold GARCH and EGARCH specifications did
not perform substantially differently.
The GARCH( p,q) model can be characterized
using polynomials in the lag operator as
r2t ¼ x þ b Lð Þr2
t þ a Lð Þe2t ð3:1Þor in the (1,1) case rt
2=x+br2t�1+ae2t�1; we can
rewrite Eq. (3.1) as
1� a Lð Þ � b Lð Þð Þe2t ¼ x þ 1� b Lð Þð Þ e2t � r2t
��:
ð3:2ÞThe FIGARCH( p,d,q) extends the model in the form
of Eq. (3.2) by allowing a term of the form (1�L)d
(usually da(0,1)), typically written as
1� / Lð Þð Þ 1� Lð Þde2t ¼ x þ 1� b Lð Þð Þ e2t � r2t
��;
ð3:3Þ
4 The alternatives considered by Bollerslev and Wright were
GARCH models up to order (2,2), and the EGARCH(1,1).
J.W. Galbraith, T. KVs˙Vnbay / International Journal of Forecasting 21 (2005) 249–260 253
where the GARCH model may be viewed as a special
case in which d=0 so that the term (1�L)d plays no
role, and in which (1�/(L))=(1�a(L)�b(L)). In the
FIGARCH(1,d ,1) case, (1�/(L ))=(1�/L ) and
(1�b(L))=(1�bL).5
For initial estimation of the GARCH(1,1) parame-
ters, initial daily samples of each of the two foreign
exchange return series are taken from the first 2 years,
1987–1988. From each forecast date, out-of-sample
forecasts are produced for s days in the future,
s=1,2. . .30, so that the first forecasts are computed
for the first 30 trading days of 1989. Parameter
estimates are then updated to include the data from
the first trading day of 1989, and forecasts are produced
for trading days 2 to 31, and so on: the estimates are
updated at each day for recursive estimation and out-of-
sample forecasting, producing sets of forecasts for
1989 through the ends of the data sets in 1998. For the
FIGARCH forecasts, a somewhat larger initial sample
was necessary in order to avoid computational prob-
lems. We therefore used forecasts based on fixed
parameter estimates on 1200 sample points for an
initial period, with recursive updating of parameter
estimates thereafter. This gives a small advantage to the
FIGARCH as initial forecasts benefit from parameter
estimation based on future data, but as the FIGARCH
does not prove to be superior to other methods, it does
not seem to be critical to the results.
Forecasts by autoregressive projection on past
realized volatilities (AR-rv) are also computed recur-
sively beginning with the same initial samples. Note
that these forecasts use the additional information
present in the high-frequency data, and are therefore
not based on the same information set as the QML-
GARCH/-FIGARCH forecasts. We again provide
results for one of several alternatives considered by
Bollerslev and Wright (2001), in this case a time-
domain AR(12) model of the realized volatilities,
which provides good results on each data set. It is
possible nonetheless that overall increases in forecast
content may be available from variants on this model.
For each data set, we then also estimate the
unconditional variance recursively using the same
5 Recall that (1�L)d can be expanded as 1� Lð Þd ¼Pl
k¼0
d
k
� Lð Þk¼
Plk¼0 bkL
k , with b0=1, b1=�d , b2=(1/2)d(d�1), bj=
(�1/j)bj�1(d+1�j), jz3.
initial sample, in order to evaluate the performance of
the forecasting models in predicting deviations of the
conditional variance from this quantity; that is, we
ask whether conditioning information, as exploited
by the model-based forecast, has value in predicting
variance at a given horizon.
Estimates of the variance forecast content function
for each class of forecast are obtained as follows.
Beginning with an initial sample of size t0, let s=t0,t0+1,. . .,T index a sample of trading days for which
high-frequency information is available, on either of
the two data sets (that is, T=2968 and 2970 for DM/
$US and Yen/$US, respectively). Let s=1,2,. . .,S indexthe forecast horizon; we consider a maximum horizon
of S=30 days. Construct the (T�t0+1)S matrix E
having typical element es,s, defined as the forecast error
from the model-based forecast for trading day s given aforecast made s days in the past. Construct the
corresponding (T�t0+1)1 vector U having typical
element us, defined as the difference between the
estimated unconditional variance of the returns and the
realized variance for trading day s. The latter quantitiesdo not depend on s because the unconditional variance
does not, of course, depend on s. For the entries in both
E and U, we take the sample mean squared error
obtained in comparing the forecast with the realized
variance, at the chosen level of aggregation (e.g. 5, 10
min). These sample quantities are substituted into Eq.
(2.2), for each s.
Pointwise standard errors for the content function
are computed using the standard approximate expres-
sion varðXYÞfðlX
lYÞ2½var Xð Þ
l2X
þ var Yð Þl2Y
� 2cov X ;Yð ÞlX lY
�. Let Xbe the estimated MSE of model-based forecasts foreach s, and let Y be the estimated MSE of the
forecast implicit in the unconditional sample var-
iance. We replace each of the population quantities
on the RHS of this expression with their sample
counterparts for asymptotic inference, allowing for
dependence to order 29, in accordance with the
standard result that successive optimal h-step fore-
casts overlap h�1 periods and so can show depend-
ence to this order. We use a Bartlett window in
computing the variance and covariance terms, and
note that consistency would require the window to
increase, or weights to approach 1, with sample size.
Note also that these standard errors should not be
interpreted as leading to asymptotically normal tests
of (non-)zero forecast content: inference on the
J.W. Galbraith, T. KVs˙Vnbay / International Journal of Forecasting 21 (2005) 249–260254
content functions will be presented in Section 4 using
a test of Giacomini and White (2003).
4. Empirical results
The figures record the estimated forecast content
functions for the two data sets and the three forecast
methods; the content functionsC(s) are shownwithF2
standard error bands. Each figure shows the forecast
content function for three different time aggregations,
together with the function estimated on squared returns,
for comparison. The latter is of interest primarily to
underline, in accordance with Andersen and Bollerslev
Fig. 1. (a) DM/USD. Forecast content function. GARCH(1,1) forecasts,
function. GARCH(1,1) forecasts, realized volatility of 10-min returns. (c) D
volatility of 30-min returns. (d) DM/USD. Forecast content function. GA
(1998), the value of realized variance in exploring the
performance of conditional variance forecasts; using
the very noisy squared return series, which was used for
forecast evaluation in literature pre-dating Andersen
and Bollerslev (1998), yields no substantial indication
of forecasting power on any data set. Recall that a value
of zero indicates a forecast of daily variance no more
informative, by MSE, than is the unconditional sample
variance.
Specifically, Figs. 1(a–d) and 2(a–d) plot estimated
forecast content functions, using forecasts from the
QML-estimated GARCH(1,1) model, on the three
realized variance measures and the untransformed
daily squared returns. These figures describe the DM/
realized volatility of 5-min returns. (b) DM/USD. Forecast content
M/USD. Forecast content function. GARCH(1,1) forecasts, realized
RCH(1,1) forecasts, squared daily returns.
Fig. 2. (a) Yen/USD. Forecast content function. GARCH(1,1) forecasts, realized volatility of 5-min returns. (b) Yen/USD. Forecast content
function. GARCH(1,1) forecasts, realized volatility of 10-min returns. (c) Yen/USD. Forecast content function. GARCH(1,1) forecasts, realized
volatility of 30-min returns. (d) Yen/USD. Forecast content function. GARCH(1,1) forecasts, squared daily returns.
J.W. Galbraith, T. KVs˙Vnbay / International Journal of Forecasting 21 (2005) 249–260 255
$US and Yen/$US data, respectively, with aggrega-
tions of 5, 10 and 30 min used in estimating realized
variance, as well as with the squared daily return for
comparison.
On both data sets, all estimates on realized variance
measures (parts a–c) show positive point estimates of
forecast content remaining at the 30-day horizon: use of
a formal variance forecasting model for daily variance
shows positive information content 30 days into the
future, in the sense that the forecasts improve on the
unconditional variance of the process as an indicator of
the realized variance that will emerge. In each data set,
the 5-min aggregation provides the highest estimate of
forecast content.
The results for conditional variance forecasts made
via autoregressions on realized variance appear in
Figs. 3 and 4. The pattern is very similar to that
appearing in Figs. 1 and 2, respectively; however, the
point estimates of forecast content are in general
higher for the realized variance forecasts for
approximately 10–15 trading days. Toward the
end of the 30-day maximum horizon that we
consider, the two types of forecasts appear to be
very similar. As one would expect, there is virtually
no evidence of positive content in applying this
method directly to squared returns (Figs. 3d and
4d). Again, the strongest results on both data sets
appear in the 5-min returns.
Fig. 3. (a) DM/USD. Forecast content function. AR(12) forecasts, realized volatility of 5-min returns. (b) DM/USD. Forecast content function.
AR(12) forecasts, realized volatility of 10-min returns. (c) DM/USD. Forecast content function. AR(12) forecasts, realized volatility of 30-min
returns. (d) DM/USD. Forecast content function. AR(12) forecasts, squared daily returns.
J.W. Galbraith, T. KVs˙Vnbay / International Journal of Forecasting 21 (2005) 249–260256
A numerical summary of some of these results
appears in Table 1, for the 5-min aggregation.6 These
results tend to confirm, for the shorter horizons, the
results of Bollerslev and Wright (2001) concerning the
superiority of AR-rv forecasts over QML-GARCH. At
longer horizons, the three methods show very similar
point estimates of information content. The content
functions for FIGARCH forecasts are similar to those
for the GARCH forecasts, and we do not record these
6 Note however that in trimming the Yen sample to remove the
apparent outliers, we have given an advantage to the AR-realized
variance forecasts, which are more sensitive to this effect than the
GARCH forecasts; the latter are based on GARCH estimates of
daily variance, which fluctuate less dramatically than the realized
volatilities.
in separate figures. However, a few differences do
appear, of which the summary measures of Table 1 are
a sufficient indicator. In particular, the FIGARCH
forecasts have slightly less content at very short ho-
rizons, so that there is a clear ranking (AR-rv,
GARCH, FIGARCH, respectively) with respect to
these point estimates of short-horizon content. At the
longest horizon, the FIGARCH forecasts have point
estimates of content virtually identical with the
GARCH.
It is important to recall that even the best realized
variance measure contains measurement noise relative
to the daily integrated varianceR tt�1
r2tþsds. Such
measurement noise tends to lower all measures of the
value of forecasts, relative to that which would be
Fig. 4. (a) Yen/USD. Forecast content function. AR(12) forecasts, realized volatility of 5-min returns. (b) Yen/USD. Forecast content function.
AR(12) forecasts, realized volatility of 10-min returns. (c) Yen/USD. Forecast content function. AR(12) forecasts, realized volatility of 30-min
returns. (d) Yen/USD. Forecast content function. AR(12) forecasts, squared daily returns.
J.W. Galbraith, T. KVs˙Vnbay / International Journal of Forecasting 21 (2005) 249–260 257
measured with knowledge of the true daily variance, by
obscuring the match between forecast and outcome.
For this reason, we interpret the highest forecast
content across measures as evidence in favour of the
superiority of that measure. The evidence in these data
sets therefore favours the 5-min aggregation.
More precise evidence on the relative performance
of the models comes from a sequence of tests for the
null of equal forecast accuracy, on the two sequences of
(FI)GARCH forecasts vs. autoregression on realized
variance, for each of the two currencies. We first report
the well-known Diebold and Mariano (1995) test
(Table 2).
There are some significant results in this test at
short horizons, corresponding with higher forecast
content in the AR-rv forecasts relative to GARCH
and FIGARCH. Clearly the (FI)GARCH vs. AR-rv
results are stronger on the DM/$US data; there is
some weak evidence on the Yen/$US data that the
AR-realized variance forecast may be superior at
fairly short horizons, by this test. For each
forecasting method, the point estimates of the
content of the forecasts become close to zero
around the 30-day horizon. Although there are
small differences in the point estimates of forecast
content for the longer horizons, there do not appear
to be statistically significant differences among the
methods at long horizons, by this test. Note again
that the ratio of the content to the standard error
should not be taken as an asymptotically normal
Table 1
Numerical summary of forecast content
Forecast method DM/$US Yen/$US
QML-GARCH
C(1) (std. err.) 0.26 (0.02) 0.25 (0.03)
C(5) (std. err.) 0.14 (0.03) 0.11 (0.03)
C(20) (std. err.) 0.07 (0.04) 0.04 (0.02)
C(30) (std. err.) 0.05 (0.03) 0.02 (0.02)
QML-FIGARCH
C(1) (std. err.) 0.23 (0.04) 0.17 (0.04)
C(5) (std. err.) 0.15 (0.07) 0.11 (0.06)
C(20) (std. err.) 0.07 (0.07) 0.05 (0.06)
C(30) (std. err.) 0.05 (0.07) 0.02 (0.05)
AR-rv
C(1) (std. err.) 0.39 (0.03) 0.32 (0.03)
C(5) (std. err.) 0.20 (0.03) 0.15 (0.03)
C(20) (std. err.) 0.07 (0.03) 0.05 (0.02)
C(30) (std. err.) 0.03 (0.03) 0.03 (0.02)
Evaluated by realized variance of 5-min returns.
J.W. Galbraith, T. KVs˙Vnbay / International Journal of Forecasting 21 (2005) 249–260258
test. We test the significance of the forecast content
below.
A limitation of these tests is that they do not take
into account uncertainty of parameter estimation and
are not valid when one of the models under
consideration is nested by the other. Tests developed
by Clark and McCracken (2001) deal with nested
models, but require linearity in the models being
estimated. We therefore implement a new test
developed by Giacomini and White (2003), which
allows for tests of equal predictive ability of both
nested and non-nested, linear and nonlinear models.
The tests have asymptotic v2 distributions, and
appear to have good finite-sample size conformity.7
Table 3 presents results from these tests, again by
horizon and p-value. The test requires a choice of the
parameter q, the number of elements in the test
function; see Giacomini and White (2003). This
choice is potentially important, and Giacomini and
White note that the test function should include
information about the future values of the loss
differential (via lags of the loss differential, where
there is autocorrelation), but an excessive value
reduces the power of the test. In their simulation
examples, the authors use q=2, where the test function
7 We thank R. Giacomini and H. White for supplying their Matlab
code for the test.
is formed as a matrix with a column of ones and
another column containing the lagged values of the
loss differential. In the present study, we examined
fixed values of q as well as a choice of q based on the
autocorrelation structure of the loss differential. To
balance the necessity of including highly correlated
lags of the loss differential and the danger of eroding
the power of the test, we include lags with sample
autocorrelation higher than 0.20 in the test function,
and choose q accordingly (using q=2 if there is no
autocorrelation of this magnitude). Table 3 reports the
results with q chosen in this way; results based on
fixed values of q are broadly similar, but differ
substantially in detail.
The GARCH(1,1) and AR(12)-rv forecasting
models are compared in columns 2 and 3 of the table;
columns 4 and 5 compare these forecasting models
with the forecast implicit in the constant unconditional
variance. A significantly lower forecast MSE relative
to the constant-variance forecasts represents signifi-
cant forecast content. That is, the null hypothesis is of
equal conditional predictive ability of the model-
based (GARCH or AR) forecasts and the uncondi-
tional variance, r2. Under this null, forecast content is
zero, so that the tests in columns 4 and 5 are tests of
significance of estimated forecast content at horizon s,
for each forecasting model.8
The result of the GARCH/AR-rv comparison in this
test is broadly similar to that in the Diebold–Mariano
test, but these p-values tend to be lower. At shorter
horizons, there are significant differences between
tests, whereas at longer horizons the differences
between the two classes of model-based forecast tend
not to be significant at conventional levels. With
respect to the significance of the forecast content at
different horizons, there are clear indications of
significant content of either forecasting method to the
30-day horizon considered here on the DM data. On the
Yen data, the same is true for the AR-rv forecasts, while
GARCH forecast content is not significantly different
from zero at the longer horizons.
To summarize the results from the set of tests, there
appears to be a significant difference in short-horizon
predictive ability in favour of the AR-rv forecasts based
on the additional information in the realized volatilities.
8 Results for the FIGARCH models are not included because of
computational constraints.
Table 2
Diebold–Mariano tests of equal forecast loss: p-values
Horizon (s) GARCHv.AR (DM/$US) GARCHv.AR (Yen/$US) FIGARCHv.AR (DM/$US) FIGARCHv.AR (Yen/$US)
1 b0.01 0.11 b0.01 0.15
2 0.01 0.04 0.01 b0.01
3 0.01 0.25 0.06 0.06
4 0.08 0.14 0.03 0.91
5 0.03 0.09 0.01 0.38
10 0.08 0.10 0.07 0.25
15 0.15 0.37 0.15 0.71
20 0.67 0.72 0.62 0.79
30 0.71 0.32 0.81 0.85
GARCH(1,1) and FIGARCH(1,d,1) vs. AR-realized variance forecasts. Evaluated by realized variance of 5-min returns.
J.W. Galbraith, T. KVs˙Vnbay / International Journal of Forecasting 21 (2005) 249–260 259
At longer horizons near 30 days, the various model-
based methods are difficult to distinguish. However,
although the point estimates of forecast content tend to
be just slightly above zero at 30 days, there is some
indication that this content is statistically significant,
that is, that there is a small but genuine element of
forecast content remaining at the 30-day horizon.
5. Concluding remarks
To measure forecast performance, we need to
estimate ex post the outcome (here, the conditional
variance) which was being forecast. The use of
realized variance, as in Andersen and Bollerslev
(1998) and various subsequent contributions, has
clearly improved our ability to do so, and we take
advantage of this measurement here to examine the
Table 3
Giacomini–White tests of equal conditional predictive ability:
p-values
Horizon
(s)
GARCHv.AR
(DM/$US)
GARCHv.AR
(Yen/$US)
GARCH/AR
vs. r2
(DM/$US)
GARCH/AR
vs. r2
(Yen/$US)
1 b0.01 b0.01 b0.01/b0.01 b0.01/b0.01
2 0.10 b0.01 b0.01/b0.01 b0.01/b0.01
3 0.02 b0.01 b0.01/b0.01 b0.01/b0.01
4 b0.01 b0.01 b0.01/b0.01 b0.01/b0.01
5 0.01 b0.01 b0.01/b0.01 0.02/b0.01
10 0.05 0.03 b0.01/b0.01 0.16/b0.01
15 0.12 0.05 b0.01/0.01 0.28/0.01
20 0.29 0.26 0.01/0.02 0.47/0.02
30 0.06 0.16 b0.01/0.02 0.18/0.01
GARCH(1,1) vs. AR-realized variance forecasts vs. unconditional
mean. Evaluated by realized variance of 5-min returns.
usefulness of forecasts at longer horizons than have
commonly been investigated. Nonetheless, as in
Andersen et al. (2001, 2003), the choice of frequency
of measurement has nontrivial affects on the meas-
ured realized variance. These different estimates
affect our measures of forecast content; we find,
however, that the 5-min realized volatilities provide
the strongest results on all data sets.
The results have a number of other features which
appear common to the different data sets. First,
forecasts of conditional variance appear to have
information content to a horizon of approximately 30
trading days, although the precise maximum horizon
of course depends on the data set and forecast method.
It is possible that a measure of daily variance with even
less noise than the 5-min realized volatilities would
provide yet stronger evidence of the value of variance
forecasts, particularly at long horizons where evidence
is weaker. Second, forecasts based on autoregressive
projection on past realized volatilities do appear to
provide better results at short horizons (to about 10–15
days), as was found by Bollerslev and Wright (2001)
for the 1-day horizon (on DM/$US data). At longer
horizons, as the content of each type of forecasts nears
zero, the methods inevitably become more difficult to
distinguish on statistical grounds. Nonetheless, there is
evidence of significant forecast content, i.e. evidence
that each of the methods usefully exploits conditioning
information to approximately 30 trading days.
Acknowledgement
We would like to thank the Fonds quebecois de la
recherche sur la societe et la culture (FQRSC) and the
J.W. Galbraith, T. KVs˙Vnbay / International Journal of Forecasting 21 (2005) 249–260260
Social Sciences and Humanities Research Council of
Canada (SSHRC) for financial support of this
research, and the Centre Interuniversitaire de
recherche en analyse des organisations (CIRANO)
and Centre Interuniversitaire de recherche en econo-
mie quantitative (CIREQ) for research facilities. The
foreign exchange data were provided by Olsen
Associates. We are also grateful to Raffaella Giaco-
mini for advice on implementation of the Giacomini
and White test, and to Serguei Zernov and Victoria
Zinde-Walsh for their comments and suggestions. The
views expressed in this paper are those of the authors
and do not necessarily represent the views or policy of
the IMF.
References
Andersen, T. G., & Bollerslev, T. (1997). Intraday periodicity and
volatility persistence in financial markets. Journal of Empirical
Finance, 4, 115–158.
Andersen, T. G., & Bollerslev, T. (1998). Answering the skeptics:
Yes, standard volatility models do provide accurate forecasts.
International Economic Review, 39(4), 885–905.
Andersen, T. G., Bollerslev, T., Diebold, F. X., & Labys, P. (2001).
The distribution of realized exchange rate volatility. Journal of
the American Statistical Association, 96, 42–55.
Andersen, T. G., Bollerslev, T., Diebold, F. X., & Labys, P. (2003).
Modeling and forecasting realized volatility. Econometrica, 71,
579–625.
Baillie, R. T., Bollerslev, T., & Mikkelsen, H. O. (1996). Fraction-
ally integrated generalized autoregressive conditional hetero-
skedasticity. Journal of Econometrics, 74, 3–30.
Barndorff-Neilsen, O. E., & Shephard, N. (2002). Estimating
quadratic variation using realized variance. Journal of Applied
Econometrics, 17, 457–477.
Bollerslev, T. (1986). Generalized autoregressive conditional
heteroskedasticity. Journal of Econometrics, 31, 307–327.
Bollerslev, T., & Domowitz, I. (1993). Trading patterns and prices
in the interbank foreign exchange market. Journal of Finance,
48, 1421–1443.
Bollerslev, T., & Wright, J. H. (2001). High-frequency data,
frequency domain inference, and volatility forecasting. Review
of Economics and Statistics, 83, 596–602.
Christoffersen, P., & Diebold, F. X. (2000). How relevant is
volatility forecasting for financial risk management? Review of
Economics and Statistics, 82, 12–22.
Clark, T. E., & McCracken, M. W. (2001). Tests of equal forecast
accuracy and encompassing for nested models. Journal of
Econometrics, 105, 85–110.
Dacorogna, M. M., Mqller, U. A., Nagler, R. J., Olsen, R. B., &Pictet, O. V. (1993). A geographical model for the daily and
weekly seasonal volatility in the foreign exchange market.
Journal of International Money and Finance, 12, 413–438.
Diebold, F. X., & Mariano, R. (1995). Comparing predictive
accuracy. Journal of Business and Economic Statistics, 13,
253–259.
Engle, R. F. (1982). Autoregressive conditional heteroskedasticity
with estimates of the variance of U.K. inflation. Econometrica,
50, 987–1008.
Galbraith, J. W. (2003). Content horizons for univariate time series
forecasts. International Journal of Forecasting, 19, 43–55.
Giacomini, R., & White, H. (2003). Tests of Conditional
Predictive Ability, Working paper, University of California,
San Diego.
Meddahi, N. (2002). A theoretical comparison between integrated
and realized volatilities. Journal of Applied Econometrics, 17,
479–505.
Mqller, U. A., Dacorogna, M. M., Olsen, R. B., Pictet, O. V.,
Schwarz, M., & Morgenegg, C. (1990). Statistical study of
foreign exchange rates, empirical evidence of a price scaling
law, and intraday analysis. Journal of Banking and Finance, 14,
1189–1208.
Biographies: John W. GALBRAITH is a Professor of Economics at
McGill University, Montreal, Canada, and a Fellow of the Centre
Interuniversitaire de Recherche en Analyse des Organisations
(CIRANO) and of the Centre Interuniversitaire de Recherche en
Economie Quantitative (CIREQ). He completed his doctorate in
Economics at the University of Oxford in 1987. His research is
primarily concerned with time-series econometrics and has appeared
in journals such as Biometrika, Econometric Theory, International
Economic Review, Journal of Applied Econometrics and Journal of
Econometrics.
Turgut KIS˙INBAY is an economist at the International Monetary
Fund. He received his BA in Economics from Bogazici University,
Istanbul, Turkey, and his PhD in Economics from McGill
University, Montreal, Canada. His research interests are in the areas
of monetary economics, business cycle analysis, time-series
econometrics, and financial econometrics.