high-dimensional covariance forecasting for short intra-day horizons

14
This article was downloaded by: [University of Iowa Libraries] On: 04 October 2014, At: 23:59 Publisher: Routledge Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK Quantitative Finance Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/rquf20 High-dimensional covariance forecasting for short intra-day horizons Roel C. A. Oomen a a Deutsche Bank , London, and Department of Quantitative Economics, University of Amsterdam Published online: 06 Apr 2010. To cite this article: Roel C. A. Oomen (2010) High-dimensional covariance forecasting for short intra-day horizons, Quantitative Finance, 10:10, 1173-1185, DOI: 10.1080/14697680903220349 To link to this article: http://dx.doi.org/10.1080/14697680903220349 PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http:// www.tandfonline.com/page/terms-and-conditions

Upload: roel-c-a

Post on 21-Feb-2017

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: High-dimensional covariance forecasting for short intra-day horizons

This article was downloaded by: [University of Iowa Libraries]On: 04 October 2014, At: 23:59Publisher: RoutledgeInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office: MortimerHouse, 37-41 Mortimer Street, London W1T 3JH, UK

Quantitative FinancePublication details, including instructions for authors and subscription information:http://www.tandfonline.com/loi/rquf20

High-dimensional covariance forecasting for shortintra-day horizonsRoel C. A. Oomen aa Deutsche Bank , London, and Department of Quantitative Economics, University ofAmsterdamPublished online: 06 Apr 2010.

To cite this article: Roel C. A. Oomen (2010) High-dimensional covariance forecasting for short intra-day horizons,Quantitative Finance, 10:10, 1173-1185, DOI: 10.1080/14697680903220349

To link to this article: http://dx.doi.org/10.1080/14697680903220349

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) containedin the publications on our platform. However, Taylor & Francis, our agents, and our licensors make norepresentations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose ofthe Content. Any opinions and views expressed in this publication are the opinions and views of the authors,and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be reliedupon and should be independently verified with primary sources of information. Taylor and Francis shallnot be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and otherliabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to orarising out of the use of the Content.

This article may be used for research, teaching, and private study purposes. Any substantial or systematicreproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in anyform to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

Page 2: High-dimensional covariance forecasting for short intra-day horizons

Quantitative Finance, Vol. 10, No. 10, December 2010, 1173–1185

High-dimensional covariance forecasting for

short intra-day horizons

ROEL C. A. OOMEN*

Deutsche Bank, London, and Department of Quantitative Economics, University of Amsterdam

(Received 4 November 2007; in final form 28 July 2009)

Asset return covariances at intra-day horizons are known to tend towards zero due to marketmicrostructure effects. Thus, traders who simply scale their daily covariance forecast to matchtheir trading horizon are likely to over-estimate the actual experienced asset dependence. Inthis paper, some of the key challenges are discussed that are encountered when forecastinghigh-dimensional covariance matrices for short intra-day horizons. Based on a novelevaluation methodology, and extensive empirical analysis, specific recommendations are maderegarding model design and data sampling.

Keywords: Vast covariance matrices; Forecast evaluation; Market microstructure; Factormodels

1. Introduction

This paper discusses some of the key ingredients of arobust and reliable approach to large-dimensional covar-iance matrix forecasting over short intra-day horizons.With the rapid development of algorithmic portfolioexecution engines, and statistical arbitrage strategiesoperating at increasingly high frequencies, good covar-iance forecasts for intra-day use are important. In thissetting, the main challenge is to deal correctly with themarket microstructure contaminations that emerge indata sampled at ultra-high frequency. In this study, wecontrast five competing methods, three based on intra-day data and two based on daily data. We cover modelswith and without microstructure noise corrections andmodels with and without factor structure imposed. Ourperformance evaluation approach centres around portfo-lio optimality and portfolio stability. The large amountof available data allows us to discriminate accuratelyamongst the performance of the alternative models andwe illustrate their relative merits and weaknesses using theFTSE-100 index constituents.

Our main findings can be summarized by the following‘rules-of-thumb’ or recommendations that should beconsidered when developing a covariance forecastingprocedure at intra-day horizons.

don’t use low-frequency risk models for high-frequencycovariance forecasting

Our analysis shows that models based on daily orlower-frequency data perform poorly at intra-day fre-quencies. This is primarily because, at intra-day frequen-cies, the dependence structure of returns changes in subtleways due to various market microstructure effects andthis cannot be captured by models based on daily data.Thus, the common practice of scaling the covariancematrix using the ‘

ffiffiffiffiTp

-rule’ fails when moving to intra-dayhorizons.

impose a factor structure to stabilize portfolio weights

Particularly in large-dimensional systems, the stability ofthe covariance matrix can be at risk when the number oftime series observations is limited. Our analysis showsthat a statistical factor reduction is effective in ‘stabilizing’the covariance matrix and that this leads to significantreduction in portfolio weight variation and resultingtransaction costs.

align sampling frequency with trading or evaluationfrequency

The mis-match observed between the level of covarianceat daily and intra-day frequencies is not a spurious effectintroduced by data sampling. Instead, it reflects thegenuinely different dependence structure that a traderexperiences over short intra-day horizons. To best capturethe covariance structure that one is exposed to it is key toalign the data sampling frequency with the (expected)trading or evaluation frequency. Failing to do so can*Email: [email protected]

Quantitative FinanceISSN 1469–7688 print/ISSN 1469–7696 online � 2010 Taylor & Francis

http://www.informaworld.comDOI: 10.1080/14697680903220349

Dow

nloa

ded

by [

Uni

vers

ity o

f Io

wa

Lib

rari

es]

at 2

3:59

04

Oct

ober

201

4

Page 3: High-dimensional covariance forecasting for short intra-day horizons

severely under- or over-estimate the experienced depen-dence among the assets.

2. Covariance forecasting and evaluation

Below, we describe the forecasting models, performanceevaluation criteria, and data we use in this paper. As thefocus is on covariance forecasting over short horizons, themodels are deliberately built using intra-day data. Wemake forecasts at the start of each day and then evaluatethese – using various portfolio optimality and stabilitycriteria – over subsets of the asset universe and shortintervals throughout that day. Here the intervals andsubsets are randomly selected, and the procedure repeatedmany times per day. We should emphasise that thecovariance forecasting methods we use in this paper arerather simple and there is substantial scope for refinement.For instance, one could consider the use of irregularlysampled transaction data to estimate covariances (e.g.Hayashi and Yoshida 2005). Also, incorporating longmemory effects (e.g. Chiriac and Voev 2009), accountingfor diurnal patterns in volatility and correlation dynamics,and extending the methodology to allow for real-timeupdates of the forecast, would be interesting. We willfurther discuss and illustrate some of these aspects inthe concluding remarks, but leave a detailed analysis forfuture work as it goes well beyond the scope of the currentpaper and distracts from the main points we want to make.

2.1. Forecasting methods

Define the n� 1 vector of the ith intra-day�t returns for aset of n assets as

rtþi=M ¼ ptþi=M � ptþði�1Þ=M for i ¼ 1, 2, . . . ,M,

where p denotes the log price vector and M denotes thenumber of synchronously sampled intra-day returns. Theex-post realised covariance (RC) for day�t, using returnssampled at frequency M, is defined as

RCM,t �XMi¼1

rtþi=Mr0tþi=M: ð1Þ

Under ideal conditions, in a frictionless market, prices aremartingales and it is well known (see, e.g., Barndorff-Nielsen and Shephard 2004) that RC is an unbiased andconsistent estimator of the true covariance matrix withincreasingly fine sampling, i.e. M!1. In other words,the expectation of RC is invariant to the choice of M butthe accuracy of the estimates improves by increasingM. Inpractice, however, market microstructure effects such asbid–ask bounce, non-synchronous trading, and sluggishadjustment of prices, are a reality and RC loses theseproperties (see for instance Fisher 1966, Scholes andWilliams 1977, Epps 1979, Roll 1984, Griffin and Oomenforthcoming). In particular, Epps (1979) was the first tonote that as the sampling frequency increases, covariance

estimates tend to zero as a consequence ofnon-synchronous trading effects (see also Lo andMacKinlay 1990). This so-called Epps effect is illustratedin figure 1 for the FTSE-100 dataset used in this paper.Consequently, when implementing RC the samplingfrequency plays a key role: it controls the level of noisecontamination present in the sampled data and determinesthe horizon of the covariance estimate. To mitigate theEpps effect, Scholes and Williams (1977)y suggestincorporating a cross-autocovariance correction, i.e.

SWq,M,t

�RCM,tþXqj¼1

XM�ji¼1

rtþðiþjÞ=Mr0tþi=Mþ rtþi=Mr0tþðiþj Þ=M

� �: ð2Þ

Note that with q¼ 1, the diagonal elements of SWcorrespond to the bias-corrected realised variance estima-tor of Zhou (1996). Griffin and Oomen (forthcoming)provide a detailed theoretical and empirical study of theproperties of SW in a bi-variate setting with noise andnon-synchronous trading and compare its performance toRC and the Hayashi and Yoshida (2005) estimator. Fromfigure 1 we observe that the SW modification is quiteeffective in reducing the bias induced by non-synchronoustrading: while the average correlation at a five-minutefrequency for RC is about 20%, SW with q¼ 2 estimatesthe correlation at 26%, which is the level at which itstabilizes as the sampling frequency is lowered and thenoise progressively reduced. An important drawback ofthe estimator in equation (2) is that it is not guaranteedto be positive definite. This can be resolved by applyingsuitable kernel weights to the cross-autocovariance terms.Barndorff-Nielsen et al. (2008) derive various asymptoticresults for such an estimator based on a refresh timesampling scheme.

We now turn to a description of the competingcovariance forecasts we consider in this study. Given atime series of realised covariance matrices, the baselineforecasting method is constructed by simple exponentialfiltering of past RC measurements, i.e.

RCM,tjt�1 ¼1� �

1� ��

X�j¼1

�j�1RCM,t�j: ð3Þ

Thus, the covariance matrix is forecast at a dailyfrequency – by exponentially smoothing a time series ofcovariance matrices computed from intra-day data – andis then assumed to remain unchanged throughout the day.

In large-dimensional systems, particularly when wehave more assets than return observations (i.e. n4M),realised covariance – or a forecast thereof – may becomeunstable and a factor based approach is often desirable.Therefore, the second forecasting method we consider is astandard principal components (PC) reduction applied tothe covariance forecast by equation (3). Here, the numberof statistical factors to include is optimally set followingthe methodology outlined in Johnstone (2001). Seeappendix B for further details.

ySee also Dimson (1979) and Cohen et al. (1983).

1174 R.C.A. Oomen

Dow

nloa

ded

by [

Uni

vers

ity o

f Io

wa

Lib

rari

es]

at 2

3:59

04

Oct

ober

201

4

Page 4: High-dimensional covariance forecasting for short intra-day horizons

As already mentioned above, microstructure effects canbe a concern when constructing covariance forecasts fromintra-day data. Motivated by this, our third forecastingmethod is based on exponential filtering of a time-seriesof SW measurements, i.e.

SWq,M,tjt�1 ¼1� �

1� ��

X�j¼1

�j�1SWq,M,t�j: ð4Þ

In addition to the above methods, we further considertwo benchmark forecasts commonly used in this litera-ture. The first is a ‘BARRA-type’ fundamental factormodel that decomposes risk into industry exposure andan idiosyncratic company specific component. The modelis estimated from daily data, using market cap weighting,and optimised for a daily forecasting horizon. Hence, werefer to it as the daily factor model or DF. See Briner andConnor (2008) for a detailed description of such models.The second benchmark method we consider is the populardynamic conditional correlation (DCC) model of Engleand Sheppard (2001) and Engle (2002). Here, univariateGARCH models are estimated for each individual assetin the universe with highly parsimonious joint correlationdynamics imposed on the full system. Like the DF model,we estimate the DCC model from daily data. Even thoughDF and DCC are expressly not designed to forecastcovariances at intra-day horizons, we recognize that thesetypes of risk models are used extensively throughout theindustry and it is conceivable that they are being appliedto increasingly short horizons as the speed of tradingcontinues to grow. The results presented here should

therefore be viewed as a measure of the potential gains orlosses associated with the arguably sub-optimal use of alow-frequency model for high-frequency forecasting.

2.2. Forecast evaluation

The methods described above will produce, for each day,five competing forecasts of the covariance matrix, namelyRC, SW, PC, DF and DCC. In evaluating the quality ofthese forecasts we concentrate on two criteria: portfoliooptimality and portfolio stability or sensitivity.

2.2.1. Evaluation criterion I: portfolio optimality. Theprimary use of a covariance forecast is often to determinean ‘optimal’ portfolio allocation strategy where risk isminimized subject to certain user-defined constraints. Insuch a setting, the best covariance forecast is the one thatgenerates portfolio returns with the lowest ex-post realisedvariance. Our first evaluation criterion is based on thisinsight. In particular, for a given covariance forecast �, wederive the associated optimal portfolio weights !�� for anumber of commonly encountered minimum varianceallocation problems. Based on the out-of-sample assetreturns r realised over the forecast horizon we thenestimate Vðr0!��Þ and identify the best forecasting methodas the one which attains the lowest ex-post realisedportfolio return variance. Here the statistical significanceof differences in performance between competing forecastscan be established using a standard bootstrap procedure.

0 5 10 15 20 25 300.08

0.1

0.12

0.14

0.16

0.18

0.2

0.22

0.24

0.26

0.28

RCSW 1SW 2

Figure 1. The Epps effect and the Scholes–Williams bias correction. Note that this figure plots the average correlation as a functionof sampling frequency in minutes, between the constituents of the FTSE-100 index over the period 1 April 2006 through 31 March2009. RC is defined in equation (1), and SW1 and SW2 are defined in equation (2) with q¼ 1 and q¼ 2, respectively.

High-dimensional covariance forecasting for short intra-day horizons 1175

Dow

nloa

ded

by [

Uni

vers

ity o

f Io

wa

Lib

rari

es]

at 2

3:59

04

Oct

ober

201

4

Page 5: High-dimensional covariance forecasting for short intra-day horizons

See Patton and Sheppard (2009) for some further discus-

sion of this approach.Below, we list the minimum variance portfolio alloca-

tion strategies used to evaluate the forecast performance

of the competing methods (here n denotes the dimension

of � and � is an n� 1 vector of ones).

(i) The ‘net long’ strategy, i.e.

min!!0�! s:t: �0! ¼ 1:

‘take an overall long position, spreading weights to

exploit diversification’

An explicit solution to this optimisation problem

is available: !�� ¼ ��1�=ð�0��1�Þ.(ii) The ‘long only’ strategy, i.e.

min!!0�! s:t: �0! ¼ 1 and f!i � 0gni¼1:

‘take a long-only position, spreading weights to

exploit diversification’

Quadratic programming can be used to obtain

optimal portfolio weights.(iii) The ‘ad hoc long–short’ strategy, i.e.

min!!0�! s:t: �0! ¼ 0, �0j!j ¼ 1,

f!i � 0gbn=2ci¼1 , and f!i � 0gni¼bn=2cþ1:

‘take a cash-neutral long–short position, going long

the first and short the second half of the assetuniverse’

Quadratic programming can be used to obtain

optimal portfolio weights.(iv) The ‘target long–short’ strategy, i.e.

min!!0�! s:t: �0! ¼ 0, and �0! ¼ c4 0:

‘take a cash-neutral long–short position, targeting a

positive expected return’

An explicit solution to this optimisation problem

is available:

!��ðcÞ ¼��1ð�, �Þ

ð�, �Þ0��1ð�, �Þðc, 0Þ0: ð5Þ

(v) The ‘min eig’ strategy, i.e.

min!!0�! s:t: !0! ¼ 1:

‘invest in a long–short position in low-volatility

high-correlation assets’

The vector of optimal portfolio weights !�� is

the eigenvector of � associated with the smallest

eigenvalue.

Strategy 1 is the standard minimum variance problem.

Motivated by Jaganathan and Ma (2003) strategy 2 adds

a short-sell constraint. Both strategies are fully invested inthat �0!¼ 1. In contrast, strategies 3 and 4 enter into long–short portfolios with �0!¼ 0, i.e. the proceeds from takingshort positions are fully re-invested to make the overallposition cash-neutral. Strategy 3 assigns the short assetsin an ad hoc fashion while strategy 4 does this based onthe expected return �. Finally, strategy 5 is a minimumvariance strategy where the positions taken can be bothlong and short provided that the sum of squared weightsadd up to unity. Because the portfolio weights are givenby the eigenvector of � associated with the smallesteigenvalue, this strategy provides an interesting setting inwhich to evaluate the performance of PC. In particular,if PC has retained too few principal components, therealised portfolio variance of strategy 4 will be inflated.As such, our evaluation method in effect provides anout-of-sample test for the number of (economically)significant principal components.

The implementation of strategy 4 requires one to fix �and c. From equation (5) and appendix A we see that theoptimal portfolio weights scale linearly in c and inverse-linearly in �: when the expected returns are halved, weneed to double our position (and thus increase risk) toattain the same target mean. In this paper we set � equalto the sign of the daily open-to-close return. Even thoughsuch an approach uses day�t information and is thereforeunfeasible in practice, � is the same for all the competingcovariance forecasting methods and therefore still allowsus to gauge their relative performance. For given �, theparameter c is then identified by imposing an additionalconstraint on the magnitude of the position, i.e. we find c�

such that �0j!j ¼ 1.By following the above logic underlying the portfolio

optimality criterion, one may also construct maximumvariance portfolios. In that case the best forecast shouldgenerate the highest realised portfolio return variance.Although such a scenario has little relevance from aneconomic viewpoint, it does provide yet another dimen-sion along which to judge the statistical quality of theforecast. Amongst the allocation strategies consideredin this paper, we therefore add the maximum varianceanalogue to strategy 5.

(vi) The ‘max eig’ strategy, i.e.

max!!0�! s:t: !0! ¼ 1:

‘invest in a one-sided position in high-volatility assets’

The vector of optimal portfolio weights !�� is theeigenvector of � associated with the largesteigenvalue.

In recent work, DeMiguel et al. (2009) find that naiveequally weighted ‘1/n’ strategies outperform those basedon risk minimization using covariance forecasts.Motivated by this study, we consider ‘naive’ implementa-tions of strategies 1, 2 and 3 that require no covarianceforecast. In particular, for strategies 1 and 2, the portfolioweights are given by !i¼ 1/n for i¼ 1, . . . , n. For thead hoc long–short strategy 3, the portfolio weights are

1176 R.C.A. Oomen

Dow

nloa

ded

by [

Uni

vers

ity o

f Io

wa

Lib

rari

es]

at 2

3:59

04

Oct

ober

201

4

Page 6: High-dimensional covariance forecasting for short intra-day horizons

given by !i¼ 1/2bn/2c for i¼ 1, . . . , bn/2c and !i¼�1/2(n�bn/2c) for i¼bn/2cþ 1, . . . , n.

2.2.2. Evaluation criterion II: portfolio stability. In addi-tion to portfolio optimality, one may also judge thequality of a covariance forecast by the stability and spreadof the portfolio weights. For instance, extreme positiontaking is often the result of error-maximisation in theoptimisation step.y Also, very small positions are imprac-tical due to contract divisibility and fixed transactioncosts while very large positions may incur excessivemarket impact. Motivated by this, we compute thecross-sectional variation of portfolio weights implied bythe competing forecasting models, i.e.

S1 ¼ !���� ��

2,

where kAkp� (P

ijjAijjp)1/p. Note that criterion S1 is

minimized for the equally weighted ‘1/n’ strategy: it willfavour covariance forecasting methods that result in wellbalanced allocations. In the tables below, we refer to S1 asthe ‘smoothness’ criterion.

In similar spirit, another evaluation criterion can beconstructed specific to the ‘target long–short’ strategy 4.It is motivated by the question: ‘if we revise our view onexpected returns, by how much do the optimal portfolioweights change?’. From a transaction costs viewpoint, thecovariance forecast that yields the more stable weightsis clearly preferred. We compute the following twoquantities:

S2 ¼ q!��=q��� ��

1

S3 ¼ q!��=q��� ��

2,

where q!��=q� can be expressed in closed form, seeequation (A1) in appendix A. Note that S2 (S3) isconsistent with linear (quadratic) transaction costs andin the tables below we refer to it as the ‘linear costs’(‘quadratic costs’) criterion.

2.3. Data and notes on implementation

The dataset we use in this paper consists of last-tickinterpolated 15-second mid-quote data for the constitu-ents of the FTSE-100 index over the period 3 January2006 through 31 March 2009. With official trading hoursfrom 8:00 to 16:30 London time, this results in 2041 priceobservations per asset per day.

To calculate the forecast in equations (3) and (4), weuse a rolling history of �¼ 60 days, a value of � to implyan ad hoc half-life of 1 month, and 15-minute returns orM¼ 34. For PC we include the first eight principalcomponents: these were found to be significant by theJohnstone (2001) test described in appendix B. The DCCmodel is specified in its simplest form (i.e. a GARCH(1,1)

for the asset volatilities and one innovation term and

lagged correlation for the correlation dynamics) withparameters estimated using maximum likelihood.z We

start the forecasting exercise on 1 April 2006 and run it

over the subsequent three years up to 31 March 2009,

totalling 781 trading days.For a given day and covariance forecast, we calculate

the optimal portfolio weights for the strategies described

above for a restricted universe of n¼ 10 randomly selected

names. Then, with given portfolio weights (which are

assumed to be fixed throughout the day), we calculate 100realised portfolio returns from 100 sets of synchronously

sampled asset returns obtained by selecting a random

starting point within the day and a random duration with

a pre-specified mean �d. This procedure is then repeated50 times. Specifically, let p( j) denote the n� 1 price vector

associated with the jth random draw of n names from the

universe. The realised portfolio returns are calculated as

zt,j,i ¼ pð j ÞtþðsiþdiÞ=2040

� pð j Þtþsi=2040

� �0!ð j Þtjt�1,

for i¼ 1, 2, . . . , 100, j¼ 1, 2, . . . , 50, and t¼ 1, 2, . . . , 781.The starting point s � i.i.d. U(0, 2040) and duration

d� 1� i.i.d. Poisson (�d� 1). This sampling procedure is

graphically illustrated in figure 2. When sþ d42040 we

‘wrap’ the data by continuing to sample the remainingreturns from the beginning of the same trading day. Such

an approach is quite common in the bootstrap literature

(e.g. Politis and Romano 1994), and is suitable here as

well. In the analysis below, we vary the trade orevaluation horizon �d between 1 and 45 minutes

depending on the experiment.In summary, for each day from 1 April 2006 onwards,

we compute the five competing covariance forecasts RC,

PC, SW, DF and DCC. Next, for the six allocationstrategies we compute their implied optimal portfolio

weights using 50 randomly drawn subsets of the asset

universe each of size 10, we then randomly draw 100

synchronous returns for the respective stocks, and finallycompute the realised portfolio return, stacked over

universe draws, return draws and days. This leads to

50� 100� 781¼ 3 905 000 returns for each strategy and

covariance forecasting method. Based on these returnseries, we then compute the realised portfolio return

volatility (in basis points) and use a bootstrap re-sampling

method to determine significant differences in perfor-

mance. The main advantages of a forecast evaluationapproach as described here are: (i) it thoroughly

‘exercises’ the covariance matrix along several dimensions

by considering subsets of the full asset universe as well as

various allocation problems and random time horizons;(ii) it mimics a real trading environment with random

timing and duration of trades; (iii) the large number of

generated portfolio returns allows us to measure

yThe solution to the minimum variance allocation problem tends to invest in assets for which the volatility is under-estimated andtake long–short positions in assets for which the correlation is over-estimated. In this case, the ex-ante portfolio risk is evidentlylower than the ex-post portfolio risk.zThe DCC model is estimated using the ‘UCSD GARCH’ MatlabTM toolbox of Kevin Sheppard available fromwww.kevinsheppard.com.

High-dimensional covariance forecasting for short intra-day horizons 1177

Dow

nloa

ded

by [

Uni

vers

ity o

f Io

wa

Lib

rari

es]

at 2

3:59

04

Oct

ober

201

4

Page 7: High-dimensional covariance forecasting for short intra-day horizons

accurately the statistical significance of differences inperformance among the competing forecasts; and (iv) theprocedure concentrates on the dependence among assetsand is invariant to scaling of the covariance matrix.

3. Empirical results

Below, we discuss the forecast evaluation results. Tofacilitate comparison and interpretation of the results, wereport the performance statistics as fractions relative toRC. In all tables, bootstrapped p-values are reportedin parenthesis below, for the null hypothesis that theperformance statistic associated with a particular methodis lower than that of RC. For all evaluation criteria –except the ‘max eig’ realised portfolio volatility – a ratioof less than one with a p-value sufficiently close to onemeans that the method under consideration is a signifi-cant improvement over RC and vice versa.

don’t use low-frequency risk models for high-frequencycovariance forecasting

Panel A of table 1 reports evaluation criterion I, i.e. therealised portfolio volatility, for the competing forecastingmethods. The message is unambiguous: the forecastingmethods DF and DCC, which are based on daily data,significantly under-perform RC. At the same time, theperformance of PC is statistically indistinguishable from

RC at conventional confidence levels. These results hold

for all strategies considered. For instance, for the ‘target

long–short’ minimum variance strategy 4, the realised

portfolio volatility when using RC is 10.05 basis points

(over a 15-minute horizon). For PC, this figure is 0.2%

higher but statistically insignificant with a p-value of

39%, whereas for DF and DCC the realised portfolio

volatility is significantly increased by 8.3 and 3.5%,

respectively. Similarly, for the ‘max eig’ maximum vari-

ance strategy 6, RC attains a volatility of 88 basis points.

For PC this figure is 0.1% lower but insignificant, while

for DF and DCC the reduction is highly significant.

Figure 3 further illustrates these findings. Panels A and B

plot the bootstrapped distributions of the performance

criterion for the two strategies and we clearly observe a

close correspondence between the RC and PC distribu-

tions, while those for DF and DCC are shifted in

unfavourable directions (to the right for minimum vari-

ance strategy and to the left for the maximum variance

strategy). Also note the performance of PC for ‘min eig’

strategy 5. This strategy invests in the smallest eigenvector

of the covariance matrix. If the number of principal

components used for the construction of PC was too

small, we would expect the resulting portfolio volatility to

be significantly higher than for RC. Yet, we find that it is

comparable to RC, indicating that PC is both statistically

and economically well specified.As already alluded to above, the under-performance of

DF and DCC is in itself not that surprising: the models are

08:00 09:00 10:00 11:00 12:00 13:00 14:00 15:00 16:00

0.95

1

1.05

1.1

1 23 45 6 7 891011 1213 14 1516 171819 20 2122 2324 25 26 27 282930 3132 33 34 35363738 394041 42 4344 4546 47 4849 50 51 5253 5455 56 5758 5960 61626364 6566 6768697071 72 737475 7677 78 798081 8283 84 85868788 899091 92 93 949596 9798 99 100

Figure 2. Illustration of intra-day random return sampling. Note that this figure illustrates the procedure for randomly samplingintra-day returns. Each numbered horizontal line indicates the sampling horizon of a random return draw. With �d¼ 15 minutes,the majority of durations lie between 10 and 20 minutes.

1178 R.C.A. Oomen

Dow

nloa

ded

by [

Uni

vers

ity o

f Io

wa

Lib

rari

es]

at 2

3:59

04

Oct

ober

201

4

Page 8: High-dimensional covariance forecasting for short intra-day horizons

13.8 14 14.2 14.4 14.6 14.8 15 15.2 15.4 15.6

RCPCSWDFDCC

0.56 0.58 0.6 0.62 0.64 0.66 0.68 0.7 0.72 0.74

RCPCSWDFDCC

84 85 86 87 88 89 90

RCPCSWDFDCC

3.5 3.6 3.7 3.8 3.9 4

RCPCSWDFDCC

A B

DC

Figure 3. Bootstrap distributions of selected evaluation criteria. Note that this figure reports the bootstrap distributions of therealised portfolio volatility for minvar strategy 4 and maxvar strategy 6 in Panels A and B and the portfolio sensitivity measures 2and 3 in Panels C and D. Panel A: minvar ‘long–short target’ strategy, Panel B: maxvar ‘max eig’ strategy, Panel C: sensitivitymeasure S2 and Panel D: sensitivity measure S3.

Table 1. Forecast evaluation of competing models.

RC PC SW DF DCC NAIVE

Panel A: evaluation criterion I (realised portfolio volatility by strategy)

1. ‘net long’ 21.055 1:002ð0:37Þ

1:028ð0:00Þ

1:135ð0:00Þ

1:133ð0:00Þ

1:201ð0:00Þ

2. ‘long only’ 21.145 1:001ð0:43Þ

1:018ð0:00Þ

1:057ð0:00Þ

1:094ð0:00Þ

1:196ð0:00Þ

3. ‘ad hoc long–short’ 10.048 1:002ð0:36Þ

1:023ð0:00Þ

1:155ð0:00Þ

1:064ð0:00Þ

1:214ð0:00Þ

4. ‘target long–short’ 14.067 1:002ð0:39Þ

1:015ð0:04Þ

1:083ð0:00Þ

1:035ð0:00Þ

5. ‘min eig’ 22.799 1:009ð0:07Þ

1:039ð0:00Þ

1:157ð0:00Þ

1:109ð0:00Þ

6. ‘max eig’ 88.346 0:999ð0:54Þ

0:998ð0:61Þ

0:963ð1:00Þ

0:981ð0:99Þ

Panel B: evaluation criterion II (portfolio sensitivity measures for strategy 4)

1. ‘smoothness’ 0.168 0:986ð1:00Þ

1:057ð0:00Þ

1:108ð0:00Þ

1:106ð0:00Þ

2. ‘linear costs’ 3.600 0:974ð1:00Þ

1:054ð0:00Þ

1:031ð0:00Þ

1:102ð0:00Þ

3. ‘quadratic costs’ 0.595 0:930ð1:00Þ

1:089ð0:00Þ

1:079ð0:00Þ

1:203ð0:00Þ

Average correlation 0.240 0.245 0.270 0.344 0.375

Note that this table reports the realised portfolio volatility for the strategies described in section 2.2.1 and the portfolio sensitivity statistics for

strategy 4 as described in section 2.2.2. RC is taken as the benchmark, and the statistics for all methods are reported as ratios relative to RC.

Bootstrapped p-values are reported in parentheses below, for the null hypothesis that the sample statistic associated with RC is higher than its

competitor. The column ‘NAIVE’ reports results for the naive equally weighted strategies that require no covariance forecast.

High-dimensional covariance forecasting for short intra-day horizons 1179

Dow

nloa

ded

by [

Uni

vers

ity o

f Io

wa

Lib

rari

es]

at 2

3:59

04

Oct

ober

201

4

Page 9: High-dimensional covariance forecasting for short intra-day horizons

designed for longer-term horizons and estimated fromdaily data. In contrast, both RC and PC are based onintra-day data and therefore better able to capture thedependence structure of returns at short horizons. Notefrom table 1 that the average correlation (across allasset-combinations and over time) for DF and DCC isabout 10% higher than it is for the RC and PC methodsbased on 15-minute data. As already discussed andillustrated in figure 1, this is a direct consequence ofnon-synchronous trading leading to the well known Eppseffect. The important point, however, is that this down-ward ‘bias’ in correlation is not spurious. It reflects theweakened dependence between assets that one is exposedto when the trading or evaluation horizon is very short.Put differently, RC computed from 15-minute data doesprovide an unbiased estimate for the true covariancebetween assets at a 15-minute horizon: it measures thedependence ‘experienced’ by the trader over this shorthorizon. Yet, the RC estimates are biased when evaluatedover a daily horizon. We can further substantiate thispoint by looking at the performance of SW. As we can seefrom figure 1 and table 1, SW forecasts higher correlationamongst the assets, thanks to the correction that capturesthe lead–lag dependence induced by non-synchronoustrading or sluggish adjustment of prices. If the Epps effectwas spurious, then SW should perform better than RCor PC but we find the opposite. All the minimum variancestrategies lead to significantly higher realised portfoliovolatility.

We should make three further observations. Firstly,there is recent work which shows that the use of intra-daydata helps to improve volatility forecasts (see for exampleAndersen et al. 2003). One may argue that our findingthat forecasting methods based on intra-day data outper-form those based on daily data simply reflect this. Yet, thecomparison between RC/PC and SW shows that appro-priately capturing the dependence structure of returnsat intra-day horizons is key. Thus, at least some of theimprovement obtained by moving from daily to intra-daydata, comes from this effect and not only from merelyhaving more data. Secondly, we see from table 1 that thenaive ‘1/n’ strategy of DeMiguel et al. (2009) under-performs all other methods, including those based ondaily data. Thus, covariance models are useful and doplay an important role in asset allocation, at least in thecurrent setting. Thirdly, Jaganathan and Ma (2003) findthat imposing a long-only constraint helps the perfor-mance of minimum variance strategies. Our results,however, show that the realised portfolio volatility of‘long only’ strategy 2 is always higher than that of ‘netlong’ strategy 1. The constraint harms the performancerather than improving it by providing more structure.

impose a factor structure to stabilize portfolio weights

Panel B of table 1 reports evaluation criterion II, i.e. theportfolio sensitivity measures for strategy 4. Up to thispoint, we found that RC and PC are the best performingmethod and statistically indistinguishable based on eval-uation criterion I. Yet, now we see that when RC and PC

are compared based on their implied portfolio sensitivityPC is clearly superior. For instance, under ‘linear costs’the improvement is about 2.5% and highly significant(also note that SW, DF and DCC all deteriorate relativeto RC and are thus inferior on both criteria). For‘quadratic costs’ the patterns is even more pronounced.As before, Panels C and D of figure 3 reiterate thesefindings by plotting the bootstrapped distribution of theperformance measures for all strategies. The distributionof PC clearly lies to the left of all its competitors. From allthis, it is evident that the imposed factor structure is keyin stabilizing the covariance matrix – and consequentlythe portfolio weights – while retaining good performancein terms of realised portfolio volatility.

To gauge the robustness of the above results, considertable 2. From Panels A–C, we see that our conclusionsremain unchanged over different sample periods. Notethat the realised portfolio volatility for all strategies aresignificantly higher in the third subsample from April2008 to March 2009. This is of course a direct reflection ofthe extremely turbulent market environment over thisperiod. Yet, the relative performance of the forecaststrategies remains unchanged with RC and PC superior toSW, DCC and DF according to portfolio optimalitycriterion I and, in addition, PC is superior to RCaccording to portfolio stability criterion II. The resultsare also robust to increasing the portfolio size to 25 assets(Panel D) and alternative sampling frequencies of 5 and45 minutes (Panels E and F).

align sampling frequency with trading or evaluationfrequency

As already discussed above, the use of high-frequency datais important because it allows one to capture the depen-dence structure of returns at intra-day horizons. But howshould we select the sampling frequency in relation to thetrading or evaluation frequency? To shed some light onthis question, consider table 3. Here, to compute the PCforecast, the sampling frequency and evaluation frequencyis varied between 1 and 45 minutes. The statistics andbootstrapped p-values are now computed relative to thebenchmark case where the sampling frequency is equal tothe evaluation frequency. Looking at the results, a clearoverall pattern emerges, namely: it is optimal to align thesampling frequency with the evaluation frequency.Consider for instance the ‘net long’ minimum variancestrategy 1 with an evaluation strategy of 1 minute. Whenthe sampling frequency is also 1 minute, the realisedportfolio volatility is 5.09 basis points, but this deterio-rates by more than 6% when lowering the samplingfrequency to 45 minutes. At the same time, when theevaluation frequency is 15 minutes, but data are sampledat a 1-minute frequency (Panel C) then the realisedportfolio volatility also deteriorates significantly. Thereare a few cases where this pattern does not hold up (e.g.‘target long–short’ strategy) but here the results are notsignificant and therefore do not contradict the statement.When lowering the evaluation frequency further to45 minutes, we see that the harm of sampling at 15

1180 R.C.A. Oomen

Dow

nloa

ded

by [

Uni

vers

ity o

f Io

wa

Lib

rari

es]

at 2

3:59

04

Oct

ober

201

4

Page 10: High-dimensional covariance forecasting for short intra-day horizons

Table

2.Perform

ance

evaluationrobustnessanalysis.

Panel

A:Apr06–Mar07

Panel

B:Apr07–Mar08

Panel

C:Apr08–Mar09

RC

PC

SW

DF

DCC

NAIV

ERC

PC

SW

DF

DCC

NAIV

ERC

PC

SW

DF

DCC

NAIV

E

1.‘net

long’

10.99

1:00

ð0:41Þ

1:03

ð0:00Þ

1:10

ð0:00Þ

1:21

ð0:00Þ

1:14

ð0:00Þ

17.93

1:00

ð0:41Þ

1:03

ð0:00Þ

1:06

ð0:00Þ

1:11

ð0:00Þ

1:16

ð0:00Þ

29.68

1:00

ð0:43Þ

1:03

ð0:00Þ

1:17

ð0:00Þ

1:13

ð0:00Þ

1:22

ð0:00Þ

2.‘longonly’

11.00

1:00

ð0:42Þ

1:02

ð0:00Þ

1:08

ð0:00Þ

1:15

ð0:00Þ

1:14

ð0:00Þ

17.96

1:00

ð0:45Þ

1:02

ð0:04Þ

1:04

ð0:00Þ

1:08

ð0:00Þ

1:16

ð0:00Þ

29.85

1:00

ð0:46Þ

1:02

ð0:02Þ

1:06

ð0:00Þ

1:09

ð0:00Þ

1:22

ð0:00Þ

3.‘adhoclong–short’

6.70

1:00

ð0:30Þ

1:02

ð0:00Þ

1:08

ð0:00Þ

1:06

ð0:00Þ

1:15

ð0:00Þ

8.51

1:00

ð0:20Þ

1:02

ð0:00Þ

1:09

ð0:00Þ

1:05

ð0:00Þ

1:12

ð0:00Þ

13.57

1:00

ð0:47Þ

1:02

ð0:00Þ

1:20

ð0:00Þ

1:07

ð0:00Þ

1:26

ð0:00Þ

4.‘target

long–short’

9.08

1:00

ð0:44Þ

1:01

ð0:09Þ

1:04

ð0:00Þ

1:03

ð0:00Þ

–11.70

1:00

ð0:38Þ

1:02

ð0:06Þ

1:04

ð0:00Þ

1:03

ð0:00Þ

–19.28

1:00

ð0:47Þ

1:02

ð0:11Þ

1:11

ð0:00Þ

1:04

ð0:00Þ

5.‘m

ineig’

15.42

1:01

ð0:05Þ

1:03

ð0:00Þ

1:12

ð0:00Þ

1:07

ð0:00Þ

–19.88

1:01

ð0:10Þ

1:03

ð0:00Þ

1:08

ð0:00Þ

1:08

ð0:00Þ

–30.33

1:01

ð0:21Þ

1:04

ð0:00Þ

1:20

ð0:00Þ

1:13

ð0:00Þ

6.‘m

axeig’

45.20

1:00

ð0:51Þ

0:99

ð0:86Þ

0:97

ð1:00Þ

0:97

ð1:00Þ

–68.92

1:00

ð0:53Þ

0:99

ð0:78Þ

0:98

ð0:96Þ

0:98

ð0:98Þ

–128.38

1:00

ð0:51Þ

1:00

ð0:50Þ

0:96

ð1:00Þ

0:98

ð0:96Þ

1.‘smoothness’

0.17

0:99

ð1:00Þ

1:05

ð0:00Þ

1:09

ð0:00Þ

1:11

ð0:00Þ

–0.17

0:99

ð0:99Þ

1:06

ð0:00Þ

1:10

ð0:00Þ

1:11

ð0:00Þ

–0.17

0:98

ð1:00Þ

1:06

ð0:00Þ

1:13

ð0:00Þ

1:10

ð0:00Þ

2.‘linearcosts’

3.47

0:98

ð0:98Þ

1:05

ð0:00Þ

1:06

ð0:00Þ

1:11

ð0:00Þ

–3.55

0:97

ð1:00Þ

1:07

ð0:00Þ

1:05

ð0:00Þ

1:10

ð0:00Þ

–3.78

0:97

ð0:99Þ

1:05

ð0:00Þ

0:99

ð0:72Þ

1:10

ð0:00Þ

3.‘quadraticcosts’

0.57

0:94

ð1:00Þ

1:10

ð0:00Þ

1:10

ð0:00Þ

1:23

ð0:00Þ

–0.58

0:93

ð1:00Þ

1:11

ð0:00Þ

1:10

ð0:00Þ

1:20

ð0:00Þ

–0.64

0:92

ð1:00Þ

1:06

ð0:00Þ

1:04

ð0:01Þ

1:18

ð0:00Þ

Averagecorrelation

0.15

0.15

0.22

0.31

0.37

–0.25

0.26

0.28

0.31

0.38

–0.31

0.31

0.31

0.41

0.37

Panel

D:size¼25,freq¼15min

Panel

E:size¼10,freq¼5min

Panel

F:size¼10,freq¼45min

RC

PC

SW

DF

DCC

NAIV

ERC

PC

SW

DF

DCC

NAIV

ERC

PC

SW

DF

DCC

NAIV

E

1.‘net

long’

18.09

1:01

ð0:09Þ

1:05

ð0:00Þ

1:21

ð0:00Þ

1:20

ð0:00Þ

1:29

ð0:00Þ

12.10

1:00

ð0:40Þ

1:02

ð0:00Þ

1:16

ð0:00Þ

1:17

ð0:00Þ

1:18

ð0:00Þ

36.24

1:00

ð0:33Þ

1:04

ð0:00Þ

1:12

ð0:00Þ

1:09

ð0:00Þ

1:21

ð0:00Þ

2.‘longonly’

18.53

1:00

ð0:40Þ

1:04

ð0:00Þ

1:10

ð0:00Þ

1:14

ð0:00Þ

1:26

ð0:00Þ

12.14

1:00

ð0:43Þ

1:01

ð0:01Þ

1:08

ð0:00Þ

1:12

ð0:00Þ

1:18

ð0:00Þ

36.30

1:00

ð0:41Þ

1:02

ð0:00Þ

1:04

ð0:00Þ

1:06

ð0:00Þ

1:21

ð0:00Þ

3.‘adhoclong–short’

6.11

1:01

ð0:01Þ

1:06

ð0:00Þ

1:28

ð0:00Þ

1:09

ð0:00Þ

1:26

ð0:00Þ

6.15

1:00

ð0:29Þ

1:01

ð0:01Þ

1:16

ð0:00Þ

1:08

ð0:00Þ

1:20

ð0:00Þ

16.99

1:00

ð0:43Þ

1:04

ð0:00Þ

1:13

ð0:00Þ

1:04

ð0:00Þ

1:21

ð0:00Þ

4.‘target

long–short’

9.79

1:01

ð0:28Þ

1:14

ð0:00Þ

1:13

ð0:00Þ

1:03

ð0:00Þ

–8.22

1:00

ð0:41Þ

1:01

ð0:08Þ

1:09

ð0:00Þ

1:05

ð0:00Þ

–25.87

1:00

ð0:41Þ

1:02

ð0:00Þ

1:06

ð0:00Þ

1:02

ð0:03Þ

5.‘m

ineig’

19.62

1:07

ð0:00Þ

1:10

ð0:00Þ

1:24

ð0:00Þ

1:14

ð0:00Þ

–14.08

1:01

ð0:05Þ

1:03

ð0:00Þ

1:16

ð0:00Þ

1:12

ð0:00Þ

–39.04

1:01

ð0:04Þ

1:07

ð0:00Þ

1:12

ð0:00Þ

1:07

ð0:00Þ

6.‘m

axeig’

125.51

1:00

ð0:53Þ

1:00

ð0:74Þ

0:98

ð1:00Þ

0:97

ð1:00Þ

–49.70

1:00

ð0:52Þ

1:00

ð0:66Þ

0:96

ð1:00Þ

0:98

ð1:00Þ

–151.74

1:00

ð0:53Þ

1:00

ð0:70Þ

0:97

ð1:00Þ

0:99

ð0:85Þ

1.‘smoothness’

0.08

0:94

ð1:00Þ

1:21

ð0:00Þ

1:22

ð0:00Þ

1:18

ð0:00Þ

–0.17

0:99

ð1:00Þ

1:04

ð0:00Þ

1:13

ð0:00Þ

1:12

ð0:00Þ

–0.17

0:98

ð1:00Þ

1:10

ð0:00Þ

1:08

ð0:00Þ

1:07

ð0:00Þ

2.‘linearcosts’

4.56

0:92

ð1:00Þ

1:24

ð0:00Þ

1:01

ð0:04Þ

1:15

ð0:00Þ

–3.55

0:98

ð1:00Þ

1:05

ð0:00Þ

1:05

ð0:00Þ

1:13

ð0:00Þ

–3.64

0:96

ð1:00Þ

1:09

ð0:00Þ

1:00

ð0:36Þ

1:07

ð0:00Þ

3.‘quadraticcosts’

0.45

0:82

ð1:00Þ

1:22

ð0:00Þ

1:11

ð0:00Þ

1:31

ð0:00Þ

–0.58

0:95

ð1:00Þ

1:10

ð0:00Þ

1:12

ð0:00Þ

1:25

ð0:00Þ

–0.61

0:91

ð1:00Þ

1:09

ð0:00Þ

1:03

ð0:00Þ

1:15

ð0:00Þ

Averagecorrelation

0.24

0.25

0.27

0.34

0.37

–0.20

0.20

0.26

0.34

0.37

–0.25

0.26

0.27

0.34

0.37

Note

thatthis

table

reportsthesamestatisticsasin

table

1forthreesubsamples(inPanelsA–C),differentportfoliosize

(inPanel

D),anddifferentsamplingfrequencies

(inPanelsE

andF).Theevaluation

frequency

issetequalto

thesamplingfrequency

inallcases.

High-dimensional covariance forecasting for short intra-day horizons 1181

Dow

nloa

ded

by [

Uni

vers

ity o

f Io

wa

Lib

rari

es]

at 2

3:59

04

Oct

ober

201

4

Page 11: High-dimensional covariance forecasting for short intra-day horizons

minutes is not significant but at five minutes it is for moststrategies.

Based on the above, we come to the following conclu-sion. At very high trading frequencies, where microstruc-ture effects are ubiquitous, the alignment of samplingfrequency with evaluation frequency is key. In thisfrequency range, the return dependence varies in complexways with the sampling frequency, and in particular, usinga lower sampling frequency tends to over-estimate thecovariance among assets. But when the trading frequencyis lowered, and market microstructure effects diminish, theimportance of alignment diminishes as well. As a roughguide for where the important frequency range lies, onemay consider figure 1 and Panel D of figure 4. The Eppscurve stabilizes at around a 25-minute frequency: whentrading at higher frequencies than this alignment isimportant as the dependence structure of returns varieswith the sampling frequency, but when trading at lowerfrequencies alignment is less important and should bebalanced against the loss of information incurred whenlowering the sampling frequency.

4. Concluding remarks

In this paper we highlight a number of importantconsiderations when building covariance models toforecast return dependence over short intra-day horizons.We find that forecasts that are based on intra-day data,impose a factor structure, and use a sampling frequency

that is aligned with the trading or evaluation frequencyperform best in terms of the various portfolio optimalityand stability metrics considered here. To the best of ourknowledge, the focus on short intra-day horizons and theperformance evaluation methodology used in this paperare new in this literature. We also note that our resultscontradict those of Jaganathan and Ma (2003) andDeMiguel et al. (2009): in our setting, the long-onlyconstraint harms performance and the naive ‘1/n’ strategyseverely under-performs model-based minimum varianceportfolios.

As already alluded to above, the forecasting models weuse are rather simple. This is deliberately done as the focusis more on basic modelling principals than on subtlerefinements. Yet, a number of interesting improvementsare worth investigating. Consider figure 4, which reportssome summary statistics for the FTSE-100 dataset studiedin this paper. From Panels A and B we observe pro-nounced diurnal patterns in both volatility and correla-tion. The volatility curve takes on the well knownU-shape, with large spikes around the US open and atscheduled news announcement times. The correlationpattern is less known, but equally prevalent: correlationis low at the beginning of the day, gradually grows, andreaches its maximum once the US market is trading. Boththese salient features provide scope for improving thecovariance model by allowing the forecast to depend onthe time of the day. In Panel C we plot the bid–ask spreadcurve, which shows a steep decline by a factor of three overthe first hour of trading, after which it stabilizes and

Table 3. Sampling frequency versus evaluation frequency.

Sampling frequency Sampling frequency

1-min 5-min 15-min 45-min 1-min 5-min 15-min 45-min

Panel A: 1-minute evaluation frequency Panel B: 5-minute evaluation frequency1. ‘net long’ 5.086 1:019

ð0:00Þ1:044ð0:00Þ

1:065ð0:00Þ

1:007ð0:14Þ

12.100 1:009ð0:08Þ

1:026ð0:00Þ

2. ‘long only’ 5.090 1:015ð0:01Þ

1:032ð0:00Þ

1:044ð0:00Þ

1:007ð0:15Þ

12.138 1:005ð0:21Þ

1:016ð0:01Þ

3. ‘ad hoc long–short’ 3.016 1:006ð0:13Þ

1:015ð0:00Þ

1:027ð0:00Þ

1:004ð0:25Þ

6.153 1:004ð0:23Þ

1:016ð0:00Þ

4. ‘target long–short’ 3.872 1:001ð0:46Þ

1:004ð0:38Þ

0:998ð0:54Þ

1:008ð0:20Þ

8.223 0:996ð0:65Þ

0:988ð0:88Þ

5. ‘min eig’ 6.945 1:008ð0:14Þ

1:026ð0:00Þ

1:047ð0:00Þ

1:010ð0:02Þ

14.082 1:009ð0:06Þ

1:027ð0:00Þ

6. ‘max eig’ 20.961 0:997ð0:65Þ

0:992ð0:88Þ

0:984ð0:99Þ

0:991ð0:89Þ

49.697 1:000ð0:51Þ

0:993ð0:84Þ

Panel C: 15-minute evaluation frequency Panel D: 45-minute evaluation frequency

1. ‘net long’ 1:015ð0:01Þ

0:999ð0:54Þ

21.055 1:011ð0:03Þ

1:021ð0:00Þ

1:004ð0:45Þ

1:000ð0:70Þ

36.235

2. ‘long only’ 1:015ð0:00Þ

1:002ð0:38Þ

21.145 1:007ð0:15Þ

1:019ð0:00Þ

1:005ð0:21Þ

1:000ð0:50Þ

36.300

3. ‘ad hoc long–short’ 1:010ð0:02Þ

1:001ð0:45Þ

10.048 1:009ð0:04Þ

1:016ð0:01Þ

1:004ð0:50Þ

1:000ð0:79Þ

16.993

4. ‘target long–short’ 1:020ð0:01Þ

1:008ð0:17Þ

14.067 0:988ð0:91Þ

1:024ð0:00Þ

1:010ð0:00Þ

1:000ð0:06Þ

25.872

5. ‘min eig’ 1:027ð0:00Þ

1:004ð0:25Þ

22.799 1:011ð0:04Þ

1:039ð0:00Þ

1:012ð0:06Þ

1:000ð0:67Þ

39.038

6. ‘max eig’ 0:982ð0:99Þ

0:996ð0:69Þ

88.346 0:996ð0:66Þ

0:977ð1:00Þ

0:993ð0:82Þ

1:000ð0:49Þ

151.743

Note that this table reports the realised portfolio volatility for the strategies described in section 2.2.1 for forecasting method PC. The sampling and

evaluation frequency is varied between 1 and 45 minutes. The benchmark case is where the sampling frequency is equal to the evaluation frequency.

All other statistics are reported as ratios relative to the benchmark. Bootstrapped p-values are reported in parentheses below, for the null hypothesis

that the sample statistic associated with the benchmark case is higher than its competitor.

1182 R.C.A. Oomen

Dow

nloa

ded

by [

Uni

vers

ity o

f Io

wa

Lib

rari

es]

at 2

3:59

04

Oct

ober

201

4

Page 12: High-dimensional covariance forecasting for short intra-day horizons

remains roughly constant for the remainder of the day.

Microstructure noise effects induced by the bid–ask

bounce (see for example Roll 1984) are thus expected to

be stronger in the morning. This may at least partially

account for depressed correlation observed early in the

morning. The diurnal pattern in microstructure noise is

again something that may be incorporated in the model-

ling framework.

08:00 09:00 10:00 11:00 12:00 13:00 14:00 15:00 16:00 17:000

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0 5 10 15 20 25 300.98

1

1.02

1.04

1.06

1.08

1.1

1.12

1.14

08:00 09:00 10:00 11:00 12:00 13:00 14:00 15:00 16:00 17:000.5

1

1.5

2

2.5

3

3.5

08:00 09:00 10:00 11:00 12:00 13:00 14:00 15:00 16:00 17:000.5

1

1.5

2

2.5

3

3.5A B

C D

Figure 4. Summary statistics. Note that this figure reports summary statistics, averaged over the FTSE-100 universe. Panel Areports the average volatility (normalized asset-by-asset and day-by-day) as a function of time of the day. Panel B reports theaverage correlation at one-minute frequency as a function of time of the day. Panel C reports the average bid–ask spread(normalized asset-by-asset and day-by-day) as a function of time of the day. Panel D reports the average (normalized) volatility as afunction of sampling frequency in minutes. Panel A: volatility curve, Panel B: correlation curve, Panel C: spread curve and Panel D:volatility signature.

Table 4. Sparse-sampling versus subsampling.

freq¼ 1min freq¼ 5min freq¼ 15min freq¼ 45min

Sparse Subsam Sparse Subsam Sparse Subsam Sparse Subsam

1. ‘net long’ 5.086 1:000ð0:50Þ

12.100 1:000ð0:48Þ

21.055 0:999ð0:57Þ

36.235 0:997ð0:67Þ

2. ‘long only’ 5.090 1:000ð0:50Þ

12.138 1:000ð0:50Þ

21.145 0:999ð0:57Þ

36.300 0:997ð0:67Þ

3. ‘ad hoc long–short’ 3.016 1:000ð0:52Þ

6.153 0:999ð0:61Þ

10.048 0:999ð0:62Þ

16.993 0:995ð0:82Þ

4. ‘target long–short’ 3.872 1:000ð0:52Þ

8.223 0:999ð0:55Þ

14.067 0:998ð0:61Þ

25.872 0:996ð0:64Þ

5. ‘min eig’ 6.945 0:999ð0:54Þ

14.082 0:997ð0:66Þ

22.799 0:996ð0:73Þ

39.038 0:992ð0:92Þ

6. ‘max eig’ 20.961 1:000ð0:48Þ

49.697 1:001ð0:46Þ

88.346 1:000ð0:50Þ

151.743 1:004ð0:29Þ

Note that this table reports the realised portfolio volatility for the strategies described in section 2.2.1 for forecasting method PC based on sparse

sampling (first column) and ‘subsampling and averaging’ (second column). In parenthesis below are the p-values for the null hypothesis that the

forecast constructed using subsampling reduces the realised portfolio volatility.

High-dimensional covariance forecasting for short intra-day horizons 1183

Dow

nloa

ded

by [

Uni

vers

ity o

f Io

wa

Lib

rari

es]

at 2

3:59

04

Oct

ober

201

4

Page 13: High-dimensional covariance forecasting for short intra-day horizons

In this paper we use synchronized 15-second data, butirregularly spaced tick data may be used as an alternative.The advantage is that all available data can now be usedusing methods as advocated in Hayashi and Yoshida(2005) and others. However, constructing high-dimensional covariance matrices in this fashion is notstraightforward as they are typically not guaranteed to bepositive definite and regularisation would be required.Another alternative to making more efficient use of thedata would be to subsample and average the estimators,see for instance Zhang et al. (2005). Table 4 implementssuch an approach where we obtain the forecast as anaverage over PCs each computed at the same frequencybut with the starting point shifted by 15-second incre-ments. So at a sampling frequency of 1 (45) minute(s), theforecast is constructed as the average over 4 (180)individual forecasts. The results are therefore quiteintuitive, namely that at lower sampling frequencies thescope for improvement by subsampling is greater. Still,the benefit appears marginal at best even at a 45-minutesampling frequency. One explanation for this may be thatthe smoothing done to obtain the forecast diminishes thebenefits. All these issues, and more, warrant furtherinvestigation and we leave this for future research.

Acknowledgements

The author would like to thank two anonymous referees,Karim Bannouh, Andy Ferraris, Alexander Gerko andTom Halahan for helpful comments and suggestions.

References

Andersen, T.G., Bollerslev, T., Diebold, F.X. and Labys, P.,Modeling and forecasting realised volatility. Econometrica,2003, 71, 579–625.

Barndorff-Nielsen, O.E., Hansen, P.R., Lunde, A. andShephard, N., Multivariate realised kernels: consistent posi-tive semi-definite estimators of the covariation of equity priceswith noise and non-synchronous trading. Working paper,Oxford-Man Institute, 2008.

Barndorff-Nielsen, O.E. and Shephard, N., Econometric analy-sis of realised covariation: high frequency based covariance,regression and correlation in financial economics.Econometrica, 2004, 72, 885–925.

Briner, B.G. and Connor, G., How much structure is best? Acomparison of market model, factor model and unstructuredequity covariance matrices. J. Risk, 2008, 10, 3–30.

Chiriac, R. and Voev, V., Modelling and forecasting multi-variate realized volatility. J. Appl. Econom., 2009,forthcoming.

Cohen, K.J., Hawawini, G.A., Maier, S.F., Schwartz, R.A. andWhitcomb, D.K., Friction in the trading process and theestimation of systematic risk. J. Finan. Econ., 1983, 12,263–278.

DeMiguel, V., Garlappi, L. and Uppal, R., Optimal versus naivediversification: how inefficient is the 1/N portfolio strategy?Rev. Finan. Stud., 2009, 22, 1915–1953.

Dimson, E., Risk measurement when shares are subject toinfrequent trading. J. Finan. Econ., 1979, 7, 197–226.

Engle, R., Dynamic conditional correlation – a simple class ofmultivariate GARCH models. J. Bus. & Econom. Statist.,2002, 20, 339–350.

Engle, R.F. and Sheppard, K., Theoretical and empiricalproperties of dynamic conditional correlation multivariateGARCH. Unpublished paper, University of California, 2001.

Epps, T.W., Comovements in stock prices in the very short run.J. Amer. Statist. Assoc., 1979, 74, 291–298.

Fisher, L., Some new stock-market indexes. J. Bus., 1966, 39,191–225.

Griffin, J.E. and Oomen, R.C., Covariance measurement in thepresence of non-synchronous trading and market microstruc-ture noise. J. Econometrics, forthcoming.

Hayashi, T. and Yoshida, N., On covariance estimation of non-synchronously observed diffusion processes. Bernoulli, 2005,11, 359–379.

Jaganathan, R. and Ma, T., Risk reduction in large portfolios:why imposing the wrong constraints helps. J. Finan., 2003, 58,1651–1683.

Johnstone, I., On the distribution of the largest eigenvalue inprincipal components analysis. Ann. Statist., 2001, 29,295–327.

Lo, A.W. and MacKinlay, A.C., An econometric analysis ofnonsynchronous-trading. J. Econom., 1990, 45, 181–212.

Patton, A. and Sheppard, K., Evaluating volatility andcorrelation forecasts. In Handbook of Financial Time Series,edited by T.G. Andersen, R.A. Davis, J.P. Kreiss, andT. Mikosch, pp. 801–838, 2009 (Springer-Verlag: Berlin).

Politis, D.N. and Romano, J.P., The stationary bootstrap.J. Amer. Statist. Assoc., 1994, 89, 1303–1313.

Roll, R., A simple implicit measure of the effective bid–askspread in an efficient market. J. Finan., 1984, 39, 1127–1139.

Scholes, M. and Williams, J., Estimating betas from nonsyn-chronous data. J. Finan. Econ., 1977, 5, 309–327.

Zhang, L., Mykland, P.A. and Aı̈t-Sahalia, Y., A tale of two timescales: determining integrated volatility with noisy highfrequency data. J. Amer. Statist. Assoc., 2005, 100, 1394–1411.

Zhou, B., High frequency data and volatility in foreign-exchange rates. J. Bus. & Econom. Statist., 1996, 14, 45–52.

Appendix A: Stability of portfolio weights

For portfolio strategy 4, we consider the variability ofoptimal weights with respect to changes in expectedreturn. Using the expression for the partitioned inversematrix, note that

!� ¼ ��1� ��1�� � �0��1� �0��1�

�0��1� �0��1�

� �1c

&

¼ ��1� ��1�� � A B

B D

� c

&

¼ c��1�Aþ &��1�Bþ c��1�Bþ &��1�D,

where

A ¼1

�0��1�þ

1

�0��1�

�0��1��0��1�

�0��1��0��1�� �0��1��0��1�,

B ¼ ��0��1�

�0��1��0��1�� �0��1��0��1�,

D ¼�0��1�

�0��1��0��1�� �0��1��0��1�:

From this it then follows that

d!�

d�¼ ��1ðcAþ &BÞ þ c��1�

dA

d�

þ ð&��1�þ c��1�ÞdB

d�þ &��1�

dD

d�, ðA1Þ

1184 R.C.A. Oomen

Dow

nloa

ded

by [

Uni

vers

ity o

f Io

wa

Lib

rari

es]

at 2

3:59

04

Oct

ober

201

4

Page 14: High-dimensional covariance forecasting for short intra-day horizons

where

dA

d�¼ �

�0��1��0��1�

�0��1�

2�0��1��0��1 � 2�0��1��0��1

�0��1��0��1�� �0��1��0��1�ð Þ2,

dB

d�¼�0��1� 2�0��1��0��1 � 2�0��1��0��1

� ��0��1��0��1�� �0��1��0��1�ð Þ

2

��0��1

�0��1��0��1�� �0��1��0��1�,

dD

d�¼

2�0��1

�0��1��0��1�� �0��1��0��1�

��0��1�ð2�0��1��0��1 � 2�0��1��0��1Þ

ð�0��1��0��1�� �0��1��0��1�Þ2:

Appendix B: Testing for the number of significant

principal components

Let X denote an n� k matrix of i.i.d. standard normal

random variables and let �max denote the largest

eigenvalue of X0X. Johnstone (2001) proves that the

distribution of �max, when centred by �� ¼ ðffiffiffiffiffiffiffiffiffiffiffin� 1p

þffiffiffikpÞ2 and scaled by �� ¼ ð

ffiffiffiffiffiffiffiffiffiffiffin� 1p

þffiffiffikpÞð1=

ffiffiffiffiffiffiffiffiffiffiffin� 1p

þ

1=ffiffiffikpÞ1=3, converges to the Tracy–Widom law of order 1

when k, n!1 and n/k! c� 1. Unreported simulations

confirm the finding of Johnstone (2001) that the finite

sample properties of the test for realistic sample sizes is

excellent.

High-dimensional covariance forecasting for short intra-day horizons 1185

Dow

nloa

ded

by [

Uni

vers

ity o

f Io

wa

Lib

rari

es]

at 2

3:59

04

Oct

ober

201

4