dynamic integration of time- and ... - princeton...

14
Dynamic Integration of Time- and State-Domain Methods for Volatility Estimation Jianqing FAN, Yingying FAN, and Jiancheng J IANG Time- and state-domain methods are two common approaches to nonparametric prediction. Whereas the former uses data predominantly from recent history, the latter relies mainly on historical information. Combining these two pieces of valuable information is an interesting challenge in statistics. We surmount this problem by dynamically integrating information from both the time and state domains. The estimators from these two domains are optimally combined based on a data-driven weighting strategy, which provides a more efficient estimator of volatility. Asymptotic normality is separately established for the time domain, the state domain, and the integrated estimators. By comparing the efficiency of the estimators, we demonstrate that the proposed integrated estimator uniformly dominates the other two estimators. The proposed dynamic integration approach is also applicable to other estimation problems in time series. Extensive simulations are conducted to demonstrate that the newly proposed procedure outperforms some popular ones, such as the RiskMetrics and historical simulation approaches, among others. In addition, empirical studies convincingly endorse our integration method. KEY WORDS: Bayes; Dynamical integration; Smoothing; State domain; Time domain; Volatility. 1. INTRODUCTION When forecasting a future event or making an investment decision, two pieces of useful information are frequently con- sulted. On the one hand, based on the recent history, a form of local average, such as the moving average in the time do- main, can be used to forecast a future event. This approach uses the continuity of a function and ignores the information in the remote history, which is related to the current through stationarity. On the other hand, a future event can be forecast based on state-domain modeling techniques (see Fan and Yao 2003 for details). For example, forecasting the volatility of bond yields with the current rate 6.47% involves computing the stan- dard deviation based on the historical information with yields of around 6.47%. This approach relies on the stationarity of the yields and depends predominately on historical data, and ignores the importance of recent data. In general, the foregoing two pieces of information are weakly dependent. For example, consider the weekly data on the yields of 3-month Treasury Bills presented in Figure 1. Sup- pose that the current time is January 4, 1991 and the interest rate is 6.47%. Based on the weighted squared differences in the past 52 weeks (1 year), for example, the volatility may be estimated. This corresponds to the time-domain smoothing, us- ing the small vertical stretch of data in Figure 1(a). Figure 1(b) computes the squared differences of the past year’s data and de- picts the associated exponential weights. The estimated volatil- ity (conditional variance) is indicated by the dashed horizontal bar. Let the resulting estimator be ˆ σ 2 t,time . On the other hand, in financial activities, we do consult historical information to make better decisions. The current interest rate is 6.47%. The volatility of the yields may be examined when the interest rate is around 6.47%, say 6.47% ± .25%. This corresponds to using the part of the data indicated by the horizontal bar. Figure 1(c) Jianqing Fan is Frederick L. Moore 18 Professor of Finance (E-mail: jqfan@ princeton.edu), Yingying Fan is Ph.D. Candidate (E-mail: yingying@princeton. edu), and Jiancheng Jiang is Associate Research Scholar (E-mail: jjiang@ princeton.edu), Department of Operations Research and Financial Engineer- ing, Princeton University, Princeton, NJ 08544. Jiancheng Jiang is also As- sistant Professor, Department of Mathematics and Statistics, University of North Carolina, Charlotte, NC 28223 (E-mail: [email protected]). This work was supported in part by the Research Grants Council of the Hong Kong SAR (project CUHK 400903/03P), the National Science Foundation (grants DMS-03-55179 and DMS-05-32370), and Chinese National Science Founda- tion (grant 10471006). The authors thank reviewers for constructive comments and Dr. Juan Gu for assistance. plots the squared differences (X t X t1 ) 2 against X t1 , with X t1 restricted to the interval 6.47% ± .25%. Applying the lo- cal kernel weight to the squared differences results in a state- domain estimator ˆ σ 2 t,state , indicated by the horizontal bar in Fig- ure 1(c). Clearly, as shown in Figure 1(a), except in the 3-week period immediately before January 4, 1991, the last period with an interest rate within 6.47% ± .25% is the period from May 15, 1988 and July 22, 1988. Thus, the time- and state-domain estimators use two weakly dependent components of the time series, because these two components are 136 weeks apart in time; see the horizontal and vertical bars of Figure 1(a). It is important to combine the estimators from the time do- main and the state domain separately, because this combination allows us to use more sampling information and to give better estimates. In fact, compared with the state-domain estimator, our new integrated estimator will put more emphasis (weight) on recent data; in contrast with the time-domain estimator, it will use historical data to improve the efficiency. These will also be demonstrated mathematically. Both state- and time-domain smoothing have been popularly studied in the literature. Many authors have contributed their works on the first topic; the survey papers by Cai and Hong (2003) and Fan (2005) provide overviews. Other works include, for example, drift and volatility estimation for short rate by Chapman and Pearson (2000); density estimation for spatial lin- ear processes by Hallin, Lu, and Tran (2001); diffusion estima- tion by Bandi and Phillips (2003); Fan and Zhang (2003), and Aït-Sahalia and Mykland (2004); kernel estimation and test- ing for continuous-time financial models by Arapis and Gao (2004); and tests for diffusion model by Chen, Gao, and Tang (2007) and Hong and Li (2005). On the other hand, there is also a large literature on time-domain smoothing, including work by Hall and Hart (1990), Robinson (1997), Gijbels, Pope, and Wand (1999), Hardle, Herwartz, and Spokoiny (2002), Fan and Gu (2003), Mercurio and Spokoiny (2004), and Aït-Sahalia, Mykland, and Zhang (2005), among others. The accuracies of both time- and state-domain estimators for the volatility de- pend on the information contained in the recent data and his- torical data. Brenner, Harjes, and Kroner (1996) proposed an interesting model on term interest rate that has some flavor of © 2007 American Statistical Association Journal of the American Statistical Association June 2007, Vol. 102, No. 478, Theory and Methods DOI 10.1198/016214507000000176 618

Upload: others

Post on 26-Apr-2020

13 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Dynamic Integration of Time- and ... - Princeton Universityorfe.princeton.edu/~jqfan/papers/04/timestate4.pdf · Dynamic Integration of Time- and State-Domain Methods for Volatility

Dynamic Integration of Time- and State-DomainMethods for Volatility Estimation

Jianqing FAN Yingying FAN and Jiancheng JIANG

Time- and state-domain methods are two common approaches to nonparametric prediction Whereas the former uses data predominantlyfrom recent history the latter relies mainly on historical information Combining these two pieces of valuable information is an interestingchallenge in statistics We surmount this problem by dynamically integrating information from both the time and state domains Theestimators from these two domains are optimally combined based on a data-driven weighting strategy which provides a more efficientestimator of volatility Asymptotic normality is separately established for the time domain the state domain and the integrated estimatorsBy comparing the efficiency of the estimators we demonstrate that the proposed integrated estimator uniformly dominates the other twoestimators The proposed dynamic integration approach is also applicable to other estimation problems in time series Extensive simulationsare conducted to demonstrate that the newly proposed procedure outperforms some popular ones such as the RiskMetrics and historicalsimulation approaches among others In addition empirical studies convincingly endorse our integration method

KEY WORDS Bayes Dynamical integration Smoothing State domain Time domain Volatility

1 INTRODUCTION

When forecasting a future event or making an investmentdecision two pieces of useful information are frequently con-sulted On the one hand based on the recent history a formof local average such as the moving average in the time do-main can be used to forecast a future event This approachuses the continuity of a function and ignores the informationin the remote history which is related to the current throughstationarity On the other hand a future event can be forecastbased on state-domain modeling techniques (see Fan and Yao2003 for details) For example forecasting the volatility of bondyields with the current rate 647 involves computing the stan-dard deviation based on the historical information with yieldsof around 647 This approach relies on the stationarity ofthe yields and depends predominately on historical data andignores the importance of recent data

In general the foregoing two pieces of information areweakly dependent For example consider the weekly data onthe yields of 3-month Treasury Bills presented in Figure 1 Sup-pose that the current time is January 4 1991 and the interestrate is 647 Based on the weighted squared differences inthe past 52 weeks (1 year) for example the volatility may beestimated This corresponds to the time-domain smoothing us-ing the small vertical stretch of data in Figure 1(a) Figure 1(b)computes the squared differences of the past yearrsquos data and de-picts the associated exponential weights The estimated volatil-ity (conditional variance) is indicated by the dashed horizontalbar Let the resulting estimator be σ 2

ttime On the other handin financial activities we do consult historical information tomake better decisions The current interest rate is 647 Thevolatility of the yields may be examined when the interest rateis around 647 say 647 plusmn 25 This corresponds to usingthe part of the data indicated by the horizontal bar Figure 1(c)

Jianqing Fan is Frederick L Moore 18 Professor of Finance (E-mail jqfanprincetonedu) Yingying Fan is PhD Candidate (E-mail yingyingprincetonedu) and Jiancheng Jiang is Associate Research Scholar (E-mail jjiangprincetonedu) Department of Operations Research and Financial Engineer-ing Princeton University Princeton NJ 08544 Jiancheng Jiang is also As-sistant Professor Department of Mathematics and Statistics University ofNorth Carolina Charlotte NC 28223 (E-mail jjiang1unccedu) This workwas supported in part by the Research Grants Council of the Hong KongSAR (project CUHK 40090303P) the National Science Foundation (grantsDMS-03-55179 and DMS-05-32370) and Chinese National Science Founda-tion (grant 10471006) The authors thank reviewers for constructive commentsand Dr Juan Gu for assistance

plots the squared differences (Xt minus Xtminus1)2 against Xtminus1 with

Xtminus1 restricted to the interval 647 plusmn 25 Applying the lo-cal kernel weight to the squared differences results in a state-domain estimator σ 2

tstate indicated by the horizontal bar in Fig-ure 1(c) Clearly as shown in Figure 1(a) except in the 3-weekperiod immediately before January 4 1991 the last period withan interest rate within 647 plusmn 25 is the period from May15 1988 and July 22 1988 Thus the time- and state-domainestimators use two weakly dependent components of the timeseries because these two components are 136 weeks apart intime see the horizontal and vertical bars of Figure 1(a)

It is important to combine the estimators from the time do-main and the state domain separately because this combinationallows us to use more sampling information and to give betterestimates In fact compared with the state-domain estimatorour new integrated estimator will put more emphasis (weight)on recent data in contrast with the time-domain estimator itwill use historical data to improve the efficiency These willalso be demonstrated mathematically

Both state- and time-domain smoothing have been popularlystudied in the literature Many authors have contributed theirworks on the first topic the survey papers by Cai and Hong(2003) and Fan (2005) provide overviews Other works includefor example drift and volatility estimation for short rate byChapman and Pearson (2000) density estimation for spatial lin-ear processes by Hallin Lu and Tran (2001) diffusion estima-tion by Bandi and Phillips (2003) Fan and Zhang (2003) andAiumlt-Sahalia and Mykland (2004) kernel estimation and test-ing for continuous-time financial models by Arapis and Gao(2004) and tests for diffusion model by Chen Gao and Tang(2007) and Hong and Li (2005) On the other hand there is alsoa large literature on time-domain smoothing including workby Hall and Hart (1990) Robinson (1997) Gijbels Pope andWand (1999) Hardle Herwartz and Spokoiny (2002) Fan andGu (2003) Mercurio and Spokoiny (2004) and Aiumlt-SahaliaMykland and Zhang (2005) among others The accuracies ofboth time- and state-domain estimators for the volatility de-pend on the information contained in the recent data and his-torical data Brenner Harjes and Kroner (1996) proposed aninteresting model on term interest rate that has some flavor of

copy 2007 American Statistical AssociationJournal of the American Statistical Association

June 2007 Vol 102 No 478 Theory and MethodsDOI 101198016214507000000176

618

Fan Fan and Jiang Integrating Time and State Domains for Volatility Estimation 619

(a)

(b) (c)

Figure 1 Illustration of Time-Domain and State-Domain Estimation (a) The yields of 3-month Treasury Bills from 1954 to 2004 The vertical barindicates localization in time and the horizontal bar represents localization in state (b) Illustration of time-domain smoothing Squared differencesare plotted against its time index and the exponential weights are used to compute the local average (c) Illustration of state-domain smoothingSquared differences are plotted against the level of interest rates restricted to the interval 647 plusmn 25 indicated by the horizontal bar in (a) TheEpanechnikov kernel is used for computing the local average

combining the time- and state-domain information in a para-metric form However there is no formal work in the litera-ture on efficiently integrating the time- and state-domain esti-mators

In this article we realize the existence of two pieces of weaklyindependent information and introduce a general approach forintegrating the time- and state-domain estimators The inte-grated estimator borrows the strengths of both the time- andstate-domain estimators with aggregated information from thedata and thus dominates both of the estimators when the data-generating process is a continuous stationary diffusion processIn particular when the time domain is far more informative thanthe state domain the integrated estimator will basically becomethe time-domain estimator On the other hand if at a particulartime when the performance of the state-domain estimator dom-inates that of the time-domain estimator the procedure opts au-tomatically for the state-domain estimator

Our strategy for integration is to introduce a dynamic weight-ing scheme 0 le wt le 1 to combine the two weakly dependentestimators Define the resulting integrated estimator as

σ 2t = wtσ

2ttime + (1 minus wt)σ

2tstate (1)

The question is how to choose the dynamic weight wt to op-timize the performance A reasonable approach is to minimizethe variance of the combined estimator leading to the dynamic

optimal weight

wt = var(σ 2tstate)

var(σ 2ttime) + var(σ 2

tstate) (2)

Estimation of the unknown variances in (2) is introduced in Sec-tion 3 Another approach to integration is to use the Bayesianapproach which considers the historical information the priorWe explore this idea in Section 4

To appreciate the intuition behind our approach and how itworks consider the diffusion process

drt = μ(rt)dt + σ(rt)dWt (3)

where Wt is a Wiener process on [0infin) This diffusion processis frequently used to model asset price and the yields of bondswhich are fundamental to fixed income securities financialmarkets consumer spending corporate earnings asset pricingand inflation The volatilities of interest rates are fundamentallyimportant to value the prices of bonds and contract debt obliga-tions (CDOs) the derivatives of bonds and their associated riskmanagement (eg hedging)

The family of models in (4) includes famous ones such asthe Vasicek (1977) model the CIR model (Cox Ingersoll andRoss 1985) and the CKLS model (Chan Karolyi Longstaffand Sanders 1992) Suppose that at time t we have historic datartiN

i=0 from process (3) with a sampling interval Our aim

620 Journal of the American Statistical Association June 2007

is to estimate the volatility σ 2t equiv σ 2(rt) Let Yi = minus12(rti+1 minus

rti) Then for model (3) the Euler approximation scheme is

Yi asymp μ(rti)12 + σ(rti)εi (4)

where εiiidsim N(01) for i = 0 N minus 1 Fan and Zhang (2003)

studied the impact of the order of difference on statistical es-timation They found that although a higher order can possi-bly reduce approximation errors it substantially increases thevariances of data They recommended the Euler scheme (4) formost practical situations

To facilitate the derivation of mathematical theory we focuson the estimation of volatility in model (3) to illustrate howto deal with the problem of dynamic integration Asymptoticnormality of the proposed estimator is established under thismodel and extensive simulations are conducted This theoret-ically and empirically demonstrates the dominant performanceof the integrated estimation Our method focuses on only theestimation of volatility but it can be adapted to other estima-tion problems such as the value at risk studied by Duffie andPan (1997) the drift estimation for diffusion considered bySpokoiny (2000) and conditional moments conditional corre-lation conditional distribution and derivative pricing It is alsoapplicable to other estimation problems in time series such asforecasting the mean function Further studies along these linesare beyond the scope of the current investigation

2 ESTIMATION OF VOLATILITY

From here on for theoretical derivations we assume that thedata rti are sampled from the diffusion model (3) As demon-strated by Stanton (1997) and Fan and Zhang (2003) the driftterm in (4) contributes to the volatility in the order of o(1) undercertain conditions when rarr 0 (eg Stanton 1997) Thus forsimplicity we sometimes ignore the drift term when estimatingthe volatility

21 Time-Domain Estimator

A popular version of the time-domain estimator of thevolatility is the moving average estimator

σ 2MAt = nminus1

tminus1sum

i=tminusn

Y2i (5)

where n is the size of the moving window This estimatorignores the drift component and uses local n data points Anextension of the moving average estimator is the exponentialsmoothing estimation given by

σ 2ESt = (1 minus λ)Y2

tminus1 + λσ 2EStminus1

= (1 minus λ)Y2tminus1 + λY2

tminus2 + λ2Y2tminus3 + middot middot middot (6)

where λ is a smoothing parameter between 0 and 1 that controlsthe size of the local neighborhood This estimator is closelyrelated to that from the GARCH(1 1) model (see Fan JiangZhang and Zhou 2003 for more details) The RiskMetrics ofJ P Morgan (1996) which is used for measuring the risk calledvalue at risk (VaR) of financial assets recommends λ = 94and λ = 97 for calculating the VaR of the daily and monthlyreturns

The exponential smoothing estimator in (6) is a weightedsum of the squared returns before time t Because the weightdecays exponentially it essentially uses recent data A slightlymodified version that explicitly uses only n data points beforetime t is

σ 2ESt = 1 minus λ

1 minus λn

nsum

i=1

Y2tminusiλ

iminus1 (7)

where the smoothing parameter λ depends on n When λ rarr 1it becomes the moving average estimator (5)

Theorem 1 Suppose that σ 2t gt 0 Under conditions (C1)

and (C2) if n rarr infin and n rarr 0 then σ 2ESt minus σ 2

t rarr 0 almostsurely Moreover if the limit c = limnrarrinfin n(1 minus λ) exists andn(2pminus1)(4pminus1) rarr 0 where p is as specified in condition (C2)then

radicn[σ 2

ESt minus σ 2t ]s1t

Dminusrarr N (01)

where s21t = cσ 4

tec+1ecminus1

Theorem 1 has very interesting implications We can com-pute the variance as if the data were independent Indeed if thedata in (7) were independent and locally homogeneous then

var(σ 2ESt) asymp (1 minus λ)2

(1 minus λn)22σ 4

t

nsum

i=1

λ2(iminus1) asymp 1

ns2

1t

This is indeed the asymptotic variance given in Theorem 1

22 Estimation in State Domain

To obtain the nonparametric estimation of the functionsf (x) = 12μ(x) and σ 2(x) in (4) we use the local linearsmoother studied by Ruppert Wand Holst and Houmlssjer (1997)and Fan and Yao (1998) The local linear technique is chosenfor its several nice properties including asymptotic minimaxefficiency and design adaptation Furthermore it automaticallycorrects edge effects and facilitates bandwidth selections (Fanand Yao 2003)

Note that the historical data used in the state-domain estima-tion at time t are (rti Yi) i = 0 N minus 1 Let f (x) = α1 bethe local linear estimator that solves the weighted least squaresproblem

(α1 α2) = arg minα1α2

Nminus1sum

i=0

[Yi minus α1 minus α2

(rti minus x

)]2Kh1

(rti minus x

)

where K(middot) is a kernel function and h1 gt 0 is a bandwidth De-note the squared residuals by Ri = Yi minus f (rti)2 Then the locallinear estimator of σ 2(x) is σ 2

S (x) = β0 given by

(β0 β1) = arg minβ0β1

Nminus1sum

i=0

Ri minus β0 minus β1

(rti minus x

)2Wh

(rti minus x

)

(8)with a kernel function W and a bandwidth h Fan and Yao(1998) gave strategies for bandwidth selection Stanton (1997)and Fan and Zhang (2003) showed that Y2

i instead of Ri in (8)also can be used for the estimation of σ 2(x)

The asymptotic bias and variance of σ 2S (x) have been given

by Fan and Zhang (2003 thm 4) Set νj = intujW2(u)du for

j = 012 and let p(middot) be the invariant density function of the

Fan Fan and Jiang Integrating Time and State Domains for Volatility Estimation 621

Markov process rs from (3) We then have the following re-sult

Theorem 2 Set s22(x) = 2ν0σ

4(x)p(x) Let x be in the inte-rior of the support of p(middot) Suppose that the second derivativesof μ(middot) and σ 2(middot) exist in a neighborhood of x Under conditions

(C3)ndash(C6)radic

Nh[σ 2S (x) minus σ 2(x)]s2(x)

Dminusrarr N (01)

3 DYNAMIC INTEGRATION OF TIMEndash ANDSTATEndashDOMAIN ESTIMATORS

In this section we first show how the optimal dynamicweights in (2) can be estimated and then prove that the time-domain and state-domain estimators are indeed asymptoticallyindependent

31 Estimation of Dynamic Weights

For the exponential smoothing estimator in (7) we can applythe asymptotic formula given in Theorem 1 to get an estimate ofits asymptotic variance But because the estimator is a weightedaverage of Y2

tminusi we also can obtain its variance directly by as-suming that Ytminusj sim N(0 σ 2

t ) for small j Indeed with the fore-going local homogeneous model we have

var(σ 2ESt)

asymp (1 minus λ)2

(1 minus λn)22σ 4

t

nsum

i=1

nsum

j=1

λi+jminus2ρt(|i minus j|)

= 2(1 minus λ)2σ 4t

(1 minus λn)2

n + 2

nminus1sum

k=1

ρt(k)λk(1 minus λ2(nminusk))

1 minus λ2

(9)

where ρt(k) = cor(Y2t Y2

tminusk) is the autocorrelation of the seriesY2

tminusk The autocorrelation can be estimated from the data inhistory Note that due to the locality of the exponential smooth-ing only ρt(k)rsquos with the first 30 lags say contribute to thevariance calculation

We now turn to estimate the variance of σ 2St = σ 2

S (rt) Detailsof this have been given by Fan and Yao (1998 2003 sec 62)Let

Vj(x) =Nminus1sum

i=1

(rti minus x)jW((

rti minus x)h1

)(10)

and ξi(x) = W(rti minusx

h1)V2(x) minus (rti minus x)V1(x)V0(x)V2(x) minus

V1(x)2 Then the local linear estimator can be expressed asσ 2

S (x) = sumNminus1i=1 ξi(x)Ri and its variance can be approximated

as

var(σ 2S (x)) asymp 2σ 4(x)

Nminus1sum

i=1

ξ2i (x) (11)

Substituting (9) and (11) into (2) we propose to combinethe time-domain and state-domain estimators with the dynamicweight

wt = σ 4St

sumNminus1i=1 ξ2

i (rt)

σ 4St

sumNminus1i=1 ξ2

i (rt) + ctσ4ESt

(12)

where ct = (1minusλ)2

(1minusλn)2 n+2sumnminus1

k=1 ρt(k)λk(1minusλ2(nminusk))(1minusλ2)For practical implementation we truncate the series ρt(k)tminus1

k=1

in the summation as ρt(k)30k=1 This results in the dynamically

integrated estimator

σ 2It = wtσ

2ESt + (1 minus wt)σ

2St (13)

where σ 2St = σ 2

S (rt) The function σ 2S (middot) uses historical data

up to the time t and we need to update this function as timeevolves Fortunately we need know only the function value atthe point rt which significantly reduces the computational costThe computational cost can be further reduced if we update theestimated function σ 2

St on a prescribed time schedule (eg onceevery 2 months for weekly data)

Finally we note that in the choice of weight only the vari-ance of the estimated volatility is considered not the meansquared error (MSE) This is mainly to facilitate the choice ofdynamic weights Because the smoothing parameters in σ 2

ESt

and σ 2S (x) have been tuned to optimize their performance sep-

arately their bias and variance trade-offs have been consideredindirectly Thus controlling the variance of the integrated esti-mator σ 2

It also controls to some extent the bias of the estima-tor

32 Sampling Properties

The fundamental component in the choice of dynamicweights is the asymptotic independence between the time- andstate-domain estimators The following theorem shows that thetime- and state-domain estimators are indeed asymptotically in-dependent To facilitate the expression of notations we presentthe result at the current time tN

Theorem 3 Suppose that the second derivatives of μ(middot) andσ 2(middot) exist in a neighborhood of rtN Under conditions (C1) and(C3)ndash(C6) if condition (C2) holds at time point t = tN then

(a) Asymptotic independence

[radicn(σ 2

EStNminus σ 2

tN )

s1tN

radicNh(σ 2

StNminus σ 2

tN )

s2tN

]T Dminusrarr N (0 I2)

where s1tN and s2tN = s2(rtN ) are as given in Theorems 1 and 2(b) Asymptotic normality of σ 2

tN with wtN in (2) If the

limit d = limNrarrinfin n[Nh] exists thenradic

Nhω[σ 2tN minus σ 2

tN ] DminusrarrN (01) where ω = w2

tN s21tN

d + (1 minus wtN )2s22tN

(c) Asymptotic normality of σ 2ItN

with an estimated weightwtN If the weight wtN converges to the weight wtN in proba-bility and the limit d = limNrarrinfin n[Nh] exists then

radicNhω times

(σ 2ItN

minus σ 2tN )

Dminusrarr N (01)

Because the estimators σ 2St and σ 2

ESt are consistent

wt asympsumtminus1

i=1 ξ2i (rt)

sumtminus1i=1 ξ2

i (rt) + ctasymp var(σ 2

S (rt))

var(σ 2S (rt)) + var(σ 2

ESt(rt))= wt

and hence the condition in Theorem 3(c) holds Note that thetheoretically optimal weight minimizing the variance in Theo-rem 3(b) is

wtN opt = ds22tN

s21tN

+ ds22tN

asymp var(σ 2S (rt))

var(σ 2S (rt)) + var(σ 2

ESt(rt))asymp wtN

622 Journal of the American Statistical Association June 2007

It follows that the integrated estimator σ 2ItN

is optimal in thesense that it achieves the minimum variance among all of theweighted estimators in (1)

When 0 lt d lt infin in Theorem 3 the effective sample sizesin both time and state domains are comparable thus neitherthe time-domain estimator nor the state-domain estimator dom-inates From Theorem 3 based on the optimal weight the as-ymptotic relative efficiencies of σ 2

ItNwith respect to σ 2

StNand

σ 2EStN

are

eff(σ 2ItN σ 2

StN ) = 1 + ds22tN s2

1tN and

eff(σ 2ItN σ 2

EStN ) = 1 + s21tN (ds2

2tN )

which are gt1 This demonstrates that the integrated estimatorσ 2

ItNis more efficient than the time-domain and state-domain

estimators

4 BAYESIAN INTEGRATION OFVOLATILITY ESTIMATES

Another possible approach is to consider the historical infor-mation as the prior and to incorporate it into the estimation ofvolatility using the Bayesian framework We now explore suchan approach

41 Bayesian Estimation of Volatility

The Bayesian approach is to consider the recent data Ytminusn

Ytminus1 as an independent sample from N(0 σ 2) [see (4)] andto consider the historical information being summarized in aprior To incorporate the historical information we assume thatthe variance σ 2 follows an inverse-gamma distribution with pa-rameters a and b which has the density function

f (σ 2) = baminus1(a)σ 2minus(a+1)exp(minusbσ 2)

Write σ 2 sim IG(ab) It is well known that

E(σ 2) = b

a minus 1

var(σ 2) = b2

(a minus 1)2(a minus 2) and (14)

mode(σ 2) = b

a + 1

The hyperparameters a and b are estimated from historical datausing the state-domain estimators

It can be easily shown that given Y = (Ytminusn Ytminus1) theposterior density of σ 2 is IG(alowastblowast) where alowast = a + n

2 andblowast = 1

2

sumni=1 Y2

tminusi + b From (14) the Bayesian mean of σ 2 is

σ 2 = blowast

alowast minus 1=

nsum

i=1

Y2tminusi + 2b

2(a minus 1) + n

This Bayesian estimator can be easily written as

σ 2B = n

n + 2(a minus 1)σ 2

MAt + 2(a minus 1)

n + 2(a minus 1)σ 2

P (15)

where σ 2MAt is the moving average estimator given by (5) and

σ 2P = b(aminus1) is the prior mean determined from the historical

data This combines the estimate based on the data and priorknowledge

The Bayesian estimator (15) uses the local average of n datapoints To incorporate the exponential smoothing estimator (6)we consider it the local average of nlowast = sumn

i=1 λiminus1 = 1minusλn

1minusλdata

points This leads to the following integrated estimator

σ 2Bt = nlowast

nlowast + 2(a minus 1)σ 2

ESt + 2(a minus 1)

2(a minus 1) + nlowast σ 2P

= 1 minus λn

1 minus λn + 2(a minus 1)(1 minus λ)σ 2

ESt

+ 2(a minus 1)(1 minus λ)

1 minus λn + 2(a minus 1)(1 minus λ)σ 2

P (16)

In particular when λ rarr 1 the estimator (16) reduces to (15)

42 Estimation of Prior Parameters

A reasonable source for the prior information in (16) is thehistorical data up to time t Thus the hyperparameters a and bshould depend on t and can be used to match with the momentsfrom the historical information Using the approximation model(4) we have

E[(Yt minus f (rt))

2 | rt] asymp σ 2(rt) and

(17)var

[(Yt minus f (rt))

2 | rt] asymp 2σ 4(rt)

These can be estimated from the historical data up to time tnamely the state-domain estimator σ 2

S (rt) Because we have as-sumed that the prior distribution for σ 2

t is IG(atbt) by (17) andthe method of moments we would get the following estimationequations by matching the moments from the prior distributionwith the moment estimation from the historical estimate

E(σ 2t ) = σ 2

S (rt) and var(σ 2t ) = 2σ 4

S (rt)

This together with (14) leads to

bt

at minus 1= σ 2

S (rt) andb2

t

(at minus 1)2(at minus 2)= 2σ 4

S (rt)

where E represents from the expectation with respect to theprior Solving the foregoing equations we obtain

at = 25 and bt = 15σ 2S (rt)

Substituting this into (16) we obtain the estimator

σ 2Bt = 1 minus λn

1 minus λn + 3(1 minus λ)σ 2

ESt +3(1 minus λ)

1 minus λn + 3(1 minus λ)σ 2

St (18)

Unfortunately the weights in (18) are static and do not dependon the time t Thus the Bayesian method that we use doesnot produce a satisfactory answer to this problem Howeverother implementations of the Bayesian method may yield thedynamic weights

5 SIMULATION STUDIES AND EXAMPLES

To facilitate the presentation we use the simple abbrevia-tions in Table 1 to denote seven volatility estimation methodsDetails on the first three methods have been given by Fan andGu (2003) In particular the first method is to estimate volatilityusing the standard deviation of the yields in the past year andthe RiskMetrics method is based on the exponential smooth-ing with λ = 94 The semiparametric method of Fan and Gu(2003) is an extension of a local model used in the exponential

Fan Fan and Jiang Integrating Time and State Domains for Volatility Estimation 623

Table 1 Seven Volatility Estimators

Hist The historical methodRiskM The RiskMetrics method of J P MorganSemi The semiparametric estimator (SEV) of Fan and Gu (2003)NonBay The nonparametric Bayesian method in (18)Integ The integration method of time and state domains in (13)GARCH The maximum likelihood method based on a GARCH(1 1)

modelStaDo The state-domain estimation method in Section 22

smoothing with the smoothing parameter determined by min-imizing the prediction error It includes exponential smoothingwith λ selected by data as a specific example For our integratedmethods the parameter λ is taken as 94

The following five measures are used to assess the per-formance of different procedures for estimating the volatilityOther related measures also can be used (see Daveacute and Stahl1997)

Measure 1 Exceedence Ratio Against Confidence LevelThis measure counts the number of the events for which theloss of an asset exceeds the loss predicted by the normal modelat a given confidence α It was given by Fan and Gu (2003) andcomputed as

ER(σ 2t ) = mminus1

T+msum

i=T+1

I(Yi lt 13minus1(α)σi) (19)

where 13minus1(α) is the α-quantile of the standard normal distribu-tion and m is the size of the out-sample This gives an indicationof how effectively the volatility estimator can be used for pre-diction

As noted by Fan and Gu (2003) one shortcoming of the mea-sure is its large Monte Carlo error Unless the postsample sizem is sufficiently large this measure has difficulty by differenti-ating the performance of various estimators due to the presenceof large error margins Note that the ER depends strongly onthe assumption of normality In our simulation study we usethe true α-quantile of the error distribution instead of 13minus1(α)

in (19) to compute the ER For real data analysis we use theα-quantile of the last 250 residuals for the in-sample data

Measure 2 Mean Absolute Deviation Error To motivate thismeasure we first consider the MSEs

PE(σ 2t ) = mminus1

T+msum

i=T+1

(Y2i minus σ 2

i )2

The expected value can be decomposed as

E(PE) = mminus1T+msum

i=T+1

E(σ 2i minus σ 2

i )2 + mminus1T+msum

i=T+1

E(Y2i minus σ 2

i )2

+ mminus1T+msum

i=T+1

E(Y2i minus σ 2

i )(σ 2i minus σ 2

i ) (20)

Because σ 2i is predicable at time ti and Y2

i minus σ 2i is a martingale

difference the third term is 0 [the drift is assumed to be neg-ligible in the forgoing formulation of PE otherwise the driftterm should be removed first In both cases it has an affec-tion O(2)] Therefore we need consider only the first two

terms The first term reflects the effectiveness of the estimatedvolatility whereas the second term is the size of the stochasticerror and independent of estimators As in all statistical pre-diction problems the second term is usually of an order ofmagnitude larger than the first term Thus a small improvementin PE could mean substantial improvement over the estimatedvolatility However due to the well-known fact that financialtime series contain outliers the MSE is not a robust measureTherefore we used the mean absolute deviation error (MADE)MADE(σ 2

t ) = mminus1 sumT+mi=T+1 |Y2

i minus σ 2i |

Measure 3 Square-Root Absolute Deviation Error An al-ternative to MADE is the Square-root absolute deviation error(RADE) defined as

RADE(σ 2t ) = mminus1

T+msum

i=T+1

∣∣| Yi | minusradic2πσi

∣∣

The constant factor comes from the fact that E|εt| = radic2π for

εt sim N(01) If the underlying error distribution deviates fromnormality then this measure is not robust

Measure 4 Ideal Mean Absolute Deviation Error and Oth-ers To assess the estimation of the volatility in simulations wealso can use the ideal mean absolute deviation error (IMADE)

IMADE = mminus1T+msum

i=T+1

|σ 2i minus σ 2

i |

and the ideal square root absolute deviation error (IRADE)

IRADE = mminus1T+msum

i=T+1

|σi minus σi|

Another intuitive measure is the relative IMADE (RIMADE)

RIMADE = mminus1T+msum

i=T+1

|σ 2i minus σ 2

i |σ 2

i

This measure may create some outliers when the true volatilityis quite small at certain time points For real data analysis bothmeasures are not applicable

51 Simulations

To assess the performance of the five estimation methodslisted in Table 1 we compute the average and standard devi-ation of each of the four measures over 600 simulations Gen-erally speaking the smaller the average (or the standard devia-tion) the better the estimation approach We also compute theldquoscorerdquo of an estimator which is the percentage of times among600 simulations that the estimator outperforms the average ofthe 7 methods in terms of an effectiveness measure To be morespecific for example consider RiskMetrics using MADE as aneffectiveness measure Let mi be the MADE of the RiskMet-rics estimator at the ith simulation and let mi be the average ofthe MADEs for the five estimators at the ith simulation Thenthe ldquoscorerdquo of the RiskMetrics approach in terms of the MADEis defined as 1

600

sum600i=1 I(mi lt mi) Obviously estimators with

higher scores are preferred In addition we define the ldquorelativelossrdquo of an estimator σ 2

t relative to σ 2It in terms of MADEs as

RLOSS(σ 2t σ 2

It) = MADE(σ 2t )MADE(σ 2

It) minus 1

624 Journal of the American Statistical Association June 2007

where MADE(σ 2t ) is the average of MADE(σ 2

t ) among simula-tions

Example 1 In this example we compare the proposedmethods with the maximum likelihood estimation approachfor a parametric generalized autoregressive conditional het-eroscedasticity [GARCH(1 1)] model This will give an assess-ment of how well our methods perform compared with the mostefficient estimation method

Because the GARCH model does not have a state variablerti but the diffusion model does the diffusion model cannot beused to fit the data simulated from the GARCH model A sen-sible approach is to use the diffusion limit of the GARCH(1 1)model to generate data There are several interesting works onreconciling the two modeling approaches along this line forexample Nelson (1990) established the continuous-time dif-fusion limit for the discrete ARCH model Duan (1997) pro-posed an augmented GARCH model to unify various paramet-ric GARCH models and derived its diffusion limit and Wang(2002) studied the asymptotic relationship between GARCHmodels and diffusions Here we use the result on the diffusionlimit of the GARCH(1 1) model of Wang (2002)

Specifically we first generate data from the GARCH(1 1)model

Yt = σtεt

σ 2t = c0 + a0σ

2tminus1 + b0Y2

tminus1

where the εtrsquos are from the standard normal distribution and thetrue parameters c0 = 498 times 10minus7 a0 = 9289615 and b0 =0411574 are from the fitted values of the parameters for thedaily exchange rate of the Australian dollar with the US dollarfrom January 3 1994 to December 29 2000 The GARCH(1 1)model is then fitted by maximum likelihood estimation

Second we use the diffusion limit (see Wang 2002 sec 23)of the foregoing GARCH model

drt = σt dW1t

dσ 2t = (β0 + β1σ

2t )dt + β2σ

2t dW2t

where W1t and W2t are two independent standard Wienerprocess and the parameters are computed as (β0 β1 β2) =(25896 times 10minus5minus15538 4197) To simulate the diffusionprocess σ 2

t we use the discrete-time order 10 strong ap-proximation scheme of Kloeden Platen Schurz and Soslashrensen(1996)

For each of the two models we generate 600 series of dailydata each with length 1200 For each simulation we set thefirst 900 observations as the ldquoin-samplerdquo data and the last 300observations as the ldquoout-samplerdquo data The results summarizedin Table 2 show that all estimators have acceptable ER valuesclose to 05 The integrated estimator has minimum RIMADEIMADE MADE IRADE and RADE on average among allof the nonparametric estimators Compared with the most ef-ficient maximum likelihood estimator (MLE) of the GARCHmodel the integrated estimator has smaller MADE but largerRIMADE IMADE and IRADE The RADEs of the MLEand the integrated estimator are very close demonstrating thatthe integrated estimator is very impressive in forecasting thevolatility of this example

Note that the dynamic weights for the integrated estimatorare estimated by minimizing the variance of the combined esti-mator in (1) As one anonymous referee pointed out one maychoose to minimize the MSE instead which seems to be dif-ficult and computationally intensive in current situations Asargued in Section 31 the biases and variances trade-off havebeen considered indirectly in the time-domain and state-domainestimation methods The bias of the integrated estimator wascontrolled before the weights were chosen

To investigate whether the foregoing intuition is correct herewe opt for the suggestion from the referee to visually displayhow the integrated estimator dominates the state-domaintime-domain estimators over time through the dynamic weights andcompare the biases of the state-domain estimator and the inte-grated estimator The weights and biases are shown in Figure 2

Table 2 Comparisons of Several Volatility Estimation Methods

Measure Empirical formula Hist RiskM Semi NonBay Integ GARCH StaDo

RIMADE Score () 4073 2805 6494 5092 7379 7629 4090Average 205 204 183 187 178 169 204

Standard (times10minus2) 655 331 450 308 397 367 567Relative loss () 1544 1444 266 497 0 minus476 1475

IMADE Score () 6177 2554 5342 4608 7212 7546 4090Average (times10minus5) 332 367 330 336 318 304 362Standard (times10minus6) 1118 744 877 690 851 800 1113Relative loss () 440 1544 373 581 0 minus447 1390

MADE Score () 2821 4958 4975 5659 8815 5743 7963Average (times10minus4) 164 153 153 152 149 152 149Standard (times10minus6) 2314 2182 2049 2105 1869 1852 1876Relative loss () 1016 284 293 209 0 207 11

IRADE Score () 6277 2404 5109 4708 7062 7713 4040Average (times10minus3) 403 448 407 410 391 369 446Standard (times10minus3) 121 075 092 070 093 085 131Relative loss () 317 1475 411 498 0 minus553 1422

RADE Score () 3055 5042 4341 5910 8464 5893 7913Average (times10minus2) 20 19 19 19 19 19 19Standard (times10minus4) 153 145 141 142 134 133 137Relative loss () 513 142 165 96 0 106 minus25

ER Average (times10minus2) 495 545 538 528 525 516 500Standard (times10minus2) 157 113 128 112 136 125 149

Fan Fan and Jiang Integrating Time and State Domains for Volatility Estimation 625

(a) (b)

Figure 2 The Dynamic Weights for Prediction for Example 1 (a) and the Biases for Prediction for the State-Domain Estimator ( minus minus minus) and theIntegration Estimator ( mdash) (b)

It is seen that the bias is actually controlled because both of theestimators have very close biases The weights are around 2and through the weights the integrated method improves boththe RiskMetrics method and the state-domain approach

Example 2 To simulate the interest rate data we considerthe CoxndashIngersollndashRoss (CIR) model

drt = κ(θ minus rt)dt + σ r12t dWt t ge t0

where the spot rate rt moves around a central location or long-run equilibrium level θ = 08571 at speed κ = 21459 The σ

is set at 07830 These parameters values given by Chapmanand Pearson (2000) satisfy the condition 2κθ ge σ 2 so that theprocess rt is stationary and positive The model has been studiedby Chapman and Pearson (2000) and Fan and Zhang (2003)

We simulated weekly data (600 runs) from the CIR modelFor each simulation we designated the first 900 observations

as the ldquoin-samplerdquo data and the last 300 observations the ldquoout-samplerdquo data The results summarized in Table 3 show that theintegrated method performs closely to the state-domain methodThis is mainly because the dynamic weights shown in Figure 3are very small and the integration estimator is very close to thestate-domain estimator supporting our statement at the end ofthe third paragraph in Section 1 Table 3 shows that the inte-grated estimator uniformly dominates the other estimators onaverage due to its highest score and lowest averaged RIMADEIMADE MADE IRADE and RADE The improvement inRIMADE IMADE and IRADE is about 100 This demon-strates that our integrated volatility method better captures thevolatility dynamics The historical simulation method performspoorly due to misspecification of the function of the volatilityparameter The results here show the advantage of aggregatingthe information of the time and state domains Note that all es-timators have reasonable ER values at level 05 and that the ERvalue of the integrated estimator is closest to 05

Table 3 Comparisons of Several Volatility Estimation Methods

Measure Empirical formula Hist RiskM Semi NonBay Integ GARCH StaDo

RIMADE Score () 501 1820 2104 3656 9933 2404 9917Average (times10minus1) 267 190 188 164 61 202 62Standard (times10minus1) 148 35 81 31 39 130 43Relative loss () 33732 21074 20798 16821 0 23012 112

IMADE Score () 768 1319 1469 2972 9967 2120 9950Average (times10minus6) 237 208 191 179 64 196 63Standard (times10minus6) 100 78 69 68 45 91 46Relative loss () 27162 22558 19905 18107 0 20696 minus163

MADE Score () 3823 5209 5776 5609 7145 5392 6945Average (times10minus5) 102 93 93 93 91 94 92Standard (times10minus6) 349 333 321 330 317 310 317Relative loss () 1100 212 221 162 0 229 21

IRADE Score () 818 1252 1402 3072 9983 2104 9950Average (times10minus4) 376 322 307 277 99 316 98Standard (times10minus4) 140 75 90 66 60 192 62Relative loss () 27855 22397 20898 17892 0 21821 minus111

RADE Score () 4040 5209 5042 5593 7379 4875 7279Average (times10minus2) 15 15 15 15 15 15 15Standard (times10minus4) 272 267 257 265 258 251 257Relative loss () 608 123 160 92 0 189 06

ER Average 056 055 054 053 049 053 049Standard 023 011 013 011 013 036 013

626 Journal of the American Statistical Association June 2007

(a) (b)

Figure 3 The Dynamic Weights for Prediction for Example 2 (a) and the Biases for Prediction for the State-Domain Estimator ( minus minus minus) and forthe Integration Estimator ( mdash) (b)

Again Figure 3(b) displays the biases for the state domainand integration estimators It can be seen that the state-domainand integration estimators have almost overlaid biases demon-strating that the bias of the integrated estimator is taken care ofeven though we choose the dynamic weights by minimizing thevariance

Example 3 To assess the performance of the proposed meth-ods under a nonstationary situation we now consider the geo-metric Brownian (GBM)

drt = μrt dt + σ rt dWt

where Wt is a standard one-dimensional Brownian motion Thisis a nonstationary process against which we check whether ourmethod continues to apply Note that the celebrated BlackndashScholes option price formula is derived from Osbornersquos as-sumption that the stock price follows the GBM model By Itocircrsquosformula we have

log rt minus log r0 = (μ minus σ 22)t + σ 2Wt

We set μ = 03 and σ = 26 in our simulations With theBrownian motion simulated from independent Gaussian incre-ments we can generate the samples for the GBM We simulate600 times with = 1252 For each simulation we generate1000 observations and use the first two-thirds of the observa-tions as in-sample data and the remaining observations as out-sample data

Because of the nonstationarity of the process the simulationresults are somewhat unstable with a few uncommon realiza-tions appearing Therefore we use a robust method to sum-marize the results For each measure we compute the relativeloss and the median along with the median absolute deviation(MAD) for scale estimation The MAD for sample ais

i=1 isdefined as MAD(a) = median|ai minus median(a)| i = 1 sIt is easy to see that MAD(a) provides a robust estimator ofthe standard deviation of ai Table 4 reports the results Thetable shows that the integrated estimator performs quite wellAll estimators have acceptable ER values The proposed esti-

Table 4 Robust Comparisons of Several Volatility Estimation Methods

Measure Empirical formula Hist RiskM Semi NonBay Integ GARCH StaDo

RIMADE Score () 733 4300 3583 7283 6133 2733 5233Median (times10minus1) 2254 1561 1770 1490 1382 2036 1443

MAD 035 012 024 012 034 036 037Relative loss () 6310 1296 2805 781 0 4732 444

IMADE Score () 1333 4333 2633 7333 6517 2550 5300Median (times10minus7) 217 174 193 163 146 265 154MAD (times10minus8) 566 474 486 445 322 817 357

Relative loss () 4864 1913 3211 1205 0 8145 561

MADE Score () 3900 3950 2650 4933 5200 4233 5133Median (times10minus6) 115 111 111 110 109 111 109MAD (times10minus7) 236 244 240 244 231 230 230

Relative loss () 592 143 181 123 0 174 29

Score () 1417 4317 2500 7317 6467 2683 5300IRADE Median (times10minus4) 346 267 310 255 251 378 266

MAD (times10minus5) 599 406 488 389 566 847 644Relative loss () 3787 649 2358 149 0 5037 591

RADE Score () 3900 3700 2433 5117 6150 2950 5750Median (times10minus4) 1581 1509 1515 1508 1504 1661 1503MAD (times10minus4) 185 201 198 200 193 257 192

Relative loss () 509 30 73 23 0 1042 minus07

ER Median 054 051 054 051 048 057 0480MAD 010 003 004 003 006 007 006

Fan Fan and Jiang Integrating Time and State Domains for Volatility Estimation 627

(a) (b)

Figure 4 The Dynamic Weights for Prediction for Example 3 (a) and the Biases for Prediction for the State-Domain Estimator ( minus minus minus) and theIntegration Estimator ( mdash) (b)

mators have the smallest measure values in median or robustscale This shows that our integrated method continues to per-form better than the others for this nonstationary case

Again Figure 4(b) displays the biases for the state domainand integration estimators It can be seen that the state-domainand integration estimators have almost overlaid biases demon-strating that the bias of the integrated estimator is taken care ofeven though we choose the dynamic weights by minimizing thevariance

52 Real Examples

In this section we apply the integrated volatility estimationmethods and other methods to the analysis of real financial data

521 Treasury Bond Here we consider the weekly returnsof three Treasury Bonds with terms of 1 5 and 10 years Thein-sample data are for January 4 1974ndashDecember 30 1994 andout-sample data are for January 6 1995ndashAugust 8 2003 Thetotal sample size is 1545 and the in-sample size is 1096 Theresults are reported in Table 5

From Table 5 for the 1-year Treasury Notes the integratedestimator is of the smallest MADE and the smallest RADE re-flecting that the integrated estimation method of the volatilityis the best among the seven methods For the bonds with 5-or 10-year maturities the seven estimators have close MADEsand RADEs where the historical simulation method is bet-ter than the RiskMetrics in terms of MADE and RADE andthe integrated estimation approach has the smallest MADEsIn addition the integrated estimator has ER values closest to

05 This demonstrates the advantage of using state-domaininformation which can improve the time-domain predictionof the volatilities in the interest dynamics The nonparametricBayesian method does not perform as well as the integrated es-timator because it uses fixed weights and the inverse-gammaprior which may not be true in practice

522 Exchange Rate We analyze the daily exchange rateof several foreign currencies against the US dollar The dataare for January 3 1994ndashAugust 1 2003 The in-sample dataconsist of the observations before January 1 2001 and the out-sample data consist of the remaining observations The resultsreported in Table 6 show that the integrated estimator has thesmallest MADEs and moderate RADEs for the exchange ratesNote that the RADE measure is effective only when the dataseem to be from a normal distribution and thus cannot calibratethe accuracy of volatility forecast for data from an unknowndistribution For this reason the RADE might not be a goodmeasure of performance for this example Both the integratedvolatility estimation and GARCH perform nicely for this exam-ple They outperform other nonparametric methods

Figure 5 reports the dynamic weights for the daily exchangerate of the UK pound to the US dollar Because the RiskMestimator is far more informative than the StaDo estimator ourintegrated estimator becomes basically the time-domain esti-mator by automatically choosing large dynamic weights Thisexemplifies our statement in the paragraph immediately before(1) in Section 1

Table 5 Comparisons of Several Volatility Estimation Methods

Term Measure (times10minus2) Hist RiskM Semi NonBay Integ GARCH StaDo

1 year MADE 1044 787 787 794 732 893 914RADE 5257 4231 4256 4225 4105 4402 4551

ER 2237 2009 2013 1562 3795 0 7366

5 years MADE 1207 1253 1296 1278 1201 1245 1638RADE 5315 5494 5630 5563 5571 5285 6543

ER 671 1339 1566 1112 5804 0 8929

10 years MADE 1041 1093 1103 1111 1018 1046 1366RADE 4939 5235 5296 5280 5152 4936 6008

ER 1112 1563 1790 1334 4911 0 6920

628 Journal of the American Statistical Association June 2007

Table 6 Comparison of Several Volatility Estimation Methods

Currency Measure Hist RiskM Semi NonBay Integ GARCH StaDo

UK MADE (times10minus4) 614 519 536 519 493 506 609RADE (times10minus3) 3991 3424 3513 3440 3492 3369 3900

ER 009 014 019 015 039 005 052

Australia MADE (times10minus4) 172 132 135 135 126 132 164RADE (times10minus3) 1986 1775 1830 1798 1761 1768 2033

ER 054 025 026 022 042 018 045

Japan MADE (times10minus1) 5554 5232 5444 5439 5064 5084 6987RADE (times10minus1) 3596 3546 3622 3588 3559 3448 4077

ER 012 011 019 012 028 003 032

6 CONCLUSIONS

We have proposed a dynamically integrated method and aBayesian method to aggregate the information from the timedomain and the state domain We studied the performance com-parisons both empirically and theoretically We showed that theproposed integrated method effectively aggregates the informa-tion from both the time and state domains and has advantagesover some previous methods for example it is powerful in fore-casting volatilities for the yields of bonds and for exchangerates Our study also revealed that proper use of informationfrom both the time domain and the state domain makes volatil-ity forecasting more accurate Our method exploits both thecontinuity in the time domain and the stationarity in the statedomain and can be applied to situations in which these twoconditions hold approximately

APPENDIX CONDITIONS AND PROOFSOF THEOREMS

We state technical conditions for the proof of our results

(C1) (Global Lipschitz condition) There exists a constant k0 ge 0such that

|μ(x) minus μ(y)| + |σ(x) minus σ(y)| le k0|x minus y| for all x y isin R

(C2) Given time point t gt 0 there exists a constant L gt 0 such that

E|μ(rs)|4(p+δ) le L and E|σ(rs)|4(p+δ) le L

for any s isin [t minus η t] where η is some positive constant p is a positiveinteger and δ is some small positive constant

Figure 5 Dynamic Weights for the Daily Exchange Rate of the UKPound and the US Dollar

(C3) The discrete observations rtiNi=0 satisfy the stationarity con-

ditions of Banon (1978) Furthermore the G2 condition of Rosenblatt(1970) holds for the transition operator

(C4) The conditional density p(y|x) of rti+ given rti is continuousin the arguments (y x) and is bounded by a constant independent of

(C5) The kernel W is a bounded symmetric probability densityfunction with compact support say [minus11]

(C6) (N minus n)h rarr infin (N minus n)h5 rarr 0 and (N)h rarr 0

Condition (C1) ensures that (3) has a unique strong solution rt t ge0 continuous with probability 1 if the initial value r0 satisfies thatP(|r0| lt infin) = 1 (see thm 46 of Liptser and Shiryayev 2001) Condi-tion (C2) indicates that given time point t gt 0 there is a time interval[t minus η t] on which the drift and the volatility have finite 4(p + δ)thmoments which is needed for deriving asymptotic normality of thetime-domain estimate σ 2

ESt Conditions (C3)ndash(C5) are similar to thoseof Fan and Zhang (2003) Condition (C6) is assumed for bandwidthswhere undersmoothing is uses to yield zero bias in the asymptotic nor-mality of the state-domain estimate

Throughout the proof we denote by M a generic positive constantand use μs and σs to represent μ(rs) and σ(rs)

Proposition A1 Under conditions (C1) and (C2) we have for al-most all sample paths

|σ 2s minus σ 2

u | le K|s minus u|(2pminus1)(4p) (A1)

for any su isin [t minus η t] where the coefficient K satisfies E[K2(p+δ)]ltinfin

Proof First we show that the process rs is locally Houmllder-continuous with order q = (p minus 1)(2p) and coefficient K1 such that

E[K4(p+δ)1 ] lt infin that is

|rs minus ru| le K1|s minus u|q foralls u isin [t minus η t] (A2)

In fact by Jensenrsquos inequality and martingale moment inequalities(Karatzas and Shreve 1991 sec 33D p 163) we have

E|ru minus rs|4(p+δ)

le M

(E

∣∣∣∣int u

sμv dv

∣∣∣∣4(p+δ)

+ E

∣∣∣∣int u

sσv dWv

∣∣∣∣4(p+δ))

le M(u minus s)4(p+δ)minus1int u

sE|μv|4(p+δ) dv

+ M(u minus s)2(p+δ)minus1int u

sE|σv|4(p+δ) dv

le M(u minus s)2(p+δ)

Then by theorem 21 of Revuz and Yor (1999 p 26)

E[(

supt 13=s

|rs minus ru||s minus u|α)4(p+δ)]

lt infin (A3)

Fan Fan and Jiang Integrating Time and State Domains for Volatility Estimation 629

for any α isin [02(p+δ)minus1

4(p+δ)) Let α = pminus1

2p and K1 = sups 13=u|rs minusru||sminusu|(pminus1)(2p) Then E[K4(p+δ)

1 lt infin] and the inequality (A2)holds

Second by condition (C1) we have |σ 2s minus σ 2

u | le k0(σs + σu)|rs minusru| This combined with (A2) leads to

|σ 2s minus σ 2

u | le k0(σs + σu)K1|s minus u|q equiv K|s minus u|q

Then by Houmllderrsquos inequality

E[K2(p+δ)

] le ME[(σsK1)2(p+δ)

]

le Mradic

E[σ

4(p+δ)s

]E[K4(p+δ)

1

]lt infin

Proof of Theorem 1

Let Zis = (rs minus rti)2 Applying Itocircrsquos formula to Zis we obtain

dZis = 2

(int s

tiμu du +

int s

tiσu dWu

)(μs ds + σs dWs

)+ σ 2

s ds

= 2

[(int s

tiμu du +

int s

tiσu dWu

)μs ds + σs

(int s

tiμu du

)dWs

]

+ 2

(int s

tiσu dWu

)σs dWs + σ 2

s ds

Then Y2i can be decomposed as Y2

i = 2ai + 2bi + σ 2i where

ai = minus1[int ti+1

tiμs ds

int s

tiμu du +

int ti+1

tiμs ds

int s

tiσu dWu

+int ti+1

tiσs dWs

int s

tiμu du

]

bi = minus1int ti+1

ti

int s

tiσu dWuσs dWs

and

σ 2i = minus1

int ti+1

tiσ 2

s ds

Therefore σ 2ESt can be written as

σ 2ESt = 2

1 minus λ

1 minus λn

tminus1sum

i=tminusn

λtminusiminus1ai + 21 minus λ

1 minus λn

tminus1sum

i=tminusn

λtminusiminus1bi

+ 1 minus λ

1 minus λn

tminus1sum

i=tminusn

λtminusiminus1σ 2i

equiv An + Bn + Cn

By Proposition A1 as n rarr 0 |Cn minus σ 2t | le K(n)q where q =

(2p minus 1)(4p) This combined with Lemmas A1 and A2 completesthe proof of the theorem

Lemma A1 If condition (C2) is satisfied then E[A2n] = O()

Proof Simple algebra gives the result In fact

E(a2i ) le 3E

[minus1

int ti+1

tiμs ds

int s

tiμu du

]2

+ 3E

[minus1

int ti+1

tiμs ds

int s

tiσu dWu

]2

+ 3E

[minus1

int ti+1

tiσs dWs

int s

tiμu du

]2

equiv I1() + I2() + I3()

Applying Jensenrsquos inequality we obtain that

I1() = O(minus1)E

[int ti+1

ti

int s

tiμ2

s μ2u du ds

]

= O(minus1)

int ti+1

ti

int s

tiE(μ4

u + μ4s )du ds = O()

By Jensenrsquos inequality Houmllderrsquos inequality and martingale momentinequalities we have

I2() = O(minus1)

int ti+1

tiE

(μs

int s

tiσu dWu

)2ds

= O(minus1)

int ti+1

ti

E[μs]4E

[int ti+1

tiσu dWu

]412ds

= O()

Similarly I3() = O() Therefore E(a2i ) = O() Then by the

CauchyndashSchwartz inequality and noting that n(1 minus λ) = O(1) we ob-tain that

E[A2n] le n

(1 minus λ

1 minus λn

)2 nsum

i=1

λ2(nminusi)E(a2i ) = O()

Lemma A2 Under conditions (C1) and (C2) if n rarr infin andn rarr 0 then

sminus11t

radicnBn

Dminusrarr N (01) (A4)

Proof Note that bj = σ 2t minus1 int tj+1

tj (Ws minus Wtj)dWs + εj where

εj = minus1int tj+1

tj(σs minus σt)

[int s

tjσu dWu

]dWs

+ minus1σt

int tj+1

tj

[int s

tj(σu minus σt)dWu

]dWs

By the central limit theorem for martingales (see Hall and Heyde 1980cor 31) it suffices to show that

V2n equiv E[sminus2

1t nB2n] rarr 1 (A5)

and that the following Lyapunov condition holds

tminus1sum

i=tminusn

E

(radicn

1 minus λ

1 minus λn λtminusiminus1bi

)4rarr 0 (A6)

Note that

2

2E(ε2

j ) le E

int tj+1

tj(σs minus σt)

[int s

tjσu dWu

]dWs

2

+ σ 2t E

int tj+1

tj

[int s

tj(σu minus σt)dWu

]dWs

2

equiv Ln1 + Ln2 (A7)

By Jensenrsquos inequality Houmllderrsquos inequality and moment inequalitiesfor martingale we have

Ln1 leint tj+1

tjE

(σs minus σt)

2[int s

tjσu dWu

]2ds

leint tj+1

tj

E(σs minus σt)

4E

[int s

tjσu dWu

]412ds

leint tj+1

tj

E[k0K1(n)q]436

int s

tjE(σ 4

u )du

12ds

le M(n)2q2 (A8)

630 Journal of the American Statistical Association June 2007

where K1 is defined in Proposition A1 Similarly

Ln2 le M(n)2q2 (A9)

By (A7) (A8) and (A9) we have E(ε2j ) le M(n)2q therefore

E[σminus4t b2

j ] = 12 + O((n)q) By the theory of stochastic calculus sim-

ple algebra gives that E(bj) = 0 and E(bibj) = 0 for i 13= j It followsthat

V2n = E(sminus2

1t nB2n) =

tminus1sum

i=tminusn

E

(2s1t

radicn

1 minus λ

1 minus λn λtminusiminus1bi

)2rarr 1

that is (A5) holds For (A6) it suffices to prove that E(b4j ) is

bounded which holds from condition (C2) and by applying the mo-ment inequalities for martingales to b4

j

Proof of Theorem 2

The proof is completed along the same lines as given by Fan andZhang (2003)

Proof of Theorem 3

By Fan and Yao (1998) the volatility estimator σ 2StN

behaves asif the instantaneous return function f were known Thus without lossof generality we assume that f (x) = 0 and hence Ri = Y2

i Let Y =(Y2

0 Y2Nminus1)T and W = diagWh(rt0 minus rtN ) Wh(rtNminus1 minus rtN )

and let X be a (N minus t0)times 2 matrix with each of the elements in the firstcolumn being 1 and the second column being (rt0 minus rtN rtNminus1 minusrtN )prime Let mi = E[Y2

i |rti ] m = (m0 mNminus1)T and e1 = (10)T Define SN = XT WX and TN = XT WY Then it can be written thatσ 2

StN= eT

1 Sminus1N TN (see Fan and Yao 2003) Hence

σ 2StN

minus σ 2tN = eT

1 Sminus1N XT Wm minus XβN + eT

1 Sminus1N XT W(Y minus m)

equiv eT1 b + eT

1 t (A10)

where βN = (m(rtN ) mprime(rtN ))T with m(rx) = E[Y21 |rt1 = x] Follow-

ing Fan and Zhang (2003) the bias vector b converges in probability toa vector b with b = O(h2) = o(1

radic(N)h) In what follows we show

that the centralized vector t is asymptotically normalIn fact write u = (N)minus1Hminus1XT W(Y minus m) where H = diag1h

Then following Fan and Zhang (2003) the vector t can be written as

t = pminus1(rtN )Hminus1Sminus1u(1 + op(1)) (A11)

where S = (μi+jminus2)ij=12 with μj = intujW(u)du and p(x) is the in-

variant density of rt defined in Section 22 For any constant vector cdefine

QN = cT u = 1

N

Nminus1sum

i=0

Y2i minus miCh

(rti minus rtN

)

where Ch(middot) = 1hC(middoth) with C(x) = c0W(x) + c1xW(x) Applyingthe ldquobig-blockrdquo and ldquosmall-blockrdquo arguments of Fan and Yao (2003thm 63 p 238) we obtain

θminus1(rtN )radic

(N)hQNDminusrarr N(01) (A12)

where θ2(rtN ) = 2p(rtN )σ 4(rtN )int +infinminusinfin C2(u)du In what follows we

decompose QN into two parts QprimeN and Qprimeprime

N which satisfy the follow-ing

(a) NhE[θminus1(rtN )QprimeN ]2 le (hN)(hminus1aN(1 + o(1)) + (N) times

o(hminus1)) rarr 0

(b) QprimeprimeN is identically distributed as QN and is asymptotically inde-

pendent of σ 2EStN

Define QprimeN = 1

NsumaN

i=0Y2i minus E[Y2

i |rti ]Ch(rti minus rtN ) and QprimeprimeN = QN minus

QprimeN where aN is a positive integer satisfying aN = o(N) and aN rarr

infin Obviously aN n Let ϑNi = (Y2i minus mi)Ch(rti minus rtN ) Then fol-

lowing Fan and Zhang (2003)

var[θminus1(

rtN)ϑN1

] = hminus1(1 + o(1))

and

Nminus2sum

=1

| cov(ϑN1 ϑN+1)| = o(hminus1)

which yields the result in (a) This combined with (A12) (a) and thedefinition of Qprimeprime

N leads to

θminus1(rtN )radic

NhQprimeprimeN

Dminusrarr N(01) (A13)

Note that the stationarity conditions of Banon (1978) and the G2 con-dition of Rosenblatt (1970) on the transition operator imply that theρt mixing coefficient ρt() of rti decays exponentially and that thestrong mixing coefficient α() le ρt() it follows that∣∣E exp

(Qprimeprime

N + σ 2EStN

) minus E expiξ(QprimeprimeN)E exp

(iξ σ 2

EStN

)∣∣

le 32α((aN minus n)) rarr 0

for any ξ isin R Using the theorem of Volkonskii and Rozanov (1959)we get the asymptotic independence of σ 2

EStNand Qprimeprime

N

By (a)radic

NhQprimeN is asymptotically negligible Then by Theorem 1

d1θminus1(rtN

)radicNhQN + d2Vminus12

2radic

n[σ 2

EStNminus σ 2(

rtN)]

DminusrarrN (0d21 + d2

2)

for any real d1 and d2 where V2 = ec+1ecminus1σ 4(rtN ) Because QN is a

linear transformation of u

Vminus12[ radic

Nhuradicn[σ 2

EStNminus σ 2(rtN )]

]Dminusrarr N (0 I3)

where V = blockdiagV1V2 with V1 = 2σ 4(rtN )p(rtN )Slowast whereSlowast = (νi+jminus2)ij=12 with νj = int

ujW2(u)du This combined with

(A11) gives the joint asymptotic normality of t and σ 2EStN

Note that

b = op(1radic

Nh) and it follows that

minus12(radic

Nh[σ 2StN

minus σ 2(rtN )]radic

n[σ 2EStN

minus σ 2(rtN )])

Dminusrarr N (0 I2)

where = diag2σ 4(rtN )ν0p(rtN )V2 Note that σ 2StN

and σ 2EStN

are asymptotically independent it follows that the asymptotical nor-mality in (b) holds

The result in (c) follows from (b) and the fact that wtNPrarr wtN and

σ 2ItN

= wtN σ 2EStN

+ (1 minus wtN )σ 2StN

It follows that

σ 2ItN minus σ 2

tN

= σ 2tN minus σ 2

tN + (wtN minus wtN

)[(σ 2

EStNminus σ 2

tN

) minus (σ 2

StNminus σ 2

tN

)]

equiv Ln1 + (wtN minus wtN

)(Ln2 minus Ln3)

Because Ln1 Ln2 and Ln3 are of the same order the second and thirdterms are dominated by the first term Ln1 Then σ 2

ItNminus σ 2

tN = (σ 2tN minus

σ 2tN )(1 + op(1)) and thus σ 2

ItNminus σ 2

tN shares the same asymptotical

normality as σ 2tN minus σ 2

tN

[Received November 2005 Revised December 2006]

Fan Fan and Jiang Integrating Time and State Domains for Volatility Estimation 631

REFERENCESAiumlt-Sahalia Y and Mykland P (2004) ldquoEstimators of Diffusions With Ran-

domly Spaced Discrete Observations A General Theoryrdquo The Annals of Sta-tistics 32 2186ndash2222

Aiumlt-Sahalia Y Mykland P A and Zhang L (2005) ldquoHow Often to Samplea Continuous-Time Process in the Presence of Market Microstructure NoiserdquoReview of Financial Studies 18 351ndash416

Arapis M and Gao J (2004) ldquoNonparametric Kernel Estimation and Testingin Continuous-Time Financial Econometricsrdquo manuscript

Bandi F and Phillips (2003) ldquoFully Nonparametric Estimation of Scalar Dif-fusion Modelsrdquo Econometrica 71 241ndash283

Banon G (1978) ldquoNonparametric Identification for Diffusion ProcessesrdquoSIAM Journal of Control Optimization 16 380ndash395

Barndoff-Neilsen O E and Shephard N (2001) ldquoNon-Gaussian OrnsteinndashUhlenbeckndashBased Models and Some of Their Uses in Financial Economicsrdquo(with discussion) Journal of the Royal Statistical Society Ser B 63167ndash241

(2002) ldquoEconometric Analysis of Realized Volatility and Its Use inEstimating Stochastic Volatility Modelsrdquo Journal of the Royal Statistical So-ciety Ser B 64 253ndash280

Bollerslev T and Zhou H (2002) ldquoEstimating Stochastic Volatility DiffusionUsing Conditional Moments of Integrated Volatilityrdquo Journal of Economet-rics 109 33ndash65

Brenner R J Harjes R H and Kroner K F (1996) ldquoAnother Look at Mod-els of the Short-Term Interest Raterdquo Journal of Financial and QuantitativeAnalysis 31 85ndash107

Cai Z and Hong Y (2003) ldquoNonparametric Methods in Continuous-TimeFinance A Selective Reviewrdquo in Recent Advances and Trends in Nonpara-metric Statistics eds M G Akritas and D M Politis Amsterdam Elsevierpp 283ndash302

Chan K C Karolyi A G Longstaff F A and Sanders A B (1992)ldquoAn Empirical Comparison of Alternative Models of the Short-Term Inter-est Raterdquo Journal of Finance 47 1209ndash1227

Chapman D A and Pearson N D (2000) ldquoIs the Short Rate Drift ActuallyNonlinearrdquo Journal of Finance 55 355ndash388

Chen S X Gao J and Tang C Y (2007) ldquoA Test for Model Specificationof Diffusion Processesrdquo The Annals of Statistics to appear

Cox J C Ingersoll J E and Ross S A (1985) ldquoA Theory of the TermStructure of Interest Ratesrdquo Econometrica 53 385ndash467

Daveacute R D and Stahl G (1997) ldquoOn the Accuracy of VaR Estimates Based onthe VariancendashCovariance Approachrdquo working paper Olshen and Associates

Duan J C (1997) ldquoAugumented GARCH(pq) Process and Its DiffusionLimitrdquo Journal of Econometrics 79 97ndash127

Duffie D and Pan J (1997) ldquoAn Overview of Value at Riskrdquo The Journal ofDerivatives 4 7ndash49

Fan J (2005) ldquoA Selective Overview of Nonparametric Methods in FinancialEconometricsrdquo (with discussion) Statistical Science 20 317ndash357

Fan J and Gu J (2003) ldquoSemiparametric Estimation of Value-at-RiskrdquoEconometrics Journal 6 261ndash290

Fan J Jiang J Zhang C and Zhou Z (2003) ldquoTime-Dependent DiffusionModels for Term Structure Dynamics and the Stock Price Volatilityrdquo Statis-tica Sinica 13 965ndash992

Fan J and Yao Q (1998) ldquoEfficient Estimation of Conditional Variance Func-tions in Stochastic Regressionrdquo Biometrika 85 645ndash660

(2003) Nonlinear Time Series Nonparametric and Parametric Meth-ods New York Springer-Verlag

Fan J and Zhang C M (2003) ldquoA Reexamination of Diffusion EstimatorsWith Applications to Financial Model Validationrdquo Journal of the AmericanStatistical Association 98 118ndash134

Genon-Catalot V Jeanthheau T and Laredo C (1999) ldquoParameter Esti-mation for Discretely Observed Stochastic Volatility Modelsrdquo Bernoulli 5855ndash872

Gijbels I Pope A and Wand M P (1999) ldquoUnderstanding ExponentialSmoothing via Kernel Regressionrdquo Journal of the Royal Statistical SocietySer B 61 39ndash50

Hall P and Hart J D (1990) ldquoNonparametric Regression With Long-RangeDependencerdquo Stochastic Processes and Their Applications 36 339ndash351

Hall P and Heyde C (1980) Martingale Limit Theorem and Its ApplicationsNew York Academic Press

Hallin M Lu Z and Tran L T (2001) ldquoDensity Estimation for Spatial Lin-ear Processesrdquo Bernoulli 7 657ndash668

Haumlrdle W Herwartz H and Spokoiny V (2002) ldquoTime-InhomogeneousMultiple Volatility Modellingrdquo Journal of Finance and Econometrics 155ndash95

Hong Y and Li H (2005) ldquoNonparametric Specification Testing forContinuous-Time Models With Applications to Term Structure of InterestrdquoReview of Financial Studies 18 37ndash84

Karatzas I and Shreve S (1991) Brownian Motion and Stochastic Calculus(2nd ed) New York Springer-Verlag

Kloeden D E Platen E Schurz H and Soslashrensen M (1996) ldquoOn Effectsof Discretization on Estimators of Drift Parameters for Diffusion ProcessesrdquoJournal of Applied Probability 33 1061ndash1076

Liptser R S and Shiryayev A N (2001) Statistics of Random Processes I II(2nd ed) New York Springer-Verlag

Mercurio D and Spokoiny V (2004) ldquoStatistical Inference for Time-Inhomogeneous Volatility Modelsrdquo The Annals of Statistics 32 577ndash602

Morgan J P (1996) RiskMetrics Technical Document (4th ed) New YorkAuthor

Nelson D B (1990) ldquoARCH Models as Diffusion Approximationsrdquo Journalof Econometrics 45 7ndash38

Revuz D and Yor M (1999) Continuous Martingales and Brownian MotionNew York Springer-Verlag

Robinson P M (1997) ldquoLarge-Sample Inference for Nonparametric Regres-sion With Dependent Errorsrdquo The Annals of Statistics 25 2054ndash2083

Rosenblatt M (1970) ldquoDensity Estimates and Markov Sequencesrdquo in Non-parametric Techniques in Statistical Inference ed M L Puri CambridgeUK Cambridge University Press pp 199ndash213

Ruppert D Wand M P Holst U and Houmlssjer O (1997) ldquoLocal PolynomialVariance Function Estimationrdquo Technometrics 39 262ndash273

Spokoiny V (2000) ldquoDrift Estimation for Nonparametric Diffusionrdquo The An-nals of Statistics 28 815ndash836

Stanton R (1997) ldquoA Nonparametric Models of Term Structure Dynamics andthe Market Price of Interest Rate Riskrdquo Journal of Finance LII 1973ndash2002

Vasicek O (1977) ldquoAn Equilibrium Characterization of Term Structurerdquo Jour-nal of Financial Economics 5 177ndash188

Volkonskii V A and Rozanov Yu A (1959) ldquoSome Limit Theorems forRandom Functions Part Irdquo Theory of Probability and Its Applications 4178ndash197

Wang Y (2002) ldquoAsymptotic Nonequivalence of GARCH Models and Diffu-sionsrdquo The Annals of Statistics 30 754ndash783

Page 2: Dynamic Integration of Time- and ... - Princeton Universityorfe.princeton.edu/~jqfan/papers/04/timestate4.pdf · Dynamic Integration of Time- and State-Domain Methods for Volatility

Fan Fan and Jiang Integrating Time and State Domains for Volatility Estimation 619

(a)

(b) (c)

Figure 1 Illustration of Time-Domain and State-Domain Estimation (a) The yields of 3-month Treasury Bills from 1954 to 2004 The vertical barindicates localization in time and the horizontal bar represents localization in state (b) Illustration of time-domain smoothing Squared differencesare plotted against its time index and the exponential weights are used to compute the local average (c) Illustration of state-domain smoothingSquared differences are plotted against the level of interest rates restricted to the interval 647 plusmn 25 indicated by the horizontal bar in (a) TheEpanechnikov kernel is used for computing the local average

combining the time- and state-domain information in a para-metric form However there is no formal work in the litera-ture on efficiently integrating the time- and state-domain esti-mators

In this article we realize the existence of two pieces of weaklyindependent information and introduce a general approach forintegrating the time- and state-domain estimators The inte-grated estimator borrows the strengths of both the time- andstate-domain estimators with aggregated information from thedata and thus dominates both of the estimators when the data-generating process is a continuous stationary diffusion processIn particular when the time domain is far more informative thanthe state domain the integrated estimator will basically becomethe time-domain estimator On the other hand if at a particulartime when the performance of the state-domain estimator dom-inates that of the time-domain estimator the procedure opts au-tomatically for the state-domain estimator

Our strategy for integration is to introduce a dynamic weight-ing scheme 0 le wt le 1 to combine the two weakly dependentestimators Define the resulting integrated estimator as

σ 2t = wtσ

2ttime + (1 minus wt)σ

2tstate (1)

The question is how to choose the dynamic weight wt to op-timize the performance A reasonable approach is to minimizethe variance of the combined estimator leading to the dynamic

optimal weight

wt = var(σ 2tstate)

var(σ 2ttime) + var(σ 2

tstate) (2)

Estimation of the unknown variances in (2) is introduced in Sec-tion 3 Another approach to integration is to use the Bayesianapproach which considers the historical information the priorWe explore this idea in Section 4

To appreciate the intuition behind our approach and how itworks consider the diffusion process

drt = μ(rt)dt + σ(rt)dWt (3)

where Wt is a Wiener process on [0infin) This diffusion processis frequently used to model asset price and the yields of bondswhich are fundamental to fixed income securities financialmarkets consumer spending corporate earnings asset pricingand inflation The volatilities of interest rates are fundamentallyimportant to value the prices of bonds and contract debt obliga-tions (CDOs) the derivatives of bonds and their associated riskmanagement (eg hedging)

The family of models in (4) includes famous ones such asthe Vasicek (1977) model the CIR model (Cox Ingersoll andRoss 1985) and the CKLS model (Chan Karolyi Longstaffand Sanders 1992) Suppose that at time t we have historic datartiN

i=0 from process (3) with a sampling interval Our aim

620 Journal of the American Statistical Association June 2007

is to estimate the volatility σ 2t equiv σ 2(rt) Let Yi = minus12(rti+1 minus

rti) Then for model (3) the Euler approximation scheme is

Yi asymp μ(rti)12 + σ(rti)εi (4)

where εiiidsim N(01) for i = 0 N minus 1 Fan and Zhang (2003)

studied the impact of the order of difference on statistical es-timation They found that although a higher order can possi-bly reduce approximation errors it substantially increases thevariances of data They recommended the Euler scheme (4) formost practical situations

To facilitate the derivation of mathematical theory we focuson the estimation of volatility in model (3) to illustrate howto deal with the problem of dynamic integration Asymptoticnormality of the proposed estimator is established under thismodel and extensive simulations are conducted This theoret-ically and empirically demonstrates the dominant performanceof the integrated estimation Our method focuses on only theestimation of volatility but it can be adapted to other estima-tion problems such as the value at risk studied by Duffie andPan (1997) the drift estimation for diffusion considered bySpokoiny (2000) and conditional moments conditional corre-lation conditional distribution and derivative pricing It is alsoapplicable to other estimation problems in time series such asforecasting the mean function Further studies along these linesare beyond the scope of the current investigation

2 ESTIMATION OF VOLATILITY

From here on for theoretical derivations we assume that thedata rti are sampled from the diffusion model (3) As demon-strated by Stanton (1997) and Fan and Zhang (2003) the driftterm in (4) contributes to the volatility in the order of o(1) undercertain conditions when rarr 0 (eg Stanton 1997) Thus forsimplicity we sometimes ignore the drift term when estimatingthe volatility

21 Time-Domain Estimator

A popular version of the time-domain estimator of thevolatility is the moving average estimator

σ 2MAt = nminus1

tminus1sum

i=tminusn

Y2i (5)

where n is the size of the moving window This estimatorignores the drift component and uses local n data points Anextension of the moving average estimator is the exponentialsmoothing estimation given by

σ 2ESt = (1 minus λ)Y2

tminus1 + λσ 2EStminus1

= (1 minus λ)Y2tminus1 + λY2

tminus2 + λ2Y2tminus3 + middot middot middot (6)

where λ is a smoothing parameter between 0 and 1 that controlsthe size of the local neighborhood This estimator is closelyrelated to that from the GARCH(1 1) model (see Fan JiangZhang and Zhou 2003 for more details) The RiskMetrics ofJ P Morgan (1996) which is used for measuring the risk calledvalue at risk (VaR) of financial assets recommends λ = 94and λ = 97 for calculating the VaR of the daily and monthlyreturns

The exponential smoothing estimator in (6) is a weightedsum of the squared returns before time t Because the weightdecays exponentially it essentially uses recent data A slightlymodified version that explicitly uses only n data points beforetime t is

σ 2ESt = 1 minus λ

1 minus λn

nsum

i=1

Y2tminusiλ

iminus1 (7)

where the smoothing parameter λ depends on n When λ rarr 1it becomes the moving average estimator (5)

Theorem 1 Suppose that σ 2t gt 0 Under conditions (C1)

and (C2) if n rarr infin and n rarr 0 then σ 2ESt minus σ 2

t rarr 0 almostsurely Moreover if the limit c = limnrarrinfin n(1 minus λ) exists andn(2pminus1)(4pminus1) rarr 0 where p is as specified in condition (C2)then

radicn[σ 2

ESt minus σ 2t ]s1t

Dminusrarr N (01)

where s21t = cσ 4

tec+1ecminus1

Theorem 1 has very interesting implications We can com-pute the variance as if the data were independent Indeed if thedata in (7) were independent and locally homogeneous then

var(σ 2ESt) asymp (1 minus λ)2

(1 minus λn)22σ 4

t

nsum

i=1

λ2(iminus1) asymp 1

ns2

1t

This is indeed the asymptotic variance given in Theorem 1

22 Estimation in State Domain

To obtain the nonparametric estimation of the functionsf (x) = 12μ(x) and σ 2(x) in (4) we use the local linearsmoother studied by Ruppert Wand Holst and Houmlssjer (1997)and Fan and Yao (1998) The local linear technique is chosenfor its several nice properties including asymptotic minimaxefficiency and design adaptation Furthermore it automaticallycorrects edge effects and facilitates bandwidth selections (Fanand Yao 2003)

Note that the historical data used in the state-domain estima-tion at time t are (rti Yi) i = 0 N minus 1 Let f (x) = α1 bethe local linear estimator that solves the weighted least squaresproblem

(α1 α2) = arg minα1α2

Nminus1sum

i=0

[Yi minus α1 minus α2

(rti minus x

)]2Kh1

(rti minus x

)

where K(middot) is a kernel function and h1 gt 0 is a bandwidth De-note the squared residuals by Ri = Yi minus f (rti)2 Then the locallinear estimator of σ 2(x) is σ 2

S (x) = β0 given by

(β0 β1) = arg minβ0β1

Nminus1sum

i=0

Ri minus β0 minus β1

(rti minus x

)2Wh

(rti minus x

)

(8)with a kernel function W and a bandwidth h Fan and Yao(1998) gave strategies for bandwidth selection Stanton (1997)and Fan and Zhang (2003) showed that Y2

i instead of Ri in (8)also can be used for the estimation of σ 2(x)

The asymptotic bias and variance of σ 2S (x) have been given

by Fan and Zhang (2003 thm 4) Set νj = intujW2(u)du for

j = 012 and let p(middot) be the invariant density function of the

Fan Fan and Jiang Integrating Time and State Domains for Volatility Estimation 621

Markov process rs from (3) We then have the following re-sult

Theorem 2 Set s22(x) = 2ν0σ

4(x)p(x) Let x be in the inte-rior of the support of p(middot) Suppose that the second derivativesof μ(middot) and σ 2(middot) exist in a neighborhood of x Under conditions

(C3)ndash(C6)radic

Nh[σ 2S (x) minus σ 2(x)]s2(x)

Dminusrarr N (01)

3 DYNAMIC INTEGRATION OF TIMEndash ANDSTATEndashDOMAIN ESTIMATORS

In this section we first show how the optimal dynamicweights in (2) can be estimated and then prove that the time-domain and state-domain estimators are indeed asymptoticallyindependent

31 Estimation of Dynamic Weights

For the exponential smoothing estimator in (7) we can applythe asymptotic formula given in Theorem 1 to get an estimate ofits asymptotic variance But because the estimator is a weightedaverage of Y2

tminusi we also can obtain its variance directly by as-suming that Ytminusj sim N(0 σ 2

t ) for small j Indeed with the fore-going local homogeneous model we have

var(σ 2ESt)

asymp (1 minus λ)2

(1 minus λn)22σ 4

t

nsum

i=1

nsum

j=1

λi+jminus2ρt(|i minus j|)

= 2(1 minus λ)2σ 4t

(1 minus λn)2

n + 2

nminus1sum

k=1

ρt(k)λk(1 minus λ2(nminusk))

1 minus λ2

(9)

where ρt(k) = cor(Y2t Y2

tminusk) is the autocorrelation of the seriesY2

tminusk The autocorrelation can be estimated from the data inhistory Note that due to the locality of the exponential smooth-ing only ρt(k)rsquos with the first 30 lags say contribute to thevariance calculation

We now turn to estimate the variance of σ 2St = σ 2

S (rt) Detailsof this have been given by Fan and Yao (1998 2003 sec 62)Let

Vj(x) =Nminus1sum

i=1

(rti minus x)jW((

rti minus x)h1

)(10)

and ξi(x) = W(rti minusx

h1)V2(x) minus (rti minus x)V1(x)V0(x)V2(x) minus

V1(x)2 Then the local linear estimator can be expressed asσ 2

S (x) = sumNminus1i=1 ξi(x)Ri and its variance can be approximated

as

var(σ 2S (x)) asymp 2σ 4(x)

Nminus1sum

i=1

ξ2i (x) (11)

Substituting (9) and (11) into (2) we propose to combinethe time-domain and state-domain estimators with the dynamicweight

wt = σ 4St

sumNminus1i=1 ξ2

i (rt)

σ 4St

sumNminus1i=1 ξ2

i (rt) + ctσ4ESt

(12)

where ct = (1minusλ)2

(1minusλn)2 n+2sumnminus1

k=1 ρt(k)λk(1minusλ2(nminusk))(1minusλ2)For practical implementation we truncate the series ρt(k)tminus1

k=1

in the summation as ρt(k)30k=1 This results in the dynamically

integrated estimator

σ 2It = wtσ

2ESt + (1 minus wt)σ

2St (13)

where σ 2St = σ 2

S (rt) The function σ 2S (middot) uses historical data

up to the time t and we need to update this function as timeevolves Fortunately we need know only the function value atthe point rt which significantly reduces the computational costThe computational cost can be further reduced if we update theestimated function σ 2

St on a prescribed time schedule (eg onceevery 2 months for weekly data)

Finally we note that in the choice of weight only the vari-ance of the estimated volatility is considered not the meansquared error (MSE) This is mainly to facilitate the choice ofdynamic weights Because the smoothing parameters in σ 2

ESt

and σ 2S (x) have been tuned to optimize their performance sep-

arately their bias and variance trade-offs have been consideredindirectly Thus controlling the variance of the integrated esti-mator σ 2

It also controls to some extent the bias of the estima-tor

32 Sampling Properties

The fundamental component in the choice of dynamicweights is the asymptotic independence between the time- andstate-domain estimators The following theorem shows that thetime- and state-domain estimators are indeed asymptotically in-dependent To facilitate the expression of notations we presentthe result at the current time tN

Theorem 3 Suppose that the second derivatives of μ(middot) andσ 2(middot) exist in a neighborhood of rtN Under conditions (C1) and(C3)ndash(C6) if condition (C2) holds at time point t = tN then

(a) Asymptotic independence

[radicn(σ 2

EStNminus σ 2

tN )

s1tN

radicNh(σ 2

StNminus σ 2

tN )

s2tN

]T Dminusrarr N (0 I2)

where s1tN and s2tN = s2(rtN ) are as given in Theorems 1 and 2(b) Asymptotic normality of σ 2

tN with wtN in (2) If the

limit d = limNrarrinfin n[Nh] exists thenradic

Nhω[σ 2tN minus σ 2

tN ] DminusrarrN (01) where ω = w2

tN s21tN

d + (1 minus wtN )2s22tN

(c) Asymptotic normality of σ 2ItN

with an estimated weightwtN If the weight wtN converges to the weight wtN in proba-bility and the limit d = limNrarrinfin n[Nh] exists then

radicNhω times

(σ 2ItN

minus σ 2tN )

Dminusrarr N (01)

Because the estimators σ 2St and σ 2

ESt are consistent

wt asympsumtminus1

i=1 ξ2i (rt)

sumtminus1i=1 ξ2

i (rt) + ctasymp var(σ 2

S (rt))

var(σ 2S (rt)) + var(σ 2

ESt(rt))= wt

and hence the condition in Theorem 3(c) holds Note that thetheoretically optimal weight minimizing the variance in Theo-rem 3(b) is

wtN opt = ds22tN

s21tN

+ ds22tN

asymp var(σ 2S (rt))

var(σ 2S (rt)) + var(σ 2

ESt(rt))asymp wtN

622 Journal of the American Statistical Association June 2007

It follows that the integrated estimator σ 2ItN

is optimal in thesense that it achieves the minimum variance among all of theweighted estimators in (1)

When 0 lt d lt infin in Theorem 3 the effective sample sizesin both time and state domains are comparable thus neitherthe time-domain estimator nor the state-domain estimator dom-inates From Theorem 3 based on the optimal weight the as-ymptotic relative efficiencies of σ 2

ItNwith respect to σ 2

StNand

σ 2EStN

are

eff(σ 2ItN σ 2

StN ) = 1 + ds22tN s2

1tN and

eff(σ 2ItN σ 2

EStN ) = 1 + s21tN (ds2

2tN )

which are gt1 This demonstrates that the integrated estimatorσ 2

ItNis more efficient than the time-domain and state-domain

estimators

4 BAYESIAN INTEGRATION OFVOLATILITY ESTIMATES

Another possible approach is to consider the historical infor-mation as the prior and to incorporate it into the estimation ofvolatility using the Bayesian framework We now explore suchan approach

41 Bayesian Estimation of Volatility

The Bayesian approach is to consider the recent data Ytminusn

Ytminus1 as an independent sample from N(0 σ 2) [see (4)] andto consider the historical information being summarized in aprior To incorporate the historical information we assume thatthe variance σ 2 follows an inverse-gamma distribution with pa-rameters a and b which has the density function

f (σ 2) = baminus1(a)σ 2minus(a+1)exp(minusbσ 2)

Write σ 2 sim IG(ab) It is well known that

E(σ 2) = b

a minus 1

var(σ 2) = b2

(a minus 1)2(a minus 2) and (14)

mode(σ 2) = b

a + 1

The hyperparameters a and b are estimated from historical datausing the state-domain estimators

It can be easily shown that given Y = (Ytminusn Ytminus1) theposterior density of σ 2 is IG(alowastblowast) where alowast = a + n

2 andblowast = 1

2

sumni=1 Y2

tminusi + b From (14) the Bayesian mean of σ 2 is

σ 2 = blowast

alowast minus 1=

nsum

i=1

Y2tminusi + 2b

2(a minus 1) + n

This Bayesian estimator can be easily written as

σ 2B = n

n + 2(a minus 1)σ 2

MAt + 2(a minus 1)

n + 2(a minus 1)σ 2

P (15)

where σ 2MAt is the moving average estimator given by (5) and

σ 2P = b(aminus1) is the prior mean determined from the historical

data This combines the estimate based on the data and priorknowledge

The Bayesian estimator (15) uses the local average of n datapoints To incorporate the exponential smoothing estimator (6)we consider it the local average of nlowast = sumn

i=1 λiminus1 = 1minusλn

1minusλdata

points This leads to the following integrated estimator

σ 2Bt = nlowast

nlowast + 2(a minus 1)σ 2

ESt + 2(a minus 1)

2(a minus 1) + nlowast σ 2P

= 1 minus λn

1 minus λn + 2(a minus 1)(1 minus λ)σ 2

ESt

+ 2(a minus 1)(1 minus λ)

1 minus λn + 2(a minus 1)(1 minus λ)σ 2

P (16)

In particular when λ rarr 1 the estimator (16) reduces to (15)

42 Estimation of Prior Parameters

A reasonable source for the prior information in (16) is thehistorical data up to time t Thus the hyperparameters a and bshould depend on t and can be used to match with the momentsfrom the historical information Using the approximation model(4) we have

E[(Yt minus f (rt))

2 | rt] asymp σ 2(rt) and

(17)var

[(Yt minus f (rt))

2 | rt] asymp 2σ 4(rt)

These can be estimated from the historical data up to time tnamely the state-domain estimator σ 2

S (rt) Because we have as-sumed that the prior distribution for σ 2

t is IG(atbt) by (17) andthe method of moments we would get the following estimationequations by matching the moments from the prior distributionwith the moment estimation from the historical estimate

E(σ 2t ) = σ 2

S (rt) and var(σ 2t ) = 2σ 4

S (rt)

This together with (14) leads to

bt

at minus 1= σ 2

S (rt) andb2

t

(at minus 1)2(at minus 2)= 2σ 4

S (rt)

where E represents from the expectation with respect to theprior Solving the foregoing equations we obtain

at = 25 and bt = 15σ 2S (rt)

Substituting this into (16) we obtain the estimator

σ 2Bt = 1 minus λn

1 minus λn + 3(1 minus λ)σ 2

ESt +3(1 minus λ)

1 minus λn + 3(1 minus λ)σ 2

St (18)

Unfortunately the weights in (18) are static and do not dependon the time t Thus the Bayesian method that we use doesnot produce a satisfactory answer to this problem Howeverother implementations of the Bayesian method may yield thedynamic weights

5 SIMULATION STUDIES AND EXAMPLES

To facilitate the presentation we use the simple abbrevia-tions in Table 1 to denote seven volatility estimation methodsDetails on the first three methods have been given by Fan andGu (2003) In particular the first method is to estimate volatilityusing the standard deviation of the yields in the past year andthe RiskMetrics method is based on the exponential smooth-ing with λ = 94 The semiparametric method of Fan and Gu(2003) is an extension of a local model used in the exponential

Fan Fan and Jiang Integrating Time and State Domains for Volatility Estimation 623

Table 1 Seven Volatility Estimators

Hist The historical methodRiskM The RiskMetrics method of J P MorganSemi The semiparametric estimator (SEV) of Fan and Gu (2003)NonBay The nonparametric Bayesian method in (18)Integ The integration method of time and state domains in (13)GARCH The maximum likelihood method based on a GARCH(1 1)

modelStaDo The state-domain estimation method in Section 22

smoothing with the smoothing parameter determined by min-imizing the prediction error It includes exponential smoothingwith λ selected by data as a specific example For our integratedmethods the parameter λ is taken as 94

The following five measures are used to assess the per-formance of different procedures for estimating the volatilityOther related measures also can be used (see Daveacute and Stahl1997)

Measure 1 Exceedence Ratio Against Confidence LevelThis measure counts the number of the events for which theloss of an asset exceeds the loss predicted by the normal modelat a given confidence α It was given by Fan and Gu (2003) andcomputed as

ER(σ 2t ) = mminus1

T+msum

i=T+1

I(Yi lt 13minus1(α)σi) (19)

where 13minus1(α) is the α-quantile of the standard normal distribu-tion and m is the size of the out-sample This gives an indicationof how effectively the volatility estimator can be used for pre-diction

As noted by Fan and Gu (2003) one shortcoming of the mea-sure is its large Monte Carlo error Unless the postsample sizem is sufficiently large this measure has difficulty by differenti-ating the performance of various estimators due to the presenceof large error margins Note that the ER depends strongly onthe assumption of normality In our simulation study we usethe true α-quantile of the error distribution instead of 13minus1(α)

in (19) to compute the ER For real data analysis we use theα-quantile of the last 250 residuals for the in-sample data

Measure 2 Mean Absolute Deviation Error To motivate thismeasure we first consider the MSEs

PE(σ 2t ) = mminus1

T+msum

i=T+1

(Y2i minus σ 2

i )2

The expected value can be decomposed as

E(PE) = mminus1T+msum

i=T+1

E(σ 2i minus σ 2

i )2 + mminus1T+msum

i=T+1

E(Y2i minus σ 2

i )2

+ mminus1T+msum

i=T+1

E(Y2i minus σ 2

i )(σ 2i minus σ 2

i ) (20)

Because σ 2i is predicable at time ti and Y2

i minus σ 2i is a martingale

difference the third term is 0 [the drift is assumed to be neg-ligible in the forgoing formulation of PE otherwise the driftterm should be removed first In both cases it has an affec-tion O(2)] Therefore we need consider only the first two

terms The first term reflects the effectiveness of the estimatedvolatility whereas the second term is the size of the stochasticerror and independent of estimators As in all statistical pre-diction problems the second term is usually of an order ofmagnitude larger than the first term Thus a small improvementin PE could mean substantial improvement over the estimatedvolatility However due to the well-known fact that financialtime series contain outliers the MSE is not a robust measureTherefore we used the mean absolute deviation error (MADE)MADE(σ 2

t ) = mminus1 sumT+mi=T+1 |Y2

i minus σ 2i |

Measure 3 Square-Root Absolute Deviation Error An al-ternative to MADE is the Square-root absolute deviation error(RADE) defined as

RADE(σ 2t ) = mminus1

T+msum

i=T+1

∣∣| Yi | minusradic2πσi

∣∣

The constant factor comes from the fact that E|εt| = radic2π for

εt sim N(01) If the underlying error distribution deviates fromnormality then this measure is not robust

Measure 4 Ideal Mean Absolute Deviation Error and Oth-ers To assess the estimation of the volatility in simulations wealso can use the ideal mean absolute deviation error (IMADE)

IMADE = mminus1T+msum

i=T+1

|σ 2i minus σ 2

i |

and the ideal square root absolute deviation error (IRADE)

IRADE = mminus1T+msum

i=T+1

|σi minus σi|

Another intuitive measure is the relative IMADE (RIMADE)

RIMADE = mminus1T+msum

i=T+1

|σ 2i minus σ 2

i |σ 2

i

This measure may create some outliers when the true volatilityis quite small at certain time points For real data analysis bothmeasures are not applicable

51 Simulations

To assess the performance of the five estimation methodslisted in Table 1 we compute the average and standard devi-ation of each of the four measures over 600 simulations Gen-erally speaking the smaller the average (or the standard devia-tion) the better the estimation approach We also compute theldquoscorerdquo of an estimator which is the percentage of times among600 simulations that the estimator outperforms the average ofthe 7 methods in terms of an effectiveness measure To be morespecific for example consider RiskMetrics using MADE as aneffectiveness measure Let mi be the MADE of the RiskMet-rics estimator at the ith simulation and let mi be the average ofthe MADEs for the five estimators at the ith simulation Thenthe ldquoscorerdquo of the RiskMetrics approach in terms of the MADEis defined as 1

600

sum600i=1 I(mi lt mi) Obviously estimators with

higher scores are preferred In addition we define the ldquorelativelossrdquo of an estimator σ 2

t relative to σ 2It in terms of MADEs as

RLOSS(σ 2t σ 2

It) = MADE(σ 2t )MADE(σ 2

It) minus 1

624 Journal of the American Statistical Association June 2007

where MADE(σ 2t ) is the average of MADE(σ 2

t ) among simula-tions

Example 1 In this example we compare the proposedmethods with the maximum likelihood estimation approachfor a parametric generalized autoregressive conditional het-eroscedasticity [GARCH(1 1)] model This will give an assess-ment of how well our methods perform compared with the mostefficient estimation method

Because the GARCH model does not have a state variablerti but the diffusion model does the diffusion model cannot beused to fit the data simulated from the GARCH model A sen-sible approach is to use the diffusion limit of the GARCH(1 1)model to generate data There are several interesting works onreconciling the two modeling approaches along this line forexample Nelson (1990) established the continuous-time dif-fusion limit for the discrete ARCH model Duan (1997) pro-posed an augmented GARCH model to unify various paramet-ric GARCH models and derived its diffusion limit and Wang(2002) studied the asymptotic relationship between GARCHmodels and diffusions Here we use the result on the diffusionlimit of the GARCH(1 1) model of Wang (2002)

Specifically we first generate data from the GARCH(1 1)model

Yt = σtεt

σ 2t = c0 + a0σ

2tminus1 + b0Y2

tminus1

where the εtrsquos are from the standard normal distribution and thetrue parameters c0 = 498 times 10minus7 a0 = 9289615 and b0 =0411574 are from the fitted values of the parameters for thedaily exchange rate of the Australian dollar with the US dollarfrom January 3 1994 to December 29 2000 The GARCH(1 1)model is then fitted by maximum likelihood estimation

Second we use the diffusion limit (see Wang 2002 sec 23)of the foregoing GARCH model

drt = σt dW1t

dσ 2t = (β0 + β1σ

2t )dt + β2σ

2t dW2t

where W1t and W2t are two independent standard Wienerprocess and the parameters are computed as (β0 β1 β2) =(25896 times 10minus5minus15538 4197) To simulate the diffusionprocess σ 2

t we use the discrete-time order 10 strong ap-proximation scheme of Kloeden Platen Schurz and Soslashrensen(1996)

For each of the two models we generate 600 series of dailydata each with length 1200 For each simulation we set thefirst 900 observations as the ldquoin-samplerdquo data and the last 300observations as the ldquoout-samplerdquo data The results summarizedin Table 2 show that all estimators have acceptable ER valuesclose to 05 The integrated estimator has minimum RIMADEIMADE MADE IRADE and RADE on average among allof the nonparametric estimators Compared with the most ef-ficient maximum likelihood estimator (MLE) of the GARCHmodel the integrated estimator has smaller MADE but largerRIMADE IMADE and IRADE The RADEs of the MLEand the integrated estimator are very close demonstrating thatthe integrated estimator is very impressive in forecasting thevolatility of this example

Note that the dynamic weights for the integrated estimatorare estimated by minimizing the variance of the combined esti-mator in (1) As one anonymous referee pointed out one maychoose to minimize the MSE instead which seems to be dif-ficult and computationally intensive in current situations Asargued in Section 31 the biases and variances trade-off havebeen considered indirectly in the time-domain and state-domainestimation methods The bias of the integrated estimator wascontrolled before the weights were chosen

To investigate whether the foregoing intuition is correct herewe opt for the suggestion from the referee to visually displayhow the integrated estimator dominates the state-domaintime-domain estimators over time through the dynamic weights andcompare the biases of the state-domain estimator and the inte-grated estimator The weights and biases are shown in Figure 2

Table 2 Comparisons of Several Volatility Estimation Methods

Measure Empirical formula Hist RiskM Semi NonBay Integ GARCH StaDo

RIMADE Score () 4073 2805 6494 5092 7379 7629 4090Average 205 204 183 187 178 169 204

Standard (times10minus2) 655 331 450 308 397 367 567Relative loss () 1544 1444 266 497 0 minus476 1475

IMADE Score () 6177 2554 5342 4608 7212 7546 4090Average (times10minus5) 332 367 330 336 318 304 362Standard (times10minus6) 1118 744 877 690 851 800 1113Relative loss () 440 1544 373 581 0 minus447 1390

MADE Score () 2821 4958 4975 5659 8815 5743 7963Average (times10minus4) 164 153 153 152 149 152 149Standard (times10minus6) 2314 2182 2049 2105 1869 1852 1876Relative loss () 1016 284 293 209 0 207 11

IRADE Score () 6277 2404 5109 4708 7062 7713 4040Average (times10minus3) 403 448 407 410 391 369 446Standard (times10minus3) 121 075 092 070 093 085 131Relative loss () 317 1475 411 498 0 minus553 1422

RADE Score () 3055 5042 4341 5910 8464 5893 7913Average (times10minus2) 20 19 19 19 19 19 19Standard (times10minus4) 153 145 141 142 134 133 137Relative loss () 513 142 165 96 0 106 minus25

ER Average (times10minus2) 495 545 538 528 525 516 500Standard (times10minus2) 157 113 128 112 136 125 149

Fan Fan and Jiang Integrating Time and State Domains for Volatility Estimation 625

(a) (b)

Figure 2 The Dynamic Weights for Prediction for Example 1 (a) and the Biases for Prediction for the State-Domain Estimator ( minus minus minus) and theIntegration Estimator ( mdash) (b)

It is seen that the bias is actually controlled because both of theestimators have very close biases The weights are around 2and through the weights the integrated method improves boththe RiskMetrics method and the state-domain approach

Example 2 To simulate the interest rate data we considerthe CoxndashIngersollndashRoss (CIR) model

drt = κ(θ minus rt)dt + σ r12t dWt t ge t0

where the spot rate rt moves around a central location or long-run equilibrium level θ = 08571 at speed κ = 21459 The σ

is set at 07830 These parameters values given by Chapmanand Pearson (2000) satisfy the condition 2κθ ge σ 2 so that theprocess rt is stationary and positive The model has been studiedby Chapman and Pearson (2000) and Fan and Zhang (2003)

We simulated weekly data (600 runs) from the CIR modelFor each simulation we designated the first 900 observations

as the ldquoin-samplerdquo data and the last 300 observations the ldquoout-samplerdquo data The results summarized in Table 3 show that theintegrated method performs closely to the state-domain methodThis is mainly because the dynamic weights shown in Figure 3are very small and the integration estimator is very close to thestate-domain estimator supporting our statement at the end ofthe third paragraph in Section 1 Table 3 shows that the inte-grated estimator uniformly dominates the other estimators onaverage due to its highest score and lowest averaged RIMADEIMADE MADE IRADE and RADE The improvement inRIMADE IMADE and IRADE is about 100 This demon-strates that our integrated volatility method better captures thevolatility dynamics The historical simulation method performspoorly due to misspecification of the function of the volatilityparameter The results here show the advantage of aggregatingthe information of the time and state domains Note that all es-timators have reasonable ER values at level 05 and that the ERvalue of the integrated estimator is closest to 05

Table 3 Comparisons of Several Volatility Estimation Methods

Measure Empirical formula Hist RiskM Semi NonBay Integ GARCH StaDo

RIMADE Score () 501 1820 2104 3656 9933 2404 9917Average (times10minus1) 267 190 188 164 61 202 62Standard (times10minus1) 148 35 81 31 39 130 43Relative loss () 33732 21074 20798 16821 0 23012 112

IMADE Score () 768 1319 1469 2972 9967 2120 9950Average (times10minus6) 237 208 191 179 64 196 63Standard (times10minus6) 100 78 69 68 45 91 46Relative loss () 27162 22558 19905 18107 0 20696 minus163

MADE Score () 3823 5209 5776 5609 7145 5392 6945Average (times10minus5) 102 93 93 93 91 94 92Standard (times10minus6) 349 333 321 330 317 310 317Relative loss () 1100 212 221 162 0 229 21

IRADE Score () 818 1252 1402 3072 9983 2104 9950Average (times10minus4) 376 322 307 277 99 316 98Standard (times10minus4) 140 75 90 66 60 192 62Relative loss () 27855 22397 20898 17892 0 21821 minus111

RADE Score () 4040 5209 5042 5593 7379 4875 7279Average (times10minus2) 15 15 15 15 15 15 15Standard (times10minus4) 272 267 257 265 258 251 257Relative loss () 608 123 160 92 0 189 06

ER Average 056 055 054 053 049 053 049Standard 023 011 013 011 013 036 013

626 Journal of the American Statistical Association June 2007

(a) (b)

Figure 3 The Dynamic Weights for Prediction for Example 2 (a) and the Biases for Prediction for the State-Domain Estimator ( minus minus minus) and forthe Integration Estimator ( mdash) (b)

Again Figure 3(b) displays the biases for the state domainand integration estimators It can be seen that the state-domainand integration estimators have almost overlaid biases demon-strating that the bias of the integrated estimator is taken care ofeven though we choose the dynamic weights by minimizing thevariance

Example 3 To assess the performance of the proposed meth-ods under a nonstationary situation we now consider the geo-metric Brownian (GBM)

drt = μrt dt + σ rt dWt

where Wt is a standard one-dimensional Brownian motion Thisis a nonstationary process against which we check whether ourmethod continues to apply Note that the celebrated BlackndashScholes option price formula is derived from Osbornersquos as-sumption that the stock price follows the GBM model By Itocircrsquosformula we have

log rt minus log r0 = (μ minus σ 22)t + σ 2Wt

We set μ = 03 and σ = 26 in our simulations With theBrownian motion simulated from independent Gaussian incre-ments we can generate the samples for the GBM We simulate600 times with = 1252 For each simulation we generate1000 observations and use the first two-thirds of the observa-tions as in-sample data and the remaining observations as out-sample data

Because of the nonstationarity of the process the simulationresults are somewhat unstable with a few uncommon realiza-tions appearing Therefore we use a robust method to sum-marize the results For each measure we compute the relativeloss and the median along with the median absolute deviation(MAD) for scale estimation The MAD for sample ais

i=1 isdefined as MAD(a) = median|ai minus median(a)| i = 1 sIt is easy to see that MAD(a) provides a robust estimator ofthe standard deviation of ai Table 4 reports the results Thetable shows that the integrated estimator performs quite wellAll estimators have acceptable ER values The proposed esti-

Table 4 Robust Comparisons of Several Volatility Estimation Methods

Measure Empirical formula Hist RiskM Semi NonBay Integ GARCH StaDo

RIMADE Score () 733 4300 3583 7283 6133 2733 5233Median (times10minus1) 2254 1561 1770 1490 1382 2036 1443

MAD 035 012 024 012 034 036 037Relative loss () 6310 1296 2805 781 0 4732 444

IMADE Score () 1333 4333 2633 7333 6517 2550 5300Median (times10minus7) 217 174 193 163 146 265 154MAD (times10minus8) 566 474 486 445 322 817 357

Relative loss () 4864 1913 3211 1205 0 8145 561

MADE Score () 3900 3950 2650 4933 5200 4233 5133Median (times10minus6) 115 111 111 110 109 111 109MAD (times10minus7) 236 244 240 244 231 230 230

Relative loss () 592 143 181 123 0 174 29

Score () 1417 4317 2500 7317 6467 2683 5300IRADE Median (times10minus4) 346 267 310 255 251 378 266

MAD (times10minus5) 599 406 488 389 566 847 644Relative loss () 3787 649 2358 149 0 5037 591

RADE Score () 3900 3700 2433 5117 6150 2950 5750Median (times10minus4) 1581 1509 1515 1508 1504 1661 1503MAD (times10minus4) 185 201 198 200 193 257 192

Relative loss () 509 30 73 23 0 1042 minus07

ER Median 054 051 054 051 048 057 0480MAD 010 003 004 003 006 007 006

Fan Fan and Jiang Integrating Time and State Domains for Volatility Estimation 627

(a) (b)

Figure 4 The Dynamic Weights for Prediction for Example 3 (a) and the Biases for Prediction for the State-Domain Estimator ( minus minus minus) and theIntegration Estimator ( mdash) (b)

mators have the smallest measure values in median or robustscale This shows that our integrated method continues to per-form better than the others for this nonstationary case

Again Figure 4(b) displays the biases for the state domainand integration estimators It can be seen that the state-domainand integration estimators have almost overlaid biases demon-strating that the bias of the integrated estimator is taken care ofeven though we choose the dynamic weights by minimizing thevariance

52 Real Examples

In this section we apply the integrated volatility estimationmethods and other methods to the analysis of real financial data

521 Treasury Bond Here we consider the weekly returnsof three Treasury Bonds with terms of 1 5 and 10 years Thein-sample data are for January 4 1974ndashDecember 30 1994 andout-sample data are for January 6 1995ndashAugust 8 2003 Thetotal sample size is 1545 and the in-sample size is 1096 Theresults are reported in Table 5

From Table 5 for the 1-year Treasury Notes the integratedestimator is of the smallest MADE and the smallest RADE re-flecting that the integrated estimation method of the volatilityis the best among the seven methods For the bonds with 5-or 10-year maturities the seven estimators have close MADEsand RADEs where the historical simulation method is bet-ter than the RiskMetrics in terms of MADE and RADE andthe integrated estimation approach has the smallest MADEsIn addition the integrated estimator has ER values closest to

05 This demonstrates the advantage of using state-domaininformation which can improve the time-domain predictionof the volatilities in the interest dynamics The nonparametricBayesian method does not perform as well as the integrated es-timator because it uses fixed weights and the inverse-gammaprior which may not be true in practice

522 Exchange Rate We analyze the daily exchange rateof several foreign currencies against the US dollar The dataare for January 3 1994ndashAugust 1 2003 The in-sample dataconsist of the observations before January 1 2001 and the out-sample data consist of the remaining observations The resultsreported in Table 6 show that the integrated estimator has thesmallest MADEs and moderate RADEs for the exchange ratesNote that the RADE measure is effective only when the dataseem to be from a normal distribution and thus cannot calibratethe accuracy of volatility forecast for data from an unknowndistribution For this reason the RADE might not be a goodmeasure of performance for this example Both the integratedvolatility estimation and GARCH perform nicely for this exam-ple They outperform other nonparametric methods

Figure 5 reports the dynamic weights for the daily exchangerate of the UK pound to the US dollar Because the RiskMestimator is far more informative than the StaDo estimator ourintegrated estimator becomes basically the time-domain esti-mator by automatically choosing large dynamic weights Thisexemplifies our statement in the paragraph immediately before(1) in Section 1

Table 5 Comparisons of Several Volatility Estimation Methods

Term Measure (times10minus2) Hist RiskM Semi NonBay Integ GARCH StaDo

1 year MADE 1044 787 787 794 732 893 914RADE 5257 4231 4256 4225 4105 4402 4551

ER 2237 2009 2013 1562 3795 0 7366

5 years MADE 1207 1253 1296 1278 1201 1245 1638RADE 5315 5494 5630 5563 5571 5285 6543

ER 671 1339 1566 1112 5804 0 8929

10 years MADE 1041 1093 1103 1111 1018 1046 1366RADE 4939 5235 5296 5280 5152 4936 6008

ER 1112 1563 1790 1334 4911 0 6920

628 Journal of the American Statistical Association June 2007

Table 6 Comparison of Several Volatility Estimation Methods

Currency Measure Hist RiskM Semi NonBay Integ GARCH StaDo

UK MADE (times10minus4) 614 519 536 519 493 506 609RADE (times10minus3) 3991 3424 3513 3440 3492 3369 3900

ER 009 014 019 015 039 005 052

Australia MADE (times10minus4) 172 132 135 135 126 132 164RADE (times10minus3) 1986 1775 1830 1798 1761 1768 2033

ER 054 025 026 022 042 018 045

Japan MADE (times10minus1) 5554 5232 5444 5439 5064 5084 6987RADE (times10minus1) 3596 3546 3622 3588 3559 3448 4077

ER 012 011 019 012 028 003 032

6 CONCLUSIONS

We have proposed a dynamically integrated method and aBayesian method to aggregate the information from the timedomain and the state domain We studied the performance com-parisons both empirically and theoretically We showed that theproposed integrated method effectively aggregates the informa-tion from both the time and state domains and has advantagesover some previous methods for example it is powerful in fore-casting volatilities for the yields of bonds and for exchangerates Our study also revealed that proper use of informationfrom both the time domain and the state domain makes volatil-ity forecasting more accurate Our method exploits both thecontinuity in the time domain and the stationarity in the statedomain and can be applied to situations in which these twoconditions hold approximately

APPENDIX CONDITIONS AND PROOFSOF THEOREMS

We state technical conditions for the proof of our results

(C1) (Global Lipschitz condition) There exists a constant k0 ge 0such that

|μ(x) minus μ(y)| + |σ(x) minus σ(y)| le k0|x minus y| for all x y isin R

(C2) Given time point t gt 0 there exists a constant L gt 0 such that

E|μ(rs)|4(p+δ) le L and E|σ(rs)|4(p+δ) le L

for any s isin [t minus η t] where η is some positive constant p is a positiveinteger and δ is some small positive constant

Figure 5 Dynamic Weights for the Daily Exchange Rate of the UKPound and the US Dollar

(C3) The discrete observations rtiNi=0 satisfy the stationarity con-

ditions of Banon (1978) Furthermore the G2 condition of Rosenblatt(1970) holds for the transition operator

(C4) The conditional density p(y|x) of rti+ given rti is continuousin the arguments (y x) and is bounded by a constant independent of

(C5) The kernel W is a bounded symmetric probability densityfunction with compact support say [minus11]

(C6) (N minus n)h rarr infin (N minus n)h5 rarr 0 and (N)h rarr 0

Condition (C1) ensures that (3) has a unique strong solution rt t ge0 continuous with probability 1 if the initial value r0 satisfies thatP(|r0| lt infin) = 1 (see thm 46 of Liptser and Shiryayev 2001) Condi-tion (C2) indicates that given time point t gt 0 there is a time interval[t minus η t] on which the drift and the volatility have finite 4(p + δ)thmoments which is needed for deriving asymptotic normality of thetime-domain estimate σ 2

ESt Conditions (C3)ndash(C5) are similar to thoseof Fan and Zhang (2003) Condition (C6) is assumed for bandwidthswhere undersmoothing is uses to yield zero bias in the asymptotic nor-mality of the state-domain estimate

Throughout the proof we denote by M a generic positive constantand use μs and σs to represent μ(rs) and σ(rs)

Proposition A1 Under conditions (C1) and (C2) we have for al-most all sample paths

|σ 2s minus σ 2

u | le K|s minus u|(2pminus1)(4p) (A1)

for any su isin [t minus η t] where the coefficient K satisfies E[K2(p+δ)]ltinfin

Proof First we show that the process rs is locally Houmllder-continuous with order q = (p minus 1)(2p) and coefficient K1 such that

E[K4(p+δ)1 ] lt infin that is

|rs minus ru| le K1|s minus u|q foralls u isin [t minus η t] (A2)

In fact by Jensenrsquos inequality and martingale moment inequalities(Karatzas and Shreve 1991 sec 33D p 163) we have

E|ru minus rs|4(p+δ)

le M

(E

∣∣∣∣int u

sμv dv

∣∣∣∣4(p+δ)

+ E

∣∣∣∣int u

sσv dWv

∣∣∣∣4(p+δ))

le M(u minus s)4(p+δ)minus1int u

sE|μv|4(p+δ) dv

+ M(u minus s)2(p+δ)minus1int u

sE|σv|4(p+δ) dv

le M(u minus s)2(p+δ)

Then by theorem 21 of Revuz and Yor (1999 p 26)

E[(

supt 13=s

|rs minus ru||s minus u|α)4(p+δ)]

lt infin (A3)

Fan Fan and Jiang Integrating Time and State Domains for Volatility Estimation 629

for any α isin [02(p+δ)minus1

4(p+δ)) Let α = pminus1

2p and K1 = sups 13=u|rs minusru||sminusu|(pminus1)(2p) Then E[K4(p+δ)

1 lt infin] and the inequality (A2)holds

Second by condition (C1) we have |σ 2s minus σ 2

u | le k0(σs + σu)|rs minusru| This combined with (A2) leads to

|σ 2s minus σ 2

u | le k0(σs + σu)K1|s minus u|q equiv K|s minus u|q

Then by Houmllderrsquos inequality

E[K2(p+δ)

] le ME[(σsK1)2(p+δ)

]

le Mradic

E[σ

4(p+δ)s

]E[K4(p+δ)

1

]lt infin

Proof of Theorem 1

Let Zis = (rs minus rti)2 Applying Itocircrsquos formula to Zis we obtain

dZis = 2

(int s

tiμu du +

int s

tiσu dWu

)(μs ds + σs dWs

)+ σ 2

s ds

= 2

[(int s

tiμu du +

int s

tiσu dWu

)μs ds + σs

(int s

tiμu du

)dWs

]

+ 2

(int s

tiσu dWu

)σs dWs + σ 2

s ds

Then Y2i can be decomposed as Y2

i = 2ai + 2bi + σ 2i where

ai = minus1[int ti+1

tiμs ds

int s

tiμu du +

int ti+1

tiμs ds

int s

tiσu dWu

+int ti+1

tiσs dWs

int s

tiμu du

]

bi = minus1int ti+1

ti

int s

tiσu dWuσs dWs

and

σ 2i = minus1

int ti+1

tiσ 2

s ds

Therefore σ 2ESt can be written as

σ 2ESt = 2

1 minus λ

1 minus λn

tminus1sum

i=tminusn

λtminusiminus1ai + 21 minus λ

1 minus λn

tminus1sum

i=tminusn

λtminusiminus1bi

+ 1 minus λ

1 minus λn

tminus1sum

i=tminusn

λtminusiminus1σ 2i

equiv An + Bn + Cn

By Proposition A1 as n rarr 0 |Cn minus σ 2t | le K(n)q where q =

(2p minus 1)(4p) This combined with Lemmas A1 and A2 completesthe proof of the theorem

Lemma A1 If condition (C2) is satisfied then E[A2n] = O()

Proof Simple algebra gives the result In fact

E(a2i ) le 3E

[minus1

int ti+1

tiμs ds

int s

tiμu du

]2

+ 3E

[minus1

int ti+1

tiμs ds

int s

tiσu dWu

]2

+ 3E

[minus1

int ti+1

tiσs dWs

int s

tiμu du

]2

equiv I1() + I2() + I3()

Applying Jensenrsquos inequality we obtain that

I1() = O(minus1)E

[int ti+1

ti

int s

tiμ2

s μ2u du ds

]

= O(minus1)

int ti+1

ti

int s

tiE(μ4

u + μ4s )du ds = O()

By Jensenrsquos inequality Houmllderrsquos inequality and martingale momentinequalities we have

I2() = O(minus1)

int ti+1

tiE

(μs

int s

tiσu dWu

)2ds

= O(minus1)

int ti+1

ti

E[μs]4E

[int ti+1

tiσu dWu

]412ds

= O()

Similarly I3() = O() Therefore E(a2i ) = O() Then by the

CauchyndashSchwartz inequality and noting that n(1 minus λ) = O(1) we ob-tain that

E[A2n] le n

(1 minus λ

1 minus λn

)2 nsum

i=1

λ2(nminusi)E(a2i ) = O()

Lemma A2 Under conditions (C1) and (C2) if n rarr infin andn rarr 0 then

sminus11t

radicnBn

Dminusrarr N (01) (A4)

Proof Note that bj = σ 2t minus1 int tj+1

tj (Ws minus Wtj)dWs + εj where

εj = minus1int tj+1

tj(σs minus σt)

[int s

tjσu dWu

]dWs

+ minus1σt

int tj+1

tj

[int s

tj(σu minus σt)dWu

]dWs

By the central limit theorem for martingales (see Hall and Heyde 1980cor 31) it suffices to show that

V2n equiv E[sminus2

1t nB2n] rarr 1 (A5)

and that the following Lyapunov condition holds

tminus1sum

i=tminusn

E

(radicn

1 minus λ

1 minus λn λtminusiminus1bi

)4rarr 0 (A6)

Note that

2

2E(ε2

j ) le E

int tj+1

tj(σs minus σt)

[int s

tjσu dWu

]dWs

2

+ σ 2t E

int tj+1

tj

[int s

tj(σu minus σt)dWu

]dWs

2

equiv Ln1 + Ln2 (A7)

By Jensenrsquos inequality Houmllderrsquos inequality and moment inequalitiesfor martingale we have

Ln1 leint tj+1

tjE

(σs minus σt)

2[int s

tjσu dWu

]2ds

leint tj+1

tj

E(σs minus σt)

4E

[int s

tjσu dWu

]412ds

leint tj+1

tj

E[k0K1(n)q]436

int s

tjE(σ 4

u )du

12ds

le M(n)2q2 (A8)

630 Journal of the American Statistical Association June 2007

where K1 is defined in Proposition A1 Similarly

Ln2 le M(n)2q2 (A9)

By (A7) (A8) and (A9) we have E(ε2j ) le M(n)2q therefore

E[σminus4t b2

j ] = 12 + O((n)q) By the theory of stochastic calculus sim-

ple algebra gives that E(bj) = 0 and E(bibj) = 0 for i 13= j It followsthat

V2n = E(sminus2

1t nB2n) =

tminus1sum

i=tminusn

E

(2s1t

radicn

1 minus λ

1 minus λn λtminusiminus1bi

)2rarr 1

that is (A5) holds For (A6) it suffices to prove that E(b4j ) is

bounded which holds from condition (C2) and by applying the mo-ment inequalities for martingales to b4

j

Proof of Theorem 2

The proof is completed along the same lines as given by Fan andZhang (2003)

Proof of Theorem 3

By Fan and Yao (1998) the volatility estimator σ 2StN

behaves asif the instantaneous return function f were known Thus without lossof generality we assume that f (x) = 0 and hence Ri = Y2

i Let Y =(Y2

0 Y2Nminus1)T and W = diagWh(rt0 minus rtN ) Wh(rtNminus1 minus rtN )

and let X be a (N minus t0)times 2 matrix with each of the elements in the firstcolumn being 1 and the second column being (rt0 minus rtN rtNminus1 minusrtN )prime Let mi = E[Y2

i |rti ] m = (m0 mNminus1)T and e1 = (10)T Define SN = XT WX and TN = XT WY Then it can be written thatσ 2

StN= eT

1 Sminus1N TN (see Fan and Yao 2003) Hence

σ 2StN

minus σ 2tN = eT

1 Sminus1N XT Wm minus XβN + eT

1 Sminus1N XT W(Y minus m)

equiv eT1 b + eT

1 t (A10)

where βN = (m(rtN ) mprime(rtN ))T with m(rx) = E[Y21 |rt1 = x] Follow-

ing Fan and Zhang (2003) the bias vector b converges in probability toa vector b with b = O(h2) = o(1

radic(N)h) In what follows we show

that the centralized vector t is asymptotically normalIn fact write u = (N)minus1Hminus1XT W(Y minus m) where H = diag1h

Then following Fan and Zhang (2003) the vector t can be written as

t = pminus1(rtN )Hminus1Sminus1u(1 + op(1)) (A11)

where S = (μi+jminus2)ij=12 with μj = intujW(u)du and p(x) is the in-

variant density of rt defined in Section 22 For any constant vector cdefine

QN = cT u = 1

N

Nminus1sum

i=0

Y2i minus miCh

(rti minus rtN

)

where Ch(middot) = 1hC(middoth) with C(x) = c0W(x) + c1xW(x) Applyingthe ldquobig-blockrdquo and ldquosmall-blockrdquo arguments of Fan and Yao (2003thm 63 p 238) we obtain

θminus1(rtN )radic

(N)hQNDminusrarr N(01) (A12)

where θ2(rtN ) = 2p(rtN )σ 4(rtN )int +infinminusinfin C2(u)du In what follows we

decompose QN into two parts QprimeN and Qprimeprime

N which satisfy the follow-ing

(a) NhE[θminus1(rtN )QprimeN ]2 le (hN)(hminus1aN(1 + o(1)) + (N) times

o(hminus1)) rarr 0

(b) QprimeprimeN is identically distributed as QN and is asymptotically inde-

pendent of σ 2EStN

Define QprimeN = 1

NsumaN

i=0Y2i minus E[Y2

i |rti ]Ch(rti minus rtN ) and QprimeprimeN = QN minus

QprimeN where aN is a positive integer satisfying aN = o(N) and aN rarr

infin Obviously aN n Let ϑNi = (Y2i minus mi)Ch(rti minus rtN ) Then fol-

lowing Fan and Zhang (2003)

var[θminus1(

rtN)ϑN1

] = hminus1(1 + o(1))

and

Nminus2sum

=1

| cov(ϑN1 ϑN+1)| = o(hminus1)

which yields the result in (a) This combined with (A12) (a) and thedefinition of Qprimeprime

N leads to

θminus1(rtN )radic

NhQprimeprimeN

Dminusrarr N(01) (A13)

Note that the stationarity conditions of Banon (1978) and the G2 con-dition of Rosenblatt (1970) on the transition operator imply that theρt mixing coefficient ρt() of rti decays exponentially and that thestrong mixing coefficient α() le ρt() it follows that∣∣E exp

(Qprimeprime

N + σ 2EStN

) minus E expiξ(QprimeprimeN)E exp

(iξ σ 2

EStN

)∣∣

le 32α((aN minus n)) rarr 0

for any ξ isin R Using the theorem of Volkonskii and Rozanov (1959)we get the asymptotic independence of σ 2

EStNand Qprimeprime

N

By (a)radic

NhQprimeN is asymptotically negligible Then by Theorem 1

d1θminus1(rtN

)radicNhQN + d2Vminus12

2radic

n[σ 2

EStNminus σ 2(

rtN)]

DminusrarrN (0d21 + d2

2)

for any real d1 and d2 where V2 = ec+1ecminus1σ 4(rtN ) Because QN is a

linear transformation of u

Vminus12[ radic

Nhuradicn[σ 2

EStNminus σ 2(rtN )]

]Dminusrarr N (0 I3)

where V = blockdiagV1V2 with V1 = 2σ 4(rtN )p(rtN )Slowast whereSlowast = (νi+jminus2)ij=12 with νj = int

ujW2(u)du This combined with

(A11) gives the joint asymptotic normality of t and σ 2EStN

Note that

b = op(1radic

Nh) and it follows that

minus12(radic

Nh[σ 2StN

minus σ 2(rtN )]radic

n[σ 2EStN

minus σ 2(rtN )])

Dminusrarr N (0 I2)

where = diag2σ 4(rtN )ν0p(rtN )V2 Note that σ 2StN

and σ 2EStN

are asymptotically independent it follows that the asymptotical nor-mality in (b) holds

The result in (c) follows from (b) and the fact that wtNPrarr wtN and

σ 2ItN

= wtN σ 2EStN

+ (1 minus wtN )σ 2StN

It follows that

σ 2ItN minus σ 2

tN

= σ 2tN minus σ 2

tN + (wtN minus wtN

)[(σ 2

EStNminus σ 2

tN

) minus (σ 2

StNminus σ 2

tN

)]

equiv Ln1 + (wtN minus wtN

)(Ln2 minus Ln3)

Because Ln1 Ln2 and Ln3 are of the same order the second and thirdterms are dominated by the first term Ln1 Then σ 2

ItNminus σ 2

tN = (σ 2tN minus

σ 2tN )(1 + op(1)) and thus σ 2

ItNminus σ 2

tN shares the same asymptotical

normality as σ 2tN minus σ 2

tN

[Received November 2005 Revised December 2006]

Fan Fan and Jiang Integrating Time and State Domains for Volatility Estimation 631

REFERENCESAiumlt-Sahalia Y and Mykland P (2004) ldquoEstimators of Diffusions With Ran-

domly Spaced Discrete Observations A General Theoryrdquo The Annals of Sta-tistics 32 2186ndash2222

Aiumlt-Sahalia Y Mykland P A and Zhang L (2005) ldquoHow Often to Samplea Continuous-Time Process in the Presence of Market Microstructure NoiserdquoReview of Financial Studies 18 351ndash416

Arapis M and Gao J (2004) ldquoNonparametric Kernel Estimation and Testingin Continuous-Time Financial Econometricsrdquo manuscript

Bandi F and Phillips (2003) ldquoFully Nonparametric Estimation of Scalar Dif-fusion Modelsrdquo Econometrica 71 241ndash283

Banon G (1978) ldquoNonparametric Identification for Diffusion ProcessesrdquoSIAM Journal of Control Optimization 16 380ndash395

Barndoff-Neilsen O E and Shephard N (2001) ldquoNon-Gaussian OrnsteinndashUhlenbeckndashBased Models and Some of Their Uses in Financial Economicsrdquo(with discussion) Journal of the Royal Statistical Society Ser B 63167ndash241

(2002) ldquoEconometric Analysis of Realized Volatility and Its Use inEstimating Stochastic Volatility Modelsrdquo Journal of the Royal Statistical So-ciety Ser B 64 253ndash280

Bollerslev T and Zhou H (2002) ldquoEstimating Stochastic Volatility DiffusionUsing Conditional Moments of Integrated Volatilityrdquo Journal of Economet-rics 109 33ndash65

Brenner R J Harjes R H and Kroner K F (1996) ldquoAnother Look at Mod-els of the Short-Term Interest Raterdquo Journal of Financial and QuantitativeAnalysis 31 85ndash107

Cai Z and Hong Y (2003) ldquoNonparametric Methods in Continuous-TimeFinance A Selective Reviewrdquo in Recent Advances and Trends in Nonpara-metric Statistics eds M G Akritas and D M Politis Amsterdam Elsevierpp 283ndash302

Chan K C Karolyi A G Longstaff F A and Sanders A B (1992)ldquoAn Empirical Comparison of Alternative Models of the Short-Term Inter-est Raterdquo Journal of Finance 47 1209ndash1227

Chapman D A and Pearson N D (2000) ldquoIs the Short Rate Drift ActuallyNonlinearrdquo Journal of Finance 55 355ndash388

Chen S X Gao J and Tang C Y (2007) ldquoA Test for Model Specificationof Diffusion Processesrdquo The Annals of Statistics to appear

Cox J C Ingersoll J E and Ross S A (1985) ldquoA Theory of the TermStructure of Interest Ratesrdquo Econometrica 53 385ndash467

Daveacute R D and Stahl G (1997) ldquoOn the Accuracy of VaR Estimates Based onthe VariancendashCovariance Approachrdquo working paper Olshen and Associates

Duan J C (1997) ldquoAugumented GARCH(pq) Process and Its DiffusionLimitrdquo Journal of Econometrics 79 97ndash127

Duffie D and Pan J (1997) ldquoAn Overview of Value at Riskrdquo The Journal ofDerivatives 4 7ndash49

Fan J (2005) ldquoA Selective Overview of Nonparametric Methods in FinancialEconometricsrdquo (with discussion) Statistical Science 20 317ndash357

Fan J and Gu J (2003) ldquoSemiparametric Estimation of Value-at-RiskrdquoEconometrics Journal 6 261ndash290

Fan J Jiang J Zhang C and Zhou Z (2003) ldquoTime-Dependent DiffusionModels for Term Structure Dynamics and the Stock Price Volatilityrdquo Statis-tica Sinica 13 965ndash992

Fan J and Yao Q (1998) ldquoEfficient Estimation of Conditional Variance Func-tions in Stochastic Regressionrdquo Biometrika 85 645ndash660

(2003) Nonlinear Time Series Nonparametric and Parametric Meth-ods New York Springer-Verlag

Fan J and Zhang C M (2003) ldquoA Reexamination of Diffusion EstimatorsWith Applications to Financial Model Validationrdquo Journal of the AmericanStatistical Association 98 118ndash134

Genon-Catalot V Jeanthheau T and Laredo C (1999) ldquoParameter Esti-mation for Discretely Observed Stochastic Volatility Modelsrdquo Bernoulli 5855ndash872

Gijbels I Pope A and Wand M P (1999) ldquoUnderstanding ExponentialSmoothing via Kernel Regressionrdquo Journal of the Royal Statistical SocietySer B 61 39ndash50

Hall P and Hart J D (1990) ldquoNonparametric Regression With Long-RangeDependencerdquo Stochastic Processes and Their Applications 36 339ndash351

Hall P and Heyde C (1980) Martingale Limit Theorem and Its ApplicationsNew York Academic Press

Hallin M Lu Z and Tran L T (2001) ldquoDensity Estimation for Spatial Lin-ear Processesrdquo Bernoulli 7 657ndash668

Haumlrdle W Herwartz H and Spokoiny V (2002) ldquoTime-InhomogeneousMultiple Volatility Modellingrdquo Journal of Finance and Econometrics 155ndash95

Hong Y and Li H (2005) ldquoNonparametric Specification Testing forContinuous-Time Models With Applications to Term Structure of InterestrdquoReview of Financial Studies 18 37ndash84

Karatzas I and Shreve S (1991) Brownian Motion and Stochastic Calculus(2nd ed) New York Springer-Verlag

Kloeden D E Platen E Schurz H and Soslashrensen M (1996) ldquoOn Effectsof Discretization on Estimators of Drift Parameters for Diffusion ProcessesrdquoJournal of Applied Probability 33 1061ndash1076

Liptser R S and Shiryayev A N (2001) Statistics of Random Processes I II(2nd ed) New York Springer-Verlag

Mercurio D and Spokoiny V (2004) ldquoStatistical Inference for Time-Inhomogeneous Volatility Modelsrdquo The Annals of Statistics 32 577ndash602

Morgan J P (1996) RiskMetrics Technical Document (4th ed) New YorkAuthor

Nelson D B (1990) ldquoARCH Models as Diffusion Approximationsrdquo Journalof Econometrics 45 7ndash38

Revuz D and Yor M (1999) Continuous Martingales and Brownian MotionNew York Springer-Verlag

Robinson P M (1997) ldquoLarge-Sample Inference for Nonparametric Regres-sion With Dependent Errorsrdquo The Annals of Statistics 25 2054ndash2083

Rosenblatt M (1970) ldquoDensity Estimates and Markov Sequencesrdquo in Non-parametric Techniques in Statistical Inference ed M L Puri CambridgeUK Cambridge University Press pp 199ndash213

Ruppert D Wand M P Holst U and Houmlssjer O (1997) ldquoLocal PolynomialVariance Function Estimationrdquo Technometrics 39 262ndash273

Spokoiny V (2000) ldquoDrift Estimation for Nonparametric Diffusionrdquo The An-nals of Statistics 28 815ndash836

Stanton R (1997) ldquoA Nonparametric Models of Term Structure Dynamics andthe Market Price of Interest Rate Riskrdquo Journal of Finance LII 1973ndash2002

Vasicek O (1977) ldquoAn Equilibrium Characterization of Term Structurerdquo Jour-nal of Financial Economics 5 177ndash188

Volkonskii V A and Rozanov Yu A (1959) ldquoSome Limit Theorems forRandom Functions Part Irdquo Theory of Probability and Its Applications 4178ndash197

Wang Y (2002) ldquoAsymptotic Nonequivalence of GARCH Models and Diffu-sionsrdquo The Annals of Statistics 30 754ndash783

Page 3: Dynamic Integration of Time- and ... - Princeton Universityorfe.princeton.edu/~jqfan/papers/04/timestate4.pdf · Dynamic Integration of Time- and State-Domain Methods for Volatility

620 Journal of the American Statistical Association June 2007

is to estimate the volatility σ 2t equiv σ 2(rt) Let Yi = minus12(rti+1 minus

rti) Then for model (3) the Euler approximation scheme is

Yi asymp μ(rti)12 + σ(rti)εi (4)

where εiiidsim N(01) for i = 0 N minus 1 Fan and Zhang (2003)

studied the impact of the order of difference on statistical es-timation They found that although a higher order can possi-bly reduce approximation errors it substantially increases thevariances of data They recommended the Euler scheme (4) formost practical situations

To facilitate the derivation of mathematical theory we focuson the estimation of volatility in model (3) to illustrate howto deal with the problem of dynamic integration Asymptoticnormality of the proposed estimator is established under thismodel and extensive simulations are conducted This theoret-ically and empirically demonstrates the dominant performanceof the integrated estimation Our method focuses on only theestimation of volatility but it can be adapted to other estima-tion problems such as the value at risk studied by Duffie andPan (1997) the drift estimation for diffusion considered bySpokoiny (2000) and conditional moments conditional corre-lation conditional distribution and derivative pricing It is alsoapplicable to other estimation problems in time series such asforecasting the mean function Further studies along these linesare beyond the scope of the current investigation

2 ESTIMATION OF VOLATILITY

From here on for theoretical derivations we assume that thedata rti are sampled from the diffusion model (3) As demon-strated by Stanton (1997) and Fan and Zhang (2003) the driftterm in (4) contributes to the volatility in the order of o(1) undercertain conditions when rarr 0 (eg Stanton 1997) Thus forsimplicity we sometimes ignore the drift term when estimatingthe volatility

21 Time-Domain Estimator

A popular version of the time-domain estimator of thevolatility is the moving average estimator

σ 2MAt = nminus1

tminus1sum

i=tminusn

Y2i (5)

where n is the size of the moving window This estimatorignores the drift component and uses local n data points Anextension of the moving average estimator is the exponentialsmoothing estimation given by

σ 2ESt = (1 minus λ)Y2

tminus1 + λσ 2EStminus1

= (1 minus λ)Y2tminus1 + λY2

tminus2 + λ2Y2tminus3 + middot middot middot (6)

where λ is a smoothing parameter between 0 and 1 that controlsthe size of the local neighborhood This estimator is closelyrelated to that from the GARCH(1 1) model (see Fan JiangZhang and Zhou 2003 for more details) The RiskMetrics ofJ P Morgan (1996) which is used for measuring the risk calledvalue at risk (VaR) of financial assets recommends λ = 94and λ = 97 for calculating the VaR of the daily and monthlyreturns

The exponential smoothing estimator in (6) is a weightedsum of the squared returns before time t Because the weightdecays exponentially it essentially uses recent data A slightlymodified version that explicitly uses only n data points beforetime t is

σ 2ESt = 1 minus λ

1 minus λn

nsum

i=1

Y2tminusiλ

iminus1 (7)

where the smoothing parameter λ depends on n When λ rarr 1it becomes the moving average estimator (5)

Theorem 1 Suppose that σ 2t gt 0 Under conditions (C1)

and (C2) if n rarr infin and n rarr 0 then σ 2ESt minus σ 2

t rarr 0 almostsurely Moreover if the limit c = limnrarrinfin n(1 minus λ) exists andn(2pminus1)(4pminus1) rarr 0 where p is as specified in condition (C2)then

radicn[σ 2

ESt minus σ 2t ]s1t

Dminusrarr N (01)

where s21t = cσ 4

tec+1ecminus1

Theorem 1 has very interesting implications We can com-pute the variance as if the data were independent Indeed if thedata in (7) were independent and locally homogeneous then

var(σ 2ESt) asymp (1 minus λ)2

(1 minus λn)22σ 4

t

nsum

i=1

λ2(iminus1) asymp 1

ns2

1t

This is indeed the asymptotic variance given in Theorem 1

22 Estimation in State Domain

To obtain the nonparametric estimation of the functionsf (x) = 12μ(x) and σ 2(x) in (4) we use the local linearsmoother studied by Ruppert Wand Holst and Houmlssjer (1997)and Fan and Yao (1998) The local linear technique is chosenfor its several nice properties including asymptotic minimaxefficiency and design adaptation Furthermore it automaticallycorrects edge effects and facilitates bandwidth selections (Fanand Yao 2003)

Note that the historical data used in the state-domain estima-tion at time t are (rti Yi) i = 0 N minus 1 Let f (x) = α1 bethe local linear estimator that solves the weighted least squaresproblem

(α1 α2) = arg minα1α2

Nminus1sum

i=0

[Yi minus α1 minus α2

(rti minus x

)]2Kh1

(rti minus x

)

where K(middot) is a kernel function and h1 gt 0 is a bandwidth De-note the squared residuals by Ri = Yi minus f (rti)2 Then the locallinear estimator of σ 2(x) is σ 2

S (x) = β0 given by

(β0 β1) = arg minβ0β1

Nminus1sum

i=0

Ri minus β0 minus β1

(rti minus x

)2Wh

(rti minus x

)

(8)with a kernel function W and a bandwidth h Fan and Yao(1998) gave strategies for bandwidth selection Stanton (1997)and Fan and Zhang (2003) showed that Y2

i instead of Ri in (8)also can be used for the estimation of σ 2(x)

The asymptotic bias and variance of σ 2S (x) have been given

by Fan and Zhang (2003 thm 4) Set νj = intujW2(u)du for

j = 012 and let p(middot) be the invariant density function of the

Fan Fan and Jiang Integrating Time and State Domains for Volatility Estimation 621

Markov process rs from (3) We then have the following re-sult

Theorem 2 Set s22(x) = 2ν0σ

4(x)p(x) Let x be in the inte-rior of the support of p(middot) Suppose that the second derivativesof μ(middot) and σ 2(middot) exist in a neighborhood of x Under conditions

(C3)ndash(C6)radic

Nh[σ 2S (x) minus σ 2(x)]s2(x)

Dminusrarr N (01)

3 DYNAMIC INTEGRATION OF TIMEndash ANDSTATEndashDOMAIN ESTIMATORS

In this section we first show how the optimal dynamicweights in (2) can be estimated and then prove that the time-domain and state-domain estimators are indeed asymptoticallyindependent

31 Estimation of Dynamic Weights

For the exponential smoothing estimator in (7) we can applythe asymptotic formula given in Theorem 1 to get an estimate ofits asymptotic variance But because the estimator is a weightedaverage of Y2

tminusi we also can obtain its variance directly by as-suming that Ytminusj sim N(0 σ 2

t ) for small j Indeed with the fore-going local homogeneous model we have

var(σ 2ESt)

asymp (1 minus λ)2

(1 minus λn)22σ 4

t

nsum

i=1

nsum

j=1

λi+jminus2ρt(|i minus j|)

= 2(1 minus λ)2σ 4t

(1 minus λn)2

n + 2

nminus1sum

k=1

ρt(k)λk(1 minus λ2(nminusk))

1 minus λ2

(9)

where ρt(k) = cor(Y2t Y2

tminusk) is the autocorrelation of the seriesY2

tminusk The autocorrelation can be estimated from the data inhistory Note that due to the locality of the exponential smooth-ing only ρt(k)rsquos with the first 30 lags say contribute to thevariance calculation

We now turn to estimate the variance of σ 2St = σ 2

S (rt) Detailsof this have been given by Fan and Yao (1998 2003 sec 62)Let

Vj(x) =Nminus1sum

i=1

(rti minus x)jW((

rti minus x)h1

)(10)

and ξi(x) = W(rti minusx

h1)V2(x) minus (rti minus x)V1(x)V0(x)V2(x) minus

V1(x)2 Then the local linear estimator can be expressed asσ 2

S (x) = sumNminus1i=1 ξi(x)Ri and its variance can be approximated

as

var(σ 2S (x)) asymp 2σ 4(x)

Nminus1sum

i=1

ξ2i (x) (11)

Substituting (9) and (11) into (2) we propose to combinethe time-domain and state-domain estimators with the dynamicweight

wt = σ 4St

sumNminus1i=1 ξ2

i (rt)

σ 4St

sumNminus1i=1 ξ2

i (rt) + ctσ4ESt

(12)

where ct = (1minusλ)2

(1minusλn)2 n+2sumnminus1

k=1 ρt(k)λk(1minusλ2(nminusk))(1minusλ2)For practical implementation we truncate the series ρt(k)tminus1

k=1

in the summation as ρt(k)30k=1 This results in the dynamically

integrated estimator

σ 2It = wtσ

2ESt + (1 minus wt)σ

2St (13)

where σ 2St = σ 2

S (rt) The function σ 2S (middot) uses historical data

up to the time t and we need to update this function as timeevolves Fortunately we need know only the function value atthe point rt which significantly reduces the computational costThe computational cost can be further reduced if we update theestimated function σ 2

St on a prescribed time schedule (eg onceevery 2 months for weekly data)

Finally we note that in the choice of weight only the vari-ance of the estimated volatility is considered not the meansquared error (MSE) This is mainly to facilitate the choice ofdynamic weights Because the smoothing parameters in σ 2

ESt

and σ 2S (x) have been tuned to optimize their performance sep-

arately their bias and variance trade-offs have been consideredindirectly Thus controlling the variance of the integrated esti-mator σ 2

It also controls to some extent the bias of the estima-tor

32 Sampling Properties

The fundamental component in the choice of dynamicweights is the asymptotic independence between the time- andstate-domain estimators The following theorem shows that thetime- and state-domain estimators are indeed asymptotically in-dependent To facilitate the expression of notations we presentthe result at the current time tN

Theorem 3 Suppose that the second derivatives of μ(middot) andσ 2(middot) exist in a neighborhood of rtN Under conditions (C1) and(C3)ndash(C6) if condition (C2) holds at time point t = tN then

(a) Asymptotic independence

[radicn(σ 2

EStNminus σ 2

tN )

s1tN

radicNh(σ 2

StNminus σ 2

tN )

s2tN

]T Dminusrarr N (0 I2)

where s1tN and s2tN = s2(rtN ) are as given in Theorems 1 and 2(b) Asymptotic normality of σ 2

tN with wtN in (2) If the

limit d = limNrarrinfin n[Nh] exists thenradic

Nhω[σ 2tN minus σ 2

tN ] DminusrarrN (01) where ω = w2

tN s21tN

d + (1 minus wtN )2s22tN

(c) Asymptotic normality of σ 2ItN

with an estimated weightwtN If the weight wtN converges to the weight wtN in proba-bility and the limit d = limNrarrinfin n[Nh] exists then

radicNhω times

(σ 2ItN

minus σ 2tN )

Dminusrarr N (01)

Because the estimators σ 2St and σ 2

ESt are consistent

wt asympsumtminus1

i=1 ξ2i (rt)

sumtminus1i=1 ξ2

i (rt) + ctasymp var(σ 2

S (rt))

var(σ 2S (rt)) + var(σ 2

ESt(rt))= wt

and hence the condition in Theorem 3(c) holds Note that thetheoretically optimal weight minimizing the variance in Theo-rem 3(b) is

wtN opt = ds22tN

s21tN

+ ds22tN

asymp var(σ 2S (rt))

var(σ 2S (rt)) + var(σ 2

ESt(rt))asymp wtN

622 Journal of the American Statistical Association June 2007

It follows that the integrated estimator σ 2ItN

is optimal in thesense that it achieves the minimum variance among all of theweighted estimators in (1)

When 0 lt d lt infin in Theorem 3 the effective sample sizesin both time and state domains are comparable thus neitherthe time-domain estimator nor the state-domain estimator dom-inates From Theorem 3 based on the optimal weight the as-ymptotic relative efficiencies of σ 2

ItNwith respect to σ 2

StNand

σ 2EStN

are

eff(σ 2ItN σ 2

StN ) = 1 + ds22tN s2

1tN and

eff(σ 2ItN σ 2

EStN ) = 1 + s21tN (ds2

2tN )

which are gt1 This demonstrates that the integrated estimatorσ 2

ItNis more efficient than the time-domain and state-domain

estimators

4 BAYESIAN INTEGRATION OFVOLATILITY ESTIMATES

Another possible approach is to consider the historical infor-mation as the prior and to incorporate it into the estimation ofvolatility using the Bayesian framework We now explore suchan approach

41 Bayesian Estimation of Volatility

The Bayesian approach is to consider the recent data Ytminusn

Ytminus1 as an independent sample from N(0 σ 2) [see (4)] andto consider the historical information being summarized in aprior To incorporate the historical information we assume thatthe variance σ 2 follows an inverse-gamma distribution with pa-rameters a and b which has the density function

f (σ 2) = baminus1(a)σ 2minus(a+1)exp(minusbσ 2)

Write σ 2 sim IG(ab) It is well known that

E(σ 2) = b

a minus 1

var(σ 2) = b2

(a minus 1)2(a minus 2) and (14)

mode(σ 2) = b

a + 1

The hyperparameters a and b are estimated from historical datausing the state-domain estimators

It can be easily shown that given Y = (Ytminusn Ytminus1) theposterior density of σ 2 is IG(alowastblowast) where alowast = a + n

2 andblowast = 1

2

sumni=1 Y2

tminusi + b From (14) the Bayesian mean of σ 2 is

σ 2 = blowast

alowast minus 1=

nsum

i=1

Y2tminusi + 2b

2(a minus 1) + n

This Bayesian estimator can be easily written as

σ 2B = n

n + 2(a minus 1)σ 2

MAt + 2(a minus 1)

n + 2(a minus 1)σ 2

P (15)

where σ 2MAt is the moving average estimator given by (5) and

σ 2P = b(aminus1) is the prior mean determined from the historical

data This combines the estimate based on the data and priorknowledge

The Bayesian estimator (15) uses the local average of n datapoints To incorporate the exponential smoothing estimator (6)we consider it the local average of nlowast = sumn

i=1 λiminus1 = 1minusλn

1minusλdata

points This leads to the following integrated estimator

σ 2Bt = nlowast

nlowast + 2(a minus 1)σ 2

ESt + 2(a minus 1)

2(a minus 1) + nlowast σ 2P

= 1 minus λn

1 minus λn + 2(a minus 1)(1 minus λ)σ 2

ESt

+ 2(a minus 1)(1 minus λ)

1 minus λn + 2(a minus 1)(1 minus λ)σ 2

P (16)

In particular when λ rarr 1 the estimator (16) reduces to (15)

42 Estimation of Prior Parameters

A reasonable source for the prior information in (16) is thehistorical data up to time t Thus the hyperparameters a and bshould depend on t and can be used to match with the momentsfrom the historical information Using the approximation model(4) we have

E[(Yt minus f (rt))

2 | rt] asymp σ 2(rt) and

(17)var

[(Yt minus f (rt))

2 | rt] asymp 2σ 4(rt)

These can be estimated from the historical data up to time tnamely the state-domain estimator σ 2

S (rt) Because we have as-sumed that the prior distribution for σ 2

t is IG(atbt) by (17) andthe method of moments we would get the following estimationequations by matching the moments from the prior distributionwith the moment estimation from the historical estimate

E(σ 2t ) = σ 2

S (rt) and var(σ 2t ) = 2σ 4

S (rt)

This together with (14) leads to

bt

at minus 1= σ 2

S (rt) andb2

t

(at minus 1)2(at minus 2)= 2σ 4

S (rt)

where E represents from the expectation with respect to theprior Solving the foregoing equations we obtain

at = 25 and bt = 15σ 2S (rt)

Substituting this into (16) we obtain the estimator

σ 2Bt = 1 minus λn

1 minus λn + 3(1 minus λ)σ 2

ESt +3(1 minus λ)

1 minus λn + 3(1 minus λ)σ 2

St (18)

Unfortunately the weights in (18) are static and do not dependon the time t Thus the Bayesian method that we use doesnot produce a satisfactory answer to this problem Howeverother implementations of the Bayesian method may yield thedynamic weights

5 SIMULATION STUDIES AND EXAMPLES

To facilitate the presentation we use the simple abbrevia-tions in Table 1 to denote seven volatility estimation methodsDetails on the first three methods have been given by Fan andGu (2003) In particular the first method is to estimate volatilityusing the standard deviation of the yields in the past year andthe RiskMetrics method is based on the exponential smooth-ing with λ = 94 The semiparametric method of Fan and Gu(2003) is an extension of a local model used in the exponential

Fan Fan and Jiang Integrating Time and State Domains for Volatility Estimation 623

Table 1 Seven Volatility Estimators

Hist The historical methodRiskM The RiskMetrics method of J P MorganSemi The semiparametric estimator (SEV) of Fan and Gu (2003)NonBay The nonparametric Bayesian method in (18)Integ The integration method of time and state domains in (13)GARCH The maximum likelihood method based on a GARCH(1 1)

modelStaDo The state-domain estimation method in Section 22

smoothing with the smoothing parameter determined by min-imizing the prediction error It includes exponential smoothingwith λ selected by data as a specific example For our integratedmethods the parameter λ is taken as 94

The following five measures are used to assess the per-formance of different procedures for estimating the volatilityOther related measures also can be used (see Daveacute and Stahl1997)

Measure 1 Exceedence Ratio Against Confidence LevelThis measure counts the number of the events for which theloss of an asset exceeds the loss predicted by the normal modelat a given confidence α It was given by Fan and Gu (2003) andcomputed as

ER(σ 2t ) = mminus1

T+msum

i=T+1

I(Yi lt 13minus1(α)σi) (19)

where 13minus1(α) is the α-quantile of the standard normal distribu-tion and m is the size of the out-sample This gives an indicationof how effectively the volatility estimator can be used for pre-diction

As noted by Fan and Gu (2003) one shortcoming of the mea-sure is its large Monte Carlo error Unless the postsample sizem is sufficiently large this measure has difficulty by differenti-ating the performance of various estimators due to the presenceof large error margins Note that the ER depends strongly onthe assumption of normality In our simulation study we usethe true α-quantile of the error distribution instead of 13minus1(α)

in (19) to compute the ER For real data analysis we use theα-quantile of the last 250 residuals for the in-sample data

Measure 2 Mean Absolute Deviation Error To motivate thismeasure we first consider the MSEs

PE(σ 2t ) = mminus1

T+msum

i=T+1

(Y2i minus σ 2

i )2

The expected value can be decomposed as

E(PE) = mminus1T+msum

i=T+1

E(σ 2i minus σ 2

i )2 + mminus1T+msum

i=T+1

E(Y2i minus σ 2

i )2

+ mminus1T+msum

i=T+1

E(Y2i minus σ 2

i )(σ 2i minus σ 2

i ) (20)

Because σ 2i is predicable at time ti and Y2

i minus σ 2i is a martingale

difference the third term is 0 [the drift is assumed to be neg-ligible in the forgoing formulation of PE otherwise the driftterm should be removed first In both cases it has an affec-tion O(2)] Therefore we need consider only the first two

terms The first term reflects the effectiveness of the estimatedvolatility whereas the second term is the size of the stochasticerror and independent of estimators As in all statistical pre-diction problems the second term is usually of an order ofmagnitude larger than the first term Thus a small improvementin PE could mean substantial improvement over the estimatedvolatility However due to the well-known fact that financialtime series contain outliers the MSE is not a robust measureTherefore we used the mean absolute deviation error (MADE)MADE(σ 2

t ) = mminus1 sumT+mi=T+1 |Y2

i minus σ 2i |

Measure 3 Square-Root Absolute Deviation Error An al-ternative to MADE is the Square-root absolute deviation error(RADE) defined as

RADE(σ 2t ) = mminus1

T+msum

i=T+1

∣∣| Yi | minusradic2πσi

∣∣

The constant factor comes from the fact that E|εt| = radic2π for

εt sim N(01) If the underlying error distribution deviates fromnormality then this measure is not robust

Measure 4 Ideal Mean Absolute Deviation Error and Oth-ers To assess the estimation of the volatility in simulations wealso can use the ideal mean absolute deviation error (IMADE)

IMADE = mminus1T+msum

i=T+1

|σ 2i minus σ 2

i |

and the ideal square root absolute deviation error (IRADE)

IRADE = mminus1T+msum

i=T+1

|σi minus σi|

Another intuitive measure is the relative IMADE (RIMADE)

RIMADE = mminus1T+msum

i=T+1

|σ 2i minus σ 2

i |σ 2

i

This measure may create some outliers when the true volatilityis quite small at certain time points For real data analysis bothmeasures are not applicable

51 Simulations

To assess the performance of the five estimation methodslisted in Table 1 we compute the average and standard devi-ation of each of the four measures over 600 simulations Gen-erally speaking the smaller the average (or the standard devia-tion) the better the estimation approach We also compute theldquoscorerdquo of an estimator which is the percentage of times among600 simulations that the estimator outperforms the average ofthe 7 methods in terms of an effectiveness measure To be morespecific for example consider RiskMetrics using MADE as aneffectiveness measure Let mi be the MADE of the RiskMet-rics estimator at the ith simulation and let mi be the average ofthe MADEs for the five estimators at the ith simulation Thenthe ldquoscorerdquo of the RiskMetrics approach in terms of the MADEis defined as 1

600

sum600i=1 I(mi lt mi) Obviously estimators with

higher scores are preferred In addition we define the ldquorelativelossrdquo of an estimator σ 2

t relative to σ 2It in terms of MADEs as

RLOSS(σ 2t σ 2

It) = MADE(σ 2t )MADE(σ 2

It) minus 1

624 Journal of the American Statistical Association June 2007

where MADE(σ 2t ) is the average of MADE(σ 2

t ) among simula-tions

Example 1 In this example we compare the proposedmethods with the maximum likelihood estimation approachfor a parametric generalized autoregressive conditional het-eroscedasticity [GARCH(1 1)] model This will give an assess-ment of how well our methods perform compared with the mostefficient estimation method

Because the GARCH model does not have a state variablerti but the diffusion model does the diffusion model cannot beused to fit the data simulated from the GARCH model A sen-sible approach is to use the diffusion limit of the GARCH(1 1)model to generate data There are several interesting works onreconciling the two modeling approaches along this line forexample Nelson (1990) established the continuous-time dif-fusion limit for the discrete ARCH model Duan (1997) pro-posed an augmented GARCH model to unify various paramet-ric GARCH models and derived its diffusion limit and Wang(2002) studied the asymptotic relationship between GARCHmodels and diffusions Here we use the result on the diffusionlimit of the GARCH(1 1) model of Wang (2002)

Specifically we first generate data from the GARCH(1 1)model

Yt = σtεt

σ 2t = c0 + a0σ

2tminus1 + b0Y2

tminus1

where the εtrsquos are from the standard normal distribution and thetrue parameters c0 = 498 times 10minus7 a0 = 9289615 and b0 =0411574 are from the fitted values of the parameters for thedaily exchange rate of the Australian dollar with the US dollarfrom January 3 1994 to December 29 2000 The GARCH(1 1)model is then fitted by maximum likelihood estimation

Second we use the diffusion limit (see Wang 2002 sec 23)of the foregoing GARCH model

drt = σt dW1t

dσ 2t = (β0 + β1σ

2t )dt + β2σ

2t dW2t

where W1t and W2t are two independent standard Wienerprocess and the parameters are computed as (β0 β1 β2) =(25896 times 10minus5minus15538 4197) To simulate the diffusionprocess σ 2

t we use the discrete-time order 10 strong ap-proximation scheme of Kloeden Platen Schurz and Soslashrensen(1996)

For each of the two models we generate 600 series of dailydata each with length 1200 For each simulation we set thefirst 900 observations as the ldquoin-samplerdquo data and the last 300observations as the ldquoout-samplerdquo data The results summarizedin Table 2 show that all estimators have acceptable ER valuesclose to 05 The integrated estimator has minimum RIMADEIMADE MADE IRADE and RADE on average among allof the nonparametric estimators Compared with the most ef-ficient maximum likelihood estimator (MLE) of the GARCHmodel the integrated estimator has smaller MADE but largerRIMADE IMADE and IRADE The RADEs of the MLEand the integrated estimator are very close demonstrating thatthe integrated estimator is very impressive in forecasting thevolatility of this example

Note that the dynamic weights for the integrated estimatorare estimated by minimizing the variance of the combined esti-mator in (1) As one anonymous referee pointed out one maychoose to minimize the MSE instead which seems to be dif-ficult and computationally intensive in current situations Asargued in Section 31 the biases and variances trade-off havebeen considered indirectly in the time-domain and state-domainestimation methods The bias of the integrated estimator wascontrolled before the weights were chosen

To investigate whether the foregoing intuition is correct herewe opt for the suggestion from the referee to visually displayhow the integrated estimator dominates the state-domaintime-domain estimators over time through the dynamic weights andcompare the biases of the state-domain estimator and the inte-grated estimator The weights and biases are shown in Figure 2

Table 2 Comparisons of Several Volatility Estimation Methods

Measure Empirical formula Hist RiskM Semi NonBay Integ GARCH StaDo

RIMADE Score () 4073 2805 6494 5092 7379 7629 4090Average 205 204 183 187 178 169 204

Standard (times10minus2) 655 331 450 308 397 367 567Relative loss () 1544 1444 266 497 0 minus476 1475

IMADE Score () 6177 2554 5342 4608 7212 7546 4090Average (times10minus5) 332 367 330 336 318 304 362Standard (times10minus6) 1118 744 877 690 851 800 1113Relative loss () 440 1544 373 581 0 minus447 1390

MADE Score () 2821 4958 4975 5659 8815 5743 7963Average (times10minus4) 164 153 153 152 149 152 149Standard (times10minus6) 2314 2182 2049 2105 1869 1852 1876Relative loss () 1016 284 293 209 0 207 11

IRADE Score () 6277 2404 5109 4708 7062 7713 4040Average (times10minus3) 403 448 407 410 391 369 446Standard (times10minus3) 121 075 092 070 093 085 131Relative loss () 317 1475 411 498 0 minus553 1422

RADE Score () 3055 5042 4341 5910 8464 5893 7913Average (times10minus2) 20 19 19 19 19 19 19Standard (times10minus4) 153 145 141 142 134 133 137Relative loss () 513 142 165 96 0 106 minus25

ER Average (times10minus2) 495 545 538 528 525 516 500Standard (times10minus2) 157 113 128 112 136 125 149

Fan Fan and Jiang Integrating Time and State Domains for Volatility Estimation 625

(a) (b)

Figure 2 The Dynamic Weights for Prediction for Example 1 (a) and the Biases for Prediction for the State-Domain Estimator ( minus minus minus) and theIntegration Estimator ( mdash) (b)

It is seen that the bias is actually controlled because both of theestimators have very close biases The weights are around 2and through the weights the integrated method improves boththe RiskMetrics method and the state-domain approach

Example 2 To simulate the interest rate data we considerthe CoxndashIngersollndashRoss (CIR) model

drt = κ(θ minus rt)dt + σ r12t dWt t ge t0

where the spot rate rt moves around a central location or long-run equilibrium level θ = 08571 at speed κ = 21459 The σ

is set at 07830 These parameters values given by Chapmanand Pearson (2000) satisfy the condition 2κθ ge σ 2 so that theprocess rt is stationary and positive The model has been studiedby Chapman and Pearson (2000) and Fan and Zhang (2003)

We simulated weekly data (600 runs) from the CIR modelFor each simulation we designated the first 900 observations

as the ldquoin-samplerdquo data and the last 300 observations the ldquoout-samplerdquo data The results summarized in Table 3 show that theintegrated method performs closely to the state-domain methodThis is mainly because the dynamic weights shown in Figure 3are very small and the integration estimator is very close to thestate-domain estimator supporting our statement at the end ofthe third paragraph in Section 1 Table 3 shows that the inte-grated estimator uniformly dominates the other estimators onaverage due to its highest score and lowest averaged RIMADEIMADE MADE IRADE and RADE The improvement inRIMADE IMADE and IRADE is about 100 This demon-strates that our integrated volatility method better captures thevolatility dynamics The historical simulation method performspoorly due to misspecification of the function of the volatilityparameter The results here show the advantage of aggregatingthe information of the time and state domains Note that all es-timators have reasonable ER values at level 05 and that the ERvalue of the integrated estimator is closest to 05

Table 3 Comparisons of Several Volatility Estimation Methods

Measure Empirical formula Hist RiskM Semi NonBay Integ GARCH StaDo

RIMADE Score () 501 1820 2104 3656 9933 2404 9917Average (times10minus1) 267 190 188 164 61 202 62Standard (times10minus1) 148 35 81 31 39 130 43Relative loss () 33732 21074 20798 16821 0 23012 112

IMADE Score () 768 1319 1469 2972 9967 2120 9950Average (times10minus6) 237 208 191 179 64 196 63Standard (times10minus6) 100 78 69 68 45 91 46Relative loss () 27162 22558 19905 18107 0 20696 minus163

MADE Score () 3823 5209 5776 5609 7145 5392 6945Average (times10minus5) 102 93 93 93 91 94 92Standard (times10minus6) 349 333 321 330 317 310 317Relative loss () 1100 212 221 162 0 229 21

IRADE Score () 818 1252 1402 3072 9983 2104 9950Average (times10minus4) 376 322 307 277 99 316 98Standard (times10minus4) 140 75 90 66 60 192 62Relative loss () 27855 22397 20898 17892 0 21821 minus111

RADE Score () 4040 5209 5042 5593 7379 4875 7279Average (times10minus2) 15 15 15 15 15 15 15Standard (times10minus4) 272 267 257 265 258 251 257Relative loss () 608 123 160 92 0 189 06

ER Average 056 055 054 053 049 053 049Standard 023 011 013 011 013 036 013

626 Journal of the American Statistical Association June 2007

(a) (b)

Figure 3 The Dynamic Weights for Prediction for Example 2 (a) and the Biases for Prediction for the State-Domain Estimator ( minus minus minus) and forthe Integration Estimator ( mdash) (b)

Again Figure 3(b) displays the biases for the state domainand integration estimators It can be seen that the state-domainand integration estimators have almost overlaid biases demon-strating that the bias of the integrated estimator is taken care ofeven though we choose the dynamic weights by minimizing thevariance

Example 3 To assess the performance of the proposed meth-ods under a nonstationary situation we now consider the geo-metric Brownian (GBM)

drt = μrt dt + σ rt dWt

where Wt is a standard one-dimensional Brownian motion Thisis a nonstationary process against which we check whether ourmethod continues to apply Note that the celebrated BlackndashScholes option price formula is derived from Osbornersquos as-sumption that the stock price follows the GBM model By Itocircrsquosformula we have

log rt minus log r0 = (μ minus σ 22)t + σ 2Wt

We set μ = 03 and σ = 26 in our simulations With theBrownian motion simulated from independent Gaussian incre-ments we can generate the samples for the GBM We simulate600 times with = 1252 For each simulation we generate1000 observations and use the first two-thirds of the observa-tions as in-sample data and the remaining observations as out-sample data

Because of the nonstationarity of the process the simulationresults are somewhat unstable with a few uncommon realiza-tions appearing Therefore we use a robust method to sum-marize the results For each measure we compute the relativeloss and the median along with the median absolute deviation(MAD) for scale estimation The MAD for sample ais

i=1 isdefined as MAD(a) = median|ai minus median(a)| i = 1 sIt is easy to see that MAD(a) provides a robust estimator ofthe standard deviation of ai Table 4 reports the results Thetable shows that the integrated estimator performs quite wellAll estimators have acceptable ER values The proposed esti-

Table 4 Robust Comparisons of Several Volatility Estimation Methods

Measure Empirical formula Hist RiskM Semi NonBay Integ GARCH StaDo

RIMADE Score () 733 4300 3583 7283 6133 2733 5233Median (times10minus1) 2254 1561 1770 1490 1382 2036 1443

MAD 035 012 024 012 034 036 037Relative loss () 6310 1296 2805 781 0 4732 444

IMADE Score () 1333 4333 2633 7333 6517 2550 5300Median (times10minus7) 217 174 193 163 146 265 154MAD (times10minus8) 566 474 486 445 322 817 357

Relative loss () 4864 1913 3211 1205 0 8145 561

MADE Score () 3900 3950 2650 4933 5200 4233 5133Median (times10minus6) 115 111 111 110 109 111 109MAD (times10minus7) 236 244 240 244 231 230 230

Relative loss () 592 143 181 123 0 174 29

Score () 1417 4317 2500 7317 6467 2683 5300IRADE Median (times10minus4) 346 267 310 255 251 378 266

MAD (times10minus5) 599 406 488 389 566 847 644Relative loss () 3787 649 2358 149 0 5037 591

RADE Score () 3900 3700 2433 5117 6150 2950 5750Median (times10minus4) 1581 1509 1515 1508 1504 1661 1503MAD (times10minus4) 185 201 198 200 193 257 192

Relative loss () 509 30 73 23 0 1042 minus07

ER Median 054 051 054 051 048 057 0480MAD 010 003 004 003 006 007 006

Fan Fan and Jiang Integrating Time and State Domains for Volatility Estimation 627

(a) (b)

Figure 4 The Dynamic Weights for Prediction for Example 3 (a) and the Biases for Prediction for the State-Domain Estimator ( minus minus minus) and theIntegration Estimator ( mdash) (b)

mators have the smallest measure values in median or robustscale This shows that our integrated method continues to per-form better than the others for this nonstationary case

Again Figure 4(b) displays the biases for the state domainand integration estimators It can be seen that the state-domainand integration estimators have almost overlaid biases demon-strating that the bias of the integrated estimator is taken care ofeven though we choose the dynamic weights by minimizing thevariance

52 Real Examples

In this section we apply the integrated volatility estimationmethods and other methods to the analysis of real financial data

521 Treasury Bond Here we consider the weekly returnsof three Treasury Bonds with terms of 1 5 and 10 years Thein-sample data are for January 4 1974ndashDecember 30 1994 andout-sample data are for January 6 1995ndashAugust 8 2003 Thetotal sample size is 1545 and the in-sample size is 1096 Theresults are reported in Table 5

From Table 5 for the 1-year Treasury Notes the integratedestimator is of the smallest MADE and the smallest RADE re-flecting that the integrated estimation method of the volatilityis the best among the seven methods For the bonds with 5-or 10-year maturities the seven estimators have close MADEsand RADEs where the historical simulation method is bet-ter than the RiskMetrics in terms of MADE and RADE andthe integrated estimation approach has the smallest MADEsIn addition the integrated estimator has ER values closest to

05 This demonstrates the advantage of using state-domaininformation which can improve the time-domain predictionof the volatilities in the interest dynamics The nonparametricBayesian method does not perform as well as the integrated es-timator because it uses fixed weights and the inverse-gammaprior which may not be true in practice

522 Exchange Rate We analyze the daily exchange rateof several foreign currencies against the US dollar The dataare for January 3 1994ndashAugust 1 2003 The in-sample dataconsist of the observations before January 1 2001 and the out-sample data consist of the remaining observations The resultsreported in Table 6 show that the integrated estimator has thesmallest MADEs and moderate RADEs for the exchange ratesNote that the RADE measure is effective only when the dataseem to be from a normal distribution and thus cannot calibratethe accuracy of volatility forecast for data from an unknowndistribution For this reason the RADE might not be a goodmeasure of performance for this example Both the integratedvolatility estimation and GARCH perform nicely for this exam-ple They outperform other nonparametric methods

Figure 5 reports the dynamic weights for the daily exchangerate of the UK pound to the US dollar Because the RiskMestimator is far more informative than the StaDo estimator ourintegrated estimator becomes basically the time-domain esti-mator by automatically choosing large dynamic weights Thisexemplifies our statement in the paragraph immediately before(1) in Section 1

Table 5 Comparisons of Several Volatility Estimation Methods

Term Measure (times10minus2) Hist RiskM Semi NonBay Integ GARCH StaDo

1 year MADE 1044 787 787 794 732 893 914RADE 5257 4231 4256 4225 4105 4402 4551

ER 2237 2009 2013 1562 3795 0 7366

5 years MADE 1207 1253 1296 1278 1201 1245 1638RADE 5315 5494 5630 5563 5571 5285 6543

ER 671 1339 1566 1112 5804 0 8929

10 years MADE 1041 1093 1103 1111 1018 1046 1366RADE 4939 5235 5296 5280 5152 4936 6008

ER 1112 1563 1790 1334 4911 0 6920

628 Journal of the American Statistical Association June 2007

Table 6 Comparison of Several Volatility Estimation Methods

Currency Measure Hist RiskM Semi NonBay Integ GARCH StaDo

UK MADE (times10minus4) 614 519 536 519 493 506 609RADE (times10minus3) 3991 3424 3513 3440 3492 3369 3900

ER 009 014 019 015 039 005 052

Australia MADE (times10minus4) 172 132 135 135 126 132 164RADE (times10minus3) 1986 1775 1830 1798 1761 1768 2033

ER 054 025 026 022 042 018 045

Japan MADE (times10minus1) 5554 5232 5444 5439 5064 5084 6987RADE (times10minus1) 3596 3546 3622 3588 3559 3448 4077

ER 012 011 019 012 028 003 032

6 CONCLUSIONS

We have proposed a dynamically integrated method and aBayesian method to aggregate the information from the timedomain and the state domain We studied the performance com-parisons both empirically and theoretically We showed that theproposed integrated method effectively aggregates the informa-tion from both the time and state domains and has advantagesover some previous methods for example it is powerful in fore-casting volatilities for the yields of bonds and for exchangerates Our study also revealed that proper use of informationfrom both the time domain and the state domain makes volatil-ity forecasting more accurate Our method exploits both thecontinuity in the time domain and the stationarity in the statedomain and can be applied to situations in which these twoconditions hold approximately

APPENDIX CONDITIONS AND PROOFSOF THEOREMS

We state technical conditions for the proof of our results

(C1) (Global Lipschitz condition) There exists a constant k0 ge 0such that

|μ(x) minus μ(y)| + |σ(x) minus σ(y)| le k0|x minus y| for all x y isin R

(C2) Given time point t gt 0 there exists a constant L gt 0 such that

E|μ(rs)|4(p+δ) le L and E|σ(rs)|4(p+δ) le L

for any s isin [t minus η t] where η is some positive constant p is a positiveinteger and δ is some small positive constant

Figure 5 Dynamic Weights for the Daily Exchange Rate of the UKPound and the US Dollar

(C3) The discrete observations rtiNi=0 satisfy the stationarity con-

ditions of Banon (1978) Furthermore the G2 condition of Rosenblatt(1970) holds for the transition operator

(C4) The conditional density p(y|x) of rti+ given rti is continuousin the arguments (y x) and is bounded by a constant independent of

(C5) The kernel W is a bounded symmetric probability densityfunction with compact support say [minus11]

(C6) (N minus n)h rarr infin (N minus n)h5 rarr 0 and (N)h rarr 0

Condition (C1) ensures that (3) has a unique strong solution rt t ge0 continuous with probability 1 if the initial value r0 satisfies thatP(|r0| lt infin) = 1 (see thm 46 of Liptser and Shiryayev 2001) Condi-tion (C2) indicates that given time point t gt 0 there is a time interval[t minus η t] on which the drift and the volatility have finite 4(p + δ)thmoments which is needed for deriving asymptotic normality of thetime-domain estimate σ 2

ESt Conditions (C3)ndash(C5) are similar to thoseof Fan and Zhang (2003) Condition (C6) is assumed for bandwidthswhere undersmoothing is uses to yield zero bias in the asymptotic nor-mality of the state-domain estimate

Throughout the proof we denote by M a generic positive constantand use μs and σs to represent μ(rs) and σ(rs)

Proposition A1 Under conditions (C1) and (C2) we have for al-most all sample paths

|σ 2s minus σ 2

u | le K|s minus u|(2pminus1)(4p) (A1)

for any su isin [t minus η t] where the coefficient K satisfies E[K2(p+δ)]ltinfin

Proof First we show that the process rs is locally Houmllder-continuous with order q = (p minus 1)(2p) and coefficient K1 such that

E[K4(p+δ)1 ] lt infin that is

|rs minus ru| le K1|s minus u|q foralls u isin [t minus η t] (A2)

In fact by Jensenrsquos inequality and martingale moment inequalities(Karatzas and Shreve 1991 sec 33D p 163) we have

E|ru minus rs|4(p+δ)

le M

(E

∣∣∣∣int u

sμv dv

∣∣∣∣4(p+δ)

+ E

∣∣∣∣int u

sσv dWv

∣∣∣∣4(p+δ))

le M(u minus s)4(p+δ)minus1int u

sE|μv|4(p+δ) dv

+ M(u minus s)2(p+δ)minus1int u

sE|σv|4(p+δ) dv

le M(u minus s)2(p+δ)

Then by theorem 21 of Revuz and Yor (1999 p 26)

E[(

supt 13=s

|rs minus ru||s minus u|α)4(p+δ)]

lt infin (A3)

Fan Fan and Jiang Integrating Time and State Domains for Volatility Estimation 629

for any α isin [02(p+δ)minus1

4(p+δ)) Let α = pminus1

2p and K1 = sups 13=u|rs minusru||sminusu|(pminus1)(2p) Then E[K4(p+δ)

1 lt infin] and the inequality (A2)holds

Second by condition (C1) we have |σ 2s minus σ 2

u | le k0(σs + σu)|rs minusru| This combined with (A2) leads to

|σ 2s minus σ 2

u | le k0(σs + σu)K1|s minus u|q equiv K|s minus u|q

Then by Houmllderrsquos inequality

E[K2(p+δ)

] le ME[(σsK1)2(p+δ)

]

le Mradic

E[σ

4(p+δ)s

]E[K4(p+δ)

1

]lt infin

Proof of Theorem 1

Let Zis = (rs minus rti)2 Applying Itocircrsquos formula to Zis we obtain

dZis = 2

(int s

tiμu du +

int s

tiσu dWu

)(μs ds + σs dWs

)+ σ 2

s ds

= 2

[(int s

tiμu du +

int s

tiσu dWu

)μs ds + σs

(int s

tiμu du

)dWs

]

+ 2

(int s

tiσu dWu

)σs dWs + σ 2

s ds

Then Y2i can be decomposed as Y2

i = 2ai + 2bi + σ 2i where

ai = minus1[int ti+1

tiμs ds

int s

tiμu du +

int ti+1

tiμs ds

int s

tiσu dWu

+int ti+1

tiσs dWs

int s

tiμu du

]

bi = minus1int ti+1

ti

int s

tiσu dWuσs dWs

and

σ 2i = minus1

int ti+1

tiσ 2

s ds

Therefore σ 2ESt can be written as

σ 2ESt = 2

1 minus λ

1 minus λn

tminus1sum

i=tminusn

λtminusiminus1ai + 21 minus λ

1 minus λn

tminus1sum

i=tminusn

λtminusiminus1bi

+ 1 minus λ

1 minus λn

tminus1sum

i=tminusn

λtminusiminus1σ 2i

equiv An + Bn + Cn

By Proposition A1 as n rarr 0 |Cn minus σ 2t | le K(n)q where q =

(2p minus 1)(4p) This combined with Lemmas A1 and A2 completesthe proof of the theorem

Lemma A1 If condition (C2) is satisfied then E[A2n] = O()

Proof Simple algebra gives the result In fact

E(a2i ) le 3E

[minus1

int ti+1

tiμs ds

int s

tiμu du

]2

+ 3E

[minus1

int ti+1

tiμs ds

int s

tiσu dWu

]2

+ 3E

[minus1

int ti+1

tiσs dWs

int s

tiμu du

]2

equiv I1() + I2() + I3()

Applying Jensenrsquos inequality we obtain that

I1() = O(minus1)E

[int ti+1

ti

int s

tiμ2

s μ2u du ds

]

= O(minus1)

int ti+1

ti

int s

tiE(μ4

u + μ4s )du ds = O()

By Jensenrsquos inequality Houmllderrsquos inequality and martingale momentinequalities we have

I2() = O(minus1)

int ti+1

tiE

(μs

int s

tiσu dWu

)2ds

= O(minus1)

int ti+1

ti

E[μs]4E

[int ti+1

tiσu dWu

]412ds

= O()

Similarly I3() = O() Therefore E(a2i ) = O() Then by the

CauchyndashSchwartz inequality and noting that n(1 minus λ) = O(1) we ob-tain that

E[A2n] le n

(1 minus λ

1 minus λn

)2 nsum

i=1

λ2(nminusi)E(a2i ) = O()

Lemma A2 Under conditions (C1) and (C2) if n rarr infin andn rarr 0 then

sminus11t

radicnBn

Dminusrarr N (01) (A4)

Proof Note that bj = σ 2t minus1 int tj+1

tj (Ws minus Wtj)dWs + εj where

εj = minus1int tj+1

tj(σs minus σt)

[int s

tjσu dWu

]dWs

+ minus1σt

int tj+1

tj

[int s

tj(σu minus σt)dWu

]dWs

By the central limit theorem for martingales (see Hall and Heyde 1980cor 31) it suffices to show that

V2n equiv E[sminus2

1t nB2n] rarr 1 (A5)

and that the following Lyapunov condition holds

tminus1sum

i=tminusn

E

(radicn

1 minus λ

1 minus λn λtminusiminus1bi

)4rarr 0 (A6)

Note that

2

2E(ε2

j ) le E

int tj+1

tj(σs minus σt)

[int s

tjσu dWu

]dWs

2

+ σ 2t E

int tj+1

tj

[int s

tj(σu minus σt)dWu

]dWs

2

equiv Ln1 + Ln2 (A7)

By Jensenrsquos inequality Houmllderrsquos inequality and moment inequalitiesfor martingale we have

Ln1 leint tj+1

tjE

(σs minus σt)

2[int s

tjσu dWu

]2ds

leint tj+1

tj

E(σs minus σt)

4E

[int s

tjσu dWu

]412ds

leint tj+1

tj

E[k0K1(n)q]436

int s

tjE(σ 4

u )du

12ds

le M(n)2q2 (A8)

630 Journal of the American Statistical Association June 2007

where K1 is defined in Proposition A1 Similarly

Ln2 le M(n)2q2 (A9)

By (A7) (A8) and (A9) we have E(ε2j ) le M(n)2q therefore

E[σminus4t b2

j ] = 12 + O((n)q) By the theory of stochastic calculus sim-

ple algebra gives that E(bj) = 0 and E(bibj) = 0 for i 13= j It followsthat

V2n = E(sminus2

1t nB2n) =

tminus1sum

i=tminusn

E

(2s1t

radicn

1 minus λ

1 minus λn λtminusiminus1bi

)2rarr 1

that is (A5) holds For (A6) it suffices to prove that E(b4j ) is

bounded which holds from condition (C2) and by applying the mo-ment inequalities for martingales to b4

j

Proof of Theorem 2

The proof is completed along the same lines as given by Fan andZhang (2003)

Proof of Theorem 3

By Fan and Yao (1998) the volatility estimator σ 2StN

behaves asif the instantaneous return function f were known Thus without lossof generality we assume that f (x) = 0 and hence Ri = Y2

i Let Y =(Y2

0 Y2Nminus1)T and W = diagWh(rt0 minus rtN ) Wh(rtNminus1 minus rtN )

and let X be a (N minus t0)times 2 matrix with each of the elements in the firstcolumn being 1 and the second column being (rt0 minus rtN rtNminus1 minusrtN )prime Let mi = E[Y2

i |rti ] m = (m0 mNminus1)T and e1 = (10)T Define SN = XT WX and TN = XT WY Then it can be written thatσ 2

StN= eT

1 Sminus1N TN (see Fan and Yao 2003) Hence

σ 2StN

minus σ 2tN = eT

1 Sminus1N XT Wm minus XβN + eT

1 Sminus1N XT W(Y minus m)

equiv eT1 b + eT

1 t (A10)

where βN = (m(rtN ) mprime(rtN ))T with m(rx) = E[Y21 |rt1 = x] Follow-

ing Fan and Zhang (2003) the bias vector b converges in probability toa vector b with b = O(h2) = o(1

radic(N)h) In what follows we show

that the centralized vector t is asymptotically normalIn fact write u = (N)minus1Hminus1XT W(Y minus m) where H = diag1h

Then following Fan and Zhang (2003) the vector t can be written as

t = pminus1(rtN )Hminus1Sminus1u(1 + op(1)) (A11)

where S = (μi+jminus2)ij=12 with μj = intujW(u)du and p(x) is the in-

variant density of rt defined in Section 22 For any constant vector cdefine

QN = cT u = 1

N

Nminus1sum

i=0

Y2i minus miCh

(rti minus rtN

)

where Ch(middot) = 1hC(middoth) with C(x) = c0W(x) + c1xW(x) Applyingthe ldquobig-blockrdquo and ldquosmall-blockrdquo arguments of Fan and Yao (2003thm 63 p 238) we obtain

θminus1(rtN )radic

(N)hQNDminusrarr N(01) (A12)

where θ2(rtN ) = 2p(rtN )σ 4(rtN )int +infinminusinfin C2(u)du In what follows we

decompose QN into two parts QprimeN and Qprimeprime

N which satisfy the follow-ing

(a) NhE[θminus1(rtN )QprimeN ]2 le (hN)(hminus1aN(1 + o(1)) + (N) times

o(hminus1)) rarr 0

(b) QprimeprimeN is identically distributed as QN and is asymptotically inde-

pendent of σ 2EStN

Define QprimeN = 1

NsumaN

i=0Y2i minus E[Y2

i |rti ]Ch(rti minus rtN ) and QprimeprimeN = QN minus

QprimeN where aN is a positive integer satisfying aN = o(N) and aN rarr

infin Obviously aN n Let ϑNi = (Y2i minus mi)Ch(rti minus rtN ) Then fol-

lowing Fan and Zhang (2003)

var[θminus1(

rtN)ϑN1

] = hminus1(1 + o(1))

and

Nminus2sum

=1

| cov(ϑN1 ϑN+1)| = o(hminus1)

which yields the result in (a) This combined with (A12) (a) and thedefinition of Qprimeprime

N leads to

θminus1(rtN )radic

NhQprimeprimeN

Dminusrarr N(01) (A13)

Note that the stationarity conditions of Banon (1978) and the G2 con-dition of Rosenblatt (1970) on the transition operator imply that theρt mixing coefficient ρt() of rti decays exponentially and that thestrong mixing coefficient α() le ρt() it follows that∣∣E exp

(Qprimeprime

N + σ 2EStN

) minus E expiξ(QprimeprimeN)E exp

(iξ σ 2

EStN

)∣∣

le 32α((aN minus n)) rarr 0

for any ξ isin R Using the theorem of Volkonskii and Rozanov (1959)we get the asymptotic independence of σ 2

EStNand Qprimeprime

N

By (a)radic

NhQprimeN is asymptotically negligible Then by Theorem 1

d1θminus1(rtN

)radicNhQN + d2Vminus12

2radic

n[σ 2

EStNminus σ 2(

rtN)]

DminusrarrN (0d21 + d2

2)

for any real d1 and d2 where V2 = ec+1ecminus1σ 4(rtN ) Because QN is a

linear transformation of u

Vminus12[ radic

Nhuradicn[σ 2

EStNminus σ 2(rtN )]

]Dminusrarr N (0 I3)

where V = blockdiagV1V2 with V1 = 2σ 4(rtN )p(rtN )Slowast whereSlowast = (νi+jminus2)ij=12 with νj = int

ujW2(u)du This combined with

(A11) gives the joint asymptotic normality of t and σ 2EStN

Note that

b = op(1radic

Nh) and it follows that

minus12(radic

Nh[σ 2StN

minus σ 2(rtN )]radic

n[σ 2EStN

minus σ 2(rtN )])

Dminusrarr N (0 I2)

where = diag2σ 4(rtN )ν0p(rtN )V2 Note that σ 2StN

and σ 2EStN

are asymptotically independent it follows that the asymptotical nor-mality in (b) holds

The result in (c) follows from (b) and the fact that wtNPrarr wtN and

σ 2ItN

= wtN σ 2EStN

+ (1 minus wtN )σ 2StN

It follows that

σ 2ItN minus σ 2

tN

= σ 2tN minus σ 2

tN + (wtN minus wtN

)[(σ 2

EStNminus σ 2

tN

) minus (σ 2

StNminus σ 2

tN

)]

equiv Ln1 + (wtN minus wtN

)(Ln2 minus Ln3)

Because Ln1 Ln2 and Ln3 are of the same order the second and thirdterms are dominated by the first term Ln1 Then σ 2

ItNminus σ 2

tN = (σ 2tN minus

σ 2tN )(1 + op(1)) and thus σ 2

ItNminus σ 2

tN shares the same asymptotical

normality as σ 2tN minus σ 2

tN

[Received November 2005 Revised December 2006]

Fan Fan and Jiang Integrating Time and State Domains for Volatility Estimation 631

REFERENCESAiumlt-Sahalia Y and Mykland P (2004) ldquoEstimators of Diffusions With Ran-

domly Spaced Discrete Observations A General Theoryrdquo The Annals of Sta-tistics 32 2186ndash2222

Aiumlt-Sahalia Y Mykland P A and Zhang L (2005) ldquoHow Often to Samplea Continuous-Time Process in the Presence of Market Microstructure NoiserdquoReview of Financial Studies 18 351ndash416

Arapis M and Gao J (2004) ldquoNonparametric Kernel Estimation and Testingin Continuous-Time Financial Econometricsrdquo manuscript

Bandi F and Phillips (2003) ldquoFully Nonparametric Estimation of Scalar Dif-fusion Modelsrdquo Econometrica 71 241ndash283

Banon G (1978) ldquoNonparametric Identification for Diffusion ProcessesrdquoSIAM Journal of Control Optimization 16 380ndash395

Barndoff-Neilsen O E and Shephard N (2001) ldquoNon-Gaussian OrnsteinndashUhlenbeckndashBased Models and Some of Their Uses in Financial Economicsrdquo(with discussion) Journal of the Royal Statistical Society Ser B 63167ndash241

(2002) ldquoEconometric Analysis of Realized Volatility and Its Use inEstimating Stochastic Volatility Modelsrdquo Journal of the Royal Statistical So-ciety Ser B 64 253ndash280

Bollerslev T and Zhou H (2002) ldquoEstimating Stochastic Volatility DiffusionUsing Conditional Moments of Integrated Volatilityrdquo Journal of Economet-rics 109 33ndash65

Brenner R J Harjes R H and Kroner K F (1996) ldquoAnother Look at Mod-els of the Short-Term Interest Raterdquo Journal of Financial and QuantitativeAnalysis 31 85ndash107

Cai Z and Hong Y (2003) ldquoNonparametric Methods in Continuous-TimeFinance A Selective Reviewrdquo in Recent Advances and Trends in Nonpara-metric Statistics eds M G Akritas and D M Politis Amsterdam Elsevierpp 283ndash302

Chan K C Karolyi A G Longstaff F A and Sanders A B (1992)ldquoAn Empirical Comparison of Alternative Models of the Short-Term Inter-est Raterdquo Journal of Finance 47 1209ndash1227

Chapman D A and Pearson N D (2000) ldquoIs the Short Rate Drift ActuallyNonlinearrdquo Journal of Finance 55 355ndash388

Chen S X Gao J and Tang C Y (2007) ldquoA Test for Model Specificationof Diffusion Processesrdquo The Annals of Statistics to appear

Cox J C Ingersoll J E and Ross S A (1985) ldquoA Theory of the TermStructure of Interest Ratesrdquo Econometrica 53 385ndash467

Daveacute R D and Stahl G (1997) ldquoOn the Accuracy of VaR Estimates Based onthe VariancendashCovariance Approachrdquo working paper Olshen and Associates

Duan J C (1997) ldquoAugumented GARCH(pq) Process and Its DiffusionLimitrdquo Journal of Econometrics 79 97ndash127

Duffie D and Pan J (1997) ldquoAn Overview of Value at Riskrdquo The Journal ofDerivatives 4 7ndash49

Fan J (2005) ldquoA Selective Overview of Nonparametric Methods in FinancialEconometricsrdquo (with discussion) Statistical Science 20 317ndash357

Fan J and Gu J (2003) ldquoSemiparametric Estimation of Value-at-RiskrdquoEconometrics Journal 6 261ndash290

Fan J Jiang J Zhang C and Zhou Z (2003) ldquoTime-Dependent DiffusionModels for Term Structure Dynamics and the Stock Price Volatilityrdquo Statis-tica Sinica 13 965ndash992

Fan J and Yao Q (1998) ldquoEfficient Estimation of Conditional Variance Func-tions in Stochastic Regressionrdquo Biometrika 85 645ndash660

(2003) Nonlinear Time Series Nonparametric and Parametric Meth-ods New York Springer-Verlag

Fan J and Zhang C M (2003) ldquoA Reexamination of Diffusion EstimatorsWith Applications to Financial Model Validationrdquo Journal of the AmericanStatistical Association 98 118ndash134

Genon-Catalot V Jeanthheau T and Laredo C (1999) ldquoParameter Esti-mation for Discretely Observed Stochastic Volatility Modelsrdquo Bernoulli 5855ndash872

Gijbels I Pope A and Wand M P (1999) ldquoUnderstanding ExponentialSmoothing via Kernel Regressionrdquo Journal of the Royal Statistical SocietySer B 61 39ndash50

Hall P and Hart J D (1990) ldquoNonparametric Regression With Long-RangeDependencerdquo Stochastic Processes and Their Applications 36 339ndash351

Hall P and Heyde C (1980) Martingale Limit Theorem and Its ApplicationsNew York Academic Press

Hallin M Lu Z and Tran L T (2001) ldquoDensity Estimation for Spatial Lin-ear Processesrdquo Bernoulli 7 657ndash668

Haumlrdle W Herwartz H and Spokoiny V (2002) ldquoTime-InhomogeneousMultiple Volatility Modellingrdquo Journal of Finance and Econometrics 155ndash95

Hong Y and Li H (2005) ldquoNonparametric Specification Testing forContinuous-Time Models With Applications to Term Structure of InterestrdquoReview of Financial Studies 18 37ndash84

Karatzas I and Shreve S (1991) Brownian Motion and Stochastic Calculus(2nd ed) New York Springer-Verlag

Kloeden D E Platen E Schurz H and Soslashrensen M (1996) ldquoOn Effectsof Discretization on Estimators of Drift Parameters for Diffusion ProcessesrdquoJournal of Applied Probability 33 1061ndash1076

Liptser R S and Shiryayev A N (2001) Statistics of Random Processes I II(2nd ed) New York Springer-Verlag

Mercurio D and Spokoiny V (2004) ldquoStatistical Inference for Time-Inhomogeneous Volatility Modelsrdquo The Annals of Statistics 32 577ndash602

Morgan J P (1996) RiskMetrics Technical Document (4th ed) New YorkAuthor

Nelson D B (1990) ldquoARCH Models as Diffusion Approximationsrdquo Journalof Econometrics 45 7ndash38

Revuz D and Yor M (1999) Continuous Martingales and Brownian MotionNew York Springer-Verlag

Robinson P M (1997) ldquoLarge-Sample Inference for Nonparametric Regres-sion With Dependent Errorsrdquo The Annals of Statistics 25 2054ndash2083

Rosenblatt M (1970) ldquoDensity Estimates and Markov Sequencesrdquo in Non-parametric Techniques in Statistical Inference ed M L Puri CambridgeUK Cambridge University Press pp 199ndash213

Ruppert D Wand M P Holst U and Houmlssjer O (1997) ldquoLocal PolynomialVariance Function Estimationrdquo Technometrics 39 262ndash273

Spokoiny V (2000) ldquoDrift Estimation for Nonparametric Diffusionrdquo The An-nals of Statistics 28 815ndash836

Stanton R (1997) ldquoA Nonparametric Models of Term Structure Dynamics andthe Market Price of Interest Rate Riskrdquo Journal of Finance LII 1973ndash2002

Vasicek O (1977) ldquoAn Equilibrium Characterization of Term Structurerdquo Jour-nal of Financial Economics 5 177ndash188

Volkonskii V A and Rozanov Yu A (1959) ldquoSome Limit Theorems forRandom Functions Part Irdquo Theory of Probability and Its Applications 4178ndash197

Wang Y (2002) ldquoAsymptotic Nonequivalence of GARCH Models and Diffu-sionsrdquo The Annals of Statistics 30 754ndash783

Page 4: Dynamic Integration of Time- and ... - Princeton Universityorfe.princeton.edu/~jqfan/papers/04/timestate4.pdf · Dynamic Integration of Time- and State-Domain Methods for Volatility

Fan Fan and Jiang Integrating Time and State Domains for Volatility Estimation 621

Markov process rs from (3) We then have the following re-sult

Theorem 2 Set s22(x) = 2ν0σ

4(x)p(x) Let x be in the inte-rior of the support of p(middot) Suppose that the second derivativesof μ(middot) and σ 2(middot) exist in a neighborhood of x Under conditions

(C3)ndash(C6)radic

Nh[σ 2S (x) minus σ 2(x)]s2(x)

Dminusrarr N (01)

3 DYNAMIC INTEGRATION OF TIMEndash ANDSTATEndashDOMAIN ESTIMATORS

In this section we first show how the optimal dynamicweights in (2) can be estimated and then prove that the time-domain and state-domain estimators are indeed asymptoticallyindependent

31 Estimation of Dynamic Weights

For the exponential smoothing estimator in (7) we can applythe asymptotic formula given in Theorem 1 to get an estimate ofits asymptotic variance But because the estimator is a weightedaverage of Y2

tminusi we also can obtain its variance directly by as-suming that Ytminusj sim N(0 σ 2

t ) for small j Indeed with the fore-going local homogeneous model we have

var(σ 2ESt)

asymp (1 minus λ)2

(1 minus λn)22σ 4

t

nsum

i=1

nsum

j=1

λi+jminus2ρt(|i minus j|)

= 2(1 minus λ)2σ 4t

(1 minus λn)2

n + 2

nminus1sum

k=1

ρt(k)λk(1 minus λ2(nminusk))

1 minus λ2

(9)

where ρt(k) = cor(Y2t Y2

tminusk) is the autocorrelation of the seriesY2

tminusk The autocorrelation can be estimated from the data inhistory Note that due to the locality of the exponential smooth-ing only ρt(k)rsquos with the first 30 lags say contribute to thevariance calculation

We now turn to estimate the variance of σ 2St = σ 2

S (rt) Detailsof this have been given by Fan and Yao (1998 2003 sec 62)Let

Vj(x) =Nminus1sum

i=1

(rti minus x)jW((

rti minus x)h1

)(10)

and ξi(x) = W(rti minusx

h1)V2(x) minus (rti minus x)V1(x)V0(x)V2(x) minus

V1(x)2 Then the local linear estimator can be expressed asσ 2

S (x) = sumNminus1i=1 ξi(x)Ri and its variance can be approximated

as

var(σ 2S (x)) asymp 2σ 4(x)

Nminus1sum

i=1

ξ2i (x) (11)

Substituting (9) and (11) into (2) we propose to combinethe time-domain and state-domain estimators with the dynamicweight

wt = σ 4St

sumNminus1i=1 ξ2

i (rt)

σ 4St

sumNminus1i=1 ξ2

i (rt) + ctσ4ESt

(12)

where ct = (1minusλ)2

(1minusλn)2 n+2sumnminus1

k=1 ρt(k)λk(1minusλ2(nminusk))(1minusλ2)For practical implementation we truncate the series ρt(k)tminus1

k=1

in the summation as ρt(k)30k=1 This results in the dynamically

integrated estimator

σ 2It = wtσ

2ESt + (1 minus wt)σ

2St (13)

where σ 2St = σ 2

S (rt) The function σ 2S (middot) uses historical data

up to the time t and we need to update this function as timeevolves Fortunately we need know only the function value atthe point rt which significantly reduces the computational costThe computational cost can be further reduced if we update theestimated function σ 2

St on a prescribed time schedule (eg onceevery 2 months for weekly data)

Finally we note that in the choice of weight only the vari-ance of the estimated volatility is considered not the meansquared error (MSE) This is mainly to facilitate the choice ofdynamic weights Because the smoothing parameters in σ 2

ESt

and σ 2S (x) have been tuned to optimize their performance sep-

arately their bias and variance trade-offs have been consideredindirectly Thus controlling the variance of the integrated esti-mator σ 2

It also controls to some extent the bias of the estima-tor

32 Sampling Properties

The fundamental component in the choice of dynamicweights is the asymptotic independence between the time- andstate-domain estimators The following theorem shows that thetime- and state-domain estimators are indeed asymptotically in-dependent To facilitate the expression of notations we presentthe result at the current time tN

Theorem 3 Suppose that the second derivatives of μ(middot) andσ 2(middot) exist in a neighborhood of rtN Under conditions (C1) and(C3)ndash(C6) if condition (C2) holds at time point t = tN then

(a) Asymptotic independence

[radicn(σ 2

EStNminus σ 2

tN )

s1tN

radicNh(σ 2

StNminus σ 2

tN )

s2tN

]T Dminusrarr N (0 I2)

where s1tN and s2tN = s2(rtN ) are as given in Theorems 1 and 2(b) Asymptotic normality of σ 2

tN with wtN in (2) If the

limit d = limNrarrinfin n[Nh] exists thenradic

Nhω[σ 2tN minus σ 2

tN ] DminusrarrN (01) where ω = w2

tN s21tN

d + (1 minus wtN )2s22tN

(c) Asymptotic normality of σ 2ItN

with an estimated weightwtN If the weight wtN converges to the weight wtN in proba-bility and the limit d = limNrarrinfin n[Nh] exists then

radicNhω times

(σ 2ItN

minus σ 2tN )

Dminusrarr N (01)

Because the estimators σ 2St and σ 2

ESt are consistent

wt asympsumtminus1

i=1 ξ2i (rt)

sumtminus1i=1 ξ2

i (rt) + ctasymp var(σ 2

S (rt))

var(σ 2S (rt)) + var(σ 2

ESt(rt))= wt

and hence the condition in Theorem 3(c) holds Note that thetheoretically optimal weight minimizing the variance in Theo-rem 3(b) is

wtN opt = ds22tN

s21tN

+ ds22tN

asymp var(σ 2S (rt))

var(σ 2S (rt)) + var(σ 2

ESt(rt))asymp wtN

622 Journal of the American Statistical Association June 2007

It follows that the integrated estimator σ 2ItN

is optimal in thesense that it achieves the minimum variance among all of theweighted estimators in (1)

When 0 lt d lt infin in Theorem 3 the effective sample sizesin both time and state domains are comparable thus neitherthe time-domain estimator nor the state-domain estimator dom-inates From Theorem 3 based on the optimal weight the as-ymptotic relative efficiencies of σ 2

ItNwith respect to σ 2

StNand

σ 2EStN

are

eff(σ 2ItN σ 2

StN ) = 1 + ds22tN s2

1tN and

eff(σ 2ItN σ 2

EStN ) = 1 + s21tN (ds2

2tN )

which are gt1 This demonstrates that the integrated estimatorσ 2

ItNis more efficient than the time-domain and state-domain

estimators

4 BAYESIAN INTEGRATION OFVOLATILITY ESTIMATES

Another possible approach is to consider the historical infor-mation as the prior and to incorporate it into the estimation ofvolatility using the Bayesian framework We now explore suchan approach

41 Bayesian Estimation of Volatility

The Bayesian approach is to consider the recent data Ytminusn

Ytminus1 as an independent sample from N(0 σ 2) [see (4)] andto consider the historical information being summarized in aprior To incorporate the historical information we assume thatthe variance σ 2 follows an inverse-gamma distribution with pa-rameters a and b which has the density function

f (σ 2) = baminus1(a)σ 2minus(a+1)exp(minusbσ 2)

Write σ 2 sim IG(ab) It is well known that

E(σ 2) = b

a minus 1

var(σ 2) = b2

(a minus 1)2(a minus 2) and (14)

mode(σ 2) = b

a + 1

The hyperparameters a and b are estimated from historical datausing the state-domain estimators

It can be easily shown that given Y = (Ytminusn Ytminus1) theposterior density of σ 2 is IG(alowastblowast) where alowast = a + n

2 andblowast = 1

2

sumni=1 Y2

tminusi + b From (14) the Bayesian mean of σ 2 is

σ 2 = blowast

alowast minus 1=

nsum

i=1

Y2tminusi + 2b

2(a minus 1) + n

This Bayesian estimator can be easily written as

σ 2B = n

n + 2(a minus 1)σ 2

MAt + 2(a minus 1)

n + 2(a minus 1)σ 2

P (15)

where σ 2MAt is the moving average estimator given by (5) and

σ 2P = b(aminus1) is the prior mean determined from the historical

data This combines the estimate based on the data and priorknowledge

The Bayesian estimator (15) uses the local average of n datapoints To incorporate the exponential smoothing estimator (6)we consider it the local average of nlowast = sumn

i=1 λiminus1 = 1minusλn

1minusλdata

points This leads to the following integrated estimator

σ 2Bt = nlowast

nlowast + 2(a minus 1)σ 2

ESt + 2(a minus 1)

2(a minus 1) + nlowast σ 2P

= 1 minus λn

1 minus λn + 2(a minus 1)(1 minus λ)σ 2

ESt

+ 2(a minus 1)(1 minus λ)

1 minus λn + 2(a minus 1)(1 minus λ)σ 2

P (16)

In particular when λ rarr 1 the estimator (16) reduces to (15)

42 Estimation of Prior Parameters

A reasonable source for the prior information in (16) is thehistorical data up to time t Thus the hyperparameters a and bshould depend on t and can be used to match with the momentsfrom the historical information Using the approximation model(4) we have

E[(Yt minus f (rt))

2 | rt] asymp σ 2(rt) and

(17)var

[(Yt minus f (rt))

2 | rt] asymp 2σ 4(rt)

These can be estimated from the historical data up to time tnamely the state-domain estimator σ 2

S (rt) Because we have as-sumed that the prior distribution for σ 2

t is IG(atbt) by (17) andthe method of moments we would get the following estimationequations by matching the moments from the prior distributionwith the moment estimation from the historical estimate

E(σ 2t ) = σ 2

S (rt) and var(σ 2t ) = 2σ 4

S (rt)

This together with (14) leads to

bt

at minus 1= σ 2

S (rt) andb2

t

(at minus 1)2(at minus 2)= 2σ 4

S (rt)

where E represents from the expectation with respect to theprior Solving the foregoing equations we obtain

at = 25 and bt = 15σ 2S (rt)

Substituting this into (16) we obtain the estimator

σ 2Bt = 1 minus λn

1 minus λn + 3(1 minus λ)σ 2

ESt +3(1 minus λ)

1 minus λn + 3(1 minus λ)σ 2

St (18)

Unfortunately the weights in (18) are static and do not dependon the time t Thus the Bayesian method that we use doesnot produce a satisfactory answer to this problem Howeverother implementations of the Bayesian method may yield thedynamic weights

5 SIMULATION STUDIES AND EXAMPLES

To facilitate the presentation we use the simple abbrevia-tions in Table 1 to denote seven volatility estimation methodsDetails on the first three methods have been given by Fan andGu (2003) In particular the first method is to estimate volatilityusing the standard deviation of the yields in the past year andthe RiskMetrics method is based on the exponential smooth-ing with λ = 94 The semiparametric method of Fan and Gu(2003) is an extension of a local model used in the exponential

Fan Fan and Jiang Integrating Time and State Domains for Volatility Estimation 623

Table 1 Seven Volatility Estimators

Hist The historical methodRiskM The RiskMetrics method of J P MorganSemi The semiparametric estimator (SEV) of Fan and Gu (2003)NonBay The nonparametric Bayesian method in (18)Integ The integration method of time and state domains in (13)GARCH The maximum likelihood method based on a GARCH(1 1)

modelStaDo The state-domain estimation method in Section 22

smoothing with the smoothing parameter determined by min-imizing the prediction error It includes exponential smoothingwith λ selected by data as a specific example For our integratedmethods the parameter λ is taken as 94

The following five measures are used to assess the per-formance of different procedures for estimating the volatilityOther related measures also can be used (see Daveacute and Stahl1997)

Measure 1 Exceedence Ratio Against Confidence LevelThis measure counts the number of the events for which theloss of an asset exceeds the loss predicted by the normal modelat a given confidence α It was given by Fan and Gu (2003) andcomputed as

ER(σ 2t ) = mminus1

T+msum

i=T+1

I(Yi lt 13minus1(α)σi) (19)

where 13minus1(α) is the α-quantile of the standard normal distribu-tion and m is the size of the out-sample This gives an indicationof how effectively the volatility estimator can be used for pre-diction

As noted by Fan and Gu (2003) one shortcoming of the mea-sure is its large Monte Carlo error Unless the postsample sizem is sufficiently large this measure has difficulty by differenti-ating the performance of various estimators due to the presenceof large error margins Note that the ER depends strongly onthe assumption of normality In our simulation study we usethe true α-quantile of the error distribution instead of 13minus1(α)

in (19) to compute the ER For real data analysis we use theα-quantile of the last 250 residuals for the in-sample data

Measure 2 Mean Absolute Deviation Error To motivate thismeasure we first consider the MSEs

PE(σ 2t ) = mminus1

T+msum

i=T+1

(Y2i minus σ 2

i )2

The expected value can be decomposed as

E(PE) = mminus1T+msum

i=T+1

E(σ 2i minus σ 2

i )2 + mminus1T+msum

i=T+1

E(Y2i minus σ 2

i )2

+ mminus1T+msum

i=T+1

E(Y2i minus σ 2

i )(σ 2i minus σ 2

i ) (20)

Because σ 2i is predicable at time ti and Y2

i minus σ 2i is a martingale

difference the third term is 0 [the drift is assumed to be neg-ligible in the forgoing formulation of PE otherwise the driftterm should be removed first In both cases it has an affec-tion O(2)] Therefore we need consider only the first two

terms The first term reflects the effectiveness of the estimatedvolatility whereas the second term is the size of the stochasticerror and independent of estimators As in all statistical pre-diction problems the second term is usually of an order ofmagnitude larger than the first term Thus a small improvementin PE could mean substantial improvement over the estimatedvolatility However due to the well-known fact that financialtime series contain outliers the MSE is not a robust measureTherefore we used the mean absolute deviation error (MADE)MADE(σ 2

t ) = mminus1 sumT+mi=T+1 |Y2

i minus σ 2i |

Measure 3 Square-Root Absolute Deviation Error An al-ternative to MADE is the Square-root absolute deviation error(RADE) defined as

RADE(σ 2t ) = mminus1

T+msum

i=T+1

∣∣| Yi | minusradic2πσi

∣∣

The constant factor comes from the fact that E|εt| = radic2π for

εt sim N(01) If the underlying error distribution deviates fromnormality then this measure is not robust

Measure 4 Ideal Mean Absolute Deviation Error and Oth-ers To assess the estimation of the volatility in simulations wealso can use the ideal mean absolute deviation error (IMADE)

IMADE = mminus1T+msum

i=T+1

|σ 2i minus σ 2

i |

and the ideal square root absolute deviation error (IRADE)

IRADE = mminus1T+msum

i=T+1

|σi minus σi|

Another intuitive measure is the relative IMADE (RIMADE)

RIMADE = mminus1T+msum

i=T+1

|σ 2i minus σ 2

i |σ 2

i

This measure may create some outliers when the true volatilityis quite small at certain time points For real data analysis bothmeasures are not applicable

51 Simulations

To assess the performance of the five estimation methodslisted in Table 1 we compute the average and standard devi-ation of each of the four measures over 600 simulations Gen-erally speaking the smaller the average (or the standard devia-tion) the better the estimation approach We also compute theldquoscorerdquo of an estimator which is the percentage of times among600 simulations that the estimator outperforms the average ofthe 7 methods in terms of an effectiveness measure To be morespecific for example consider RiskMetrics using MADE as aneffectiveness measure Let mi be the MADE of the RiskMet-rics estimator at the ith simulation and let mi be the average ofthe MADEs for the five estimators at the ith simulation Thenthe ldquoscorerdquo of the RiskMetrics approach in terms of the MADEis defined as 1

600

sum600i=1 I(mi lt mi) Obviously estimators with

higher scores are preferred In addition we define the ldquorelativelossrdquo of an estimator σ 2

t relative to σ 2It in terms of MADEs as

RLOSS(σ 2t σ 2

It) = MADE(σ 2t )MADE(σ 2

It) minus 1

624 Journal of the American Statistical Association June 2007

where MADE(σ 2t ) is the average of MADE(σ 2

t ) among simula-tions

Example 1 In this example we compare the proposedmethods with the maximum likelihood estimation approachfor a parametric generalized autoregressive conditional het-eroscedasticity [GARCH(1 1)] model This will give an assess-ment of how well our methods perform compared with the mostefficient estimation method

Because the GARCH model does not have a state variablerti but the diffusion model does the diffusion model cannot beused to fit the data simulated from the GARCH model A sen-sible approach is to use the diffusion limit of the GARCH(1 1)model to generate data There are several interesting works onreconciling the two modeling approaches along this line forexample Nelson (1990) established the continuous-time dif-fusion limit for the discrete ARCH model Duan (1997) pro-posed an augmented GARCH model to unify various paramet-ric GARCH models and derived its diffusion limit and Wang(2002) studied the asymptotic relationship between GARCHmodels and diffusions Here we use the result on the diffusionlimit of the GARCH(1 1) model of Wang (2002)

Specifically we first generate data from the GARCH(1 1)model

Yt = σtεt

σ 2t = c0 + a0σ

2tminus1 + b0Y2

tminus1

where the εtrsquos are from the standard normal distribution and thetrue parameters c0 = 498 times 10minus7 a0 = 9289615 and b0 =0411574 are from the fitted values of the parameters for thedaily exchange rate of the Australian dollar with the US dollarfrom January 3 1994 to December 29 2000 The GARCH(1 1)model is then fitted by maximum likelihood estimation

Second we use the diffusion limit (see Wang 2002 sec 23)of the foregoing GARCH model

drt = σt dW1t

dσ 2t = (β0 + β1σ

2t )dt + β2σ

2t dW2t

where W1t and W2t are two independent standard Wienerprocess and the parameters are computed as (β0 β1 β2) =(25896 times 10minus5minus15538 4197) To simulate the diffusionprocess σ 2

t we use the discrete-time order 10 strong ap-proximation scheme of Kloeden Platen Schurz and Soslashrensen(1996)

For each of the two models we generate 600 series of dailydata each with length 1200 For each simulation we set thefirst 900 observations as the ldquoin-samplerdquo data and the last 300observations as the ldquoout-samplerdquo data The results summarizedin Table 2 show that all estimators have acceptable ER valuesclose to 05 The integrated estimator has minimum RIMADEIMADE MADE IRADE and RADE on average among allof the nonparametric estimators Compared with the most ef-ficient maximum likelihood estimator (MLE) of the GARCHmodel the integrated estimator has smaller MADE but largerRIMADE IMADE and IRADE The RADEs of the MLEand the integrated estimator are very close demonstrating thatthe integrated estimator is very impressive in forecasting thevolatility of this example

Note that the dynamic weights for the integrated estimatorare estimated by minimizing the variance of the combined esti-mator in (1) As one anonymous referee pointed out one maychoose to minimize the MSE instead which seems to be dif-ficult and computationally intensive in current situations Asargued in Section 31 the biases and variances trade-off havebeen considered indirectly in the time-domain and state-domainestimation methods The bias of the integrated estimator wascontrolled before the weights were chosen

To investigate whether the foregoing intuition is correct herewe opt for the suggestion from the referee to visually displayhow the integrated estimator dominates the state-domaintime-domain estimators over time through the dynamic weights andcompare the biases of the state-domain estimator and the inte-grated estimator The weights and biases are shown in Figure 2

Table 2 Comparisons of Several Volatility Estimation Methods

Measure Empirical formula Hist RiskM Semi NonBay Integ GARCH StaDo

RIMADE Score () 4073 2805 6494 5092 7379 7629 4090Average 205 204 183 187 178 169 204

Standard (times10minus2) 655 331 450 308 397 367 567Relative loss () 1544 1444 266 497 0 minus476 1475

IMADE Score () 6177 2554 5342 4608 7212 7546 4090Average (times10minus5) 332 367 330 336 318 304 362Standard (times10minus6) 1118 744 877 690 851 800 1113Relative loss () 440 1544 373 581 0 minus447 1390

MADE Score () 2821 4958 4975 5659 8815 5743 7963Average (times10minus4) 164 153 153 152 149 152 149Standard (times10minus6) 2314 2182 2049 2105 1869 1852 1876Relative loss () 1016 284 293 209 0 207 11

IRADE Score () 6277 2404 5109 4708 7062 7713 4040Average (times10minus3) 403 448 407 410 391 369 446Standard (times10minus3) 121 075 092 070 093 085 131Relative loss () 317 1475 411 498 0 minus553 1422

RADE Score () 3055 5042 4341 5910 8464 5893 7913Average (times10minus2) 20 19 19 19 19 19 19Standard (times10minus4) 153 145 141 142 134 133 137Relative loss () 513 142 165 96 0 106 minus25

ER Average (times10minus2) 495 545 538 528 525 516 500Standard (times10minus2) 157 113 128 112 136 125 149

Fan Fan and Jiang Integrating Time and State Domains for Volatility Estimation 625

(a) (b)

Figure 2 The Dynamic Weights for Prediction for Example 1 (a) and the Biases for Prediction for the State-Domain Estimator ( minus minus minus) and theIntegration Estimator ( mdash) (b)

It is seen that the bias is actually controlled because both of theestimators have very close biases The weights are around 2and through the weights the integrated method improves boththe RiskMetrics method and the state-domain approach

Example 2 To simulate the interest rate data we considerthe CoxndashIngersollndashRoss (CIR) model

drt = κ(θ minus rt)dt + σ r12t dWt t ge t0

where the spot rate rt moves around a central location or long-run equilibrium level θ = 08571 at speed κ = 21459 The σ

is set at 07830 These parameters values given by Chapmanand Pearson (2000) satisfy the condition 2κθ ge σ 2 so that theprocess rt is stationary and positive The model has been studiedby Chapman and Pearson (2000) and Fan and Zhang (2003)

We simulated weekly data (600 runs) from the CIR modelFor each simulation we designated the first 900 observations

as the ldquoin-samplerdquo data and the last 300 observations the ldquoout-samplerdquo data The results summarized in Table 3 show that theintegrated method performs closely to the state-domain methodThis is mainly because the dynamic weights shown in Figure 3are very small and the integration estimator is very close to thestate-domain estimator supporting our statement at the end ofthe third paragraph in Section 1 Table 3 shows that the inte-grated estimator uniformly dominates the other estimators onaverage due to its highest score and lowest averaged RIMADEIMADE MADE IRADE and RADE The improvement inRIMADE IMADE and IRADE is about 100 This demon-strates that our integrated volatility method better captures thevolatility dynamics The historical simulation method performspoorly due to misspecification of the function of the volatilityparameter The results here show the advantage of aggregatingthe information of the time and state domains Note that all es-timators have reasonable ER values at level 05 and that the ERvalue of the integrated estimator is closest to 05

Table 3 Comparisons of Several Volatility Estimation Methods

Measure Empirical formula Hist RiskM Semi NonBay Integ GARCH StaDo

RIMADE Score () 501 1820 2104 3656 9933 2404 9917Average (times10minus1) 267 190 188 164 61 202 62Standard (times10minus1) 148 35 81 31 39 130 43Relative loss () 33732 21074 20798 16821 0 23012 112

IMADE Score () 768 1319 1469 2972 9967 2120 9950Average (times10minus6) 237 208 191 179 64 196 63Standard (times10minus6) 100 78 69 68 45 91 46Relative loss () 27162 22558 19905 18107 0 20696 minus163

MADE Score () 3823 5209 5776 5609 7145 5392 6945Average (times10minus5) 102 93 93 93 91 94 92Standard (times10minus6) 349 333 321 330 317 310 317Relative loss () 1100 212 221 162 0 229 21

IRADE Score () 818 1252 1402 3072 9983 2104 9950Average (times10minus4) 376 322 307 277 99 316 98Standard (times10minus4) 140 75 90 66 60 192 62Relative loss () 27855 22397 20898 17892 0 21821 minus111

RADE Score () 4040 5209 5042 5593 7379 4875 7279Average (times10minus2) 15 15 15 15 15 15 15Standard (times10minus4) 272 267 257 265 258 251 257Relative loss () 608 123 160 92 0 189 06

ER Average 056 055 054 053 049 053 049Standard 023 011 013 011 013 036 013

626 Journal of the American Statistical Association June 2007

(a) (b)

Figure 3 The Dynamic Weights for Prediction for Example 2 (a) and the Biases for Prediction for the State-Domain Estimator ( minus minus minus) and forthe Integration Estimator ( mdash) (b)

Again Figure 3(b) displays the biases for the state domainand integration estimators It can be seen that the state-domainand integration estimators have almost overlaid biases demon-strating that the bias of the integrated estimator is taken care ofeven though we choose the dynamic weights by minimizing thevariance

Example 3 To assess the performance of the proposed meth-ods under a nonstationary situation we now consider the geo-metric Brownian (GBM)

drt = μrt dt + σ rt dWt

where Wt is a standard one-dimensional Brownian motion Thisis a nonstationary process against which we check whether ourmethod continues to apply Note that the celebrated BlackndashScholes option price formula is derived from Osbornersquos as-sumption that the stock price follows the GBM model By Itocircrsquosformula we have

log rt minus log r0 = (μ minus σ 22)t + σ 2Wt

We set μ = 03 and σ = 26 in our simulations With theBrownian motion simulated from independent Gaussian incre-ments we can generate the samples for the GBM We simulate600 times with = 1252 For each simulation we generate1000 observations and use the first two-thirds of the observa-tions as in-sample data and the remaining observations as out-sample data

Because of the nonstationarity of the process the simulationresults are somewhat unstable with a few uncommon realiza-tions appearing Therefore we use a robust method to sum-marize the results For each measure we compute the relativeloss and the median along with the median absolute deviation(MAD) for scale estimation The MAD for sample ais

i=1 isdefined as MAD(a) = median|ai minus median(a)| i = 1 sIt is easy to see that MAD(a) provides a robust estimator ofthe standard deviation of ai Table 4 reports the results Thetable shows that the integrated estimator performs quite wellAll estimators have acceptable ER values The proposed esti-

Table 4 Robust Comparisons of Several Volatility Estimation Methods

Measure Empirical formula Hist RiskM Semi NonBay Integ GARCH StaDo

RIMADE Score () 733 4300 3583 7283 6133 2733 5233Median (times10minus1) 2254 1561 1770 1490 1382 2036 1443

MAD 035 012 024 012 034 036 037Relative loss () 6310 1296 2805 781 0 4732 444

IMADE Score () 1333 4333 2633 7333 6517 2550 5300Median (times10minus7) 217 174 193 163 146 265 154MAD (times10minus8) 566 474 486 445 322 817 357

Relative loss () 4864 1913 3211 1205 0 8145 561

MADE Score () 3900 3950 2650 4933 5200 4233 5133Median (times10minus6) 115 111 111 110 109 111 109MAD (times10minus7) 236 244 240 244 231 230 230

Relative loss () 592 143 181 123 0 174 29

Score () 1417 4317 2500 7317 6467 2683 5300IRADE Median (times10minus4) 346 267 310 255 251 378 266

MAD (times10minus5) 599 406 488 389 566 847 644Relative loss () 3787 649 2358 149 0 5037 591

RADE Score () 3900 3700 2433 5117 6150 2950 5750Median (times10minus4) 1581 1509 1515 1508 1504 1661 1503MAD (times10minus4) 185 201 198 200 193 257 192

Relative loss () 509 30 73 23 0 1042 minus07

ER Median 054 051 054 051 048 057 0480MAD 010 003 004 003 006 007 006

Fan Fan and Jiang Integrating Time and State Domains for Volatility Estimation 627

(a) (b)

Figure 4 The Dynamic Weights for Prediction for Example 3 (a) and the Biases for Prediction for the State-Domain Estimator ( minus minus minus) and theIntegration Estimator ( mdash) (b)

mators have the smallest measure values in median or robustscale This shows that our integrated method continues to per-form better than the others for this nonstationary case

Again Figure 4(b) displays the biases for the state domainand integration estimators It can be seen that the state-domainand integration estimators have almost overlaid biases demon-strating that the bias of the integrated estimator is taken care ofeven though we choose the dynamic weights by minimizing thevariance

52 Real Examples

In this section we apply the integrated volatility estimationmethods and other methods to the analysis of real financial data

521 Treasury Bond Here we consider the weekly returnsof three Treasury Bonds with terms of 1 5 and 10 years Thein-sample data are for January 4 1974ndashDecember 30 1994 andout-sample data are for January 6 1995ndashAugust 8 2003 Thetotal sample size is 1545 and the in-sample size is 1096 Theresults are reported in Table 5

From Table 5 for the 1-year Treasury Notes the integratedestimator is of the smallest MADE and the smallest RADE re-flecting that the integrated estimation method of the volatilityis the best among the seven methods For the bonds with 5-or 10-year maturities the seven estimators have close MADEsand RADEs where the historical simulation method is bet-ter than the RiskMetrics in terms of MADE and RADE andthe integrated estimation approach has the smallest MADEsIn addition the integrated estimator has ER values closest to

05 This demonstrates the advantage of using state-domaininformation which can improve the time-domain predictionof the volatilities in the interest dynamics The nonparametricBayesian method does not perform as well as the integrated es-timator because it uses fixed weights and the inverse-gammaprior which may not be true in practice

522 Exchange Rate We analyze the daily exchange rateof several foreign currencies against the US dollar The dataare for January 3 1994ndashAugust 1 2003 The in-sample dataconsist of the observations before January 1 2001 and the out-sample data consist of the remaining observations The resultsreported in Table 6 show that the integrated estimator has thesmallest MADEs and moderate RADEs for the exchange ratesNote that the RADE measure is effective only when the dataseem to be from a normal distribution and thus cannot calibratethe accuracy of volatility forecast for data from an unknowndistribution For this reason the RADE might not be a goodmeasure of performance for this example Both the integratedvolatility estimation and GARCH perform nicely for this exam-ple They outperform other nonparametric methods

Figure 5 reports the dynamic weights for the daily exchangerate of the UK pound to the US dollar Because the RiskMestimator is far more informative than the StaDo estimator ourintegrated estimator becomes basically the time-domain esti-mator by automatically choosing large dynamic weights Thisexemplifies our statement in the paragraph immediately before(1) in Section 1

Table 5 Comparisons of Several Volatility Estimation Methods

Term Measure (times10minus2) Hist RiskM Semi NonBay Integ GARCH StaDo

1 year MADE 1044 787 787 794 732 893 914RADE 5257 4231 4256 4225 4105 4402 4551

ER 2237 2009 2013 1562 3795 0 7366

5 years MADE 1207 1253 1296 1278 1201 1245 1638RADE 5315 5494 5630 5563 5571 5285 6543

ER 671 1339 1566 1112 5804 0 8929

10 years MADE 1041 1093 1103 1111 1018 1046 1366RADE 4939 5235 5296 5280 5152 4936 6008

ER 1112 1563 1790 1334 4911 0 6920

628 Journal of the American Statistical Association June 2007

Table 6 Comparison of Several Volatility Estimation Methods

Currency Measure Hist RiskM Semi NonBay Integ GARCH StaDo

UK MADE (times10minus4) 614 519 536 519 493 506 609RADE (times10minus3) 3991 3424 3513 3440 3492 3369 3900

ER 009 014 019 015 039 005 052

Australia MADE (times10minus4) 172 132 135 135 126 132 164RADE (times10minus3) 1986 1775 1830 1798 1761 1768 2033

ER 054 025 026 022 042 018 045

Japan MADE (times10minus1) 5554 5232 5444 5439 5064 5084 6987RADE (times10minus1) 3596 3546 3622 3588 3559 3448 4077

ER 012 011 019 012 028 003 032

6 CONCLUSIONS

We have proposed a dynamically integrated method and aBayesian method to aggregate the information from the timedomain and the state domain We studied the performance com-parisons both empirically and theoretically We showed that theproposed integrated method effectively aggregates the informa-tion from both the time and state domains and has advantagesover some previous methods for example it is powerful in fore-casting volatilities for the yields of bonds and for exchangerates Our study also revealed that proper use of informationfrom both the time domain and the state domain makes volatil-ity forecasting more accurate Our method exploits both thecontinuity in the time domain and the stationarity in the statedomain and can be applied to situations in which these twoconditions hold approximately

APPENDIX CONDITIONS AND PROOFSOF THEOREMS

We state technical conditions for the proof of our results

(C1) (Global Lipschitz condition) There exists a constant k0 ge 0such that

|μ(x) minus μ(y)| + |σ(x) minus σ(y)| le k0|x minus y| for all x y isin R

(C2) Given time point t gt 0 there exists a constant L gt 0 such that

E|μ(rs)|4(p+δ) le L and E|σ(rs)|4(p+δ) le L

for any s isin [t minus η t] where η is some positive constant p is a positiveinteger and δ is some small positive constant

Figure 5 Dynamic Weights for the Daily Exchange Rate of the UKPound and the US Dollar

(C3) The discrete observations rtiNi=0 satisfy the stationarity con-

ditions of Banon (1978) Furthermore the G2 condition of Rosenblatt(1970) holds for the transition operator

(C4) The conditional density p(y|x) of rti+ given rti is continuousin the arguments (y x) and is bounded by a constant independent of

(C5) The kernel W is a bounded symmetric probability densityfunction with compact support say [minus11]

(C6) (N minus n)h rarr infin (N minus n)h5 rarr 0 and (N)h rarr 0

Condition (C1) ensures that (3) has a unique strong solution rt t ge0 continuous with probability 1 if the initial value r0 satisfies thatP(|r0| lt infin) = 1 (see thm 46 of Liptser and Shiryayev 2001) Condi-tion (C2) indicates that given time point t gt 0 there is a time interval[t minus η t] on which the drift and the volatility have finite 4(p + δ)thmoments which is needed for deriving asymptotic normality of thetime-domain estimate σ 2

ESt Conditions (C3)ndash(C5) are similar to thoseof Fan and Zhang (2003) Condition (C6) is assumed for bandwidthswhere undersmoothing is uses to yield zero bias in the asymptotic nor-mality of the state-domain estimate

Throughout the proof we denote by M a generic positive constantand use μs and σs to represent μ(rs) and σ(rs)

Proposition A1 Under conditions (C1) and (C2) we have for al-most all sample paths

|σ 2s minus σ 2

u | le K|s minus u|(2pminus1)(4p) (A1)

for any su isin [t minus η t] where the coefficient K satisfies E[K2(p+δ)]ltinfin

Proof First we show that the process rs is locally Houmllder-continuous with order q = (p minus 1)(2p) and coefficient K1 such that

E[K4(p+δ)1 ] lt infin that is

|rs minus ru| le K1|s minus u|q foralls u isin [t minus η t] (A2)

In fact by Jensenrsquos inequality and martingale moment inequalities(Karatzas and Shreve 1991 sec 33D p 163) we have

E|ru minus rs|4(p+δ)

le M

(E

∣∣∣∣int u

sμv dv

∣∣∣∣4(p+δ)

+ E

∣∣∣∣int u

sσv dWv

∣∣∣∣4(p+δ))

le M(u minus s)4(p+δ)minus1int u

sE|μv|4(p+δ) dv

+ M(u minus s)2(p+δ)minus1int u

sE|σv|4(p+δ) dv

le M(u minus s)2(p+δ)

Then by theorem 21 of Revuz and Yor (1999 p 26)

E[(

supt 13=s

|rs minus ru||s minus u|α)4(p+δ)]

lt infin (A3)

Fan Fan and Jiang Integrating Time and State Domains for Volatility Estimation 629

for any α isin [02(p+δ)minus1

4(p+δ)) Let α = pminus1

2p and K1 = sups 13=u|rs minusru||sminusu|(pminus1)(2p) Then E[K4(p+δ)

1 lt infin] and the inequality (A2)holds

Second by condition (C1) we have |σ 2s minus σ 2

u | le k0(σs + σu)|rs minusru| This combined with (A2) leads to

|σ 2s minus σ 2

u | le k0(σs + σu)K1|s minus u|q equiv K|s minus u|q

Then by Houmllderrsquos inequality

E[K2(p+δ)

] le ME[(σsK1)2(p+δ)

]

le Mradic

E[σ

4(p+δ)s

]E[K4(p+δ)

1

]lt infin

Proof of Theorem 1

Let Zis = (rs minus rti)2 Applying Itocircrsquos formula to Zis we obtain

dZis = 2

(int s

tiμu du +

int s

tiσu dWu

)(μs ds + σs dWs

)+ σ 2

s ds

= 2

[(int s

tiμu du +

int s

tiσu dWu

)μs ds + σs

(int s

tiμu du

)dWs

]

+ 2

(int s

tiσu dWu

)σs dWs + σ 2

s ds

Then Y2i can be decomposed as Y2

i = 2ai + 2bi + σ 2i where

ai = minus1[int ti+1

tiμs ds

int s

tiμu du +

int ti+1

tiμs ds

int s

tiσu dWu

+int ti+1

tiσs dWs

int s

tiμu du

]

bi = minus1int ti+1

ti

int s

tiσu dWuσs dWs

and

σ 2i = minus1

int ti+1

tiσ 2

s ds

Therefore σ 2ESt can be written as

σ 2ESt = 2

1 minus λ

1 minus λn

tminus1sum

i=tminusn

λtminusiminus1ai + 21 minus λ

1 minus λn

tminus1sum

i=tminusn

λtminusiminus1bi

+ 1 minus λ

1 minus λn

tminus1sum

i=tminusn

λtminusiminus1σ 2i

equiv An + Bn + Cn

By Proposition A1 as n rarr 0 |Cn minus σ 2t | le K(n)q where q =

(2p minus 1)(4p) This combined with Lemmas A1 and A2 completesthe proof of the theorem

Lemma A1 If condition (C2) is satisfied then E[A2n] = O()

Proof Simple algebra gives the result In fact

E(a2i ) le 3E

[minus1

int ti+1

tiμs ds

int s

tiμu du

]2

+ 3E

[minus1

int ti+1

tiμs ds

int s

tiσu dWu

]2

+ 3E

[minus1

int ti+1

tiσs dWs

int s

tiμu du

]2

equiv I1() + I2() + I3()

Applying Jensenrsquos inequality we obtain that

I1() = O(minus1)E

[int ti+1

ti

int s

tiμ2

s μ2u du ds

]

= O(minus1)

int ti+1

ti

int s

tiE(μ4

u + μ4s )du ds = O()

By Jensenrsquos inequality Houmllderrsquos inequality and martingale momentinequalities we have

I2() = O(minus1)

int ti+1

tiE

(μs

int s

tiσu dWu

)2ds

= O(minus1)

int ti+1

ti

E[μs]4E

[int ti+1

tiσu dWu

]412ds

= O()

Similarly I3() = O() Therefore E(a2i ) = O() Then by the

CauchyndashSchwartz inequality and noting that n(1 minus λ) = O(1) we ob-tain that

E[A2n] le n

(1 minus λ

1 minus λn

)2 nsum

i=1

λ2(nminusi)E(a2i ) = O()

Lemma A2 Under conditions (C1) and (C2) if n rarr infin andn rarr 0 then

sminus11t

radicnBn

Dminusrarr N (01) (A4)

Proof Note that bj = σ 2t minus1 int tj+1

tj (Ws minus Wtj)dWs + εj where

εj = minus1int tj+1

tj(σs minus σt)

[int s

tjσu dWu

]dWs

+ minus1σt

int tj+1

tj

[int s

tj(σu minus σt)dWu

]dWs

By the central limit theorem for martingales (see Hall and Heyde 1980cor 31) it suffices to show that

V2n equiv E[sminus2

1t nB2n] rarr 1 (A5)

and that the following Lyapunov condition holds

tminus1sum

i=tminusn

E

(radicn

1 minus λ

1 minus λn λtminusiminus1bi

)4rarr 0 (A6)

Note that

2

2E(ε2

j ) le E

int tj+1

tj(σs minus σt)

[int s

tjσu dWu

]dWs

2

+ σ 2t E

int tj+1

tj

[int s

tj(σu minus σt)dWu

]dWs

2

equiv Ln1 + Ln2 (A7)

By Jensenrsquos inequality Houmllderrsquos inequality and moment inequalitiesfor martingale we have

Ln1 leint tj+1

tjE

(σs minus σt)

2[int s

tjσu dWu

]2ds

leint tj+1

tj

E(σs minus σt)

4E

[int s

tjσu dWu

]412ds

leint tj+1

tj

E[k0K1(n)q]436

int s

tjE(σ 4

u )du

12ds

le M(n)2q2 (A8)

630 Journal of the American Statistical Association June 2007

where K1 is defined in Proposition A1 Similarly

Ln2 le M(n)2q2 (A9)

By (A7) (A8) and (A9) we have E(ε2j ) le M(n)2q therefore

E[σminus4t b2

j ] = 12 + O((n)q) By the theory of stochastic calculus sim-

ple algebra gives that E(bj) = 0 and E(bibj) = 0 for i 13= j It followsthat

V2n = E(sminus2

1t nB2n) =

tminus1sum

i=tminusn

E

(2s1t

radicn

1 minus λ

1 minus λn λtminusiminus1bi

)2rarr 1

that is (A5) holds For (A6) it suffices to prove that E(b4j ) is

bounded which holds from condition (C2) and by applying the mo-ment inequalities for martingales to b4

j

Proof of Theorem 2

The proof is completed along the same lines as given by Fan andZhang (2003)

Proof of Theorem 3

By Fan and Yao (1998) the volatility estimator σ 2StN

behaves asif the instantaneous return function f were known Thus without lossof generality we assume that f (x) = 0 and hence Ri = Y2

i Let Y =(Y2

0 Y2Nminus1)T and W = diagWh(rt0 minus rtN ) Wh(rtNminus1 minus rtN )

and let X be a (N minus t0)times 2 matrix with each of the elements in the firstcolumn being 1 and the second column being (rt0 minus rtN rtNminus1 minusrtN )prime Let mi = E[Y2

i |rti ] m = (m0 mNminus1)T and e1 = (10)T Define SN = XT WX and TN = XT WY Then it can be written thatσ 2

StN= eT

1 Sminus1N TN (see Fan and Yao 2003) Hence

σ 2StN

minus σ 2tN = eT

1 Sminus1N XT Wm minus XβN + eT

1 Sminus1N XT W(Y minus m)

equiv eT1 b + eT

1 t (A10)

where βN = (m(rtN ) mprime(rtN ))T with m(rx) = E[Y21 |rt1 = x] Follow-

ing Fan and Zhang (2003) the bias vector b converges in probability toa vector b with b = O(h2) = o(1

radic(N)h) In what follows we show

that the centralized vector t is asymptotically normalIn fact write u = (N)minus1Hminus1XT W(Y minus m) where H = diag1h

Then following Fan and Zhang (2003) the vector t can be written as

t = pminus1(rtN )Hminus1Sminus1u(1 + op(1)) (A11)

where S = (μi+jminus2)ij=12 with μj = intujW(u)du and p(x) is the in-

variant density of rt defined in Section 22 For any constant vector cdefine

QN = cT u = 1

N

Nminus1sum

i=0

Y2i minus miCh

(rti minus rtN

)

where Ch(middot) = 1hC(middoth) with C(x) = c0W(x) + c1xW(x) Applyingthe ldquobig-blockrdquo and ldquosmall-blockrdquo arguments of Fan and Yao (2003thm 63 p 238) we obtain

θminus1(rtN )radic

(N)hQNDminusrarr N(01) (A12)

where θ2(rtN ) = 2p(rtN )σ 4(rtN )int +infinminusinfin C2(u)du In what follows we

decompose QN into two parts QprimeN and Qprimeprime

N which satisfy the follow-ing

(a) NhE[θminus1(rtN )QprimeN ]2 le (hN)(hminus1aN(1 + o(1)) + (N) times

o(hminus1)) rarr 0

(b) QprimeprimeN is identically distributed as QN and is asymptotically inde-

pendent of σ 2EStN

Define QprimeN = 1

NsumaN

i=0Y2i minus E[Y2

i |rti ]Ch(rti minus rtN ) and QprimeprimeN = QN minus

QprimeN where aN is a positive integer satisfying aN = o(N) and aN rarr

infin Obviously aN n Let ϑNi = (Y2i minus mi)Ch(rti minus rtN ) Then fol-

lowing Fan and Zhang (2003)

var[θminus1(

rtN)ϑN1

] = hminus1(1 + o(1))

and

Nminus2sum

=1

| cov(ϑN1 ϑN+1)| = o(hminus1)

which yields the result in (a) This combined with (A12) (a) and thedefinition of Qprimeprime

N leads to

θminus1(rtN )radic

NhQprimeprimeN

Dminusrarr N(01) (A13)

Note that the stationarity conditions of Banon (1978) and the G2 con-dition of Rosenblatt (1970) on the transition operator imply that theρt mixing coefficient ρt() of rti decays exponentially and that thestrong mixing coefficient α() le ρt() it follows that∣∣E exp

(Qprimeprime

N + σ 2EStN

) minus E expiξ(QprimeprimeN)E exp

(iξ σ 2

EStN

)∣∣

le 32α((aN minus n)) rarr 0

for any ξ isin R Using the theorem of Volkonskii and Rozanov (1959)we get the asymptotic independence of σ 2

EStNand Qprimeprime

N

By (a)radic

NhQprimeN is asymptotically negligible Then by Theorem 1

d1θminus1(rtN

)radicNhQN + d2Vminus12

2radic

n[σ 2

EStNminus σ 2(

rtN)]

DminusrarrN (0d21 + d2

2)

for any real d1 and d2 where V2 = ec+1ecminus1σ 4(rtN ) Because QN is a

linear transformation of u

Vminus12[ radic

Nhuradicn[σ 2

EStNminus σ 2(rtN )]

]Dminusrarr N (0 I3)

where V = blockdiagV1V2 with V1 = 2σ 4(rtN )p(rtN )Slowast whereSlowast = (νi+jminus2)ij=12 with νj = int

ujW2(u)du This combined with

(A11) gives the joint asymptotic normality of t and σ 2EStN

Note that

b = op(1radic

Nh) and it follows that

minus12(radic

Nh[σ 2StN

minus σ 2(rtN )]radic

n[σ 2EStN

minus σ 2(rtN )])

Dminusrarr N (0 I2)

where = diag2σ 4(rtN )ν0p(rtN )V2 Note that σ 2StN

and σ 2EStN

are asymptotically independent it follows that the asymptotical nor-mality in (b) holds

The result in (c) follows from (b) and the fact that wtNPrarr wtN and

σ 2ItN

= wtN σ 2EStN

+ (1 minus wtN )σ 2StN

It follows that

σ 2ItN minus σ 2

tN

= σ 2tN minus σ 2

tN + (wtN minus wtN

)[(σ 2

EStNminus σ 2

tN

) minus (σ 2

StNminus σ 2

tN

)]

equiv Ln1 + (wtN minus wtN

)(Ln2 minus Ln3)

Because Ln1 Ln2 and Ln3 are of the same order the second and thirdterms are dominated by the first term Ln1 Then σ 2

ItNminus σ 2

tN = (σ 2tN minus

σ 2tN )(1 + op(1)) and thus σ 2

ItNminus σ 2

tN shares the same asymptotical

normality as σ 2tN minus σ 2

tN

[Received November 2005 Revised December 2006]

Fan Fan and Jiang Integrating Time and State Domains for Volatility Estimation 631

REFERENCESAiumlt-Sahalia Y and Mykland P (2004) ldquoEstimators of Diffusions With Ran-

domly Spaced Discrete Observations A General Theoryrdquo The Annals of Sta-tistics 32 2186ndash2222

Aiumlt-Sahalia Y Mykland P A and Zhang L (2005) ldquoHow Often to Samplea Continuous-Time Process in the Presence of Market Microstructure NoiserdquoReview of Financial Studies 18 351ndash416

Arapis M and Gao J (2004) ldquoNonparametric Kernel Estimation and Testingin Continuous-Time Financial Econometricsrdquo manuscript

Bandi F and Phillips (2003) ldquoFully Nonparametric Estimation of Scalar Dif-fusion Modelsrdquo Econometrica 71 241ndash283

Banon G (1978) ldquoNonparametric Identification for Diffusion ProcessesrdquoSIAM Journal of Control Optimization 16 380ndash395

Barndoff-Neilsen O E and Shephard N (2001) ldquoNon-Gaussian OrnsteinndashUhlenbeckndashBased Models and Some of Their Uses in Financial Economicsrdquo(with discussion) Journal of the Royal Statistical Society Ser B 63167ndash241

(2002) ldquoEconometric Analysis of Realized Volatility and Its Use inEstimating Stochastic Volatility Modelsrdquo Journal of the Royal Statistical So-ciety Ser B 64 253ndash280

Bollerslev T and Zhou H (2002) ldquoEstimating Stochastic Volatility DiffusionUsing Conditional Moments of Integrated Volatilityrdquo Journal of Economet-rics 109 33ndash65

Brenner R J Harjes R H and Kroner K F (1996) ldquoAnother Look at Mod-els of the Short-Term Interest Raterdquo Journal of Financial and QuantitativeAnalysis 31 85ndash107

Cai Z and Hong Y (2003) ldquoNonparametric Methods in Continuous-TimeFinance A Selective Reviewrdquo in Recent Advances and Trends in Nonpara-metric Statistics eds M G Akritas and D M Politis Amsterdam Elsevierpp 283ndash302

Chan K C Karolyi A G Longstaff F A and Sanders A B (1992)ldquoAn Empirical Comparison of Alternative Models of the Short-Term Inter-est Raterdquo Journal of Finance 47 1209ndash1227

Chapman D A and Pearson N D (2000) ldquoIs the Short Rate Drift ActuallyNonlinearrdquo Journal of Finance 55 355ndash388

Chen S X Gao J and Tang C Y (2007) ldquoA Test for Model Specificationof Diffusion Processesrdquo The Annals of Statistics to appear

Cox J C Ingersoll J E and Ross S A (1985) ldquoA Theory of the TermStructure of Interest Ratesrdquo Econometrica 53 385ndash467

Daveacute R D and Stahl G (1997) ldquoOn the Accuracy of VaR Estimates Based onthe VariancendashCovariance Approachrdquo working paper Olshen and Associates

Duan J C (1997) ldquoAugumented GARCH(pq) Process and Its DiffusionLimitrdquo Journal of Econometrics 79 97ndash127

Duffie D and Pan J (1997) ldquoAn Overview of Value at Riskrdquo The Journal ofDerivatives 4 7ndash49

Fan J (2005) ldquoA Selective Overview of Nonparametric Methods in FinancialEconometricsrdquo (with discussion) Statistical Science 20 317ndash357

Fan J and Gu J (2003) ldquoSemiparametric Estimation of Value-at-RiskrdquoEconometrics Journal 6 261ndash290

Fan J Jiang J Zhang C and Zhou Z (2003) ldquoTime-Dependent DiffusionModels for Term Structure Dynamics and the Stock Price Volatilityrdquo Statis-tica Sinica 13 965ndash992

Fan J and Yao Q (1998) ldquoEfficient Estimation of Conditional Variance Func-tions in Stochastic Regressionrdquo Biometrika 85 645ndash660

(2003) Nonlinear Time Series Nonparametric and Parametric Meth-ods New York Springer-Verlag

Fan J and Zhang C M (2003) ldquoA Reexamination of Diffusion EstimatorsWith Applications to Financial Model Validationrdquo Journal of the AmericanStatistical Association 98 118ndash134

Genon-Catalot V Jeanthheau T and Laredo C (1999) ldquoParameter Esti-mation for Discretely Observed Stochastic Volatility Modelsrdquo Bernoulli 5855ndash872

Gijbels I Pope A and Wand M P (1999) ldquoUnderstanding ExponentialSmoothing via Kernel Regressionrdquo Journal of the Royal Statistical SocietySer B 61 39ndash50

Hall P and Hart J D (1990) ldquoNonparametric Regression With Long-RangeDependencerdquo Stochastic Processes and Their Applications 36 339ndash351

Hall P and Heyde C (1980) Martingale Limit Theorem and Its ApplicationsNew York Academic Press

Hallin M Lu Z and Tran L T (2001) ldquoDensity Estimation for Spatial Lin-ear Processesrdquo Bernoulli 7 657ndash668

Haumlrdle W Herwartz H and Spokoiny V (2002) ldquoTime-InhomogeneousMultiple Volatility Modellingrdquo Journal of Finance and Econometrics 155ndash95

Hong Y and Li H (2005) ldquoNonparametric Specification Testing forContinuous-Time Models With Applications to Term Structure of InterestrdquoReview of Financial Studies 18 37ndash84

Karatzas I and Shreve S (1991) Brownian Motion and Stochastic Calculus(2nd ed) New York Springer-Verlag

Kloeden D E Platen E Schurz H and Soslashrensen M (1996) ldquoOn Effectsof Discretization on Estimators of Drift Parameters for Diffusion ProcessesrdquoJournal of Applied Probability 33 1061ndash1076

Liptser R S and Shiryayev A N (2001) Statistics of Random Processes I II(2nd ed) New York Springer-Verlag

Mercurio D and Spokoiny V (2004) ldquoStatistical Inference for Time-Inhomogeneous Volatility Modelsrdquo The Annals of Statistics 32 577ndash602

Morgan J P (1996) RiskMetrics Technical Document (4th ed) New YorkAuthor

Nelson D B (1990) ldquoARCH Models as Diffusion Approximationsrdquo Journalof Econometrics 45 7ndash38

Revuz D and Yor M (1999) Continuous Martingales and Brownian MotionNew York Springer-Verlag

Robinson P M (1997) ldquoLarge-Sample Inference for Nonparametric Regres-sion With Dependent Errorsrdquo The Annals of Statistics 25 2054ndash2083

Rosenblatt M (1970) ldquoDensity Estimates and Markov Sequencesrdquo in Non-parametric Techniques in Statistical Inference ed M L Puri CambridgeUK Cambridge University Press pp 199ndash213

Ruppert D Wand M P Holst U and Houmlssjer O (1997) ldquoLocal PolynomialVariance Function Estimationrdquo Technometrics 39 262ndash273

Spokoiny V (2000) ldquoDrift Estimation for Nonparametric Diffusionrdquo The An-nals of Statistics 28 815ndash836

Stanton R (1997) ldquoA Nonparametric Models of Term Structure Dynamics andthe Market Price of Interest Rate Riskrdquo Journal of Finance LII 1973ndash2002

Vasicek O (1977) ldquoAn Equilibrium Characterization of Term Structurerdquo Jour-nal of Financial Economics 5 177ndash188

Volkonskii V A and Rozanov Yu A (1959) ldquoSome Limit Theorems forRandom Functions Part Irdquo Theory of Probability and Its Applications 4178ndash197

Wang Y (2002) ldquoAsymptotic Nonequivalence of GARCH Models and Diffu-sionsrdquo The Annals of Statistics 30 754ndash783

Page 5: Dynamic Integration of Time- and ... - Princeton Universityorfe.princeton.edu/~jqfan/papers/04/timestate4.pdf · Dynamic Integration of Time- and State-Domain Methods for Volatility

622 Journal of the American Statistical Association June 2007

It follows that the integrated estimator σ 2ItN

is optimal in thesense that it achieves the minimum variance among all of theweighted estimators in (1)

When 0 lt d lt infin in Theorem 3 the effective sample sizesin both time and state domains are comparable thus neitherthe time-domain estimator nor the state-domain estimator dom-inates From Theorem 3 based on the optimal weight the as-ymptotic relative efficiencies of σ 2

ItNwith respect to σ 2

StNand

σ 2EStN

are

eff(σ 2ItN σ 2

StN ) = 1 + ds22tN s2

1tN and

eff(σ 2ItN σ 2

EStN ) = 1 + s21tN (ds2

2tN )

which are gt1 This demonstrates that the integrated estimatorσ 2

ItNis more efficient than the time-domain and state-domain

estimators

4 BAYESIAN INTEGRATION OFVOLATILITY ESTIMATES

Another possible approach is to consider the historical infor-mation as the prior and to incorporate it into the estimation ofvolatility using the Bayesian framework We now explore suchan approach

41 Bayesian Estimation of Volatility

The Bayesian approach is to consider the recent data Ytminusn

Ytminus1 as an independent sample from N(0 σ 2) [see (4)] andto consider the historical information being summarized in aprior To incorporate the historical information we assume thatthe variance σ 2 follows an inverse-gamma distribution with pa-rameters a and b which has the density function

f (σ 2) = baminus1(a)σ 2minus(a+1)exp(minusbσ 2)

Write σ 2 sim IG(ab) It is well known that

E(σ 2) = b

a minus 1

var(σ 2) = b2

(a minus 1)2(a minus 2) and (14)

mode(σ 2) = b

a + 1

The hyperparameters a and b are estimated from historical datausing the state-domain estimators

It can be easily shown that given Y = (Ytminusn Ytminus1) theposterior density of σ 2 is IG(alowastblowast) where alowast = a + n

2 andblowast = 1

2

sumni=1 Y2

tminusi + b From (14) the Bayesian mean of σ 2 is

σ 2 = blowast

alowast minus 1=

nsum

i=1

Y2tminusi + 2b

2(a minus 1) + n

This Bayesian estimator can be easily written as

σ 2B = n

n + 2(a minus 1)σ 2

MAt + 2(a minus 1)

n + 2(a minus 1)σ 2

P (15)

where σ 2MAt is the moving average estimator given by (5) and

σ 2P = b(aminus1) is the prior mean determined from the historical

data This combines the estimate based on the data and priorknowledge

The Bayesian estimator (15) uses the local average of n datapoints To incorporate the exponential smoothing estimator (6)we consider it the local average of nlowast = sumn

i=1 λiminus1 = 1minusλn

1minusλdata

points This leads to the following integrated estimator

σ 2Bt = nlowast

nlowast + 2(a minus 1)σ 2

ESt + 2(a minus 1)

2(a minus 1) + nlowast σ 2P

= 1 minus λn

1 minus λn + 2(a minus 1)(1 minus λ)σ 2

ESt

+ 2(a minus 1)(1 minus λ)

1 minus λn + 2(a minus 1)(1 minus λ)σ 2

P (16)

In particular when λ rarr 1 the estimator (16) reduces to (15)

42 Estimation of Prior Parameters

A reasonable source for the prior information in (16) is thehistorical data up to time t Thus the hyperparameters a and bshould depend on t and can be used to match with the momentsfrom the historical information Using the approximation model(4) we have

E[(Yt minus f (rt))

2 | rt] asymp σ 2(rt) and

(17)var

[(Yt minus f (rt))

2 | rt] asymp 2σ 4(rt)

These can be estimated from the historical data up to time tnamely the state-domain estimator σ 2

S (rt) Because we have as-sumed that the prior distribution for σ 2

t is IG(atbt) by (17) andthe method of moments we would get the following estimationequations by matching the moments from the prior distributionwith the moment estimation from the historical estimate

E(σ 2t ) = σ 2

S (rt) and var(σ 2t ) = 2σ 4

S (rt)

This together with (14) leads to

bt

at minus 1= σ 2

S (rt) andb2

t

(at minus 1)2(at minus 2)= 2σ 4

S (rt)

where E represents from the expectation with respect to theprior Solving the foregoing equations we obtain

at = 25 and bt = 15σ 2S (rt)

Substituting this into (16) we obtain the estimator

σ 2Bt = 1 minus λn

1 minus λn + 3(1 minus λ)σ 2

ESt +3(1 minus λ)

1 minus λn + 3(1 minus λ)σ 2

St (18)

Unfortunately the weights in (18) are static and do not dependon the time t Thus the Bayesian method that we use doesnot produce a satisfactory answer to this problem Howeverother implementations of the Bayesian method may yield thedynamic weights

5 SIMULATION STUDIES AND EXAMPLES

To facilitate the presentation we use the simple abbrevia-tions in Table 1 to denote seven volatility estimation methodsDetails on the first three methods have been given by Fan andGu (2003) In particular the first method is to estimate volatilityusing the standard deviation of the yields in the past year andthe RiskMetrics method is based on the exponential smooth-ing with λ = 94 The semiparametric method of Fan and Gu(2003) is an extension of a local model used in the exponential

Fan Fan and Jiang Integrating Time and State Domains for Volatility Estimation 623

Table 1 Seven Volatility Estimators

Hist The historical methodRiskM The RiskMetrics method of J P MorganSemi The semiparametric estimator (SEV) of Fan and Gu (2003)NonBay The nonparametric Bayesian method in (18)Integ The integration method of time and state domains in (13)GARCH The maximum likelihood method based on a GARCH(1 1)

modelStaDo The state-domain estimation method in Section 22

smoothing with the smoothing parameter determined by min-imizing the prediction error It includes exponential smoothingwith λ selected by data as a specific example For our integratedmethods the parameter λ is taken as 94

The following five measures are used to assess the per-formance of different procedures for estimating the volatilityOther related measures also can be used (see Daveacute and Stahl1997)

Measure 1 Exceedence Ratio Against Confidence LevelThis measure counts the number of the events for which theloss of an asset exceeds the loss predicted by the normal modelat a given confidence α It was given by Fan and Gu (2003) andcomputed as

ER(σ 2t ) = mminus1

T+msum

i=T+1

I(Yi lt 13minus1(α)σi) (19)

where 13minus1(α) is the α-quantile of the standard normal distribu-tion and m is the size of the out-sample This gives an indicationof how effectively the volatility estimator can be used for pre-diction

As noted by Fan and Gu (2003) one shortcoming of the mea-sure is its large Monte Carlo error Unless the postsample sizem is sufficiently large this measure has difficulty by differenti-ating the performance of various estimators due to the presenceof large error margins Note that the ER depends strongly onthe assumption of normality In our simulation study we usethe true α-quantile of the error distribution instead of 13minus1(α)

in (19) to compute the ER For real data analysis we use theα-quantile of the last 250 residuals for the in-sample data

Measure 2 Mean Absolute Deviation Error To motivate thismeasure we first consider the MSEs

PE(σ 2t ) = mminus1

T+msum

i=T+1

(Y2i minus σ 2

i )2

The expected value can be decomposed as

E(PE) = mminus1T+msum

i=T+1

E(σ 2i minus σ 2

i )2 + mminus1T+msum

i=T+1

E(Y2i minus σ 2

i )2

+ mminus1T+msum

i=T+1

E(Y2i minus σ 2

i )(σ 2i minus σ 2

i ) (20)

Because σ 2i is predicable at time ti and Y2

i minus σ 2i is a martingale

difference the third term is 0 [the drift is assumed to be neg-ligible in the forgoing formulation of PE otherwise the driftterm should be removed first In both cases it has an affec-tion O(2)] Therefore we need consider only the first two

terms The first term reflects the effectiveness of the estimatedvolatility whereas the second term is the size of the stochasticerror and independent of estimators As in all statistical pre-diction problems the second term is usually of an order ofmagnitude larger than the first term Thus a small improvementin PE could mean substantial improvement over the estimatedvolatility However due to the well-known fact that financialtime series contain outliers the MSE is not a robust measureTherefore we used the mean absolute deviation error (MADE)MADE(σ 2

t ) = mminus1 sumT+mi=T+1 |Y2

i minus σ 2i |

Measure 3 Square-Root Absolute Deviation Error An al-ternative to MADE is the Square-root absolute deviation error(RADE) defined as

RADE(σ 2t ) = mminus1

T+msum

i=T+1

∣∣| Yi | minusradic2πσi

∣∣

The constant factor comes from the fact that E|εt| = radic2π for

εt sim N(01) If the underlying error distribution deviates fromnormality then this measure is not robust

Measure 4 Ideal Mean Absolute Deviation Error and Oth-ers To assess the estimation of the volatility in simulations wealso can use the ideal mean absolute deviation error (IMADE)

IMADE = mminus1T+msum

i=T+1

|σ 2i minus σ 2

i |

and the ideal square root absolute deviation error (IRADE)

IRADE = mminus1T+msum

i=T+1

|σi minus σi|

Another intuitive measure is the relative IMADE (RIMADE)

RIMADE = mminus1T+msum

i=T+1

|σ 2i minus σ 2

i |σ 2

i

This measure may create some outliers when the true volatilityis quite small at certain time points For real data analysis bothmeasures are not applicable

51 Simulations

To assess the performance of the five estimation methodslisted in Table 1 we compute the average and standard devi-ation of each of the four measures over 600 simulations Gen-erally speaking the smaller the average (or the standard devia-tion) the better the estimation approach We also compute theldquoscorerdquo of an estimator which is the percentage of times among600 simulations that the estimator outperforms the average ofthe 7 methods in terms of an effectiveness measure To be morespecific for example consider RiskMetrics using MADE as aneffectiveness measure Let mi be the MADE of the RiskMet-rics estimator at the ith simulation and let mi be the average ofthe MADEs for the five estimators at the ith simulation Thenthe ldquoscorerdquo of the RiskMetrics approach in terms of the MADEis defined as 1

600

sum600i=1 I(mi lt mi) Obviously estimators with

higher scores are preferred In addition we define the ldquorelativelossrdquo of an estimator σ 2

t relative to σ 2It in terms of MADEs as

RLOSS(σ 2t σ 2

It) = MADE(σ 2t )MADE(σ 2

It) minus 1

624 Journal of the American Statistical Association June 2007

where MADE(σ 2t ) is the average of MADE(σ 2

t ) among simula-tions

Example 1 In this example we compare the proposedmethods with the maximum likelihood estimation approachfor a parametric generalized autoregressive conditional het-eroscedasticity [GARCH(1 1)] model This will give an assess-ment of how well our methods perform compared with the mostefficient estimation method

Because the GARCH model does not have a state variablerti but the diffusion model does the diffusion model cannot beused to fit the data simulated from the GARCH model A sen-sible approach is to use the diffusion limit of the GARCH(1 1)model to generate data There are several interesting works onreconciling the two modeling approaches along this line forexample Nelson (1990) established the continuous-time dif-fusion limit for the discrete ARCH model Duan (1997) pro-posed an augmented GARCH model to unify various paramet-ric GARCH models and derived its diffusion limit and Wang(2002) studied the asymptotic relationship between GARCHmodels and diffusions Here we use the result on the diffusionlimit of the GARCH(1 1) model of Wang (2002)

Specifically we first generate data from the GARCH(1 1)model

Yt = σtεt

σ 2t = c0 + a0σ

2tminus1 + b0Y2

tminus1

where the εtrsquos are from the standard normal distribution and thetrue parameters c0 = 498 times 10minus7 a0 = 9289615 and b0 =0411574 are from the fitted values of the parameters for thedaily exchange rate of the Australian dollar with the US dollarfrom January 3 1994 to December 29 2000 The GARCH(1 1)model is then fitted by maximum likelihood estimation

Second we use the diffusion limit (see Wang 2002 sec 23)of the foregoing GARCH model

drt = σt dW1t

dσ 2t = (β0 + β1σ

2t )dt + β2σ

2t dW2t

where W1t and W2t are two independent standard Wienerprocess and the parameters are computed as (β0 β1 β2) =(25896 times 10minus5minus15538 4197) To simulate the diffusionprocess σ 2

t we use the discrete-time order 10 strong ap-proximation scheme of Kloeden Platen Schurz and Soslashrensen(1996)

For each of the two models we generate 600 series of dailydata each with length 1200 For each simulation we set thefirst 900 observations as the ldquoin-samplerdquo data and the last 300observations as the ldquoout-samplerdquo data The results summarizedin Table 2 show that all estimators have acceptable ER valuesclose to 05 The integrated estimator has minimum RIMADEIMADE MADE IRADE and RADE on average among allof the nonparametric estimators Compared with the most ef-ficient maximum likelihood estimator (MLE) of the GARCHmodel the integrated estimator has smaller MADE but largerRIMADE IMADE and IRADE The RADEs of the MLEand the integrated estimator are very close demonstrating thatthe integrated estimator is very impressive in forecasting thevolatility of this example

Note that the dynamic weights for the integrated estimatorare estimated by minimizing the variance of the combined esti-mator in (1) As one anonymous referee pointed out one maychoose to minimize the MSE instead which seems to be dif-ficult and computationally intensive in current situations Asargued in Section 31 the biases and variances trade-off havebeen considered indirectly in the time-domain and state-domainestimation methods The bias of the integrated estimator wascontrolled before the weights were chosen

To investigate whether the foregoing intuition is correct herewe opt for the suggestion from the referee to visually displayhow the integrated estimator dominates the state-domaintime-domain estimators over time through the dynamic weights andcompare the biases of the state-domain estimator and the inte-grated estimator The weights and biases are shown in Figure 2

Table 2 Comparisons of Several Volatility Estimation Methods

Measure Empirical formula Hist RiskM Semi NonBay Integ GARCH StaDo

RIMADE Score () 4073 2805 6494 5092 7379 7629 4090Average 205 204 183 187 178 169 204

Standard (times10minus2) 655 331 450 308 397 367 567Relative loss () 1544 1444 266 497 0 minus476 1475

IMADE Score () 6177 2554 5342 4608 7212 7546 4090Average (times10minus5) 332 367 330 336 318 304 362Standard (times10minus6) 1118 744 877 690 851 800 1113Relative loss () 440 1544 373 581 0 minus447 1390

MADE Score () 2821 4958 4975 5659 8815 5743 7963Average (times10minus4) 164 153 153 152 149 152 149Standard (times10minus6) 2314 2182 2049 2105 1869 1852 1876Relative loss () 1016 284 293 209 0 207 11

IRADE Score () 6277 2404 5109 4708 7062 7713 4040Average (times10minus3) 403 448 407 410 391 369 446Standard (times10minus3) 121 075 092 070 093 085 131Relative loss () 317 1475 411 498 0 minus553 1422

RADE Score () 3055 5042 4341 5910 8464 5893 7913Average (times10minus2) 20 19 19 19 19 19 19Standard (times10minus4) 153 145 141 142 134 133 137Relative loss () 513 142 165 96 0 106 minus25

ER Average (times10minus2) 495 545 538 528 525 516 500Standard (times10minus2) 157 113 128 112 136 125 149

Fan Fan and Jiang Integrating Time and State Domains for Volatility Estimation 625

(a) (b)

Figure 2 The Dynamic Weights for Prediction for Example 1 (a) and the Biases for Prediction for the State-Domain Estimator ( minus minus minus) and theIntegration Estimator ( mdash) (b)

It is seen that the bias is actually controlled because both of theestimators have very close biases The weights are around 2and through the weights the integrated method improves boththe RiskMetrics method and the state-domain approach

Example 2 To simulate the interest rate data we considerthe CoxndashIngersollndashRoss (CIR) model

drt = κ(θ minus rt)dt + σ r12t dWt t ge t0

where the spot rate rt moves around a central location or long-run equilibrium level θ = 08571 at speed κ = 21459 The σ

is set at 07830 These parameters values given by Chapmanand Pearson (2000) satisfy the condition 2κθ ge σ 2 so that theprocess rt is stationary and positive The model has been studiedby Chapman and Pearson (2000) and Fan and Zhang (2003)

We simulated weekly data (600 runs) from the CIR modelFor each simulation we designated the first 900 observations

as the ldquoin-samplerdquo data and the last 300 observations the ldquoout-samplerdquo data The results summarized in Table 3 show that theintegrated method performs closely to the state-domain methodThis is mainly because the dynamic weights shown in Figure 3are very small and the integration estimator is very close to thestate-domain estimator supporting our statement at the end ofthe third paragraph in Section 1 Table 3 shows that the inte-grated estimator uniformly dominates the other estimators onaverage due to its highest score and lowest averaged RIMADEIMADE MADE IRADE and RADE The improvement inRIMADE IMADE and IRADE is about 100 This demon-strates that our integrated volatility method better captures thevolatility dynamics The historical simulation method performspoorly due to misspecification of the function of the volatilityparameter The results here show the advantage of aggregatingthe information of the time and state domains Note that all es-timators have reasonable ER values at level 05 and that the ERvalue of the integrated estimator is closest to 05

Table 3 Comparisons of Several Volatility Estimation Methods

Measure Empirical formula Hist RiskM Semi NonBay Integ GARCH StaDo

RIMADE Score () 501 1820 2104 3656 9933 2404 9917Average (times10minus1) 267 190 188 164 61 202 62Standard (times10minus1) 148 35 81 31 39 130 43Relative loss () 33732 21074 20798 16821 0 23012 112

IMADE Score () 768 1319 1469 2972 9967 2120 9950Average (times10minus6) 237 208 191 179 64 196 63Standard (times10minus6) 100 78 69 68 45 91 46Relative loss () 27162 22558 19905 18107 0 20696 minus163

MADE Score () 3823 5209 5776 5609 7145 5392 6945Average (times10minus5) 102 93 93 93 91 94 92Standard (times10minus6) 349 333 321 330 317 310 317Relative loss () 1100 212 221 162 0 229 21

IRADE Score () 818 1252 1402 3072 9983 2104 9950Average (times10minus4) 376 322 307 277 99 316 98Standard (times10minus4) 140 75 90 66 60 192 62Relative loss () 27855 22397 20898 17892 0 21821 minus111

RADE Score () 4040 5209 5042 5593 7379 4875 7279Average (times10minus2) 15 15 15 15 15 15 15Standard (times10minus4) 272 267 257 265 258 251 257Relative loss () 608 123 160 92 0 189 06

ER Average 056 055 054 053 049 053 049Standard 023 011 013 011 013 036 013

626 Journal of the American Statistical Association June 2007

(a) (b)

Figure 3 The Dynamic Weights for Prediction for Example 2 (a) and the Biases for Prediction for the State-Domain Estimator ( minus minus minus) and forthe Integration Estimator ( mdash) (b)

Again Figure 3(b) displays the biases for the state domainand integration estimators It can be seen that the state-domainand integration estimators have almost overlaid biases demon-strating that the bias of the integrated estimator is taken care ofeven though we choose the dynamic weights by minimizing thevariance

Example 3 To assess the performance of the proposed meth-ods under a nonstationary situation we now consider the geo-metric Brownian (GBM)

drt = μrt dt + σ rt dWt

where Wt is a standard one-dimensional Brownian motion Thisis a nonstationary process against which we check whether ourmethod continues to apply Note that the celebrated BlackndashScholes option price formula is derived from Osbornersquos as-sumption that the stock price follows the GBM model By Itocircrsquosformula we have

log rt minus log r0 = (μ minus σ 22)t + σ 2Wt

We set μ = 03 and σ = 26 in our simulations With theBrownian motion simulated from independent Gaussian incre-ments we can generate the samples for the GBM We simulate600 times with = 1252 For each simulation we generate1000 observations and use the first two-thirds of the observa-tions as in-sample data and the remaining observations as out-sample data

Because of the nonstationarity of the process the simulationresults are somewhat unstable with a few uncommon realiza-tions appearing Therefore we use a robust method to sum-marize the results For each measure we compute the relativeloss and the median along with the median absolute deviation(MAD) for scale estimation The MAD for sample ais

i=1 isdefined as MAD(a) = median|ai minus median(a)| i = 1 sIt is easy to see that MAD(a) provides a robust estimator ofthe standard deviation of ai Table 4 reports the results Thetable shows that the integrated estimator performs quite wellAll estimators have acceptable ER values The proposed esti-

Table 4 Robust Comparisons of Several Volatility Estimation Methods

Measure Empirical formula Hist RiskM Semi NonBay Integ GARCH StaDo

RIMADE Score () 733 4300 3583 7283 6133 2733 5233Median (times10minus1) 2254 1561 1770 1490 1382 2036 1443

MAD 035 012 024 012 034 036 037Relative loss () 6310 1296 2805 781 0 4732 444

IMADE Score () 1333 4333 2633 7333 6517 2550 5300Median (times10minus7) 217 174 193 163 146 265 154MAD (times10minus8) 566 474 486 445 322 817 357

Relative loss () 4864 1913 3211 1205 0 8145 561

MADE Score () 3900 3950 2650 4933 5200 4233 5133Median (times10minus6) 115 111 111 110 109 111 109MAD (times10minus7) 236 244 240 244 231 230 230

Relative loss () 592 143 181 123 0 174 29

Score () 1417 4317 2500 7317 6467 2683 5300IRADE Median (times10minus4) 346 267 310 255 251 378 266

MAD (times10minus5) 599 406 488 389 566 847 644Relative loss () 3787 649 2358 149 0 5037 591

RADE Score () 3900 3700 2433 5117 6150 2950 5750Median (times10minus4) 1581 1509 1515 1508 1504 1661 1503MAD (times10minus4) 185 201 198 200 193 257 192

Relative loss () 509 30 73 23 0 1042 minus07

ER Median 054 051 054 051 048 057 0480MAD 010 003 004 003 006 007 006

Fan Fan and Jiang Integrating Time and State Domains for Volatility Estimation 627

(a) (b)

Figure 4 The Dynamic Weights for Prediction for Example 3 (a) and the Biases for Prediction for the State-Domain Estimator ( minus minus minus) and theIntegration Estimator ( mdash) (b)

mators have the smallest measure values in median or robustscale This shows that our integrated method continues to per-form better than the others for this nonstationary case

Again Figure 4(b) displays the biases for the state domainand integration estimators It can be seen that the state-domainand integration estimators have almost overlaid biases demon-strating that the bias of the integrated estimator is taken care ofeven though we choose the dynamic weights by minimizing thevariance

52 Real Examples

In this section we apply the integrated volatility estimationmethods and other methods to the analysis of real financial data

521 Treasury Bond Here we consider the weekly returnsof three Treasury Bonds with terms of 1 5 and 10 years Thein-sample data are for January 4 1974ndashDecember 30 1994 andout-sample data are for January 6 1995ndashAugust 8 2003 Thetotal sample size is 1545 and the in-sample size is 1096 Theresults are reported in Table 5

From Table 5 for the 1-year Treasury Notes the integratedestimator is of the smallest MADE and the smallest RADE re-flecting that the integrated estimation method of the volatilityis the best among the seven methods For the bonds with 5-or 10-year maturities the seven estimators have close MADEsand RADEs where the historical simulation method is bet-ter than the RiskMetrics in terms of MADE and RADE andthe integrated estimation approach has the smallest MADEsIn addition the integrated estimator has ER values closest to

05 This demonstrates the advantage of using state-domaininformation which can improve the time-domain predictionof the volatilities in the interest dynamics The nonparametricBayesian method does not perform as well as the integrated es-timator because it uses fixed weights and the inverse-gammaprior which may not be true in practice

522 Exchange Rate We analyze the daily exchange rateof several foreign currencies against the US dollar The dataare for January 3 1994ndashAugust 1 2003 The in-sample dataconsist of the observations before January 1 2001 and the out-sample data consist of the remaining observations The resultsreported in Table 6 show that the integrated estimator has thesmallest MADEs and moderate RADEs for the exchange ratesNote that the RADE measure is effective only when the dataseem to be from a normal distribution and thus cannot calibratethe accuracy of volatility forecast for data from an unknowndistribution For this reason the RADE might not be a goodmeasure of performance for this example Both the integratedvolatility estimation and GARCH perform nicely for this exam-ple They outperform other nonparametric methods

Figure 5 reports the dynamic weights for the daily exchangerate of the UK pound to the US dollar Because the RiskMestimator is far more informative than the StaDo estimator ourintegrated estimator becomes basically the time-domain esti-mator by automatically choosing large dynamic weights Thisexemplifies our statement in the paragraph immediately before(1) in Section 1

Table 5 Comparisons of Several Volatility Estimation Methods

Term Measure (times10minus2) Hist RiskM Semi NonBay Integ GARCH StaDo

1 year MADE 1044 787 787 794 732 893 914RADE 5257 4231 4256 4225 4105 4402 4551

ER 2237 2009 2013 1562 3795 0 7366

5 years MADE 1207 1253 1296 1278 1201 1245 1638RADE 5315 5494 5630 5563 5571 5285 6543

ER 671 1339 1566 1112 5804 0 8929

10 years MADE 1041 1093 1103 1111 1018 1046 1366RADE 4939 5235 5296 5280 5152 4936 6008

ER 1112 1563 1790 1334 4911 0 6920

628 Journal of the American Statistical Association June 2007

Table 6 Comparison of Several Volatility Estimation Methods

Currency Measure Hist RiskM Semi NonBay Integ GARCH StaDo

UK MADE (times10minus4) 614 519 536 519 493 506 609RADE (times10minus3) 3991 3424 3513 3440 3492 3369 3900

ER 009 014 019 015 039 005 052

Australia MADE (times10minus4) 172 132 135 135 126 132 164RADE (times10minus3) 1986 1775 1830 1798 1761 1768 2033

ER 054 025 026 022 042 018 045

Japan MADE (times10minus1) 5554 5232 5444 5439 5064 5084 6987RADE (times10minus1) 3596 3546 3622 3588 3559 3448 4077

ER 012 011 019 012 028 003 032

6 CONCLUSIONS

We have proposed a dynamically integrated method and aBayesian method to aggregate the information from the timedomain and the state domain We studied the performance com-parisons both empirically and theoretically We showed that theproposed integrated method effectively aggregates the informa-tion from both the time and state domains and has advantagesover some previous methods for example it is powerful in fore-casting volatilities for the yields of bonds and for exchangerates Our study also revealed that proper use of informationfrom both the time domain and the state domain makes volatil-ity forecasting more accurate Our method exploits both thecontinuity in the time domain and the stationarity in the statedomain and can be applied to situations in which these twoconditions hold approximately

APPENDIX CONDITIONS AND PROOFSOF THEOREMS

We state technical conditions for the proof of our results

(C1) (Global Lipschitz condition) There exists a constant k0 ge 0such that

|μ(x) minus μ(y)| + |σ(x) minus σ(y)| le k0|x minus y| for all x y isin R

(C2) Given time point t gt 0 there exists a constant L gt 0 such that

E|μ(rs)|4(p+δ) le L and E|σ(rs)|4(p+δ) le L

for any s isin [t minus η t] where η is some positive constant p is a positiveinteger and δ is some small positive constant

Figure 5 Dynamic Weights for the Daily Exchange Rate of the UKPound and the US Dollar

(C3) The discrete observations rtiNi=0 satisfy the stationarity con-

ditions of Banon (1978) Furthermore the G2 condition of Rosenblatt(1970) holds for the transition operator

(C4) The conditional density p(y|x) of rti+ given rti is continuousin the arguments (y x) and is bounded by a constant independent of

(C5) The kernel W is a bounded symmetric probability densityfunction with compact support say [minus11]

(C6) (N minus n)h rarr infin (N minus n)h5 rarr 0 and (N)h rarr 0

Condition (C1) ensures that (3) has a unique strong solution rt t ge0 continuous with probability 1 if the initial value r0 satisfies thatP(|r0| lt infin) = 1 (see thm 46 of Liptser and Shiryayev 2001) Condi-tion (C2) indicates that given time point t gt 0 there is a time interval[t minus η t] on which the drift and the volatility have finite 4(p + δ)thmoments which is needed for deriving asymptotic normality of thetime-domain estimate σ 2

ESt Conditions (C3)ndash(C5) are similar to thoseof Fan and Zhang (2003) Condition (C6) is assumed for bandwidthswhere undersmoothing is uses to yield zero bias in the asymptotic nor-mality of the state-domain estimate

Throughout the proof we denote by M a generic positive constantand use μs and σs to represent μ(rs) and σ(rs)

Proposition A1 Under conditions (C1) and (C2) we have for al-most all sample paths

|σ 2s minus σ 2

u | le K|s minus u|(2pminus1)(4p) (A1)

for any su isin [t minus η t] where the coefficient K satisfies E[K2(p+δ)]ltinfin

Proof First we show that the process rs is locally Houmllder-continuous with order q = (p minus 1)(2p) and coefficient K1 such that

E[K4(p+δ)1 ] lt infin that is

|rs minus ru| le K1|s minus u|q foralls u isin [t minus η t] (A2)

In fact by Jensenrsquos inequality and martingale moment inequalities(Karatzas and Shreve 1991 sec 33D p 163) we have

E|ru minus rs|4(p+δ)

le M

(E

∣∣∣∣int u

sμv dv

∣∣∣∣4(p+δ)

+ E

∣∣∣∣int u

sσv dWv

∣∣∣∣4(p+δ))

le M(u minus s)4(p+δ)minus1int u

sE|μv|4(p+δ) dv

+ M(u minus s)2(p+δ)minus1int u

sE|σv|4(p+δ) dv

le M(u minus s)2(p+δ)

Then by theorem 21 of Revuz and Yor (1999 p 26)

E[(

supt 13=s

|rs minus ru||s minus u|α)4(p+δ)]

lt infin (A3)

Fan Fan and Jiang Integrating Time and State Domains for Volatility Estimation 629

for any α isin [02(p+δ)minus1

4(p+δ)) Let α = pminus1

2p and K1 = sups 13=u|rs minusru||sminusu|(pminus1)(2p) Then E[K4(p+δ)

1 lt infin] and the inequality (A2)holds

Second by condition (C1) we have |σ 2s minus σ 2

u | le k0(σs + σu)|rs minusru| This combined with (A2) leads to

|σ 2s minus σ 2

u | le k0(σs + σu)K1|s minus u|q equiv K|s minus u|q

Then by Houmllderrsquos inequality

E[K2(p+δ)

] le ME[(σsK1)2(p+δ)

]

le Mradic

E[σ

4(p+δ)s

]E[K4(p+δ)

1

]lt infin

Proof of Theorem 1

Let Zis = (rs minus rti)2 Applying Itocircrsquos formula to Zis we obtain

dZis = 2

(int s

tiμu du +

int s

tiσu dWu

)(μs ds + σs dWs

)+ σ 2

s ds

= 2

[(int s

tiμu du +

int s

tiσu dWu

)μs ds + σs

(int s

tiμu du

)dWs

]

+ 2

(int s

tiσu dWu

)σs dWs + σ 2

s ds

Then Y2i can be decomposed as Y2

i = 2ai + 2bi + σ 2i where

ai = minus1[int ti+1

tiμs ds

int s

tiμu du +

int ti+1

tiμs ds

int s

tiσu dWu

+int ti+1

tiσs dWs

int s

tiμu du

]

bi = minus1int ti+1

ti

int s

tiσu dWuσs dWs

and

σ 2i = minus1

int ti+1

tiσ 2

s ds

Therefore σ 2ESt can be written as

σ 2ESt = 2

1 minus λ

1 minus λn

tminus1sum

i=tminusn

λtminusiminus1ai + 21 minus λ

1 minus λn

tminus1sum

i=tminusn

λtminusiminus1bi

+ 1 minus λ

1 minus λn

tminus1sum

i=tminusn

λtminusiminus1σ 2i

equiv An + Bn + Cn

By Proposition A1 as n rarr 0 |Cn minus σ 2t | le K(n)q where q =

(2p minus 1)(4p) This combined with Lemmas A1 and A2 completesthe proof of the theorem

Lemma A1 If condition (C2) is satisfied then E[A2n] = O()

Proof Simple algebra gives the result In fact

E(a2i ) le 3E

[minus1

int ti+1

tiμs ds

int s

tiμu du

]2

+ 3E

[minus1

int ti+1

tiμs ds

int s

tiσu dWu

]2

+ 3E

[minus1

int ti+1

tiσs dWs

int s

tiμu du

]2

equiv I1() + I2() + I3()

Applying Jensenrsquos inequality we obtain that

I1() = O(minus1)E

[int ti+1

ti

int s

tiμ2

s μ2u du ds

]

= O(minus1)

int ti+1

ti

int s

tiE(μ4

u + μ4s )du ds = O()

By Jensenrsquos inequality Houmllderrsquos inequality and martingale momentinequalities we have

I2() = O(minus1)

int ti+1

tiE

(μs

int s

tiσu dWu

)2ds

= O(minus1)

int ti+1

ti

E[μs]4E

[int ti+1

tiσu dWu

]412ds

= O()

Similarly I3() = O() Therefore E(a2i ) = O() Then by the

CauchyndashSchwartz inequality and noting that n(1 minus λ) = O(1) we ob-tain that

E[A2n] le n

(1 minus λ

1 minus λn

)2 nsum

i=1

λ2(nminusi)E(a2i ) = O()

Lemma A2 Under conditions (C1) and (C2) if n rarr infin andn rarr 0 then

sminus11t

radicnBn

Dminusrarr N (01) (A4)

Proof Note that bj = σ 2t minus1 int tj+1

tj (Ws minus Wtj)dWs + εj where

εj = minus1int tj+1

tj(σs minus σt)

[int s

tjσu dWu

]dWs

+ minus1σt

int tj+1

tj

[int s

tj(σu minus σt)dWu

]dWs

By the central limit theorem for martingales (see Hall and Heyde 1980cor 31) it suffices to show that

V2n equiv E[sminus2

1t nB2n] rarr 1 (A5)

and that the following Lyapunov condition holds

tminus1sum

i=tminusn

E

(radicn

1 minus λ

1 minus λn λtminusiminus1bi

)4rarr 0 (A6)

Note that

2

2E(ε2

j ) le E

int tj+1

tj(σs minus σt)

[int s

tjσu dWu

]dWs

2

+ σ 2t E

int tj+1

tj

[int s

tj(σu minus σt)dWu

]dWs

2

equiv Ln1 + Ln2 (A7)

By Jensenrsquos inequality Houmllderrsquos inequality and moment inequalitiesfor martingale we have

Ln1 leint tj+1

tjE

(σs minus σt)

2[int s

tjσu dWu

]2ds

leint tj+1

tj

E(σs minus σt)

4E

[int s

tjσu dWu

]412ds

leint tj+1

tj

E[k0K1(n)q]436

int s

tjE(σ 4

u )du

12ds

le M(n)2q2 (A8)

630 Journal of the American Statistical Association June 2007

where K1 is defined in Proposition A1 Similarly

Ln2 le M(n)2q2 (A9)

By (A7) (A8) and (A9) we have E(ε2j ) le M(n)2q therefore

E[σminus4t b2

j ] = 12 + O((n)q) By the theory of stochastic calculus sim-

ple algebra gives that E(bj) = 0 and E(bibj) = 0 for i 13= j It followsthat

V2n = E(sminus2

1t nB2n) =

tminus1sum

i=tminusn

E

(2s1t

radicn

1 minus λ

1 minus λn λtminusiminus1bi

)2rarr 1

that is (A5) holds For (A6) it suffices to prove that E(b4j ) is

bounded which holds from condition (C2) and by applying the mo-ment inequalities for martingales to b4

j

Proof of Theorem 2

The proof is completed along the same lines as given by Fan andZhang (2003)

Proof of Theorem 3

By Fan and Yao (1998) the volatility estimator σ 2StN

behaves asif the instantaneous return function f were known Thus without lossof generality we assume that f (x) = 0 and hence Ri = Y2

i Let Y =(Y2

0 Y2Nminus1)T and W = diagWh(rt0 minus rtN ) Wh(rtNminus1 minus rtN )

and let X be a (N minus t0)times 2 matrix with each of the elements in the firstcolumn being 1 and the second column being (rt0 minus rtN rtNminus1 minusrtN )prime Let mi = E[Y2

i |rti ] m = (m0 mNminus1)T and e1 = (10)T Define SN = XT WX and TN = XT WY Then it can be written thatσ 2

StN= eT

1 Sminus1N TN (see Fan and Yao 2003) Hence

σ 2StN

minus σ 2tN = eT

1 Sminus1N XT Wm minus XβN + eT

1 Sminus1N XT W(Y minus m)

equiv eT1 b + eT

1 t (A10)

where βN = (m(rtN ) mprime(rtN ))T with m(rx) = E[Y21 |rt1 = x] Follow-

ing Fan and Zhang (2003) the bias vector b converges in probability toa vector b with b = O(h2) = o(1

radic(N)h) In what follows we show

that the centralized vector t is asymptotically normalIn fact write u = (N)minus1Hminus1XT W(Y minus m) where H = diag1h

Then following Fan and Zhang (2003) the vector t can be written as

t = pminus1(rtN )Hminus1Sminus1u(1 + op(1)) (A11)

where S = (μi+jminus2)ij=12 with μj = intujW(u)du and p(x) is the in-

variant density of rt defined in Section 22 For any constant vector cdefine

QN = cT u = 1

N

Nminus1sum

i=0

Y2i minus miCh

(rti minus rtN

)

where Ch(middot) = 1hC(middoth) with C(x) = c0W(x) + c1xW(x) Applyingthe ldquobig-blockrdquo and ldquosmall-blockrdquo arguments of Fan and Yao (2003thm 63 p 238) we obtain

θminus1(rtN )radic

(N)hQNDminusrarr N(01) (A12)

where θ2(rtN ) = 2p(rtN )σ 4(rtN )int +infinminusinfin C2(u)du In what follows we

decompose QN into two parts QprimeN and Qprimeprime

N which satisfy the follow-ing

(a) NhE[θminus1(rtN )QprimeN ]2 le (hN)(hminus1aN(1 + o(1)) + (N) times

o(hminus1)) rarr 0

(b) QprimeprimeN is identically distributed as QN and is asymptotically inde-

pendent of σ 2EStN

Define QprimeN = 1

NsumaN

i=0Y2i minus E[Y2

i |rti ]Ch(rti minus rtN ) and QprimeprimeN = QN minus

QprimeN where aN is a positive integer satisfying aN = o(N) and aN rarr

infin Obviously aN n Let ϑNi = (Y2i minus mi)Ch(rti minus rtN ) Then fol-

lowing Fan and Zhang (2003)

var[θminus1(

rtN)ϑN1

] = hminus1(1 + o(1))

and

Nminus2sum

=1

| cov(ϑN1 ϑN+1)| = o(hminus1)

which yields the result in (a) This combined with (A12) (a) and thedefinition of Qprimeprime

N leads to

θminus1(rtN )radic

NhQprimeprimeN

Dminusrarr N(01) (A13)

Note that the stationarity conditions of Banon (1978) and the G2 con-dition of Rosenblatt (1970) on the transition operator imply that theρt mixing coefficient ρt() of rti decays exponentially and that thestrong mixing coefficient α() le ρt() it follows that∣∣E exp

(Qprimeprime

N + σ 2EStN

) minus E expiξ(QprimeprimeN)E exp

(iξ σ 2

EStN

)∣∣

le 32α((aN minus n)) rarr 0

for any ξ isin R Using the theorem of Volkonskii and Rozanov (1959)we get the asymptotic independence of σ 2

EStNand Qprimeprime

N

By (a)radic

NhQprimeN is asymptotically negligible Then by Theorem 1

d1θminus1(rtN

)radicNhQN + d2Vminus12

2radic

n[σ 2

EStNminus σ 2(

rtN)]

DminusrarrN (0d21 + d2

2)

for any real d1 and d2 where V2 = ec+1ecminus1σ 4(rtN ) Because QN is a

linear transformation of u

Vminus12[ radic

Nhuradicn[σ 2

EStNminus σ 2(rtN )]

]Dminusrarr N (0 I3)

where V = blockdiagV1V2 with V1 = 2σ 4(rtN )p(rtN )Slowast whereSlowast = (νi+jminus2)ij=12 with νj = int

ujW2(u)du This combined with

(A11) gives the joint asymptotic normality of t and σ 2EStN

Note that

b = op(1radic

Nh) and it follows that

minus12(radic

Nh[σ 2StN

minus σ 2(rtN )]radic

n[σ 2EStN

minus σ 2(rtN )])

Dminusrarr N (0 I2)

where = diag2σ 4(rtN )ν0p(rtN )V2 Note that σ 2StN

and σ 2EStN

are asymptotically independent it follows that the asymptotical nor-mality in (b) holds

The result in (c) follows from (b) and the fact that wtNPrarr wtN and

σ 2ItN

= wtN σ 2EStN

+ (1 minus wtN )σ 2StN

It follows that

σ 2ItN minus σ 2

tN

= σ 2tN minus σ 2

tN + (wtN minus wtN

)[(σ 2

EStNminus σ 2

tN

) minus (σ 2

StNminus σ 2

tN

)]

equiv Ln1 + (wtN minus wtN

)(Ln2 minus Ln3)

Because Ln1 Ln2 and Ln3 are of the same order the second and thirdterms are dominated by the first term Ln1 Then σ 2

ItNminus σ 2

tN = (σ 2tN minus

σ 2tN )(1 + op(1)) and thus σ 2

ItNminus σ 2

tN shares the same asymptotical

normality as σ 2tN minus σ 2

tN

[Received November 2005 Revised December 2006]

Fan Fan and Jiang Integrating Time and State Domains for Volatility Estimation 631

REFERENCESAiumlt-Sahalia Y and Mykland P (2004) ldquoEstimators of Diffusions With Ran-

domly Spaced Discrete Observations A General Theoryrdquo The Annals of Sta-tistics 32 2186ndash2222

Aiumlt-Sahalia Y Mykland P A and Zhang L (2005) ldquoHow Often to Samplea Continuous-Time Process in the Presence of Market Microstructure NoiserdquoReview of Financial Studies 18 351ndash416

Arapis M and Gao J (2004) ldquoNonparametric Kernel Estimation and Testingin Continuous-Time Financial Econometricsrdquo manuscript

Bandi F and Phillips (2003) ldquoFully Nonparametric Estimation of Scalar Dif-fusion Modelsrdquo Econometrica 71 241ndash283

Banon G (1978) ldquoNonparametric Identification for Diffusion ProcessesrdquoSIAM Journal of Control Optimization 16 380ndash395

Barndoff-Neilsen O E and Shephard N (2001) ldquoNon-Gaussian OrnsteinndashUhlenbeckndashBased Models and Some of Their Uses in Financial Economicsrdquo(with discussion) Journal of the Royal Statistical Society Ser B 63167ndash241

(2002) ldquoEconometric Analysis of Realized Volatility and Its Use inEstimating Stochastic Volatility Modelsrdquo Journal of the Royal Statistical So-ciety Ser B 64 253ndash280

Bollerslev T and Zhou H (2002) ldquoEstimating Stochastic Volatility DiffusionUsing Conditional Moments of Integrated Volatilityrdquo Journal of Economet-rics 109 33ndash65

Brenner R J Harjes R H and Kroner K F (1996) ldquoAnother Look at Mod-els of the Short-Term Interest Raterdquo Journal of Financial and QuantitativeAnalysis 31 85ndash107

Cai Z and Hong Y (2003) ldquoNonparametric Methods in Continuous-TimeFinance A Selective Reviewrdquo in Recent Advances and Trends in Nonpara-metric Statistics eds M G Akritas and D M Politis Amsterdam Elsevierpp 283ndash302

Chan K C Karolyi A G Longstaff F A and Sanders A B (1992)ldquoAn Empirical Comparison of Alternative Models of the Short-Term Inter-est Raterdquo Journal of Finance 47 1209ndash1227

Chapman D A and Pearson N D (2000) ldquoIs the Short Rate Drift ActuallyNonlinearrdquo Journal of Finance 55 355ndash388

Chen S X Gao J and Tang C Y (2007) ldquoA Test for Model Specificationof Diffusion Processesrdquo The Annals of Statistics to appear

Cox J C Ingersoll J E and Ross S A (1985) ldquoA Theory of the TermStructure of Interest Ratesrdquo Econometrica 53 385ndash467

Daveacute R D and Stahl G (1997) ldquoOn the Accuracy of VaR Estimates Based onthe VariancendashCovariance Approachrdquo working paper Olshen and Associates

Duan J C (1997) ldquoAugumented GARCH(pq) Process and Its DiffusionLimitrdquo Journal of Econometrics 79 97ndash127

Duffie D and Pan J (1997) ldquoAn Overview of Value at Riskrdquo The Journal ofDerivatives 4 7ndash49

Fan J (2005) ldquoA Selective Overview of Nonparametric Methods in FinancialEconometricsrdquo (with discussion) Statistical Science 20 317ndash357

Fan J and Gu J (2003) ldquoSemiparametric Estimation of Value-at-RiskrdquoEconometrics Journal 6 261ndash290

Fan J Jiang J Zhang C and Zhou Z (2003) ldquoTime-Dependent DiffusionModels for Term Structure Dynamics and the Stock Price Volatilityrdquo Statis-tica Sinica 13 965ndash992

Fan J and Yao Q (1998) ldquoEfficient Estimation of Conditional Variance Func-tions in Stochastic Regressionrdquo Biometrika 85 645ndash660

(2003) Nonlinear Time Series Nonparametric and Parametric Meth-ods New York Springer-Verlag

Fan J and Zhang C M (2003) ldquoA Reexamination of Diffusion EstimatorsWith Applications to Financial Model Validationrdquo Journal of the AmericanStatistical Association 98 118ndash134

Genon-Catalot V Jeanthheau T and Laredo C (1999) ldquoParameter Esti-mation for Discretely Observed Stochastic Volatility Modelsrdquo Bernoulli 5855ndash872

Gijbels I Pope A and Wand M P (1999) ldquoUnderstanding ExponentialSmoothing via Kernel Regressionrdquo Journal of the Royal Statistical SocietySer B 61 39ndash50

Hall P and Hart J D (1990) ldquoNonparametric Regression With Long-RangeDependencerdquo Stochastic Processes and Their Applications 36 339ndash351

Hall P and Heyde C (1980) Martingale Limit Theorem and Its ApplicationsNew York Academic Press

Hallin M Lu Z and Tran L T (2001) ldquoDensity Estimation for Spatial Lin-ear Processesrdquo Bernoulli 7 657ndash668

Haumlrdle W Herwartz H and Spokoiny V (2002) ldquoTime-InhomogeneousMultiple Volatility Modellingrdquo Journal of Finance and Econometrics 155ndash95

Hong Y and Li H (2005) ldquoNonparametric Specification Testing forContinuous-Time Models With Applications to Term Structure of InterestrdquoReview of Financial Studies 18 37ndash84

Karatzas I and Shreve S (1991) Brownian Motion and Stochastic Calculus(2nd ed) New York Springer-Verlag

Kloeden D E Platen E Schurz H and Soslashrensen M (1996) ldquoOn Effectsof Discretization on Estimators of Drift Parameters for Diffusion ProcessesrdquoJournal of Applied Probability 33 1061ndash1076

Liptser R S and Shiryayev A N (2001) Statistics of Random Processes I II(2nd ed) New York Springer-Verlag

Mercurio D and Spokoiny V (2004) ldquoStatistical Inference for Time-Inhomogeneous Volatility Modelsrdquo The Annals of Statistics 32 577ndash602

Morgan J P (1996) RiskMetrics Technical Document (4th ed) New YorkAuthor

Nelson D B (1990) ldquoARCH Models as Diffusion Approximationsrdquo Journalof Econometrics 45 7ndash38

Revuz D and Yor M (1999) Continuous Martingales and Brownian MotionNew York Springer-Verlag

Robinson P M (1997) ldquoLarge-Sample Inference for Nonparametric Regres-sion With Dependent Errorsrdquo The Annals of Statistics 25 2054ndash2083

Rosenblatt M (1970) ldquoDensity Estimates and Markov Sequencesrdquo in Non-parametric Techniques in Statistical Inference ed M L Puri CambridgeUK Cambridge University Press pp 199ndash213

Ruppert D Wand M P Holst U and Houmlssjer O (1997) ldquoLocal PolynomialVariance Function Estimationrdquo Technometrics 39 262ndash273

Spokoiny V (2000) ldquoDrift Estimation for Nonparametric Diffusionrdquo The An-nals of Statistics 28 815ndash836

Stanton R (1997) ldquoA Nonparametric Models of Term Structure Dynamics andthe Market Price of Interest Rate Riskrdquo Journal of Finance LII 1973ndash2002

Vasicek O (1977) ldquoAn Equilibrium Characterization of Term Structurerdquo Jour-nal of Financial Economics 5 177ndash188

Volkonskii V A and Rozanov Yu A (1959) ldquoSome Limit Theorems forRandom Functions Part Irdquo Theory of Probability and Its Applications 4178ndash197

Wang Y (2002) ldquoAsymptotic Nonequivalence of GARCH Models and Diffu-sionsrdquo The Annals of Statistics 30 754ndash783

Page 6: Dynamic Integration of Time- and ... - Princeton Universityorfe.princeton.edu/~jqfan/papers/04/timestate4.pdf · Dynamic Integration of Time- and State-Domain Methods for Volatility

Fan Fan and Jiang Integrating Time and State Domains for Volatility Estimation 623

Table 1 Seven Volatility Estimators

Hist The historical methodRiskM The RiskMetrics method of J P MorganSemi The semiparametric estimator (SEV) of Fan and Gu (2003)NonBay The nonparametric Bayesian method in (18)Integ The integration method of time and state domains in (13)GARCH The maximum likelihood method based on a GARCH(1 1)

modelStaDo The state-domain estimation method in Section 22

smoothing with the smoothing parameter determined by min-imizing the prediction error It includes exponential smoothingwith λ selected by data as a specific example For our integratedmethods the parameter λ is taken as 94

The following five measures are used to assess the per-formance of different procedures for estimating the volatilityOther related measures also can be used (see Daveacute and Stahl1997)

Measure 1 Exceedence Ratio Against Confidence LevelThis measure counts the number of the events for which theloss of an asset exceeds the loss predicted by the normal modelat a given confidence α It was given by Fan and Gu (2003) andcomputed as

ER(σ 2t ) = mminus1

T+msum

i=T+1

I(Yi lt 13minus1(α)σi) (19)

where 13minus1(α) is the α-quantile of the standard normal distribu-tion and m is the size of the out-sample This gives an indicationof how effectively the volatility estimator can be used for pre-diction

As noted by Fan and Gu (2003) one shortcoming of the mea-sure is its large Monte Carlo error Unless the postsample sizem is sufficiently large this measure has difficulty by differenti-ating the performance of various estimators due to the presenceof large error margins Note that the ER depends strongly onthe assumption of normality In our simulation study we usethe true α-quantile of the error distribution instead of 13minus1(α)

in (19) to compute the ER For real data analysis we use theα-quantile of the last 250 residuals for the in-sample data

Measure 2 Mean Absolute Deviation Error To motivate thismeasure we first consider the MSEs

PE(σ 2t ) = mminus1

T+msum

i=T+1

(Y2i minus σ 2

i )2

The expected value can be decomposed as

E(PE) = mminus1T+msum

i=T+1

E(σ 2i minus σ 2

i )2 + mminus1T+msum

i=T+1

E(Y2i minus σ 2

i )2

+ mminus1T+msum

i=T+1

E(Y2i minus σ 2

i )(σ 2i minus σ 2

i ) (20)

Because σ 2i is predicable at time ti and Y2

i minus σ 2i is a martingale

difference the third term is 0 [the drift is assumed to be neg-ligible in the forgoing formulation of PE otherwise the driftterm should be removed first In both cases it has an affec-tion O(2)] Therefore we need consider only the first two

terms The first term reflects the effectiveness of the estimatedvolatility whereas the second term is the size of the stochasticerror and independent of estimators As in all statistical pre-diction problems the second term is usually of an order ofmagnitude larger than the first term Thus a small improvementin PE could mean substantial improvement over the estimatedvolatility However due to the well-known fact that financialtime series contain outliers the MSE is not a robust measureTherefore we used the mean absolute deviation error (MADE)MADE(σ 2

t ) = mminus1 sumT+mi=T+1 |Y2

i minus σ 2i |

Measure 3 Square-Root Absolute Deviation Error An al-ternative to MADE is the Square-root absolute deviation error(RADE) defined as

RADE(σ 2t ) = mminus1

T+msum

i=T+1

∣∣| Yi | minusradic2πσi

∣∣

The constant factor comes from the fact that E|εt| = radic2π for

εt sim N(01) If the underlying error distribution deviates fromnormality then this measure is not robust

Measure 4 Ideal Mean Absolute Deviation Error and Oth-ers To assess the estimation of the volatility in simulations wealso can use the ideal mean absolute deviation error (IMADE)

IMADE = mminus1T+msum

i=T+1

|σ 2i minus σ 2

i |

and the ideal square root absolute deviation error (IRADE)

IRADE = mminus1T+msum

i=T+1

|σi minus σi|

Another intuitive measure is the relative IMADE (RIMADE)

RIMADE = mminus1T+msum

i=T+1

|σ 2i minus σ 2

i |σ 2

i

This measure may create some outliers when the true volatilityis quite small at certain time points For real data analysis bothmeasures are not applicable

51 Simulations

To assess the performance of the five estimation methodslisted in Table 1 we compute the average and standard devi-ation of each of the four measures over 600 simulations Gen-erally speaking the smaller the average (or the standard devia-tion) the better the estimation approach We also compute theldquoscorerdquo of an estimator which is the percentage of times among600 simulations that the estimator outperforms the average ofthe 7 methods in terms of an effectiveness measure To be morespecific for example consider RiskMetrics using MADE as aneffectiveness measure Let mi be the MADE of the RiskMet-rics estimator at the ith simulation and let mi be the average ofthe MADEs for the five estimators at the ith simulation Thenthe ldquoscorerdquo of the RiskMetrics approach in terms of the MADEis defined as 1

600

sum600i=1 I(mi lt mi) Obviously estimators with

higher scores are preferred In addition we define the ldquorelativelossrdquo of an estimator σ 2

t relative to σ 2It in terms of MADEs as

RLOSS(σ 2t σ 2

It) = MADE(σ 2t )MADE(σ 2

It) minus 1

624 Journal of the American Statistical Association June 2007

where MADE(σ 2t ) is the average of MADE(σ 2

t ) among simula-tions

Example 1 In this example we compare the proposedmethods with the maximum likelihood estimation approachfor a parametric generalized autoregressive conditional het-eroscedasticity [GARCH(1 1)] model This will give an assess-ment of how well our methods perform compared with the mostefficient estimation method

Because the GARCH model does not have a state variablerti but the diffusion model does the diffusion model cannot beused to fit the data simulated from the GARCH model A sen-sible approach is to use the diffusion limit of the GARCH(1 1)model to generate data There are several interesting works onreconciling the two modeling approaches along this line forexample Nelson (1990) established the continuous-time dif-fusion limit for the discrete ARCH model Duan (1997) pro-posed an augmented GARCH model to unify various paramet-ric GARCH models and derived its diffusion limit and Wang(2002) studied the asymptotic relationship between GARCHmodels and diffusions Here we use the result on the diffusionlimit of the GARCH(1 1) model of Wang (2002)

Specifically we first generate data from the GARCH(1 1)model

Yt = σtεt

σ 2t = c0 + a0σ

2tminus1 + b0Y2

tminus1

where the εtrsquos are from the standard normal distribution and thetrue parameters c0 = 498 times 10minus7 a0 = 9289615 and b0 =0411574 are from the fitted values of the parameters for thedaily exchange rate of the Australian dollar with the US dollarfrom January 3 1994 to December 29 2000 The GARCH(1 1)model is then fitted by maximum likelihood estimation

Second we use the diffusion limit (see Wang 2002 sec 23)of the foregoing GARCH model

drt = σt dW1t

dσ 2t = (β0 + β1σ

2t )dt + β2σ

2t dW2t

where W1t and W2t are two independent standard Wienerprocess and the parameters are computed as (β0 β1 β2) =(25896 times 10minus5minus15538 4197) To simulate the diffusionprocess σ 2

t we use the discrete-time order 10 strong ap-proximation scheme of Kloeden Platen Schurz and Soslashrensen(1996)

For each of the two models we generate 600 series of dailydata each with length 1200 For each simulation we set thefirst 900 observations as the ldquoin-samplerdquo data and the last 300observations as the ldquoout-samplerdquo data The results summarizedin Table 2 show that all estimators have acceptable ER valuesclose to 05 The integrated estimator has minimum RIMADEIMADE MADE IRADE and RADE on average among allof the nonparametric estimators Compared with the most ef-ficient maximum likelihood estimator (MLE) of the GARCHmodel the integrated estimator has smaller MADE but largerRIMADE IMADE and IRADE The RADEs of the MLEand the integrated estimator are very close demonstrating thatthe integrated estimator is very impressive in forecasting thevolatility of this example

Note that the dynamic weights for the integrated estimatorare estimated by minimizing the variance of the combined esti-mator in (1) As one anonymous referee pointed out one maychoose to minimize the MSE instead which seems to be dif-ficult and computationally intensive in current situations Asargued in Section 31 the biases and variances trade-off havebeen considered indirectly in the time-domain and state-domainestimation methods The bias of the integrated estimator wascontrolled before the weights were chosen

To investigate whether the foregoing intuition is correct herewe opt for the suggestion from the referee to visually displayhow the integrated estimator dominates the state-domaintime-domain estimators over time through the dynamic weights andcompare the biases of the state-domain estimator and the inte-grated estimator The weights and biases are shown in Figure 2

Table 2 Comparisons of Several Volatility Estimation Methods

Measure Empirical formula Hist RiskM Semi NonBay Integ GARCH StaDo

RIMADE Score () 4073 2805 6494 5092 7379 7629 4090Average 205 204 183 187 178 169 204

Standard (times10minus2) 655 331 450 308 397 367 567Relative loss () 1544 1444 266 497 0 minus476 1475

IMADE Score () 6177 2554 5342 4608 7212 7546 4090Average (times10minus5) 332 367 330 336 318 304 362Standard (times10minus6) 1118 744 877 690 851 800 1113Relative loss () 440 1544 373 581 0 minus447 1390

MADE Score () 2821 4958 4975 5659 8815 5743 7963Average (times10minus4) 164 153 153 152 149 152 149Standard (times10minus6) 2314 2182 2049 2105 1869 1852 1876Relative loss () 1016 284 293 209 0 207 11

IRADE Score () 6277 2404 5109 4708 7062 7713 4040Average (times10minus3) 403 448 407 410 391 369 446Standard (times10minus3) 121 075 092 070 093 085 131Relative loss () 317 1475 411 498 0 minus553 1422

RADE Score () 3055 5042 4341 5910 8464 5893 7913Average (times10minus2) 20 19 19 19 19 19 19Standard (times10minus4) 153 145 141 142 134 133 137Relative loss () 513 142 165 96 0 106 minus25

ER Average (times10minus2) 495 545 538 528 525 516 500Standard (times10minus2) 157 113 128 112 136 125 149

Fan Fan and Jiang Integrating Time and State Domains for Volatility Estimation 625

(a) (b)

Figure 2 The Dynamic Weights for Prediction for Example 1 (a) and the Biases for Prediction for the State-Domain Estimator ( minus minus minus) and theIntegration Estimator ( mdash) (b)

It is seen that the bias is actually controlled because both of theestimators have very close biases The weights are around 2and through the weights the integrated method improves boththe RiskMetrics method and the state-domain approach

Example 2 To simulate the interest rate data we considerthe CoxndashIngersollndashRoss (CIR) model

drt = κ(θ minus rt)dt + σ r12t dWt t ge t0

where the spot rate rt moves around a central location or long-run equilibrium level θ = 08571 at speed κ = 21459 The σ

is set at 07830 These parameters values given by Chapmanand Pearson (2000) satisfy the condition 2κθ ge σ 2 so that theprocess rt is stationary and positive The model has been studiedby Chapman and Pearson (2000) and Fan and Zhang (2003)

We simulated weekly data (600 runs) from the CIR modelFor each simulation we designated the first 900 observations

as the ldquoin-samplerdquo data and the last 300 observations the ldquoout-samplerdquo data The results summarized in Table 3 show that theintegrated method performs closely to the state-domain methodThis is mainly because the dynamic weights shown in Figure 3are very small and the integration estimator is very close to thestate-domain estimator supporting our statement at the end ofthe third paragraph in Section 1 Table 3 shows that the inte-grated estimator uniformly dominates the other estimators onaverage due to its highest score and lowest averaged RIMADEIMADE MADE IRADE and RADE The improvement inRIMADE IMADE and IRADE is about 100 This demon-strates that our integrated volatility method better captures thevolatility dynamics The historical simulation method performspoorly due to misspecification of the function of the volatilityparameter The results here show the advantage of aggregatingthe information of the time and state domains Note that all es-timators have reasonable ER values at level 05 and that the ERvalue of the integrated estimator is closest to 05

Table 3 Comparisons of Several Volatility Estimation Methods

Measure Empirical formula Hist RiskM Semi NonBay Integ GARCH StaDo

RIMADE Score () 501 1820 2104 3656 9933 2404 9917Average (times10minus1) 267 190 188 164 61 202 62Standard (times10minus1) 148 35 81 31 39 130 43Relative loss () 33732 21074 20798 16821 0 23012 112

IMADE Score () 768 1319 1469 2972 9967 2120 9950Average (times10minus6) 237 208 191 179 64 196 63Standard (times10minus6) 100 78 69 68 45 91 46Relative loss () 27162 22558 19905 18107 0 20696 minus163

MADE Score () 3823 5209 5776 5609 7145 5392 6945Average (times10minus5) 102 93 93 93 91 94 92Standard (times10minus6) 349 333 321 330 317 310 317Relative loss () 1100 212 221 162 0 229 21

IRADE Score () 818 1252 1402 3072 9983 2104 9950Average (times10minus4) 376 322 307 277 99 316 98Standard (times10minus4) 140 75 90 66 60 192 62Relative loss () 27855 22397 20898 17892 0 21821 minus111

RADE Score () 4040 5209 5042 5593 7379 4875 7279Average (times10minus2) 15 15 15 15 15 15 15Standard (times10minus4) 272 267 257 265 258 251 257Relative loss () 608 123 160 92 0 189 06

ER Average 056 055 054 053 049 053 049Standard 023 011 013 011 013 036 013

626 Journal of the American Statistical Association June 2007

(a) (b)

Figure 3 The Dynamic Weights for Prediction for Example 2 (a) and the Biases for Prediction for the State-Domain Estimator ( minus minus minus) and forthe Integration Estimator ( mdash) (b)

Again Figure 3(b) displays the biases for the state domainand integration estimators It can be seen that the state-domainand integration estimators have almost overlaid biases demon-strating that the bias of the integrated estimator is taken care ofeven though we choose the dynamic weights by minimizing thevariance

Example 3 To assess the performance of the proposed meth-ods under a nonstationary situation we now consider the geo-metric Brownian (GBM)

drt = μrt dt + σ rt dWt

where Wt is a standard one-dimensional Brownian motion Thisis a nonstationary process against which we check whether ourmethod continues to apply Note that the celebrated BlackndashScholes option price formula is derived from Osbornersquos as-sumption that the stock price follows the GBM model By Itocircrsquosformula we have

log rt minus log r0 = (μ minus σ 22)t + σ 2Wt

We set μ = 03 and σ = 26 in our simulations With theBrownian motion simulated from independent Gaussian incre-ments we can generate the samples for the GBM We simulate600 times with = 1252 For each simulation we generate1000 observations and use the first two-thirds of the observa-tions as in-sample data and the remaining observations as out-sample data

Because of the nonstationarity of the process the simulationresults are somewhat unstable with a few uncommon realiza-tions appearing Therefore we use a robust method to sum-marize the results For each measure we compute the relativeloss and the median along with the median absolute deviation(MAD) for scale estimation The MAD for sample ais

i=1 isdefined as MAD(a) = median|ai minus median(a)| i = 1 sIt is easy to see that MAD(a) provides a robust estimator ofthe standard deviation of ai Table 4 reports the results Thetable shows that the integrated estimator performs quite wellAll estimators have acceptable ER values The proposed esti-

Table 4 Robust Comparisons of Several Volatility Estimation Methods

Measure Empirical formula Hist RiskM Semi NonBay Integ GARCH StaDo

RIMADE Score () 733 4300 3583 7283 6133 2733 5233Median (times10minus1) 2254 1561 1770 1490 1382 2036 1443

MAD 035 012 024 012 034 036 037Relative loss () 6310 1296 2805 781 0 4732 444

IMADE Score () 1333 4333 2633 7333 6517 2550 5300Median (times10minus7) 217 174 193 163 146 265 154MAD (times10minus8) 566 474 486 445 322 817 357

Relative loss () 4864 1913 3211 1205 0 8145 561

MADE Score () 3900 3950 2650 4933 5200 4233 5133Median (times10minus6) 115 111 111 110 109 111 109MAD (times10minus7) 236 244 240 244 231 230 230

Relative loss () 592 143 181 123 0 174 29

Score () 1417 4317 2500 7317 6467 2683 5300IRADE Median (times10minus4) 346 267 310 255 251 378 266

MAD (times10minus5) 599 406 488 389 566 847 644Relative loss () 3787 649 2358 149 0 5037 591

RADE Score () 3900 3700 2433 5117 6150 2950 5750Median (times10minus4) 1581 1509 1515 1508 1504 1661 1503MAD (times10minus4) 185 201 198 200 193 257 192

Relative loss () 509 30 73 23 0 1042 minus07

ER Median 054 051 054 051 048 057 0480MAD 010 003 004 003 006 007 006

Fan Fan and Jiang Integrating Time and State Domains for Volatility Estimation 627

(a) (b)

Figure 4 The Dynamic Weights for Prediction for Example 3 (a) and the Biases for Prediction for the State-Domain Estimator ( minus minus minus) and theIntegration Estimator ( mdash) (b)

mators have the smallest measure values in median or robustscale This shows that our integrated method continues to per-form better than the others for this nonstationary case

Again Figure 4(b) displays the biases for the state domainand integration estimators It can be seen that the state-domainand integration estimators have almost overlaid biases demon-strating that the bias of the integrated estimator is taken care ofeven though we choose the dynamic weights by minimizing thevariance

52 Real Examples

In this section we apply the integrated volatility estimationmethods and other methods to the analysis of real financial data

521 Treasury Bond Here we consider the weekly returnsof three Treasury Bonds with terms of 1 5 and 10 years Thein-sample data are for January 4 1974ndashDecember 30 1994 andout-sample data are for January 6 1995ndashAugust 8 2003 Thetotal sample size is 1545 and the in-sample size is 1096 Theresults are reported in Table 5

From Table 5 for the 1-year Treasury Notes the integratedestimator is of the smallest MADE and the smallest RADE re-flecting that the integrated estimation method of the volatilityis the best among the seven methods For the bonds with 5-or 10-year maturities the seven estimators have close MADEsand RADEs where the historical simulation method is bet-ter than the RiskMetrics in terms of MADE and RADE andthe integrated estimation approach has the smallest MADEsIn addition the integrated estimator has ER values closest to

05 This demonstrates the advantage of using state-domaininformation which can improve the time-domain predictionof the volatilities in the interest dynamics The nonparametricBayesian method does not perform as well as the integrated es-timator because it uses fixed weights and the inverse-gammaprior which may not be true in practice

522 Exchange Rate We analyze the daily exchange rateof several foreign currencies against the US dollar The dataare for January 3 1994ndashAugust 1 2003 The in-sample dataconsist of the observations before January 1 2001 and the out-sample data consist of the remaining observations The resultsreported in Table 6 show that the integrated estimator has thesmallest MADEs and moderate RADEs for the exchange ratesNote that the RADE measure is effective only when the dataseem to be from a normal distribution and thus cannot calibratethe accuracy of volatility forecast for data from an unknowndistribution For this reason the RADE might not be a goodmeasure of performance for this example Both the integratedvolatility estimation and GARCH perform nicely for this exam-ple They outperform other nonparametric methods

Figure 5 reports the dynamic weights for the daily exchangerate of the UK pound to the US dollar Because the RiskMestimator is far more informative than the StaDo estimator ourintegrated estimator becomes basically the time-domain esti-mator by automatically choosing large dynamic weights Thisexemplifies our statement in the paragraph immediately before(1) in Section 1

Table 5 Comparisons of Several Volatility Estimation Methods

Term Measure (times10minus2) Hist RiskM Semi NonBay Integ GARCH StaDo

1 year MADE 1044 787 787 794 732 893 914RADE 5257 4231 4256 4225 4105 4402 4551

ER 2237 2009 2013 1562 3795 0 7366

5 years MADE 1207 1253 1296 1278 1201 1245 1638RADE 5315 5494 5630 5563 5571 5285 6543

ER 671 1339 1566 1112 5804 0 8929

10 years MADE 1041 1093 1103 1111 1018 1046 1366RADE 4939 5235 5296 5280 5152 4936 6008

ER 1112 1563 1790 1334 4911 0 6920

628 Journal of the American Statistical Association June 2007

Table 6 Comparison of Several Volatility Estimation Methods

Currency Measure Hist RiskM Semi NonBay Integ GARCH StaDo

UK MADE (times10minus4) 614 519 536 519 493 506 609RADE (times10minus3) 3991 3424 3513 3440 3492 3369 3900

ER 009 014 019 015 039 005 052

Australia MADE (times10minus4) 172 132 135 135 126 132 164RADE (times10minus3) 1986 1775 1830 1798 1761 1768 2033

ER 054 025 026 022 042 018 045

Japan MADE (times10minus1) 5554 5232 5444 5439 5064 5084 6987RADE (times10minus1) 3596 3546 3622 3588 3559 3448 4077

ER 012 011 019 012 028 003 032

6 CONCLUSIONS

We have proposed a dynamically integrated method and aBayesian method to aggregate the information from the timedomain and the state domain We studied the performance com-parisons both empirically and theoretically We showed that theproposed integrated method effectively aggregates the informa-tion from both the time and state domains and has advantagesover some previous methods for example it is powerful in fore-casting volatilities for the yields of bonds and for exchangerates Our study also revealed that proper use of informationfrom both the time domain and the state domain makes volatil-ity forecasting more accurate Our method exploits both thecontinuity in the time domain and the stationarity in the statedomain and can be applied to situations in which these twoconditions hold approximately

APPENDIX CONDITIONS AND PROOFSOF THEOREMS

We state technical conditions for the proof of our results

(C1) (Global Lipschitz condition) There exists a constant k0 ge 0such that

|μ(x) minus μ(y)| + |σ(x) minus σ(y)| le k0|x minus y| for all x y isin R

(C2) Given time point t gt 0 there exists a constant L gt 0 such that

E|μ(rs)|4(p+δ) le L and E|σ(rs)|4(p+δ) le L

for any s isin [t minus η t] where η is some positive constant p is a positiveinteger and δ is some small positive constant

Figure 5 Dynamic Weights for the Daily Exchange Rate of the UKPound and the US Dollar

(C3) The discrete observations rtiNi=0 satisfy the stationarity con-

ditions of Banon (1978) Furthermore the G2 condition of Rosenblatt(1970) holds for the transition operator

(C4) The conditional density p(y|x) of rti+ given rti is continuousin the arguments (y x) and is bounded by a constant independent of

(C5) The kernel W is a bounded symmetric probability densityfunction with compact support say [minus11]

(C6) (N minus n)h rarr infin (N minus n)h5 rarr 0 and (N)h rarr 0

Condition (C1) ensures that (3) has a unique strong solution rt t ge0 continuous with probability 1 if the initial value r0 satisfies thatP(|r0| lt infin) = 1 (see thm 46 of Liptser and Shiryayev 2001) Condi-tion (C2) indicates that given time point t gt 0 there is a time interval[t minus η t] on which the drift and the volatility have finite 4(p + δ)thmoments which is needed for deriving asymptotic normality of thetime-domain estimate σ 2

ESt Conditions (C3)ndash(C5) are similar to thoseof Fan and Zhang (2003) Condition (C6) is assumed for bandwidthswhere undersmoothing is uses to yield zero bias in the asymptotic nor-mality of the state-domain estimate

Throughout the proof we denote by M a generic positive constantand use μs and σs to represent μ(rs) and σ(rs)

Proposition A1 Under conditions (C1) and (C2) we have for al-most all sample paths

|σ 2s minus σ 2

u | le K|s minus u|(2pminus1)(4p) (A1)

for any su isin [t minus η t] where the coefficient K satisfies E[K2(p+δ)]ltinfin

Proof First we show that the process rs is locally Houmllder-continuous with order q = (p minus 1)(2p) and coefficient K1 such that

E[K4(p+δ)1 ] lt infin that is

|rs minus ru| le K1|s minus u|q foralls u isin [t minus η t] (A2)

In fact by Jensenrsquos inequality and martingale moment inequalities(Karatzas and Shreve 1991 sec 33D p 163) we have

E|ru minus rs|4(p+δ)

le M

(E

∣∣∣∣int u

sμv dv

∣∣∣∣4(p+δ)

+ E

∣∣∣∣int u

sσv dWv

∣∣∣∣4(p+δ))

le M(u minus s)4(p+δ)minus1int u

sE|μv|4(p+δ) dv

+ M(u minus s)2(p+δ)minus1int u

sE|σv|4(p+δ) dv

le M(u minus s)2(p+δ)

Then by theorem 21 of Revuz and Yor (1999 p 26)

E[(

supt 13=s

|rs minus ru||s minus u|α)4(p+δ)]

lt infin (A3)

Fan Fan and Jiang Integrating Time and State Domains for Volatility Estimation 629

for any α isin [02(p+δ)minus1

4(p+δ)) Let α = pminus1

2p and K1 = sups 13=u|rs minusru||sminusu|(pminus1)(2p) Then E[K4(p+δ)

1 lt infin] and the inequality (A2)holds

Second by condition (C1) we have |σ 2s minus σ 2

u | le k0(σs + σu)|rs minusru| This combined with (A2) leads to

|σ 2s minus σ 2

u | le k0(σs + σu)K1|s minus u|q equiv K|s minus u|q

Then by Houmllderrsquos inequality

E[K2(p+δ)

] le ME[(σsK1)2(p+δ)

]

le Mradic

E[σ

4(p+δ)s

]E[K4(p+δ)

1

]lt infin

Proof of Theorem 1

Let Zis = (rs minus rti)2 Applying Itocircrsquos formula to Zis we obtain

dZis = 2

(int s

tiμu du +

int s

tiσu dWu

)(μs ds + σs dWs

)+ σ 2

s ds

= 2

[(int s

tiμu du +

int s

tiσu dWu

)μs ds + σs

(int s

tiμu du

)dWs

]

+ 2

(int s

tiσu dWu

)σs dWs + σ 2

s ds

Then Y2i can be decomposed as Y2

i = 2ai + 2bi + σ 2i where

ai = minus1[int ti+1

tiμs ds

int s

tiμu du +

int ti+1

tiμs ds

int s

tiσu dWu

+int ti+1

tiσs dWs

int s

tiμu du

]

bi = minus1int ti+1

ti

int s

tiσu dWuσs dWs

and

σ 2i = minus1

int ti+1

tiσ 2

s ds

Therefore σ 2ESt can be written as

σ 2ESt = 2

1 minus λ

1 minus λn

tminus1sum

i=tminusn

λtminusiminus1ai + 21 minus λ

1 minus λn

tminus1sum

i=tminusn

λtminusiminus1bi

+ 1 minus λ

1 minus λn

tminus1sum

i=tminusn

λtminusiminus1σ 2i

equiv An + Bn + Cn

By Proposition A1 as n rarr 0 |Cn minus σ 2t | le K(n)q where q =

(2p minus 1)(4p) This combined with Lemmas A1 and A2 completesthe proof of the theorem

Lemma A1 If condition (C2) is satisfied then E[A2n] = O()

Proof Simple algebra gives the result In fact

E(a2i ) le 3E

[minus1

int ti+1

tiμs ds

int s

tiμu du

]2

+ 3E

[minus1

int ti+1

tiμs ds

int s

tiσu dWu

]2

+ 3E

[minus1

int ti+1

tiσs dWs

int s

tiμu du

]2

equiv I1() + I2() + I3()

Applying Jensenrsquos inequality we obtain that

I1() = O(minus1)E

[int ti+1

ti

int s

tiμ2

s μ2u du ds

]

= O(minus1)

int ti+1

ti

int s

tiE(μ4

u + μ4s )du ds = O()

By Jensenrsquos inequality Houmllderrsquos inequality and martingale momentinequalities we have

I2() = O(minus1)

int ti+1

tiE

(μs

int s

tiσu dWu

)2ds

= O(minus1)

int ti+1

ti

E[μs]4E

[int ti+1

tiσu dWu

]412ds

= O()

Similarly I3() = O() Therefore E(a2i ) = O() Then by the

CauchyndashSchwartz inequality and noting that n(1 minus λ) = O(1) we ob-tain that

E[A2n] le n

(1 minus λ

1 minus λn

)2 nsum

i=1

λ2(nminusi)E(a2i ) = O()

Lemma A2 Under conditions (C1) and (C2) if n rarr infin andn rarr 0 then

sminus11t

radicnBn

Dminusrarr N (01) (A4)

Proof Note that bj = σ 2t minus1 int tj+1

tj (Ws minus Wtj)dWs + εj where

εj = minus1int tj+1

tj(σs minus σt)

[int s

tjσu dWu

]dWs

+ minus1σt

int tj+1

tj

[int s

tj(σu minus σt)dWu

]dWs

By the central limit theorem for martingales (see Hall and Heyde 1980cor 31) it suffices to show that

V2n equiv E[sminus2

1t nB2n] rarr 1 (A5)

and that the following Lyapunov condition holds

tminus1sum

i=tminusn

E

(radicn

1 minus λ

1 minus λn λtminusiminus1bi

)4rarr 0 (A6)

Note that

2

2E(ε2

j ) le E

int tj+1

tj(σs minus σt)

[int s

tjσu dWu

]dWs

2

+ σ 2t E

int tj+1

tj

[int s

tj(σu minus σt)dWu

]dWs

2

equiv Ln1 + Ln2 (A7)

By Jensenrsquos inequality Houmllderrsquos inequality and moment inequalitiesfor martingale we have

Ln1 leint tj+1

tjE

(σs minus σt)

2[int s

tjσu dWu

]2ds

leint tj+1

tj

E(σs minus σt)

4E

[int s

tjσu dWu

]412ds

leint tj+1

tj

E[k0K1(n)q]436

int s

tjE(σ 4

u )du

12ds

le M(n)2q2 (A8)

630 Journal of the American Statistical Association June 2007

where K1 is defined in Proposition A1 Similarly

Ln2 le M(n)2q2 (A9)

By (A7) (A8) and (A9) we have E(ε2j ) le M(n)2q therefore

E[σminus4t b2

j ] = 12 + O((n)q) By the theory of stochastic calculus sim-

ple algebra gives that E(bj) = 0 and E(bibj) = 0 for i 13= j It followsthat

V2n = E(sminus2

1t nB2n) =

tminus1sum

i=tminusn

E

(2s1t

radicn

1 minus λ

1 minus λn λtminusiminus1bi

)2rarr 1

that is (A5) holds For (A6) it suffices to prove that E(b4j ) is

bounded which holds from condition (C2) and by applying the mo-ment inequalities for martingales to b4

j

Proof of Theorem 2

The proof is completed along the same lines as given by Fan andZhang (2003)

Proof of Theorem 3

By Fan and Yao (1998) the volatility estimator σ 2StN

behaves asif the instantaneous return function f were known Thus without lossof generality we assume that f (x) = 0 and hence Ri = Y2

i Let Y =(Y2

0 Y2Nminus1)T and W = diagWh(rt0 minus rtN ) Wh(rtNminus1 minus rtN )

and let X be a (N minus t0)times 2 matrix with each of the elements in the firstcolumn being 1 and the second column being (rt0 minus rtN rtNminus1 minusrtN )prime Let mi = E[Y2

i |rti ] m = (m0 mNminus1)T and e1 = (10)T Define SN = XT WX and TN = XT WY Then it can be written thatσ 2

StN= eT

1 Sminus1N TN (see Fan and Yao 2003) Hence

σ 2StN

minus σ 2tN = eT

1 Sminus1N XT Wm minus XβN + eT

1 Sminus1N XT W(Y minus m)

equiv eT1 b + eT

1 t (A10)

where βN = (m(rtN ) mprime(rtN ))T with m(rx) = E[Y21 |rt1 = x] Follow-

ing Fan and Zhang (2003) the bias vector b converges in probability toa vector b with b = O(h2) = o(1

radic(N)h) In what follows we show

that the centralized vector t is asymptotically normalIn fact write u = (N)minus1Hminus1XT W(Y minus m) where H = diag1h

Then following Fan and Zhang (2003) the vector t can be written as

t = pminus1(rtN )Hminus1Sminus1u(1 + op(1)) (A11)

where S = (μi+jminus2)ij=12 with μj = intujW(u)du and p(x) is the in-

variant density of rt defined in Section 22 For any constant vector cdefine

QN = cT u = 1

N

Nminus1sum

i=0

Y2i minus miCh

(rti minus rtN

)

where Ch(middot) = 1hC(middoth) with C(x) = c0W(x) + c1xW(x) Applyingthe ldquobig-blockrdquo and ldquosmall-blockrdquo arguments of Fan and Yao (2003thm 63 p 238) we obtain

θminus1(rtN )radic

(N)hQNDminusrarr N(01) (A12)

where θ2(rtN ) = 2p(rtN )σ 4(rtN )int +infinminusinfin C2(u)du In what follows we

decompose QN into two parts QprimeN and Qprimeprime

N which satisfy the follow-ing

(a) NhE[θminus1(rtN )QprimeN ]2 le (hN)(hminus1aN(1 + o(1)) + (N) times

o(hminus1)) rarr 0

(b) QprimeprimeN is identically distributed as QN and is asymptotically inde-

pendent of σ 2EStN

Define QprimeN = 1

NsumaN

i=0Y2i minus E[Y2

i |rti ]Ch(rti minus rtN ) and QprimeprimeN = QN minus

QprimeN where aN is a positive integer satisfying aN = o(N) and aN rarr

infin Obviously aN n Let ϑNi = (Y2i minus mi)Ch(rti minus rtN ) Then fol-

lowing Fan and Zhang (2003)

var[θminus1(

rtN)ϑN1

] = hminus1(1 + o(1))

and

Nminus2sum

=1

| cov(ϑN1 ϑN+1)| = o(hminus1)

which yields the result in (a) This combined with (A12) (a) and thedefinition of Qprimeprime

N leads to

θminus1(rtN )radic

NhQprimeprimeN

Dminusrarr N(01) (A13)

Note that the stationarity conditions of Banon (1978) and the G2 con-dition of Rosenblatt (1970) on the transition operator imply that theρt mixing coefficient ρt() of rti decays exponentially and that thestrong mixing coefficient α() le ρt() it follows that∣∣E exp

(Qprimeprime

N + σ 2EStN

) minus E expiξ(QprimeprimeN)E exp

(iξ σ 2

EStN

)∣∣

le 32α((aN minus n)) rarr 0

for any ξ isin R Using the theorem of Volkonskii and Rozanov (1959)we get the asymptotic independence of σ 2

EStNand Qprimeprime

N

By (a)radic

NhQprimeN is asymptotically negligible Then by Theorem 1

d1θminus1(rtN

)radicNhQN + d2Vminus12

2radic

n[σ 2

EStNminus σ 2(

rtN)]

DminusrarrN (0d21 + d2

2)

for any real d1 and d2 where V2 = ec+1ecminus1σ 4(rtN ) Because QN is a

linear transformation of u

Vminus12[ radic

Nhuradicn[σ 2

EStNminus σ 2(rtN )]

]Dminusrarr N (0 I3)

where V = blockdiagV1V2 with V1 = 2σ 4(rtN )p(rtN )Slowast whereSlowast = (νi+jminus2)ij=12 with νj = int

ujW2(u)du This combined with

(A11) gives the joint asymptotic normality of t and σ 2EStN

Note that

b = op(1radic

Nh) and it follows that

minus12(radic

Nh[σ 2StN

minus σ 2(rtN )]radic

n[σ 2EStN

minus σ 2(rtN )])

Dminusrarr N (0 I2)

where = diag2σ 4(rtN )ν0p(rtN )V2 Note that σ 2StN

and σ 2EStN

are asymptotically independent it follows that the asymptotical nor-mality in (b) holds

The result in (c) follows from (b) and the fact that wtNPrarr wtN and

σ 2ItN

= wtN σ 2EStN

+ (1 minus wtN )σ 2StN

It follows that

σ 2ItN minus σ 2

tN

= σ 2tN minus σ 2

tN + (wtN minus wtN

)[(σ 2

EStNminus σ 2

tN

) minus (σ 2

StNminus σ 2

tN

)]

equiv Ln1 + (wtN minus wtN

)(Ln2 minus Ln3)

Because Ln1 Ln2 and Ln3 are of the same order the second and thirdterms are dominated by the first term Ln1 Then σ 2

ItNminus σ 2

tN = (σ 2tN minus

σ 2tN )(1 + op(1)) and thus σ 2

ItNminus σ 2

tN shares the same asymptotical

normality as σ 2tN minus σ 2

tN

[Received November 2005 Revised December 2006]

Fan Fan and Jiang Integrating Time and State Domains for Volatility Estimation 631

REFERENCESAiumlt-Sahalia Y and Mykland P (2004) ldquoEstimators of Diffusions With Ran-

domly Spaced Discrete Observations A General Theoryrdquo The Annals of Sta-tistics 32 2186ndash2222

Aiumlt-Sahalia Y Mykland P A and Zhang L (2005) ldquoHow Often to Samplea Continuous-Time Process in the Presence of Market Microstructure NoiserdquoReview of Financial Studies 18 351ndash416

Arapis M and Gao J (2004) ldquoNonparametric Kernel Estimation and Testingin Continuous-Time Financial Econometricsrdquo manuscript

Bandi F and Phillips (2003) ldquoFully Nonparametric Estimation of Scalar Dif-fusion Modelsrdquo Econometrica 71 241ndash283

Banon G (1978) ldquoNonparametric Identification for Diffusion ProcessesrdquoSIAM Journal of Control Optimization 16 380ndash395

Barndoff-Neilsen O E and Shephard N (2001) ldquoNon-Gaussian OrnsteinndashUhlenbeckndashBased Models and Some of Their Uses in Financial Economicsrdquo(with discussion) Journal of the Royal Statistical Society Ser B 63167ndash241

(2002) ldquoEconometric Analysis of Realized Volatility and Its Use inEstimating Stochastic Volatility Modelsrdquo Journal of the Royal Statistical So-ciety Ser B 64 253ndash280

Bollerslev T and Zhou H (2002) ldquoEstimating Stochastic Volatility DiffusionUsing Conditional Moments of Integrated Volatilityrdquo Journal of Economet-rics 109 33ndash65

Brenner R J Harjes R H and Kroner K F (1996) ldquoAnother Look at Mod-els of the Short-Term Interest Raterdquo Journal of Financial and QuantitativeAnalysis 31 85ndash107

Cai Z and Hong Y (2003) ldquoNonparametric Methods in Continuous-TimeFinance A Selective Reviewrdquo in Recent Advances and Trends in Nonpara-metric Statistics eds M G Akritas and D M Politis Amsterdam Elsevierpp 283ndash302

Chan K C Karolyi A G Longstaff F A and Sanders A B (1992)ldquoAn Empirical Comparison of Alternative Models of the Short-Term Inter-est Raterdquo Journal of Finance 47 1209ndash1227

Chapman D A and Pearson N D (2000) ldquoIs the Short Rate Drift ActuallyNonlinearrdquo Journal of Finance 55 355ndash388

Chen S X Gao J and Tang C Y (2007) ldquoA Test for Model Specificationof Diffusion Processesrdquo The Annals of Statistics to appear

Cox J C Ingersoll J E and Ross S A (1985) ldquoA Theory of the TermStructure of Interest Ratesrdquo Econometrica 53 385ndash467

Daveacute R D and Stahl G (1997) ldquoOn the Accuracy of VaR Estimates Based onthe VariancendashCovariance Approachrdquo working paper Olshen and Associates

Duan J C (1997) ldquoAugumented GARCH(pq) Process and Its DiffusionLimitrdquo Journal of Econometrics 79 97ndash127

Duffie D and Pan J (1997) ldquoAn Overview of Value at Riskrdquo The Journal ofDerivatives 4 7ndash49

Fan J (2005) ldquoA Selective Overview of Nonparametric Methods in FinancialEconometricsrdquo (with discussion) Statistical Science 20 317ndash357

Fan J and Gu J (2003) ldquoSemiparametric Estimation of Value-at-RiskrdquoEconometrics Journal 6 261ndash290

Fan J Jiang J Zhang C and Zhou Z (2003) ldquoTime-Dependent DiffusionModels for Term Structure Dynamics and the Stock Price Volatilityrdquo Statis-tica Sinica 13 965ndash992

Fan J and Yao Q (1998) ldquoEfficient Estimation of Conditional Variance Func-tions in Stochastic Regressionrdquo Biometrika 85 645ndash660

(2003) Nonlinear Time Series Nonparametric and Parametric Meth-ods New York Springer-Verlag

Fan J and Zhang C M (2003) ldquoA Reexamination of Diffusion EstimatorsWith Applications to Financial Model Validationrdquo Journal of the AmericanStatistical Association 98 118ndash134

Genon-Catalot V Jeanthheau T and Laredo C (1999) ldquoParameter Esti-mation for Discretely Observed Stochastic Volatility Modelsrdquo Bernoulli 5855ndash872

Gijbels I Pope A and Wand M P (1999) ldquoUnderstanding ExponentialSmoothing via Kernel Regressionrdquo Journal of the Royal Statistical SocietySer B 61 39ndash50

Hall P and Hart J D (1990) ldquoNonparametric Regression With Long-RangeDependencerdquo Stochastic Processes and Their Applications 36 339ndash351

Hall P and Heyde C (1980) Martingale Limit Theorem and Its ApplicationsNew York Academic Press

Hallin M Lu Z and Tran L T (2001) ldquoDensity Estimation for Spatial Lin-ear Processesrdquo Bernoulli 7 657ndash668

Haumlrdle W Herwartz H and Spokoiny V (2002) ldquoTime-InhomogeneousMultiple Volatility Modellingrdquo Journal of Finance and Econometrics 155ndash95

Hong Y and Li H (2005) ldquoNonparametric Specification Testing forContinuous-Time Models With Applications to Term Structure of InterestrdquoReview of Financial Studies 18 37ndash84

Karatzas I and Shreve S (1991) Brownian Motion and Stochastic Calculus(2nd ed) New York Springer-Verlag

Kloeden D E Platen E Schurz H and Soslashrensen M (1996) ldquoOn Effectsof Discretization on Estimators of Drift Parameters for Diffusion ProcessesrdquoJournal of Applied Probability 33 1061ndash1076

Liptser R S and Shiryayev A N (2001) Statistics of Random Processes I II(2nd ed) New York Springer-Verlag

Mercurio D and Spokoiny V (2004) ldquoStatistical Inference for Time-Inhomogeneous Volatility Modelsrdquo The Annals of Statistics 32 577ndash602

Morgan J P (1996) RiskMetrics Technical Document (4th ed) New YorkAuthor

Nelson D B (1990) ldquoARCH Models as Diffusion Approximationsrdquo Journalof Econometrics 45 7ndash38

Revuz D and Yor M (1999) Continuous Martingales and Brownian MotionNew York Springer-Verlag

Robinson P M (1997) ldquoLarge-Sample Inference for Nonparametric Regres-sion With Dependent Errorsrdquo The Annals of Statistics 25 2054ndash2083

Rosenblatt M (1970) ldquoDensity Estimates and Markov Sequencesrdquo in Non-parametric Techniques in Statistical Inference ed M L Puri CambridgeUK Cambridge University Press pp 199ndash213

Ruppert D Wand M P Holst U and Houmlssjer O (1997) ldquoLocal PolynomialVariance Function Estimationrdquo Technometrics 39 262ndash273

Spokoiny V (2000) ldquoDrift Estimation for Nonparametric Diffusionrdquo The An-nals of Statistics 28 815ndash836

Stanton R (1997) ldquoA Nonparametric Models of Term Structure Dynamics andthe Market Price of Interest Rate Riskrdquo Journal of Finance LII 1973ndash2002

Vasicek O (1977) ldquoAn Equilibrium Characterization of Term Structurerdquo Jour-nal of Financial Economics 5 177ndash188

Volkonskii V A and Rozanov Yu A (1959) ldquoSome Limit Theorems forRandom Functions Part Irdquo Theory of Probability and Its Applications 4178ndash197

Wang Y (2002) ldquoAsymptotic Nonequivalence of GARCH Models and Diffu-sionsrdquo The Annals of Statistics 30 754ndash783

Page 7: Dynamic Integration of Time- and ... - Princeton Universityorfe.princeton.edu/~jqfan/papers/04/timestate4.pdf · Dynamic Integration of Time- and State-Domain Methods for Volatility

624 Journal of the American Statistical Association June 2007

where MADE(σ 2t ) is the average of MADE(σ 2

t ) among simula-tions

Example 1 In this example we compare the proposedmethods with the maximum likelihood estimation approachfor a parametric generalized autoregressive conditional het-eroscedasticity [GARCH(1 1)] model This will give an assess-ment of how well our methods perform compared with the mostefficient estimation method

Because the GARCH model does not have a state variablerti but the diffusion model does the diffusion model cannot beused to fit the data simulated from the GARCH model A sen-sible approach is to use the diffusion limit of the GARCH(1 1)model to generate data There are several interesting works onreconciling the two modeling approaches along this line forexample Nelson (1990) established the continuous-time dif-fusion limit for the discrete ARCH model Duan (1997) pro-posed an augmented GARCH model to unify various paramet-ric GARCH models and derived its diffusion limit and Wang(2002) studied the asymptotic relationship between GARCHmodels and diffusions Here we use the result on the diffusionlimit of the GARCH(1 1) model of Wang (2002)

Specifically we first generate data from the GARCH(1 1)model

Yt = σtεt

σ 2t = c0 + a0σ

2tminus1 + b0Y2

tminus1

where the εtrsquos are from the standard normal distribution and thetrue parameters c0 = 498 times 10minus7 a0 = 9289615 and b0 =0411574 are from the fitted values of the parameters for thedaily exchange rate of the Australian dollar with the US dollarfrom January 3 1994 to December 29 2000 The GARCH(1 1)model is then fitted by maximum likelihood estimation

Second we use the diffusion limit (see Wang 2002 sec 23)of the foregoing GARCH model

drt = σt dW1t

dσ 2t = (β0 + β1σ

2t )dt + β2σ

2t dW2t

where W1t and W2t are two independent standard Wienerprocess and the parameters are computed as (β0 β1 β2) =(25896 times 10minus5minus15538 4197) To simulate the diffusionprocess σ 2

t we use the discrete-time order 10 strong ap-proximation scheme of Kloeden Platen Schurz and Soslashrensen(1996)

For each of the two models we generate 600 series of dailydata each with length 1200 For each simulation we set thefirst 900 observations as the ldquoin-samplerdquo data and the last 300observations as the ldquoout-samplerdquo data The results summarizedin Table 2 show that all estimators have acceptable ER valuesclose to 05 The integrated estimator has minimum RIMADEIMADE MADE IRADE and RADE on average among allof the nonparametric estimators Compared with the most ef-ficient maximum likelihood estimator (MLE) of the GARCHmodel the integrated estimator has smaller MADE but largerRIMADE IMADE and IRADE The RADEs of the MLEand the integrated estimator are very close demonstrating thatthe integrated estimator is very impressive in forecasting thevolatility of this example

Note that the dynamic weights for the integrated estimatorare estimated by minimizing the variance of the combined esti-mator in (1) As one anonymous referee pointed out one maychoose to minimize the MSE instead which seems to be dif-ficult and computationally intensive in current situations Asargued in Section 31 the biases and variances trade-off havebeen considered indirectly in the time-domain and state-domainestimation methods The bias of the integrated estimator wascontrolled before the weights were chosen

To investigate whether the foregoing intuition is correct herewe opt for the suggestion from the referee to visually displayhow the integrated estimator dominates the state-domaintime-domain estimators over time through the dynamic weights andcompare the biases of the state-domain estimator and the inte-grated estimator The weights and biases are shown in Figure 2

Table 2 Comparisons of Several Volatility Estimation Methods

Measure Empirical formula Hist RiskM Semi NonBay Integ GARCH StaDo

RIMADE Score () 4073 2805 6494 5092 7379 7629 4090Average 205 204 183 187 178 169 204

Standard (times10minus2) 655 331 450 308 397 367 567Relative loss () 1544 1444 266 497 0 minus476 1475

IMADE Score () 6177 2554 5342 4608 7212 7546 4090Average (times10minus5) 332 367 330 336 318 304 362Standard (times10minus6) 1118 744 877 690 851 800 1113Relative loss () 440 1544 373 581 0 minus447 1390

MADE Score () 2821 4958 4975 5659 8815 5743 7963Average (times10minus4) 164 153 153 152 149 152 149Standard (times10minus6) 2314 2182 2049 2105 1869 1852 1876Relative loss () 1016 284 293 209 0 207 11

IRADE Score () 6277 2404 5109 4708 7062 7713 4040Average (times10minus3) 403 448 407 410 391 369 446Standard (times10minus3) 121 075 092 070 093 085 131Relative loss () 317 1475 411 498 0 minus553 1422

RADE Score () 3055 5042 4341 5910 8464 5893 7913Average (times10minus2) 20 19 19 19 19 19 19Standard (times10minus4) 153 145 141 142 134 133 137Relative loss () 513 142 165 96 0 106 minus25

ER Average (times10minus2) 495 545 538 528 525 516 500Standard (times10minus2) 157 113 128 112 136 125 149

Fan Fan and Jiang Integrating Time and State Domains for Volatility Estimation 625

(a) (b)

Figure 2 The Dynamic Weights for Prediction for Example 1 (a) and the Biases for Prediction for the State-Domain Estimator ( minus minus minus) and theIntegration Estimator ( mdash) (b)

It is seen that the bias is actually controlled because both of theestimators have very close biases The weights are around 2and through the weights the integrated method improves boththe RiskMetrics method and the state-domain approach

Example 2 To simulate the interest rate data we considerthe CoxndashIngersollndashRoss (CIR) model

drt = κ(θ minus rt)dt + σ r12t dWt t ge t0

where the spot rate rt moves around a central location or long-run equilibrium level θ = 08571 at speed κ = 21459 The σ

is set at 07830 These parameters values given by Chapmanand Pearson (2000) satisfy the condition 2κθ ge σ 2 so that theprocess rt is stationary and positive The model has been studiedby Chapman and Pearson (2000) and Fan and Zhang (2003)

We simulated weekly data (600 runs) from the CIR modelFor each simulation we designated the first 900 observations

as the ldquoin-samplerdquo data and the last 300 observations the ldquoout-samplerdquo data The results summarized in Table 3 show that theintegrated method performs closely to the state-domain methodThis is mainly because the dynamic weights shown in Figure 3are very small and the integration estimator is very close to thestate-domain estimator supporting our statement at the end ofthe third paragraph in Section 1 Table 3 shows that the inte-grated estimator uniformly dominates the other estimators onaverage due to its highest score and lowest averaged RIMADEIMADE MADE IRADE and RADE The improvement inRIMADE IMADE and IRADE is about 100 This demon-strates that our integrated volatility method better captures thevolatility dynamics The historical simulation method performspoorly due to misspecification of the function of the volatilityparameter The results here show the advantage of aggregatingthe information of the time and state domains Note that all es-timators have reasonable ER values at level 05 and that the ERvalue of the integrated estimator is closest to 05

Table 3 Comparisons of Several Volatility Estimation Methods

Measure Empirical formula Hist RiskM Semi NonBay Integ GARCH StaDo

RIMADE Score () 501 1820 2104 3656 9933 2404 9917Average (times10minus1) 267 190 188 164 61 202 62Standard (times10minus1) 148 35 81 31 39 130 43Relative loss () 33732 21074 20798 16821 0 23012 112

IMADE Score () 768 1319 1469 2972 9967 2120 9950Average (times10minus6) 237 208 191 179 64 196 63Standard (times10minus6) 100 78 69 68 45 91 46Relative loss () 27162 22558 19905 18107 0 20696 minus163

MADE Score () 3823 5209 5776 5609 7145 5392 6945Average (times10minus5) 102 93 93 93 91 94 92Standard (times10minus6) 349 333 321 330 317 310 317Relative loss () 1100 212 221 162 0 229 21

IRADE Score () 818 1252 1402 3072 9983 2104 9950Average (times10minus4) 376 322 307 277 99 316 98Standard (times10minus4) 140 75 90 66 60 192 62Relative loss () 27855 22397 20898 17892 0 21821 minus111

RADE Score () 4040 5209 5042 5593 7379 4875 7279Average (times10minus2) 15 15 15 15 15 15 15Standard (times10minus4) 272 267 257 265 258 251 257Relative loss () 608 123 160 92 0 189 06

ER Average 056 055 054 053 049 053 049Standard 023 011 013 011 013 036 013

626 Journal of the American Statistical Association June 2007

(a) (b)

Figure 3 The Dynamic Weights for Prediction for Example 2 (a) and the Biases for Prediction for the State-Domain Estimator ( minus minus minus) and forthe Integration Estimator ( mdash) (b)

Again Figure 3(b) displays the biases for the state domainand integration estimators It can be seen that the state-domainand integration estimators have almost overlaid biases demon-strating that the bias of the integrated estimator is taken care ofeven though we choose the dynamic weights by minimizing thevariance

Example 3 To assess the performance of the proposed meth-ods under a nonstationary situation we now consider the geo-metric Brownian (GBM)

drt = μrt dt + σ rt dWt

where Wt is a standard one-dimensional Brownian motion Thisis a nonstationary process against which we check whether ourmethod continues to apply Note that the celebrated BlackndashScholes option price formula is derived from Osbornersquos as-sumption that the stock price follows the GBM model By Itocircrsquosformula we have

log rt minus log r0 = (μ minus σ 22)t + σ 2Wt

We set μ = 03 and σ = 26 in our simulations With theBrownian motion simulated from independent Gaussian incre-ments we can generate the samples for the GBM We simulate600 times with = 1252 For each simulation we generate1000 observations and use the first two-thirds of the observa-tions as in-sample data and the remaining observations as out-sample data

Because of the nonstationarity of the process the simulationresults are somewhat unstable with a few uncommon realiza-tions appearing Therefore we use a robust method to sum-marize the results For each measure we compute the relativeloss and the median along with the median absolute deviation(MAD) for scale estimation The MAD for sample ais

i=1 isdefined as MAD(a) = median|ai minus median(a)| i = 1 sIt is easy to see that MAD(a) provides a robust estimator ofthe standard deviation of ai Table 4 reports the results Thetable shows that the integrated estimator performs quite wellAll estimators have acceptable ER values The proposed esti-

Table 4 Robust Comparisons of Several Volatility Estimation Methods

Measure Empirical formula Hist RiskM Semi NonBay Integ GARCH StaDo

RIMADE Score () 733 4300 3583 7283 6133 2733 5233Median (times10minus1) 2254 1561 1770 1490 1382 2036 1443

MAD 035 012 024 012 034 036 037Relative loss () 6310 1296 2805 781 0 4732 444

IMADE Score () 1333 4333 2633 7333 6517 2550 5300Median (times10minus7) 217 174 193 163 146 265 154MAD (times10minus8) 566 474 486 445 322 817 357

Relative loss () 4864 1913 3211 1205 0 8145 561

MADE Score () 3900 3950 2650 4933 5200 4233 5133Median (times10minus6) 115 111 111 110 109 111 109MAD (times10minus7) 236 244 240 244 231 230 230

Relative loss () 592 143 181 123 0 174 29

Score () 1417 4317 2500 7317 6467 2683 5300IRADE Median (times10minus4) 346 267 310 255 251 378 266

MAD (times10minus5) 599 406 488 389 566 847 644Relative loss () 3787 649 2358 149 0 5037 591

RADE Score () 3900 3700 2433 5117 6150 2950 5750Median (times10minus4) 1581 1509 1515 1508 1504 1661 1503MAD (times10minus4) 185 201 198 200 193 257 192

Relative loss () 509 30 73 23 0 1042 minus07

ER Median 054 051 054 051 048 057 0480MAD 010 003 004 003 006 007 006

Fan Fan and Jiang Integrating Time and State Domains for Volatility Estimation 627

(a) (b)

Figure 4 The Dynamic Weights for Prediction for Example 3 (a) and the Biases for Prediction for the State-Domain Estimator ( minus minus minus) and theIntegration Estimator ( mdash) (b)

mators have the smallest measure values in median or robustscale This shows that our integrated method continues to per-form better than the others for this nonstationary case

Again Figure 4(b) displays the biases for the state domainand integration estimators It can be seen that the state-domainand integration estimators have almost overlaid biases demon-strating that the bias of the integrated estimator is taken care ofeven though we choose the dynamic weights by minimizing thevariance

52 Real Examples

In this section we apply the integrated volatility estimationmethods and other methods to the analysis of real financial data

521 Treasury Bond Here we consider the weekly returnsof three Treasury Bonds with terms of 1 5 and 10 years Thein-sample data are for January 4 1974ndashDecember 30 1994 andout-sample data are for January 6 1995ndashAugust 8 2003 Thetotal sample size is 1545 and the in-sample size is 1096 Theresults are reported in Table 5

From Table 5 for the 1-year Treasury Notes the integratedestimator is of the smallest MADE and the smallest RADE re-flecting that the integrated estimation method of the volatilityis the best among the seven methods For the bonds with 5-or 10-year maturities the seven estimators have close MADEsand RADEs where the historical simulation method is bet-ter than the RiskMetrics in terms of MADE and RADE andthe integrated estimation approach has the smallest MADEsIn addition the integrated estimator has ER values closest to

05 This demonstrates the advantage of using state-domaininformation which can improve the time-domain predictionof the volatilities in the interest dynamics The nonparametricBayesian method does not perform as well as the integrated es-timator because it uses fixed weights and the inverse-gammaprior which may not be true in practice

522 Exchange Rate We analyze the daily exchange rateof several foreign currencies against the US dollar The dataare for January 3 1994ndashAugust 1 2003 The in-sample dataconsist of the observations before January 1 2001 and the out-sample data consist of the remaining observations The resultsreported in Table 6 show that the integrated estimator has thesmallest MADEs and moderate RADEs for the exchange ratesNote that the RADE measure is effective only when the dataseem to be from a normal distribution and thus cannot calibratethe accuracy of volatility forecast for data from an unknowndistribution For this reason the RADE might not be a goodmeasure of performance for this example Both the integratedvolatility estimation and GARCH perform nicely for this exam-ple They outperform other nonparametric methods

Figure 5 reports the dynamic weights for the daily exchangerate of the UK pound to the US dollar Because the RiskMestimator is far more informative than the StaDo estimator ourintegrated estimator becomes basically the time-domain esti-mator by automatically choosing large dynamic weights Thisexemplifies our statement in the paragraph immediately before(1) in Section 1

Table 5 Comparisons of Several Volatility Estimation Methods

Term Measure (times10minus2) Hist RiskM Semi NonBay Integ GARCH StaDo

1 year MADE 1044 787 787 794 732 893 914RADE 5257 4231 4256 4225 4105 4402 4551

ER 2237 2009 2013 1562 3795 0 7366

5 years MADE 1207 1253 1296 1278 1201 1245 1638RADE 5315 5494 5630 5563 5571 5285 6543

ER 671 1339 1566 1112 5804 0 8929

10 years MADE 1041 1093 1103 1111 1018 1046 1366RADE 4939 5235 5296 5280 5152 4936 6008

ER 1112 1563 1790 1334 4911 0 6920

628 Journal of the American Statistical Association June 2007

Table 6 Comparison of Several Volatility Estimation Methods

Currency Measure Hist RiskM Semi NonBay Integ GARCH StaDo

UK MADE (times10minus4) 614 519 536 519 493 506 609RADE (times10minus3) 3991 3424 3513 3440 3492 3369 3900

ER 009 014 019 015 039 005 052

Australia MADE (times10minus4) 172 132 135 135 126 132 164RADE (times10minus3) 1986 1775 1830 1798 1761 1768 2033

ER 054 025 026 022 042 018 045

Japan MADE (times10minus1) 5554 5232 5444 5439 5064 5084 6987RADE (times10minus1) 3596 3546 3622 3588 3559 3448 4077

ER 012 011 019 012 028 003 032

6 CONCLUSIONS

We have proposed a dynamically integrated method and aBayesian method to aggregate the information from the timedomain and the state domain We studied the performance com-parisons both empirically and theoretically We showed that theproposed integrated method effectively aggregates the informa-tion from both the time and state domains and has advantagesover some previous methods for example it is powerful in fore-casting volatilities for the yields of bonds and for exchangerates Our study also revealed that proper use of informationfrom both the time domain and the state domain makes volatil-ity forecasting more accurate Our method exploits both thecontinuity in the time domain and the stationarity in the statedomain and can be applied to situations in which these twoconditions hold approximately

APPENDIX CONDITIONS AND PROOFSOF THEOREMS

We state technical conditions for the proof of our results

(C1) (Global Lipschitz condition) There exists a constant k0 ge 0such that

|μ(x) minus μ(y)| + |σ(x) minus σ(y)| le k0|x minus y| for all x y isin R

(C2) Given time point t gt 0 there exists a constant L gt 0 such that

E|μ(rs)|4(p+δ) le L and E|σ(rs)|4(p+δ) le L

for any s isin [t minus η t] where η is some positive constant p is a positiveinteger and δ is some small positive constant

Figure 5 Dynamic Weights for the Daily Exchange Rate of the UKPound and the US Dollar

(C3) The discrete observations rtiNi=0 satisfy the stationarity con-

ditions of Banon (1978) Furthermore the G2 condition of Rosenblatt(1970) holds for the transition operator

(C4) The conditional density p(y|x) of rti+ given rti is continuousin the arguments (y x) and is bounded by a constant independent of

(C5) The kernel W is a bounded symmetric probability densityfunction with compact support say [minus11]

(C6) (N minus n)h rarr infin (N minus n)h5 rarr 0 and (N)h rarr 0

Condition (C1) ensures that (3) has a unique strong solution rt t ge0 continuous with probability 1 if the initial value r0 satisfies thatP(|r0| lt infin) = 1 (see thm 46 of Liptser and Shiryayev 2001) Condi-tion (C2) indicates that given time point t gt 0 there is a time interval[t minus η t] on which the drift and the volatility have finite 4(p + δ)thmoments which is needed for deriving asymptotic normality of thetime-domain estimate σ 2

ESt Conditions (C3)ndash(C5) are similar to thoseof Fan and Zhang (2003) Condition (C6) is assumed for bandwidthswhere undersmoothing is uses to yield zero bias in the asymptotic nor-mality of the state-domain estimate

Throughout the proof we denote by M a generic positive constantand use μs and σs to represent μ(rs) and σ(rs)

Proposition A1 Under conditions (C1) and (C2) we have for al-most all sample paths

|σ 2s minus σ 2

u | le K|s minus u|(2pminus1)(4p) (A1)

for any su isin [t minus η t] where the coefficient K satisfies E[K2(p+δ)]ltinfin

Proof First we show that the process rs is locally Houmllder-continuous with order q = (p minus 1)(2p) and coefficient K1 such that

E[K4(p+δ)1 ] lt infin that is

|rs minus ru| le K1|s minus u|q foralls u isin [t minus η t] (A2)

In fact by Jensenrsquos inequality and martingale moment inequalities(Karatzas and Shreve 1991 sec 33D p 163) we have

E|ru minus rs|4(p+δ)

le M

(E

∣∣∣∣int u

sμv dv

∣∣∣∣4(p+δ)

+ E

∣∣∣∣int u

sσv dWv

∣∣∣∣4(p+δ))

le M(u minus s)4(p+δ)minus1int u

sE|μv|4(p+δ) dv

+ M(u minus s)2(p+δ)minus1int u

sE|σv|4(p+δ) dv

le M(u minus s)2(p+δ)

Then by theorem 21 of Revuz and Yor (1999 p 26)

E[(

supt 13=s

|rs minus ru||s minus u|α)4(p+δ)]

lt infin (A3)

Fan Fan and Jiang Integrating Time and State Domains for Volatility Estimation 629

for any α isin [02(p+δ)minus1

4(p+δ)) Let α = pminus1

2p and K1 = sups 13=u|rs minusru||sminusu|(pminus1)(2p) Then E[K4(p+δ)

1 lt infin] and the inequality (A2)holds

Second by condition (C1) we have |σ 2s minus σ 2

u | le k0(σs + σu)|rs minusru| This combined with (A2) leads to

|σ 2s minus σ 2

u | le k0(σs + σu)K1|s minus u|q equiv K|s minus u|q

Then by Houmllderrsquos inequality

E[K2(p+δ)

] le ME[(σsK1)2(p+δ)

]

le Mradic

E[σ

4(p+δ)s

]E[K4(p+δ)

1

]lt infin

Proof of Theorem 1

Let Zis = (rs minus rti)2 Applying Itocircrsquos formula to Zis we obtain

dZis = 2

(int s

tiμu du +

int s

tiσu dWu

)(μs ds + σs dWs

)+ σ 2

s ds

= 2

[(int s

tiμu du +

int s

tiσu dWu

)μs ds + σs

(int s

tiμu du

)dWs

]

+ 2

(int s

tiσu dWu

)σs dWs + σ 2

s ds

Then Y2i can be decomposed as Y2

i = 2ai + 2bi + σ 2i where

ai = minus1[int ti+1

tiμs ds

int s

tiμu du +

int ti+1

tiμs ds

int s

tiσu dWu

+int ti+1

tiσs dWs

int s

tiμu du

]

bi = minus1int ti+1

ti

int s

tiσu dWuσs dWs

and

σ 2i = minus1

int ti+1

tiσ 2

s ds

Therefore σ 2ESt can be written as

σ 2ESt = 2

1 minus λ

1 minus λn

tminus1sum

i=tminusn

λtminusiminus1ai + 21 minus λ

1 minus λn

tminus1sum

i=tminusn

λtminusiminus1bi

+ 1 minus λ

1 minus λn

tminus1sum

i=tminusn

λtminusiminus1σ 2i

equiv An + Bn + Cn

By Proposition A1 as n rarr 0 |Cn minus σ 2t | le K(n)q where q =

(2p minus 1)(4p) This combined with Lemmas A1 and A2 completesthe proof of the theorem

Lemma A1 If condition (C2) is satisfied then E[A2n] = O()

Proof Simple algebra gives the result In fact

E(a2i ) le 3E

[minus1

int ti+1

tiμs ds

int s

tiμu du

]2

+ 3E

[minus1

int ti+1

tiμs ds

int s

tiσu dWu

]2

+ 3E

[minus1

int ti+1

tiσs dWs

int s

tiμu du

]2

equiv I1() + I2() + I3()

Applying Jensenrsquos inequality we obtain that

I1() = O(minus1)E

[int ti+1

ti

int s

tiμ2

s μ2u du ds

]

= O(minus1)

int ti+1

ti

int s

tiE(μ4

u + μ4s )du ds = O()

By Jensenrsquos inequality Houmllderrsquos inequality and martingale momentinequalities we have

I2() = O(minus1)

int ti+1

tiE

(μs

int s

tiσu dWu

)2ds

= O(minus1)

int ti+1

ti

E[μs]4E

[int ti+1

tiσu dWu

]412ds

= O()

Similarly I3() = O() Therefore E(a2i ) = O() Then by the

CauchyndashSchwartz inequality and noting that n(1 minus λ) = O(1) we ob-tain that

E[A2n] le n

(1 minus λ

1 minus λn

)2 nsum

i=1

λ2(nminusi)E(a2i ) = O()

Lemma A2 Under conditions (C1) and (C2) if n rarr infin andn rarr 0 then

sminus11t

radicnBn

Dminusrarr N (01) (A4)

Proof Note that bj = σ 2t minus1 int tj+1

tj (Ws minus Wtj)dWs + εj where

εj = minus1int tj+1

tj(σs minus σt)

[int s

tjσu dWu

]dWs

+ minus1σt

int tj+1

tj

[int s

tj(σu minus σt)dWu

]dWs

By the central limit theorem for martingales (see Hall and Heyde 1980cor 31) it suffices to show that

V2n equiv E[sminus2

1t nB2n] rarr 1 (A5)

and that the following Lyapunov condition holds

tminus1sum

i=tminusn

E

(radicn

1 minus λ

1 minus λn λtminusiminus1bi

)4rarr 0 (A6)

Note that

2

2E(ε2

j ) le E

int tj+1

tj(σs minus σt)

[int s

tjσu dWu

]dWs

2

+ σ 2t E

int tj+1

tj

[int s

tj(σu minus σt)dWu

]dWs

2

equiv Ln1 + Ln2 (A7)

By Jensenrsquos inequality Houmllderrsquos inequality and moment inequalitiesfor martingale we have

Ln1 leint tj+1

tjE

(σs minus σt)

2[int s

tjσu dWu

]2ds

leint tj+1

tj

E(σs minus σt)

4E

[int s

tjσu dWu

]412ds

leint tj+1

tj

E[k0K1(n)q]436

int s

tjE(σ 4

u )du

12ds

le M(n)2q2 (A8)

630 Journal of the American Statistical Association June 2007

where K1 is defined in Proposition A1 Similarly

Ln2 le M(n)2q2 (A9)

By (A7) (A8) and (A9) we have E(ε2j ) le M(n)2q therefore

E[σminus4t b2

j ] = 12 + O((n)q) By the theory of stochastic calculus sim-

ple algebra gives that E(bj) = 0 and E(bibj) = 0 for i 13= j It followsthat

V2n = E(sminus2

1t nB2n) =

tminus1sum

i=tminusn

E

(2s1t

radicn

1 minus λ

1 minus λn λtminusiminus1bi

)2rarr 1

that is (A5) holds For (A6) it suffices to prove that E(b4j ) is

bounded which holds from condition (C2) and by applying the mo-ment inequalities for martingales to b4

j

Proof of Theorem 2

The proof is completed along the same lines as given by Fan andZhang (2003)

Proof of Theorem 3

By Fan and Yao (1998) the volatility estimator σ 2StN

behaves asif the instantaneous return function f were known Thus without lossof generality we assume that f (x) = 0 and hence Ri = Y2

i Let Y =(Y2

0 Y2Nminus1)T and W = diagWh(rt0 minus rtN ) Wh(rtNminus1 minus rtN )

and let X be a (N minus t0)times 2 matrix with each of the elements in the firstcolumn being 1 and the second column being (rt0 minus rtN rtNminus1 minusrtN )prime Let mi = E[Y2

i |rti ] m = (m0 mNminus1)T and e1 = (10)T Define SN = XT WX and TN = XT WY Then it can be written thatσ 2

StN= eT

1 Sminus1N TN (see Fan and Yao 2003) Hence

σ 2StN

minus σ 2tN = eT

1 Sminus1N XT Wm minus XβN + eT

1 Sminus1N XT W(Y minus m)

equiv eT1 b + eT

1 t (A10)

where βN = (m(rtN ) mprime(rtN ))T with m(rx) = E[Y21 |rt1 = x] Follow-

ing Fan and Zhang (2003) the bias vector b converges in probability toa vector b with b = O(h2) = o(1

radic(N)h) In what follows we show

that the centralized vector t is asymptotically normalIn fact write u = (N)minus1Hminus1XT W(Y minus m) where H = diag1h

Then following Fan and Zhang (2003) the vector t can be written as

t = pminus1(rtN )Hminus1Sminus1u(1 + op(1)) (A11)

where S = (μi+jminus2)ij=12 with μj = intujW(u)du and p(x) is the in-

variant density of rt defined in Section 22 For any constant vector cdefine

QN = cT u = 1

N

Nminus1sum

i=0

Y2i minus miCh

(rti minus rtN

)

where Ch(middot) = 1hC(middoth) with C(x) = c0W(x) + c1xW(x) Applyingthe ldquobig-blockrdquo and ldquosmall-blockrdquo arguments of Fan and Yao (2003thm 63 p 238) we obtain

θminus1(rtN )radic

(N)hQNDminusrarr N(01) (A12)

where θ2(rtN ) = 2p(rtN )σ 4(rtN )int +infinminusinfin C2(u)du In what follows we

decompose QN into two parts QprimeN and Qprimeprime

N which satisfy the follow-ing

(a) NhE[θminus1(rtN )QprimeN ]2 le (hN)(hminus1aN(1 + o(1)) + (N) times

o(hminus1)) rarr 0

(b) QprimeprimeN is identically distributed as QN and is asymptotically inde-

pendent of σ 2EStN

Define QprimeN = 1

NsumaN

i=0Y2i minus E[Y2

i |rti ]Ch(rti minus rtN ) and QprimeprimeN = QN minus

QprimeN where aN is a positive integer satisfying aN = o(N) and aN rarr

infin Obviously aN n Let ϑNi = (Y2i minus mi)Ch(rti minus rtN ) Then fol-

lowing Fan and Zhang (2003)

var[θminus1(

rtN)ϑN1

] = hminus1(1 + o(1))

and

Nminus2sum

=1

| cov(ϑN1 ϑN+1)| = o(hminus1)

which yields the result in (a) This combined with (A12) (a) and thedefinition of Qprimeprime

N leads to

θminus1(rtN )radic

NhQprimeprimeN

Dminusrarr N(01) (A13)

Note that the stationarity conditions of Banon (1978) and the G2 con-dition of Rosenblatt (1970) on the transition operator imply that theρt mixing coefficient ρt() of rti decays exponentially and that thestrong mixing coefficient α() le ρt() it follows that∣∣E exp

(Qprimeprime

N + σ 2EStN

) minus E expiξ(QprimeprimeN)E exp

(iξ σ 2

EStN

)∣∣

le 32α((aN minus n)) rarr 0

for any ξ isin R Using the theorem of Volkonskii and Rozanov (1959)we get the asymptotic independence of σ 2

EStNand Qprimeprime

N

By (a)radic

NhQprimeN is asymptotically negligible Then by Theorem 1

d1θminus1(rtN

)radicNhQN + d2Vminus12

2radic

n[σ 2

EStNminus σ 2(

rtN)]

DminusrarrN (0d21 + d2

2)

for any real d1 and d2 where V2 = ec+1ecminus1σ 4(rtN ) Because QN is a

linear transformation of u

Vminus12[ radic

Nhuradicn[σ 2

EStNminus σ 2(rtN )]

]Dminusrarr N (0 I3)

where V = blockdiagV1V2 with V1 = 2σ 4(rtN )p(rtN )Slowast whereSlowast = (νi+jminus2)ij=12 with νj = int

ujW2(u)du This combined with

(A11) gives the joint asymptotic normality of t and σ 2EStN

Note that

b = op(1radic

Nh) and it follows that

minus12(radic

Nh[σ 2StN

minus σ 2(rtN )]radic

n[σ 2EStN

minus σ 2(rtN )])

Dminusrarr N (0 I2)

where = diag2σ 4(rtN )ν0p(rtN )V2 Note that σ 2StN

and σ 2EStN

are asymptotically independent it follows that the asymptotical nor-mality in (b) holds

The result in (c) follows from (b) and the fact that wtNPrarr wtN and

σ 2ItN

= wtN σ 2EStN

+ (1 minus wtN )σ 2StN

It follows that

σ 2ItN minus σ 2

tN

= σ 2tN minus σ 2

tN + (wtN minus wtN

)[(σ 2

EStNminus σ 2

tN

) minus (σ 2

StNminus σ 2

tN

)]

equiv Ln1 + (wtN minus wtN

)(Ln2 minus Ln3)

Because Ln1 Ln2 and Ln3 are of the same order the second and thirdterms are dominated by the first term Ln1 Then σ 2

ItNminus σ 2

tN = (σ 2tN minus

σ 2tN )(1 + op(1)) and thus σ 2

ItNminus σ 2

tN shares the same asymptotical

normality as σ 2tN minus σ 2

tN

[Received November 2005 Revised December 2006]

Fan Fan and Jiang Integrating Time and State Domains for Volatility Estimation 631

REFERENCESAiumlt-Sahalia Y and Mykland P (2004) ldquoEstimators of Diffusions With Ran-

domly Spaced Discrete Observations A General Theoryrdquo The Annals of Sta-tistics 32 2186ndash2222

Aiumlt-Sahalia Y Mykland P A and Zhang L (2005) ldquoHow Often to Samplea Continuous-Time Process in the Presence of Market Microstructure NoiserdquoReview of Financial Studies 18 351ndash416

Arapis M and Gao J (2004) ldquoNonparametric Kernel Estimation and Testingin Continuous-Time Financial Econometricsrdquo manuscript

Bandi F and Phillips (2003) ldquoFully Nonparametric Estimation of Scalar Dif-fusion Modelsrdquo Econometrica 71 241ndash283

Banon G (1978) ldquoNonparametric Identification for Diffusion ProcessesrdquoSIAM Journal of Control Optimization 16 380ndash395

Barndoff-Neilsen O E and Shephard N (2001) ldquoNon-Gaussian OrnsteinndashUhlenbeckndashBased Models and Some of Their Uses in Financial Economicsrdquo(with discussion) Journal of the Royal Statistical Society Ser B 63167ndash241

(2002) ldquoEconometric Analysis of Realized Volatility and Its Use inEstimating Stochastic Volatility Modelsrdquo Journal of the Royal Statistical So-ciety Ser B 64 253ndash280

Bollerslev T and Zhou H (2002) ldquoEstimating Stochastic Volatility DiffusionUsing Conditional Moments of Integrated Volatilityrdquo Journal of Economet-rics 109 33ndash65

Brenner R J Harjes R H and Kroner K F (1996) ldquoAnother Look at Mod-els of the Short-Term Interest Raterdquo Journal of Financial and QuantitativeAnalysis 31 85ndash107

Cai Z and Hong Y (2003) ldquoNonparametric Methods in Continuous-TimeFinance A Selective Reviewrdquo in Recent Advances and Trends in Nonpara-metric Statistics eds M G Akritas and D M Politis Amsterdam Elsevierpp 283ndash302

Chan K C Karolyi A G Longstaff F A and Sanders A B (1992)ldquoAn Empirical Comparison of Alternative Models of the Short-Term Inter-est Raterdquo Journal of Finance 47 1209ndash1227

Chapman D A and Pearson N D (2000) ldquoIs the Short Rate Drift ActuallyNonlinearrdquo Journal of Finance 55 355ndash388

Chen S X Gao J and Tang C Y (2007) ldquoA Test for Model Specificationof Diffusion Processesrdquo The Annals of Statistics to appear

Cox J C Ingersoll J E and Ross S A (1985) ldquoA Theory of the TermStructure of Interest Ratesrdquo Econometrica 53 385ndash467

Daveacute R D and Stahl G (1997) ldquoOn the Accuracy of VaR Estimates Based onthe VariancendashCovariance Approachrdquo working paper Olshen and Associates

Duan J C (1997) ldquoAugumented GARCH(pq) Process and Its DiffusionLimitrdquo Journal of Econometrics 79 97ndash127

Duffie D and Pan J (1997) ldquoAn Overview of Value at Riskrdquo The Journal ofDerivatives 4 7ndash49

Fan J (2005) ldquoA Selective Overview of Nonparametric Methods in FinancialEconometricsrdquo (with discussion) Statistical Science 20 317ndash357

Fan J and Gu J (2003) ldquoSemiparametric Estimation of Value-at-RiskrdquoEconometrics Journal 6 261ndash290

Fan J Jiang J Zhang C and Zhou Z (2003) ldquoTime-Dependent DiffusionModels for Term Structure Dynamics and the Stock Price Volatilityrdquo Statis-tica Sinica 13 965ndash992

Fan J and Yao Q (1998) ldquoEfficient Estimation of Conditional Variance Func-tions in Stochastic Regressionrdquo Biometrika 85 645ndash660

(2003) Nonlinear Time Series Nonparametric and Parametric Meth-ods New York Springer-Verlag

Fan J and Zhang C M (2003) ldquoA Reexamination of Diffusion EstimatorsWith Applications to Financial Model Validationrdquo Journal of the AmericanStatistical Association 98 118ndash134

Genon-Catalot V Jeanthheau T and Laredo C (1999) ldquoParameter Esti-mation for Discretely Observed Stochastic Volatility Modelsrdquo Bernoulli 5855ndash872

Gijbels I Pope A and Wand M P (1999) ldquoUnderstanding ExponentialSmoothing via Kernel Regressionrdquo Journal of the Royal Statistical SocietySer B 61 39ndash50

Hall P and Hart J D (1990) ldquoNonparametric Regression With Long-RangeDependencerdquo Stochastic Processes and Their Applications 36 339ndash351

Hall P and Heyde C (1980) Martingale Limit Theorem and Its ApplicationsNew York Academic Press

Hallin M Lu Z and Tran L T (2001) ldquoDensity Estimation for Spatial Lin-ear Processesrdquo Bernoulli 7 657ndash668

Haumlrdle W Herwartz H and Spokoiny V (2002) ldquoTime-InhomogeneousMultiple Volatility Modellingrdquo Journal of Finance and Econometrics 155ndash95

Hong Y and Li H (2005) ldquoNonparametric Specification Testing forContinuous-Time Models With Applications to Term Structure of InterestrdquoReview of Financial Studies 18 37ndash84

Karatzas I and Shreve S (1991) Brownian Motion and Stochastic Calculus(2nd ed) New York Springer-Verlag

Kloeden D E Platen E Schurz H and Soslashrensen M (1996) ldquoOn Effectsof Discretization on Estimators of Drift Parameters for Diffusion ProcessesrdquoJournal of Applied Probability 33 1061ndash1076

Liptser R S and Shiryayev A N (2001) Statistics of Random Processes I II(2nd ed) New York Springer-Verlag

Mercurio D and Spokoiny V (2004) ldquoStatistical Inference for Time-Inhomogeneous Volatility Modelsrdquo The Annals of Statistics 32 577ndash602

Morgan J P (1996) RiskMetrics Technical Document (4th ed) New YorkAuthor

Nelson D B (1990) ldquoARCH Models as Diffusion Approximationsrdquo Journalof Econometrics 45 7ndash38

Revuz D and Yor M (1999) Continuous Martingales and Brownian MotionNew York Springer-Verlag

Robinson P M (1997) ldquoLarge-Sample Inference for Nonparametric Regres-sion With Dependent Errorsrdquo The Annals of Statistics 25 2054ndash2083

Rosenblatt M (1970) ldquoDensity Estimates and Markov Sequencesrdquo in Non-parametric Techniques in Statistical Inference ed M L Puri CambridgeUK Cambridge University Press pp 199ndash213

Ruppert D Wand M P Holst U and Houmlssjer O (1997) ldquoLocal PolynomialVariance Function Estimationrdquo Technometrics 39 262ndash273

Spokoiny V (2000) ldquoDrift Estimation for Nonparametric Diffusionrdquo The An-nals of Statistics 28 815ndash836

Stanton R (1997) ldquoA Nonparametric Models of Term Structure Dynamics andthe Market Price of Interest Rate Riskrdquo Journal of Finance LII 1973ndash2002

Vasicek O (1977) ldquoAn Equilibrium Characterization of Term Structurerdquo Jour-nal of Financial Economics 5 177ndash188

Volkonskii V A and Rozanov Yu A (1959) ldquoSome Limit Theorems forRandom Functions Part Irdquo Theory of Probability and Its Applications 4178ndash197

Wang Y (2002) ldquoAsymptotic Nonequivalence of GARCH Models and Diffu-sionsrdquo The Annals of Statistics 30 754ndash783

Page 8: Dynamic Integration of Time- and ... - Princeton Universityorfe.princeton.edu/~jqfan/papers/04/timestate4.pdf · Dynamic Integration of Time- and State-Domain Methods for Volatility

Fan Fan and Jiang Integrating Time and State Domains for Volatility Estimation 625

(a) (b)

Figure 2 The Dynamic Weights for Prediction for Example 1 (a) and the Biases for Prediction for the State-Domain Estimator ( minus minus minus) and theIntegration Estimator ( mdash) (b)

It is seen that the bias is actually controlled because both of theestimators have very close biases The weights are around 2and through the weights the integrated method improves boththe RiskMetrics method and the state-domain approach

Example 2 To simulate the interest rate data we considerthe CoxndashIngersollndashRoss (CIR) model

drt = κ(θ minus rt)dt + σ r12t dWt t ge t0

where the spot rate rt moves around a central location or long-run equilibrium level θ = 08571 at speed κ = 21459 The σ

is set at 07830 These parameters values given by Chapmanand Pearson (2000) satisfy the condition 2κθ ge σ 2 so that theprocess rt is stationary and positive The model has been studiedby Chapman and Pearson (2000) and Fan and Zhang (2003)

We simulated weekly data (600 runs) from the CIR modelFor each simulation we designated the first 900 observations

as the ldquoin-samplerdquo data and the last 300 observations the ldquoout-samplerdquo data The results summarized in Table 3 show that theintegrated method performs closely to the state-domain methodThis is mainly because the dynamic weights shown in Figure 3are very small and the integration estimator is very close to thestate-domain estimator supporting our statement at the end ofthe third paragraph in Section 1 Table 3 shows that the inte-grated estimator uniformly dominates the other estimators onaverage due to its highest score and lowest averaged RIMADEIMADE MADE IRADE and RADE The improvement inRIMADE IMADE and IRADE is about 100 This demon-strates that our integrated volatility method better captures thevolatility dynamics The historical simulation method performspoorly due to misspecification of the function of the volatilityparameter The results here show the advantage of aggregatingthe information of the time and state domains Note that all es-timators have reasonable ER values at level 05 and that the ERvalue of the integrated estimator is closest to 05

Table 3 Comparisons of Several Volatility Estimation Methods

Measure Empirical formula Hist RiskM Semi NonBay Integ GARCH StaDo

RIMADE Score () 501 1820 2104 3656 9933 2404 9917Average (times10minus1) 267 190 188 164 61 202 62Standard (times10minus1) 148 35 81 31 39 130 43Relative loss () 33732 21074 20798 16821 0 23012 112

IMADE Score () 768 1319 1469 2972 9967 2120 9950Average (times10minus6) 237 208 191 179 64 196 63Standard (times10minus6) 100 78 69 68 45 91 46Relative loss () 27162 22558 19905 18107 0 20696 minus163

MADE Score () 3823 5209 5776 5609 7145 5392 6945Average (times10minus5) 102 93 93 93 91 94 92Standard (times10minus6) 349 333 321 330 317 310 317Relative loss () 1100 212 221 162 0 229 21

IRADE Score () 818 1252 1402 3072 9983 2104 9950Average (times10minus4) 376 322 307 277 99 316 98Standard (times10minus4) 140 75 90 66 60 192 62Relative loss () 27855 22397 20898 17892 0 21821 minus111

RADE Score () 4040 5209 5042 5593 7379 4875 7279Average (times10minus2) 15 15 15 15 15 15 15Standard (times10minus4) 272 267 257 265 258 251 257Relative loss () 608 123 160 92 0 189 06

ER Average 056 055 054 053 049 053 049Standard 023 011 013 011 013 036 013

626 Journal of the American Statistical Association June 2007

(a) (b)

Figure 3 The Dynamic Weights for Prediction for Example 2 (a) and the Biases for Prediction for the State-Domain Estimator ( minus minus minus) and forthe Integration Estimator ( mdash) (b)

Again Figure 3(b) displays the biases for the state domainand integration estimators It can be seen that the state-domainand integration estimators have almost overlaid biases demon-strating that the bias of the integrated estimator is taken care ofeven though we choose the dynamic weights by minimizing thevariance

Example 3 To assess the performance of the proposed meth-ods under a nonstationary situation we now consider the geo-metric Brownian (GBM)

drt = μrt dt + σ rt dWt

where Wt is a standard one-dimensional Brownian motion Thisis a nonstationary process against which we check whether ourmethod continues to apply Note that the celebrated BlackndashScholes option price formula is derived from Osbornersquos as-sumption that the stock price follows the GBM model By Itocircrsquosformula we have

log rt minus log r0 = (μ minus σ 22)t + σ 2Wt

We set μ = 03 and σ = 26 in our simulations With theBrownian motion simulated from independent Gaussian incre-ments we can generate the samples for the GBM We simulate600 times with = 1252 For each simulation we generate1000 observations and use the first two-thirds of the observa-tions as in-sample data and the remaining observations as out-sample data

Because of the nonstationarity of the process the simulationresults are somewhat unstable with a few uncommon realiza-tions appearing Therefore we use a robust method to sum-marize the results For each measure we compute the relativeloss and the median along with the median absolute deviation(MAD) for scale estimation The MAD for sample ais

i=1 isdefined as MAD(a) = median|ai minus median(a)| i = 1 sIt is easy to see that MAD(a) provides a robust estimator ofthe standard deviation of ai Table 4 reports the results Thetable shows that the integrated estimator performs quite wellAll estimators have acceptable ER values The proposed esti-

Table 4 Robust Comparisons of Several Volatility Estimation Methods

Measure Empirical formula Hist RiskM Semi NonBay Integ GARCH StaDo

RIMADE Score () 733 4300 3583 7283 6133 2733 5233Median (times10minus1) 2254 1561 1770 1490 1382 2036 1443

MAD 035 012 024 012 034 036 037Relative loss () 6310 1296 2805 781 0 4732 444

IMADE Score () 1333 4333 2633 7333 6517 2550 5300Median (times10minus7) 217 174 193 163 146 265 154MAD (times10minus8) 566 474 486 445 322 817 357

Relative loss () 4864 1913 3211 1205 0 8145 561

MADE Score () 3900 3950 2650 4933 5200 4233 5133Median (times10minus6) 115 111 111 110 109 111 109MAD (times10minus7) 236 244 240 244 231 230 230

Relative loss () 592 143 181 123 0 174 29

Score () 1417 4317 2500 7317 6467 2683 5300IRADE Median (times10minus4) 346 267 310 255 251 378 266

MAD (times10minus5) 599 406 488 389 566 847 644Relative loss () 3787 649 2358 149 0 5037 591

RADE Score () 3900 3700 2433 5117 6150 2950 5750Median (times10minus4) 1581 1509 1515 1508 1504 1661 1503MAD (times10minus4) 185 201 198 200 193 257 192

Relative loss () 509 30 73 23 0 1042 minus07

ER Median 054 051 054 051 048 057 0480MAD 010 003 004 003 006 007 006

Fan Fan and Jiang Integrating Time and State Domains for Volatility Estimation 627

(a) (b)

Figure 4 The Dynamic Weights for Prediction for Example 3 (a) and the Biases for Prediction for the State-Domain Estimator ( minus minus minus) and theIntegration Estimator ( mdash) (b)

mators have the smallest measure values in median or robustscale This shows that our integrated method continues to per-form better than the others for this nonstationary case

Again Figure 4(b) displays the biases for the state domainand integration estimators It can be seen that the state-domainand integration estimators have almost overlaid biases demon-strating that the bias of the integrated estimator is taken care ofeven though we choose the dynamic weights by minimizing thevariance

52 Real Examples

In this section we apply the integrated volatility estimationmethods and other methods to the analysis of real financial data

521 Treasury Bond Here we consider the weekly returnsof three Treasury Bonds with terms of 1 5 and 10 years Thein-sample data are for January 4 1974ndashDecember 30 1994 andout-sample data are for January 6 1995ndashAugust 8 2003 Thetotal sample size is 1545 and the in-sample size is 1096 Theresults are reported in Table 5

From Table 5 for the 1-year Treasury Notes the integratedestimator is of the smallest MADE and the smallest RADE re-flecting that the integrated estimation method of the volatilityis the best among the seven methods For the bonds with 5-or 10-year maturities the seven estimators have close MADEsand RADEs where the historical simulation method is bet-ter than the RiskMetrics in terms of MADE and RADE andthe integrated estimation approach has the smallest MADEsIn addition the integrated estimator has ER values closest to

05 This demonstrates the advantage of using state-domaininformation which can improve the time-domain predictionof the volatilities in the interest dynamics The nonparametricBayesian method does not perform as well as the integrated es-timator because it uses fixed weights and the inverse-gammaprior which may not be true in practice

522 Exchange Rate We analyze the daily exchange rateof several foreign currencies against the US dollar The dataare for January 3 1994ndashAugust 1 2003 The in-sample dataconsist of the observations before January 1 2001 and the out-sample data consist of the remaining observations The resultsreported in Table 6 show that the integrated estimator has thesmallest MADEs and moderate RADEs for the exchange ratesNote that the RADE measure is effective only when the dataseem to be from a normal distribution and thus cannot calibratethe accuracy of volatility forecast for data from an unknowndistribution For this reason the RADE might not be a goodmeasure of performance for this example Both the integratedvolatility estimation and GARCH perform nicely for this exam-ple They outperform other nonparametric methods

Figure 5 reports the dynamic weights for the daily exchangerate of the UK pound to the US dollar Because the RiskMestimator is far more informative than the StaDo estimator ourintegrated estimator becomes basically the time-domain esti-mator by automatically choosing large dynamic weights Thisexemplifies our statement in the paragraph immediately before(1) in Section 1

Table 5 Comparisons of Several Volatility Estimation Methods

Term Measure (times10minus2) Hist RiskM Semi NonBay Integ GARCH StaDo

1 year MADE 1044 787 787 794 732 893 914RADE 5257 4231 4256 4225 4105 4402 4551

ER 2237 2009 2013 1562 3795 0 7366

5 years MADE 1207 1253 1296 1278 1201 1245 1638RADE 5315 5494 5630 5563 5571 5285 6543

ER 671 1339 1566 1112 5804 0 8929

10 years MADE 1041 1093 1103 1111 1018 1046 1366RADE 4939 5235 5296 5280 5152 4936 6008

ER 1112 1563 1790 1334 4911 0 6920

628 Journal of the American Statistical Association June 2007

Table 6 Comparison of Several Volatility Estimation Methods

Currency Measure Hist RiskM Semi NonBay Integ GARCH StaDo

UK MADE (times10minus4) 614 519 536 519 493 506 609RADE (times10minus3) 3991 3424 3513 3440 3492 3369 3900

ER 009 014 019 015 039 005 052

Australia MADE (times10minus4) 172 132 135 135 126 132 164RADE (times10minus3) 1986 1775 1830 1798 1761 1768 2033

ER 054 025 026 022 042 018 045

Japan MADE (times10minus1) 5554 5232 5444 5439 5064 5084 6987RADE (times10minus1) 3596 3546 3622 3588 3559 3448 4077

ER 012 011 019 012 028 003 032

6 CONCLUSIONS

We have proposed a dynamically integrated method and aBayesian method to aggregate the information from the timedomain and the state domain We studied the performance com-parisons both empirically and theoretically We showed that theproposed integrated method effectively aggregates the informa-tion from both the time and state domains and has advantagesover some previous methods for example it is powerful in fore-casting volatilities for the yields of bonds and for exchangerates Our study also revealed that proper use of informationfrom both the time domain and the state domain makes volatil-ity forecasting more accurate Our method exploits both thecontinuity in the time domain and the stationarity in the statedomain and can be applied to situations in which these twoconditions hold approximately

APPENDIX CONDITIONS AND PROOFSOF THEOREMS

We state technical conditions for the proof of our results

(C1) (Global Lipschitz condition) There exists a constant k0 ge 0such that

|μ(x) minus μ(y)| + |σ(x) minus σ(y)| le k0|x minus y| for all x y isin R

(C2) Given time point t gt 0 there exists a constant L gt 0 such that

E|μ(rs)|4(p+δ) le L and E|σ(rs)|4(p+δ) le L

for any s isin [t minus η t] where η is some positive constant p is a positiveinteger and δ is some small positive constant

Figure 5 Dynamic Weights for the Daily Exchange Rate of the UKPound and the US Dollar

(C3) The discrete observations rtiNi=0 satisfy the stationarity con-

ditions of Banon (1978) Furthermore the G2 condition of Rosenblatt(1970) holds for the transition operator

(C4) The conditional density p(y|x) of rti+ given rti is continuousin the arguments (y x) and is bounded by a constant independent of

(C5) The kernel W is a bounded symmetric probability densityfunction with compact support say [minus11]

(C6) (N minus n)h rarr infin (N minus n)h5 rarr 0 and (N)h rarr 0

Condition (C1) ensures that (3) has a unique strong solution rt t ge0 continuous with probability 1 if the initial value r0 satisfies thatP(|r0| lt infin) = 1 (see thm 46 of Liptser and Shiryayev 2001) Condi-tion (C2) indicates that given time point t gt 0 there is a time interval[t minus η t] on which the drift and the volatility have finite 4(p + δ)thmoments which is needed for deriving asymptotic normality of thetime-domain estimate σ 2

ESt Conditions (C3)ndash(C5) are similar to thoseof Fan and Zhang (2003) Condition (C6) is assumed for bandwidthswhere undersmoothing is uses to yield zero bias in the asymptotic nor-mality of the state-domain estimate

Throughout the proof we denote by M a generic positive constantand use μs and σs to represent μ(rs) and σ(rs)

Proposition A1 Under conditions (C1) and (C2) we have for al-most all sample paths

|σ 2s minus σ 2

u | le K|s minus u|(2pminus1)(4p) (A1)

for any su isin [t minus η t] where the coefficient K satisfies E[K2(p+δ)]ltinfin

Proof First we show that the process rs is locally Houmllder-continuous with order q = (p minus 1)(2p) and coefficient K1 such that

E[K4(p+δ)1 ] lt infin that is

|rs minus ru| le K1|s minus u|q foralls u isin [t minus η t] (A2)

In fact by Jensenrsquos inequality and martingale moment inequalities(Karatzas and Shreve 1991 sec 33D p 163) we have

E|ru minus rs|4(p+δ)

le M

(E

∣∣∣∣int u

sμv dv

∣∣∣∣4(p+δ)

+ E

∣∣∣∣int u

sσv dWv

∣∣∣∣4(p+δ))

le M(u minus s)4(p+δ)minus1int u

sE|μv|4(p+δ) dv

+ M(u minus s)2(p+δ)minus1int u

sE|σv|4(p+δ) dv

le M(u minus s)2(p+δ)

Then by theorem 21 of Revuz and Yor (1999 p 26)

E[(

supt 13=s

|rs minus ru||s minus u|α)4(p+δ)]

lt infin (A3)

Fan Fan and Jiang Integrating Time and State Domains for Volatility Estimation 629

for any α isin [02(p+δ)minus1

4(p+δ)) Let α = pminus1

2p and K1 = sups 13=u|rs minusru||sminusu|(pminus1)(2p) Then E[K4(p+δ)

1 lt infin] and the inequality (A2)holds

Second by condition (C1) we have |σ 2s minus σ 2

u | le k0(σs + σu)|rs minusru| This combined with (A2) leads to

|σ 2s minus σ 2

u | le k0(σs + σu)K1|s minus u|q equiv K|s minus u|q

Then by Houmllderrsquos inequality

E[K2(p+δ)

] le ME[(σsK1)2(p+δ)

]

le Mradic

E[σ

4(p+δ)s

]E[K4(p+δ)

1

]lt infin

Proof of Theorem 1

Let Zis = (rs minus rti)2 Applying Itocircrsquos formula to Zis we obtain

dZis = 2

(int s

tiμu du +

int s

tiσu dWu

)(μs ds + σs dWs

)+ σ 2

s ds

= 2

[(int s

tiμu du +

int s

tiσu dWu

)μs ds + σs

(int s

tiμu du

)dWs

]

+ 2

(int s

tiσu dWu

)σs dWs + σ 2

s ds

Then Y2i can be decomposed as Y2

i = 2ai + 2bi + σ 2i where

ai = minus1[int ti+1

tiμs ds

int s

tiμu du +

int ti+1

tiμs ds

int s

tiσu dWu

+int ti+1

tiσs dWs

int s

tiμu du

]

bi = minus1int ti+1

ti

int s

tiσu dWuσs dWs

and

σ 2i = minus1

int ti+1

tiσ 2

s ds

Therefore σ 2ESt can be written as

σ 2ESt = 2

1 minus λ

1 minus λn

tminus1sum

i=tminusn

λtminusiminus1ai + 21 minus λ

1 minus λn

tminus1sum

i=tminusn

λtminusiminus1bi

+ 1 minus λ

1 minus λn

tminus1sum

i=tminusn

λtminusiminus1σ 2i

equiv An + Bn + Cn

By Proposition A1 as n rarr 0 |Cn minus σ 2t | le K(n)q where q =

(2p minus 1)(4p) This combined with Lemmas A1 and A2 completesthe proof of the theorem

Lemma A1 If condition (C2) is satisfied then E[A2n] = O()

Proof Simple algebra gives the result In fact

E(a2i ) le 3E

[minus1

int ti+1

tiμs ds

int s

tiμu du

]2

+ 3E

[minus1

int ti+1

tiμs ds

int s

tiσu dWu

]2

+ 3E

[minus1

int ti+1

tiσs dWs

int s

tiμu du

]2

equiv I1() + I2() + I3()

Applying Jensenrsquos inequality we obtain that

I1() = O(minus1)E

[int ti+1

ti

int s

tiμ2

s μ2u du ds

]

= O(minus1)

int ti+1

ti

int s

tiE(μ4

u + μ4s )du ds = O()

By Jensenrsquos inequality Houmllderrsquos inequality and martingale momentinequalities we have

I2() = O(minus1)

int ti+1

tiE

(μs

int s

tiσu dWu

)2ds

= O(minus1)

int ti+1

ti

E[μs]4E

[int ti+1

tiσu dWu

]412ds

= O()

Similarly I3() = O() Therefore E(a2i ) = O() Then by the

CauchyndashSchwartz inequality and noting that n(1 minus λ) = O(1) we ob-tain that

E[A2n] le n

(1 minus λ

1 minus λn

)2 nsum

i=1

λ2(nminusi)E(a2i ) = O()

Lemma A2 Under conditions (C1) and (C2) if n rarr infin andn rarr 0 then

sminus11t

radicnBn

Dminusrarr N (01) (A4)

Proof Note that bj = σ 2t minus1 int tj+1

tj (Ws minus Wtj)dWs + εj where

εj = minus1int tj+1

tj(σs minus σt)

[int s

tjσu dWu

]dWs

+ minus1σt

int tj+1

tj

[int s

tj(σu minus σt)dWu

]dWs

By the central limit theorem for martingales (see Hall and Heyde 1980cor 31) it suffices to show that

V2n equiv E[sminus2

1t nB2n] rarr 1 (A5)

and that the following Lyapunov condition holds

tminus1sum

i=tminusn

E

(radicn

1 minus λ

1 minus λn λtminusiminus1bi

)4rarr 0 (A6)

Note that

2

2E(ε2

j ) le E

int tj+1

tj(σs minus σt)

[int s

tjσu dWu

]dWs

2

+ σ 2t E

int tj+1

tj

[int s

tj(σu minus σt)dWu

]dWs

2

equiv Ln1 + Ln2 (A7)

By Jensenrsquos inequality Houmllderrsquos inequality and moment inequalitiesfor martingale we have

Ln1 leint tj+1

tjE

(σs minus σt)

2[int s

tjσu dWu

]2ds

leint tj+1

tj

E(σs minus σt)

4E

[int s

tjσu dWu

]412ds

leint tj+1

tj

E[k0K1(n)q]436

int s

tjE(σ 4

u )du

12ds

le M(n)2q2 (A8)

630 Journal of the American Statistical Association June 2007

where K1 is defined in Proposition A1 Similarly

Ln2 le M(n)2q2 (A9)

By (A7) (A8) and (A9) we have E(ε2j ) le M(n)2q therefore

E[σminus4t b2

j ] = 12 + O((n)q) By the theory of stochastic calculus sim-

ple algebra gives that E(bj) = 0 and E(bibj) = 0 for i 13= j It followsthat

V2n = E(sminus2

1t nB2n) =

tminus1sum

i=tminusn

E

(2s1t

radicn

1 minus λ

1 minus λn λtminusiminus1bi

)2rarr 1

that is (A5) holds For (A6) it suffices to prove that E(b4j ) is

bounded which holds from condition (C2) and by applying the mo-ment inequalities for martingales to b4

j

Proof of Theorem 2

The proof is completed along the same lines as given by Fan andZhang (2003)

Proof of Theorem 3

By Fan and Yao (1998) the volatility estimator σ 2StN

behaves asif the instantaneous return function f were known Thus without lossof generality we assume that f (x) = 0 and hence Ri = Y2

i Let Y =(Y2

0 Y2Nminus1)T and W = diagWh(rt0 minus rtN ) Wh(rtNminus1 minus rtN )

and let X be a (N minus t0)times 2 matrix with each of the elements in the firstcolumn being 1 and the second column being (rt0 minus rtN rtNminus1 minusrtN )prime Let mi = E[Y2

i |rti ] m = (m0 mNminus1)T and e1 = (10)T Define SN = XT WX and TN = XT WY Then it can be written thatσ 2

StN= eT

1 Sminus1N TN (see Fan and Yao 2003) Hence

σ 2StN

minus σ 2tN = eT

1 Sminus1N XT Wm minus XβN + eT

1 Sminus1N XT W(Y minus m)

equiv eT1 b + eT

1 t (A10)

where βN = (m(rtN ) mprime(rtN ))T with m(rx) = E[Y21 |rt1 = x] Follow-

ing Fan and Zhang (2003) the bias vector b converges in probability toa vector b with b = O(h2) = o(1

radic(N)h) In what follows we show

that the centralized vector t is asymptotically normalIn fact write u = (N)minus1Hminus1XT W(Y minus m) where H = diag1h

Then following Fan and Zhang (2003) the vector t can be written as

t = pminus1(rtN )Hminus1Sminus1u(1 + op(1)) (A11)

where S = (μi+jminus2)ij=12 with μj = intujW(u)du and p(x) is the in-

variant density of rt defined in Section 22 For any constant vector cdefine

QN = cT u = 1

N

Nminus1sum

i=0

Y2i minus miCh

(rti minus rtN

)

where Ch(middot) = 1hC(middoth) with C(x) = c0W(x) + c1xW(x) Applyingthe ldquobig-blockrdquo and ldquosmall-blockrdquo arguments of Fan and Yao (2003thm 63 p 238) we obtain

θminus1(rtN )radic

(N)hQNDminusrarr N(01) (A12)

where θ2(rtN ) = 2p(rtN )σ 4(rtN )int +infinminusinfin C2(u)du In what follows we

decompose QN into two parts QprimeN and Qprimeprime

N which satisfy the follow-ing

(a) NhE[θminus1(rtN )QprimeN ]2 le (hN)(hminus1aN(1 + o(1)) + (N) times

o(hminus1)) rarr 0

(b) QprimeprimeN is identically distributed as QN and is asymptotically inde-

pendent of σ 2EStN

Define QprimeN = 1

NsumaN

i=0Y2i minus E[Y2

i |rti ]Ch(rti minus rtN ) and QprimeprimeN = QN minus

QprimeN where aN is a positive integer satisfying aN = o(N) and aN rarr

infin Obviously aN n Let ϑNi = (Y2i minus mi)Ch(rti minus rtN ) Then fol-

lowing Fan and Zhang (2003)

var[θminus1(

rtN)ϑN1

] = hminus1(1 + o(1))

and

Nminus2sum

=1

| cov(ϑN1 ϑN+1)| = o(hminus1)

which yields the result in (a) This combined with (A12) (a) and thedefinition of Qprimeprime

N leads to

θminus1(rtN )radic

NhQprimeprimeN

Dminusrarr N(01) (A13)

Note that the stationarity conditions of Banon (1978) and the G2 con-dition of Rosenblatt (1970) on the transition operator imply that theρt mixing coefficient ρt() of rti decays exponentially and that thestrong mixing coefficient α() le ρt() it follows that∣∣E exp

(Qprimeprime

N + σ 2EStN

) minus E expiξ(QprimeprimeN)E exp

(iξ σ 2

EStN

)∣∣

le 32α((aN minus n)) rarr 0

for any ξ isin R Using the theorem of Volkonskii and Rozanov (1959)we get the asymptotic independence of σ 2

EStNand Qprimeprime

N

By (a)radic

NhQprimeN is asymptotically negligible Then by Theorem 1

d1θminus1(rtN

)radicNhQN + d2Vminus12

2radic

n[σ 2

EStNminus σ 2(

rtN)]

DminusrarrN (0d21 + d2

2)

for any real d1 and d2 where V2 = ec+1ecminus1σ 4(rtN ) Because QN is a

linear transformation of u

Vminus12[ radic

Nhuradicn[σ 2

EStNminus σ 2(rtN )]

]Dminusrarr N (0 I3)

where V = blockdiagV1V2 with V1 = 2σ 4(rtN )p(rtN )Slowast whereSlowast = (νi+jminus2)ij=12 with νj = int

ujW2(u)du This combined with

(A11) gives the joint asymptotic normality of t and σ 2EStN

Note that

b = op(1radic

Nh) and it follows that

minus12(radic

Nh[σ 2StN

minus σ 2(rtN )]radic

n[σ 2EStN

minus σ 2(rtN )])

Dminusrarr N (0 I2)

where = diag2σ 4(rtN )ν0p(rtN )V2 Note that σ 2StN

and σ 2EStN

are asymptotically independent it follows that the asymptotical nor-mality in (b) holds

The result in (c) follows from (b) and the fact that wtNPrarr wtN and

σ 2ItN

= wtN σ 2EStN

+ (1 minus wtN )σ 2StN

It follows that

σ 2ItN minus σ 2

tN

= σ 2tN minus σ 2

tN + (wtN minus wtN

)[(σ 2

EStNminus σ 2

tN

) minus (σ 2

StNminus σ 2

tN

)]

equiv Ln1 + (wtN minus wtN

)(Ln2 minus Ln3)

Because Ln1 Ln2 and Ln3 are of the same order the second and thirdterms are dominated by the first term Ln1 Then σ 2

ItNminus σ 2

tN = (σ 2tN minus

σ 2tN )(1 + op(1)) and thus σ 2

ItNminus σ 2

tN shares the same asymptotical

normality as σ 2tN minus σ 2

tN

[Received November 2005 Revised December 2006]

Fan Fan and Jiang Integrating Time and State Domains for Volatility Estimation 631

REFERENCESAiumlt-Sahalia Y and Mykland P (2004) ldquoEstimators of Diffusions With Ran-

domly Spaced Discrete Observations A General Theoryrdquo The Annals of Sta-tistics 32 2186ndash2222

Aiumlt-Sahalia Y Mykland P A and Zhang L (2005) ldquoHow Often to Samplea Continuous-Time Process in the Presence of Market Microstructure NoiserdquoReview of Financial Studies 18 351ndash416

Arapis M and Gao J (2004) ldquoNonparametric Kernel Estimation and Testingin Continuous-Time Financial Econometricsrdquo manuscript

Bandi F and Phillips (2003) ldquoFully Nonparametric Estimation of Scalar Dif-fusion Modelsrdquo Econometrica 71 241ndash283

Banon G (1978) ldquoNonparametric Identification for Diffusion ProcessesrdquoSIAM Journal of Control Optimization 16 380ndash395

Barndoff-Neilsen O E and Shephard N (2001) ldquoNon-Gaussian OrnsteinndashUhlenbeckndashBased Models and Some of Their Uses in Financial Economicsrdquo(with discussion) Journal of the Royal Statistical Society Ser B 63167ndash241

(2002) ldquoEconometric Analysis of Realized Volatility and Its Use inEstimating Stochastic Volatility Modelsrdquo Journal of the Royal Statistical So-ciety Ser B 64 253ndash280

Bollerslev T and Zhou H (2002) ldquoEstimating Stochastic Volatility DiffusionUsing Conditional Moments of Integrated Volatilityrdquo Journal of Economet-rics 109 33ndash65

Brenner R J Harjes R H and Kroner K F (1996) ldquoAnother Look at Mod-els of the Short-Term Interest Raterdquo Journal of Financial and QuantitativeAnalysis 31 85ndash107

Cai Z and Hong Y (2003) ldquoNonparametric Methods in Continuous-TimeFinance A Selective Reviewrdquo in Recent Advances and Trends in Nonpara-metric Statistics eds M G Akritas and D M Politis Amsterdam Elsevierpp 283ndash302

Chan K C Karolyi A G Longstaff F A and Sanders A B (1992)ldquoAn Empirical Comparison of Alternative Models of the Short-Term Inter-est Raterdquo Journal of Finance 47 1209ndash1227

Chapman D A and Pearson N D (2000) ldquoIs the Short Rate Drift ActuallyNonlinearrdquo Journal of Finance 55 355ndash388

Chen S X Gao J and Tang C Y (2007) ldquoA Test for Model Specificationof Diffusion Processesrdquo The Annals of Statistics to appear

Cox J C Ingersoll J E and Ross S A (1985) ldquoA Theory of the TermStructure of Interest Ratesrdquo Econometrica 53 385ndash467

Daveacute R D and Stahl G (1997) ldquoOn the Accuracy of VaR Estimates Based onthe VariancendashCovariance Approachrdquo working paper Olshen and Associates

Duan J C (1997) ldquoAugumented GARCH(pq) Process and Its DiffusionLimitrdquo Journal of Econometrics 79 97ndash127

Duffie D and Pan J (1997) ldquoAn Overview of Value at Riskrdquo The Journal ofDerivatives 4 7ndash49

Fan J (2005) ldquoA Selective Overview of Nonparametric Methods in FinancialEconometricsrdquo (with discussion) Statistical Science 20 317ndash357

Fan J and Gu J (2003) ldquoSemiparametric Estimation of Value-at-RiskrdquoEconometrics Journal 6 261ndash290

Fan J Jiang J Zhang C and Zhou Z (2003) ldquoTime-Dependent DiffusionModels for Term Structure Dynamics and the Stock Price Volatilityrdquo Statis-tica Sinica 13 965ndash992

Fan J and Yao Q (1998) ldquoEfficient Estimation of Conditional Variance Func-tions in Stochastic Regressionrdquo Biometrika 85 645ndash660

(2003) Nonlinear Time Series Nonparametric and Parametric Meth-ods New York Springer-Verlag

Fan J and Zhang C M (2003) ldquoA Reexamination of Diffusion EstimatorsWith Applications to Financial Model Validationrdquo Journal of the AmericanStatistical Association 98 118ndash134

Genon-Catalot V Jeanthheau T and Laredo C (1999) ldquoParameter Esti-mation for Discretely Observed Stochastic Volatility Modelsrdquo Bernoulli 5855ndash872

Gijbels I Pope A and Wand M P (1999) ldquoUnderstanding ExponentialSmoothing via Kernel Regressionrdquo Journal of the Royal Statistical SocietySer B 61 39ndash50

Hall P and Hart J D (1990) ldquoNonparametric Regression With Long-RangeDependencerdquo Stochastic Processes and Their Applications 36 339ndash351

Hall P and Heyde C (1980) Martingale Limit Theorem and Its ApplicationsNew York Academic Press

Hallin M Lu Z and Tran L T (2001) ldquoDensity Estimation for Spatial Lin-ear Processesrdquo Bernoulli 7 657ndash668

Haumlrdle W Herwartz H and Spokoiny V (2002) ldquoTime-InhomogeneousMultiple Volatility Modellingrdquo Journal of Finance and Econometrics 155ndash95

Hong Y and Li H (2005) ldquoNonparametric Specification Testing forContinuous-Time Models With Applications to Term Structure of InterestrdquoReview of Financial Studies 18 37ndash84

Karatzas I and Shreve S (1991) Brownian Motion and Stochastic Calculus(2nd ed) New York Springer-Verlag

Kloeden D E Platen E Schurz H and Soslashrensen M (1996) ldquoOn Effectsof Discretization on Estimators of Drift Parameters for Diffusion ProcessesrdquoJournal of Applied Probability 33 1061ndash1076

Liptser R S and Shiryayev A N (2001) Statistics of Random Processes I II(2nd ed) New York Springer-Verlag

Mercurio D and Spokoiny V (2004) ldquoStatistical Inference for Time-Inhomogeneous Volatility Modelsrdquo The Annals of Statistics 32 577ndash602

Morgan J P (1996) RiskMetrics Technical Document (4th ed) New YorkAuthor

Nelson D B (1990) ldquoARCH Models as Diffusion Approximationsrdquo Journalof Econometrics 45 7ndash38

Revuz D and Yor M (1999) Continuous Martingales and Brownian MotionNew York Springer-Verlag

Robinson P M (1997) ldquoLarge-Sample Inference for Nonparametric Regres-sion With Dependent Errorsrdquo The Annals of Statistics 25 2054ndash2083

Rosenblatt M (1970) ldquoDensity Estimates and Markov Sequencesrdquo in Non-parametric Techniques in Statistical Inference ed M L Puri CambridgeUK Cambridge University Press pp 199ndash213

Ruppert D Wand M P Holst U and Houmlssjer O (1997) ldquoLocal PolynomialVariance Function Estimationrdquo Technometrics 39 262ndash273

Spokoiny V (2000) ldquoDrift Estimation for Nonparametric Diffusionrdquo The An-nals of Statistics 28 815ndash836

Stanton R (1997) ldquoA Nonparametric Models of Term Structure Dynamics andthe Market Price of Interest Rate Riskrdquo Journal of Finance LII 1973ndash2002

Vasicek O (1977) ldquoAn Equilibrium Characterization of Term Structurerdquo Jour-nal of Financial Economics 5 177ndash188

Volkonskii V A and Rozanov Yu A (1959) ldquoSome Limit Theorems forRandom Functions Part Irdquo Theory of Probability and Its Applications 4178ndash197

Wang Y (2002) ldquoAsymptotic Nonequivalence of GARCH Models and Diffu-sionsrdquo The Annals of Statistics 30 754ndash783

Page 9: Dynamic Integration of Time- and ... - Princeton Universityorfe.princeton.edu/~jqfan/papers/04/timestate4.pdf · Dynamic Integration of Time- and State-Domain Methods for Volatility

626 Journal of the American Statistical Association June 2007

(a) (b)

Figure 3 The Dynamic Weights for Prediction for Example 2 (a) and the Biases for Prediction for the State-Domain Estimator ( minus minus minus) and forthe Integration Estimator ( mdash) (b)

Again Figure 3(b) displays the biases for the state domainand integration estimators It can be seen that the state-domainand integration estimators have almost overlaid biases demon-strating that the bias of the integrated estimator is taken care ofeven though we choose the dynamic weights by minimizing thevariance

Example 3 To assess the performance of the proposed meth-ods under a nonstationary situation we now consider the geo-metric Brownian (GBM)

drt = μrt dt + σ rt dWt

where Wt is a standard one-dimensional Brownian motion Thisis a nonstationary process against which we check whether ourmethod continues to apply Note that the celebrated BlackndashScholes option price formula is derived from Osbornersquos as-sumption that the stock price follows the GBM model By Itocircrsquosformula we have

log rt minus log r0 = (μ minus σ 22)t + σ 2Wt

We set μ = 03 and σ = 26 in our simulations With theBrownian motion simulated from independent Gaussian incre-ments we can generate the samples for the GBM We simulate600 times with = 1252 For each simulation we generate1000 observations and use the first two-thirds of the observa-tions as in-sample data and the remaining observations as out-sample data

Because of the nonstationarity of the process the simulationresults are somewhat unstable with a few uncommon realiza-tions appearing Therefore we use a robust method to sum-marize the results For each measure we compute the relativeloss and the median along with the median absolute deviation(MAD) for scale estimation The MAD for sample ais

i=1 isdefined as MAD(a) = median|ai minus median(a)| i = 1 sIt is easy to see that MAD(a) provides a robust estimator ofthe standard deviation of ai Table 4 reports the results Thetable shows that the integrated estimator performs quite wellAll estimators have acceptable ER values The proposed esti-

Table 4 Robust Comparisons of Several Volatility Estimation Methods

Measure Empirical formula Hist RiskM Semi NonBay Integ GARCH StaDo

RIMADE Score () 733 4300 3583 7283 6133 2733 5233Median (times10minus1) 2254 1561 1770 1490 1382 2036 1443

MAD 035 012 024 012 034 036 037Relative loss () 6310 1296 2805 781 0 4732 444

IMADE Score () 1333 4333 2633 7333 6517 2550 5300Median (times10minus7) 217 174 193 163 146 265 154MAD (times10minus8) 566 474 486 445 322 817 357

Relative loss () 4864 1913 3211 1205 0 8145 561

MADE Score () 3900 3950 2650 4933 5200 4233 5133Median (times10minus6) 115 111 111 110 109 111 109MAD (times10minus7) 236 244 240 244 231 230 230

Relative loss () 592 143 181 123 0 174 29

Score () 1417 4317 2500 7317 6467 2683 5300IRADE Median (times10minus4) 346 267 310 255 251 378 266

MAD (times10minus5) 599 406 488 389 566 847 644Relative loss () 3787 649 2358 149 0 5037 591

RADE Score () 3900 3700 2433 5117 6150 2950 5750Median (times10minus4) 1581 1509 1515 1508 1504 1661 1503MAD (times10minus4) 185 201 198 200 193 257 192

Relative loss () 509 30 73 23 0 1042 minus07

ER Median 054 051 054 051 048 057 0480MAD 010 003 004 003 006 007 006

Fan Fan and Jiang Integrating Time and State Domains for Volatility Estimation 627

(a) (b)

Figure 4 The Dynamic Weights for Prediction for Example 3 (a) and the Biases for Prediction for the State-Domain Estimator ( minus minus minus) and theIntegration Estimator ( mdash) (b)

mators have the smallest measure values in median or robustscale This shows that our integrated method continues to per-form better than the others for this nonstationary case

Again Figure 4(b) displays the biases for the state domainand integration estimators It can be seen that the state-domainand integration estimators have almost overlaid biases demon-strating that the bias of the integrated estimator is taken care ofeven though we choose the dynamic weights by minimizing thevariance

52 Real Examples

In this section we apply the integrated volatility estimationmethods and other methods to the analysis of real financial data

521 Treasury Bond Here we consider the weekly returnsof three Treasury Bonds with terms of 1 5 and 10 years Thein-sample data are for January 4 1974ndashDecember 30 1994 andout-sample data are for January 6 1995ndashAugust 8 2003 Thetotal sample size is 1545 and the in-sample size is 1096 Theresults are reported in Table 5

From Table 5 for the 1-year Treasury Notes the integratedestimator is of the smallest MADE and the smallest RADE re-flecting that the integrated estimation method of the volatilityis the best among the seven methods For the bonds with 5-or 10-year maturities the seven estimators have close MADEsand RADEs where the historical simulation method is bet-ter than the RiskMetrics in terms of MADE and RADE andthe integrated estimation approach has the smallest MADEsIn addition the integrated estimator has ER values closest to

05 This demonstrates the advantage of using state-domaininformation which can improve the time-domain predictionof the volatilities in the interest dynamics The nonparametricBayesian method does not perform as well as the integrated es-timator because it uses fixed weights and the inverse-gammaprior which may not be true in practice

522 Exchange Rate We analyze the daily exchange rateof several foreign currencies against the US dollar The dataare for January 3 1994ndashAugust 1 2003 The in-sample dataconsist of the observations before January 1 2001 and the out-sample data consist of the remaining observations The resultsreported in Table 6 show that the integrated estimator has thesmallest MADEs and moderate RADEs for the exchange ratesNote that the RADE measure is effective only when the dataseem to be from a normal distribution and thus cannot calibratethe accuracy of volatility forecast for data from an unknowndistribution For this reason the RADE might not be a goodmeasure of performance for this example Both the integratedvolatility estimation and GARCH perform nicely for this exam-ple They outperform other nonparametric methods

Figure 5 reports the dynamic weights for the daily exchangerate of the UK pound to the US dollar Because the RiskMestimator is far more informative than the StaDo estimator ourintegrated estimator becomes basically the time-domain esti-mator by automatically choosing large dynamic weights Thisexemplifies our statement in the paragraph immediately before(1) in Section 1

Table 5 Comparisons of Several Volatility Estimation Methods

Term Measure (times10minus2) Hist RiskM Semi NonBay Integ GARCH StaDo

1 year MADE 1044 787 787 794 732 893 914RADE 5257 4231 4256 4225 4105 4402 4551

ER 2237 2009 2013 1562 3795 0 7366

5 years MADE 1207 1253 1296 1278 1201 1245 1638RADE 5315 5494 5630 5563 5571 5285 6543

ER 671 1339 1566 1112 5804 0 8929

10 years MADE 1041 1093 1103 1111 1018 1046 1366RADE 4939 5235 5296 5280 5152 4936 6008

ER 1112 1563 1790 1334 4911 0 6920

628 Journal of the American Statistical Association June 2007

Table 6 Comparison of Several Volatility Estimation Methods

Currency Measure Hist RiskM Semi NonBay Integ GARCH StaDo

UK MADE (times10minus4) 614 519 536 519 493 506 609RADE (times10minus3) 3991 3424 3513 3440 3492 3369 3900

ER 009 014 019 015 039 005 052

Australia MADE (times10minus4) 172 132 135 135 126 132 164RADE (times10minus3) 1986 1775 1830 1798 1761 1768 2033

ER 054 025 026 022 042 018 045

Japan MADE (times10minus1) 5554 5232 5444 5439 5064 5084 6987RADE (times10minus1) 3596 3546 3622 3588 3559 3448 4077

ER 012 011 019 012 028 003 032

6 CONCLUSIONS

We have proposed a dynamically integrated method and aBayesian method to aggregate the information from the timedomain and the state domain We studied the performance com-parisons both empirically and theoretically We showed that theproposed integrated method effectively aggregates the informa-tion from both the time and state domains and has advantagesover some previous methods for example it is powerful in fore-casting volatilities for the yields of bonds and for exchangerates Our study also revealed that proper use of informationfrom both the time domain and the state domain makes volatil-ity forecasting more accurate Our method exploits both thecontinuity in the time domain and the stationarity in the statedomain and can be applied to situations in which these twoconditions hold approximately

APPENDIX CONDITIONS AND PROOFSOF THEOREMS

We state technical conditions for the proof of our results

(C1) (Global Lipschitz condition) There exists a constant k0 ge 0such that

|μ(x) minus μ(y)| + |σ(x) minus σ(y)| le k0|x minus y| for all x y isin R

(C2) Given time point t gt 0 there exists a constant L gt 0 such that

E|μ(rs)|4(p+δ) le L and E|σ(rs)|4(p+δ) le L

for any s isin [t minus η t] where η is some positive constant p is a positiveinteger and δ is some small positive constant

Figure 5 Dynamic Weights for the Daily Exchange Rate of the UKPound and the US Dollar

(C3) The discrete observations rtiNi=0 satisfy the stationarity con-

ditions of Banon (1978) Furthermore the G2 condition of Rosenblatt(1970) holds for the transition operator

(C4) The conditional density p(y|x) of rti+ given rti is continuousin the arguments (y x) and is bounded by a constant independent of

(C5) The kernel W is a bounded symmetric probability densityfunction with compact support say [minus11]

(C6) (N minus n)h rarr infin (N minus n)h5 rarr 0 and (N)h rarr 0

Condition (C1) ensures that (3) has a unique strong solution rt t ge0 continuous with probability 1 if the initial value r0 satisfies thatP(|r0| lt infin) = 1 (see thm 46 of Liptser and Shiryayev 2001) Condi-tion (C2) indicates that given time point t gt 0 there is a time interval[t minus η t] on which the drift and the volatility have finite 4(p + δ)thmoments which is needed for deriving asymptotic normality of thetime-domain estimate σ 2

ESt Conditions (C3)ndash(C5) are similar to thoseof Fan and Zhang (2003) Condition (C6) is assumed for bandwidthswhere undersmoothing is uses to yield zero bias in the asymptotic nor-mality of the state-domain estimate

Throughout the proof we denote by M a generic positive constantand use μs and σs to represent μ(rs) and σ(rs)

Proposition A1 Under conditions (C1) and (C2) we have for al-most all sample paths

|σ 2s minus σ 2

u | le K|s minus u|(2pminus1)(4p) (A1)

for any su isin [t minus η t] where the coefficient K satisfies E[K2(p+δ)]ltinfin

Proof First we show that the process rs is locally Houmllder-continuous with order q = (p minus 1)(2p) and coefficient K1 such that

E[K4(p+δ)1 ] lt infin that is

|rs minus ru| le K1|s minus u|q foralls u isin [t minus η t] (A2)

In fact by Jensenrsquos inequality and martingale moment inequalities(Karatzas and Shreve 1991 sec 33D p 163) we have

E|ru minus rs|4(p+δ)

le M

(E

∣∣∣∣int u

sμv dv

∣∣∣∣4(p+δ)

+ E

∣∣∣∣int u

sσv dWv

∣∣∣∣4(p+δ))

le M(u minus s)4(p+δ)minus1int u

sE|μv|4(p+δ) dv

+ M(u minus s)2(p+δ)minus1int u

sE|σv|4(p+δ) dv

le M(u minus s)2(p+δ)

Then by theorem 21 of Revuz and Yor (1999 p 26)

E[(

supt 13=s

|rs minus ru||s minus u|α)4(p+δ)]

lt infin (A3)

Fan Fan and Jiang Integrating Time and State Domains for Volatility Estimation 629

for any α isin [02(p+δ)minus1

4(p+δ)) Let α = pminus1

2p and K1 = sups 13=u|rs minusru||sminusu|(pminus1)(2p) Then E[K4(p+δ)

1 lt infin] and the inequality (A2)holds

Second by condition (C1) we have |σ 2s minus σ 2

u | le k0(σs + σu)|rs minusru| This combined with (A2) leads to

|σ 2s minus σ 2

u | le k0(σs + σu)K1|s minus u|q equiv K|s minus u|q

Then by Houmllderrsquos inequality

E[K2(p+δ)

] le ME[(σsK1)2(p+δ)

]

le Mradic

E[σ

4(p+δ)s

]E[K4(p+δ)

1

]lt infin

Proof of Theorem 1

Let Zis = (rs minus rti)2 Applying Itocircrsquos formula to Zis we obtain

dZis = 2

(int s

tiμu du +

int s

tiσu dWu

)(μs ds + σs dWs

)+ σ 2

s ds

= 2

[(int s

tiμu du +

int s

tiσu dWu

)μs ds + σs

(int s

tiμu du

)dWs

]

+ 2

(int s

tiσu dWu

)σs dWs + σ 2

s ds

Then Y2i can be decomposed as Y2

i = 2ai + 2bi + σ 2i where

ai = minus1[int ti+1

tiμs ds

int s

tiμu du +

int ti+1

tiμs ds

int s

tiσu dWu

+int ti+1

tiσs dWs

int s

tiμu du

]

bi = minus1int ti+1

ti

int s

tiσu dWuσs dWs

and

σ 2i = minus1

int ti+1

tiσ 2

s ds

Therefore σ 2ESt can be written as

σ 2ESt = 2

1 minus λ

1 minus λn

tminus1sum

i=tminusn

λtminusiminus1ai + 21 minus λ

1 minus λn

tminus1sum

i=tminusn

λtminusiminus1bi

+ 1 minus λ

1 minus λn

tminus1sum

i=tminusn

λtminusiminus1σ 2i

equiv An + Bn + Cn

By Proposition A1 as n rarr 0 |Cn minus σ 2t | le K(n)q where q =

(2p minus 1)(4p) This combined with Lemmas A1 and A2 completesthe proof of the theorem

Lemma A1 If condition (C2) is satisfied then E[A2n] = O()

Proof Simple algebra gives the result In fact

E(a2i ) le 3E

[minus1

int ti+1

tiμs ds

int s

tiμu du

]2

+ 3E

[minus1

int ti+1

tiμs ds

int s

tiσu dWu

]2

+ 3E

[minus1

int ti+1

tiσs dWs

int s

tiμu du

]2

equiv I1() + I2() + I3()

Applying Jensenrsquos inequality we obtain that

I1() = O(minus1)E

[int ti+1

ti

int s

tiμ2

s μ2u du ds

]

= O(minus1)

int ti+1

ti

int s

tiE(μ4

u + μ4s )du ds = O()

By Jensenrsquos inequality Houmllderrsquos inequality and martingale momentinequalities we have

I2() = O(minus1)

int ti+1

tiE

(μs

int s

tiσu dWu

)2ds

= O(minus1)

int ti+1

ti

E[μs]4E

[int ti+1

tiσu dWu

]412ds

= O()

Similarly I3() = O() Therefore E(a2i ) = O() Then by the

CauchyndashSchwartz inequality and noting that n(1 minus λ) = O(1) we ob-tain that

E[A2n] le n

(1 minus λ

1 minus λn

)2 nsum

i=1

λ2(nminusi)E(a2i ) = O()

Lemma A2 Under conditions (C1) and (C2) if n rarr infin andn rarr 0 then

sminus11t

radicnBn

Dminusrarr N (01) (A4)

Proof Note that bj = σ 2t minus1 int tj+1

tj (Ws minus Wtj)dWs + εj where

εj = minus1int tj+1

tj(σs minus σt)

[int s

tjσu dWu

]dWs

+ minus1σt

int tj+1

tj

[int s

tj(σu minus σt)dWu

]dWs

By the central limit theorem for martingales (see Hall and Heyde 1980cor 31) it suffices to show that

V2n equiv E[sminus2

1t nB2n] rarr 1 (A5)

and that the following Lyapunov condition holds

tminus1sum

i=tminusn

E

(radicn

1 minus λ

1 minus λn λtminusiminus1bi

)4rarr 0 (A6)

Note that

2

2E(ε2

j ) le E

int tj+1

tj(σs minus σt)

[int s

tjσu dWu

]dWs

2

+ σ 2t E

int tj+1

tj

[int s

tj(σu minus σt)dWu

]dWs

2

equiv Ln1 + Ln2 (A7)

By Jensenrsquos inequality Houmllderrsquos inequality and moment inequalitiesfor martingale we have

Ln1 leint tj+1

tjE

(σs minus σt)

2[int s

tjσu dWu

]2ds

leint tj+1

tj

E(σs minus σt)

4E

[int s

tjσu dWu

]412ds

leint tj+1

tj

E[k0K1(n)q]436

int s

tjE(σ 4

u )du

12ds

le M(n)2q2 (A8)

630 Journal of the American Statistical Association June 2007

where K1 is defined in Proposition A1 Similarly

Ln2 le M(n)2q2 (A9)

By (A7) (A8) and (A9) we have E(ε2j ) le M(n)2q therefore

E[σminus4t b2

j ] = 12 + O((n)q) By the theory of stochastic calculus sim-

ple algebra gives that E(bj) = 0 and E(bibj) = 0 for i 13= j It followsthat

V2n = E(sminus2

1t nB2n) =

tminus1sum

i=tminusn

E

(2s1t

radicn

1 minus λ

1 minus λn λtminusiminus1bi

)2rarr 1

that is (A5) holds For (A6) it suffices to prove that E(b4j ) is

bounded which holds from condition (C2) and by applying the mo-ment inequalities for martingales to b4

j

Proof of Theorem 2

The proof is completed along the same lines as given by Fan andZhang (2003)

Proof of Theorem 3

By Fan and Yao (1998) the volatility estimator σ 2StN

behaves asif the instantaneous return function f were known Thus without lossof generality we assume that f (x) = 0 and hence Ri = Y2

i Let Y =(Y2

0 Y2Nminus1)T and W = diagWh(rt0 minus rtN ) Wh(rtNminus1 minus rtN )

and let X be a (N minus t0)times 2 matrix with each of the elements in the firstcolumn being 1 and the second column being (rt0 minus rtN rtNminus1 minusrtN )prime Let mi = E[Y2

i |rti ] m = (m0 mNminus1)T and e1 = (10)T Define SN = XT WX and TN = XT WY Then it can be written thatσ 2

StN= eT

1 Sminus1N TN (see Fan and Yao 2003) Hence

σ 2StN

minus σ 2tN = eT

1 Sminus1N XT Wm minus XβN + eT

1 Sminus1N XT W(Y minus m)

equiv eT1 b + eT

1 t (A10)

where βN = (m(rtN ) mprime(rtN ))T with m(rx) = E[Y21 |rt1 = x] Follow-

ing Fan and Zhang (2003) the bias vector b converges in probability toa vector b with b = O(h2) = o(1

radic(N)h) In what follows we show

that the centralized vector t is asymptotically normalIn fact write u = (N)minus1Hminus1XT W(Y minus m) where H = diag1h

Then following Fan and Zhang (2003) the vector t can be written as

t = pminus1(rtN )Hminus1Sminus1u(1 + op(1)) (A11)

where S = (μi+jminus2)ij=12 with μj = intujW(u)du and p(x) is the in-

variant density of rt defined in Section 22 For any constant vector cdefine

QN = cT u = 1

N

Nminus1sum

i=0

Y2i minus miCh

(rti minus rtN

)

where Ch(middot) = 1hC(middoth) with C(x) = c0W(x) + c1xW(x) Applyingthe ldquobig-blockrdquo and ldquosmall-blockrdquo arguments of Fan and Yao (2003thm 63 p 238) we obtain

θminus1(rtN )radic

(N)hQNDminusrarr N(01) (A12)

where θ2(rtN ) = 2p(rtN )σ 4(rtN )int +infinminusinfin C2(u)du In what follows we

decompose QN into two parts QprimeN and Qprimeprime

N which satisfy the follow-ing

(a) NhE[θminus1(rtN )QprimeN ]2 le (hN)(hminus1aN(1 + o(1)) + (N) times

o(hminus1)) rarr 0

(b) QprimeprimeN is identically distributed as QN and is asymptotically inde-

pendent of σ 2EStN

Define QprimeN = 1

NsumaN

i=0Y2i minus E[Y2

i |rti ]Ch(rti minus rtN ) and QprimeprimeN = QN minus

QprimeN where aN is a positive integer satisfying aN = o(N) and aN rarr

infin Obviously aN n Let ϑNi = (Y2i minus mi)Ch(rti minus rtN ) Then fol-

lowing Fan and Zhang (2003)

var[θminus1(

rtN)ϑN1

] = hminus1(1 + o(1))

and

Nminus2sum

=1

| cov(ϑN1 ϑN+1)| = o(hminus1)

which yields the result in (a) This combined with (A12) (a) and thedefinition of Qprimeprime

N leads to

θminus1(rtN )radic

NhQprimeprimeN

Dminusrarr N(01) (A13)

Note that the stationarity conditions of Banon (1978) and the G2 con-dition of Rosenblatt (1970) on the transition operator imply that theρt mixing coefficient ρt() of rti decays exponentially and that thestrong mixing coefficient α() le ρt() it follows that∣∣E exp

(Qprimeprime

N + σ 2EStN

) minus E expiξ(QprimeprimeN)E exp

(iξ σ 2

EStN

)∣∣

le 32α((aN minus n)) rarr 0

for any ξ isin R Using the theorem of Volkonskii and Rozanov (1959)we get the asymptotic independence of σ 2

EStNand Qprimeprime

N

By (a)radic

NhQprimeN is asymptotically negligible Then by Theorem 1

d1θminus1(rtN

)radicNhQN + d2Vminus12

2radic

n[σ 2

EStNminus σ 2(

rtN)]

DminusrarrN (0d21 + d2

2)

for any real d1 and d2 where V2 = ec+1ecminus1σ 4(rtN ) Because QN is a

linear transformation of u

Vminus12[ radic

Nhuradicn[σ 2

EStNminus σ 2(rtN )]

]Dminusrarr N (0 I3)

where V = blockdiagV1V2 with V1 = 2σ 4(rtN )p(rtN )Slowast whereSlowast = (νi+jminus2)ij=12 with νj = int

ujW2(u)du This combined with

(A11) gives the joint asymptotic normality of t and σ 2EStN

Note that

b = op(1radic

Nh) and it follows that

minus12(radic

Nh[σ 2StN

minus σ 2(rtN )]radic

n[σ 2EStN

minus σ 2(rtN )])

Dminusrarr N (0 I2)

where = diag2σ 4(rtN )ν0p(rtN )V2 Note that σ 2StN

and σ 2EStN

are asymptotically independent it follows that the asymptotical nor-mality in (b) holds

The result in (c) follows from (b) and the fact that wtNPrarr wtN and

σ 2ItN

= wtN σ 2EStN

+ (1 minus wtN )σ 2StN

It follows that

σ 2ItN minus σ 2

tN

= σ 2tN minus σ 2

tN + (wtN minus wtN

)[(σ 2

EStNminus σ 2

tN

) minus (σ 2

StNminus σ 2

tN

)]

equiv Ln1 + (wtN minus wtN

)(Ln2 minus Ln3)

Because Ln1 Ln2 and Ln3 are of the same order the second and thirdterms are dominated by the first term Ln1 Then σ 2

ItNminus σ 2

tN = (σ 2tN minus

σ 2tN )(1 + op(1)) and thus σ 2

ItNminus σ 2

tN shares the same asymptotical

normality as σ 2tN minus σ 2

tN

[Received November 2005 Revised December 2006]

Fan Fan and Jiang Integrating Time and State Domains for Volatility Estimation 631

REFERENCESAiumlt-Sahalia Y and Mykland P (2004) ldquoEstimators of Diffusions With Ran-

domly Spaced Discrete Observations A General Theoryrdquo The Annals of Sta-tistics 32 2186ndash2222

Aiumlt-Sahalia Y Mykland P A and Zhang L (2005) ldquoHow Often to Samplea Continuous-Time Process in the Presence of Market Microstructure NoiserdquoReview of Financial Studies 18 351ndash416

Arapis M and Gao J (2004) ldquoNonparametric Kernel Estimation and Testingin Continuous-Time Financial Econometricsrdquo manuscript

Bandi F and Phillips (2003) ldquoFully Nonparametric Estimation of Scalar Dif-fusion Modelsrdquo Econometrica 71 241ndash283

Banon G (1978) ldquoNonparametric Identification for Diffusion ProcessesrdquoSIAM Journal of Control Optimization 16 380ndash395

Barndoff-Neilsen O E and Shephard N (2001) ldquoNon-Gaussian OrnsteinndashUhlenbeckndashBased Models and Some of Their Uses in Financial Economicsrdquo(with discussion) Journal of the Royal Statistical Society Ser B 63167ndash241

(2002) ldquoEconometric Analysis of Realized Volatility and Its Use inEstimating Stochastic Volatility Modelsrdquo Journal of the Royal Statistical So-ciety Ser B 64 253ndash280

Bollerslev T and Zhou H (2002) ldquoEstimating Stochastic Volatility DiffusionUsing Conditional Moments of Integrated Volatilityrdquo Journal of Economet-rics 109 33ndash65

Brenner R J Harjes R H and Kroner K F (1996) ldquoAnother Look at Mod-els of the Short-Term Interest Raterdquo Journal of Financial and QuantitativeAnalysis 31 85ndash107

Cai Z and Hong Y (2003) ldquoNonparametric Methods in Continuous-TimeFinance A Selective Reviewrdquo in Recent Advances and Trends in Nonpara-metric Statistics eds M G Akritas and D M Politis Amsterdam Elsevierpp 283ndash302

Chan K C Karolyi A G Longstaff F A and Sanders A B (1992)ldquoAn Empirical Comparison of Alternative Models of the Short-Term Inter-est Raterdquo Journal of Finance 47 1209ndash1227

Chapman D A and Pearson N D (2000) ldquoIs the Short Rate Drift ActuallyNonlinearrdquo Journal of Finance 55 355ndash388

Chen S X Gao J and Tang C Y (2007) ldquoA Test for Model Specificationof Diffusion Processesrdquo The Annals of Statistics to appear

Cox J C Ingersoll J E and Ross S A (1985) ldquoA Theory of the TermStructure of Interest Ratesrdquo Econometrica 53 385ndash467

Daveacute R D and Stahl G (1997) ldquoOn the Accuracy of VaR Estimates Based onthe VariancendashCovariance Approachrdquo working paper Olshen and Associates

Duan J C (1997) ldquoAugumented GARCH(pq) Process and Its DiffusionLimitrdquo Journal of Econometrics 79 97ndash127

Duffie D and Pan J (1997) ldquoAn Overview of Value at Riskrdquo The Journal ofDerivatives 4 7ndash49

Fan J (2005) ldquoA Selective Overview of Nonparametric Methods in FinancialEconometricsrdquo (with discussion) Statistical Science 20 317ndash357

Fan J and Gu J (2003) ldquoSemiparametric Estimation of Value-at-RiskrdquoEconometrics Journal 6 261ndash290

Fan J Jiang J Zhang C and Zhou Z (2003) ldquoTime-Dependent DiffusionModels for Term Structure Dynamics and the Stock Price Volatilityrdquo Statis-tica Sinica 13 965ndash992

Fan J and Yao Q (1998) ldquoEfficient Estimation of Conditional Variance Func-tions in Stochastic Regressionrdquo Biometrika 85 645ndash660

(2003) Nonlinear Time Series Nonparametric and Parametric Meth-ods New York Springer-Verlag

Fan J and Zhang C M (2003) ldquoA Reexamination of Diffusion EstimatorsWith Applications to Financial Model Validationrdquo Journal of the AmericanStatistical Association 98 118ndash134

Genon-Catalot V Jeanthheau T and Laredo C (1999) ldquoParameter Esti-mation for Discretely Observed Stochastic Volatility Modelsrdquo Bernoulli 5855ndash872

Gijbels I Pope A and Wand M P (1999) ldquoUnderstanding ExponentialSmoothing via Kernel Regressionrdquo Journal of the Royal Statistical SocietySer B 61 39ndash50

Hall P and Hart J D (1990) ldquoNonparametric Regression With Long-RangeDependencerdquo Stochastic Processes and Their Applications 36 339ndash351

Hall P and Heyde C (1980) Martingale Limit Theorem and Its ApplicationsNew York Academic Press

Hallin M Lu Z and Tran L T (2001) ldquoDensity Estimation for Spatial Lin-ear Processesrdquo Bernoulli 7 657ndash668

Haumlrdle W Herwartz H and Spokoiny V (2002) ldquoTime-InhomogeneousMultiple Volatility Modellingrdquo Journal of Finance and Econometrics 155ndash95

Hong Y and Li H (2005) ldquoNonparametric Specification Testing forContinuous-Time Models With Applications to Term Structure of InterestrdquoReview of Financial Studies 18 37ndash84

Karatzas I and Shreve S (1991) Brownian Motion and Stochastic Calculus(2nd ed) New York Springer-Verlag

Kloeden D E Platen E Schurz H and Soslashrensen M (1996) ldquoOn Effectsof Discretization on Estimators of Drift Parameters for Diffusion ProcessesrdquoJournal of Applied Probability 33 1061ndash1076

Liptser R S and Shiryayev A N (2001) Statistics of Random Processes I II(2nd ed) New York Springer-Verlag

Mercurio D and Spokoiny V (2004) ldquoStatistical Inference for Time-Inhomogeneous Volatility Modelsrdquo The Annals of Statistics 32 577ndash602

Morgan J P (1996) RiskMetrics Technical Document (4th ed) New YorkAuthor

Nelson D B (1990) ldquoARCH Models as Diffusion Approximationsrdquo Journalof Econometrics 45 7ndash38

Revuz D and Yor M (1999) Continuous Martingales and Brownian MotionNew York Springer-Verlag

Robinson P M (1997) ldquoLarge-Sample Inference for Nonparametric Regres-sion With Dependent Errorsrdquo The Annals of Statistics 25 2054ndash2083

Rosenblatt M (1970) ldquoDensity Estimates and Markov Sequencesrdquo in Non-parametric Techniques in Statistical Inference ed M L Puri CambridgeUK Cambridge University Press pp 199ndash213

Ruppert D Wand M P Holst U and Houmlssjer O (1997) ldquoLocal PolynomialVariance Function Estimationrdquo Technometrics 39 262ndash273

Spokoiny V (2000) ldquoDrift Estimation for Nonparametric Diffusionrdquo The An-nals of Statistics 28 815ndash836

Stanton R (1997) ldquoA Nonparametric Models of Term Structure Dynamics andthe Market Price of Interest Rate Riskrdquo Journal of Finance LII 1973ndash2002

Vasicek O (1977) ldquoAn Equilibrium Characterization of Term Structurerdquo Jour-nal of Financial Economics 5 177ndash188

Volkonskii V A and Rozanov Yu A (1959) ldquoSome Limit Theorems forRandom Functions Part Irdquo Theory of Probability and Its Applications 4178ndash197

Wang Y (2002) ldquoAsymptotic Nonequivalence of GARCH Models and Diffu-sionsrdquo The Annals of Statistics 30 754ndash783

Page 10: Dynamic Integration of Time- and ... - Princeton Universityorfe.princeton.edu/~jqfan/papers/04/timestate4.pdf · Dynamic Integration of Time- and State-Domain Methods for Volatility

Fan Fan and Jiang Integrating Time and State Domains for Volatility Estimation 627

(a) (b)

Figure 4 The Dynamic Weights for Prediction for Example 3 (a) and the Biases for Prediction for the State-Domain Estimator ( minus minus minus) and theIntegration Estimator ( mdash) (b)

mators have the smallest measure values in median or robustscale This shows that our integrated method continues to per-form better than the others for this nonstationary case

Again Figure 4(b) displays the biases for the state domainand integration estimators It can be seen that the state-domainand integration estimators have almost overlaid biases demon-strating that the bias of the integrated estimator is taken care ofeven though we choose the dynamic weights by minimizing thevariance

52 Real Examples

In this section we apply the integrated volatility estimationmethods and other methods to the analysis of real financial data

521 Treasury Bond Here we consider the weekly returnsof three Treasury Bonds with terms of 1 5 and 10 years Thein-sample data are for January 4 1974ndashDecember 30 1994 andout-sample data are for January 6 1995ndashAugust 8 2003 Thetotal sample size is 1545 and the in-sample size is 1096 Theresults are reported in Table 5

From Table 5 for the 1-year Treasury Notes the integratedestimator is of the smallest MADE and the smallest RADE re-flecting that the integrated estimation method of the volatilityis the best among the seven methods For the bonds with 5-or 10-year maturities the seven estimators have close MADEsand RADEs where the historical simulation method is bet-ter than the RiskMetrics in terms of MADE and RADE andthe integrated estimation approach has the smallest MADEsIn addition the integrated estimator has ER values closest to

05 This demonstrates the advantage of using state-domaininformation which can improve the time-domain predictionof the volatilities in the interest dynamics The nonparametricBayesian method does not perform as well as the integrated es-timator because it uses fixed weights and the inverse-gammaprior which may not be true in practice

522 Exchange Rate We analyze the daily exchange rateof several foreign currencies against the US dollar The dataare for January 3 1994ndashAugust 1 2003 The in-sample dataconsist of the observations before January 1 2001 and the out-sample data consist of the remaining observations The resultsreported in Table 6 show that the integrated estimator has thesmallest MADEs and moderate RADEs for the exchange ratesNote that the RADE measure is effective only when the dataseem to be from a normal distribution and thus cannot calibratethe accuracy of volatility forecast for data from an unknowndistribution For this reason the RADE might not be a goodmeasure of performance for this example Both the integratedvolatility estimation and GARCH perform nicely for this exam-ple They outperform other nonparametric methods

Figure 5 reports the dynamic weights for the daily exchangerate of the UK pound to the US dollar Because the RiskMestimator is far more informative than the StaDo estimator ourintegrated estimator becomes basically the time-domain esti-mator by automatically choosing large dynamic weights Thisexemplifies our statement in the paragraph immediately before(1) in Section 1

Table 5 Comparisons of Several Volatility Estimation Methods

Term Measure (times10minus2) Hist RiskM Semi NonBay Integ GARCH StaDo

1 year MADE 1044 787 787 794 732 893 914RADE 5257 4231 4256 4225 4105 4402 4551

ER 2237 2009 2013 1562 3795 0 7366

5 years MADE 1207 1253 1296 1278 1201 1245 1638RADE 5315 5494 5630 5563 5571 5285 6543

ER 671 1339 1566 1112 5804 0 8929

10 years MADE 1041 1093 1103 1111 1018 1046 1366RADE 4939 5235 5296 5280 5152 4936 6008

ER 1112 1563 1790 1334 4911 0 6920

628 Journal of the American Statistical Association June 2007

Table 6 Comparison of Several Volatility Estimation Methods

Currency Measure Hist RiskM Semi NonBay Integ GARCH StaDo

UK MADE (times10minus4) 614 519 536 519 493 506 609RADE (times10minus3) 3991 3424 3513 3440 3492 3369 3900

ER 009 014 019 015 039 005 052

Australia MADE (times10minus4) 172 132 135 135 126 132 164RADE (times10minus3) 1986 1775 1830 1798 1761 1768 2033

ER 054 025 026 022 042 018 045

Japan MADE (times10minus1) 5554 5232 5444 5439 5064 5084 6987RADE (times10minus1) 3596 3546 3622 3588 3559 3448 4077

ER 012 011 019 012 028 003 032

6 CONCLUSIONS

We have proposed a dynamically integrated method and aBayesian method to aggregate the information from the timedomain and the state domain We studied the performance com-parisons both empirically and theoretically We showed that theproposed integrated method effectively aggregates the informa-tion from both the time and state domains and has advantagesover some previous methods for example it is powerful in fore-casting volatilities for the yields of bonds and for exchangerates Our study also revealed that proper use of informationfrom both the time domain and the state domain makes volatil-ity forecasting more accurate Our method exploits both thecontinuity in the time domain and the stationarity in the statedomain and can be applied to situations in which these twoconditions hold approximately

APPENDIX CONDITIONS AND PROOFSOF THEOREMS

We state technical conditions for the proof of our results

(C1) (Global Lipschitz condition) There exists a constant k0 ge 0such that

|μ(x) minus μ(y)| + |σ(x) minus σ(y)| le k0|x minus y| for all x y isin R

(C2) Given time point t gt 0 there exists a constant L gt 0 such that

E|μ(rs)|4(p+δ) le L and E|σ(rs)|4(p+δ) le L

for any s isin [t minus η t] where η is some positive constant p is a positiveinteger and δ is some small positive constant

Figure 5 Dynamic Weights for the Daily Exchange Rate of the UKPound and the US Dollar

(C3) The discrete observations rtiNi=0 satisfy the stationarity con-

ditions of Banon (1978) Furthermore the G2 condition of Rosenblatt(1970) holds for the transition operator

(C4) The conditional density p(y|x) of rti+ given rti is continuousin the arguments (y x) and is bounded by a constant independent of

(C5) The kernel W is a bounded symmetric probability densityfunction with compact support say [minus11]

(C6) (N minus n)h rarr infin (N minus n)h5 rarr 0 and (N)h rarr 0

Condition (C1) ensures that (3) has a unique strong solution rt t ge0 continuous with probability 1 if the initial value r0 satisfies thatP(|r0| lt infin) = 1 (see thm 46 of Liptser and Shiryayev 2001) Condi-tion (C2) indicates that given time point t gt 0 there is a time interval[t minus η t] on which the drift and the volatility have finite 4(p + δ)thmoments which is needed for deriving asymptotic normality of thetime-domain estimate σ 2

ESt Conditions (C3)ndash(C5) are similar to thoseof Fan and Zhang (2003) Condition (C6) is assumed for bandwidthswhere undersmoothing is uses to yield zero bias in the asymptotic nor-mality of the state-domain estimate

Throughout the proof we denote by M a generic positive constantand use μs and σs to represent μ(rs) and σ(rs)

Proposition A1 Under conditions (C1) and (C2) we have for al-most all sample paths

|σ 2s minus σ 2

u | le K|s minus u|(2pminus1)(4p) (A1)

for any su isin [t minus η t] where the coefficient K satisfies E[K2(p+δ)]ltinfin

Proof First we show that the process rs is locally Houmllder-continuous with order q = (p minus 1)(2p) and coefficient K1 such that

E[K4(p+δ)1 ] lt infin that is

|rs minus ru| le K1|s minus u|q foralls u isin [t minus η t] (A2)

In fact by Jensenrsquos inequality and martingale moment inequalities(Karatzas and Shreve 1991 sec 33D p 163) we have

E|ru minus rs|4(p+δ)

le M

(E

∣∣∣∣int u

sμv dv

∣∣∣∣4(p+δ)

+ E

∣∣∣∣int u

sσv dWv

∣∣∣∣4(p+δ))

le M(u minus s)4(p+δ)minus1int u

sE|μv|4(p+δ) dv

+ M(u minus s)2(p+δ)minus1int u

sE|σv|4(p+δ) dv

le M(u minus s)2(p+δ)

Then by theorem 21 of Revuz and Yor (1999 p 26)

E[(

supt 13=s

|rs minus ru||s minus u|α)4(p+δ)]

lt infin (A3)

Fan Fan and Jiang Integrating Time and State Domains for Volatility Estimation 629

for any α isin [02(p+δ)minus1

4(p+δ)) Let α = pminus1

2p and K1 = sups 13=u|rs minusru||sminusu|(pminus1)(2p) Then E[K4(p+δ)

1 lt infin] and the inequality (A2)holds

Second by condition (C1) we have |σ 2s minus σ 2

u | le k0(σs + σu)|rs minusru| This combined with (A2) leads to

|σ 2s minus σ 2

u | le k0(σs + σu)K1|s minus u|q equiv K|s minus u|q

Then by Houmllderrsquos inequality

E[K2(p+δ)

] le ME[(σsK1)2(p+δ)

]

le Mradic

E[σ

4(p+δ)s

]E[K4(p+δ)

1

]lt infin

Proof of Theorem 1

Let Zis = (rs minus rti)2 Applying Itocircrsquos formula to Zis we obtain

dZis = 2

(int s

tiμu du +

int s

tiσu dWu

)(μs ds + σs dWs

)+ σ 2

s ds

= 2

[(int s

tiμu du +

int s

tiσu dWu

)μs ds + σs

(int s

tiμu du

)dWs

]

+ 2

(int s

tiσu dWu

)σs dWs + σ 2

s ds

Then Y2i can be decomposed as Y2

i = 2ai + 2bi + σ 2i where

ai = minus1[int ti+1

tiμs ds

int s

tiμu du +

int ti+1

tiμs ds

int s

tiσu dWu

+int ti+1

tiσs dWs

int s

tiμu du

]

bi = minus1int ti+1

ti

int s

tiσu dWuσs dWs

and

σ 2i = minus1

int ti+1

tiσ 2

s ds

Therefore σ 2ESt can be written as

σ 2ESt = 2

1 minus λ

1 minus λn

tminus1sum

i=tminusn

λtminusiminus1ai + 21 minus λ

1 minus λn

tminus1sum

i=tminusn

λtminusiminus1bi

+ 1 minus λ

1 minus λn

tminus1sum

i=tminusn

λtminusiminus1σ 2i

equiv An + Bn + Cn

By Proposition A1 as n rarr 0 |Cn minus σ 2t | le K(n)q where q =

(2p minus 1)(4p) This combined with Lemmas A1 and A2 completesthe proof of the theorem

Lemma A1 If condition (C2) is satisfied then E[A2n] = O()

Proof Simple algebra gives the result In fact

E(a2i ) le 3E

[minus1

int ti+1

tiμs ds

int s

tiμu du

]2

+ 3E

[minus1

int ti+1

tiμs ds

int s

tiσu dWu

]2

+ 3E

[minus1

int ti+1

tiσs dWs

int s

tiμu du

]2

equiv I1() + I2() + I3()

Applying Jensenrsquos inequality we obtain that

I1() = O(minus1)E

[int ti+1

ti

int s

tiμ2

s μ2u du ds

]

= O(minus1)

int ti+1

ti

int s

tiE(μ4

u + μ4s )du ds = O()

By Jensenrsquos inequality Houmllderrsquos inequality and martingale momentinequalities we have

I2() = O(minus1)

int ti+1

tiE

(μs

int s

tiσu dWu

)2ds

= O(minus1)

int ti+1

ti

E[μs]4E

[int ti+1

tiσu dWu

]412ds

= O()

Similarly I3() = O() Therefore E(a2i ) = O() Then by the

CauchyndashSchwartz inequality and noting that n(1 minus λ) = O(1) we ob-tain that

E[A2n] le n

(1 minus λ

1 minus λn

)2 nsum

i=1

λ2(nminusi)E(a2i ) = O()

Lemma A2 Under conditions (C1) and (C2) if n rarr infin andn rarr 0 then

sminus11t

radicnBn

Dminusrarr N (01) (A4)

Proof Note that bj = σ 2t minus1 int tj+1

tj (Ws minus Wtj)dWs + εj where

εj = minus1int tj+1

tj(σs minus σt)

[int s

tjσu dWu

]dWs

+ minus1σt

int tj+1

tj

[int s

tj(σu minus σt)dWu

]dWs

By the central limit theorem for martingales (see Hall and Heyde 1980cor 31) it suffices to show that

V2n equiv E[sminus2

1t nB2n] rarr 1 (A5)

and that the following Lyapunov condition holds

tminus1sum

i=tminusn

E

(radicn

1 minus λ

1 minus λn λtminusiminus1bi

)4rarr 0 (A6)

Note that

2

2E(ε2

j ) le E

int tj+1

tj(σs minus σt)

[int s

tjσu dWu

]dWs

2

+ σ 2t E

int tj+1

tj

[int s

tj(σu minus σt)dWu

]dWs

2

equiv Ln1 + Ln2 (A7)

By Jensenrsquos inequality Houmllderrsquos inequality and moment inequalitiesfor martingale we have

Ln1 leint tj+1

tjE

(σs minus σt)

2[int s

tjσu dWu

]2ds

leint tj+1

tj

E(σs minus σt)

4E

[int s

tjσu dWu

]412ds

leint tj+1

tj

E[k0K1(n)q]436

int s

tjE(σ 4

u )du

12ds

le M(n)2q2 (A8)

630 Journal of the American Statistical Association June 2007

where K1 is defined in Proposition A1 Similarly

Ln2 le M(n)2q2 (A9)

By (A7) (A8) and (A9) we have E(ε2j ) le M(n)2q therefore

E[σminus4t b2

j ] = 12 + O((n)q) By the theory of stochastic calculus sim-

ple algebra gives that E(bj) = 0 and E(bibj) = 0 for i 13= j It followsthat

V2n = E(sminus2

1t nB2n) =

tminus1sum

i=tminusn

E

(2s1t

radicn

1 minus λ

1 minus λn λtminusiminus1bi

)2rarr 1

that is (A5) holds For (A6) it suffices to prove that E(b4j ) is

bounded which holds from condition (C2) and by applying the mo-ment inequalities for martingales to b4

j

Proof of Theorem 2

The proof is completed along the same lines as given by Fan andZhang (2003)

Proof of Theorem 3

By Fan and Yao (1998) the volatility estimator σ 2StN

behaves asif the instantaneous return function f were known Thus without lossof generality we assume that f (x) = 0 and hence Ri = Y2

i Let Y =(Y2

0 Y2Nminus1)T and W = diagWh(rt0 minus rtN ) Wh(rtNminus1 minus rtN )

and let X be a (N minus t0)times 2 matrix with each of the elements in the firstcolumn being 1 and the second column being (rt0 minus rtN rtNminus1 minusrtN )prime Let mi = E[Y2

i |rti ] m = (m0 mNminus1)T and e1 = (10)T Define SN = XT WX and TN = XT WY Then it can be written thatσ 2

StN= eT

1 Sminus1N TN (see Fan and Yao 2003) Hence

σ 2StN

minus σ 2tN = eT

1 Sminus1N XT Wm minus XβN + eT

1 Sminus1N XT W(Y minus m)

equiv eT1 b + eT

1 t (A10)

where βN = (m(rtN ) mprime(rtN ))T with m(rx) = E[Y21 |rt1 = x] Follow-

ing Fan and Zhang (2003) the bias vector b converges in probability toa vector b with b = O(h2) = o(1

radic(N)h) In what follows we show

that the centralized vector t is asymptotically normalIn fact write u = (N)minus1Hminus1XT W(Y minus m) where H = diag1h

Then following Fan and Zhang (2003) the vector t can be written as

t = pminus1(rtN )Hminus1Sminus1u(1 + op(1)) (A11)

where S = (μi+jminus2)ij=12 with μj = intujW(u)du and p(x) is the in-

variant density of rt defined in Section 22 For any constant vector cdefine

QN = cT u = 1

N

Nminus1sum

i=0

Y2i minus miCh

(rti minus rtN

)

where Ch(middot) = 1hC(middoth) with C(x) = c0W(x) + c1xW(x) Applyingthe ldquobig-blockrdquo and ldquosmall-blockrdquo arguments of Fan and Yao (2003thm 63 p 238) we obtain

θminus1(rtN )radic

(N)hQNDminusrarr N(01) (A12)

where θ2(rtN ) = 2p(rtN )σ 4(rtN )int +infinminusinfin C2(u)du In what follows we

decompose QN into two parts QprimeN and Qprimeprime

N which satisfy the follow-ing

(a) NhE[θminus1(rtN )QprimeN ]2 le (hN)(hminus1aN(1 + o(1)) + (N) times

o(hminus1)) rarr 0

(b) QprimeprimeN is identically distributed as QN and is asymptotically inde-

pendent of σ 2EStN

Define QprimeN = 1

NsumaN

i=0Y2i minus E[Y2

i |rti ]Ch(rti minus rtN ) and QprimeprimeN = QN minus

QprimeN where aN is a positive integer satisfying aN = o(N) and aN rarr

infin Obviously aN n Let ϑNi = (Y2i minus mi)Ch(rti minus rtN ) Then fol-

lowing Fan and Zhang (2003)

var[θminus1(

rtN)ϑN1

] = hminus1(1 + o(1))

and

Nminus2sum

=1

| cov(ϑN1 ϑN+1)| = o(hminus1)

which yields the result in (a) This combined with (A12) (a) and thedefinition of Qprimeprime

N leads to

θminus1(rtN )radic

NhQprimeprimeN

Dminusrarr N(01) (A13)

Note that the stationarity conditions of Banon (1978) and the G2 con-dition of Rosenblatt (1970) on the transition operator imply that theρt mixing coefficient ρt() of rti decays exponentially and that thestrong mixing coefficient α() le ρt() it follows that∣∣E exp

(Qprimeprime

N + σ 2EStN

) minus E expiξ(QprimeprimeN)E exp

(iξ σ 2

EStN

)∣∣

le 32α((aN minus n)) rarr 0

for any ξ isin R Using the theorem of Volkonskii and Rozanov (1959)we get the asymptotic independence of σ 2

EStNand Qprimeprime

N

By (a)radic

NhQprimeN is asymptotically negligible Then by Theorem 1

d1θminus1(rtN

)radicNhQN + d2Vminus12

2radic

n[σ 2

EStNminus σ 2(

rtN)]

DminusrarrN (0d21 + d2

2)

for any real d1 and d2 where V2 = ec+1ecminus1σ 4(rtN ) Because QN is a

linear transformation of u

Vminus12[ radic

Nhuradicn[σ 2

EStNminus σ 2(rtN )]

]Dminusrarr N (0 I3)

where V = blockdiagV1V2 with V1 = 2σ 4(rtN )p(rtN )Slowast whereSlowast = (νi+jminus2)ij=12 with νj = int

ujW2(u)du This combined with

(A11) gives the joint asymptotic normality of t and σ 2EStN

Note that

b = op(1radic

Nh) and it follows that

minus12(radic

Nh[σ 2StN

minus σ 2(rtN )]radic

n[σ 2EStN

minus σ 2(rtN )])

Dminusrarr N (0 I2)

where = diag2σ 4(rtN )ν0p(rtN )V2 Note that σ 2StN

and σ 2EStN

are asymptotically independent it follows that the asymptotical nor-mality in (b) holds

The result in (c) follows from (b) and the fact that wtNPrarr wtN and

σ 2ItN

= wtN σ 2EStN

+ (1 minus wtN )σ 2StN

It follows that

σ 2ItN minus σ 2

tN

= σ 2tN minus σ 2

tN + (wtN minus wtN

)[(σ 2

EStNminus σ 2

tN

) minus (σ 2

StNminus σ 2

tN

)]

equiv Ln1 + (wtN minus wtN

)(Ln2 minus Ln3)

Because Ln1 Ln2 and Ln3 are of the same order the second and thirdterms are dominated by the first term Ln1 Then σ 2

ItNminus σ 2

tN = (σ 2tN minus

σ 2tN )(1 + op(1)) and thus σ 2

ItNminus σ 2

tN shares the same asymptotical

normality as σ 2tN minus σ 2

tN

[Received November 2005 Revised December 2006]

Fan Fan and Jiang Integrating Time and State Domains for Volatility Estimation 631

REFERENCESAiumlt-Sahalia Y and Mykland P (2004) ldquoEstimators of Diffusions With Ran-

domly Spaced Discrete Observations A General Theoryrdquo The Annals of Sta-tistics 32 2186ndash2222

Aiumlt-Sahalia Y Mykland P A and Zhang L (2005) ldquoHow Often to Samplea Continuous-Time Process in the Presence of Market Microstructure NoiserdquoReview of Financial Studies 18 351ndash416

Arapis M and Gao J (2004) ldquoNonparametric Kernel Estimation and Testingin Continuous-Time Financial Econometricsrdquo manuscript

Bandi F and Phillips (2003) ldquoFully Nonparametric Estimation of Scalar Dif-fusion Modelsrdquo Econometrica 71 241ndash283

Banon G (1978) ldquoNonparametric Identification for Diffusion ProcessesrdquoSIAM Journal of Control Optimization 16 380ndash395

Barndoff-Neilsen O E and Shephard N (2001) ldquoNon-Gaussian OrnsteinndashUhlenbeckndashBased Models and Some of Their Uses in Financial Economicsrdquo(with discussion) Journal of the Royal Statistical Society Ser B 63167ndash241

(2002) ldquoEconometric Analysis of Realized Volatility and Its Use inEstimating Stochastic Volatility Modelsrdquo Journal of the Royal Statistical So-ciety Ser B 64 253ndash280

Bollerslev T and Zhou H (2002) ldquoEstimating Stochastic Volatility DiffusionUsing Conditional Moments of Integrated Volatilityrdquo Journal of Economet-rics 109 33ndash65

Brenner R J Harjes R H and Kroner K F (1996) ldquoAnother Look at Mod-els of the Short-Term Interest Raterdquo Journal of Financial and QuantitativeAnalysis 31 85ndash107

Cai Z and Hong Y (2003) ldquoNonparametric Methods in Continuous-TimeFinance A Selective Reviewrdquo in Recent Advances and Trends in Nonpara-metric Statistics eds M G Akritas and D M Politis Amsterdam Elsevierpp 283ndash302

Chan K C Karolyi A G Longstaff F A and Sanders A B (1992)ldquoAn Empirical Comparison of Alternative Models of the Short-Term Inter-est Raterdquo Journal of Finance 47 1209ndash1227

Chapman D A and Pearson N D (2000) ldquoIs the Short Rate Drift ActuallyNonlinearrdquo Journal of Finance 55 355ndash388

Chen S X Gao J and Tang C Y (2007) ldquoA Test for Model Specificationof Diffusion Processesrdquo The Annals of Statistics to appear

Cox J C Ingersoll J E and Ross S A (1985) ldquoA Theory of the TermStructure of Interest Ratesrdquo Econometrica 53 385ndash467

Daveacute R D and Stahl G (1997) ldquoOn the Accuracy of VaR Estimates Based onthe VariancendashCovariance Approachrdquo working paper Olshen and Associates

Duan J C (1997) ldquoAugumented GARCH(pq) Process and Its DiffusionLimitrdquo Journal of Econometrics 79 97ndash127

Duffie D and Pan J (1997) ldquoAn Overview of Value at Riskrdquo The Journal ofDerivatives 4 7ndash49

Fan J (2005) ldquoA Selective Overview of Nonparametric Methods in FinancialEconometricsrdquo (with discussion) Statistical Science 20 317ndash357

Fan J and Gu J (2003) ldquoSemiparametric Estimation of Value-at-RiskrdquoEconometrics Journal 6 261ndash290

Fan J Jiang J Zhang C and Zhou Z (2003) ldquoTime-Dependent DiffusionModels for Term Structure Dynamics and the Stock Price Volatilityrdquo Statis-tica Sinica 13 965ndash992

Fan J and Yao Q (1998) ldquoEfficient Estimation of Conditional Variance Func-tions in Stochastic Regressionrdquo Biometrika 85 645ndash660

(2003) Nonlinear Time Series Nonparametric and Parametric Meth-ods New York Springer-Verlag

Fan J and Zhang C M (2003) ldquoA Reexamination of Diffusion EstimatorsWith Applications to Financial Model Validationrdquo Journal of the AmericanStatistical Association 98 118ndash134

Genon-Catalot V Jeanthheau T and Laredo C (1999) ldquoParameter Esti-mation for Discretely Observed Stochastic Volatility Modelsrdquo Bernoulli 5855ndash872

Gijbels I Pope A and Wand M P (1999) ldquoUnderstanding ExponentialSmoothing via Kernel Regressionrdquo Journal of the Royal Statistical SocietySer B 61 39ndash50

Hall P and Hart J D (1990) ldquoNonparametric Regression With Long-RangeDependencerdquo Stochastic Processes and Their Applications 36 339ndash351

Hall P and Heyde C (1980) Martingale Limit Theorem and Its ApplicationsNew York Academic Press

Hallin M Lu Z and Tran L T (2001) ldquoDensity Estimation for Spatial Lin-ear Processesrdquo Bernoulli 7 657ndash668

Haumlrdle W Herwartz H and Spokoiny V (2002) ldquoTime-InhomogeneousMultiple Volatility Modellingrdquo Journal of Finance and Econometrics 155ndash95

Hong Y and Li H (2005) ldquoNonparametric Specification Testing forContinuous-Time Models With Applications to Term Structure of InterestrdquoReview of Financial Studies 18 37ndash84

Karatzas I and Shreve S (1991) Brownian Motion and Stochastic Calculus(2nd ed) New York Springer-Verlag

Kloeden D E Platen E Schurz H and Soslashrensen M (1996) ldquoOn Effectsof Discretization on Estimators of Drift Parameters for Diffusion ProcessesrdquoJournal of Applied Probability 33 1061ndash1076

Liptser R S and Shiryayev A N (2001) Statistics of Random Processes I II(2nd ed) New York Springer-Verlag

Mercurio D and Spokoiny V (2004) ldquoStatistical Inference for Time-Inhomogeneous Volatility Modelsrdquo The Annals of Statistics 32 577ndash602

Morgan J P (1996) RiskMetrics Technical Document (4th ed) New YorkAuthor

Nelson D B (1990) ldquoARCH Models as Diffusion Approximationsrdquo Journalof Econometrics 45 7ndash38

Revuz D and Yor M (1999) Continuous Martingales and Brownian MotionNew York Springer-Verlag

Robinson P M (1997) ldquoLarge-Sample Inference for Nonparametric Regres-sion With Dependent Errorsrdquo The Annals of Statistics 25 2054ndash2083

Rosenblatt M (1970) ldquoDensity Estimates and Markov Sequencesrdquo in Non-parametric Techniques in Statistical Inference ed M L Puri CambridgeUK Cambridge University Press pp 199ndash213

Ruppert D Wand M P Holst U and Houmlssjer O (1997) ldquoLocal PolynomialVariance Function Estimationrdquo Technometrics 39 262ndash273

Spokoiny V (2000) ldquoDrift Estimation for Nonparametric Diffusionrdquo The An-nals of Statistics 28 815ndash836

Stanton R (1997) ldquoA Nonparametric Models of Term Structure Dynamics andthe Market Price of Interest Rate Riskrdquo Journal of Finance LII 1973ndash2002

Vasicek O (1977) ldquoAn Equilibrium Characterization of Term Structurerdquo Jour-nal of Financial Economics 5 177ndash188

Volkonskii V A and Rozanov Yu A (1959) ldquoSome Limit Theorems forRandom Functions Part Irdquo Theory of Probability and Its Applications 4178ndash197

Wang Y (2002) ldquoAsymptotic Nonequivalence of GARCH Models and Diffu-sionsrdquo The Annals of Statistics 30 754ndash783

Page 11: Dynamic Integration of Time- and ... - Princeton Universityorfe.princeton.edu/~jqfan/papers/04/timestate4.pdf · Dynamic Integration of Time- and State-Domain Methods for Volatility

628 Journal of the American Statistical Association June 2007

Table 6 Comparison of Several Volatility Estimation Methods

Currency Measure Hist RiskM Semi NonBay Integ GARCH StaDo

UK MADE (times10minus4) 614 519 536 519 493 506 609RADE (times10minus3) 3991 3424 3513 3440 3492 3369 3900

ER 009 014 019 015 039 005 052

Australia MADE (times10minus4) 172 132 135 135 126 132 164RADE (times10minus3) 1986 1775 1830 1798 1761 1768 2033

ER 054 025 026 022 042 018 045

Japan MADE (times10minus1) 5554 5232 5444 5439 5064 5084 6987RADE (times10minus1) 3596 3546 3622 3588 3559 3448 4077

ER 012 011 019 012 028 003 032

6 CONCLUSIONS

We have proposed a dynamically integrated method and aBayesian method to aggregate the information from the timedomain and the state domain We studied the performance com-parisons both empirically and theoretically We showed that theproposed integrated method effectively aggregates the informa-tion from both the time and state domains and has advantagesover some previous methods for example it is powerful in fore-casting volatilities for the yields of bonds and for exchangerates Our study also revealed that proper use of informationfrom both the time domain and the state domain makes volatil-ity forecasting more accurate Our method exploits both thecontinuity in the time domain and the stationarity in the statedomain and can be applied to situations in which these twoconditions hold approximately

APPENDIX CONDITIONS AND PROOFSOF THEOREMS

We state technical conditions for the proof of our results

(C1) (Global Lipschitz condition) There exists a constant k0 ge 0such that

|μ(x) minus μ(y)| + |σ(x) minus σ(y)| le k0|x minus y| for all x y isin R

(C2) Given time point t gt 0 there exists a constant L gt 0 such that

E|μ(rs)|4(p+δ) le L and E|σ(rs)|4(p+δ) le L

for any s isin [t minus η t] where η is some positive constant p is a positiveinteger and δ is some small positive constant

Figure 5 Dynamic Weights for the Daily Exchange Rate of the UKPound and the US Dollar

(C3) The discrete observations rtiNi=0 satisfy the stationarity con-

ditions of Banon (1978) Furthermore the G2 condition of Rosenblatt(1970) holds for the transition operator

(C4) The conditional density p(y|x) of rti+ given rti is continuousin the arguments (y x) and is bounded by a constant independent of

(C5) The kernel W is a bounded symmetric probability densityfunction with compact support say [minus11]

(C6) (N minus n)h rarr infin (N minus n)h5 rarr 0 and (N)h rarr 0

Condition (C1) ensures that (3) has a unique strong solution rt t ge0 continuous with probability 1 if the initial value r0 satisfies thatP(|r0| lt infin) = 1 (see thm 46 of Liptser and Shiryayev 2001) Condi-tion (C2) indicates that given time point t gt 0 there is a time interval[t minus η t] on which the drift and the volatility have finite 4(p + δ)thmoments which is needed for deriving asymptotic normality of thetime-domain estimate σ 2

ESt Conditions (C3)ndash(C5) are similar to thoseof Fan and Zhang (2003) Condition (C6) is assumed for bandwidthswhere undersmoothing is uses to yield zero bias in the asymptotic nor-mality of the state-domain estimate

Throughout the proof we denote by M a generic positive constantand use μs and σs to represent μ(rs) and σ(rs)

Proposition A1 Under conditions (C1) and (C2) we have for al-most all sample paths

|σ 2s minus σ 2

u | le K|s minus u|(2pminus1)(4p) (A1)

for any su isin [t minus η t] where the coefficient K satisfies E[K2(p+δ)]ltinfin

Proof First we show that the process rs is locally Houmllder-continuous with order q = (p minus 1)(2p) and coefficient K1 such that

E[K4(p+δ)1 ] lt infin that is

|rs minus ru| le K1|s minus u|q foralls u isin [t minus η t] (A2)

In fact by Jensenrsquos inequality and martingale moment inequalities(Karatzas and Shreve 1991 sec 33D p 163) we have

E|ru minus rs|4(p+δ)

le M

(E

∣∣∣∣int u

sμv dv

∣∣∣∣4(p+δ)

+ E

∣∣∣∣int u

sσv dWv

∣∣∣∣4(p+δ))

le M(u minus s)4(p+δ)minus1int u

sE|μv|4(p+δ) dv

+ M(u minus s)2(p+δ)minus1int u

sE|σv|4(p+δ) dv

le M(u minus s)2(p+δ)

Then by theorem 21 of Revuz and Yor (1999 p 26)

E[(

supt 13=s

|rs minus ru||s minus u|α)4(p+δ)]

lt infin (A3)

Fan Fan and Jiang Integrating Time and State Domains for Volatility Estimation 629

for any α isin [02(p+δ)minus1

4(p+δ)) Let α = pminus1

2p and K1 = sups 13=u|rs minusru||sminusu|(pminus1)(2p) Then E[K4(p+δ)

1 lt infin] and the inequality (A2)holds

Second by condition (C1) we have |σ 2s minus σ 2

u | le k0(σs + σu)|rs minusru| This combined with (A2) leads to

|σ 2s minus σ 2

u | le k0(σs + σu)K1|s minus u|q equiv K|s minus u|q

Then by Houmllderrsquos inequality

E[K2(p+δ)

] le ME[(σsK1)2(p+δ)

]

le Mradic

E[σ

4(p+δ)s

]E[K4(p+δ)

1

]lt infin

Proof of Theorem 1

Let Zis = (rs minus rti)2 Applying Itocircrsquos formula to Zis we obtain

dZis = 2

(int s

tiμu du +

int s

tiσu dWu

)(μs ds + σs dWs

)+ σ 2

s ds

= 2

[(int s

tiμu du +

int s

tiσu dWu

)μs ds + σs

(int s

tiμu du

)dWs

]

+ 2

(int s

tiσu dWu

)σs dWs + σ 2

s ds

Then Y2i can be decomposed as Y2

i = 2ai + 2bi + σ 2i where

ai = minus1[int ti+1

tiμs ds

int s

tiμu du +

int ti+1

tiμs ds

int s

tiσu dWu

+int ti+1

tiσs dWs

int s

tiμu du

]

bi = minus1int ti+1

ti

int s

tiσu dWuσs dWs

and

σ 2i = minus1

int ti+1

tiσ 2

s ds

Therefore σ 2ESt can be written as

σ 2ESt = 2

1 minus λ

1 minus λn

tminus1sum

i=tminusn

λtminusiminus1ai + 21 minus λ

1 minus λn

tminus1sum

i=tminusn

λtminusiminus1bi

+ 1 minus λ

1 minus λn

tminus1sum

i=tminusn

λtminusiminus1σ 2i

equiv An + Bn + Cn

By Proposition A1 as n rarr 0 |Cn minus σ 2t | le K(n)q where q =

(2p minus 1)(4p) This combined with Lemmas A1 and A2 completesthe proof of the theorem

Lemma A1 If condition (C2) is satisfied then E[A2n] = O()

Proof Simple algebra gives the result In fact

E(a2i ) le 3E

[minus1

int ti+1

tiμs ds

int s

tiμu du

]2

+ 3E

[minus1

int ti+1

tiμs ds

int s

tiσu dWu

]2

+ 3E

[minus1

int ti+1

tiσs dWs

int s

tiμu du

]2

equiv I1() + I2() + I3()

Applying Jensenrsquos inequality we obtain that

I1() = O(minus1)E

[int ti+1

ti

int s

tiμ2

s μ2u du ds

]

= O(minus1)

int ti+1

ti

int s

tiE(μ4

u + μ4s )du ds = O()

By Jensenrsquos inequality Houmllderrsquos inequality and martingale momentinequalities we have

I2() = O(minus1)

int ti+1

tiE

(μs

int s

tiσu dWu

)2ds

= O(minus1)

int ti+1

ti

E[μs]4E

[int ti+1

tiσu dWu

]412ds

= O()

Similarly I3() = O() Therefore E(a2i ) = O() Then by the

CauchyndashSchwartz inequality and noting that n(1 minus λ) = O(1) we ob-tain that

E[A2n] le n

(1 minus λ

1 minus λn

)2 nsum

i=1

λ2(nminusi)E(a2i ) = O()

Lemma A2 Under conditions (C1) and (C2) if n rarr infin andn rarr 0 then

sminus11t

radicnBn

Dminusrarr N (01) (A4)

Proof Note that bj = σ 2t minus1 int tj+1

tj (Ws minus Wtj)dWs + εj where

εj = minus1int tj+1

tj(σs minus σt)

[int s

tjσu dWu

]dWs

+ minus1σt

int tj+1

tj

[int s

tj(σu minus σt)dWu

]dWs

By the central limit theorem for martingales (see Hall and Heyde 1980cor 31) it suffices to show that

V2n equiv E[sminus2

1t nB2n] rarr 1 (A5)

and that the following Lyapunov condition holds

tminus1sum

i=tminusn

E

(radicn

1 minus λ

1 minus λn λtminusiminus1bi

)4rarr 0 (A6)

Note that

2

2E(ε2

j ) le E

int tj+1

tj(σs minus σt)

[int s

tjσu dWu

]dWs

2

+ σ 2t E

int tj+1

tj

[int s

tj(σu minus σt)dWu

]dWs

2

equiv Ln1 + Ln2 (A7)

By Jensenrsquos inequality Houmllderrsquos inequality and moment inequalitiesfor martingale we have

Ln1 leint tj+1

tjE

(σs minus σt)

2[int s

tjσu dWu

]2ds

leint tj+1

tj

E(σs minus σt)

4E

[int s

tjσu dWu

]412ds

leint tj+1

tj

E[k0K1(n)q]436

int s

tjE(σ 4

u )du

12ds

le M(n)2q2 (A8)

630 Journal of the American Statistical Association June 2007

where K1 is defined in Proposition A1 Similarly

Ln2 le M(n)2q2 (A9)

By (A7) (A8) and (A9) we have E(ε2j ) le M(n)2q therefore

E[σminus4t b2

j ] = 12 + O((n)q) By the theory of stochastic calculus sim-

ple algebra gives that E(bj) = 0 and E(bibj) = 0 for i 13= j It followsthat

V2n = E(sminus2

1t nB2n) =

tminus1sum

i=tminusn

E

(2s1t

radicn

1 minus λ

1 minus λn λtminusiminus1bi

)2rarr 1

that is (A5) holds For (A6) it suffices to prove that E(b4j ) is

bounded which holds from condition (C2) and by applying the mo-ment inequalities for martingales to b4

j

Proof of Theorem 2

The proof is completed along the same lines as given by Fan andZhang (2003)

Proof of Theorem 3

By Fan and Yao (1998) the volatility estimator σ 2StN

behaves asif the instantaneous return function f were known Thus without lossof generality we assume that f (x) = 0 and hence Ri = Y2

i Let Y =(Y2

0 Y2Nminus1)T and W = diagWh(rt0 minus rtN ) Wh(rtNminus1 minus rtN )

and let X be a (N minus t0)times 2 matrix with each of the elements in the firstcolumn being 1 and the second column being (rt0 minus rtN rtNminus1 minusrtN )prime Let mi = E[Y2

i |rti ] m = (m0 mNminus1)T and e1 = (10)T Define SN = XT WX and TN = XT WY Then it can be written thatσ 2

StN= eT

1 Sminus1N TN (see Fan and Yao 2003) Hence

σ 2StN

minus σ 2tN = eT

1 Sminus1N XT Wm minus XβN + eT

1 Sminus1N XT W(Y minus m)

equiv eT1 b + eT

1 t (A10)

where βN = (m(rtN ) mprime(rtN ))T with m(rx) = E[Y21 |rt1 = x] Follow-

ing Fan and Zhang (2003) the bias vector b converges in probability toa vector b with b = O(h2) = o(1

radic(N)h) In what follows we show

that the centralized vector t is asymptotically normalIn fact write u = (N)minus1Hminus1XT W(Y minus m) where H = diag1h

Then following Fan and Zhang (2003) the vector t can be written as

t = pminus1(rtN )Hminus1Sminus1u(1 + op(1)) (A11)

where S = (μi+jminus2)ij=12 with μj = intujW(u)du and p(x) is the in-

variant density of rt defined in Section 22 For any constant vector cdefine

QN = cT u = 1

N

Nminus1sum

i=0

Y2i minus miCh

(rti minus rtN

)

where Ch(middot) = 1hC(middoth) with C(x) = c0W(x) + c1xW(x) Applyingthe ldquobig-blockrdquo and ldquosmall-blockrdquo arguments of Fan and Yao (2003thm 63 p 238) we obtain

θminus1(rtN )radic

(N)hQNDminusrarr N(01) (A12)

where θ2(rtN ) = 2p(rtN )σ 4(rtN )int +infinminusinfin C2(u)du In what follows we

decompose QN into two parts QprimeN and Qprimeprime

N which satisfy the follow-ing

(a) NhE[θminus1(rtN )QprimeN ]2 le (hN)(hminus1aN(1 + o(1)) + (N) times

o(hminus1)) rarr 0

(b) QprimeprimeN is identically distributed as QN and is asymptotically inde-

pendent of σ 2EStN

Define QprimeN = 1

NsumaN

i=0Y2i minus E[Y2

i |rti ]Ch(rti minus rtN ) and QprimeprimeN = QN minus

QprimeN where aN is a positive integer satisfying aN = o(N) and aN rarr

infin Obviously aN n Let ϑNi = (Y2i minus mi)Ch(rti minus rtN ) Then fol-

lowing Fan and Zhang (2003)

var[θminus1(

rtN)ϑN1

] = hminus1(1 + o(1))

and

Nminus2sum

=1

| cov(ϑN1 ϑN+1)| = o(hminus1)

which yields the result in (a) This combined with (A12) (a) and thedefinition of Qprimeprime

N leads to

θminus1(rtN )radic

NhQprimeprimeN

Dminusrarr N(01) (A13)

Note that the stationarity conditions of Banon (1978) and the G2 con-dition of Rosenblatt (1970) on the transition operator imply that theρt mixing coefficient ρt() of rti decays exponentially and that thestrong mixing coefficient α() le ρt() it follows that∣∣E exp

(Qprimeprime

N + σ 2EStN

) minus E expiξ(QprimeprimeN)E exp

(iξ σ 2

EStN

)∣∣

le 32α((aN minus n)) rarr 0

for any ξ isin R Using the theorem of Volkonskii and Rozanov (1959)we get the asymptotic independence of σ 2

EStNand Qprimeprime

N

By (a)radic

NhQprimeN is asymptotically negligible Then by Theorem 1

d1θminus1(rtN

)radicNhQN + d2Vminus12

2radic

n[σ 2

EStNminus σ 2(

rtN)]

DminusrarrN (0d21 + d2

2)

for any real d1 and d2 where V2 = ec+1ecminus1σ 4(rtN ) Because QN is a

linear transformation of u

Vminus12[ radic

Nhuradicn[σ 2

EStNminus σ 2(rtN )]

]Dminusrarr N (0 I3)

where V = blockdiagV1V2 with V1 = 2σ 4(rtN )p(rtN )Slowast whereSlowast = (νi+jminus2)ij=12 with νj = int

ujW2(u)du This combined with

(A11) gives the joint asymptotic normality of t and σ 2EStN

Note that

b = op(1radic

Nh) and it follows that

minus12(radic

Nh[σ 2StN

minus σ 2(rtN )]radic

n[σ 2EStN

minus σ 2(rtN )])

Dminusrarr N (0 I2)

where = diag2σ 4(rtN )ν0p(rtN )V2 Note that σ 2StN

and σ 2EStN

are asymptotically independent it follows that the asymptotical nor-mality in (b) holds

The result in (c) follows from (b) and the fact that wtNPrarr wtN and

σ 2ItN

= wtN σ 2EStN

+ (1 minus wtN )σ 2StN

It follows that

σ 2ItN minus σ 2

tN

= σ 2tN minus σ 2

tN + (wtN minus wtN

)[(σ 2

EStNminus σ 2

tN

) minus (σ 2

StNminus σ 2

tN

)]

equiv Ln1 + (wtN minus wtN

)(Ln2 minus Ln3)

Because Ln1 Ln2 and Ln3 are of the same order the second and thirdterms are dominated by the first term Ln1 Then σ 2

ItNminus σ 2

tN = (σ 2tN minus

σ 2tN )(1 + op(1)) and thus σ 2

ItNminus σ 2

tN shares the same asymptotical

normality as σ 2tN minus σ 2

tN

[Received November 2005 Revised December 2006]

Fan Fan and Jiang Integrating Time and State Domains for Volatility Estimation 631

REFERENCESAiumlt-Sahalia Y and Mykland P (2004) ldquoEstimators of Diffusions With Ran-

domly Spaced Discrete Observations A General Theoryrdquo The Annals of Sta-tistics 32 2186ndash2222

Aiumlt-Sahalia Y Mykland P A and Zhang L (2005) ldquoHow Often to Samplea Continuous-Time Process in the Presence of Market Microstructure NoiserdquoReview of Financial Studies 18 351ndash416

Arapis M and Gao J (2004) ldquoNonparametric Kernel Estimation and Testingin Continuous-Time Financial Econometricsrdquo manuscript

Bandi F and Phillips (2003) ldquoFully Nonparametric Estimation of Scalar Dif-fusion Modelsrdquo Econometrica 71 241ndash283

Banon G (1978) ldquoNonparametric Identification for Diffusion ProcessesrdquoSIAM Journal of Control Optimization 16 380ndash395

Barndoff-Neilsen O E and Shephard N (2001) ldquoNon-Gaussian OrnsteinndashUhlenbeckndashBased Models and Some of Their Uses in Financial Economicsrdquo(with discussion) Journal of the Royal Statistical Society Ser B 63167ndash241

(2002) ldquoEconometric Analysis of Realized Volatility and Its Use inEstimating Stochastic Volatility Modelsrdquo Journal of the Royal Statistical So-ciety Ser B 64 253ndash280

Bollerslev T and Zhou H (2002) ldquoEstimating Stochastic Volatility DiffusionUsing Conditional Moments of Integrated Volatilityrdquo Journal of Economet-rics 109 33ndash65

Brenner R J Harjes R H and Kroner K F (1996) ldquoAnother Look at Mod-els of the Short-Term Interest Raterdquo Journal of Financial and QuantitativeAnalysis 31 85ndash107

Cai Z and Hong Y (2003) ldquoNonparametric Methods in Continuous-TimeFinance A Selective Reviewrdquo in Recent Advances and Trends in Nonpara-metric Statistics eds M G Akritas and D M Politis Amsterdam Elsevierpp 283ndash302

Chan K C Karolyi A G Longstaff F A and Sanders A B (1992)ldquoAn Empirical Comparison of Alternative Models of the Short-Term Inter-est Raterdquo Journal of Finance 47 1209ndash1227

Chapman D A and Pearson N D (2000) ldquoIs the Short Rate Drift ActuallyNonlinearrdquo Journal of Finance 55 355ndash388

Chen S X Gao J and Tang C Y (2007) ldquoA Test for Model Specificationof Diffusion Processesrdquo The Annals of Statistics to appear

Cox J C Ingersoll J E and Ross S A (1985) ldquoA Theory of the TermStructure of Interest Ratesrdquo Econometrica 53 385ndash467

Daveacute R D and Stahl G (1997) ldquoOn the Accuracy of VaR Estimates Based onthe VariancendashCovariance Approachrdquo working paper Olshen and Associates

Duan J C (1997) ldquoAugumented GARCH(pq) Process and Its DiffusionLimitrdquo Journal of Econometrics 79 97ndash127

Duffie D and Pan J (1997) ldquoAn Overview of Value at Riskrdquo The Journal ofDerivatives 4 7ndash49

Fan J (2005) ldquoA Selective Overview of Nonparametric Methods in FinancialEconometricsrdquo (with discussion) Statistical Science 20 317ndash357

Fan J and Gu J (2003) ldquoSemiparametric Estimation of Value-at-RiskrdquoEconometrics Journal 6 261ndash290

Fan J Jiang J Zhang C and Zhou Z (2003) ldquoTime-Dependent DiffusionModels for Term Structure Dynamics and the Stock Price Volatilityrdquo Statis-tica Sinica 13 965ndash992

Fan J and Yao Q (1998) ldquoEfficient Estimation of Conditional Variance Func-tions in Stochastic Regressionrdquo Biometrika 85 645ndash660

(2003) Nonlinear Time Series Nonparametric and Parametric Meth-ods New York Springer-Verlag

Fan J and Zhang C M (2003) ldquoA Reexamination of Diffusion EstimatorsWith Applications to Financial Model Validationrdquo Journal of the AmericanStatistical Association 98 118ndash134

Genon-Catalot V Jeanthheau T and Laredo C (1999) ldquoParameter Esti-mation for Discretely Observed Stochastic Volatility Modelsrdquo Bernoulli 5855ndash872

Gijbels I Pope A and Wand M P (1999) ldquoUnderstanding ExponentialSmoothing via Kernel Regressionrdquo Journal of the Royal Statistical SocietySer B 61 39ndash50

Hall P and Hart J D (1990) ldquoNonparametric Regression With Long-RangeDependencerdquo Stochastic Processes and Their Applications 36 339ndash351

Hall P and Heyde C (1980) Martingale Limit Theorem and Its ApplicationsNew York Academic Press

Hallin M Lu Z and Tran L T (2001) ldquoDensity Estimation for Spatial Lin-ear Processesrdquo Bernoulli 7 657ndash668

Haumlrdle W Herwartz H and Spokoiny V (2002) ldquoTime-InhomogeneousMultiple Volatility Modellingrdquo Journal of Finance and Econometrics 155ndash95

Hong Y and Li H (2005) ldquoNonparametric Specification Testing forContinuous-Time Models With Applications to Term Structure of InterestrdquoReview of Financial Studies 18 37ndash84

Karatzas I and Shreve S (1991) Brownian Motion and Stochastic Calculus(2nd ed) New York Springer-Verlag

Kloeden D E Platen E Schurz H and Soslashrensen M (1996) ldquoOn Effectsof Discretization on Estimators of Drift Parameters for Diffusion ProcessesrdquoJournal of Applied Probability 33 1061ndash1076

Liptser R S and Shiryayev A N (2001) Statistics of Random Processes I II(2nd ed) New York Springer-Verlag

Mercurio D and Spokoiny V (2004) ldquoStatistical Inference for Time-Inhomogeneous Volatility Modelsrdquo The Annals of Statistics 32 577ndash602

Morgan J P (1996) RiskMetrics Technical Document (4th ed) New YorkAuthor

Nelson D B (1990) ldquoARCH Models as Diffusion Approximationsrdquo Journalof Econometrics 45 7ndash38

Revuz D and Yor M (1999) Continuous Martingales and Brownian MotionNew York Springer-Verlag

Robinson P M (1997) ldquoLarge-Sample Inference for Nonparametric Regres-sion With Dependent Errorsrdquo The Annals of Statistics 25 2054ndash2083

Rosenblatt M (1970) ldquoDensity Estimates and Markov Sequencesrdquo in Non-parametric Techniques in Statistical Inference ed M L Puri CambridgeUK Cambridge University Press pp 199ndash213

Ruppert D Wand M P Holst U and Houmlssjer O (1997) ldquoLocal PolynomialVariance Function Estimationrdquo Technometrics 39 262ndash273

Spokoiny V (2000) ldquoDrift Estimation for Nonparametric Diffusionrdquo The An-nals of Statistics 28 815ndash836

Stanton R (1997) ldquoA Nonparametric Models of Term Structure Dynamics andthe Market Price of Interest Rate Riskrdquo Journal of Finance LII 1973ndash2002

Vasicek O (1977) ldquoAn Equilibrium Characterization of Term Structurerdquo Jour-nal of Financial Economics 5 177ndash188

Volkonskii V A and Rozanov Yu A (1959) ldquoSome Limit Theorems forRandom Functions Part Irdquo Theory of Probability and Its Applications 4178ndash197

Wang Y (2002) ldquoAsymptotic Nonequivalence of GARCH Models and Diffu-sionsrdquo The Annals of Statistics 30 754ndash783

Page 12: Dynamic Integration of Time- and ... - Princeton Universityorfe.princeton.edu/~jqfan/papers/04/timestate4.pdf · Dynamic Integration of Time- and State-Domain Methods for Volatility

Fan Fan and Jiang Integrating Time and State Domains for Volatility Estimation 629

for any α isin [02(p+δ)minus1

4(p+δ)) Let α = pminus1

2p and K1 = sups 13=u|rs minusru||sminusu|(pminus1)(2p) Then E[K4(p+δ)

1 lt infin] and the inequality (A2)holds

Second by condition (C1) we have |σ 2s minus σ 2

u | le k0(σs + σu)|rs minusru| This combined with (A2) leads to

|σ 2s minus σ 2

u | le k0(σs + σu)K1|s minus u|q equiv K|s minus u|q

Then by Houmllderrsquos inequality

E[K2(p+δ)

] le ME[(σsK1)2(p+δ)

]

le Mradic

E[σ

4(p+δ)s

]E[K4(p+δ)

1

]lt infin

Proof of Theorem 1

Let Zis = (rs minus rti)2 Applying Itocircrsquos formula to Zis we obtain

dZis = 2

(int s

tiμu du +

int s

tiσu dWu

)(μs ds + σs dWs

)+ σ 2

s ds

= 2

[(int s

tiμu du +

int s

tiσu dWu

)μs ds + σs

(int s

tiμu du

)dWs

]

+ 2

(int s

tiσu dWu

)σs dWs + σ 2

s ds

Then Y2i can be decomposed as Y2

i = 2ai + 2bi + σ 2i where

ai = minus1[int ti+1

tiμs ds

int s

tiμu du +

int ti+1

tiμs ds

int s

tiσu dWu

+int ti+1

tiσs dWs

int s

tiμu du

]

bi = minus1int ti+1

ti

int s

tiσu dWuσs dWs

and

σ 2i = minus1

int ti+1

tiσ 2

s ds

Therefore σ 2ESt can be written as

σ 2ESt = 2

1 minus λ

1 minus λn

tminus1sum

i=tminusn

λtminusiminus1ai + 21 minus λ

1 minus λn

tminus1sum

i=tminusn

λtminusiminus1bi

+ 1 minus λ

1 minus λn

tminus1sum

i=tminusn

λtminusiminus1σ 2i

equiv An + Bn + Cn

By Proposition A1 as n rarr 0 |Cn minus σ 2t | le K(n)q where q =

(2p minus 1)(4p) This combined with Lemmas A1 and A2 completesthe proof of the theorem

Lemma A1 If condition (C2) is satisfied then E[A2n] = O()

Proof Simple algebra gives the result In fact

E(a2i ) le 3E

[minus1

int ti+1

tiμs ds

int s

tiμu du

]2

+ 3E

[minus1

int ti+1

tiμs ds

int s

tiσu dWu

]2

+ 3E

[minus1

int ti+1

tiσs dWs

int s

tiμu du

]2

equiv I1() + I2() + I3()

Applying Jensenrsquos inequality we obtain that

I1() = O(minus1)E

[int ti+1

ti

int s

tiμ2

s μ2u du ds

]

= O(minus1)

int ti+1

ti

int s

tiE(μ4

u + μ4s )du ds = O()

By Jensenrsquos inequality Houmllderrsquos inequality and martingale momentinequalities we have

I2() = O(minus1)

int ti+1

tiE

(μs

int s

tiσu dWu

)2ds

= O(minus1)

int ti+1

ti

E[μs]4E

[int ti+1

tiσu dWu

]412ds

= O()

Similarly I3() = O() Therefore E(a2i ) = O() Then by the

CauchyndashSchwartz inequality and noting that n(1 minus λ) = O(1) we ob-tain that

E[A2n] le n

(1 minus λ

1 minus λn

)2 nsum

i=1

λ2(nminusi)E(a2i ) = O()

Lemma A2 Under conditions (C1) and (C2) if n rarr infin andn rarr 0 then

sminus11t

radicnBn

Dminusrarr N (01) (A4)

Proof Note that bj = σ 2t minus1 int tj+1

tj (Ws minus Wtj)dWs + εj where

εj = minus1int tj+1

tj(σs minus σt)

[int s

tjσu dWu

]dWs

+ minus1σt

int tj+1

tj

[int s

tj(σu minus σt)dWu

]dWs

By the central limit theorem for martingales (see Hall and Heyde 1980cor 31) it suffices to show that

V2n equiv E[sminus2

1t nB2n] rarr 1 (A5)

and that the following Lyapunov condition holds

tminus1sum

i=tminusn

E

(radicn

1 minus λ

1 minus λn λtminusiminus1bi

)4rarr 0 (A6)

Note that

2

2E(ε2

j ) le E

int tj+1

tj(σs minus σt)

[int s

tjσu dWu

]dWs

2

+ σ 2t E

int tj+1

tj

[int s

tj(σu minus σt)dWu

]dWs

2

equiv Ln1 + Ln2 (A7)

By Jensenrsquos inequality Houmllderrsquos inequality and moment inequalitiesfor martingale we have

Ln1 leint tj+1

tjE

(σs minus σt)

2[int s

tjσu dWu

]2ds

leint tj+1

tj

E(σs minus σt)

4E

[int s

tjσu dWu

]412ds

leint tj+1

tj

E[k0K1(n)q]436

int s

tjE(σ 4

u )du

12ds

le M(n)2q2 (A8)

630 Journal of the American Statistical Association June 2007

where K1 is defined in Proposition A1 Similarly

Ln2 le M(n)2q2 (A9)

By (A7) (A8) and (A9) we have E(ε2j ) le M(n)2q therefore

E[σminus4t b2

j ] = 12 + O((n)q) By the theory of stochastic calculus sim-

ple algebra gives that E(bj) = 0 and E(bibj) = 0 for i 13= j It followsthat

V2n = E(sminus2

1t nB2n) =

tminus1sum

i=tminusn

E

(2s1t

radicn

1 minus λ

1 minus λn λtminusiminus1bi

)2rarr 1

that is (A5) holds For (A6) it suffices to prove that E(b4j ) is

bounded which holds from condition (C2) and by applying the mo-ment inequalities for martingales to b4

j

Proof of Theorem 2

The proof is completed along the same lines as given by Fan andZhang (2003)

Proof of Theorem 3

By Fan and Yao (1998) the volatility estimator σ 2StN

behaves asif the instantaneous return function f were known Thus without lossof generality we assume that f (x) = 0 and hence Ri = Y2

i Let Y =(Y2

0 Y2Nminus1)T and W = diagWh(rt0 minus rtN ) Wh(rtNminus1 minus rtN )

and let X be a (N minus t0)times 2 matrix with each of the elements in the firstcolumn being 1 and the second column being (rt0 minus rtN rtNminus1 minusrtN )prime Let mi = E[Y2

i |rti ] m = (m0 mNminus1)T and e1 = (10)T Define SN = XT WX and TN = XT WY Then it can be written thatσ 2

StN= eT

1 Sminus1N TN (see Fan and Yao 2003) Hence

σ 2StN

minus σ 2tN = eT

1 Sminus1N XT Wm minus XβN + eT

1 Sminus1N XT W(Y minus m)

equiv eT1 b + eT

1 t (A10)

where βN = (m(rtN ) mprime(rtN ))T with m(rx) = E[Y21 |rt1 = x] Follow-

ing Fan and Zhang (2003) the bias vector b converges in probability toa vector b with b = O(h2) = o(1

radic(N)h) In what follows we show

that the centralized vector t is asymptotically normalIn fact write u = (N)minus1Hminus1XT W(Y minus m) where H = diag1h

Then following Fan and Zhang (2003) the vector t can be written as

t = pminus1(rtN )Hminus1Sminus1u(1 + op(1)) (A11)

where S = (μi+jminus2)ij=12 with μj = intujW(u)du and p(x) is the in-

variant density of rt defined in Section 22 For any constant vector cdefine

QN = cT u = 1

N

Nminus1sum

i=0

Y2i minus miCh

(rti minus rtN

)

where Ch(middot) = 1hC(middoth) with C(x) = c0W(x) + c1xW(x) Applyingthe ldquobig-blockrdquo and ldquosmall-blockrdquo arguments of Fan and Yao (2003thm 63 p 238) we obtain

θminus1(rtN )radic

(N)hQNDminusrarr N(01) (A12)

where θ2(rtN ) = 2p(rtN )σ 4(rtN )int +infinminusinfin C2(u)du In what follows we

decompose QN into two parts QprimeN and Qprimeprime

N which satisfy the follow-ing

(a) NhE[θminus1(rtN )QprimeN ]2 le (hN)(hminus1aN(1 + o(1)) + (N) times

o(hminus1)) rarr 0

(b) QprimeprimeN is identically distributed as QN and is asymptotically inde-

pendent of σ 2EStN

Define QprimeN = 1

NsumaN

i=0Y2i minus E[Y2

i |rti ]Ch(rti minus rtN ) and QprimeprimeN = QN minus

QprimeN where aN is a positive integer satisfying aN = o(N) and aN rarr

infin Obviously aN n Let ϑNi = (Y2i minus mi)Ch(rti minus rtN ) Then fol-

lowing Fan and Zhang (2003)

var[θminus1(

rtN)ϑN1

] = hminus1(1 + o(1))

and

Nminus2sum

=1

| cov(ϑN1 ϑN+1)| = o(hminus1)

which yields the result in (a) This combined with (A12) (a) and thedefinition of Qprimeprime

N leads to

θminus1(rtN )radic

NhQprimeprimeN

Dminusrarr N(01) (A13)

Note that the stationarity conditions of Banon (1978) and the G2 con-dition of Rosenblatt (1970) on the transition operator imply that theρt mixing coefficient ρt() of rti decays exponentially and that thestrong mixing coefficient α() le ρt() it follows that∣∣E exp

(Qprimeprime

N + σ 2EStN

) minus E expiξ(QprimeprimeN)E exp

(iξ σ 2

EStN

)∣∣

le 32α((aN minus n)) rarr 0

for any ξ isin R Using the theorem of Volkonskii and Rozanov (1959)we get the asymptotic independence of σ 2

EStNand Qprimeprime

N

By (a)radic

NhQprimeN is asymptotically negligible Then by Theorem 1

d1θminus1(rtN

)radicNhQN + d2Vminus12

2radic

n[σ 2

EStNminus σ 2(

rtN)]

DminusrarrN (0d21 + d2

2)

for any real d1 and d2 where V2 = ec+1ecminus1σ 4(rtN ) Because QN is a

linear transformation of u

Vminus12[ radic

Nhuradicn[σ 2

EStNminus σ 2(rtN )]

]Dminusrarr N (0 I3)

where V = blockdiagV1V2 with V1 = 2σ 4(rtN )p(rtN )Slowast whereSlowast = (νi+jminus2)ij=12 with νj = int

ujW2(u)du This combined with

(A11) gives the joint asymptotic normality of t and σ 2EStN

Note that

b = op(1radic

Nh) and it follows that

minus12(radic

Nh[σ 2StN

minus σ 2(rtN )]radic

n[σ 2EStN

minus σ 2(rtN )])

Dminusrarr N (0 I2)

where = diag2σ 4(rtN )ν0p(rtN )V2 Note that σ 2StN

and σ 2EStN

are asymptotically independent it follows that the asymptotical nor-mality in (b) holds

The result in (c) follows from (b) and the fact that wtNPrarr wtN and

σ 2ItN

= wtN σ 2EStN

+ (1 minus wtN )σ 2StN

It follows that

σ 2ItN minus σ 2

tN

= σ 2tN minus σ 2

tN + (wtN minus wtN

)[(σ 2

EStNminus σ 2

tN

) minus (σ 2

StNminus σ 2

tN

)]

equiv Ln1 + (wtN minus wtN

)(Ln2 minus Ln3)

Because Ln1 Ln2 and Ln3 are of the same order the second and thirdterms are dominated by the first term Ln1 Then σ 2

ItNminus σ 2

tN = (σ 2tN minus

σ 2tN )(1 + op(1)) and thus σ 2

ItNminus σ 2

tN shares the same asymptotical

normality as σ 2tN minus σ 2

tN

[Received November 2005 Revised December 2006]

Fan Fan and Jiang Integrating Time and State Domains for Volatility Estimation 631

REFERENCESAiumlt-Sahalia Y and Mykland P (2004) ldquoEstimators of Diffusions With Ran-

domly Spaced Discrete Observations A General Theoryrdquo The Annals of Sta-tistics 32 2186ndash2222

Aiumlt-Sahalia Y Mykland P A and Zhang L (2005) ldquoHow Often to Samplea Continuous-Time Process in the Presence of Market Microstructure NoiserdquoReview of Financial Studies 18 351ndash416

Arapis M and Gao J (2004) ldquoNonparametric Kernel Estimation and Testingin Continuous-Time Financial Econometricsrdquo manuscript

Bandi F and Phillips (2003) ldquoFully Nonparametric Estimation of Scalar Dif-fusion Modelsrdquo Econometrica 71 241ndash283

Banon G (1978) ldquoNonparametric Identification for Diffusion ProcessesrdquoSIAM Journal of Control Optimization 16 380ndash395

Barndoff-Neilsen O E and Shephard N (2001) ldquoNon-Gaussian OrnsteinndashUhlenbeckndashBased Models and Some of Their Uses in Financial Economicsrdquo(with discussion) Journal of the Royal Statistical Society Ser B 63167ndash241

(2002) ldquoEconometric Analysis of Realized Volatility and Its Use inEstimating Stochastic Volatility Modelsrdquo Journal of the Royal Statistical So-ciety Ser B 64 253ndash280

Bollerslev T and Zhou H (2002) ldquoEstimating Stochastic Volatility DiffusionUsing Conditional Moments of Integrated Volatilityrdquo Journal of Economet-rics 109 33ndash65

Brenner R J Harjes R H and Kroner K F (1996) ldquoAnother Look at Mod-els of the Short-Term Interest Raterdquo Journal of Financial and QuantitativeAnalysis 31 85ndash107

Cai Z and Hong Y (2003) ldquoNonparametric Methods in Continuous-TimeFinance A Selective Reviewrdquo in Recent Advances and Trends in Nonpara-metric Statistics eds M G Akritas and D M Politis Amsterdam Elsevierpp 283ndash302

Chan K C Karolyi A G Longstaff F A and Sanders A B (1992)ldquoAn Empirical Comparison of Alternative Models of the Short-Term Inter-est Raterdquo Journal of Finance 47 1209ndash1227

Chapman D A and Pearson N D (2000) ldquoIs the Short Rate Drift ActuallyNonlinearrdquo Journal of Finance 55 355ndash388

Chen S X Gao J and Tang C Y (2007) ldquoA Test for Model Specificationof Diffusion Processesrdquo The Annals of Statistics to appear

Cox J C Ingersoll J E and Ross S A (1985) ldquoA Theory of the TermStructure of Interest Ratesrdquo Econometrica 53 385ndash467

Daveacute R D and Stahl G (1997) ldquoOn the Accuracy of VaR Estimates Based onthe VariancendashCovariance Approachrdquo working paper Olshen and Associates

Duan J C (1997) ldquoAugumented GARCH(pq) Process and Its DiffusionLimitrdquo Journal of Econometrics 79 97ndash127

Duffie D and Pan J (1997) ldquoAn Overview of Value at Riskrdquo The Journal ofDerivatives 4 7ndash49

Fan J (2005) ldquoA Selective Overview of Nonparametric Methods in FinancialEconometricsrdquo (with discussion) Statistical Science 20 317ndash357

Fan J and Gu J (2003) ldquoSemiparametric Estimation of Value-at-RiskrdquoEconometrics Journal 6 261ndash290

Fan J Jiang J Zhang C and Zhou Z (2003) ldquoTime-Dependent DiffusionModels for Term Structure Dynamics and the Stock Price Volatilityrdquo Statis-tica Sinica 13 965ndash992

Fan J and Yao Q (1998) ldquoEfficient Estimation of Conditional Variance Func-tions in Stochastic Regressionrdquo Biometrika 85 645ndash660

(2003) Nonlinear Time Series Nonparametric and Parametric Meth-ods New York Springer-Verlag

Fan J and Zhang C M (2003) ldquoA Reexamination of Diffusion EstimatorsWith Applications to Financial Model Validationrdquo Journal of the AmericanStatistical Association 98 118ndash134

Genon-Catalot V Jeanthheau T and Laredo C (1999) ldquoParameter Esti-mation for Discretely Observed Stochastic Volatility Modelsrdquo Bernoulli 5855ndash872

Gijbels I Pope A and Wand M P (1999) ldquoUnderstanding ExponentialSmoothing via Kernel Regressionrdquo Journal of the Royal Statistical SocietySer B 61 39ndash50

Hall P and Hart J D (1990) ldquoNonparametric Regression With Long-RangeDependencerdquo Stochastic Processes and Their Applications 36 339ndash351

Hall P and Heyde C (1980) Martingale Limit Theorem and Its ApplicationsNew York Academic Press

Hallin M Lu Z and Tran L T (2001) ldquoDensity Estimation for Spatial Lin-ear Processesrdquo Bernoulli 7 657ndash668

Haumlrdle W Herwartz H and Spokoiny V (2002) ldquoTime-InhomogeneousMultiple Volatility Modellingrdquo Journal of Finance and Econometrics 155ndash95

Hong Y and Li H (2005) ldquoNonparametric Specification Testing forContinuous-Time Models With Applications to Term Structure of InterestrdquoReview of Financial Studies 18 37ndash84

Karatzas I and Shreve S (1991) Brownian Motion and Stochastic Calculus(2nd ed) New York Springer-Verlag

Kloeden D E Platen E Schurz H and Soslashrensen M (1996) ldquoOn Effectsof Discretization on Estimators of Drift Parameters for Diffusion ProcessesrdquoJournal of Applied Probability 33 1061ndash1076

Liptser R S and Shiryayev A N (2001) Statistics of Random Processes I II(2nd ed) New York Springer-Verlag

Mercurio D and Spokoiny V (2004) ldquoStatistical Inference for Time-Inhomogeneous Volatility Modelsrdquo The Annals of Statistics 32 577ndash602

Morgan J P (1996) RiskMetrics Technical Document (4th ed) New YorkAuthor

Nelson D B (1990) ldquoARCH Models as Diffusion Approximationsrdquo Journalof Econometrics 45 7ndash38

Revuz D and Yor M (1999) Continuous Martingales and Brownian MotionNew York Springer-Verlag

Robinson P M (1997) ldquoLarge-Sample Inference for Nonparametric Regres-sion With Dependent Errorsrdquo The Annals of Statistics 25 2054ndash2083

Rosenblatt M (1970) ldquoDensity Estimates and Markov Sequencesrdquo in Non-parametric Techniques in Statistical Inference ed M L Puri CambridgeUK Cambridge University Press pp 199ndash213

Ruppert D Wand M P Holst U and Houmlssjer O (1997) ldquoLocal PolynomialVariance Function Estimationrdquo Technometrics 39 262ndash273

Spokoiny V (2000) ldquoDrift Estimation for Nonparametric Diffusionrdquo The An-nals of Statistics 28 815ndash836

Stanton R (1997) ldquoA Nonparametric Models of Term Structure Dynamics andthe Market Price of Interest Rate Riskrdquo Journal of Finance LII 1973ndash2002

Vasicek O (1977) ldquoAn Equilibrium Characterization of Term Structurerdquo Jour-nal of Financial Economics 5 177ndash188

Volkonskii V A and Rozanov Yu A (1959) ldquoSome Limit Theorems forRandom Functions Part Irdquo Theory of Probability and Its Applications 4178ndash197

Wang Y (2002) ldquoAsymptotic Nonequivalence of GARCH Models and Diffu-sionsrdquo The Annals of Statistics 30 754ndash783

Page 13: Dynamic Integration of Time- and ... - Princeton Universityorfe.princeton.edu/~jqfan/papers/04/timestate4.pdf · Dynamic Integration of Time- and State-Domain Methods for Volatility

630 Journal of the American Statistical Association June 2007

where K1 is defined in Proposition A1 Similarly

Ln2 le M(n)2q2 (A9)

By (A7) (A8) and (A9) we have E(ε2j ) le M(n)2q therefore

E[σminus4t b2

j ] = 12 + O((n)q) By the theory of stochastic calculus sim-

ple algebra gives that E(bj) = 0 and E(bibj) = 0 for i 13= j It followsthat

V2n = E(sminus2

1t nB2n) =

tminus1sum

i=tminusn

E

(2s1t

radicn

1 minus λ

1 minus λn λtminusiminus1bi

)2rarr 1

that is (A5) holds For (A6) it suffices to prove that E(b4j ) is

bounded which holds from condition (C2) and by applying the mo-ment inequalities for martingales to b4

j

Proof of Theorem 2

The proof is completed along the same lines as given by Fan andZhang (2003)

Proof of Theorem 3

By Fan and Yao (1998) the volatility estimator σ 2StN

behaves asif the instantaneous return function f were known Thus without lossof generality we assume that f (x) = 0 and hence Ri = Y2

i Let Y =(Y2

0 Y2Nminus1)T and W = diagWh(rt0 minus rtN ) Wh(rtNminus1 minus rtN )

and let X be a (N minus t0)times 2 matrix with each of the elements in the firstcolumn being 1 and the second column being (rt0 minus rtN rtNminus1 minusrtN )prime Let mi = E[Y2

i |rti ] m = (m0 mNminus1)T and e1 = (10)T Define SN = XT WX and TN = XT WY Then it can be written thatσ 2

StN= eT

1 Sminus1N TN (see Fan and Yao 2003) Hence

σ 2StN

minus σ 2tN = eT

1 Sminus1N XT Wm minus XβN + eT

1 Sminus1N XT W(Y minus m)

equiv eT1 b + eT

1 t (A10)

where βN = (m(rtN ) mprime(rtN ))T with m(rx) = E[Y21 |rt1 = x] Follow-

ing Fan and Zhang (2003) the bias vector b converges in probability toa vector b with b = O(h2) = o(1

radic(N)h) In what follows we show

that the centralized vector t is asymptotically normalIn fact write u = (N)minus1Hminus1XT W(Y minus m) where H = diag1h

Then following Fan and Zhang (2003) the vector t can be written as

t = pminus1(rtN )Hminus1Sminus1u(1 + op(1)) (A11)

where S = (μi+jminus2)ij=12 with μj = intujW(u)du and p(x) is the in-

variant density of rt defined in Section 22 For any constant vector cdefine

QN = cT u = 1

N

Nminus1sum

i=0

Y2i minus miCh

(rti minus rtN

)

where Ch(middot) = 1hC(middoth) with C(x) = c0W(x) + c1xW(x) Applyingthe ldquobig-blockrdquo and ldquosmall-blockrdquo arguments of Fan and Yao (2003thm 63 p 238) we obtain

θminus1(rtN )radic

(N)hQNDminusrarr N(01) (A12)

where θ2(rtN ) = 2p(rtN )σ 4(rtN )int +infinminusinfin C2(u)du In what follows we

decompose QN into two parts QprimeN and Qprimeprime

N which satisfy the follow-ing

(a) NhE[θminus1(rtN )QprimeN ]2 le (hN)(hminus1aN(1 + o(1)) + (N) times

o(hminus1)) rarr 0

(b) QprimeprimeN is identically distributed as QN and is asymptotically inde-

pendent of σ 2EStN

Define QprimeN = 1

NsumaN

i=0Y2i minus E[Y2

i |rti ]Ch(rti minus rtN ) and QprimeprimeN = QN minus

QprimeN where aN is a positive integer satisfying aN = o(N) and aN rarr

infin Obviously aN n Let ϑNi = (Y2i minus mi)Ch(rti minus rtN ) Then fol-

lowing Fan and Zhang (2003)

var[θminus1(

rtN)ϑN1

] = hminus1(1 + o(1))

and

Nminus2sum

=1

| cov(ϑN1 ϑN+1)| = o(hminus1)

which yields the result in (a) This combined with (A12) (a) and thedefinition of Qprimeprime

N leads to

θminus1(rtN )radic

NhQprimeprimeN

Dminusrarr N(01) (A13)

Note that the stationarity conditions of Banon (1978) and the G2 con-dition of Rosenblatt (1970) on the transition operator imply that theρt mixing coefficient ρt() of rti decays exponentially and that thestrong mixing coefficient α() le ρt() it follows that∣∣E exp

(Qprimeprime

N + σ 2EStN

) minus E expiξ(QprimeprimeN)E exp

(iξ σ 2

EStN

)∣∣

le 32α((aN minus n)) rarr 0

for any ξ isin R Using the theorem of Volkonskii and Rozanov (1959)we get the asymptotic independence of σ 2

EStNand Qprimeprime

N

By (a)radic

NhQprimeN is asymptotically negligible Then by Theorem 1

d1θminus1(rtN

)radicNhQN + d2Vminus12

2radic

n[σ 2

EStNminus σ 2(

rtN)]

DminusrarrN (0d21 + d2

2)

for any real d1 and d2 where V2 = ec+1ecminus1σ 4(rtN ) Because QN is a

linear transformation of u

Vminus12[ radic

Nhuradicn[σ 2

EStNminus σ 2(rtN )]

]Dminusrarr N (0 I3)

where V = blockdiagV1V2 with V1 = 2σ 4(rtN )p(rtN )Slowast whereSlowast = (νi+jminus2)ij=12 with νj = int

ujW2(u)du This combined with

(A11) gives the joint asymptotic normality of t and σ 2EStN

Note that

b = op(1radic

Nh) and it follows that

minus12(radic

Nh[σ 2StN

minus σ 2(rtN )]radic

n[σ 2EStN

minus σ 2(rtN )])

Dminusrarr N (0 I2)

where = diag2σ 4(rtN )ν0p(rtN )V2 Note that σ 2StN

and σ 2EStN

are asymptotically independent it follows that the asymptotical nor-mality in (b) holds

The result in (c) follows from (b) and the fact that wtNPrarr wtN and

σ 2ItN

= wtN σ 2EStN

+ (1 minus wtN )σ 2StN

It follows that

σ 2ItN minus σ 2

tN

= σ 2tN minus σ 2

tN + (wtN minus wtN

)[(σ 2

EStNminus σ 2

tN

) minus (σ 2

StNminus σ 2

tN

)]

equiv Ln1 + (wtN minus wtN

)(Ln2 minus Ln3)

Because Ln1 Ln2 and Ln3 are of the same order the second and thirdterms are dominated by the first term Ln1 Then σ 2

ItNminus σ 2

tN = (σ 2tN minus

σ 2tN )(1 + op(1)) and thus σ 2

ItNminus σ 2

tN shares the same asymptotical

normality as σ 2tN minus σ 2

tN

[Received November 2005 Revised December 2006]

Fan Fan and Jiang Integrating Time and State Domains for Volatility Estimation 631

REFERENCESAiumlt-Sahalia Y and Mykland P (2004) ldquoEstimators of Diffusions With Ran-

domly Spaced Discrete Observations A General Theoryrdquo The Annals of Sta-tistics 32 2186ndash2222

Aiumlt-Sahalia Y Mykland P A and Zhang L (2005) ldquoHow Often to Samplea Continuous-Time Process in the Presence of Market Microstructure NoiserdquoReview of Financial Studies 18 351ndash416

Arapis M and Gao J (2004) ldquoNonparametric Kernel Estimation and Testingin Continuous-Time Financial Econometricsrdquo manuscript

Bandi F and Phillips (2003) ldquoFully Nonparametric Estimation of Scalar Dif-fusion Modelsrdquo Econometrica 71 241ndash283

Banon G (1978) ldquoNonparametric Identification for Diffusion ProcessesrdquoSIAM Journal of Control Optimization 16 380ndash395

Barndoff-Neilsen O E and Shephard N (2001) ldquoNon-Gaussian OrnsteinndashUhlenbeckndashBased Models and Some of Their Uses in Financial Economicsrdquo(with discussion) Journal of the Royal Statistical Society Ser B 63167ndash241

(2002) ldquoEconometric Analysis of Realized Volatility and Its Use inEstimating Stochastic Volatility Modelsrdquo Journal of the Royal Statistical So-ciety Ser B 64 253ndash280

Bollerslev T and Zhou H (2002) ldquoEstimating Stochastic Volatility DiffusionUsing Conditional Moments of Integrated Volatilityrdquo Journal of Economet-rics 109 33ndash65

Brenner R J Harjes R H and Kroner K F (1996) ldquoAnother Look at Mod-els of the Short-Term Interest Raterdquo Journal of Financial and QuantitativeAnalysis 31 85ndash107

Cai Z and Hong Y (2003) ldquoNonparametric Methods in Continuous-TimeFinance A Selective Reviewrdquo in Recent Advances and Trends in Nonpara-metric Statistics eds M G Akritas and D M Politis Amsterdam Elsevierpp 283ndash302

Chan K C Karolyi A G Longstaff F A and Sanders A B (1992)ldquoAn Empirical Comparison of Alternative Models of the Short-Term Inter-est Raterdquo Journal of Finance 47 1209ndash1227

Chapman D A and Pearson N D (2000) ldquoIs the Short Rate Drift ActuallyNonlinearrdquo Journal of Finance 55 355ndash388

Chen S X Gao J and Tang C Y (2007) ldquoA Test for Model Specificationof Diffusion Processesrdquo The Annals of Statistics to appear

Cox J C Ingersoll J E and Ross S A (1985) ldquoA Theory of the TermStructure of Interest Ratesrdquo Econometrica 53 385ndash467

Daveacute R D and Stahl G (1997) ldquoOn the Accuracy of VaR Estimates Based onthe VariancendashCovariance Approachrdquo working paper Olshen and Associates

Duan J C (1997) ldquoAugumented GARCH(pq) Process and Its DiffusionLimitrdquo Journal of Econometrics 79 97ndash127

Duffie D and Pan J (1997) ldquoAn Overview of Value at Riskrdquo The Journal ofDerivatives 4 7ndash49

Fan J (2005) ldquoA Selective Overview of Nonparametric Methods in FinancialEconometricsrdquo (with discussion) Statistical Science 20 317ndash357

Fan J and Gu J (2003) ldquoSemiparametric Estimation of Value-at-RiskrdquoEconometrics Journal 6 261ndash290

Fan J Jiang J Zhang C and Zhou Z (2003) ldquoTime-Dependent DiffusionModels for Term Structure Dynamics and the Stock Price Volatilityrdquo Statis-tica Sinica 13 965ndash992

Fan J and Yao Q (1998) ldquoEfficient Estimation of Conditional Variance Func-tions in Stochastic Regressionrdquo Biometrika 85 645ndash660

(2003) Nonlinear Time Series Nonparametric and Parametric Meth-ods New York Springer-Verlag

Fan J and Zhang C M (2003) ldquoA Reexamination of Diffusion EstimatorsWith Applications to Financial Model Validationrdquo Journal of the AmericanStatistical Association 98 118ndash134

Genon-Catalot V Jeanthheau T and Laredo C (1999) ldquoParameter Esti-mation for Discretely Observed Stochastic Volatility Modelsrdquo Bernoulli 5855ndash872

Gijbels I Pope A and Wand M P (1999) ldquoUnderstanding ExponentialSmoothing via Kernel Regressionrdquo Journal of the Royal Statistical SocietySer B 61 39ndash50

Hall P and Hart J D (1990) ldquoNonparametric Regression With Long-RangeDependencerdquo Stochastic Processes and Their Applications 36 339ndash351

Hall P and Heyde C (1980) Martingale Limit Theorem and Its ApplicationsNew York Academic Press

Hallin M Lu Z and Tran L T (2001) ldquoDensity Estimation for Spatial Lin-ear Processesrdquo Bernoulli 7 657ndash668

Haumlrdle W Herwartz H and Spokoiny V (2002) ldquoTime-InhomogeneousMultiple Volatility Modellingrdquo Journal of Finance and Econometrics 155ndash95

Hong Y and Li H (2005) ldquoNonparametric Specification Testing forContinuous-Time Models With Applications to Term Structure of InterestrdquoReview of Financial Studies 18 37ndash84

Karatzas I and Shreve S (1991) Brownian Motion and Stochastic Calculus(2nd ed) New York Springer-Verlag

Kloeden D E Platen E Schurz H and Soslashrensen M (1996) ldquoOn Effectsof Discretization on Estimators of Drift Parameters for Diffusion ProcessesrdquoJournal of Applied Probability 33 1061ndash1076

Liptser R S and Shiryayev A N (2001) Statistics of Random Processes I II(2nd ed) New York Springer-Verlag

Mercurio D and Spokoiny V (2004) ldquoStatistical Inference for Time-Inhomogeneous Volatility Modelsrdquo The Annals of Statistics 32 577ndash602

Morgan J P (1996) RiskMetrics Technical Document (4th ed) New YorkAuthor

Nelson D B (1990) ldquoARCH Models as Diffusion Approximationsrdquo Journalof Econometrics 45 7ndash38

Revuz D and Yor M (1999) Continuous Martingales and Brownian MotionNew York Springer-Verlag

Robinson P M (1997) ldquoLarge-Sample Inference for Nonparametric Regres-sion With Dependent Errorsrdquo The Annals of Statistics 25 2054ndash2083

Rosenblatt M (1970) ldquoDensity Estimates and Markov Sequencesrdquo in Non-parametric Techniques in Statistical Inference ed M L Puri CambridgeUK Cambridge University Press pp 199ndash213

Ruppert D Wand M P Holst U and Houmlssjer O (1997) ldquoLocal PolynomialVariance Function Estimationrdquo Technometrics 39 262ndash273

Spokoiny V (2000) ldquoDrift Estimation for Nonparametric Diffusionrdquo The An-nals of Statistics 28 815ndash836

Stanton R (1997) ldquoA Nonparametric Models of Term Structure Dynamics andthe Market Price of Interest Rate Riskrdquo Journal of Finance LII 1973ndash2002

Vasicek O (1977) ldquoAn Equilibrium Characterization of Term Structurerdquo Jour-nal of Financial Economics 5 177ndash188

Volkonskii V A and Rozanov Yu A (1959) ldquoSome Limit Theorems forRandom Functions Part Irdquo Theory of Probability and Its Applications 4178ndash197

Wang Y (2002) ldquoAsymptotic Nonequivalence of GARCH Models and Diffu-sionsrdquo The Annals of Statistics 30 754ndash783

Page 14: Dynamic Integration of Time- and ... - Princeton Universityorfe.princeton.edu/~jqfan/papers/04/timestate4.pdf · Dynamic Integration of Time- and State-Domain Methods for Volatility

Fan Fan and Jiang Integrating Time and State Domains for Volatility Estimation 631

REFERENCESAiumlt-Sahalia Y and Mykland P (2004) ldquoEstimators of Diffusions With Ran-

domly Spaced Discrete Observations A General Theoryrdquo The Annals of Sta-tistics 32 2186ndash2222

Aiumlt-Sahalia Y Mykland P A and Zhang L (2005) ldquoHow Often to Samplea Continuous-Time Process in the Presence of Market Microstructure NoiserdquoReview of Financial Studies 18 351ndash416

Arapis M and Gao J (2004) ldquoNonparametric Kernel Estimation and Testingin Continuous-Time Financial Econometricsrdquo manuscript

Bandi F and Phillips (2003) ldquoFully Nonparametric Estimation of Scalar Dif-fusion Modelsrdquo Econometrica 71 241ndash283

Banon G (1978) ldquoNonparametric Identification for Diffusion ProcessesrdquoSIAM Journal of Control Optimization 16 380ndash395

Barndoff-Neilsen O E and Shephard N (2001) ldquoNon-Gaussian OrnsteinndashUhlenbeckndashBased Models and Some of Their Uses in Financial Economicsrdquo(with discussion) Journal of the Royal Statistical Society Ser B 63167ndash241

(2002) ldquoEconometric Analysis of Realized Volatility and Its Use inEstimating Stochastic Volatility Modelsrdquo Journal of the Royal Statistical So-ciety Ser B 64 253ndash280

Bollerslev T and Zhou H (2002) ldquoEstimating Stochastic Volatility DiffusionUsing Conditional Moments of Integrated Volatilityrdquo Journal of Economet-rics 109 33ndash65

Brenner R J Harjes R H and Kroner K F (1996) ldquoAnother Look at Mod-els of the Short-Term Interest Raterdquo Journal of Financial and QuantitativeAnalysis 31 85ndash107

Cai Z and Hong Y (2003) ldquoNonparametric Methods in Continuous-TimeFinance A Selective Reviewrdquo in Recent Advances and Trends in Nonpara-metric Statistics eds M G Akritas and D M Politis Amsterdam Elsevierpp 283ndash302

Chan K C Karolyi A G Longstaff F A and Sanders A B (1992)ldquoAn Empirical Comparison of Alternative Models of the Short-Term Inter-est Raterdquo Journal of Finance 47 1209ndash1227

Chapman D A and Pearson N D (2000) ldquoIs the Short Rate Drift ActuallyNonlinearrdquo Journal of Finance 55 355ndash388

Chen S X Gao J and Tang C Y (2007) ldquoA Test for Model Specificationof Diffusion Processesrdquo The Annals of Statistics to appear

Cox J C Ingersoll J E and Ross S A (1985) ldquoA Theory of the TermStructure of Interest Ratesrdquo Econometrica 53 385ndash467

Daveacute R D and Stahl G (1997) ldquoOn the Accuracy of VaR Estimates Based onthe VariancendashCovariance Approachrdquo working paper Olshen and Associates

Duan J C (1997) ldquoAugumented GARCH(pq) Process and Its DiffusionLimitrdquo Journal of Econometrics 79 97ndash127

Duffie D and Pan J (1997) ldquoAn Overview of Value at Riskrdquo The Journal ofDerivatives 4 7ndash49

Fan J (2005) ldquoA Selective Overview of Nonparametric Methods in FinancialEconometricsrdquo (with discussion) Statistical Science 20 317ndash357

Fan J and Gu J (2003) ldquoSemiparametric Estimation of Value-at-RiskrdquoEconometrics Journal 6 261ndash290

Fan J Jiang J Zhang C and Zhou Z (2003) ldquoTime-Dependent DiffusionModels for Term Structure Dynamics and the Stock Price Volatilityrdquo Statis-tica Sinica 13 965ndash992

Fan J and Yao Q (1998) ldquoEfficient Estimation of Conditional Variance Func-tions in Stochastic Regressionrdquo Biometrika 85 645ndash660

(2003) Nonlinear Time Series Nonparametric and Parametric Meth-ods New York Springer-Verlag

Fan J and Zhang C M (2003) ldquoA Reexamination of Diffusion EstimatorsWith Applications to Financial Model Validationrdquo Journal of the AmericanStatistical Association 98 118ndash134

Genon-Catalot V Jeanthheau T and Laredo C (1999) ldquoParameter Esti-mation for Discretely Observed Stochastic Volatility Modelsrdquo Bernoulli 5855ndash872

Gijbels I Pope A and Wand M P (1999) ldquoUnderstanding ExponentialSmoothing via Kernel Regressionrdquo Journal of the Royal Statistical SocietySer B 61 39ndash50

Hall P and Hart J D (1990) ldquoNonparametric Regression With Long-RangeDependencerdquo Stochastic Processes and Their Applications 36 339ndash351

Hall P and Heyde C (1980) Martingale Limit Theorem and Its ApplicationsNew York Academic Press

Hallin M Lu Z and Tran L T (2001) ldquoDensity Estimation for Spatial Lin-ear Processesrdquo Bernoulli 7 657ndash668

Haumlrdle W Herwartz H and Spokoiny V (2002) ldquoTime-InhomogeneousMultiple Volatility Modellingrdquo Journal of Finance and Econometrics 155ndash95

Hong Y and Li H (2005) ldquoNonparametric Specification Testing forContinuous-Time Models With Applications to Term Structure of InterestrdquoReview of Financial Studies 18 37ndash84

Karatzas I and Shreve S (1991) Brownian Motion and Stochastic Calculus(2nd ed) New York Springer-Verlag

Kloeden D E Platen E Schurz H and Soslashrensen M (1996) ldquoOn Effectsof Discretization on Estimators of Drift Parameters for Diffusion ProcessesrdquoJournal of Applied Probability 33 1061ndash1076

Liptser R S and Shiryayev A N (2001) Statistics of Random Processes I II(2nd ed) New York Springer-Verlag

Mercurio D and Spokoiny V (2004) ldquoStatistical Inference for Time-Inhomogeneous Volatility Modelsrdquo The Annals of Statistics 32 577ndash602

Morgan J P (1996) RiskMetrics Technical Document (4th ed) New YorkAuthor

Nelson D B (1990) ldquoARCH Models as Diffusion Approximationsrdquo Journalof Econometrics 45 7ndash38

Revuz D and Yor M (1999) Continuous Martingales and Brownian MotionNew York Springer-Verlag

Robinson P M (1997) ldquoLarge-Sample Inference for Nonparametric Regres-sion With Dependent Errorsrdquo The Annals of Statistics 25 2054ndash2083

Rosenblatt M (1970) ldquoDensity Estimates and Markov Sequencesrdquo in Non-parametric Techniques in Statistical Inference ed M L Puri CambridgeUK Cambridge University Press pp 199ndash213

Ruppert D Wand M P Holst U and Houmlssjer O (1997) ldquoLocal PolynomialVariance Function Estimationrdquo Technometrics 39 262ndash273

Spokoiny V (2000) ldquoDrift Estimation for Nonparametric Diffusionrdquo The An-nals of Statistics 28 815ndash836

Stanton R (1997) ldquoA Nonparametric Models of Term Structure Dynamics andthe Market Price of Interest Rate Riskrdquo Journal of Finance LII 1973ndash2002

Vasicek O (1977) ldquoAn Equilibrium Characterization of Term Structurerdquo Jour-nal of Financial Economics 5 177ndash188

Volkonskii V A and Rozanov Yu A (1959) ldquoSome Limit Theorems forRandom Functions Part Irdquo Theory of Probability and Its Applications 4178ndash197

Wang Y (2002) ldquoAsymptotic Nonequivalence of GARCH Models and Diffu-sionsrdquo The Annals of Statistics 30 754ndash783