32 spline estimation of a semiparametric garch …

Econometric Theory, 32, 2016, 1023–1054.doi:10.1017/S0266466615000055

SPLINE ESTIMATION OFA SEMIPARAMETRIC GARCH MODEL

RONG LIUSoochow University and University of Toledo

LIJIAN YANGSoochow University

The semiparametric GARCH (Generalized AutoRegressive Conditional Het-eroskedasticity) model of Yang (2006, Journal of Econometrics 130, 365–384) hascombined the flexibility of a nonparametric link function with the dependence oninfinitely many past observations of the classic GARCH model. We propose a cubicspline procedure to estimate the unknown quantities in the semiparametric GARCHmodel that is intuitively appealing due to its simplicity. The theoretical properties ofthe procedure are the same as the kernel procedure, while simulated and real dataexamples show that the numerical performance is either better than or comparableto the kernel method. The new method is computationally much more efficient thanthe kernel method and very useful for analyzing large financial time series data.

1. INTRODUCTION

Volatility forecasting is of special interest for risk management and portfoliochoice that involve many financial time series such as stock and foreign exchangereturns. Empirical evidence had led to the understanding that for such series, thevolatility often depends on infinitely many past returns with diminishing weights.The GARCH(p,q) model of Bollerslev (1986), for example, allows the volatilityfunction to depend on all past observations, with geometrically decaying rate.

As a special case, the GARCH(1,1)model describes a process {Yt }∞t=−∞ of theform Yt = σtξt , t ∈ Z = {...,−2,−1,0,1,2, ...} where the innovations {ξt}t∈Z arei.i.d random variables satisfying E (ξt ) = 0, E

(ξ 2

t

) = 1, and{σ 2

t

}∞t=−∞ denotes

the conditional volatility series σ 2t = var (Yt |Yt−1,Yt−2, . . .) i.e.,

σ 2t = w+β0

∑∞j=1 θ

j−10 Y 2

t− j , t ∈ Z,w,β0, θ0 > 0, θ0 +β0 < 1. (1)

The comments from editor Peter Phillips, co-editor Oliver Linton and two anonymous referees have resulted in sub-stantial improvement of the work. This research has been supported in part by NSF awards DMS 0706518,1007594,Jiangsu Specially-AppointedProfessor Program SR10700111, Jiangsu Province Key-Discipline (Statistics) ProgramZY107002, ZY107992, National Natural Science Foundation of China Award NSFC 11371272, Research Fund forthe Doctoral Program of Higher Education of China Award 20133201110002. Address correspondence to LijianYang, Center for Advanced Statistics and Econometrics Research, Soochow University, Suzhou 215006, China;e-mail: [email protected].

c© Cambridge University Press 2015 1023

1024 RONG LIU AND LIJIAN YANG

Nelson (1990) established necessary and sufficient conditions for the stationarityand ergodicity of the GARCH(1,1) process. Engle and Ng (1993), Glosten,Jaganathan, and Runkle (1993), Hentschel (1995), Duan (1997), Hafner andHerwartz (2006), and Hafner (2008) examined various useful extensions of model(1), mostly providing empirical evidence without establishing asymptotic results.For related theoretical works on GARCH model, see Peng and Yao (2003), Sunand Stengos (2006), Chan, Deng, Peng, and Xia (2007), Giraitis, Leipus, andSurgailis (2010), Meitz and Saikkonen (2011), and Andrews (2012). Linton, Pan,and Wang (2010) and Zhang and Ling (2014) established asymptotic results forheavy-tailed noises.

In recent years, there has been a surge of interest in applying nonparametricsmoothing theory to volatility estimation, as in Yang, Hardle, and Nielsen (1999),Dahl and Levine (2006), Levine (2006), Brown and Levine (2007), and Kim andLinton (2011). In particular, Linton and Mammen (2005) proposed an iterativealgorithm for nonparametric GARCH model of the form

σ 2t =∑∞

j=1 θj−1

0 m(Yt− j

), t ∈ Z,0< θ0 < 1

with unknown parameter θ0 and unknown smooth news impact function m, with-out asymptotic theory. A truncated version of the above nonparametric model wasstudied in Yang (2000), Yang (2002), and Wang, Feng, Song, and Yang (2012)with asymptotic results, yet it failed to capture the dependence of σ 2

t on infinitelymany past Yt− j .

As an alternative, Yang (2006) proposed a class of semiparametric GARCHmodel

σ 2t = m

{∑∞j=1 θ

j−10 v

(Yt− j ; η0

)}, t ∈ Z, θ0 ∈ (0,1),η0 ∈ [η1,η2] ⊂ [0,∞) , (2)

where v (y; η)= y2 +η0 y21(y<0), with unknown parameter vector γ0 = (θ0,η0)and unknown smooth link function m. The unknown γ0 and m were estimatedby kernel estimation method with satisfactory theoretical properties and numeri-cal accuracy in simulations and applications. Like all the aforementioned worksbased on kernel smoothing, the algorithm in Yang (2006) is extremely slow dueto the intensive computation of solving as many least squares problems as thesample size. The average computing time for the local linear based algorithm inYang (2006) is contained in Table 3 for sample sizes n from 1,000 to 4,000, andone can see that it grows at the rate of n2. At n = 4,000, which is a moderate sam-ple size for financial time series, the estimation of unknown parameter vector γ0takes 5 hours. The method of Yang (2006) is therefore not appealing for practicaluse.

Model (1) has been extended to multivariate GARCH by Bauwens, Laurent,and Rombouts (2006), Silvennoinen and Terasvirta (2009), Linton (2009), Conradand Karanasos (2010), Hafner and Linton (2010), and Francq and Zakoıan(2012) which take into account conditional correlations in addition to conditional

SPLINE ESTIMATION OF A SEMIPARAMETRIC GARCH MODEL 1025

volatility. Extending the semiparametric model (2) to multivariate time serieswould bring much progress to an active field and this paper serves as an importantfirst step in this direction.

It is widely recognized that global smoothing methods such as those by splineor wavelet are much more computationally efficient than local kernel smooth-ing, see for example the comparison of computing time in Xue and Yang (2006)and Wang and Yang (2007). Recent development of regression spline smooth-ing in terms of local asymptotics (Huang, 2003) and high dimensional andweakly dependent data (Huang and Yang, 2004; Xue and Yang, 2006; Wangand Yang, 2007) has presented convincing incentives for applying spline smooth-ing to solve challenging problems in time series analysis. We apply cubic splinesmoothing to the semiparametric GARCH model (2), which resulted in a proce-dure which is much faster but shares the same theoretical and numerical prop-erties of the kernel smoothing procedure in Yang (2006). Table 3 shows thecomputing time comparison between the proposed cubic spline method versusthe local linear method in estimating γ0. Clearly, the cubic spline method issuperior for large samples as its computing time is proportional to n−1 of thecorresponding time of the local linear method. The advantage of spline methodhad already been recognized by Engle and Ng (1993), who proposed spline es-timation for the news impact curve for extensions of model (1), without de-veloping justifications by asymptotic theory. Theoretical justifications may bealso rather difficult to establish for wavelet or other basis, when applied in thetime series context. Some comparisons can be found in Baraud, Comte, andViennet (2001).

The paper is organized as follows. In Section 2 we discuss the assumptions ofthe model (2), the spline estimation of the unknown parameter γ0 and asymptoticproperties including oracle efficiency. In Section 3 we describe the implementa-tion of the estimator. In Sections 4 and 5 we apply the method to simulated andempirical examples. All technical proofs are given in the Appendix.

2. ESTIMATION METHOD

The statistical inference of the semiparametric GARCH model (2) consists ofestimating both parameter γ0 and link function m. In this paper we focus on es-timating the parameter and one can estimate the link function by using γ as thetrue value of γ0, but the theoretical properties of such plug-in estimation requirefurther research.

For convenience, define

Xt =∑∞

j=1θ

j−10 v

(Yt− j ; η0

), t ∈ Z,

which simplifies model (2) to Yt = m1/2 (Xt) ξt ,σ2t = m (Xt ) , t ∈ Z while

the process {Xt}∞t=−∞ satisfies the Markovian equation Xt = θ0 Xt−1 +v{m (Xt−1)ξ

2t−1; η0

}, t ∈ Z.


The following assumptions on the data generating process are used.

A1: The process {Yt }∞t=−∞ is strictly stationary, and the innovations {ξt}t∈Z

have finite 6-th absolute moments E |ξt |6 <∞.

A2: The link function m (·) is positive everywhere on R+ and has Lipschitzcontinuous 4-th derivative. There exist constants 0 < δ,c < ∞ such thatE X δt <∞ and

limx→∞sup m2/δ (x)/x = m0 ∈ (0,c).

Since γ0 is an unknown parameter vector in (0,1)× [η1,η2], to make numeri-cal optimization feasible, we assume that θ0 lies in the interior of [θ1, θ2], where0 < θ1 < θ2 < 1, are boundary values known a priori. In practice, one takes suf-ficiently small θ1 and large θ2 based on prior knowledge of the data, and denotes = [θ1, θ2] × [η1,η2]. Define next Xγ,t as a series analogous to Xt but with anycandidate value of γ ∈ Xγ,t =

∑∞j=1

θ j−1v(Yt− j ; η

), t ∈ Z. (3)

We need the following assumptions on the processes{

Xγ,t}∞

t=−∞ , γ ∈ .

A3: The processes{

Xγ,t}∞

t=−∞ , γ ∈ are jointly strictly stationary and geo-

metrically α-mixing, i.e., the α -mixing coefficient α(k) ≤ cρk, for con-stants c> 0, 0< ρ < 1, where

α(k)= supA∈σ(Xγ ,t ,t≤0,γ∈),B∈σ(Xγ ,t ,t≥k,γ∈)

|P(A)P(B)− P(A ∩ B)| .

The ergodicity and mixing properties of Xγ,t were discussed in Carrasco andChen (2002) and Yang (2006). Mixing conditions similar to Assumption (A3)are standard in the time series literatures, see Linton and Mammen (2005) andWang and Yang (2007), although primitive conditions that ensure Assumption(A3) remain unavailable. From Assumption (A3) and the fact that the innovations{ξt}∞t=−∞ are iid, the joint distribution of

(Yt , ξt , Xγ,t , γ ∈ ) is strictly station-

ary. Since the range of each Xγ,t is (0,+∞), one first transforms all Xγ,t ’s bya common transformation to make their range [0,1], so B spline regression canbe applied to the transformed variables. For each γ∈ , define the transformedvariables for the Xγ,t as,

Uγ,t = F(Xγ,t

)= Fγ1

(Xγ,t

)+ Fγ2

(Xγ,t

)2

,1 ≤ t ≤ n

in which Fγ1 and Fγ2 are cdfs of Xγ1,t and Xγ2,t respectively, where γ1 = (θ1,η1)and γ2 = (θ2,η2). Since the Xγ,t ’s are increasing in both θ and η, one has Fγ1 ≤Fγ ≤ Fγ2 for any γ ∈ , thus the common transformation function F assignssufficient probability mass to the whole range of [0,1]. In particular, we denoteUt = Uγ0,t = F

(Xγ0,t

)= F (Xt). With previous transformation, one assumes


A4: The pdf associated with F is f (x) > 0,∀x ∈ (0,+∞) and Uγ,t has a pdfϕγ (·) which is Lipschitz continuous and there exist constants cϕ,Cϕ suchthat infγ∈,0≤u≤1 ϕγ (u)≥ cϕ and supγ∈,0≤u≤1 ϕγ (u)≤ Cϕ .

For any γ∈ define the predictor of Y 2t based on Uγ,t as gγ (u)= E(Y 2

t |Uγ,t =u),0 < u < 1. In particular, denote g(Ut ) = gγ0(Uγ0,t ) = E(Y 2

t |Uγ0,t ) =m (Xt ). Define the risk function of γ as R(γ) = E

{Y 2

t − gγ (Uγ,t )}2

. ApparentlyYt has finite 4-th moment due to Assumptions (A1) and (A2). So R(γ) allows the

usual bias-variance decomposition R(γ)= E{

g(Ut )− gγ (Uγ,t )}2 +(E |ξt |4 −1)

Eg2(Ut ), together with g(Ut )≡ gγ0(Uγ0,t ), imply that

R(γ) = E{

g(Ut)− gγ (Uγ,t )}2 + R(γ0)≥ R(γ0),∀γ ∈.

We need the following assumption on the function R(γ),

A5: The function R(γ) has positive definite Hessian matrix at γ0, and con-sequently R(γ) is locally convex at γ0, i.e., for any ε > 0, there existsδ > 0 such that R(γ)− R(γ0) < δ implies ‖γ −γ0‖< ε, where ‖·‖ is theEuclidean norm.

Thus by minimizing the prediction error of Y 2t on Uγ,t , one should be able

to locate the true parameter γ consistently via polynomial spline smoothing.To introduce the space of splines, we divide [0,1] into (N +1) subintervalsJj = [

tj , tj+1), j = 0, . . . ,N −1, JN = [tN ,1], where T := {

tj}N

j=1 is a sequenceof equally-spaced points, called interior knots, given as

t1−k = · · · = t−1 = t0 = 0< t1 < · · ·< tN < 1 = tN+1 = · · · = tN+k,

in which tj = j h, j = 0,1, . . .,N + 1,h = 1/ (N +1) is the distance betweenneighboring knots. The j -th B-spline of order k for the knot sequence T denotedby Bj,k is recursively defined by de Boor (2001) as

Bj,k (u)=(u − tj

)Bj,k−1 (u)

tj+k−1 − tj−(u − tj+k

)Bj+1,k−1 (u)

tj+k − tj+1,1 − k ≤ j ≤ N

for k > 1, with

Bj,1 (u)= I{u∈Jj } ={

1 tj ≤ u < tj+10 otherwise

.

Define the spaces of linear, quadratic, and cubic spline functions on [0,1] as

G(k−2) = G(k−2)[0,1]

={ψ : ψ (u)≡

∑N+1

J=1−kλJ BJ,k (u) ,u ∈ [0,1]

}, k = 2,3,4.


Given a realization {Yt }nt=1, define for ∀γ ∈ the cubic spline estimator of gγ (·)

gγ (·) = argminψ∈G(2)1

n′′∑n

t=n′+1

{Y 2

t −ψ(Uγ,t )}2

with n′′ = n −n′, where the first n′ data points are not used in the above estimatorfor implementation reasons given in Section 3. Define next the empirical riskfunction

R(γ)= 1

n′′∑n

t=n′+1

{Y 2

t − gγ (Uγ,t )}2

and let γ be the minimizer of R(γ), i.e.

γ = argminγ∈ R(γ). (4)

A6: The number of interior knots N satisfies: n1/6 � N = Nn � n1/5

(logn)−2/5 ,h = (N +3)−1.

The proofs of the following proposition and theorems use complex splinesmoothing arguments and are given in the Appendix.

PROPOSITION 1. Under Assumptions (A1)–(A6), as n → ∞

supγ∈∣∣∣∇(k) R(γ)−∇(k)R(γ)

∣∣∣→ 0,a.s.,k = 0,1,2.

The next theorem establishes the strong consistency of γ .

THEOREM 1. Under assumptions (A1)–(A6), as n → ∞,∣∣ γ −γ0

∣∣→ 0,a.s..

Denote the asymptotic variance of γ by the following formula

� (γ0)={∇2 R(γ0)

}−1� (γ0)

{∇2 R(γ0)

}−1(5)

with

� (γ0)= 4E

[{gγ (Uγ0,t )−Y 2

t

}2 {∇gγ (Uγ,t )}{∇gγ (Uγ,t )

}T |γ=γ0

]2

(6)

and

∇2 R(γ0)= 2E[{

gγ0(Uγ0,t )−Y 2t

}∇2gγ (Uγ,t )|γ=γ0

+{∇gγ (Uγ,t )}{∇gγ (Uγ,t )

}T |γ=γ0

]. (7)

Here ∇2gγ (Uγ,t )|γ=γ0 is not the same as ∇2gγ (Uγ0,t ) since both gγ and Uγ,tdepend on γ . The next theorem establishes γ ’s

√n-asymptotic normality.


THEOREM 2. Under assumptions (A1)–(A6), as n → ∞,

√n ( γ −γ0)→d N (0,� (γ0)) . (8)

According to Theorem 2, the true parameter vector γ0 can be estimated byγ at

√n-rate. One can then use the estimated γ in place of the unknown

γ0 for the estimation of function m. We define next the “would-be oracle”estimator of γ0 as γ = argminγ∈A R(γ0) under the oracle assumption that the

link function g is known, where the oracle empirical risk is R(γ) = (n′′)−1∑n

t=n′+1

{Y 2

t − g(Uγ,t )}2

. So γ serves as a benchmark of oracle optimal-ity and the following theorem states the asymptotic oracle efficiency of theestimator γ .

THEOREM 3. Under assumptions (A1)–(A6), as n → ∞, the estimator γ isasymptotically oracally efficient, i.e., it is asymptotically as efficient as γ . Specif-ically,

√n (γ −γ0)→d N (0,� (γ0)) , where the variance � (γ0) is the same as

in (5) and (8).

The proof of Theorem 3 consists of routine arguments in parametric inference,thus it is omitted. If γ0 is known, asymptotic convergence rate of the estimation offunctions g and m are given by Stone (1985, Thm. 1) and Huang and Yang (2004,Lemma 4).

3. IMPLEMENTATION

For a given realization {Yt }nt=1, denote in the following two integers

n′ =[2 logn/ log

(θ−1

2

)]+1,n′′ = n −n′.

It is easily verified that

supθ∈[θ1,θ2]

θn′ = θn′2 < n−2,

which is the magnitude of error one would incur if the infinite series in (2) weretruncated at n′. In practice, one always has to replace the infinite series of Xγ,t in

(3) by a finite truncated∑n′

j=1 θj−1v

(Yt− j ; η

)for t ∈ Z, the difference between

the two being∑∞j=n′+1

θ j−1v(Yt− j ; η

)≤∑∞

j=n′+1θ

j−12 v

(Yt− j ; η

)= θn′

2

∑∞j=1

θj−1

2 v(Yt−n′− j ; η

)= θn′

2 Xγ2,t−n′ < n−2Xγ2,t−n′ ,


which is bounded by n−2 times a stationary process with finite varianceaccording to Assumption (A1). Thus instead of computing the infinite sum∑∞

j=1 θj−1v

(Yt− j ; η

), slowly growing truncation

∑n′j=1 θ

j−1v(Yt− j ; η

)is used

for implementing the algorithm due to practicality. Also due to practicality, theempirical cdfs Fγ1 and Fγ2 of Xγ1,t and Xγ2,t are used in place of Fγ1 and Fγ2

respectively to compute the transformation function F . The range [γ1, γ2] can bechosen as wide as possible. In practice, one can start with a narrow range, andexpand the range in the case of the estimator γ being too close to γ1 or γ2, andre-estimate with the new parameter range. Lastly, the number of interior knotsN = Nn is computed according to the formula N = min

([n2/11

]+1,n/4 −1),

which satisfies the Assumption (A6).One computes the value of R(γ) over an equally spaced grid of points from γ1

to γ2, and takes the one with smallest R(γ) value as γ according to (4). In the nexttwo sections, numerical evidence is presented on how the proposed procedureswork for both simulated and real time series data.

4. SIMULATION

To investigate the finite-sample precision of the proposed estimator, the procedureis applied to time series data generated according to (2) with θ0 = 0.85,η0 =0.5, = [θ1, θ2]× [η1,η2] = [0.75,0.95] × [0.5,0.6], and function

m(x) = 0.01 (2x +1 + sin(x/5))/ (1 − θ0) (9)

and ξt has either the standard double exponential distribution or the standardnormal distribution. The data generating process actually follows the standardGARCH model possessing all the known theoretical properties presented inEngle and Ng (1993) and Glosten et al. (1993).

For sample sizes n = 1,000,2,000,4,000,8,000,a total of K = 100 realizationsof length n + 400 are generated according to model (2), with functions m (x) as(9). For each realization, the last n observations are kept as our data for inference.Truncating the first 400 observations off the series ensures that the remainingseries behaves like a stationary one. Estimator γ of the parameter γ0 is computedaccording to the setups described in Section 3, using cubic spline. For comparison,we also compute the infeasible oracle estimator γ and the maximum likelihoodestimator γ with the correct link function and treating the ξt as standard normal.

For a parameter λ and its estimate λ, define Var(λ) = K −1∑K

k=1(λk − K −1∑K

k=1 λk)2, Bias

(λ) = K −1∑K

k=1

(λk − λ), MSE

(λ) = K −1∑K

k=1(λk − λ)2, where λk is obtained from k-th replication. Table 1 consists of the

average sum of squared error, bias, variance for θ , θ , and θ , with n = 1,000,2,000,4,000,8,000, and efficiencies EFF

(θ , θ

) = MSE(θ)/MSE

(θ),

EFF(θ , θ

) = MSE(θ)/MSE

(θ). Table 2 contains the same measures for

η, η and η.


TABLE 1. Simulated example: The MSE, Bias, Var for θ , θ , θ and EFF(θ , θ

),

EFF(θ , θ

)for n = 1,000,2,000,4,000,8,000 with 100 replications. Numbers

outside/inside parenthesis correspond to ξt having standard double exponen-tial/standard normal distributions

n = 1,000 n = 2,000 n = 4,000 n = 8,000

θ 0.00290 (0.00298) 0.00221 (0.00244) 0.00158 (0.00120) 0.00076 (0.00086)MSE θ 0.00227 (0.00250) 0.00191 (0.00206) 0.00137 (0.00105) 0.00073 (0.00080)

θ 0.00368 (0.00246) 0.00271 (0.00188) 0.00230 (0.00093) 0.00134 (0.00068)

θ−0.0110 (−0.0081) −0.0104 (−0.0204) 0.0004 (0.00051) 0.0070 (−0.00134)Bias θ−0.0048 (0.0050 −0.0034 (0.0026) −0.0040 (−0.0062) −0.0020 (0.0064)

θ−0.0078 (−0.0042) −0.0052 (−0.0074) 0.0044 (−0.0136) 0.0150 (−0.0040)

θ 0.00277 (0.00283) 0.00210 (0.00202) 0.00158 (0.0011) 0.00071 (0.00068)Var θ 0.00197 (0.00241) 0.00188 (0.00192) 0.00135 (0.0009) 0.00067 (0.00072)

θ 0.00366 (0.00239) 0.00271 (0.00182) 0.00228 (0.0008) 0.00105 (0.00067)

EFF (θ , θ) 0.783 (0.839) 0.864 (0.844) 0.867 (0.875) 0.961 (0.930)EFF(θ, θ) 1.269 (0.825) 1.226 (0.770) 1.456 (0.775) 1.684 (0.79)

TABLE 2. Simulated example: The MSE, Bias, Var η, η, η and EFF(η, η

), for

n = 1,000,2,000,4,000,8,000 with 100 replications

n = 1,000 n = 2,000 n = 4,000 n = 8,000

η 0.00256 (0.00265) 0.00210 (0.00244) 0.00138 (0.00114) 0.00073 (0.00071)MSE η 0.00184 (0.00235) 0.00163 (0.00206) 0.00145 (0.00105) 0.00068 (0.00062)

η 0.00334 (0.00226) 0.00236 (0.00163) 0.00205 (0.00078) 0.00128 (0.00057)

η 0.0067 (−0.0064) −0.0174 (−0.0105) −0.0024 (0.0036) 0.0066 (−0.0076)Bias η−0.0035 (0.0038) −0.0022 (−0.0038) −0.0045 (−0.0045) −0.0018 (0.0024)

η−0.0064 (−0.0029) −0.0052 (−0.0068) 0.0032 (−0.0027) 0.0078 (−0.0034)

η 0.00249 (0.00254) 0.00181 (0.00215) 0.00138 (0.0009) 0.00081 (0.00068)Var η 0.00173 (0.00226) 0.00162 (0.00184) 0.00142 (0.0009) 0.00065 (0.00059)

η 0.00366 (0.00204) 0.00271 (0.00156) 0.00228 (0.0007) 0.00105 (0.00054)

EFF (η, η) 0.719 (0.839) 0.776 (0.844) 0.917 (0.875) 0.939 (0.930)EFF(η, η) 1.437 (0.853) 1.123 (0.668) 1.485 (0.684) 1.753 (0.803)

Tables 1 and 2 show that the estimated θ and η converge to the true pa-rameters θ0 and η0 as the sample size increases, corroborating the asymptoticsin Theorem 2. In Figure 1, the probability density functions of θ/θ0 and η/η0 areestimated by kernel smoothing based on the 100 replications, which also confirm


FIGURE 1. Plot of densities of (a) θ/θ0 and (b) η/η0 of n = 1,000 - dashed line,n = 2,000 - dotted line, n = 4,000 - thin solid line, n = 8,000 - thick solid line.


TABLE 3. Computing time (in seconds) of cubic spline estimation and locallinear estimation of parameter θ0 for one replication with n = 1,000, 2,000, 4,000,8,000 (For n = 8,000, the time is omitted for local linear estimation as it is exces-sive.)

n 1,000 2,000 4,000 8,000

Spline estimation 12 35 92 252Local linear estimation 650 3600 18000 −Time ratio 1 : 57 1 : 103 1 : 196 −

the numerical convergence. The efficiency results in Tables 1 and 2 show that bothEFF

(θ , θ

)and EFF

(η, η

)converge to 1 as Theorem 3 states; on the other hand,

both EFF(θ , θ

)and EFF

(η, η

)are much less than 1 for normal innovations and

much greater than 1 for double exponential innovations. The latter phenomena areclearly caused by γ being optimally efficient when the correct parametric model(in this case Gaussian model when actual innovations are normal) is specified,and completely wrong when an incorrect one is used (in this case Gaussian modelwhen actual innovations are double exponential).

We have experimented with knot numbers ranging from 3 to 10, and have notseen significant changes in the simulation study.

As discussed in the introduction, Table 3 shows the computing time compar-ison between the proposed cubic spline method and the local linear method ofYang (2006) in estimating parameter γ0. Since for each candidate parameter vec-tor γ , the cubic spline method needs to solve one linear least squares problemin order to compute the empirical risk while the local linear has to solve n, onefor each data point, so the ratio of their computing times is inversely proportionalto n. As a matter of fact, the computing times are of order n and n2 respectivelyfor the cubic spline and the local linear methods. Since the theoretical proper-ties and numerical performance of the two are similar, the cubic spline methodis the one we would recommend for the estimation of parameter γ0. Once theparameter γ0 has been efficiently estimated, the estimation of functions g and mcan be done via either kernel type or spline type method, using the estimated γin place of γ0.

Given the above empirical observations, γ is a very competitive estimator forγ0 in terms of robustness, efficiency and computing time. Since the sample sizeswe have used are common for high frequency financial time series such as thedata set in the next section, the satisfactory numerical performance provides theassurance one needs to apply the procedure to real data.

5. APPLICATIONS

In this section, we apply the semiparametric GARCH model on stock daily per-centage returns of the BMW share price from 1 June 1986 to 30 January 1994,


TABLE 4. Fitting the BMW stock returns, the GARCH(1,1) model has θ =0.87, the GJR(1,1) model has θ = 0.87, η = 0.66, and the semiparametricGARCH(Spline) has θ = 0.87, η= 0.66

Fitted model Log-likelihood Volatility prediction error

GARCH(1,1) −0.8528 15.65GJR −0.8234 15.04Semi. GARCH(Spline) −0.7952 14.15

a total of 2,000 observations. We truncated Yt by its 0.01 and 0.99 quantiles.For more details, see Wang et al. (2012). In analyzing the data set, a process{

Xγ,t}2000

t=1 is generated for every parameter value γ . The parameter estimateγ is first obtained according to Section 3. In the second step, we use the es-timated γ in place of the unknown γ0 for the Nadaraya–Watson estimation offunction g. The volatility forecasts are σ 2

t = g γ(U γ ,t

), while the residuals are

ξt = Yt/σt , t = 1, . . .,2000.In Table 4, we have compared the goodness-of-fit of our model with

GARCH(1,1) and GJR(1,1) in terms of volatility prediction error∑2000

t=51

(Y 2

t −σ 2

t

)2/1950 and loglikelihood −(1/2)∑2000

t=51

{Y 2

t /σ2t + ln

(σ 2

t

)}/1950 with

n′ = 50. The semiparametric GARCH model (2) with spline estimation methodhas the best log-likelihood and prediction error. In Figure 2, we show the standard-ized residuals and estimated autocorrelation function (ACF) in the daily returnseries and there is very little if any dependence left in the residuals. Furtherevidence of the residuals’ randomness is provided in Table 5, where p-values arelisted for the Ljung–Box and Box-Pierce tests of the semiparametric GARCHresiduals. All p-values are large, and hence there is no evidence of any serialdependence in the residuals.

Figure 3 (a) shows the graph of the estimation of function g. Then we canestimate the unknown smooth link function m by using m

(X γ ,t

) = g γ(U γ ,t

),

which is shown in Figure 3 (b).

TABLE 5. Significance probabilities of Portmanteau tests on the residuals of thesemiparametric GARCH model

Lag LB BP

20 0.7569 0.763330 0.7268 0.730840 0.5538 0.5771


FIGURE 2. Semiparametric GARCH modeling of BMW stock returns: (a) Standardizedresiduals (b) the estimated ACF with 95% Bartlett intervals.


FIGURE 3. Plots of estimated function (a) g and (b) m by m(

X γ ,t)

= g γ(

U γ ,t

)for the

semiparametric GARCH model.


REFERENCES

Andrews, B. (2012) Rank-based estimation for GARCH processes. Econometric Theory 28,1037–1064.

Baraud, Y., F. Comte, & G. Viennet (2001) Model selection for (auto-)regression with dependent data.ESAIM: Probability and Statistics 5, 33–49.

Bauwens, L., B. Laurent, & J. Rombouts (2006) Multivariate GARCH models: A survey. Journal ofApplied Econometrics 21, 79–109.

Bollerslev, T.P. (1986) Generalized autoregressive conditional heteroscedasticity. Journal of Econo-metrics 31, 307–327.

Bosq, D. (1998) Nonparametric Statistics for Stochastic Processes. Springer-Verlag.Brown, L.D. & M. Levine (2007) Variance estimation in nonparametric regression via the difference

sequence method. Annals of Statistics 35, 2219–2232.Carrasco, M. & X. Chen (2002). Mixing and moment properties of various GARCH and stochastic

volatility models. Econometric Theory 18, 17–39.Chan, N., S. Deng, L. Peng, & Z. Xia (2007) Interval estimation of value-at-risk based on GARCH

models with heavy-tailed innovations. Journal of Econometrics 137, 556–576.Conrad, C. & M. Karanasos (2010) Negative volatility spillovers in the unrestricted ECCC-GARCH

model. Econometric Theory 26, 838–862.Dahl, C.M. & M. Levine (2006) Nonparametric estimation of volatility models with serially dependent

innovations. Statistics and Probability Letters 76, 2007–2016.de Boor, C. (2001) A Practical Guide to Splines. Springer-Verlag.DeVore, R.A. & G.G. Lorentz (1993) Constructive Approximation: Polynomials and Splines Approx-

imation. Springer-Verlag.Duan, J.C. (1997) Augmented GARCH(p,q) process and its diffusion limit. Journal of Econometrics

79, 97–127.Engle, R.F. & V. Ng (1993) Measuring and testing the impact of news on volatility. Journal of Finance

48, 1749–1778.Fan, J. & Q. Yao (2003) Nonlinear Time Series: Nonparametric and Parametric Methods. Springer-

Verlag.Francq, C. & J.M. Zakoıan (2012) QML estimation of a class of multivariate asymmetric GARCH

models. Econometric Theory 28, 179–206.Giraitis, L., R. Leipus, & R. Surgailis (2010) Aggregation of the random coefficient GLARCH (1,1)

process. Econometric Theory 26, 406–425.Glosten, L.R., R. Jaganathan, & D.E. Runkle (1993) On the relation between the expected value and

the volatility of the nominal excess return on stocks. Journal of Finance 48, 1779–1801.Hafner, C.M. (2008) Temporal aggregation of multivariate GARCH processes. Journal of Economet-

rics 142, 467–483.Hafner, C.M. & H. Herwartz (2006) Volatility impulse responses for multivariate GARCH models:

An exchange rate illustration. Journal of International Money and Finance 25, 719–740.Hafner, C.M. & O. Linton (2010) Efficient estimation of a multivariate multiplicative volatility model.

Journal of Econometrics 159, 55–73.Hentschel, L. (1995) All in the family: Nesting symmetric and asymmetric GARCH models. Journal

of Financial Economics 39, 71–104.Huang, J.Z. (2003) Local asymptotics for polynomial spline regression. Annals of Statistics 31,

1600–1635.Huang, J.Z. & L. Yang (2004) Identification of nonlinear additive autoregressive models. Journal of

the Royal Statistical Society Series B 66, 463–477.Kim, W. & O. Linton (2011) Estimation of a semiparametric IGARCH(1,1) model. Econometric

Theory 27, 639–661.


Levine, M. (2006) Bandwidth selection for a class of difference-based variance estimators in thenonparametric regression: A possible approach. Computational Statistics and Data Analysis 50,3405–3431.

Linton, O. (2009) Semiparametric and nonparametric ARCH modeling. In T.G. Andersen, R.A. Davis,J.P. Kreiss and Th. Mikosch (Eds.), Handbook of Financial Time Series, pp. 157–167. Springer-Verlag.

Linton, O. & E. Mammen (2005) Estimating semiparametric ARCH(∞) models by kernel smoothingmethods. Econometrica 73, 771–836.

Linton, O., J. Pan, & H. Wang (2010) Estimation for a nonstationary semi-strong GARCH (1, 1) modelwith heavy-tailed errors. Econometric Theory 26, 1–28.

Meitz, M. & P. Saikkonen (2011) Parameter estimation in nonlinear AR–GARCH models. Economet-ric Theory 27, 1236–1278.

Nelson, D.B. (1990) Stationarity and persistence in the GARCH(1, 1) Model. Econometric Theory 6,318–334.

Peng, L. & Q. Yao (2003) Least absolute deviations estimation for ARCH and GARCH models.Biometrika 90, 967–975.

Silvennoinen, A. & T. Terasvirta (2009) Modeling multivariate autoregressive conditional het-eroskedasticity with the double smooth transition conditional correlation GARCH model. Journalof Financial Econometrics 7, 373–411.

Stone, C. (1985) Additive regression and other nonparametric models. Annals of Statistics 13,689–705.

Sun, Y. & T. Stengos (2006) Semiparametric efficient adaptive estimation of asymmetric GARCHmodels. Journal of Econometrics 133, 373–386.

Wang, L., C. Feng, Q. Song, & L. Yang (2012) Efficient semiparametric GARCH modelling of finan-cial volatility. Statistica Sinica 22, 249–270.

Wang, L. & L. Yang (2007) Spline-backfitted kernel smoothing of nonlinear additive autoregressionmodel. Annals of Statistics 35, 2474–2503.

Xue, L. & L. Yang (2006) Additive coefficient modeling via polynomial spline. Statistica Sinica 16,1423–1446.

Yang, L. (2000) Finite nonparametric GARCH model for foreign exchangevolatility. Communicationsin Statistics Theory and Methods 5 & 6, 1347–1365.

Yang, L. (2002) Direct estimation in an additive model when the components are proportional. Statis-tica Sinica 12, 801–821.

Yang, L. (2006) A semiparametric GARCH model for foreign exchange volatility. Journal of Econo-metrics 130, 365–384.

Yang, L., W. Hardle, & J.P. Nielsen (1999) Nonparametric autoregression with multiplicative volatilityand additive mean. Journal of Time Series Analysis 20, 597–604.

Zhang, R. & S. Ling (2014) Asymptotic inference for AR models with heavy-tailed G-GARCH noises.Econometric Theory. Available on CJO2014. doi:10.1017/S0266466614000632.

APPENDIX

A.1. Preliminaries

In the whole section, we use ‖g‖∞ denote supx |g (x)|. For any functions g1,g2 ∈ L2 [0,1],define for ∀γ ∈ the theoretical inner product and norm as

〈g1,g2〉γ =∫ 1

0g1 (u)g2 (u)ϕγ (u)du, ‖g1‖2

2,γ = 〈g1,g1〉γ .


For any vector λ= (λ1, . . . ,λp)

and 0 < r < ∞, ‖λ‖r =(∑p

i=1 |λi |r)1/r

and ‖λ‖∞ =max

(|λ1| , . . . ,∣∣λp∣∣). In particular, denote ‖λ‖ = ‖λ‖2.

LEMMA A.1 (Bernstein’s inequality, Bosq, 1998, Thm. 1.4). Let {ξt } be a zero meanreal valued process, Sn = ∑n

i=1 ξi . Suppose that there exists c > 0 such that for i =1, · · · ,n, k ≥ 3,E |ξi |k ≤ ck−2k!Eξ2

i <+∞,mr = max1≤i≤N ‖ξi‖r ,r ≥ 2. Then for eachn > 1, integer q = qn ∈ [1,n/2], each εn > 0 and k ≥ 3

P

⎧⎨⎩∣∣∣∣∣∣

n∑i=1

ξi

∣∣∣∣∣∣ > nεn

⎫⎬⎭≤ a1 exp

(− qε2

n

25m22 +5cεn

)+a2 (k)α

([n

q +1

]) 2k2k+1

,

where

a1 = 2n

q+2

(1+ ε2

n

25m22 +5cεn

), a2 (k) = 11n

⎛⎝1+ 5m2k/(2k+1)kεn

⎞⎠ .Next, we introduce some properties of the B-spline. We denote by QT (g) the 4-th

order quasi-interpolant of g corresponding to the knots T , see DeVore and Lorentz (1993,eqn. 4.12, p. 146). According to DeVore and Lorentz (1993, Thm. 7.7.4, p. 225), thefollowing lemma holds.

LEMMA A.2. There exists a constant C∞ > 0 such that for any g ∈ C(r) [0,1] and

0 ≤ k ≤ 2,∥∥∥(QT (g)− g)(k)

∥∥∥∞ ≤ C∞∥∥∥g(k)

∥∥∥∞ hr−k .

LEMMA A.3 (B-spline Property).

(i) Partition of Unity. (de Boor, 2001, p. 96) The sequence{

Bj,k}N

j=−k+1 provides a

positive and local partition of unity, i.e., each Bj,k is positive on(tj,tj+k

), is zero

off[tj , tj+k

],∑N

j=−k+1 Bj,k = 1.

(ii) Differentiation. (de Boor, 2001, p. 116)

d

duBj,k (u)= (k −1)

{Bj,k−1 (u)

tj+k−1 − tj− Bj+1,k−1 (u)

tj+k − tj+1

},1−k ≤ j ≤ N.

(iii) Good Condition. (DeVore and Lorentz, 1993, Thm. 5.4.2, p. 145) There is a con-stant Dk > 0 such that for each spline S =∑N

j=−k+1 cj Bj,k of order k and each0< r ≤ ∞,{

Dk∥∥c′∥∥

r ≤ ‖S‖r ≤ ∥∥c′∥∥r , 1 ≤ r ≤ ∞,

Dk∥∥c′∥∥

r ≤ ‖S‖r ≤ k1/r∥∥c′∥∥

r , 0< r < 1.

LEMMA A.4. There exist constants c > 0 such that for any λ :=(λ−1,2,λ0,2, . . . ,

λN,2, . . . ,λN,4) ∈ R3N+6 .⎧⎪⎨⎪⎩

ch1/r ‖λ‖r ≤∥∥∥∑4

k=2∑N

j=−k+1 λj,k Bj,k

∥∥∥r

≤(

3r−1kh)1/r ‖λ‖r , 1 ≤ r ≤ ∞,

ch1/r ‖λ‖r ≤∥∥∥∑4

k=2∑N

j=−k+1 λj,k Bj,k

∥∥∥r

≤ (3kh)1/r ‖λ‖r , 0< r < 1.


In particular, under Assumption (A4), ∃ constants c,C ∈ (0,+∞) such that

ch1/2 ‖λ‖2 ≤∥∥∥∥∑4

k=2

∑N

j=−k+1λj,k Bj,k

∥∥∥∥2,γ

≤ Ch1/2 ‖λ‖2 ,∀γ ∈ .

Proof. It follows from Lemma A.3 (i) that,∑4

k=2∑N

j=−k+1 Bj,k ≡ 3 on [0,1]. So theright inequality follows immediately for r = ∞. When 1 ≤ r < ∞, Holder’s inequalityimplies that∣∣∣∣∑4

k=2

∑N

j=−k+1λj,k Bj,k

∣∣∣∣≤ 31−1/r(∑4

k=2

∑N

j=−k+1

∣∣λj,k∣∣r Bj,k

)1/r.

Since all the knots are equally spaced, Lemma A.3 (i) ensures that∫∞−∞ Bj,k (u)du ≤ kh,

the right inequality follows from∫ 1

0

∣∣∣∑4k=2

∑Nj=−k+1 λj,k Bj,k (u)

∣∣∣r du ≤3r−1kh ‖λ‖rr .

When r < 1, we have∣∣∣∑4

k=2∑N

j=−k+1 λj,k Bj,k

∣∣∣r ≤∑4k=2

∑Nj=−k+1

∣∣λj,k∣∣r Br

j,k . Since∫∞−∞ Br

j,k (u)du ≤ tj+k − tj = kh and

∫ 1

0

∣∣∣∣∑4

k=2

∑N

j=−k+1λj,k Bj,k (u)

∣∣∣∣r du ≤ ‖λ‖rr

∫ ∞−∞

Brj,k (u)du ≤ 3kh ‖λ‖r

r ,

the right inequality follows in this case as well. For the left inequalities, we derive fromLemma A.3 (iii), for any 0< r ≤ ∞∣∣λj,k

∣∣r ≤ Cr1h−1

∫ tj+1

tj

∣∣∣∣∑N


∣∣∣∣r du.

Since each u ∈ [0,1] appears in at most k intervals(tj,tj+k

), adding up these inequalities,

we obtain that

‖λ‖rr ≤ C1h−1

4∑k=1

∫ tj+k

tj

∣∣∣∣∑N


∣∣∣∣r du

≤ 3Ch−1∥∥∥∥∑N

j=−k+1λj,k Bj,k

∥∥∥∥r

r.

The left inequality follows. n

Define for any functions g1,g2 ∈ L2 [0,1] and any γ ∈ the empirical inner productand norm as

〈g1,g2〉n,γ = (n′′)−1

n∑t=n′+1

g1(Uγ,t

)g2(Uγ,t

), ‖g1‖2

2,n,γ = 〈g1,g1〉n,γ .

LEMMA A.5. Under Assumptions (A3), (A4), and (A6), as n → ∞, with probability 1

supγ∈

maxk,k′=2,3,4

1−k≤ j≤N,1−k′≤ j ′≤N

∣∣∣⟨Bj,k,Bj ′,k′⟩n,γ − ⟨Bj,k ,Bj ′,k′

⟩γ

∣∣∣ = O{(nN)−1/2 logn

}.


Proof. We only prove the case k = k′ = 4, all other cases are similar. Let

ζγ, j, j ′,t = Bj,4(Uγ,t

)Bj ′,4

(Uγ,t

)− E Bj,4(Uγ,t

)Bj ′,4

(Uγ,t

)with the second moment

Eζ 2γ, j, j ′ ,t = E

[B2

j,4(Uγ,t

)B2

j ′,4(Uγ,t

)]−{E Bj,4(Uγ,t

)Bj ′,4

(Uγ,t

)}2,

where E[

B2j,4

(Uγ,t

)B2

j ′,4(Uγ,t

)] ∼ N−1,{

E Bj,4(Uγ,t

)Bj ′,4

(Uγ,t

)}2 ∼ N−2 uni-

formly for all −3 ≤ j, j ′ ≤ N by Assumption (A4) and Lemma A.4. Hence, Eζ 2γ, j, j ′ ,t ∼

N−1 uniformly for all −3 ≤ j, j ′ ≤ N . The k−th moment is

E∣∣ζγ, j, j ′,t ∣∣k = E

∣∣Bj,4(Uγ,t

)Bj ′,4

(Uγ,t

)− E Bj,4(Uγ,t

)Bj ′,4

(Uγ,t

)∣∣k≤ 2k−1

{E∣∣Bj,4

(Uγ,t

)Bj ′,4

(Uγ,t

)∣∣k + ∣∣E Bj,4(Uγ,t

)Bj ′,4

(Uγ,t

)∣∣k},where E

∣∣Bj,4(Uγ,t

)Bj ′,4

(Uγ,t

)∣∣k ∼ N−1,∣∣E Bj,4

(Uγ,t

)Bj ′,4

(Uγ,t

)∣∣k ∼ N−k uni-

formly for all −3 ≤ j, j ′ ≤ N . Thus, there exists a constant C > 0 such that E∣∣ζγ, j, j ′,t ∣∣k ≤

C2k−1k!Eζ 2γ, j, j ′ ,t for all −3 ≤ j, j ′ ≤ N . So Cramer’s condition in Lemma A.1 is satisfied

and one has for δn = δ logn/√

nN , for some constant c such that cn/ logn ≤ q = qn < n/2,and fixed γ

P

{1

n′′∣∣∣∑n

t=n′+1ζγ, j, j ′,t

∣∣∣ > δn} ≤ n−10. (A.1)

We divide interval[θ1,θ2

]and

[η1,η2

]into Mn = n3 equally spaced intervals with disjoint

endpoints θ1 = θ1 < · · · < θMn = θ2 and η1 = η1 < · · ·< ηMn = η2 (Discretization). Thensupγ∈max−3≤ j, j ′≤N

∣∣ζγ, j, j ′ ,t ∣∣ is bounded by

sup1≤r≤Mn

max−3≤ j, j ′≤N

∣∣ζar , j, j ′,t∣∣+ max

−3≤ j, j ′≤Nsup

1≤r≤Mn

maxγ∈[θr ,θr+1]×[ηs,ηs+1]

∣∣ζγ, j, j ′,t − ζγrs , j, j ′,t∣∣ .

(A.2)

with γrs = (θr ,ηs ). While (A.1) implies that

sup1≤r,s≤Mn


(n′′)−1

∣∣∣∑n

t=n′+1ζγrs , j, j ′,t

∣∣∣ = O{(nN)−1/2 logn

},a.s. (A.3)

by Borel–Cantelli lemma. Employing Lipschitz continuity of the cubic B-spline, one haswith probability 1


sup1≤r≤Mn

maxγ∈[θr ,θr+1]×[ηs ,ηs+1]

∣∣∣(n′′)−1∑n

t=n′+1

(ζγ, j, j ′,t − ζγrs , j, j ′,t

)∣∣∣= O

(M−1

n h−6). (A.4)

Therefore Assumption A4, (A.2), (A.3), and (A.4) lead to the result. n


Denote by G = G(0) ∪ G(1) ∪ G(2) the space of all linear, quadratic, and cubic splinefunctions on [0,1]. We establish the uniform rate at which the empirical inner productapproximates the theoretical inner product for all B-splines Bj,k with k = 2,3,4.

LEMMA A.6. Under Assumptions (A3), (A4), and (A6), as n → ∞, one has

An = supγ∈

supφ1,φ2∈G

∣∣∣∣∣ 〈φ1,φ2〉n,γ −〈φ1,φ2〉γ‖φ1‖2,γ ‖φ2‖2,γ

∣∣∣∣∣= O{(nh)−1/2 logn

},a.s.. (A.5)

Proof. Denote φa = ∑4k=2

∑Nj=−k+1φa, jk Bj,k,a = 1,2, without loss of generality.

Then

〈φ1,φ2〉n,γ =4∑

k=2

N∑j=−k+1

4∑k′=2

N∑j ′=−k+1

φ1, j,kφ2, j ′k′⟨Bj,k ,Bj ′,k′

⟩n,γ ,

‖φ1‖22,γ =

4∑k=2

N∑j=−k+1

4∑k′=2

N∑j ′=−k+1

φ1, j,kφ1, j ′k′⟨Bj,k ,Bj ′ ,k′

⟩γ,

‖φ2‖22,γ =

4∑k=2

N∑j=−k+1

4∑k′=2

N∑j ′=−k+1

φ2, j,kφ2, j ′k′⟨Bj,k ,Bj ′ ,k′

⟩γ.

Let φ1=(φ1,−1,2,φ1,0,2, . . . ,φ1,N,2, . . . ,φ1,N,4), φ2=

(φ2,−1,2,φ2,0,2, . . . ,φ2,N,2, . . . ,

φ2,N,4). According to Lemma A.4, one has for any γ ∈ ,

ch ‖φ1‖22 ≤ ‖φ1‖2

2,γ ≤ Ch ‖φ1‖22 ,ch ‖φ2‖2

2 ≤ ‖φ2‖22,γ ≤ Ch ‖φ2‖2

2 ,

ch ‖φ1‖2 ‖φ2‖2 ≤ ‖φ1‖2,γ ‖φ2‖2,γ ≤ Ch ‖φ1‖2 ‖φ2‖2 .

Hence

An = supγ∈

supφ1∈φ,φ2∈G

∣∣∣∣∣ 〈φ1,φ2〉n,γ −〈φ1,φ2〉γ‖φ1‖2,γ ‖φ2‖2,γ

∣∣∣∣∣≤ ‖φ1‖∞ ‖φ2‖∞

c1h ‖φ1‖2 ‖φ2‖2supγ∈

maxk,k′=2,3,4

1−k≤ j≤N,1−k′≤ j ′≤N

∣∣∣∣{⟨Bj,k,Bj ′ ,k′⟩n,γ

−⟨Bj,k ,Bj ′ ,k′

⟩γ

}∣∣∣∣≤ c0h−1 sup

γ∈max

k,k′=2,3,41−k≤ j≤N,1−k′≤ j ′≤N

∣∣∣∣{⟨Bj,k ,Bj ′ ,k′⟩n,γ

−⟨Bj,k ,Bj ′ ,k′

⟩γ

}∣∣∣∣ ,which, together with Lemma A.5, imply (A.5). n

For any fixed γ , one has Y2 = gγ + g − gγ + E = gγ + Eγ + E, where ET ={g(Ut)(ξ2

t −1)}n

t=n′+1,Eγ = {g(Ut)− gγ

(Uγ,t

)}nt=n′+1. Then one can break the cubic

spline estimation error as

gγ (u)− gγ (u)= gγ (u)− gγ (u)+ εγ (u)+ εγ (u), (A.6)


where

gγ (u)={

Bj,4 (u)}T−3≤ j≤N V−1

n,γ

{⟨gγ ,Bj,4

⟩n,γ

}N

j=−3,

εγ (u)={

Bj,4 (u)}T−3≤ j≤N V−1

n,γ

{⟨Eγ ,Bj,4

⟩n,γ

}N

j=−3,

εγ (u)={

Bj,4 (u)}T−3≤ j≤N V−1

n,γ

{⟨E,Bj,4

⟩n,γ

}N

j=−3,

Vn,γ ={⟨

Bj,4,Bj ′,4⟩n,γ

}N

j, j ′=−3,Vγ =

{⟨Bj,4,Bj ′,4

⟩γ

}N

j, j ′=−3. (A.7)

The next proposition is used in proving Proposition 1.

PROPOSITION 2. Under Assumptions (A1)–(A4), (A6), as n → ∞

supγ∈

supu∈[0,1]

∣∣gγ (u)− gγ (u)∣∣= O

{(nh)−1/2 logn+h4

},a.s., (A.8)

supγ∈

maxn′+1≤t≤n

∣∣∇ {gγ(Uγ,t

)− gγ(Uγ,t

)}∣∣ = O{

n−1/2h−3/2 logn+h3},a.s., (A.9)

supγ∈

∣∣∣∇2 {gγ(Uγ,t

)− gγ(Uγ,t

)}∣∣∣ = O{

n−1/2h−5/2 logn+h2},a.s.. (A.10)

In order to prove the above proposition, we need several technical lemmas. The fol-lowing is a special case in DeVore and Lorentz (1993, Thm. 13.4.3). We denote forsquare positive definite symmetric matrix B =(bi, j

), ‖B‖2 = sup{‖Bx‖2 /‖x‖2 : x �= 0} =

sup{

xT Bx2/‖x‖22 : x �= 0

},which is the largest eigenvalue of B, and ‖B‖∞ =

maxi∑

j∣∣bi, j

∣∣.LEMMA A.7. If a bi-infinite matrix with bandwidth r has a bounded inverse A−1

on l2 (defined in DeVore and Lorentz, 1993, p. 19) and κ = κ (A) = ‖A‖2

∥∥∥A−1∥∥∥

2

is the condition number of A, then∥∥∥A−1

∥∥∥∞ ≤ 2c0 (1−v)−1, with c0 = v−2r ‖A‖2,

v = (κ2 −1

)1/4r (κ2 +1

)−1/4r .

LEMMA A.8. Under Assumptions (A3), (A4), and (A6), there exist constants 0< cV <

CV such that for any vector w �= 0,

cVN−1 ‖w‖22 ≤ wT Vγw ≤CV N−1 ‖w‖2

2 (A.11)

cVN−1 ‖w‖22 ≤ wT Vn,γ w ≤CV N−1 ‖w‖2

2 (A.12)

with matrices Vγ and Vn,γ defined in (A.7). In addition, there exists a constant C > 0 suchthat

supγ∈

∥∥∥V−1n,γ

∥∥∥∞ ≤ C N,a.s., supγ∈

∥∥∥V−1γ

∥∥∥∞ ≤ C N. (A.13)


Proof. Let w be any (N +4)-vector and φw (u) = ∑Nj=−3wJ Bj,4 (u), then

Bγw ={φw(Uγ,n′

), . . . ,φw

(Uγ,n−1

)}and An in (A.5) entails that

‖φw‖22,γ (1− An) ≤ wT Vn,γ w ≤‖φw‖2

2,γ (1+ An) . (A.14)

By DeVore and Lorentz (1993, Thm. 5.4.2) and Assumption (A4), one has

cϕC

N‖w‖2

2 ≤ wT Vγw ≤CϕC

N‖w‖2

2 (A.15)

which, together with (A.14), yield

cϕC

N‖w‖2

2 (1− An )≤ wT Vn,γ w ≤CϕC

N‖w‖2

2 (1+ An ) .

Then one has (A.11) and (A.12) by (A.15), (A.14), and (A.5). Next, denote by λmax(Vn,γ

)and λmin

(Vn,γ

)the maximum and minimum eigenvalue of Vn,γ , by the definition of

the∥∥∥∥

2, one has cϕCN

(1 − An

) ≤ wT Vn,γ w∥∥w∥∥2

2

≤ ∥∥Vn,γ∥∥

2 = λmax(Vn,γ

), then there

exists constant cV > 0, cV N−1 ≤ ∥∥Vn,γ∥∥

2 = λmax(Vn,γ

), a.s.. Similarly, λ−1

min(Vn,γ

)=∥∥V−1n,γ∥∥

2≤CV N−1, a.s., thus κ = ∥∥Vn,γ∥∥

2

∥∥V−1n,γ∥∥

2 = λmax(Vn,γ

)λ−1

min

(Vn,γ

) =CV /cV < ∞,a.s.. One can also show that κ ≥ C > 1,a.s.. Combining the above

and Lemma A.7 with v = (κ2 − 1

)1/16(κ2 + 1

)−1/16, one gets∥∥V−1

n,γ∥∥∞ ≤ 2v−8

N(1 − v)−1 = C N,a.s., which is part one of (A.13). Part two of (A.13) can be provedsimilarly. n

A.2. Proof of Proposition 2

We only illustrate the first element in the vector ∇ {gγ(Uγ,t

)− gγ(Uγ,t

)}and matrix

∇2gγ(Uγ,t

), i.e., ∂

{gγ(Uγ,t

)− gγ(Uγ,t

)}/∂θ and ∂2 {gγ

(Uγ,t

)− gγ(Uγ,t

)}/∂θ2.

The proofs for other elements are similar.

LEMMA A.9. Under Assumptions (A2)–(A4) and (A6), as n → ∞

supγ∈

∥∥∥(gγ − gγ)(k)∥∥∥∞ ≤ C

∥∥∥m(4)∥∥∥∞ h4−k,a.s.,0 ≤ k ≤ 2. (A.16)

Proof. According to Huang (2003, Lemma 5.1), there exists an absolute constant C > 0,such that

supγ∈

∥∥gγ − gγ∥∥∞ ≤ C sup

γ∈inf

φ∈G(2)

∥∥φ− gγ∥∥∞ ≤ C

∥∥∥m(4)∥∥∥∞ h4,a.s.,

which proves for the case k = 0. Applying Lemma A.2, one has for 0 ≤ k ≤ 2

supγ∈

∥∥∥(QT(gγ)− gγ

)(k)∥∥∥∞ ≤ C supγ∈

∥∥∥g(4)γ∥∥∥∞ h4−k ≤ C

∥∥∥m(4)∥∥∥∞ h4−k,a.s.. (A.17)


So supγ∈

∥∥QT(gγ)− gγ

∥∥∞ ≤ C∥∥∥m(4)

∥∥∥∞ h4 a.s., which entails that

supγ∈

∥∥∥(QT(gγ)− gγ

)(k)∥∥∥∞ ≤ C∥∥∥m(4)

∥∥∥∞ h4−k ,a.s.,0 ≤ k ≤ 2. (A.18)

Then the lemma is proved by combining (A.17) and (A.18). n

Denote Bγ = {Bj,4

(Uγ,t

)}n,Nt=n′+1, j=−3 and

Pγ = Bγ(

BTγ Bγ

)−1BTγ (A.19)

as the projection matrix onto the cubic spline space spanned by G(2), and Bγ ={∂Bj,4

(Uγ,t

)/∂θ

}n,Nt=n′+1, j=−3, Pγ = ∂Pγ /∂θ .

LEMMA A.10. Under Assumptions (A4), one has

Bγ =[{

Bj,3(Uγ,t

)− Bj+1,3(Uγ,t

)}f(Xγ,t

)h−1

×∑∞

j=1( j −1)θ j−2Y 2

t− j

{1+η1(Yt− j<0

)}]n,N

t=n′+1, j=−3, (A.20)

Pγ = (I −Pγ

)Bγ

(BTγ Bγ

)−1BTγ +Bγ

(BTγ Bγ

)−1BTγ

(I −Pγ

). (A.21)

Proof. Property (ii) in Lemma A.3 implies that

Bγ = {∇ Bj,4(Uγ,t

)}n,Nt=n′+1, j=−3 =

{B ′

j,4(Uγ,t

)∇Uγ,t}n,N

t=n′+1, j=−3

=[

3

{Bj,3

(Uγ,t

)tj+3 − tj

− Bj+1,3(Uγ,t

)tj+4 − tj+1

}f(Xγ,t

)×∑∞

j=1( j −1)θ j−2Y 2

t− j

{1+η1(Yt− j<0

)}]n,N

t=n′+1, j=−3

=[{

Bj,3(Uγ,t

)− Bj+1,3(Uγ,t

)}f(Xγ,t

)h−1

×∑∞

j=1( j −1)θ j−2Y 2

t− j

{1+η1(Yt− j<0

)}]n,N

t=n′+1, j=−3.

Next, note that

Pγ = Bγ(

BTγ Bγ

)−1BTγ +Bγ

∂

∂θ

{(BTγ Bγ

)−1}

BTγ +Bγ

(BTγ Bγ

)−1Bγ

and

∂

∂θ

{(BTγ Bγ

)−1}

= −(BTγ Bγ

)−1 ∂

∂θ

(BTγ Bγ

)(BTγ Bγ

)−1

= −(

BTγ Bγ

)−1 (BTγ Bγ +BT

γ Bγ)(

BTγ Bγ

)−1. (A.22)


Hence Pγ is

Bγ(

BTγ Bγ

)−1BTγ −Bγ

(BTγ Bγ

)−1BTγ Bγ

(BTγ Bγ

)−1BTγ

−Bγ(

BTγ Bγ

)−1BTγ Bγ

(BTγ Bγ

)−1BTγ +Bγ

(BTγ Bγ

)−1BTγ

= (I −Pγ

)Bγ

(BTγ Bγ

)−1BTγ +Bγ

(BTγ Bγ

)−1BTγ

(I −Pγ

). (A.23)

n

LEMMA A.11. Under Assumptions (A3), (A4), and (A6), as n → ∞supγ∈

∥∥∥(n′′)−1 BTγ

∥∥∥∞ ≤ Ch, supγ∈

∥∥∥(n′′)−1 BTγ

∥∥∥∞ ≤ C,a.s. (A.24)

supγ∈

∥∥Pγ∥∥∞ ≤ C, sup

γ∈∥∥Pγ

∥∥∞ ≤ Ch, supγ∈

∥∥∥∥ ∂∂θ{(

BTγ Bγ

)−1}∥∥∥∥∞

= O (N) ,a.s. (A.25)

Proof. For any vector a ∈Rn′′, one has∥∥∥(n′′)−1 BT

γ a∥∥∥∞ ≤ ‖a‖∞ max−3≤ j≤N

∣∣∣(n′′)−1∑n

t=n′+1Bj,4

(Uγ,t

)∣∣∣≤ Ch ‖a‖∞ ,a.s.

and using equation (A.20),∥∥∥(n′′)−1 BT

γ a∥∥∥∞ is bounded with probability 1 by

‖a‖∞ max−3≤ j≤N

∣∣∣(n′′h)−1∑n

t=n′+1

{(Bj,3 − Bj+1,3

)(Uγ,t

)}f(Xγ,t

)×∑∞

j=1( j −1)θ j−2Y 2

t− j

{1+η1(Yt− j<0

)}∣∣∣∣ ≤ C ‖a‖∞ .

Then one has (A.25) by (A.19), (A.13), (A.24), (A.23), and (A.22). Equations (A.20) and(A.21) are needed for proving the rest of the inequalities. n

LEMMA A.12. Under Assumptions (A2)–(A4) and (A6),

supγ∈

∣∣∣∣∣ ∂k

∂θk

{gγ(Uγ,t

)− gγ(Uγ,t

)}∣∣∣∣∣≤ C∥∥∥m(4)

∥∥∥∞ h4−k,a.s.,k = 1,2. (A.26)

Proof. According to the definition of gγ , one has

∂

∂θ

[{QT

(gγ)− gγ

}(Uγ,t

)]= ∂

∂θPγ[{

QT(gγ)− gγ

}(Uγ,t

)]= Pγ

[{QT

(gγ)− gγ

}(Uγ,t

)]+Pγ∂

∂θ

[{QT

(gγ)− gγ

}(Uγ,t

)],

∂

∂θ

[{QT

(gγ)− gγ

}(Uγ,t

)]=[{

QT

(∂

∂θgγ

)− ∂

∂θgγ

}(Uγ,t

)]+[{

QT(gγ)− gγ

}(1) (Uγ,t )]× f

(Xγ,t

)h−1

∑∞j=1

( j −1)θ j−2Y 2t− j ,


which yield (A.26) for k = 1 by (A.17), (A.25), and Lemma A.2. The proof for k = 2is similar. n

LEMMA A.13. Under Assumptions (A2)–(A4) and (A6), as n → ∞, one has with prob-ability 1

supγ∈

∥∥∥∥∥BTγ E

n′′

∥∥∥∥∥∞= O

(logn√

nN

), sup

γ∈

∥∥∥∥∥BTγ Eγn′′

∥∥∥∥∥∞ = O

(logn√

nN

), (A.27)

supγ∈

∥∥∥∥∥ ∂∂θ(

BTγ E

n′′

)∥∥∥∥∥∞= O

(logn√

nh

), sup

γ∈

∥∥∥∥∥ ∂∂θ(

BTγ Eγn′′

)∥∥∥∥∥∞= O

(logn√

nh

), (A.28)

supγ∈

∥∥∥∥∥ BTγ E

n′′

∥∥∥∥∥∞= O

(logn√

nh

), sup

γ∈

∥∥∥∥∥ BTγ Eγn′′

∥∥∥∥∥∞= O

(logn√

nh

). (A.29)

Proof. we prove only the first equation in (A.27) and the second equation of (A.28),other equations can be proved similarly. One has

BTγ E

n′′ =[(

n′′)−1∑n

t=n′ Bj,4(Uγ,t

)g (Ut )

(ξ2

t −1)]N

j=−3.

Denote Zt = g (Ut )(ξ2

t −1)

= Z Dnt ,1 + Z Dn

t ,2 + Z Dnt ,3 , where Dn = nη (1/3< η < 2/5),

Z Dnt ,1 = g (Ut )

(ξ2

t −1)

I{∣∣∣g (Ut )

(ξ2

t −1)∣∣∣ > Dn

},

Z Dnt ,2 = g (Ut )

(ξ2

t −1)

I{∣∣∣g (Ut )

(ξ2

t −1)∣∣∣ ≤ Dn

}− Z Dn

t ,3 ,

Z Dnt ,3 = E

[g (Ut )

(ξ2

t −1)

I{∣∣∣g (Ut )

(ξ2

t −1)∣∣∣≤ Dn

}].

Note that the B-spline basis is bounded and Eg (Ut )(ξ2

t −1)= 0, so∣∣∣Z Dn

t ,3

∣∣∣ = ∣∣∣E [g (Ut )(ξ2

t −1)

I{∣∣∣g (Ut )

(ξ2

t −1)∣∣∣ > Dn

}]∣∣∣ ≤ E∣∣∣g (Ut )

(ξ2

t −1)∣∣∣3 D−2

n .

Then

supγ∈

∣∣∣(n′′)−1∑n

t=n′ Bj,4(Uγ,t

)Z Dn

t ,3

∣∣∣ = O(D−2

n) = o

(n−2/3).

One has∑∞

n=n′+1 P{∣∣g(Un−1

)(ξ2

n −1)∣∣> Dn

}≤∑∞n=n′+1 D−3

n <∞ according to the

assumption that E(ξ6

t)= m6 <+∞, and Borel–Cantelli lemma implies that the tail part

supγ∈

∣∣∣(n′′)−1∑n

t=n′+1Bj,4

(Uγ,t

)Z Dn

t ,1

∣∣∣= O(n−k ), for any k > 0.

For the truncated part, similar to the proof of Lemma A.5, using Lemma A.1 and thediscretization technique in Fan and Yao (2003, p. 266), one has

supγ∈

∣∣∣(n′′)−1∑n

t=n′+1Bj,4

(Uγ,t

)Z Dn

t ,2

∣∣∣= O(

logn/√

Nn).


Therefore the first equation in (A.27) is established with probability 1. To prove the secondequation of (A.28), notice that

BTγ Eγn′′ =

[(n′′)−1∑n

t=n′+1Bj,4

(Uγ,t

){g (Ut )− gγ

(Uγ,t

)}]N

j=−3,

∂

∂θ

(BTγ Eγn′′

)=[(

n′′)−1∑n

t=n′+1

∂

∂θ

[Bj,4

(Uγ,t

){g (Ut )− gγ

(Uγ,t

)}]]N

j=−3.

While E[

Bj,4(Uγ,t

){g (Ut )− gγ

(Uγ,t

)}]= 0, −3 ≤ j ≤ N implies that

E

{∂

∂θ

[Bj,4

(Uγ,t

) {g (Ut )− gγ

(Uγ,t

)}]} = 0, −3 ≤ j ≤ N, γ ∈,

which allows one to apply Lemma A.1 to obtain that with probability one

supγ∈

∥∥∥∥∥ ∂∂θ(

BTγ Eγn′′

)∥∥∥∥∥∞= O

(logn/

√nh).

n

LEMMA A.14. Under Assumptions (A2)–(A4) and (A6), as n → ∞supγ∈

supu∈[0,1]

∣∣εγ (u)∣∣= O(

logn/√

nh),a.s., (A.30)

supγ∈

supu∈[0,1]

∣∣εγ (u)∣∣= O(

logn/√

nh),a.s.. (A.31)

Proof. We only prove (A.30), the proof of (A.31) is similar. Denote a =(a−3, . . . , aN

)T = (BTγ Bγ

)−1BTγ E = V−1

n,γ{(

n′′)−1BTγ E}, then εγ (u) =∑N

j=−3 aj Bj,4(u).

supγ∈

supu∈[0,1]

∣∣εγ (u)∣∣≤ supγ∈

∥∥a∥∥∞ = sup

γ∈

∥∥∥V−1n,γ

(n−1BT

γ E)∥∥∥∞

≤ C N supγ∈

∥∥∥(n′′)−1 BTγ E∥∥∥∞ ,a.s.,

where the last inequality follows from Lemmas A.8 and A.13. n

LEMMA A.15. Under Assumptions (A2)–(A4) and (A6), as n → ∞

supγ∈

maxn′+1≤t≤n

∣∣∣∣ ∂∂θ εγ (Uγ,t )∣∣∣∣= O

(n−1/2 N3/2 logn

),a.s., (A.32)

supγ∈

maxn′+1≤t≤n

∣∣∣∣ ∂∂θ εγ (Uγ,t )∣∣∣∣= O

(n−1/2 N3/2 logn

),a.s., (A.33)

supγ∈

maxn′+1≤t≤n

∣∣∣∣∣ ∂2

∂θ2εγ (Uγ,t )

∣∣∣∣∣= O(

n−1/2 N5/2 logn),a.s., (A.34)

supγ∈

maxn′+1≤t≤n

∣∣∣∣∣ ∂2

∂θ2εγ (Uγ,t )

∣∣∣∣∣= O(

n−1/2 N5/2 logn),a.s.. (A.35)


Proof. We only prove (A.32) and (A.33), the proofs of (A.34) and (A.35) are similar.One has{∂

∂θεγ (Uγ,t )

}n

t=n′+1= (

I−Pγ)

Bγ(

BTγ Bγ

)−1BTγ E+Bγ

(BTγ Bγ

)−1BTγ

(I−Pγ

)E

= (I−Pγ

)Bγ

(BTγ Bγn

)−1BTγ E

n+Bγ

(BTγ Bγn

)−1BTγ

(I−Pγ

)E

n.

According to (A.13), (A.24), (A.25), and (A.27), one has (A.32). To prove (A.33), notethat{∂

∂θεγ (Uγ,t )

}n

t=n′+1= (

I −Pγ)

Bγ(

BTγ Bγ

)−1BTγ Eγ +BT

γ

(BTγ Bγ

)−1Bγ

(I −Pγ

)Eγ

+Bγ(

BTγ Bγ

)−1BTγ∂

∂θEγ

= (I −Pγ

)Bγ

(BTγ Bγ

)−1BTγ Eγ −BT

γ

(BTγ Bγ

)−1Bγ PγEγ

+BTγ

(BTγ Bγ

)−1BγEγ +Bγ

(BTγ Bγ

)−1BTγ∂

∂θEγ

= T1 + T2

where

T1 ={(

I −Pγ)

Bγ −BTγ

(BTγ Bγ

)−1BγBT

γ

}(BTγ Bγ

)−1BTγ Eγ

=⎧⎨⎩(I −Pγ

)Bγ −BT

γ

(BTγ Bγn

)−1BγBT

γ

n

⎫⎬⎭(

BTγ Bγn

)−1BTγ Eγn

T2 = Bγ

(BTγ Bγn

)−1∂

∂θ

(BTγ Eγn

).

By (A.13), (A.24), (A.25), (A.27), and (A.28), one has supγ∈ ‖T1‖∞ =O(n−1/2 N3/2 logn

)and supγ∈ ‖T2‖∞ = O

(n−1/2 N3/2 logn

),a.s. which leads to

(A.33). n

Proof of Proposition 2. According to (A.6), one has (A.8) by (A.16), (A.30), and(A.31). Similarly, one has

∂

∂θ

{gγ(Uγ,t

)− gγ(Uγ,t

)}= ∂

∂θ

{gγ(Uγ,t

)− gγ(Uγ,t

)}+ ∂

∂θεγ (Uγ,t )+ ∂

∂θεγ (Uγ,t ).

Thus one has (A.9) by (A.16), (A.32), and (A.33). The proof of (A.10) is similar. n

A.3. Proof of Proposition 1

LEMMA A.16. Under Assumptions (A1)–(A6), as n → ∞, supγ∈∣∣∣ R(γ )− R(γ)

∣∣∣ =o(1),a.s..


Proof.

R(γ)= 1

n′′∑n

t=n′+1

{Y 2

t − gγ (Uγ,t )}2

= 1

n′′∑n

t=n′+1

{g(Ut )+ g(Ut )

(ξ2

t −1)

− gγ (Uγ,t )+ gγ (Uγ,t )− gγ (Uγ,t )}2

= 1

n′′∑n

t=n′+1

{gγ (Uγ,t )− gγ (Uγ,t )

}2 + 1

n′′n∑

t=n′+1

{g(Ut )− gγ (Uγ,t )

}2

+ 2

n′′n∑

t=n′+1


}{g(Ut )

(ξ2

t −1)}

+ 1n′′

n∑t=n′+1

{g(Ut )

(ξ2

t −1)}2

+ 2

n′′n∑

t=n′+1


}{g(Ut )− gγ (Uγ,t )+ g(Ut )

(ξ2

t −1)},

R(γ)= E{

Y 2t − gγ (Uγ,t )

}2

= E{

g(Ut )+ g(Ut )(ξ2

t −1)

− gγ (Uγ,t )}2

= E{

g(Ut )− gγ (Uγ,t )}2 + E

{g(Ut )

(ξ2

t −1)}2

.

Hence

supγ∈

∣∣∣R(γ)− R(γ)∣∣∣ ≤ I1 + I2 + I3 + I4,

where

I1 = supγ∈

∣∣∣∣∣∣ 1

n−n′n∑

t=n′+1


}2

∣∣∣∣∣∣ ,I2 = sup

γ∈

∣∣∣∣∣∣ 2

n′′n∑

t=n′+1


}{g(Ut )− gγ (Uγ,t )+ g(Ut )

(ξ2

t −1)}∣∣∣∣∣∣ ,

I3 = supγ∈

∣∣∣∣∣∣ 1

n′′n∑

t=n′+1


}2 − E{

g(Ut )− gγ (Uγ,t )}2

∣∣∣∣∣∣ ,I4 = sup

γ∈

⎧⎨⎩∣∣∣∣∣∣ 1n′′

n∑t=n′+1

{g(Ut )

(ξ2

t −1)}2 − (m4 −1) Eg2(Ut )

∣∣∣∣∣∣+∣∣∣∣∣∣ 2n′′

n∑t=n′+1


}{g(Ut )

(ξ2

t −1)}∣∣∣∣∣∣

⎫⎬⎭ .


According to Lemma A.1, one has I3 + I4 = o (1),a.s, and (A.8) entails that

I1 = O

{(n−1/2 logn

)2 +(

h4)2},a.s.. One also has

I2 ≤ O(

n−1/2 logn+h4)

supγ∈

2

n′′n∑

t=n′+1

∣∣∣{g(Ut )− gγ (Uγ,t )+ g(Ut )(ξ2

t −1)}∣∣∣ ,

which is O(n−1/2 logn+h4),a.s.. The lemma is proved by combining I1, I2, I3, I4. n

LEMMA A.17. Under Assumptions (A1)–(A6), as n → ∞, one has for k = 1,2

supγ∈

∣∣∣∇(k) R(γ)−∇(k)R(γ)∣∣∣= O

(n−1/2h−1/2−k logn+h4−k

),a.s.. (A.36)

Proof. We only show the proof for the case∂

∂θR(γ)− ∂

∂θR (γ) and the proof for other

elements is similar.

12∂

∂θR(γ)= 1

n′′n∑

t=n′+1

{gγ (Uγ,t )−Y 2

t

} ∂

∂θgγ (Uγ,t ),

1

2

∂

∂θR(γ)= E

[{gγ (Uγ,t )−Y 2

t

} ∂

∂θgγ (Uγ,t )

],

then

1

2

∂

∂θ

(R(γ)− R(γ)

)= 1

n′′n∑

t=n′+1

ξγ,t + Jγ,1 + Jγ,2 + Jγ,3,

where ξγ,t is defined in (A.38) and Eξγ,t = 0 , and where

Jγ,1 = 1n′′

n∑t=n′+1


} ∂∂θ

(gγ − gγ

)(Uγ,t ),

Jγ,2 = 1

n′′n∑

t=n′+1

{gγ (Uγ,t )−Y 2

t

} ∂∂θ

(gγ − gγ

)(Uγ,t ),

Jγ,3 = 1

n′′n∑

t=n′+1


} ∂∂θ

gγ (Uγ,t ).

By Lemma A.1, supγ∈

∣∣∣(n′′)−1∑nt=n′+1 ξγ,t

∣∣∣= O(n−1/2 logn

)a.s.. Meanwhile, (A.8) and

(A.9) imply that supγ∈

∣∣Jγ,1∣∣ = O(n−1h−2 log2 n+h7) a.s.. Note that

Jγ,2 = 1

n′′∑n

t=n′+1

{gγ (Uγ,t )−Y 2

t

} ∂∂θ

(gγ − gγ

)(Uγ,t )

− 1

n′′(Eγ +E

)T ∂

∂θ

{Pγ(Eγ +E

)}.


One has

supγ∈

∣∣∣∣ 1

n′′∑n

t=n′+1

{gγ (Uγ,t )−Y 2

t

} ∂∂θ

(gγ − gγ

)(Uγ,t )

∣∣∣∣= O(

h3)

a.s.

according to (A.16). Next∣∣∣∣ 1

n′′(Eγ +E

)T ∂

∂θ

{Pγ(Eγ +E

)}∣∣∣∣=∣∣∣∣∣∣ 1n′′(Eγ +E

)T ∂

∂θ

⎧⎨⎩Bγ

(BTγ Bγn′′

)−1BTγ

n′′(Eγ +E

)⎫⎬⎭∣∣∣∣∣∣

≤∣∣∣∣∣∣ 1n′′(Eγ +E

)T Bγ

(BTγ Bγn′′

)−1BTγ

n′′(Eγ +E

)∣∣∣∣∣∣+∣∣∣∣∣∣ 1

n′′(Eγ +E

)T Bγ

(BTγ Bγn′′

)−1∂

∂θ

BTγ

n′′(Eγ +E

)∣∣∣∣∣∣+∣∣∣∣∣∣ 1

n′′(Eγ +E

)T Bγ∂

∂θ

⎧⎨⎩(

BTγ Bγn′′

)−1⎫⎬⎭ BT

γ

n′′(Eγ +E

)∣∣∣∣∣∣ .Thus

supγ∈

∣∣∣∣ 1

n′′(Eγ + E

)T ∂

∂θ

{Pγ(Eγ + E

)}∣∣∣∣≤ O (N )× sup

γ∈

⎧⎨⎩∥∥∥∥ 1

n′′(Eγ + E

)T Bγ

∥∥∥∥∞

∥∥∥∥∥∥(

BTγ Bγn′′

)−1∥∥∥∥∥∥∞

∥∥∥∥∥BTγ

n′′(Eγ + E

)∥∥∥∥∥∞

⎫⎬⎭+ O (N )× sup

γ∈

⎧⎨⎩∥∥∥∥ 1

n′′(Eγ + E

)T Bγ

∥∥∥∥∞

∥∥∥∥∥∥(

BTγ Bγn′′

)−1∥∥∥∥∥∥∞

∥∥∥∥∥ ∂∂θ BTγ

n′′(Eγ + E

)∥∥∥∥∥∞

⎫⎬⎭+ O (N )× sup

γ∈

⎧⎨⎩∥∥∥∥ 1

n′′(Eγ + E

)T Bγ

∥∥∥∥∞

∥∥∥∥∥∥ ∂∂θ⎧⎨⎩(

BTγ Bγn′′

)−1⎫⎬⎭∥∥∥∥∥∥∞

∥∥∥∥∥BTγ

n′′(Eγ + E

)∥∥∥∥∥∞

⎫⎬⎭= O (N )× O

(logn/

√nh)

× O (N )× O(

logn/√

nN)

= O(

n−1 N 2 log2 n)

a.s.

according to (A.27), (A.28), (A.29), (A.13), and (A.25). So supγ∈

∣∣Jγ,2∣∣ =O(n−1 N2 log2 n+h3),a.s.. Similarly, one can write

Jγ,3 = 1

n′′n∑

t=n′+1


} ∂∂θ

gγ (Uγ,t )

+ 1

n′′(Eγ +E

)T Bγ

(BTγ Bγn

)−1BTγ

n

∂

∂θgγ .


and has

supγ∈

∣∣∣∣∣∣ 1

n′′n∑

t=n′+1


} ∂∂θ

gγ (Uγ,t )

∣∣∣∣∣∣ = O(

h4)

a.s.,

supγ∈

∣∣∣∣∣∣ 1n′′(Eγ +E

)T Bγ

(BTγ Bγn

)−1BTγ

n

∂

∂θgγ

∣∣∣∣∣∣= O

{logn√

nN× N ×h

}= O

{logn√

nN

}a.s..

Thus (A.36) is proved for k = 1. One can prove that for the term ξγ,t defined in (A.38),with probability 1

supγ∈

∣∣∣∣∣∣12 ∂

∂θ

{R(γ)− R(γ)

}− 1

n′′n∑

t=n′+1

ξγ,t

∣∣∣∣∣∣ = o(

n−1/2). (A.37)

The proof of (A.36) for k = 2 follows from (A.8), (A.9), and (A.10), since

1

2∇2 R(γ)= 1

n′′n∑

t=n′+1

[{gγ (Uγ,t )−Y 2

t

}∇2gγ (Uγ,t )+∇ gγ (Uγ,t )∇ gγ (Uγ,t )

],

12∇2 R(γ) = E

[{gγ (Uγ,t )−Y 2

t

}∇2gγ (Uγ,t )+∇gγ (Uγ,t )∇gγ (Uγ,t )

]. n

Proof of Proposition 1. It follows from Lemmas A.16 and A.17. n

Proof of Theorem 1. According to Proposition 1, one has supγ∈∣∣∣R(γ)− R(γ)

∣∣∣ →0,a.s. Thus there exists an integer n0, such that R(γ0)− R(γ0) < δ/2 when n > n0. Noticethat γ is the minimizer of R(γ0), so R( γ )− R(γ0) < δ/2. There also exists an integer n1,such that R( γ )− R( γ ) < δ/2 when n > n1. Thus, when n > max(n0,n1 ),

R( γ )− R(γ0)= R( γ )− R( γ )+ R( γ )− R(γ0) < δ.

According to Assumption (A5), R (γ ) is locally convex at γ0, so for any ε > 0, if R( γ )−R(γ0) < δ, then

∣∣ γ −γ0∣∣< ε for n large enough, which has proved the theorem. n

Proof of Theorem 2. Denote S (γ)= ∇ R(γ) and

ξγ,t ={

gγ (Uγ,t )−Y 2t

}∇gγ (Uγ,t )− E

[{gγ (Uγ,t )−Y 2

t

}∇gγ (Uγ,t )

], (A.38)

then because ∇ R(γ0)= 0, one has∣∣∣∣S(γ0)− 2n′′∑n

t=n′+1ξγ0,t

∣∣∣∣= o(

n−1/2),a.s.. (A.39)

according to (A.37). For some γ1, γ2 between γ and γ0,

S( γ )− S(γ0)=⎛⎝ (

∂2/∂θ2)

R(γ1)(∂2/∂θ∂η

)R(γ1)(

∂2/∂θ∂η)

R(γ2)(∂2/∂η2

)R(γ2)

⎞⎠( γ −γ0)

= A(γ −γ0

),


and S( γ )= 0 because R(γ) attains its minimum at γ . Thus, we have

γ −γ0 = −A−1 S(γ0).

According to Theorem 1 and Proposition 1, we have A−1 → {∇2 R(γ0)}−1a.s., where

∇2R(γ0) is given in (7). According to (A.39), one has√

n′′ S(γ0) →d N {0,� (γ0)} bythe Central Limit Theorem for strongly mixing processes (Bosq, 1998, Thm. 1.7), where�(γ0) is given in (6). Then Theorem 2 is proved by formula (5) and Slutsky’s theorem. n

32 spline estimation of a semiparametric garch …

Documents