advances in decision sciencesdownloads.hindawi.com/journals/specialissues/614639.pdf · optimal...

Hindawi Publishing Corporationhttp://www.hindawi.com

Advances in

DecisionSciencesGuest Editors: Masanobu Taniguchi, Cathy W. S. Chen, Junichi Hirukawa, Hiroshi Shiraishi, Kenichiro Tamaki, and David Veredas

Special IssueStatistical Estimation of Portfolios for Dependent Financial Returns

Statistical Estimation of Portfoliosfor Dependent Financial Returns

Advances in Decision Sciences

Statistical Estimation of Portfoliosfor Dependent Financial Returns

Guest Editors: Masanobu Taniguchi, Cathy W. S. Chen,Junichi Hirukawa, Hiroshi Shiraishi, Kenichiro Tamaki,and David Veredas

Copyright © 2012 Hindawi Publishing Corporation. All rights reserved.

This is a special issue published in “Advances in Decision Sciences.” All articles are open access articles distributed under the CreativeCommons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the originalwork is properly cited.

Editorial Board

Mahyar A. Amouzegar, USAFernando Beltran, New ZealandOmer S. Benli, USADavid Bulger, AustraliaRaymond Chan, Hong KongWai Ki Ching, Hong KongStefanka Chukova, New Zealand

S. Dempe, GermanyC. D. Lai, New ZealandYanXia Lin, CanadaChenghu Ma, ChinaKhosrow Moshirvaziri, USAShelton Peiris, AustraliaJack Penm, Australia

Roger Z. Rıos-Mercado, MexicoHenry Schellhorn, USAAndreas Soteriou, CyprusOlivier Thas, BelgiumWingKeung Wong, Hong KongGraham Raymond Wood, Australia

Contents

Statistical Estimation of Portfolios for Dependent Financial Returns, Masanobu Taniguchi,Cathy W. S. Chen, Junichi Hirukawa, Hiroshi Shiraishi, Kenichiro Tamaki, and David VeredasVolume 2012, Article ID 681490, 3 pages

Large-Deviation Results for Discriminant Statistics of Gaussian Locally Stationary Processes,Junichi HirukawaVolume 2012, Article ID 572919, 15 pages

Asymptotic Optimality of Estimating Function Estimator for CHARN Model, Tomoyuki AmanoVolume 2012, Article ID 515494, 11 pages

Optimal Portfolio Estimation for Dependent Financial Returns with Generalized Empirical Likelihood,Hiroaki OgataVolume 2012, Article ID 973173, 8 pages

Statistically Efficient Construction of α-Risk-Minimizing Portfolio,Hiroyuki Taniai and Takayuki ShiohamaVolume 2012, Article ID 980294, 17 pages

Estimation for Non-Gaussian Locally Stationary Processes with Empirical Likelihood Method,Hiroaki OgataVolume 2012, Article ID 704693, 22 pages

A Simulation Approach to Statistical Estimation of Multiperiod Optimal Portfolios, Hiroshi ShiraishiVolume 2012, Article ID 341476, 13 pages

On the Causality between Multiple Locally Stationary Processes, Junichi HirukawaVolume 2012, Article ID 261707, 15 pages

Optimal Portfolios with End-of-Period Target, Hiroshi Shiraishi, Hiroaki Ogata, Tomoyuki Amano,Valentin Patilea, David Veredas, and Masanobu TaniguchiVolume 2012, Article ID 703465, 13 pages

Least Squares Estimators for Unit Root Processes with Locally Stationary Disturbance,Junichi Hirukawa and Mako SadakataVolume 2012, Article ID 893497, 16 pages

Statistical Portfolio Estimation under the Utility Function Depending on Exogenous Variables,Kenta Hamada, Dong Wei Ye, and Masanobu TaniguchiVolume 2012, Article ID 127571, 15 pages

Statistical Estimation for CAPM with Long-Memory Dependence, Tomoyuki Amano, Tsuyoshi Kato,and Masanobu TaniguchiVolume 2012, Article ID 571034, 12 pages

Hindawi Publishing CorporationAdvances in Decision SciencesVolume 2012, Article ID 681490, 3 pagesdoi:10.1155/2012/681490

EditorialStatistical Estimation of Portfolios forDependent Financial Returns

Masanobu Taniguchi,1 Cathy W. S. Chen,2 Junichi Hirukawa,3Hiroshi Shiraishi,4 Kenichiro Tamaki,1 and David Veredas5

1 Department of Applied Mathematics, Waseda University, Tokyo 169-8555, Japan2 Department of Statistics/Graduate Institute of Statistics & Actuarial Science,Feng Chia University Taichung 407, Taiwan

3 Department of Mathematics, Faculty of Science, Niigata University, Niigata 950-2181, Japan4 The Jikei University School of Medicine, Tokyo 105-8461, Japan5 ECARES-Solvay Brussels School of Economics and Management, Universite Libre de Bruxelles,1050 Brussels, Belgium

Correspondence should be addressed to Masanobu Taniguchi, [email protected]

Received 22 August 2012; Accepted 22 August 2012

Copyright q 2012 Masanobu Taniguchi et al. This is an open access article distributed underthe Creative Commons Attribution License, which permits unrestricted use, distribution, andreproduction in any medium, provided the original work is properly cited.

The field of financial engineering has developed as a huge integration of economics,mathematics, probability theory, statistics, time series analysis, operation research, and soforth, over the last decade. The construction of portfolios for financial assets is one of themost important issues in financial engineering. It is empirically observed that financialreturns are non-Gaussian and dependent, and it is shown that the classical mean-varianceportfolio estimator is not statistically optimal. Knowledge and understanding of these haveled to the development of general time series modeling for financial returns, sophisticatedoptimal estimation theory, robust estimation methods, empirical likelihood for time series,nonstational time series analysis, prediction of time series, and various numerical approachesfor portfolios.

As the contents, this special volume includes the following topics in financial timeseries analysis and financial engineering.

The paper titled “Large deviation results for discriminant statistics of Gaussian locallystationary processes” by J. Hirukawa discusses the large deviation principle of discriminantstatistics for Gaussian locally stationary processes. The large deviation theorems for quadraticforms and the log-likelihood ratio for a Gaussian locally stationary process with a meanfunction are proved. Their asymptotics are described by the large deviation rate functions.Next, the situation where processes are misspecified to be stationary is considered. In these

2 Advances in Decision Sciences

misspecified cases, the log-likelihood ratio discriminant statistics are formally constructedand the large deviation theorems of them are derived. Since they are mathematicallycomplicated, they are evaluated and illustrated by numerical examples. We see that themisspecification of the process to be stationary seriously affects these discrimination.

The paper by T. Amano is titled “Asymptotic optimality of estimating function estimatorfor CHARN model.” CHARN model is a famous and important model in the finance, whichincludes many financial time series models and can be used to model the return processesof assets. One of the most fundamental estimators for financial time series models is theconditional least squares CL estimator. However, recently, it was shown that the optimalestimating function estimator (G estimator) is better than CL estimator for some time seriesmodels in the sense of efficiency. This paper examines efficiencies of CL and G estimators forCHARN model and derives the condition that G estimator is asymptotically optimal.

The next paper titled “Optimal portfolio estimation for dependent financial returns withgeneralized empirical likelihood” by H. Ogata proposes to use the method of generalizedempirical likelihood to find the optimal portfolio weights. The log-returns of assets aremodeled by multivariate stationary processes rather than i.i.d. sequences. The varianceof the portfolio is written by the spectral density matrix, and we seek the portfolioweights minimizing it. The illustration of this method to the real market index data is alsogiven.

The paper titled “Statistically efficient construction of α-risk-minimizing portfolio” byHiroyuki Taniai and Takayuki Shiohama proposes a semiparametrically efficient estimatorfor α-risk-minimizing portfolio weights. The optimal portfolio whose α-risk being minimizedis formulated in a linear quantile regression problem. The authors apply the rank-basedsemiparametric method, using the signs and ranks of residual, to provide the efficientconstruction of the optimal portfolios. This efficiency gain is verified by Monte Carlosimulations and Empirical applications.

The paper titled “Estimation for non-Gaussian locally stationary processes with empiricallikelihood method” by H. Ogata considered an estimation problem for non-Gaussian locallystationary processes by empirical likelihood. The parameter of interest is specified by thetime-varying spectral moment condition, and it can express various important indices forlocally stationary processes such as an autocorrelation function. The asymptotic distributionof maximum empirical likelihood estimator and empirical likelihood ratio test statistic aregiven based on the central limit theorem for locally stationary processes.

The paper “A simulation approach to statistical estimation of multiperiod optimal portfolios”by H. Shiraishi discusses a simulation-based method for solving discrete-time multiperiodportfolio choice problems under AR(1) return process. Based on the AR bootstrap, first,simulation sample paths of the random returns are generated. Then, for each sample pathand each investment time, an optimal portfolio estimator, which optimizes a constant relativerisk aversion (CRRA) utility function, is obtained.

The paper by J. Hirukawa entitled “On the causality between multiple locally stationaryprocesses” is concerned with the concepts of dependence and causality which can describe therelations between multivariate time series. These concepts also appear to be useful when oneis describing the properties of an engineering or econometric model. Although the measuresof dependence and causality under stationary assumption are well established, empiricalstudies show that these measures are not constant in time. In this paper, the generalizedmeasures of linear dependence and causality to multiple locally stationary processes areproposed. The measures of linear dependence, linear causality from one series to the other,and instantaneous linear feedback, at each time and each frequency, are given.

Advances in Decision Sciences 3

The paper titled “Optimal portfolios with end-of-period target” by H. Shiraishi et al.studies the estimation of optimal portfolios for a Reserve Fund with an end-of-periodtarget, when the returns of the assets constituting the Reserve Fund portfolio follow twospecifications. They focus the case when assets are split into short memory bonds and longmemory equity or when returns of the distribution are heavy-tailed stable.

The next paper is by J. Hirukawa and M. Sadakata entitled “Least squares estimatorsfor unit root processes with locally stationary disturbance.” It contains a discussion of variousproperties of the least squares estimators for unit root processes with locally stationaryinnovation processes. Since locally stationary process is not a stationary process, these modelsinclude different two types of nonstationarity, namely, unit root and locally stationarity. Thelocally stationary innovation has time-varying spectral structure, hence, it is suitable fordescribing the empirical financial time series data. Due to its nonstationarity, the least squaresestimators of these models do not satisfy asymptotic normality. In this paper, the limitingdistributions of the least squares estimators of unit root, near unit root, and general integratedprocesses with LSP innovation are derived.

The paper titled “Statistical portfolio estimation under the utility function depending onexogenous variables” by K. Hamada et al. develops the portfolio estimation under the situation.To estimate the optimal portfolio, a function of sample moments of the return process andsample cumulant between the return processes and exogenous variables is introduced. Then,its asymptotic distribution is derived, and the influence of exogenous variable on the returnprocess is illuminated.

The paper titled “Statistical estimation for CAPM with long-memory dependence” by T.Amano et al. investigates the Capital Asset Pricing Model (CAPM) with time dimension.In view of time series analysis, authors describe the model of CAPM such as a regressionmodel that market portfolio and the error process are long-memory process and correlatedeach other. They give a sufficient condition for the return of assets in the CAPM to be shortmemory. In this setting, they propose a two-stage least squares estimator for the regressioncoefficient and derive the asymptotic distribution. Some numerical studies are given.

This issue develops a modern, high-level statistical optimal estimation theory forportfolio coefficients, assuming that the financial returns are “dependent and non-Gaussian,”which opens up a new horizon in the field of portfolio estimation.

Masanobu TaniguchiCathy W. S. ChenJunichi HirukawaHiroshi ShiraishiKenichiro Tamaki

David Veredas


Research ArticleLarge-Deviation Results for Discriminant Statisticsof Gaussian Locally Stationary Processes

Junichi Hirukawa

Faculty of scince, Niigata University, 8050 Ikarashi 2-no-cho, Nishi-ku, Niigata 950-2181, Japan

Correspondence should be addressed to Junichi Hirukawa, [email protected]

Received 16 February 2012; Accepted 9 April 2012

Academic Editor: Kenichiro Tamaki

Copyright q 2012 Junichi Hirukawa. This is an open access article distributed under the CreativeCommons Attribution License, which permits unrestricted use, distribution, and reproduction inany medium, provided the original work is properly cited.

This paper discusses the large-deviation principle of discriminant statistics for Gaussian locallystationary processes. First, large-deviation theorems for quadratic forms and the log-likelihoodratio for a Gaussian locally stationary process with a mean function are proved. Their asymptoticsare described by the large deviation rate functions. Second, we consider the situations whereprocesses are misspecified to be stationary. In these misspecified cases, we formally make thelog-likelihood ratio discriminant statistics and derive the large deviation theorems of them. Sincethey are complicated, they are evaluated and illustrated by numerical examples. We realize themisspecification of the process to be stationary seriously affecting our discrimination.

1. Introduction

Consider a sequence of random variables S1, S2, . . . converging (in probability) to a realconstant c. By this we mean that Pr{|ST − c| > ε} → 0 as T → ∞ for all ε > 0. The simplestsetting in which to obtain large-deviation results is that considering sums of independentidentically distributed (iid) random variables on the real line. For example, we would like toconsider the large excursion probabilities of sums as the sample average:

ST = T−1T∑

i=1

Xi, (1.1)

where the Xi, i = 1, 2, . . ., are i.i.d., and T approaches infinity. Suppose that E(Xi) = m existsand is finite. By the law of large numbers, we know that ST should be converging tom. Hence,


c is merely the expected value of the random process. It is often the case that not only doesPr{|ST − c| > ε} go to zero, but it does so exponentially fast. That is,

Pr{|ST − c| > ε} ≈ K(ε, c, T) exp{−TI(ε, c)}, (1.2)

where K(ε, c, T) is a slowly varying function of T (relative to the exponential), and I(ε, c) is apositive quantity. Loosely, if such a relationship is satisfied, we will say that the sequence{Sn} satisfies a large-deviation principle. Large-deviation theory is concerned primarilywith determining the quantities I(ε, c) and (to a lesser extent) K(ε, c, T). The reason forthe nomenclature is that for a fixed ε > 0 and a large index T , a large-deviation from thenominal value occurs if |ST − c| > ε. Large-deviation theory can rightly be considered as ageneralization or extension of the law of large numbers. The law of large numbers says thatcertain probabilities converge to zero. Large-deviation theory is concerned with the rate ofconvergence. Bucklew [1] describes the historical statements of large-deviation in detail.

There have been a few works on the large-deviation theory for time series data.Sato et al. [2] discussed the large-deviation theory of several statistics for short- and long-memory stationary processes. However, it is still hard to find the large-deviation resultsfor nonstationary processes. Recently, Dahlhaus [3, 4] has formulated an important class ofnonstationary processes with a rigorous asymptotic theory, which he calls locally stationary.A locally stationary process has a time-varying spectral density whose spectral structurechanges smoothly with time. There are several papers which discuss discriminant analysisfor locally stationary processes (e.g., Chandler and Polonik [5], Sakiyama and Taniguchi[6], and Hirukawa [7]). In this paper, we discuss the large-deviation theory of discriminantstatistics of Gaussian locally stationary processes. In Section 2 we present the Gartner-Ellistheorem which establishes a large-deviation principle of random variables based only uponconvergence properties of the associated sequence of cumulant generating functions. Sinceno assumptions are made about the dependency structure of random variables, we can applythis theorem to non-stationary time series data. In Section 3, we deal with a Gaussian locallystationary process with a mean function. First, we prove the large-deviation principle fora general quadratic form of the observed stretch. We also give the large-deviation principlefor the log-likelihood ratio and the misspecified log-likelihood ratio between two hypotheses.These fundamental statistics are important not only in statistical estimation and testing theorybut in discriminant problems. The above asymptotics are described by the large-deviationrate functions. In our stochastic models, the rate functions are very complicated. Thus, inSection 4, we evaluate them numerically. They demonstrate that the misspecifications of non-stationary has serious effects. All the proofs of the theorems presented in Section 3 are givenin the Appendix.

2. Gartner-Ellis Theorem

Cramer’s theorem (e.g., Bucklew [1]) is usually credited with being the first large-deviationresult. It gives the large-deviation principle for sums of independent identically distributedrandom variables. One of the most useful and surprising generalizations of this theorem isthe one due to Gartner [8] and, more recently, Ellis [9]. These authors established a large-deviation principle of random variables based only upon convergence properties of theassociated sequence of moment generating functions Φ(ω). Their methods thus allow large-deviation results to be derived for dependent random processes such as Markov chains and


functionals of Gaussian random processes. Gartner [8] assumed throughout that Φ(ω) < ∞for all ω. By extensive use of convexity theory, Ellis [9] relaxed this fairly stringent condition.

Suppose that we are given an infinite sequence of random variables {YT , T ∈ N}. Noassumptions are made about the dependency structure of this sequence. Define

ψT (ω) ≡ T−1 logE{

exp(ωYT )}. (2.1)

Now let us list two assumptions.

Assumption 2.1. ψ(ω) ≡ limT→∞ψT (ω) exists for all ω ∈ R, where we allow∞ both as a limitvalue and as an element of the sequence {ψT (ω)}.

Assumption 2.2. ψ(ω) is differentiable on Dψ ≡ {ω : ψ(ω) <∞}.

Define the large-deviation rate function by

I(x) ≡ supω

{ωx − ψ(ω)}; (2.2)

this function plays a crucial role in the development of the theory. Furthermore, define

ψ ′(Dψ

) ≡ {ψ ′(ω) : ω ∈ Dψ

}, (2.3)

where ψ ′ indicates the derivative of ψ. Before proceeding to the main theorem, we first statesome properties of this rate function.

Property 1. I(x) is convex.

We remark that a convex function I(·) on the real line is continuous everywhere onDI ≡ {x : I(x) <∞}, the domain of I(·).

Property 2. I(x) has its minimum value at m = limT→∞T−1E(YT ), and I(m) = 0.

We now state a simple form of a general large-deviation theorem which is known asthe Gartner and Ellis theorem (e.g., Bucklew [1]).

Lemma 2.3 (Gartner-Ellis). Let (a, b) be an interval with [a, b] ∩DI /= ∅. If Assumption 2.1 holdsand a < b, then

lim supT→∞

T−1 log Pr{T−1YT ∈ [a, b]

}≤ − inf

x∈[a,b]I(x). (2.4)

If Assumptions 2.1 and 2.2 hold and (a, b) ⊂ ψ ′(Dψ), then

lim infT→∞

T−1 log Pr{T−1YT ∈ (a, b)

}≥ − inf

x∈(a,b)I(x). (2.5)

Large-deviation theorems are usually expressed as two separate limit theorem: anupper bound for closed sets and a lower bound for open sets. In the case of interval subsets


of R, it can be guaranteed that the upper bound equals the lower bound by the continuity ofI(·). For the applications that we have in mind, the interval subsets will be sufficient.

3. Large-Deviation Results for Locally Stationary Processes

In this section, using the Gartner-Ellis theorem, we develop the large-deviation principlefor some non-stationary time series statistics. When we deal with non-stationary processes,one of the difficult problems to solve is how to set up an adequate asymptotic theory. Toovercome this problem, an important class of non-stationary process has been formulatedin rigorous asymptotic framework by Dahlhaus [3, 4], called locally stationary processes.Locally stationary processes have time-varying densities, whose spectral structures smoothlychange in time. We give the precise definition of locally stationary processes which is due toDahlhaus [3, 4].

Definition 3.1. A sequence of stochastic processes Xt,T (t = 1, . . . , T ; T ≥ 1) is called locallystationary with transfer function A◦ and trend μ if there exists a representation:

Xt,T = μ(t

T

)+∫π

−πexp(iλt)A◦t,T (λ)dξ(λ), (3.1)

where

(i) ξ(λ) is a stochastic process on [−π,π] with ξ(λ) = ξ(−λ) and

cum{dξ(λ1), . . . , dξ(λk)} = η⎛

⎝k∑

j=1

λj

⎞

⎠νk(λ1, . . . , λk−1)dλ1 · · ·dλk, (3.2)

where cum{. . .} denotes the cumulant of k-th order, ν1 = 0, ν2(λ) = 1,|νk(λ1, . . . , λk−1)| ≤ constk for all k and η(λ) =

∑∞j=−∞ δ(λ + 2πj) is the period 2π

extension of the Dirac delta function. To simplify the problem, we assume in thispaper that the process Xt,T is Gaussian, namely, we assume that νk(λ) = 0 for allk ≥ 3;

(ii) there exists constantK and a 2π-periodic functionA : [0, 1]×R → C withA(u, λ) =A(u,−λ) and

supt,λ

∣∣∣∣A◦t,T (λ) −A

(t

T, λ

)∣∣∣∣ ≤ KT−1, (3.3)

for all T . A(u, λ) and μ(u) are assumed to be continuous in u.

The function f(u, λ) := |A(u, λ)|2 is called the time-varying spectral density of theprocess. In the following, we will always denote by s and t time points in the interval [1, T],while u and v will denote time points in the rescaled interval [0, 1], that is u = t/T .

We discuss the asymptotics away from the expectation of some statistics used for theproblem of discriminating between two Gaussian locally stationary processes with specified


mean functions. Suppose that {Xt,T , t = 1, . . . , T ; T ≥ 1} is a Gaussian locally stationary processwhich under the hypothesis Πj has mean function μ(j)(u) and time-varying spectral densityf (j)(u, λ) for j = 1, 2. Let XT = (X1,T , . . . , XT,T )

′ be a stretch of the series {Xt,T}, and let p(j)(·)be the probability density function of XT under Πj(j = 1, 2). The problem is to classify XT intoone of two categories Π1 and Π2 in the case that we do not have any information on the priorprobabilities of Π1 and Π2.

Set μ(j)T = {μ(j)(1/T), . . . , μ(j)(T/T)}′ and Σ(j)

T = ΣT (A(j), A(j)), where

ΣT (A,B) ={∫π

−πA◦s,T (λ)B

◦t,T (−λ) exp(iλ(s − t))dλ

}

s,t=1,...,T. (3.4)

Initially, we make the following assumption.

Assumption 3.2. (i) We observe a realisation X1,T , . . . , XT,T of a Gaussian locally stationaryprocess with mean function μ(j) and transfer function A(j)◦, under Πj , j = 1, 2;

(ii) the A(j)(u, λ) are uniformly bounded from above and below, and are differentiablein u and λ with uniformly continuous derivatives (∂/∂u)(∂/∂λ)A(j);

(iii) the μ(j)(u) are differentiable in u with uniformly continuous derivatives.In time series analysis, the class of statistics which are quadratic forms of XT is

fundamental and important. This class of statistics includes the first-order terms (in theexpansion with respect to T) of quasi-Gaussian maximum likelihood estimator (QMLE), testsand discriminant statistics, and so forth Assume that G◦ is the transfer function of a locallystationary process, where the corresponding G satisfies Assumption 3.2 (ii) and g(u) is a con-tinuous function of u which satisfies Assumption 3.2 (iii), if we replace A(j) by G and μ(j)(u)by g(u), respectively. And set GT ≡ ΣT (G,G), fG(u, λ) ≡ |G(u, λ)|2, gT ≡ {g(1/T), . . . , g(T/T)}′andQT ≡ X′TG

−1T XT+g′TXT. Henceforth, E(j)(·) stands for the expectation with respect to p(j)(·).

Set S(j)T (Q) ≡ QT − E(j)(QT ) for j = 1, 2. We first prove the large-deviation theorem for this

quadratic form QT of XT . All the proofs of the theorems are in the Appendix.

Theorem 3.3. Let Assumption 3.2 hold. Then under Π1,

limT→∞

T−1 log Pr1

{T−1S

(1)T (Q) > x

}= inf

ω

{ψQ(ω; f (1)

)−ωmax(x, 0)

}, (3.5)

and under Π2,

limT→∞

T−1 log Pr2

{T−1S

(2)T (Q) < x

}= inf

ω

{ψQ(ω; f (2)

)−ωmin(x, 0)

}, (3.6)

where for j = 1, 2, ψQ(ω; f (j)) equals

14π

∫1

0

∫π

−π

[log

fG(u, λ)fG(u, λ) − 2ωf (j)(u, λ)

− 2ωf (j)(u, λ)fG(u, λ)

+ω2{fG(u, 0)g(u) + 2μ(j)(u)

}2f (j)(u, 0)

2π{fG(u, 0) − 2ωf (j)(u, 0)

}fG(u, 0)

⎤

⎦dλdu.

(3.7)


Next, one considers the log-likelihood ratio statistics. It is well known that the log-likelihood ratio criterion:

ΛT ≡ logp(2)(XT )p(1)(XT )

(3.8)

gives the optimal discrimination rule in the sense that it minimizes the probability ofmisdiscrimination (Anderson [10]). Set S(j)

T (Λ) ≡ ΛT −E(j)(ΛT ) for j = 1, 2. For discriminationproblem one gives the large-deviation principle for ΛT .


limT→∞

T−1 log Pr1

{T−1S

(1)T (Λ) > x

}= inf

ω

{ψL(ω; f (1), f (2)

)−ωmax(x, 0)

}, (3.9)

where ψL(ω; f (1), f (2)) equals

14π

∫1

0

∫π

−π

⎡

⎣ logf (2)(u, λ)

(1 −ω)f (2)(u, λ) +ωf (1)(u, λ)+ω

{f (1)(u, λ)f (2)(u, λ)

− 1

}

+ω2{μ(1)(u) − μ(2)(u)

}2f (1)(u, 0)

2π{(1 −ω)f (2)(u, 0) +ωf (1)(u, 0)

}f (2)(u, 0)

⎤

⎦dλdu.

(3.10)

Similarly, under Π2,

limT→∞

T−1 log Pr2

{T−1S

(2)T (Λ) < x

}= inf

ω

{ψL(−ω; f (2), f (1)

)−ωmin(x, 0)

}. (3.11)

In practice, misspecification occurs in many statistical problems. We consider thefollowing three situations. Although actually {Xt,T} has the time-varying mean functionsμ(j)(u) and the time-varying spectral densities f (j)(u, λ), under Πj , j = 1, 2, respectively,

(i) the mean functions are misspecified to μ(j)(u) ≡ 0, j = 1, 2;

(ii) the spectral densities are misspecified to f (j)(u, λ) ≡ f (j)(0, λ), j = 1, 2;

(iii) the mean functions and the spectral densities are misspecified to μ(j)(u) ≡ 0 andf (j)(u, λ) ≡ f (j)(0, λ), j = 1, 2. Namely, XT is misspecified to stationary.


In each misspecified case, one can formally make the log-likelihood ratio in the form:

M1,T =12

⎡

⎣log

⎧⎨

⎩

∣∣∣Σ(1)∣∣∣

∣∣∣Σ(2)∣∣∣

⎫⎬

⎭ + XT ′{Σ(1)−1

T − Σ(2)−1

T

}XT

⎤

⎦,

M2,T =12

⎡⎢⎣log

⎧⎪⎨

⎪⎩

∣∣∣Σ(1)∣∣∣

∣∣∣Σ(2)∣∣∣

⎫⎪⎬

⎪⎭+(XT − μ(1)

T

)′Σ(1)−1T

(XT − μ(1)

T

)

−(XT − μ(2)

T

)′Σ(2)−1T

(XT − μ(2)

T

)],

M3,T =12

⎡⎢⎣log

⎧⎪⎨

⎪⎩

∣∣∣Σ(1)∣∣∣

∣∣∣Σ(2)∣∣∣

⎫⎪⎬

⎪⎭+ XT ′

{Σ(1)−1T − Σ(2)−1

T

}XT

⎤⎥⎦,

(3.12)

where

Σ(j)T =

{∫π

−πexp(iλ(t − s))f (j)(0, λ)dλ

}

s,t=1,...,T. (3.13)

Set S(j)T (Mk) ≡Mk,T −E(j)(Mk,T ) for j = 1, 2 and k = 1, 2, 3. The next result is a large-deviation

theorem for the misspecified log-likelihood ratios Mk,T . It is useful in investigating the effectof misspecification.


limT→∞

T−1 log Pr 1

{T−1S

(1)T (Mk) > x

}= inf

ω

{ψMk

(ω; f (1), f (2), μ(1), μ(2)

)−ωmax(x, 0)

},

(3.14)

where ψM1(ω; f (1), f (2), μ(1), μ(2)) equals

14π

∫1

0

∫π

−π

⎡

⎣ logf (2)(u, λ)

(1 −ω)f (2)(u, λ) +ωf (1)(u, λ)+ω

{f (1)(u, λ)f (2)(u, λ)

− 1

}

+

[ωμ(1)(u)

{f (1)(u, 0) − f (2)(u, 0)

}]2

2π{(1 −ω)f (2)(u, 0) +ωf (1)(u, 0)

}f (1)(u, 0)f (2)(u, 0)

⎤

⎦dλdu,

(3.15)


ψM2(ω; f (1), f (2), μ(1), μ(2)) equals

14π

∫1

0

∫π

−π

[log

f (1)(0, λ)f (2)(0, λ)f (1)(0, λ)f (2)(0, λ) −ωf (1)(u, λ)

{f (2)(0, λ) − f (1)(0, λ)

}

+ω

{f (1)(u, λ)f (2)(0, λ)

− f(1)(u, λ)f (1)(0, λ)

}

+ω2{μ(2)(u) − μ(1)(u)

}2f (1)(u, 0)f (1)(0, 0)/f (2)(0, 0)

2π[f (1)(0, 0)f (2)(0, 0) −ωf (1)(u, 0)

{f (2)(0, 0) − f (1)(0, 0)

}]

⎤

⎦dλdu

(3.16)

and ψM3(ω; f (1), f (2), μ(1), μ(2)) equals

14π

∫1

0

∫π

−π

[log

f (1)(0, λ)f (2)(0, λ)f (1)(0, λ)f (2)(0, λ) −ωf (1)(u, λ)

{f (2)(0, λ) − f (1)(0, λ)

}

+ω

{f (1)(u, λ)f (2)(0, λ)

− f(1)(u, λ)f (1)(0, λ)

}

+ω2μ(1)(u)2f (1)(u, 0)

{f (1)(0, 0) − f (2)(0, 0)

}2/{f (1)(0, 0)f (2)(0, 0)

}

2π[f (1)(0, 0)f (2)(0, 0) −ωf (1)(u, 0)

{f (2)(0, 0) − f (1)(0, 0)

}]

⎤

⎦dλdu.

(3.17)

Similarly, under Π2,

limT→∞

T−1 log Pr 2

{T−1S

(2)T (Mk) < x

}= inf

ω

{ψMk

(−ω; f (2), f (1), μ(2), μ(1)

)−ωmin(x, 0)

}.

(3.18)

Now, we turn to the discussion of our discriminant problem of classifying XT into oneof two categories described by two hypotheses a follows:

Π1 : μ(1)(u), f (1)(u, λ), Π2 : μ(2)(u), f (2)(u, λ). (3.19)


We use ΛT as the discriminant statistic for the problem (3.19), namely, if ΛT > 0 we assign XTinto Π2, and otherwise into Π1. Taking x = −limT→∞T−1E(1)(ΛT ) in (3.9), we can evaluate theprobability of misdiscrimination of XT from Π1 into Π2 as follows:

P(2 | 1) ≡ Pr1{ΛT > 0}

≈ exp

⎡

⎣T infω

⎧⎨

⎩1

4π

∫1

0

∫π

−π

⎡

⎣ logf (1)(u, λ)ωf (2)(u, λ)1−ω

(1 −ω)f (2)(u, λ) +ωf (1)(u, λ)

+ω(ω − 1)

{μ(1)(u) − μ(2)(u)

}2

2π{(1 −ω)f (2)(u, 0) +ωf (1)(u, 0)

}

⎤

⎦dλdu

⎫⎬

⎭

⎤

⎦.

(3.20)

Thus, we see that the rate functions play an important role in the discriminant pro-blem.

4. Numerical Illustration for Nonstationary Processes

We illustrate the implications of Theorems 3.4 and 3.5 by numerically evaluating the large-deviation probabilities of the statistics ΛT and Mk,T , k = 1, 2, 3 for the following hypotheses:

(Stationary white noise

)Π1 : μ(1)(u) ≡ 0, f (1)(u, λ) ≡ 1,

(Time-varying AR(1)

)Π2 : μ(2)(u) = μ(u), f (2)(u, λ) =

σ(u)2

∣∣1 − a(u)eiλ∣∣2,

(4.1)

where μ(u) = (1/2) exp(−u2), σ(u) = (1/2) exp{−(u−1)2} and a(u) = (1/2) exp{−4(u−1/2)2},u ∈ [0, 1], respectively. Figure 1 plots the mean function μ(u) (the solid line), the coefficientfunctions σ(u) (the dashed line), and a(u) (the dotted line). The time-varying spectral densityf (2)(u, λ) is plotted in Figure 2.

From these figures, we see that the magnitude of the mean function is large at u closeto 0, while the magnitude of the time-varying spectral density is large at u close to 1.

Specifically, we use the formulae in those theorems concerning Π2 to evaluate thelimits of the large-deviation probabilities:

LDP(Λ) = limT→∞

T−1 log Pr2

{T−1S

(2)T (Λ) < x

},

LDP(Mk) = limT→∞

T−1 log Pr2

{T−1S

(2)T (Mk) < x

}, k = 1, 2, 3.

(4.2)

Though the result is an asymptotic theory, we perform the simulation with a limited samplesize. Therefore, we use some levels of x to expect fairness, that is, we take x = −0.1,−1,−10.The results are listed in Table 1.

For each value x, the large-deviation rate of ΛT is the largest and that of M3,T is thesmallest. Namely, we see that the correctly specified case is the best, and on the other hand themisspecified to stationary case is the worst. Furthermore, the large-deviation rates −LDP(M2)


0.1

0.2

0.3

0.4

0.5

0 0.2 0.4 0.6 0.8 1μ

Figure 1: The mean function μ(u) (the solid line), the coefficient functions σ(u) (the dashed line), and a(u)(the dotted line).

0.18

0.36

0.54

1.880.63

−0.63−1.88

λ0.2

0.40.6

0.8

μ

f2

Figure 2: The time-varying spectral density f (2)(u, λ).

Table 1: The limits of the large-deviation probabilities of ΛT and Mk,T , k = 1, 2, 3.

x = −0.1 x = −1 x = −10

LDP(Λ) −0.012078 −0.562867 −9.460066LDP(M1) −0.009895 −0.486088 −8.857859LDP(M2) −0.000348 −0.026540 −0.703449LDP(M3) −0.000290 −0.022313 −0.629251

and −LDP(M2) are significantly small, comparing with −LDP(M1). This fact implies that themisspecification of the spectral density to be constant in the time seriously affects the large-deviation rate.

Figures 3, 4, 5, and 6 show the large-deviation probabilities of ΛT and Mk,T , k = 1, 2, 3,for x = −1, at each time u and frequency λ.

We see that the large-deviation rate of ΛT keeps the almost constant value at all thetime u and frequency λ. On the other hand, that of M1,T is small at u close to 0 and those ofM2,T andM3,T are small at u close to 1 and λ close to 0. That is, the large-deviation probabilityof M1,T is violated by the large magnitude of the mean function, while those of M2,T and


1.880.63

−0.63−1.88

0.20.4

0.60.8

μ

−0.795

−0.78

−0.765

−0.75

LR

λ

Figure 3: The time-frequency plot of the large-deviation probabilities of ΛT .

−0.68

−0.64

M1

−0.6

1.880.63

−0.63−1.88

0.20.4

0.60.8

μλ

Figure 4: The time-frequency plot of the large-deviation probabilities of M1,T .

M3,T are violated by that of the time-varying spectral density. Hence, we can conclude themisspecifications seriously affect our discrimination.

Appendix

We sketch the proofs of Theorems 3.3–3.5. First, we summarize the assumptions used in thispaper.

Assumption A.1. (i) Suppose that A : [0, 1] × R → C is a 2π-periodic function withA(u, λ) = A(u,−λ) which is differentiable in u and λ with uniformly bounded derivative(∂/∂u)(∂/∂λ)A. fA(u, λ) ≡ |A(u, λ)|2 denotes the time-varying spectral density. A◦t,T : R → C

are 2π-periodic functions with

supt,λ

∣∣∣∣A◦t,T (λ) −A

(t

T, λ

)∣∣∣∣ ≤ KT−1, (A.1)

(ii) suppose that μ : [0, 1] → R is differentiable with uniformly bounded derivative.


−0.042

−0.036

−0.03

−0.024

M2

1.880.63

−0.63−1.88 0.2

0.40.6

0.8

μλ


−0.036

−0.032

−0.028

−0.024

M3

1.880.63

−0.63−1.88 0.2

0.40.6

0.8

μλ


We introduce the following matrices (see Dahlhaus [4] p.154 for the detaileddefinition):

WT

(φ)=S

N

M∑

j=1

K(j)T

′W(j)

T

(φ)K(j)T , (A.2)

where

W(j)T

(φ)={∫π

−πφ(uj, λ)

exp(iλ(k − l))dλ}

k,l=1,...,Lj

, (A.3)

and K(j)T = (0j1, ILj , 0j2). According to Lemmata 4.4 and 4.7 of Dahlhaus [4], we can see that

‖ΣT (A,A)‖ ≤ C + o(1),∥∥∥ΣT (A,A)−1

∥∥∥ ≤ C + o(1), (A.4)

and WT (fA) and WT ({4π2fA}−1) are the approximations of ΣT (A,A) and ΣT (A,A)−1,respectively. We need the following lemmata which are due to Dahlhaus [3, 4]. Lemma A.2 isLemma A.5 of Dahlhaus [3] and Lemma A.3 is Theorem 3.2 (ii) of Dahlhaus [4].


Lemma A.2. Let k ∈ N, Al, Bl fulfill Assumption A.1 (i) and μ1, μ2 fulfill Assumption A.1 (ii).Let Σl = ΣT (Al,Al) or WT (fAl). Furthermore, let Γl = ΣT (Bl, Bl), WT ({4π2}−1

fBl) or Γ−1l =

WT ({4π2fBl}−1). Then we have

T−1 tr

{k∏

l=1

Γ−1l Σl

}=

12π

∫1

0

∫π

−π

{k∏

l=1

fAl(u, λ)fBl(u, λ)

}dλdu +O

(T−1/2log2k+2T

),

T−1μ′1,T

{k−1∏

l=1

Γ−1l Σl

}Γ−1k μ2,T =

12π

∫1

0

{k−1∏

l=1

fAl(u, 0)fBl(u, 0)

}fBk(u, 0)

−1μ1(u)μ2(u)du

+O(T−1/2log2k+2T

).

(A.5)

Lemma A.3. Let D◦ be the transfer function of a locally stationary process {Zt,T}, where thecorresponding D is bounded from below and has uniformly bounded derivative (∂/∂u)(∂/∂λ)D.fD(u, λ) ≡ |D(u, λ)|2 denotes the time-varying spectral density of Zt,T . Then, for ΣT (d) ≡ ΣT (D,D),we have

limT→∞

T−1 log|ΣT (d)| = 12π

∫1

0

∫π

−πlog 2πfD(u, λ)dλdu. (A.6)

We also remark that if UT and VT are real nonnegative symmetric matrices, then

tr{UTVT} ≤ tr{UT}‖VT‖. (A.7)

Proof of Theorems 3.3–3.5. We need the cumulant generating function of the quadratic form innormal variables XT ∼ N(ν(j)T ,Σ(j)

T ). It is known that the quadratic form S(j)T ≡ X′THTXT +

h′TXT − E(j)(X′THTXT + h′TXT ) has cumulant generating function logE(j)(eωS(j)T ) equals to

−12

log∣∣∣Σ(j)

T

∣∣∣ − 12

log∣∣∣∣Σ

(j)−1

T − 2ωHT

∣∣∣∣ −ω tr{HTΣ

(j)T

}

+12ω2{hT + 2HTμ

(j)T

}′{Σ(j)−1

T − 2ωHT

}−1{hT + 2HTμ

(j)T

} (A.8)


(see Mathai and Provost [11] Theorem 3.2a.3). Theorems 3.3, 3.4, and 3.5 correspond to therespective cases:

HT (Q) = G−1T , hT (Q) = gT ;

HT (Λ) =12

(Σ(1)−1

T − Σ(2)−1

T

), hT (Λ) = −Σ(1)−1

T μ(1)T + Σ(2)−1

T μ(2)T ;

HT (M1) =12

(Σ(1)−1

T − Σ(2)−1

T

), hT (M1) = 0;

HT (M2) =12

(Σ(1)−1T − Σ(2)−1

T

), hT (M2) = −Σ(1)−1

T μ(1)T + Σ

(2)−1T μ(2)

T ;

HT (M3) =12

(Σ(1)−1T − Σ(2)−1

T

), hT (M3) = 0.

(A.9)

We prove Theorem 3.5 for M3,T (under Π1) only. Theorems 3.3 and 3.4 are similarly obtained.In order to use the Gartner-Ellis theorem, consider

ψ(j)T (ω) ≡ T−1 log

[E(j){

exp(ωS

(j)T (M3)

)}]. (A.10)

Setting HT = HT (M3) and hT = hT (M3) in (A.8), we have under Π1 the following:

ψ(1)T (ω) = − (2T)−1

[log∣∣∣Σ(1)

T

∣∣∣ − log∣∣∣Σ(1)−1

T −ω(Σ(1)−1T − Σ(2)−1

T

)∣∣∣

−ω tr{(

Σ(1)−1T − Σ(2)−1

T

)Σ(1)T

}+ω2μ(1)

T

′{Σ(1)−1T − Σ(2)−1

T

}

×{Σ(1)−1

T −ω(Σ(1)−1T − Σ(2)−1

T

)}−1{Σ(1)−1T − Σ(2)−1

T

}μ(1)T

].

(A.11)

Using the inequality (A.7), we then replace {Σ(1)−1

T − ω(Σ(1)−1T − Σ

(2)−1T )} by WT (f

(1)0 f

(2)0 −

ωf (1){f (2)0 − f (1)

0 }/4π2f (1)f(1)0 f

(2)0 ), where f

(j)0 denote f (j)(0, λ), j = 1, 2, that is, we obtain

the approximation

− (2T)−1

⎡⎢⎣− log

∣∣∣Σ(1)T

∣∣∣−log

∣∣∣∣∣∣∣WT

⎛⎜⎝f(1)0 f

(2)0 −ωf (1)

{f(2)0 − f (1)

0

}

4π2f (1)f(1)0 f

(2)0

⎞⎟⎠

∣∣∣∣∣∣∣−ω tr

{(Σ(1)−1T − Σ(2)−1

T

)Σ(1)T

}

+ω2μ(1)T

′{Σ(1)−1T − Σ(2)−1

T

}⎧⎪⎨

⎪⎩WT

⎛⎜⎝f(1)0 f

(2)0 −ωf (1)

{f(2)0 − f (1)

0

}

4π2f (1)f(1)0 f

(2)0

⎞⎟⎠

⎫⎪⎬

⎪⎭

−1

×{Σ(1)−1T − Σ(2)−1

T

}μ(1)T

⎤⎥⎦.

(A.12)


In view of Lemmas A.2 and A.3, the above ψ(1)T (ω) converges to ψM3 , given in

Theorem 3.5. Clearly, ψM3 exists for ω ∈ DψM3= {ω : 1 − ω{(f (1)(u, λ)/f (1)(0, λ)) −

(f (1)(u, λ)/f (2)(0, λ))} > 0} and is convex and continuously differentiable with respect toω. For a sequence {ωm} → ω0 ∈ ∂DψM3

as m → ∞, we can show that

∂ψM3

(ωm; f (1), f (2), μ(1), μ(2))

∂ω−→ ∞, ∂ψM3

(0; f (1), f (2), μ(1), μ(2))

∂ω= 0. (A.13)

Hence, ψ ′M3(DψM3

) ⊃ (x,∞) for every x > 0. Application of the Gartner-Ellis theoremcompletes the proof.

Acknowledgments

The author would like to thank the referees for their many insightful comments, whichimproved the original version of this paper. The author would also like to thank ProfessorMasanobu Taniguchi who is the lead guest editor of this special issue for his efforts andcelebrate his sixtieth birthday.

References

[1] J. A. Bucklew, Large Deviation Techniques in Decision, Simulation, and Estimation, Wiley, New York, NY,USA, 1990.

[2] T. Sato, Y. Kakizawa, and M. Taniguchi, “Large deviation results for statistics of short- and long-memory Gaussian processes,” Australian & New Zealand Journal of Statistics, vol. 40, no. 1, pp. 17–29,1998.

[3] R. Dahlhaus, “Maximum likelihood estimation and model selection for locally stationary processes,”Journal of Nonparametric Statistics, vol. 6, no. 2-3, pp. 171–191, 1996.

[4] R. Dahlhaus, “On the Kullback-Leibler information divergence of locally stationary processes,”Stochastic Processes and their Applications, vol. 62, no. 1, pp. 139–168, 1996.

[5] G. Chandler and W. Polonik, “Discrimination of locally stationary time series based on the excessmass functional,” Journal of the American Statistical Association, vol. 101, no. 473, pp. 240–253, 2006.

[6] K. Sakiyama and M. Taniguchi, “Discriminant analysis for locally stationary processes,” Journal ofMultivariate Analysis, vol. 90, no. 2, pp. 282–300, 2004.

[7] J. Hirukawa, “Discriminant analysis for multivariate non-Gaussian locally stationary processes,”Scientiae Mathematicae Japonicae, vol. 60, no. 2, pp. 357–380, 2004.

[8] J. Gartner, “On large deviations from an invariant measure,” Theory of Probability and its Applications,vol. 22, no. 1, pp. 24–39, 1977.

[9] R. S. Ellis, “Large deviations for a general class of random vectors,” The Annals of Probability, vol. 12,no. 1, pp. 1–12, 1984.

[10] T. W. Anderson, An Introduction to Multivariate Statistical Analysis, Wiley, New York, NY, USA, 2ndedition, 1984.

[11] A. M. Mathai and S. B. Provost, Quadratic Forms in Random Variables: Theory and Applications, MarcelDekker, New York, NY, USA, 1992.


Research ArticleAsymptotic Optimality of Estimating FunctionEstimator for CHARN Model

Tomoyuki Amano

Faculty of Economics, Wakayama University, 930 Sakaedani, Wakayama 640-8510, Japan

Correspondence should be addressed to Tomoyuki Amano, [email protected]


Academic Editor: Hiroshi Shiraishi

Copyright q 2012 Tomoyuki Amano. This is an open access article distributed under the CreativeCommons Attribution License, which permits unrestricted use, distribution, and reproduction inany medium, provided the original work is properly cited.

CHARN model is a famous and important model in the finance, which includes many financialtime series models and can be assumed as the return processes of assets. One of the mostfundamental estimators for financial time series models is the conditional least squares (CL)estimator. However, recently, it was shown that the optimal estimating function estimator (Gestimator) is better than CL estimator for some time series models in the sense of efficiency. Inthis paper, we examine efficiencies of CL and G estimators for CHARN model and derive thecondition that G estimator is asymptotically optimal.

1. Introduction

The conditional least squares (CL) estimator is one of the most fundamental estimators forfinancial time series models. It has the two advantages which can be calculated with easeand does not need the knowledge about the innovation process (i.e., error term). Hence thisconvenient estimator has been widely used for many financial time series models. However,Amano and Taniguchi [1] proved it is not good in the sense of the efficiency for ARCH model,which is the most famous financial time series model.

The estimating function estimator was introduced by Godambe ([2, 3]) and Hansen[4]. Recently, Chandra and Taniguchi [5] constructed the optimal estimating functionestimator (G estimator) for the parameters of the random coefficient autoregressive (RCA)model, which was introduced to describe occasional sharp spikes exhibited in many fieldsand ARCH model based on Godambe’s asymptotically optimal estimating function. InChandra and Taniguchi [5], it was shown that G estimator is better than CL estimator bysimulation. Furthermore, Amano [6] applied CL and G estimators to some important timeseries models (RCA, GARCH, and nonlinear AR models) and proved that G estimator is


Table 1: MSE of θCL and θG for the parameter a in (4).

a 0.1 0.2 0.3

θCL(n = 100) 0.01103311 0.01135393 0.01035005θG(n = 100) 0.01096804 0.01113519 0.01006857θCL(n = 200) 0.00596135 0.00586717 0.00549567θG (n = 200) 0.00565094 0.00555699 0.00533862θCL(n = 300) 0.00371269 0.00376673 0.00351314θG(n = 300) 0.00356603 0.00359798 0.00348829

better than CL estimator in the sense of the efficiency theoretically. Amano [6] also derivedthe conditions that G estimator becomes asymptotically optimal, which are not strict andnatural.

However, in Amano [6], G estimator was not applied to a conditional heteroscedasticautoregressive nonlinear model (denoted by CHARN model). CHARN model was proposedby Hardle and Tsybakov [7] and Hardle et al. [8], which includes many financial timeseries models and is used widely in the finance. Kanai et al. [9] applied G estimator toCHARN model and proved its asymptotic normality. However, Kanai et al. [9] did notcompare efficiencies of CL and G estimators and discuss the asymptotic optimality of Gestimator theoretically. Since CHARN model is the important and rich model, which includesmany financial time series models and can be assumed as return processes of assets, moreinvestigation of CL and G estimators for this model are needed. Hence, in this paper, wecompare efficiencies of CL and G estimators and investigate the asymptotic optimality of Gestimator for this model.

This paper is organized as follows. Section 2 describes definitions of CL and Gestimators. In Section 3, CL and G estimators are applied to CHARN model, and efficienciesof these estimators are compared. Furthermore, we derive the condition of asymptoticoptimality of G estimator. We also compare the mean squared errors of θCL and θG bysimulation in Section 4. Proofs of Theorems are relegated to Section 5. Throughout this paper,we use the following notation: |A|: Sum of the absolute values of all entries of A.

2. Definitions of CL and G Estimators

One of the most fundamental estimators for parameters of the financial time series modelsis the conditional least squares (CL) estimator θCL introduced by Tjφstheim [10], and it hasbeen widely used in the finance. θCL for a time series model {Xt} is obtained by minimizingthe penalty function

Qn(θ) ≡n∑

t=m+1

(Xt − E[Xt | Ft(m)])2, (2.1)

where Ft(m) is the σ-algebra generated by {Xs : t −m ≤ s ≤ t − 1}, and m is an appropriatepositive integer (e.g., if {Xt} follows kth-order nonlinear autoregressive model, we can takem = k). CL estimator generally has a simple expression. However, it is not asymptoticallyoptimal in general (see Amano and Taniguchi [1]).


Hence, Chandra and Taniguchi [5] constructed G estimator θG based on Godambe’sasymptotically optimal estimating function for RCA and ARCH models. For the definitionof θG, we prepare the following estimating function G(θ). Let {Xt} be a stochastic processwhich is depending on the k-dimensional parameter θ0, then G(θ) is given by

G(θ) =n∑

t=1

at−1ht, (2.2)

where at−1 is a k-dimensional vector depending on X1, . . . , Xt−1 and θ, ht = Xt − E[Xt | Ft−1],and Ft−1 is the σ-field generated by {Xs, s ≤ t − 1}. The estimating function estimator θE forthe parameter θ0 is defined as

G(θE

)= 0. (2.3)

Chandra and Taniguchi [5] derived the asymptotic variance of√n(θE − θ0) as

(1nE

[∂

∂θ′G(θ0)

])−1E[G(θ0)G′(θ0)]n

(1nE

[∂

∂θ′G(θ0)

]′)−1

(2.4)

and gave the following lemma by extending the result of Godambe [3].

Lemma 2.1. The asymptotic variance (2.4) is minimized if G(θ) = G∗(θ) where

G∗(θ) =n∑

t=1

a∗t−1ht,

a∗θ,t−1 = E[∂ht∂θ| Ft−1

]E[h2t | Ft−1

]−1.

(2.5)

Based on the estimating function G∗(θ) in Lemma 2.1, Chandra and Taniguchi [5]constructed G estimator θG for the parameters of RCA and ARCH models and showed thatθG is better than θCL by simulation. Furthermore, Amano [6] applied θG to some importantfinancial time series models (RCA, GARCH, and nonlinear AR models) and showed thatθG is better than θCL in the sense of the efficiency theoretically. Amano [6] also derivedconditions that θG becomes asymptotically optimal. However, in Amano [6], θCL and θG

were not applied to CHARN model, which includes many important financial time seriesmodels. Hence, in the next section, we apply θCL and θG to this model and prove θG is betterthan θCL in the sense of the efficiency for this model. Furthermore, conditions of asymptoticaloptimality of θG are also derived.

3. CL and G Estimators for CHARN Model

In this section, we discuss the asymptotics of θCL and θG for CHARN model.


CHARN model of order m is defined as

Xt = Fθ(Xt−1, . . . , Xt−m) +Hθ(Xt−1, . . . , Xt−m)ut, (3.1)

where Fθ,Hθ : Rm → R are measurable functions, and {ut} is a sequence of i.i.d. randomvariables with Eut = 0, E[u2

t ] = 1 and independent of {Xs; s < t}. Here, the parameter vectorθ = (θ1, . . . , θk)

′ is assumed to be lying in an open set Θ ⊂ Rk. Its true value is denoted by θ0.First we estimate the true parameter θ0 of (3.1) by use of θCL, which is obtained by

minimizing the penalty function

Qn(θ) =n∑

t=m+1

(Xt − E[Xt | Ft(m)])2

=n∑

t=m+1

(Xt − Fθ)2.

(3.2)

For the asymptotics of θCL, we impose the following assumptions.

Assumption 3.1. (1)ut has the probability density function f(u) > 0 a.e. u ∈ R.(2) There exist constants ai ≥ 0, bi ≥ 0, 1 ≤ i ≤ m, such that for x ∈ Rm with |x| → ∞,

|Fθ(x)| ≤m∑

i=1

ai|xi| + o(|x|),

|Hθ(x)| ≤m∑

i=1

bi|xi| + o(|x|).(3.3)

(3) Hθ(x) is continuous and symmetric on Rm, and there exists a positive constant λsuch that

Hθ(x) ≥ λ for ∀x ∈ Rm. (3.4)

(4) Consider the following

{m∑

i=1

ai + E|u1|m∑

i=1

bi

}< 1. (3.5)

Assumption 3.1 makes {Xt} strict stationary and ergodic (see [11]). We further imposethe following.

Assumption 3.2. Consider the following

Eθ|Fθ(Xt−1, . . . , Xt−m)|2 <∞,Eθ|Hθ(Xt−1, . . . , Xt−m)|2 <∞,

(3.6)

for all θ ∈ Θ.


Assumption 3.3. (1)Fθ and Hθ are almost surely twice continuously differentiable in Θ, andtheir derivatives ∂Fθ/∂θj and ∂Hθ/∂θj , j = 1, . . . , k, satisfy the condition that there existsquare-integrable functions Aj and Bj such that

∣∣∣∣∣∂Fθ∂θj

∣∣∣∣∣ ≤ Aj

∣∣∣∣∣∂Hθ

∂θj

∣∣∣∣∣ ≤ Bj,(3.7)

for all θ ∈ Θ.(2)f(u) satisfies

lim|u|→∞

|u|f(u) = 0,

∫u2f(u)du = 1.

(3.8)

(3) The continuous derivative f ′(u) ≡ ∂f(u)/∂u exists on R and satisfies

∫ (f ′

f

)4

f(u)du <∞,

∫u2

(f ′

f

)2

f(u)du <∞.(3.9)

From Tjφstheim [10], the following lemma holds.

Lemma 3.4. Under Assumptions 3.1, 3.2, and 3.3, θCL has the following asymptotic normality:

√n(θCL − θ0

)d−→ U−1WU−1, (3.10)

where

W = E[H2θ0

∂

∂θFθ0

∂

∂θ′Fθ0

],

U = E[∂

∂θFθ0

∂

∂θ′Fθ0

].

(3.11)

Next, we apply θG to CHARN model. From Lemma 2.1, θG is obtained by solving theequation

n∑

t=m+1

1H2θ

∂

∂θFθ(Xt − Fθ) = 0. (3.12)


For the asymptotic of θG, we impose the following Assumptions.

Assumption 3.5. (1) Consider the following

Eθ∥∥∥a∗θ,t−1

∥∥∥2<∞ (3.13)

for all θ ∈ Θ.(2)a∗θ,t−1 is almost surely twice continuously differentiable in Θ, and for the derivatives

∂a∗θ,t−1/∂θj , j = 1, . . . , k, there exist square-integrable functions Cj such that

∣∣∣∣∣∂a∗θ,t−1

∂θj

∣∣∣∣∣ ≤ Cj, (3.14)

for all θ ∈ Θ.(3) V = E[(1/H2

θ0)(∂/∂θ)Fθ0(∂/∂θ

′)Fθ0] is k × k-positive definite matrix and satisfies

|V| <∞. (3.15)

(4) For θ ∈ B (a neighborhood of θ0 in Θ), there exist integrable functions Pijlθ (X(t−1)),

Qijl

θ (X(t−1)), and Rijl

θ (X(t−1)) such that

∣∣∣∣∣∂2

∂θj∂θl

(a∗θ,t−1

)

iht

∣∣∣∣∣ ≤ Pijl

θ

(X(t−1)

),

∣∣∣∣∣∂

∂θj

(a∗θ,t−1

)

i

∂

∂θlht

∣∣∣∣∣ ≤ Qijl

θ

(X(t−1)

),

∣∣∣∣∣(a∗θ,t−1

)

i

∂2

∂θj∂θlht

∣∣∣∣∣ ≤ Rijl

θ

(X(t−1)

),

(3.16)

for i, j, l = 1, . . . , k, where X(t−1) = (X1, . . . , Xt−1).From Kanai et al. [9], the following lemma holds.

Lemma 3.6. Under Assumptions 3.1, 3.2, 3.3, and 3.5, the following statement holds:

√n(θG − θ0

)d−→N

(0,V−1

). (3.17)

Finally we compare efficiencies of θCL and θG. We give the following theorem.

Theorem 3.7. Under Assumptions 3.1, 3.2, 3.3, and 3.5, the following inequality holds:

U−1WU−1 ≥ V−1, (3.18)


and equality holds if and only ifHθ0 is constant or ∂Fθ0/∂θ = 0 (for matrices A and B, A ≥ B meansA − B is positive definite).

This theorem is proved by use of Kholevo inequality (see Kholevo [12]). From thistheorem, we can see that the magnitude of the asymptotic variance of θG is smaller thanthat of θCL, and the condition that these asymptotic variances coincide is strict. Therefore,θG is better than θCL in the sense of the efficiency. Hence, we evaluate the condition that θG

is asymptotically optimal based on local asymptotic normality (LAN). LAN is the conceptof local asymptotic normality for the likelihood ratio of general statistical models, which wasestablished by Le Cam [13]. Once LAN is established, the asymptotic optimality of estimatorsand tests can be described in terms of the LAN property. Hence, its Fisher information matrixΓ is described in terms of LAN, and the asymptotic variance of an estimator has the lowerbound Γ−1. Now, we prepare the following Lemma, which is due to Kato et al. [14].

Lemma 3.8. Under Assumptions 3.1, 3.2, and 3.3, CHARN model has LAN, and its Fisherinformation matrix Γ is

E

⎡⎢⎢⎣

1H2θ0

(−∂Hθ0

∂θ,∂Fθ0

∂θ

)(a cc b

)⎛⎜⎜⎝

−∂Hθ0

∂θ′

∂Fθ0

∂θ′

⎞⎟⎟⎠

⎤⎥⎥⎦, (3.19)

where

at = ut(f ′(ut)/f(ut)

)+ 1, bt = −

(f ′(ut)/f(ut)

),

a = E[a2t

], b = E

[b2t

], c = E[atbt].

(3.20)

From this Lemma, the asymptotic variance of θGV−1 has the lower bound Γ−1, that is,

V−1 ≥ Γ−1. (3.21)

The next theorem gives the condition that V−1 equals Γ−1, that is θG becomesasymptotically optimal.

Theorem 3.9. Under Assumptions 3.1, 3.2, 3.3, and 3.5, if ∂Hθ0/∂θ = 0 and ut is Gaussian, thenθG is asymptotically optimal, that is,

V−1 = Γ−1. (3.22)

Finally, we give the following example which satisfies the assumptions in Theorems3.7 and 3.9.

Example 3.10. CHARN model includes the following nonlinear AR model:

Xt = Fθ(Xt−1, . . . , Xt−m) + ut, (3.23)


where Fθ: Rm → R is a measurable function, and {ut} is a sequence of i.i.d. random variableswith Eut = 0, E[u2

t ] = 1 and independent of {Xs; s < t}, and we assume Assumptions 3.1,

3.2, 3.3, and 3.5 (for example, we define Fθ =√a0 + a1X

2t−1 + · · · + amX2

t−m, where a0 > 0,aj ≥ 0, j = 1, . . . , m,

∑mj=1 aj < 1 ). In Amano [6], it was shown that the asymptotic variance of

θCL attains that of θG. Amano [6] also showed under the condition that ut is Gaussian, θG isasymptotically optimal.

4. Numerical Studies

In this section, we evaluate accuracies of θCL and θG for the parameter of CHARN model bysimulation. Throughout this section, we assume the following model:

Xt = aXt−1 +√

0.2 + 0.1X2t−1ut, (4.1)

where {ut} ∼i.i.d.N(0, 1). Mean squared errors (MSEs) of θCL and θG for the parameter a arereported in the following Table 1. The simulations are based on 1000 realizations, and we setthe parameter value a and the length of observations n as a = 0.1, 0.2, 0.3 and n = 100, 200,300.

From Table 1, we can see that MSE of θG is smaller than that of θCL. Furthermore it isseen that MSE of θCL and θG decreases as the length of observations n increases.

5. Proofs

This section provides the proofs of the theorems. First, we prepare the following lemma tocompare the asymptotic variances of CL and G estimators (see Kholevo [12]).

Lemma 5.1. We define ψ(ω) and φ(ω) as r × s and t × s random matrices, respectively, and h(ω)as a random variable that is positive everywhere. If the matrix E[φφ′/h]−1 exists, then the followinginequality holds:

E[ψψ ′h

] ≥ E[ψφ′]E[φφ′

h

]−1

E[ψφ′

]′. (5.1)

The equality holds if and only if there exists a constant r × t matrix C such that

hψ + Cφ = o a.e. (5.2)

Now we proceed to prove Theorem 3.7.


Proof of Theorem 3.7. Let ψ = (∂/∂θ)Fθ0 , φ = (∂/∂θ)Fθ0 , and h = H2θ0

, then from thedefinitions of matrices U, W and V, it can be represented as

U = E[ψφ′

],

W = E[ψψ ′h

],

V = E

[φφ′

h

].

(5.3)

Hence from Lemma 5.1, we can see that

W ≥ UV−1U. (5.4)

From this inequality, we can see that

U−1WU−1 ≥ V−1. (5.5)

Proof of Theorem 3.9. Fisher information matrix of CHARN model based on LAN Γ can berepresented as

Γ = E

⎡⎢⎢⎣

1H2θ0

(−∂Hθ0

∂θ,∂Fθ0

∂θ

)(a cc b

)⎛⎜⎜⎝

−∂Hθ0

∂θ′

∂Fθ0

∂θ′

⎞⎟⎟⎠

⎤⎥⎥⎦

= E

[1H2θ0

(a∂Hθ0

∂θ

∂Hθ0

∂θ′− c∂Fθ0

∂θ

∂Hθ0

∂θ′− c∂Hθ0

∂θ

∂Fθ0

∂θ′+ b

∂Fθ0

∂θ

∂Fθ0

∂θ′

)]

= E

[1H2θ0

(a∂Hθ0

∂θ

∂Hθ0

∂θ′− c∂Fθ0

∂θ

∂Hθ0

∂θ′− c∂Hθ0

∂θ

∂Fθ0

∂θ′

)]

+ E

[(f ′(ut)f(ut)

)2]E

[1H2θ0

∂Fθ0

∂θ

∂Fθ0

∂θ′

].

(5.6)

From (5.6), if ∂Hθ0/∂θ = 0, Γ becomes

Γ = E

[(f ′(ut)f(ut)

)2]E

[1H2θ0

∂Fθ0

∂θ

∂Fθ0

∂θ′

]. (5.7)


Next, we show under the Gaussianity of ut that E[(f ′(ut)/f(ut))2] = 1. From the

Schwarz inequality, it can be obtained that

E

[(f ′(ut)f(ut)

)2]= E

[u2t

]E

[(f ′(ut)f(ut)

)2]

≥(E

[utf ′(ut)f(ut)

])2

=(∫∞

−∞xf ′(x)dx

)2

=(∫∞

−∞xf(x)dx −

∫∞

−∞f(x)dx

)2

= 1.

(5.8)

The equality holds if and only if there exists some constant c such that

cx =f ′(x)f(x)

. (5.9)

Equation (5.9) becomes, for some constant k,

cx =(log f(x)

)′,

c

2x2 + k = log f(x),

f(x) = e(c/2)x2+k = eke(c/2)x2.

(5.10)

Hence, c is −1, and f(x) becomes the density function of the normal distribution.

Acknowledgment

The author would like to thank referees for their comments, which improved the originalversion of this paper.

References

[1] T. Amano and M. Taniguchi, “Asymptotic efficiency of conditional least squares estimators for ARCHmodels,” Statistics & Probability Letters, vol. 78, no. 2, pp. 179–185, 2008.

[2] V. P. Godambe, “An optimum property of regular maximum likelihood estimation,” Annals ofMathematical Statistics, vol. 31, pp. 1208–1211, 1960.

[3] V. P. Godambe, “The foundations of finite sample estimation in stochastic processes,” Biometrika, vol.72, no. 2, pp. 419–428, 1985.

[4] L. P. Hansen, “Large sample properties of generalized method of moments estimators,” Econometrica,vol. 50, no. 4, pp. 1029–1054, 1982.


[5] S. A. Chandra and M. Taniguchi, “Estimating functions for nonlinear time series models,” Annals ofthe Institute of Statistical Mathematics, vol. 53, no. 1, pp. 125–141, 2001.

[6] T. Amano, “Asymptotic efficiency of estimating function estimators for nonlinear time series models,”Journal of the Japan Statistical Society, vol. 39, no. 2, pp. 209–231, 2009.

[7] W. Hardle and A. Tsybakov, “Local polynomial estimators of the volatility function in nonparametricautoregression,” Journal of Econometrics, vol. 81, no. 1, pp. 223–242, 1997.

[8] W. Hardle, A. Tsybakov, and L. Yang, “Nonparametric vector autoregression,” Journal of StatisticalPlanning and Inference, vol. 68, no. 2, pp. 221–245, 1998.

[9] H. Kanai, H. Ogata, and M. Taniguchi, “Estimating function approach for CHARN models,” Metron,vol. 68, pp. 1–21, 2010.

[10] D. Tjøstheim, “Estimation in nonlinear time series models,” Stochastic Processes and Their Applications,vol. 21, no. 2, pp. 251–273, 1986.

[11] Z. Lu and Z. Jiang, “L1 geometric ergodicity of a multivariate nonlinear AR model with an ARCHterm,” Statistics & Probability Letters, vol. 51, no. 2, pp. 121–130, 2001.

[12] A. S. Kholevo, “On estimates of regression coefficients,” Theory of Probability and its Applications, vol.14, pp. 79–104, 1969.

[13] L. Le Cam, “Locally asymptotically normal families of distributions. Certain approximations tofamilies of distributions and their use in the theory of estimation and testing hypotheses,” vol. 3,pp. 37–98, 1960.

[14] H. Kato, M. Taniguchi, and M. Honda, “Statistical analysis for multiplicatively modulated nonlinearautoregressive model and its applications to electrophysiological signal analysis in humans,” IEEETransactions on Signal Processing, vol. 54, pp. 3414–3425, 2006.


Research ArticleOptimal Portfolio Estimation forDependent Financial Returns with GeneralizedEmpirical Likelihood

Hiroaki Ogata

School of International Liberal Studies, Waseda University, Tokyo 169-8050, Japan

Correspondence should be addressed to Hiroaki Ogata, [email protected]


Academic Editor: Junichi Hirukawa

Copyright q 2012 Hiroaki Ogata. This is an open access article distributed under the CreativeCommons Attribution License, which permits unrestricted use, distribution, and reproduction inany medium, provided the original work is properly cited.

This paper proposes to use the method of generalized empirical likelihood to find the optimalportfolio weights. The log-returns of assets are modeled by multivariate stationary processes ratherthan i.i.d. sequences. The variance of the portfolio is written by the spectral density matrix, and weseek the portfolio weights which minimize it.

1. Introduction

The modern portfolio theory has been developed since circa the 1950s. It is commonknowledge that Markowitz [1, 2] is a pioneer in this field. He introduced the so-called mean-variance theory, in which we try to maximize the expected return (minimize the variance)under the constant variance (the constant expected return). After that, many researchersfollowed, and portfolio theory has been greatly improved. For a comprehensive survey ofthis field, refer to Elton et al. [3], for example.

Despite its sophisticated paradigm, we admit there exists several criticisms againstthe early portfolio theory. One of them is that it blindly assumes that the asset returns arenormally distributed. As Mandelbrot [4] pointed out, the price changes in the financialmarket do not seem to be normally distributed. Therefore, it is appropriate to use thenonparametric estimation method to find the optimal portfolio. Furthermore, it is empiricallyobserved that financial returns are dependent. Therefore, it is unreasonable to fit theindependent model to it.

One of the nonparametric techniques which has been capturing the spotlight recentlyis the empirical likelihood method. It was originally proposed by Owen [5, 6] as a method of


inference based on a data-driven likelihood ratio function. Smith [7] and Newey and Smith[8] extended it to the generalized empirical likelihood (GEL). GEL can be also consideredas an alternative of generalized methods of moments (GMM), and it is known that itsasymptotic bias does not grow with the number of moment restrictions, while the bias ofGMM often does.

From the above point of view, we consider to find the optimal portfolio weights byusing the GEL method under the multivariate stationary processes. The optimal portfolioweights are defined as the weights which minimize the variance of the return process withconstant mean. The analysis is done in the frequency domain.

This paper is organized as follows. Section 2 explains about a frequency domainestimating function. In Section 3, we review the GEL method and mention the relatedasymptotic theory. Monte Carlo simulations and a real-data example are given in Section 4.Throughout this paper, A′ and A∗ indicate the transposition and adjoint of a matrix A,respectively.

2. Frequency Domain Estimating Function

Here, we are concerned with the m-dimensional stationary process {X(t)}t∈Z with meanvector 0, the autocovariance matrix

Γ(h) = Cov[X(t + h),X(t)] = E[X(t + h)X(t)′

], (2.1)

and spectral density matrix

f(λ) =1

2π

∞∑

h=−∞Γ(h)e−ihλ, −π ≤ λ < π. (2.2)

Suppose that information of an interested p-dimensional parameter θ ∈ Θ ⊂ Rp exists

through a system of general estimating equations in frequency domain as follows. Letφj(λ;θ), (j = 1, . . . , q, q ≥ p) be m × m matrix-valued continuous functions on [−π,π)satisfying φj(λ;θ) = φj(λ;θ)∗ and φj(−λ;θ) = φj(λ;θ)′. We assume that each φj(λ;θ) satisfiesthe spectral moment condition

∫π

−πtr{φj(λ;θ0)f(λ)

}dλ = 0

(j = 1, . . . , q

), (2.3)

where θ0 = (θ10, . . . , θp0)′ is the true value of the parameter. By taking an appropriate function

for φj(λ;θ), (2.3) can express the best portfolio weights as shown in Example 2.1.

Example 2.1 (portfolio selection). In this example, we set m = p = q. Let xi(t) be the log-returnof ith asset (i = 1, . . . , m) at time t and suppose that the process {X(t) = (x1(t), . . . , xm(t))

′} isstationary with zero mean. Consider the portfolio p(t) =

∑mi=1 θixi(t), where θ = (θ1, . . . , θm)

′

is a vector of weights, satisfying∑m

i=1 θi = 1. The process {p(t)} is a linear combination of the


stationary process, hence {p(t)} is still stationary, and, from Herglotz’s theorem, its varianceis

Var{p(t)

}= θ′ Var{X(t)}θ = θ′

{∫π

−πf(λ)dλ

}θ. (2.4)

Our aim is to find the weights θ0 = (θ10, . . . , θm0)′ that minimize the variance (the risk) of the

portfolio p(t) under the constrain of∑m

i=1 θi = 1. The Lagrange function is given by

L(θ, λ) = θ′{∫π

−πf(λ)dλ

}θ + ξ

(θ′e − 1

), (2.5)

where e = (1, 1, . . . , 1)′ and ξ is Lagrange multiplier. The first order condition leads to

(I − eθ′0

)[∫π

−π

{f(λ) + f(λ)′

}dλ

]θ0 = 0, (2.6)

where I is an identity matrix. Now, for fixed j = 1, . . . , m, consider to take

φj(λ;θ) =

⎧⎪⎪⎪⎨

⎪⎪⎪⎩

2θj(1 − θj

) (j, j

)th component

1 − 2θjθ�(j, �

)th and

(�, j

)th component with � = 1, . . . , m and � /= j

−2θkθ� (k, �)th component with k, � = 1, . . . , m and k /= j, � /= j.

(2.7)

Then, (2.3) coincides with the first order condition (2.6), which implies that the best portfolioweights can be solved with the framework of the spectral moment condition.

Besides, we can express other important indices for time series. In what follows, severalexamples are given.

Example 2.2 (autocorrelation). Denote the autocovariance and the autocorrelation of theprocess {xi(t)} (ith component of the process {X(t)}) with lag h by γi(h) and ρi(h),respectively. Suppose that we are interested in the joint estimation of ρi(h) and ρj(k). Take

φ1(λ;θ) =

⎧⎨

⎩cos(hλ) − θ1 (i, i)th component

0 otherwise,

φ2(λ;θ) =

⎧⎨

⎩cos(kλ) − θ2

(j, j

)th component

0 otherwise.

(2.8)


Then, (2.3) leads to

θ10 ={∫π

−πexp(ihλ)fii(λ)dλ

}{∫π

−πfii(λ)dλ

}−1

,

θ20 ={∫π

−πexp(ikλ)fjj(λ)dλ

}{∫π

−πfjj(λ)dλ

}−1

.

(2.9)

From Herglotz’s theorem, θ10 = γi(h)/γi(0), and θ20 = γj(k)/γj(0). Then, θ0 = (θ10, θ20)′

corresponds to the desired autocorrelations ρ = (ρi(h), ρj(k))′. The idea can be directly

extended to more than two autocorrelations.

Example 2.3 (Whittle estimation). In this example, we set p = q. Consider fitting a parametricspectral density model fθ(λ) to the true spectral density f(λ). The disparity between fθ(λ)and f(λ) is measured by the following criterion:

D(fθ, f

) ≡∫π

−π

[log det fθ(λ) + tr

{fθ(λ)

−1f(λ)}]dλ, (2.10)

which is based on Whittle likelihood. The purpose here is to seek the quasi-true value θdefined by

θ = arg minθ∈Θ

D(fθ, f

). (2.11)

Assume that the spectral density model is expressed by the following form:

fθ(λ) =

⎧⎨

⎩

∞∑

j=0

Bθ(j)

exp(ijλ

)⎫⎬

⎭K

⎧⎨

⎩

∞∑

j=0

Bθ(j)

exp(ijλ

)⎫⎬

⎭

∗

, (2.12)

where each Bθ(j) is an m ×m matrix (Bθ(0) is defined as identity matrix), and K is an m ×msymmetric matrix. The general linear process has this spectral form, so this assumption isnot so restrictive. The key of this assumption is that the elements of the parameter θ do notdepend on K. We call such a parameter innovation-free. Imagine that you fit the VARMAprocess, for example. Innovation-free implies that the elements of the parameter θ depend ononly AR or MA coefficients and not on the covariance matrix of the innovation process. Now,let us consider the equation:

∂

∂θD(fθ, f

)∣∣∣∣θ=θ

= 0, (2.13)

to find quasi-true value. The Kolmogorov’s formula says

det K = exp[

12π

∫π

−πlog det

{2πfθ(λ)

}dλ

]. (2.14)


This implies that if θ is innovation-free, the quantity∫π−π log det{fθ(λ)}dλ is independent of

θ and (2.13) leads to

∂

∂θ

∫π

−πtr{fθ(λ)

−1f(λ)dλ}∣∣∣∣

θ=θ= 0. (2.15)

This corresponds to (2.3), when we set

φj(λ;θ) =∂fθ(λ)

−1

∂θj

(j = 1, . . . , p

), (2.16)

so the quasi-true value can be expressed by the spectral moment condition.

Based on the form of (2.3), we set the estimating function for θ as

m(λt;θ) =(tr{φ1(λt;θ)In(λt)

}, . . . , tr

{φq(λt;θ)In(λt)

})′, (2.17)

where In(λ) is the periodogram, defined by

In(λ) = (2πn)−1

{n∑

t=1

X(t) exp(itλ)

}{n∑

t=1

X(t) exp(itλ)

}∗, (2.18)

where λt = (2πt)/n, (t = −[(n − 1)/2], . . . , [n/2]). Then, we have

2πn

[n/2]∑

−[(n−1)/2]

E[m(λt;θ)] −→[∫π

−πtr{φj(λ;θ0)f(λ)

}dλ

]

j=1,...,q= 0. (2.19)

3. Generalized Empirical Likelihood

Once we construct the estimating function, we can make use of the method of GEL as in thework by Smith [7] and Newey and Smith [8]. GEL is introduced as an alternative to GMMand it is pointed out that its asymptotic bias does not grow with the number of momentrestrictions, while the bias of GMM often does.

To describe GEL let ρ(v) be a function of a scalar v which is concave on its domain, anopen interval V containing zero. Let Λn(θ) = {λ : λ′m(λt;θ) ∈ V, t = 1, . . . , n}. The estimatoris the solution to a saddle point problem

θGEL = arg minθ∈Θ

supλ∈Λn(θ)

n∑

t=1

ρ(λ′m(λt;θ)

). (3.1)

The empirical likelihood (EL) estimator (cf. [9]) the exponential tilting (ET) estimator (cf.[10]), and the continuous updating estimator (CUE) (cf. [11]) are special cases with ρ(v) =log(1 − v), ρ(v) = −ev and ρ(v) = −(1 + v)2/2, respectively. Let Ω = E[m(λt;θ0)m(λt;θ0)

′],


Table 1: Estimated autocorrelations for two-dimensional-AR(1) model.

EL ET CUE

Mean s.d. Mean s.d. Mean s.d.

ρ1 (1) 0.3779 0.0890 0.3760 0.0900 0.3797 0.0911ρ2 (1) 0.4680 0.0832 0.4660 0.0817 0.4650 0.0855

G = E[∂m(λt;θ0)/∂θ], and Σ = (G′Ω−1G)−1. The following assumptions and theorems aredue to Newey and Smith [8].

Assumption 3.1. (i) θ0 ∈ Θ is the unique solution to (2.3).(ii) Θ is compact.(iii)m(λt;θ) is continuous at each θ ∈ Θ with probability one.(iv) E[supθ∈Θ||m(λt;θ)||α] <∞ for some α > 2.(v) Ω is nonsingular.(vi) ρ(v) is twice continuously differentiable in a neighborhood of zero.

Theorem 3.2. Let Assumption 3.1 hold. Then θGELp→ θ0.

Assumption 3.3. (i) θ0 ∈ int(Θ).(ii) m(λt;θ) is continuously differentiable in a neighborhood N of θ0 and

E[supθ∈N||∂m(λt;θ)/∂θ′||] <∞.(iii) rank(G) = p.

Theorem 3.4. Let Assumptions 3.1 and 3.3 hold. Then√n(θGEL − θ0)

p→ N(0,Σ).

4. Monte Carlo Studies and Illustration

In the first part of this section, we summarize the estimation results of Monte Carlo studies ofthe GEL method. We generate 200 observations from the following two-dimensional-AR(1)model

[X1(t)

X2(2)

]=

[0.2 −0.5

−0.5 0.3

][X1(t − 1)

X2(t − 2)

]+

[U1(t)

U2(t)

], (4.1)

where {U(t) = (U1(t), U2(t))′}t∈Z is an i.i.d. innovation process, distributed to two-

dimensional t-distribution whose correlation matrix is identity, and degree of freedom is 5.The true autocorrelations with lag 1 of this process are ρ1(1) = 0.3894 and ρ2(1) = 0.4761,respectively. As described in Example 2.2, we estimate ρ1(1) and ρ2(1) by using three typesof frequency domain GEL method (EL, ET, and CUE). Table 1 shows the mean and standarddeviation of the estimation results whose repetition time is 1000. All types work properly.

Next, we apply the proposed method to the returns of market index data. The sampleconsists of 7 weekly indices (S&P 500, Bovespa, CAC 40, AEX, ATX, HKHSI, and Nikkei)having 800 observations each: the initial date is April 30, 1993, and the ending date is August22, 2008. Refer to Table 2 for the market of each index.


Table 2: Market.

Index MarketS&P 500 NYSEBovespa Sao Paulo Stock ExchangeCAC 40 Bourse de ParisAEX Amsterdam Stock ExchangeATX Wiener BorseHKHSI Hong Kong Exchanges and ClearingNikkei Tokyo Stock Exchange

Table 3: Estimated portfolio weights.

EL ET CUES&P 500 0.0759 0.0617 0.0001Bovespa 0.6648 0.6487 0.6827CAC 40 0.0000 0.0000 0.0000AEX 0.0000 0.0000 0.0000ATX 0.2593 0.2558 0.3168HKHSI 0.0000 0.0000 0.0000Nikkei 0.0000 0.0338 0.0004

As shown in Example 2.1, we use frequency domain GEL method to estimate theoptimal portfolio weights, and the results are shown in Table 3. Bovespa and ATX accountfor large part in the optimal portfolio.

Acknowledgment

This work was supported by Grant-in-Aid for Young Scientists (B) (22700291).

References

[1] H. Markowitz, “Portfolio selection,” Journal of Finance, vol. 7, no. 1, pp. 77–91, 1952.[2] H. M. Markowitz, Portfolio Selection: Efficient Diversification of Investments, John Wiley & Sons, New

York, NY, USA, 1959.[3] E. J. Elton, M. J. Gruber, S. J. Brown, and W. N. Goetzmann, Modern Portfolio Theory and Investment

Analysis, John Wiley & Sons, New York, NY, USA, 7th edition, 2007.[4] B. Mandelbrot, “The variation of certain speculative prices,” Journal of Business, vol. 36, pp. 394–419,

1963.[5] A. B. Owen, “Empirical likelihood ratio confidence intervals for a single functional,” Biometrika, vol.

75, no. 2, pp. 237–249, 1988.[6] A. Owen, “Empirical likelihood ratio confidence regions,” The Annals of Statistics, vol. 18, no. 1, pp.

90–120, 1990.[7] R. J. Smith, GEL criteria for moment condition models. Working paper, 2004, http://cemmap

.ifs.org.uk/wps/cwp0419.pdf.[8] W. K. Newey and R. J. Smith, “Higher order properties of GMM and generalized empirical likelihood

estimators,” Econometrica, vol. 72, no. 1, pp. 219–255, 2004.[9] J. Qin and J. Lawless, “Empirical likelihood and general estimating equations,” The Annals of Statistics,

vol. 22, no. 1, pp. 300–325, 1994.


[10] Y. Kitamura and M. Stutzer, “An information-theoretic alternative to generalized method of momentsestimation,” Econometrica, vol. 65, no. 4, pp. 861–874, 1997.

[11] L. P. Hansen, J. Heaton, and A. Yaron, “inite-sample properties of some alternative GMM estimators,”Journal of Business and Economic Statistics, vol. 14, no. 3, pp. 262–280, 1996.


Research ArticleStatistically Efficient Construction ofα-Risk-Minimizing Portfolio

Hiroyuki Taniai1 and Takayuki Shiohama2

1 School of International Liberal Studies, Waseda University, 1-6-1 Nishi-Waseda, Shinjuku,Tokyo 169-8050, Japan

2 Department of Management Science, Faculty of Engineering, Tokyo University of Science,1-3 Kagurazaka, Shinjuku, Tokyo 162-8601, Japan

Correspondence should be addressed to Hiroyuki Taniai, [email protected]

Received 21 March 2012; Accepted 19 April 2012

Academic Editor: Masanobu Taniguchi

Copyright q 2012 H. Taniai and T. Shiohama. This is an open access article distributed underthe Creative Commons Attribution License, which permits unrestricted use, distribution, andreproduction in any medium, provided the original work is properly cited.

We propose a semiparametrically efficient estimator for α-risk-minimizing portfolio weights.Based on the work of Bassett et al. (2004), an α-risk-minimizing portfolio optimization is for-mulated as a linear quantile regression problem. The quantile regression method uses a pseu-dolikelihood based on an asymmetric Laplace reference density, and asymptotic properties suchas consistency and asymptotic normality are obtained. We apply the results of Hallin et al. (2008)to the problem of constructing α-risk-minimizing portfolios using residual signs and ranks anda general reference density. Monte Carlo simulations assess the performance of the proposedmethod. Empirical applications are also investigated.

1. Introduction

Since the first formation of Markowitz’s mean-variance model, portfolio optimization andconstruction have been a critical part of asset and fund management. At the same time,portfolio risk assessment has become an essential tool in risk management. Yet there are well-known shortcomings of variance as a risk measure for the purposes of portfolio optimization;namely, variance is a good risk measure only for elliptical and symmetric return distributions.

The proper mathematical characterization of risk is of central importance in finance.The choice of an adequate risk measure is a complex task that, in principle, involves deepconsideration of the attitudes of market players and the structure of markets. Recently, valueat risk (VaR) has gained widespread use, in practice as well as in regulation. VaR has beencriticized, however, because as a quantile is no reason to be convex, and indeed, it is easy toconstruct portfolios for which VaR seriously violates convexity. The shortcomings of VaR ledto the introduction of coherent risk measures. Artzner et al. [1] and Follmer and Schied [2]


question whether VaR qualifies as such a measure, and both find that VaR is not an adequatemeasure of risk. Unlike VaR, expected shortfall (or tail VaR), which is defined as the expectedportfolio tail return, has been shown to have all necessary characteristics of a coherent riskmeasure. In this paper, we use α-risk as a risk measure that satisfies the conditions of coherentrisk measure (see [3]). Variants of the α-risk measure include expected shortfall and tail VaR.The α-risk-minimizing portfolio, introduced as a pessimistic portfolio in Bassett et al. [3], canbe formulated as a problem of linear quantile regression.

Since the seminal work by Koenker and Bassett [4], quantile regression (QR) hasbecome more widely used to describe the conditional distribution of a random variable givena set of covariates. One common finding in the extant literature is that the quantile regressionestimator has good asymptotic properties under various data dependence structures, and fora wide variety of conditional quantile models and data structures. A comprehensive guide toquantile regression is provided by Koenker [5].

Quantile regression methods use a pseudolikelihood based on an asymmetricLaplace reference density (see [6]). Komunjer [7] introduced a class of “tick-exponential”distribution, which includes an asymmetric Laplace density as a particular case, and showedthat the tick-exponential QMLE reduces to the standard quantile regression estimator ofKoenker and Bassett [4].

In quantile regression, one must know the conditional error density at zero, andincorrect specification of the conditional error density leads to inefficient estimators. Yetcorrect specification is difficult, because reliable shape information may be scarce. Zhao [8],Whang [9], and Komunjer and Vuong [10] propose efficiency corrections for the univariatequantile regression model.

This paper describes a semiparametrically efficient estimation of an α-risk-minimizingportfolio in place of an asymmetric Laplace reference density (a standard quantile regressionestimator), by using any other α-quantile zero reference density f , based on residual ranksand signs. A

√n-consistent and asymptotically normal one-step estimator is proposed. Like

all semiparametric estimators in the literature, our method relies on the availability of a√n-consistent first-round estimator, a natural choice being the standard quantile regression

estimator. Under correct specifications, they attain the semiparametric efficiency boundassociated with f .

The remainder of this paper is organized as follows. In Section 2, we introduce thesetup and definition of an α-risk-minimizing portfolio and present its equivalent formationunder quantile regression settings. Section 3 contains theoretical results for our one-stepestimator, and Section 4 describes its computation and performance. Section 5 gives empiricalapplications, and Section 6 our conclusions.

2. α-Risk-Minimizing Portfolio Formulation

“α-risk” can be considered a coherent measure of risk as discussed in Artzner et al. [1]. Theα-risk of X, say ρνα(X), is defined as

ρνα(X) := −∫1

0F←X (t)dνα(t) = − 1

α

∫α

0F←X (t)dt, α ∈ (0, 1), (2.1)

where να(t) := min{t/α, 1} and F←X (α) := inf{x : FX(x) ≥ α} denote the quantile function of arandom variable X with distribution function FX . Here, we recall the definition of expected


shortfall and the relationship among the tail risk measures in finance. The α-expected shortfalldefined for α ∈ (0, 1) as

ES(α)(X) = − 1α

∫α

0F←X

(p)dp (2.2)

can be shown to be a risk measure that satisfies the axioms of a coherent measure of risk. It isworth mentioning that the expected shortfall is closely related but not coincident to the notionof conditional value at risk CVaR(α) defined in Uryasev [11] and Pflug [12]. We note thatexpected shortfall and conditional VaR or tail conditional expectations are identical “extreme”risk measures only for continuous distributions, that is,

CVaR(α)(X) = TCE(α)(X) = −E[X | X > F←X (α)

]. (2.3)

To avoid confusion, in this paper, we use the term “α-risk measure” instead of terms likeexpected shortfall, CVaR, or tail conditional expectation.

Bassett et al. [3] showed that a portfolio with minimized α-risk can be constructed viathe quantile regression (QR) methods of Koenker and Bassett [4]. QR is based on the fact thata quantile can be characterized as the minimizer of some expected asymmetric absolute lossfunction, namely,

F←X (α) = argminθ

E[(α1{X − θ ≥ 0} + (1 − α)1{X − θ < 0})|X − θ|]

=: argminθ

E[ρα(X − θ)

],

(2.4)

where ρα(u) := u(α − 1{u < 0}), u ∈ R is called the check function (see [5]), and 1A is theindicator function defined by 1A = 1A(ω) := 1 if ω ∈ A, := 0 if ω /∈ A. To construct theoptimal (i.e., α-risk minimized) portfolio, the following lemma is needed.

Lemma 2.1 (Theorem 2 of [3]). Let X be a real-valued random variable with EX = μ <∞, then

minθ∈R

E[ρα(X − θ)

]= α

(μ + ρνα(X)

). (2.5)

Then, Y = Y (π) = X′π denotes a portfolio consisting of d different assets X :=(X1, . . . , Xd)

′ with allocation weights π := (π1, . . . , πd)′ (subject to

∑dj=1 πj = 1), and the

optimization problem under study is, for some prespecified expected return μ0,

minπρνα

(X′π

)subject to E

(X′π

)= μ0, 1′dπ = 1. (2.6)

The sample or empirical analogue of this problem can be expressed as

minb∈Rd

n+1∑

i=1

ρα(Zi − (Awb)i), (2.7)


where Xij denotes the jth sample value of asset i, Xi := n−1 ∑nj=1 Xij ,

Z = (Z1, . . . , Zn, Zn+1)′ :=(X11, . . . , Xn1, κ

(X1 − μ0

))′,

Aw :=

⎡⎢⎢⎢⎢⎢⎣

1 X11 −X21 · · · X11 −Xd1...

.... . .

...1 X1n −X2n · · · X1n −Xdn

0 κ(X1 −X2

)· · · κ

(X1 −Xd

)

⎤⎥⎥⎥⎥⎥⎦

=:

⎡⎢⎢⎢⎢⎣

W′1

...W′

n

W′n+1

⎤⎥⎥⎥⎥⎦,

(2.8)

with some κ sufficiently large. The minimizer of (2.7), namely,

β(n)

(α) := argminb∈Rd

n+1∑

i=1

ρα(Zi − (Awb)i) = argminb∈Rd

n+1∑

i=1

ρα(Zi −W′

ib)

(2.9)

and π(n)1 (α) := 1 − ∑d

i=2 β(n)i (α), provides the optimal weights yielding the minimal α-risk.

The large sample properties of β(n)

(α), especially its√n-consistency, can be implied from the

standard arguments and assumptions in the QR context (see [5]).

Let W(n)

and Σ(n)W be the mean vector and the covariance matrix of Wi which are given

by

W(n)

:=1

n + 1

n+1∑

i=1

Wi, Σ(n)W :=

1n + 1

n+1∑

i=1

(Wi −W

(n))(Wi −W

(n))′, (2.10)

respectively. Here, the (p, q)th element of Σ(n)W is

σ(n)W,pq =

⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

n

(n + 1)2if p = q = 1,

n(1 − κ)(n + 1)2

(X1 −Xp

)if q = 1, p = 2, . . . , d,

n(1 − κ)(n + 1)2

(X1 −Xq

)if p = 1, q = 2, . . . , d,

σpq +n(κ − 1)2

(n + 1)2

(X1 −Xp

)(X1 −Xq

)if p, q = 2, . . . , d,

(2.11)

where

σpq :=1

n + 1

n∑

i=1

(X1i −Xpi

)(X1i −Xqi

) − n

n + 1

(X1 −Xp

)(X1 −Xq

). (2.12)


Let DΣW := diag{σ(n)W,11, . . . , σ

(n)W,dd}. Then the correlation matrix of {Wi}i=1,...,n+1 becomes R :=

D−1/2ΣW

Σ(n)W D−1/2

ΣW, and the (p, q)th elemant of R is given by

rpq

=1 +

((n + 1)2/n(κ − 1)2

)·(σpq/

(X1 −Xp

)(X1 −Xq

))

(1+

((n+1)2/n(κ−1)2

)·(σpp/

(X1−Xp

)2))1/2(

1+((n+1)2/n(κ−1)2

)·(σqq/

(X1−Xq

)2))1/2

,

(2.13)

for p, q = 2, . . . , d. The above correlation coefficient can take values close to 1 when n/κ2 isclose to 0 with (X1−Xp)/= 0 and (X1−Xq)/= 0. Hence, the correlation of the estimated portfolioweights is possibly highly correlated among assets whose sample means differ fromX1, whilethese problems are ignorable in an asymptotic inference problem if we take κ = O(n1/2).

Thus far, we have seen that the α-risk-minimizing portfolio can be obtained by (2.9),which was the result of Bassett et al. [3]. In what follows, we show that semiparametrically

efficient inference of the optimal weights β(n)

(α) is feasible. The quantity estimated by (2.9)can be regarded as a QR coefficient β(α), defined by

F←Zi(α | wi) := F←Zi|Wi=wi

(α) =: w′iβ(α), (2.14)

where F←X|S(·) denotes a conditional quantile function, that is, F←

X|S(α) := inf{x : P(X ≤x|S) ≥ α}. Note that here the QR model (2.14) has a random coefficient regression(RCR) interpretation of the form Zi = W′

iβ(Ui) with componentwise monotone increasingfunction β and random variables Ui that are uniformly distributed over [0, 1], that is,Ui ∼ Uniform [0, 1] (see [5]). Here, a choice such that β(u) = [β1(u), β2(u), . . . , βd(u)]

′ :=[b1 +F←ξ (u), b2, . . . , bd]

′ with Fξ the distribution function of some independent and identicallydistributed (i.i.d.) n-tuple (ξ1, . . . , ξn) yields

Zi = W′iβ(Ui) = W′

iβ(Fξ(ξi)

)= W′

i

⎡⎢⎢⎢⎣

b1 + ξib2...bd

⎤⎥⎥⎥⎦. (2.15)

Hence, recalling that the first component of Wi is 1, it follows that, for any fixed α ∈ [0, 1], theQR coefficient β(α) can be characterized as the parameter b ∈ R

d of a model such as

Zi = W′ib + ξi, ξi

iid∼ G, (2.16)

where the density g of G is subject to

g ∈ Fα :=

{f :

∫0

−∞f(x)dx = α = 1 −

∫∞

0f(x)dx

}, (2.17)


that is, G←(α) = 0. Let us describe this model as (Z(n),A(n),P(n)Q ), with P(n)

Q := {P (n)b,g | b ∈

Rd, g ∈ Fα}, where P (n)

b,g denotes the distribution of an observation {Zi}ni=1. This model (2.16)is a fixed-α submodel of (2.14) and is the parametric submodel through which we will achievesemiparametric efficiency.

The model (2.16) is a quantile-restricted linear regression model. But here we haveno knowledge about the true density g, other than that it belongs to Fα, which allows usto identify b. So, we arbitrarily choose f from Fα and call it the “reference density” andcorrespondingly define a “reference model”

Zi = W′ib + ei, ei

iid∼ F, (2.18)

where the density f of F is subject to f ∈ Fα. The goal of the next section is to construct anasymptotically efficient version of βn(α) based on some feasible f ∈ Fα, that is, attaining thesemiparametric lower bound at correctly specified density f = g that nevertheless remains√n-consistent under a misspecified density (f /= g).

3. Semiparametrically Efficient Estimation

The procedure that we will apply here to achieve semiparametric efficiency is based on theinvariance principle, as introduced by Hallin and Werker [13]. To this end, first we shouldhave locally asymptotic normality (LAN; see, e.g., van der Vaart [14]) for a parametricsubmodel P (n)

b,g , namely,

logP(n)b+h/

√n,g

P(n)b,g

= h′Δ(n)b;g +

12h′Ib;gh + oP (1), h ∈ R

d,

Δ(n)b;g

d−→ N(0, Ib;g

),

(3.1)

where all the stochastic convergences are taken under Pb,g := P(∞)b,g . Here, the random vector

Δ(n)b;g is called the central sequence, and the positive definite matrix Ib;g is the information matrix.

To ensure the LAN condition for model (2.18), the following assumption is required.

Assumption 3.1. The reference density f has finite Fisher information for location:

0 < If :=∫∞

−∞ϕf(x)2f(x)dx =

∫1

0ϕf(F←(u))2du <∞, where ϕf(x) :=

−f ′(x)f(x)

. (3.2)

Assumption 3.2. The regression vectors Wi satisfy, under Pb,g ,

W(n) P−→ μW, Σ

(n)W

P−→ ΣW, (3.3)

for some vector μW and positive definite ΣW, where W(n)

and Σ(n)W are defined by (2.10).


Then, by Theorem 2.1 and Example 4.1 of Drost et al. [15], model (2.18) satisfies theuniform LAN condition for any bn of the form b + O(n−1/2), with central sequence andinformation matrix

Δ(n)bn;f =

1√n + 1

n+1∑

i=1

ϕf(ebn,i)Wi, I∗;f = If(ΣW + μWμ

′W

), (3.4)

where ebn,i denotes the residual (i.e., ebn,i := Zi−W′ibn). Consequently, we have the contiguity

P(n)bn+h/

√n,f

� P(n)bn,f

, and of course P (n)bn,f

� P(n)b,f as well. Recall that the contiguity P (n) � Q(n)

means that for any sequence S(n), if P (n)(S(n)) → 0, thenQ(n)(S(n)) → 0 also. The reason whywe have specified uniform LAN, rather than LAN at single b, is the one-step improvement,which will be discussed later.

By following Hallin and Werker [13], a semiparametrically efficient procedure canbe obtained by projecting Δ(n)

bn;f on some σ-field to which the generating group for {P (n)bn,f

|f ∈ Fα} becomes maximal invariant (see, e.g., Schmetterer [16]). For the quantile-restrictedregression model (2.16), such a σ-field is studied by Hallin et al. [6] and found to be generatedby signs and ranks of the residuals. Here, let us denote the sign of a residual as Sbn,i, the rankof a residual as R(n)

bn,i, and the σ-field generated by them as

B(n)bn:= σ

(Sbn,1, . . . , Sbn,n;R(n)

bn,1, . . . , R

(n)bn,n

). (3.5)

Then, “good” inference should be based on

Δ∼(n)

bn,f:= E(n)

bn,f

[Δ(n)

bn,f| B(n)bn

]=

1√n + 1

n+1∑

i=1

E(n)bn,f

[ϕf(ebn,i) | B(n)bn

]Wi

=1√n + 1

n+1∑

i=1

E(n)bn,f

[ϕf[F←(Ubn,i)] | B(n)bn

]Wi

=1√n + 1

n+1∑

i=1

ϕf[F←(Vbn,i)]Wi + oP (1),

(3.6)

where Ubn,i := F(ebn,i) is i.i.d. uniform on [0, 1] under P (n)bn,f

and hence approximated by

V(n)bn,i

:=

⎧⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎩

α ·R

(n)bn,i

N(n)bn,L

+ 1if R(n)

bn,i≤N(n)

bn,L,

α + (1 − α) ·R

(n)bn,i−N(n)

bn,L

n −N(n)bn,L

+ 1otherwise,

(3.7)

with N(n)bn,L

:= #{i ∈ {1, . . . , n} | Sbn,i ≤ 0}. In short, we are first rewriting the residual ebn,i asF←(Ubn,i) with realizationsUbn,i of a [0, 1]-uniform random variable, and then approximatingthose Ubn,i as V (n)

bn,igiven {N(n)

bn,L;R(n)

bn,1, . . . , R

(n)bn,n}.


Using this rank-based central sequence, we can construct the one-step estimator (see,e.g., Bickel [17]; Bickel et al. [18]) as follows.

Definition 3.3. For any sequence of estimators θn, the discretized estimator θn is defined to bethe nearest vertex of {θ : θ = (1/

√n)(i1, i2, . . . , id)

′, ij : integers}.

Definition 3.4. Let β(n)

(α) be the discretized version of β(n)

(α) defined at (2.9). We define the(rank-based) one-step estimator of b based on reference density f ∈ Fα as

b(n)f := β

(n)(α) + Σ

−1fg

Δ∼(n)

β(n)

(α),f√n

, with Σfg := IfgΣW −f(0)1 − αμ

−ϕgμWμ

′W,

(3.8)

where Ifg and μ−ϕg are consistent estimates of

Ifg :=∫1

0ϕf[F←(u)]ϕg[G←(u)]du, (3.9)

μ−ϕg := E[ϕg

[g←(U)

] | U ≤ α](=−g(0)α

), (3.10)

respectively.

Consistent estimates Ifg and μ−ϕg can be obtained in the manner of Hallin et al. [19],which is done without the kernel estimation of g, though here we omit the details.

Lemma 3.5 (Section 4.1 of [6]). Under Pb,g with g ∈ Fα,

√n(b(n)f − b

)d−→ N

(0, Σ−1

fgΣffΣ−1fg

). (3.11)

Therefore, the one-step estimator b(n)f

defined by (3.8) for b is semiparametrically efficient at f = g.

In our original notation, the above statement can be rewritten as, for some α ∈ (0, 1)

fixed,√n(b(n)

f− β(α)) d→ N(0, Σ−1

fgΣffΣ−1

fg).

Recall that the standard QR estimator, defined at (2.9), is asymptotically normal (seeKoenker [5]):

√n

(β(n)

(α) − β(α))

d−→ N(0, D−1

), (3.12)

where

D :=g2(0)α(1 − α)D0, with D0 = lim

n→∞1

n + 1

n+1∑

i=1

WiW′i. (3.13)


Denote the true portfolio weight with respect to risk probability α by π = (1 − 1′d−1π2(α),π2(α)

′)′, where π2(α) = (π2(α), . . . , πd(α))′, and its standard quantile regression and our one-

step estimators by π(QR) := (1 − 1′d−1β

(n)2 (α), β

(n)2 (α)′)

′and π (OS)

f:= (1 − 1′

d−1b(n)f,2 , b

(n)′

f,2 )′, respec-

tively. Denote the block matrix of the covariance matrix of standard quantile and one-stepestimators by

D−1 =(D11 D12

D′12 D22

), Σ−1

fgΣffΣ−1fg =

(Σ11 Σ12

Σ′12 Σ22

), (3.14)

where submatrices D22 and Σ22 are (d − 1) × (d − 1) symmetric matrices for the covarianceof portfolio weights π2. Then we obtain the variances of the α-risk-minimizing portfolioconstructed by the standard quantile, and the one-step estimators are stated in the followingproposition. Since direct evaluation gives the following statement, we skip its proof.

Proposition 3.6. The asymptotic conditional variances of an α-risk-minimizing portfolio using thestandard quantile regression and one-step estimators given at X = x are, respectively,

Var(X′π(QR) | X = x =

[x1, x′2

]′) = x21 · tr

[1d−11′d−1D22

]+ 2x1 · tr

[1d−1x′2D22

]+ tr

[x2x′2D22

],

Var(X′π(OS)

f | X = x =[x1, x′2

]′) = x21 · tr

[1d−11′d−1Σ22

]+ 2x1 · tr

[1d−1x′2Σ22

]+ tr

[x2x′2Σ22

],

(3.15)

where x2 = (x2, . . . , xd)′.

For any positive definite matrices A and B, we say A ≤ B if B − A is nonnegativedefinite. To compare the efficiency of the standard quantile regression estimator and the one-step estimator, we need to show that Σ−1

fgΣffΣ−1fg ≤ D−1. To see this, as in Section 3 of Koenker

and Zhao [20], let us consider

Σ :=

(ΣfgΣ−1

ffΣfg ΣfgΣfg ΣfgD−1Σfg

). (3.16)

Note that Σ is a nonnegative definite matrix. If Σ−1fgDΣ−1

fg is a positive definite, then there existsorthogonal matrix P, such that

P′ΣP =

(ΣfgΣ−1

ffΣfg 00 ΣfgD−1Σfg − Σff

), (3.17)


so ΣfgD−1Σfg − Σff is nonnegative definite. Hence, D−1 − Σ−1fgΣffΣ

−1fg is nonnegative definite

if Σfg is nonsingular. This result assures that the one-step estimator is asymptotically moreefficient than the standard quantile regression estimator. From this result, it is easy to see that

Var(X′π(OS)

f| X = x

)≤ Var

(X′π(QR) | X = x

). (3.18)

Also, by taking expectation on both sides, the same inequality holds for unconditionalvariances.


In this section, we examine the finite sample properties of the proposed one-step estimatordescribed in Section 3 for the cases where α = 0.1 and 0.5. Our simulations are performedwith two data generating processes to focus on the underlying true density g and how thechoice of the reference density f might affect the finite sample performances.

The first data-generating process (DGP1) is the same as that investigated by Bassettet al. [3]. For DGP1, we consider the construction of an α-minimizing portfolio from fourindependently distributed assets, that is, asset 1 is normally distributed with mean 0.05 andstandard deviation 0.02. Asset 2 is a reversed χ2

3 density with location and scale chosen sothat its mean and variance are identical to those of asset 1. Asset 3 is normally distributedwith mean 0.09 and standard deviation 0.05. Finally, asset 4 has a χ2

3 density with identicalmean and standard deviation to asset 3. DGP2 is a four-dimensional normal distributionwith mean vectors the same as those of DGP1, and covariance matrix Σ = [σij]i,j=1,...,4 withdiagΣ = (0.02, 0.02, 0.05, 0.05) and σi,j for i /= j is σiiσjjρ. Here, we set ρ = 0.5, which indicatesthat the asset returns have correlation 0.5. Notice that both DGP1 and DGP2 have the samemean and variance structures. The underlying true conditional densities of u = Z − A′wb forDGP1 and DGP2 are a mixture of the normal χ2

3 and reversed χ23 distribution and the normal

distributions, respectively. A simulation of the estimator, for sample, size n = 100, 500, and1000 consists of 1000 replications. We choose prespecified expected return μ0 at 0.07.

For each scenario, we computed standard quantile regression estimates β(n)

(α) withcorresponding portfolio weights π (QR) = (1 −∑d

j=2 β(n)j (α), β(n)2 (α), . . . , β(n)

d(α)), and our one-

step estimates are defined by (3.8) for various choices of the reference density f and actualdensity g in the α-minimizing portfolio allocation problem.

To make the problem a pure location model, we set the variance of the estimated

residual to have one, that is, u = u/√

Var(u), where u = [ui]i=1,...,n+1 = Zi − (Awb(n)

(α))i.The true density g can be estimated by the kernel estimator for DGP1,

g(u) =1

(n + 1)h

n+1∑i=1K

(u − uih

), (4.1)

where K is a kernel function, and h is a bandwidth. The first derivative g ′(u) is estimated by

g ′(u) =1

(n + 1)h2

n+1∑

i=1

K′(u − uh

). (4.2)


As for the DGP2, the actual density g becomes normal because the portfolio is constructed bynormally distributed returns. We use the normal distribution (N), the asymmetric Laplacedistribution (AL), the logistic distribution (LGT), and the asymmetric power distribution(APD) with λ = 1.5 for the reference density f .

The density function of the asymmetric power distribution introduced by Komunjer[7] is given by

f(u) =

⎧⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎩

δ1/λα,λ

Γ(1 + 1/λ)exp

[−δα,λαλ|u|λ

]if u ≤ 0,

δ1/λα,λ

Γ(1 + 1/λ)exp

[− δα,λ

(1 − α)λ|u|λ

]if u > 0,

(4.3)

where 0 ≤ α < 1, λ > 0, and

δα,λ =2αλ(1 − α)λαλ + (1 − α)λ

. (4.4)

When α = 0.5, the APD pdf is symmetric around zero. In this case, the APD density reducesto the standard generalized power distribution (GPD) [21, pages 194-195]. Special cases ofthe GPD include uniform (λ =∞), Gaussian (λ = 2), and Laplace (λ = 1) distributions. Whenα/= 0.5, the APD pdf is asymmetric. Special cases include asymmetric Laplace (λ = 1), thetwo-piece normal (λ = 2) distributions.

For a given sample size, we compute simulated mean and standard deviation of π (QR)

and π(OS)f

and the relative efficiency Var(π(OS)f,i

)/Var(π(QR)i ) for i = 1, . . . , 4.

Table 1 gives the results of the relative efficiencies for DGP 1. When α = 0.1, we seethat the efficiency gains of one-step estimators with asymmetric Laplace reference densityare large compared with other reference densities with n = 1000, while these efficiency gainsare less when sample size is n = 100. When α = 0.5, relative efficiency of assets 3 and 4with asymmetric Laplace reference density is minimum, while for assets 1 and 2, relativeefficiency with normal reference density is minimum. This is because of the covariance

structure of Σ(n)W defined by (2.10). As can be seen in Section 2, if μ1 /=μp and μ1 /=μq, the

(p, q)th element of the correlation matrix defined by (2.13) has a value close to unity. In thiscase, the asymptotic variance of the usual quantile regression estimator becomes large, whichleads to unsatisfactorily large variances in assets 3 and 4. However, the asymptotic varianceof our one-step estimator does not have such problems.

Table 2 gives the results of the relative efficiencies for DGP2. In line with efficiencyat a correctly specified reference density f = N, we see that the relative efficiency isminimal for all assets and sample sizes with α = 0.1 and 0.5. Even though we misspecifythe reference density f /=N, there exists some sort of efficiency gain except for assets 1 and2 of the asymmetric Laplace reference density with n = 100 and α = 0.1. Efficiency gainsfor the normal reference density and logistic reference density are almost the same becausethe underlying true density is a symmetric normal distribution, and the asymmetric powerreference density with λ = 1.5 outperforms the asymmetric Laplace reference density.

Figure 1 plots the kernel densities for the estimated portfolio weights for DGP2 withα = 0.5 and n = 1000. We see that the standard quantile regression estimators have long


Table 1: Var(π(OS)f

)/Var(π (QR)) for DGP 1 [3]: in this case, we estimate an unknown g (which must beN-χ2 mixed) using the kernel method.

fπ

π1 (.1) π2 (.1) π3 (.1) π4 (.1) π1 (.5) π2 (.5) π3 (.5) π4 (.5)

n = 100AL 1.1054 1.1088 1.0036 1.0082 1.0211 1.0193 0.9925 0.9570N 0.9773 0.9826 0.9093 0.9331 0.9412 0.9383 0.9227 0.9490LGT 0.9731 0.9788 0.9017 0.9289 0.9458 0.9436 0.9228 0.9463APD1.5 0.9851 0.9838 0.9827 0.9893 0.9517 0.9492 0.9273 0.9471

n = 500AL 1.0266 1.0068 0.9549 0.9871 0.9596 0.9563 0.3626 0.5478N 0.9762 0.9778 0.8966 0.9340 0.9186 0.9138 0.7522 0.8285LGT 0.9739 0.9769 0.8893 0.9301 0.9255 0.9225 0.7277 0.8102APD1.5 0.9765 0.9712 0.9773 0.9896 0.9291 0.9262 0.6922 0.7842

n = 1000AL 0.8702 0.9084 0.5621 0.8193 0.9150 0.8985 0.2225 0.3614N 0.9416 0.9485 0.8680 0.9112 0.7850 0.7714 0.4019 0.5214LGT 0.9453 0.9532 0.8640 0.9085 0.8101 0.7981 0.4043 0.5199APD1.5 0.9290 0.9296 0.9242 0.9645 0.8206 0.8077 0.3572 0.4806

Table 2: Var(π(OS)f

)/Var(π (QR)) for DGP2 (multinormal). In this case, residual density is a normaldistribution. Hence, we adopt g =N.

fπ

π1 (.1) π2 (.1) π3 (.1) π4 (.1) π1 (.5) π2 (.5) π3 (.5) π4 (.5)

n = 100AL 1.0890 1.0996 0.7311 0.7221 0.7428 0.7584 0.6652 0.6370N 0.5891 0.6013 0.4105 0.4199 0.5896 0.6010 0.5451 0.5347LGT 0.6153 0.6280 0.4113 0.4229 0.6026 0.6154 0.5503 0.5356APD1.5 0.8512 0.8603 0.6120 0.6069 0.6076 0.6204 0.5580 0.5420

n = 500AL 0.8970 0.8814 0.7742 0.7402 0.6944 0.7030 0.6235 0.5965N 0.5090 0.4993 0.4883 0.4648 0.5618 0.5666 0.5169 0.4949LGT 0.5184 0.5096 0.4892 0.4678 0.5703 0.5761 0.5321 0.5086APD1.5 0.7271 0.7183 0.6746 0.6423 0.5723 0.5784 0.5317 0.5099

n = 1000AL 0.8836 0.8633 0.8015 0.8712 0.8778 0.8612 0.6726 0.7390N 0.4800 0.4655 0.4861 0.5095 0.7071 0.6944 0.5746 0.6261LGT 0.4897 0.4750 0.4939 0.5198 0.7238 0.7094 0.5831 0.6347APD1.5 0.7144 0.7041 0.6617 0.7088 0.7261 0.7126 0.5845 0.6395

tails on both sides for all assets, whereas one-step estimators have a narrower intervaland higher peak at the true weight. This confirms that the one-step estimators are moresemiparametrically efficient than the standard ones.


0

0

1

2

3

4

5

0.2 0.4 0.6

Asset 1

Den

sity

−0.2

(a)

0 0.2 0.4 0.6

Asset 2

0

1

2

3

4

5

Den

sity

(b)

0.1 0.15 0.2 0.25 0.3 0.35 0.4

5

10

15

Asset 3

Den

sity

QRALN

LGTAPD

0

(c)

0.1 0.15 0.2 0.25 0.3 0.35 0.4

5

10

15Asset 4

Den

sity

QRALN

LGTAPD

0

(d)

Figure 1: Kernel density plots for the portfolio weights. Panels (a) to (d) correspond to the kerneldensity for assets 1 to 4, respectively. The density shows the standard quantile estimator (QR; solidline); the estimator with an asymmetric Laplace distribution reference density (AL; dashed line);normal distribution (N; dotted line); logistic distribution (LGT; dotted-dashed line); asymmetric powerdistribution (APD; long dashed line).

5. Empirical Application

We apply our methodology to weekly log returns of the 96 stocks of the TOPIX large100 index. The samples run from January 5, 2007, to December 2, 2011, for a total of 257observations. The stock prices are adjusted to take into account events such as stock splits onindividual securities. Preliminary tests reveal that most log return series have high values ofkurtosis and negative values of skewness in general, which indicates that the log returns arenon-Gaussian.

We computed the optimal portfolio allocations for α = 0.01, 0.05, 0.1, 0.2, 0.3, 0.4, and0.5. We set κ = 1000 and μ = −0.002, which is the third quartile of the average log-return distribution. For the first-round estimates, we used the standard quantile regressionestimator, and for the one-step estimates, we chose a normal distribution as a reference


α = 0.1

0 0.05

0

0.2

0.4

0.6

0.8

1

−0.05

(a)

0 0.05

0

0.2

0.4

0.6

0.8

1

−0.05

α = 0.2

(b)

0 0.05

0

0.2

0.4

0.6

0.8

1

−0.05

α = 0.3

QROne-step

(c)

QROne-step

0 0.05

0

0.2

0.4

0.6

0.8

1

−0.05

α = 0.4

(d)

Figure 2: Empirical cumulative distribution function of the α-risk-minimizing portfolio based on thestandard quantile regression estimator (thin line) and one-step estimator (thick line) for α = 0.1, 0.2, 0.3,and 0.5, which corresponds to (a) to (d), respectively.

density. Since we do not have enough information about the shape of the portfoliodistributions for the various choices of α, the actual density g is estimated by the kernelmethod.

Figure 2 plots the cumulative distribution functions of the α-risk-minimizingportfolios obtained by the standard quantile regression estimates and one-step estimates forα = 0.1, 0.2, 0.3, and 0.5. Summary statistics for the distributions of the different portfolios arereported in Table 3.

Figure 2 and Table 3 clearly show that the optimal α-risk-minimizing portfoliomanages to reduce the occurrence of events in the left tail when α is small for both standardQR estimates and one-step estimates. The standard deviation of the one-step estimates of anα-minimizing portfolio is smaller than that of the standard QR estimates. We can also observethat the range of a constructed portfolio with one-step estimates is much smaller than thatof standard QR estimates, due to the semiparametric efficiency properties of our one-stepestimators. When α becomes large, the difference in the standard deviation of the constructedportfolio between standard QR estimates and one-step estimates tends to become large.Hence, efficiency gains are large for α = 0.5, which is the mean absolute deviation portfolio.


0.015 0.02 0.025 0.03Standard deviation

Exp

ecte

d r

etur

n

QR (0.5)One step (0.5)

QR (0.1)One step (0.1)

−0.01

0

−0.008

−0.006

−0.004

−0.002

Figure 3: Efficient frontiers for an α-risk-minimizing portfolio based on the standard quantile regressionestimator and the one-step one. The lines with triangles and circles represent the pair of obtained standarddeviation and mean for the portfolio with α = 0.5 and 0.1, respectively. The solid and dashed lines representrisks and returns for the standard quantile regression and one-step portfolios, respectively.

Table 3: Summary statistics for the α-minimizing-portfolios using quantile regression methods X′π(QR) andone-step estimates X′π(OS)

f.

αX′π(QR) X′π(OS)

f

Min Max Mean Std. dev. Q(α) Min max Mean Std. dev. Q(α)

0.01 −0.0210 0.0557 −0.0020 0.0187 −0.0210 −0.0210 0.0555 −0.0020 0.0186 −0.02090.05 −0.0192 0.0756 −0.0020 0.0196 −0.0192 −0.0202 0.0733 −0.0020 0.0192 −0.01940.1 −0.0190 0.0649 −0.0020 0.0196 −0.0190 −0.0227 0.0606 −0.0020 0.0189 −0.01910.2 −0.0959 0.0708 −0.0020 0.0189 −0.0149 −0.0944 0.0624 −0.0020 0.0179 −0.01460.3 −0.1247 0.0512 −0.0020 0.0171 −0.0078 −0.1200 0.0487 −0.0020 0.0163 −0.00770.4 −0.1307 0.0545 −0.0020 0.0173 −0.0038 −0.1230 0.0482 −0.0020 0.0162 −0.00400.5 −0.1011 0.0605 −0.0020 0.0171 −0.0012 −0.0958 0.0537 −0.0020 0.0158 −0.0013

Note: the corresponding summary statistics of the TOPIX log returns for minimum, maximum, mean, and standarddeviation are −0.2202, 0.0924, −0.0032, and 0.0328, respectively. Also, quantiles for the TOPIX log returns for α =0.01, 0.05, 0.1, 0.2, 0.3, 0.4, and 0.5 are −0.0981, −0.0536, −0.0158, −0.0068, and 0.0006, respectively.

Another interesting finding is that the standard QR-constructed portfolios have high-densitypeaks at the required quantiles for all values of α, whereas the portfolio constructed by one-step estimates has a quite moderate density reduction at the required quantiles.

Consistent with economic intuition, higher risk aversion is associated with a shorterleft tail. In the case where α ≤ 0.1, maximum loss is limited to less than −0.02. This resultis particularly striking given that the sample includes the stock market crash of October2008 due to the US subprime mortgage crisis and the bankruptcy of Lehman Brothers, whichresulted in a weekly loss of more than −0.220 for TOPIX. The sample also includes the stockmarket crash of March 2011 due to the catastrophic earthquake and tsunami that hit Japan,which resulted in a weekly loss of −0.104.

Figure 3 presents empirical efficient frontiers corresponding to the standard quantileregression-based portfolios and one-step estimates of a portfolio with α = 0.1 and 0.5.


Figure 3 clearly illustrates that the standard quantile regression-based portfolio is completelyinefficient, far from the one-step frontier.

6. Summary and Conclusions

This paper considered a semiparametrically efficient estimation of an α-risk-minimizingportfolio. A one-step estimator based on residual signs and ranks was proposed, andsimulations were performed to compare the finite sample relative efficiencies for the standardquantile regression estimators and the one-step one. These simulations confirmed ourtheoretical findings. An empirical application to construct a portfolio using 96 Japanese stockswas investigated and confirms that the one-step α-risk-minimizing portfolio has smallervariance that is obtained by the standard quantile regression estimator.

Further research topics include (1) construction of portfolios without short-saleconstraints and (2) extending the results to the covariates of time series with heteroskedasticreturns. For the former, we impose nonnegativity of the weights by using a penalty functioncontaining a term that diverges to infinity as any of the weights becomes negative (see [22]).For the latter, we refer to Hallin et al. [6] and Taniai [23].

Acknowledgments

This paper was supported by Norinchukin Bank and the Nochu Information SystemEndowed Chair of Financial Engineering in the Department of Management Science, TokyoUniversity of Science. The authors thank Professors Masanobu Taniguchi and Marc Hallinand an anonymous referee for their helpful comments.

References

[1] P. Artzner, F. Delbaen, J.-M. Eber, and D. Heath, “Coherent measures of risk,” Mathematical Finance,vol. 9, no. 3, pp. 203–228, 1999.

[2] H. Follmer and A. Schied, “Convex measures of risk and trading constraints,” Finance and Stochastics,vol. 6, no. 4, pp. 429–447, 2002.

[3] G. W. J. Bassett, R. Koenker, and G. Kordas, “Pessimistic portfolio allocation and choquet expectedutility,” Journal of Financial Econometrics, vol. 2, no. 4, pp. 477–492, 2004.

[4] R. Koenker and G. Bassett, Jr., “Regression quantiles,” Econometrica, vol. 46, no. 1, pp. 33–50, 1978.[5] R. Koenker, Quantile Regression, vol. 38 of Econometric Society Monographs, Cambridge University

Press, Cambridge, UK, 2005.[6] M. Hallin, C. Vermandele, and B. J. M. Werker, “Semiparametrically efficient inference based on signs

and ranks for median-restricted models,” Journal of the Royal Statistical Society B, vol. 70, no. 2, pp.389–412, 2008.

[7] I. Komunjer, “Asymmetric power distribution: theory and applications to risk measurement,” Journalof Applied Econometrics, vol. 22, no. 5, pp. 891–921, 2007.

[8] Q. Zhao, “Asymptotically efficient median regression in the presence of heteroskedasticity ofunknown form,” Econometric Theory, vol. 17, no. 4, pp. 765–784, 2001.

[9] Y.-J. Whang, “Smoothed empirical likelihood methods for quantile regression models,” EconometricTheory, vol. 22, no. 2, pp. 173–205, 2006.

[10] I. Komunjer and Q. Vuong, “Semiparametric efficiency bound in time-series models for conditionalquantiles,” Econometric Theory, vol. 26, no. 2, pp. 383–405, 2010.

[11] S. Uryasev, “Conditional value-at-risk: optimization algorithms and applications,” in Proceedings ofthe IEEE/IAFE/INFORMS Conference on Computational Intelligence for Financial Engineering (CIFEr ’00),pp. 49–57, 2000.


[12] G. C. Pflug, “Some remarks on the value-at-risk and the conditional value-at-risk,” in ProbabilisticConstrained Optimization, vol. 49 of Nonconvex Optimization and Its Applications, pp. 272–281, KluwerAcademic Publishers, Dodrecht, The Netherlands, 2000.

[13] M. Hallin and B. J. M. Werker, “Semi-parametric efficiency, distribution-freeness and invariance,”Bernoulli, vol. 9, no. 1, pp. 137–165, 2003.

[14] A. W. van der Vaart, Asymptotic Statistics, vol. 3 of Cambridge Series in Statistical and ProbabilisticMathematics, Cambridge University Press, Cambridge, UK, 1998.

[15] F. C. Drost, C. A. J. Klaassen, and B. J. M. Werker, “Adaptive estimation in time-series models,” TheAnnals of Statistics, vol. 25, no. 2, pp. 786–817, 1997.

[16] L. Schmetterer, Introduction to Mathematical Statistics, Springer, Berlin, Germany, 1974.[17] P. J. Bickel, “On adaptive estimation,” The Annals of Statistics, vol. 10, no. 3, pp. 647–671, 1982.[18] P. J. Bickel, C. A. J. Klaassen, Y. Ritov, and J. A. Wellner, Efficient and Adaptive Estimation for Semipara-

metric Models, Johns Hopkins Series in the Mathematical Sciences, Johns Hopkins University Press,Baltimore, Md, USA, 1993.

[19] M. Hallin, H. Oja, and D. Paindaveine, “Semiparametrically efficient rank-based inference for shape.II. Optimal R-estimation of shape,” The Annals of Statistics, vol. 34, no. 6, pp. 2757–2789, 2006.

[20] R. Koenker and Q. Zhao, “Conditional quantile estimation and inference for ARCH models,” Econo-metric Theory, vol. 12, no. 5, pp. 793–813, 1996.

[21] N. L. Johnson, S. Kotz, and N. Balakrishnan, Continuous Univariate Distributions. Vol. 1, Wiley Series inProbability and Mathematical Statistics: Applied Probability and Statistics, John Wiley & Sons, NewYork, NY, USA, 2nd edition, 1994.

[22] S. Leorato, F. Peracchi, and A. V. Tanase, “Asymptotically efficient estimation of the conditionalexpected shortfall,” Computational Statistics & Data Analysis, vol. 56, no. 4, pp. 768–784, 2012.

[23] H. Taniai, Inference for the quantiles of ARCH processes, Ph.D. thesis, Universite libre de Bruxelles,Brussels, Belgium, 2009.


Research ArticleEstimation for Non-Gaussian Locally StationaryProcesses with Empirical Likelihood Method

Hiroaki Ogata

Waseda University, Tokyo 169-8050, Japan

Correspondence should be addressed to Hiroaki Ogata, [email protected]

Received 28 January 2012; Revised 28 March 2012; Accepted 30 March 2012

Academic Editor: David Veredas

Copyright q 2012 Hiroaki Ogata. This is an open access article distributed under the CreativeCommons Attribution License, which permits unrestricted use, distribution, and reproduction inany medium, provided the original work is properly cited.

An application of the empirical likelihood method to non-Gaussian locally stationary processesis presented. Based on the central limit theorem for locally stationary processes, we give theasymptotic distributions of the maximum empirical likelihood estimator and the empiricallikelihood ratio statistics, respectively. It is shown that the empirical likelihood method enablesus to make inferences on various important indices in a time series analysis. Furthermore, we givea numerical study and investigate a finite sample property.

1. Introduction

The empirical likelihood is one of the nonparametric methods for a statistical inferenceproposed by Owen [1, 2]. It is used for constructing confidence regions for a mean, fora class of M-estimates that includes quantile, and for differentiable statistical functionals.The empirical likelihood method has been applied to various problems because of itsgood properties: generality of the nonparametric method and effectiveness of the likelihoodmethod. For example, we can name applications to the general estimating equations, [3]the regression models [4–6], the biased sample models [7], and so forth. Applications arealso extended to dependent observations. Kitamura [8] developed the blockwise empiricallikelihood for estimating equations and for smooth functions of means. Monti [9] appliedthe empirical likelihood method to linear processes, essentially under the circular Gaussianassumption, using a spectral method. For short- and long-range dependence, Nordman andLahiri [10] gave the asymptotic properties of the frequency domain empirical likelihood. Aswe named above, some applications to time series analysis can be found but it seems thatthey were mainly for stationary processes. Although stationarity is the most fundamentalassumption when we are engaged in a time series analysis, it is also known that real timeseries data are generally nonstationary (e.g., economics analysis). Therefore we need to


use nonstationary models in order to describe the real world. Recently Dahlhaus [11–13]proposed an important class of nonstationary processes, called locally stationary processes.They have so-called time-varying spectral densities whose spectral structures smoothlychange in time.

In this paper we extend the empirical likelihood method to non-Gaussian locallystationary processes with time-varying spectra. First, We derive the asymptotic normalityof the maximum empirical likelihood estimator based on the central limit theorem for locallystationary processes, which is stated in Dahlhaus [13, Theorem A.2]. Next, we show that theempirical likelihood ratio converges to a sum of Gamma distribution. Especially, when weconsider a stationary case, that is, the time-varying spectral density is independent of a timeparameter, the asymptotic distribution becomes the chi-square.

As an application of this method, we can estimate an extended autocorrelation forlocally stationary processes. Besides we can consider the Whittle estimation which is statedin Dahlhaus [13].

This paper is organized as follows. Section 2 briefly reviews the stationary processesand explains about the locally stationary processes. In Section 3, we propose the empiricallikelihood method for non-Gaussian locally stationary processes and give the asymptoticproperties. In Section 4 we give numerical studies on confidence intervals of the autocor-relation for locally stationary processes. Proofs of theorems are given in Section 5.

2. Locally Stationary Processes

The stationary process is a fundamental setting in a time series analysis. If the process {Xt}t∈Zis stationary with mean zero, it is known to have the spectral representation:

Xt =∫π

−πexp(iλt)A(λ)dξ(λ), (2.1)

where A(λ) is a 2π-periodic complex-valued function with A(−λ) = A(λ), called transferfunction, and ξ(λ) is a stochastic process on [−π,π] with ξ(−λ) = ξ(λ) and

E[dξ(λ)] = 0, Cov(dξ(λ1), dξ(λ2)) = η(λ1 − λ2), (2.2)

where η(λ) =∑∞

j=−∞ δ(λ + 2πj) is the 2π-periodic extension of the Dirac delta function. Ifthe process is stationary, the covariance between Xt and Xt+k is independent of time t and afunction of only the time lag k. We denote it by γ(k) = Cov(Xt,Xt+k). The Fourier transformof the autocovariance function

g(λ) =1

2π

∞∑

k=−∞γ(k) exp(−ikλ) (2.3)

is called spectral density function. In the expression of (2.1), the spectral density functionis written by g(λ) = |A(λ)|2. It is estimated by the periodogram, defined by IT (λ) =(2π)−1|∑T

t=1 Xt exp(−iλt)|2. If one wants to change the weight of each data, we can insert


the function h(x) defined on [0, 1] into the periodogram: IT (λ) = (2π∑T

t=1 h(t/T)2)−1|∑T

t=1 h(t/T)Xt exp(−iλt)|2. The function h(x) is called data taper. Now, we give asimple example of the stationary process below.

Example 2.1. Consider the following AR(p) process:

p∑

j = 0

ajXt−j = εt, (2.4)

where εt are independent random variables with mean zero and variance 1. In the form of(2.1), this is obtained by letting

A(λ) =1√2π

⎛

⎝p∑

j = 0

aj exp(−iλj)

⎞

⎠−1

. (2.5)

As an extension of the stationary process, Dahlhaus [13] introduced the concept oflocally stationary. An example of the locally stationary processes is the following time-vary-ing AR(p) process:

p∑

j = 0

aj

(t

T

)Xt−j,T = εt, (2.6)

where aj(u) is a function defined on [0, 1] and εt are independent random variables withmean zero and variance 1. If all aj(u) are constant, the process (2.6) reduces to stationary.To define a general class of the locally stationary processes, we can naturally consider thetime-varying spectral representation

Xt,T =∫π

−πexp(iλt)A

(t

T, λ

)dξ(λ). (2.7)

However, it turns out that (2.6) has not exactly but only approximately a solution of theform of (2.7). Therefore, we only require that (2.7) holds approximately. The following is thedefinition of the locally stationary processes given by Dahlhaus [13].

Definition 2.2. A sequence of stochastic processes Xt,T (t = 1, . . . , T) is called locally stationarywith mean zero and transfer function A◦t,T , if there exists a representation

Xt,T =∫π

−πexp(iλt)A◦t,T (λ)dξ(λ), (2.8)

where the following holds.


(i) ξ(λ) is a stochastic process on [−π,π] with ξ(−λ) = ξ(λ) and

cum{dξ(λ1), . . . , dξ(λk)} = η⎛

⎝k∑

j=1

λj

⎞

⎠qk(λ1, . . . , λk−1)dλ1 · · ·dλk, (2.9)

where cum{· · · } denotes the cumulant of kth order, q1 = 0, q2(λ) = 1,|qk(λ1, . . . , λk−1)| ≤ constk for all k and η(λ) =

∑∞j=−∞ δ(λ + 2πj) is the 2π-periodic

extension of the Dirac delta function.

(ii) There exists a constant K and 2π-periodic function A : [0, 1] × R → C withA(u,−λ) = A(u, λ) which satisfies

supt,λ

∣∣∣∣A◦t,T (λ) −A

(t

T, λ

)∣∣∣∣ ≤ KT−1 (2.10)

for all T ; A(u, λ) is assumed to be continuous in u.

The time-varying spectral density is defined by g(u, λ) := |A(u, λ)|2. As an estimatorof g(u, λ), we define the local periodogram IN(u, λ) (for even N) as follows:

dN(u, λ) =N∑

s=1

h( s

N

)X[uT]−N/2+s,T exp(−iλs),

Hk,N =N∑

s=1

h( s

N

)k,

IN(u, λ) =1

2πH2,N|dN(u, λ)|2.

(2.11)

Here, h : R → R is a data taper with h(x) = 0 for x /∈ [0, 1]. Thus, IN(u, λ) is nothing butthe periodogram over a segment of length N with midpoint [uT]. The shift from segment tosegment is denoted by S, which means we calculate IN with midpoints tj = S(j−1)+N/2 (j =1, . . . ,M), where T = S(M − 1) +N, or, written in rescaled time, at time points uj := tj/T .Hereafter we set S = 1 rather than S =N. That means the segments overlap each other.

3. Empirical Likelihood Approach for Non-Gaussian LocallyStationary Processes

Consider an inference on a parameter θ ∈ Θ ⊂ Rq based on an observed stretchX1,T , . . . , XT,T .We suppose that information about θ exists through a system of general estimating equations.For short- or long-memory processes, Nordman and Lahiri [10] supposed that θ0, the truevalue of θ, is specified from the following spectral moment condition:

∫π

−πφ(λ,θ0)g(λ)dλ = 0, (3.1)


where φ(λ,θ) is an appropriate function depending on θ. Following this manner, we natu-rally suppose that θ0 satisfies the following time-varying spectral moment condition:

∫1

0

∫π

−πφ(u, λ,θ0)g(u, λ)dλdu = 0 (3.2)

in a locally stationary setting. Here φ : [0, 1] × [−π,π] ×Rq → Cq is a function depending onθ and satisfies Assumption 3.4(i). We give brief examples of φ and corresponding θ0.

Example 3.1 (autocorrelation). Let us set

φ(u, λ, θ) = θ − exp(iλk). (3.3)

Then (3.2) leads to

θ0 =

∫10

∫π−π exp(iλk)g(u, λ)dλdu∫1

0

∫π−π g(u, λ)dλdu

. (3.4)

When we consider the stationary case, that is, g(u, λ) is independent of the time parameter u,(3.4) becomes

θ0 =

∫π−π exp(iλk)g(λ)dλ

∫π−π g(λ)dλ

=γ(k)γ(0)

= ρ(k), (3.5)

which corresponds to the autocorrelation with lag k. So, (3.4) can be interpreted as a kind ofautocorrelation with lag k for the locally stationary processes.

Example 3.2 (Whittle estimation). Consider the problem of fitting a parametric spectral modelto the true spectral density by minimizing the disparity between them. For the stationaryprocess, this problem is considered in Hosoya and Taniguchi [14] and Kakizawa [15]. For thelocally stationary process, the disparity between the parametric model gθ(u, λ) and the truespectral density g(u, λ) is measured by

L(θ) = 14π

∫1

0

∫π

−π

{log gθ(u, λ) +

g(u, λ)gθ(u, λ)

}dλdu (3.6)

and we seek the minimizer

θ0 = arg minθ∈ΘL(θ). (3.7)

Under appropriate conditions, θ0 in (3.7) is obtained by solving the equation ∂L(θ)/∂θ = 0.Suppose that the fitting model is described as gθ(u, λ) = σ2(u)fθ(u, λ), which means θ is freefrom innovation part σ2(u). Then, by Kolmogorov’s formula (Dahlhaus [11, Theorem 3.2])


we can see that∫π−π log gθ(u, λ)dλ is independent of θ. So the differential condition on θ0

becomes

∫1

0

∫π

−π

∂

∂θgθ(u, λ)

−1∣∣∣∣θ=θ0

g(u, λ)dλdu = 0. (3.8)

This is the case when we set

φ(u, λ,θ) =∂

∂θgθ(u, λ)

−1. (3.9)

Now, we set

mj(θ) =∫π

−πφ(uj, λ,θ

)IN

(uj , λ

)dλ

(j = 1, . . . ,M

)(3.10)

as an estimating function and use the following empirical likelihood ratio function R(θ)defined by

R(θ) = maxw

⎧⎨

⎩

M∏

j=1

Mwj |M∑

j=1

wjmj(θ) = 0, wj ≥ 0,M∑

j=1

wj = 1

⎫⎬

⎭. (3.11)

Denote the maximum empirical likelihood estimator by θ, which maximizes the empiricallikelihood ratio function R(θ).

Remark 3.3. We can also use the following alternative estimating function:

m(T)j (θ) =

2πT

T∑

t=1

φ(uj,Δt,θ

)IN

(uj,Δt

) (Δt =

2πtT

)(3.12)

instead of mj(θ) in (3.10). The asymptotic equivalence of mj(θ) and m(T)j (θ) can be proven if

E∣∣∣m(T)

j (θ) −mj(θ)∣∣∣ = o(1) (3.13)

is satisfied for any j, and this is shown by straightforward calculation.

To show the asymptotic properties of θ and R(θ0), we impose the following assump-tion.

Assumption 3.4. (i) The functions A(u, λ) and φ(u, λ,θ) are 2π-periodic in λ, and the periodicextensions are differentiable in u and λ with uniformly bounded derivative (∂/∂u)(∂/∂λ)A(φ, resp.).

(ii) The parameters N and T fulfill the relations T1/4 N T1/2/ log T .


(iii) The data taper h : R → R with h(x) = 0 for all x /∈ (0, 1) is continuous on R andtwice differentiable at all x /∈ p where p is a finite set and supx/∈p|h′′(x)| <∞.

(iv) For k = 1, . . . , 8,

qk(λ1, . . . , λk−1) = ck (constant). (3.14)

Remark 3.5. Assumption 3.4(ii) seems to be restrictive. However, this is required to use thecentral limit theorem for locally stationary processes (cf. Assumption A.1 and Theorem A.2of Dahlhaus [13]) (Most of the restrictions on N result from the

√T -unbiasedness in the

central limit theorem). See also A.3. Remarks of Dahlhaus [13] for the detail.

Now we give the following theorem.

Theorem 3.6. Suppose that Assumption 3.4 holds and X1,T , . . . , XT,T is realization of the locallystationary process which has the representation (2.8). Then,

√M

(θ − θ0

)d→ N(0,Σ) (3.15)

as T → ∞, where

Σ = 4π(Σ′3Σ

−12 Σ3

)−1Σ′3Σ

−12 Σ1Σ−1

2 Σ3

(Σ′3Σ

−12 Σ3

)−1. (3.16)

Here Σ1 and Σ2 are the q by q matrices whose (i, j) elements are

(Σ1)ij =1

2π

∫1

0

[∫π

−πφi(u, λ,θ0)

{φj(u, λ,θ0) + φj(u,−λ,θ0)

}g(u, λ)2dλ

+c4

∫π

−πφi(u, λ,θ0)g(u, λ)dλ

∫π

−πφj

(u, μ,θ0

)g(u, μ

)dμ

]du,

(3.17)

(Σ2)ij =1

2π

∫1

0

[∫π

−πφi(u, λ,θ0)

{φj(u, λ,θ0) + φj(u,−λ,θ0)

}g(u, λ)2dλ

+∫π

−πφi(u, λ,θ0)g(u, λ)dλ

∫π

−πφj

(u, μ,θ0

)g(u, μ

)dμ

]du,

(3.18)

respectively, and Σ3 is the q by q matrix which is defined as

Σ3 =∫1

0

∫π

−π

∂φ(u, λ,θ)∂θ′

g(u, λ)dλdu. (3.19)

In addition, we give the following theorem on the asymptotic property of the empiricallikelihood ratio R(θ0).


Theorem 3.7. Suppose that Assumption 3.4 holds and X1,T , . . . , XT,T is realization of a locallystationary process which has the representation (2.8). Then,

− 1π

logR(θ0)d−→ (FN)′(FN) (3.20)

as T → ∞, where N is a q-dimensional normal random vector with zero mean vector and covariancematrix Iq (identity matrix) and F = Σ−1/2

2 Σ1/21 . Here Σ1 and Σ2 are same matrices in Theorem 3.6.

Remark 3.8. Denote the eigenvalues of F′F by a1, . . . , aq, then we can write

(FN)′(FN) =q∑

i=1

Zi, (3.21)

where Zi is distributed as Gamma(1/2, 1/(2ai)), independently.

Remark 3.9. If the process is stationary, that is, the time-varying spectral density isindependent of the time parameter u, we can easily see that Σ1 = Σ2 and the asymptoticdistribution becomes the chi-square with degree of freedom q.

Remark 3.10. In our setting, the number of the estimating equations and that of the parametersare equal. In that case, the empirical likelihood ratio at the maximum empirical likelihoodestimator, R(θ), becomes one (cf. [3, page 305]). That means the test statistic in Theorem 3.7becomes zero when we evaluate it at the maximum empirical likelihood estimator.

4. Numerical Example

In this section, we present simulation results of the estimation of the autocorrelation in locallystationary processes which is stated in Example 3.1. Consider the following time-varyingAR(1) process:

Xt,T − a(t

T

)Xt−1,T = εt for t ∈ Z, (4.1)

where εti.i.d.∼ Gamma(3/π, (3/π)1/2) − (3/π)1/2 and a(u) = (u − b)2, b = 0.1, 0.5, 0.9. The

observations X1,T , . . . , XT,T are generated from the process (4.1), and we make the confidenceintervals of the autocorrelation with lag k = 1, which is expressed as

θ0 =

∫10

∫π−π e

iλg(u, λ)dλdu∫1

0

∫π−π g(u, λ)dλdu

, (4.2)

based on the result of Theorem 3.7. The several combinations of the sample size T and thewindow length N are chosen: (T,N) = (100, 10), (500, 10), (500, 50), (1000, 10), (1000, 100),and the data taper is set as h(x) = (1/2){1 − cos(2πx)}. Then we calculate the valuesof the test statistic −π−1 logR(θ) at numerous points θ and obtain confidence intervals by


Table 1: 90% confidence intervals of the autocorrelation with lag k = 1.

(T,N) Lowerbound

Upperbound

Intervallength

Successfulrate

b = 0.1, θ0 = 0.308(100, 10) 0.057 0.439 0.382 0.854(500, 10) 0.172 0.382 0.210 0.866(500, 50) 0.203 0.332 0.129 0.578(1000, 10) 0.203 0.356 0.154 0.826(1000, 100) 0.225 0.308 0.084 0.444

b = 0.5, θ0 = 0.085(100, 10) −0.087 0.225 0.312 0.890(500, 10) 0.001 0.169 0.168 0.910(500, 50) 0.028 0.104 0.076 0.515(1000, 10) 0.023 0.139 0.116 0.922(1000, 100) 0.047 0.087 0.040 0.384

b = 0.9, θ0 = 0.308(100, 10) 0.060 0.449 0.388 0.841(500, 10) 0.176 0.393 0.216 0.871(500, 50) 0.201 0.332 0.131 0.586(1000, 10) 0.203 0.359 0.156 0.827(1000, 100) 0.226 0.310 0.083 0.467

collecting the points θ which satisfy −π−1 logR(θ) < zα where zα, is α-percentile of theasymptotic distribution in Theorem 3.7. We admit that Assumption 3.4. (ii) is hard to holdin a finite sample experiment, but this Monte Carlo simulation is purely illustrative and justfor investigating how the sample size and the window length affect the results of confidenceintervals.

We set a confidence level as α = 0.90 and carry out the above procedure 1000 times foreach case. Table 1 shows the averages of lower and upper bounds, lengths of the intervals,and the successful rates. Looking at the results, we find out that the larger sample size givesthe shorter length of the interval, as expected. Furthermore, the results indicate that the largerwindow length leads to the worse successful rate. We can predict that the best rate N/T liesaround 0.02 because the combination (T,N) = (500, 10) seems to give the best result amongall.

5. Proofs

5.1. Some Lemmas

In this subsection we give the three lemmas to prove Theorems 3.6 and 3.7. First of all, weintroduce the following function LN : R → R, which is defined by the 2π-periodic extensionof

LN(α) :=

⎧⎪⎪⎨

⎪⎪⎩

N, |α| ≤ 1N

1|α| ,

1N≤ |α| ≤ π.

(5.1)

The properties of the function LN are described in Lemma A.4 of Dahlhaus [13].


Lemma 5.1. Suppose (3.2) and Assumption 3.4 hold. Then for 1 ≤ k ≤ 8,

cum{dN(u1, λ1), . . . , dN(uk, λk)}

= (2π)k−1ck

⎧⎨

⎩

k∏

j=1

A(uj, λj

)⎫⎬

⎭ exp

⎧⎨

⎩−ik∑

j=1

λj([ukT] −

[ujT

])⎫⎬

⎭

×N∑

s=1

⎧⎨

⎩

k∏

j=1

h

(s + [ukT] −

[ujT

]

N

)⎫⎬

⎭ exp

⎧⎨

⎩−i⎛

⎝k∑

j=1

λj

⎞

⎠s

⎫⎬

⎭

+O

(N2

T

)+O

((logN

)k−1)= O

⎛

⎝LN

⎛

⎝k∑

j=1

λj

⎞

⎠

⎞

⎠ +O

(N2

T

)+O

((logN

)k−1).

(5.2)

Proof. Let Π = (−π,π] and let ω = (ω1, . . . , ωk). Since

cum(Xt1,T , . . . , Xtk,T ) = ck

∫

Πk

exp

⎛

⎝ik∑

j=1

ωjtj

⎞

⎠

⎛

⎝k∏

j=1

A◦tj ,T(ωj

)⎞

⎠η

⎛

⎝k∑

j=1

ωj

⎞

⎠dω, (5.3)

the kth cumulant of dN is equal to

ck

∫

Πk

exp

⎧⎨

⎩ik∑

j=1

ωj

([ujT

] − N2

)⎫⎬

⎭η

⎛

⎝k∑

j=1

ωj

⎞

⎠

×k∏

j=1

N∑

s=1

h( s

N

)A◦[ujT]−N/2+s,T

(ωj

)exp

{−i(λj −ωj

)s}dω.

(5.4)

As in the proof of Theorem 2.2 of Dahlhaus [12] we replace A◦[u1T]−N/2+s1,T(ω1) by A(u1 +

(−N/2 + s1)/T, λ1) and we obtain

∣∣∣∣∣

N∑

s=1

h( s

N

){A◦[u1T]−N/2+s,T(ω1) −A

(u1 +

−N/2 + sT

, λ1

)}exp{−i(λ1 −ω1)s}

∣∣∣∣∣ ≤ K (5.5)

with some constant K while

∣∣∣∣∣

N∑

s=1

h( s

N

)A◦[ujT]−N/2+s,T

(ωj

)exp

{−i(λj −ωj

)s}∣∣∣∣∣ ≤ KLN

(λj −ωj

)(5.6)


for j = 2, . . . , k. The replacement error is smaller than

K

∫

Πk

k∏

j=2

LN(λj −ωj

)dω ≤ K(

logN)k−1

. (5.7)

In the same way we replace A◦[ujT]−N/2+sj ,T(ωj) by A(uj + (−N/2 + sj)/T, λj) for j = 2, . . . , k,

and then we obtain

ckN∑

s1,...,sk=1

⎧⎨

⎩

k∏

j=1

h

(sj

N

)A

(uj +

−N/2 + sjT

, λj

)⎫⎬

⎭ exp

⎛

⎝−ik∑

j=1

λjsj

⎞

⎠

×∫

Πk

η

⎛

⎝k∑

j=1

ωj

⎞

⎠ exp

⎧⎨

⎩ik∑

j=1

ωj

([ujT

] − N2

+ sj)⎫⎬

⎭dω +O((

logN)k−1

).

(5.8)

The integral part is equal to

k−1∏

j=1

∫

Πexp

{iωj

([ujT

] − [ukT] + sj − sk)}dωj. (5.9)

So we get

(2π)k−1ckN∑

s=1

⎧⎨

⎩

k∏

j=1

h

(s + [ukT] −

[ujT

]

N

)A

(uj +

−N/2 + s + [ukT] −[ujT

]

T, λj

)⎫⎬

⎭

× exp

⎧⎨

⎩−ik∑

j=1

λj(s + [ukT] −

[ujT

])⎫⎬

⎭ +O((

logN)k−1

).

(5.10)

Since h(x) = 0 for x /∈ (0, 1), we only have to consider the range of s which satisfies 1 ≤s + [ukT] − [ujT] ≤N − 1. Therefore we can regard (−N/2 + s + [ukT] − [ujT])/T as O(N/T),and Taylor expansion of A around uj gives the first equation of the desired result. Moreover,as in the same manner of the proof of Lemma A.5 of Dahlhaus [13] we can see that

N∑

s=1

⎧⎨

⎩

k∏

j=1

h

(s + [ukT] −

[ujT

]

N

)⎫⎬

⎭ exp

⎧⎨

⎩−i⎛

⎝k∑

j=1

λj

⎞

⎠s

⎫⎬

⎭ = O

⎛

⎝LN

⎛

⎝k∑

j=1

λj

⎞

⎠

⎞

⎠, (5.11)

which leads to the second equation.


Lemma 5.2. Suppose (3.2) and Assumption 3.4 hold. Then,

PM :=1

2π√M

M∑

j=1

mj(θ0)d→ N(0,Σ1). (5.12)

Proof. We set

JT (φ) :=1M

M∑

j=1

∫π

−πφ(uj, λ,θ0

)IN

(uj, λ

)dλ,

J(φ) :=∫1

0

∫π

−πφ(u, λ,θ0)g(u, λ)dλdu.

(5.13)

Henceforth we denote φ(u, λ,θ0) by φ (u, λ) for simplicity. This lemma is proved by provingthe convergence of the cumulants of all orders. Due to Lemma A.8 of Dahlhaus [13] theexpectation of PM is equal to

√M

2π

{J(φ) + o

(T−1/2

)}. (5.14)

By (3.2) and O(M) = O(T), this converges to zero.Next, we calculate the covariance of PM. From the relation T = M + N − 1 we can

rewrite

PM =

√M

T

√T

2πJT (φ) =

√

1 − N + 1T

√T

2πJT (φ). (5.15)

Then the (α, β)-element of the covariance matrix of PM is equal to

1

(2π)2

(1 − N + 1

T

)T cov

{JT

(φα

), JT

(φβ

)}. (5.16)

Due to Lemma A.9 of Dahlhaus [13], this converges to

12π

∫1

0

[∫π

−πφi(u, λ)

{φj(u, λ) + φj(u,−λ)

}g(u, λ)2dλ

+∫ ∫π

−πφi(u, λ)φj

(u, μ

)g(u, λ)g

(u, μ

)q4

(λ,−λ, μ)dλdμ

]du.

(5.17)

By Assumption 3.4(iv) the covariance tends to Σ1.The kth cumulant for k ≥ 3 tends to zero due to Lemma A.10 of Dahlhaus [13]. Then

we obtain the desired result.


Lemma 5.3. Suppose (3.2) and Assumption 3.4 hold. Then,

SM :=1

2πM

M∑

j=1

mj(θ0)mj(θ0)′p→ Σ2. (5.18)

Proof. First we calculate the mean of (α, β)-element of SM:

E

⎡

⎣ 12πM

M∑

j=1

mj(θ0)mj(θ0)′⎤

⎦

αβ

=1

2πM

M∑

j=1

∫ ∫π

−πφα

(uj, λ

)φβ

(uj, μ

)E[IN

(uj, λ

)IN

(uj, μ

)]dλdμ

=1

2πM

M∑

j=1

∫ ∫π

−πφα

(uj, λ

)φβ

(uj, μ

)

× [cov

{IN

(uj, λ

), IN

(uj, μ

)}+ EIN

(uj, λ

)EIN

(uj, μ

)]dλdμ.

(5.19)

Due to Dahlhaus [12, Theorem 2.2 (i)] the second term of (5.19) becomes

12πM

M∑

j=1

∫π

−πφα

(uj, λ

){g(uj, λ

)+O

(N2

T2

)+O

(logNN

)}dλ

×∫π

−πφβ

(uj, μ

){g(uj, μ

)+O

(N2

T2

)+O

(logNN

)}dμ

=1

2π

∫1

0

{∫π

−πφα(u, λ)g(u, λ) dλ

∫π

−πφβ

(u, μ

)g(u, μ

)dμ

}

+O(

1M

)+O

(N2

T2

)+O

(logNN

).

(5.20)

Next we consider

cov{IN

(uj, λ

), IN

(uj, μ

)}

=1

(2πH2,N)2

[cum

{dN

(uj, λ

), dN

(uj, μ

)}cum

{dN

(uj,−λ

), dN

(uj ,−μ

)}

+ cum{dN

(uj, λ

), dN

(uj,−μ

)}cum

{dN

(uj,−λ

), dN

(uj, μ

)}

+cum{dN

(uj, λ

), dN

(uj ,−λ

), dN

(uj, μ

), dN

(uj ,−μ

)}].

(5.21)


We calculate the three terms separately. From Lemma 5.1 the first term of (5.21) is equal to

1

(2πH2,N)2

{2πA

(uj, λ

)A(uj, μ

) N∑

s=1

h( s

N

)2exp

{−i(λ + μ)s}+O

(N2

T

)+O

(logN

)}

×{

2πA(uj ,−λ

)A(uj,−μ

) N∑

s=1

h( s

N

)2exp

{−i(−λ − μ)s} +O

(N2

T

)+O

(logN

)}.

(5.22)

It converges to zero when λ/= − μ and is equal to

g(uj, λ

)2 +O(N

T

)+O

(logNN

)(5.23)

when λ = −μ. Similarly the second term of (5.21) converges to zero when λ/=μ and is equalto (5.23) when λ = μ. We can also apply Lemma 5.1 to the third term of (5.21), and analogouscalculation shows that it converges to zero. After all we can see that (5.19) converges to (Σ2)αβ,the (α, β)-element of Σ2.

Next we calculate the second-order cumulant:

cum

⎧⎪⎨

⎪⎩

⎡

⎣ 12πM

M∑

j=1


⎦

α1β1

,

⎡

⎣ 12πM

M∑

j=1


⎦

α2β2

⎫⎪⎬

⎪⎭. (5.24)

This is equal to

(2πM)−2(2πH2,N)−4M∑

j1= 1

M∑

j2 = 1

∫ ∫ ∫ ∫π

−πφα1

(uj1 , λ1

)φβ1

(uj1 , μ1

)φα2

(uj2 , λ2

)φβ2

(uj2 , μ2

)

× cum{dN

(uj1 , λ1

)dN

(uj1 ,−λ1

)dN

(uj1 , μ1

)dN

(uj1 ,−μ1

),

dN(uj2 , λ2

)dN

(uj2 ,−λ2

)dN

(uj2 , μ2

)dN

(uj2 ,−μ2

)}dλ1dμ1dλ2dμ2.

(5.25)

Using the product theorem for cumulants (cf. [16, Theorem 2.3.2]) we have to sum over allindecomposable partitions {P1, . . . , Pm}with |Pi| ≥ 2 of the scheme

dN(uj1 , λ1

)dN

(uj1 ,−λ1

)dN

(uj1 , μ1

)dN

(uj1 ,−μ1

)

dN(uj2 , λ2

)dN

(uj2 ,−λ2

)dN

(uj2 , μ2

)dN

(uj2 ,−μ2

).

(5.26)

We can apply Lemma 5.1 to all cumulants which is seen in (5.25), and the dominant term ofthe cumulants is o(N4) so (5.25) tends to zero. Then we obtain the desired result.


5.2. Proof of Theorem 3.6

Using the lemmas in Section 5.1, we prove Theorem 3.6. To find the maximizing weights w′jsof (3.11), we proceed by the Lagrange multiplier method. Write

G =M∑

j=1

log(Mwj

) −Mα′M∑

j=1

wjmj(θ) + γ

⎛

⎝M∑

j=1

wj − 1

⎞

⎠, (5.27)

where α ∈ Rq and γ ∈ R are Lagrange multipliers. Setting ∂G/∂wj = 0 gives

∂G

∂wj=

1wj−Mα′mj(θ) + γ = 0. (5.28)

So the equation∑M

j=1 wj(∂G/∂wj) = 0 gives γ = −M. Then, we may write

wj =1M

11 + α′mj(θ)

, (5.29)

where the vector α = α(θ0) satisfies q equations given by

1M

M∑

j=1

mj(θ)1 + α′mj(θ)

= 0. (5.30)

Therefore, θ is a minimizer of the following (minus) empirical log likelihood ratio function

l(θ) :=M∑

j=1

log{

1 + α′mj(θ)}

(5.31)

and satisfies

0 =∂l(θ)∂θ

∣∣∣∣θ=θ

=M∑

j=1

(∂α′(θ)/∂θ)mj(θ) +(∂m′j(θ)/∂θ

)α(θ)

1 + α′(θ)mj(θ)

∣∣∣∣∣∣∣θ=θ

=M∑

j=1

(∂m′j(θ)/∂θ

)α(θ)

1 + α′(θ)mj(θ)

∣∣∣∣∣∣∣θ=θ

.

(5.32)


Denote

Q1M(θ,α) :=1M

M∑

j=1

mj(θ)1 + α′(θ)mj(θ)

,

Q2M(θ,α) :=1M

M∑

j=1

11 + α′(θ)mj(θ)

∂m′j(θ)

∂θα(θ).

(5.33)

Then, from (5.30) and (5.32), we have

0 = Q1M

(θ, α

)

= Q1M(θ0, 0) +∂Q1M(θ0, 0)

∂θ′(θ − θ0

)+∂Q1M(θ0, 0)

∂α′ (α − 0) + op(δM),(5.34)

0 = Q2M

(θ, α

)

= Q2M(θ0, 0) +∂Q2M(θ0, 0)

∂θ′(θ − θ0

)+∂Q2M(θ0, 0)

∂α′(α − 0) + op(δM),

(5.35)

where α = α(θ) and δM = ||θ − θ0|| + ||α||. Let us see the asymptotic properties of the abovefour derivatives. First,

∂Q1M(θ0, 0)∂θ′

=1M

M∑

j=1

∂mj(θ0)

∂θ′ =1M

M∑

j=1

∫π

−π

∂φ(uj, λ,θ

)

∂θ′IN

(uj, λ

)dλ. (5.36)

From Lemmas A.8 and A.9 of Dahlhaus [13], we have

E

[∂Q1M(θ0, 0)

∂θ′

]=

∫1

0

∫π

−π

∂φ(u, λ,θ)∂θ′

g(u, λ)dλdu + o(M−1/2

),

cov

[[∂Q1M(θ0, 0)

∂θ′

]

α1β1

,

[∂Q1M(θ0, 0)

∂θ′

]

α2β2

]= O

(M−1

),

(5.37)

which leads to

∂Q1M(θ0, 0)∂θ′

p→∫1

0

∫π

−π

∂φ(u, λ,θ)∂θ′

g(u, λ)dλdu = Σ3. (5.38)

Similarly, we have

∂Q2M(θ0, 0)∂α′

=1M

M∑

j=1

∂mj(θ0)′

∂θ

p→ Σ′3. (5.39)


Next, from Lemma 5.3, we obtain

∂Q1M(θ0, 0)∂α′

= − 1M

M∑

j=1

mj(θ0)mj(θ0)′p→ −2πΣ2. (5.40)

Finally, we have

∂Q2M(θ0, 0)∂θ′

= 0. (5.41)

Now, (5.34), (5.35) and (5.38)–(5.41) give

(α

θ − θ0

)=

⎛⎜⎜⎝

∂Q1M

∂α′∂Q1M

∂θ′∂Q2M

∂α′∂Q2M

∂θ′

⎞⎟⎟⎠

−1

(θ0,0)

(−Q1M(θ0, 0) + op(δM)

op(δM)

), (5.42)

where

⎛⎜⎜⎝

∂Q1M

∂sα′∂Q1M

∂θ′∂Q2M

∂α′∂Q2M

∂θ′

⎞⎟⎟⎠

(θ0,0)

p→(−2πΣ2 Σ3

Σ′3 0

). (5.43)

Because of Lemma 5.2, we have

Q1M(θ0, 0) =1M

M∑

j=1

mj(θ0) = Op

(M−1/2

). (5.44)

From this and the relation (5.42), (5.43), we can see that δM = Op(M−1/2). Again, from (5.42),(5.43), and Lemma 5.2, direct calculation gives that

√M

(θ − θ0

)d→ N(0,Σ). (5.45)

5.3. Proof of Theorem 3.7

Using the lemmas in Section 5.1, we prove Theorem 3.7. The proof is the same as that ofTheorem 3.6 up to (5.30). Let α = ‖α‖e where ‖e‖ = 1, and we introduce

Yj := α′mj(θ0), Z∗M := max1≤j≤M

∥∥mj(θ0)∥∥. (5.46)


Note 1/(1 + Yj) = 1 − Yj/(1 + Yj) and from (5.30) we find that

e′

⎧⎨

⎩1M

M∑

j=1

(1 − Yj

1 + Yj

)mj(θ0)

⎫⎬

⎭ = 0,

e′

⎛

⎝ 1M

M∑

j=1

α′mj(θ0)1 + Yj

mj(θ0)

⎞

⎠ = e′

⎛

⎝ 1M

M∑

j=1

mj(θ0)

⎞

⎠,

‖α‖e′⎛

⎝ 1M

M∑

j=1

mj(θ0)mj(θ0)′

1 + Yj

⎞

⎠e = e′

⎛

⎝ 1M

M∑

j=1

mj(θ0)

⎞

⎠.

(5.47)

Every wj > 0, so 1 + Yj > 0, and therefore by (5.47), we get

‖α‖e′SMe ≤ ‖α‖e′⎛

⎝ 12πM

M∑

j=1

mj(θ0)mj(θ0)′

1 + Yj

⎞

⎠e ·(

1 + maxjYj

)

≤ ‖α‖e′⎛

⎝ 12πM

M∑

j=1

mj(θ0)mj(θ0)′

1 + Yj

⎞

⎠e · (1 + ‖α‖Z∗M)

= e′M−1/2PM(1 + ‖α‖Z∗M

),

(5.48)

where SM and PM are defined in Lemmas 5.2 and 5.3. Then by (5.48), we get

‖α‖{e′SMe − Z∗Me′

(M−1/2PM

)}≤ e′

(M−1/2PM

). (5.49)

From Lemmas 5.2 and 5.3 we can see that

M−1/2PM = Op

(M−1/2

), SM = Op(1). (5.50)

We evaluate the order of Z∗M. We can write

Z∗M ≤ max1≤j≤M

∫π

−π

∥∥∥φθ0

(uj, λ

)∥∥∥IN(uj, λ

)dλ =: max

1≤j≤Mm∗j (θ0)

(say

). (5.51)


Then, for any ε > 0,

P

(max

1≤j≤Mm∗j (θ0) > ε

√M

)≤

M∑

j=1

P(m∗j (θ0) > ε

√M

)

=M∑

j=1

P

(m∗j (θ0)3 >

(ε√M

)3)≤

M∑

j=1

1ε3M3/2

E∣∣∣m∗j (θ0)

∣∣∣3

=1

ε3M3/2

M∑

j=1

∫ ∫ ∫π

−π

∥∥φθ(uj, λ1

)φθ

(uj, λ2

)φθ

(uj, λ3

)∥∥

× E[IN(uj, λ1

)IN

(uj, λ2

)IN

(uj, λ3

)]dλ1dλ2dλ3.

(5.52)

The above expectation is written as

E[IN

(uj, λ1

)IN

(uj, λ2

)IN

(uj, λ3

)]

=1

(2πH2,N)3cum

[dN

(uj, λ1

)dN

(uj,−λ1

)dN

(uj, λ2

)

×dN(uj,−λ2

)dN

(uj, λ3

)dN

(uj,−λ3

)].

(5.53)

From Lemma 5.1 this is of order Op(1), so we can see that (5.52) tends to zero, which leads

Z∗M = op(M1/2

). (5.54)

From (5.49), (5.50), and (5.54), it is seen that

‖α‖[Op(1) − op

(M−1/2

)Op

(M−1/2

)]≤ Op

(M−1/2

). (5.55)

Therefore,

‖α‖ = Op

(M−1/2

). (5.56)

Now we have from (5.54) that

max1≤t≤T

∣∣Yj∣∣ = Op

(M−1/2

)op

(M1/2

)= op(1) (5.57)


and from (5.30) that

0 =1M

M∑

j=1

mj(θ0)1

1 + Yj

=1M

M∑

j=1

mj(θ0)

(1 − Yj +

Y 2j

1 + Yj

)

= 2πM−1/2PM − 2πSMα +1M

M∑

j=1

mj(θ0)Y 2j

1 + Yj.

(5.58)

Noting that

1M

M∑

j=1

∥∥mj(θ0)∥∥3 ≤ 1

M

M∑

j=1

Z∗M∥∥mj(θ0)

∥∥2 = op(M1/2

), (5.59)

we can see that the final term in (5.58) has a norm bounded by

1M

M∑

j=1

∥∥mj(θ0)∥∥3‖α‖2∣∣1 + Yj

∣∣−1 = op(M1/2

)Op

(M−1

)Op(1) = op

(M−1/2

). (5.60)

Hence, we can write

α =M−1/2S−1MPM + ε, (5.61)

where ε = op(M−1/2). By (5.57), we may write

log(1 + Yj

)= Yj − 1

2Y 2j + ηj , (5.62)

where for some finite K

Pr(∣∣ηj

∣∣ ≤ K∣∣Yj∣∣3, 1 ≤ j ≤M

)−→ 1 (T −→ ∞). (5.63)


We may write

− 1π

logR(θ0) = − 1π

M∑

j=1

log(Twj

)=

1π

M∑

j=1

log(1 + Yj

)

=1π

M∑

j=1

Yj − 12π

M∑

j=1

Y 2j +

1π

M∑

j=1

ηj

= P′MS−1

MPM −Mε′SMε +1π

M∑

j=1

ηt

= (A) − (B) + (C)(say

).

(5.64)

Here it is seen that

(B) =Mop(M−1/2

)Op(1)op

(M−1/2

)= op(1),

(C) ≤ K‖α‖3M∑

j=1

∥∥mj(θ0)∥∥3 = Op

(M−3/2

)op

(M3/2

)= op(1).

(5.65)

And finally from Lemmas 5.2 and 5.3, we can show that

(A) d−→(Σ−1/2

2 Σ1/21 Σ−1/2

1 PM)′(

Σ−1/22 Σ1/2

1 Σ−1/21 PM

)

d−→ (FN)′(FN).

(5.66)

Then we can obtain the desired result.

Acknowledgments

The author is grateful to Professor M. Taniguchi, J. Hirukawa, and H. Shiraishi for theirinstructive advice and helpful comments. Thanks are also extended to the two referees whosecomments are useful. This work was supported by Grant-in-Aid for Young Scientists (B)(22700291).

References

[1] A. B. Owen, “Empirical likelihood ratio confidence intervals for a single functional,” Biometrika, vol.75, no. 2, pp. 237–249, 1988.

[2] A. B. Owen, “Empirical likelihood ratio confidence regions,” The Annals of Statistics, vol. 18, no. 1, pp.90–120, 1990.

[3] J. Qin and J. Lawless, “Empirical likelihood and general estimating equations,” The Annals of Statistics,vol. 22, no. 1, pp. 300–325, 1994.

[4] A. B. Owen, “Empirical likelihood for linear models,” The Annals of Statistics, vol. 19, no. 4, pp. 1725–1747, 1991.


[5] S. X. Chen, “On the accuracy of empirical likelihood confidence regions for linear regression model,”Annals of the Institute of Statistical Mathematics, vol. 45, no. 4, pp. 621–637, 1993.

[6] S. X. Chen, “Empirical likelihood confidence intervals for linear regression coefficients,” Journal ofMultivariate Analysis, vol. 49, no. 1, pp. 24–40, 1994.

[7] J. Qin, “Empirical likelihood in biased sample problems,” The Annals of Statistics, vol. 21, no. 3, pp.1182–1196, 1993.

[8] Y. Kitamura, “Empirical likelihood methods with weakly dependent processes,” The Annals of Statis-tics, vol. 25, no. 5, pp. 2084–2102, 1997.

[9] A. C. Monti, “Empirical likelihood confidence regions in time series models,” Biometrika, vol. 84, no.2, pp. 395–405, 1997.

[10] D. J. Nordman and S. N. Lahiri, “A frequency domain empirical likelihood for short- and long-rangedependence,” The Annals of Statistics, vol. 34, no. 6, pp. 3019–3050, 2006.

[11] R. Dahlhaus, “On the Kullback-Leibler information divergence of locally stationary processes,”Stochastic Processes and Their Applications, vol. 62, no. 1, pp. 139–168, 1996.

[12] R. Dahlhaus, “Asymptotic statistical inference for nonstationary processes with evolutionary spec-tra,” in Proceedings of the Athens Conference on Applied Probability and Time Series Analysis, vol. 115 ofLecture Notes in Statistics, pp. 145–159, Springer, 1996.

[13] R. Dahlhaus, “Fitting time series models to nonstationary processes,” The Annals of Statistics, vol. 25,no. 1, pp. 1–37, 1997.

[14] Y. Hosoya and M. Taniguchi, “A central limit theorem for stationary processes and the parameterestimation of linear processes,” The Annals of Statistics, vol. 10, no. 1, pp. 132–153, 1982, Correction:vol 21, pp. 1115–1117, 1993.

[15] Y. Kakizawa, “Parameter estimation and hypothesis testing in stationary vector time series,” Statistics& Probability Letters, vol. 33, no. 3, pp. 225–234, 1997.

[16] D. R. Brillinger, Time Series: Data Analysis and Theory, Holden-Day, San Francisco, Calif, USA, 2001.


Research ArticleA Simulation Approach to Statistical Estimation ofMultiperiod Optimal Portfolios

Hiroshi Shiraishi

The Jikei University School of Medicine, Tokyo 1828570, Japan

Correspondence should be addressed to Hiroshi Shiraishi, [email protected]



Copyright q 2012 Hiroshi Shiraishi. This is an open access article distributed under the CreativeCommons Attribution License, which permits unrestricted use, distribution, and reproduction inany medium, provided the original work is properly cited.

This paper discusses a simulation-based method for solving discrete-time multiperiod portfoliochoice problems under AR(1) process. The method is applicable even if the distributions of returnprocesses are unknown. We first generate simulation sample paths of the random returns by usingAR bootstrap. Then, for each sample path and each investment time, we obtain an optimal portfolioestimator, which optimizes a constant relative risk aversion (CRRA) utility function. When aninvestor considers an optimal investment strategy with portfolio rebalancing, it is convenient tointroduce a value function. The most important difference between single-period portfolio choiceproblems and multiperiod ones is that the value function is time dependent. Our method takescare of the time dependency by using bootstrapped sample paths. Numerical studies are providedto examine the validity of our method. The result shows the necessity to take care of the timedependency of the value function.

1. Introduction

Portfolio optimization is said to be “myopic” when the investor does not know what willhappen beyond the immediate next period. In this framework, basic results about singleperiod portfolio optimization (such as mean-variance analysis) are justified for short-terminvestments without portfolio rebalancing. Multiperiod problems are much more realisticthan single-period ones. In this framework, we assume that an investor makes a sequence ofdecisions to maximize a utility function at each time. The fundamental method to solve thisproblem is the dynamic programming. In this method, a value function which expresses theexpected terminal wealth is introduced. The recursive equation with respect to the valuefunction is so-called Bellman equation. The first order conditions (FOCs) to satisfy theBellman equation are key tool in order to solve the dynamic problem.

The original literature on dynamic portfolio choice, pioneered by Merton [1] in con-tinuous time and by Samuelson [2] and Fama [3] in discrete time, produced many important


insights into the properties of optimal portfolio policies. Unfortunately, since it is known thatthe closed-form solutions are obtained only for a few special cases, the recent literature uses avariety of numerical and approximate solution methods to incorporate realistic features intothe dynamic portfolio problem such as Ait-Sahalia and Brandet [4] and Brandt et al. [5].

We introduce an procedure to construct the dynamic portfolio weights based on ARbootstrap.

The simulation algorithm is as follows; first, we generate simulation sample pathsof the vector random returns by using AR bootstrap. Based on the bootstrapping samples,an optimal portfolio estimator, which is applied from time T − 1 to the end of tradingtime T , is obtained under a constant relative risk aversion (CRRA) utility function. Notethat this optimal portfolio corresponds “myopic” (single period) optimal portfolio. Nextwe approximate the value function by linear functions of the past observation. This idea issimilar to that of [4, 5]. Then, optimal portfolio weight estimators at each trading time areobtained based on the value function. Finally, we construct an optimal investment strategyas a sequence of the optimal portfolio weight estimators.

This paper is organized as follows. We describe the basic idea to solve multiperiodoptimal portfolio weights under a CRRA utility function in Section 2. In Section 3, wediscuss an algorithm to construct the estimator involving the method of AR bootstrap. Theapplications of our method are in Section 4.

2. Multiperiod Optimal Portfolio

Suppose the existence of a finite number of risky assets indexed by i, (i = 1, . . . , m). Let Xt =(X1(t), . . . ,Xm(t))′ denote the random excess returns on m assets from time t to t+ 1 (supposethat Si(t) is a value of asset i at time t. Then, the return is described as 1+Xi(t) = Si(t)/Si(t−1)).Suppose too that there exists a risk-free asset with the excess return Xf (Suppose that B(t) isa value of risk-free asset at time t. Then, the return is described as 1 + Xf = B(t)/B(t − 1)).Based on the process {Xt}Tt=1 andXf , we consider an investment strategy from time 0 to time Twhere T(∈ N) denotes the end of the investment time. Let wt = (w1(t), . . . , wm(t))′ be vectorsof portfolio weight for the risky assets at the beginning of time t + 1. Here we assume thatthe portfolio weights wt can be rebalanced at the beginning of time t + 1 and measurable(predictable) with respect to the past information Ft = σ(Xt,Xt−1, . . .). Here we make thefollowing assumption.

Assumption 2.1. There exists an optimal portfolio weight wt ∈ Rm satisfied with |w′tXt+1 + (1−

wte)Xf | � 1 (we assume that the risky assets exclude ultra high-risk and high-return ones,for instance, the asset value Si(t + 1) may be larger than 2Si(t)), almost surely for each timet = 0, 1, . . . , T − 1 where e = (1, . . . , 1)′.

Then the return of the portfolio from time t to t+ 1 is written as 1+Xf +w′t(Xt+1 −Xfe)(assuming that St := (S1(t), . . . , Sm(t))′ = B(t)e, the portfolio return is written as (w′tSt+1 +(1 −wte)B(t + 1))/(w′tSt + (1 −wte)B(t)) = 1 +Xf +w′t(Xt+1 −Xfe)) and the return from time0 to time T (called terminal wealth) is written as

WT :=T−1∏

t=0

(1 +Xf +w′t

(Xt+1 −Xfe

)). (2.1)


Suppose that a utility function U : x �→ U(x) is differentiable, concave, and strictlyincreasing for each x ∈ R. Consider an investor’s problem

max{wt}T−1

t=0

E[U(WT )]. (2.2)

Following a formulation by the dynamic programming (e.g., Bellman [6]), it isconvenient to express the expected terminal wealth in terms of a value function Vt:

Vt ≡ max{ws}T−1

s=t

E[U(WT ) | Ft]

= maxwt

E

[max{ws}T−1

s=t

E[U(WT ) | Ft+1] | Ft]

= maxwt

E[Vt+1 | Ft],

(2.3)

subject to the terminal condition VT = U(WT). The recursive equation (2.3) is the so-calledBellman equation and is the basis for any recursive solution of the dynamic portfolio choiceproblem. The first-order conditions (FOCs) (here (∂/∂wt )E[Vt+1|Ft] = E[(∂/∂wt)Vt+1|Ft].isassumed). in order to obtain an optimal solution at each time t are

∂Vt∂wt

= E[∂1U(WT )

(Xt+1 −Xfe

) | Ft]= 0, (2.4)

where ∂1U(x0) = (∂/∂x)U(x)|x=x0 . These FOCs make up a system of nonlinear equationsinvolving integrals that can in general be solved for wt only numerically.

According to the literature (e.g., [5]), we can simplify this problem in case of a constantrelative risk aversion (CRRA) utility function, that is,

U(W) =W1−γ

1 − γ , γ /= 1, (2.5)


where γ denotes the coefficient of relative risk aversion. In this case, the Bellman equationsimplifies to

Vt = maxwt

E

[max{ws}T−1

s=t+1

E

[1

1 − γ (WT )1−γ | Ft+1

]| Ft

]

= maxwt

E

⎡

⎣ max{ws}T−1

s=t+1

E

⎡

⎣ 11 − γ

(T−1∏

s=0

(1 +Xf +w′s(Xs+1 −Xfe

))1−γ

| Ft+1

⎤

⎦ | Ft⎤

⎦

= maxwt

E

⎡

⎣ 11 − γ

(t∏

s=0

(1 +Xf +w′s

(Xs+1 −Xfe

)))1−γ

× max{ws}T−1

s=t+1

E

⎡

⎣(

T−1∏

s=t+1

(1 +Xf +w′s

(Xs+1 −Xfe

)))1−γ

| Ft+1

⎤

⎦ | Ft⎤

⎦

= maxwt

E

[1

1 − γ (Wt+1)1−γ max{ws}T−1

s=t+1

E

[(WT

t+1

)1−γ | Ft+1

]| Ft

]

= maxwt

E[U(Wt+1)Ψt+1 | Ft],

(2.6)

where WTt+1 =

∏T−1s=t+1(1 +Xf +w′s(Xs+1 −Xfe)) and Ψt+1 = max{ws}T−1

s=t+1E[(WT

t+1)1−γ |Ft+1].

From this, the value function Vt can be expressed as

Vt = U(Wt)Ψt, (2.7)

and Ψt also satisfies a Bellman equation

Ψt = maxwt

E[(

1 +Xf +w′t(Xt+1 −Xfe

))1−γΨt+1 | Ft], (2.8)

subject to the terminal condition ΨT = 1.The corresponding FOCs (in terms of Ψt) are

E[(

1 +Xf +w′t(Xt+1 −Xfe

))−γΨt+1(Xt+1 −Xfe

) | Ft]= 0. (2.9)

3. Estimation

Suppose that {Xt = (X1(t), . . . , Xm(t))′; t ∈ Z} is an m-vector AR(1) process defined by

Xt = μ +A(Xt−1 − μ) + εt, (3.1)

where μ = (μ1, . . . , μm)′ is a constant m-dimensional vector, εt = (ε1(t), . . . , εm(t))′ areindependent and identically distributed (i.i.d.) randomm-dimensional vectors with E[εt] = 0


and E[εtε′t] = Γ (Γ is a nonsingular m by m matrix), and A is a nonsingular m by m matrix.We make the following assumption.

Assumption 3.1. det{Im −Az}/= 0 on {z ∈ C; |z| ≤ 1}.Given {X−n+1, . . . ,X0,X1, . . . ,Xt}, the least-squares estimator A(t) of A is obtained by

solving

Γ(t)A(t) =t∑

s=−n+2

Y(t)s−1

(Y(t)s

)′, (3.2)

where Y(t)s = Xs− μ(t), Γ(t) =

∑ts=−n+1 Y

(t)s (Y(t)

s )′ and μ(t) = (1/(n+ t))∑t

s=−n+1 Xs. Then, the errorε(t)s = (ε(t)1 (s), . . . , ε(t)m (s))′ is “recovered” by

ε(t)s := Y(t)s − A(t)Y(t)

s−1, s = −n + 2, . . . , t. (3.3)

Let F(t)n (·) denote the distribution which puts mass 1/(n + t) at ε(t)s . Let {ε(b,t)∗s }Ts=t+1 (for b =

1, . . . , B(∈ N)) be i.i.d. bootstrapped observations from F(t)n .

Given {ε(b,t)∗s }, define Y(b,t)∗s and X(b1,b2,t)∗

s by

Y(b,t)∗s =

(A(t)

)s−t(Xt − μ(t)

)+

s∑

k=t+1

(A(t)

)s−kε(b,t)∗k

,

X(b1,b2,t)∗s = μ(t) + A(t)Y(b1,t)∗

s−1 + ε(b2,t)∗s ,

(3.4)

for s = t + 1, . . . , T .Based on the above {X(b1,b2,t)∗

s }b1,b2=1,...,B;s=t+1,...,T for each t = 0, . . . , T − 1, we construct anestimator of the optimal portfolio weight wt as follows.

Step 1. First, we fix the current time t which implies that the observed stretch n + t is fixed.Then, we can generate {X(b1,b2,t)∗

s } by (3.4).

Step 2. Next, for each b0 = 1, . . . , B, we obtain w(b0,t)T−1 as the maximizer of

E∗T−1

[(1 +Xf +w′

(X(b0,b,t)∗T −Xfe

))1−γ]=

1B

B∑

b=1

(1 +Xf +w′


))1−γ, (3.5)

or the solution of

E∗T−1

[(1 +Xf +w′


))1−γ(X(b0,b,t)∗T −Xfe

)]

=1B

B∑

b=1

(1 +Xf +w′


))1−γ(X(b0,b,t)∗T −Xfe

)

= 0,

(3.6)


with respect to w. Here we introduce a notation “E∗s[·]” as an estimator of conditionalexpectation E[· | Fs], which is defined by E∗s[h(X

(b0,b,t)∗s+1 )] = (1/B)

∑Bb=1 h(X

(b0,b,t)∗s+1 ) for any

function h of X(b0,b,t)∗s+1 . This w(b0,t)

T−1 corresponds to the estimator of myopic (single period)optimal portfolio weight.

Step 3. Next, we construct estimators of ΨT−1. Since it is difficult to express the explicit formof ΨT−1, we parameterize it as linear functions of XT−1 as follows;

Ψ(1)(XT−1,θT−1) :=[1,X′T−1

]θT−1, (3.7)

Ψ(2)(XT−1,θT−1) :=[1,X′T−1, vech

(XT−1X′T−1

)′]θT−1. (3.8)

Note that the dimensions of θT−1 in Ψ(1) and Ψ(2) arem+1 andm(m+1)/2+m+1, respectively.The idea of Ψ(1) and Ψ(2) is inspired by the parameterization of the conditional expectationsin [5].

In order to construct the estimators of Ψ(i) (i = 1, 2), we introduce the conditional leastsquares estimators of the parameter θ(i)

T−1, that is,

θ(i)T−1 = arg min

θQ

(i)T−1(θ), (3.9)

where

Q(i)T−1(θ) =

1B

B∑

b0=1

E∗T−1

[(ΨT−1 −Ψ(i)

)2]

=1B

B∑

b0=1

[1B

B∑

b=1

{ΨT−1

(X(b0,b,t)∗T

)−Ψ(i)

T−1

(X(b0,b0,t)∗T−1 ,θ

)}2],

ΨT−1

(X(b0,b,t)∗T

)=(

1 +Xf +(w(b0,t)T−1

)′(X(b0,b,t)∗T −Xfe

))1−γ.

(3.10)

Then, by using θ(i)T−1, we can compute Ψ(i)(X(b0,b,t)

∗

T−1 , θ(i)T−1).

Step 4. Based on the above Ψ(i), we obtain w(b0,t)T−2 as the maximizer of

E∗T−2

[(1 +Xf +w′

(X(b0,b,t)∗T−1 −Xfe

))1−γΨ(i)

(X(b0,b,t)∗T−1 , θ

(i)T−1

)]

=1B

B∑

b=1

(1 +Xf +w′


))1−γΨ(i)

(X(b0,b,t)∗T−1 , θ

(i)T−1

),

(3.11)


or the solution of

E∗T−2

[(1 +Xf +w′


))1−γ(X(b0,b,t)∗T−1 −Xfe

)Ψ(i)

(Y(b0,b,t)∗T−1 , θ

(i)T−1

)]

=1B

B∑

b=1

(1 +Xf +w′


))1−γ(X(b0,b,t)∗T−1 −Xfe

)Ψ(i)

(X(b0,b,t)∗T−1 , θ

(i)T−1

)

= 0.(3.12)

with respect to w. This w(b0,t)T−2 does not correspond to the estimator of myopic (single period)

optimal portfolio weight due to the effect of Ψ(i).

Step 5. In the same manner of Steps 3–4, we can obtain θ(i)s and w(b0,t)

s , recursively, for s =T − 2, T − 1, . . . , t + 1.

Step 6. Then, we define an optimal portfolio weight estimator at time t as w(t)t := w(b0,t)

t byStep 4. Note that w(t)

t is obtained as only one solution because X(b0,b,t)∗t+1 (= μ(t) + A(t)(Xt − μ(t)) +

ε(b,t)∗t+1 ) is independent of b0.

Step 7. For each time t = 0, 1, . . . , T − 1, we obtain w(t)t by Steps 1–6. Finally, we can construct

an optimal investment strategy as {w(t)t }T−1

t=0 .

4. Examples

In this section we examine our approach numerically. Suppose that there exists a risky assetwith the excess return Xt at time t and a risk-free asset with the excess return Xf = 0.01. Weassume that Xt is defined by the following univariate AR(1) model:

Xt = μ +A(Xt−1 − μ

)+ εt, εt ∼N(0,Γ). (4.1)

Let wt be a portfolio weight for the risky asset at the beginning of time t + 1. Suppose thatan investor is interested in the investment strategy from time 0 to time T . Then the terminalwealth is written as (2.1). Applying our method, the estimator WT can be obtained by

WT =T−1∏

t=0

(1 +Xf + wt

(Xt+1 −Xf

)), (4.2)

where wt is the estimator of optimal portfolio under the CRRA utility function defined by(2.5). In what follows, we examine the effect of WT for a variety of n (initial sample size), B(resampling size), A (AR parameter), Γ (variance of εt), γ (relative risk aversion parameter),and Ψ (defined by (3.7) or (3.8)).

Example 4.1 (myopic (single period) versus dynamic (multiPeriod)). Let μ = 0.02, A = 0.1,Γ = 0.05, n = 100, T = 10, and B = 100. We generate the excess return process {Xt}t=−n+1,...,T


0 2 4 6 8 10

0.6

0.4

0

0.2

Time

Res

ampl

ed e

xces

s re

turn

−0.2

−0.4

−0.6

Time series plots of Xt∗ and Xf

Figure 1: Resampled excess return.

Port

folio

ret

urn

Port

folio

ret

urn

Port

folio

ret

urn

Port

folio

ret

urn

Port

folio

ret

urn

0.96

1

1.04

0.96

1

1.04

0.96

1

1.04

2 4 6 8 10

Time

2 4 6 8 10

Time

2 4 6 8 10

Time

2 4 6 8 10

Time

2 4 6 8 10

Time

2 4 6 8 10

Time

2 4 6 8 10

Time

2 4 6 8 10

Time

2 4 6 8 10

Time

1.3

1.15

1

Port

folio

ret

urn 1.3

1.15

1

Port

folio

ret

urn 1.3

1.15

1

Port

folio

ret

urn 0

Port

folio

ret

urn

−0.2

−0.1

0

−0.2

−0.1

0

−0.2

−0.1

Single (gam = 5) Single (gam = 10) Single (gam = 20)

Cumulative (gam = 5) Cumulative (gam = 10) Cumulative (gam = 20)

Utilty (gam = 5) Utilty (gam = 10) Utilty (gam = 20)

Figure 2: Myopic and dynamic portfolio return.


0 1 2 3 4

Boxplot of terminal wealth (T = 10)

n = 1000, B = 100

n = 1000, B = 20

n = 1000, B = 5

n = 100, B = 100

n = 100, B = 20

n = 100, B = 5

n = 10, B = 100

n = 10, B = 20

n = 10, B = 5

Figure 3: Boxplot1.

0.5 1 1.5 2

Boxplot of terminal wealth (T = 10)

A = 0.2, Gam = 0.1

A = 0.2, Gam = 0.05

A = 0.2, Gam = 0.01

A = 0.1, Gam = 0.1

A = 0.1, Gam = 0.05

A = 0.1, Gam = 0.01

A = 0.01, Gam = 0.1

A = 0.01, Gam = 0.05

A = 0.01, Gam = 0.01

Figure 4: Boxplot2.

by (4.1). First, for each t = 0, . . . , T − 1 we generate {X(b1,b2,t)∗s }b1,b2=1,...,B;s=t+1,...,T by (3.4) based

on {Xs}ts=−n+1 (as Step 1). We plot {Xt}t=1,...,T and {X(b1,b2,t)∗s }b=1,...,10; s=1,...,T in Figure 1.

It can be seen that X(b1,b2,t)∗s show similar behavior with Xt.


Table 1: Dynamic portfolio returns for γ = 5.

Myopic Dynamic (Ψ(1)) Dynamic (Ψ(2))

T Mean (q0.25, q0.5, q0.75) Mean (q0.25, q0.5, q0.75) Mean (q0.25, q0.5, q0.75)A: Terminal wealth

1 1.013564 (0.9924, 1.0091,1.0192) 1.013667 (0.9920, 1.0096,

1.0176) 1.013814 (0.9920, 1.0096,1.0176)

2 1.024329 (0.9917, 1.0192,1.0445) 1.024396 (0.9923, 1.0177,

1.0436) 1.024667 (0.9924, 1.0183,1.0437)

5 1.065896 (1.0021, 1.0504,1.1125) 1.065988 (1.0000, 1.0509,

1.1115) 1.066355 (0.9999, 1.0505,1.1106)

10 1.137727 (1.0273, 1.1062,1.2024) 1.137707 (1.0264, 1.1041,

1.2005) 1.138207 (1.0265, 1.1043,1.2002)

B: Utility of terminal wealth

1 −0.24158 (−0.257, −0.241,−0.231) −0.24139 (−0.258, −0.240,

−0.233) −0.24130 (−0.258, −0.240,−0.233)

2 −0.23609 (−0.258, −0.231,−0.210) −0.23595 (−0.257, −0.233,

−0.210) −0.23578 (−0.257, −0.232,−0.210)

5 −0.21761 (−0.247, −0.205,−0.163) −0.21761 (−0.249, −0.204,

−0.163) −0.21703 (−0.250, −0.205,−0.164)

10 −0.18349 (−0.224, −0.166,−0.119) −0.18339 (−0.225, −0.168,

−0.120) −0.18287 (−0.225, −0.168,−0.120)




1 1.011802 (1.0011, 1.0095,1.0146) 1.011859 (1.0010, 1.0098,

1.0138) 1.011944 (1.0010, 1.0098,1.0138)

2 1.022249 (1.0059, 1.0196,1.0323) 1.022286 (1.0065, 1.0190,

1.0319) 1.022439 (1.0065, 1.0192,1.0319)

5 1.058344 (1.0276, 1.0512,1.0825) 1.058373 (1.0254, 1.0509,

1.0823) 1.058584 (1.0253, 1.0507,1.0818)

10 1.120369 (1.0658, 1.1070,1.1544) 1.120323 (1.0687, 1.1068,

1.1533) 1.120595 (1.0666, 1.1060,1.1532)


1 −0.10224 (−0.109, −0.101,−0.097) −0.10215 (−0.110, −0.101,

−0.098) −0.10210 (−0.110, −0.101,−0.098)

2 −0.09530 (−0.105, −0.093,−0.083) −0.09523 (−0.104, −0.093,

−0.083) −0.09515 (−0.104, −0.093,−0.083)

5 −0.07581 (−0.086, −0.070,−0.054) −0.07582 (−0.088, −0.071,

−0.054) −0.07557 (−0.088, −0.071,−0.054)

10 −0.05007 (−0.062, −0.044,−0.030) −0.05003 (−0.061, −0.044,

−0.030) −0.04986 (−0.062, −0.044,−0.030)





1 1.010905 (1.0055, 1.0097,1.0123) 1.010934 (1.0055, 1.0099,

1.0119) 1.010979 (1.0054, 1.0099,1.0119)

2 1.021181 (1.0131, 1.0198,1.0262) 1.021200 (1.0133, 1.0195,

1.0260) 1.021281 (1.0133, 1.0196,1.0260)

5 1.054646 (1.0396, 1.0512,1.0668) 1.054655 (1.0381, 1.0509,

1.0668) 1.054767 (1.0381, 1.0508,1.0667)

10 1.112289 (1.0853, 1.1062,1.1297) 1.112256 (1.0876, 1.1060,

1.1291) 1.112396 (1.0865, 1.1056,1.1290)


1 −0.04386 (−0.047, −0.043,−0.041) −0.04382 (−0.047, −0.043,

−0.042) −0.04379 (−0.047, −0.043,−0.042)

2 −0.03705 (−0.041, −0.036,−0.032) −0.03702 (−0.040, −0.036,

−0.032) −0.03699 (−0.040, −0.036,−0.032)

5 −0.02189 (−0.025, −0.020,−0.015) −0.02189 (−0.025, −0.020,

−0.015) −0.02181 (−0.025, −0.020,−0.015)

10 −0.00881 (−0.011, −0.007,−0.005) −0.00880 (−0.010, −0.007,

−0.005) −0.00876 (−0.010, −0.007,−0.005)

Next, we construct the optimal portfolio estimator w(t)t along the lines with Steps 2–7.

Here we apply the approximated solution for (3.5) or (3.11) following [5], that is,

w(b0,t)s

=− 1

2E∗s[D

(b0,b,t)∗3,s+1

]{E∗s[D

(b0,b,t)∗2,s+1

]+3(w

(b0,t)s

)2

E∗s[D

(b0,b,t)∗4,s+1

]+4(w

(b0,t)s

)3

E∗s[D

(b0,b,t)∗5,s+1

]},

(4.3)

where

D(b0,b,t)∗2,s+1 =

(1 +Xf

)−γ(X

(b1,b2,t)∗s+1 −Xf

)Ψ(i)

(X

(b1,b2,t)∗s+1 , θ

(i)s+1

),

D(b0,b,t)∗3,s+1 =

−γ2(1 +Xf

)−1−γ(X

(b1,b2,t)∗s+1 −Xf

)2Ψ(i)

(X

(b1,b2,t)∗s+1 , θ

(i)s+1

),

D(b0,b,t)∗4,s+1 =

(−γ)(−1 − γ)

6(1 +Xf

)−2−γ(X

(b1,b2,t)∗s+1 −Xf

)3Ψ(i)

(X

(b1,b2,t)∗s+1 , θ

(i)s+1

),

D(b0,b,t)∗5,s+1 =

(−γ)(−1 − γ)(−2 − γ)

24(1 +Xf

)−3−γ(X

(b1,b2,t)∗s+1 −Xf

)4Ψ(i)

(X

(b1,b2,t)∗s+1 , θ

(i)s+1

),

w(b0,t)s = −

E∗s[D

(b0,b,t)∗2,s+1

]

2E∗s[D

(b0,b,t)∗3,s+1

] .

(4.4)


This approximate solution describes a fourth-order expansion of the value function around1 +Xf ( ws describes a second-order expansion). According to [5], a second-order expansionof the value function is sometimes not sufficiently accurate, but a fourth-order expansionincludes adjustments for the skewness and kurtosis of returns and their effects on the utilityof the investor.

Figure 2 shows time series plots for single portfolio return (=1+Xf+wt(Xt+1−Xf), Line1), cumulative portfolio return (=WT , Line 2), and value of utility function (=1/(1 − g)W1−g

T ,Line 3) for γ = 5, 10 and 20. The solid line shows the investment only for risk-free asset (i.e.,wt = 0), the dotted line with � shows myopic (single period) portfolio (i.e., Ψ(i) = 1) and thedotted line with + shows dynamic (multiperiod) portfolio by using Ψ(1).

Regarding the single-portfolio return, we can not argue the best investment strategyamong the risk-free, the myopic portfolio and the dynamic portfolio investment. However,to look at the cumulative portfolio return or the value of utility function, it is obviouslythat the dynamic portfolio investment is the best one. The difference between the myopicand dynamic portfolio is due to Ψ and is called “hedging demands” because by deviatingfrom the single period portfolio choice, the investor tries to hedge against changes in theinvestment opportunities. In view of the effect of γ , we can see that the magnitude of thehedging demands decreases with increased amount of γ .

Next, we repeat the above algorithm 100 times using the different generated data.Tables 1, 2, and 3 show means, 25 percentiles (q0.25), medians (q0.5), and 75 percentiles (q0.75)of terminal wealth (WT ) and the values of utility function (1/(1 − g)W1−g

T ) for T = 1, 2, 5, 10,and γ = 5, 10, 20.

We can see that for all T , the means of terminal wealth WT are larger than that of risk-free investment (i.e., (1+Xf)

T ). In view of the distribution of WT , the means are larger than themedians (q0.5) which shows the asymmetry of the distribution. Among the myopic, dynamicportfolio using Ψ(1) and Ψ(2), dynamic portfolio using Ψ(2) is the best investment strategy inview of the means of WT or 1/(1 − g)W1−g

T . There are some cases that the means of WT fordynamic portfolio using Ψ(1) are smaller than those for myopic portfolio. This phenomenonwould show the inaccuracy of the approximation of Ψ. In addition, in view of the dispersionof WT , the dynamic portfolio’s one is relatively smaller than the myopic portfolio’s one.

Example 4.2 ((sample size (n) and resampling size (B)). In this example, we examine effect ofthe initial sample size (n) and the resample size (B). Let μ = 0.02,A = 0.1, Γ = 0.05, T = 10, andγ = 5. In the same manner as Example 4.1, we consider the effect of WT for n = 10, 100, 1000and B = 5, 20, 100. Figure 3 shows the box plots of the terminal wealth WT for eachn and B.

It can be seen that the medians tend to increase with increased amount of n and B.In addition, the wideness of the box plots decreases with increased amount of n and B. Thisphenomenon shows the accuracy of the approximation of X∗t .

Example 4.3 (AR Parameter (A) and variance of ∈t (Γ)). In this example, we examine effect ofthe AR parameter (A) and the variance of εt (Γ). Let μ = 0.02, n = 100, B = 100, T = 10, andγ = 5. In the same manner as Example 4.1, we consider the effect of WT for A = 0.01, 0.1, 0.2,and Γ = 0.01, 0.05, 0.10. Figure 4 shows the box plots of the terminal wealth WT for each Aand Γ.

Obviously, the medians increase with decreased amount of Γ which shows that theinvestment result is preferred when the amount of εt is small. On the other hand, the wideness


of the box plots increases with increased amount of A which shows that the difference of theinvestment result is wide when the amount of A is large.

References

[1] R. C. Merton, “Lifetime portfolio selection under uncertainty: the continuous-time case,” The Review ofEconomics and Statistics, vol. 51, no. 3, pp. 247–257, 1969.

[2] P. A. Samuelson, “Lifetime portfolio selection by dynamic stochastic programming,” Review of Econom-ics and Statistics, vol. 51, no. 3, pp. 239–246, 1969.

[3] E. F. Fama, “Multiperiod consumption-investment decisions,” American Economic Review, vol. 60, no.1, pp. 163–174, 1970.

[4] Y. Aıt-Sahalia and M. W. Brandt, “Variable selection for portfolio choice,” Journal of Finance, vol. 56, no.4, pp. 1297–1351, 2001.

[5] M. W. Brandt, A. Goyal, P. Santa-Clara, and J. R. Stroud, “A simulation approach to dynamic portfoliochoice with an application to learning about return predictability,” Review of Financial Studies, vol. 18,no. 3, pp. 831–873, 2005.

[6] R. Bellman, Dynamic Programming, Princeton Landmarks in Mathematics, Princeton University Press,Princeton, NJ, USA, 2010.


Research ArticleOn the Causality between Multiple LocallyStationary Processes

Junichi Hirukawa

Faculty of Science, Niigata University, 8050 Ikarashi 2-no-cho, Nishi-ku, Niigata 950-2181, Japan


Received 14 January 2012; Accepted 25 March 2012


Copyright q 2012 Junichi Hirukawa. This is an open access article distributed under the CreativeCommons Attribution License, which permits unrestricted use, distribution, and reproduction inany medium, provided the original work is properly cited.

When one would like to describe the relations between multivariate time series, the concepts ofdependence and causality are of importance. These concepts also appear to be useful when oneis describing the properties of an engineering or econometric model. Although the measures ofdependence and causality under stationary assumption are well established, empirical studiesshow that these measures are not constant in time. Recently one of the most important classes ofnonstationary processes has been formulated in a rigorous asymptotic framework by Dahlhausin (1996), (1997), and (2000), called locally stationary processes. Locally stationary processeshave time-varying spectral densities whose spectral structures smoothly change in time. Here, wegeneralize measures of linear dependence and causality to multiple locally stationary processes.We give the measures of linear dependence, linear causality from one series to the other, andinstantaneous linear feedback, at time t and frequency λ.

1. Introduction

In discussion of the relations between time series, concepts of dependence and causality arefrequently invoked. Geweke [1] and Hosoya [2] have proposed measures of dependenceand causality for multiple stationary processes (see also Taniguchi et al. [3]). They have alsoshowed that these measures can be additively decomposed into frequency-wise. However,it seems to be restrictive that these measures are constants all the time. Priestley [4] hasdeveloped the extensions of prediction and filtering theory to nonstationary processes whichhave evolutionary spectra. Alternatively, in this paper we generalize measures of dependenceand causality to multiple locally stationary processes.

When we deal with nonstationary processes, one of the difficult problems to solve ishow to set up an adequate asymptotic theory. To meet this Dahlhaus [5–7] introduced an im-portant class of nonstationary processes and developed the statistical inference. We give theprecise definition of multivariate locally stationary processes which is due to Dahlhaus [8].


Definition 1.1. A sequence of multivariate stochastic processes Zt,T = (Z(1)t,T , . . . , Z

(d(Z))t,T )

′, (t =

2−N/2, . . . , 1, . . . , T, . . . , T +N/2; T,N ≥ 1) is called locally stationary with mean vector 0 andtransfer function matrix A◦ if there exists a representation

Zt,T =∫π

−πexp(iλt)A◦t,T (λ)dζ(λ), (1.1)

where

(i) ζ(λ) = (ζ(1)(λ), . . . , ζ(d(Z))(λ))

′is a complex valued stochastic vector process on

[−π,π] with ζ(a)(λ) = ζ(a)(−λ) and

cum{dζ(a1)(λ1), . . . , dζ(ak)(λk)

}= η

⎛

⎝k∑

j=1

λj

⎞

⎠ κa1,...,ak

(2π)k−1dλ1 · · ·dλk−1, (1.2)

for k ≥ 2, a1, . . . , ak = 1, . . . , d(Z), where cum{· · · } denotes the cumulant of kth order,and η(λ) =

∑∞j=−∞ δ(λ + 2πj) is the period 2π extension of the Dirac delta function.

(ii) There exists a constant K and a 2π-periodic matrix valued function A : [0, 1]×R →Cd(Z)×d(Z)

with A(u,−λ) = A(u, λ) and

supt,λ

∣∣∣∣∣A◦t,T (λ)a,b −A

(t

T, λ

)

a,b

∣∣∣∣∣ ≤ KT−1 (1.3)

for all a, b = 1, . . . , d(Z) and T ∈ N. A(u, λ) is assumed to be continuous in u.

We call f(u, λ) := A(u, λ)ΩA(u, λ)∗ the time-varying spectral density matrix of theprocess, where Ω = (κa,b)a,b=1,...,d(Z) . Write

εt :=∫π

−πexp(iλt)dζ(λ), (1.4)

then {εt} becomes a white noise process with E(εt) = 0 and Var(εt) = Ω.Our objective is the generalization of dependence and causality measures to locally

stationary processes and construction of test statistics which can examine the nonstationaryeffect of actual time series data. The paper, organized as follows. Section 2 explains the gen-eralization of causality measures to multiple locally stationary processes. Since this extensionis natural, we do is not explain the original idea of the causality measures in stationarycase and recommend to refer Geweke [1] and Hosoya [2] for it. In Section 3 we introducethe nonparametric spectral estimator of multivariate locally stationary processes and explaintheir asymptotic properties. Finally, we propose the test statistics for linear dependence andshow their performance in terms of empirical numerical example in Section 4.


2. Measurements of Linear Dependence and Causality forNonstationary Processes

Here, we generalize measures of dependence and causality to multiple locally stationaryprocesses. The assumptions and results of this section are straightforward extension of theoriginal idea in stationary case. To avoid repetition, Geweke [1] and Hosoya [2] should bereferred to for the original idea of causality.

For the d(Z)-dimensional locally stationary process {Zt,T}, we introduceH, the Hilbertspace spanned by Z

(j)t,T , j = 1, . . . , d(Z), t = 0,±1, . . ., and call H(Zt,T ) the closed subspace

spanned by Z(j)s,T , j = 1, . . . , d(Z), s ≤ t. We obtain the best one-step linear predictor of Zt+1,T by

projecting the components of the vector ontoH(Zt,T ), so here projection implies component-wise projection. We denote the error of prediction by ξt+1,T . Then, for locally stationary processwe have

E(ξs,Tξ

′t,T

)= δs,tGt,T , (2.1)

where δs,t is the Kronecker delta function. Note that ξt,T ’s are uncorrelated but do nothave identical covariance matrices; namely, Gt,T are time-dependent. Now, we impose thefollowing assumption on Gt,T .

Assumption 2.1. The covariance matrices of errors Gt,T are nonsingular for all t and T .

Define

ut,T =∞∑

j=0

Ht,T

(j)ξt−j,T , tr

⎧⎨

⎩

∞∑

j=0

Ht,T

(j)Gt−j,THt,T

(j)′⎫⎬

⎭ <∞ (2.2)

as a one-sided linear process and

vt,T = Zt,T − ut,T , (2.3)

where coefficient matrices are

Ht,T

(j)= E

(Zt,Tξ

′t−j,T

)G−1t,T , j ≥ 1, Ht,T (0) = Id(Z) . (2.4)

Note that each Ht,T (j)ξt−j,T , j = 0, 1, . . . is projection of Zt,T onto the closed subspacespanned by ξt−j,T . Now, we have the following Wold decomposition for locally stationaryprocesses.

Lemma 2.2 (Wold decomposition). If {Zt,T} is a locally stationary vector process of d(Z) com-ponents, then Zt,T = ut,T +vt,T , where ut,T is given by (2.1), (2.2), and (2.4), vt,T is deterministic, andE(vs,Tξ′t,T ) ≡ 0.

If only ut,T occurs, we say that Zt,T is purely nondeterministic.

Assumption 2.3. Zt,T is purely nondeterministic.


In view of Lemma 2.2, we can see that under Assumptions 2.1 and 2.3, Zt,T becomesa one-side linear process given by (2.2). For locally stationary process, if we choose anorthonormal basis ε(j)t , j = 1, . . . , d(Z), in the closed subspace spanned by ξt,T , then {εt}will bean uncorrelated stationary process. We call {εt} a fundamental process of {Zt,T} and Ct,T (j);j = 0, 1, . . . denote the corresponding coefficients, that is,

Zt,T =∞∑

j=0

Ct,T

(j)εt−j . (2.5)

Let ft,T (λ) be the time-varying spectral density matrix of Zt,T . A process is said to havethe maximal rank if it has nondegenerate spectral density matrix a.e.

Assumption 2.4. The locally stationary process {Zt,T} has the maximal rank for all t and T . Inparticular

∫π

−πlog|ft,T (λ)|dλ > −∞, ∀t, T, (2.6)

where |D| denotes the determinant of the matrix D.

We will say that a function φ(z), analytic in the unit disc, belongs to the classH2 if

H2(φ)= sup

0≤ρ<1

∫π

−π

∣∣∣φ(ρe−iλ

)∣∣∣2dλ <∞. (2.7)

Under Assumptions 2.1–2.4, it follows that {Zt,T} has a time-varying spectral densityft,T (λ) which has rank d(Z) for almost all λ, and is representable in the form

ft,T (λ) =1

2πΦt,T

(eiλ)Φt,T

(eiλ)∗, (2.8)

where D∗ denotes the complex conjugate of matrix D and Φt,T (eiλ) is the boundary value ofa d(Z) × d(Z) analytic function

Φt,T (z) =∞∑

j=0

Ct,T

(j)zj , (2.9)

in the unit disc, and it holds that Φt,T (0)Φt,T (0)∗ = Gt,T .

Now, we introduce measures of linear dependence, linear causality, and instantaneouslinear feedback at time t. Let Zt,T = (X′t,T ,Y

′t,T )′ be d(Z) = (d(X) + d(Y ))-dimensional locally

stationary process, which has time-varying spectral density matrix:

ft,T (λ) =

⎛⎜⎝

f(xx)t,T (λ) f(xy)t,T (λ)

f(yx)t,T (λ) f(yy)t,T (λ)

⎞⎟⎠. (2.10)


We will find the partitions ξt,T

⎛⎜⎜⎝ξ(1)t,T d(X) × 1

ξ(2)t,T d(Y ) × 1

⎞⎟⎟⎠ and

Cov(ξt,T , ξt,T

)= Gt,T =

⎛⎜⎝G

(1,1)t,T G

(1,2)t,T

G(2,1)t,T G

(2,2)t,T

⎞⎟⎠ (2.11)

useful. Meanwhile G(X)t,T and G

(Y )t,T denote the covariance matrices of the one-step-ahead errors

ξ(X)t,T and ξ(Y )t,T when Xt,T and Yt,T are forecasts from their own pasts alone; namely, ξ(X)

t,T and ξ(Y )t,T

are the residuals of the projections of Xt,T and Yt,T ontoH(Xt−1,T ) andH(Yt−1,T ), respectively.We define the measures of linear dependence, linear causality from {Yt,T} to {Xt,T},

from {Xt,T} to {Yt,T} and instantaneous linear feedback, at time t as

M(X,Y )t,T = log

∣∣∣G(X)t,T

∣∣∣∣∣∣G(Y )

t,T

∣∣∣

|Gt,T | ,

M(Y →X)t,T = log

∣∣∣G(X)t,T

∣∣∣∣∣∣G(1,1)

t,T

∣∣∣,

M(X→Y )t,T = log

∣∣∣G(Y )t,T

∣∣∣∣∣∣G(2,2)

t,T

∣∣∣,

(2.12)

M(X·Y )t,T = log

∣∣∣G(1,1)t,T

∣∣∣∣∣∣G(2,2)

t,T

∣∣∣

|Gt,T | , (2.13)

respectively; then we have

M(X,Y )t,T =M(Y →X)

t,T +M(X→Y )t,T +M(X·Y )

t,T . (2.14)

Next, we decompose measures of linear causality into frequency-wise. To definefrequency-wise measures of causality, we introduce the following analytic facts.

Lemma 2.5. The analytic matrix Φt,T (z) corresponding to a fundamental process {εt} (for {Zt,T})is maximal among analytic matrices Ψt,T (z) with components from the classH2, and satisfying theboundary condition (2.8); that is,

Φt,T (0)Φt,T (0)∗ ≥ Ψt,T (0)Ψt,T (0)∗. (2.15)

Although the following assumption is natural extension of Kolmogorov’s formula instationary case (see, e.g., [9]), it is not straightforward and unfortunately, so far, we cannotprove it from more simple assumption. We guess it requires another completely technicalpaper.


Assumption 2.6 (Kolmogorov’s formula). Under Assumptions 2.1–2.4, an analytic matrixΦt,T (z) satisfying the boundary condition (2.8) will be maximal if and only if

|Φt,T (0)|2 = |Gt,T | = exp1

2π

∫π

−πlog|2πft,T (λ)|dλ. (2.16)

Now we define the process {ηt,T} as

⎛

⎝η(1)t,T

η(2)t,T

⎞

⎠ =

⎛⎜⎝

Id(X) −G(1,2)t,T G

(2,2)t,T

−1

−G(2,1)t,T G

(1,1)t,T

−1Id(Y )

⎞⎟⎠

⎛

⎝ξ(1)t,T

ξ(2)t,T

⎞

⎠, (2.17)

then η(1)t,T is the residuals of the projection of Xt,T onto H(Xt−1,T ,Yt,T ), whereas η(2)

t,T is theresiduals of the projection of Yt,T ontoH(Xt,T ,Yt−1,T ).

Furthermore, we have

Cov

⎧⎨

⎩

⎛

⎝ξ(1)t,T

η(2)t,T

⎞

⎠,

⎛

⎝ξ(1)t,T

η(2)t,T

⎞

⎠

⎫⎬

⎭ =

⎛

⎝G

(1,1)t,T 0

0 G(2,2)t,T −G(2,1)

t,T G(1,1)t,T

−1G

(1,2)t,T

⎞

⎠

=

⎛

⎝G

(1,1)t,T 0

0 G(2,2)t,T

⎞

⎠ = Gt,T = G1/2t,T G

1/2t,T ,

(2.18)

so we can see that η(2)t,T is orthogonal to ξ(1)t,T . For a d(Z) × d(Z) matrix

Ft,T =

⎡

⎣Id(X) 0

−G(2,1)t,T G

(1,1)t,T

−1Id(Y )

⎤

⎦, (2.19)

we have(ξ(1)t,T

η(2)t,T

)= Ft,T

(ξ(1)t,T

ξ(2)t,T

).

If we set

Φt,T (z) = Φt,T (z)Φt,T (0)−1Ft,T−1G1/2

t,T

= Γt,T (z)G1/2t,T ,

(2.20)

we have the following lemma.

Lemma 2.7. Φt,T (z) is an analytic function in the unit disc with Φt,T (0)Φt,T (0)∗ = Gt,T and thus

maximal, such that the time-varying spectral density ft,T (λ) has a factorization

ft,T (λ) =1

2πΦt,T

(eiλ)Φt,T

(eiλ)∗. (2.21)


From this lemma, it is seen that time-varying spectral density is decomposed into twoparts:

f(xx)t,T (λ) =1

2π

{Γ(1,1)t,T

(eiλ)G

(1,1)t,T Γ(1,1)t,T

(eiλ)∗

+ Γ(1,2)t,T

(eiλ)G

(2,2)t,T Γ(1,2)t,T

(eiλ)∗}

, (2.22)

where Γ(1,1)t,T (z) is a d(X) × d(X) left-upper submatrix of Γt,T (z). The former part is related to

the process {ξ(1)t,T } whereas the latter part is related to the process {η(2)t,T }, which is orthogonal

to {ξ(1)t,T }. This relation suggests frequency-wise measure of causality, from {Yt,T} to {Xt,T} attime t:

M(Y →X)t,T (λ) = log

∣∣∣f(xx)t,T (λ)∣∣∣

∣∣∣(1/2π){Γ(1,1)t,T

(eiλ)G

(1,1)t,T Γ(1,1)t,T

(eiλ)∗}∣∣∣

. (2.23)

Similarly, we propose

M(X→Y )t,T (λ) = log

∣∣∣f(yy)t,T (λ)∣∣∣

∣∣∣(1/2π){Δ(2,2)t,T

(eiλ)G

(2,2)t,T Δ(2,2)

t,T

(eiλ)∗}∣∣∣

,

M(X,Y )t,T (λ) = − log

∣∣∣Id(Y ) − f(yx)t,T (λ)f(xx)t,T (λ)−1f(xy)t,T (λ)f(yy)t,T (λ)−1∣∣∣,

M(X·Y )t,T (λ) = log

∣∣∣(1/2π){Γ(1,1)t,T

(eiλ)G

(1,1)t,T Γ(1,1)t,T

(eiλ)∗}∣∣∣

∣∣∣(1/2π){Δ(2,2)t,T

(eiλ)G

(2,2)t,T Δ(2,2)

t,T

(eiλ)∗}∣∣∣

|ft,T (λ)| ,

(2.24)

where Δ(2,2)t,T (z) is in the same manner of Γ(1,1)t,T (z).

Now, we introduce the following assumption.

Assumption 2.8. The roots of |Γ(1,1)t,T (z)| and |Δ(2,2)t,T (z)| all lie outside the unit circle.

The relation of frequency-wise measure to overall measure is addressed in the follow-ing result.

Theorem 2.9. Under Assumptions 2.1–2.8, we have

M(·)t,T =

12π

∫π

−πM

(·)t,T (λ)dλ. (2.25)

If Assumptions 2.1–2.6 hold, but Assumption 2.8 does not hold, then

M(Y →X)t,T >

12π

∫π

−πM

(Y →X)t,T (λ)dλ, M

(X→Y )t,T >

12π

∫π

−πM

(X→Y )t,T (λ)dλ. (2.26)


Remark 2.10. SinceH(Zt,T ) =H(ξt,T ) =H(ηt,T ) andH(Zt,T ) ⊇ H(Xt,T ,η(2)t,T ) ⊇ H(ηt,T ), we can

see that H(Zt,T ) = H(Xt,T ,η(2)t,T ). Therefore, the best one-step prediction error of the process

{ Xt,Tη(2)t,T

}is given by

{ξ(1)t,T

η(2)t,T

}. Let ft,T (λ) be a time-varying spectral density matrix of the process

{ Xt,Tη(2)t,T

}and denote the partition by

ft,T (λ) =

⎛⎜⎝

f(xx)t,T (λ) f(1,2)t,T (λ)

f(2,1)t,T (λ)1

2πG

(2,2)t,T

⎞⎟⎠. (2.27)

Then, we obtain another representation of frequency-wise measure of causality, from {Yt,T}to {Xt,T} at time t:

M(Y →X)t,T (λ) = log

∣∣∣f(xx)t,T (λ)∣∣∣

∣∣∣f(xx)t,T (λ) − 2π f(1,2)t,T (λ)G(2,2)−1t,T f(2,1)t,T (λ)

∣∣∣. (2.28)

This relation suggests that we apply the nonparametric time-varying spectral density

estimator of the residual process{ Xt,Tη(2)t,T

}. However, this problem requires another paper. We

will make it as a further work.

3. Nonparametric Spectral Estimator of Multivariate LocallyStationary Processes

In this section we introduce the nonparametric spectral estimator of multivariate locallystationary processes. First, we make the following assumption on the transfer function matrixA(u, λ).

Assumption 3.1. (i) The transfer function matrix A(u, λ) is 2π-periodic in λ, and the periodicextension is twice differentiable in u and λ with uniformly bounded continuous derivatives∂2/∂u2A, ∂2/∂λ2A and (∂/∂u)(∂/∂λ)A. Furthermore, the uniformly bounded continuousderivative (∂2/∂u2)(∂/∂λ)A also exists.

(ii) All the eigenvalues of f(u, λ) are bounded from below and above by some constantsδ1, δ2 > 0 uniformly in u and λ.

As an estimator of f(u, λ), we use the nonparametric estimator of kernel type definedby

f(u, λ) =∫π

−πWT

(λ − μ)IN

(u, μ

)dμ, (3.1)


where WT (ω) = M∑∞

ν=−∞W(M(ω + 2πν)) is the weight function and M > 0 depends on T ,and IN(u, λ) is the localized periodogram matrix over the segment {[uT]−N/2+1, [uT]+N/2}defined as

IN(u, λ) =1

2πH2,N

{N∑

s=1

h( s

N

)Z[uT]−N/2+s,T exp{iλs}

}

×{

N∑

r=1

h( r

N

)Z[uT]−N/2+r,T exp{iλr}

}∗.

(3.2)

Here h : [0, 1] → R is a data taper and H2,N =∑N

s=1 h(s/N)2. It should be noted thatIN(u, λ) is not a consistent estimator of the time-varying spectral density. To make a consistentestimator of f(u, λ) we have to smooth it over neighbouring frequencies.

Now we impose the following assumptions on W(·) and h(·).

Assumption 3.2. The weighted functionW : R → [0,∞] satisfiesW(x) = 0 for x /∈ [−1/2, 1/2]and is continuous and even function satisfying

∫1/2−1/2 W(x)dx = 1 and

∫1/2−1/2 x

2W(x)dx <∞.

Assumption 3.3. The data taper h : R → R satisfies (i) h(x) = 0 for all x /∈ [0, 1] and h(x) =h(1 − x); (ii) h(x) is continuous on R, twice differentiable at all x /∈ U where U is a finite setof R, and supx/∈U|h′′(x)| <∞. Write

Kt(x) :=

{∫1

0h(x)2dx

}−1

h

(x +

12

)2

, x ∈[−1

2,

12

], (3.3)

which plays a role of kernel in the time domain.

Furthermore, we assume the following.

Assumption 3.4. M =M(T) and N =N(T), M N T satisfy

√T

M2= o(1),

N2

T3/2= o(1),

√T logNN

= o(1). (3.4)

The following lemmas are multivariate version of Theorem 2.2 of Dahlhaus [10] andTheorem A.2 of Dahlhaus [7] (see also [11]).

Lemma 3.5. Assume that Assumptions 3.1–3.4 hold. Then(i)

E(IN(u, λ)) = f(u, λ) +N2

2T2

∫1/2

−1/2x2Kt(x)2dx

∂2

∂u2f(u, λ)

+ o

(N2

T2

)+O

(logNN

),

(3.5)


(ii)

E(f(u, λ)

)= f(u, λ) +

N2

2T2

∫1/2

−1/2x2Kt(x)2dx

∂2

∂u2f(u, λ)

+1

2M2

∫1/2

−1/2x2W(x)2dx

∂2

∂λ2f(u, λ)

+ o

(N2

T2+M−2

)+O

(logNN

),

(3.6)

(iii)

m∑

i,j=1

Var(fi,j(u, λ)

)=M

N

m∑

i,j=1

fi,j(u, λ)2∫1/2

−1/2Kt(x)2dx

×∫1/2

−1/2W(x)2dx(2π + 2π{λ ≡ 0 mod π}) + o

(M

N

).

(3.7)

Hence, we have

E∥∥∥f(u, λ) − f(u, λ)

∥∥∥2= O

(M

N

)+O

(M−2 +N2T−2

)2= O

(M

N

), (3.8)

where ‖D‖ is the Euclidean norm of the matrix D and ‖D‖ = {tr{DD∗}}1/2.

Lemma 3.6. Assume that Assumptions 3.1–3.4 hold. Let φj(u, λ), j = 1, . . . , k be d(Z)×d(Z) matrix-valued continuous function on [0, 1] × [−π,π] which satisfies the same conditions as the transferfunction matrix A(u, λ) in Assumption 3.1 and φj(u, λ)

∗ = φj(u, λ), φj(u,−λ) = φj(u, λ)′. Then

LT(φj)=√T

{1T

T∑

t=1

∫π

−πtr{φj

(t

T, λ

)IN(t

T, λ

)}dλ

−∫1

0

∫π

−πtr{φj(u, λ)f(u, λ)

}dλdu

}, j = 1, . . . , k

(3.9)


have, asymptotically, a normal distribution with zero mean vector and covariance matrix V whose(i, j)-the element is

4π∫1

0

[∫π

−πtr{φi(u, λ)f(u, λ)φj(u, λ)f(u, λ)

}dλ

+1

4π2

∑

a1,a2,a3,a4

∑

b1,b2,b3,b4

κb1,b2,b3,b4

×∫π

−π

∫π

−πφi(u, λ)a1,a2

φj(u, μ

)a4,a3·A(u, λ)a2,b1

A(u,−λ)a1,b2

×A(u,−μ)a4,b3A(u, μ

)a3,b4

dλdμ

]du.

(3.10)

Assumption 3.4 does not coincide with Assumption A.1(ii) of Dahlhaus [7]. Asmentioned in A.3 Remarks of Dahlhaus [7, page 27], Assumption A.1(ii) of Dahlhaus [7]is required because of the

√T -unbiasedness at the boundary 0 and 1. If we assume that

{Z2−N/2,T , . . . ,Z0,T} and {ZT+1,T , . . . ,ZT+N/2,T} are available with Assumption 3.4, then fromLemma 3.5 (i)

E(LT(φj))

=√TE

{1T

T∑

t=1

∫π

−πtr{φj

(t

T, λ

)IN(t

T, λ

)}dλ

−∫1

0

∫π

−πtr{φj(u, λ)f(u, λ)

}dλdu

}

= O

(√T

(N2

T2+

logNN

+1T

))= o(1).

(3.11)

4. Testing Problem for Linear Dependence

In this section we discuss the testing problem for linear dependence. The average measureof linear dependence is given by the following integral functional of time varying spectraldensity:

limT→∞

T−1T∑

t=1

M(X,Y )t,T =

∫1

0

∫π

−π− 1

2πlog∣∣∣Id(Y ) − fyx(u, λ)fxx(u, λ)−1fxy(u, λ)fyy(u, λ)

−1∣∣∣dλdu

=∫1

0

∫π

−πK(X,Y ){f(u, λ)}dλdu,

(4.1)


where

K(X,Y ){f(u, λ)} ≡ − 12π

log∣∣∣Id(Y ) − fyx(u, λ)fxx(u, λ)−1fxy(u, λ)fyy(u, λ)

−1∣∣∣. (4.2)

We consider the testing problem for existence of linear dependence:

H :∫1

0

∫π

−πK(X,Y ){f(u, λ)}dλdu = 0 (4.3)

against

A :∫1

0

∫π

−πK(X,Y ){f(u, λ)}dλdu/= 0. (4.4)

For this testing problem, we define the test statistics ST as

ST =√T

∫1

0

∫π

−πK(X,Y )

{f(u, λ)

}dλdu, (4.5)

then, we have the following result.

Theorem 4.1. UnderH,

STD−→ N

(0, V 2

K(X,Y )

), (4.6)

where the asymptotic variance of ST is given by

V 2K(X,Y )

= 4π∫1

0

[∫π

−πtr[f(u, λ)K(1)

(X,Y ){f(u, λ)}′]2dλ

+1

4π2

∑

a,b,c,d

κa,b,c,dγb,a(u)γc,d(u)

]du,

(4.7)

with

Γ(u) ={γ(u)

}a,b=1,...,d(Z) =

∫π

−πA(u, λ)∗K(1)

(X,Y ){f(u, λ)}A(u, λ)dλ, (4.8)

and K(1)(X,Y )(·) is the first derivative of K(X,Y )(·).


To simplify, {Zt,T} is assumed to be Gaussian locally stationary process. Then, theasymptotic variance of ST becomes the integral functional of the time-varying spectraldensity:

V 2K(X,Y )

= 4π∫1

0

∫π

−πtr[f(u, λ)K(1)

(X,Y ){f(u, λ)}′]2dλ

= V 2K(X,Y )

{f(u, λ)}.(4.9)

If we take V 2K(X,Y )

= V 2K(X,Y )

{f(u, λ)}, then V 2K(X,Y )

is consistent estimator of asymptotic var-iance, so, we have

LT =ST√V 2K(X,Y )

D−→ N(0, 1). (4.10)

Next, we introduce a measure of goodness of our test. Consider a sequence of alterna-tive spectral density matrices:

gT (u, λ) = f(u, λ) +1√Tb(u, λ), (4.11)

where b(u, λ) is a d(Z) ×d(Z) matrix whose entries bab(u, λ) are square-integrable functions on[0, 1] × [−π,π].

Let EgT (·) and Vf(·) denote the expectation under gT (u, λ) and the variance underf(u, λ), respectively. It is natural to define an efficacy of LT by

eff(LT ) = limT→∞

EgT (ST )√Vf(ST )

(4.12)

in line with the usual definition for a sequence of “parametric alternatives.” Then we see that

eff(LT ) = limT→∞

√T∫π−π[K(X,Y ){gT (u, λ)} −K(X,Y ){f(u, λ)}

]dλ

V 2K(X,Y )

=

∫π−π tr

[K

(1)(X,Y ){f(u, λ)}b(u, λ)′

]dλ

V 2K(X,Y )

.

(4.13)

For another test L∗T we can define an asymptotic relative efficiency (ARE) of LT relativeto L∗T by

ARE(LT , L

∗T

)=

{eff(LT )eff(L∗T)}2

. (4.14)


Table 1: LT in (4.10) for each two companies.

1:Hi 2:Ma 3:Sh 4:So 5:Ho 6:Ni 7:To

1:Hi — — — — — — —

2:Ma 18.79 — — — — — —

3:Sh 19.86 18.93 — — — — —

4:So 19.22 19.18 18.27 — — — —

5:Ho 15.35 14.46 15.17 15.42 — — —

6:Ni 15.18 15.03 15.84 16.58 19.24 — —

7:To 15.86 16.06 16.00 16.61 20.57 19.12 —

Q3 Q1 Q3 Q1 Q3 Q1 Q3 Q1

0.4

0.6

0.8

1

1.2

1.4

Hon

da-

Toyo

ta

2000–2004

Figure 1: The daily linear dependence between HONDA and TOYOTA.

If we take the test statistic based on stationary assumption as another test L∗T , we canmeasure the effect of nonstationarity when the process concerned is locally stationary process.

Finally, we discuss a testing problem of linear dependence for stock prices of TokyoStock Exchange. The data are daily log returns of 7 companies; 1 : HITACHI 2 : MATSUSHITA3 : SHARP 4 : SONY 5 : HONDA 6 : NISSAN 7 : TOYOTA. The individual time series are 1174data points since December 28, 1999 until October 1, 2004. We compute LT in (4.10) for eachtwo companies. The selected parameters are T = 1000, N = 175, and M = 8, where N is thelength of segment which the localized periodogram is taken over and M is the bandwidth ofthe weight function.

The results are listed in Table 1. It shows that all values for each two companies arelarge. Since under null hypothesis the limit distribution of LT is standard normal, we canconclude hypothesis is rejected. Namely, the linear dependencies exist at each two companies.In particular, the values both among electric appliance companies and among automobilecompanies are significantly large. Therefore, we can see that the companies in the samebusiness have strong dependence.

In Figures 1 and 2, the daily linear dependence between HONDA and TOYOTA andbetween HITACHI and SHARP is plotted. They show that the daily dependencies are not


Q3 Q1 Q3 Q1 Q3 Q1 Q3 Q10.5

0.7

0.9

1.1

Hit

achi

-Sha

rp

2000–2004

Figure 2: The daily linear dependence between HITACHI and SHARP.

constant and change in time. So, it seems to be reasonable that we use the test statistic basedon nonstationary assumption.

References

[1] J. Geweke, “Measurement of linear dependence and feedback between multiple time series,” Journalof the American Statistical Association, vol. 77, no. 378, pp. 304–313, 1982.

[2] Y. Hosoya, “The decomposition and measurement of the interdependency between second-orderstationary processes,” Probability Theory and Related Fields, vol. 88, no. 4, pp. 429–444, 1991.

[3] M. Taniguchi, M. L. Puri, and M. Kondo, “Nonparametric approach for non-Gaussian vector station-ary processes,” Journal of Multivariate Analysis, vol. 56, no. 2, pp. 259–283, 1996.

[4] M. B. Priestley, Spectral Analysis and Time Series, Academic Press, London, UK, 1981.[5] R. Dahlhaus, “On the Kullback-Leibler information divergence of locally stationary processes,” Sto-

chastic Processes and Their Applications, vol. 62, no. 1, pp. 139–168, 1996.[6] R. Dahlhaus, “Maximum likelihood estimation and model selection for locally stationary processes,”

Journal of Nonparametric Statistics, vol. 6, no. 2-3, pp. 171–191, 1996.[7] R. Dahlhaus, “Fitting time series models to nonstationary processes,” The Annals of Statistics, vol. 25,

no. 1, pp. 1–37, 1997.[8] R. Dahlhaus, “A likelihood approximation for locally stationary processes,” The Annals of Statistics,

vol. 28, no. 6, pp. 1762–1794, 2000.[9] Yu. A. Rozanov, Stationary Random Processes, Holden-Day, San Francisco, Calif, USA, 1967.[10] R. Dahlhaus, “Asymptotic statistical inference for nonstationary processes with evolutionary

spectra,” in Athens Conference on Applied Probability and Time Series 2, vol. 115 of Lecture Notes in Statist,pp. 145–159, Springer, New York, NY, USA, 1996.

[11] K. Sakiyama and M. Taniguchi, “Discriminant analysis for locally stationary processes,” Journal ofMultivariate Analysis, vol. 90, no. 2, pp. 282–300, 2004.


Research ArticleOptimal Portfolios with End-of-Period Target

Hiroshi Shiraishi,1 Hiroaki Ogata,2 Tomoyuki Amano,3Valentin Patilea,4 David Veredas,5 and Masanobu Taniguchi6

1 The Jikei University School of Medicine, Tokyo 1828570, Japan2 School of International Liberal Studies, Waseda University, Tokyo 1698050, Japan3 Faculty of Economics, Wakayama University, Wakayama 6408510, Japan4 CREST, Ecole Nationale de la Statistique et de l’Analyse de l’Information, France5 ECARES, Solvay Brussels School of Economics and Management, Universite libre de Bruxelles,CP114/04, Avenue F.D. Roosevelt 50, 1050, Brussels, Belgium

6 Department of Applied Mathematics, School of Fundamental Science and Engineering,Waseda University, Tokyo 1698555, Japan

Correspondence should be addressed to Hiroshi Shiraishi, [email protected]

Received 7 November 2011; Accepted 22 December 2011


Copyright q 2012 Hiroshi Shiraishi et al. This is an open access article distributed under theCreative Commons Attribution License, which permits unrestricted use, distribution, andreproduction in any medium, provided the original work is properly cited.

We study the estimation of optimal portfolios for a Reserve Fund with an end-of-period target andwhen the returns of the assets that constitute the Reserve Fund portfolio follow two specifications.In the first one, assets are split into short memory (bonds) and long memory (equity), andthe optimality of the portfolio is based on maximizing the Sharpe ratio. In the second, returnsfollow a conditional heteroskedasticity autoregressive nonlinear model, and we study when thedistribution of the innovation vector is heavy-tailed stable. For this specification, we considerappropriate estimation methods, which include bootstrap and empirical likelihood.

1. Introduction

The Government Pension Investment Fund (GPIF) of Japan was established in April 1st 2006as an independent administrative institution with the mission of managing and investing theReserve Fund of the employees’ pension insurance and the national pension (http://www.gpif.go.jp/for more information) [1]. It is the world’s largest pension fund ($1.4 trillionsin assets under management as of December 2009), and it has a mission of managing andinvesting the Reserve Funds in safe and efficient investment with a long-term perspective. Businessmanagement targets to be achieved by GPIF are set by the Minister of Health, Labour, andWelfare based on the law on the general rules of independent administrative agencies. In theactuarial science, “required Reserve Fund” for pension insurance has been investigated for along time. The traditional approach focuses on the expected value of future obligations and


interest rate. Then, the investment strategy is determined for exceeding the expected value ofinterest rate. Recently, solvency for the insurer is defined in terms of random values of futureobligations (e.g., Olivieri and Pitacco [2]). In this paper, we assume that the Reserve Fund isdefined in terms of the random interest rate and the expected future obligations. Then, wepropose optimal portfolios by optimizing the randomized Reserve Fund.

The GPIF invests in a portfolio of domestic and international stocks and bonds. In thispaper, we consider the optimal portfolio problem of the Reserve Fund under two econometricspecifications for the asset’s returns.

First, we select the optimal portfolio weights based on the maximization of the Sharperatio under three different functional forms for the portfolio mean and variance, two of themdepending on the Reserve Fund at the end-of-period target (about 100 years). Followingthe asset structure of the GPIF, we split the assets into cash and domestic and foreignbonds on one hand and domestic and foreign equity on the other. The first type of assetsare assumed to be short memory, while the second type are long memory. To obtain theoptimal portfolio weights, we rely on bootstrap. For the short memory returns, we usewild bootstrap (WB). Early work focuses on providing first- and second-order theoreticaljustification for the wild bootstrap in the classical linear regression model (see, e.g., [3]).Goncalves and Kilian [4] show that WB is applicable for the linear regression model withconditional heteroscedastic such as stationary ARCH, GARCH, and stochastic volatilityeffects. For the long memory returns, we apply sieve bootstrap (SB). Buhlmann [5] establishesconsistency of the autoregressive sieve bootstrap. Assuming that the long memory processcan be written as AR(∞) and MA(∞) processes, we estimate the long memory parameterby means of the Whittle’s approximate likelihood [6]. Given this estimator, the residualsare computed and resampled for the construction of the bootstrap samples, from whichthe optimal portfolio estimated weights are obtained. We study the usefulness of theseprocedures with an application to the GPIF assets.

Second, we consider the case when the returns are time dependent and follow a heavy-tailed. It is known that one of the stylized facts of financial returns are heavy tails. It is,therefore, reasonable to use the stable distribution, instead of the Gaussian, since it allowsfor skewness and fat tails. We couple this distribution with the conditional heteroskedasticityautoregressive nonlinear (CHARN) model that nests many well-known time series models,such as ARMA and ARCH. We estimate the parameters and the optimal portfolio by meansof empirical likelihood.

The paper is organized as follows. Section 2 sets the Reserve Fund portfolio problem.Section 3 focuses on the first part, that is, estimation in terms of the Sharpe ratio and discussesthe bootstrap procedure. Section 4 covers the CHARN model under stable innovations andthe estimation by means of empirical likelihood. Section 5 concludes.

2. Reserve Funds Portfolio with End-of-Period Target

Let Si,t be the price of the ith asset at time t (i = 1, . . . , k), and let Xi,t be its log-return. Timeruns from 0 to T . The paper, we consider that today is T0 and T is the end-of-period target.Hence the past and present observations run for t = 0, . . . , T0, and the future until the end-of-period target for t = T0 + 1, . . . , T . The price Si,t can be written as

Si,t = Si,t−1 exp{Xi,t} = Si,0 exp

(t∑

s=1

Xi,s

), (2.1)


where Si,0 is the initial price. Let Fi,t denote the Reserve Fund on asset i at time t and bedefined by

Fi,t = Fi,t−1 exp{Xi,t} − ci,t, (2.2)

where ci,t denotes the maintenance cost at time t. By recursion, Fi,t can be written as

Fi,t = Fi,t−1 exp{Xi,t} − ci,t

= Fi,t−2 exp

(t∑

s=t−1

Xi,s

)−

t∑

s=t−1

ci,s exp

(t∑

s′=s+1

Xi,s′

)

= Fi,0 exp

(t∑

s=1

Xi,s

)−

t∑

s=1

ci,s exp

(t∑

s′=s+1

Xi,s′

),

(2.3)

where Fi,0 = Si,0.We gather the Reserve Funds in the vector Ft = (F1,t, . . . , Fk,t). Let Ft(α) = α′Ft

be a portfolio form by the k Reserve Funds, which depend on the vector of weights α =(α1, . . . , αk). The portfolio Reserve Fund can be expressed as a function of all past returns

Ft(α) ≡k∑

i=1

αiFi,t

=k∑

i=1

αi

(Fi,0 exp

(t∑

s=1

Xi,s

)−

t∑

s=1

ci,s exp

(t∑

s′=s+1

Xi,s′

)).

(2.4)

We are interested in maximizing Ft(α) at the end-of-period target FT (α)

FT (α) =k∑

i=1

αi

(Fi,T0 exp

(T∑

s=T0+1

Xi,s

)−

T∑

s=T0+1

ci,s exp

(T∑

s′=s+1

Xi,s′

)). (2.5)

It depends on the future returns, the maintenance cost, and the portfolio weights.While the first two are assumed to be constant from T0 to T (the constant return can be seenas the average return over the T − T0 periods), we focus on the optimality of the weights thatwe denote by αopt.

3. Sharpe-Ratio-Based Optimal Portfolios

In the first specification, the estimation of the optimal portfolio weights is based on the max-imization of the Sharpe ratio:

αopt = arg maxα

μ(α)σ(α)

, (3.1)


under different functional forms of the expectation μ(α) and the risk σ(α) of the portfolio. Wepropose three functional forms, two of them depending on the Reserve Fund. The first one isthe traditional based on the returns:

μ(α) = α′E(XT ), σ(α) =√α′V (XT )α, (3.2)

where E(XT ) and V (XT ) are the expectation and the covariance matrix of the returns at theend-of-period target. The second form for the portfolio expectation and risk is based on thevector of Reserve Funds:

μ(α) = α′E(FT ), σ(α) =√α′V (FT )α, (3.3)

where E(FT ) and V (FT ) indicate the mean and covariance of the Reserve Funds at time T .Last, we consider the case where the portfolio risk depends on the lower partial moments ofthe Reserve Funds at the end-of-period target:

μ(α) = α′E(FT ), σ(α) = E{(F − FT (α)

)I

(FT (α) < F

)}, (3.4)

where F is a given value.Standard portfolio management rules are based on a mean-variance approach, for

which risk is measured by the standard deviation of the future portfolio value. However,the variance often does not provide a correct assessment of risk under dependency and non-Gaussianity. To overcome this problem, various optimization models have been proposedsuch as mean-semivariance model, mean-absolute deviation model, mean-variance-skewnessmodel, mean-(C)VaR model, and mean-lower partial moment model. The mean-lower partialmoment model is an appropriate model for reducing the influence of heavy tails.

The k returns are split into p- and q-dimensional vectors {XSt ; t ∈ Z} and {XLt ; t ∈ Z},where S and L stand for short and long memory, respectively. The short memory returnscorrespond to cash and domestic and foreign bonds, which we generically denote by bonds.The long memory returns correspond to domestic and foreign equity, which we denote asequity.

Cash and bonds follow the nonlinear model

XSt = μS +H(XSt−1, . . . ,X

St−m

)εSt , (3.5)

where μS is a vector of constants, H : Rmp → R

p × Rp is a positive definite matrix-valued

measurable function, and εSt = (εS1,t, . . . , εSp,t) are i.i.d. random vectors with mean 0 and

covariance matrix ΣS. By contrast, equity returns follow a long memory nonlinear model

XLt =∞∑

ν=0

φνεLt−ν, εLt =

∞∑

ν=0

ψνXLt−ν, (3.6)


where

φν =Γ(ν + d)

Γ(ν + 1)Γ(d), ψν =

Γ(ν − d)Γ(ν + 1)Γ(−d) (3.7)

with −1/2 < d < 1/2, and εLt = (εL1,t, . . . , εLp,t) are i.i.d. random vectors with mean 0 and

covariance matrix ΣL.We estimate the optimal portfolio weights by means of bootstrap. Let the superindexes

(S, b) and (L, b) denote the bootstrapped samples for the bonds and equity, respectively, andB the total number of bootstrapped samples. In the sequel, we show the bootstrap procedurefor both types of assets.

Bootstrap Procedure for X(S,b)t

Step 1. Generate the i.i.d. sequences {ε(S,b)t } for t = T0 + 1, . . . , T and b = 1, . . . , B from N(0, Ip).

Step 2. Let YSt = XSt − μS, where μS = (1/T0)∑T0

s=1XSs . Generate the i.i.d. sequences {Y(S,b)

t } fort = T0 + 1, . . . , T and b = 1, . . . , B from the empirical distribution of {YSt }.

Step 3. Compute {X(S,b)t } for t = T0 + 1, . . . , T and b = 1, . . . , B as

X(S,b)t = μS + Y(S,b)

t � ε(S,b)t , (3.8)

where � denotes the cellwise product.

Bootstrap Procedure for X(L,b)t

Step 1. Estimate d from the observed returns by means of Whittle’s approximate likelihood:

d = arg mind∈(0,1/2)

L(d,Σ), (3.9)

where

L(d,Σ) =2T0

(T0−1)/2∑

j=1

{log det f

(λj,T0 , d,Σ

)+ tr

(f(λj,T0 , d,Σ

)−1I(λj,T0

))},

f(λ, d,Σ) =

∣∣1 − exp(iλ)∣∣−2d

2πΣ,

I(λ) =1

√2πT0

∣∣∣∣∣

T0∑

t=1

XLt eitλ

∣∣∣∣∣

2

,

λj,T0 =2πjT0

.

(3.10)


Step 2. Compute {εLt } for t = 1, . . . , T0,

εLt =t−1∑

k=0

πkXLt−k, where πk =

⎧⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎩

Γ(k − d

)

Γ(k + 1)Γ(−d

) , k ≤ 100,

k−d−1

Γ(−d

) , k > 100.

(3.11)

Step 3. Generate {ε(L,b)t } for t = T0 + 1, . . . , T and b = 1, . . . , B from the empirical distributionof {εLt }.

Step 4. Generate {X(L,b)t } for t = T0 + 1, . . . , T and b = 1, . . . , B as

X(L,b)t =

t−T0−1∑

k=0

τkε(L,b)t−k +

t−1∑

k=t−T0

τkεt−k. (3.12)

We gather X(S,b)t and X∗(L,b)t into X(b)

t = (X(S,b)t ,X(L,b)

t ) = (X∗(b)1,t , . . . , X∗(b)p+q,t) for t = T0 +

1, . . . , T and b = 1, . . . , B. The bootstrapped Reserve Funds F(b)T = (F(b)1,T , . . . , F

(b)p+q,T)

F(b)i,T = FT0 exp

(T∑

s=T0+1

X(b)i,s

)−

T∑

s=T0+1

ci,s exp

(T∑

s′=s+1

X(b)i,s′

). (3.13)

And the bootstrapped Reserve Fund portfolio is

F(b)T (α) = α′F(b)T =

p+q∑

i=1

αiF(b)i,T . (3.14)

Finally, the estimated portfolio weights that give the optimal portfolio are

αopt = arg maxα

μ(b)(α)σ(b)(α)

, (3.15)

where μ(b)(α) and σ(b)(α) may take any of the three forms introduced earlier but be evaluatedin the bootstrapped returns or Reserve Funds.

3.1. An Illustration

We consider monthly log-returns from January 31 1971 to October 31 2009 (466 observations)of the five types of assets considered earlier: domestic bond (DB), domestic equity (DE),foreign bond (FB), foreign equity (FE), and cash (cash). Cash and bonds are gathered in theshort-memory panel XSt = (X(DB)

t , X(FB)t , X

(cash)t ) and follow (3.5). Equities are gathered into


100 300 500

0.1

DB DE

FB FE

Cash

−0.2

0.1

−0.2

0.1

−0.2

Figure 1

Table 1: Estimated optimal portfolio weights (Section 3).

DB DE FB FE CashReturns 0.95 0.00 0.00 0.00 0.05Reserve fund 0.75 0.00 0.20 0.00 0.05Lowe partial 0.85 0.10 0.00 0.00 0.05

the long-memory panel XLt = (X(DE)t , X

(FE)t ) and follow (3.6). Figure 1 shows the five assets.

Cash is virtually constant, and equities are significantly more volatile than bonds and withaverages that are slightly higher than those of bonds.

We estimate the optimal portfolio weights, αopt1, αopt2, and αopt3, corresponding to thethree forms for the expectation and risk of the Sharpe ratio, and we compute the trajectory ofthe optimal Reserve Fund for t = T0 + 1, . . . , T . Because of liquidity reasons, the portfolioweight for cash is kept constant to 5%. The target period is fixed to 100 years, and themaintenance cost is based on the 2004 Pension Reform.

Table 1 shows the estimated optimal portfolio weights for the three different choicesof the portfolio expectation and risk. The weight of domestic bonds is very high and clearlydominates over the other assets. Domestic bonds are low risk and medium return, whichis in contrast with equity that shows higher return but also higher risk, and with foreignbonds that show low return and risk. Therefore, in a sense, domestic bonds are a compromisebetween the characteristic of the four equities and bonds.

Figure 2 shows the trajectory of the future Reserve Fund for different values of theyearly return (assumed to be constant from T0 + 1 to T) ranging from 2.7% to 3.7%. Since theinvestment term is extremely long, 100 years, the Reserve Fund is quite sensitive to the choiceof the yearly return. In the 2004 Pension Reform, authorities assumed a yearly return of theportfolio of 3.2%, which corresponds to the middle thick line of the figure.

4. Optimal Portfolio with Time-Dependent Returns and Heavy Tails

In this section, we consider the second scenario where returns follow a dependent mod-el with stable innovations. The theory of portfolio choice is mostly based on the assump-tion that investors maximize their expected utility. The most well-known utility is the Mark-owitz’s mean-variance function that is optimal under Gaussianity. However, it is widely


2004 2020 2036 2052 2068 2084 2100

0

200

400

600

800

1000

YR = 2.7%YR = 2.8%YR = 2.9%YR = 3%

YR = 3.1%YR = 3.2% (base)YR = 3.3%YR = 3.4%

YR = 3.5%YR = 3.6%YR = 3.7%

Figure 2

acknowledged that financial returns show fat tails and, frequently, skewness. Moreover, thevariance may not always be the best risk measure. Since the purpose of GPIF is to avoidmaking a big loss at a certain point in future, risk measures that summarize the probabilitythat the Reserve Fund is below the prescribed level at a certain future point, such as valueat risk (VaR), are more appropriate [7]. In addition, the traditional mean-variance approachconsiders that returns are i.i.d., which is not realistic as past information may help to explaintoday’s distribution of returns.

We need a specification that allows for heavy tails and skewness and time depend-encies. This calls for a general model with location and scale that are a function of pastobservations and with innovations that are stable distributed. The location-scale model forthe returns is the conditional heteroscedastic autoregressive nonlinear (CHARN), which isfairly general and it nests important models such as ARMA and ARCH.

Estimation of the parameters in a stable framework is not straightforward since thedensity does not have a closed form (Maximum likelihood is feasible for the i.i.d. uni-variate case thanks to the STABLE packages developed by John Nolan—see Nolan [8]and the website http://academic2.american.edu/∼jpnolan/stable/stable.html. For morecomplicated cases, including dynamics, maximum likelihood is a quite complex task.). Werely on empirical likelihood, which is one of the nonparametric methods, as it has beenalready studied in this context [9]. Once the parameters are estimated, we simulate samplesof the returns, which are used to compute the Reserve Fund at the end-of-period target,and estimate the optimal portfolio weights by means of minimizing the empirical VaR ofthe Reserve Fund at time T .

Suppose that the vector of returns Xt ∈ Rk follows the following CHARN model:

Xt = Fμ(Xt−1, . . . ,Xt−p

)+Hσ

(Xt−1, . . . ,Xt−p

)εt, (4.1)

where Fμ : Rkp → R

k is a vector-valued measurable function with a parameter μ ∈ Rp1

and Hσ : Rkq → R

k × Rk is a positive definite matrix-valued measurable function with a


parameter σ ∈ Rp2 . Each element of the vector of innovations εt ∈ R

k is standardized stabledistributed: εi,t ∼ S(αi, βi, 1, 0) and εi,t’s are independent with respect to both i and t. We setθ = (μ,σ,α,β), where α = (α1, . . . , αk) and β = (β1, . . . , βk).

The stable distribution is often represented by its characteristic function:

φ(ν) = E[exp(iνεi,t)

]= exp

(−γα|ν|α

(1 + iβ sgn(ν) tan

πα

2

(∣∣γν∣∣1−α − 1

))+ iνδ

), (4.2)

where δ ∈ R is a location parameter, γ > 0 is a scale parameter, β ∈ [−1, 1] is a skewnessparameter, and α ∈ (0, 2] is a characteristic exponent that captures the tail thickness of thedistribution: the smaller the α the heavier the tail. The distributions with α = 2 correspondto the Gaussian. The existence of moments is given by α: moments of order higher than α donot exist, with the case of α = 2 being an exception, for which all moments exist.

The lack of important moments may, in principle, render estimation by the method ofmoments difficult. However, instead of matching moments, it is fairly simple to match thetheoretical and empirical characteristic function evaluated at a grid of frequencies [9]. Let

εt = H−1σ

(Xt − Fμ

)(4.3)

be the residual of the CHARN model. If the parameters μ and σ are the true ones, theresiduals εi,t should be independently and identically distributed to S(αi, βi, 1, 0). So the aimis to find the estimated parameters such that the residuals are i.i.d. and stable distributed,meaning that their probability law is expressed by the above characteristic function. Or, inother words, estimate the parameters by matching the empirical and theoretical characteristicfunctions and minimizing their distance. Let J be the number of frequencies at whichwe evaluate the characteristic function: ν1, . . . , νJ . That makes, in principle, a system ofJ matching equations. But since the characteristic function can be split into the real andimaginary parts, φ(ν) = E[cos(νεi,t)] + iE[sin(νεi,t)], we double the dimension of the systemby matching these parts. Let Re(φ(ν)) and Im(φ(ν)) be the real and imaginary parts of thetheoretical characteristic function, and cos(νεi,t) and sin(νεi,t) the empirical counterparts. Theestimating functions are

ψθ(εi,t) =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

cos(ν1εi,t) −Re(φ(ν1)

)

...

cos(νJεi,t

) −Re(φ(νJ))

sin(ν1εi,t) − Im(φ(ν1)

)

...

sin(νJεi,t

) − Im(φ(νJ))

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

, (4.4)

for each i = 1, . . . , k, and gather them into the vector

ψθ(εt) =(ψθ(ε1,t), . . . ,ψθ(εk,t)

). (4.5)


The number of frequencies J and the frequencies themselves are chosen arbitrary. Feuerver-ger and McDunnough [10] show that the asymptotic variance can be made arbitrarily closeto the Cramer-Rao lower bound if the number of frequencies is sufficiently large and thegrid is sufficiently fine and extended. Similarly, Yu [11, Section 2.1] argues that, from theviewpoint of the minimum asymptotic variance, many and fine frequencies are the appro-priate. However, Carrasco and Florens [12] show that too fine frequencies lead to a singularasymptotic variance matrix and we cannot calculate its inverse.

Given the estimating functions (4.5), the natural estimator is constructed by GMM:

θ = arg minθE[ψθ(εt)

′]WE[ψθ(εt)

], (4.6)

where W is a weighting matrix defining the metric (its optimal choice is typically theinverse of the covariance matrix of ψθ(εt)) and the expectations are replaced by samplemoments. GMM estimator can be generalized to the empirical likelihood estimator, whichwas originally proposed by Owen [13] as nonparametric methods of inference based on adata-driven likelihood ratio function (see also [14], for a review and applications). It producesa better variance estimate in one step, while, in general, the optimal GMM requires apreliminary step and a preliminary estimation of an optimal W matrix. We define theempirical likelihood ratio function for θ as

R(θ) = maxp

{T0∏

t=1

T0pt |T0∑

t=1

ptψθ(εt) = 0,T0∑

t=1

pt = 1, pt ≥ 0

}, (4.7)

where p = (p1, . . . , pT0) and the maximum empirical likelihood estimator is

θ = arg maxθR(θ). (4.8)

Qin and Lawless [15] show that this estimator is consistent, asymptotically Gaussian, andwith covariance matrix (B′θ0

A−1θ0Bθ0)

−1, where

Bθ = E[∂ψθ∂θ′

], Aθ = E

[ψθ(εt)ψθ(εt)

′]. (4.9)

Once the parameters are estimated, we compute the optimal portfolio weights andthe portfolio Reserve Fund at the end-of-period target. Because of a notational conflict, theweights are now denoted by a = (a1, . . . , ak). And, for simplicity, we assume that there is nomaintenance cost, so (2.5) simplifies to

FT (a) =k∑

i=1

aiFi,T0 exp

(T∑

t=T0+1

Xi,t

). (4.10)

The procedure to estimate the optimal portfolio weights is as follows.


Step 1. For each asset i = 1, . . . , k, we simulate the innovation process

εi,ti.i.d.∼ S

(αi, βi, 1, 0

), t = T0 + 1, . . . , T (4.11)

based on the maximum empirical likelihood estimator (αi, βi).

Step 2. We calculate the predicted log-returns

Xt = Fμ(Xt−1, . . . , Xt−p

)+Hσ

(Xt−1, . . . , Xt−p

)εt (4.12)

for t = T0+1, . . . , T and based on the estimators (μ, σ) and the simulated εt obtained in Step 1.

Step 3. For a given portfolio weight a, we calculate the predicted values of fund at time T ,FT (a), with (4.10).

Step 4. We repeat Step 1–Step 3 M times and save F(1)T (a), . . . , F(M)

T (a). Then we calculate theproportion that the predicted values fail below the prescribed level F, that is,

g(a) =1M

M∑

m=1

I{F(m)T (a)<F}. (4.13)

Step 5. Minimize g(a) with respect to a: a∗ = arg minag(a).

4.1. An Illustration

In this section, we apply the above procedure to real financial data. We consider the samemonthly log-returns data in Section 3.1. domestic bond (DB), domestic equity (DE), foreignbond (FB), and foreign equity (FE) are assumed to follow the following ARCH(1) model:

Xt = σtεt, σ2 = bX2t−1, (4.14)

respectively. Here b > 0 and εti.i.d.∼ S(α, β, 1, 0). Cash is virtually constant so we assume the

log-return of cash as 0, permanently. Set the present Reserve Fund FT0 = 1 and the targetperiod is fixed to half years.

Table 2 shows the estimated optimal portfolio weights for the different prescribed levelF. The weights of domestic and foreign bonds tend to be high when F is small. Small F impliesthat we want to avoid the loss. On the contrary, the weights of equities become higher whenF is larger. Large F implies that we do not want to miss the chance of big gain. This resultseems to be natural because bonds are lower risk (less volatile) than equities.

5. Conclusions

In this paper, we study the estimation of optimal portfolios for a Reserve Fund with an end-of-period target in two different settings. In the first setting, one assets are split into short


Table 2: Estimated optimal portfolio weights (Section 4).

F DB DE FB FE Cash0.90 0.79 0.03 0.07 0.05 0.050.95 0.49 0.03 0.38 0.05 0.051.00 0.39 0.01 0.54 0.01 0.051.05 0.17 0.36 0.36 0.06 0.051.10 0.37 0.27 0.26 0.05 0.05

memory (bonds) and long memory (equity), and the optimality of the portfolio is basedon maximizing the Sharpe ratio. The simulation result shows that the portfolio weight ofdomestic bonds is quite high. The reason is that the investment term is extremely long (100years). Because the investment risk for the Reserve Fund is exponentially amplified yearby year, the portfolio selection problem for the Reserve Fund is quite sensitive to the year-based portfolio risk. In the second setting, returns follow a conditional heteroskedasticityautoregressive nonlinear model, and we study when the distribution of the innovation vectoris heavy-tailed stable. Simulation studies show that we prefer the bonds when we want toavoid the big loss in the future. The result seems to be natural because the bonds are lessvolatile than the equities.

Acknowledgments

This paper was supported by Government Pension Investment Fund (GPIF). The authorsthank all the related people of GPIF, especially Dr. Takashi Yamashita.

References

[1] GPIF Home Page, http://www.gpif.go.jp/eng/index.html.[2] A. Olivieri and E. Pitacco, “Solvency requirements for life annuities,” in Proceedings of the Proceedings

of the Actuarial Approach for Financial Risks (AFIR ’00), Colloquium, 2000.[3] C.-F. J. Wu, “Jackknife, bootstrap and other resampling methods in regression analysis,” The Annals

of Statistics, vol. 14, no. 4, pp. 1261–1350, 1986.[4] S. Goncalves and L. Kilian, “Bootstrapping autoregressions with conditional heteroskedasticity of

unknown form,” Journal of Econometrics, vol. 123, no. 1, pp. 89–120, 2004.[5] P. Buhlmann, “Sieve bootstrap for time series,” Bernoulli, vol. 3, no. 2, pp. 123–148, 1997.[6] J. Beran, Statistics for Long-Memory Processes, vol. 61 of Monographs on Statistics and Applied Probability,

Chapman and Hall, New York, NY, USA, 1994.[7] T. Shiohama, M Hallin, D. Veredas, and M. Taniguchi, “Dynamic portfolio optimization using gener-

alized conditional heteroskedasticity factor models,” ECARES WP 2010-30, 2010.[8] J. P. Nolan, “Numerical calculation of stable densities and distribution functions,” Communications in

Statistics—Stochastic Models, vol. 13, no. 4, pp. 759–774, 1997.[9] H. Ogata, “Empirical likelihood estimation for a class of stable processes,” Mimeo, 2009.[10] A. Feuerverger and P. McDunnough, “On the efficiency of empirical characteristic function proce-

dures,” Journal of the Royal Statistical Society B, vol. 43, no. 1, pp. 20–27, 1981.[11] J. Yu, “Empirical characteristic function estimation and its applications,” Econometric Reviews, vol. 23,

no. 2, pp. 93–123, 2004.[12] M. Carrasco and J. Florens, “Efficient GMM estimation using the empirical characteristic function,”

IDEI Working Papers 140, Institut d’Economie Industrielle (IDEI), Toulouse, France, 2002.[13] A. B. Owen, “Empirical likelihood ratio confidence intervals for a single functional,” Biometrika, vol.

75, no. 2, pp. 237–249, 1988.


[14] A. B. Owen, Empirical Likelihood, Chapman-Hall/CRC, Boca Raton, Fla, USA, 2001.[15] J. Qin and J. Lawless, “Empirical likelihood and general estimating equations,” The Annals of Statistics,

vol. 22, no. 1, pp. 300–325, 1994.


Research ArticleLeast Squares Estimators for Unit Root Processeswith Locally Stationary Disturbance

Junichi Hirukawa and Mako Sadakata

Faculty of Science, Niigata University, 8050, Ikarashi 2-no-cho, Nishi-ku,Niigata 950-2181, Japan


Received 4 November 2011; Accepted 26 December 2011

Academic Editor: Hiroshi Shiraishi

Copyright q 2012 J. Hirukawa and M. Sadakata. This is an open access article distributed underthe Creative Commons Attribution License, which permits unrestricted use, distribution, andreproduction in any medium, provided the original work is properly cited.

The random walk is used as a model expressing equitableness and the effectiveness of variousfinance phenomena. Random walk is included in unit root process which is a class of nonstationaryprocesses. Due to its nonstationarity, the least squares estimator (LSE) of random walk doesnot satisfy asymptotic normality. However, it is well known that the sequence of partial sumprocesses of random walk weakly converges to standard Brownian motion. This result is so-called functional central limit theorem (FCLT). We can derive the limiting distribution of LSE ofunit root process from the FCLT result. The FCLT result has been extended to unit root processwith locally stationary process (LSP) innovation. This model includes different two types ofnonstationarity. Since the LSP innovation has time-varying spectral structure, it is suitable fordescribing the empirical financial time series data. Here we will derive the limiting distributionsof LSE of unit root, near unit root and general integrated processes with LSP innovation. Testingproblem between unit root and near unit root will be also discussed. Furthermore, we willsuggest two kind of extensions for LSE, which include various famous estimators as specialcases.

1. Introduction

Since the random walk is a martingale sequence, the best predictor of the next termbecomes the value of this term. In this sense, the random walk is used as a modelexpressing equitableness and the effectiveness of various finance phenomena in economics.Furthermore, because the random walk is a unit root process, taking the difference of therandom walk, we can recover the independent sequence. However, the information of theoriginal sequence will be lost by taking the difference when it does not include a unitroot. Therefore, the testing of the existence of unit root in the original sequence becomesimportant.


In this section, we review the fundamental asymptotic results for unit root processes.Let {εj} be i.i.d. (0, σ2) random variables, where σ2 > 0, and define the partial sum

rj = rj−1 + εj (r0 = 0)

=j∑

i=1

εi,(j = 1, . . . , T

),

(1.1)

which is the so-called random walk process. Random walk corresponds to the first-orderautoregressive (AR(1)) model with unit coefficient. Therefore, random walk is included inunit root (I(1)) processes which is a class of nonstationary processes. Let C = C[0, 1] be thespace of all real-valued continuous functions defined on [0, 1]. For random walk process, weconstruct the sequence of the processes of the partial sum {RT} in C as

RT (t) =1

σ√Trj + T

(t − j

T

)1

σ√Tεj ,

(j − 1T≤ t ≤ j

T

). (1.2)

It is well known that the partial sum process {RT} converge weakly to a standard Brownianmotion on [0, 1], namely,

L(RT ) −→ L(W) as T −→ ∞, (1.3)

whereL(·) denotes the distribution law of the corresponding random elements. This result isthe so-called functional central limit theorem (FCLT) (see Billingsley [1]).

The FCLT result can be extended to the unit root process where the innovation is gen-eral linear process. We consider a sequence {RT} of a stochastic process in C defined by

RT (t) =1√Trj + T

(t − j

T

)1√Tuj ,

(j − 1T≤ t ≤ j

T

), (1.4)

where rj =∑j

i=1ui and {uj} is assumed to be generated by

uj =∞∑

l=0

αlεj−l, α0 = 1. (1.5)

Here, {εj} is a sequence of i.i.d. (0, σ2) random variables, and {αj} is a sequence of constantswhich satisfies

∑∞l=0l|αl| <∞; therefore, {uj} becomes stationary process. Using the Beveridge

and Nelson [2] decomposition, it holds (see, e.g., Tanaka [3])

L(RT

)−→ L(αW), α =

∞∑

l=0

αl. (1.6)

The asymptotic property of LSE for stationary autoregressive models has been wellestablished (see, e.g., Hannan [4]). On the other hand, due to its nonstationarity, the


LSE of random walk does not satisfy asymptotic normality. However, we can derive thelimiting distribution of LSE of unit root process from the FCLT result. For more detailedunderstanding about unit root process with i.i.d. or stationary innovation, refer to, forexample, Billingsley [1] and Tanaka [3].

In the above case, the {uj}’s are stationary and hence, have constant variance, whilecovariances depend on only time differences. This is referred to as the homogeneous case,which is too restrictive to interpret empirical data, for example, empirical financial data.Recently, an important class of nonstationary processes have been proposed by Dahlhaus(see, e.g., Dahlhaus [5, 6]), called locally stationary processes. In this paper, we alternativelyadopt locally stationary innovation process, which has smoothly changing variance. Since theLSP innovation has time-varying spectral structure, it is suitable for describing the empiricalfinancial time series data.

This paper is organized as follows. In the appendix, we review the extension of theFCLT results to the cases that the innovations are locally stationary process. Namely, weexplain the FCLT for unit root, near unit root, and general integrated processes with LSPinnovations. In Section 2, we obtain the asymptotic distribution of the least squares estimatorfor each case of the appendix. In Section 3, we also consider the testing problem for unit rootwith LSP innovation. Finally, in Section 4, we discuss the extensions of LSE, which includevarious famous estimators as special cases.

2. The Property of Least Squares Estimator

In this section, we investigate the asymptotic properties of least squares estimators for unitroot, near unit root, and I(d) processes with locally stationary process innovations. Testingproblem for unit root is also discussed. For the notations which are not defined in this section,refer to the appendix.

2.1. Least Squares Estimator for Unit Root Process

Here, we consider the following statistics:

ρ =

∑Tj=2xj−1,Txj,T∑T

j=2x2j−1,T

, (2.1)

obtained from model (A.3), which can be regarded as the least squares estimator (LSE) ofautoregressive coefficient in the first-order autoregressive (AR(1)) model xj,T = ρxj−1,T + uj,T .Define

U1,T =1Tσ2

T∑

j=2

xj−1,T(xj,T − xj−1,T

)

=12XT (1)2 − 1

2X(0)2 − 1

2Tσ2

T∑

j=1

u2j,T −

X(0)u1,T√Tσ

,

V1,T =1

T2σ2

T∑

j=2

x2j−1,T =

1T

T∑

j=1

XT

(j

T

)2

− 1TXT (1)2,

(2.2)


then we have

S1,T ≡ T(ρ − 1

)=U1,T

V1,T. (2.3)

Let us define a continuous function H1(x) = (H11(x),H12(x)) for x ∈ C, where

H11(x) =12

{x(1)2 − x(0)2 −

∫1

0

∞∑

l=0

αl(ν)2dν

}, H12(x) =

∫1

0x(ν)2dν. (2.4)

It is easy to check

U1,T = H11(XT ) + oP (1), V1,T = H12(XT ) + oP (1). (2.5)

Therefore, the continuous mapping theorem (CMT) leads to L(U1,T , V1,T ) → L(H1(X)) and

L(S1,T ) = L(T(ρ − 1

))

−→ L(H11(X)H12(X)

)= L

⎛⎜⎝

(1/2){X(1)2 −X(0)2 − ∫1

0

∑∞l=0αl(ν)

2dν}

∫10 X(ν)2dν

⎞⎟⎠

= L

⎛⎜⎝

∫10 X(ν)dX(ν) + (1/2)

∫10

[{∑∞l=0αl(ν)

}2 −∑∞l=0αl(ν)

2]dν

∫10 X(ν)2dν

⎞⎟⎠.

(2.6)

2.2. Least Squares Estimator for Near Unit Root Process

We next consider the least squares estimator ρT for model (A.11) in the case that β(t) ≡ β is aconstant on [0, 1], namely,

yj,T = ρTyj−1,T + uj,T ,(j = 1, . . . , T

), (2.7)

with ρT = 1 − β/T . Then, we have

ρT = 1 − βT

=

∑Tj=2yj−1,Tyj,T∑T

j=2y2j−1,T

, S2,T ≡ T(ρT − 1

)= −β =

U2,T

V2,T, (2.8)


where

U2,T =1Tσ2

T∑

j=2

yj−1,T(yj,T − yj−1,T

)

=12YT (1)2 − 1

2Y (0)2 − 1

2Tσ2

T∑

j=1

(uj,T −

β

Tyj−1,T

)2

− 1√Tσ

Y (0)(u1,T −

β

Ty0,T

)

V2,T =1

T2σ2

T∑

j=2

y2j−1,T =

1T

T∑

j=1

YT

(j

T

)2

− 1TYT (1)2.

, (2.9)


H21(x) =12

{x(1)2 − x(0)2 −

∫1

0

∞∑

l=0

αl(ν)2dν

}, H22(x) =

∫1

0x(ν)2dν. (2.10)

It is easy to check

U2,T = H21(YT ) + oP (1), V2,T = H22(YT ) + oP (1). (2.11)

Therefore, the CMT leads to L(U2,T , V2,T ) → L(H2(Y )) and

L(S2,T ) = L(T(ρ − 1

))= L

(−β

)

−→ L(H21(Y )H22(Y )

)= L

⎛⎜⎝

(1/2){Y (1)2 − Y (0)2 − ∫1

0

∑∞l=0αl(ν)

2dν}

∫10 Y (ν)

2dν

⎞⎟⎠

= L

⎛⎜⎝

∫10 Y (ν)dY (ν) + (1/2)

∫10

[{∑∞l=0αl(ν)

}2 −∑∞l=0αl(ν)

2]dν

∫10 Y (ν)

2dν

⎞⎟⎠.

(2.12)

2.3. Least Squares Estimator for I(d) Process

Furthermore, we consider the least squares estimator

ρ{d} =

∑Tj=2x

{d}j−1,Tx

{d}j,T

∑Tj=2

(x{d}j−1,T

)2, S3,T ≡ T

(ρ{d} − 1

)=U3,T

V3,T, (2.13)


obtained from model x{d}j,T = ρx{d}j−1,T + x{d−1}j,T , where

U3,T =1

T2d−1σ2

T∑

j=2

x{d}j−1,T

(x{d}j,T − x

{d}j−1,T

)

=12X{d}T (1)2 − 1

2T2

T∑

j=1

{X{d−1}T

(j

T

)}2

− 1TX{d}T (0)X{d−1}

T

(1T

)

V3,T =1

T2dσ2

T∑

j=2

(x{d}j−1,T

)2=

1T

T∑

j=1

{X{d}T

(j

T

)}2

− 1T

{X{d}T (1)

}2.

(2.14)


H31(x) =12x(1)2, H32(x) =

∫1

0x(ν)2dν. (2.15)

It is easy to check

U3,T = H31

(X{d}T

)+ oP (1), V3,T = H32

(X{d}T

)+ oP (1). (2.16)

Therefore, the CMT leads to L(U3,T , V3,T ) → L(H3(X{d−1})) and

L(S3,T ) = L(T(ρ{d} − 1

))

−→ L(H31

(X{d−1})

H32(X{d−1})

)

= L⎛

⎝(1/2){X{d−1}(1)

}2

∫10

{X{d−1}(ν)

}2dν

⎞

⎠

= L⎛

⎝∫1

0 X{d−1}(ν)dX{d−1}(ν)

∫10

{X{d−1}(ν)

}2dν

⎞

⎠.

(2.17)

The equality above is due to (d − 1)-times differentiability of X{d−1}.

3. Testing for Unit Root

In the analysis of empirical financial data, the existence of the unit root is an importantproblem. However, as we see in the previous section, the asymptotic results between unitroot and near unit root processes are quite different (the drift term appeared in the limiting


process of near unit root). Therefore, we consider the following testing problem against thelocal alternative hypothesis:

H0 : ρ = 1 H1 : ρ = 1 − βT. (3.1)

We should assume that σ2 is a unit to identify the models. Let the statistics S1,T be constructedin (2.3). Recall that, as T → ∞, under H0,

L(S1,T ) −→ L

⎛⎜⎝

∫10 X(ν)dX(ν) + (1/2)

∫10

[{∑∞l=0αl(ν)

}2 −∑∞l=0αl(ν)

2]dν

∫10 X(ν)2dν

⎞⎟⎠

= L

⎛⎜⎝U

V+

∫10

[{∑∞l=0αl(ν)

}2 −∑∞l=0αl(ν)

2]dν

2∫1

0 X(ν)2dν

⎞⎟⎠,

(3.2)

where

U =∫1

0X(ν)dX(ν), V =

∫1

0X(ν)2dν. (3.3)

Since {∑∞l=0αl(ν)}2,

∑∞l=0αl(ν)

2 are unknown, we construct a test statistic

Zρ = T(ρ − 1

)+(1/T)

∑Tj=1u

2j,T − (1/T)

∑Tt=1f(t/T, 0)

2(1/T2)∑T

j=2x2j−1,T

, (3.4)

where uj,T = xj,T − xj−1,T . A nonparametric time-varying spectral density estimator f(u, λ) isgiven by

f(u, λl) =M∫K

(M

(λl − μk

))IN

(u, μk

)dμk

≈ 2πMT

T/4πM+l∑

k=−T/4πM+l

K(M

(λl − μk

))IN

(u, μk

),

(3.5)

where λl = (2π/T)l − π , l = 1, . . . , T − 1 and μk = (2π/T)k − π , k = 1, . . . , T − 1. Here, IN(u, λ)is the local periodogram around time u given by

IN(u, λ) =1

2πN

∣∣∣∣∣

N∑

s=1

h( s

N

)u[uT]−N/2+s,Te

−iλs∣∣∣∣∣

2

, (3.6)


where [·] denotes Gauss symbol and, for real number a, [a] is the greatest integer that is lessthan or equal to a. Furthermore, we employ the following kernel functions and the orders ofbandwidth for smoothing in time and frequency domain, respectively,

K(x) = 6(

14− x2

), x ∈

[−1

2,

12

], h(x) = {6x(1 − x)}1/2, x ∈ [0, 1],

M = T1/6, N = T5/6,

(3.7)

which are optimal in the sense that they minimize the mean squared error of nonparametricestimator (see Dahlhaus [6]); however, we simply multiply the orders of bandwidth by theconstants equal to one. Then, it can be established that, under H0,

L(Zρ

) −→ L(U

V

). (3.8)

We now have to deal with statistics for which numerical integration must be elaborated. LetR be such a statistic, which takes the form R = U/V . Using Imhof’s [7] formula gives usdistribution function of R,

FR(x) = P(R ≤ x) = P(xV −U ≥ 0) =12+

1π

∫1

0

1s

Im{φ(s;x)

}ds, (3.9)

where φ(s;x) is the characteristic function of xV −U, namely,

φ(−is;x) = E[exp{s(xV −U)}] = E

[exp

{s

(x

∫1

0X(ν)2dν −

∫1

0X(ν)dX(ν)

)}]. (3.10)

However, so far we do not have the explicit form of the distribution function of the estimator.Therefore, we cannot perform a numerical experiment except for the clear simple cases. Itincludes the complicated problem in the differential equation and requires one further paperfor solution.

4. Extensions of LSE

In this section, we consider the extensions of LSE ρT for near random walk model yj,T =ρTyj−1,T + uj,T , ρT = 1 − β/T .


4.1. Ochi Estimator

Ochi [8] proposed the class of estimators of the following form, which are the extensions ofLSE for autoregressive coefficient:

ρ(θ1,θ2)T = 1 − β(θ1,θ2)

T=

∑Tj=2yj−1,Tyj,T

∑T−1j=2 y

2j,T + θ1y

21,T + θ2y

2T,T

, θ1, θ2 ≥ 0,

S4,T = T(ρ(θ1,θ2)T − 1

)= −β(θ1,θ2) =

U4,T

V4,T,

(4.1)

where

U4,T =1Tσ2

⎧⎨

⎩

T∑

j=2

yj−1,Tyj,T −T−1∑

j=2

y2j,T − θ1y

21,T − θ2y

2T,T

⎫⎬

⎭

=

{12(1 − 2θ1) +

β

T(2θ1 − 1) +

β2

T2 (1 − θ1)

}Y (0)2

+12(1 − θ2)YT (1)2 − 1

21Tσ2

T∑

j=1

(uj,T −

β

Tyj−1,T

)2

+1√Tσ

{1 − 2θ1 +

2βT(θ1 − 1)

}u1,TY (0) +

1Tσ2 (1 − θ1)u′1,T2,

V4,T =1

T2σ2

⎧⎨

⎩

T−1∑

j=2

y2j,T + θ1y

21,T + θ2y

2T,T

⎫⎬

⎭

=1T

T∑

j=1

YT

(j

T

)2

+ (θ1 − 1)1TYT

(1T

)2

+ (θ2 − 1)1TYT (1)2.

(4.2)

This class of estimators includes LSE ρ(1,0)T , Daniels’s estimator ρ(1/2,1/2)

T , and Yule-Walkerestimator ρ(1,1)T as the special cases.

Define for x ∈ C, H4(x) = (H41(x),H42(x)),

H41(x) =12

{(1 − 2θ1)x(0)2 + (1 − 2θ2)x(1)2 −

∫1

0

∞∑

l=0

αl(ν)2dν

},

H42(x) =∫1

0x(ν)2dν,

(4.3)

then we see that H4(x) is continuous and

U4,T = H41(YT ) + oP (1), V4,T = H42(YT ) + oP (1). (4.4)


From the CMT, we obtain L(U4,T , V4,T ) → L(H4(Y )), and therefore,

L(S4,T ) = L(T(ρ(θ1,θ2)T − 1

))= L

(−β(θ1,θ2)

)−→ L

(H41(Y )H42(Y )

), (4.5)

where

H41(Y ) =12

{(1 − 2θ1)Y (0)2 + (1 − 2θ2)Y (1)2 −

∫1

0

∞∑

l=0

αl(ν)2dν

}

= (1 − 2θ2)∫1

0Y (ν)dY (ν) + (1 − θ1 − θ2)Y (0)2

+12

∫1

0

⎡

⎣(1 − 2θ2)

{ ∞∑

l=0

αl(ν)

}2

−∞∑

l=0

αl(ν)2

⎤

⎦dν,

H42(Y ) =∫1

0Y (ν)2dν.

(4.6)

4.2. Another Extension of LSE

Next, we suggest another class of estimators which are also the extensions of LSE. Define forθ(u)(∈ C) with continuous derivative θ′(u) = (∂/∂u)θ(u),

ρθT = 1 − βθT

=

∑Tj=2θ

((j − 1

)/T

)yj−1,Tyj,T

∑Tj=2θ

((j − 1

)/T

)y2j−1,T

, S5,T = T(ρθT − 1

)= −βθ =

U5,T

V5,T, (4.7)

where

U5,T =1Tσ2

T∑

j=2

θ

(j − 1T

)yj−1,T

(yj,T − yj−1,T

)

= −12

T∑

j=1

{θ

(j

T

)− θ

(j − 1T

)}YT

(j

T

)2

+12θ(1)YT (1)2 − 1

2θ(0)Y (0)2

− 12

1Tσ2

T∑

j=1

θ

(j

T

)(uj,T −

β

Tyj−1,T

)2

+1

2Tσ2θ

(1T

)(u1,T −

β

Ty0,T

)2

+1

2Tσ2θ(0)

{u1,T

(u1,T + 2y0,T

) − 2βTy0,T

(y0,T + u1,T

)+β2

T2y2

0,T

},

V5,T =1

T2σ2

T∑

j=2

θ

(j − 1T

)y2j−1,T =

1T

T∑

j=1

θ

(j

T

)YT

(j

T

)2

− 1Tθ(1)YT (1)2.

(4.8)

If we take the taper function as θ(u), this estimator corresponds to the local LSE.


Define for x ∈ C, H5(x) = (H51(x),H52(x)),

H51(x) = −12

{∫1

0θ′(ν)x(ν)2dν − θ(1)x(1)2 + θ(0)x(0)2

}

− 12

{∫1

0θ(ν)

∞∑

l=0

αl(ν)2dν

},

H52(x) =∫1

0θ(ν)x(ν)2dν,

(4.9)

where θ′(u) = (∂/∂u)θ(u), then we see that H5(x) is continuous and

U5,T = H51(YT ) + oP (1), V5,T = H52(YT ) + oP (1). (4.10)

From the CMT, we obtain L(U5,T , V5,T ) → L(H5(Y )), and therefore,

L(S5,T ) = L(T(ρθT − 1

))= L

(−βθ

)−→ L

(H51(Y )H52(Y )

)≡ L

(Yθ

), (4.11)

where

H51(Y ) = −12

{∫1

0θ′(ν)Y (ν)2dν − θ(1)Y (1)2 + θ(0)Y (0)2

}

− 12

{∫1

0θ(ν)

∞∑

l=0

αl(ν)2dν

},

H52(Y ) =∫1

0θ(ν)Y (ν)2dν.

(4.12)

The integration by part leads to

Yθ =(1/2)

{∫10 θ(ν)dY

(1)(ν) − ∫10 θ(ν)

∑∞l=0αl(ν)

2dν}

∫10 θ(ν)Y (ν)

2dν, (4.13)

with Y (1)(t) = Y (t)2. Hence, using Ito’s formula,

dY (1)(t) = d{Y (t)2

}= 2Y (t)dY (t) +

{ ∞∑

l=0

αl(t)

}2

dt, (4.14)


we have

Yθ =

∫10 θ(ν)Y (ν)dY (ν) + (1/2)

∫10 θ(ν)

[{∑∞l=0αl(ν)

}2 −∑∞l=0αl(ν)

2]dν

∫10 θ(ν)Y (ν)

2dν. (4.15)

Appendices

In this appendix, we review the extensions of functional central limit theorem to the casesthat innovations are locally stationary processes, which are used for the main results of thispaper.

A. FCLT for Locally Stationary Processes

Hirukawa and Sadakata [9] extended the FCLT results to the unit root processes which havelocally stationary process innovations. Namely, they derived the FCLT for unit root, near unitroot, and general integrated processes with LSP innovations. In this section, we briefly reviewthese results which are applied in previous sections.

A.1. Unit Root Process with Locally Stationary Disturbance

First, we introduce locally stationary process innovation. Let {uj,T} be generated by thefollowing time-varying MA (∞) model:

uj,T =∞∑

l=0

αl

(j

T

)εj−l :=

∞∑

l=0

αl

(j

T

)Llεj = α

(j

T, L

)εj , (A.1)

where L is the lag-operator which acts as Lεj = εj−1 and α(u, L) =∑∞

l=0αl(u)Ll, and time-

varying MA coefficients satisfy

∞∑

l=0

l sup0≤u≤1

|αl(u)| <∞,∞∑

l=0

l sup0≤u≤1

∣∣∣∣∂

∂uαl(u)

∣∣∣∣ <∞. (A.2)

Then, these {uj,T}’s become locally stationary processes (see Dahlhaus [5], Hirukawa andTaniguchi [10]). Using this innovation process, define the partial sum {xj,T} as

xj,T = xj−1,T + uj,T = x0,T +j∑

i=1

ui,T , (A.3)

where x0,T = σ√TX(0), X(0) ∼N(γX, δ2

X) and is independent of {εj}.We consider a sequence {XT} of partial sum stochastic processes in C defined by

XT (t) =1

σ√Txj,T + T

(t − j

T

)1

σ√Tuj,T ,

(j − 1T≤ t ≤ j

T

). (A.4)


Now, we define on R × C

h(1)t

(x, y

)= x + α(t, 1)y(t) −

∫ t

0α′(ν, 1)y(ν)dν,

α(t, 1) =∞∑

l=0

αl(t), α′(t, 1) =∂

∂tα(t, 1) =

∞∑

l=0

∂

∂tαl(t).

(A.5)

Then, we can obtain

L(XT ) −→ L{h(1)(X(0),W)

}≡ L(X). (A.6)

The integration by parts leads to

X(t) = X(0) + α(t, 1)W(t) −∫ t

0α′(ν, 1)W(ν)dν

= X(0) +∫ t

0α(ν, 1)dW(ν),

dX(t) = α(t, 1)dW(t).

(A.7)

Note that the time-varying MA (∞) process uj,T in (A.1) has the spectral representation

uj,T =∫π

−πA

(j

T, λ

)eijλdξ(λ), (A.8)

where ξ(λ) is the spectral measure of i.i.d. process {εj} which satisfies εj =∫π−π e

ijλdξ(λ), andthe transfer function A(t, λ) is given by

A(t, λ) =∞∑

l=0

αl(t)e−ilλ, A(t, 0) =∞∑

l=0

αl(t) = α(t, 1). (A.9)

Therefore, stochastic differential in (A.7) can be written as

dX(t) = A(t, 0)dW(t). (A.10)

A.2. Near Unit Root Process with Locally Stationary Disturbance

In this section, we consider the following near unit root process {yj,T}with locally stationarydisturbance:

yj,T = ρj,Tyj−1,T + uj,T ,(j = 1, . . . , T

)

=j∏

i=1

ρi,Ty0,T +j∑

i=1

(j∏

k=i+1

ρk,T

)ui,T ,

(A.11)


where {uj,T} is generated from the time-varying MA (∞) model in (A.1), ρj,T = 1 − (1/T)β(j/T), β(t) ∈ C[0, 1], y0,T =

√TσY (0), and Y (0) ∼ N(γY , δY ) is independent of {εj} and

X(0). Then, we define a sequence {YT} of partial sum processes in C as

YT (t) =1

σ√Tyj,T + T

(t − j

T

)yj,T − yj−1,T

σ√T

,

(j − 1T≤ t ≤ j

T

). (A.12)

Define on R2 × C

h(2)t

(x, y, z

)= e−

∫ t0 β(ν)dν

(y − x) −

∫ t

0β(ν)e−

∫ tν β(s)dsz(ν)dν + z(t). (A.13)

Then, we can obtain

L(YT ) −→ L{h(2)(X(0), Y (0), X)

}≡ L(Y ). (A.14)

The integration by parts and Ito’s formula lead to

Y (t) = e−∫ t

0 β(s)ds

(Y (0) −X(0) −

∫ t

0β(ν)e

∫ν0 β(s)dsX(ν)dν

)+X(t)

= e−∫ t

0 β(s)ds

(Y (0) +

∫ t

0e∫ν

0 β(μ)dμdX(ν)

)

= e−∫ t

0 β(s)ds

(Y (0) +

∫ t

0e∫ν

0 β(μ)dμα(ν, 1)dW(ν)

),

dY (t) = −β(t)Y (t) + α(t, 1)dW(t)

= −β(t)Y (t) +A(t, 0)dW(t)

= −β(t)Y (t) + dX(t).

(A.15)

A.3. I(d) Process with Locally Stationary Disturbance

Let I(d) process {x{d}j,T } be generated by

(1 − L)dx{d}j,T = uj,T ,(j = 1, . . . , T

), (A.16)

with x{d}−d+1,T = · · · = x{d}0,T = 0, and {uj,T} being the time-varying MA (∞) process in (A.1). Notethat the relation (A.16) can be rewritten as

(1 − L)x{d}j,T = x{d−1}j,T . (A.17)


Then, we construct the partial sum process {X{d}T } as

X{d}T (t) =

1Td−1

{1

σ√Tx{d}j,T + T

(t − j

T

)1

σ√Tx{d−1}j,T

}, (A.18)

for (j − 1)/T ≤ t ≤ j/T , d ≥ 2, and X{1}T (t) ≡ XT (t), where the partial sum process {XT}

is defined in (A.4). Let us first discuss weak convergence to the onefold integrated process{X{1}} defined by

X{1}(t) =∫ t

0X(ν)dν =

∫ t

0

{X(0) +

∫ν

0α(μ, 1

)dW

(μ)}dν. (A.19)

For d = 2, the partial sum process in (A.18) becomes

X{2}T (t) =

1T

{j∑

i=1

XT

(i

T

)+ T

(t − j

T

)XT

(j

T

)},

(j − 1T≤ t ≤ j

T

). (A.20)

Define on C

h(3)t (x) =

∫ t

0x(ν)dν. (A.21)

Then, we can see that

L(X{2}T

)−→ L

{h(3)(X)

}= L

{X{1}

}. (A.22)

For the general integer d, define the d-fold integrated process {X{d}} by

X{d}(t) =∫ t

0X{d−1}(ν)dν, X{0}(t) = X(t). (A.23)

From the similar argument in the case of d = 2, we can see that the partial sum process {X{d}T }satisfies

L(X{d}T

)−→ L

{h(3)

(X{d−1}

)}= L

{X{d−1}

}. (A.24)

Acknowledgments

The authors would like to thank the referees for their many insightful comments, whichimproved the original version of this paper. The authors would also like to thank ProfessorMasanobu Taniguchi who is the lead guest editor of this special issue for his efforts andcelebrate his sixtieth birthday.


References

[1] P. Billingsley, Convergence of Probability Measures, John Wiley & Sons, New York, NY, USA, 1968.[2] S. Beveridge and C. R. Nelson, “A new approach to decomposition of economic time series into

permanent and transitory components with particular attention to measurement of the ‘businesscycle’,” Journal of Monetary Economics, vol. 7, no. 2, pp. 151–174, 1981.

[3] K. Tanaka, Time Series Analysis: Nonstationary and Noninvertible Distribution Theory, Wiley Series inProbability and Statistics: Probability and Statistics, John Wiley & Sons, New York, NY, USA, 1996.

[4] E. J. Hannan, Multiple Time Series, John Wiley and Sons, London, UK, 1970.[5] R. Dahlhaus, “Maximum likelihood estimation and model selection for locally stationary processes,”

Journal of Nonparametric Statistics, vol. 6, no. 2-3, pp. 171–191, 1996.[6] R. Dahlhaus, “Asymptotic statistical inference for nonstationary processes with evolutionary spec-

tra,” in Athens Conference on Applied Probability and Time Series Analysis, Vol. II, vol. 115 of Lecture Notesin Statist., pp. 145–159, Springer, New York, NY, USA, 1996.

[7] J. P. Imhof, “Computing the distribution of quadratic forms in normal variables,” Biometrika, vol. 48,pp. 419–426, 1961.

[8] Y. Ochi, “Asymptotic expansions for the distribution of an estimator in the first-order autoregressiveprocess,” Journal of Time Series Analysis, vol. 4, no. 1, pp. 57–67, 1983.

[9] J. Hirukawa and M. Sadakata, “Asymptotic properties of unit root processes with locally stationarydisturbance,” Preprint.

[10] J. Hirukawa and M. Taniguchi, “LAN theorem for non-Gaussian locally stationary processes and itsapplications,” Journal of Statistical Planning and Inference, vol. 136, no. 3, pp. 640–688, 2006.


Research ArticleStatistical Portfolio Estimation under the UtilityFunction Depending on Exogenous Variables

Kenta Hamada, Dong Wei Ye, and Masanobu Taniguchi

Department of Applied Mathematics, School of Fundamental Science and Engineering, Waseda University,Tokyo 169-8050, Japan

Correspondence should be addressed to Masanobu Taniguchi, [email protected]

Received 8 September 2011; Accepted 15 November 2011

Academic Editor: Cathy W. S. Chen

Copyright q 2012 Kenta Hamada et al. This is an open access article distributed under the CreativeCommons Attribution License, which permits unrestricted use, distribution, and reproduction inany medium, provided the original work is properly cited.

In the estimation of portfolios, it is natural to assume that the utility function depends onexogenous variable. From this point of view, in this paper, we develop the estimation under theutility function depending on exogenous variable. To estimate the optimal portfolio, we introducea function of moments of the return process and cumulant between the return processes andexogenous variable, where the function means a generalized version of portfolio weight function.First, assuming that exogenous variable is a random process, we derive the asymptotic distributionof the sample version of portfolio weight function. Then, an influence of exogenous variable on thereturn process is illuminated when exogenous variable has a shot noise in the frequency domain.Second, assuming that exogenous variable is nonstochastic, we derive the asymptotic distributionof the sample version of portfolio weight function. Then, an influence of exogenous variable on thereturn process is illuminated when exogenous variable has a harmonic trend. We also evaluate theinfluence of exogenous variable on the return process numerically.

1. Introduction

In the usual theory of portfolio analysis, optimal portfolios are determined by the mean μ andthe variance Σ of the portfolio returnX = {X(t)}. Several authors proposed estimators of opti-mal portfolios as functions of the sample mean μ and the sample variance Σ for independ-ent returns of assets. However, empirical studies show that financial return processes areoften dependent and non-Gaussian. Shiraishi and Taniguchi [1] showed that the above esti-mators are not asymptotically efficient generally if the returns are dependent. Under the non-Gaussianity, if we consider a general utility function U(·), the expected utility should dependon higher-order moments of the return. From this point of view, Shiraishi and Taniguchi [1]proposed the portfolios including higher-order moments of the return.

However, empirical studies show that the utility function often depends on exogenousvariable Z = {Z(t)}. From this point of view, in this paper, we develop the estimation under


the utility function depending on exogenous variable. Denote the optimal portfolio estimatorby a function g = g(θ) = g(E(X), cov(X,X), cov(X,Z), cum(X,X,Z)) where hat ( · ) meansthe sample version of (·). Although Shiraishi and Taniguchi’s [1] setting does not include theexogenous variableZ(t) in g, we can develop the asymptotic theory in the light of their work.

First, assuming that Z = {Z(t)} is a random process, we derive the asymptotic distri-bution of g. Then, an influence of Z on the return process is illuminated when Z has a shotnoise in the frequency domain. Second, assuming that Z is a nonrandom sequence of varia-bles which satisfy Grenander’s conditions, we also derive the asymptotic distribution of g.Then an influence ofZ onX is evaluated whenZ is a sequence of harmonic functions. Numer-ical studies will be given, and they show some interesting features.

The paper is organized as follows. Section 2 introduces the optimal portfolio of theform g and provides the asymptotic distribution of g. Assuming thatZ is a stochastic process,we derive the asymptotics of g when Z has a shot noise in the frequency domain. The influ-ence of Z on X is numerically evaluated in Section 2.2. Assuming that Z is a nonrandomsequence satisfying Grenander’s conditions, we derive the asymptotic distribution of g.Section 3 provides numerical studies for the influence of Z on X when Z is a sequence of har-monic functions. The appendix gives the proofs of all the theorems.

2. Optimal Portfolio with the Exogenous Variables

Suppose the existence of a finite number of assets indexed by i, (i = 1, . . . , p). LetX(t) = (X1(t), . . . , Xp(t))

′ denote the random returns on p assets at time t, and let Z(t) =(Z1(t), . . . , Zq(t))

′ denote the exogenous variables influencing on the utility function at timet. We write Y (t) = (X(t)′, Z(t)′)′ = (X1(t), . . . , Xp(t), Z1(t), . . . , Zq(t))

′.Since it is empirically observed that {X(t)} is non-Gaussian and dependent, we will as-

sume that it is a non-Gaussian stationary process with the 3rd-order cumulants. Also, suppose

that there exists a risk-free asset whose return is denoted by Y0(t). Let α0 and α = (p

︷︸︸︷α1, . . . , αp

,

q︷︸︸︷0, . . . , 0 )′ be the portfolio weights at time t, and the portfolio isM(t) = Y (t)′α+Y0(t)α0 whose

higher-order cumulants are written as

cM1 (t) = cum{M(t)} = ca1αa1 + Y0(t)α0,

cM2 (t) = cum{M(t),M(t)} = ca2a3αa2αa3 ,

cM3 (t) = cum{M(t),M(t),M(t)} = ca4a5a6αa4αa5αa6 .

(2.1)

We use Einstein’s summation convention here and throughout the paper. For a utility func-tion U(·), the expected utility can be approximated as

E[U(M(t))] ≈ U(cM1 (t)

)+

12!D2U

(cM1 (t)

)cM2 (t) +

13!D3U

(cM1 (t)

)cM3 (t), (2.2)


by Taylor expansion of order 3. The approximate optimal portfolio may be described as

maxα0,α

{the right hand side of (2.2)

},

subject to α0 +p∑

i=1

αi = 1.(2.3)

Solving (2.4), Shiraishi and Taniguchi [1] introduced the optimal portfolio dependingon the mean, variance, and the third-order cumulants, and then derived the asymptotic distri-bution of a sample version estimator. Although our problem is different from that of Shiraishiand Taniguchi [1], we develop the discussion with the methods inspired by them.

Introduce a portfolio estimator function based on observed higher-order cumulants,

g ≡ g(θ)= g(E(X), cov(X,X), cov(X,Z), cum{X,X,Z}

), (2.4)

and assume that the function g(·) is p-dimensional and measurable, that is,

g(θ) : g(E(X), cov(X,X), cov(X,Z), cum{X,X,Z}) −→ Rp. (2.5)

Let the random process {Y(t) = (Y1(t), . . . , Yp+q(t))′} be a (p + q)-vector linear process

generated by

Y(t) =∞∑

j=0

G(j)ε(t − j) + µ, (2.6)

where {ε(t)} is a (p + q)-dimensional stationary process such that E{ε(t)} = 0 andE{ε(s)ε(t)′} = δ(s, t)K, withK a nonsingular (p+q)×(p+q)-matrix,G(j)’s are (p+q)×(p+q)-matrices, and μ = (μ1, . . . , μp+q) is the mean vector of {Y(t)}. All the components of Y, ε, G, µare real. Assuming that {ε(t)} has all order cumulants, let Qe

a1,...,aj (t1, . . . , tj−1) be the joint jth-order cumulant of εa1(t), εa2(t + t1), . . . , εaj (t + tj−1). In what follows we assume that, for eachj = 1, 2, 3, . . . ,

∞∑

t1,...,tj=−∞

p+q∑

a1,...,aj=1

∣∣∣Qea1,...,aj

(t1, . . . , tj−1

)∣∣∣ <∞,

∞∑t=0‖G(t)‖ <∞.

(2.7)

Letting QYa1,...,aj (t1, . . . , tj−1) be the joint jth-order cumulant of Ya1(t), Ya2(t +

t1), . . . , Yaj (t + tj−1), we define the jth-order cumulant spectral density by

fa1,...,aj

(λ1, . . . , λj−1

)=(

12π

)j−1 ∞∑

t1,...,tj−1=−∞exp

{−i(λ1t1 + · · · + λj−1tj−1)}

×QYa1,...,aj

(t1, . . . , tj−1

),

(2.8)


which is expressed as

fa1,...,aj

(λ1, . . . , λj−1

)

=p+q∑

b1,...,bj=1

ka1b1

(λ1 + · · · + λj−1

)ka2b2(−λ1) · · · kaj−1bj−1

(−λj−2)kajbj

(−λj−1)Qea1,...,aj

(λ1, . . . , λj−1

),

(2.9)

where Qea1,...,aj

is the jth-order cumulant spectral density of εa1(t), . . . , εaj (t), kab(λ) =∑∞

l=0Gab(l)eiλl andGab(l) is the (a, b)th element ofG(l). We introduce the following quantities:

ca1 =1n

n∑

s=1

Ya1(s),

ca2a3 =1n

n∑

s=1

(Ya2(s) − ca2)(Ya3(s) − ca3),

ca4a5 =1n

n∑

s=1

(Ya4(s) − ca4)(Ya5(s) − ca5),

ca6a7a8 =1n

n∑

s=1

(Ya6(s) − ca6)(Ya7(s) − ca7)(Ya8(s) − ca8),

(2.10)

where 1 ≤ a1, a2, a3, a4, a6, a7 ≤ p and p+ 1 ≤ a5, a8 ≤ p+ q. Write the quantities that appearedin (2.4) by

θ = (ca1 , ca2a3 , ca4a5 , ca6a7a8),

θ = (ca1 , ca2a3 , ca4a5 , ca6a7a8),(2.11)

where ca1,...,aj ≡ QYa1,...,aj (0, . . . , 0). Then dim θ = dim θ = a+b+c+d, where a = p, b = p(p+1)/2,

c = pq, d = p(p + 1)q/2.First, we derive the asymptotics of the fundamental quantity θ.

Theorem 2.1. Under the assumptions,

√n(θ − θ

) D−→N(0,Ω), (n −→ ∞), (2.12)

where

Ω =

⎛⎜⎜⎜⎜⎜⎝

Ω11 Ω12 Ω13 Ω14

Ω21 Ω22 Ω23 Ω24

Ω31 Ω32 Ω33 Ω34

Ω41 Ω42 Ω43 Ω44

⎞⎟⎟⎟⎟⎟⎠, (2.13)


and the typical element of Ωij corresponding to the covariance between cΔ and c∇ is denoted byV {(Δ) (∇)}, and

V{(a1)

(a′1)}

= Ω11 = 2πfa1a′1(0),

V{(a2a3)

(a′1)}

= Ω12 = 2π∫π

−πfa2a3a

′1(λ, 0)dλ(= Ω21),

V{(a4a5)

(a′1)}

= Ω13 = 2π∫π

−πfa4a5a

′1(λ, 0)dλ(= Ω31),

V{(a6a7a8)

(a′1)}

= Ω14 = 2π∫∫π

−πfa6a7a8a

′1(λ1, λ2, 0)dλ1dλ2(= Ω41),

V{(a2a3)

(a′2a

′3)}

= Ω22 = 2π∫∫π

−πfa2a3a

′2a′3(λ1, λ2,−λ2)dλ1dλ2

+ 2π∫π

−π

{fa2a

′2(λ)fa3a

′3(−λ) + fa2a

′3(λ)fa3a

′2(−λ)

}dλ,

V{(a2a3)

(a′4a

′5)}

= Ω23 = 2π∫∫π

−πfa2a3a

′4a′5(λ1, λ2,−λ2)dλ1dλ2

+ 2π∫π

−π

{fa2a

′4(λ)fa3a

′5(−λ) + fa2a

′5(λ)fa3a

′4(−λ)

}dλ(= Ω32),

V{(a2a3)

(a′6a

′7a′8)}

= Ω24 = 2π∫∫∫π

−πfa2a3a

′6a′7a′8(λ1, λ2, λ3,−λ3)dλ1dλ2dλ3(= Ω42),

V{(a4a5)

(a′4a

′5)}

= Ω33 = 2π∫∫π

−πfa4a5a

′4a′5(λ1, λ2,−λ2)dλ1dλ2

+ 2π∫π

−π

{fa4a

′4(λ)fa5a

′5(−λ) + fa4a

′5(λ)fa5a

′4(−λ)

}dλ,

V{(a2a3)

(a′6a

′7a′8)}

= Ω34 = 2π∫∫∫π

−πfa4a5a

′6a′7a′8(λ1, λ2, λ3,−λ3)dλ1dλ2dλ3(= Ω43),

V{(a6a7a8)

(a′6a

′7a′8)}

= Ω44 = 2π∫ ∫∫∫π

−πfa6a7a8a

′6a′7a′8(λ1, λ2, λ3,−λ3 − λ4)dλ1 · · ·dλ4

+ 2π∫∫∫π

−π

∑

ν1

fai1ai2ai3ai4 (λ1, λ2, λ3)fai5ai6

× (−λ2 − λ3)dλ1dλ2dλ3

+ 2π∫∫∫π

−π

∑

ν2

fai1ai2ai3 (λ1, λ2)fai4ai5ai6

× (λ3,−λ2 − λ3)dλ1dλ2dλ3

+ 2π∫∫∫π

−π

∑

ν3

fai1ai2 (λ1)fai3ai4 (λ2)fai5ai6 (−λ1 − λ2)dλ1dλ2.

(2.14)


Table 1: Standardized 3rd-order cumulants of the returns of five stocks from 2005/11/08 to 2011/11/08.

IBM Ford Merck HP EXXONs 0.05608 2.50879 0.79521 9.84777 0.34485

In what follows we place all the proofs of theorems in the appendix.Next we discuss the estimation of portfolio g(θ). For this we assume that the portfolio

function g(θ) is continuously differentiable. Henceforth, we use a unified estimator g(θ) forg(θ). The δ-method and Slutsky’s lemma imply the following.

Theorem 2.2. Under the assumptions

√n(g(θ)− g(θ)

) D−→N(0,(Dg)Ω(Dg)′)

, (n −→ ∞), (2.15)

where, Dg = {∂igj ; i = 1, . . . ,dim θ, j = 1, . . . , p + q}.

The quantities ca6a7a8 ’s are the 3rd-order cumulants of the process, which show thenon-Gaussianity. For the returns of five financial stocks IBM, Ford, Merck, HP, and EXXON,we calculated the standardized 3rd-order cumulants s = ca6a7a8/(v2)3/2 where v2 is thesample variance of the stock. Table 1 below shows their values.

From Table 1 we observe that the five returns are non-Gaussian. In view ofTheorem 2.1, it is possible to construct the (1 − α) confidence interval for c = ca6a7a8 in thefollowing form:

[c − zα√

nΩ1/2

44 , c +zα√nΩ1/2

44

], (2.16)

where zα is the upper level-α point of N(0, 1) and Ω44 is a consistent estimator of Ω44

calculated by the method of Keenan [2] and Taniguchi [3].

2.1. Influence of Exogenous Variable

In this subsection we investigate an influence of the exogenous variables Z(t) on theasymptotics of the portfolio estimator g(θ).

Assume that the exogenous variables have “shot noise” in the frequency domain, thatis,

Zaj (λ) = δ(λaj − λ

), (2.17)

where δ(·) is the Dirac delta function with period 2π , and λaj /= 0, hence Zaj (λ) has one peakat λ + λaj ≡ 0 (mod 2π).


Theorem 2.3. For (2.17), denote Ωij and V {(Δ) (∇)} in Theorem 2.1 by Ω′ij and V′{(Δ) (∇)},

respectively. That is, Ω′ij and V′{(Δ) (∇)} represent the asymptotic variance when the exogenous

variables are shot noise. Then,

V ′{(a4a5)

(a1′)} = Ω′13 = 0

(= Ω′31

),

V ′{(a6a7a8)

(a1′)} = Ω′14 = 2πfa6a7a

′1(λa8 , 0)

(= Ω′41

),

V ′{(a2a3)

(a4′a5′)} = Ω′23 = 2πfa2a3a

′4

(λa′5 , 0

)+ 2πfa2a

′4(λa5)fa3a

′5

(−λa′5

)

+ 2πfa2a′5

(−λa′5

)fa3a

′4(λa5)

(= Ω′32

),

V ′{(a2a3)

(a6′a7′a8′)} = Ω′24 = 2π

∫∫π

−πfa2a3a

′6a′7a′8

(−λa′8 , λ1, λ2,−λ2

)dλ1dλ2

(= Ω′42

),

V ′{(a4a5)

(a4′a5′)} = Ω′33 = 2πfa4a5a

′4a′5

(λa′5 , λa5 ,−λa5

)+ 2πfa4a

′4(λa5)fa5a

′5(−λa5)

+ 2πfa4a′5(λa5)fa5a

′4(−λa5),

V ′{(a4a5)

(a1′a1′)} = Ω′34 = 2π

∫π

−πfa4a5a

′6a′7a′8

(−λa′8 , λa5 , λ,−λ

)dλ(= Ω′43

).

(2.18)

2.2. Numerical Studies for Stochastic Exogenous Variables

This subsection provides some numerical examples which show the influence of Z(t) on Ωij .

Example 2.4. For a risk-free asset X0(t) and risky asset X(t), we consider construction ofoptimal portfolios αX(t) + α0X0(t). Here {X(t)} is the return process of the risky asset, whichis generated by

X(t) = θX(t − 1) + ε(t) + μ1, (2.19)

where E{ε(t)} = 0, Var{ε(t)} = σ2. We assume that X0(t) = μ, and that the exogenous variablein the frequency domain is given by Z(λ) = δ(λ). Write,

Y (t) =(X(t)X0(t)Z(t)′

), (2.20)

then

Ω′13 = Ω′31 = V ′{(a4a5)

(a′1)}

= 0,

Ω′23 = Ω′32 = V ′{(a2a3)

(a′4a

′5)}

= 0,

Ω′33 = V{(a4a5)

(a′4a

′5)}

= cum(a1, a

′1

)cum

(a3, a

′3)

= σ2 1(

1 − θeiλa3

) ,

(2.21)


−0.5

0

0.5 −2

0

2

Ω33

20

40

60

80

100

θ

Λ

Figure 1: Values of Ω′33 for θ = −0.9(0.018)0.9, λa3 = −π(0.06)π .

which are covarances between Z(t) and X(t), and show an influence Z(t) on X(t). FromFigure 1, it is seen that as θ tends to 1, and λa3 tends to 0, then Ω′33 increases. If θ tends to−1 and λa3 tends to −π , π , then Ω′33 also increases, which entails that the exogenous variableshave big influence on the asymptotics of estimators when θ is close to the unit root of AR(2.2).

Remark 2.5. Ω′13 is robust for the shot noise in Z(t) at λ = λa3 .

3. Portfolio Estimation for Nonstochastic Exogenous Variables

So far we assumed that the sequence of exogenous variables {Z(t)} is a random stochasticprocess. In this section, assuming that {Z(t)} is a nonrandom sequence, we will propose aportfolio estimator, and elucidate the asymptotics. We introduce the following quantities,

Aj,k =

∑nt=1Xj(t)Zk(t)√n∑n

t=1Z2k(t)

,

Bj,m,k =

∑nt=1Xj(t)Xm(t)Zk(t)√n∑n

t=1Z2k(t)

.

(3.1)

We assume that Z(t)’s satisfy Grenander’s conditions (G1)–(G4) with

a(n)j,k (h) =

n−h∑

t=1

Zj(t + h)Zk(t). (3.2)

(G1) limn→∞a(n)j,j (0) =∞, (j = 1, . . . , q).

(G2) limn→∞Zj(n + 1)2/a(n)j,j (0) = 0, (j = 1, . . . , q).


(G3) a(n)j,k

(h)/√a(n)j,j (h)a

(n)k,k

(h) = ρj,k(h) + o(1/√n) for j, k = 1, . . . , q, h ∈ Z.

(G4) the (p, p)-matrix Φ(0) ≡ {ρj,k(0) : j, k = 1, . . . , q} is regular.

Under Grenander’s conditions, there exists a Hermitian matrix function M(λ) ={Mj,k(λ) : j, k = 1, . . . , q}with positive semidefinite increments such that

Φ(h) =∫π

−πeihλdM(λ). (3.3)

M(λ) is the regression spectral measure of {Z(t)}. Next we discuss the asymptotics for sampleversions of cov(X,Z) and cov{XX,Z}. For this we need the following assumption, Thereexists constant b > 0 such that

det{fX(λ)

} ≥ b, (3.4)

where fX(λ) is the spectral density matrix of {X(t)}.

Theorem 3.1. Under Grenander’s conditions and the assumption

√n{Aj,k −Aj,k

} D−→N(0,Ωj,k

), (3.5)

where the (j ′, k′)-th element of Ωj,k is given by

V(j, k : j ′, k′

)= 2π

∫π

−πfjj ′(λ)dMk,k′(λ). (3.6)

Theorem 3.2. Under Grenander’s conditions and the assumption

√n{Bj,m,k − Bj,m,k

} D−→N(0,Ωj,m,k

), (3.7)

where Ωj,m,k = {V (j,m, k : j ′, m′, k′)} with

V(j,m, k : j ′, m′, k′

)= 2π

∫π

−π

[∫π

−π

{fjm′(λ − λ1)fmj ′(λ1) + fjj ′(λ − λ1)fmm′(λ1)

}dλ1

+∫∫π

−πfjmj ′m′(λ1, λ2 − λ, λ2)dλ1λ2

]dMk,k′(λ).

(3.8)

3.1. Numerical Studies for Nonstochastic Exogenous Variables

Letting {X(t)} and {Z(t)} be scalar processes, we investigate an influence of non-stochasticprocess {Z(t)} on {X(t)}. The figures below show influence of harmonic trends {Z(t)} onV (j,m, k : j ′, m′, k′) in Ωj,m,k. In these cases V (j,m, k : j ′, m′, k′) measures the amount ofcovariance between XX and Z.


η

−0.5

0

0.5

μ

−3

−2

−1

0

12

Var

ian

ce

10000

20000

30000

40000

Figure 2: Values of V (j,m, k : j ′,m′, k′) in Theorem 3.2 for η = −0.9(0.1)0.9, μ = −3.14(0.33)3.14.

η

−0.5

0

0.5

μ

−3

−2

−1

0

12

vari

ance

200300

400500

600700

Figure 3: Values of V (j,m, k : j ′,m′, k′) for η = −0.9(0.1)0.9, μ = −3.14(0.33)3.14.

Example 3.3. Let the return process {X(t)} and the exogenous process {Z(t)} be generated by

X(t) = ε(t) − ηε(t − 1),

Z(t) = cos(μt)+ cos

(0.25μt

),

(3.9)

where ε(t)’s are i. i. d. N(0,1) variables. Next, suppose that {Z(t)} consists of harmonic trendswith period μ and the quarter period. We plotted the graph of V (j,m, k : j ′, m′, k′) in Figure 2.

Example 3.4. Assume that {X(t)} and {Z(t)} are generated by

X(t) − ηX(t − 1) = ε(t),

Z(t) = cos(μt)+ cos

(0.25μt

).

(3.10)


We observe that there exist two peaks in Figure 3. If μ ≈ 0 and |η| ≈ 1, V (j,m, k : j ′, m′, k′)increases rapidly. For further study it may be noted that Cheng et al. [4] discussed statisticalestimation of generalized multiparameter likelihood models. Although these models are forindependent samples, there is some possibility to apply them to our portfolio problem.

Appendix

This section provides the proofs of theorems.

Proof of Theorem 2.1. Our setting includes the exogenous variables. Although Shiraishi andTaniguchi’s [1] setting does not include them, we can prove the theorem in line with Shiraishiand Taniguchi [1].

Let

ca2a3 =1n

n∑

s=1

(Ya2 − μa2

)(Ya3 − μa3

),

ca6a7a8 =1n

n∑

s=1

(Ya6 − μa6

)(Ya7 − μa7

)(Ya8 − μa8

).

(A.1)

From Fuller [5], it is easy to see that

(ca6 − μa6

)(ca7 − μa7

)(ca8 − μa8

)= op

(1√n

),

ca6a7(ca8 − μa8

)= ca6a7

(ca8 − μa8

)+ op

(1√n

),

(A.2)

where

1 ≤ a6, a7 ≤ p, p + 1 ≤ a8 + p + q. (A.3)

Then we can see that

ca6a7a8 = ca6a7a8 −8∗∑

k=6

cajk aik(cak − μak

)+ op

(1√n

), (A.4)

where∑8∗

k=6 is the sum over k = 6, 7, 8 with ik and jk ∈ {6, 7, 8} satisfying ik < jk; k /= ik, jk.Hence it follows that

n Cov(ca6a7a8 − ca6a7a8)(ca6′a7′a8′ − ca6′a7′a8′ )

= n cum{ca6a7a8 − ca6′a7′a8′ }

− n8∗∑

k=6

cum{caik ajk

(cak − μak

), ca6′a7′a8′

} − n8∗∑

k′=6

cum{ca6a7a8 , caik′ ajk′

(cak′ − μak′

)}

+ n8∗∑

k=6

8∗∑

k′=6

cum{caik ajk

(cak − μak

)caik′ ajk′

(cak′ − μak′

)}+ o(1).

(A.5)


In what follows we assume that (ai1 , . . . , ai6) is an arbitrary permutation of(a6, a7, a8, a6′ , a7′ , a8′),

1 ≤ a6, a7, a6′ , a7′ ≤ p, p + 1 ≤ a8, a8′ ≤ p + q. (A.6)

Then we get

−→∞∑

t=−∞

{QYa6a7a8a6′a7′a8′ (0, 0, t, t, t) +

∑

ν1

QYai1ai2ai3ai4

(0, t, t)QYai5ai6

(t)

+∑

ν2

QYai1ai2ai3

(0, t)QYai4ai5ai6

(t, t) +∑

ν3

QYai1ai2

(t)QYai5ai4

(t) +QYai5ai6

(t)

}.

(A.7)

By use of Fourier transform, we see that

(A1) = 2π∫ ∫∫∫π

−πfa6a7a8a6

′a7′a8′(λ1, λ2, λ3, λ4,−λ3,−λ4)dλ1 · · ·dλ4

+ 2π∫∫∫π

−π

∑

ν1

fai1ai2ai3ai4 (λ1, λ2, λ3)fai5ai6 (−λ2 − λ3)dλ1dλ2dλ3

+ 2π∫∫∫π

−π

∑

ν2

fai1ai2ai3 (λ1, λ2)fai4ai5ai6 (λ3,−λ2 − λ3)dλ1dλ2dλ3

+ 2π∫∫∫π

−π

∑

ν3

fai1ai2 (λ1)fai3ai4 (λ2)fai5ai6 (−λ1 − λ2)dλ1dλ2.

(A.8)

The other asymptotic covariances are similarly evaluated. Finally, it suffices to prove theasymptotic normality of

√n(θ − θ). For this we prove

cum{√

n(θa1 − θa1

), . . . ,

√n(θaj − θaj

)}−→ 0, j ≥ 3, (A.9)

where θaj and θaj are the ith component of θ and θ, respectively. Let

θaj − θaj =

⎧⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎩

cb1 − cb1 if i = 1, . . . , j1

cb2b3 − cb2b3 if i = j1 + 1, . . . , j1 + j2

cb4b5 − cb4b5 if i = j1 + j2 + 1, . . . , j1 + j2 + j3

cb6b7b8 − cb6b7b8 if i = j1 + j2 + j3 . . . , j1 + j2 + j3 + j4(= j).

(A.10)


Then, similarly as in Shiraishi and Taniguchi [1] we can see that

cum{√

n(θa1 − θa1

), . . . ,

√n(θaj − θaj

)}= nn/2 cum

{θa1 , . . . , θa1+1, . . . , θaj1+j2+j3+j4

}

= O(n1−j/2

)for j ≥ 3,

(A.11)

which implies the asymptotic normality of√n(θ − θ).

Proof of Theorem 3.1. We write√∑n

t=1Z2k(t) by dt,k(n). Then,

E

[Aj,k

]=μj∑n

t=11 · Zk(t)√ndt,k(n)

(by (G3)

)

= μj

∫π

−πdM0,k(λ) + o

(1√n

)say= Aj,k,

(A.12)

which leads to

E

[Aj,k

]= Aj,k + o

(1√n

). (A.13)

Next we evaluate the covariance:

Cov(√

n{Aj,k −Aj,k

},√n{Aj ′,k′ −Aj ′,k′

})

= n E

[{(Aj,k −Aj,k

)− E

[Aj,k −Aj,k

]}{(Aj ′,k′ −Aj ′,k′

)− E

[Aj ′,k′ −Aj ′,k′

]}]

= n E

[{(Aj,k −Aj,k

)+ o(

1√n

)}{(Aj ′,k′ −Aj ′,k′

)+ o(

1√n

)}]

= n E

[(Aj,k −Aj,k

)(Aj ′,k′ −Aj ′,k′

)]+ o(1)

=n∑

t=1

n∑

s=1

Rj,j ′(s − t) Zk(t)Zk′(s)dt,k(n)ds·k′(n)

=n−1∑

l=−n+1

Rj,j ′(l)n∑

s=1 1≤s−l≤n

Zk(s − l)Zk′(s)ds−l,k(n)ds·k′(n)

−→ 2π∫π

−πfjj ′(λ)dMk,k′(λ) = V

(j, k : j ′, k′

),

(A.14)

where Rj,j ′(s − t) is the covariance function of Xj(t) and Xj ′(s).The asymptotic normality of

√n(Aj,k −Aj,k) can be shown if we prove

cum{√

n(Aj1,k1 −Aj1,k1

), . . . ,

√n(Ajl,kl −Ajl,kl

)}−→ 0, l ≥ 3. (A.15)


Similarly as in Theorem 5.11.1 of Brillinger [6], we can see that

cum{√

n(Aj1,k1 −Aj1,k1

), . . . ,

√n(Ajl,kl −Ajl,kl

)}

= nl/2 cum{Aj1,k1 , . . . , Ajl,kl

}

= O(n1−l/2

), for l ≥ 3.

(A.16)

Proof of Theorem 3.2. First, it is seen that

limn→∞

E

[Bj,m,k

]= lim

n→∞Rj,m(0)

∑nt=1Zk(t)√ndt,k(n)

−→ Rj,m(0)∫π

−πdM0,k(λ)

say= Bj,m,k

(by (G3)

).

(A.17)

We can evaluate the covariance as follows:

Cov(√

n{Bj,k,m − Bj,k,m

},√n{Bj ′,k′,m′ − Bj ′,k′,m′

})

= n E

[(Bj,k,m − Bj,k,m

)(Bj ′,k′,m′ − Bj ′,k′,m′

)]

= n E

[Bj,k,mBj ′,k′,m′ − Bj,k,mBj ′,k′,m′

]

=n∑

t=1

n∑

s=1

Cov(Xj(t)Xm(t), Xj ′(s)Xm′(s)

) ∫π

−πdMk,k′(λ) + o(1)

=∫π

−π

∞∑

l=−∞

{cumj,m,j ′,m′(0, l, l) + Rj,m′(l)Rm,j ′(l) + Rj,j ′(l)Rm,m′(l)

}e−ilλdMk,k′(λ)

−→ 2π∫π

−π

[∫π

−π

{fjm′(λ − λ1)fmj ′(λ1) + fjj ′(λ − λ1)fmm′(λ1)

}dλ1

+2π∫∫π

−πfjmj ′m′(λ1, λ2 + λ,−λ2)dλ1dλ2

]dMk,k′(λ)

= V(j,m, k : j ′, m′, k′

).

(A.18)

Next we derive the asymptotic normality of√n(Bj,m,k − Bj,m,k). For this we prove

cum{√

n(Bj1,k1,m1 − Bj1,k1,m1

), . . . ,

√n(Bjl,kl,ml − Bjl,kl,ml

)}−→ 0, l ≥ 3. (A.19)


Similarly as in Theorem 5.11.1 of Brillinger [6], it is shown that

cum{√

n(Bj1,m1,k1 − Bj1,m1,k1

), . . . ,

√n(Bjl,ml,kl − Bjl,ml,kl

)}

= nl/2 cum{Bj1,m1,k1 , . . . , Bjl ,ml,kl

}

= O(n1−l/2

),

(A.20)

which proves the asymptotic normality.

Acknowledgments

The authors thank Professor Cathy Chen and two referees for their kind comments.

References

[1] H. Shiraishi and M. Taniguchi, “Statistical estimation of optimal portfolios depending on higher ordercumulants,” Annales de l’ISUP, vol. 53, no. 1, pp. 3–18, 2009.

[2] D. M. Keenan, “Limiting behavior of functionals of higher-order sample cumulant spectra,” The Annalsof Statistics, vol. 15, no. 1, pp. 134–151, 1987.

[3] M. Taniguchi, “On estimation of the integrals of the fourth order cumulant spectral density,” Biometrika,vol. 69, no. 1, pp. 117–122, 1982.

[4] M.-Y. Cheng, W. Zhang, and L.-H. Chen, “Statistical estimation in generalized multiparameterlikelihood models,” Journal of the American Statistical Association, vol. 104, no. 487, pp. 1179–1191, 2009.

[5] W. A. Fuller, Introduction to Statistical Time Series, John Wiley & Sons, New York, NY, USA, 2nd edition,1996.

[6] D. R. Brillinger, Time Series:Data Analysis and Theory, Holden-Day, San Francisco, Calif, USA, 2ndedition, 1981.


Research ArticleStatistical Estimation for CAPM withLong-Memory Dependence

Tomoyuki Amano,1 Tsuyoshi Kato,2 and Masanobu Taniguchi2

1 Faculty of Economics, Wakayama University, Wakayama 6408510, Japan2 Department of Applied Mathematics, School of Fundamental Science and Engineering,Waseda University, Tokyo 1698555, Japan

Correspondence should be addressed to Tomoyuki Amano, [email protected]

Received 24 June 2011; Revised 27 August 2011; Accepted 10 September 2011


Copyright q 2012 Tomoyuki Amano et al. This is an open access article distributed under theCreative Commons Attribution License, which permits unrestricted use, distribution, andreproduction in any medium, provided the original work is properly cited.

We investigate the Capital Asser Pricing Model (CAPM) with time dimension. By using time seriesanalysis, we discuss the estimation of CAPM when market portfolio and the error process arelong-memory process and correlated with each other. We give a sufficient condition for the returnof assets in the CAPM to be short memory. In this setting, we propose a two-stage least squaresestimator for the regression coefficient and derive the asymptotic distribution. Some numericalstudies are given. They show an interesting feature of this model.

1. Introduction

The CAPM is one of the typical models of risk asset’s price on equilibrium market andhas been used for pricing individual stocks and portfolios. At first, Markowitz [1] did thegroundwork of this model. In his research, he cast the investor’s portfolio selection problemin terms of expected return and variance. Sharpe [2] and Lintner [3] developed Markowitz’sidea for economical implication. Black [4] derived a more general version of the CAPM. Intheir version, the CAPM is constructed based on the excess of the return of the asset overzero-beta return E[Ri] = E[R0m] + βim(E[Rm] − E[R0m]), where Ri and Rm are the returnof the ith asset and the market portfolio and R0m is the return of zero-beta portfolio ofthe market portfolio. Campbell et al. [5] discussed the estimation of CAPM, but in theirwork they did not discuss the time dimension. However, in the econometric analysis, it isnecessary to investigate this model with the time dimension; that is, the model is representedas Ri,t = αim +βimRm,t + εi,t. Recently from the empirical analysis, it is known that the return of


asset follows a short-memory process. But Granger [6] showed that the aggregation of short-memory processes yields long-memory dependence, and it is known that the return of themarket portfolio follows a long-memory process. From this point of view, first, we show thatthe return of the market portfolio and the error process εt are long-memory dependent andcorrelated with each other.

For the regression model, the most fundamental estimator is the ordinary least squaresestimator. However, the dependence of the error process with the explanatory process makesthis estimator to be inconsistent. To overcome this difficulty, the instrumental variable methodis proposed by use of the instrumental variables which are uncorrelated with the error processand correlated with the explanatory variable. This method was first used by Wright [7], andmany researchers developed this method (see Reiersøl [8], Geary [9], etc.). Comprehensivereviews are seen in White [10]. However, the instrumental variable method has been dis-cussed in the case where the error process does not follow long-memory process, and thismakes the estimation difficult.

For the analysis of long-memory process, Robinson and Hidalgo [11] considered astochastic regression model defined by yt = α + β′xt + ut, where α, β = (β1, . . . , βK)′ areunknown parameters and the K-vector processes {xt} and {ut} are long-memory dependentwith E(xt) = 0, E(ut) = 0. Furthermore, in Choy and Taniguchi [12], they consider thestochastic regression model yt = βxt + ut, where {xt} and {ut} are stationary process withE(xt) = μ/= 0, and Choy and Taniguchi [12] introduced a ratio estimator, the least squaresestimator, and the best linear unbiased estimator for β. However, Robinson and Hidalgo [11]and Choy and Taniguchi [12] assume that the explanatory process {xt} and the error process{ut} are independent.

In this paper, by the using of instrumental variable method we propose the two-stageleast squares (2SLS) estimator for the CAPM in which the returns of the individual asset anderror process are long-memory dependent and mutually correlated with each other. Thenwe prove its consistency and CLT under some conditions. Also, some numerical studies areprovided.

This paper is organized as follows. Section 2 gives our definition of the CAPM, andwe give a sufficient condition that return of assets as short dependence is generated by thereturns of market portfolio and error process which are long-memory dependent and mutu-ally correlated each other. In Section 3 we propose 2SLS estimator for this model and showits consistency and asymptotic normality. Section 4 provides some numerical studies whichshow interesting features of our estimator. The proof of theorem is relegated to Section 5.

2. CAPM (Capital Asset Pricing Model)

For Sharpe and Lintner version of the CAPM (see Sharpe [2] and Lintner [3]), the expectedreturn of asset i is given by

E[Ri] = Rf + βim(E[Rm − Rf

]), (2.1)

where

βim =Cov[Ri, Rm]V [Rm]

, (2.2)


Rm is the return of the market portfolio, and Rf is the return of the risk-free asset. AnotherSharpe-Lintner’s CAPM (see Sharpe [2] and Lintner [3]) is defined for Zi ≡ Ri − Rf ,

E[Zi] = βimE[Zm], (2.3)

where

βim =Cov[Zi, Zm]V [Zm]

(2.4)

and Zm = Rm − Rf .Black [4] derived a more general version of CAPM, which is written as

E[Ri] = αim + βimE[Rm], (2.5)

where αim = E[R0m](1 − βim) and R0m is the return on the zero-beta portfolio.Since CAPM is single-period model, (2.1) and (2.5) do not have a time dimension.

However, for econometric analysis of the model, it is necessary to add assumptions concern-ing the time dimension. Hence, it is natural to consider the model:

Yi,t = αi + βiZt + εi,t, (2.6)

where i denotes the asset, t denotes the period, and Yi,t and Zt, i = 1, . . . , n and t = 1, . . . , T ,are, respectively, the returns of the asset i and the market portfolio at t.

Empirical features of the realized returns for assets and market portfolios are wellknown.

We plot the autocorrelation function (ACF(l) (l : time lag)) of returns of IBM stock andS&P500 (squared transformed) in Figures 1 and 2, respectively.

From Figures 1 and 2, we observe that the return of stock (i.e., IBM) shows the short-memory dependence and that a market index (i.e., S&P500) shows the long-memory depend-ence.

Suppose that an n-dimensional process {Yt = (Y1,t, . . . , Yn,t)′} is generated by

Yt = α + B′Zt + εt (t = 1, 2, . . . , T), (2.7)

where α = (α1, . . . , αn)′ and B = {βij ; i = 1, . . . , p, j = 1, . . . , n} are unknown vector and

matrix; respectively, {Zt = (Z1,t, . . . , Zp,t)′} is an explanatory stochastic regressor process, and

{εt = (ε1,t, . . . , εn,t)′} is a sequence of disturbance process. The ith component is written as

Yi,t = αi + β′iZt + εi,t, (2.8)

where β′i = (βi,1, . . . , βi,p).In the CAPM, Yt is the return of assets and Zt is the return of the market portfolios.

As we saw, empirical studies suggest that {Yt} is short-memory dependent and that {Zt} islong-memory dependent. On this ground, we investigate the conditions that the CAPM (2.7)


Lag

0 10 20 30 40

−0.2

0

0.2

0.4

0.6

0.8

1

Figure 1: ACF of return of the IBM stock.

Lag

0 10 20 30 40

−0.2

0

0.2

0.4

0.6

0.8

1

Figure 2: ACF of return of S&P500 (square transformed).

is well defined. It is seen that, if the model (2.7) is valid, we have to assume that {εt} is alsolong-memory dependent and is correlated with {Zt}.

Hence, we suppose that {Zt} and {εt} are defined by

Zt =∞∑

j=0

γ jat−j +∞∑

j=0

ρjbt−j ,

εt =∞∑

j=0

ηjet−j +∞∑

j=0

ξjbt−j ,

(2.9)

where {at}, {bt}, and {et} are p-dimensional zero-mean uncorrelated processes, and they aremutually independent. Here the coefficients {γ j} and {ρj} are p × p-matrices, and all thecomponents of γ j are �1-summable, (for short, γ j ∈ �1) and those of ρj are �2-summable (for


short, ρj ∈ �2). The coefficients {ηj} and {ξj} are n × p-matrices, ηj ∈ �1, and ξj ∈ �2. From(2.9) it follows that

Yt = α +∞∑

j=0

(B′γ jat−j + ηjet−j

)+∞∑

j=0

(B′ρj + ξi

)bt−j . (2.10)

Although (B′ρj + ξj) ∈ �2 generally, if B′ρj + ξj = O(1/jα), α > 1, then (B′ρj + ξj) ∈ �1,which leads to the following.

Proposition 2.1. If B′ρj + ξj = O(j−α), α > 1, then the process {Yt} is short-memory dependent.

Proposition 2.1 provides an important view for the CAPM; that is, if we assume naturalconditions on (2.7) based on the empirical studies, then they impose a sort of “curvedstructure”: B′ρj + ξj = O(j−α) on the regressor and disturbance. More important view is thestatement implying that the process {β′iZt + εi,t} is fractionally cointegrated. Here βi and εi,tare called the cointegrating vector and error, respectively, (see Robinson and Yajima [13]).

3. Two-Stage Least Squares Estimation

This section discusses estimation of (2.7) satisfying Proposition 2.1. Sinc E(Ztε′t)/= 0, the leastsquares estimator for B, is known to be inconsistent. In what follows we assume that α = 0in (2.7), because it can be estimated consistently by the sample mean. However, by use ofthe econometric theory, it is often possible to find other variables that are uncorrelated withthe errors εt, which we call instrumental variables, and to overcome this difficulty. Withoutinstrumental variables, correlations between the observables {Zt} and unobservables {εt}persistently contaminate our estimator for B. Hence, instrumental variables are useful in al-lowing us to estimate B.

Let {Xt} be r × 1-dimensional vector (p ≤ r) instrumental variables with E[Xt] = 0,Cov(Xt,Zt)/= 0, and Cov(Xt, εt) = 0. Consider the OLS regression of Zt on Xt. If Zt can berepresented as

Zt = δ′Xt + ut, (3.1)

where δ is a r × p matrix and {ut} is a p-dimensional vector process which is independent of{Xt}, δ can be estimated by the OLS estimator

δ =

[T∑

t=1

XtX′t

]−1[ T∑

t=1

XtZ′t

]. (3.2)

From (2.7) with α = 0 and (3.1), Yt has the form:

Yt = B′δ′Xt + B′ut + εt, (3.3)


and δ′Xt is uncorrelated with B′ut + εt; hence, B can be estimated by the OLS estimator:

BOLS =

[T∑

t=1

(δ′Xt

)(δ′Xt

)′]−1[ T∑

t=1

(δ′Xt

)Y′t

]. (3.4)

Using (3.2) and (3.4), we can propose the 2SLS estimator:

B2SLS =

[T∑

t=1

(δ′Xt)(δ′Xt)′]−1[ T∑

t=1

(δ′Xt)Y′t

]. (3.5)

Now, we aim at proving the consistency and asymptotic normality of the 2SLS estima-tor B2SLS. For this we assume that {εt} and {Xt} jointly constitute the following linear process:

(εt

Xt

)=∞∑

j=0

G(j)Γ(t − j) = At

(say

), (3.6)

where {Γ(t)} is uncorrelated (n + r)-dimensional vector process with

E[Γ(t)] = 0,

E[Γ(t)Γ(s)∗

]= δ(t, s)K,

δ(t, s) =

⎧⎨

⎩1, t = s,

0, t /= s,

(3.7)

and G(j),s are (n + r) × (n + r) matrices which satisfy∑∞

j=0tr{G(j)KG(j)∗} < ∞. Then {At}has the spectral density matrix:

f(ω) =1

2πk(ω)Kk(ω)∗ =

{fab(ω); 1 ≤ a, b ≤ (n + r)

}(−π < ω ≤ π), (3.8)

where

k(ω) =∞∑

j=0

G(j)eiωj = {kab(ω); 1 ≤ a, b ≤ (n + r)} (−π < ω ≤ π). (3.9)

Further, we assume that∫π−π log det f(ω)dω > −∞, so that the process {At} is nondetermin-

istic. For the asymptotics of B2SLS, from page 108, line↑1–page 109, line 7 of Hosoya [14], weimpose the following assumption.

Assumption 3.1. (i) There exists ε > 0 such that, for any t < t1 ≤ t2 ≤ t3 ≤ t4 and for each β1, β2,

var[E{Γβ1(t1)Γβ2(t2) | B(t)

} − δ(t1 − t2, 0)Kβ1β2

]= O

{(t1 − t)−2−ε

}, (3.10)


and also

E∣∣E{Γβ1(t1)Γβ2(t2)Γβ3(t3)Γβ4(t4) | B(t)

} − E{Γβ1(t1)Γβ2(t2)Γβ3(t3)Γβ4(t4)}∣∣

= O{(t1 − t)−1−ε

},

(3.11)

uniformly in t, where B(t) is the σ-field generated by {Γ(s); s ≤ t}.(ii) For any ε > 0 and for any integer M ≥ 0, there exists Bε > 0 such that

E[T(n, s)2{T(n, s) > Bε}

]< ε, (3.12)

uniformly in n, s, where

T(n, s) =

⎡

⎣p∑

α,β=1

M∑

r=0

{1√T

T∑

t=1

(Γα(t + s)Γβ(t + s + r) −Kαβδ(0, r)

)}2⎤

⎦1/2

, (3.13)

and {T(n, s) > Bε} is the indicator, which is equal to 1 if T(n, s) > Bε and equal to 0 otherwise.(iii) Each fab(ω) is square-integrable.

Under the above assumptions, we can establish the following theorem.

Theorem 3.2. Under Assumption 3.1, it holds that

(i)

B2SLSP−→ B, (3.14)

(ii)

√T(B2SLS − B

)d−→ Q−1E

[ZtX′t

]E[XtX′t

]−1U, (3.15)

where

Q =[E(ZtX′t

)][E(XtX′t

)]−1[E(XtZ′t

)], (3.16)

and U = {Ui,j ; 1 ≤ i ≤ r, 1 ≤ j ≤ n} is a random matrix whose elements follow normaldistributions with mean 0 and

Cov[Ui,j ,Uk,l

]= 2π

∫π

−π

[fn+i,n+k(ω)fj,l(ω) + fn+i,l(ω)fj,n+k(ω)

]dω

+ 2πp∑

β1,...,β4=1

∫π

−π

∫π

−πκn+i,β1(ω1)κj,β2(−ω1)κn+k,β3(ω2)κl,β4(−ω2)QΓ

β1,...,β4

× (ω1,−ω2, ω2)dω1 dω2.

(3.17)


The next example prepares the asymptotic variance formula of B2SLS to investigate itsfeatures in simulation study.

Example 3.3. Let {Zt} and {Xt} be scalar long-memory processes, with spectral densities{2π |1 − eiλ|2dZ}−1 and {2π |1 − eiλ|2dX}−1, respectively, and cross spectral density (1/2π)(1 −eiλ)−dX(1 − e−iλ)−dZ , where 0 < dZ < 1/2 and 0 < dX < 1/2. Then

E(XtZt) =1

2π

∫π

−π

1(1 − eiλ)dX

1(1 − e−iλ)dZ

dλ. (3.18)

Suppose that {εt} is a scalar uncorrelated process with σ2ε ≡ E{ε2

t }. Assuming Gaus-sianity of {At}, it is seen that the right hand of (3.17) is

2π∫π

−π

1

2π∣∣1 − eiλ∣∣2dX

σ2ε

2πdλ, (3.19)

which entails

limT→∞

var[√

T(B2SLS − B

)]=

2π∫π−π(

1/2π∣∣1 − eiλ∣∣2dX

)(σ2ε/2π

)dλ

(1/2π

∫π−π(

1/(1 − eiλ)dX

)(1/(1 − e−iλ)dZ

)dλ)2

= σ2ε

⎛⎜⎝2π

∫π−π(

1/∣∣1 − eiλ∣∣2dX

)dλ

(∫π−π(

1/(1 − eiλ)dX

)(1/(1 − e−iλ)dZ

)dλ)2

⎞⎟⎠

= σ2ε × V∗(dX, dZ).

(3.20)


In this section, we evaluate the behaviour of B2SLS in the case p = 1 in (2.7) numerically.

Example 4.1. Under the condition of Example 3.3, we investigate the asymptotic variancebehaviour of B2SLS by simulation. Figure 3 plots V∗(dX, dZ) for 0 < dX < 1/2 and 0 < dZ < 1/2.

From Figure 3, we observe that, if dZ ↘ 0 and if dX ↗ 1/2, then V∗ becomes large, andotherwise V∗ is small. This result implies only in the case that the long-memory behavior ofZt is weak and the long-memory behavior of Xt is strong, V∗ is large. Note that long-memorybehaviour of Zt makes the asymptotic variance of the 2SLS estimator small, but one of Xt

makes it large.


0.8

1.6

2.4

0.50.4

0.30.2

0.10.1

0.20.3

0.40.5

dZ

dX

V∗

Figure 3: V∗(dx, dz) in Section 4.

Table 1: MSE of B2SLS and BOLS.

d2 0.1 0.2 0.3B2SLS (d1 = 0.1) 0.03 0.052 0.189BOLS (d1 = 0.1) 0.259 0.271 0.34B2SLS (d1 = 0.2) 0.03 0.075 0.342BOLS (d1 = 0.2) 0.178 0.193 0.307B2SLS (d1 = 0.3) 0.019 0.052 0.267BOLS (d1 = 0.3) 0.069 0.089 0.23

Example 4.2. In this example, we consider the following model:

Yt = Zt + εt,

Zt = Xt + ut,

εt = wt + ut,

(4.1)

where Xt, wt, and ut are the scalar long-memory processes which follow FARIMA(0, d1, 0),FARIMA(0,d2,0), and FARIMA(0,0.1,0), respectively. Note that Zt and εt are correlated, Xt

and Zt are correlated, but Xt and εt are independent. Under this model we compare B2SLS

with the ordinary least squares estimator BOLS for B, which is defined as

BOLS =

[T∑

t=1

Z2t

]−1[ T∑

t=1

ZtYt

]. (4.2)

The lengths of Xt, Yt, and Zt are set by 100, and based on 5000 times simulation we report themean square errors (MSE) of B2SLS and BOLS. We set d1, d2 = 0.1, 0.2, 0.3 in Table 1.

In most cases of d1 and d2 in Table 1, MSE of B2SLS is smaller than that of BOLS. Hence,from this Example we can see that our estimator B2SLS is better than BOLS in the sense of MSE.Furthermore, from Table 1, we can see that MSE of B2SLS and BOLS increases as d2 becomeslarge; that is, long-memory behavior of wt makes the asymptotic variances of B2SLS and BOLS

large.


Table 2: B2SLS based on the actual financial data.

Stock IBM Nike Amazon American Express FordB2SLS 0.75 1.39 1.71 2.61 −1.89

Example 4.3. In this example, we calculate B2SLS based on the actual financial data. We chooseS&P500 (square transformed) as Zt and the Nikkei stock average as an instrumental variableXt. Assuming that Yt(5 × 1) consists of the return of IBM, Nike, Amazon, American Expressesand Ford; the 2SLS estimates for Bi, i = 1, . . . , 5 are recorded in Table 2. We chose the Nikkeistock average as the instrumental variable, because we got the following correlation analysisbetween the residual processes of returns and Nikkei.

Correlation of IBM’s residual and Nikkei’s return: −0.000311

Correlation of Nike’s residual and Nikkei’s return: −0.00015

Correlation of Amazon’s residual and Nikkei’s return: −0.000622

Correlation of American Express’s residual and Nikkei’s return: 0.000147

Correlation of Ford’s residual and Nikkei’s return: −0.000536,

which supports the assumption Cov(Xt, εt) = 0.

From Table 2, we observe that the return of the finance stock (American Express) isstrongly correlated with that of S&P500 and the return of the auto industry stock (Ford) isnegatively correlated with that of S&P500.

5. Proof of Theorem

This section provides the proof of Theorem 3.2. First for convenience we define Zt =(Z1,t, . . . , Zp,t)

′ ≡ δ′Xt. Let ut = (u1,t, . . . , up,t)

′ be the residual from the OLS estimation of(3.1); that is,

ui,t = Zi,t − Zi,t. (5.1)

The OLS makes this residual orthogonal to Xt:

T∑

t=1

X′tui,t = 0, (5.2)

which implies the residual is orthogonal to Zj,t,

T∑

t=1

Zj,tui,t =

(T∑

t=1

X′tui,t

)δj = 0, (5.3)


where δj is jth column vector of δ. Hence, we can obtain

T∑

t=1

Zj,tZi,t =T∑

t=1

Zj,t

(Zi,t + ui,t

)=

T∑

t=1

Zj,tZi,t, (5.4)

for all i and j. This means

T∑

t=1

ZtZ′t =T∑

t=1

ZtZ′t. (5.5)

So, the ith column vector of the 2SLS estimator (3.5) β2SLS,i (say) can be represented as

β2SLS,i =

[T∑

t=1

ZtZ′t

]−1[ T∑

t=1

ZtYi,t

], (5.6)

which leads to

β2SLS,i − βi =[

1T

T∑

t=1

ZtZ′t

]−1[1T

T∑

t=1

Ztεi,t

]. (5.7)

Hence, we can see that

√T(B2SLS − B

)=

[1T

T∑

t=1

ZtZ′t

]−1[1√T

T∑

t=1

Ztε′t

]. (5.8)

Note that, by the ergodic theorem (e.g., Stout [15] p179–181),

1T

T∑

t=1

ZtZ′t =1Tδ′ T∑

t=1

XtZ′t

=

[1T

T∑

t=1

ZtX′t

][1T

T∑

t=1

XtX′t

]−1[1T

T∑

t=1

XtZ′t

]

P−→ Q.

(5.9)

Furthermore, the second term of the right side of (5.8) can be represented as

[1√T

T∑

t=1

Ztε′t

]= δ

′ 1√T

T∑

t=1

Xtε′t, (5.10)


and by the ergodic theorem (e.g., Stout [15] p179–181), we can see

δ′=

[T∑

t=1

ZtX′t

][T∑

t=1

XtX′t

]−1P−→ [

E(ZtX′t

)][E(XtX′t

)]−1. (5.11)

Proof of (i). From the above,

B2SLS − B = OP

[1T

T∑

t=1

Xtε′t

]. (5.12)

In view of Theorem 1.2 (i) of Hosoya [14], the right-hand side of (5.12) converges to 0in probability.

Proof of (ii). From Theorem 3.2 of Hosoya [14], if Assumption 3.1 holds, it follows that

(1/√T)∑T

t=1Xtε′t

d→ U. Hence, Theorem 3.2 is proved.

Acknowledgments

The author would like to thank the Editor and the referees for their comments, which im-proved the original version of this paper.

References

[1] H. Markowitz, Portfolio Selection: Efficient Diversification of Investments, John Wiley & Sons, New York,NY, USA, 1991.

[2] W. Sharpe, “Capital asset prices: a theory of market equilibrium under conditions of risk,” Journal ofFinance, vol. 19, pp. 425–442, 1964.

[3] J. Lintner, “The valuation of risk assets and the selection of risky in vestments in stock portfolios andcapital budgets,” The Review of Economics and Statistics, vol. 47, pp. 13–37, 1965.

[4] F. Black, “Capital market equilibrium with restricted borrowing,” The Journal of Business, vol. 45, pp.444–455, 1972.

[5] J. Y. Campbell, A. W. Lo, and A. C. Mackinlay, The Econometrics of Financial Markets, Princeton Univer-sity Press, Princeton, NJ, USA, 1997.

[6] C. W. J. Granger, “Long memory relationships and the aggregation of dynamic models,” Journal ofEconometrics, vol. 14, no. 2, pp. 227–238, 1980.

[7] P. G. Wright, The Tariff on Animal and Vegetable Oils, Macmillan, New York, NY, USA, 1928.[8] O. Reiersøl, “Confluence analysis by means of instrumental sets of variables,” Arkiv for Matematik,

Astronomi Och Fysik, vol. 32A, no. 4, pp. 1–119, 1945.[9] R. C. Geary, “Determination of linear relations between systematic parts of variables with errors of

observation the variances of which are unknown,” Econometrica, vol. 17, pp. 30–58, 1949.[10] H. White, Asymptotic Theory for Econometricians, Academic Press, New York, NY, USA, 2001.[11] P. M. Robinson and F. J. Hidalgo, “Time series regression with long-range dependence,” The Annals of

Statistics, vol. 25, no. 1, pp. 77–104, 1997.[12] K. Choy and M. Taniguchi, “Stochastic regression model with dependent disturbances,” Journal of

Time Series Analysis, vol. 22, no. 2, pp. 175–196, 2001.[13] P. M. Robinson and Y. Yajima, “Determination of cointegrating rank in fractional systems,” Journal of

Econometrics, vol. 106, no. 2, pp. 217–241, 2002.[14] Y. Hosoya, “A limit theory for long-range dependence and statistical inference on related models,”

The Annals of Statistics, vol. 25, no. 1, pp. 105–137, 1997.[15] W. F. Stout, Almost Sure Convergence, Academic Press, New York, NY, USA, 1974.

advances in decision sciencesdownloads.hindawi.com/journals/specialissues/614639.pdf · optimal...

Documents