Long memory models for volatility and high frequency financial
data econometrics
Dmitri Koulikov
Department of Economics
School of Economics and Management
University of Aarhus
Aarhus C, 8000, Denmark
phone: +45 89421577
e-mail: [email protected]
PhD dissertation submitted to
The Faculty of Social Sciences
University of Aarhus
Completed under supervision of
Professor Niels Haldrup, University of Aarhus
and
Professor Bent Jesper Christensen, University of Aarhus
June 11, 2004
i
Contents
Preface iii
Summary iii
Dask resume iv
Chapter 1 Modeling sequences of long memory non-negative covariance
stationary random variables 1
Chapter 2 Long memory ARCH(∞) models: specification and
quasi–maximum likelihood estimation 26
Chapter 3 Non–stationary models for volatility of speculative returns:
with application to foreign exchange data 53
Chapter 4 Conditional heteroscedasticity model for discrete high-frequency
price changes: with application to IBM trades data 73
ii
1 Preface
I wish to thank Niels Haldrup and Bent Jesper Christensen, my PhD thesis supervisors, for their
effort and advice during the period of my study at the Department of Economics, University
of Aarhus. I also wish to use this opportunity to express my gratitude to Niels Haldrup
for stimulating my interest in long memory time series analysis back in 1999, while he was
supervising my masters thesis. I am greatly indebted to Svend Hylleberg for supporting my
masters education at the Department of Economics in 1998–1999, and for his help during my
PhD studies in 2000–2003.
In addition, I would like to thank my teachers and colleagues from the University of Tartu
and Euro Faculty: Raul Eamets, Arne Gotfredsen, Jens A. Larsen, Helje Kaldaru, Tiiu Paas,
Alf Vanags, Morten Hansen and Kenneth Smith for inspiring and supporting my early inroads
into the field of econometric and economic research.
2 Summary
This thesis presents contribution to the branch of econometric research devoted to the long
memory models for sequences of non-negative stationary random variables (Xt, ψt) : t ∈ Z.Among the most common fields of applications of such models are the conditional heteroscedas-
ticity models and the econometric models for high frequency financial data. An enlightening
recent survey of new theoretical developments for this class of econometric models is Giraitis,
Leipus, Surgailis (2003), for an overview of empirical work in this area refer to Bollerslev, Engle
and Nelson (1994).
In Chapter 1 of this thesis I introduce a class of stationary MD-ARCH(∞) models for
(Xt, ψt) : t ∈ Z, defined as follows:
Xt = ψt εt ψt = a+∞∑
j=1
θj−1(Xt−j − ψt−j) , (1)
where εt : t ∈ Z is a sequence of i.i.d. non-negative random variables and all other parame-
ters are positive. I show that under a mild set of conditions (1) has a non-negative covariance
stationary solution with non-summable autocovariance function, commonly referred to as long
memory. Moreover, the class of MD-ARCH(∞) models includes all short memory covariance
stationary GARCH sequences, and therefore provides a natural extension of the GARCH mod-
els to the long memory case, just as ARFIMA is an extension of the classical ARMA models.
In Chapter 2 I study the class of long memory ARCH(∞) models, as well as the asymptotic
properties of a time-domain QML estimator of such models. In their influential study, Ding
and Granger (1996) define the following ARCH(∞) model:
Xt = ψt εt ψt =∞∑
j=1
πj−1Xt−j , (2)
but leave a number of important issues unresolved. In particular, existence of non-degenerate
long memory solution of this model have not been previously established in the literature.
iii
Under certain conditions, covariance stationary solutions of (1) and (2) are equivalent, allowing
me to show properties of the Ding and Granger (1996) model. The second part of the paper
examines QML estimator for this class of models. One of the most notable results is that the
asymptotic variance of the memory parameter estimator equals to 6π2 , the same as in the class
of linear ARFIMA models.
In Chapter 3 I propose a framework for non–stationary conditional heteroscedasticity mod-
els and examine some empirical evidence of non–stationary volatility. A family of models,
referred to as cMD–ARCH models, suitable for non–stationary conditional heteroscedasticity
time series is defined as:
Xt = ψt = a0 for t ≤ 0
Xt = ψt εt ψt = at +t∑
j=1
θj−1(Xt−j − ψt−j) for t > 0 ,
where at > 0 is a non-stochastic function of the index t and other parameters are similar to the
stationary MD-ARCH(∞) model in Chapter 1. The model allows for separation of deterministic
and stochastic effects in the conditional volatility, similarly to the linear time series case. A
statistical inference theory for the parameters of cMD–ARCH is developed. Consistency and
asymptotic normality of the QML estimator is shown under general assumptions on innovations
εt-s. An empirical application of the new model to thirteen major European and Asian foreign
exchange returns is included, illustrating empirical cases of non–stationary volatility.
In Chapter 4 of this thesis I introduce MD-ARCH(∞)-like models for time series of discrete
price changes in high frequency financial data, allowing for separate modeling of conditional
mean and conditional variance parameters. The paper borrows on ideas from the ordered probit
model, but includes an observation driven dynamic volatility part. Both short and long memory
volatility models are discussed, and an application to the IBM trades dataset is included.
3 Dansk resume
Denne afhandling beskæftiger sig med økonometri-omradet, som handler om modeller med lang
hukommelse for sekvenser af nonnegative stationære stokastiske variabler (Xt, ψt) : t ∈ Z.Iblandt de mest almindelige applikationer af sadanne modeller er de betingede volatilitets
modeller og økonometriske modeller for højfrekvente finansielle data. Giraitis, Leipus, Sur-
gailis (2003) giver en oplysende oversigt over nyere teoretiske bidrag til denne klasse af mod-
eller, og Bollerslev, Engle and Nelson (1994) opsummerer den empiriske forskning inden for
omradet.
I kapitel 1 introducerer jeg en klasse af stationære MD-ARCH(∞) modeller for (Xt, ψt) :
t ∈ Z, defineret som følgende:
Xt = ψt εt ψt = a+∞∑
j=1
θj−1(Xt−j − ψt−j) , (3)
iv
hvor εt : t ∈ Z er en sekvens af i.i.d. nonnegative stokastiske variabler og alle andre parametre
er positive. Under en række milde betingelser viser jeg at (3) har en nonnegativ kovarians
stationær løsning med en ikke sumerbar autokovarians funktion, kendt som lang hukommelse.
Desuden inkluderer MD-ARCH(∞) klassen alle kovarianse stationære GARCH sekvenser med
kort hukommelse og tilfører derfor en naturlig udvidelse af GARCH modellerne til tilfælde med
lang hukommelse, ligesom ARFIMA er en udvidelse af de klassiske ARMA modeller.
I kapitel 2 undersøger jeg klassen af ARCH(∞) modeller med lang hukommelse, samt asymp-
totiske egenskaber af QML estimatoren for disse modeller. I deres banebrydende artikel de-
finerer Ding and Granger (1996) følgende ARCH(∞) model:
Xt = ψt εt ψt =∞∑
j=1
πj−1Xt−j , (4)
men efterlader et antal emner uløste. Specielt er eksistensen af en ikke-degenereret lang hukom-
melses løsning af denne model ikke blevet pavist i litteraturen. Under særlige betingelser er
kovarianse stationære løsninger af (3) og (4) identiske, dette giver mig mulighed for at vise
Ding and Granger (1996) modellens egenskaber. Anden del af kapitlet undersøger QML es-
timatoren for denne klasse af modeller. En af de mest bemærkelsesværdige resultater er at
hukommelsesparameterets asymptotiske varians af er lig med 6π2 , det samme som for ARFIMA
klassen.
I kapitel 3 foreslar jeg en rammemodel for den ikke–stationære betingede heteroscedasticitet
og gennemgar en række empiriske eksempler af ikke-stationær volatilitet. En gruppe af modeller
som kaldes cMD–ARCH modeller, og som passer til ikke–stationær betinget heteroscedastcitet
er defineret som:
Xt = ψt = a0 for t ≤ 0
Xt = ψt εt ψt = at +t∑
j=1
θj−1(Xt−j − ψt−j) for t > 0 ,
hvor at > 0 er en ikke-stokastisk funktion af t–indekset og de andre parametre er identiske
med den stationære MD-ARCH(∞) model i kapitel 1. Denne model tillader en separation
af deterministiske og stokastiske effekter i den betingede volatilitet, som ligner de lineære
tidsrækker. Der udvikles en statistisk inferensteori for parametrene af cMD–ARCH. Under
generelle betingelser for εt residualer er QML estimatoren konsistent og normalfordelt. En
empirisk anvendelse af den nye model til afkast af tretten større Europæiske og Asiatiske valu-
takurser er ligeledes inkluderet, disse illustrerer empiriske tilfælde af ikke–stationær volatilitet.
I kapitel 4 af denne afhandling introducerer jeg MD-ARCH(∞) lignende modeller for tid-
srækker af diskrete prisændringer i høj–frekvent finansiel data, som giver mulighed for adskilte
modeller for betinget middelværdi og betinget volatilitet. Artiklen laner ideer fra den ordered
probit model, men indeholder en volatilitets parameter, styret af observationer. Kapitlet inde-
holder desuden en diskussion af modeller med bade kort og lang hukommelse og en empirisk
anvendelse af disse modeller for IBM data.
v
References
Bollerslev, Tim, Robert F. Engle and Daniel B. Nelson (1994) ARCH models. Handbook of
Econometrics, vol. IV, pp. 2961–3031, New-York: Elsevier Science.
Ding, Z. and Clive W.J. Granger (1996) Modeling volatility persistence of speculative returns:
a new approach. Journal of Econometrics, vol. 73, pp. 185-215.
Giraitis, Liudas, Remigijus Leipus and Donatas Surgailis (2003) Recent advances in ARCH
modelling. Preprint.
vi
Chapter 1: Modeling sequences of long memorynon-negative covariance stationary random variables
1
Modeling sequences of long memory non-negative covariance
stationary random variables
Dmitri Koulikov∗
Department of Economics
School of Economics and Management
University of Aarhus
Aarhus C, 8000, Denmark
phone: +45 89421577
e-mail: [email protected]
This revision:
July 4, 2003
Abstract
This paper extends the class of covariance stationary GARCH processes of Engle (1982)
and Bollerslev (1986) to the case of non-summable autocovariances. We improve on the
results of two previous studies in this field: FIGARCH model of Baillie, Bollerslev and
Mikkelsen (1996), which generates sequences of non-negative random variables with infi-
nite first and higher-order moments, and hyperbolic decay rate of the impulse response
function, and the linear ARCH sequences of Giraitis, Robinson and Surgailis (2000), which
do not contain the class of short-memory covariance stationary GARCH processes. We
use an infinite series representation of GARCH models in terms of martingale differences
innovations referred to as MD-ARCH(∞) representation. This allows for the case of hy-
perbolically decaying square-summable weighting coefficients. Conditions for the existence,
non-negativity and covariance stationarity of the MD-ARCH(∞) sequences are derived,
and the functional limit of normalized partial sums of the process is studied. Applications
of long-memory MD-ARCH(∞) processes include volatility modeling and high-frequency
financial data econometrics.
JEL classification: C22, C51
Keywords: Conditional heteroscedasticity, Long-memory, Weak stationarity, ARCH(∞),
Econometrics of high-frequency financial data
∗The author wishes to thank the participants of the seminars at CORE, Universite Catholique de Louvain,
and at Nuffield College, University of Oxford, for their helpul feedback. Comments from Luc Bauwens, Neil
Shephard and Bent Nielsen are gratefully acknowledged. All remaining mistakes are my own.
2
1 Introduction
In this paper we show that the class of covariance stationary GARCH processes1 of Engle (1982)
and Bollerslev (1986) can be extended to the case of long memory. For the class of linear ARMA
models such extension has been proposed in early eighties by Granger and Joyeux (1980) and
Hosking (1981) and is nowadays widely used in various applications in economics. Evidence of
long-memory and persistent autocorrelations has been documented in many fields in economics,
including volatility of financial series and trading intensity in financial durations data. Short-
memory GARCH processes are widely used in these settings, but their extension to the long-
memory case proved to be non-trivial, largely because of their complicated non-linear structure.
Let εt : t ∈ Z be a sequence of non-negative i.i.d. innovations. Similarly to Engle (1982)
and Bollerslev (1986) we seek to model a sequence of non-negative random variables Xt : t ∈Z recursively defined in the following way:
Xt = ψt · εt ψt = ψ(εt−1, εt−2, . . .) , (1)
where ψ is a measurable function of possibly infinite history of innovations. Non-negativity of
Xt : t ∈ Z implies that we are interested in the class of non-negative functions ψ.
The general formulation given in (1) includes many models, but the primary interest in the
econometric literature has been concentrated on the class of GARCH models, where the function
ψ is linear in the previous history of (Xt, ψt) : t ∈ Z. GARCH processes are appealing for
modeling financial time-series, where ψt : t ∈ Z is interpreted as the time-varying conditional
second moment of returns, squares of which are represented by the sequence Xt : t ∈ Z.Recently, techniques and ideas employed in GARCH literature have been utilized for statistical
modeling of other non-negative processes, most notably high-frequency financial durations data
in Engle and Russell (1998) and Engle (2000), where ψt : t ∈ Z represents the time-varying
trading intensity of financial markets. Recent surveys of GARCH literature are Bollerslev,
Engle and Nelson (1994) and Berkes, Horvath and Kokoszka (2002).
During the last decade substantial empirical evidence of long-memory in volatility and
trading intensity of many financial time-series has been accumulated and documented; see
Andersen, Bollerslev, Diebold and Labys (2001) and Jasiak (1998) among many others. Some
empirical regularities of such data can be modeled in the framework of FIGARCH processes
introduced by Baillie, Bollerslev and Mikkelsen (1996) and Ding and Granger (1996). The
simplest FIGARCH process is given by:
Xt = ψt · εt ψt = a+ [1− (1− L)d]Xt , (2)1Throughout the paper we adopt notation and terminology of the ARCH(∞) literature, whereby we only
consider sequences of non-negative random variables, representing squared returns and volatilities in the main-
stream GARCH literature. Therefore all statistical concepts used in the paper will be based on this notation. For
example, “covariance stationary GARCH sequence” refers to a GARCH model with well-defined time-invariant
autocorrelation function of squared returns and volatility.
3
where Eε0 = 1, L denotes the lag operator, and a > 0 and 0 ≤ d < 1 are given parameters. In
contrast to the original GARCH formulation of Engle (1982) and Bollerslev (1986), FIGARCH
model assign hyperbolic weights to the previous history of the process. As shown in Baillie,
Bollerslev and Mikkelsen (1996), FIGARCH model implies infinite first and higher-order un-
conditional moments of (Xt, ψt) : t ∈ Z, but does feature hyperbolic decay rate of the impulse
response function of ψt.
Infinite unconditional moments of the sequence (Xt, ψt) : t ∈ Z in FIGARCH model
may not be attractive in many empirical settings, especially for modeling financial durations
data, where it implies infinite unconditional expected waiting time until the next high-frequency
event. Moreover, Giraitis, Kokoszka and Leipus (2000) show that the covariance stationary ver-
sion of Ding and Granger (1996) model, which is closely related to (2), has absolutely summable
autocovariance function, and hence short-memory as defined in McLeod and Hipel (1978).
The short-memory nature of the Ding and Granger (1996) model is, among other factors,
due to the summable coefficients of the polynomial 1 − (1 − z)d, which also appears in the
definition of ψt in the FIGARCH process (2). Giraitis, Robinson and Surgailis (2000) relax the
summability requirement by disturbing ψ∗t with zero-centered random variables Yt : t ∈ Z as
shown below:
Xt = Y 2t Yt = ψ∗t · zt ψ∗t = a+ (1− L)−dYt−1 , (3)
where zt : t ∈ Z is a sequence of i.i.d. random variables with mean zero and unit variance.
Giraitis, Robinson and Surgailis (2000) further show that for 0 < d < 12 Xt : t ∈ Z is
covariance stationary with non-summable autocovariance function.
Note that the process ψ∗t : t ∈ Z in the linear ARCH model (3) is defined on R, and hence
lacks the usual volatility interpretation of ψt : t ∈ Z in the class of GARCH processes. This
fact precludes applications of linear ARCH models in high-frequency financial econometrics,
where ψt : t ∈ Z is also required to be non-negative. Additionally, expressions for the
autocovariance function of Xt : t ∈ Z in model (3) are complicated due to the square
transformation of Yt : t ∈ Z.In this paper we introduce a new class of processes, referred to as the MD-ARCH(∞)
class, which extends the covariance-stationary GARCH sequences of Engle (1982) and Boller-
slev (1986) to the case of non-summable autocovariances. We stay in the general framework
of (1), similar to the short-memory GARCH processes, where ψt is a linear function of zero-
centered innovations Xt − ψt : t ∈ Z weighted by a sequence of coefficients θj : j ≥ 0.We show that conditions for the existence, non-negativity and covariance stationarity of MD-
ARCH(∞) sequences allow for the case of square summable θj : j ≥ 0 and long-memory in
(Xt, ψt) : t ∈ Z. We derive closed-form expressions for the moments of (Xt, ψt) : t ∈ Z in
terms of underlying parameters. For an important case of hyperbolically decaying θj : j ≥ 0we show the functional limit of normalized partial sums of Xt : t ∈ Z. Finally, an overview
of the statistical inference methods for the new class of models is also given.
The paper is organized as follows. Section 2 collects main results of the paper, introducing
4
the class of MD-ARCH(∞) sequences, conditions for the existence, non-negativity and station-
arity of the new model, and ending with a functional CLT for partial sums of Xt : t ∈ Z.Section 3 describes semiparametric and fully parametric approaches to the statistical inference
for the long-memory MD-ARCH(∞) sequences. Conclusion summarizes the findings. Full
proofs of the main results of the paper are presented in the Appendix.
2 Sequences of long-memory covariance stationary non-negative
random variables
In this section we introduce and study the class of MD-ARCH(∞) processes. The new class
contains short memory covariance stationary GARCH sequences of Engle (1982) and Boller-
slev (1986). We derive sufficient conditions for the covariance stationarity of the MD-ARCH(∞)
processes and show that these conditions allow for long memory.
2.1 Representations of GARCH sequences
In the series of recent articles Giraitis, Kokoszka and Leipus (2000) and Kazakevicius and
Leipus (2002) study statistical properties of a wide class of GARCH model by expressing them
in the framework of ARCH(∞) processes defined as follows:
Xt = ψt · εt ψt = a∗ +∞∑
j=1
θ∗j−1Xt−j , (4)
where a∗ > 0, θ∗j : j ≥ 0 ⊆ R0+ and εt : t ∈ Z is a sequence of i.i.d. non-negative random
variables. Sufficient conditions for stationarity of ARCH(∞) sequences derived by Giraitis,
Kokoszka and Leipus (2000) imply absolute summability of the coefficients θ∗j : j ≥ 0, and
ultimately the short-memory nature of the process.
Absolute summability of θ∗j : j ≥ 0 in the ARCH(∞) framework of Giraitis, Kokoszka
and Leipus (2000) and Kazakevicius and Leipus (2002) is necessary to ensure convergence of
the infinite series in the definition of ψt in (4). Consider the following alternative to (4):
Xt = ψt · εt ψt = a+∞∑
j=1
θj−1(Xt−j − ψt−j) , (5)
where the following assumptions hold:
A1. εt : t ∈ Z is defined on the common probability space (Ω,F ,P), and consists of i.i.d.
copies of a non-negative random variable ε0 with Eε0 = 1.
A2. a > 0 and θj : j ≥ 0 ⊆ R0+.
The ψt part of (5) is formulated in terms of the sequence of zero-centered innovations Xt−ψt :
t ∈ Z, where Xt − ψt = ψt(εt − 1) and ψt and εt are independent for each t ∈ Z. In this
paper we will only consider the case when the first two moments of ψt are finite for each t ∈ Z.
It follows that E[Xt − ψt] = 0 and E[Xt − ψt|Ft−1] = 0 for each t ∈ Z, Ft being the process
5
filtration, and hence Xt−ψt : t ∈ Z is a sequence of martingale differences innovations. This
structure of innovations, much like in the class of linear ARFIMA processes, can potentially
ensure convergence of the infinite series in (5) without assuming the absolute summability of
θj : j ≥ 0. Model (5) will be referred to as the MD-ARCH(∞) model.
Specification of GARCH models using the sequence of zero-centered innovations Xt −ψt :
t ∈ Z can be traced back to Robinson (1991) and Robinson and Henry (1999). In the latter
study the authors show that particular choice of weighting coefficients in such models can lead
to non-summable autocovariances and long memory. However, they leave the crucial questions
of covariance stationarity and non-negativity of such specifications of GARCH processes open,
noting that results of Giraitis, Kokoszka and Leipus (2000) may contradict their conjectures.
The issues of covariance stationarity and non-negativity of the proposed MD-ARCH(∞)
class of processes are central to this paper. We start with the following examples, demonstrating
the range of potential parametrizations of the MD-ARCH(∞) sequences:
EXAMPLE 1. Consider the covariance stationary GARCH(p,q) model of Engle (1982) and
Bollerslev (1986). It can be written in our notation as:
Xt = ψt · εt ψt = a[1−A(1)− B(1)] + [A(L) + B(L)]ψt +A(L)(Xt − ψt) , (6)
where A(z) :=∑q
j=1 αjzj , B(z) :=
∑pj=1 βjz
j , and A1 holds. Recall that the covariance
stationary assumption implies that all roots of 1−A(z)−B(z) = 0 are outside the unit circle,
see Bollerslev (1986), and hence we can rewrite the process as:
Xt = ψt · εt ψt = a+A(L)[1−A(L)− B(L)]−1(Xt − ψt) . (7)
This representation of the covariance stationary GARCH(p,q) model is closely related to (5). In
particular, covariance stationarity assumption implies that Xt−ψt : t ∈ Z in (7) is the square-
integrable martingale difference sequence. Power series expansion of A(z)[1 − A(z) − B(z)]−1
gives the sequence of absolutely summable coefficients with exponential rate of decay, which
can be found from the following recursion:
θ0 = α1 , θ1 = α2 + α1[β1 + α1]θ0 , θ2 = α3 + α1[β1 + α1]θ1 + α1[β2 + α2]θ0 , . . . (8)
Using these results, the MD-ARCH(∞) representation of GARCH(1,1) model can be written
as follows:
Xt = ψt · εt ψt = a+∞∑
j=1
α1[β1 + α1]j−1(Xt−j − ψt−j) . (9)
Covariance stationarity of the GARCH(p,q) sequence (7) ensures that Eψ20 < ∞ and EX2
0 <
∞.
EXAMPLE 2. A potentially important parametrization of θj : j ≥ 0 in model (5) is given
by the coefficients from the power series expansion of (1− z)−d, previously discussed in Robin-
son (1991) and Robinson and Henry (1999). This specification forms a building block of many
6
parametric long-memory time-series models, most popular being the class of linear ARFIMA
models of Granger and Joyeux (1980) and Hosking (1981). Coefficients of the expansion are
given by:
θj :=Γ(d+ j)
Γ(d)Γ(1 + j)∀j ≥ 0 , (10)
where 0 ≤ d < 12 and Γ is the gamma function. In the class of linear time-series models,
the sequence of square-summable hyperbolically decaying coefficients (10) is known to lead to
the non-summable autocovariance function, and hence long memory as defined in McLeod and
Hipel (1978); see also Hosking (1996).
Among the two examples above the first one is of particular importance, showing that
the covariance stationary GARCH(p,q) model is a member of the class of MD-ARCH(∞) se-
quences, just like it is nested within the original ARCH(∞) framework of Giraitis, Kokoszka
and Leipus (2000) and Kazakevicius and Leipus (2002). More importantly, Example 1 demon-
strates that even though the sequence of innovations Xt − ψt : t ∈ Z in (7) is supported on
R, the process (Xt, ψt) : t ∈ Z is non-negative with probability one.
In the remainder of this section we address the following three issues. First, we show that
the covariance stationary MD-ARCH(∞) sequences can be constructed assuming only square
summability of θj : j ≥ 0, and that non-summable autocovariance function of (Xt, ψt) :
t ∈ Z can be obtained. Second, we study conditions for non-negativity of the MD-ARCH(∞)
sequences. Finally, we derive the functional limit of the appropriately normalized partial sums
of Xt : t ∈ Z in the case of square-summable hyperbolically decaying coefficients θj : j ≥ 0from Example 2. This limit is potentially useful for the semiparametric inference in the class
of long-memory MD-ARCH(∞) sequences.
2.2 Covariance stationary MD-ARCH(∞) sequences
The issue of convergence of the infinite series in the definition of the MD-ARCH(∞) process (5)
is central to the existence and stationarity of the model. The sequence of innovations Xt −ψt : t ∈ Z is constructed from the past history of the process itself, and hence the linear
representation of ψt in (5) cannot be used to study properties of the MD-ARCH(∞) process.
Instead, we follow the approach of Giraitis, Kokoszka and Leipus (2000) and Kazakevicius and
Leipus (2002) and use a Volterra series representation of (5) given by:
Xt = ψt · εt ψt = a
∞∑k=0
M(k, t) , (11)
where for each t ∈ Z sequence M(k, t) : k ≥ 0 is defined as:
M(0, t) := 1 ,
M(k, t) :=∞∑
j1...jk=1
θj1−1 · · · θjk−1(εt−j1 − 1) · · · (εt−j1−...−jk− 1) ∀k ≥ 1 .
(12)
7
Expressed in the Volterra series form, ψt part of the MD-ARCH(∞) process (11) is given by the
infinite sum of random variables M(k, t) : k ≥ 0, which themselves are non-linear functions of
the underlying sequence of innovation εt : t ∈ Z. Under appropriate assumptions on θj : j ≥0 and the moments of εt : t ∈ Z, we are able to show that for each t ∈ Z M(k, t) : k ≥ 0is a L2 sequence of mutually orthogonal elements with exponentially decreasing variance; see
Lemma 1 in the Appendix. This finding provides considerable simplifications in the derivation
of the following two theorems.
THEOREM 1. Under A1–A2 and conditions∞∑
j=1
[log j]2θ2j <∞ , (13)
E(ε0 − 1)2∞∑
j=0
θ2j < 1 , (14)
the process (Xt, ψt) : t ∈ Z defined in (5), equivalently (11)–(12), is finite a.e. on (Ω,F ,P),
and is stationary and ergodic.
Note that conditions (13) and (14) involve only squares of θj : j ≥ 0, and therefore
allow for parametrizations involving hyperbolically decaying coefficients in Example 2. Under
stronger assumption of absolute summability of θj : j ≥ 0, such as in the case of stationary
GARCH sequences (7), existence of the second moment of innovations εt : t ∈ Z implied
by (14) can be relaxed; refer to Kazakevicius and Leipus (2002) for the thorough discussion of
the latter case.
Sufficient conditions for the covariance stationarity of the MD-ARCH(∞) process are given
in the following theorem:
THEOREM 2. Assume A1–A2 and (14). Then the sequence (Xt, ψt) : t ∈ Z defined in (5)
is covariance stationary, where for each t ∈ Z and k ≥ 0:
EXt = Eψt = a , (15)
E [(ψt+k − a)(ψt − a)] =a2E(ε0 − 1)2
1− E(ε0 − 1)2∑∞
j=0 θ2j
∞∑j=0
θjθj+k , (16)
E [(Xt+k − a)(Xt − a)] = E [(ψt+k − a)(ψt − a)] +a2E(ε0 − 1)2
1− E(ε0 − 1)2∑∞
j=0 θ2j
θ∗k , (17)
and θ∗k : k ≥ 0 is defined as θ∗0 := 1, θ∗k := θk−1 for k ≥ 1.
Covariance stationarity condition (14) in Theorem 2 involves only sums of squared weighting
coefficients θj : j ≥ 0, again allowing for parametrizations suggested in Example 2. In
addition, Theorem 2 demonstrates that behavior of the autocovariance function of (Xt, ψt) :
t ∈ Z is determined by the rate of decay of the weighting coefficients, similarly to the class
of linear ARFIMA processes. Hence, the properties of the autocovariance function of MD-
ARCH(∞) sequences are closely related to the properties of θj : j ≥ 0. Implications of
Theorem 2 are elaborated upon in the following examples:
8
EXAMPLE 3. Consider the covariance stationary GARCH(p,q) model in Example 1. Boller-
slev (1986) and Ding and Granger (1996) derive the autocovariance function of the GARCH(1,1)
model (9), where the sequence of innovations εt : t ∈ Z is given by i.i.d. copies of χ21 ran-
dom variable. Simple calculations show that the covariance stationarity condition for the
MD-ARCH(∞) sequences in Theorem 2 reduces in this case to the well known restriction
β21 + 2β1α1 + 3α1 < 1. Similarly, the autocovariance functions (16) and (17) reduce to the
corresponding expressions in Ding and Granger (1996).
The moment structure of general GARCH(p,q) sequences is studied in He and Terasvir-
ta (1999). Due to the excessive amount of technicalities necessary for the direct comparison of
Theorem 2 with their results, we abstain from doing it in this example.
EXAMPLE 4. The rate of decay of the autocovariance function of GARCH(p,q) sequences in
the previous example is known to be exponential, see He and Terasvirta (1999) and Giraitis,
Kokoszka and Leipus (2000), and the autocovariance function can be shown to be summable.
Example 2 introduces three possible parametrizations of the MD-ARCH(∞) model with
hyperbolically decaying square-summable coefficients θj : j ≥ 0. Consider the sequence of
coefficients from the power series expansion of (1− z)−d given in (10). Using notation in terms
of the lag operator L, the MD-ARCH(∞) model (5) can be written in this case as follows:
Xt = ψt · εt ψt = a+ γ(1− L)−d(Xt−1 − ψt−1) , (18)
where an additional parameter γ > 0 is needed to guarantee a.s. non-negativity of the process
as shown in Example 6 of subsection 2.3. The autocovariance function for this model can be
derived using the summation formula∑∞
j=0Γ(d+j)Γ(d+j+k)
Γ2(d)Γ(1+j)Γ(1+j+k)= Γ(1−2d)Γ(d+k)
Γ(d)Γ(1−d)Γ(1−d+k) found in
Hosking (1996), which together with the Stirling’s formula implies that E[(Xt+k−a)(Xt−a)] =
O(k2d−1) , when 0 < d < 12 and the covariance stationarity condition in Theorem 2 is satisfied.
Thus, the autocovariance function of model (18) is non-summable. We refer to model (18) as
the long-memory MD-ARCH(∞) model.
A potentially interesting variation of model (18), allowing for both summable and non-
summable autocovariance function, can be defined as:
Xt = ψt · εt ψt = a+ γ (1− φL)−1(1− L)−d(Xt−1 − ψt−1) , (19)
where 0 ≤ φ < 1 and γ > 0. The sequence of coefficients θj : j ≥ 0 in this specification
is given by θj := γ∑j
k=0 φkθ∗j−k ∀j ≥ 0 , with θ∗j : j ≥ 0 as defined in (10). For d = 0
and 0 < φ < 1 the process (19) reduces to the covariance stationary GARCH(1,1) model,
see (9). The process has non-summable autocovariance function when d > 0 and the covariance
stationarity condition (14) is satisfied.
EXAMPLE 5. One of the models introduced by Ding and Granger (1996) for picking up the
long memory dynamics in the volatility of financial returns can be written in our notation as
9
follows:
Xt = ψt · εt ψt = [1− (1− L)d]Xt . (20)
In the terminology of Ding and Granger (1996) this model corresponds to the case “µ =
1”. While Giraitis, Kokoszka and Leipus (2000) find that the case “µ < 1” has summable
autocovariance function and hence short memory, they could not make definitive conclusion
about the model (20) because it violates their sufficient stationarity condition.
Consider the following parametrization of the MD-ARCH(∞) process (5):
Xt = ψt · εt ψt = a+ [(1− L)−d − 1](Xt − ψt) . (21)
Using similarity to the long-memory MD-ARCH(∞) model from the previous example, one can
show that under conditions of Theorem 2 the model (21) is covariance stationary with non-
summable autocovariance function. This permits inversion of (1−L)−d and allows to establish
equivalence of (21) and the model (20) of Ding and Granger (1996).
2.3 Non-negativity of MD-ARCH(∞) sequences
Definition of the MD-ARCH(∞) process involves the infinite series of weighted zero-centered
innovations Xt − ψt : t ∈ Z. Consequently, non-negativity of the process is not immediate
from the definition and is likely to hold only under suitable parameter restrictions. Recall
that non-negativity of the GARCH(p,q) sequence (7) is a simple consequence of its finite-
dimensional representation (6). Unlike GARCH(p,q), general MD-ARCH(∞) sequences do not
possess such a finite-dimensional form. Instead, we work with a sequence of finite-dimensional
approximations to MD-ARCH(∞) as detailed below.
Define a conditional process (Xt,n, ψt,n) : t ∈ Z, n ≥ 0 as follows:
Xt,n = ψt,n · εt ψt,n = a+ θn(ψ − a) +n∑
j=1
θj−1(Xt−j,n−j − ψt−j,n−j) , (22)
where ψ ∈ R0+ is a known starting value and A1–A2 hold. The conditional process can be
regarded as “started” MD-ARCH(∞) sequence, where the infinite series in (5) is replaced by
the finite stretch of previous innovations of length n. As stated below in the first part of
Theorem 3, the following additional assumption on the sequence of coefficients θj : j ≥ 0 is
sufficient to ensure a.s. non-negativity of the sequence of conditional models (22):
A3. For the sequence ηj : j ≥ 0 defined as:
η0 := θ0 , ηj := θj −j−1∑k=0
θj−1−kηk ∀j ≥ 1 (23)
let ηj : j ≥ 0 ⊆ R0+ and∑∞
j=0 ηj ≤ 1.
10
Assumption A3 strengthens A2 by requiring θj : j ≥ 0 to be strictly positive and decline to
zero sufficiently fast. The sequence of a.s. non-negative conditional models (Xt,n, ψt,n) : n ≥ 0is shown to converge a.e. to the MD-ARCH(∞) process (Xt, ψt) : t ∈ Z for each t ∈ Z in the
second part of Theorem 3, establishing non-negativity of (5):
THEOREM 3. Under A1–A3:
1. For each t ∈ Z, the sequence (Xt,n, ψt,n) : n ≥ 0 defined in (22) is non-negative a.e.
2. Under condition (14), there exists a subsequence nj : j ≥ 0 such that for each t ∈ Z
(Xt,nj , ψt,nj ) : nj ≥ 0 a.s.−→ (Xt, ψt), as j →∞, where (Xt, ψt) is defined in (5).
EXAMPLE 6. Consider the sequences of square-summable hyperbolically decaying coefficients (10)
in the long memory MD-ARCH(∞) process (18). It can be easily shown that the following
inequality holds for the sequence ηj : j ≥ 0 implied by the model:
ηj ≥ γ (d− γ)j ∀j ≥ 0 .
Hence condition d ≥ γ is sufficient for non-negativity of ηj : j ≥ 0. No closed-form solution
is available for the sum of ηj-s in this model, but numerical results show that the second part
of A3 is also satisfied.
In Ding and Granger (1996) model (21) the sequence ηj : j ≥ 0 is given by the coefficients
of the power series expansion of 1− (1− z)d and therefore satisfies A3.
Consider the combined short- and long-memory model (19). Simple calculations show that
the sequence ηj : j ≥ 0 for this process satisfies:
ηj ≥ γ(d− γ + φ)j ∀j ≥ 0
when d(1−d−2φ) ≥ 0. The latter condition together with d−γ+φ ≥ 0 are therefore sufficient
for the first part of A3 to hold. Numerical calculations can be used to show that the second
part of A3 is satisfied as well.
EXAMPLE 7. Representation (6) of the covariance stationary GARCH(p,q) sequences ensures
their a.s. non-negativity. Using (8), one can easily check that A3 is also satisfied, where
ηj : j ≥ 0 is recursively given by:
η0 = α1 , η1 = α2 + β1η0 , η2 = α3 + β1η1 + β2η0 , . . .
This sequence is equivalent to the coefficients θ∗j : j ≥ 0 in the ARCH(∞) representation (4)
of the covariance stationary GARCH(p,q) models. Sufficient covariance stationarity condition
of Giraitis, Kokoszka and Leipus (2000), together with A1, imply that∑∞
j=0 ηj < 1.
2.4 Convergence to fractional Brownian motion
Theorem 2 shows that summability of the autocovariance function for MD-ARCH(∞) model
depends on the properties of θj : j ≥ 0. In particular, GARCH(p,q) model (7) has summable
11
autocovariances, while the sequence of hyperbolically decaying square-summable coefficients in
long-memory MD-ARCH(∞) models (18) and (21) leads to the non-summable autocovariance
function. In the case of linear long-memory ARFIMA models of Granger and Joyeux (1980) and
Hosking (1981), the limit of appropriately normalized partial sums is given by the fractional
Brownian motion; see Marinucci and Robinson (1999) among others. In this subsection we
establish similar result for partial sums of long-memory MD-ARCH(∞) sequences.
Let us introduce the following notation. For 0 < H < 1 and r ∈ R, let BH(r) denote a
zero-mean Gaussian process with the covariance function given by:
E[BH(r1)BH(r2)] =12
(|r1|2H + |r2|2H − |r1 − r2|2H
).
This process was introduced by Mandelbrot and Van Ness (1968) and is commonly referred
to as fractional Brownian motion. Note that the case H = 12 corresponds to the standard
Brownian motion. Let ⇒ stand for convergence of finite-dimensional distributions, and b·cdenote the floor function.
It is well known that the functional limit of the normalized partial sums of random variables
depends on the summability of their autocovariances. Under certain additional assumptions,
such as linear structure of the underlying time series process, summable autocovariance function
implies limit given by the standard Brownian motion. On the other hand, non-summable
autocovariances lead to convergence to the fractional Brownian motion.
Although GARCH(p,q) sequences (7) do not belong to the class of linear time series mod-
els, Giraitis, Kokoszka and Leipus (2000) show that the functional limit of such sequences is
given by the standard Brownian motion. Using similar techniques, Giraitis, Robinson and
Surgailis (2000) demonstrate that the normalized partial sums of their linear ARCH model (3)
converge in finite-dimensional distributions to the fractional Brownian motion. In line with
these results we show that the following limit holds for the long memory MD-ARCH(∞) se-
quences:
THEOREM 4. Under A1–A2, let Xt : t ∈ Z be defined by (5) such that for some d > 0 the
sequence of coefficients θj : j ≥ 0 satisfies θj = O(jd−1) together with the condition (14) of
Theorem 1. Then the following distributional limit holds for each 0 < r ≤ 1:
1
cdTd+ 1
2
bTrc∑t=1
(Xt − a) ⇒ Bd+ 12(r) as T →∞ ,
where for 0 < Kd <∞ coefficient cd is defined as c2d := E(ε0 − 1)2 Eψ20
Kdd(1+2d) .
3 Statistical inference for MD-ARCH(∞) model
This section presents an account of the methods for statistical inference for the MD-ARCH(∞)
sequences introduced in section 2. Considering the novel structure of the model, our discussion
in this section will necessarily have a preliminary character. Using the functional limit result
of subsection 2.4, we first show how a well known semiparametric estimator can be used to
12
obtain inference on the parameter d in the long-memory MD-ARCH(∞) sequences. In the sec-
ond subsection we propose time-domain quasi-maximum likelihood estimator for simultaneous
estimation of all parameters of the model.
3.1 Semiparametric inference for long-memory MD-ARCH(∞) sequences
The class of MD-ARCH(∞) models, introduced in section 2, allows for parsimonious statistical
modeling of long memory covariance stationary non-negative sequences Xt : t ∈ Z. In
particular, we showed that the limiting behavior of the autocovariance function of the long
memory MD-ARCH(∞) model (18) and Ding and Granger (1996) model (21) depends on the
parameter d. Using methods developed for general long-memory sequences, the inference on d
can be obtained separately from the other parameters of the model, which in some applications
are of secondary importance for the researches. In this subsection we discuss estimation of d
using R/S statistic of Hurst (1951).
Let xt : 1 ≤ t ≤ T be a sample of the long-memory MD-ARCH(∞) process with param-
eter d satisfying assumptions of Theorem 4. Giraitis, Kokoszka, Leipus and Teyssiere (2000)
consider application of R/S analysis to linear ARCH sequences (3) of Giraitis, Robinson and
Surgailis (2000) for which the functional limit analogous to that in Theorem 4 is available.
Define the empirical range estimator as follows:
RT = max1≤k≤T
k∑t=1
(xt − xT )− min1≤k≤T
k∑t=1
(xt − xT ) ,
where xT is the usual sample mean. Let s2T be the sample variance estimator given by:
s2T =1T
T∑t=1
(xt − xT )2 .
Under assumptions analogous to that of Theorem 4 in subsection 2.4, Giraitis, Kokoszka, Leipus
and Teyssiere (2000) show the following limit of the R/S statistic as T →∞:
RT
T d+ 12 sT
d−→cd
[max0≤r≤1B
0d+ 1
2
(r)−min0≤r≤1B0d+ 1
2
(r)]
E [(X0 − a)2]12
, (24)
where d−→ denotes convergence in distribution, cd given in Theorem 4 and B0d+ 1
2
(r) = Bd+ 12(r)−
rBd+ 12(r), for 0 ≤ r ≤ 1, is fractional Brownian bridge. Finally, estimator of the long memory
parameter d can be defined as follows:
dT :=log RT
sT
log T− 1
2,
for which the rate of convergence is given by dT − d = Op
([log T ]−1
); refer to (24).
Giraitis, Kokoszka, Leipus and Teyssiere (2000) report results of Monte Carlo study of
the empirical bias and MSE of dT . They conclude that its performance in the setting of the
long-memory linear ARCH model (3) is similar to that found for linear time series models.
13
R/S statistic is also found to have somewhat better MSE compared to a number of other
semiparametric estimators of d.
We note that parameter a in covariance stationary MD-ARCH(∞) sequences can simply be
estimated by xT using (15). Hosking (1996) derives rate of convergence of xT under assump-
tions on the limiting behavior of the autocovariance function similar to that of long-memory
MD-ARCH(∞) models 18 and 21. He shows that xT − a = Op
(T 2d−1
). However, limiting
distribution of xT shown in Hosking (1996) requires linear structure of Xt : t ∈ Z.A number of other semiparametric estimators of the long memory parameter d is available
in the literature. A large class of estimators is based on the log periodogram regressions similar
to the one pioneered by Geweke and Porter-Hudak (1983).
3.2 The quasi-maximum likelihood estimator
Semiparametric methods presented above allow for estimation of only a subset of parameters of
the MD-ARCH(∞) model (5). Apart form efficiency considerations, testing of some interesting
hypothesis is ruled out, in particular, those concerning summability of the autocovariance
function in the framework of the combined short- and long-memory model (19). For joint
estimation of all parameters together with their covariance matrix we propose time domain
quasi-maximum likelihood estimator.
Properties of the QML estimator of short-memory GARCH(p,q) models were previously
addressed by several authors, most notably Lee and Hansen (1994), Lumsdaine (1996) and
Berkes, Horvath and Kokoszka (2003). These studies impose only a handful of moment con-
ditions on the sequence of shocks εt : t ∈ Z, largely compatible with A1. However, all three
rely on the ARCH(∞) representation of GARCH(p,q) sequences with non-zero a∗ and abso-
lutely summable θ∗j : j ≥ 0 in (4). Hence, properties of the QML estimator of the class of
long-memory MD-ARCH(∞) models introduced in section 2 remain unknown. General ideas
associated with such estimator are introduced below.
Recall that the set of parameters of MD-ARCH(∞) processes consists of a and and θj :
j ≥ 0. Let the sequence of coefficients θj : j ≥ 0 be further parametrized by a finite-
dimensional vector κ, and let λ0 = (a,κ)′denote the vector of true parameters. Under A1–A3,
let xt : 1 ≤ t ≤ T be a sample of the Xt part of the MD-ARCH(∞) process, with parameters
evaluated at λ0. The following sequence of functions approximates the unobserved sequence
pt : 1 ≤ t ≤ T, corresponding to the ψt part of the MD-ARCH(∞) process:
w1(u) = α , wt(u) = α+t−1∑j=1
θj−1(u)[xt−j − wt−j(u)] for 1 < t ≤ T , (25)
where u = (α,k)′has the same dimension as λ0. Define the quasi-maximum likelihood function
as follows:
LT (u) = −T∑
t=1
[log wt(u) +
xt
wt(u)
]. (26)
14
The quasi-maximum likelihood function (26) would be the proper likelihood function for the
model if, in addition to A1, we were to assume exponential distribution of ε0. For many
other specific distributional assumptions on ε0, notably those involving exponential family
of distributions, the difference between (26) and respective proper likelihoods will consist of
inessential constants. Finally, the QML estimator of λ0 based on a sample of T observations
of the MD-ARCH(∞) process is defined as:
λT = arg maxu∈U
LT (u) , (27)
where λ0 ∈ U , and U is a compact subspace of Rk, where k is the dimension of λ0, such that
A2–A3 are satisfied. Non-negativity of wt(u) : 1 ≤ t ≤ T for all u ∈ U can then be deduced
from the following representation of (25):
w1(u) = α , wt(u) = α+t−1∑j=1
ηj−1(u)(xt−j − α) for 1 < t ≤ T ,
where ηj(u) : j ≥ 0 is defined in terms of θj(u) : j ≥ 0 as in (23).
However, it is clear that even when evaluated at true parameters, the sequence wt(λ0) : 1 ≤t ≤ T is only a rough approximation to the unobserved sequence pt : 1 ≤ t ≤ T. By analogy
with Lee and Hansen (1994), Lumsdaine (1996) and Berkes, Horvath and Kokoszka (2003), it
appears to be convenient to base the study of asymptotic properties of the QML estimator on
the following unobserved quasi-maximum likelihood function:
LT (u) = −T∑
t=1
[logwt(u) +
xt
wt(u)
], (28)
where the sequence of functions wt(u) : 1 ≤ t ≤ T depends on the unobserved part of the
sample xt : t ≤ 0 and is defined as:
wt(u) = α+∞∑
j=1
θj−1(u)[xt−j − wt−j(u)] for 1 ≤ t ≤ T . (29)
Upon establishing the properties of the QML estimator based on (28) and (29), it needs to be
shown that supu∈U
∣∣∣ 1T LT (u)− 1
T LT (u)∣∣∣ → 0. Much of the previously cited literature on infer-
ence and estimation of the GARCH processes refers to the QML estimator based the likelihood
function (26), respectively (28), as the feasible, respectively infeasible, QML estimator.
Recall that Berkes, Horvath and Kokoszka (2003) establish consistency and asymptotic
normality of the QML estimator for GARCH(p,q) processes using their ARCH(∞) representa-
tion. As shown in Section 2, covariance stationary GARCH(p,q) processes have MD-ARCH(∞)
representation with exponentially decaying sequence of weighting coefficients θj : j ≥ 0. We
conjecture that the same conclusions hold with respect to the asymptotic properties of the
QML estimator (27) when applied to the covariance stationary MD-ARCH(∞) sequences. It
is also likely that in the case of long-memory MD-ARCH(∞) sequences the rate of convergence
of the first element of u, which corresponds to the parameter a, will be different from Op(T−12 )
and will depend on the degree of long memory.
15
4 Conclusion
This paper introduces a class of models for sequences of long memory covariance stationary
non-negative random variables. The new class, referred to as MD-ARCH(∞) class, is closely
related to the GARCH sequences of Engle (1982) and Bollerslev (1986), having multiplica-
tive structure of innovations and linear dependence on its own history. Using a representation
in terms of a sequence of martingale differences, similar to the one in Robinson (1991) and
Robinson and Henry (1999), the MD-ARCH(∞) model with the sequence of hyperbolically de-
caying square-summable weighting coefficients is shown to have non-summable autocovariance
function. In addition, the MD-ARCH(∞) class includes the usual covariance stationary short
memory GARCH sequences, providing the natural extension of GARCH models to the long
memory case.
Models introduced in this paper have a range of potential empirical applications in a variety
of fields. One of the most interesting questions which can be addressed using the class of models
introduced in this paper is testing the hypothesis of long memory versus the alternative of short
memory in samples of non-negative random variables, e. g. squared returns on financial and real
assets. However more work is needed before the properties of parametric estimators, such as the
QML estimator described in Section 3, become known for the class of MD-ARCH(∞) processes.
Another area of application of the MD-ARCH(∞) sequences is econometrics of high-frequency
financial data. Time-series dynamics of durations between events in transactions data is compli-
cated and often exhibits long-range dependence. However, direct application of the FIGARCH
models to this data implies infinite first unconditional moment of financial durations, feature
that is hardly attractive for the empirical models in this field. We believe that researches
working with high-frequency financial durations data will find results and ideas of this paper
useful.
5 Appendix
This technical appendix collects proofs of the main theorems in Section 2. In the appendix we
use notation ut := εt − 1, where by assumption A1 ut : t ∈ Z is zero mean i.i.d. sequence. In
addition, we use convention∑n
m · = 0 whenever m > n for m,n ∈ Z.
LEMMA 1. Under conditions of Theorem 2, for each t ∈ Z the sequence M(k, t) : k ≥ 0defined in (12) is orthogonal in L2 with moments given by:
EM(k, t) = 0 , EM(k, t)2 =
Eu20
∞∑j=0
θ2j
k
for each k ≥ 1. Sequence M(k, t) : t ∈ Z is stationary and ergodic for each k ≥ 0.
PROOF: As shown in Kokoszka and Leipus (2000) the following recursive equality holds for
16
the sequence M(k, t) : k ≥ 0:
M(0, t) = 1 , M(k, t) =∞∑
j=1
θj−1ut−jM(k − 1, t− j) for k ≥ 1 , t ∈ Z . (A.1)
By the conditions (14) Eu20 < ∞ and θj : j ≥ 0 is square-summable. Then by Lemma 2.2.2
of Stout (1974) M(1, t) : t ∈ Z ⊆ L2. Stationarity and ergodicity of M(1, t) : t ∈ Zfollows from Lemma 3.5.8 and Theorem 3.5.8 of Stout (1974) by assumptions on ut : t ∈ Zand (A.1).
Let M(k − 1, t) : t ∈ Z ⊆ L2 be stationarity and ergodic for given k ≥ 1. Then
the sequence utM(k − 1, t) : t ∈ Z is orthogonal in L2, where we use independence of ut
and M(k − 1, t) for each t ∈ Z. From (A.1) and Lemma 2.2.2 of Stout (1974) follows that
M(k, t) : t ∈ Z ⊆ L2. Stationarity and ergodicity of M(k, t) : t ∈ Z is the consequence
of Theorem 3.5.8 of Stout (1974) and assumptions on ut : t ∈ Z and M(k − 1, t) : t ∈ Ztogether with representation (A.1). Induction shows that M(k, t) : k ≥ 0 ⊆ L2 for each t ∈ Z
and that M(k, t) : t ∈ Z is stationary and ergodic for each k ≥ 0.
Previous results together with Theorem 2.3.1 of Lukacs (1975) allow us to obtain for
each t ∈ Z, k ≥ 1: EM(k, t) =∑∞
j=1 θj−1 E[ut−jM(k − 1, t − j)] = 0 and EM(k, t)2 =∑∞j=1 θ
2j−1 E[u2
t−jM(k − 1, t− j)2], from where EM(k, t)2 =[Eu2
0
∑∞j=0 θ
2j
]k.
Similarly, expression for covariance of M(1, t) and elements of M(k, t) : k > 1 is given by:
E[M(1, t)M(k, t)] =∑∞
j1=1
∑∞j2=1 θj1−1θj2−1E[ut−j1ut−j2M(k − 1, t − j2)] = 0, where we use
the independence of ut−m from the other terms under the expectation operator for m = j1∧ j2,j1, j2 ≥ 1, j1 6= j2, and EM(k − 1, t) = 0 for each t ∈ Z, k > 1. It follows trivially that
E[M(0, t)M(k, t)] = 0 for each k ≥ 1.
Let M(p − 1, t) be orthogonal w.r.t. elements of M(k, t) : k ≥ 1, k 6= p − 1. Then
similar arguments show: E[M(p, t)M(k, t)] =∑∞
j1=1
∑∞j2=1 θj1−1θj2−1E[ut−j1ut−j2M(p− 1, t−
j1)M(k − 1, t− j2)] = 0. By induction M(k, t) : k ≥ 0 is orthogonal in L2 for each t ∈ Z.
LEMMA 2. Under conditions of Theorem 2, for each t ∈ Z and 0 ≤ n < ∞ define sequences
of random variables Mn(k, t) : k ≥ 0 and Mn(k, t) : k ≥ 0 as follows:
Mn(0, t) := 1 ,
Mn(k, t) :=j1+...+jk≤n∑
j1...jk=1
θj1−1 · · · θjk−1ut−j1 · · ·ut−j1−...−jk∀1 ≤ k ≤ n ,
(A.2)
Mn(0, t) := θn ,
Mn(k, t) :=j1+...+jk≤n∑
j1...jk=1
θj1−1 · · · θjk−1θn−j1−...−jkut−j1 · · ·ut−j1−...−jk
∀1 ≤ k ≤ n ,(A.3)
and Mn(k, t) = Mn(k, t) := 0 for k > n. Then Mn(k, t) : k ≥ 0 and Mn(k, t) : k ≥ 0 are
orthogonal sequences in L2 with the following properties:
E[Mn(p, t)M(k, t)] = 0 , E[Mn(p, t)M(k, t)] = 0 , E[Mn(p, t)Mn(k, t)] = 0
17
for p, k ≥ 0, p 6= k.
PROOF: Definitions of Mn(k, t) and Mn(k, t) involve only finite sums, and it follows imme-
diately that Mn(k, t) : k ≥ 0 ⊆ L2 and Mn(k, t) : k ≥ 0 ⊆ L2 for each t ∈ Z and
0 ≤ n < ∞ under assumptions of Theorem 2. Results EMn(k, t) = E Mn(k, t) = 0 for k ≥ 1
and E[Mn(p, t)Mn(k, t)] = E[Mn(p, t)Mn(k, t)] = 0 for p, k ≥ 0, p 6= k follow easily by i.i.d.
property of ut : t ∈ Z.Analogously to (A.1), Mn(k, t) and Mn(k, t) admit the following recursive representations:
Mn(0, t) = 1 , Mn(k, t) =n−k+1∑
j=1
θj−1ut−jMn−j(k − 1, t− j) ∀1 ≤ k ≤ n , (A.4)
Mn(0, t) = 1 , Mn(k, t) =n−k+1∑
j=1
θj−1ut−jMn−j(k − 1, t− j) ∀1 ≤ k ≤ n (A.5)
for each t ∈ Z and 0 ≤ n <∞. Using Theorem 2.3.1 of Lukacs (1975) together with arguments
presented in Lemma 1 we get:
E[Mn(1, t)M(k, t)] =n−k+1∑j1=1
∞∑j2=1
θj1−1θj2−1E[ut−j1ut−j2M(k − 1, t− j2)] = 0
for each 0 ≤ n < ∞ and k > 1. Trivially, E[Mn(0, t)M(k, t)] = E[M(0, t)Mn(k, t)] = 0 for
each k ≥ 1. Assume that for each 0 ≤ n < ∞ Mn(p − 1, t) is orthogonal w.r.t. elements of
M(k, t) : k ≥ 1, k 6= p− 1. Then similar considerations lead to:
E[Mn(p, t)M(k, t)] =n−k+1∑j1=1
∞∑j2=1
θj1−1θj2−1E[ut−j1ut−j2Mn−j1(p− 1, t− j1)M(k − 1, t− j2)] = 0 .
By induction E[Mn(p, t)M(k, t)] = 0 for p, k ≥ 0, p 6= k. Similar line of reasoning can be used
to establish the remaining results.
PROOF OF THEOREM 1: As shown in Lemma 1, the sequence M(k, t) : k ≥ 0, for
given t ∈ Z, can be written in the linear form (A.1). The sequence of innovations in (A.1) is
orthogonal in L2, weighted by the coefficients θj : j ≥ 0. Theorem 2.3.2 of Stout (1974) and
condition (13) imply that M(k, t) is finite a.e. on (Ω,F ,P) for every k ≥ 0.
Consider representation (11) of the MD-ARCH(∞) process. Using expression for EM(k, t)2
derived in Lemma 1 and condition (14), it follows by the standard inifinite series convergence
tests that∑∞
k=1[log k]2EM(k, t)2 <∞ . Orthogonality of the sequence M(k, t) : k ≥ 0 shown
in Lemma 1 together with Theorem 2.3.2 of Stout (1974) imply that the infinite series in (11)
is finite a.e. on (Ω,F ,P).
From (11) and (12) Xt = X(εt, εt−1, . . .) and ψt = ψ(εt−1, εt−2, . . .), where X and ψ are
measurable functions of εt : t ∈ Z. Hence, stationarity and ergodicity of (Xt, ψt) : t ∈ Zfollows from Lemma 3.5.8 and Theorem 3.5.8 of Stout (1974).
18
PROOF OF THEOREM 2: Using orthogonality of M(k, t) : k ≥ 0, for given t ∈ Z, shown
in Lemma 1, and Lemma 2.2.2 of Stout (1974) infinite series in (11) will converge in L2 if∑∞k=0 EM(k, t)2 <∞. This holds by the condition (14). The first unconditional moment of ψt
is given by: Eψt = a∑∞
k=0 EM(k, t) = a, which coincides with the first unconditional moment
of Xt.
The following auxiliary result is used in the derivation of the autocovariance function of
(Xt, ψt) : t ∈ Z, where n ≥ 0:
E [M(p, t+ n)M(k, t)]
=∞∑
j1,j2=1
θj1−1θj2−1E[ut+n−j1ut−j2M(p− 1, t+ n− j1)M(k − 1, t− j2)]
= Eu20
∞∑j=1
θj−1θj+n−1E[M(p− 1, t− j)M(k − 1, t− j)] ,
where last equality is justified by the fact that for (j1−n) 6= j2 andm = (j1−n)∧j2 for j1, j2 ≥ 1
(εt−m−1) is independent of the rest of the terms under the expectation operator, producing zero
terms. By orthogonality of M(k, t) : k ≥ 0 for given t ∈ Z, E[M(p− 1, t− j)M(k − 1, t− j)]
are different from zero only for p = k. Hence:
E[M(p, t+ n)M(k, t)] =
Eu2
0
∞∑j=1
θj−1θj+n−1EM(p− 1, t− j)2 for p = k
0 for p 6= k.
This result together with stationarity of M(k, t) : t ∈ Z for given k ≥ 0 is used to derive
autocovariance function of ψt : t ∈ Z as follows:
E[(ψt+n − a)(ψt − a)] = a2∞∑
p=1
∞∑k=1
E[M(p, t+ n)M(k, t)]
= a2 Eu20
( ∞∑k=0
EM(k, t)2)( ∞∑
j=0
θjθj+n
)=
a2 Eu20
1− Eu20
∑∞j=0 θ
2j
∞∑j=0
θjθj+n .
Finally, the autocorrelation function of Xt : t ∈ Z is derived using its representation given
below:
Xt − a =∞∑
j=1
θj−1(Xt−j − ψt−j) + (Xt − ψt) =∞∑
j=0
θ∗j (Xt−j − ψt−j) =∞∑
j=0
θ∗jut−jψt−j .
Note that the sequence utψt : t ∈ Z is orthogonal in L2 by the previous results and assump-
tions on ut : t ∈ Z, and θ∗j : j ≥ 0 is square-summable. Hence, standard results of Brockwell
and Davis (1991) can be used to show: E[(Xt+n−a)(Xt−a)] = Eu20
∑∞j=0 θ
∗j θ
∗j+nEψ2
t−j . Using
stationarity of ψt : t ∈ Z together with the expression for its variance derived above we arrive
at the desired result.
19
PROOF OF THEOREM 3: By simple recursive substitution, for each n ≥ 0 and t ∈ Z, ψt,n
part of the conditional process (22) can be written as follows:
ψt,0 = ψ , ψt,n = a+n∑
j=1
ηj−1(Xt−j,n−j − a) + ηn(ψ − a) ∀n ≥ 1 .
By A3 the sequence ηj : j ≥ 0 is non-negative with∑n
j=0 ηj ≤ 1 for all n ≥ 0, showing a.s.
non-negativity of (Xt,n, ψt,n) : n ≥ 0.To prove the second part of the theorem we use another representation of ψt,n, which again
is derived by simple recursive substitution:
ψt,n =n∑
k=0
[aMn(k, t) + (ψ − a)Mn(k, t)] ∀n ≥ 0 and ∀ t ∈ Z .
It follows from Lemma 2 that ψt,n : n ≥ 0 ⊆ L2 for every t ∈ Z. In the following we study L2
convergence of random variables (ψt − ψt,n) as n→∞. Using the previous expression for ψt,n
together with equation (11) and Lemma 1 and 2 we can write for each each t ∈ Z and n ≥ 0:
E(ψt − ψt,n)2 =n∑
k=0
E[a[M(k, t)−Mn(k, t)]− (ψ − a)Mn(k, t)
]2+ a2
∞∑k=n+1
EM(k, t)2 .
Since∑∞
k=0 EM(k, t)2 <∞ by condition (14), the second part of this expression converges to
zero as n → ∞. Consider now the first part. Using inequality (A.8) we can write for each
n ≥ 0:
0 ≤n∑
k=0
E[a[M(k, t)−Mn(k, t)]− (ψ − a)Mn(k, t)
]2≤
∞∑k=0
E[a[M(k, t)−Mn(k, t)]− (ψ − a)Mn(k, t)
]2≤
(a2 + (ψ − a)2
∞∑j=0
θ2j
) ∞∑k=0
EM(k, t)2 <∞ .
Given ε > 0, we can then choose K ≥ 0 s.t. the following inequality will hold for each n ≥ 0
and t ∈ Z:
0 ≤∞∑
k=0
E[a[M(k, t)−Mn(k, t)]− (ψ − a)Mn(k, t)
]2≤
K∑k=0
E[a[M(k, t)−Mn(k, t)]− (ψ − a)Mn(k, t)
]2+
(a2 + (ψ − a)2
∞∑j=0
θ2j
) ∞∑k=K
EM(k, t)2 ≤
K∑k=0
E[a[M(k, t)−Mn(k, t)]− (ψ − a)Mn(k, t)
]2+ε
2.
From (A.7) follows that the last sum in the expression above converges to zero as n → ∞,
establishing that E(ψt−ψt,n)2 → 0. The result of the theorem then follows from Corollary 2.1.1
20
of Stout (1974). Using a.s. convergence of ψt,n : n ≥ 0 to ψt and definitions of Xt,n and Xt
in respectively (22) and (5), the second part of the theorem is established.
Auxiliary results (A.7) and (A.8), which we use above, are derived as follows. Using re-
cursive representations (A.1), (A.4) and (A.5) together with orthogonality of terms under the
summation sign we can write for each n ≥ 0, k ≥ 1 and t ∈ Z:
E[a[M(k, t)−Mn(k, t)]− (ψ − a)Mn(k, t)
]2=
Eu20
[n−k+1∑
j=1
θ2j−1E
[a[M(k − 1, t− j)−Mn−j(k − 1, t− j)]−
(ψ − a)Mn−j(k − 1, t− j)]2
+ a2∞∑
j=n−k+2
θ2j−1EM(k − 1, t− j)2
].
(A.6)
Consider the case k = 1. The right-hand side of (A.6) reduces to: Eu20
[(ψ−a)2·
∑n−1j=0 θ
2j θ
2n−j−1+
a2∑∞
j=n θ2j
]. Using property of absolutely convergent series
(∑∞j=0 θ
2j
)2=
∑∞i=0
∑ij=0 θ
2j θ
2i−j ,
we conclude that for each t ∈ Z this expression converges to zero as n → ∞. In addition, the
following inequality holds: E[a[M(1, t)−Mn(1, t)]− (ψ − a)Mn(1, t)
]2≤
((ψ − a)2
∑∞j=0 θ
2j +
a2)EM(1, t)2 . Assume now that for each t ∈ Z E
[a[M(k−1, t)−Mn(k−1, t)]−(ψ−a)Mn(k−
1, t)]2→ 0 as n→∞ . We can rewrite (A.6) in the following way:
E[a[M(k, t)−Mn(k, t)]− (ψ − a)Mn(k, t)
]2=
Eu20
[ ∞∑i=0
θ2n−k−i1i≤n−kE
[a[M(k − 1, t− 1− i)−Mk−1+i(k − 1, t− 1− i)]−
(ψ − a)Mk−1+i(k − 1, t− 1− i)]2
+ a2(Eu2
0
∞∑j=0
θ2j
)k−1∞∑
j=n−k+1
θ2j
].
The second term on the right-hand side of this expression converges to zero as n → ∞ by
the square-summability of θj : j ≥ 0 implied by the condition in Theorem 2. The square-
summability also ensures that:∑∞
i=0 θ2n−k−i1i≤n−k ≤
∑∞j=0 θ
2j < ∞ for all n ≥ 0, and
θ2n−k−i1i≤n−k → 0 as n→∞ for each i ≥ 0. Using Lemma 3.2.3 of Stout (1974) we conclude
that the first right-hand side term of the expression above also converges to zero as n → ∞.
By induction we conclude that for each k ≥ 0 and t ∈ Z:
E[a[M(k, t)−Mn(k, t)]− (ψ − a)Mn(k, t)
]2→ 0 as n→∞ , (A.7)
where the case k = 0 follows trivially. Assume now that for each t ∈ Z E[a[M(k − 1, t) −
Mn(k − 1, t)]− (ψ − a)Mn(k − 1, t)]2≤
(a2 + (ψ − a)2
∑∞j=0 θ
2j
)EM(k − 1, t)2 . Substituting
this inequality into (A.6) and collecting terms we conclude that for each n, k ≥ 0 and t ∈ Z:
E[a[M(k, t)−Mn(k, t)]− (ψ − a)Mn(k, t)
]2≤
(a2 + (ψ − a)2
∞∑j=0
θ2j
)EM(k, t)2 , (A.8)
21
where the case k = 0 is trivial.
PROOF OF THEOREM 4: Define νt := ψtut and write Xt = ψt + νt. Then:
1
T d+ 12
bTrc∑t=1
(Xt − EXt) =1
T d+ 12
bTrc∑t=1
(ψt − a) +1
T d+ 12
bTrc∑t=1
νt ,
where EXt = a is from (15) by assumed stationarity of (Xt, ψt) : t ∈ Z. The second term in
this expression is asymptotically negligible:
1T 2d+1
E( T∑
t=1
νt
)2=
1T 2d
Eu20 Eψ2
0 → 0 as T →∞ ,
where we again use stationarity assumption together with orthogonality of νt : t ∈ Z. Hence,
it is sufficient to show that:
1
T d+ 12
bTrc∑t=1
(ψt − a) ⇒ cdBd+ 12(r) . (A.9)
As in Giraitis, Robinson and Surgailis (2000), we use representation (5) to write:
ψt − a =∞∑
j=1
θj−1E[ψt−j |F+t−j−M ]ut−j +
∞∑j=1
θj−1
(ψt−j − E[ψt−j |F+
t−j−M ])ut−j
:=z−t + z+t ,
where M ≥ 0 is a given integer, F+t denotes information generated by the process (Xt, ψt) :
t ∈ Z from t onwards and θj = O(jd−1). It follows that z+t : t ∈ Z and z−t : t ∈ Z are
stationary L2 sequences. We can write:
1
T d+ 12
bTrc∑t=1
(ψt − a) =1
T d+ 12
bTrc∑t=1
z−t +1
T d+ 12
bTrc∑t=1
z+t . (A.10)
We show that, by choosing sufficiently large M , variance of the second term in (A.10) can
be made arbitrary close to zero, and hence it can be ignored in subsequent derivations of the
limiting process:
1T 2d+1
E( T∑
t=1
z+t
)2=
1T 2d+1
T∑t=1
T∑s=1
E[z+t z
+s ]
= Eu20 E
(ψ0 − E[ψ0|F+
−M ])2 1T 2d+1
T∑t=1
T∑s=1
∞∑j=0
θjθj+|t−s| .
The following limit relies on the assumed structure of θj : j ≥ 0:
1T 2d+1
T∑t=1
T∑s=1
∞∑j=0
θjθj+|t−s| →Kd
d(1 + 2d)as T →∞ ,
where constant 0 < Kd <∞ depends on d and is possibly different for various parametrizations
of θj : j ≥ 0; refer to Giraitis, Robinson and Surgailis (2000) and Example 4. It follows that
22
1T 2d+1 E
(∑Tt=1 z
+t
)2→ Eu2
0 E(ψ0 − E[ψ0|F+
−M ])2 Kd
d(1+2d) as T → ∞ , which in turn goes to
zero as M increases since E(ψ0 − E[ψ0|F+
−M ])2 → 0 as M → ∞ . Consider now first part
of (A.10). Using previous results we can write:
1T 2d+1
E( T∑
t=1
z−t
)2= Eu2
0 E(E[ψ0|F+
−M ])2 1T 2d+1
T∑t=1
T∑s=1
∞∑j=0
θjθj+|t−s|
→Eu20 E
(E[ψ0|F+
−M ])2 Kd
d(1 + 2d)as T →∞ .
By choosing sufficiently large value of M , the last expression can be made arbitrary close
to c2d := Eu20 Eψ2
0Kd
d(1+2d) . Result (A.9) then follows by arguments in Giraitis, Robinson and
Surgailis (2000) based on linear structure of z−t , where E[ψt|F+t−M ]ut : t ∈ Z is stationary
M -dependent orthogonal sequence in L2.
References
Andersen, Torben G., Tim Bollerslev, Francis Diebold, and Paul Labys (2001) The distribution
of exchange rate volatility. Journal of the American Statistical Association, vol. 96, pp. 42-
55.
Baillie, Richard T., Tim Bollerslev and Hans O. Mikkelsen (1996) Fractionally integrated gen-
eralized autoregressive conditional heteroskedasticity. Journal of Econometrics, vol. 74,
pp. 3-30.
Berkes, Istvan, Lajos Horvath and Piotr Kokoszka (2003) GARCH processes: Structure and
estimation. Bernoulli, vol. 9.
Berkes, Istvan, Lajos Horvath and Piotr Kokoszka (2002) Probabilistic and statistical proper-
ties of GARCH processes. Preprint.
Bollerslev, Tim (1986) Generalized autoregressive conditional heteroskedasticity. Journal of
Econometrics, vol. 31, pp. 307-327.
Bollerslev, Tim, Robert F. Engle and D. B. Nelson (1994) ARCH models. Handbook of Econo-
metrics, vol. IV, pp. 2961-3031, New-York: Elsevier Science.
Bollerslev, Tim and Hans O. Mikkelsen (1996) Modeling and pricing long memory in stock
market volatility. Journal of Econometrics, vol. 73, pp. 151-184.
Brockwell, Peter J. and Richard A. Davis (1991) Time series: theory and methods. Second
Edition, New-York: Springer-Verlag.
Chung, Ching-Fan and Richard T. Baillie (1993) Small sample bias in conditional sum-of-
squares estimators of fractionally integrated ARMA models. Empirical Economics, vol. 18,
pp. 791-806.
Ding, Z. and Clive W.J. Granger (1996) Modeling volatility persistence of speculative returns:
a new approach. Journal of Econometrics, vol. 73, pp. 185-215.
Engle, Robert F. (2000) The econometrics of ultra-high-frequency data. Econometrica, vol. 68,
no. 1, pp. 1-22.
23
Engle, Robert F. (1982) Autoregressive conditional heteroskedasticity with estimates of the
variance of U.K. inflation. Econometrica, vol. 50, pp. 987-1008.
Engle, Robert F. and Bollerslev, Tim (1986) Modeling persistence of conditional variance.
Econometric Reviews, vol. 5, pp. 1-50.
Engle, Robert F. and Jeffrey R. Russell (1998) Autoregressive conditional duration: a new
model for irregularly spaced transaction data. Econometrica, vol. 66, pp. 1127-1162.
Geweke, John and Susan Porter-Hudak (1983) The estimation and application of long memory
time series models. Journal of Time Series Analysis, vol. 4, pp. 221-238.
Giraitis, Liudas, Piotr Kokoszka and Remigijus Leipus (2000) Stationary ARCH models: de-
pendence structure and central limit theorem. Econometric Theory, vol. 16, pp. 3-22.
Giraitis, Liudas, Piotr Kokoszka, Remigijus Leipus and Gilles Teyssiere (2000) Semiparametric
estimation of the intensity of long memory in conditional heteroskedasticity. Statistical
Inference for Stochastic Processes, vol. 3, pp. 113-128.
Giraitis, Liudas, Peter M. Robinson and Donatas Surgailis (2000) A model for long memory
conditional heteroscedasticity. The Annals of Applied Probability, vol. 10, pp. 1002-1024.
Granger, Clive W.J. and Joyeux, R. (1980) An introduction to long memory time series models
and fractional differencing. Journal of Time Series Analysis, vol. 1, pp. 15-39.
Hosking, Jonathan R.M. (1981) Fractional differencing. Biometrika, vol. 68, pp. 165-76.
Hosking, Jonathan R.M. (1996) Asymptotic distributions of the sample mean, autocovariances,
and autocorrelations of long-memory time series. Journal of Econometrics, vol. 73, pp. 261-
284.
Jasiak, Joanna (1998) Persistence in intertrade durations. Finance, vol. 19, pp. 166-195.
Kazakevicius, Vytautas and Remigijus Leipus (2002) On stationarity in the ARCH(∞) model.
Econometric Theory, vol. 18, pp. 1-16.
Kokoszka, Piotr and Remigijus Leipus (2000) Chainge-point estimation in ARCH models.
Bernoulli, vol. 6, pp. 1-28.
Lee, Sang-Won and Bruce E. Hansen (1994) Asymptotic theory for the GARCH(1,1) quasi-
maximum likelihood estimator. Econometric Theory, vol. 10, pp. 29-52.
Lukacs, Eugene (1975) Stochastic convergence. Second Edition, Academic Press, Inc.
Lumsdaine, Robin L. (1996) Consistency and asymptotic normality of the quasi-maximum like-
lihood estimator in IGARCH(1,1) and covariance stationary GARCH(1,1) models. Econo-
metrica, vol. 64, pp. 575-596.
Mandelbrot, B. and J. W. Van Ness (1968) Fractional Brownian motions, fractional noises and
applications. SIAM Review, vol. 10, pp. 422-437.
Marinucci, D. and Peter M. Robinson (1999) Alternative forms of fractional Brownian motion.
Journal of Statistical Planning and Inference, vol. 80, pp. 111-122.
McLeod, A. I. and K. W. Hipel (1978) Preservation of the rescaled adjusted range, 1: a
reassessment of the Hurst phenomenon, Water Resources Research, vol. 14, pp. 491-508.
Nelson, Daniel B. (1990) Stationarity and persistence in the GARCH(1,1) model. Econometric
24
Theory, vol. 6, pp. 318-334.
Robinson, Peter M. (1991) Testing for strong serial correlation and dynamic conditional het-
eroskedasticity in multiple regression. Journal of Econometrics, vol. 47, pp. 67-84.
Robinson, Peter M. and M. Henry (1999) Long and short memory conditional heteroscedasticity
in estimating the memory parameter of levels. Econometric Theory, vol. 15, pp. 299-336.
Stout, William F. (1974) Almost sure convergence. Academic Press, Inc.
25
Chapter 2: Long memory ARCH(∞) models:specification and quasi–maximum likelihood
estimation
26
Long memory ARCH(∞) models: specification and
quasi-maximum likelihood estimation
Dmitri Koulikov∗
Department of Economics
School of Economics and Management
University of Aarhus
Aarhus C, 8000, Denmark
phone: +45 89421577
e-mail: [email protected]
This revision:
December 8, 2003
Abstract
The paper introduces the long memory ARCH(∞) model and studies the asymptotic
properties of its QML estimator. The class of ARCH(∞) sequences of Robinson (1991)
includes many popular models for the dynamic conditional volatility and high frequency
financial data econometrics. Giraitis, Kokoszka and Leipus (2000) show that the covari-
ance stationary solution of the ARCH(∞) model with non-zero intercept has a summable
autocovariance function, and hence short memory as defined in McLeod and Hipel (1978).
First part of the paper shows that a long memory non-negative solution of the ARCH(∞)
model exists as well, but requires the intercept to be zero. The result is established us-
ing equivalence between covariance stationary ARCH(∞) and MD-ARCH(∞) models, the
latter studied in Koulikov (2003). The second part of the paper examines the properties
of a time-domain QML estimator of the memory parameter of the new model. Strong
consistency and asymptotic normality of the infeasible estimator is established, while only
consistency result holds for the feasible case. A Monte Carlo experiment is conducted to
assess properties of the QML estimator in finite samples.
JEL classification: C13, C15, C22
Keywords: Conditional heteroscedasticity, Long-memory, ARCH(∞), Weak stationarity,
Quasi-maximum likelihood estimation
∗Most important parts of the paper were completed during my visit to Nuffield College, University of Oxford,
in spring 2003. Their hospitality is gratefully acknowledged. I would also like to thank Bent Jesper Christensen,
Neil Shephard, Bent Nielsen and Matthias Winkel for their helpful suggestions.
27
1 Introduction
The class of ARCH(∞) processes, defined in Robinson (1991) and studied in details in recent
papers by Giraitis, Kokoszka and Leipus (2000) and Kazakevicius and Leipus (2002), serves as
an important class of models for dynamic sequences of non-negative random variables, hence-
forth denoted as (Xt, ψt) : t ∈ Z. The stationary sequence (Xt, ψt) : t ∈ Z is said to be
ARCH(∞) if it satisfies the following set of stochastic equations:
Xt = ψt · εt ψt = a∗ +∞∑j=1
πj−1Xt−j , (1)
where εt : t ∈ Z is the sequence of non-negative shocks, and all parameters are assumed to be
non-negative. The popular GARCH(p,q) model of Bollerslev (1986) and many of its subsequent
variations represent the most widely used parametrizations of the ARCH(∞) process, refer
to Giraitis, Kokoszka and Leipus (2000) for a thorough description and further examples.
This class of models is particularly useful in various applications in finance, where the non-
negative time series are often encountered. Examples of such data include various volatility
measures in finance, and more recently durations between market events in the high-frequency
financial datasets. The complicated nature of dynamic dependencies in this kind of series is a
well established empirical fact, as highlighted inter alia in Andersen, Bollerslev, Diebold and
Labys (2001) and Jasiak (1998).
Yet, several outstanding issues related to the ARCH(∞) sequences have hitherto received
limited attention in the literature. One such issue is the existence of the long memory solution
of the ARCH(∞) model. This is especially relevant as many financial series exhibit strong
dependence, often characterized by high persistence of the estimated autocovariance function.
One of the early attempts to define a long memory ARCH(∞) model is the study by Ding
and Granger (1996), who also present a wealth of empirical evidence of the persistent auto-
correlations in squared and absolute returns of several real-world financial data series. One of
the conclusions of Ding and Granger (1996) is that existent GARCH and IGARCH volatility
models do not provide adequate fit to the observed autocorrelation structure of the empirical
volatility estimates of their series.
In order to account for persistent volatility, Ding and Granger (1996) suggest the following
parametrization of the ARCH(∞) process:
Xt = ψt · εt ψt = a
∫ 1
0
1− α− β
1− βdF (α, β) +
∞∑j=1
[∫ 1
0αβj−1dF (α, β)
]Xt−j ,
where a > 0, α, β ≥ 0 s.t. 0 ≤ α + β ≤ 1, and F (α, β) is a joint distribution function of the
parameters α and β. This model is motivated by the influential study of Granger (1980), where
he shows that a linear fractionally integrated process can arise as the result of aggregation of
an infinite number of component short-memory AR(1) processes, each having the autoregres-
sive coefficient drawn randomly from a specific distribution on the unit interval. Ding and
Granger (1996) use a similar idea and show that aggregation of short-memory GARCH(1,1)
28
processes leads to the model above, where a, α, β correspond to the respective parameters in
the component GARCH(1,1) processes.
Additional parametric assumptions on the distribution function F (α, β) allow Ding and
Granger (1996) to identify two important cases of their model. The first case corresponds to
the restriction α+ β < 1, resulting in the following ARCH(∞) model:
Xt = ψt · εt ψt = a(1− µ) + µ∞∑j=1
πj−1Xt−j , (2)
where 0 < µ < 1 and the sequence of coefficients πj : j ≥ 0 is defined for p, d > 0 as
πj := dΓ(p+d)Γ(p+j)Γ(p)Γ(p+d+j+1) , where Γ is the gamma function. Using properties of the hypergeometric
function one can show that∑∞
j=0 πj = 1 for all values of p and d. The second case arises when
F (α, β) is such that α+ β = 1. Ding and Granger (1996) show that the resulting process can
be written as:
Xt = ψt · εt ψt =∞∑j=1
πj−1Xt−j . (3)
Having defined the processes (2) and (3), Ding and Granger (1996) leave a number of issues
open for further research. One of the most important issues concerns existence of the covariance
stationarity solution of their models, together with corresponding properties of the autocovari-
ance function. In particular, Ding and Granger (1996) conjecture that the process (2) has a
non-summable autocovariance function, equivalently long memory by the widely used defini-
tion of McLeod and Hipel (1978). However, Giraitis, Kokoszka and Leipus (2000) show that a
sufficient covariance stationarity condition for this process implies summable autocovariances,
and therefore the short memory nature of the model.
Yet more questions arise in connection with the model defined in (3), which in the re-
mainder of the paper is referred to as the ARCH(∞) model with zero intercept. Under i.i.d.
assumptions on the sequence of shocks εt : t ∈ Z and E εt∑∞
j=0 πj < 1, Giraitis, Kokoszka
and Leipus (2000) show that the unique solution of (3) is given by Xt = ψt = 0 for all t ∈ Z.
The same trivial solution for the class of short memory GARCH processes with zero intercept
appears in Bougerol and Picard (1992). As well as this, given that a non-degenerate solution
of (3) exists, the issues of covariance stationarity and long memory of such solution remain
open. Note that each of the component GARCH processes in this model is not covariance
stationary, indeed not even first order stationary, owing to the restriction α + β = 1 imposed
in the derivation of (3).
In this paper we focus on the class of ARCH(∞) processes with zero intercept, analogous to
the one defined in (3). We show that there exists a non-degenerate and non-negative covariance
stationary solution of such processes, which coincides with the solution of the recently studied
class of MD-ARCH(∞) processes of Koulikov (2003). In particular, such a solution is shown to
have non-summable autocovariance function. In section 3 we show consistency and asymptotic
normality of the QML estimator for the mean and memory parameter of a simple long memory
29
ARCH(∞) model. Finite sample properties of the estimator are examined in a small Monte
Carlo study in section 4. Proofs of the main results of the paper are collected in the appendix.
2 Long memory solution of the ARCH(∞) equations
The class of ARCH(∞) models with zero intercept, such as the model (3) defined by Ding and
Granger (1996), does not have a Volterra series expansion similar to the one used in Giraitis,
Kokoszka and Leipus (2000) and Kazakevicius and Leipus (2002). Hence, sufficient stationarity
and covariance stationarity conditions derived in these papers are not immediately applicable
to this class of models. In order to find the solution of (3) and to study its properties, a
new representation in terms of the sequence of i.i.d. shocks εt : t ∈ Z is needed. As we
show below, under a set of mild conditions on the sequence of coefficients πj : j ≥ 0, the
covariance stationary solution of the ARCH(∞) models with zero intercept coincides with the
long memory solution of the class of MD-ARCH(∞) sequences studied in Koulikov (2003).
We begin by examining a set of necessary conditions for the existence of a covariance
stationary solution of the general ARCH(∞) sequences (1). Kazakevicius and Leipus (2002)
study conditions for existence of a strictly stationary solution of the ARCH(∞) model, but their
approach requires a∗ > 0. As demonstrated in Theorem 1, non-negative covariance stationary
solutions of the ARCH(∞) equations do not require this restriction.
THEOREM 1. Let (Xt, ψt) : t ∈ Z be a non-negative covariance stationary solution of (1)
satisfying EXt = Eψt = a for a ≥ 0. Then the following conditions hold when a > 0:
a∗ ≥ 0 ,
∞∑j=0
πj ≤ 1 , where a∗ > 0 iff∞∑j=0
πj < 1 , (4)
∞∑j=1
πj−1E(Xt − a)(Xt−j − a) <∞ . (5)
In addition, the only non-negative stationary solution of (1) when a = 0 is given by Xt = ψt = 0
for all t ∈ Z.
Condition (5) related to the autocovariance function of any covariance stationary solution
of the ARCH(∞) model is an important result of the theorem. In particular, it does not rule
out long memory solutions of the ARCH(∞) equations, for which the autocovariance function
satisfies E(Xt − a)(Xt−j − a) = O(j2d−1), as the summability of πj : j ≥ 0 guarantees that
the condition is fulfilled.
An equally notable result of Theorem 1 is the link between a∗ and the sequence of coefficients
πj : j ≥ 0 shown in (4). Giraitis, Kokoszka and Leipus (2000) demonstrate that for an
ARCH(∞) process with a∗ > 0, the condition∑∞
j=0 πj < 1, together with some additional
assumptions on the sequence of shocks εt : t ∈ Z, implies summability of the autocovariance
function of the weakly stationary solution of the process. Hence, any long memory solutions
30
of the model (1) has to belong to the class of ARCH(∞) models with zero intercept, such as
model (3) of Ding and Granger (1996).
For convenience of subsequent exposition, assumptions related to the ARCH(∞) model (1)
and repeatedly referred to in the remainder of the paper are collected below:
A1. εt : t ∈ Z is defined on the common probability space (Ω,F ,P), and consists of i.i.d.
copies of a non-negative random variable ε0 with Eε0 = 1 and E(ε0 − 1)2 <∞.
A2. a∗ ≥ 0 and πj : j ≥ 0 ⊆ R0+ s.t.∑∞
j=0 πj ≤ 1.
We note that A1 is comparable to the assumptions of Giraitis, Kokoszka and Leipus (2000),
and is necessary for the existence of covariance stationarity solution of the model with a∗ > 0.
While Theorem 1 shows that the ARCH(∞) process with zero intercept can have covariance
stationary solutions with non-summable autocovariances, the existence of such solutions has yet
to be established. Consider the MD-ARCH(∞) process, studied in details in Koulikov (2003):
X∗t = ψ∗t · εt ψ∗t = a+∞∑j=1
θj−1(X∗t−j − ψ∗t−j) . (6)
Under A1 and E(ε0−1)2∑∞
j=0 θ2j < 1, a covariance stationary solution of this process is derived
in Koulikov (2003), and is given by the following Volterra series expansion of (6):
X∗t = ψ∗t · εt ψ∗t = a∞∑k=0
M(k, t) , (7)
where each element of the sequence M(k, t) : k ≥ 0, t ∈ Z is a non-linear square-integrable
function of the underlying sequence of i.i.d. innovations εt : t ∈ Z. Let the sequence of
coefficients θj : j ≥ 0 in (6) be related to πj : j ≥ 0 in the ARCH(∞) model as follows.
Define P(z) := 1−∑∞
j=1 πj−1 zj on the open complex unit disc. When P(z) 6= 0 for all |z| < 1,
let P−1(z) = 1 +∑∞
j=1 θj−1 zj be given by the power series expansion of 1
P(z) around z = 0.
Then P(z)P−1(z) = 1 for all |z| < 1 and the sequence of coefficients θj : j ≥ 0 is related to
πj : j ≥ 0 as:
θ0 = π0 , θj = πj +j−1∑i=0
θj−1−iπi ∀j ≥ 1 . (8)
Theorem 2 establishes the link between covariance stationary solutions of the MD-ARCH(∞)
and ARCH(∞) models.
THEOREM 2. Under A1–A2, consider covariance stationary solutions of the ARCH(∞) pro-
cess (1) and of the MD-ARCH(∞) process (6), where EXt = EX∗t = a > 0 and the sequences
θj : j ≥ 0 and πj : j ≥ 0 are related as in (8). Assume that the following condition is
satisfied:
∞∑j=N
(πj +
N−1∑i=0
θj−1−iπi
)→ 0 as N →∞ . (9)
31
0
1
2
3
4
5
6
7
8
9
10
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
E(e-
1)^2
d
Non-stationary region
Covariance stationary region
Figure 1: Graph of E(ε0 − 1)2[
Γ(1−2d)Γ(1−d)2 − 1
]= 1 with corresponding stationary and non-
stationary regions of model (12).
Then the solution (Xt, ψt) : t ∈ Z of the model (1) satisfies the following set of stochastic
equations:
Xt = ψt · εt ψt = a+∞∑j=1
θj−1(Xt−j − ψt−j) . (10)
Similarly, the solution (X∗t , ψ∗t ) : t ∈ Z of the model (6) satisfies the following set of stochastic
equations:
X∗t = ψ∗t · εt ψ∗t = a+∞∑j=1
πj−1(X∗t−j − a) . (11)
It follows from Theorem 2 that the class of covariance stationary MD-ARCH(∞) sequences
studied in Koulikov (2003) has the ARCH(∞) representation (11). In particular, the Volterra
series form (7) gives the solution of any covariance stationary ARCH(∞) sequence (Xt, ψt) :
t ∈ Z, including the long memory case studied in Koulikov (2003). By Theorem 1, the long
memory solution of (1) has a∗ = 0 and∑∞
j=0 πj = 1. Model (3) of Ding and Granger (1996)
gives one possible parametrization of the long memory ARCH(∞) processes.
Giraitis, Kokoszka and Leipus (2000) consider the following ARCH(∞) process, correspond-
ing to the restricted version of Ding and Granger (1996) model (3), for which p = 1− d:
Xt = ψt · εt ψt = [1− (1− L)d]Xt , (12)
where L denotes the lag operator. The techniques used by Giraitis, Kokoszka and Leipus (2000)
do not allow them to establish conditions for the covariance stationarity of this process. The-
orem 2 shows that the covariance stationary solution of (12) is the same as the covariance
stationary solution of the following long memory MD-ARCH(∞) model:
Xt = ψt · εt ψt = a+ [(1− L)−d − 1](Xt − ψt) , (13)
32
provided that condition (9) is fulfilled. Indeed, the recursive structure of coefficients from the
power series expansion of polynomials 1− (1− z)d and (1− z)−d − 1 allows us to write:
∞∑j=N
(πj +
N−1∑i=0
θj−1−iπi
)=
θN (N + 1)π0
∞∑j=1
j
N + jπj−1 ≤ K1N
d∞∑j=1
j−2−d
j +N
≤ K2Nd
∫ ∞
0
s−32d
N + sds = O(N−d) → 0 as N →∞ ,
where K1,K2 > 0 are constants, and we use (18.25) in Spiegel and Liu (1999) to evaluate the
integral.
Under A1, the sufficient covariance stationarity condition in Theorem 2 of Koulikov (2003)
implies that the parameter d in model (12) has to satisfy the inequality E(ε0−1)2[
Γ(1−2d)Γ(1−d)2 − 1
]<
1 . This inequality, linking the variance of shocks and the range of parameter d, is akin to similar
conditions for the class of short memory GARCH models; refer to Giraitis, Kokoszka and Lei-
pus (2000). Figure 1 depicts covariance stationary region of the model in the two-dimensional
plane d × E(ε0 − 1)2. It follows that for the case of unit variance of shocks, parameter d is
restricted to an approximate interval [0, 0.395]. When εt : t ∈ Z is given by i.i.d. copies of χ21
random variable, a choice popular in the empirical volatility modeling literature, the approx-
imate interval of d corresponding to the covariance stationary region of the model is reduced
to [0, 0.340].
3 Quasi-maximum likelihood estimator for the long memory
ARCH(∞) model
Since the introduction of the GARCH(p,q) model by Bollerslev (1986), there has been a con-
tinuous interest in the statistical properties of various estimators of the parameters for this
class of models. Recent studies by Giraitis, Kokoszka, Leipus and Teyssiere (2000) and Giraitis
and Robinson (2001) consider certain semiparametric estimators of the ARCH(∞) models with
non-zero intercept, both relying on minimal additional assumptions about the process, such as
the weak limit of normalized partial sums in the former and a specific local behavior of the spec-
tral density around the origin in the latter. The quasi-maximum likelihood (QML) estimator
for the class of GARCH(p,q) processes was previously examined in a number of studies, most
notably Weiss (1986), Lee and Hansen (1994), Lumsdaine (1996) and Berkes, Horvath and
Kokoszka (2003). The latter covers the entire class of stationary GARCH(p,q) sequences by
making use of their ARCH(∞) representation. Robinson and Zaffaroni (2003) attempt to pro-
vide asymptotic theory of the QML estimator for yet wider class of ARCH(∞) sequences with
non-zero intercept, including the FIGARCH model of Baillie, Bollerslev and Mikkelsen (1996).
However, existence and properties of the stationary solutions of the FIGARCH sequences have
not been studied in the literature, making it difficult to develop a rigorous estimation theory
for this class of models.
33
However, none of the previously cited studies consider the QML estimator in the context
of the long memory ARCH(∞) processes examined in section 2. Recall that by Theorems 1
and 2 the long memory ARCH(∞) model is required to have a∗ = 0, such as in the models (3)
and (12) of Ding and Granger (1996). This poses a number of challenges not previously
addressed in the estimation literature. In particular, existence of Eψ−νt for some ν > 0 is
not an immediate consequence of the process structure and has to be established using a
new set of techniques. Negative moments of ψt are required for existence of the limiting log-
likelihood function and various other ratios of random variates in a number of auxiliary results.
Another issue, commonly present in the time-domain QML estimation of the linear long memory
processes, is a relatively slow convergence rate of the log-likelihood function and its derivatives
to their respective limits. This has important implications on the limiting distribution of the
estimator, see Yajima (1985) for the case of Gaussian fractionally integrated processes and
Robinson and Zaffaroni (2003) for the non-stationary FIGARCH model.
In this section we examine properties of the time-domain QML estimator of the long memory
ARCH(∞) model (12). For a time series of non-negative random variables (Xt, ψt) : t ∈ Zwith non-summable autocovariances, this model offers a particularly simple parametrization in
terms of only two parameters a and d. In this respect (12) is similar to a well studied fractional
white noise model of Granger (1980) and Hosking (1981). Moreover, as will be shown in
Theorem 3, the asymptotic properties of the two models are remarkably similar as well.
In addition to A1–A2, the QML estimator studied in this section is based on the following
additional assumptions:
A3. The random variable ε0 has a distribution function satisfying lims→0
s−µPε0 ≤ s = 0 for
some µ > 0.
A4. Let D := [d, d] for some 0 < d < d < 1. Let d0 ∈ D0 and a0 ∈ R+, where D0 ⊆ D is the
covariance stationary region of the model.
Note that A3 is equivalent to the corresponding assumption in Berkes, Horvath and Kokosz-
ka (2003), and ensures existence of N ≥ 1 such that E(
N∑n=1
εn
)−1
< ∞ without imposing
specific distributional assumptions on the sequence of i.i.d. shocks εt : t ∈ Z.Suppose that a finite interval Xt : 1 ≤ t ≤ T of a covariance stationary solution Xt : t ∈
Z of the long memory ARCH(∞) model (12) is observed by an econometrician. Under A4,
let a0 and d0 be the unknown parameters of (12), and suppose that the statistical inference for
these parameters is required. Define a sequence of positive functions πj(d) : j ≥ 0 as follows:
πj(d) :=dΓ(1− d+ j)
Γ(1− d)Γ(2 + j), (14)
for d ∈ D. Let πj(d0) be denoted simply as πj for all j ≥ 0. In particular, definition (14)
implies that πj(d) : j ≥ 0 ⊆ R+ for all d ∈ D. Denoting the n-th derivative of (14) as
π(n)j (d), and observing that π(0)
j (d) = πj(d), the following inequality holds for all j ≥ 1 and
34
n ≥ 0:∣∣∣π(n)j−1(d)
∣∣∣ ≤ Kn [log j]n j−1−d , (15)
where Kn : n ≥ 0 is a sequence of non-negative constants. Furthermore, define a sequence
of non-negative functions wt(d) : 1 ≤ t ≤ T as follows:
wt(d) =∞∑j=1
πj−1(d)Xt−j , (16)
where by convention wt(d0) = ψt. Let w(n)t (d) denote the n-th derivatives of (16) with respect
to d. Finally, let the log-likelihood function based on (16) be defined as:
LT (d) = −T∑t=1
[logwt(d) +
Xt
wt(d)
], (17)
and the QML estimator of the parameter d0 be given by:
dT = arg maxd∈D
1TLT (d) . (18)
For the case of exponential shocks εt : t ∈ Z, the function (17) will be the proper log-likelihood
function for the model. However, apart from A1 and A3, the asymptotic properties of the
estimator (18) derived in this section do not depend on particular distributional assumptions
about ε0, and therefore we refer to (18) as the QML estimator for the model (12). Theorem 3
shows consistency and asymptotic normality of the QML estimator defined above.
THEOREM 3. Under A1–A4, let the sequence of estimators dT : T ≥ 1 of the long memory
ARCH(∞) model (12) be defined as in (18). Then:
dTa.s.−→ d0 as T →∞ . (19)
In addition:
T12 (dT − d0)
d−→ N
(0,
6π2
)as T →∞ , (20)
where N(0, S) denotes a univariate normal distribution with mean 0 and variance S.
We wish to remark on the similarity of the asymptotic limiting distribution of dT in (20) to
that of the fractional white noise model established in Yajima (1985). Notably, the asymptotic
variance of the limiting distribution of T12 (dT − d0) is the same in both models in spite of
their markedly different structure. In addition, the estimator of the memory parameter is
independent of that for the mean parameter a0, as the log-likelihood function (17) does not
depend on the latter.
As in Yajima (1985), we propose the following simple estimator of a0:
aT =1T
T∑t=1
Xt . (21)
35
Asymptotic properties of the sequence aT : T ≥ 1 follow from Theorem 4 of Koulikov (2003)
using the equivalence of representations (12) and (13). In particular, aT is weakly consistent
for a0 with the limiting distribution given by:
T12−d0(aT − a0)
d−→ N(0, c2a0,d0
)as T →∞ ,
where the closed form expression for c2a0,d0is not available, but is known to depend on both
a0 and d0. The rate T12−d0 of the estimator aT is in line with the corresponding results of
Brockwell and Davis (1991) for the class of linear long memory models.
In practice, the estimator (18) is not feasible, as the functions wt(d) : 1 ≤ t ≤ T re-
quire availability of an infinite subset of the process Xt : t ≤ T. However, the asymptotic
properties of such estimator are easier to establish. Therefore, the literature on the maximum
likelihood estimation of GARCH models commonly defines both feasible and infeasible estima-
tors, showing their convergence to the same limit. We proceed in the analogous way. Consider
the following sequence of functions, related to (16):
w1(d) = a0 , wt(d) = a0
∞∑j=t
πj−1(d) +t−1∑j=1
πj−1(d)Xt−j for 1 < t ≤ T , (22)
where d ∈ D. The feasible QML estimator is defined as:
dT = arg maxd∈D
1TLT (d) , (23)
where the function LT (d) is given by:
LT (d) = −T∑t=1
[log wt(d) +
Xt
wt(d)
]. (24)
Theorem 4 shows consistency of the feasible QML estimator dT for d0.
THEOREM 4. Under A1–A4, let the sequence of feasible estimators dT : T ≥ 1 of the long
memory ARCH(∞) model (12) be defined as in (23). Then:
dTa.s.−→ d0 as T →∞ . (25)
The limiting distribution of the sequence T12 (dT − d0) is not available, as |dT − dT | : T ≥
1 has a slower rate than the required op(T−12 ). This follows from the slow convergence of
supd∈D1T
∣∣∣L(1)T (d)− L
(1)T (d)
∣∣∣ as T → ∞, where L(1)T (d) and L
(1)T (d) denote first derivatives of
respectively (17) and (24). We note that Yajima (1985) relies on Gaussianity of his fractional
white noise model in deriving the limiting distribution of the QML estimator of the memory
parameter. In the context of the FIGARCH model, Robinson and Zaffaroni (2003) assume
d0 >12 in order to show the equivalent limiting distribution of their feasible and infeasible QML
estimators. An analogous restriction is not suitable for the long memory ARCH(∞) case, as the
stationarity properties of such models are not known. The Monte Carlo experiment in the next
section is designed to examine how well the limiting distribution in Theorem 3 approximates
that of the feasible estimator dT .
36
4 Monte Carlo experiment
In order to assess the finite sample properties of the QML estimator of model (12) studied in the
previous section, we conduct a small scale Monte Carlo experiment, results of which are reported
below. Recall that although consistent, the feasible QML estimator defined in (23) has unknown
limiting distribution, owing to a relatively slow convergence of supd∈D1T
∣∣∣L(1)T (d)− L
(1)T (d)
∣∣∣ as
T → ∞. One of the goals of this section is to study in the experimental way an approximate
distribution of T12 (dT − d0) for large T , for a variety of sample sizes and assumptions on the
distribution of shocks εt : t ∈ Z. In particular, it is of interest to know how well this
distribution is approximated by that of the infeasible estimator (18) shown in Theorem 3.
Another potentially interesting issue is to assess effects of the estimation uncertainty of the
location parameter a0 on the distribution of (23). Recall that Theorem 4 is proved under
assumption that the true value a0 is available in (22). Finally, the approximate distribution
T12 (dT − d0) is examined for mildly non-stationary values of d0 in model (12).
We use the following experimental setup. The data generating process in all experiments
is based on the set of stochastic equations (12), with d0 = 0.35 for the covariance stationary
version of the model and d0 = 0.45 for the non-stationary case. In both versions of the data
generating process a0 = 1. All experiments are conducted with two choices of the sample size,
given by T = 500 and T = 2000. An often encountered problem with generating long memory
time series on the computer is the slowly dissipating effect of the starting value of the generated
process, owing to the hyperbolic rate of coefficients in the power series expansion of (1− z)−d
around z = 0, see Beran (1994) and references therein. In order to mitigate this effect, the data
generating algorithm used in the experiments starts the process at its unconditional expected
value a0 and discards the first 3000 realizations of the process. The remainder of the generated
data is used in the estimation phase of the experiment.
The data is simulated using three alternative distributions of the random shocks εt : t ∈ Z,such that assumptions A1 and A3 are satisfied. The first choice are the exponential random
numbers, for which (24) is the proper log-likelihood function. Finite sample properties of the
feasible QML estimator (23) in this case are expected to be the best. The second case is given
by χ21 distributed shocks ε∗t , normalized as follows:
εt =ε∗t + c
b, where b =
√2 , c = b− 1 .
The normalization scales the variance of ε∗t such that E(ε0 − 1)2 = 1. The third choice of
random numbers are Student’s t variates zt with parameter ν, transformed to εt in order to
ensure A1 and to scale the variance to unity as follows:
εt =z2t + c
b, where b = ν
√3
(ν − 2)(ν − 4), c = b− ν
ν − 2.
The parameter ν in this setup is related to the distribution function of εt by the moment
condition E εν∗t <∞ for ν∗ < ν2 . In all experiments in this section we use ν = 7.
37
Table 1: Results of the Monte Carlo experiment for the data generating process based on (12)
with parameters a0 = 1 and d0 = 0.35.
Experiments Mean Median Bias RMSE Variance 95% Coverage
a0 Exp. T = 2000 0.35168 0.35163 0.00168 0.00042 0.00042 0.90600
a0 χ21 T = 2000 0.35303 0.35293 0.00303 0.00050 0.00049 0.86800
a0 t T = 2000 0.35263 0.35484 0.00263 0.00075 0.00074 0.79000
aT Exp. T = 2000 0.34316 0.34313 -0.00683 0.00093 0.00088 0.74600
aT χ21 T = 2000 0.34267 0.34198 -0.00733 0.00094 0.00088 0.73600
aT t T = 2000 0.33988 0.33845 -0.01012 0.00109 0.00099 0.67400
a0 Exp. T = 500 0.35972 0.36028 0.00972 0.00198 0.00189 0.87800
a0 χ21 T = 500 0.35604 0.35877 0.00604 0.00231 0.00228 0.85200
a0 t T = 500 0.36286 0.36768 0.01286 0.00348 0.00331 0.75400
aT Exp. T = 500 0.32642 0.32698 -0.02358 0.00337 0.00282 0.76000
aT χ21 T = 500 0.32994 0.33332 -0.02006 0.00363 0.00322 0.74800
aT t T = 500 0.32826 0.32414 -0.02174 0.00409 0.00361 0.69000
Notes: The experiments are described as follows: a0, respectively aT , indicates that the true parameter,
respectively the estimator (21), has been used in (22) during the estimation, Exp., χ21 and t denote the
normalized distribution of shocks, and T = 500 and T = 2000 give the sample sizes. Numbers reported in
the table give corresponding statistics of the estimator dT across 500 replications. Variance column shows the
sampling variance of the estimator. The 95% coverage column reports empirical frequency of the estimator
within the 95% confidence interval implied by the limiting distribution of the infeasible estimator in Theorem 3.
38
Table 2: Results of the Monte Carlo experiment for the data generating process based on (12)
with parameters a0 = 1 and d0 = 0.45.
Experiments Mean Median Bias RMSE Variance 95% Coverage
a0 Exp. T = 2000 0.46113 0.46274 0.01113 0.00066 0.00053 0.81800
a0 χ21 T = 2000 0.46010 0.46077 0.01010 0.00076 0.00066 0.77400
a0 t T = 2000 0.45923 0.46168 0.00923 0.00104 0.00096 0.70800
aT Exp. T = 2000 0.43395 0.43147 -0.01605 0.00136 0.00111 0.63400
aT χ21 T = 2000 0.43469 0.43199 -0.01531 0.00146 0.00123 0.62200
aT t T = 2000 0.43672 0.43422 -0.01328 0.00169 0.00151 0.62400
a0 Exp. T = 500 0.47574 0.48219 0.02574 0.00294 0.00227 0.80000
a0 χ21 T = 500 0.47433 0.48048 0.02433 0.00339 0.00279 0.73200
a0 t T = 500 0.48043 0.47936 0.03043 0.00463 0.00371 0.66200
aT Exp. T = 500 0.42494 0.42282 -0.02506 0.00434 0.00371 0.70800
aT χ21 T = 500 0.42021 0.42340 -0.02979 0.00461 0.00372 0.69600
aT t T = 500 0.41299 0.41004 -0.03701 0.00581 0.00444 0.62000
Notes: The experiments are described as follows: a0, respectively aT , indicates that the true parameter,
respectively the estimator (21), has been used in (22) during the estimation, Exp., χ21 and t denote the
normalized distribution of shocks, and T = 500 and T = 2000 give the sample sizes. Numbers reported in
the table give corresponding statistics of the estimator dT across 500 replications. Variance column shows the
sampling variance of the estimator. The 95% coverage column reports empirical frequency of the estimator
within the 95% confidence interval implied by the limiting distribution of the infeasible estimator in Theorem 3.
39
The feasible QML estimator studied in this section is defined by equations (22), (23)
and (24). In one set of experiments, the parameter a0 is assumed to be known, as required by
Theorem 4. In another set of experiments a0 is estimated in the first stage by (21), replacing
a0 in (22) during the estimation of the memory parameter. The sampling variance of aT adds
to the estimation uncertainty of parameter d0 in the feasible QML case, but the amount of the
added variation is not quantified by Theorem 4. Design of the Monte Carlo experiment allows
us to assess this effect. Results of all experiments reported in Tables 1 and 2 are based on 500
replications.
As expected, results of the Monte Carlo experiment point out to higher sampling variance
of the feasible estimator d in comparison to the asymptotic variance of the infeasible one shown
in Theorem 3. This is also reflected by lower than nominal 95% coverage frequences of the
estimator. In all experiments the fat–tail t distributed shocks lead to increased variance and
worse coverage rate of the estimator, in contrast to both the exponential and χ21 cases, both
of which have finite moments of all orders. In addition, the sampling variance and bias of d
in the non-stationary region of the model shown in Table 2 are uniformly higher than those
in Table 1. Recall that the limiting results derived in section 3 are applicable only in the
covariance stationary case.
In line with corresponding results for the covariance stationary linear ARFIMA models
reported in Cheung and Diebold (1994), the absolute bias of d increases sharply when (21) is
used as the estimator of a0 in place of its true value in Table 1. However, the increase in bias
is less noticeable for the non-stationary model. Unlike in the linear case, the bias changes from
positive to negative when changing the estimator of the location parameter in all experiments
reported in Tables 1 and 2. Predictably, the sampling variance of d increases when (21) is used
in the log-likelihood function, more pronounced so for larger sample sizes.
5 Conclusion
This paper studies specification and estimation of a class of long memory ARCH(∞) models.
We show that the ARCH(∞) model of Robinson (1991) can have a covariance stationary
solution with non-summable autocovariance function, equivalently long memory as defined in
McLeod and Hipel (1978). A notable feature of the long memory ARCH(∞) model is absence
of the intercept and sum of the weighting coefficients equal to unity. This makes the Volterra
series expansion of the classical ARCH(∞) model used in Giraitis, Kokoszka and Leipus (2000)
and Kazakevicius and Leipus (2002) inapplicable in the present context. In order to establish
existence of a non-negative long memory solution of (1) and examine its properties, we show
equivalence between the covariance stationary solutions of the ARCH(∞) model and the class
of MD-ARCH(∞) sequences of Koulikov (2003). Two important parametrizations of the long
memory ARCH(∞) models have been introduced by Ding and Granger (1996), including a
particularly simple two parameter model (12).
In the second part of the paper we examine asymptotic properties of the QML estimator
40
for the mean and memory parameters of the model (12). It is shown that the estimator of the
memory parameter is strongly consistent, but the asymptotic normality is not available for the
feasible case due to a slow convergence rate between the sequences of feasible and infeasible
estimators. The asymptotic variance of the infeasible estimator is shown to be 6π2 , the same as
in the fractional white noise model of Granger (1980) and Hosking (1981).
The class of long memory ARCH(∞) models will find potential applications in several areas
of financial econometrics. Apart from the volatility modeling, where the manifestations of long
range dependence have been well documented and extensively studied, it offers an attractive
and parsimonious way of modeling time series of non-negative data in the newly emerged field
of econometric models for high frequency financial data.
6 Appendix
This technical appendix collects proofs of the main results in Section 2 and Section 3. To
simplify notation, we use convention∑n
m · = 0 whenever m > n for m,n ∈ Z. Kn : n ≥ 0 is
a sequence of non-negative constants.
PROOF OF THEOREM 1: Using stationarity and non-negativity of the solution (Xt, ψt) :
t ∈ Z, and the monotone convergence theorem we write:
Eψt = E
[a∗ +
∞∑j=1
πj−1Xt−j
]= a∗ +
∞∑j=1
πj−1EXt−j = a∗ + a
∞∑j=0
πj ,
from where∑∞
j=0 πj ≤ 1 for a > 0 since Eψt = a. Since all parameters are non-negative, a∗ > 0
iff∑∞
j=0 πj < 1. Hence, the process can be written as:
Xt − a = (Xt − ψt) +∞∑j=1
πj−1(Xt−j − a) .
Then, by E|Xt−a||Xt−j−a| ≤ E(Xt−a)2 <∞, summability of πj : j ≥ 0 and the monotone
convergence theorem:
E(Xt − a)2 = E(Xt − ψt)(Xt − a) +∞∑j=1
πj−1E(Xt − a)(Xt−j − a) .
By Cauchy-Schwarz inequality and covariance stationarity of (Xt, ψt) : t ∈ Z:
E(Xt − ψt)(Xt − a) <∞ ,
implying that∑∞
j=1 πj−1E(Xt−a)(Xt−j−a) <∞. Finally, the last part of the theorem follows
by non-negativity of the solution (Xt, ψt) : t ∈ Z and EXt = Eψt = 0 when a = 0.
PROOF OF THEOREM 2: Suppose a covariance stationary solution (Xt, ψt) : t ∈ Z of the
ARCH(∞) process is given. Then, using (8), for each N ≥ 0 we can rewrite (1) as:
ψt = a+N∑j=1
θj−1(Xt−j − ψt−j) +∞∑j=N
(πj +
N−1∑i=0
θj−1−iπi
)(Xt−1−j − a) .
41
By covariance stationarity of Xt : t ∈ Z follows:
E
[ψt − a−
N∑j=1
θj−1(Xt−j − ψt−j)
]2
≤ E(Xt − a)2[ ∞∑j=N
(πj +
N−1∑i=0
θj−1−iπi
)]2
,
and the last expression converges to zero as N → ∞ by (9). Since∑N
j=1 θj−1(Xt−j − ψt−j)
converges in L2 as N →∞, we establish representation (10).
Next, consider a covariance stationary solution (X∗t , ψ∗t ) : t ∈ Z of the MD-ARCH(∞)
process (6). By (8), for each N ≥ 0:
ψ∗t = a+N∑j=1
πj−1(X∗t−j − a)− πN−1(ψ∗t−N − a) +∞∑
j=N+1
bj−1,N−1(X∗t−j − ψ∗t−j) ,
where bj,N : j,N ≥ 0 is defined as:
bj,N :=
θj −
N−1∑i=0
θj−1−iπi for j ≥ N ≥ 0
θj otherwise.(A.1)
It is easy to see that πj ≤ bj,N ≤ θj for all j,N ≥ 0, hence the sequence bj,N : j ≥ 0 is
square-summable for each N ≥ 0. By Cauchy-Schwarz inequality we can write:
E
[ψ∗t − a−
N∑j=1
πj−1(X∗t−j − a)
]2
≤ 2 E
[ ∞∑j=N+1
bj−1,N−1(X∗t−j − ψ∗t−j)
]2
+
2π2N−1E(ψ∗t−N − a)2 .
It remains to show that the right-hand side of this expression converges to zero as N → ∞.
The limit of the last expression above is clearly zero, as E(ψt−N − a)2 < ∞ for any N ≥ 0,
and the summability of πj : j ≥ 0 implies that π2N → 0 as N → ∞. Similarly, covariance
stationarity of (X∗t , ψ∗t ) : t ∈ Z together with (6) implies that (X∗t − ψ∗t ) : t ∈ Z is the
sequence of uncorrelated random variables, with E(X∗t −ψ∗t ) = 0 and E(X∗t −ψ∗t )2 <∞ for all
t ∈ Z. It follows that:
E
[ ∞∑j=N+1
bj−1,N−1(X∗t−j − ψ∗t−j)
]2
≤∞∑
j=N+1
θ2j−1 E(X∗t−j − ψ∗t−j)
2 → 0 as N →∞ ,
where we use (A.1). Standard results on stationary time series, such as Brockwell and Da-
vis (1991), imply that a +∑N
j=1 πj−1(X∗t−j − a) converges in L2 as N → ∞, and hence (11)
follows.
LEMMA 1. Let (Xt, ψt) : t ∈ Z be a sequence of non-negative random variables satisfying
ARCH(∞) equations (12) and Pψt = 0 = 0 for all t ∈ Z. Assume that the sequence
πj : j ≥ 0 is given by (14). Then the following inequalities hold a.s.:
K1 ≤ψtψt−1
≤ 1 + εt−1 ,
for some 0 < K1 < 1.
42
PROOF: Using the structure of model (12), we can write:
ψtψt−1
= π0Xt−1
ψt−1+
∞∑j=2
πj−1Xt−j
ψt−1≤ 1 + εt−1 , (A.2)
where the recursive structure of (14):
πj = πj−1j − d0
j + 1for all j ≥ 1 , (A.3)
is utilized in the following way:
∞∑j=2
πj−1Xt−j
ψt−1=
∞∑j=1
πj−1j−d0j+1 Xt−1−j
ψt−1≤
∞∑j=1
πj−1Xt−1−j
ψt−1= 1 a.s.
noting non-negativity of the summands, and 0 < j−d0j+1 < 1 for all j ≥ 1 and 0 ≤ d0 < 1.
From (A.2) and (A.3) we write:
ψtψt−1
≥
∞∑j=2
πj−1Xt−j
ψt−1=
∞∑j=1
πj−1j−d0j+1 Xt−1−j
ψt−1≥ 1− d0
2> 0 ,
where we use non-negativity of the summands, and 1−d02 ≤ j−d0
j+1 for all j ≥ 1.
LEMMA 2. Let (Xt, ψt) : t ∈ Z be a covariance stationary non-negative solution of the
ARCH(∞) model (12), where EXt = Eψt = a > 0 and assumptions A1–A3 hold. Then:
Pψt = 0 = 0 , (A.4)
Eψ−νt <∞ , (A.5)
for any ν > 0.
PROOF: We first show (A.4). By A3 the distribution of the shocks εt : t ∈ Z does not
contain atom at zero. Using (12) and the structure of (14), whereby πj : j ≥ 0 ⊆ R+, the
following probabilities are equal:
Pψt = 0 = P
∞⋂j=1
ψt−j = 0
.
By Theorem 2, the covariance stationary solution (Xt, ψt) : t ∈ Z also satisfies (6). Thus,
probability of the event on the right hand side of this expression is zero, since ψt−1−j = 0 for
all j ≥ 1 implies that ψt−1 = a.
To show (A.5), it is sufficient to establish that:
Pψ−1t ≥ s = Pψt ≤ s−1 ≤ O(s−ν
∗) , (A.6)
43
for some ν∗ > ν and s ∈ R+. Choose a sequence of positive numbers ci : i ≥ 0, where
c0 := 1, ci → ∞ as i → ∞, and cici+1
≥ K1 > 0 for all i ≥ 0. Using non-negativity of the
solution (Xt, ψt) : t ∈ Z and Lemma 1 we can write for an arbitrary 1 ≤M <∞ and i ≥ 0:
PKi2 ci ψt ≤ s−
1
2i ≤ P
Ki2 ci ψt−M
M∑j=1
ψt−jψt−M
εt−j ≤ s−1
2i
≤ P
Ki+12 ci ψt−M
M∑j=1
εt−j ≤ s−1
2i
,
where by Lemma 1 there exists a constant 0 < K2 < 1 s.t. K2 ≤ min ψt−1
ψt−M, . . . ,
ψt−M
ψt−M .
Observe that the following inequality holds for a pair of non-negative random variables A and
B and any s ∈ R+:
PA ·B ≤ s ≤ PA ≤ s12 + PB ≤ s
12 .
We continue writing:
P
Ki+12 ci
ci+1
ci+1ψt−M
M∑j=1
εt−j ≤ s−1
2i
≤ P
cici+1
M∑j=1
εt−j ≤ s−1
2i+1
+ P
Ki+1
2 ci+1 ψt−M ≤ s−1
2i+1
.
Using stationarity of the solution (Xt, ψt) : t ∈ Z we can write for an arbitrary 1 ≤ N <∞:
Pψt ≤ s−1 ≤N∑i=1
P
ci−1
ci
M∑j=1
εj ≤ s−1
2i
+ PKN
2 cN ψt ≤ s− 1
2N
. (A.7)
Choose the sequence ci : i ≥ 0 s.t. KN2 cN →∞ as N →∞. By (A.4) the last probability on
the right hand side of this expression converges to zero. Next, consider probabilities under the
summation sign in equation (A.7). Using Markov’s inequality:
P
ci−1
ci
M∑j=1
εj ≤ s−1
2i
≤ P
K1
M∑j=1
εj
−2i
≥ s
≤ s−ν∗E
K1
M∑j=1
εj
−ν∗ 2i
.
Let M in the last expression be given by M = K3 · M∗. By A3 there exists K3 ≥ 1 s.t.
E(K3∑n=1
εn
)−1
< ∞ . The harmonic–arithmetic mean inequality, see Spiegel and Liu (1999),
helps to establish the following:
E
M∑j=1
εj
−1
≤M−2M∗∑j=1
E
(K3∑n=1
εn+K3(j−1)
)−1
= O(M−1) = K4 .
Finally, choose sufficiently large M to ensure K−11 K4 < 1 and write (A.7) for any given s ∈ R+
as follows:
Pψt ≤ s−1 ≤ s−ν∗
( ∞∑i=1
[K−1
1 K4
]ν∗ 2i
+ 1
).
This finishes the proof of (A.5).
44
LEMMA 3. Let (Xt, ψt) : t ∈ Z be a covariance stationary non-negative solution of the
ARCH(∞) model (12), and wt(d) : 1 ≤ t ≤ T be defined as in (16). Then, under A3–A4,
for any ν > 0:
E(
supd∈D
ψtwt(d)
)ν<∞ (A.8)
PROOF: For any 1 ≤M <∞ we can write:ψt
wt(d)≤ ψt
M∑j=1
πj−1(d)Xt−j
≤ ψt
ψ π(d)M∑j=1
εt−j
,
where ψ := minψt−1, . . . , ψt−M and
π(d) := minπ0(d), . . . , πM−1(d) . (A.9)
Note that ψ > 0 by Lemma 2. Using Lemma 1, ψt
ψ ≤ max(1 + εt−1), . . . ,M∏j=1
(1 + εt−j) ≤
M∏j=1
(1 + εt−j) , and hence we write:
supd∈D
ψtwt(d)
≤ K1
M∏j=1
(1 + εt−j)
M∑j=1
εt−j
.
The remainder of the proof is analogous to the proof of Lemma 5.1 in Berkes, Horvath and
Kokoszka (2003) using A3.
LEMMA 4. Let (Xt, ψt) : t ∈ Z be a covariance stationary non-negative solution of the
ARCH(∞) model (12), and wt(d) : 1 ≤ t ≤ T be defined as in (16). Then, under A3–A4,
for any ν > 0:
E
(supd∈D
∣∣∣∣∣w(1)t (d)wt(d)
∣∣∣∣∣)ν
<∞ , (A.10)
E
(supd∈D
∣∣∣∣∣w(2)t (d)wt(d)
∣∣∣∣∣)ν
<∞ . (A.11)
PROOF: First, consider (A.10). Using (14), we can write for any 1 < N < M <∞:
∣∣∣∣∣w(1)t (d)wt(d)
∣∣∣∣∣ ≤K9
1d
∞∑j=1
πj−1(d)Xt−j +∞∑j=1
πj−1(d) log j Xt−j
∞∑j=1
πj−1(d)Xt−j
≤K9
d+K9
M−1∑j=1
πj−1(d) log j Xt−j +∞∑j=M
πj−1(d) log j Xt−j
M−1∑j=1
πj−1(d)Xt−j
≤K9
[1 +
1d
]logM +
K9
π(d)
∞∑j=M
πj−1(d) log jXt−jN∑n=1
Xt−n
.
45
where π(d) is defined in (A.9). Using properties of the sequence πj(d) : j ≥ 0, it follows that:
supd∈D
∣∣∣∣∣w(1)t (d)wt(d)
∣∣∣∣∣ ≤ K1 logM +K2
∞∑j=M
j−γ log jXt−jN∑n=1
Xt−n
,
for some γ > 1. Next, consider (A.11). Using similar arguments, it follows for any 1 < N <
M <∞:
∣∣∣∣∣w(2)t (d)wt(d)
∣∣∣∣∣ ≤K10
2d
∞∑j=1
πj−1(d) log j Xt−j +∞∑j=1
πj−1(d) (log j)2Xt−j
∞∑j=1
πj−1(d)Xt−j
≤K10
[1 +
2d
](logM)2 +
K10
π(d)
[1 +
2d
] ∞∑j=M
πj−1(d) (log j)2Xt−jN∑n=1
Xt−n
,
from where we can write:
supd∈D
∣∣∣∣∣w(2)t (d)wt(d)
∣∣∣∣∣ ≤ K11(logM)2 +K12
∞∑j=M
j−γ (log j)2Xt−jN∑n=1
Xt−n
.
Hence, in order to establish (A.10) and (A.11) it is sufficient to show that:
P
K11(logM)2 +K12
∞∑j=M
j−γ (log j)2Xt−jN∑n=1
Xt−n
> s
≤ O(s−ν∗) , (A.12)
for some ν∗ > ν and s ∈ R+. The following auxiliary result is used in subsequent derivations:
E Xt−j
N∑n=1
Xt−n
<∞ for all 1 ≤ N < j <∞. Observe that:
Xt−jN∑n=1
Xt−n
=Xt−j(
ψt−1
ψt−Nεt−1 + . . .+ εt−N
)ψt−N
≤ K4
(N∑n=1
εt−n
)−1Xt−jψt−N
,
using inequality ψt
ψt−1≥ K3 > 0 for all t ∈ Z shown in Lemma 1. By Holder’s inequality,
independence of shocks εt : t ∈ Z, covariance stationarity of the solution (Xt, ψt) : t ∈ Zand Lemma 2, for all 1 ≤ N < j <∞:
EXt−jN∑n=1
Xt−n
≤ K4 E
(N∑n=1
εt−n
)−1
EXt−jψt−N
≤ K5
(E
1ψ2
0
EX20
) 12
<∞ ,
where N is sufficiently large for E(
N∑n=1
εt−n
)−1
<∞ in view of A3.
46
Returning to (A.12), we write using Markov’s inequality and equation above:
P
K12
∞∑j=M
j−γ (log j)2Xt−jN∑n=1
Xt−n
> s−K11(logM)2
≤ K12
s−K11(logM)2E
∞∑j=M
j−γ (log j)2Xt−jN∑n=1
Xt−n
=
K13
s−K11(logM)2
∞∑j=M
j−γ (log j)2 ,
where the interchange of limits is justified by the monotone convergence theorem and non-
negativity of the summands. Finally, it is known that∞∑j=M
j−γ (log j)2 = O(M−γ∗) , for some
γ > γ∗ > 1. Choosing M = sν∗γ∗ , we establish (A.12) since:
K13
s−K11(logM)2
∞∑j=M
j−γ (log j)2 =K13
s−K14(log s)2O(s−ν
∗) ,
for large enough s, where s−K14(log s)2 is positive and increasing.
PROOF OF THEOREM 3: Using Lemma 3 and 4, the proof of (19) follows from the same
arguments as the proof of Theorem 4.1 in Berkes, Horvath and Kokoszka (2003). We have the
following additional remarks.
First, consider the E | logw0(d)| < ∞. By | logw0(d)| ≤ w−10 (d) + w0(d) it is sufficient to
show that Ew−10 (d) <∞ and Ew0(d) <∞ for all d ∈ D. We write for any 1 ≤ N <∞:
1w0(d)
≤ 1N∑j=1
πj−1(d)X−j
≤ 1
π(d)(ψ−1
ψ−Nε−1 + . . .+ ε−N
)ψ−N
≤
N∑j=1
ε−j
−1
K1
ψ−N, (A.13)
where π(d) is defined in (A.9), and we use ψt
ψt−1≥ K2 > 0 for all t ∈ Z shown in Lemma 1.
Hence by Lemma 2 and A3, for sufficiently large N :
E1
w0(d)≤ K1E
N∑j=1
ε−j
−1
E1ψ0
<∞ .
In view of (15), we have w0(d) ≤ K3
∞∑j=1
j−γX−j for some γ > 1 and all d ∈ D. Hence,
by the monotone convergence theorem and stationarity of the solution (Xt, ψt) : t ∈ Z,Ew0(d) ≤ K3
∞∑j=1
j−γ EX0 <∞.
Second, the uniqueness of the maximum of the limiting log-likelihood function L(d) :=
−E(logw0(d) + X0
w0(d)
)follows from the arguments similar to those in Theorem 2.3 in Berkes,
47
Horvath and Kokoszka (2003), where their equation (2.2) can be written as:
εt−m =
∞∑j=m+1
(π∗j − πj)Xt−j
(π∗m − πm)ψt−m,
where the sequence π∗j : j ≥ 0 satisfies A2. The right-hand side of this expression is well-
defined, as Pψt = 0 = 0 for all t ∈ Z by Lemma 2.
The asymptotic normality of the estimator (18) follows by the arguments used by Berkes,
Horvath and Kokoszka (2003) to prove their Theorem 4.2. Additionally, we would like to
comment on the following.
Define A0 := E[−w
(1)0 (d0)ψ0
(1− ε0)]2
and B0 := −E[w
(1)0 (d0)ψ0
]2
, such that they correspond to
respective definitions of Berkes, Horvath and Kokoszka (2003). Existence of A0 and B0 follows
from Lemma 3 and A1. In order to establish that they are different from zero, it is sufficient
to show |w(1)0 (d0)|2 6= 0 a.s. Indeed, using (12) and (14), |w(1)
0 (d0)|2 ≥ X2−1, from where by
Lemma 2 and A3 we establish the result.
The limiting variance of the QML estimator in Theorem 3 is derive as follows. Using (12)
we can write w0(d) = [1− (1− L)d]X0 , from where:
w(1)0 (d0) = − log(1− L) (1− L)d0X0 = log(1− L)(X0 − ψ0) ,
from where the following expectation holds by A1:
E
[w
(1)0 (d0)ψ0
]2
= E [log(1− L)(ε0 − 1)]2 =π2
6E(ε0 − 1)2 .
The limiting variance of the QML estimator is then:
A0
B20
= E(1− ε0)2 E
[w
(1)0 (d0)ψ0
]−2
=6π2
.
This concludes the proof of Theorem 3.
LEMMA 5. Let (Xt, ψt) : t ∈ Z be a covariance stationary non-negative solution of the
ARCH(∞) model (12), and functions LT (d) and LT (d) be defined respectively in (17) and (24).
Then, under A1, A3–A4, as T →∞:
supd∈D
1T
∣∣∣LT (d)− LT (d)∣∣∣ a.s.−→ 0 . (A.14)
PROOF: Using definitions of functions LT (d) and LT (d), by triangular inequality we are able
to write:
1T
∣∣∣LT (d)− LT (d)∣∣∣ ≤ 1
T
T∑t=1
|logwt(d)− log wt(d)|+1T
T∑t=1
Xt
wt(d)
∣∣∣∣wt(d)− wt(d)wt(d)
∣∣∣∣ . (A.15)
We re-write the two expressions on the right hand side of this equations as follows. First, by
non-negativity of wt(d) and wt(d), | log x− log y| ≤ (x−1 + y−1)|x− y| for x, y ∈ R+, and (16)
48
and (22) follows:
1T
T∑t=1
|logwt(d)− log wt(d)| ≤1T
T∑t=1
[w−1t (d) + w−1
t (d)] ∞∑j=t
πj−1(d)|Xt−j − a0| .
Similarly to (A.13), function w−1t (d) is bounded uniformly in d ∈ D by the following expression,
for t ≥ N ≥ 1:
w−1t (d) ≤ 1
N∑i=1
πi−1(d)Xt−i
≤
(N∑i=1
εt−i
)−1K1
ψt−N.
The same bound holds for w−1t (d) as well. Hence, using (15) we obtain:
supd∈D
1T
T∑t=N+1
|logwt(d)− log wt(d)| ≤ K21T
T∑t=N+1
∞∑j=t
j−γ
(N∑i=1
εt−i
)−1|Xt−j − a0|ψt−N
,
for some γ > 1. Next, consider the second term on the right hand side of (A.15). Using bounds
of functions w−1t (d) and w−1
t (d) shown above together with (A.2), (16) and (22), we obtain the
following inequality:
supd∈D
1T
T∑t=N+1
Xt
wt(d)
∣∣∣∣wt(d)− wt(d)wt(d)
∣∣∣∣ ≤ K31T
T∑t=N+1
∞∑j=t
j−γ
N∏i=0
(1 + εt−i)
N∑i=1
εt−i
2
|Xt−j − a0|ψt−N
.
Comparison of the last two equations shows that the former dominates, given an appropriate
choice of the generic constants, by non-negativity of the shocks εt : t ∈ Z. By Kronecker’s
Lemma, the following limit is therefore sufficient for the uniform convergence of 1T LT (d) and
1T LT (d), apart from the first N terms:
∞∑t=N+1
1t
∞∑j=t
j−γ
N∏i=0
(1 + εt−i)
N∑i=1
εt−i
2
|Xt−j − a0|ψt−N
<∞ a.s.
By Beppo Levi’s theorem the desired result is established upon showing convergence of the
following infinite series:
∞∑t=N+1
E
1t
∞∑j=t
j−γ
N∏i=0
(1 + εt−i)
N∑i=1
εt−i
2
|Xt−j − a0|ψt−N
=
∞∑t=N+1
1t
∞∑j=t
j−γE
N∏i=0
(1 + εt−i)
N∑i=1
εt−i
2 [
E1
ψ2t−N
E(Xt−j − a0)2] 1
2
=
K4
∞∑t=N+1
1t
∞∑j=t
j−γ <∞ ,
49
where we used A1, covariance stationarity of the solution (Xt, ψt) : t ∈ Z, Holder’s inequality,
Lemma 2, and last part of the proof of Lemma 3.
To finally establish (A.14) it remains to show that the difference between firstN terms of the
functions 1T LT (d) and 1
T LT (d) converges to zero a.s. as T →∞. We use decomposition (A.15)
and write:
supd∈D
1T
N∑t=1
|logwt(d)− log wt(d)| ≤
K51T
N∑t=1
∞∑j=t
j−γ supd∈D
|Xt−j − a0|wt(d)
+K61T
N∑t=1
∞∑j=t
j−γ |Xt−j − a0| ,
upon noting w−1t (d) ≤
(a0
∞∑j=N
πj−1(d)
)−1
≤ K6 uniformly in d ∈ D for 1 ≤ t ≤ N . Since N
is fixed, it is sufficient to show that:
E
N∑t=1
∞∑j=t
j−γ supd∈D
|Xt−j − a0|wt(d)
=N∑t=1
∞∑j=t
j−γ
[E(
supd∈D
1wt(d)
)2
E(Xt−j − a0)2] 1
2
<∞ ,
where we used Holder’s inequality, covariance stationarity of (Xt, ψt) : t ∈ Z and argument
similar to (A.13). Similarly:
E
N∑t=1
∞∑j=t
j−γ |Xt−j − a0|
=N∑t=1
∞∑j=t
j−γ E|Xt−j − a0| <∞ ,
by the covariance stationarity of (Xt, ψt) : t ∈ Z. Finally, the last part of (A.15) can be
written as follows:
supd∈D
1T
N∑t=1
Xt
wt(d)
∣∣∣∣wt(d)− wt(d)wt(d)
∣∣∣∣ ≤ K71T
N∑t=1
εt
∞∑j=t
j−γ supd∈D
ψtwt(d)
|Xt−j − a0| ,
where we use uniform boundary of w−1t (d) shown above. Taking expectation of the right hand
side of this expression and using independence of εt : t ∈ Z, Holder’s inequality, and Lemma 3
we conclude that:
E
N∑t=1
εt
∞∑j=t
j−γ supd∈D
ψtwt(d)
|Xt−j − a0|
=
N∑t=1
∞∑j=t
j−γ
[E(
supd∈D
ψtwt(d)
)2
E(Xt−j − a0)2] 1
2
<∞ .
This finishes the proof of (A.14).
PROOF OF THEOREM 4: Using Lemma 5, (25) follows directly from the proof of Theorem 4.3
in Berkes, Horvath and Kokoszka (2003).
50
References
Andersen, Torben G., Tim Bollerslev, Francis Diebold, and Paul Labys (2001) The distribution
of exchange rate volatility. Journal of the American Statistical Association, vol. 96, pp. 42-
55.
Baillie, Richard T., Tim Bollerslev and Hans O. Mikkelsen (1996) Fractionally integrated gen-
eralized autoregressive conditional heteroskedasticity. Journal of Econometrics, vol. 74,
pp. 3-30.
Beran, J. (1994) Statistics for long-memory processes. Chapman and Hall, New York.
Berkes, Istvan, Lajos Horvath and Piotr Kokoszka (2003) GARCH processes: Structure and
estimation. Bernoulli, vol. 9, pp. 201-227.
Bollerslev, Tim (1986) Generalized autoregressive conditional heteroskedasticity. Journal of
Econometrics, vol. 31, pp. 307-327.
Bougerol, Philippe and Nico Picard (1992) Stationarity of GARCH processes and of some
non-negative time series. Journal of Econometrics, vol. 52, pp. 115-127.
Brockwell, Peter J. and Richard A. Davis (1991) Time series: theory and methods. Second
Edition, New-York: Springer-Verlag.
Cheung, Yin-Wong and Francis X. Diebold (1994) On maximum likelihood estimation of the
differencing parameter of fractionally-integrated noise with unknown mean. Journal of
Econometrics, vol. 62, pp. 301-316.
Ding, Z. and Clive W.J. Granger (1996) Modeling volatility persistence of speculative returns:
a new approach. Journal of Econometrics, vol. 73, pp. 185-215.
Giraitis, Liudas, Piotr Kokoszka and Remigijus Leipus (2000) Stationary ARCH models: de-
pendence structure and central limit theorem. Econometric Theory, vol. 16, pp. 3-22.
Giraitis, Liudas, Piotr Kokoszka, Remigijus Leipus and Gilles Teyssiere (2000) Semiparametric
estimation of the intensity of long memory in conditional heteroskedasticity. Statistical
Inference for Stochastic Processes, vol. 3, pp. 113-128.
Giraitis, Liudas and Peter M. Robinson (2001) Whittle estimation of ARCH models. Econo-
metric Theory, vol. 17, pp. 608-631.
Granger, Clive W.J. (1980) Long memory relationships and aggregation of dynamic models.
Journal of Econometrics, vol. 14, pp. 227-238.
Hosking, Jonathan R.M. (1981) Fractional differencing. Biometrika, vol. 68, pp. 165-76.
Jasiak, Joanna (1998) Persistence in intertrade durations. Finance, vol. 19, pp. 166-195.
Kazakevicius, Vytautas and Remigijus Leipus (2002) On stationarity in the ARCH(∞) model.
Econometric Theory, vol. 18, pp. 1-16.
Koulikov, Dmitri (2003) Modeling sequences of long memory non-negative covariance stationary
random variables. CAF Working Paper Series, no. 156
Lee, Sang-Won and Bruce E. Hansen (1994) Asymptotic theory for the GARCH(1,1) quasi-
maximum likelihood estimator. Econometric Theory, vol. 10, pp. 29-52.
51
Lumsdaine, Robin L. (1996) Consistency and asymptotic normality of the quasi-maximum like-
lihood estimator in IGARCH(1,1) and covariance stationary GARCH(1,1) models. Econo-
metrica, vol. 64, pp. 575-596.
McLeod, A. I. and K. W. Hipel (1978) Preservation of the rescaled adjusted range, 1: a
reassessment of the Hurst phenomenon, Water Resources Research, vol. 14, pp. 491-508.
Robinson, Peter M. (1991) Testing for strong serial correlation and dynamic conditional het-
eroskedasticity in multiple regression. Journal of Econometrics, vol. 47, pp. 67-84.
Robinson, Peter M. and Paolo Zaffaroni (2003) Pseudo–maximum likelihood estimation of
ARCH(∞) models. Preprint.
Spiegel, Murray R. and John Liu (1999) Mathematical handbook of formulas and tables. Second
Edition, McGraw-Hill.
Weiss, Andrew A. (1986) Asymptotic theory for ARCH models: estimation and testing. Econo-
metric Theory, vol. 2, pp. 107-131.
Yajima, Yoshihiro (1985) On estimation of long–memory time series models. Australian Journal
of Statistics, vol. 27, pp. 303-320.
52
Chapter 3: Non–stationary models for volatility ofspeculative returns: with application to foreign
exchange data
53
Non–stationary models for volatility of speculative returns: with
application to foreign exchange data
Dmitri Koulikov
Department of Economics
School of Economics and Management
University of Aarhus
Aarhus C, 8000, Denmark
phone: +45 89421577
e-mail: [email protected]
This revision:
December 29, 2003
Abstract
A family of models for non–stationary conditional heteroscedasticity is introduced in
the paper, allowing for both a flexible deterministic time dependence and various forms of
stochastic trends in the volatility process. The stochastic part of the new model, referred
to as cMD–ARCH, is based on a weighted sequence of martingale difference innovations,
constructed from the process history. Sufficient conditions for a.s. non–negativity are pro-
vided, along with a number of possible parametrizations based on a number of popular
stationary volatility models. Consistency and asymptotic normality of the QML estimator
are established, drawing on the contributions of Jensen and Rahbek (2002, 2003). Finally,
an empirical application to thirteen series of daily exchange rate returns is included to
illustrate a practical potential of the cMD–ARCH process.
JEL classification: C13, C22
Keywords: Conditional heteroscedasticity, Non–stationarity, Quasi-maximum likelihood
estimation
54
1 Introduction
The literature on modeling conditionally heteroscedastic time series, such as returns on specula-
tive assets, considers infinite sequences of strictly stationary random variables (rt, σ2t ) : t ∈ Z,
where rt usually represents a compound difference between prices of a financial asset at con-
secutive time periods, and σ2t can be regarded as a scaling parameter of rt. The most popular
class of models for heteroscedastic data is given by the following general family:
rt = σt z∗t σ2t = a∗ +
∞∑j=1
πj−1r2t−j , (1)
where z∗t : t ∈ Z is an infinite sequence of i.i.d. disturbances, and a∗ ≥ 0 and πj : j ≥0 ⊆ R0+ are the model parameters. Members of (1) include ARCH model of Engle (1982),
GARCH model of Bollerslev (1986), IGARCH sequences of Engle and Bollerslev (1986) and
a number of other specifications. Robinson (1991) introduced ARCH(∞) model (1) in the
heteroscedasticity testing context. An extensive survey of recent theoretical advances in ARCH
modeling is Giraitis, Leipus and Surgailis (2003), while an overview of the applied GARCH
literature is found in Bollerslev, Engle and Nelson (1994).
Another class of stationary models for conditionally heteroscedastic data, referred to as
MD–ARCH(∞), was recently studied in Koulikov (2003a) and is given as follows:
rt = σt z∗t σ2t = a +
∞∑j=1
θj−1(r2t−j − σ2
t−j) , (2)
where a > 0 and θj : j ≥ 0 ⊆ R+ is a sequence of square–summable coefficients. This model
is particularly suited for covariance stationary sequences (r2t , σ
2t ) : t ∈ Z, allowing for either
short or long memory in the conditional volatility process. Koulikov (2003b) demonstrates that
covariance stationary non-negative solutions of (2) also belong to the ARCH(∞) family.
ARCH(∞) and MD–ARCH(∞) models define strictly stationary infinite sequences of re-
turns and conditional volatility (rt, σ2t ) : t ∈ Z, where existence of the moments E σν
t , for
ν > 0, depends on the parameters and moment assumptions on the sequence of disturbances
z∗t : t ∈ Z. Nelson (1990) derives general moment condition for the GARCH(1,1) case, includ-
ing IGARCH(1,1). Ling and McAleer (2002) show necessary and sufficient conditions for the
existence of higher order moments E σ2it , for integer i ≥ 1, in the class of GARCH(p,q) models.
The moment conditions for the general ARCH(∞) family of models are derived in Giraitis,
Kokoszka and Leipus (2000) and Giraitis, Leipus and Surgailis (2003), while Kazakevicius and
Leipus (2002) study existence of strictly stationary ARCH(∞) sequences. Koulikov (2003a)
shows second order stationarity conditions for the class of MD–ARCH(∞) models.
In this paper we introduce a family of models for heteroscedastic time series, which allow for
non–stationary behavior of the conditional volatility parameter. It is defined by the following
55
set of stochastic equations:
r2t = σ2
t = a0 for t ≤ 0
rt = σt zt σ2t = at +
t∑j=1
θj−1(r2t−j − σ2
t−j) for t > 0 ,(3)
where at : t ≥ 0 ⊆ R+ is a non-stochastic function on the set of integers, and zt : t ≥ 0 are
innovations with E z2t = 1. The model implies a set of conditional distributions of the vectors
(rt, σ2t ) given the history of the process and its parameters. Other than this, the definition (3)
places few restrictions on the time dynamics of (rt, σ2t ) : t ≥ 0, allowing for explosive behavior,
various forms of deterministic trends in at : t ≥ 0, and diverging stochastic trends in the
conditional volatility process. Naturally, only solutions where σ2t : t ≥ 0 ⊆ R0+ will be of
interest for volatility modeling. In the remainder of the paper such solutions will be referred
to as cMD–ARCH process.
As shown in the paper, particular parametrizations of at : t ≥ 0 and θj : j ≥ 0 in
the cMD–ARCH model (3) allows the process to converge to stationary limits defined by the
ARCH(∞) and MD–ARCH(∞) equations. In particular, stationary GARCH and IGARCH
limits are possible. Moreover, the cMD–ARCH model permits a study of volatility processes
with unknown or possibly non-stationary solutions, such as the FIGARCH process of Bail-
lie, Bollerslev and Mikkelsen (1996) or the model (2) with non–square–summable coefficients.
An attractive property of the cMD–ARCH processes is the ease of statistical inference about
the set of parameters in (3). Consistency and asymptotic normality of the QML estimator
of the model parameters follows from a set of sufficient conditions in Basawa, Feigin and
Heyde (1976), regardless of the limiting stability of the cMD–ARCH process. This result
generalizes recent findings of Jensen and Rahbek (2002, 2003), who consider non-stationary
ARCH(1) and GARCH(1,1) models, to a wide class of models given by (3).
The simplicity of statistical inference coupled with a wide range of permitted parametriza-
tions of the sequences θj : j ≥ 0 and at : t ≥ 0, including those outside the stationary
regions of known ARCH(∞) and MD–ARCH(∞) processes, allows for estimation and testing
of flexible conditional volatility models for a variety of real–world financial time series. In this
paper we provide application of the cMD–ARCH model to thirteen series of daily exchange rate
returns on major world currencies. The hypotheses of interest in this application will include
presence of a deterministic trend in at : t ≥ 0 and square–summability of θj : j ≥ 0. We
show that deterministic trends in (3), implied by the IGARCH and FIGARCH limits of the
cMD–ARCH process, are rejected in most foreign exchange rate series in our sample. The
hypothesis of square–summability of the sequence θj : j ≥ 0 is often rejected as well, in
favor of the non–square–summable coefficients. Therefore, the empirically preferred model for
our sample of foreign exchange rate returns does not belong any known class of stationary
ARCH(∞) or MD–ARCH(∞) processes.
The paper is comprised of the following sections. Statistical properties of the cMD–ARCH
model, including the QML inference, are discussed in details in Section 2. Section 3 presents
56
an empirical application of the cMD–ARCH model to a sample of thirteen foreign exchange
returns. Conclusion summarizes the findings and outlines plans for further research. Proofs of
main theorems are collected in the Appendix.
2 Specification and estimation of cMD–ARCH model
In this section we consider a class of cMD–ARCH models which allow for a wide range of
non–stationary behavior in the conditional second moment of heteroscedastic time series. Sub-
section 2.1 introduces the new class of volatility models and provides a number of possible
parametrizations of cMD–ARCH processes. Subsection 2.2 discusses issues related to the sta-
tistical inferences in the model parameters.
2.1 Framework for non–stationary volatility modeling
Most of the current theoretical and empirical research in volatility modeling is centered around
the class of stationary ARCH(∞) and MD–ARCH(∞) models, which include popular GARCH,
IGARCH and covariance stationary long memory MD–ARCH(∞) processes for the conditional
volatility. All these models imply a set of restrictions on the parameters in stochastic equa-
tions (1) and (2) in order to ensure existence of stationary solutions (rt, σ2t ) : t ∈ Z. A review
of theoretical properties of the stationary ARCH(∞) models and implied parameter restrictions
is Giraitis, Leipus, Surgailis (2003), for the corresponding results pertaining to the covariance
stationary MD–ARCH(∞) sequences refer to Koulikov (2003a).
Yet, empirical and theoretical relevance of these restrictions received very limited attention
in the econometric literature. One of the reasons behind the lack of research in this direction is
underdeveloped statistical inference theory for non–stationary volatility models. An important
progress in developing such a theory has recently been made by Jensen and Rahbek (2002,
2003) for non–stationary ARCH(1) and GARCH(1,1) models. They establish consistency and
asymptotic normality of the QML estimator for the parameters of the two processes outside the
stationary region shown in Nelson (1990). On the other hand, most earlier contributions dealing
with the statistical inference for ARCH(∞) and MD–ARCH(∞) processes assume stationary
of (rt, σ2t ) : t ∈ Z, refer to Lee and Hansen (1994), Berkes, Horvath and Kokoszka (2003a)
and Koulikov (2003b).
Empirically, there is a substantial interest in modeling behavior of various financial assets
in a potentially non–stationary environments, such as foreign exchange rates during the periods
of market instability. Recent contribution along these lines is Davidson (2003), where he uses a
volatility model with time–invariant parameters to fit both the pre– and post–crisis periods in
a sample of major Asian currencies. Davidson (2003) reports remarkable stability of parameter
estimates across the sub-samples and concludes that some existing volatility models, such as the
FIGARCH process of Baillie, Bollerslev and Mikkelsen (1996), provide an adequate statistical
tool for potentially non–stationary volatility modeling.
57
The goal of this paper is to introduce a formal framework for non–stationary volatility
modeling and develop a statistical inference theory in this context. To this end we define the
cMD–ARCH model in (3), where the following assumptions hold:
A1. z2t−1 : t ≥ 0 is a stochastic non–degenerate sequence of martingale difference innovations
satisfying E [z2t − 1]2 < ∞ and E
[z2t − 1 | Ft−1
]= 0, where Ft is the natural filtration.
A2. at : t ≥ 0 ⊆ R+ and θj : j ≥ 0 ⊆ R+ are non–stochastic sequences of parameters such
that πj : j ≥ 0 ⊆ R0+ and at −∑t−1
j=0 πj at−1−j ≥ 0 for all t > 0, where:
π0 := θ0 , πj := θj −j−1∑k=0
θj−1−k πk for j > 0 . (4)
We wish to remark on the following points. A1 places few restrictions of the dependence
structure and distribution of shocks zt : t ≥ 0. This compares favorably with much
stronger assumptions on the sequence of shocks z∗t : t ∈ Z in the stationary ARCH(∞)
and MD–ARCH(∞) models, where the i.i.d. property is usually required. In addition, under
A1 the innovations r2t − σ2
t : t ≥ 0 in the conditional volatility part of (3) are martingale dif-
ferences. This follows from σ2t < ∞ for all t ≥ 0, whereby E[σ2
t (1− z2t ) | Ft−1] = 0 is immediate
from the definition of the process. Assumption A2 is needed to ensure a.s. non–negativity of
the cMD–ARCH process, and is more general than the one in Koulikov (2003a).
For modeling the conditional volatility of financial time series, the process σ2t : t ≥ 0
defined by the cMD–ARCH model has to remain non–negative. The following theorem formally
establishes the required result:
THEOREM 1. Under assumptions A1–A2 the sequence (r2t , σ
2t ) : t ≥ 0 defined by the stochas-
tic equations (3) is a.s. non–negative.
Examples 1 to 4 provide a number of possible parametrizations of the cMD–ARCH model and
demonstrate applicability of the non–negativity condition in A2:
EXAMPLE 1. One of the implication of the IGARCH process of Engle and Bollerslev (1986)
is that the volatility forecast conditional on the current information set increases linearly with
the forecast horizon. This is a manifestation of the fact that the stationary (rt, σ2t ) : t ∈ Z
process defined by the IGARCH equations has no finite integer moments, including the first
moment. Let zt : t ≥ 0 be a sequence of shocks satisfying A1, and let r20 = σ2
0 = a0 denote
an arbitrary non–negative initial value, from which we wish to recursively construct a sequence
(rt, σ2t ) : t ≥ 0 according to the usual IGARCH equations:
rt = σt zt σ2t = c + γ σ2
t−1 + (1− γ)r2t−1 ,
where the parameters satisfy 0 < γ < 1 and c ∈ R+. Then the corresponding cMD–ARCH
model is given by:
r2t = σ2
t = a0 for t ≤ 0
rt = σt zt σ2t = a0 + c t +
t∑j=1
γ (r2t−j − σ2
t−j) for t > 0 .(5)
58
In this representation at = a0 + c t contains a linear trend, corresponding to the linear com-
ponent in the IGARCH conditional volatility forecast. The sequence of martingale difference
innovations in (5) forms a stochastic trend in σ2t : t ≥ 0.
In this model, the sequence of πj-s defined in (4) is given by πj = γ (1 − γ)j for all j ≥ 0.
The following condition has to be checked in order to establish A2:
at −t−1∑j=0
πj at−1−j =a0
1− γt−1∑j=0
(1− γ)j
+
c
t− γt−1∑j=0
(1− γ)j(t− 1− j)
≥ c > 0 for all t > 0 ,
by the inequality 0 < γ∑t
j=0(1− γ)j ≤ 1 for all t ≥ 0. Hence, by Theorem 1, the conditional
volatility part of (5) is a.s. non–negative.
Finally, we note that unlike in the stationary IGARCH model, the sequence of shocks
in the cMD–ARCH sequence (5) needs not to be i.i.d. In this case, however, convergence of
(rt, σ2t ) : t ≥ 0 to the limiting IGARCH process cannot be shown. In the i.i.d. case the result
follows from Theorem 2 of Nelson (1990).
EXAMPLE 2. The FIGARCH process of Baillie, Bollerslev and Mikkelsen (1996) belongs to
the family of ARCH(∞) sequences (1), with the coefficients πj = O(j−1−d) converging to zero
at a relatively slow hyperbolic rate controlled by a parameter 0 < d < 1. Similarly to the
IGARCH case, the sequence πj : j ≥ 0 in the FIGARCH model sums up to unity. However,
existence and properties of the stationary solution (rt, σ2t ) : t ∈ Z satisfying the FIGARCH
equations have not been established in the literature, see Giraitis, Leipus, Surgailis (2003).
While the limiting properties of the FIGARCH model remain unknown, its empirical appli-
cations on samples of financial data are feasible, where the infinite series in (1) is truncated at
the sample size value. This effectively overcomes the problem of potential non–stationarity of
the model. In the cMD–ARCH framework the truncated FIGARCH process has the following
representation:
r2t = σ2
t = a0 for t ≤ 0
rt = σt zt σ2t = a0 + a∗
t∑j=1
θj−1 +t∑
j=1
θj−1(r2t−j − σ2
t−j) for t > 0 ,(6)
where a0 ∈ R+ is a starting value of the process, and the sequence of coefficients θj : j ≥ 0 is
defined from πj : j ≥ 0 by inverting the formula (4). It can be shown that for the hyperbolic
πj = O(j−1−d) the corresponding θj = O(j−1+d), therefore the sequence θj : j ≥ 0 is not
absolutely summable.
The trend component in (6) is given by at = a0 + a∗∑t
j=1 θj−1, where its asymptotic rate
O(td) is slower than the corresponding linear trend in model (5). The stochastic part of the
model consists of a weighted sequence of martingale difference innovations r2t − σ2
t : t ≥ 0,
59
where the weighting coefficients θj : j ≥ 0 are square–summable for 0 < d < 12 , a half of the
permitted parameter range.
Non–negativity of the conditional volatility part in (6) holds by Theorem 1 and the following
argument:
at −t−1∑j=0
πj at−1−j = a0
1−t−1∑j=0
πj
+ a∗t−1∑j=0
[θj − πj
t−2−j∑i=0
θi
]
= a0
1−t−1∑j=0
πj
+ a∗t−1∑j=0
πj ≥ a∗t−1∑j=0
πj > 0 for all t > 0 ,
(7)
where we use (4) and the inequality 0 <∑t
j=0 πj ≤ 1 for all t ≥ 0.
EXAMPLE 3. The class of stationary GARCH(p,q) sequences of Bollerslev (1986) and the
long memory MD–ARCH(∞) model of Koulikov (2003a) have representation (2), for a detailed
discussion refer to Koulikov (2003a, 2003b). By starting the process at a non–negative value
r20 = σ2
0 = a0 and truncating the infinite series in equation (2) at t, the cMD–ARCH model
can be defined as:
r2t = σ2
t = a0 for t ≤ 0
rt = σt zt σ2t = a0 +
t∑j=1
θj−1(r2t−j − σ2
t−j) for t > 0 ,(8)
where the sequence of coefficients θj : j ≥ 0 is at least square–summable. From Theo-
rem 3 of Koulikov (2003a) follows that the cMD–ARCH model (8) converges to a stationary
MD–ARCH(∞) limit when zt : t ≥ 0 are i.i.d. shocks.
In contrast to models (5) and (6) in the previous examples, the sequence at : t ≥ 0in (8) is constant for all t ≥ 0. The stochastic part of the model is similar to that in (6), with
an added square–summability requirement on θj-s. The latter is needed for the three series
theorem to hold, ensuring a.s. convergence of the Volterra series expansion of (8), for details
refer to Koulikov (2003a).
Non–negativity of the σ2t : t ≥ 0 process in (8) follows by the same arguments as in
Example 3, on replacing a∗ in (7) by 0 and observing that inequality∑t
j=0 πj ≤ 1 holds for
the class of GARCH and MD–ARCH(∞) models by Theorem 2.1 in Giraitis, Kokoszka and
Leipus (2000) and Theorem 1 in Koulikov (2003b).
EXAMPLE 4. cMD–ARCH models (5), (6) and (8) in the previous examples are based on
specifications of known members of the ARCH(∞) or MD–ARCH(∞) family. They imply
different form of the deterministic and stochastic parts of the conditional volatility process
σ2t : t ≥ 0. In this example we combine ideas of the three models into one general cMD–ARCH
specification as follows:
r2t = σ2
t = a0 for t ≤ 0
rt = σt zt σ2t = a0 + c tα + γ (1− φ L)−1(1− L)−d(r2
t−1 − σ2t−1) for t > 0 ,
(9)
60
where L denotes the lag operator, and the parameters satisfy a0, c ∈ R+, 0 ≤ d, α, γ ≤ 1 and
|φ| < 1. A number of cross restrictions on these parameters have to be imposed to ensure a.s.
non–negativity of the conditional variance.
This process nests all previous specifications in Examples 1 to 3 as special cases. In particu-
lar, the sequence of coefficients θj : j ≥ 0 from the generating function γ (1−φ z)−1(1− z)−d
satisfies∑∞
j=0 θνj < ∞ for ν > 1
1−d and 0 < d < 1, and is absolutely summable when d = 0
and 0 < φ < 1. On the other extreme, the case d = 1, φ = 0 and 0 < γ < 1 corresponds to
model (5) in Example 1.
The deterministic part of the conditional volatility process in (9) allows for a substantial
flexibility in at : t ≥ 0, ranging from a linear trend resembling the one in model (5) to the
absence of any deterministic time dependence in at : t ≥ 0 similar to (8). Parameter c
permits the importance of deterministic time behavior to vary.
To establish non–negativity of model (9) one needs to restrict parameters of the model such
that A2 is satisfied. The closed–form expressions for πj : j ≥ 0 are somewhat complicated,
but sufficient conditions for non–negativity of πj-s are given by d(1−d−2φ) ≥ 0 and d−γ+φ ≥0, refer to Koulikov (2003a). For sufficiently large d these conditions permit a negative φ
parameter. The second condition in A2 follows by:
at −t−1∑j=0
πj at−1−j = a0
1−t−1∑j=0
πj
+ c
t−t−1∑j=0
πj(t− 1− j)α
≥ c > 0 for all t > 0 ,
by the inequality 0 <∑t
j=0 πj ≤ 1 according to Koulikov (2003a).
2.2 The quasi–maximum likelihood estimation
A substantial amount of research in the theoretical GARCH literature has been devoted to
the issue of statistical inference for this family of time series models. However, most of this
work assumes a stationary ARCH(∞) or MD–ARCH(∞) process (rt, σ2t ) : t ∈ Z, from which
a finite realization of returns rt : 1 ≤ t ≤ T is observed. Recent contributions within this
framework include Berkes, Horvath and Kokoszka (2003a), Robinson and Zaffaroni (2003),
Hall and Yao (2003) and Koulikov (2003b), for a broader review of the literature refer to
Giraitis, Leipus, Surgailis (2003). When stationarity of the data generating process of returns
and conditional volatility is not assumed, the asymptotic properties of the QML estimator are
established only for some special cases. Jensen and Rahbek (2002, 2003) show consistency and
asymptotic normality of the QML estimator of the non–intercept parameters of non–stationary
explosive ARCH(1) and GARCH(1,1) processes. They note that in contrast to the unit root
and explosive non–stationarities in the linear time series models, the properties of the QML
estimator in non–stationary volatility processes remain similar to the stationary case. In this
subsection we extend their results to the class of cMD–ARCH sequences (3), including the two
cases considered in Jensen and Rahbek (2002, 2003).
61
Let an observed sequence of returns of length T ≥ 1 satisfy the set of equations (3). Let
at(u) : t ≥ 0 and θj(u) : j ≥ 0 be functions of a vector u, such that for every u ∈ U A2
is satisfied, where U is a subset of the finite–dimensional Eucledian space. Let the unknown
parameter vector be given by u0 ∈ U , and let the sequence of shocks zt : t ≥ 0 in the
data generating process satisfy A1. We wish to obtain statistical inference on u0 based on the
sample rt : 1 ≤ t ≤ T by maximizing the following stochastic function:
LT (u) = −T∑
t=1
[log σ2
t (u) +r2t
σ2t (u)
], (10)
where the sequence of non–negative functions σ2t (u) : 1 ≤ t ≤ T on U is defined as:
r2t = σ2
t = a0(u) for t ≤ 0
σ2t (u) = at(u) +
t∑j=1
θj−1(u)[r2t−j − σ2
t−j(u)]
for 1 ≤ t ≤ T .
It is important to note the following: in contrast to most previously cited studies, where
properties of the QML estimator are derived under assumption of stationarity, the sequence
σ2t (u0) : 1 ≤ t ≤ T gives true values of the conditional volatility parameter in the sample,
not just a consistent approximation. In view of this fact and in order to save on notation we
write σ2t = σ2
t (u0) for every 1 ≤ t ≤ T . The superscript (n) next to a function denotes its n-th
derivative with respect to u.
The following additional regularity conditions on the parameters at(u) : t ≥ 0 and θj(u) :
j ≥ 0 is needed for Theorem 2:
A3. For a non–zero vector v with Eucledian norm ||v|| = 1 and a non–negative finite constant
C let:
v′
∣∣∣∣∣∣log[at(u0)−
t−1∑j=0
πj(u0) at−1−j(u0)](1)
+t∑
j=1
[log πj−1(u0)
](1)∣∣∣∣∣∣ ≤ C for all t ≥ 0 .
A4. The following function is continuous element–wise in u in a neighborhood of u0:[at(u0)−
t−1∑j=0
πj(u0) at−1−j(u0)](2)
+t∑
j=1
[πj−1(u0)
](2)for all t ≥ 0 .
Proof of the following result is based on Basawa, Feigin and Heyde (1976), and Jensen and
Rahbek (2002, 2003):
THEOREM 2. Under A1–A4, let the sequence of estimators uT : T ≥ 1 be defined by:
uT = arg maxu∈U
1T
LT (u) ,
where the log–likelihood function LT (u) is given in (10). Then:
T12 (uT − u0)
d−→ N(0,A−10 B0A
−10 ) as T →∞ ,
where A0 := limT→∞ E[
1T L
(2)T (u0)
], B0 := limT→∞ E
[1T L
(1)T (u0) L
(1)T (u′0)
], and N(0,S) de-
notes a multivariate normal distribution with mean 0 and variance S.
62
In the empirical part of the paper we estimate the matrices A0 and B0 by the numerical
Hessian and outer product of gradients using the sample log–likelihood function evaluated at
the consistent estimates uT .
The vector of QML estimates uT allows to calculate estimates of the unobserved sequence
of true shocks zt : 1 ≤ t ≤ T as follows:
zt = sign rt ·
√r2t
σ2t (uT )
, (11)
where sign rt returns −1 for rt < 0 and 1 for rt ≥ 0. From Theorem 2 follows that zt : 1 ≤t ≤ T are consistent for the sequence of true shocks. In section 3 we assess the fit of empirical
foreign exchange volatility models by checking adequacy of A1 in the sequence of estimated
residuals zt : 1 ≤ t ≤ T. However, we note that statistical properties of the diagnostic tests
based on (11) remain unknown, and are likely to depend on the corresponding properties of
uT : T ≥ 1. Asymptotic properties of the autocorrelation tests based on z2t : 1 ≤ t ≤ T
have recently been shown in Berkes, Horvath and Kokoszka (2003b) for stationary GARCH(p,q)
model.
3 Application to foreign exchange returns
This section provides an empirical illustration of the new class of conditional volatility models
introduced in section 2. A general overview of the dataset is given in subsection 3.1, followed
by the empirical results in subsection 3.2.
3.1 Data and descriptive statistics
An empirical application of the cMD–ARCH model in section 3 is based on a sample of thirteen
major European and Asian foreign exchange rate returns series. The dataset contains ten years
of daily foreign exchange rates against the US dollar, from which compound daily returns are
calculated by the usual methodology. All series are obtained from Datastream and cover 2576
returns from 1st January 1994 to 18th November 2003.
The exchange rate series for eight European countries are included in the dataset: Denmark,
Finland, Germany, Ireland, Italy, Portugal, Spain, Switzerland, and United Kingdom. Since
January 1999 six of these countries became members of the common European currency area,
and adopted Euro as their national currencies. This is likely to affect a number of results
reported in the next subsection due to the common dynamics of the six currencies against the
US dollar.
In addition, the dataset contains foreign exchange returns series for four major Asian
economies: Indonesia, Japan, South Korea, and Taiwan. The period of Asian economic crisis
during 1997–1998 is covered by the sample. Because of this, and due to a relatively sluggish
economic development in Asia during the last decade in comparison to Europe and America, we
expect to find substantial differences in volatility modeling results for this part of the sample.
63
Table 1: Descriptive statistics of foreign exchange returns series.
Series Mean Variance Skewness Kurtosis Q(100) Q2(100)
DKK -2.870·10−5 3.483·10−5 -0.28542∗ 4.49634∗ 99.6128 260.666∗
FIM -5.362·10−5 3.528·10−5 -0.25159∗ 4.21366∗ 95.3173 289.506∗
DEM -1.767·10−5 3.677·10−5 -0.27849∗ 4.57641∗ 102.359 372.140∗
IEP -2.339·10−5 3.219·10−5 -0.18452∗ 4.73030∗ 98.6883 389.494∗
ITL -1.606·10−5 3.297·10−5 -0.19310∗ 4.59435∗ 98.9594 527.978∗
PTE -1.483·10−5 3.473·10−5 -0.27748∗ 4.54844∗ 108.759 338.646∗
ESP -4.853·10−6 3.373·10−5 -0.24890∗ 4.46264∗ 104.279 307.245∗
CHF -4.554·10−5 4.405·10−5 -0.32785∗ 5.06566∗ 89.1619 373.699∗
GBP -5.164·10−5 2.102·10−5 -0.09735∗ 4.73871∗ 99.0589 188.816∗
IDR 5.399·10−4 3.846·10−4 1.49429∗ 44.4082∗ 664.571∗ 5083.84∗
JPY -9.539·10−6 5.152·10−5 -0.65022∗ 8.54790∗ 139.409 941.560∗
KRW 1.482·10−4 1.001·10−4 -0.75246∗ 114.191∗ 1194.46∗ 3299.99∗
TWD 9.443·10−5 7.410·10−6 2.56988∗ 54.5937∗ 320.633∗ 737.627∗
Notes: The table reports respective statistics for sample of returns rt : 1 ≤ t ≤ T, where
T = 2576. The series are denoted using the international currency codes. Except for mean and
variance, the star near a reported test statistic indicates 5% level significance for the appropriate
distribution. The column Q(100), respectively Q2(100), shows Box–Pierce statistics for returns,
respectively squared returns, see Li and Mak (1994). Skewness and kurtosis statistics are
according to Jarque and Bera (1987).
64
Summary statistics of squared returns in the dataset are given in Table 1. The reported
results point out to significant deviations from Gaussianity in the distribution of foreign ex-
change rate returns, a well known result in the empirical volatility modeling literature. For the
sample of four Asian currencies the deviation appears to be especially pronounced, and can be
attributed to the effects of 1997–1998 crisis.
Strong dynamic dependence in the squared returns are present in all foreign exchange series.
In addition, IDR, KRW and TWD series have significant autocorrelation in the returns series. This
is likely to be an effect of rapid depreciation of these currencies, where during the crisis period
a long sequence of negative returns against the US dollar is introduced into the sample. We use
a simple AR(1) filter to eliminate these dependencies before estimating conditional volatility
models in the next section.
3.2 Empirical results
In this subsection we discuss an empirical application of the cMD–ARCH model to foreign
exchange data. As shown in section 2, the class of cMD–ARCH models imposes very few
restrictions on the parameters of the conditional volatility process, allowing for non–stationary
and explosive types of behavior. At the same time, stationary limiting process, such as IGARCH
and long memory covariance stationary MD–ARCH(∞) sequences, are also included. The QML
estimator of the cMD–ARCH parameters is shown to be consistent and asymptotically normal.
The goal of the modeling exercise in this section is to use previously developed ideas on the
real world data, testing for possible presence of non–stationary behavior in foreign exchange
returns.
The empirical heteroscedasticity model of foreign exchange returns in this subsection is
based on the cMD–ARCH specification (9). Recall from Example 4 that this specification
includes both the stationary and non–stationary limits, as well as a flexible deterministic part
in the conditional volatility process. In particular, an empirically important IGARCH model
of Engle and Bollerslev (1986) is nested within the specification (9). This allows for testing
the hypothesis of integrated volatility in the sense of Engle and Bollerslev (1986) against a
number of other nested stationary and non–stationary limiting processes. Among these, the
processes with non–zero fractional integration parameter d are of special interest. As shown
in Example 4, 0 < d < 1 implies non–summable coefficients θj : j ≥ 0 in the general
cMD–ARCH model (3), where the square–summability of θj : j ≥ 0 and thus the covariance
stationary MD–ARCH(∞) limit of (9) holds for 0 < d < 12 .
During the estimation, the parameter a0 in the cMD–ARCH model (9) has to be fixed in
order to avoid identification issues between a0 and c when α = 0. In all empirical volatility
models estimated in this subsection we fix a0 on the sample average of r2t : 1 ≤ t ≤ T.
For the second–order stationary returns processes this choice of a0 naturally corresponds to
the unconditional expected value of the volatility, refer to Example 3. For other cases, a0
represents an unobserved pre–sample value of the squared returns process, permitting a range
65
Table 2: cMD–ARCH modeling results for thirteen series of foreign exchange returns.
Series a0 c α γ d φ
DKK 4.2487·10−09 2.4097·10−12 1.8823 0.023134 0.09493 0.98374
(—) (2.9590·10−11) (1.5899) (0.006463) (0.11292) (0.00795)
FIM 4.0123·10−09 1.7955·10−19 2.1433 0.021467 0.29847 0.94637
(—) (1.1868·10−17) (2.1034) (0.005488) (0.17020) (0.03482)
DEM 4.8389·10−09 4.2332·10−13 2.0502 0.029754 0.01376 0.98639
(—) (7.0662·10−12) (2.1604) (0.007198) (0.09756) (0.00602)
IEP 3.8691·10−09 5.0261·10−10 1.4605 0.008056 0.73252 0.89955
(—) (2.1124·10−09) (0.5204) (0.001789) (0.12003) (0.04581)
ITL 3.9091·10−09 2.1709·10−10 1.6522 0.027258 0.78169 0.69117
(—) (9.6382·10−10) (0.5444) (0.005660) (0.08346) (0.10746)
PTE 4.2839·10−09 4.4976·10−11 1.6040 0.021412 0.44304 0.90978
(—) (3.8021·10−10) (1.0683) (0.004465) (0.18011) (0.06193)
ESP 3.9411·10−09 8.7586·10−11 1.5713 0.017922 0.60947 0.84159
(—) (6.8220·10−10) (0.9834) (0.003846) (0.10145) (0.06143)
CHF 7.9079·10−09 1.2588·10−14 2.5932 0.097256 0.54513 -0.34532
(—) (2.0515·10−13) (2.0742) (0.014078) (0.06818) (0.12570)
GBP 1.6547·10−09 2.9505·10−14 2.3072 0.044645 -0.15794 0.98908
(—) (6.8264·10−13) (3.0115) (0.011270) (0.08410) (0.00408)
IDR 6.4466·10−06 1.1387·10−13 3.0680 0.266080 0.99317 -0.34217
(—) (7.4904·10−14) (0.0883) (0.014152) (0.00286) (0.04295)
JPY 2.0041·10−08 2.5396·10−10 1.5101 0.089290 0.66882 0.20970
(—) (2.9743·10−09) (1.4339) (0.016493) (0.06534) (0.22682)
KRW 1.1327·10−06 1.5517·10−10 2.0602 0.389410 0.96060 -0.42893
(—) (8.6000·10−11) (0.0715) (0.021410) (0.01093) (0.03143)
TWD 2.9625·10−09 5.7081·10−08 1.1764 0.457370 0.91817 -0.35245
(—) (2.3774·10−08) (0.0590) (0.019184) (0.02405) (0.03477)
Notes: The table reports estimation results of the cMD–ARCH model (9) on thirteen samples of foreign
exchange returns rt : 1 ≤ t ≤ T, where T = 2576. The series are denoted using the international
currency codes. Notation of the model parameters corresponds to that in (9). Asymptotic maximum–
likelihood standard errors are given in parenthesis below the coefficient estimates, except for the fixed
a0 parameter. Diagnostic tests are summarized in Table 3.
66
Table 3: Residuals diagnostics for models in Table 2.
Series Mean Variance Skewness Kurtosis Q(100) Q2(100)
DKK -0.00653864 0.987771 -0.298274∗ 4.46094∗ 100.614 99.5539
FIM -0.00914468 1.005780 -0.238782∗ 4.06924∗ 98.9181 112.562
DEM -0.00412344 0.989840 -0.297643∗ 4.37323∗ 103.247 110.441
IEP -0.00825703 0.982476 -0.059215∗ 5.32161∗ 99.1079 93.2667
ITL -0.00458738 0.975643 -0.184337∗ 4.34671∗ 97.7134 104.048
PTE -0.00315817 0.985543 -0.252066∗ 4.50373∗ 111.260 112.995
ESP -0.00008092 0.987327 -0.219347∗ 4.39398∗ 107.536 112.684
CHF -0.00830310 0.980774 -0.336811∗ 4.61003∗ 82.2046 111.603
GBP -0.01527700 0.987953 -0.154136∗ 4.76172∗ 96.3491 82.6668
IDR 0.08611490 1.150080 0.586289∗ 20.2904∗ 109.062 60.5305
JPY 0.00323757 0.984919 -0.475329∗ 5.33905∗ 111.057 94.8046
KRW 0.01435000 0.977692 0.616896∗ 8.17019∗ 113.408 95.4020
TWD 0.02019300 1.076450 1.898260∗ 40.5118∗ 105.181 43.8986
Notes: The table reports respective statistics of estimated residuals zt : 1 ≤ t ≤ T, shown
in (11), for models in Table 2. The series are denoted using the international currency
codes. Except for mean and variance, the star near a reported test statistic indicates 5% level
significance for the appropriate distribution. The column Q(100), respectively Q2(100), shows
Box–Pierce statistics for estimated residuals, respectively squared estimated residuals, see Li
and Mak (1994). Skewness and kurtosis statistics are according to Jarque and Bera (1987).
67
of empirically reasonable choices of a0.
The estimation results using thirteen series of foreign exchange data described in the previ-
ous subsection are presented in Table 2. A battery of diagnostic tests based on the estimated
residuals series zt : 1 ≤ t ≤ T for each model are reported in Table 3. All estimation and
diagnostic routines used in this subsection are written in Ox version 3.30, see Doornik (2002).
Empirical volatility models reported in Table 2 can be grouped into two main categories.
The models for DKK, FIM, DEM, and GBP series have statistically insignificant fractional inte-
gration parameter d, and a point estimate of φ close to unity. For a larger group of models,
based on IEP, ITL, PTE, ESP, CHF, IDR, JPY, KRW, and TWD series, both d and φ are statistically
significant, where d is close to unity for IDR and KRW models. Importantly, the estimates of d
in all but PTE model in the second group are above 12 , even though the asymptotic standard
errors point to statistical significance of this hypothesis only in ITL, IDR, JPY, KRW, and TWD
models.
The deterministic component of the conditional volatility process in the estimated models
shows almost uniform absence of time dependence. Recall that the parameter c determines
importance of time trend in the deterministic part of the cMD–ARCH process (9). The point
estimates of c are statistically insignificant for all but TWD model. In the latter case α is close
to unity, pointing to IGARCH-like deterministic dynamics in σ2t : t ≥ 0 for this series.
A set of diagnostic tests reported in Table 3 uses series of estimated residuals (11) corre-
sponding to the empirical volatility models in Table 2. Reported tests indicate overall success
of the estimated cMD–ARCH models in picking up essential volatility dependences in the
data. Remaining skewness and excess kurtosis in zt : 1 ≤ t ≤ T do not contradict A1 and
Theorem 2.
The results of empirical volatility modeling presented in this subsection can be summarized
as follows. First, relaxing strict time dynamics in at : t ≥ 0 imposed in many familiar
conditional heteroscedasticity processes, such as IGARCH and FIGARCH, leads to rejection
of deterministic trend component in empirical volatility models. We find that in all but one
model for our sample of foreign exchange rate returns the sequence at : t ≥ 0 is O(1). Second,
only four out of thirteen empirical models imply summable sequence of coefficients θj : j ≥ 0when expressed in the general cMD–ARCH form (3). Among the remaining models, all but
one, have fractional integration parameter d significantly below unity. This finding suggests
empirical importance of conditional heteroscedasticity models with hyperbolically decaying
non–summable weighting coefficients. Third, even though the sequence θj : j ≥ 0 is close to
O(1) for six models in Table 2, the IGARCH hypothesis is not supported by the data due to lack
of statistically significant linear trend component in at : t ≥ 0. Fourth, in six out of thirteen
cases the empirical results point out to non–square–summable θj : j ≥ 0 and thus non–
stationary limiting conditional volatility process. This result indicates potential significance of
non–stationary fractionally integrated cMD–ARCH processes (9) with 12 ≤ d < 1 and c = 0,
opening potentially interesting direction of further research.
68
4 Concluding remarks and further research
Non–stationary models for linear time series, such as I(d) models for d ≥ 1 and trend–stationary
processes, play a substantial role in theoretical and applied econometric research. The main
goal of this paper is to introduce a framework for non–stationary conditional heteroscedasticity
models and to examine some empirical evidence of non–stationary volatility.
The proposed family of models, referred to in the paper as cMD–ARCH processes, allows
for separation of deterministic and stochastic effects in the conditional volatility, similarly to
the linear time series models. The stochastic part consists of a weighted sequence of martingale
difference innovations, permitting a range of different weighting structures. The deterministic
part may include a time trend component. A set of cross–restrictions between the parameters
of the cMD–ARCH process is imposed to ensure non–negativity of the conditional variance.
The paper examines a number of parametrizations of the new process.
A statistical inference theory for the parameters of cMD–ARCH is developed in the paper,
drawing on previous contributions of Jensen and Rahbek (2002, 2003). Consistency and asymp-
totic normality of the QML estimator is shown under general assumptions on the sequence of
innovations.
Finally, an empirical application of the new model to thirteen major European and Asian
foreign exchange returns is included. The results indicate non–stationary volatility for six
cases in the form of non–square–summable weighting coefficients in the stochastic part of the
estimated volatility process.
The findings reported in the paper suggest a potential empirical importance of the following
non–stationary cMD–ARCH model:
r2t = σ2
t = a0 for t ≤ 0
rt = σt zt σ2t = a0 + γ (1− L)−d(r2
t−1 − σ2t−1) for t > 0 ,
where 12 ≤ d < 1 and 0 < γ ≤ d < 1. More work needs to be done to establish statistical
properties and practical implications of this process. In particular, a detailed comparison of
this model with IGARCH sequences of Engle and Bollerslev (1986) may provide fruitful insights
into the area of non–stationary volatility modeling.
5 Appendix
This technical appendix collects proofs of the main results in Section 2. The proof of Theorem 2
is preliminary and is likely to be revised further. Notation is simplified by using convention∑nm · = 0 whenever m > n for m,n ∈ Z.
PROOF OF THEOREM 1: Using definition of πj : j ≥ 0 in (4) and substituting recursively
one can re–write the conditional variance part of (3) as follows:
σ2t = at −
t−1∑j=0
πj at−1−j +t∑
j=1
πj−1r2t−j , (A.1)
69
from where result follows by induction on t ≥ 0 by A1–A2.
PROOF OF THEOREM 2: We proceed to establish sufficient conditions of Basawa, Feigin
and Heyde (1976). First, using (A.1) and denoting Πt(u) := at(u)−∑t−1
j=0 πj(u) at−1−j(u), we
observe that:
σ2(1)t (u0)σ2
t
=Π(1)
t (u0)σ2
t
+t∑
j=1
π(1)j−1(u0)
r2t−j
σ2t
≤ [log Π(u0)](1) +
t∑j=0
[log πj(u0)](1) , (A.2)
where the following inequalities are used:
1σ2
t
≤ 1Πt(u0)
andr2t−j
σ2t
≤ 1πj−1(u0)
for all t ≥ 0 .
By A3 the right–hand side of (A.2) is bounded in absolute value for all t ≥ 0.
Second, the score vector of (10) is shown to be asymptotically normal as follows. Write:
T−12 L
(1)T (u0) = −T−
12
T∑t=1
[1− z2t ]
σ2(1)t (u0)σ2
t
, (A.3)
where by A1 the vectors under the summation are martingale differences. Then:
−T−1T∑
t=1
E
([1− z2
t ]2σ2(1)
t (u0)σ2
t
σ2(1)t (u′0)σ2
t
∣∣∣∣∣Ft−1
)P−→ B0 ,
by A1 and A3, where B0 is positive definite. Since σ2(1)t (u0)
σ2t
: t ≥ 0 is bounded according
to A3, the asymptotic normality of (A.3) follows by the martingale central limit theorem for
random vectors.
Third, the Hessian of the log–likelihood function evaluated at u0 is given by:
−T−1L(2)T (u0) = −T−1
T∑t=1
[2z2t − 1]
σ2(1)t (u0)σ2
t
σ2(1)t (u′0)σ2
t
− T−1T∑
t=1
[1− z2t ]
σ2(2)t (u0)σ2
t
.
The second part on the right–hand side of this expression converges to zero by A1. The limit
of the first part is following:
−T−1T∑
t=1
[2z2t − 1]
σ2(1)t (u0)σ2
t
σ2(1)t (u′0)σ2
t
P−→ −A0 ,
where A0 is a positive definite matrix.
Finally, condition (B7) in Basawa, Feigin and Heyde (1976) is needed to ensure weak conver-
gence of the Hessian evaluated at the neighborhood of u0 to −A0. It is non–trivial to check in
the general setting of the cMD–ARCH model without considering specific parametrizations of
the parameters. In Berkes, Horvath and Kokoszka (2003a) the required convergence is claimed
by sufficient continuity of T−1L(2)T (u) with respect to u in the neighborhood of the true vector
u0. The analogous continuity requirements are imposed in A4.
70
References
Baillie, Richard T., Tim Bollerslev and Hans O. Mikkelsen (1996) Fractionally integrated gen-
eralized autoregressive conditional heteroskedasticity. Journal of Econometrics, vol. 74,
pp. 3–30.
Berkes, Istvan, Lajos Horvath and Piotr Kokoszka (2003a) GARCH processes: Structure and
estimation. Bernoulli, vol. 9, pp. 201–227.
Berkes, Istvan, Lajos Horvath and Piotr Kokoszka (2003b) Asymptotics for GARCH squared
residual correlations. Econometric Theory, vol. 19, pp. 515–540.
Bollerslev, Tim (1986) Generalized autoregressive conditional heteroskedasticity. Journal of
Econometrics, vol. 31, pp. 307–327.
Bollerslev, Tim, Robert F. Engle and D. B. Nelson (1994) ARCH models. Handbook of Econo-
metrics, vol. IV, pp. 2961–3031, New-York: Elsevier Science.
Bougerol, Philippe and Nico Picard (1992) Stationarity of GARCH processes and of some
non–negative time series. Journal of Econometrics, vol. 52, pp. 115–127.
Davidson, James (2003) Moment and memory properties of linear conditional heteroscedasticity
models, and a new model. Preprint.
Doornik, Jurgen A. (1998) Object–Oriented Matrix Programming Using Ox, 3rd ed. London:
Timberlake Consultants Press and Oxford: www.nuff.ox.ac.uk/Users/Doornik.
Engle, Robert F. (1982) Autoregressive conditional heteroskedasticity with estimates of the
variance of U.K. inflation. Econometrica, vol. 50, pp. 987–1008.
Engle, Robert F. and Bollerslev, Tim (1986) Modeling persistence of conditional variance.
Econometric Reviews, vol. 5, pp. 1–50.
Giraitis, Liudas, Piotr Kokoszka and Remigijus Leipus (2000) Stationary ARCH models: de-
pendence structure and central limit theorem. Econometric Theory, vol. 16, pp. 3–22.
Giraitis, Liudas, Piotr Kokoszka, Remigijus Leipus and Gilles Teyssiere (2000) Semiparametric
estimation of the intensity of long memory in conditional heteroskedasticity. Statistical
Inference for Stochastic Processes, vol. 3, pp. 113–128.
Giraitis, Liudas, Remigijus Leipus and Donatas Surgailis (2003) Recent advances in ARCH
modelling. Preprint.
Hall, Peter and Qiwei Yao (2003) Inference in ARCH and GARCH models with heavy–tailed
errors. Econometrica, vol. 71, pp. 285–317.
Jarque, C. M. and A. K. Bera (1987) A test for normality of observations and regression
residuals. International Statistical Review, vol. 55, pp. 163-172.
Jensen, Søren Tolver and Anders Rahbek (2003) Asymptotic normality for Non–stationary,
explosive GARCH. Preprint.
Jensen, Søren Tolver and Anders Rahbek (2002) Non–stationary and no moments asymptotics
for the ARCH model. CAF Working Paper Series, no. 124
Kazakevicius, Vytautas and Remigijus Leipus (2002) On stationarity in the ARCH(∞) model.
Econometric Theory, vol. 18, pp. 1–16.
71
Koulikov, Dmitri (2003a) Modeling sequences of long memory non–negative covariance station-
ary random variables. CAF Working Paper Series, no. 156
Koulikov, Dmitri (2003b) Long memory ARCH(∞) models: specification and quasi–maximum
likelihood estimation. CAF Working Paper Series, no. 165
Lee, Sang-Won and Bruce E. Hansen (1994) Asymptotic theory for the GARCH(1,1) quasi–
maximum likelihood estimator. Econometric Theory, vol. 10, pp. 29–52.
Li, W. K. and T. K. Mak (1994) On the squared residual autocorrelations in non–linear time se-
ries with conditional heteroscedasticity. Journal of Time Series Analysis, vol. 15, pp. 627–
636.
Ling, Shiqing and Michael McAleer (2002) Necessary and sufficient moment conditions for the
GARCH(r,s) and assymetric power GARCH(r,s) models. Econometric Theory, vol. 18,
pp. 722–729.
Nelson, Daniel B. (1990) Stationarity and persistence in the GARCH(1,1) model. Econometric
Theory, vol. 6, pp. 318–334.
Robinson, Peter M. (1991) Testing for strong serial correlation and dynamic conditional het-
eroskedasticity in multiple regression. Journal of Econometrics, vol. 47, pp. 67–84.
Robinson, Peter M. and Paolo Zaffaroni (2003) Pseudo–maximum likelihood estimation of
ARCH(∞) models. Preprint.
72
Chapter 4: Conditional heteroscedasticity model fordiscrete high-frequency price changes: with
application to IBM trades data
73
Conditional heteroscedasticity model for discrete high-frequency
price changes: with application to IBM trades data
Dmitri Koulikov∗
Department of Economics
School of Economics and Management
University of Aarhus
Aarhus C, 8000, Denmark
phone: +45 89421577
e-mail: [email protected]
This revision:
February 7, 2002
Abstract
In this paper we present conditional heteroscedasticity models for time-series of discrete
price changes in high-frequency financial data. They combine tractability of observation-
driven GARCH models of Bollerslev (1986) with the simplicity of the ordered probit/logit
structure of Hausman, Lo and MacKinlay (1992). In contrast to the ACM model of Russell
and Engle (1998) and the ADS decomposition model of Rydberg and Shephard (2003), we
separate groups of parameters driving conditional mean and conditional variance of the
data, allowing us to test the effects of explanatory variables separately on the two moments
of high-frequency price changes. We introduce two models belonging to the class outlined
above: IV-GARCH model with short-memory volatility dynamics and IV-FIARCH model
with long-range dependence in the conditional volatility. Application of the models to IBM
trades dataset is provided.
JEL classification: C22, C25, C51, G10
Keywords: High-frequency financial data, Time-series of discrete random variables, Con-
ditional heteroscedasticity, Markov chains, Non-linear econometric models.
∗I wish to thank the participants of the following conferences for their helpful suggestions: “Market Mi-
crostructure and High-Frequency Data in Finance”, Sandbjerg, “57th European Meeting of the Econometric
Society”, Venice, and “VIIth Spring Meeting of Young Economists”, Paris. The usual disclaimer applies.
74
1 Introduction
This paper presents a contribution to the literature on econometric modeling of high-frequency
financial data. We introduce a class of observation-driven models for conditionally heteroscedas-
tic discrete price changes in the spirit of GARCH models of Engle (1982) and Bollerslev (1986)1.
This class includes both short- and long-memory models, where the latter is able to accommo-
date substantial persistence found in the volatility of discrete price changes. In addition, our
models admit a relatively straightforward integration with financial duration models, such as
the ACD model of Engle and Russell (1998) and Engle (2000), giving the framework for joint
modeling of stochastically dependent inter-trade durations and price changes. Thus, this paper
follows a research agenda put forward in Rydberg and Shephard (2000), where the authors
propose compound Poisson process as the basic statistical model for high-frequency financial
data.
Recent availability of high-frequency financial data, coupled with increased computing ca-
pacity, has spurred the literature seeking to develop a range of econometric techniques suitable
for its statistical modeling. Among important recent contributions in the area are Davis, Ry-
dberg, Shephard and Streett (2001), Engle (2000), Engle and Russell (1998), Gerhard and
Pohlmeier (2000), Hausman, Lo and MacKinlay (1992), Rydberg and Shephard (1999, 2000),
Russell and Engle (1998) and Engle (2000). A good survey of this relatively new econometric
field is Hautsch and Pohlmeier (2002).
From a viewpoint of the established econometric methodology, high-frequency financial data
presents several new challenges. First and foremost, high-frequency datasets contain collections
of variables, such as bid and ask quotes, trade prices and trade volumes, were observations are
separated by stochastic time intervals. As suggested by the market microstructure literature,
these intervals themselves carry an important informational content and therefore should be
modeled simultaneously with other variables. This point has been recently stressed in papers
by Dufour and Engle (2000), Ghysels (2000) and Gerhard and Pohlmeier (2000).
Secondly, trade prices and quotes of many financial assets are discrete, reflecting the insti-
tutional structure of the markets. As documented in Rydberg and Shephard (2000), Campbell,
Lo and MacKinlay (1997) and in section 4 of this paper, discrete price changes in high-frequency
financial data exhibit statistical features similar to those normally observed in continuous low-
frequency returns.
Finally, the real-time character of high-frequency data further contributes to its statistical
complexity. Intra-day volatility patterns, news announcement effects and a range of mar-
ket microstructure-specific idiosyncrasies make econometric modeling of this data particularly
challenging.
In this paper we focus on discreteness of prices and quotes in high-frequency financial data.
Our models are suitable both for price changes series “binned” into regular time intervals in the
spirit of Davis, Rydberg, Shephard and Streett (2001), and for the original irregularly-spaced1A comprehensive survey of the GARCH literature can be found in Bollerslev, Chou and Kroner (1992).
75
data. In the latter case, our models integrate with the class of stochastic duration models
for financial data, providing a foundation for joint modeling of durations and price changes in
high-frequency data.
Our work is motivated by the need for a tractable model that picks up essential dynamics of
discrete price changes, in particular, of their second moment. The success of GARCH models of
Engle (1982) and Bollerslev (1986) for low-frequency financial returns calls for the development
of a similar observation-driven conditional heteroscedasticity model for discrete data. While
both the Autoregressive Conditional Multinomial (ACM) model of Russell and Engle (1998)
and the ADS decomposition model of Rydberg and Shephard (1999a) are able to describe
the observed volatility clustering phenomenon in high-frequency price changes, neither of the
two has a single underlying parameter driving the volatility. Among other things, this fact
complicates testing of economically relevant hypothesis linked to the second moment of price
changes in high-frequency data.
We reuse the idea of Hausman, Lo and MacKinlay (1992), whereby the discrete distribu-
tion of price changes is approximated using the logistic distribution function in a framework
resembling the ordered logit model. However, unlike Hausman, Lo and MacKinlay (1992), our
models are cast entirely in terms of discrete random variables and therefore allow for simple
estimation and diagnostic procedures. Moreover, the probabilistic structure of the models lends
itself to the study of stationarity and moments of the discrete price change series.
Our models have two separate parameters: one driving conditional first moment of the
discrete distribution of price changes, and the other driving its conditional variance. The latter
is specified similarly to the GARCH model of Bollerslev (1986), allowing for a straightforward
extension for the long-memory case and augmentation with a set of exogenous explanatory
variables, such as announcement dummies and deterministic intra-day seasonality. In the paper
we look in detail at both short- and long-memory cases, referred to as IV-GARCH and IV-
FIARCH respectively2.
The paper is organized as follows. Section 2 gives a short overview of the existing models
for high-frequency financial data and discuss advantages and drawbacks of existing approaches.
Section 3 introduces IV-GARCH and IV-FIARCH models, together with conditions for their
existence and stationarity and an overview of the estimation and diagnostic methods. Sec-
tion 4 gives the description of high-frequency IBM trades dataset used in the empirical part of
the paper. Section 5 presents estimation results of the IV-GARCH and IV-FIARCH models.
Conclusion summarizes the findings and discusses directions for the future research.2These abbreviations stand for Integer-Valued Generalized Autoregressive Conditional Heteroscedasticity and
Integer-Valued Fractionally Integrated Generalized Autoregressive Conditional Heteroscedasticity respectively.
76
2 Short overview of existing models for high-frequency finan-
cial data
In this section we give an overview of three existing models for discrete price changes in high-
frequency financial data: the ordered probit model of Hausman, Lo and MacKinlay (1992), the
ACM model of Russell and Engle (1998), and the ADS decomposition model of Rydberg and
Shephard (2003). We also provide a brief discussion of a number of other contributions in the
field. Some short comparison of the three models of price changes with our model is also given.
2.1 Ordered probit model of Hausman, Lo and MacKinlay (1992)
One of the first studies in the literature on modeling discrete price changes in high-frequency
financial data, Hausman, Lo and MacKinlay (1992) propose an ordered probit model to capture
essential features of such data. The ordered probit and logit models are well known from cross-
sectional econometrics, with main application area in the modeling of individual choices with
a natural ordering structure.
Hausman, Lo and MacKinlay (1992) motivate their model using the traditional framework
for ordered probit and logit:
∆p∗i = x′i β + εi ,
where xi is a vector of exogenous explanatory variables and εi is assumed to be an independent
normal random variable with variance σ2i . In this setting ∆p∗i is an unobserved continuous
state variable3. The observation rule is then given by:
∆pi =
−K if ∆p∗t ∈ A−K
−K + 1 if ∆p∗t ∈ A−K+1
......
...
K if ∆p∗t ∈ AK
,
where K denotes maximum allowable absolute price change, and Ak : −K ≤ k ≤ K defines
a finite partition of R.
Hausman, Lo and MacKinlay (1992) make the variance of εi term dependent on a set of
exogenous variables wi:
σ2i = 1 + w′
i γ .
The need for heteroscedastic εi in their model is motivated by appealing to the diffusion models
for prices of financial assets, where the variance of increments depends on the sampling interval.
Since price changes in high-frequency financial data are spaced irregularly, they include inter-
trade duration variable in wi to control for possible heteroscedasticity.
The models for ∆pi developed in section 3 have the structure superficially similar to that of
ordered probit model of Hausman, Lo and MacKinlay (1992). In particular, our models inherit3In the remainder of this paper we index observations in the high-frequency financial data by the subscript
i. This convention underlines the fact that such data may not be regularly spaced in time.
77
the ordered logit mechanism of modeling probabilities of price changes. However, the models
developed in this papers do not have underlying continuous state variable ∆p∗i of Hausman,
Lo and MacKinlay (1992)4. This difference is not purely motivational: the variable ∆p∗i in the
ordered probit model of Hausman, Lo and MacKinlay (1992) figures prominently throughout
their paper, and in particular the diagnostic procedures developed by the authors rely on
conditional independence and normality of ∆p∗i . This makes the specifications tests unnecessary
complicated due to the latent character of the state variable.
Moreover, interpretation of ∆p∗i in the setting of financial markets is at best vague. Haus-
man, Lo and MacKinlay (1992) carefully avoid linking the state variable ∆p∗i to the hypothetical
underlying continuous price, which is then rounded to the nearest tick by the ordering structure
of their model. Their empirical findings indicate that the boundaries of Ak : −K ≤ k ≤ Kare misaligned with respect to the $1
8 grid and vary substantially from stock to stock. How-
ever, when ∆p∗i does not have a price interpretation, the relevance of time varying σ2i becomes
unclear.
In this paper we show that the empirical success of the model in Hausman, Lo and MacKin-
lay (1992) lies in the flexibility of the ordered probit/logit structure in fitting the shape of the
discrete distribution of high-frequency price changes. By looking at the ordered logit model
merely as a convenient and parsimonious way to specify a probability distribution function of
discrete ∆pi, we are able to cast an entire class of models in section 3 in terms of discrete
random variables.
2.2 ACM model of Russell and Engle (1998)
Autoregressive conditional multinomial (ACM) model of Russell and Engle (1998) provides a
general framework for modeling the dynamics of a series of discrete random variables. Russell
and Engle (1998) specify a dynamic model for the discrete probability distribution of high-
frequency price changes ∆pi. For price changes in transaction data the state space of ∆pi is
assumed to be a bounded interval of Z symmetric around zero5. Let probabilities of individual
price changes be denoted by π = (π−K , . . . , πK)′, where the support of ∆pi has the length
2K + 1. Russell and Engle (1998) suggest a dynamic model for πi of the following form:
h(πi) =p∑
j=1
Aj(1∆pi−j − πi−j) +q∑
j=1
Bj1∆pi−j +r∑
j=1
Cjh(πi−j) +GZi , (1)
4Historically, ordered probit and logit models were always build upon linear regressions with continuous
disturbances and appropriate classifying observation rules. In medical and biological applications the underlying
state variable is often interpreted as time or drug dosage, whereas in economics it most often represents level
of utility. For an overview of the models refer to McFadden (1984). In this paper we work only with discrete
random variables whose probability mass function is parametrized similarly to the structure of the ordered logit
model; see section 3.5We use this assumption throughout the paper. However, both the ACM model of Russell and Engle (1998)
and the models developed in section 3 are not limited to this case.
78
where 1∆pi denotes a (2K + 1)× 1 vector of the form (1∆pi=−K , . . . ,1∆pi=K)′, Zi stands for a
set of weakly exogenous explanatory variables, and the link function h is chosen such that the
probabilities πk : −K ≤ k ≤ K sum up to unity. In their subsequent discussion of the model
and its empirical applications Russell and Engle (1998) utilize multinomial logit specification
for h.
With this level of generality almost every other model for the time series of discrete random
variables will be a special case of the ACM model. In its practical application to the IBM
transaction data Russell and Engle (1998) impose certain restrictions on the parameters of
the ACM model in equation (1) justified by the considerations of response symmetry. These
restrictions allow them to substantially reduce the number of parameters that have to be
estimated.
While capable of producing very good fits to the high-frequency price change data, the
ACM model has certain drawbacks from the empirical researcher perspective. Multinomial
logit specification of the link function h leads to difficulties in interpreting parameters of the
model. In particular, direction in which included variables affect the probabilities of the states
of ∆pi will not in general coincide with the signs of respective coefficients, and the total effect
of an explanatory variable will depend on a subset of parameters. In the case of complicated
dynamic specification, such as given in equation (1), numerical simulations from the model
give a feasible solution to this interpretation problem. More generally, parameters of the ACM
model do not naturally fall into groups driving conditional moments of the discrete distribution
of ∆pi. While it is possible to identify a subset of parameters in the model that will enter the
expressions for the conditional moments of ∆pi, it is likely to be a complicated expression that
is difficult to keep track of in practice. Therefore, testing of hypotheses that explicitly involve
restrictions on the conditional moments is difficult.
The models for conditionally heteroscedastic discrete price changes presented in section 3
share many similarities with the ACM model of Russell and Engle (1998). However, several
important differences are apparent. Firstly, our models are designed to have identifiable groups
of parameters driving conditional first and conditional second moments of ∆pi. Hence, a range
of economically interesting hypotheses related to the moments of discrete price changes can be
tested directly, without the need to resort to post-estimation simulations. Secondly, by switch-
ing from the multinomial logit link function of the ACM model to the ordered logit or ordered
probit link function, such as in the model of Hausman, Lo and MacKinlay (1992), substantial
gains in terms of model parsimony are realized. This allows reducion of the computation time
needed to maximize the likelihood function — an important consideration in increasingly large
high-frequency datasets. Finally, having a simpler specification for discrete price changes allows
us to derive results pertaining to the stationary distribution of ∆pi.
79
2.3 The ADS model of Rydberg and Shephard (2003)
Rydberg and Shephard (2003) propose a decomposition model for the discrete price changes in
financial data, whereby ∆pi is assumed to be the product of three random processes as follows:
∆pi = AiDi Si ,
where Ai : i ∈ Z is the binary process on 0, 1 describing trading activity, Di : i ∈ Zis the binary process on −1, 1 modeling direction of the price movement, and Si : i ∈ Zis the random process defined on the set of positive integers giving the magnitude of price
change. All three processes are allowed to be interdependent with possible inclusion of other
exogenous explanatory variables and intra-day seasonality. In their empirical analysis Rydberg
and Shephard (2003) use the autologistic model for Ai : i ∈ Z and Di : i ∈ Z, and specify
Si : i ∈ Z by the negative binomial GLARMA process.
The ADS decomposition model has certain advantages from the point of view of market
microstructure research and hypothesis testing. In many cases the market activity process
Ai : i ∈ Z can be the sole focus of research. The product of Ai and Di represents the censored
model of price movement, where ∆pi is at most allowed to change by one tick. Rydberg and
Shephard (2003) report a number of interesting results concerning the dynamics and effect of
exogenous variables on Ai : i ∈ Z, Di : i ∈ Z and Si : i ∈ Z.However, as in the ACM model of Russell and Engle (1998), the effects of explanatory
variables in the ADS decomposition model are not directly tied to the moments of the discrete
price changes. Although Si is tightly related to the volatility of ∆pi, the activity indicator Ai
must also be accounted for in the implied second moment of the price changes distribution.
The model for high-frequency price changes presented in this paper makes the link between the
parameters and the moments of ∆pi even more explicit.
2.4 Other contributions
Among other important contributions to the econometric modeling of high-frequency data we
mention ACD-GARCH models for irregularly spaced financial data by Ghysels and Jasiak (1997),
UHF-GARCH model by Engle (2000) and dynamic model for discrete bid-ask quotes with
ARCH volatility by Hasbrouck (1999). The first two models are not designed to account for
data discreteness, whereas Hasbrouck (1999) models it in the framework of the so-called round-
ing models of discreteness surveyed in Campbell, Lo and MacKinlay (1997) pp. 114-122.
ACD-GARCH model of Ghysels and Jasiak (1997) is based on the GARCH aggregation
results of Drost and Nijman (1993), where aggregation intervals are stochastic and driven by
the autoregressive conditional duration (ACD) model of Engle and Russell (1998). Ghysels
and Jasiak (1997) introduce a latent GARCH model that generates unobserved conditionally
heteroscedastic returns at the highest observed frequency (normally 1 second). Parameters of
this latent GARCH model are of the primary interest for the researcher. Observed irregularly
spaced returns come from the aggregation of the latent data, where the aggregation intervals
80
are stochastic and driven by the ACD model. This leads to the GARCH model with random
coefficients that depend on the expected duration parameter from the ACD part. The basic
model can be appended to include deterministic intra-day seasonality and effects of the exoge-
nous explanatory variables. Ghysels and Jasiak (1997) present application of the ACD-GARCH
to the IBM transaction dataset. They find that latent GARCH model features remarkably low
volatility persistence — something that contrasts many other results, including those reported
in Engle (2000). Ghysels and Jasiak (1997) interpret this results as the demonstration of the
important role of the persistence of inter-trade durations, that together with the high-frequency
returns create the evidence of substantial volatility persistence in high-frequency data.
The ACD-GARCH model of Ghysels and Jasiak (1997) is one of the few contributions
in the current literature attempting to link the volatility process of high-frequency returns
with the process driving the inter-trade durations through their joint modeling. Another such
attempt is made in Russell and Engle (1998), who also use the ACD model to fit the durations
data. However, as also pointed out in Ghysels (2000), both models are reduced to the two-step
framework, where the ACD model for durations data is estimated first under the assumption of
exogeneity from the process driving the high-frequency returns or price changes data. Ghysels
and Jasiak (1997) also attempt to test causality from the volatility of high-frequency returns
to the inter-trade durations and find some supporting evidence for it.
The UHF-GARCH model of Engle (2000) uses high-frequency returns scaled by the actual
inter-trade durations for modeling in the usual GARCH framework. Scaling of the returns by
the square root of durations is intuitively justified as the natural measure of the volatility per
unit of time. Like Russell and Engle (1998), Engle (2000) also studies the effect of actual and
predicted durations on the conditional second moment of scaled returns. However, in contrast to
Russell and Engle (1998), Engle (2000) finds that actual durations have a statistically significant
effect on the variance of scaled high-frequency returns.
Both the ACD-GARCH model of Ghysels and Jasiak (1997) and UHF-GARCH model of
Engle (2000) ignore the inherent discreteness of the high-frequency financial data and fit tradi-
tional GARCH models into it. There have been no studies up to date discussing outcomes of
this modeling decision on the performance of GARCH models. As documented in Hausman,
Lo and MacKinlay (1992), Campbell, Lo and MacKinlay (1997) pp. 107-114, Russell and En-
gle (1998), Rydberg and Shephard (2000) and many other studies, high-frequency transaction
data normally contains a large proportion of zero price changes and, therefore, zero returns.
In addition to that, minimum price change of one tick is usually sufficiently coarse compared
to the price level of the asset, leading to the bunching of high-frequency returns around the
points of support of the discrete price change distribution; see Szpiro (1998) and Crack and
Ledoit (1996) for more on this effect. For some distributional assumptions, such as GED in the
EGARCH model of Nelson (1991), the concentration of probability mass on the zero returns
may lead to numerical instabilities and failures to estimate the model; see Hasbrouck (1999).
Moreover, predictions from such models are likely to fail to generate a sufficiently large amount
81
of zero returns and to pick up the bunching.
Hasbrouck (1999) proposes a dynamic model for the discrete bid and ask quotes that is
largely motivated by the insights from the market microstructure theory. Apart from the
discreteness, his model features ARCH effects and incorporates costs of market making. Has-
brouck (1999) approach discreteness by the rounding of continuous data generated from the
latent time-series process with conditionally heteroscedastic innovations. The rounding is asym-
metric and is related to the cost of market making. This way of introducing discreteness into
the model goes back to the contributions of Gottlieb and Kalay (1985) and Ball (1988), who
use this setup to consider effects of discreteness on the estimator of variance of the continuous
underlying process. Inference in the model is complicated by the presence of several latent
components and is based on the recursive likelihood calculations, where the state variables
are integrated out using numerical methods. Empirical findings of Hasbrouck (1999) imply
highly peaked distribution of the innovations to the unobserved price process together with the
relatively low degree of persistence of their variance.
3 IV-GARCH and IV-FIARCH models for high-frequency fi-
nancial data
In this section we present a class of models for time-series of discrete conditionally heteroscedas-
tic price changes in high-frequency financial datasets. The models are motivated by Hausman,
Lo and MacKinlay (1992) ordered probit model for transactions data with the variance process
similar to the GARCH model of Bollerslev (1986). The models belong to the class of obser-
vation driven models in the sense of Cox (1981) and lead to the straightforward maximum
likelihood based inferential procedures. The basic specification can be also used to parametrize
conditional variance of discrete price changes in terms of a background driving unobserved
process.
3.1 Empirical regularities in the distribution of high-frequency price changes
in financial data
Before we proceed with discussion of IV-GARCH and IV-FIARCH models in the following
subsections, we present several stylized facts pertaining to the statistical properties of high-
frequency price changes. For a much broader survey of general empirical regularities of high-
frequency financial data refer to Campbell, Lo and MacKinlay (1997) pp. 107-114, and Hautsch
and Pohlmeier (2002). Here we emphasize the following three most notable features of ∆pi :
i ∈ Z:
1. Figure 1 depicts unconditional probability density function of high-frequency price changes
in IBM trades dataset. Description of the dataset will be given in section 4. It is seen
that the function is nearly symmetric around ∆pi = 0, with high concentration of the
probability mass on the middle state and almost no mass for |∆pi| ≥ $12 ; see Campbell,
82
-3.5 -3 -2.5 -2 -1.5 -1 -.5 0 .5 1 1.5 2 2.5 3 3.5
2.5
5
7.5
10
12.5
15
17.5
20
22.5
Date: 25-07-2001 17:50:14 Filename: trade_durations.mat
Figure 1: Unconditional distribution of ∆pi in IBM transaction data. Tick size equals to $18 .
0 50 100 150 200 250 300 350 400 450 500
-.4
-.2
0
Correlogram
Date: 26-07-2001 11:57:46 Filename: trade_durations.mat0 50 100 150 200 250 300 350 400 450 500
.1
.2
.3
.4
Correlogram
Figure 2: Correlograms of ∆pi (upper panel) and |∆pi| (lower panel) for IBM transaction data.
83
-3.5 -3 -2.5 -2 -1.5 -1 -.5 0 .5 1 1.5 2 2.5 3 3.5
5
10
15
Conditional price change
Date: 24-10-2001 14:19:01 Filename: trade_durations.mat-3.5 -3 -2.5 -2 -1.5 -1 -.5 0 .5 1 1.5 2 2.5 3 3.5
5
10
15
20
Conditional price change
Figure 3: Distribution of ∆pi conditional on ∆pi−1 ≥ 0 (upper panel) and on ∆pi−1 ≤ 0 (lower
panel) in IBM transaction data. Tick size equals to $18 .
-1.25 -1 -.75 -.5 -.25 0 .25 .5 .75 1 1.25
10
20
Price changes on 23/11/90
Date: 30-10-2001 17:52:11 Filename: trade_durations.mat-1.25 -1 -.75 -.5 -.25 0 .25 .5 .75 1 1.25
10
20
Price changes on 17/01/91
Figure 4: Marginal distributions of ∆pi on 23rd of November 1990 (upper panel) and 17th of
January 1991 (lower panel) in IBM transaction data. Tick size equals to $18 .
84
Lo and MacKinlay (1997) pp. 107-114 and Hautsh and Pohlmeier (2001) for the similar
evidence in other high-frequency datasets;
2. Another notable regularity of the data is a significant negative first-order autocorrelation
of ∆pi : i ∈ Z series; see Figure 2. It follows that conditional on the sign of previous
observation, the distribution of ∆pi will be asymmetric; see Figure 3 for the illustration.
The negative autocorrelation is consistent with the bid-ask bounce model of Roll (1984).
3. Finally, as it is the case of many financial series with continuous support, ∆pi : i ∈ Zappears to exhibit dynamic heteroscedasticity. In the case of IBM data it is illustrated
by the correlogram of |∆pi| on Figure 2. We also demonstrate probability mass function
of ∆pi on Figure 4 at two different trading dates within our sample, where the difference
in the tail mass distribution is apparent.
In the next subsections we present a framework for econometric modeling of discrete price
changes, where the three properties outlined above are accounted for in a relatively parsimo-
nious and mathematically tractable way. The models can be easily extended in many other
directions, making them suitable for testing a range of market microstructure related theories.
In sections 4 and 5 we evaluate the fit of the models on the real-world IBM trades dataset.
3.2 Discrete distribution for high-frequency price changes
The discrete distribution for ∆pi in high-frequency financial series forms the basic building
block of the models proposed further in this section. While there is a large class of statistical
distributions with discrete support — a good overview of these can be found in Feller (1968)
and Johnston and Kotz (1969) — most are restricted to the non-negative counts and therefore
are not suitable for modeling ∆pi. The few parametric discrete distributions that are defined
for the intervals of Z do not seem to have enough flexibility to accommodate a range of patterns
of ∆pi that was documented in the previous subsection. Below we introduce a parametrization
for the discrete distribution of high-frequency price changes that allows us to pick up changes
in the first and second moments of the discrete data using as few parameters as possible.
We model the set of probability atoms π = (π−K , . . . , πK) on a bounded interval of Z
symmetric around zero as a function of two parameters linked to the first two moments of
∆pi6. In the reminder of the paper the two parameters are denoted µ and σ2, where we allow
for a non-linear relationship between µ and E(∆pi|µ, σ2) and σ2 and V(∆pi|µ, σ2). In addition,
the following assumptions are used throughout the paper:
A1. 0 < πk(µ, σ2) < 1 and∑K
k=−K πk(µ, σ2) = 1 for all k = −K . . .K, µ ∈ R and σ2 ∈ R+.
6Here and henceforth we use the same notation as in subsection 2.
85
A2. Functions πk(µ, σ2) : −K ≤ k ≤ K have the following limits, for some 0 ≤ δ < 1:
limσ2→0
πk(µ, σ2) = δ for k = −K . . . ,−1, 1, . . . ,K
limσ2→0
π0(µ, σ2) = 1− δ
limσ2→∞
πk(µ, σ2) < 1 .
A 3. Functions σ2 7→ πk(µ, σ2) : −K ≤ k ≤ K are Lipschitz, and statisfy |πk(µ, x) −πk(µ, y)| ≤ Ck|x− y| for all k = −K . . .K and some Ck : −K ≤ k ≤ K ⊆ R+.
A1 imposes restriction on the vector π, ensuring that the probability mass remains at all
points in the support of ∆pi, regardless of the values of µ and σ2. This restriction is common in
econometric models with absolutely continuous distributions and time-varying volatility, such
as GARCH and EGARCH models. It also plays an important role in ensuring dynamic stability
of the time-series models based on this discrete distribution.
A2 provides a number of restrictions on the behavior of π as the function of σ2. First two
limits guarantee that the probability mass concentrates at the middle point of the support,
possibly leaving only some limited probability mass in the tails, as σ2 approaches zero. The
third limit ensures stability of the probability distribution as σ2 →∞ by requiring the functions
πk(µ, σ2) : −K ≤ k ≤ K to have a well-defined limit.
Finally, A3 imposes extra regularity requirements on the functions πk(µ, σ2) : −K ≤ k ≤K. In particular, it guarantees their smoothness with respect to σ2 — a desired property in
an econometric model.
General framework presented above resembles the one outlined by Russell and Engle (1998),
but some important differences are present. Notably, as was mentioned in subsection 2.2, the
model for discrete distribution of ∆pi in this paper is designed to have identifiable groups of
parameters associated with the moments of high-frequency price changes. Because volatility
in finance plays a prominent role and many hypotheses are specifically linked to the second
moment of financial data, the structure of our model should provide a powerful and convenient
tool for the empirical research. Another important distinction of our approach is to make
πk : −K ≤ k ≤ K the functions of parameters of interest, rather than to model the evolu-
tion of π in the generalized VAR framework. As will become more clear below, we sacrifice
some flexibility of the generalized VAR framework of Russell and Engle (1998) in order to
obtain more parsimonious, easier to interpret structure of the discrete distribution of ∆pi. In
fact, for modeling price changes in high-frequency data, one hardly needs completely flexible
parametrization of π. As we saw in the previous subsection, the spectrum of distributions of
∆pi have clear common features, such as pronounced concentration of probability mass on zero
price change and thin tails. We make use of these facts to simplify our model and to gain in
its interpretability.
Once the suitable mapping (µ, σ2) 7→ π is established, a parametrization of µ and σ2 can
be selected. For example, in the spirit of observation driven models of Cox (1981), µ and
σ2 can be made dependent on the history of the process and a set of exogenous variables.
86
Another suggestion is to parametrize them in terms of latent state variables. Moreover, it
becomes possible to model µ and σ2 together with other endogenous variables of interest in the
high-frequency dataset. We will explore some of these possibilities later in the paper.
In the reminder of this subsection we describe a particular parametrization of functions
πk(µ, σ2) : −K ≤ k ≤ K such that A1–A3 are satisfied and the parameters µ and σ2 are
linked to the first two moments of ∆pi. Our choice is similar to the popular ordered logit
model, where logistic distribution function is used as a link function between µ and σ2 and the
probabilities π. This choice was originally motivated by the ordered probit model of Hausman,
Lo and MacKinlay (1992), but as we mentioned previously, our model is not in the class of
ordered logit models. We use logistic link function for the parametrization of π because of
its convenience in describing the variety of forms of the probability mass distributions of ∆pi
observed in the data; refer to subsection 2.1 for the discussion7.
The logistic distribution function is a continuous bounded function with two parameters:
location parameter µ and scale parameter σ2. In addition to these, probabilities πk : −K ≤k ≤ K are defined using an extra set of parameters α = (−αK−1, . . .−α1,−1, 1, α1, . . . αK−1)
in the following way:
π−K(µ, σ2) :=1
1 + e−−αK−1−µ
σ
...
π0(µ, σ2) :=1
1 + e−−1−µ
σ
− 1
1 + e−1−µ
σ
(2)
...
πK(µ, σ2) := 1− 1
1 + e−αK−1−µ
σ
.
Together with the assumed structure of α, the distribution of ∆pi is seen to be an interval of Z
symmetric around 0. This parametrization ensures that the probabilities πk : −K ≤ k ≤ Ksum up to unity. Note that in the rest of this paper it will be assumed that the parameters
(α1, . . . αK−1) are time-invariant, and do depending on any set of exogenous or predetermined
variables. The vector of parameters α can be thought of as defining the unconditional distri-
bution of ∆pi : i ∈ Z, whilst the conditional distributions of price changes is modeled using
µ and σ2.
A symmetric discrete distribution of ∆pi around its middle state, normally ∆pi = 0, obtains
whenever the location parameter µ is zero. When µ 6= 0, probability mass swings to the left
or to the right tail of the distribution, leading to non-zero expected price change. This gives
convenient way of picking up variations in the conditional first moment of ∆pi in the real-world7Other models for heteroscedastic sequences of discrete random variables can be constructed. For example,
consider partition of the unit interval [0, 1) into subintervals A1 = (0, α1], A2 = (α1, α2] and A3 = (α2, 1] s.t.
0 < α1 < α2 < 1. Assign πi = λ1(Ai) for i = 1, 2, 3 where λ1 is the Lebesgue measure on [0, 1). By parametrizing
α1 and α2, a variety of shapes of the trivariate distribution of ∆pi can be achieved. In particular, E(∆pi) changes
when A2 is moved around the unit interval and λ1(A2) stays constant, while V(∆pi) varies depending on λ1(A2).
87
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
-4 -3 -2 -1 0 1 2 3 4
mu
sigma=1sigma=2sigma=3
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0.5 1 1.5 2 2.5 3 3.5 4
sigma
mu=0mu=1mu=2
Figure 5: Dependence of E(∆pi) on µ and σ2 (left panel) and V(∆pi) on µ and σ2 (right panel)
in the static model with trivariate distribution of ∆pi.
data. This mechanism is seen from the expression for the first moment of ∆pi given by:
E(∆pi|µ, σ2) =K∑
k=−K
k · πk(µ, σ2) ,
where πk : −K ≤ k ≤ K are defined in equation (2) and the unit of measurement of E(∆pi)
is the number of ticks by which the price is expected to change. As is apparent from this
equation, whenever α have the structure given above, probability atoms in the left and in the
right tail of the distribution will be equal to each other when µ = 0 and different otherwise.
The second moment of ∆pi is given by the following expression:
V(∆pi|µ, σ2) =K∑
k=−K
k2 · πk(µ, σ2)−(E(∆pi|µ, σ2)
)2.
The measurement unit of the variance is given by squared ticks. When µ = 0, the second part
of the expression drops out, and as follows from equation (2), the remaining tail probabilities
in the expression are scaled proportionally to the parameter σ2, linking it to V(∆pi).
However, just as in many parametric discrete distributions, there will be cross effect of µ,
respectively σ2, on V(∆pi), respectively E(∆pi), since πk(µ, σ2) : −K ≤ k ≤ K are functions
of both parameters. Figure 5 plots the first two moments of a simple trivariate model of ∆pi as
the functions of µ and σ2. For E(∆pi), the increase in σ2 leads to less pronounced dependence
of the first moment on µ, and similar effect is observed for the dependence of V(∆pi) on σ2
when µ increases. The same result holds for models with larger support of ∆pi.
Given some constant µ, σ2 and α, the static model presented in this subsection produces a
sequence of i.i.d. homoscedastic discrete random variables ∆pi : i ∈ Z. By specifying µ and
σ2 in terms of the set of exogenous parameters, the set of probability densities πi : i ∈ Z is
allowed to vary, permitting the discrete process ∆pi : i ∈ Z to exhibit a variety of dynamic
88
features. We wish to note that the sequence ∆pi : i ∈ Z is bounded by construction, hence
all moments of ∆pi will exist. In the following subsections we will show how to introduce
dependence into the time series of high-frequency price changes such that the stylized facts
presented in 3.1 can be modeled in an adequate way.
3.3 IV-GARCH model for heteroscedastic discrete ∆pi
Building upon the model for homoscedastic price changes introduced in the previous subsection,
we now show how to model sequences of heteroscedastic ∆pi in high-frequency data. As
documented in subsection 3.1, and as will be seen in section 4, there is a dynamic dependence
in the real world ∆pi series in both the first and the second moments. We pick this up by
borrowing ideas from GARCH models of Bollerslev (1986). In particular, we parametrize σ2 in
terms of its own lags and a set of exogenous variables, with disturbances given by a sequence of
martingale difference (henceforth MD) innovations. µ is assumed to be non-dynamic, possibly
depending on a set of exogenous variables.
The first model for dependent heteroscedastic sequence of price changes features short-
memory dynamic structure in the second moment and is defined as follows:
∆pi ∼ π−K(µi, σ2i ;α), . . . πK(µi, σ
2i ;α)
µi = x′i β
σ2i = γ0i + γ1σ
2i−1 + γ2
[∆p2
i−1 − E(∆p2i−1|σ2
i−1)]
,
(3)
where π−K(µ, σ2;α), . . . πK(µi, σ2i ;α) denotes discrete distribution introduced in the previous
subsection with parameters µi, σi and α, and for simplicity we supress dependnece of the condi-
tional expectation E(∆p2i |σ2
i ) on µi. Vector xi collects a set of exogenous variables normalized
to have zero mean. Parameter γ0i is assumed to be non-negative for all i ∈ Z, and can option-
ally be parametrized as z′i δ to include effects of intra-day seasonality, news announcements
and additional exogenous variables. As follows from (3), the conditional volatility parameter
σ2i has one autoregressive component and one driving martingale difference innovation. In the
reminder of this paper this model is referred to as IV-GARCH(1,1).
σ2i : i ∈ Z process in the IV-GARCH(1,1) model above is driven by the innovation terms
∆p2i −E(∆p2
i |σ2i )
8. They have a natural interpretation of volatility “surprises”, i.e. unexpected
increases or decreases in the volatility of the discrete random variable ∆pi. The innovations
∆p2i −E(∆p2
i |σ2i ) are by construction martingale differences, since E
[∆p2
i − E(∆p2i |σ2
i )|Fi−1
]=
0 for all i ∈ Z, where Fi stands for the information generated by the process up to time i.8At this point the difference between our view of the model in equation (3) and the traditional ordered
probit/logit literature is most visible. Since we do not have any underlying latent variable with absolutely
continuous distribution in (3), the only random innovation in our model is the discrete ∆pi. Traditional approach
may call for the inclusion of the squared unobserved continuous ∆p∗i , which leads to complications due to its
latent character. Also, diagnostic procedures in traditional ordered probit/logit models require calculation of
the expected underlying innovation; see Hausman, Lo and MacKinlay (1992). We base our specification tests
directly on ∆pi.
89
The sequence of squared price innovation ∆p2i : i ∈ Z in the volatility process is not i.i.d.
and therefore requires careful treatment in proofs of the distributional results and statistical
properties of σ2i : i ∈ Z.
Under assumptions Eσ2i <∞ and 0 ≤ γ1 < 1, the IV-GARCH(1,1) model has the following
representation, referred to as IV-ARCH(∞):
σ2i =
∞∑j=0
γj1γ0i−j + γ2
∞∑j=0
γj1
[∆p2
i−j−1 − E(∆p2i−j−1|σ2
i−j−1)]
.
This representation highlights an important feature of model (3): the sequence of martingale
difference innovations is weighted by an exponentially decaying sequence of coeffiects, leading
to fast dissipation of the effect of past shocks on the current conditional volatility parameter
σ2i . In partuclar, when the conditional volatility process has a stationary distribution and γ0i
is constant, the autocovariance function of σ2i : i ∈ Z will be absolutely summable, similarly
to the class of short memory linear ARMA models.
Below we give a sufficient condition for non-negativity of the conditional volatility process
σ2i : i ∈ Z in the IV-GARCH(1,1) model:
THEOREM 1. The conditional volatility process in IV-GARCH(1,1) model is non-negative if
σ2 7→ γ0 + γ1σ2 − γ2E(∆p2|σ2) > 0 for all σ2 > 0.
The choice of γ0, γ1 and γ2 that satisfy the condition in Theorem 1 is always possible because
by A1 and A3 function σ2 7→ E(∆p2|σ2) is continuous and bounded between 0 and K2. In
addition, a simpler sufficient condition for positivity of the volatility process in IV-GARCH(1,1)
model is given by σ2 7→ γ0 − γ2E(∆p2|σ2) > 0, which is seen from (A.1).
Before showing conditions for stochastic stability of the IV-GARCH(1,1) model, we show
deterministic stability of the noise-free skeleton of the model as given in equation (A.1). We
also obtain the lower bound for the volatility process in the IV-GARCH(1,1) model. We need
the following extra regularity assumption:
A4. The function σ2 7→ γ0 + γ1σ2 − γ2E(∆p2|σ2) defines a contraction, that is∣∣γ1[x− y]− γ2[E(∆p2|x)− E(∆p2|y)]
∣∣ ≤ δ|x− y|
holds for any x, y ∈ R+ and 0 < δ < 1.
THEOREM 2. Under A4, the deterministic part of the volatility process in IV-GARCH(1,1)
model defined by the recursion σ2i = γ0 + γ1σ
2i−1 − γ2E(∆p2|σ2
i−1) is globally stable and has a
unique limit point given by the solution of σ2 = γ0 + γ1σ2 − γ2E(∆p2|σ2).
It is also seen that the expected value of the volatility process in equation (3) is given by:
Eσ2i =
E γ0i
1− γ1.
Finally, existence and uniqueness of the stationary distribution of the IV-GARCH(1,1)
model draws on the results from Markov chain theory in general state spaces. In particular, we
90
will follow the line of proofs given in Davis, Rydberg, Shephard and Streett (2001). The IV-
GARCH(1,1) model and the CBIN model introduced by these authors have similar probabilistic
structures, and the cited work have been very helpful in establishing stationarity results for our
model. Note that the conditional volatility part the IV-GARCH(1,1) model defines a Markov
chain on the positive half-line:
σ2i = γ0 + γ1σ
2i−1 − γ2E(∆p2
i−1|σ2i−1) + γ2∆p2
i−1 , (4)
where ∆p2i : i ∈ Z is the sequence of non-i.i.d. innovations. The following result is central in
this subsection:
THEOREM 3. Markov chain σ2i : i ∈ Z with transition function defined by (4) possesses a
unique stationary distribution under A1–A4.
3.4 IV-FIARCH model for heteroscedastic discrete ∆pi
The long-memory version of the IV-ARCH(∞) model, referred to as IV-FIARCH(∞) model,
is defined by:
∆pi ∼ π−K(µi, σ2i ;α), . . . πK(µi, σ
2i ;α)
µi = x′i β
σ2i = γ0i + γ2(1− L)−d
[∆p2
i−1 − E(∆p2i−1|σ2
i−1)]
,
(5)
where d is the coefficient of fractional integration, and L denotes the lag operator9. The power
series expansion of (1−z)−d around z = 0 gives a sequence of coefficients θj : j ≥ 0, with the
property∑∞
j=0 θδj < ∞ for all δ > (1 − d)−1 and 0 ≤ d < 1. In particular, θj : j ≥ 0 is not
absolutely summable, but is square–summable for 0 ≤ d < 12 . This will imply non–summable
autocovariance function under assumption of stationary of the conditional volatility process
and constant γ0i.
Similarly to the IV-GARCH(1,1) model (3), the sequence ∆p2i − E(∆p2
i |σ2i ) : i ∈ Z is
by construction a martingale difference sequence, where both E|∆p2i − E(∆p2
i |σ2i )| < ∞ and
E[∆p2
i − E(∆p2i |σ2
i )|Fi−1
]= 0 hold because the random variable ∆p2
i is bounded between 0
and K2. It follows that Eσ2i = E γ0i.
Sufficient non-negativity condition for the σ2i : i ∈ Z process in the IV-FIARCH(∞)
model is given in Theorem 4:
THEOREM 4. The conditional volatility process in the IV-FIARCH(∞) model (5) is non-
negative if σ2 7→ σ2 − γ2 E(∆p2|σ2) ≥ 0.
An immediate consequence of Theorem 4 is that limσ2→0 E(∆p2|σ2) = 0. This implies δ = 0 in
A2 for the IV-FIARCH(∞) model.
Unlike in the IV-GARCH(1,1) case, only a deterministic stability result for the condi-
tional volatility process in model (5) are available, due to non-Markovian structure of the
IV-FIARCH(∞) model:9Baillie (1996) gives an overview of recent results on fractional integration in econometrics.
91
THEOREM 5. Under conditions of Theorem 4, the deterministic part of the volatility process
in IV-FIARCH(∞) model defined by σ2i = γ0 − γ2
∑ij=1 ψj−1E(∆p2
i−j |σ2i−j) is globally stable
and has a unique limit point σ2 = 0.
So far we introduced two simple dynamic heteroscedasticity models for high-frequency price
changes. As seen from equations (3) and (5), IV-GARCH(1,1) and IV-FIARCH(∞) models
have relatively restricted short- and long-memory dynamics. The work is being done to extend
the models to have richer dynamics, including short-memory part in the IV-FIARCH model.
3.5 Estimation and diagnostics for IV-GARCH and IV-FIARCH models
Statistical inference in IV-GARCH and IV-FIARCH models is based on the straightforward
maximum-likelihood procedure. Given the set of observed data ∆pi,xi,zi : 1 ≤ i ≤ N the
log-likelihood function is given by:
LN
(α,β,γ, δ
∣∣∆pi,xi,zi : 1 ≤ i ≤ N)
=N∑
i=1
π′i 1∆pi .
In IV-FIARCH model the parameter d replaces γ1 in the log-likelihood function. In the em-
pirical part of the paper in section 5 we employ numerical procedures to calculate the gradient
and the Hessian matrix of the log-likelihood function. We use BHHH method of Berndt, Hall,
Hall and Hausman (1974) for numerical maximization of the log-likelihood functions. Stan-
dard errors of the parameter estimates are computed from the diagonal elements of the inverted
negative Hessian matrix at the point of maximum of the log-likelihood function.
Diagnostics in the IV-GARCH and IV-FIARCH models can be based on generalized resid-
uals defined as follows:
ui =∆pi − E(∆pi|σ2
i )√V(∆pi
∣∣σ2i
) . (6)
By construction, under the maintained hypothesis of ∆pi : i ∈ Z generated from IV-GARCH
or IV-FIARCH model, the sequence ui : 1 ≤ i ≤ N will be homoscedastic and uncorrelated.
A battery of standard test procedures can be applied to ui : 1 ≤ i ≤ N to test for this
assumptions in empirical applications of the models.
4 Data and descriptive statistics
In this section we give an overview of the high-frequency transaction data used in the empirical
part of the paper in section 5. We present simple descriptive statistics of the dataset, discuss
evidence of long-range dependence in the volatility of high-frequency price changes and its
dependence on the tails of price changes distribution.
In section 5 we apply IV-GARCH and IV-FIARCH models to the high-frequency IBM trades
data. This dataset has been used in the series of recent papers by Engle and Russell (1998),
Russell and Engle (1998) and Engle (2000). The dataset originates from the TORQ database
and covers trades of the IBM stock on the weekdays during the three-month period from
92
November, 1990 through January, 1991. There are 60328 observations in the sample, and
besides the date and the timestamp the dataset also includes information on the traded volume,
transaction price and the bid-ask price of the stock at the time of the trade.
From this dataset we extract transaction prices and create price changes data ∆pi by
taking their first difference. The prices and price changes are discrete due to the institutional
structure of the NYSE where the stock is traded. ∆pi is the multiple of $18 , and we delete
a few “keypunch” errors where this has not been the case. In addition to the high-frequency
price changes, we calculate inter-trade durations and construct trading hour indicator dummies
for picking up intraday seasonal effects. We also create buyer-seller indicators by comparing
the current transaction price with the mid-quote price coming at least 5 seconds before the
transaction; see Campbell, Lo and MacKinlay (1997) pp. 136-137 for the discussion of this
methodology.
It has become a common practice in the literature to filter out observations that have zero
inter-trade durations and zero price changes; see Russell and Engle (1998) and Jasiak (1998)
among others. This adjustment is believed to help reducing the influence of so-called splitted
trades, whereby larger orders are divided into a number of smaller ones traded at the same price.
Therefore, the filtered price changes can be attributed to the unique transactions leading to
the price movements due to the new information arrivals and/or position adjustments/liquidity
considerations. In the empirical application of IV-GARCH and IV-FIARCH models in section 5
we also follow this procedure for the IBM data.
Table 1: Probabilities of ∆pi.
sk Pr(∆pi = sk)
≤ -0.50000 0.0042211
-0.37500 0.0027775
-0.25000 0.014217
-0.12500 0.16002
0.0000 0.63633
0.12500 0.16016
0.25000 0.014947
0.37500 0.0031064
≥ 0.50000 0.0042211
Distribution of the resulting ∆pi series is shown on figure 1. Table 1 shows the state
probabilities of high-frequency ∆pi, where we censor price changes higher than $12 in absolute
value. Although the distribution of ∆pi series on figure 1 is notably symmetric around its
middle state, table 1 show that the right half of the distribution has slightly higher probability
mass reflecting the upward drift of the IBM stock price during the sample period. Nevertheless,
assumption of the symmetric unconditional distribution of price changes in the IV-GARCH and
IV-FIARCH models made in section 3 seems warranted in view of the evidence in table 1 and
93
figure 1. Note that the parameter µi in IV-GARCH and IV-FIARCH models is designed to pick
up short-term fluctuations in the first moment of high-frequency price changes, and is likely to
have only limited success in describing its long term trend.
Descriptive statistics of the ∆pi series together with its transformations is given in table 2.
After all data adjustments there are 54725 observations in the sample. It is seen that the sam-
ple average of the series is statistically indistinguishable from zero, and that small variance of
∆pi reflects the dominating probability of zero price movement in the data. Symmetry of the
distribution of ∆pi is confirmed by the statistically insignificant skewness statistic. Portman-
teau statistics presented in the table clearly indicate the presence of the dynamic structure in
both the first and second moments of high-frequency price changes. This is further confirmed
by the estimated autocorrelation functions on figure 210. It is seen that, while there is a sig-
nificant negative first-order autocorrelation in the ∆pi series, all higher-order autocorrelations
are statistically insignificant. This pattern corresponds very well to the bid-ask bounce model
of Roll (1984). In the empirical part of the paper in section 5 we include lags of buyer-seller
indicator in our specification of µi to capture this effect.
Table 2: Sample statistics of high-frequency price changes.
∆pi |∆pi| ∆p2i
Mean 0.000342622 0.0558748∗ 0.0140304∗
Variance 0.0140303 0.0109084 0.0244471
Skewness -0.02079 8.3517∗ 61.4053∗
Kurtosis 125.193∗ 187.434∗ 4555.33∗
Q(500) 12773.6∗ 42829∗ 32837.3∗
Maximum 3.625 3.625 13.1406
Minimum -3.625 0 0
No. of obs.: 54725 54725 54725
Notes: ∆pi denotes high-frequency price change for IBM transac-
tion data. Tick size equals to $ 18. Skewness and kurtosis statistics
and their standard errors are according to Jarque and Bera (1987).
Q(500) denotes Ljung-Box statistic with autocorrelations up to 500
lags; see Ljung and Box (1978). Star near a test statistic indicates
significance on 5% level for the appropriate distribution.
Lower panel of figure 2 reveals substantial degree of persistence in the second moment of
high-frequency price changes in IBM trades data as given by 500 lags of significant autocorre-
lations in the |∆pi| series. Similar pattern in the irregularly spaced stock market data has been
documented earlier in Rydberg and Shephard (2000), while Andersen and Bollerslev (1997a,
1997b, 1998) document long-range dependence in the volatility of five-minute foreign exchange
returns.
The number of parameters in IV-GARCH and IV-FIARCH models in section 3 depends on10Under i.i.d. normality assumption 95% confidence band for estimated autocorrelations is given by ± 1.96√
T.
Confidence intervals for correlograms shown on figure 2 are given by ±0.0084 for both ∆pi and |∆pi| series.
94
0 50 100 150 200 250 300 350 400 450 500
.2
.4
Correlogram
Date: 27-07-2001 17:01:43 Filename: trade_durations.mat
0 50 100 150 200 250 300 350 400 450 500
.2
.4
Correlogram
0 50 100 150 200 250 300 350 400 450 500
.2
.4
Correlogram
Figure 6: Correlograms of censored |∆pi| series: 7 states (upper panel), 5 states (middle panel)
and 3 states (lower panel).
the number of support points in the distribution of high-frequency price changes. As shown
in table 1, the states of ∆pi far in the tails of the distributions have quite small probability.
Therefore, in section 5 of the paper we estimate IV-GARCH and IV-FIARCH models with
reduced number of states of ∆pi, saving on the estimation time by cutting the parameters that
are likely to be estimated inefficiently. However, there has been little evidence in the literature
on the effects of the censoring of large price changes in high-frequency data on the dynamics of
the volatility of ∆pi series. Figure 6 depicts correlograms of |∆pi| series censored to 7, 5 and
3 states11. It is seen that the very extreme states of the price changes distribution do not play
a major role in the volatility dynamics of the series. When ∆pi is censored down to 3 states
from the initial number of 37 states — equivalent of loosing less than 5% of the information
compared to the initial distribution — the autocorrelations of absolute price changes become
remarkably low, becoming statistically insignificant already after 50 lags. With 7 states in price
changes distribution — still much lower than the number of states in the uncensored data —
the dynamics of |∆pi| closely resembles that of the original series on figure 2.
5 Application to IBM transaction data
In this section we report estimation results for IV-GARCH and IV-FIARCH models using IBM
transaction dataset introduced in section 4. We study the influence of several explanatory vari-11The confidence band for autocorrelations on figure 6 is given by ±0.0084 for all three graphs.
95
ables on the conditional volatility part of IV-GARCH and IV-FIARCH models, but our main
goal is to gauge the overall success of these two models in explaining volatility dynamics of dis-
crete high-frequency data. We present model diagnostics based on the generalized residuals (6)
to assess the quality of the fit12.
In the application of IV-GARCH and IV-FIARCH models to IBM data we censor high-
frequency price changes to 7 states. This allows us save on the number of estimated parameters,
but at the same time preserves essential dynamic structure of the volatility in the data; see
section 4. With 7 support points in the estimated distribution of ∆pi the dimensionality of α
is 6, out of which 2 parameters are free.
The conditional mean parameter µi and the parameter γ0i in IV-GARCH and IV-FIARCH
models are parametrized using the following sets of predetermined and exogenous variables.
For the µi parameter we use:
- Ibsi−1 is lagged buyer-seller indicator constructed as outlined in section 4. We include
one lag of this variable to pick up the bid-ask bounce in the first moment of ∆pi series;
- ∆pi−1 is included to account for possible autoregressive dynamics of the first moment of
∆pi series. Hausman, Lo and MacKinlay (1992) found the lags of endogenous variable to
be significant in the conditional mean of their ordered probit model;
- Durati is the inter-trade duration variable for the current observation. We treat this
variable as exogenous with respect to the price changes in all models in this section,
although this assumption may not be entirely realistic; see Ghysels and Jasiak (1997) for
the evidence of why this may not be so.
The parameter γ0i in the conditional volatility part of the models includes the following vari-
ables:
- Negdpi−1 is the indicator variable showing occurrence of the previous negative price
change in the series. This variable is included to study possible leverage effect in the
high-frequency data, refer to Nelson (1991). Rydberg and Shephard (2003) found this
variable to be significant in their ADS decomposition model;
- Durati is the same variable that appears in the conditional mean part of the model. By
including inter-trade duration variable two times we study the effect of trading intensity
separately on the first and second moments of ∆pi. Hausman, Lo and MacKinlay (1992)
found this variable marginally significant in static conditional volatility part of their
model;
- Trade hour dummies pick up possible intraday seasonality in the second moment of high-
frequency price changes. Intraday seasonal patterns in the volatility of high-frequency
data are widely reported for the regularly spaced data; see Andersen, Bollerslev and
Cai (2000) for recent evidence.12All models in this paper are estimated using Ox version 2.20 for Linux 2.2.17, refer to Doornik (1998).
96
As was mentioned in section 3, all variables in the conditional mean part of IV-GARCH and
IV-FIARCH models, with an exception of the constant term, should be normalized to have
zero means. This is done for all exogenous variables entering µi in both models. Signs of the
coefficients for the variables listed above were not restricted during the estimation procedure.
Therefore, the direction of the influence of included explanatory variables on the second moment
of high-frequency price changes coincides with the signs of the estimated parameters.
First, we discuss estimation results for the short memory IV-GARCH model. As seen
from table 3, the γ1 parameter of the conditional volatility process is highly significant, but
is firmly below unity. With the value of this parameter estimated at 0.93478, the half-live of
a unit shock to the conditional variance process is given by only 10 periods. This seems to
contradict the evidence of the long-range dependence in the volatility of the ∆pi series presented
in section 4. Generalized residuals diagnostics presented in the bottom half of table 3 also hints
to the unexplained dynamics left in both the first and second moments of ∆pi series. However,
graphical examination of the correlograms of generalized residuals and absolute generalized
residuals on figure 713 reveals dramatic reduction of the volatility dynamics of the residuals
compared to the original series. In fact, as suggested by the lower right panel of figure 7,
the significance of the Ljung-Box statistic of squared residuals stems from the unexplained
fist-order autocorrelation in the volatility of the ∆pi series, indicating possible omission of
explanatory variables or the need for extra MA dynamics in σ2i . Most importantly, however,
the IV-GARCH model seems to do a good job in picking up the conditional heteroscedasticity
of high-frequency price changes.
Most coefficients of other explanatory variables in the IV-GARCH model in table 3 have
expected signs. Buyer-seller indicator Ibsi−1 have expected negative influence on the condi-
tional mean part of the model, indicating increased probability of negative price change when
the transaction is seller-initiated. However, as seen from the lower left panel of figure 7, gener-
alized residuals still retain significant first order negative autocorrelation. This may signal the
failure of Ibsi−1 variable to correctly classify all transactions in the dataset into either initiated
by the buyer or seller14. The Durati variables is only significant in the conditional mean part,
and is also indicating right skew of the distribution of ∆pi for longer inter-trade durations. As
in the static model of Hausman, Lo and MacKinlay (1992), inter-trade durations have no sig-
nificant influence in the conditional volatility part of the model. A somewhat surprising result
shown in table 3 is the effect of the previous negative price change on the conditional volatility
of ∆pi. In contrast to the leverage hypothesis of Nelson (1991), conditional volatility of high-
frequency ∆pi in IBM data becomes lower after the preceding stock price drop, although the
statistical significance of the coefficient estimate of Negdpi−1 is high for the given sample size.
Effect of intraday seasonality dummies on σ2i is also insignificant.
13The confidence band for autocorrelations on figure 7 is given by ±0.0084 for all three graphs.14In is also necessary to note here that the methodology of creating the Ibsi−1 variable does not allow for the
full data classification. In our dataset around 21% of observations are left unclassified, which may also contribute
to the remaining first-order autocorrelation in the generalized residuals.
97
Table 3: Results of modeling high-frequency IBM trades series.
IV-GARCH IV-FIARCH
α parameters
α1 2.5553 (0.022118) 2.5555 (0.039110)
α2 3.4401 (0.041139) 3.3469 (0.064460)
Conditional mean parameters
Const 0.0043945 (0.0068764 0.0086375 (0.013443)
Ibsi−1 -0.30770 (0.0084919 -0.32839 (0.016626)
∆pi−1 -5.3752 (0.091196) -5.4225 (0.16522)
Durati -0.055171 (0.0095503) -0.10364 (0.030933)
Conditional variance parameters
Const 0.030987 (0.0037965) 1.5471 (0.44586)
Negdpi−1 -0.024142 (0.0071386) 0.083911 (0.021269)
Durati -0.0017309 (0.0015081) -0.032359 (0.021301)
9-10h -0.0011467 (0.0016714) -0.21259 (0.047269)
11-12h -0.0016986 (0.0016620) -0.16512 (0.063266)
13-14h -0.0020903 (0.0017012) -0.17244 (0.064233)
γ1 0.93478 (0.0086118) — —
γ2 0.069761 (0.0060288) 0.10259 (0.011412)
d — — 0.60528 (0.033268)
Skewness -0.153967∗ -0.132489∗
Kurtosis 5.55161∗ 5.35271∗
Q(500) 1777.92∗ 1081.83∗
Q2(500) 729.472∗ 594.943∗
Log lik. -27929.287 -8619.459
No of obs. 54725 22000
Notes: Parameters of the models are denoted as detailed in the text. Asymp-
totic maximum-likelihood standard errors are given in the parenthesis. Gen-
eralized residuals calculated according to equation 6. Skewness and kurtosis
statistics and their standard errors are according to Jarque and Bera (1987).
Q(500) denotes Ljung-Box statistic of the generalized residuals and Q2(500) of
squared generalized residuals with autocorrelations up to 500 lags; see Ljung
and Box (1978). Star near a test statistic indicates significance on 5% level
for the appropriate distribution.
98
0 100 200 300 400 500
−0.2
0.0
0.2
Correlogram
Date: 29−07−2001 15:20:08 Filename: trade_durations.mat
−5.0 −2.5 0.0 2.5 5.0 7.5
0.2
0.4
0.6
Density
0 100 200 300 400 500
−0.1
0.0
0.1
0.2 Correlogram
0 100 200 300 400 500
−0.2
0.0
0.2
Correlogram
Figure 7: Residuals diagnostics for IV-GARCH model. Correlograms of ui (lower left panel)
and |ui| (lower right panel). Estimated density of ui (upper right panel) and correlogram of
|∆pi| (upper left panel).
0 100 200 300 400 500
−0.2
0.0
0.2
Correlogram
Date: 31−07−2001 18:53:01 Filename: trade_durations.mat
−5.0 −2.5 0.0 2.5 5.0 7.5
0.2
0.4
Density
0 100 200 300 400 500
−0.1
0.0
0.1
0.2 Correlogram
0 100 200 300 400 500
−0.2
0.0
0.2
Correlogram
Figure 8: Residuals diagnostics for IV-FIARCH model. Correlograms of ui (lower left panel)
and |ui| (lower right panel). Estimated density of ui (upper right panel) and correlogram of
|∆pi| (upper left panel).
99
Estimation results for the long memory IV-FIARCH model are given in the right half of
table 315. We use the same set of exogenous explanatory variables as in the IV-GARCH model.
The estimate of the coefficient of fractional integration is above 0.5 and highly statistically
significant. Figure 816 shows high persistence of the volatility of ∆pi in the estimation sub-
sample as well as dramatic reduction of this persistence in the estimated generalized residuals,
although similarly to the IV-GARCH model some significant low-order autocorrelation is still
present. Effects of the included explanatory variables remain the same, except for the Negdpi−1
in the conditional variance part of the model. Even though it now supports the leverage effect
of Nelson (1991), the estimated standard error of the coefficient is relatively large for the given
sample size.
Surprisingly, both the IV-GARCH and IV-FIARCH models are doing very much alike in
terms of the residuals diagnostics shown on figures 7 and 8, even though the AR(1) structure of
the volatility process of the IV-GARCH model is not able to pick up high persistence exhibited
by the data. Relatively low point estimate of γ1 in the model reported in table 3 seems to
reaffirm the analogous results documented in Ghysels and Jasiak (1997) and Hasbrouck (1999);
see subsection 2.4. In summary, the findings call for further investigation into potential infre-
quent changes in the volatility regimes on the market that can create statistical illusion of the
long-range dependence in the data.
6 Conclusions
In this paper we present a class of models for time-series of discrete high-frequency price
changes. In contrast to the ACM model of Russell and Engle (1998) and the ADS decomposition
model of Rydberg and Shephard (2003), our models have separate set of parameters linked to
the first two moments of the conditional distribution of discrete price changes. We borrow the
idea of Hausman, Lo and MacKinlay (1992) and specify discrete distribution of ∆pi similarly to
the well-known ordered logit model. But unlike the latter study, our models are cast entirely in
terms of discrete random variables, including procedures for model diagnostics. We introduce
IV-GARCH and IV-FIARCH models for heteroscedastic sequences of discrete price changes,
where volatility parameter has the dynamic structure resembling the one in GARCH models
of Bollerslev (1986).
Separate sets of parameters for the moments of discrete price changes allow us to isolate
effects of exogenous variables on conditional mean and conditional variance of ∆pi. We present
application of IV-GARCH(1,1) and IV-FIARCH(∞) models to high-frequency IBM trades
data, where we study the effects of inter-trade durations, buyer-seller indicator and previous
negative price changes on the moments of high-frequency price changes.
We find that both IV-GARCH and IV-FIARCH models explain dynamic heteroscedasticity
of ∆pi series quite well. In particular, both models succeed in explaining most of the long-15Please note that the model is estimated on the reduced sample due to computational time considerations.16The confidence band for autocorrelations on figure 8 is given by ±0.0132 for all three graphs.
100
range dependence observed in absolute price changes series in the data, although some low order
dependence remains. Unexpectedly, the short-memory IV-GARCH(1,1) model fits the dataset
at least as good as the IV-FIARCH(∞) model in terms of residuals diagnostics. This may
indicate that observed long-range dependence in |∆pi| comes from the infrequent changes in
the volatility regimes of the stock market, rather than from the generic long-memory structure
in the second moment.
Current research efforts in the literature are directed to joint modeling of price changes and
durations in high-frequency financial data; see Gerhard and Pohlmeier (2000). In this paper
we estimate a conditional model of price changes volatility and find an insignificant effect of
the immediately preceding inter-trade duration on σ2i parameter both in IV-GARCH and IV-
FIARCH models. This finding is surprising and calls for further investigation of the interaction
between the two variables in the joint model. Combination of the ACD model of Engle and
Russell (1998) and IV-GARCH model introduced in this paper may provide a useful tool for
such analysis.
The interrelations between possible volatility regimes and long-range dependence in the
second moment of high-frequency price changes is another research issue raised by the findings
in this paper. The literature on potential links between structural breaks and long memory
in the volatility of financial data is numerous (see Hamilton and Susmel (1994), Lamoureux
and Lastrapes (1990) and Liu (2000) among others), but is mostly limited to lower frequency
financial data. The IV-FIARCH model offers an opportunity to study this issue in high-
frequency datasets.
7 Appendix
PROOF OF THEOREM 1: The noise-free skeleton of the volatility process in (3) is given by
the following exression, for any arbitrary large 0 < N <∞:
σ2i =
N∑j=1
γj−11
[γ0 − γ2E(∆p2
i−j |σ2i−j)
]+ γN
1 σ2i−N . (A.1)
Assume that the process is started at σ2−N . Recursive substitution into the equation above and
positivity of σ2 7→ γ0 + γ1σ2 − γ2E(∆p2|σ2) shows that:
σ2−N = γ0
σ2−N+1 = γ0 − γ2E(∆p2
−N |σ2−N ) + γ1σ
2−N > 0
σ2−N+2 =
[γ0 − γ2E(∆p2
−N+1|σ2−N+1)
]+ γ1
[γ0 − γ2E(∆p2
−N |σ2−N )
]+ γ2
1σ2−N
=[γ0 − γ2E(∆p2
−N+1|σ2−N+1)
]+ γ1σ
21 > 0
...
By letting N to infinity we establish the required result.
101
PROOF OF THEOREM 2: We utilize the invariance principle given in Theorem 2.9 of
Tong (1990). We have to show that the mapping σ2 7→ γ0 + γ1σ2 − γ2E(∆p2|σ2) is con-
tinuous and bounded, and that a Lyapunov function exists for the recursion σ2i = γ0 +γ1σ
2i−1−
γ2E(∆p2|σ2i−1).
1. Continuity of the mapping σ2 7→ γ0 + γ1σ2 − γ2E(∆p2|σ2) follows from A3, whereby the
function σ2 7→ E(∆p2|σ2) is continuous. Boundedness is an immediate consequence of
the parameter restrictions γ0, γ2 > 0, 0 < γ1 < 1 and non-negativity of the function
σ2 7→ E(∆p2|σ2).
2. Define an identity map V (σ2) ≡ σ2. In order for V to be a Lyapunov function for the
recursion σ2i = γ0 + γ1σ
2i−1 − γ2E(∆p2|σ2
i−1), we have to check that:
V(γ0 + γ1σ
2 − γ2E(∆p2|σ2))− V
(σ2)
= γ0 + γ1σ2 − γ2E(∆p2|σ2)− σ2 ≤ 0
for σ2 ∈ G ⊆ R+. Under A4 there exists is a unique solution of the equation γ0 − (1 −γ1)σ2 − γ2E(∆p2|σ2) = 0 provided that limσ2↓0 E(∆p2|σ2) ≤ γ0
γ2. We denote this solution
by σ2. Hence, V is Lyapunov function for the recursion σ2i = γ0 +γ1σ
2i−1−γ2E(∆p2|σ2
i−1)
on G = [σ2,∞).
Global stability of the recursion σ2i = γ0 + γ1σ
2i−1 − γ2E(∆p2|σ2
i−1) follows from the fact that
it converges to σ2 from any σ20 ∈ G.
LEMMA 1. Suppose that there exists a measurable function V : X 7→ [0,∞) and a set A ∈ B(X)
satisfying:
1. For some b <∞,
PV ≤ V − 1 + b1A .
2.
limi→∞
supa∈A
E(V (σ2
i )1(τA > i)∣∣∣σ2
0 = a)
= 0 .
3. For each m ≥ 1, the family of probability measures 1m
∑mk=1 P
k(a, ·) : a ∈ A is tight.
Then the chain is bounded in probability on average.
PROOF: The proof follows directly from Glynn and Meyn (1997).
LEMMA 2. If σ2i : i ∈ Z is a weak Feller, then for each m ≥ 1 and compact A, the family of
probability measures 1m
∑mk=1 P
k(a, ·) : a ∈ A is tight.
PROOF: The proof is given in Davis, Rydberg, Shephard and Streett (2001).
PROOF OF THEOREM 3: To show existence and uniqueness of the stationary distribution
of IV-GARCH(1,1) model we show that Markov chain σ2i : i ∈ Z defined by equation (4) is
bounded in probability on average, is an e-chain and possesses a reachable state.
102
1. To show that the chain σ2i : i ∈ Z is bounded in probability on average we verify three
conditions given in Lemma 1 of Glynn and Meyn (1997).
(a) Let function V be given by the identity map V (x) = x, let set A be an in-
terval [σ2, 1+γ0
1−γ1], where σ2 = γ0−γ2E(∆p2|σ2)
1−γ1is given in Theorem 2, and let b =
1 + γ2E(∆p2|σ2). Recall that (4) is defined on R+. We are required to show that
PV − V ≤ −1 + b1A. This follows from:
PV − V ≡ E(σ2i |σ2
i−1 = x)− x
= γ0 + γ1x+ γ2E(∆p2i−1|σ2
i−1)− γ2E(∆p2i−1|σ2
i−1)− x
= γ0 + x(γ1 − 1)
≤ −1 +(1 + γ2E(∆p2|σ2)
)1A .
(b) Let set A be as before and note that be Cauchy-Schwartz inequality we have that:
limi→∞
supa∈A
E(σ2
i 1(τA > i)∣∣∣σ2
0 = a)≤ lim
i→∞supa∈A
E12
((σ2
i )2∣∣∣σ2
0 = a)P
12 (τA > i|σ2
0 = a) .
(A.2)
The first term in the inequality above can be written as:
E((σ2
i )2∣∣∣σ2
0 = a)
= E((γ0 + γ1σ
2i−1 + γ2[∆p2
i−1 − E(∆p2i−1|σ2
i−1)])2∣∣∣σ2
0 = a)
= E(
E(γ2
0 + γ21(σ2
i−1)2 + γ2
2 [∆p2i−1 − E(∆p2
i−1|σ2i−1)]
2
+2γ0γ1σ2i−1 + 2γ0γ2[∆p2
i−1 − E(∆p2i−1|σ2
i−1)]
+2γ1σ2i−1γ2[∆p2
i−1 − E(∆p2i−1|σ2
i−1)]∣∣∣σ2
i−1
)∣∣∣∣σ20 = a
)= γ2
0 + γ21E((σ2
i−1)2∣∣∣σ2
0 = a)
+ γ22V(∆p2
i−1|σ20 = a)
+2γ0γ1E(σ2i−1|σ2
0 = a) .
By recursively substituting E((σ2
i )2|σ2
0 = a)
into the expression above the following
equation obtains:
E((σ2
i )2∣∣∣σ2
0 = a)
= γ20
t−1∑j=0
γ2j1 + γ2t
1 a2 + 2γ0γ1
i−1∑j=0
γ2j1 E(σ2
i−j−1|σ20 = a)
+γ22
i−1∑j=0
γ2j1 V(∆p2
i−j−1|σ20 = a) . (A.3)
In this expression, E(σ2i |σ2
0 = a) can be written as follows:
E(σ2i |σ2
0 = a) = E(γ0 + γ1σ
2i−1 + γ2[∆p2
i−1 − E(∆p2i−1|σ2
i−1)]∣∣∣σ2
0 = a)
= E(
E(γ0 + γ1σ
2i−1 + γ2[∆p2
i−1 − E(∆p2i−1|σ2
i−1)]∣∣∣σ2
i−1
)∣∣∣∣σ20 = a
)= γ0 + γ1E(σ2
i−1|σ20 = a)
= . . . = γ0
i−1∑j=0
γj1 + γi
1a .
103
From this we have that:
2γ0γ1
i−1∑j=0
γ2j1 E(σ2
i−1|σ20 = a) = 2γ0γ1
i−1∑j=0
γ2j1
(γ0
i−j−2∑l=0
γl1 + γi−j−1
1 a
)
= 2γ0γi1a
i−1∑j=0
γj1 + 2γ2
0γ1
i−1∑j=0
γ2j1
i−j−2∑l=0
γl1 ,
from where using the fact that∑i−j−2
l=0 γl1 ≤ 1
1−γ1we arrive at the limit:
limi→∞
2γ0γ1
i−1∑j=0
γ2j1 E(σ2
i−j−1|σ20 = a) ≤ 2γ2
0γ1
(1− γ1)(1− γ21)
.
Next, consider V(∆p2i |σ2
0 = a) in equation (A.3). Because the support of the random
variable ∆p2i is a finite subset of N0, its variance will always be bounded. Let the
upper bound of V(∆p2i |σ2
0 = a) be given by V . Then we have the following limit:
limi→∞
γ22
i−1∑j=0
γ2j1 V(∆p2
i−j−1|σ20 = a) ≤ lim
i→∞γ2
2
i−1∑j=0
γ2j1 V =
γ22 V
1− γ21
.
Combining these results for the first part of equation (A.2) we get the following
inequality:
E((σ2
i )2∣∣∣σ2
0 = a)≤ γ2
0
1− γ21
+ a2 +2γ2
0γ1
(1− γ1)(1− γ21)
+γ2
2 V
1− γ21
= c1 <∞ .
The second part of equation (A.2) follows from the Theorem 11.3.4 of Meyn and
Tweedie (1993), whereby:
P(τA > i|σ20 = a) ≤ E(τA|σ2
0 = a)i+ 1
≤ V (a) + b1A(a)i+ 1
≤ a+ 1 + γ2E(∆p2|σ2)i+ 1
.
The second condition of Lemma 1 then follows from:
limi→∞
supa∈A
E(σ2
i 1(τA > i)∣∣∣σ2
0 = a)
≤ c121 lim
i→∞supa∈A
(a+ 1 + γ2E(∆p2|σ2)
i+ 1
) 12
≤ c121 lim
i→∞
(1+γ0
1−γ1+ 1 + γ2E(∆p2|σ2)
i+ 1
) 12
= 0 .
(c) The third condition of Lemma 1 is the consequence of Lemma 2 if we show that
chain σ2i : i ∈ Z is a weak Feller chain. Recall that a Markov chain is said to be
weak Feller if its transition function P (·, O) is a lower semicontinuous function for
any open set O ∈ B(X); refer to Tweedie (1998). Rewrite (4) as:
σ2i = γ0 + γ1σ
2i−1 − γ2E(∆p2
i−1|σ2i−1) + γ2∆p2
i−1 .
It follows that the Markov transition kernel P (·, O) for the chain defined by the
equation above is given by:
P (x,O) =K∑
k=−K
1O(γ0 + γ1x− γ2E(∆p2|x) + γ2k2)πk(x) .
104
Recall that the function 1O is a lower semicontinuous function for an open set O,
and that according to A3 functions x 7→ E(∆p2|x) and x 7→ πk(x), k = −K . . .K
are both continuous in x. Hence, x 7→ P (x,O) is a lower semicontinuous for the
IV-GARCH(1,1) model.
This finishes the proof that the chain σ2i : i ∈ Z is bounded in probability on average.
2. Recall that a Markov chain is said to be an e-chain if the collection of Markov transition
kernels Pnf : n ≥ 1 is equicontinuous for each continuous function f with compact
support; refer to Meyn and Tweedie (1993). Therefore, it is necessary to show that for
any x, y in the state-space of the chain σ2i : i ∈ Z and for a given ε1 > 0 there is ε2 > 0
s.t. |Pnx f − Pn
y f | < ε1 whenever |x− y| < ε2 for all n ≥ 1.
We start with one-step transition probabilities. Since by assumption f will be uniformly
continuous and bounded, assume without loss of generality that |f | ≤ 1. Observe that:
|Pxf − Pyf | =∣∣∣∣ K∑
k=−K
f(γ0 + γ1x− γ2E(∆p2|x) + γ2k2)πk(x)
−K∑
k=−K
f(γ0 + γ1y − γ2E(∆p2|y) + γ2k2)πk(y)
∣∣∣∣=
∣∣∣∣ K∑k=−K
(f(γ0 + γ1x− γ2E(∆p2|x) + γ2k
2)
−f(γ0 + γ1y − γ2E(∆p2|y) + γ2k
2))πk(x)
+K∑
k=−K
f(γ0 + γ1y − γ2E(∆p2|y) + γ2k2)(πk(x)− πk(y)
)∣∣∣∣≤
K∑k=−K
∣∣∣f(γ0 + γ1x− γ2E(∆p2|x) + γ2k2)
−f(γ0 + γ1y − γ2E(∆p2|y) + γ2k
2)∣∣∣πk(x)
+K∑
k=−K
∣∣∣πk(x)− πk(y)∣∣∣ .
Now, using uniform continuity of f , the differences∣∣f(γ0 + γ1x− γ2E(∆p2|x) + γ2k
2)−f(γ0 + γ1y − γ2E(∆p2|y) + γ2k
2)∣∣ can be made less that ε′ > 0 whenever |x − y| < ε2
for any x, y in the state-space of the chain. Also recall, that by A3 functions x 7→ πk(x)
are Lipschitz for all k = −K . . .K. Therefore we can select C = maxC−K . . . CK s.t.∣∣πk(x)− πk(y)∣∣ ≤ C|x− y| for all x, y in the state-space of the chain. Hence, we arrive at
the following inequality:
|Pxf − Pyf | ≤ ε′ + (2K + 1)C|x− y| .
105
Next, consider the two-step transition probabilities. Using similar arguments we have:
|P 2xf − P 2
y f | =∣∣Px(Px′f)− Py(Py′f)
∣∣≤
K∑k=−K
|Px′f − Py′f |πk(x) +K∑
k=−K
∣∣πk(x)− πk(y)∣∣ ,
where x′ = γ0 + γ1x − γ2E(∆p2|x) + γ2k2 and analogously for y′. Then |x′ − y′| =∣∣∣γ1(x− y)− γ2
(E(∆p2|x)− E(∆p2|y)
)∣∣∣ ≤ φ|x− y| by A4, where φ < 1. Hence we have:
|P 2xf − P 2
y f | ≤ ε′ + (2K + 1)Cφ|x− y|+ (2K + 1)C|x− y| .
By induction,
|Pnx f − Pn
y f | ≤ ε′ + (2K + 1)C|x− y|n−1∑j=0
φj
≤ ε′ +(2K + 1)C
1− φ|x− y|
≤ ε′ +(2K + 1)C
1− φε2 ≤ ε1 .
Hence, collection of Markov transition kernels Pnf : n ≥ 1 is equicontinuous for IV-
GARCH(1,1) model.
3. Lastly, we show that the pointσ2
is a reachable state of the chain σ2i : i ∈ Z. It is
enough to show that for any open O ∈ B(X) containingσ2
there exists 1 ≤ n <∞ s.t.
Pn(x,O) > 0 for any starting value x in the state-space of the chain.
From equation (4) we see that σ2i can be written as:
σ2i = γ0
i−1∑j=0
γi1 + γi
1σ20 + γ2
i−1∑j=0
γi−1−j1 ∆p2
j − γ2
i−1∑j=0
γi−1−j1 E(∆p2
j |σ2j ) .
Consider the case when ∆p2i : i ∈ Z is a sequence of zero price innovations, where each
zero price innovation has probability π0(σ2i ), which by A2 is strictly greater than zero for
all σ2i in the state-space of the chain. By Proposition 2 the limit of σ2
i is given by:
limi→∞
σ2i = lim
i→∞γ0
i−1∑j=0
γi1 + lim
i→∞γi
1σ20 − lim
i→∞γ2
i−1∑j=0
γi−1−j1 E(∆p2
j |σ2j )
= σ2 ,
and by definition of the limit there exist 1 ≤ n < ∞ s.t. σ2i is arbitrary close to
σ2
with probability∏n
i=0 π0(σ2i ) > 0.
This finishes the proof of Theorem (3).
PROOF OF THEOREM 4: Similarly to the IV-GARCH model, we write the noise-free skeleton
of the conditional volatility process σ2i : i ∈ Z in (5) as follows:
σ2i = γ0 − γ2
N∑j=1
θj−1E(∆p2i−j |σ2
i−j) , (A.4)
106
where the sequence θj : j ≥ 0 is from the power series expansion of (1 − z)−d, refer to
Hosking (1981), and an arbitrary large 0 < N < ∞. Assume that the process is started at
σ2−N . The following recursion holds:
σ2−N = γ0
σ2−N+1 = σ2
−N − γ2 θ0 E(∆p2−N |σ2
−N )
σ2−N+2 = γ0 − γ2 θ0 E(∆p2
−N |σ2−N ) + γ2 θ0 E(∆p2
−N |σ2−N )
− γ2 θ1 E(∆p2−N |σ2
−N )− γ2 θ0 E(∆p2−N+1|σ2
−N+1)
= σ2−N+1 − γ2 θ0 E(∆p2
−N+1|σ2−N+1) + γ2(θ0 − θ1)E(∆p2
−N |σ2−N )
σ2−N+3 = σ2
−N+2 − γ2 θ0 E(∆p2−N+2|σ2
−N+2) + γ2(θ0 − θ1)E(∆p2−N+1|σ2
−N+1)
+ γ2(θ1 − θ2)E(∆p2−N |σ2
−N )...
By induction we can write:
σ2i = σ2
i−1 − γ2 θ0 E(∆p2i−1|σ2
i−1) + γ2
N∑j=2
(θj−2 − θj−1)E(∆p2i−j |σ2
i−j) ,
from where sufficiency of σ2 7→ σ2 − γ2E(∆p2|σ2) ≥ 0 follows by letting N to infinity and
non-negativity of θj−1 − θj : j ≥ 0.
PROOF OF THEOREM 5: From Theorem 4 follows that, starting from any σ2−N ∈ R+,
the sequence σ2i : i ∈ Z from the noise-free skeleton (A.4) of the volatility process in IV-
FIARCH(∞) model is bounded below by zero. At the same time, equation (A.4) implies that
the sequence σ2i : i ∈ Z is monotonically decreasing. Hence, the result follows from the
convergence theorem for monotonic bounded sequences; see Theorem 3.14 in Rudin (1976).
References
Andersen, Torben G. and Tim Bollerslev (1998a) Deutsche Mark-Dollar volatility: intraday
activity patterns, macroeconomic announcements and longer run dependencies. Journal
of Finance, vol. 53, pp. 219-265.
Andersen, Torben G. and Tim Bollerslev (1997a) Intraday periodicity and volatility persistence
in financial markets. Journal of Empirical Finance, vol. 4, pp. 115-158.
Andersen, Torben G. and Tim Bollerslev (1997b) Heterogeneous information arrivals and re-
turns volatility dynamics: uncovering the long-run in high frequency returns. Journal of
Finance, vol. 52, pp. 975-1005.
Andersen, Torben G., Tim Bollerslev and Jun Cai (2000) Intraday and interday volatility in
the Japanese stock market. Journal of International Financial Markets, Institutions and
Money, vol. 10, pp. 107-130.
107
Baillie, Richard T. (1996) Long memory processes and fractional integration in econometrics.
Journal of Econometrics, vol. 73, pp. 5-59.
Ball, C. (1988) Estimation bias induced by discrete security prices. Journal of Finance, vol. 43,
pp. 841-865.
Berndt, E., B. Hall, R. Hall and J. Hausman (1974) Estimation and inference in non-linear
structural models. Annals of Economic and Social Measurement, vol. 3, pp. 653-665.
Bollerslev, Tim (1986) Generalized autoregressive conditional heteroskedasticity. Journal of
Econometrics, vol. 31, pp. 307-327.
Bollerslev, Tim, Ray F. Chou and Kenneth F. Kroner (1992) ARCH modeling in finance.
Journal of Econometrics, vol. 52, pp. 5-59.
Campbell, John Y., Andrew W. Lo, and A. Craig MacKinlay (1997) The econometrics of
financial markets. Princeton, New Jersey: Princeton University Press.
Cox, D. R. (1981) Statistical analysis of time series: some recent developments. Scandinavian
Journal of Statistics, vol. 8, pp. 93-115.
Crack, Timothy Falcon and Olivier Ledoit (1996) Robust structure without predictability: the
“compass rose” pattern of the stock market. Journal of Finance, vol. 51, no. 2, pp. 751-762.
Davis, A. Richard, Tina Hviid Rydberg, Neil Shephard and Sarah B. Streett (2001) The CBIN
model for counts: testing for common features in the speed of trading, quote changes,
limit and market order arrivals. Preprint.
Doornik, Jurgen A. (1998) Object-Oriented Matrix Programming using Ox 2.0. London: Tim-
berlake Consultants Ltd and Oxford: www.nuff.ox.ac.uk/Users/~Doornik.
Drost, Feike C. and Theo E. Nijman (1993) Temporal aggregation of GARCH processes. Econo-
metrica, vol. 61, no. 4, pp. 909-927.
Dufour, Alfonso and Robert F. Engle (2000) Time and the price impact of a trade. Journal of
Finance, vol. 55, no. 6, pp. 2467-2498.
Engle, Robert F. (2000) The econometrics of ultra-high-frequency data. Econometrica, vol. 68,
no. 1, pp. 1-22.
Engle, Robert F. (1982) Autoregressive conditional heteroskedasticity with estimates of the
variance of U.K. inflation. Econometrica, vol. 50, pp. 987-1008.
Engle, Robert F. and Jeffrey R. Russell (1998) Autoregressive conditional duration: a new
model for irregularly spaced transaction data. Econometrica, vol. 66, pp. 1127-1162.
Feller, William (1968) An Introduction to Probability Theory and its Applications, 3rd edition,
Wiley.
Gerhard, Frank and Winfried Pohlmeier (2000) On the Simultaneity of Components of the
Transaction Process. University of Konstanz. Preprint.
Ghysels, Eric (2000) Some econometric recipes for high-frequency data cooking. Journal of
Business and Economic Statistics, vol. 18, no. 2, pp. 154-163.
Ghysels, Eric and Joanna Jasiak (1997) GARCH for irregularly spaced financial data: the
ACD-GARCH model. Preprint.
108
Gottlieb, Gary and Avner Kalay (1985) Implications of the discreteness of observed stock prices.
Journal of Finance, vol. 40, pp. 135-153.
Glynn, Peter and Sean Meyn (1997) Tightness for non-irreducible Markov chains. Preprint.
Hamilton, James D. and Raul Susmel (1994) Autoregressive conditional heteroskedasticity and
changes in regime. Journal of Econometrics, vol. 64, pp. 307-333.
Hasbrouck, Joel (1999) The dynamics of discrete bid and ask quotes. Journal of Finance,
vol. 54, no. 6,pp. 2109-2142.
Hausman, Jerry A., Andrew W. Lo and A. Craig MacKinlay (1992) An ordered probit analysis
of transaction stock prices”, Journal of Financial Economics, vol. 31, pp. 319-379.
Hautsch, Nikolas and Winfried Pohlmeier (2002) Econometric analysis of financial transaction
data: pitfalls and opportunities. Allgemeines Statistisches Archiv, vol. 86, pp. 5-30.
Hosking, Jonathan R.M. (1981) Fractional differencing. Biometrika, vol. 68, pp. 165-76.
Jarque, C. M. and A. K. Bera (1987) A test for normality of observations and regression
residuals. International Statistical Review, vol. 55, pp. 163-172.
Jasiak, Joanna (1998) Persistence in intertrade durations. Finance, vol. 19, pp. 166-195.
Johnston, Norman L. and Samuel Kotz (1969) Discrete distributions, Houghton Mifflin Com-
pany, Boston.
Lamoureux, Christopher G. and William D. Lastrapes (1990) Persistence in variance, structural
change and the GARCH model. Journal of Business and Economic Statistics, vol. 8,
pp. 225-234.
Liu, Ming (2000) Modeling long memory in stock market volatility. Journal of Econometrics,
vol. 99, pp. 139-171.
Ljung, G. M. and G. P. E. Box (1978) On a measure of lack of fit in time series models.
Biometrika, vol. 66, pp. 66-72
Meyn, Sean and Richard L. Tweedie (1993) Markov chains and stochastic stability, Springer-
Verlag, London.
McFadden, Daniel L. (1984) Econometric analysis of qualitative response models. In Handbook
of Econometrics, vol. II, Elsevier Science, North-Holland.
Nelson, Daniel B. (1991) Conditional heteroskedasticity in asset returns: a new approach.
Econometrica, vol. 59, pp. 347-370.
Roll, R. (1984) A simple implicit measure of the effective bid-ask spread in an efficient market.
Journal of Finance, vol. 39, pp. 1127-1140.
Rudin, Walter (1976) Principles of mathematical analysis. Third Edition, McGraw-Hill Inter-
national.
Russell, Jeffrey R. and Robert F. Engle (1998) Econometric analysis of discrete-valued irregularly-
spaced financial transactions data using a new autoregressive conditional multinomial
model. Preprint.
Rydberg, Tina Hviid and Neil Shephard (2000) A modelling framework for the prices and times
of trades made on the New York stock exchange. Preprint.
109
Rydberg, Tina Hviid and Neil Shephard (2003) Dynamics of trade-by-trade price movements:
decomposition and models. Journal of Financial Econometrics, vol. 1, pp. 2-25.
Szpiro, George G. (1998) Tick size, the compass rose and market nanostructure. Journal of
Banking and Finance, vol. 22, pp. 1559-1569.
Tong, Howell (1990) Non-linear time series. A dynamic system approach. New York, Oxford
University Press.
Tweedie, Richard L. (1998) Markov chains: structure and applications. Preprint.
110