estimation of continuous time models driven by …...estimation of continuous time models driven by...
TRANSCRIPT
2015-1
Anne Floor Brix
PhD Thesis
DEPARTMENT OF ECONOMICS AND BUSINESS
AARHUS UNIVERSITY DENMARK
Estimation of Continuous Time Models Driven
by Lévy Processes
ESTIMATION OF CONTINUOUS TIME MODELS
DRIVEN BY LÉVY PROCESSES
By Anne Floor Brix
A PhD thesis submitted to
School of Business and Social Sciences, Aarhus University,
in partial fulfilment of the requirements of
the PhD degree in
Economics and Business
August 2014
CREATESCenter for Research in Econometric Analysis of Time Series
PREFACE
This dissertation is the result of my Ph.D. studies at the Department of Economics
and Business at Aarhus University and was written in the period from October 2010
to August 2014. I am grateful to the Department of Economics and Business as well
as the Center for Research in Econometric Analysis of Time Series (CREATES) funded
by the Danish National Research Foundation for providing an excellent research envi-
ronment and for allowing me to attend numerous inspiring courses and conferences
both in Denmark and abroad. I would also like to thank the department for giving me
the opportunity to lecture the course Mathematics - Dynamic Analysis, it was a very
exciting and fruitful experience for me.
I wish to thank my main advisor Asger Lunde for his guidance, many helpful
comments and for encouraging me to apply for a Ph.D. position. I had the pleasure of
working with Asger during my studies, and two of the chapters in this dissertation
are the outcome of our collaboration. I also wish to thank Wei Wei for working with
me and for introducing me to Bayesian techniques. It has been a great experience
working with her and Asger on the last chapter of the dissertation. My co-advisors
Claus Munk and Elisa Nicolato also deserve thanks.
From March 2013 to July 2013 I had the pleasure of visiting Torben G. Andersen
at Kellogg School of Management, Northwestern University. I would like to thank
Torben for inviting me and to thank Kellogg School of Management for their hos-
pitality. I would also like to thank Almut Veraart for letting me visit her at Imperial
College London during December 2012 and 2013. She has been extremely helpful and
encouraging during my studies. Jan Pedersen also deserves my gratitude for many
interesting discussions, helpful comments and guidance in general. I also owe him a
big thank for making it possible for me to change my line of studies from mathematics
to econometrics and finance, back when I was a bachelor student at the Department
of Mathematics. I would also like to thank Michael Sørensen for inspiring discussions
and for inviting me to visit him at University of Copenhagen.
At Aarhus University I would like to thank the faculty and my fellow PhD students
for providing a friendly and inspiring work environment. I owe my old office mates
Lasse and Mikkel H a big thank for saving me from the basement and inviting me
into their office. Special thanks go to my new office mates Mikkel B and Strange for
all their help and numerous hours of fun discussions about math, statistics, gossip
i
ii
and Game of Thrones. I would also like to thank Jannie, Tine and in particular Sanni
for their support and for always being up for coffee and a chat when I needed it. All
my friends at CREATES, especially, Anders Kock, Laurent, Juan Carlos, Manuel, Jonas,
Rasmus, Mikkel B and Strange deserve a huge thank for making all the lunch breaks,
Friday bars and other social activities fun and unforgettable. Johannes and Strange
also deserve my gratitude for providing friendly and patient LATEX-support.
I am thankful for the never-ending encouragement and support from my parents,
family and friends. Thank you for helping me take my mind off the dissertation from
time to time.
Finally and most importantly, I would like to thank my husband Rasmus for all his
love and encouragement. Thank you for always believing in me, I truly would never
have made it without you.
Anne Floor Brix
Aarhus, July 2014
UPDATED PREFACE
The pre-defence meeting was held on September 5, 2014, in Aarhus. I would like
to thank the assessment committee consisting of Fred Espen Benth, University of
Oslo, Almut Veraart, Imperial College, London and Niels Haldrup, Aarhus University
for their careful reading of the dissertation and their constructive comments and
suggestions. I am very grateful for the inputs from our discussion and many of the
suggestions have been incorporated in the present version of the dissertation.
Anne Floor Brix
Aarhus, December 2014
iii
CONTENTS
Summary vii
Danish summary xiii
1 PBEFs for Stochastic Volatility Models with Noisy Data 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Estimating Stochastic Volatility Models . . . . . . . . . . . . . . . . . 4
1.3 A Monte Carlo Study of the Finite Sample Performances . . . . . . . 16
1.4 Empirical Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.5 Conclusion and Final Remarks . . . . . . . . . . . . . . . . . . . . . . 31
1.6 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
1.7 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2 On Estimation Methods for non-Gaussian OU Processes 47
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.2 Non-Gaussian Ornstein-Uhlenbeck Processes . . . . . . . . . . . . . 50
2.3 Estimation Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
2.4 Monte Carlo Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
2.5 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
2.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
2.7 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
2.8 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
3 A Generalized Schwartz Model for Energy Spot Prices 85
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
3.2 Model Descriptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
3.3 Data Description and Initial Analysis . . . . . . . . . . . . . . . . . . . 94
3.4 Estimation Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
3.5 Estimation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
3.6 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
3.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
3.8 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
v
vi CONTENTS
3.9 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
SUMMARY
The dissertation comprises three self-contained chapters on different methods for
estimating continuous time models driven by Lévy processes. The first two chapters
focus on the derivation and finite sample performances of estimators based on
estimating functions. In Chapter 1, prediction-based estimating functions are used
to estimate a model with diffusion type stochastic volatility. In Chapter 2, the purely
jump-driven non-Gaussian Ornstein-Uhlenbeck (OU) processes are estimated using
martingale estimating functions. In both chapters, a Monte Carlo simulation study is
carried out and the performance of the estimators are compared to the performance
of other competing estimators. In Chapter 3, the features of the two models from the
previous chapters are combined and a two-factor OU-based model with stochastic
volatility and jumps is proposed as a model for commodity spot prices. Due to the
complexity of the model under consideration, the methods based on estimating
functions are disregarded, and instead the Bayesian technique of Markov Chain
Monte Carlo (MCMC) with particle filters are employed.
The first chapter investigates the finite sample performances of estimators based
on prediction-based estimating functions (PBEFs). More specifically, we consider
estimation of the widely popular Heston model. Ever since the introduction of PBEFs
in Sørensen (2000), many papers have suggested using PBEFs to estimate various
models, including stochastic volatility models. However, the actual implementation
and the performance of this estimation approach was still to be investigated and
served as the starting point for our endeavors in Chapter 1. PBEFs are a generalization
of martingale estimating functions (MGEFs), and are particularly useful when consid-
ering non-Markovian models and other models where conditional moments become
analytically intractable. PBEFs are instead based on unconditional moments and aim
at approximating the unknown score function, which is a MGEF, by suitably choosing
the functions of the data to predict and the corresponding prediction spaces, used
for constructing the PBEFs. There are no guidelines on how to make these choices,
and the choices might impact the loss in efficiency from not using the likelihood
function to conduct inference. Inspired by the stylized fact of volatility clustering, we
consider predicting squared five minute returns using the most recent observations
of squared returns. As a method of reference, used for evaluating the finite sample
performance of the PBEF-based estimator, we consider the GMM estimator from
vii
viii SUMMARY
Bollerslev and Zhou (2002). The GMM estimator is based on conditional moments of
the latent daily integrated volatility and is constructed using realized variance as a
proxy. Our Monte Carlo study investigates the possible informational gain from using
the intra-daily returns directly in the PBEF-based estimator versus aggregating them
into the realized measures used in the GMM estimator.
Besides investigating the potential of PBEFs in the context of stochastic volatility
models, we also extend both estimators to account for the presence of market mi-
crostructure noise in the observations, thereby making the estimators more suitable
for practical applications. In both settings, with and without noise, our Monte Carlo
study reveals great promise for the PBEF-based estimator. Finally, we conduct an
empirical study, where the Heston model is fitted to SPDR S&P 500 (SPY) data. The
study shows how the flexibility of the PBEF-based estimator can be utilized to check
for model misspecification.
The focus of the second chapter is on estimation of non-Gaussian OU processes
driven by Lévy subordinators. These processes were made popular by Barndorff-
Nielsen and Shephard (2001) in the context of stochastic volatility models and used
by Benth, Kallsen, and Meyer-Brandis (2007), among others, to model commodity
spot prices. In most multi-factor models found in the literature on commodity spot
price modeling, the non-Gaussian OU factors are split using various filtering tech-
niques in a first step and then estimated separately in a subsequent step. Chapter 2
investigates the performance of different estimators used for estimating one-factor
non-Gaussian OU processes. The chapter therefore aims at answering the question
of which estimation procedure to invoke after splitting the factors in a multi-factor
model. In contrast to Chapter 1, the models are now Markovian. The transition den-
sity of the observations is however still not tractable, making maximum likelihood
estimation infeasible. We consider two ways of circumventing this problem. The first
approach, put forward in Valdivieso, Schoutens, and Tuerlinckx (2009), is to consider
approximating the likelihood function using the fast Fourier transform (FFT) and
the analytical expression for the characteristic function of the Lévy functionals. As a
second approach, we derive the optimal quadratic martingale estimating function
(MGEFs). The finite sample performance of the two estimators are then investigated
in a Monte Carlo study. Both finite and infinite activity OU processes are considered
and two different parameter settings, resembling respectively the base-signal and
spike part of the commodity spot price, are investigated. The performances of the
two estimation methods are also compared to that of straightforward methods, such
as quasi maximum likelihood and simple moment matching.
The performance of the FFT MLE method from Valdivieso et al. (2009) was supe-
rior in the ideal setup with simulated data and no model misspecification. However,
leaving this setup, the MGEF-based estimation method seems like a numerically more
robust approach, especially if we consider the trouble the FFT MLE method has with
handling high-frequency data and the sensitivity towards the nuisance parameters
ix
used for fine-tuning the FFT algorithm. Furthermore, using the MGEF-based method,
the mean-reversion parameter as well as the parameters governing the marginal
distribution can also be estimated simultaneously for finite activity processes. The
MGEF-based method also has the advantage of being able to handle a setup with
noise in the observations. This extension and others are discussed at the end of the
chapter.
The third and last chapter of the dissertation combines the building blocks from
Chapter 1 and 2 and propose a two-factor OU based commodity spot price model
with stochastic volatility and spikes. The model and benchmark models are fitted to
UK natural gas spot prices using Bayesian estimation techniques. We use a Markov
Chain Monte Carlo method with a particle filter (PMCMC), introduced in Andrieu,
Doucet, and Holenstein (2010), in order to avoid the aforementioned splitting of
the OU factors, often encountered in the literature, and are able to estimate the
complex model in one step. Besides adapting the PMCMC approach to our proposed
model, the chapter contributes by investigating the interplay between the inclusion
of stochastic volatility in the model and having a separate factor to account for the
spikes. Furthermore, our PMCMC method enables us to consider both a continuous
and purely jump-driven specification of the volatility process and thereby assess
whether the volatility specifications also influence the dynamics of the spike process.
Another advantage of the PMCMC method is that one of the outputs from the particle
filter is an estimate of the likelihood function, making model comparison easy.
It turns out, that the inclusion of stochastic volatility in the process, used for
modeling the base-signal, has a strong impact on the jump intensity and jump size
distribution in the spike process. Our study reveals, that for the UK natural gas data,
having stochastic volatility in the model is much more important than allowing for
jumps, and that neglecting to include stochastic volatility will drive up the jump
intensity severely. We also find, that even though the jumps only account for a small
part of the variations in the data, having separate mean-reversion rates for the two
OU-factors justifies the inclusion of jumps in the model. By tracking the likelihood
ratio sequentially it becomes clear, that the jump-driven volatility specification is well
suited for modeling the volatile periods with large changes in the volatility, whereas
the continuous specification is better at handling the more tranquil periods.
The Bayesian estimation approach employed in the last chapter is very different
from the classical estimation methods investigated in the other chapters. In classical
statistics, the parameters of interest, θ, are considered fixed but unknown, and the
data, y , is treated as a random realization from a hypothetical infinitely large data set.
The object of interest is the likelihood p(y |θ) (or its derivative, the score), from which
parameter estimates are obtained. In the Bayesian framework, θ is assumed to have a
distribution and the target is now p(θ|y) ∝ p(y |θ)p(θ), with the prior, p(θ), allowing
prior beliefs about θ to be included in the estimation method.
The methods from Chapter 1 and Chapter 2, which are based on estimating
x SUMMARY
functions, have the advantage of being fairly straight forward to implement, as they
only rely on unconditional and conditional moments respectively. Furthermore, the
methods are simulation free and are fast to execute. The relative simplicity of the
PBEF-based estimation approach, along with the promising results found in the first
chapter, suggest that the method could be used for estimating more complex Lévy-
driven models, such as the multi-factor model considered in Chapter 3. However, the
estimation method still remains to be tested on non-Gaussian Lévy based models,
like the superposition of non-Gaussian OU processes, as was suggested in Benth
et al. (2007). Since the PBEF-based approach only relies on unconditional moments,
which might have to be simulated in complex models, and due to the unpredictable
nature of the jumps in Lèvy processes the method might have a hard time producing
reliable results for multi-factor models with a high-dimensional parameter vector.
One solution to this potential problem could be, as previously discussed, to disen-
tangle the factors of the model in a first step and then in a second step apply the
relevant estimation methods from Chapter 1 and 2 on each factor. Of course, the
major drawback of this approach would be the loss of coherency, as the obtained
results might depend on the filtering technique used for splitting the factors and the
interplay between the factors is lost, since the estimation of each factor is carried out
separately. This drawback can be overcome by using Bayesian techniques, such as
the PMCMC method from Chapter 3. The method also has the advantage, that model
comparison and computation of statistics, like standard errors, come almost for free,
since the method estimates the entire distribution of θ, and the likelihood function
follows as a by-product of the particle filter. The underlying states of the model, like
the stochastic volatility process, can also be obtained from the particle filter. This is a
clear advantage of the Bayesian approach, since pricing of financial derivatives, such
as forward contracts, often depend on the filtered states of the spot price model.
One disadvantage of the Bayesian approach is its lack of simplicity, in the sense
that the approach might not be easy to explain to practitioners. Furthermore, a lot of
choices have to be made when implementing the method, like how many particles
to use in the filter and which priors to use. Time issues also arise if one wishes to
re-estimate the model on a daily basis, as the most general model from Chapter 3
takes several days to estimate. However, after estimating the model parameters, then
updating the filtered state variables as new data arrives, is just a question of running
the particle filter once. A task which takes less than a minute.
When estimating complex Lévy-driven models, the Bayesian methods is, in my
opinion, to be preferred and will in fact often be the only feasible approach. In the
context of commodity spot price modeling, topics for future research could include
estimation of advanced multi-dimensional cross-commodity models, like the one
recently proposed by Benth and Vos (2013), or the models found in Barndorff-Nielsen,
Benth, and Veraart (2013) which are driven by Lévy semistationary processes.
xi
References
Andrieu, C., Doucet, A., Holenstein, R., 2010. Particle markov chain monte carlo
methods. Journal of the Royal Statistical Society: Series B (Statistical Methodology)
72 (3), 269–342.
Barndorff-Nielsen, O. E., Benth, F. E., Veraart, A. E. D., 2013. Modelling energy spot
prices by volatility modulated Lévy-driven Volterra processes. Bernoulli 19(3),
803–845.
Barndorff-Nielsen, O. E., Shephard, N., 2001. Non-Gaussian OU-based models and
some of their uses in financial economics (with dicussion). Journal of the Royal
Statistical Society B 63, 167–241.
Benth, F. E., Kallsen, J., Meyer-Brandis, T., 2007. A non-Gaussian Ornstein-Uhlenbeck
process for electricity spot price modeling and derivatives pricing. Applied Mathe-
matical Finance 14:2, 153–169.
Benth, F. E., Vos, L., 2013. Cross-commodity spot price modeling with stochastic
volatility and leverage for energy markets. Advances in Applied Probability 45,
545–571.
Bollerslev, T., Zhou, H., 2002. Estimating stochastic volatility diffusions using condi-
tional moments of integrated volatility. Journal of Econometrics 109, 33–65.
Sørensen, M., 2000. Prediction-based estimating functions. Econometrics Journal 3,
123–147.
Valdivieso, L., Schoutens, W., Tuerlinckx, F., 2009. Maximum likelihood estimation
in processes of Ornstein-Uhlenbeck type. Statistical Inference for Stochastic Pro-
cesses 12, 1–19.
DANISH SUMMARY
Afhandlingen indeholder tre selvstændige kapitler der alle omhandler estimation af
kontinuert-tids modeller, hvor stokastikken er drevet af Lévy processer. De to første
kapitler fokuserer på udledningen af estimatorer, som bygger på estimationsfunktio-
ner, samt på undersøgelsen af estimatorernes egenskaber, når man betragter endelige
stikprøver. I kapitel 1 bruges prædiktionsbaserede estimationsfunktioner til at esti-
mere en difussionsbaseret stokastisk volatilitets model. I kapitel 2 betragtes i stedet
springdrevne ikke-Gaussiske Ornstein-Uhlenbeck (OU) processer, og disse estimeres
vha. martingal estimationsfunktioner. I begge kapitler sammenlignes estimatorernes
performance med andre konkurrerende estimatorer i et Monte Carlo simulations
studie. I kapitel 3 kombineres karakteristika for modellerne fra de to foregående kapit-
ler, til en to-faktor OU-baseret model med stokastisk volatilitet og spring. Modellen
bruges i kapitel 3 til at modellere råvarepriser. På grund af modellens kompleksitet be-
nyttes den Bayesianske estimationsmetode Markov Chain Monte Carlo (MCMC) med
et partikelfilter, i stedet for estimationsmetoderne baseret på estimationsfunktioner.
I det første kapitel undersøges brugen af prædiktionsbaserede estimationsfunktio-
ner (PBEF) til at estimere den populære Heston model. Siden PBEF blev introduceret
i Sørensen (2000), har mange artikler foreslået brugen af PBEF, bl.a. til at estime-
re stokastiske volatilitets modeller. Dog er implementeringen og kvaliteten af den
PBEF-baserede estimationsmetode endnu ikke undersøgt, og dette er derfor ud-
gangspunktet for vores analyse i kapitel 1. PBEF er en generalisering af martingal
estimationsfunktioner (MGEF), som er specielt brugbar når man betragter modeller
der ikke er Markovianske, eller andre modeller, hvor betingede middelværdier er svæ-
re at beregne analytisk. Byggestenene i PBEF er i stedet ubetingede middelværdier,
og man forsøger at approksimere den ukendte scorefunktion, der selv er en MGEF,
ved nøje at vælge hvilken funktion af data der prædikteres og hvilke prædiktorer der
skal bruges til at udforme PBEF’en. Der er ingen retningslinjer for, hvordan disse
valg foretages, og valgene kan have indflydelse på tabet af efficiens, ved ikke at bruge
maximum likelihood estimatoren. Inspireret af det velkendte fænomen volatility
clustering, vælger vi at prædiktere kvadrerede afkast vha. de seneste observerede
kvadrerede afkast. Den resulterende PBEF implementeres derefter og bruges til at
estimere Heston modellen i et Monte Carlo studie. Som referencemetode betragter vi
også GMM estimatoren fra Bollerslev and Zhou (2002). GMM estimatoren er baseret
xiii
xiv DANISH SUMMARY
på betingende middelværdier af funktioner af den latente integrerede daglige volati-
litet (IV), og konstrueres ved at benytte realized variance som et proxy for IV. Vores
Monte Carlo studie undersøger den mulige informationsfordel der kunne ligge i at
bruge intra-dags afkast direkte, som i PBEF estimatoren, i stedet for at aggregere dem
til daglige mål og bruge disse til at konstruere estimatorer.
Udover at undersøge den PBEF-baserede metodes potentiale indenfor estimation
af stokastiske volatilitets modeller, udvider vi også begge estimationsmetoder til at
kunne håndtere market microstructure støj. Både i scenariet med og uden støj i data
viser vores Monte Carlo studie, at den PBEF-baserede metode har et stort potentiale,
især når man kun har et kort datasæt til rådighed. Til sidst i kapitlet udfører vi et
empirisk studie, hvor vi fitter Heston modellen til SPDR S&P 500 (SPY) data vha. af
de to forskellige estimationsmetoder. Fra anvendelsen af markedsdata ses, hvordan
fleksibiliteten af den PBEF-baserede metode kan bruges til at udføre robusthedstjek
for modelmisspecifikation.
I kapitel 2 er fokus på estimation af ikke-Gaussiske OU processer drevet af Lévy su-
bordinatorer. Disse processer blev gjort populære i Barndorff-Nielsen and Shephard
(2001), som modeller for den stokastiske volatilitet, men er siden også blevet brugt til
at modellere råvarepriser i bl.a. Benth et al. (2007). I de fleste multi-faktor modeller,
som findes i litteraturen om modellering af råvarepriser, bruges diverse filtreringstek-
niker til først at skille de indgående faktorer ad, hvorefter de estimeres separat. Kapitel
2 undersøger performance af forskellige estimatorer brugt til at estimere en-faktor
ikke-Gaussiske OU processer. Kapitlet forsøger derved at besvare spørgsmålet om,
hvilken estimationsmetode der er bedst at bruge efter modellens faktorer er skilt ad. I
modsætning til kapitel 1 er modellerne nu Markov. Tæthedsfunktionen for transitio-
nerne er dog stadigvæk ikke tilgængelig og maximum likelihood estimation er derfor
ikke mulig. Vi betragter to forskellige metoder til at omgå dette problem. Den første
metode vi betragter er metoden fra Valdivieso et al. (2009), hvor likelihood funktionen
approksimeres vha. fast Fourier transform (FFT) algoritmen og den karakteristiske
funktion for processens Lévy-baserede tilvækster. Den anden estimationsmetode
bygger på den optimale kvadratiske martingal estimationsfunktion (MGEF), som vi
udleder analytisk for den ikke-Gaussiske OU process. De to estimatorers egenskaber
undersøges derefter i et Monte Carlo studie, hvor både processer med endelig og
uendelig springaktivitet betragtes. For hver process betragter vi to forskellige konfigu-
rationer af parametrene, alt efter om det simulerede data skal minde om base-signal
eller spike-delen af råvareprisen. De to estimatorers performance sammenlignes og-
så med simple estimationsmetoder, såsom quasi maximum likelihood og moment
matching.
FFT MLE metoden fra Valdivieso et al. (2009) havde den klart bedste performance
i vores simulations-setup, uden modelmisspecifikation. Hvis man bevæger sig væk
fra dette setup virker den MGEF-baserede metode dog til at være mere numerisk
robust, især set i lyset af de vanskeligheder FFT MLE metoden har med at håndtere
xv
højfrekvent data, og den følsomhed metoden har overfor parametrene der bruges
til at fine-tune FFT algoritmen. MGEF metoden har også den fordel, at alle model-
lens parametre stadig estimeres simultant, når vi betragter processer med endelig
springaktivitet. Ydermere har MGEF metoden, i modsætning til FFT MLE metoden,
potentiale for at kunne udvides til at håndtere støj i observationerne. Denne udvidel-
se, og andre, diskuteres sidst i kapitlet.
Det tredje og sidste kapitel i afhandlingen kombinerer modellerne fra de to foregå-
ende kapitler og betragter en to-faktor OU baseret model med stokastisk volatilitet og
spring. Ved hjælp af Bayesianske estimationsteknikker fittes modellen og andre ben-
chmark modeller til naturgasspotpriser fra Storbritanien (UK). Mere specifikt, bruger
vi en Markov Chain Monte Carlo metode med et partikelfilter (PMCMC), introduceret
i Andrieu et al. (2010), for at undgå at splitte de indgående faktorer fra hinanden før
modellen kan estimeres. Estimationsmetoden, og den foreslåede model, muliggør
en undersøgelse af vigtigheden af at inkludere stokastisk volatilitet og forskellige
mean-reversion rates for faktorerne i modelspecifikationen. Vores PMCMC estima-
tionsmetode gør det også muligt at betragte både en kontinuert og en springdreven
specifikation af volatiliteten, og derved undersøge specifikationernes indflydelse på
fx. estimaterne i springdelen af prisprocessen. En anden fordel ved PMCMC metoden
er, at likelihood funktionen automatisk estimeres i partikelfiltret og sammenligning
af forskellige modeller er derfor let af udføre.
Vi viser, at det er yderst vigtigt at have stokastisk volatilitet med i modellen for
naturgasspotpriserne, samt at denne beslutning har stor indflydelse på springdelen af
prisprocessen. Vores studie viser også, at hvis stokastisk volatilitet ikke inkorporeres
i modellen, så bliver springintensiteten kraftigt overestimeret. Vores studie viser at
forskellige mean-reversion rates for faktorerne retfærdiggør spring i modellen, selv-
om springene kun står for en lille andel af variansen i spotpriserne. Ved at betragte
likelihood ratioen som en funktion af tid kan vi se, at den springdrevne volatilitets-
specifikation er bedst til modellere volatilt data, hvor variansen af variansen er stor,
hvorimod den kontinuerte specifikation vi betragter er bedst til at håndtere de mere
rolige perioder.
Den Bayesianske estimationsmetode fra det sidste kapitel er meget anderledes
end de klassiske estimationsteknikker fra de andre kapitler. I klassisk statistik bli-
ver parametrene man ønsker at bestemme, θ, betragtet som faste men ukendte og
data, y , betragtes som en tilfældig stikprøve fra en hypotetisk uendelig stor popu-
lation. Man er hovedsageligt interesseret i at beregne likelihoodfunktionen p(y |θ)
(eller dens afledte, scorefunktionen) idet denne bruges til at bestemme et estimat
for θ. I Bayesiansk statitisk antages det, at θ også har en fordeling og ikke blot er
fast. Man er nu interesseret i den a posteri fordeling p(θ|y) ∝ p(y |θ)p(θ), hvor p(θ)
betegner a priori viden (forhåndsviden) om parametrene, og ud fra denne bestemme
parameterestimatet.
Metoderne fra kapitel 1 og 2, der er baseret på estimationsfunktioner, har den
xvi DANISH SUMMARY
fordel at de er forholdsvis lette at implementere, da de kun afhænger af henholdsvis
ubetingede og betingede middelværdier. Derudover benyttes ingen simulations-
teknikker og metoderne er hurtige at udføre. Den PBEF-baserede metodes relative
simplicitet, sammenholdt med de lovende resultater fra det første kapitel kunne
antyde, at metoden også har potentiale i forbindelse med estimation af mere kompli-
cerede Lévy-drevne modeller, som fx. den multi-faktor model vi betragter i kapitel 3.
Dette potentiale er dog endnu ikke testet på ikke-Gaussiske Lévy-drevne modeller,
såsom superpositioner af ikke-Gaussiske OU processer, hvilket blev foreslået i Benth
et al. (2007). Metoden har muligvis svært ved at producere nøjagtige resultater for
multi-faktor modeller med mange parametre, da metoden kun afhænger af ubetinge-
de middelværdier, hvilke muligvis skal simuleres i komplekse modeller, og pga. den
ikke-prædiktable opførsel som springene i Lévy processer har. En mulig løsning på
dette potentielle problem kunne være, som tidligere diskuteret, at splitte de indgåen-
de faktorer ad og dernæst estimere dem separat, fx. vha. metoderne fra kapitel 1 og 2.
Den største ulempe ved denne løsning er naturligvis den manglende sammenhæng i
estimationen. Parameterestimaterne kommer potentielt til at afhænge af den valgte
metode der er brugt til at splitte faktorerne, og da hver faktor efterfølgende estimeres
separat, går samspillet mellem faktorernes specifikation tabt. Denne problemstilling
kan løses ved brugen af Bayesianske estimationsmetoder og var præsis motivationen
for brugen af PMCMC metoden i kapitel 3. PMCMC metoden har også den fordel, at
det er let at sammenligne forskellige modeller samt beregne fx. standardafvigelser,
da likelihood funktionen estimateres i partikelfiltret og man med PMCMC metoden
estimerer fordelingen for θ. Modellens underliggende variable, såsom den stokasti-
ske volatilitetsprocess, kan også estimeres vha. partikelfiltret. Dette er en stor fordel
ved den Bayesianske estimationmetode idet prisfastsættelse af afledte finansielle
instrumenter, som fx. forwardkontrakter, ofte afhænger af disse variable.
En mulig ulempe ved den Bayesianske tilgang til estimation er manglende simpli-
citet, da metoden potentielt kan være svær at forklare til praktikere. Desuden skal der
foretages en masse valg når metoden implementeres, fx. hvor mange partikler der
skal anvendes i filteret, og hvilken a priori viden der skal specificeres for parametrene.
Endelig kan de Bayesianske metoder være meget tidskrævende. Fx. tager det flere
dage at estimere den mest komplekse model fra kapitel 3. Efter parametrene er esti-
meret for en given model, tager det dog mindre end et minut at opdatere de filtrede
underliggende variable, når nye observationer kommer til.
Når man skal estimere komplekse Lévy-drevne modeller med mange parametre,
er Bayesianske metoder efter min mening det bedste, og ofte eneste valg. I forbindelse
med modellering af råvarepriser, kunne fremtidige forskningsemner eksempelvis
fokusere på estimation af avancerede multi-dimensionale modeller, fx. modellen
fra Benth and Vos (2013), der kan bruges til simultan modellering af flere råvare-
priser, som fx. gas- og elektricitetspriser. Det ville også være interessant at udvikle
estimationsmetoder for modellerne fra Barndorff-Nielsen et al. (2013) som er drevet
xvii
af såkaldte Lévy semistationære processer.
Litteratur
Andrieu, C., Doucet, A., Holenstein, R., 2010. Particle markov chain monte carlo
methods. Journal of the Royal Statistical Society: Series B (Statistical Methodology)
72 (3), 269–342.
Barndorff-Nielsen, O. E., Benth, F. E., Veraart, A. E. D., 2013. Modelling energy spot
prices by volatility modulated Lévy-driven Volterra processes. Bernoulli 19(3),
803–845.
Barndorff-Nielsen, O. E., Shephard, N., 2001. Non-Gaussian OU-based models and
some of their uses in financial economics (with dicussion). Journal of the Royal
Statistical Society B 63, 167–241.
Benth, F. E., Kallsen, J., Meyer-Brandis, T., 2007. A non-Gaussian Ornstein-Uhlenbeck
process for electricity spot price modeling and derivatives pricing. Applied Mathe-
matical Finance 14:2, 153–169.
Benth, F. E., Vos, L., 2013. Cross-commodity spot price modeling with stochastic
volatility and leverage for energy markets. Advances in Applied Probability 45,
545–571.
Bollerslev, T., Zhou, H., 2002. Estimating stochastic volatility diffusions using condi-
tional moments of integrated volatility. Journal of Econometrics 109, 33–65.
Sørensen, M., 2000. Prediction-based estimating functions. Econometrics Journal 3,
123–147.
Valdivieso, L., Schoutens, W., Tuerlinckx, F., 2009. Maximum likelihood estimation in
processes of Ornstein-Uhlenbeck type. Statistical Inference for Stochastic Proces-
ses 12, 1–19.
CH
AP
TE
R
1PREDICTION-BASED ESTIMATING FUNCTIONS
FOR STOCHASTIC VOLATILITY MODELS WITH
NOISY DATA
A COMPARATIVE STUDY
Anne Floor Brix
Aarhus University and CREATES
Asger Lunde
Aarhus University and CREATES
Abstract
Prediction-based estimating functions (PBEFs), introduced in Sørensen (2000), are
reviewed, and PBEFs for the Heston (1993) stochastic volatility model are derived
with and without the inclusion of noise in the data. The finite sample performance of
the PBEF-based estimator is investigated in a Monte Carlo study and compared to the
performance of the Generalized Method of Moments (GMM) estimator from Boller-
slev and Zhou (2002) that is based on conditional moments of integrated volatility. We
derive new moment conditions in the presence of noise, but we also consider noise
correcting the GMM estimator by basing it on a realized kernel instead of realized
variance. Our Monte Carlo study reveals great promise for the estimator based on
PBEFs. The study also shows that the PBEF-based estimator outperforms the GMM
estimator, both in the setting with MMS noise and in the setting without MMS noise,
especially for small sample sizes. Finally, in an empirical application we fit the Heston
model to SPY data and investigate how the two methods handle real data and possible
model misspecification. The empirical study also shows how the flexibility of the
PBEF-based method can be used for robustness checks.
1
2 CHAPTER 1. PBEFS FOR STOCHASTIC VOLATILITY MODELS WITH NOISY DATA
1.1 Introduction
Continuous time stochastic volatility (SV) models are widely used in econometrics
and empirical finance for modeling prices of financial assets. Considerable efforts
have been put into modeling and estimation of the latent volatility process. Most of
this research is surveyed in part II of Andersen, Davis, Kreiss, and Mikosch (2009).
Stochastic volatility diffusion models, such as the Heston (1993) model, represent a
popular class of models within the continuous time framework. The Heston model
will be the baseline model considered in this chapter, since it is one of the most widely
used models in financial institutions, due to its analytical tractability.
Parameter estimation in SV-models is difficult because the volatility process is
latent. The hidden Markov structure complicates inference, since the observed log-
price process will not in it self be a Markov process, which implies that computing
conditional expectations of functions of the observed process is practically infeasi-
ble. As a consequence, martingale estimating functions will not be a useful tool for
conducting inference in SV-models. Likelihood inference is also not straightforward,
because an analytical expression for the transition density is almost never available
and methods based on extensive simulations are called for.
We will circumvent the above mentioned problems for conducting inference in
SV models by using prediction-based estimating functions (PBEFs), introduced in
Sørensen (2000), which are a generalization of martingale estimating functions. This
generalization becomes particularly useful when applied to observations from a non-
Markovian model. PBEFs are estimating functions based on predictors of functions
of the observed process. The structure of PBEFs is essentially a sum of weighted
augmented prediction errors, and an estimator is found by making this sum zero.
In this chapter we investigate and contrast two estimation approaches. First,
PBEFs will be reviewed, detailed, and used for parameter estimation in the Heston
model. The estimation method is fairly easy to implement and fast to execute, as
the construction of PBEFs rely only on the computation of unconditional moments.
When the Heston SV-model is considered, no simulations are needed for constructing
the PBEFs used in the chapter.1 As a benchmark, we consider the method suggested
in Bollerslev and Zhou (2002). In Bollerslev and Zhou (2002) a Generalized Method
of Moments (GMM) type estimator based on the first and second order conditional
moments of the integrated volatility (IV ) is derived. Since IV is latent, realized
variance (RV ) is used as a proxy and the sample moments of RV are matched to
the population moments of IV implied by the model. When high-frequency data is
1An implementable version of the optimal PBEF will however require simulation of a covariancematrix. Simulation based estimation methods for continuous time SV models, such as indirect inference,see Gourieroux, Monfort, and Renault (1993), the efficient method of moments (EMM), see Gallant andTauchen (1996), or Markov Chain Monte Carlo (MCMC), see Eraker (2001), are not as easily implemented,since many of them require substantial computational efforts. Another way of tackling the difficulties, thatarise when considering parameter estimation in continuous time SV models, is based on approximationsof the likelihood function, see for example Aït-Sahalia and Kimmel (2007).
1.1. INTRODUCTION 3
available, several other simulation-free methods have been suggested in the literature,
see for instance Barndorff-Nielsen and Shephard (2002), Corradi and Distaso (2006),
and Todorov (2009). Common to these methods, including the GMM-based estimator
from Bollerslev and Zhou (2002), is that they are all based on time-series of daily
realized measures, such as realized variance (RV ) and bipower variation (BV ). Instead
of being transformed into daily realized measures, the squared intra-daily returns
are used directly when constructing PBEFs. This means that PBEFs have a potential
informational advantage, the strength of which will be investigated throughout the
chapter. More specifically, the chapter investigates the finite sample properties of the
PBEF-based estimator in a Monte Carlo study and compares its performance to that
of the GMM estimator from Bollerslev and Zhou (2002).
The case where the efficient price is assumed to be directly observable as well as
the case where noise is present in data are considered. In particular, we contribute by
extending the two competing methods to handle the presence of noise. The usage of
PBEFs for estimating SV-models was suggested, among others, in Barndorff-Nielsen
and Shephard (2001), but to the best of our knowledge this is the first time the finite
sample performance of PBEFs applied to SV-models is studied. In fact, this is the most
extensive Monte Carlo study of the finite sample performance of the PBEF-based
estimation method. In Nolsøe, Nielsen, and Madsen (2000) the authors conduct a
small Monte Carlo study for the case where a Cox-Ingersoll-Ross (CIR) process is
observed with additive white noise, but the potential of using PBEFs to estimate
SV-models has not previously been studied.
The chapter also addresses the link between the estimation method based on
PBEFs and GMM based on the moment conditions underlying the PBEF. Especially,
the connection between the optimal PBEF and the optimal choice of the weight
matrix in GMM estimation is established.
Lastly, an empirical application using SPY data is carried out, investigating how
the two estimation methods handle real data characteristics and possible model
misspecification. In the empirical application we also study how different choices in
the flexible PBEF-based estimation method might impact the parameter estimates.
In particular, we investigate how considering different choices of the predictor space
might serve as a robustness check of whether there is a need for additional volatility
factors in the model.
The chapter is organized as follows: In the following section the PBEF estimation
method is review and detailed. The connection to GMM estimation is established,
and a brief review of the GMM estimator from Bollerslev and Zhou (2002) is provided.
For both methods, the estimator of the parameters in the Heston model is derived
with and without the inclusion of noise in the data. In section 3 we present our
Monte Carlo study. This includes an investigation of how i.i.d. noise impacts the
performances of the two methods and how the noise corrected estimators perform.
Section 4 contains an empirical application to SPY data that investigates how the
4 CHAPTER 1. PBEFS FOR STOCHASTIC VOLATILITY MODELS WITH NOISY DATA
methods handle real data and if and how the choice of estimation method impacts
the parameter estimates. The final section concludes, and ideas on further research
are outlined.
1.2 Estimating Stochastic Volatility Models
In this section the two estimation methods from Sørensen (2000) and Bollerslev and
Zhou (2002) are reviewed and extended to handle market microstructure (MMS)
noise. We also discuss the link between the optimal PBEF and a GMM estimator with
the optimal choice of weight matrix. The focus of this chapter is on the performance
of the two considered estimation methods used for estimating SV models of the form
dX t =pvt dWt , dvt = b(vt ;θ)dt + c(vt ;θ)dBt , (1.1)
where W and B are independent standard Brownian motions. The independence
assumption rules out the possibility of leverage effects, but it is only imposed for
computational ease and could be relaxed in other applications. We will assume v to
be a positive, ergodic, diffusion process with invariant measure µθ and that v0 ∼µθ is
independent of B , which implies that v is stationary. In particular, we are interested
in studying inference for the Heston SV-model
dX t =pvt dWt , dvt = κ(α− vt )dt +σpvt dBt , (1.2)
where the spot volatility, vt , is a CIR-process. The parameter, α, is the long run aver-
age variance of the observed process, X t , and the other drift parameter, κ, is the rate
at which vt reverts to the long run average. The third parameter,σ, can be interpreted
as the volatility of volatility. The Feller condition, 2κα≥σ2, ensures positivity of the
variance process vt . The Heston model is widely used in mathematical finance
where the observed process, X t , would be the logarithm of an asset price. The pop-
ularity of the Heston model in financial institutions is primarily due to the analytical
tractability of the model, which allows for (quasi) closed form expressions for prices
of financial derivatives, such as European options.
1.2.1 Estimation using Prediction-based Estimating Functions
First, we explain the general setup and ideas underlying the estimation method based
on PBEFs that was introduced in Sørensen (2000) and further developed in Sørensen
(2011). Then, following Sørensen (2000), we derive the PBEFs for the Heston model
without MMS noise. Finally, we add MMS noise to the observations and derive PBEFs
in this setting.
The General Setup and Ideas
The estimation method based on PBEFs is used for conducting parametric infer-
ence based on observations Y1,Y2, . . . ,Yn from a general stochastic process. The
1.2. ESTIMATING STOCHASTIC VOLATILITY MODELS 5
stochastic process is assumed to belong to a class of models parametrized by a p-
dimensional vector, θ ∈Θ⊆Rp , that we wish to estimate. An estimating function is a
p-dimensional function Gn(θ) that depends on the data Y1,Y2, . . . ,Yn and θ, and an
estimator is then obtained by solving the p equations Gn(θ) = 0 w.r.t. θ.
Let Fi denote the σ-algebra generated by the observations Y1,Y2, . . . ,Yi . When
θ is the true parameter, we denote by H θi the L 2-space of all square integrable,
Fi -measurable, 1-dimensional random variables. H θi is a Hilbert space of real-
valued functions of the type h(Y1, . . . ,Yi ), with inner product given by ⟨h1,h2⟩ =Eθ[h1(Y1, . . . ,Yi )h2(Y1, . . . ,Yi )]. For each i , a closed linear subspace P θ
i−1 of H θi−1 can
be chosen as the predictor space for predicting f (Yi ), where f is some known 1-
dimensional function2. In the setup of PBEFs, we study estimating functions of the
form
Gn(θ) =n∑
i=1Π(i−1)(θ)︸ ︷︷ ︸
p×1
[ f (Yi )− π(i−1)(θ)︸ ︷︷ ︸∈P θ
i−1︸ ︷︷ ︸1×1
], (1.3)
where the function to be predicted, f (Yi ), is defined on the state space of the data
generating process Y . The function f is assumed to satisfy the condition Eθ[ f (Yi )2] <∞ for all θ ∈ Θ and for i = 1, . . . ,n. In (1.3), the p-dimensional stochastic vector of
weights,Π(i−1)(θ) = (π(i−1)
1 (θ), . . . ,π(i−1)p (θ)
), has elements that belong to the predictor
space P θi−1, and π(i−1)(θ) is the minimum mean square error (MMSE) predictor of
f (Yi ) in P θi−1. That is, π(i−1)(θ) is the orthogonal projection of f (Yi ) onto P θ
i−1 w.r.t.
the inner product in H θi defined above. Since the predictor space is both closed and
linear, this orthogonal projection exists and is uniquely determined by the normal
equations
Eθ[π
[f (Yi )− π(i−1)(θ)
]]= 0, for all π ∈P θi−1, (1.4)
see e.g. Thm. 3.1 in Karlin and Taylor (1975).3
A special class of PBEFs is the class of martingale estimating functions (MGEFs),
which is obtained by choosing P θi−1 := H θ
i−1. In this case, the MMSE predictor of
f (Yi ) in P θi−1 is the conditional expectation, π(i−1)(θ) = Eθ
[f (Yi )|Y1, . . . ,Yi−1
], and
Gn(θ) becomes a Pθ-martingale w.r.t. the filtration generated by the data process.
MGEFs are however mainly useful when considering Markovian models, since for
Non-Markovian models it is practically infeasible to calculate conditional expecta-
tions, conditioning on the entire past of observations. The idea underlying PBEFs, is
to use a smaller and more tractable predictor space in place of H θi−1 and think of the
resulting PBEF as an approximation of the MGEF. The advantage of considering this
2One could also choose to predict functions of the type f (Yi , . . . ,Yi−s ), see Sørensen (2011), but forthe purpose of this study f (Yi ) will be general enough. The function f can be chosen freely but willoften take the form f (Yi ) = Y ν
i , ν ∈ N, such that the moments needed to find the (optimal) PBEF areeasier to calculate. PBEFs can in fact be further generalized to a setup where several functions of thedata, f j (Yi ) j = 1, . . . , N , are predicted, see Sørensen (2000) and Sørensen (2011). This generalization will,however, not be necessary for estimating the SV-model we are considering.
3Unique in the sense of mean square distance.
6 CHAPTER 1. PBEFS FOR STOCHASTIC VOLATILITY MODELS WITH NOISY DATA
approximation is that it is only based on unconditional moments, which are much
easier to compute, or simulate, than conditional moments. Regarding efficiency is-
sues, estimators based on conditional moments are more efficient than those based
on the unconditional version of the moments. The reason is that the score function is
a martingale, so we can obtain a close approximation to the score function by using
MGEFs. By suitably choosing the predictor space, the hope is that the PBEF will also
be a good approximation of the score function, and the resulting estimator will obtain
high efficiency.
In the rest of the chapter we will restrict our attention to finite dimensional pre-
dictor spaces, P θi−1, and assume that the observed process Yi is stationary.4 For
asymptotic properties of the estimator in this setting, consult Sørensen (2000) and
Sørensen (2011). In order to obtain even more tractable PBEFs, we will only con-
sider q +1 dimensional predictor spaces with basis elements of the form Z (i−1)k =
hk (Yi−1, . . . ,Yi−s ), k = 0, . . . , q , where hk : Rs 7→ R, s ∈ N and where the functions
h0,h1, . . . ,hq are linearly independent and do not depend on θ. The predictor space
used for predicting f (Yi ) is then given by P θi−1 = spanZ (i−1)
0 , Z (i−1)1 , . . . , Z (i−1)
q . The
basis elements of the predictor space are no longer assumed to be functions of the en-
tire past, but they are instead functions of the “most recent past” of a period of length s.
To adapt to usual practice, and to ensure that the resulting MMSE predictor of f (Yi ) in
P θi−1 becomes unbiased, we will assume h0 = 1. The predictors in P θ
i−1 will therefore
be of the form a0 +a′Z (i−1), where a′ = (a1, . . . , aq ) and Z (i−1) = (Z (i−1)
1 , . . . , Z (i−1)q
)′.5The normal equations (1.4) lead to the MMSE predictor
π(i−1)(θ) = a0(θ)+ a(θ)T Z (i−1), (1.5)
where a(θ)) =C (θ)−1b(θ) and a0(θ) = Eθ[ f (Yi )]−a(θ)′Eθ[Z (i−1)]. C (θ) denotes the q×q covariance matrix of Z (i−1) and b(θ) = [
Covθ(Z (i−1)1 , f (Yi )), . . . ,Covθ(Z (i−1)
q , f (Yi ))]′.
Note that, since the observed process Yi is stationary, the coefficients of the MMSE
predictor do not depend on i , but stay constant across time.6 For a formal derivation
of the expressions for the coefficients in (1.5) see Appendix A.
From (1.3-1.5) it follows that PBEFs can be calculated provided that we can calcu-
late the first- and second-order moments of the random vector(
f (Yi ), Z (i−1)1 , . . . , Z (i−1)
q
).
Within the setup of the finite dimensional predictor spaces considered above,
we now turn to the specification of the p ×1-vectorΠ(i−1)(θ) from (1.3). Since each
element of the vectorΠ(i−1)(θ) belongs to the predictor space P θi−1, the j th element
of Π(i−1)(θ) is of the form π(i−1)j (θ) =∑q
k=0 a j k (θ)Z (i−1)k , where, as before, Z (i−1)
0 = 1.
Note that the coefficients a j k (θ) do not depend on i but are, like the coefficients of
4In the context of stochastic volatility models, Yi will be the series of stationary asset returns.5In the case of the Heston model, the basis elements we will consider is of the form Z (i−1)
k= Y 2
i−k , k =1, . . . q .
6PBEFs with finite dimensional predictor spaces can also be computed for non-stationary processes,but in this case computing the MMSE predictor, π(θ), is a bit more complicated since the coefficients,a0(θ), . . . , aq (θ), become time-varying.
1.2. ESTIMATING STOCHASTIC VOLATILITY MODELS 7
the MMSE, a(θ) and a0(θ), constant over time. Therefore, in order to ease notation,
we define
A(θ)p×(q+1)
=
a10(θ) . . . a1q (θ)
......
...
ap0(θ) . . . apq (θ)
, H (i )(θ)(q+1)×1
=
Z (i−1)
0
[f (Yi )− π(i−1)(θ)
]...
Z (i−1)q
[f (Yi )− π(i−1)(θ)
] ,
for i = 1, . . . ,n and Fn(θ) :=∑ni=s+1 H (i )(θ).7 With this notation at hand we are consid-
ering PBEFs of the form
Gn(θ) = A(θ)Fn(θ), (1.6)
where we need p ≤ q +1 to identify the p unknown parameters. Finding the optimal
PBEF within a class of PBEFs of the type (1.6) is then a question of choosing an optimal
weight matrix, A∗(θ). The resulting optimal PBEF will be the estimating function,
within the considered class of estimating functions of type (1.6), that is closest to the
score in an L 2-sense. Since we have asymptotic normality of the estimator, the PBEF
with the optimal choice of weight matrix will then give rise to the estimator with the
smallest asymptotic variance. For further details on the optimal PBEF see Sørensen
(2000) or Appendix B.
Relating PBEFs to GMM Estimation
The PBEFs and martingale estimating functions share many similarities with GMM
estimation from the econometrics literature. In this subsection we will explain the link
between the optimal PBEF and the optimal GMM estimator based on the moment
conditions from the normal equations. Throughout the chapter, optimal refers to
efficiency of the resulting estimator.
The PBEF-based estimator is obtained by solving Gn(θ) = 0 for θ, but for numer-
ical reasons it is often easier to minimize Gn(θ)′Gn(θ) w.r.t. θ ∈ Θ. We employ this
approach and find an estimator by solving
minθ∈Θ
Gn(θ)′Gn(θ) = minθ∈Θ
Fn(θ)′A(θ)′A(θ)Fn(θ).
This expression looks very similar to the GMM objective function that emerges if we
perform GMM estimation using the q +1 moment conditions Eθ[H(θ)] = 0. In this
case the GMM objective function to be minimized is ( 1n−s Fn(θ))′Wn(θ)( 1
n−s Fn(θ)),
which is equivalent to minimizing Fn(θ)′Wn(θ)Fn(θ). In the latter case, the corre-
sponding p first order conditions are
2(∂θFn(θ)
)′︸ ︷︷ ︸p×(q+1)
Wn(θ)︸ ︷︷ ︸(q+1)×(q+1)
Fn(θ)︸ ︷︷ ︸(q+1)×1
= 0, (1.7)
7Note that the sum starts at i = s +1 since Z (i−1)k
is only well-defined for i ≥ s +1.
8 CHAPTER 1. PBEFS FOR STOCHASTIC VOLATILITY MODELS WITH NOISY DATA
if we evaluate the GMM weight matrix, Wn(θ), at some consistent parameter estimate
θ such that the weight matrix does not depend on θ. The first order conditions
(1.7) have the same structure as the PBEFs in (1.6). The only difference is that the
term in front of Fn(θ) in (1.7) becomes data dependent. However, it turns out that
there is a strong link between (1.7) with Wn(θ) chosen optimally and the optimal
PBEF of type (1.6). The optimal PBEF takes the form G∗n(θ) = A∗
n(θ)Fn(θ), where
A∗n(θ) = U (θ)′M n(θ)−1 and the expression for U (θ) and M n(θ)−1 can be found in
Sørensen (2000) or Appendix B. Straightforward calculations show that
− 1
n − s∂θFn(θ)′
p−→U (θ)′, when n −→∞. (1.8)
From the theory on GMM estimation, we know that the optimal choice of weight
matrix, Wn(θ), is the inverse of the covariance matrix of Fn(θ) since the H (i )’s are
correlated. In the GMM setting this weight matrix will in practice be constructed using
the sample version of the covariance matrix. When Wn(θ) is chosen optimally, Wn(θ)
equals 1n−s M n(θ)−1 and (1.7) becomes (except of a factor − 1
2 ) the empirical analog of
the optimal PBEF, G∗n(θ) = A∗
n(θ)Fn(θ). Constructing the optimal PBEF is therefore
the same as constructing the theoretical first order conditions that emerge from the
optimal GMM objective functions based on the moment conditions, Eθ[H(θ)] = 0,
from the normal equations. The choice of f and predictor space then translates into
which moment conditions to use in the GMM estimation. These choices will therefore
also impact the efficiency of the resulting PBEF-based estimator. Once these choices
are made, the optimal PBEF-based estimator is linked to the optimal GMM estimator,
based on the moment conditions from the normal equations, as described above.
Note that a sub-optimal choice of the weight matrix Wn(θ) will lead to a sub-optimal
PBEF, but the class of PBEFs are in general broader than the ones having the structure
(1.7).
PBEFs for SV-Models without Noise in the Data
We now return to the setup from (1.1) and, following Sørensen (2000), review how to
compute PBEFs for SV-models without MMS noise.
Suppose the process X has been observed at discrete time points X0, X∆, . . . , Xn∆.
It is more convenient to base the statistical inference on the differences Yi = Xi∆−X(i−1)∆ since the process Yi , in contrast to Xi∆, will be stationary since vt is
assumed stationary. In this setup, inference based on MGEFs becomes practically
infeasible, since the conditional expectations appearing in the MGEFs, which are
based on f (Yi )−Eθ[ f (Yi )|Yi−1, . . . ,Y1], are difficult to compute analytically, as well as
numerically. One feasible approach for conducting inference is to use PBEFs instead.
In fact, for many models, such as the Heston model, we are able to derive analytical
1.2. ESTIMATING STOCHASTIC VOLATILITY MODELS 9
expressions for the PBEFs. Then the continuous time returns from (1.1) are given by
Yi = Xi∆−X(i−1)∆ =∫ i∆
(i−1)∆
pvt dWt , (1.9)
which allows for the decomposition Yi =p
Si Zi , where the Zi ’s are i.i.d. standard nor-
mal random variables independent of Si , and where the process Si is given by Si =∫ i∆(i−1)∆vt dt . The distribution of vt is the same on all intervals [0,∆), . . . , [(n −1)∆,n∆)
because vt is stationary, hence it follows that Si and Yi are stationary processes.
Note that the Y ′i s have zero mean and are uncorrelated, but not independent.
To construct PBEFs, we have to decide on which function of the data to predict.
Since the Y ′i s are uncorrelated, trying to predict Yi using Yi−1,Yi−2, . . .Yi−q will not
work and f (y) = y is a bad choice. To match empirical data, where we have volatility
clustering, squared returns from the considered SV-models are often correlated, and
a natural choice for f would therefore be f (y) = y2. The decomposition Yi =p
Si Zi
also reveals that f (y) = y2 is a convenient choice, as it eases the computation of the
moments required to construct the PBEFs. Correlation between absolute returns
tend to be even more persistent than the correlation between squared returns and
f (y) =∣∣y∣∣ might also seem like a good choice. This choice will however complicate
the computation of the moments need for constructing the PBEF. The problem is that
there is in general no simple way of relating the moments Eθ[Sηi ] to the moments of
the volatility process, unless η is an integer. For instance, computing Eθ[p
Si ] is not
an easy task. Other choices of f might result in efficiency gains, but without further
knowledge of the intractable score functions that we aim to estimate, we will stick to
the class of polynomial PBEFs and use f (y) = y2, as this offers computational ease.8
As our predictor spaces we choose
P θi−1 = a0 +a1Y 2
i−1 +·· ·+aq Y 2i−q |ak ∈R k = 0,1, . . . , q.
This means that the predictor variables Z (i−1)k = Y 2
i−k for k = 1,2, . . . , q have the same
functional form as the function of the data to predict.9 Notice that in this case s = q
since P θi−1 is spanned by the “most recent past of squared returns of length q”.10
With the above choice of f and predictor space, the MMSE predictor is given by
π(i−1)(θ) = a0(θ)+ a(θ)′Z (i−1), with Z (i−1) = (Y 2i−1, . . . ,Y 2
i−q ),
a(θ) =C−1(θ)b(θ), a0(θ) = Eθ(Y 21 )[1− (a1(θ)+·· ·+ aq (θ))]. (1.10)
8Higher power of y could also have been considered, but very high moments are often not reliable forempirical investigations, and we choose to stick with f (y) = y2 as suggested in Sørensen (2000).
9It should be noted that one does not have to choose a predictor space spanned by variables of thesame functional form as f , even though it seems like the most natural choice.
10When the volatility process vt is ρ-mixing the coefficient ak (θ) decreases exponentially with k andq need not be very large since q represents the “‘required” information for predicting f (Y ), (see Thm.3.3 in Bradley (2005)). Note that if the volatility process vt is α-mixing then the observed process Yi inherits this property and is also α-mixing, (see lemma 6.3 in Sørensen (2000)).
10 CHAPTER 1. PBEFS FOR STOCHASTIC VOLATILITY MODELS WITH NOISY DATA
As before, C denotes the covariance matrix of Z (i−1), and b is the q ×1-vector with
j th element given by Covθ(Y 2i− j ,Y 2
i ). Together with (1.6), this means that we are
considering PBEFs of the form
Gn(θ) =n∑
i=q+1Π(i−1)(θ)︸ ︷︷ ︸
p×1
[Y 2i − (a0(θ)+ a1(θ)Y 2
i−1 +·· ·+ aq (θ)Y 2i−q )], (1.11)
with Π(i−1)(θ) = A(θ)Z (i−1), where Z (i−1) = (1,Y 2i−1, . . . ,Y 2
i−q )′. In our Monte Carlo
study we will use the following sub-optimal, yet simple weight matrix
A(θ) =
1 0 0 0 . . . 0
0 1 0 0 . . . 0
0 0 1 0 . . . 0
,
since computing the optimal weight matrix A∗(θ) involves computing the covariance
matrix of Fn(θ).11 The resulting sub-optimal PBEF is
Gn(θ) =n∑
i=q+1
1
Y 2i−1
Y 2i−2
[Y 2i − (a0(θ)+ a1(θ)Y 2
i−1 +·· ·+ aq (θ)Y 2i−q )]. (1.12)
Equating (1.12) to zero and solving for θ gives ap
n-consistent estimator, but we may
loose some efficiency for not using the optimal weight matrix A∗(θ). However, the aim
of the chapter is to study the finite sample performance of an easy implementable
and simulation free PBEF. As we shall see in our Monte Carlo study, the estimator
based on the sub-optimal PBEF performs well in finite samples, and a study of the
possible further improvements from using the optimal PBEF is left for future research.
The sub-optimal PBEF from (1.12) can now be computed if we can calculate
a0(θ), a1(θ), . . . , aq (θ). For this we need Eθ[Y 2i ],Varθ(Y 2
i ) and Covθ(Y 2i ,Y 2
i+ j ) for j =1, . . . , q , so we have to assume Eθ[Y 4
i ] <∞ for the MMSE predictor to be well-defined.12
The required moments can be calculated from the moments of the volatility process
vt . Now, define the mean, variance and autocorrelation function of the volatility
process as ξ(θ) := Eθ[vt ],ω(θ) := Varθ(vt ), and r (u;θ) := Covθ(vt , vt+u)/ω(θ). From
(Barndorff-Nielsen and Shephard, 2001, pp. 179-181) it follows that
Eθ[Y 2i ] =∆ξ(θ), (1.13)
Varθ(Y 2i ) = 6ω(θ)R∗(∆;θ)+2∆2ξ(θ)2, (1.14)
Covθ(Y 2i ,Y 2
i+ j ) =ω(θ)[R∗(∆( j +1);θ)−2R∗(∆ j ;θ)+R∗(∆( j −1);θ)], (1.15)
11A task that involves computing Eθ[Y 2i Y 2
j Y 2k Y 2
1 ] for i ≥ j ≥ k. For further details on how to compute
optimal PBEFs for stochastic volatility models, see Sørensen (2000). In Sørensen (2000) an analyticalformula for the optimal PBEF for an affine SV-model, such as the Heston model, is also given. Even thoughan analytical expression for A∗(θ) is in principle available, it is a very complicated expression and noteasily implementable. In practice, a feasible strategy could be to simulate A∗(θ).
12From Jensen’s inequality it follows that Eθ[vβ/2t ] < ∞ implies Eθ[Y
βi ] < ∞ for β ≥ 2. For β ≤ 2,
Eθ[vt ] <∞ implies Eθ[Yβ
i ] <∞.
1.2. ESTIMATING STOCHASTIC VOLATILITY MODELS 11
where R∗(t ;θ) = ∫ t0
∫ s0 r (u;θ)duds. In the Heston model the stationary distribution of
vt is the Gamma distribution with shape parameter 2κασ−2 and scale parameter
2κσ−2, provided that σ> 0, α> 0 (non-negativity), κ> 0 (stationary in mean), and
2κα≥σ2 (stationary in volatility). Thus, we have ξ(θ) =α, ω(θ) = ασ2
2κ , r (u;θ) = e−κu
and R∗(t ;θ) = 1κ2
(e−κt +κt −1
). The moments we need are then given by
Eθ[Y 2i ] =∆α, (1.16)
Varθ(Y 2i ) = 6ασ2
2κ3
(e−κ∆+κ∆−1
)+2∆2α2, (1.17)
Covθ(Y 2i ,Y 2
i+ j ) = ασ2
2κ3
(e−κ∆ j [e−κ∆−2+eκ∆
]). (1.18)
From the above derivations it is clear that PBEFs can easily be derived in other
diffusion models where we can compute the mean, variance and autocorrelation
structure of the volatility process. This is for instance the case when the volatility
process belong to the class of Pearson diffusions, that nests the CIR process, see
Forman and Sørensen (2008).
PBEFs for SV-Models with Noisy Data
We now add noise to the observation scheme from (1.9). More specifically, we add
i.i.d. Gaussian noise to the log-price process
Xi = X ∗i +Ui , Ui i.i.d. N (0,ω2), (1.19)
where the efficient log-price process X ∗ comes from the Heston model and X ∗ and
U are assumed to be independent. The additive error term Ui will be interpreted as
market microstructure (MMS) noise due to market frictions such as bid-ask bounce,
liquidity changes, and discreteness of prices. When MMS noise is present, the ob-
served returns have the following structure
Yi = Xi −Xi−1 = (X ∗i −X ∗
i−1)+ (Ui −Ui−1) = Y ∗i +εi , (1.20)
where the M A(1) process, ε, is normally distributed, N (0,2ω2), and independent of
the efficient return process Y ∗.
To correct for MMS noise in the PBEFs, the moments used to construct the MMSE
predictor have to be recalculated. That is, Eθ[Y 2i ],Varθ(Y 2
i ) and Covθ(Y 2i ,Y 2
i+ j ) need
to be computed in the setting from (1.20).
Straightforward calculations give Eθ[Y 2i ] = ∆α+2ω2, since Y ∗ and ε are inde-
pendent and have mean zero. We can now derive the bias in α that can be expected
to occur, when performing the PBEF-based estimation procedure without correct-
ing for MMS noise. If the MMS noise is not taken into account, then the equation,
Eθ[Y 2i ] =∆α, is erroneously used for constructing the PBEF. Therefore, the expected
bias in α is given by 2ω2
∆ , and as we shall see, this quantity matches with the bias
12 CHAPTER 1. PBEFS FOR STOCHASTIC VOLATILITY MODELS WITH NOISY DATA
found in our Monte Carlo Study. Since we do not have an analytical expression for
the estimators, we will not attempt to derive the bias encountered in κ and σ.
As for the variance of the squared returns, it follows from (1.20) that
Y 2i = Y ∗2
i +ε2i +2Y ∗
i εi ,
and since the three terms are uncorrelated, we have that
Varθ(Y 2i ) = Varθ(Y ∗2
i )+Varθ(ε2i )+4Varθ(Y ∗
i εi ). (1.21)
Given the structure of the noise process, and since the noise process is normally
distributed with mean zero and a variance of 2ω2, we find that Varθ(ε2i ) = 8ω4. The
efficient return process and the noise process are independent, and both have zero
mean, so Varθ(Y ∗i εi ) = Eθ[ε2
i ]Eθ[Y ∗2
i ] = 2ω2∆α. Plugging this into (1.21) yields
Varθ(Y 2i ) = Varθ(Y ∗2
i )+8ω4 +8ω2∆α. (1.22)
Regarding the covariance structure of the squared returns, only the first order covari-
ance will change due to the M A(1) structure in the return errors ε. By, once again,
exploiting that Y ∗ and ε are independent and both have mean zero, we obtain the
following expression for the first order covariance of the observed squared return
series
Covθ(Y 2i ,Y 2
i+1) = Covθ(Y ∗2i ,Y ∗2
i+1)+Covθ(ε2i ,ε2
i+1) = Covθ(Y ∗2i ,Y ∗2
i+1)+2ω4. (1.23)
We can now compute the noise corrected version of the PBEF previously described.
Note that we can choose to estimate the variance of the noise,ω2, in a first step, by for
instance, plugging a non-parametric estimator into the noise corrected PBEF used for
estimating κ,α and σ. Another approach would be to expand the parameter vector
to θ = (κ,α,σ,ω2) and use the noise corrected PBEF to estimate all four parameters.
In the last approach one would have to choose a 4× (q +1) weight matrix, A(θ). This
will result in a 4×1 estimating function Gn(θ), such that our estimator θ is obtained
by solving four equations in four unknowns. We will follow the last approach and
estimate all four parameters in one step. Since we have chosen q = 3 and wish to
estimate ω2, the weight matrix will be a 4×4 matrix. This means, that the weight
matrix can be ignored when solving Gn(θ) = 0, under the assumption that A(θ) is
invertible and the sub-optimal PBEF we have considered so far will, in this setting, be
optimal.
1.2.2 A GMM Estimator based on Moments of Integrated Volatility
In this subsection the GMM estimation procedure from Bollerslev and Zhou (2002)
is reviewed and extended to handle MMS noise. In Bollerslev and Zhou (2002), the
moment conditions for constructing the GMM estimator arise from the analytical
1.2. ESTIMATING STOCHASTIC VOLATILITY MODELS 13
derivations of the conditional first- and second-order moments of the daily integrated
volatility (IV ) process. We will consider both a parametric and a non-parametric way
of accounting for the presence of MMS noise in the data used for constructing the
GMM estimator. In the parametric approach, the moment conditions are adjusted
to hold in the MMS noise setting, and in the non-parametric approach we use a
noise robust estimate of IV , namely the realized kernel (RK ) from Barndorff-Nielsen,
Hansen, Lunde, and Shephard (2008a).
The GMM estimator without Noise in the Data
We now review the GMM estimator from Bollerslev and Zhou (2002) in the case where
we have observations from the Heston model without MMS noise. Since the daily
IV is latent, the realization of this time-series is approximated by the daily realized
variance (RV ). Replacing population moments of IV with sample moments of RV
result in an easy-to-implement GMM estimator. Once again, the statistical inference
will be based on the discretely sampled returns Yi = Xi∆ − X(i−1)∆, which we will
assume to be available at high frequencies. The GMM estimation method crucially
depends on the availability of high-frequency data, since high-frequency data will
ensure that RV is a good approximation of IV and, hence the moment conditions
will hold approximately for RV .
When considering the Heston model, the conditional moment conditions used
for constructing the GMM estimator are given by
Eθ[IVt+1,t+2 −δIVt ,t+1 −β|Gt ] = 0,
Eθ[IV 2t+1,t+2 −H(IV 2
t ,t+1)− I (IVt ,t+1)− J |Gt ] = 0,(1.24)
where IVt ,t+1 denotes the integrated volatility from day t to day t − 1 and Gt =σ
(IVt−s−1,t−s |s = 0,1,2, . . .∞
). The functions δ,β, H , I , and J are functions of the
parameters κ,α, and σ and can be found in Appendix C. The functions δ and β only
depend on the drift parameters κ and α which is why the second moment condi-
tion is needed. For further details on the derivation of the two conditional moment
conditions see Bollerslev and Zhou (2002) or Appendix C. To get enough moment
conditions to identify θ, the two moment conditions are augmented by IVt−1,t and
IV 2t−1,t , yielding a total of six moment conditions. By replacing daily IV with daily RV ,
and using the unconditional versions of these six moment conditions, we are now
able to construct a feasible GMM estimator for the parameters of interest θ = (κ,α,σ).
Letting T denote the number of trading days, the feasible GMM estimator is then
given by
θT = argminθ
( 1
T −2
T−2∑t=1
ft (θ))′
W( 1
T −2
T−2∑t=1
ft (θ)), (1.25)
with W = S−1, where S is a consistent estimate of the asymptotic covariance matrix
14 CHAPTER 1. PBEFS FOR STOCHASTIC VOLATILITY MODELS WITH NOISY DATA
of gT (θ) = 1T−2
∑T−2t=1 ft (θ) and where ft (θ) is given by
ft (θ) =
RVt+1,t+2 −δRVt ,t+1 −βRV 2
t+1,t+2 −H(RV 2t ,t+1)− I (RVt ,t+1)− J
[RVt+1,t+2 −δRVt ,t+1 −β]RVt−1,t
[RV 2t+1,t+2 −H(RV 2
t ,t+1)− I (RVt ,t+1)− J ]RVt−1,t
[RVt+1,t+2 −δRVt ,t+1 −β]RV 2t−1,t
[RV 2t+1,t+2 −H(RV 2
t ,t+1)− I (RVt ,t+1)− J ]RV 2t−1,t
. (1.26)
Parametrically Correcting for Noisy Data in the GMM Approach
Recall, when MMS noise is present, the observed returns have the following structure
Yi = Xi −Xi−1 = (X ∗i −X ∗
i−1)+ (Ui −Ui−1) = Y ∗i +εi .
If we denote the realized variance based on the MMS noise contaminated returns by
RV MMS and the realized variance based on the efficient return process by RV ∗, we
can rewrite RV MMS over day t as
RV MMSt ,t+1 = RV ∗
t ,t+1 +m∑
i=1ε2
i ,t +2m∑
i=1εi ,t Y ∗
i ,t , (1.27)
where the number of intra-day observations are given by m :=∆−1.
The idea is to noise correct the GMM estimation approach from Bollerslev and
Zhou (2002) by adjusting the moment condition from (1.26) such that they hold
for RV MMS. In order to do so, we have to extend the filtration we condition on
to a larger filtration, making RV MMS measurable w.r.t that filtration. The moment
conditions from Bollerslev and Zhou (2002) were derived using the sigma-algebra
Gt =σ(IVt−s−1,t−s |s = 0,1,2, . . .
), that was approximated by Gt =σ
(RV ∗
t−s−1,t−s |s =0,1,2, . . .
). Instead of Gt , we will now consider the larger filtration, H t , generated by
RV ∗, the efficient returns process, Y ∗, and the noise process, ε, up until the beginning
of day t . Define
H t :=σ(RV ∗
t−s−1,t−s ,Y ∗i ,t−s−1,εi ,t−s−1|s = 0,1,2, . . . and i = 1,2, . . . ,m
),
where Y ∗i ,t−1 and εi ,t−1 for i = 1,2, . . . ,m denote the intra-day returns of the efficient
price process and the MMS noise process during day t − 1 respectively. We now
consider how to extend the first conditional moment condition from (1.24), using the
decomposition from (1.27)
Eθ[RV MMSt+1,t+2 −δRV MMS
t ,t+1 −β|H t ] = Eθ[RV ∗t+1,t+2 −δRV ∗
t ,t+1 −β|H t ] (1.28)
+Eθ[m∑
i=1ε2
i ,t+1 −δm∑
i=1ε2
i ,t |H t ] (1.29)
+2Eθ[m∑
i=1εi ,t+1Y ∗2
i ,t+1 −δm∑
i=1εi ,t Y ∗2
i ,t |H t ]. (1.30)
1.2. ESTIMATING STOCHASTIC VOLATILITY MODELS 15
Let us first consider (1.28). From the moment conditions used in the no noise case
we know that we approximately get a zero when conditioning on Gt in (1.28) instead.
Under the assumption that this approximating still holds when information on the
efficient return series, Y ∗, up til time t is added to the sigma-algebra Gt , then
(1.28) will also equal zero approximately, since both the RV ∗ and Y ∗ series are
independent of the noise process. We do not use overnight returns, so the M A(1)
structure in the noise process only holds within the trading day. This means that it
will not impact the calculation of the conditional expectation (1.29), which will just
equal the unconditional expectation (1−δ)2ω2m. Since, as just discussed, the noise
process is independent of any realizations from previous days and has mean zero,
the conditional expectation (1.30) will just equal zero. All in all, this leaves us with the
noise adjusted moment condition
Eθ[RV MMSt+1,t+2 −δRV MMS
t ,t+1 −β− (1−δ)2ω2m|H t ] ≈ 0. (1.31)
As in the no MMS noise case we will augment the conditional moment condition
(1.31) by RV MMSt−1,t and (RV MMS
t−1,t )2 to get two additional moment conditions.
Turning our attention to the second conditional moment condition from (1.24),
we wish to compute
Eθ[(RV MMS
t+1,t+2)2 −H(RV MMSt ,t+1 )2 − I (RV MMS
t ,t+1 )− J |H t]. (1.32)
This task is however not feasible since it involves computing Eθ[Y ∗2
i ,t+1|H t ] and
Eθ[Y ∗2
i ,t |H t ]. If this was possible, we could use these expressions to form martin-
gale estimating functions and would not need to use PBEFs for the estimation of
our SV-model. The problem is that we do not have an analytical expression for the
conditional expectation of the squared returns during day t +1 and day t given the
filtration generation by the return series up until time t . Instead we will settle for four
moment conditions and simply use the unconditional expectation of (1.32) given by
Eθ[(RV MMS
t+1,t+2)2 −H(RV MMSt ,t+1 )2 − I (RV MMS
t ,t+1 )− J −K −L]≈ 0, (1.33)
K = (1−H)(4m2ω4 +4mω2α+12mω4 −4mω4 +8ω2α),
L =−2mω2I
The derivation of the moment condition above can be found in Appendix C.
Non-parametrically Correcting for Noisy Data in the GMM Approach
In the presence of i.i.d. MMS noise that is independent of the efficient log-price
process, Hansen and Lunde (2006) show that the bias in RV equals 2∆−1ω2. In fact,
the variance of RV also diverges to infinity as the sampling frequency increases. In
the setting with MMS noise, RV is no longer a consistent estimator of IV . We will
therefore use a noise robust estimate of IV when constructing the GMM estimator.
16 CHAPTER 1. PBEFS FOR STOCHASTIC VOLATILITY MODELS WITH NOISY DATA
Instead of basing the estimation procedure on the time-series of daily RV , we will use
the time series of daily realized kernels (RK ) from Barndorff-Nielsen et al. (2008a).
The estimator is then constructed using the moment conditions (1.26), replacing
RV with RK . We use the flat-top Tukey-Hanning2 kernel, since the resulting RK
is closest to being efficient in the setting of i.i.d. noise that is independent of the
observed process. As for the bandwidth, H , we follow the asymptotic derivations
from Barndorff-Nielsen et al. (2008a) and let H ∝ (1/∆)1/2, in order to obtain the
optimal rate of convergence, (1/∆)1/4, of RK to IV .13
1.3 A Monte Carlo Study of the Finite Sample Performances
1.3.1 The Setup and the Case without Noisy Data
In this subsection and the following, the finite sample performances of the PBEF-
based estimator from Sørensen (2000) and the GMM estimator from Bollerslev and
Zhou (2002) are investigated in a Monte Carlo study. This subsection investigates
the potential of using the intra-day returns directly in the PBEF-based estimator in
a setting without MMS noise. The benchmark used for evaluating the performance
of the PBEF-based method is the GMM approach from the previous subsection. In
the next subsection, we first consider a setup with mild model misspecification, in
the sense that we now add MMS noise to the simulated data and investigate how
this impacts the two estimation methods. Afterwards, the performance of the noise
corrected estimation methods are studied.
The data used for constructing the estimators are simulated realizations from the
Heston model (1.2). We use a first-order Euler scheme to simulate the volatility- and
log-price processes. The log-price is sampled every 30 seconds in the artificial 6.5
hours of daily trading, for sample sizes of T = 100,400,1000 and 4000 trading days.
Using the simulated data, daily realized variances based on the artificial five-minute
returns are constructed. We will think of the five-minute returns as our available data.
Since we are using five-minute returns over 6.5 hours of trading, we have ∆= 1/78.
To get a better grasp of the finite sample performance of the estimator based on
PBEFs, as well as the GMM estimator, we conduct our Monte Carlo experiment in
three different scenarios of parameter configurations.
• Scenario 1: (κ,α,σ) = (0.03,0.25,0.10). The volatility process is highly persis-
tent (near unit-root). The autocorrelation is given by r (u,θ) = e−κu and the
correlation between the volatility process sampled five minutes apart equals
e−0.03∗1/78 = 0.9996. The half-life of the volatility process equals 23.1 days.
13For further details on how the bandwidth is chosen, consult section 4 of Barndorff-Nielsen et al.(2008a).
1.3. A MONTE CARLO STUDY OF THE FINITE SAMPLE PERFORMANCES 17
• Scenario 2: (κ,α,σ) = (0.10,0.25,0.10). Here we have a slightly less persistent
volatility process due to the increase in the mean-reversion parameter. The
half-life now equals 6.93 days.
• Scenario 3: (κ,α,σ) = (0.10,0.25,0.20). The local variance of volatility is now
increased. This process is also close to the non-stationary region since the
CIR process is stationary if and only if 2κα ≥ σ2, and here 2κα−σ2 = 0.01
(compared to 0.04 in scenario 2).
The same scenarios were considered in the Monte Carlo study conducted in Bollerslev
and Zhou (2002). In Bollerslev and Zhou (2002), the authors only consider the rather
large sample sizes T = 1000 and T = 4000, corresponding to 4 and 16 years of data.
Our Monte Carlo study therefore also contributes by investigating the usability of this
method when less data are available. We impose strict positivity of the parameter
estimates κ, α and σ and use the true values of θ as starting values in the numerical
routines. In each case the number of Monte Carlo replications is 1000.
An interesting question that arises when considering PBEFs is how to optimally
choose q . No theory exists for this choice, as this would require knowledge of the
intractable conditional expectation Eθ[
f (Yi )|Y1, . . . ,Yi−1]
that we wish to approxi-
mate. One approach for answering this question could be to consider the partial-
autocorrelation function for f (Yi ) and the functions used for predicting f (Yi ) and
then choose q as the cut-off point where the function dies out. In our setting this
corresponds to considering the partial-autocorrelation function for the squared re-
turns. However, inverting the covariance matrix C (θ) can cause numerical challenges
and inaccuracies for large values of q . Instead, we start at the smallest interesting
choice, q = 3, and then later on investigate the sensitivity w.r.t the choice of q .14
When the parameters θ = (κ,α,σ) are estimated, we minimize Gn(θ)′Gn(θ) instead of
solving Gn(θ) = 0. In the implementation of the GMM estimation procedure, we use
continuously updated GMM, where the weight matrix is estimated simultaneously
with the parameters θ. The asymptotic covariance matrix of gT (θ) is estimated using
the heteroscedasticity and autocorrelation consistent estimator from Newey and
West (1987). Regarding the lag length in the Bartlett kernel we follow a rule-of-thumb
from Newey and West (1987), that is b4(T /100)2/9c. The results on the finite sample
performance of the two estimation methods in the absence of noise are summarized
in Table 1.1.
The potential of PBEFs are clear from Panel A in Table 1.1. The estimators are
practically unbiased for the large sample sizes, T = 1000,4000, and the bias for the
small sample size is of an acceptable size. For the small sample sizes, there is a
small downwards bias in κ and a more pronounced upwards bias in σ. The root
mean square relative errors (root MSRE) behave as expected, decaying with T and
14q = 2 would automatically result in an optimal PBEF because the weight matrix A(θ) would then be a3×3 matrix and could be disregarded when solving Gn (θ) = 0.
18 CHAPTER 1. PBEFS FOR STOCHASTIC VOLATILITY MODELS WITH NOISY DATA
Table 1.1. Performance of estimators in absence of noise.
Panel A: PBEF based estimator with q = 3.
Relative Bias (%) Root MSRE (%)
100 400 1000 4000 100 400 1000 4000
Scenario 1
κ= 0.03 -6.924[0.295] -3.386[0.173] -1.467[0.111] -0.279[0.044] 11.97 6.648 3.975 1.499α= 0.25 -1.983[1.590] -2.150[0.893] -0.929[0.594] -0.256[0.311] 52.74 29.67 19.70 10.30σ= 0.10 19.97[0.974] 8.368[0.434] 3.486[0.260] 0.631[0.093] 37.96 16.64 9.309 3.151
Scenario 2
κ= 0.10 -1.207[0.113] -0.165[0.025] -0.062[0.018] -0.012[0.007] 3.934 0.842 0.584 0.245α= 0.25 0.185[0.570] -0.021[0.293] -0.265[0.186] -0.096[0.096] 18.91 9.726 6.172 3.175σ= 0.10 2.940[0.275] 0.348[0.048] 0.132[0.034] 0.026[0.015] 9.594 1.623 1.146 0.486
Scenario 3
κ= 0.10 -4.418[0.192] -1.436[0.111] -0.373[0.058] -0.129[0.025] 7.750 3.942 1.956 0.837α= 0.25 -2.285[1.101] -1.244[0.567] -0.337[0.364] 0.298[0.184] 36.59 18.83 12.06 6.104σ= 0.20 10.92[0.527] 3.347[0.254] 0.853[0.122] 0.275[0.051] 20.61 9.065 4.148 1.719
Panel B: GMM with daily realized variance from five-minute returns.
Relative Bias (%) Root MSRE (%)
100 400 1000 4000 100 400 1000 4000
Scenario 1
κ= 0.03 180.7[10.34] 36.88[2.127] 15.26[1.110] 4.925[0.518] 386.5 79.59 39.85 17.85α= 0.25 226.5[47.73] 1.956[2.677] -3.437[0.620] -2.567[0.311] 1594 88.76 20.84 10.64σ= 0.10 -3.781[0.892] -0.762[0.367] 0.900[0.232] 2.053[0.127] 29.73 12.18 7.743 4.690
Scenario 2
κ= 0.10 54.14[3.840] 13.60[1.172] 7.114[0.644] 2.370[0.291] 138.3 41.15 22.51 9.948α= 0.25 51.57[24.82] -2.793[0.303] -2.640[0.187] -1.885[0.095] 824.0 10.43 6.747 3.668σ= 0.10 3.609[1.036] 5.573[0.390] 6.584[0.234] 6.977[0.115] 34.52 14.07 10.16 7.946
Scenario 3
κ= 0.10 74.58[4.084] 22.22[1.215] 10.67[0.673] 4.222[0.323] 154.5 45.99 24.73 11.51α= 0.25 33.65[11.96] -7.543[0.585] -5.095[0.364] -2.806[0.181] 397.6 20.82 13.10 6.629σ= 0.20 -0.839[0.643] 0.402[0.278] 1.301[0.170] 2.038[0.086] 21.31 9.234 5.796 3.504
The table reports the relative bias (with Monte Carlo std. error in brackets) and the root mean squared relativeerror (root MSRE) of the estimates from the sub-optimal PBEF and the RV-based GMM estimation procedurein the three different scenarios of parameter configurations. The number of Monte Carlo replications is 1000.
1.3. A MONTE CARLO STUDY OF THE FINITE SAMPLE PERFORMANCES 19
are roughly halved when the sample size grows from T = 100 to T = 400 and from
T = 1000 to T = 4000. The mean-reversion rate, κ, is most accurately estimated in
terms of root MSRE, whereas the other drift parameter, α, has the highest root MSRE.
All three parameters are easiest to estimate in Scenario 2, where the volatility process
is less persistent and less volatile. This could be because the other two scenarios are
closer to the non-stationary region where the Feller condition is violated. In Scenario
1, with a highly persistent volatility process, the root MSREs are higher compared to
the other two scenarios.
Turning our attention to Panel B of Table 1.1, we see that the GMM estimator from
Bollerslev and Zhou (2002) performs poorly when the sample size is small. In fact,
the results for T = 100 indicate that the method is not working for sample sizes this
small. For the larger sample sizes, our results match those found in Bollerslev and
Zhou (2002). The table reveals an upwards bias in κ and a smaller downwards bias in
α. The volatility of volatility parameter σ has a small, yet systematic, upwards bias
that actually seems to worsen when the sample size increases.15 The drift parameters
again appear to be easiest to estimate in Scenario 2. In contrast to the results for the
PBEF-based estimator, the most accurate estimates of σ are now found in Scenario 3,
where the volatility of volatility is high.
If we compare Panel A and B of Table 1.1, it is clear that the PBEF-based method
outperforms the GMM approach, especially when the sample size is small. The
informational content of 100 observations of daily realized variance is too small
to fully extract the dynamics of the underlying volatility process compared to 7800
observations of intra-day squared returns, and in general it seems that the PBEF-
based estimator is able to exploit the extra information contained in the intra-daily
returns. Furthermore, a GMM estimator based on 100 observations will, in general,
often result in inaccurate estimates. The gains from using PBEFs are most prominent
for the mean-reversion rate, κ, whereas the root MSREs for α are similar across
the two estimation methods for the larger sample sizes. The gains from using the
PBEF-based method might be even larger if the optimal PBEF was used. As already
discussed, the optimal PBEF could be constructed by simulating the optimal weight
matrix A∗(θ), but this would render the Monte Carlo study more time consuming.
Besides, the aim of the study is to investigate the potential of PBEFs by comparing the
performance of two easily implementable simulation-free estimation methods, and
the promising results for the sup-optimal PBEF only leave room for minor efficiency
gains. The study of the performance of the optimal PBEF is left for future research.
15 This bias in σ can, as found in Bollerslev and Zhou (2002), be explained by the variance of thediscretization error ut ,t+1 := RVt ,t+1 − IVt ,t+1, since Barndorff-Nielsen and Shephard (2002) show thatRV 2
t ,t+1 , is for any fixed sampling frequency, an upwards biased estimator of IV 2t ,t+1. To account for
this discretization error, Bollerslev and Zhou (2002) introduce a nuisance parameter, γ, and approximateIV 2
t+1,t+2 by RV 2t+1,t+2 −γ. We also implemented this simple discretization error correction and found,
in line with Bollerslev and Zhou (2002), that is helps remove the systematic bias in σ, but also roughlydoubles the root MSRE. We will therefore proceed without this correction in the rest of our Monte Carlostudy. The results with discretization error correction are available upon request.
20 CHAPTER 1. PBEFS FOR STOCHASTIC VOLATILITY MODELS WITH NOISY DATA
2 3 4 5 6 7 8 9 100.95
1
1.05
1.1
1.15
1.2
1.25
1.3
1.35
1.4
1.45
Choice of q
RMSRE
forκ
Scenario 1 Scenario 2 Scenario 3
2 3 4 5 6 7 8 9 100.99
0.995
1
1.005
1.01
1.015
1.02
1.025
Choice of q
RMSRE
forα
Scenario 1 Scenario 2 Scenario 3
2 3 4 5 6 7 8 9 100.95
1
1.05
1.1
1.15
1.2
1.25
1.3
1.35
1.4
1.45
Choice of q
RMSRE
forσ
Scenario 1 Scenario 2 Scenario 3
Figure 1.1. Normalized root MSRE for the three parameter estimates in scenario 1-3, withT = 1000, plotted as a function of the number of predictor variables q . The root MSRE are ineach case normalized to have a minimum of 1.
1.3. A MONTE CARLO STUDY OF THE FINITE SAMPLE PERFORMANCES 21
To investigate the optimal choice of q in the PBEF, the impact on the root MSRE
from increasing q is examined. In Figure 1.1 the three different scenarios are consid-
ered for T = 1000, and the root MSREs for the three parameters are plotted against
the choice of q . The shape of the plots for κ andσ looks almost identical. In Scenarios
1 and 3, where the volatility process is close to the non-stationary region, q = 3 seems
to be the optimal choice for κ and σ. In Scenario 2, q = 6 appears to be the optimal
choice for both parameters. Looking at the three plots for α there does not seem to be
much variation in the root MSRE across the choice of q . In Scenario 1 and 3, where
we are close to the non-stationary region, the root MSRE decreases when q increases.
Therefore, we chose q = 3 in our Monte Carlo Study even though the variations in the
root MSREs are small. As already discussed, the optimal choice of q might depend on
the PBEF under consideration, that is, the choice of f and the functional form of the
basis elements in the predictor space. In the rest of our Monte Carlo study we will fix
q = 3.
1.3.2 Including Noise in the Observations
In this section, the impact of MMS noise on the parameter estimates from the two
estimation procedures is investigated. We consider the noise level, ω2 = 0.001, as this
choice is in line with the empirical estimates found for stock returns in Hansen and
Lunde (2006).16 First, we will simulate data from the Heston model with the inclusion
of MMS noise and then perform parameter estimation ignoring the presence of noise.
The resulting estimates will be analyzed in the next subsection. Then, in the following
subsection, the finite sample performance of the noise corrected estimators will be
investigated.
The Impact of Failing to Correct for the Presence of Noise
The finite sample performances of the two estimation methods without noise cor-
rection are summarized in Table 1.2. Panel A of the table reports the results for the
PBEF-based estimator, and Panel B reports the results for the GMM estimator based
on RV .
From the results it follows that the inclusion of MMS noise in the observed process
leads to biases in the parameters. Panel A of the table shows that the downwards bias
in κ has worsened in the presence of noise and the small downwards bias in α has
turned into a severe upwards bias. The bias in α matches exactly the expected bias,2ω2
∆ , which in relative terms equals 62.4%. The upwards bias in σ has also worsened,
but α is the parameter that is most affected by ignoring the presence of noise. For all
three parameters, the highest root MSREs still occur in Scenario 1, and the lowest in
Scenario 2. In Panel B, the results for the performance of the GMM estimator based
on RV reveal that the bias in κ has roughly doubled compared to the no noise setting,
16We also considered the noise level, ω2 = 0.0005, and the results are available upon request.
22 CHAPTER 1. PBEFS FOR STOCHASTIC VOLATILITY MODELS WITH NOISY DATA
Table 1.2. Performance of estimators in presence of noise, ω2 = 0.001.
Panel A: PBEF based estimator with q = 3.
Relative Bias (%) Root MSRE (%)
100 400 1000 4000 100 400 1000 4000
Scenario 1
κ= 0.03 -9.624[0.436] -8.276[0.239] -7.992[0.148] -7.544[0.055] 17.37 11.46 9.383 7.764α= 0.25 60.40[1.589] 60.22[0.893] 61.46[0.594] 62.14[0.311] 80.14 67.11 64.53 62.99σ= 0.10 36.08[1.915] 22.65[0.907] 19.58[0.504] 17.29[0.147] 73.02 37.63 25.74 17.96
Scenario 2
κ= 0.10 -5.750[0.091] -5.485[0.036] -5.490[0.022] -5.563[0.012] 6.492 5.614 5.539 5.577α= 0.25 62.66[0.572] 62.40[0.295] 62.13[0.186] 62.31[0.096] 65.46 63.16 62.44 62.39σ= 0.10 12.88[0.249] 11.92[0.087] 11.90[0.052] 12.06[0.028] 15.30 12.27 12.03 12.09
Scenario 3
κ= 0.10 -8.494[0.291] -7.842[0.132] -7.647[0.068] -7.717[0.031] 12.85 8.983 7.976 7.784α= 0.25 60.10[1.107] 61.18[0.567] 62.12[0.364] 62.71[0.184] 70.42 64.00 63.28 63.01σ= 0.20 24.43[1.154] 18.38[0.422] 17.16[0.180] 17.16[0.077] 45.39 23.10 18.17 17.35
Panel B: GMM with daily realized variance from five-minute returns.
Relative Bias (%) Root MSRE (%)
100 400 1000 4000 100 400 1000 4000
Scenario 1
κ= 0.03 251.2[28.73] 48.49[2.902] 24.77[1.479] 13.07[0.678] 982.2 107.7 54.93 26.00α= 0.25 466.1[55.44] 82.07[11.53] 57.64[0.615] 58.53[0.310] 1890 390.9 61.13 59.42σ= 0.10 -21.71[1.531] -23.15[0.589] -21.23[0.373] -19.48[0.180] 55.06 30.28 24.56 20.37
Scenario 2
κ= 0.10 70.58[7.012] 14.78[1.829] 10.29[1.020] 5.459[0.459] 242.6 62.42 35.35 16.17α= 0.25 212.4[29.47] 59.98[0.471] 59.10[0.191] 59.67[0.096] 998.5 61.98 59.44 59.76σ= 0.10 -5.983[1.735] -10.01[0.728] -7.537[0.409] -7.136[0.192] 57.75 26.13 15.51 9.559
Scenario 3
κ= 0.10 99.39[9.789] 35.58[1.518] 20.86[0.797] 14.58[0.395] 339.4 61.63 33.67 19.59α= 0.25 210.8[41.62] 54.90[0.607] 56.99[0.372] 58.11[0.190] 1396 58.47 58.31 58.45σ= 0.20 -21.13[0.974] -20.20[0.291] -19.61[0.176] -18.99[0.091] 38.58 22.38 20.46 19.23
The table reports the relative bias (with Monte Carlo std. error in brackets) and the root mean squared relativeerror (root MSRE) of the estimates from the sub-optimal PBEF and the RV-based GMM estimation procedurein the three different scenarios of parameter configurations. The number of Monte Carlo replications is 1000.
1.3. A MONTE CARLO STUDY OF THE FINITE SAMPLE PERFORMANCES 23
and it is approximately twice the size of the bias in κ in Panel A. As for the other drift
parameter, the downwards bias in α has been turned into a severe upwards bias of a
similar size as the one found in Panel A. The sign of the bias in σ has also changed,
and the table now reports a downwards bias around the same size as the upwards
bias found in Panel A.
The root MSREs reported in Table 1.2 have all gone up compared to the no noise
case from Table 1.1, and the results show that failing to correct for noise strongly
impacts the parameter estimates. The long-run mean of volatility, α, appears to be
affected the most. As in the no noise setting of Table 1.1, the performance of the
GMM estimator is quite poor for small sample sizes, and it only produces trustworthy
results when T = 400 and we are in Scenario 2 or 3. When the larger sample sizes
T = 1000 and T = 4000 are considered, the root MSREs of κ are higher in Panel B, but
the impact of noise on the root MSREs of α and σ appears to be the same across the
two estimation methods.
Finite Sample Performances of the Noise Corrected Estimation Procedures
The performances of the estimator based on the noise corrected PBEF and the GMM
estimator where noise is corrected for parametrically are reported in Table 1.3. Table
1.4 presents the results for the GMM estimator based on RK .
The results in Panel A of Table 1.3 show that the noise corrected PBEF based
estimation procedure is able to correctly account for the presence of noise and
produce unbiased estimates for the larger sample sizes. In fact, the small upwards
bias in σ found in Table 1.1 has decreased significantly for the smaller sample sizes,
and it disappears as the sample size grows. The variance of the noise process, ω2, is
also accurately estimated. The root MSREs for κ are in general very similar to those
reported in Table 1.1. For the other drift parameter, α, the root MSREs have now
gone down, compared to the no noise setting. Due to the bias reduction in σ, for the
smaller sample sizes T = 100 and T = 400, the root MSRE are also lower than what
was reported in Table 1.1. For the larger sample sizes T = 1000 and T = 4000, the
root MSREs for σ are a bit higher or similar to those found in Table 1.1. For all three
parameters, the root MSREs are smallest in Scenario 2, as was also the case in the no
noise setting.
The performance of the parametrically noise corrected GMM estimator is sum-
marized in Panel B of Table 1.3. An inspection of the results shows that although the
GMM estimator do not produce unbiased estimates it succeeds at incorporating the
noise and produce estimates with small biases comparable to those from Table 1.1.
In fact, the biases are now a bit lower than in Table 1.1, with the difference being most
apparent in Scenario 3. The results for the smallest sample size, T = 100, appear unre-
liable, in the sense that 100 observations are simply not enough to infer the dynamics
of the underlying volatility process and produce estimates with an acceptable level of
bias and root MSREs. Comparing the root MSREs to those reported in Table 1.1, we
24 CHAPTER 1. PBEFS FOR STOCHASTIC VOLATILITY MODELS WITH NOISY DATA
Table 1.3. Performance of noise corrected estimators, ω2 = 0.001.
Panel A: PBEF based estimator with q = 3.
Relative Bias (%) Root MSRE (%)
100 400 1000 4000 100 400 1000 4000
Scenario 1
κ= 0.03 -2.119[0.398] -1.157[0.190] -0.302[0.106] -0.191[0.060] 13.35 6.388 3.535 1.986α= 0.25 -0.945[1.166] -1.326[0.657] -0.549[0.432] -0.130[0.225] 38.66 21.81 14.35 7.475σ= 0.10 8.567[0.676] 3.595[0.414] 0.985[0.221] 0.505[0.123] 24.00 14.17 7.389 4.097ω2 = 0.001 -1.680[0.699] -1.359[0.386] -0.624[0.263] -0.200[0.139] 23.22 12.87 8.742 4.614
Scenario 2
κ= 0.10 0.230[0.156] 0.090[0.047] 0.019[0.030] 0.020[0.014] 5.176 1.567 0.993 0.469α= 0.25 0.185[0.413] 0.016[0.213] -0.185[0.135] -0.060[0.069] 13.71 7.068 4.464 2.300σ= 0.10 0.023[0.189] -0.114[0.085] -0.009[0.058] -0.033[0.029] 6.278 2.805 1.928 0.947ω2 = 0.001 0.121[0.261] -0.023[0.134] -0.130[0.084] -0.052[0.044] 8.652 4.456 2.796 1.456
Scenario 3
κ= 0.10 -1.743[0.247] -0.297[0.110] -0.134[0.077] -0.088[0.042] 8.356 3.649 2.554 1.403α= 0.25 -1.350[0.817] -0.742[0.416] -0.163[0.267] 0.251[0.134] 27.13 13.79 8.865 4.454σ= 0.20 5.617[0.520] 0.991[0.226] 0.466[0.159] 0.235[0.086] 18.13 7.545 5.290 2.864ω2 = 0.001 -1.512[0.483] -0.761[0.260] -0.178[0.170] 0.096[0.082] 16.07 8.661 5.648 2.731
Panel B: GMM with daily realized variance from five-minute returns.
Relative Bias (%) Root MSRE (%)
100 400 1000 4000 100 400 1000 4000
Scenario 1
κ= 0.03 227.0[36.85] 37.05[2.709] 13.99[1.452] 3.166[0.683] 1226 97.15 50.13 22.88α= 0.25 504.5[94.12] 1.871[2.559] -2.444[0.468] -1.674[0.228] 3118 84.86 15.71 7.732σ= 0.10 2.073[5.774] -5.076[0.816] -1.670[0.440] 0.909[0.216] 188.8 27.52 14.70 7.209ω2 = 0.001 22.76[3.409] 5.935[1.124] -0.732[0.300] -1.308[0.143] 113.8 37.74 9.987 4.912
Scenario 2
κ= 0.10 78.30[6.307] 11.78[1.773] 6.247[1.000] 1.541[0.459] 223.1 59.96 33.75 15.29α= 0.25 209.6[59.22] -2.932[0.511] -1.869[0.147] -1.248[0.071] 1973 17.21 5.219 2.662σ= 0.10 6.528[2.338] 3.196[0.832] 6.355[0.451] 6.846[0.214] 77.73 27.78 16.25 9.855ω2 = 0.001 18.27[3.274] 1.651[0.392] -1.276[0.118] -1.736[0.050] 110.0 13.11 4.109 2.401
Scenario 3
κ= 0.10 69.43[4.119] 17.70[1.287] 7.345[0.721] 2.263[0.373] 153.1 46.20 25.01 12.57α= 0.25 136.3[54.52] -2.967[0.430] -2.182[0.266] -1.422[0.134] 1811 14.57 9.088 4.674σ= 0.20 -1.539[1.072] -0.614[0.342] 0.546[0.208] 1.784[0.109] 35.54 11.35 6.919 4.020ω2 = 0.001 15.14[4.856] -1.739[0.270] -1.611[0.162] -1.402[0.081] 161.5 9.123 5.597 3.035
The table reports the relative bias (with Monte Carlo std. error in brackets) and the root mean squaredrelative error (root MSRE) of the estimates from the noise corrected versions of the sub-optimal PBEF andthe RV-based GMM estimation procedure in the three different scenarios of parameter configurations. Thenumber of Monte Carlo replications is 1000.
1.3. A MONTE CARLO STUDY OF THE FINITE SAMPLE PERFORMANCES 25
Table 1.4. Performance of of the GMM estimator based on RK , ω2 = 0.001.
Relative Bias (%) Root MSRE (%)
100 400 1000 4000 100 400 1000 4000
Scenario 1
κ= 0.03 282.0[26.76] 36.34[3.053] 13.81[1.716] 3.484[0.834] 928.8 107.5 58.55 27.85α= 0.25 398.7[62.41] 13.14[5.635] -4.466[0.719] -4.056[0.308] 2103 187.3 24.25 10.98σ= 0.10 15.83[2.544] 3.289[1.138] 6.062[0.802] 9.292[0.423] 85.63 37.85 27.26 16.83
Scenario 2
κ= 0.10 179.6[30.65] 12.80[2.324] 5.320[1.375] 1.527[0.655] 1031 78.10 45.88 21.75α= 0.25 206.5[32.43] 4.226[2.124] -3.348[0.407] -3.674[0.095] 1094 70.54 13.89 4.840σ= 0.10 50.92[6.523] 23.22[1.409] 26.45[0.899] 28.85[0.425] 222.1 52.15 39.85 32.10
Scenario 3
κ= 0.10 75.02[4.856] 19.33[1.403] 8.028[0.779] 2.546[0.389] 177.5 50.37 27.03 13.14α= 0.25 166.0[35.48] -7.895[0.584] -5.964[0.356] -4.141[0.179] 1187 20.90 13.23 7.238σ= 0.20 2.936[1.112] 6.264[0.499] 7.654[0.327] 9.132[0.178] 36.95 17.69 13.26 10.87
The table reports the relative bias (with Monte Carlo std. error in brackets) and the root mean squared relativeerror (root MSRE) of the estimates from the RK-based GMM estimation procedure in the three differentscenarios of parameter configurations. The number of Monte Carlo replications is 1000.
find that the root MSREs are in general similar to the no noise setting, with the root
MSREs for κ and σ being a bit bigger than in Table 1.1 and the root MSREs for α a bit
smaller. The patterns, found in Panel B of Table 1.1, across the different scenarios
are repeated, the drift parameters have lower root MSREs in Scenario 2, whereas σ is
most accurately estimated in Scenario 3.
When comparing Panel A and B of Table 1.3, it is clear that the PBEF-based
estimation method produces more accurate parameter estimates. The root MSREs
of α are quite similar for the two methods, but the root MSREs are lower in Panel A,
for the other two parameters and the noise variance. The difference between the two
methods is most prominent for the mean-reversion parameter, which is extremely
well estimated with the PBEF-based method.
We also considered estimating the Heston model using noisy data, by simply
replacing RV with a realized kernel, RK , in the original moment conditions from
Bollerslev and Zhou (2002). The finite sample performance of this estimator is sum-
marized in Table 1.4. Even though the estimator is based on six moment conditions,
compared to the four moment conditions used for constructing the parametrically
noise corrected estimator, the performance is not nearly as good. The biases in the
parameters are larger, but have the same signs as in Panel B of Table 1.3. The small
systematic bias in σ due to the dicretization error and noisy data has also increased
compared to Table 1.1. This could be explained by the slower rate of convergence of
RK to IV , compared to the convergence rate of RV . The biases in the drift parameters
are however comparable to those reported in the no noise setting, and for κ the bias
is in fact a bit lower. The two ways of correcting for noise in the GMM approach give
rise to somewhat similar results for the mean-reversion rate κ, but the root MSREs
26 CHAPTER 1. PBEFS FOR STOCHASTIC VOLATILITY MODELS WITH NOISY DATA
are lower for α and σ when the noise corrected estimator based on RV is employed.
In the parametrically noise corrected GMM approach, the noise varianceω2 is also es-
timated. This was not possible in the non-parametric way of accounting for noise. Of
course, not having to specify the dynamics of the noise process could be an advantage
in other applications where model misspecification might occur. The investigation
of robustness towards misspecification of the noise process is outside the scope of
this chapter. Note that the noise specification also impacts the optimal choice of
kernel used for constructing the time-series of RK . In our setting, without model
misspecification, the parametric approach of correcting for the noise outperforms
the non-parametric approach.
In conclusion, the PBEF-based estimation method produced promising results,
and it appears that the method is able to exploit the extra information contained in
directly using the dynamics of the intra-day returns, instead of aggregating them into
realized measures. When little data history is available, the difference between the
two methods also becomes more pronounced, as the PBEF-based method is based on
moments of high frequency data and not on moments of the daily realized measures.
The PBEF-based method also handles the presence of noise better than the GMM
approach, at least in our simulation setting.
It still remains to be investigated how the two estimation methods perform in
empirical application to economic data, such as stock returns, a task that we will
undertake in the following section.
1.4 Empirical Application
In this section we use actual five-minute returns as input in the two estimation
methods analyzed in our Monte Carlo study. We are well aware that the Heston model
might not fit the chosen data. This is however not the purpose of this exercise. The
empirical application should rather be seen as a check of what happens when the
estimation methods are used to fit a (possibly misspecified) model to real data. The
empirical application is also an investigation of how different choices, such as how to
correct for MMS noise in the GMM estimator and the choice of predictor space in the
flexible PBEF-based method, might affect the parameter estimates.
1.4.1 Data description
For our empirical illustration we use five-minute returns for SPDR S&P 500 (SPY).
SPY is an exchange traded fund (ETF) that tracks the S&P 500. The sample covers
the period from January 4, 2010 through December 31, 2013. We sample the first
price each day at 9:35 and then every 5 minute until the close at 16:00. Thus, we
have 77 daily five-minute returns for each of the 1006 trading days in our sample,
yielding a total of 77462 five-minute returns. Careful data cleaning is important when
estimating volatility models from high-frequency data. Numerous problems and
1.4. EMPIRICAL APPLICATION 27
solutions are discussed in Falkenberry (2001), Hansen and Lunde (2006), Brownless
and Gallo (2006) and Barndorff-Nielsen, Hansen, Lunde, and Shephard (2008b). We
follow the step-by-step cleaning procedure used in Barndorff-Nielsen et al. (2008b).
As a first inspection of the data characteristics, we consider the empirical auto-
correlation functions for the squared five-minute returns, reported in the top panel
of Figure 1.2. The autocorrelation function does not seem to be exponentially de-
caying, revealing that the Heston model will not be able to properly account for the
dynamics of the data. However, our main interest lies in investigating whether the two
estimation methods will yield similar parameter estimates or whether they perform
differently. The autocorrelation function also exhibits cyclical patterns corresponding
to a lag length of one trading day. This is due to the well-documented intra-day peri-
odicity in volatility in foreign exchange and equity markets, see for instance Andersen
and Bollerslev (1997) or Dacorogna, Müller, Nagler, Olsen, and Pictet (1993).
1 77 154 231 308 385 462 539 616 693 770
−0.02
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
ACF,5min.retu
rns
Lag length j
1 77 154 231 308 385 462 539 616 693 770
−0.02
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
ACF,adj.
5min.retu
rns
Lag length j
Figure 1.2. Autocorrelation function for the squared five-minute returns (top) and the adjustedsquared five-minute returns (bottom) on SPY.
The intra-daily periodicity in the volatility might cause the two estimation meth-
ods to perform differently. The intra-daily periodicity should not affect the GMM esti-
mator much, since the intra-daily pattern in volatility will be “smoothed out” when
the five-minute returns are aggregated into the daily realized measures. The same
does not apply for the estimator based on PBEFs, since this estimator is based directly
on the squared five-minute returns. Hence, the intra-daily periodicity might effect
the parameter estimates when the PBEF-based estimation method is carried out. In
28 CHAPTER 1. PBEFS FOR STOCHASTIC VOLATILITY MODELS WITH NOISY DATA
order to avoid this, the intra-daily volatility pattern is captured by fitting a spline
function to the intra-daily averages of five-minute returns using a non-parametric
kernel regression17.
The data are then adjusted for periodicity in intra-daily volatility by dividing the
squared returns by the fitted values from the spline function, matched according to
the intra-daily five-minute interval in which the observation falls. Finally, the squared
returns are normalized such that the overall variance of the squared returns remains
unchanged. The bottom panel of Figure 1.2 displays the autocorrelation function
for the adjusted data. From the figure it is clear that the intra-daily periodicity has
been removed. It is however also evident that the autocorrelation function is not
exponentially decaying, rendering the Heston model a poor model choice. There
seems to be a need for at least a two factor SV model, in order to properly capture
the dynamics of the autocorrelation function. One factor is needed for capturing
the fast decay in the autocorrelation function at the short end, whereas the other
factor should be slowly mean-reverting and thereby account for the persistent or long
memory-like factor in the variance.
1.4.2 Estimation results
In the Heston model, the decay rate of the autocorrelation function for the squared
returns is uniquely governed by the mean-reversion parameter κ. Due to the dynamic
structure of the autocorrelation function, discussed above, the choice of prediction
space might heavily influence the estimated value of κ. Depending on the largest
time lag of past squared returns included in the predictor space, different dynamics
might be captured. When fitting the Heston model to the adjusted data, we hold the
dimension of the predictor space fixed at 4, (q=3), but consider four different choices
of basis elements, spanning the predictor space. The four cases correspond to having
1 hour, 1 day, 2 days, and 4 days between each of the basis elements.18 The time lag
between the basis elements will be denoted by the variable l . The simple choice of
weight matrix used in the Monte Carlo study is also employed when constructing the
PBEFs. We will only consider the noise corrected estimators, as MMS noise is a stylized
feature of the present data. The variance of the noise process is not estimated in the
GMM approach based on RK , so instead we report the non-parametric estimate of
the daily MMS noise variance, using the formula favored in Barndorff-Nielsen et al.
(2008a)
ω2 = exp[log(ω2)−RK /RV ],
17We use the fit function in MATLAB with a smoothing spline, and we set the smoothing parameterequal to 0.001.
18That is, in the first case we let the predictor space be spanned by Y 2i−1,Y 2
i−13,Y 2i−25 and a constant.
In the second case we choose Y 2i−1,Yi−78,Yi−155 and a constant as the basis elements of the predictor
space and so on.
1.4. EMPIRICAL APPLICATION 29
with RK and RV constructed using the intra-daily adjusted five-minute returns and
where ω2 = RV /(2m). In our case m = 77, and the overall estimate of the MMS noise
variance is found by averaging the 1006 daily estimates. For the SPY data we find
ω2avg = 0.001563. The realized kernel is now computed using the parzen kernel with
H ∝ (1/∆)3/5 as recommended in Barndorff-Nielsen et al. (2008b) for empirical
applications. The obtained convergence rate of RK to IV is now (1/∆)1/5. The choice
of bandwidth H , that resulted in the convergence rate of (1/∆)1/4 in our Monte Carlo
study, relies heavily on the assumption of i.i.d. MMS noise which might not hold in
practice.
The results from fitting the Heston model to the data, using the various estimators,
are reported in Table 1.5. Computation of asymptotic standard errors for the PBEF
estimator is challenging. It involves computation of the matrix M n(θ), that also enters
the expression for the optimal weight matrix A∗(θ), and was in fact the main reason
why we focused on the sub-optimal PBEF in our Monte Carlo study. Therefore, we
resort to bootstrap methods for computing standard errors. The standard errors and
95% confidence intervals (CI) reported in Table 1.5 are computed using the moving
block bootstrap, as recommended in Lahiri (1999). The confidence intervals are
equal tail intervals, constructed using the percentile method. As for the block length
in the bootstrap method, then T 1/3(≈ 10) days is the rule-of-thumb advocated in
Hall, Horowitz, and Jing (1995) for constructing standard errors. Due to the strong
persistence in the data, we choose to be conservative and use a block length of 20
days.
Table 1.5 also reports the fit to the moments of the adjusted squared returns im-
plied by the parameter estimates from the various estimation methods. The obtained
parameter estimates vary across the different estimation methods, but all are mean-
ingful and within the same range. From Table 1.5, we see that the two different ways
of noise correcting the GMM estimator impact the parameter estimates, especially
σ. The estimated noise variance in the parametrically noise corrected method is
only about half of the non-parametric estimate ω2avg . It also becomes evident that
the choice of the predictor space, represented by the l variable, highly impacts the
parameter estimates. When l is low, κ is high, and more emphasis is put, by the PBEF
estimator, on capturing the fast decaying part in the short end of the autocorrelation
function. As the l variable increases, κ drastically drops and reveals the need for
several volatility factors in order to fully capture the dynamics of the data. The split
between how much of the mean of the squared adjusted returns is due to the long-run
mean of the volatility process, α, and how much is due to the variance of MMS noise,
ω2, also varies with the l variable. An exception is when l = 77, where the estimated
noise variance is similar to the estimate obtained with the parametrically noise cor-
rected GMM method, otherwise the noise variance is either severely underestimated
or potentially overestimated. This could be a consequence of the i.i.d. Gaussian noise
assumption, which might not hold. It could also be that the PBEF-based method has
30 CHAPTER 1. PBEFS FOR STOCHASTIC VOLATILITY MODELS WITH NOISY DATA
Tab
le1.
5.E
stim
atio
nre
sult
sfo
rth
ees
tim
atio
nm
eth
od
sw
ith
no
ise
corr
ecti
on
.
100×
100×
100×
Est
imat
ion
met
ho
dκ
ασ
FC
ω2
E[Y
2 i]
V(Y
2 i)
AC
F1
(Y2 i
)
Sam
ple
mo
men
ts0.
684
0.04
420.
240
GM
Mw
ith
RK
0.07
76(0
.056
1)0.
500
(0.1
11)
0.48
5(0
.120
)0.
157
(0.1
15)
0.15
63(0
.025
0)0.
961
0.05
670.
233
[0.0
469:
0.27
70]
[0.3
47:0
.692
][0
.299
:0.7
45]
[-0.
012:
0.38
6][0
.111
9:0.
2103
]
GM
Mn
ois
eCo
rr.
0.10
13(0
.052
8)0.
426
(0.1
86)
0.77
5(0
.223
)0.
514
(0.3
56)
0.08
80(0
.009
4)0.
866
0.07
880.
276
[0.0
317:
0.25
86]
[0.2
69:0
.661
][0
.456
:1.2
84]
[0.1
27:1
.460
][0
.076
8:0.
1059
]
PB
EF
1st
ep
q=
3,l=
120.
3874
(0.2
085)
0.40
3(0
.140
)1.
017
(0.5
20)
0.72
1(1
.897
)0.
0803
(0.0
745)
0.83
60.
0412
0.23
1[0
.103
6:0.
9379
][0
.144
:0.6
41]
[0.4
13:2
.319
][-
0.07
5:5.
160]
[0.0
004:
0.24
50]
q=
3,l=
770.
2593
(0.0
532)
0.51
4(0
.078
)0.
786
(0.3
50)
0.35
1(0
.908
)0.
0085
(0.0
111)
0.98
10.
0502
0.21
5[0
.230
0:0.
4461
][0
.375
:0.6
80]
[0.4
37:1
.642
][-
0.07
5:2.
313]
[0.0
004:
0.00
86]
q=
3,l=
154
0.09
56(0
.050
6)0.
128
(0.0
59)
0.63
0(0
.617
)0.
373
(2.5
87)
0.25
98(0
.063
1)0.
479
0.01
800.
275
[0.0
376:
0.24
74]
[0.0
34:0
.268
][0
.195
:2.3
76]
[0.0
20:5
.635
][0
.125
9:0.
3741
]
q=
3,l=
308
0.06
77(0
.033
0)0.
139
(0.1
32)
0.53
5(0
.405
)0.
267
(1.1
85)
0.25
34(0
.089
9)0.
493
0.01
970.
276
[0.0
539:
0.18
07]
[0.0
53:0
.547
][0
.251
:1.6
84]
[0.0
21:2
.824
][0
.008
7:0.
3493
]
PB
EF
2st
ep
q=
3,l=
120.
2356
(0.2
232)
0.28
6(0
.043
)0.
876
(0.6
32)
0.63
2(2
.032
)0.
1563
(0.0
250)
0.68
40.
0329
0.25
3[0
.008
5:0.
8752
][0
.214
:0.3
79]
[0.1
18:2
.372
][0
.007
:5.3
01]
[0.1
119:
0.21
03]
q=
3,l=
770.
2270
(0.0
498)
0.28
7(0
.043
)0.
894
(0.4
66)
0.66
8(1
.491
)0.
1563
(0.0
250)
0.68
50.
0349
0.25
7[0
.166
7:0.
3747
][0
.214
:0.3
79]
[0.4
11:2
.008
][0
.064
:3.8
53]
[0.1
119:
0.21
03]
q=
3,l=
154
0.15
85(0
.038
6)0.
287
(0.0
43)
0.70
9(0
.376
)0.
412
(0.9
66)
0.15
63(0
.025
1)0.
686
0.03
250.
251
[0.1
255:
0.28
13]
[0.2
13:0
.378
][0
.325
:1.6
69]
[0.0
35:2
.658
][0
.111
5:0.
2103
]
q=
3,l=
308
0.09
63(0
.025
3)0.
289
(0.0
43)
0.56
9(0
.301
)0.
269
(0.5
61)
0.15
63(0
.025
0)0.
687
0.03
400.
255
[0.0
892:
0.19
12]
[0.2
13:0
.379
][0
.295
:1.3
55]
[0.0
34:1
.746
][0
.111
9:0.
2103
]
Th
eta
ble
rep
orts
the
pa
ram
eter
esti
ma
tes
from
fitt
ing
the
Hes
ton
mod
elto
the
SPY
da
ta.T
he
vari
abl
eF
Cd
enot
esth
eFe
ller
con
dit
ion
,σ2−2
κα
,an
dit
isp
osit
ive
ifth
ep
aram
eter
con
stra
inti
svi
olat
ed.T
he
tabl
eal
sore
por
tssa
mp
lem
omen
tsas
wel
las
theo
reti
calm
omen
tsim
pli
edby
the
vari
ous
obta
ined
par
amet
eres
tim
ates
.Th
est
d.e
rror
san
d95
%eq
ual
tail
CI’s
are
com
pu
ted
usi
ng
the
mov
ing
bloc
kbo
otst
rap
wit
ha
bloc
kle
ngt
hof
20d
ays
and
B=9
99.
1.5. CONCLUSION AND FINAL REMARKS 31
problems identifying ω2 and α in the data at hand. We therefore also consider fixing
the noise variance at the estimate ω2avg and only estimate the three parameters from
the Heston model. This procedure is denoted by PBEF 2 step in Table 1.5.
The results from the PBEF 2 step method reveals that fixing the noise variance
mainly affects the estimate of α, which is now constant across the choice of predictor
space. The PBEF 2 step method also appears more stable, with the relative standard
errors19 being constant across the values of the time lag in the predictor space, except
for the intra-day time lag (l = 12). Checking for parameter stability across the different
predictor spaces employed could serve as a general robustness check that might
reveal model misspecification. The last three columns of Table 1.5 report the model-
implied fit to the sample moments of the data. The mean of the adjusted squared
returns is extremely well fitted by the PBEF 2 step procedure, whereas the other
methods do not give as good fits. The variance is however reasonably well matched
when the lower l values are used in the PBEF 1 step procedure, but is poorly matched
when the l variable is high. The PBEF 2 step procedure and the GMM estimator based
on RK also produce reasonable fits to the variance. The first order autocorrelation
of the squared returns are best matched by the GMM method based on RV and the
PBEF 1 step method with l = 12, with the PBEF 2 step approach being the runner-up.
Not surprisingly, the overall best fit is obtained by the PBEF-based method, and the
fit appears more stable when the PBEF 2 step approach is used.
From Table 1.5 we also observe that the Feller condition is violated in all the
estimation procedures, indicating that the Heston model provides a poor fit to the
data.20 However, this has no influence on the specific aim of this section, which
was to investigate how the two different estimation methods handle real data with
possible model misspecification. The problem seems to be that the dynamic structure
implied by the Heston model is not flexible enough to adequately model the observed
dynamics. The need for allowing for several volatility factors is best highlighted by
the PBEF-based estimation method. The flexibility of the PBEF-based method can
in general serve as a robustness check of the specified model, including the noise
specification.
1.5 Conclusion and Final Remarks
The general theory underlying PBEFs was reviewed and detailed. We explicitly con-
structed PBEFs for parameter estimation in the Heston model with and without the
inclusion of noise in the data. Implementation issues and the link between opti-
mal GMM estimation and the optimal PBEF were also derived. As a benchmark for
evaluating the performance of the PBEF-based estimator, we considered the GMM
estimator from Bollerslev and Zhou (2002), and we extended the method to handle
19The standard errors normalized by the parameter estimate.200 is actually contained in the CI’s for the FC variable when the GMM method based on RK and the
PBEF 1 step method with l = 12 and l = 77 are employed, but only barely.
32 CHAPTER 1. PBEFS FOR STOCHASTIC VOLATILITY MODELS WITH NOISY DATA
noisy data in a parametric and non-parametric way. The finite sample performance
of the estimator based on PBEFs were investigated in a Monte Carlo study and com-
pared to that of the GMM estimator. Both the case with and without the inclusion
of additive i.i.d. MMS noise was considered. In the no MMS noise setting, there are
gains to be made from using PBEFs, both in terms of bias and root MSRE, especially
when the sample size is small. The gain from using the PBEF-based method was most
prominent for the mean-reversion parameter, κ, that was extremely well estimated.
The PBEF-based method produced promising results in all three parameter configu-
rations, but the root MSREs were lower when the volatility process was less persistent
and less volatile.
Including MMS noise in the observation equation, but neglecting to correct
for it, produced biased estimates, with the upwards bias in the long run average
variance, α, being most severe. We then considered the performance of the noise
corrected estimation methods. The PBEF-based estimator and the parametrically
noise corrected GMM estimator produced results similar to those found in the no
MMS noise setting. The non-parametric approach, where the GMM estimator is
based on RK , did not perform as well. This result is probably caused by the slower
convergence rate of RK compared to RV , making it a more noisy proxy for IV . The
non-parametric approach also has the drawback of not producing an estimator for
the noise variance. Even though the parametric way of correcting the GMM estimator
for noise produced results similar to those found in the no MMS noise setting, the
estimator could not compete with the results obtained using the noise corrected
PBEF. The difference is again more pronounced for the mean-reversion rate, but
the volatility of volatility parameter, σ, and the noise variance, ω2, were also more
accurately estimated with the PBEF-based method.
Based on our Monte Carlo study, PBEFs seem like a promising tool for conducting
inference in stochastic processes, and it appears that the method is able to exploit
the additional information contained in the intra-day returns. The gain from using
high frequency observations directly might however come at a cost and one concern
regarding the application of the PBEF based method to real data is the possible
sensitivity towards intra-daily dynamics, like intra-daily periodicity in volatility. With
the GMM based estimator, this intra-daily periodicity is not of any concern, as this is
averaged out in the aggregation step. On the other hand, if the data is only available
at low frequencies, then the PBEF-based method has an obvious advantage. In our
empirical application, by fitting the Heston model to SPY data, we investigated how
the two different approaches handle real data. The data was cleaned and corrected
for the intra-daily volatility pattern. The aim was not to provide the best fitting model,
but to investigate how the methods deal with possible model misspecification. The
empirical application revealed that the choice of estimation approach impacts the
parameter estimates. The study also made it clear, how the great flexibility of the
PBEF based estimation method could serve as a way of conducting robustness checks,
1.5. CONCLUSION AND FINAL REMARKS 33
for instance by checking for parameter stability across different time-spans of the
predictor space.
An interesting extension of the Heston model would be to allow for several inde-
pendent volatility factors, in order to better capture the persistence in the data. It is
possible to derive PBEFs in this setup as long as the mean, variance and covariance
structure is computable for each of the volatility factors. It would also be of interest
to see how the estimation method based on PBEFs performs if we extend the Monte
Carlo setup by leaving the assumption of i.i.d. noise. This would however complicate
the construction of the MMS noise corrected PBEF and the recalculation of the mo-
ments used for constructing the GMM estimator. A solution to this potential problem
could be to filter out the noise in a first step using the method of pre-averaging intro-
duced by Jacod, Li, Mykland, Podolskij, and Vetter (2009), instead of modeling the
noise directly. The performance of this approach is still to be investigated. Since the
PBEF based estimation method is quite general, an important contribution to the
existing literature would be to consider PBEFs in a setting where the driving sources
of randomness are general Lévy processes, like the models considered in Brockwell
(2001), Barndorff-Nielsen and Shephard (2001), and in Todorov and Tauchen (2006).
Finally, quantifying the gain from using the optimal PBEF in different settings, as well
as how to best simulate or approximate the optimal weight matrix, would also be a
topic for future research.
34 CHAPTER 1. PBEFS FOR STOCHASTIC VOLATILITY MODELS WITH NOISY DATA
1.6 Appendix
Appendix A: A Note on Orthogonal Projection
Let Y , Z1, . . . , Zn denote random variables with finite second moments. We wish to
compute the orthogonal projection, Y , of the random variable Y on the linear space
V = span1, Z1, . . . , Zn. To that end, let us introduce the notation
Z = (Z1, . . . , Zn
),
mY = E [Y ],
mZ = (E [Z1], . . . ,E [Zn]
),
Cov(Z,Z) = E[(Z−mZ)(Z−mZ)T ]
,
Cov(Y ,Z) = E[(Y −mY )(Z−mZ)T ]
.
From Karlin and Taylor (1975) we know that the orthogonal projection, Y , of Y on
V is an element of V that fulfills the normal equations E[v(Y − Y )
]= 0 for all v ∈ V .
This means (Y − Y )⊥V due to our definition of the inner product in this L 2-space.
Theorem 1. Under the assumption of a non-singular covariance matrix Cov(Z,Z), the
orthogonal projection Y exists and is given by
Y = mY +Cov(Y ,Z)Cov−1(Z,Z)(Z−mZ).
Proof. Let a = (a1, . . . , an) be an arbitrary vector and consider the random variable φ
given by
φ= (Y −mY )+a(Z−mZ).
The aim is now to choose the vector, a, such thatφ becomes orthogonal to V , because
in this case we obtain the following decomposition of Y
Y = mY −a(Z−mZ)︸ ︷︷ ︸∈V
+ φ︸︷︷︸∈V ⊥
,
and hence we have Y = mY −a(Z−mZ). By construction, we know that E [φ] = 0 and
E [Z−mZ] = 0, and since we furthermore have φ ∈ V ⊥ and Z−mZ ∈ V we get
E [φ(Z−mZ)] = 0.
Combining this with the definition of φ, we obtain the following equation
Cov(Y ,Z)+aCov(Z,Z) = 0
In order to ensure φ ∈ V ⊥ we conclude that we should put
a =−Cov(Y ,Z)Cov−1(Z,Z).
Thus, the orthogonal projection of Y onto V is given by
Y = mY +Cov(Y ,Z)Cov−1(Z,Z)(Z−mZ).
1.6. APPENDIX 35
Appendix B: General Theory on Optimal Estimating Functions
It is well-known that the ideal choice of estimating function would be to use the
score function, Un(θ), since this usually yields an efficient estimator and provides
a minimal sufficient partitioning of the sample space. However, the score function
might be unavailable or difficult to calculate, so the need for a optimal estimating
function within a class of estimating functions arises. We focus only on the class, G ,
of zero mean, square integrable estimating functions Gn(θ) :=Gn(Yi , i ≤ n,θ). We
furthermore assume that the p ×p matrices Eθ[∂θT Gn(θ)] and Eθ[Gn(θ)Gn(θ)T ] are
non-singular. Let Gn ⊆G and consider the standardized estimating function given by
G (s)n (θ) =−Eθ[∂θT Gn(θ)]T (
Eθ[Gn(θ)Gn(θ)T ])−1Gn(θ).
OF -optimality, (fixed sample optimality / Godambe optimality), within Gn is achieved
by maximizing the covariance matrix of the standardized estimating functions. That
is, the information criterion to be maximized is the Godambe information
I (Gn(θ)) = Eθ[G (s)n (θ)G (s)
n (θ)T ] = Eθ[∂θT Gn(θ)]T (Eθ[Gn(θ)Gn(θ)T ]
)−1Eθ[∂θT Gn(θ)],
which is a natural generalization of the Fisher information. If the score function,
Un(θ) = ∂θT logLn(θ), exists we actually obtain the Fisher information
I (Un(θ)) = Eθ[∂θT Un(θ)]T (Eθ[Un(θ)Un(θ)T ]
)−1Eθ[∂θT Un(θ)] = Eθ[Un(θ)Un(θ)T ],
because the score function (usually) satisfies the second Bartlett-identity
Eθ[Un(θ)Un(θ)T ] =−Eθ[∂θT Un(θ)].
The rational behind considering the standardized estimating function G (s)n (θ) is that
G (s)n (θ) satisfies the second Bartlett-identity and is therefore more directly comparable
to the score function.
Definition 2. G∗n(θ) ∈Gn is an OF -optimal estimating function within Gn if
I (G∗n(θ))−I (Gn(θ))
is non-negative definite for all Gn(θ) ∈Gn and for all θ ∈Θ.
If an OF -optimal estimating function exists it is often referred to as the quasi-
score estimating function and the corresponding estimator as the quasi-likelihood
estimator. The quasi-score estimating function is close to the score function in an
L 2-sense since we have the following result. Suppose G∗n(θ) is OF -optimal in Gn ,
then
Eθ[(G (s)n (θ)−Un(θ))T (G (s)
n (θ)−Un(θ))] ≥ Eθ[(G∗(s)n (θ)−Un(θ))T (G∗(s)
n (θ)−Un(θ))]
36 CHAPTER 1. PBEFS FOR STOCHASTIC VOLATILITY MODELS WITH NOISY DATA
for all Gn ∈Gn and for all θ ∈Θ. In fact, if Gn is a closed subspace of G , then the quasi-
score function is the orthogonal projection of Un(θ) onto Gn and can be interpreted
as an approximation to the score function. By choosing a sequence of subspaces Gn ,
that, as n →∞, converges to a subspace containing Un(θ), a sequence of estimators
that are asymptotic fully efficient can be constructed.
The following theorem (Thm. 2.1 in Heyde (1997)) provides a tool for verifying
optimality of an estimating function and can be used to find optimal PBEFs.
Theorem 3. G∗n ∈Gn is an OF -optimal estimating function within Gn if
Eθ[G∗(s)n (θ)G (s)
n (θ)T ] = Eθ[G (s)n (θ)G∗(s)
n (θ)T ] = Eθ[G (s)n (θ)G (s)
n (θ)T ], (1.34)
or equivalently
Eθ[∂θT Gn(θ)]−1Eθ[Gn(θ)G∗n(θ)T ] (1.35)
is a constant matrix for all Gn ∈Gn and for all θ ∈Θ. Conversely, if Gn is convex and
G∗n ∈Gn is OF -optimal within Gn , then (1.34) holds.
For a proof of the theorem see p. 14-15 in Heyde (1997). (1.35) can often be verified
by showing that
Eθ[Gn(θ)G∗n(θ)T ] =−Eθ[∂θT Gn(θ)]
for all Gn ∈ Gn and for all θ ∈Θ. In this case the optimal estimating function G∗n(θ)
satisfies the second Bartlett-identity and the Godambe information simplifies to the
Fisher information, which in this situation equals −Eθ[∂θT G∗n(θ)].
Optimal Prediction-based Estimating Functions
We now consider how to find the optimal PBEF within the class of PBEFs based on
finite dimensional predictor spaces of the form considered in the second section of
the chapter. This means we are studying PBEFs of the form (1.6), with s = q
Gn(θ) = A(θ)n∑
i=q+1H (i )(θ). (1.36)
Let r = q +1 and define the r ×p matrix
U (θ) =−Eθ[∂θT H (r )] = C (θ)∂θT ˆa(θ),
where
C (θ) = [Eθ[Z (r−1)
k Z (r−1)l ]
]k,l=0...,q
with Z (r−1)0 = 1 as usual and where the r -dimensional vector ˆa(θ) is given by
ˆa(θ)T = (a0(θ), a1(θ), . . . , aq (θ))T .
1.6. APPENDIX 37
Remark 4. Note that C (θ) is related to the covariance matrix C (θ) in the following
way
C (θ) =
0 . . . 0... C (θ)
0
+Eθ[Z (r−1)]Eθ[Z (r−1)]T ,
with the r -dimensional vector Z (r−1) given by
Z (r−1) = (Z (r−1)
0 , Z (r−1)1 , . . . , Z (r−1)
q
)T .
From Sørensen (2000) we have the following result on how to choose the optimal
weight matrix A∗(θ)
Proposition 1. Suppose that for all θ ∈ Θ the matrix ∂θT ˆa(θ) has rank p. Then the
matrix
M n(θ) = Eθ[H (r )(θ)H (r )(θ)T ] (1.37)
+n−r∑k=1
(n − r −k +1)
n − r +1
(Eθ[H (r )(θ)H (r+k)(θ)T ]+Eθ[H (r+k)(θ)H (r )(θ)T ]
)(1.38)
is invertible, and the estimating function
G∗n(θ) = A∗
n(θ)n∑
i=rH (i )(θ),
where
A∗n(θ) =U (θ)T M n(θ)−1,
is OF -optimal within the class of estimating functions of the type (1.36), for which A(θ)
has rank p. Furthermore, the optimal estimating function G∗n(θ) satisfies the second
Bartlett-identity with Godambe information U (θ)T M n(θ)−1U (θ).
For a proof see proposition 3.2 in Sørensen (2000). In the uncommon case where
p equals q +1 the weight matrix A∗n(θ) is invertible and hence does not influence the
estimator. In this case, A∗n(θ) only ensures that the second Bartlett-identity holds.
Since the observed process is assumed to be stationary we use equation (1.5) and
find
∂θka(θ) =C (θ)−1[∂θk
b(θ)− (∂θkC (θ))a(θ)].
An expression for ∂θT a0 can be found by differentiating the expression following (1.5).
Note that only unconditional moments and derivatives of unconditional moments are
needed to compute optimal prediction-based estimating functions. Thus, if we know
C (θ),b(θ),Eθ[Z (r−1)] and Eθ[ f (Yr )], their derivatives and the moments appearing in
(1.38) we can compute optimal PBEFs.
If the observed process Yi is sufficientlyα-mixing then M n(θ) → M(θ) as n →∞,
(see section 6 in Sørensen (2000) or section 4 in Sørensen (2011) for these type of
38 CHAPTER 1. PBEFS FOR STOCHASTIC VOLATILITY MODELS WITH NOISY DATA
results). Asymptotically it does not matter if we use U (θ)T M n(θ)−1 or U (θ)T M(θ)−1
as our optimal weight, the asymptotic variance of the resulting estimator will be
unaffected by this choice. The most challenging part of computing the optimal weight
matrix, A∗n(θ), is to compute M n(θ).
Remark 5. (From Sørensen (2011))
If we let
Hn(θ) = 1
n − r +1
n∑i=r
H (i )(θ),
then M n(θ) is the covariance matrix ofp
n − r +1Hn(θ). This means that in practice
the matrix M n(θ) can be calculated by simulatingp
n − r +1Hn(θ) a large number of
times under Pθ and then calculate the empirical covariance matrix. Calculating the
optimal weight matrix A∗n(θ) can be quite time consuming, and one can save a lot
time if A∗n(θ) is calculated for one parameter value only. This can be done by replacing
A∗n(θ) by A∗
n(θn), where θn is a consistent estimator of θ. Under the conditions 4.1
and 4.2 from Sørensen (2011), one way of obtaining a consistent estimator θn would
be to use the estimating function obtained by choosing p coordinates of Hn(θ).
1.6. APPENDIX 39
Appendix C: Moment Conditions underlying the GMM Estimator
Derivation of the Conditional First- and Second order Moments of IntegratedVolatility
The moment conditions, that Bollerslev and Zhou (2002) base their GMM estimator
on, comes from the derivations of the conditional first- and second-order moments
of daily integrated volatility (IV ). In this subsection the derivation for the Heston
model found inBollerslev and Zhou (2002) will be reviewed. In the next subsection
we consider how to compute the expectation of squared realized variance in the
presence of MMS noise. The derivations are then used to form a moment condition
used for constructing the GMM estimator that takes MMS noise into account.
Before deriving the conditional moments and the resulting moment conditions
the following notation is introduced
Ft =σvs ; s ≤ t ,
Gt =σIVt−s−1,t−s ; s = 0,1,2, . . .∞.
Note that the discrete sigma-algebra, Gt , generated by the integrated volatility series
is contained in the continuous sigma-algebra, Ft , generated by the point-in-time
volatility process. The distinction between the two sigma-algebras is important in
the derivation of the conditional first and and second order moments of IV .
First, let us consider how to derive the moment condition based on the con-
ditional mean of daily IV . From Cox, Ingersoll, and Ross (1985) it follows that the
conditional mean of the volatility process is given by
Eθ[vT |Ft ] = vt e−κ(T−t ) +α(1−e−κ(T−t ))= δT−t vt +βT−t . (1.39)
Hence, by interchanging the order of integration using Fubini’s theorem we obtain
Eθ[IVt ,T |Ft ] = Eθ[∫ T
tvs d s|Ft
]=
∫ T
tEθ[vs |Ft ]d s
=∫ T
t
(vt e−κ(s−t ) +α(1−e−κ(s−t ))
)d s
= vt1
κ
(1−e−κ(T−t ))+α(T − t )− α
κ
(1−e−κ(T−t ))
= dT−t vt +bT−t .
(1.40)
To ease the notation we let δ,β,d and b denote the parameters corresponding to the
daily horizon where T − t = 1. Using the law of iterated expectations we obtain
Eθ[Eθ[IVt+1,t+2]Ft+1]
∣∣Ft ] = Eθ[IVt+1,t+2|Ft ] = Eθ[d vt+1 +b|Ft ]
= dEθ[vt+1|Ft ]+b = d(δvt +β)+b
= δd vt +dβ+b
= δ(Eθ[IVt ,t+1|Ft ]−b
)+dβ+b
= δEθ[IVt ,t+1|Ft ]+β,
40 CHAPTER 1. PBEFS FOR STOCHASTIC VOLATILITY MODELS WITH NOISY DATA
and by using the law of iterated expectations once more we get
Eθ[IVt+1,t+2|Gt ] = δEθ[IVt ,t+1|Gt ]+β. (1.41)
That is, the integrated volatility process satisfies the conditional moment condition
Eθ[IVt+1,t+2 −δIVt ,t+1 −β|Gt ] = 0, (1.42)
and hence also the unconditional moment restriction
Eθ[IVt+1,t+2 −δIVt ,t+1 −β] = 0. (1.43)
Since the moment condition (1.43) only depend on the drift parameters, κ and α,
we need an additional moment condition in order to construct a GMM estimator for
all three parameters. We will therefore also derive a moment condition based on the
conditional second order moment of integrated volatility.
From the SDE describing the evolution of spot volatility and the formula (1.40)
we can obtain an SDE for Eθ[IVt ,T |Ft ] by applying Itô’s lemma
dEθ[IVt ,T |Ft ] =(dT−tκ(α− vt )+ ∂dT−t
∂tvt + ∂bT−t
∂t
)d t +dT−tσ
pvt dWt
=−vt d t +dT−tσp
vt dWt .(1.44)
Now we fix the upper limit T and let the lower limit t be time-varying. Integrating
from t to T in (1.44) then yields
Eθ[IVT,T |FT ] = Eθ[IVt ,T |Ft ]−∫ T
tvs d s +
∫ T
tdT−sσ
pvs dWs ,
but since the left-hand side obviously equals zero, this implies
IVt ,T −Eθ[IVt ,T |Ft ] =∫ T
tdT−sσ
pvs dWs .
Using the Itô isometry we now obtain an expression for the conditional variance of
integrated volatility
Varθ[IVt ,T |Ft ] = Eθ[(IVt ,T −Eθ[IVt ,T |Ft ])2|Ft ]
= Eθ[(∫ T
tdT−sσ
pvs dWs
)2|Ft
]= Eθ
[∫ T
td 2
T−sσ2vs d s|Ft
]=
∫ T
td 2
T−sσ2Eθ[vs |Ft ]d s
=∫ T
td 2
T−sσ2[δs−t vt +βs−t ]d s
= DT−t vt +BT−t ,
(1.45)
1.6. APPENDIX 41
where
DT−t = σ2
κ2
[ 1
κ−2e−κ(T−t )(T − t )− 1
κe−2κ(T−t )],
BT−t = σ2
κ2
[α(T − t )
(1+2e−κ(T−t ))+ α
2κ
(e−κ(T−t ) +5
)(e−κ(T−t ) −1
)].
From Cox et al. (1985) and the obtained expression for Eθ[vT |Ft ] it follows that
Eθ[v2T |Ft ] = Varθ(vT |Ft )+ (
Eθ[vT |Ft ])2
= vtσ2
κ
(e−κ(T−t ) −e−2κ(T−t ))+ σ2α
2κ
(1−e−κ(T−t ))2 + (
δT−t vt +βT−t)2
=CT−t vt +ET−t +δ2T−t v2
t +β2T−t +2δT−tβT−t vt
= δ2T−t v2
t +(CT−t +2δT−tβT−t
)vt +
(ET−t +β2
T−t
).
Focusing on the one-day horizon and using (1.40) and (1.45) gives us
Eθ[IV 2t ,t+1|Ft ] = Varθ(IVt ,t+1|Ft )+ (
Eθ[IVt ,t+1|Ft ])2
= Dvt +B +d 2v2t +b2 +2dbvt
= d 2v2t + (D +2db)vt + (B +b2).
Now by using the law of iterated expectation and leading the arguments by one period
we get
Eθ[Eθ[IVt+1,t+2|Ft+1]|Ft
]= d 2Eθ[v2t+1|Ft ]+ (D +2db)Eθ[vt+1|Ft ]+ (B +b2),
and substituting in the obtained expressions for Eθ[vt+1|Ft ] and Eθ[v2t+1|Ft ] gives
us
Eθ[IV 2t+1,t+2|Ft ] =d 2[δ2v2
t + (C +2δβ)vt + (E +β2)]+ (D +2db)(δvt +β)+ (B +b2)
=δ2d 2v2t + [d 2(C +2δβ)+δ(D +2db)]vt
+ [d 2(E +β2)+β(D +2db)+ (B +b2)].
If we now reversely substitute out v2t by using our expression for Eθ[IV 2
t ,t+1|Ft ] we
get
Eθ[IV 2t+1,t+2|Ft ] =δ2[Eθ[IV 2
t ,t+1|Ft ]− (D +2db)vt − (B +b2)]
+ [d 2(C +2δβ)+δ(D +2db)]vt + [d 2(E +β2)+β(D +2db)+ (B +b2)].
Reversely substituting out vt by using Eθ[IVt ,t+1|Ft ] = d vt +b gives
Eθ[IV 2t+1,t+2|Ft ] =δ2Eθ[IV 2
t ,t+1|Ft ]
+ [d 2(C +2δβ)+ (δ−δ2)(D +2db)]1
d[Eθ[IVt ,t+1|Ft ]−b]
+ [d 2(E +β2)+β(D +2db)+ (1−δ2)(B +b2)],
42 CHAPTER 1. PBEFS FOR STOCHASTIC VOLATILITY MODELS WITH NOISY DATA
and rearranging the terms yields
Eθ[IV 2t+1,t+2|Ft ] =δ2Eθ[IV 2
t ,t+1|Ft ]+ 1
d[d 2(C +2δβ)+ (δ−δ2)(D +2db)]Eθ[IVt ,t+1|Ft ]
− b
a[d 2(C +2δβ)+ (δ−δ2)(D +2db)]
+ [d 2(E +β2)+β(D +2db)+ (1−δ2)(B +b2)].
Finally, using the law of iterated expectations we get
Eθ[IV 2t+1,t+2|Gt ] =δ2Eθ[IV 2
t ,t+1|Gt ]+ 1
d[d 2(C +2δβ)+ (δ−δ2)(D +2db)]Eθ[IVt ,t+1|Gt ]
− b
d[d 2(C +2δβ)+ (δ−δ2)(D +2db)]
+ [d 2(E +β2)+β(D +2db)+ (1−δ2)(B +b2)]
=HEθ[IV 2t ,t+1|Gt ]+ I Eθ[IVt ,t+1|Gt ]+ J .
That is, the integrated volatility process satisfy the conditional moment restriction
Eθ[IV 2t+1,t+2 −H(IV 2
t ,t+1)− I (IVt ,t+1)− J |Gt ] = 0, (1.46)
and hence also the unconditional moment restriction
Eθ[IV 2t+1,t+2 −H(IV 2
t ,t+1)− I (IVt ,t+1)− J ] = 0. (1.47)
Derivation of the Fourth Moment Condition in the Presence of MMS Noise
We now consider computing
Eθ[(RV MMS
t+1,t+2)2 −H(RV MMSt ,t+1 )2 − I (RV MMS
t ,t+1 )− J ].
The first step is to compute Eθ[(RV MMS
t ,t+1 )2]. By recalling the decomposition of realized
variance in the presence of MMS noise
RV MMSt ,t+1 = RV ∗
t ,t+1 +m∑
i=1ε2
i ,t +2m∑
i=1εi ,t Y ∗
i ,t , (1.48)
we se that Eθ[RV MMSt ,t+1 ] = Eθ[RV ∗
t ,t+1]+2mω2. The three terms in (1.48) are uncorre-
lated so the variance of RV MMSt ,t+1 equals
Varθ(RV MMS
t ,t+1
)= Varθ(RV ∗
t ,t+1
)+Varθ( m∑
i=1ε2
i ,t
)+4Varθ( m∑
i=1εi ,t Y ∗
i ,t
).
Due to the M A(1) structure and distribution of the noise process, ε∼ N (0,2ω2), we
find that the second term equals
Varθ( m∑
i=1ε2
i ,t
)= m∑i=1
Varθ(ε2i ,t )+
m∑i , j=1;i 6= j
Covθ(ε2i ,t ,ε2
j ,t )
= m(8ω4)+2(m −1)(2ω4) = 12mω4 −4ω4. (1.49)
1.6. APPENDIX 43
The last term in the expression for the variance consists of uncorrelated term and
since the efficient return process and the noise process are independent we find
4Varθ( m∑
i=1εi ,t Y ∗
i ,t
)= 4m∑
i=1Varθ(εi ,t Y ∗
i ,t ) = 8ω2Eθ[RV ∗t ,t+1]. (1.50)
This means that
Varθ(RV MMS
t ,t+1
)= Varθ(RV ∗
t ,t+1
)+12mω4 −4ω4 +8ω2[RV ∗t ,t+1]
and we get the following expression for Eθ[(RV MMS
t ,t+1 )2],
Eθ[(RV MMS
t ,t+1 )2]= Eθ[(RV ∗
t ,t+1)2]+4m2ω4 +4mω2[RV ∗t ,t+1]+12mω4 −4ω4 +8ω2Eθ[RV ∗
t ,t+1].
(1.51)
Using the old moment condition
Eθ[(RV ∗
t+1,t+2)2 −H(RV ∗t ,t+1)2 − I (RV ∗
t ,t+1)− J]≈ 0,
where equality holds if we replace RV ∗ with IV , and the above derivations we find
that
Eθ[(RV MMS
t+1,t+2)2 −H(RV MMSt ,t+1 )2 − I (RV MMS
t ,t+1 )− J −K −L]≈ 0, (1.52)
K = (1−H)(4m2ω4 +4mω2α+12mω4 −4mω4 +8ω2α),
L =−2mω2I ,
since the expectation of RV ∗ is α.
44 CHAPTER 1. PBEFS FOR STOCHASTIC VOLATILITY MODELS WITH NOISY DATA
1.7 References
Andersen, T. G., Bollerslev, T., 1997. Intraday periodicity and volatility persistence in
financial markets. Journal of Empirical Finance 4, 115–158.
Andersen, T. G., Davis, R., Kreiss, J.-P., Mikosch, T., 2009. Handbook of Financial Time
Series. Springer.
Aït-Sahalia, Y., Kimmel, R., 2007. Maximum likelihood estimation of stochastic volatil-
ity models. Journal of Financial Economics 83, 413–452.
Barndorff-Nielsen, O. E., Hansen, P. R., Lunde, A., Shephard, N., 2008a. Designing
realized kernels to meausure the ex post variantion of equity prices in the presence
of noise. Econometrica 76, 1481–1536.
Barndorff-Nielsen, O. E., Hansen, P. R., Lunde, A., Shephard, N., 2008b. Realised
kernels in practice: Trades and quotes. Econometrics Journal 04, 1–32.
Barndorff-Nielsen, O. E., Shephard, N., 2001. Non-Gaussian OU-based models and
some of their uses in financial economics (with dicussion). Journal of the Royal
Statistical Society B 63, 167–241.
Barndorff-Nielsen, O. E., Shephard, N., 2002. Econometric analysis of realized volatil-
ity and its use in estimating stochastic volatility models. Journal of the Royal Statis-
tical Society B 64, 253–280.
Bollerslev, T., Zhou, H., 2002. Estimating stochastic volatility diffusions using condi-
tional moments of integrated volatility. Journal of Econometrics 109, 33–65.
Bradley, R. C., 2005. Basic properties of strong mixing conditions: A survey and some
open questions. Probability Surveys 2, 107–144.
Brockwell, P. J., 2001. Lévy-driven CARMA processes. Annals of the Instute of Statistical
Mathematics 53, 113–124.
Brownless, C. T., Gallo, G. M., 2006. Financial econometric analysis at ultra-high
frequency: Data handling concerns. Computational Statistics & Data Analysis 51,
2232–2245.
Corradi, V., Distaso, W., 2006. Semi-parametric comparison of stochastic volatility
models using realized measures. Review of Economic Studies 73, 635–667.
Cox, J. C., Ingersoll, J. E., Ross, S. A., 1985. A theory of the term structure of interest
rates. Econometrica 53, 385–408.
Dacorogna, M., Müller, U., Nagler, R., Olsen, R., Pictet, O., 1993. A geographical model
for the daily and weekly seasonal volatility in the foreign exchange market. Journal
of International Money and Finance 12, 413–438.
1.7. REFERENCES 45
Eraker, B., 2001. Markov Chain Monte Carlo analysis of diffusion models with appli-
cation to finance. Journal of Business and Economic Statistics 19-2, 177–191.
Falkenberry, T. N., 2001. High frequency data filtering. Technical report, Tick Data.
Forman, J. L., Sørensen, M., 2008. The Pearson diffusions: A class of statistically
tractable diffusion processes. Scandinavian Journal of Statistics 35, 438–465.
Gallant, A. R., Tauchen, G., 1996. Which moments to match? Econometric Theory 12,
657–681.
Gourieroux, C., Monfort, A., Renault, E., 1993. Indirect inference. Journal of Applied
Econometrics 8, S85–S118.
Hall, P., Horowitz, J., Jing, B., 1995. On blocking rules for the bootstrap with dependent
data. Biometrika 82, 561–574.
Hansen, P. R., Lunde, A., 2006. Realized variance and market microstructure noise.
Journal of Business and Economic Statistics 24, 127–218.
Heston, S. L., 1993. A closed-form solution for options with stochastic volatility with
applications to bond and currency options. Review of Financial Studies 6, 327–343.
Heyde, C. C., 1997. Quasi-Likelihood and its Application. Springer-Verlag, New York.
Jacod, J., Li, Y., Mykland, P., Podolskij, M., Vetter, M., 2009. Microstructure noise in
the continuous case: The pre-averaging approach. Stochastic Processes and Their
Applications 119, 2249–2276.
Karlin, S., Taylor, H. M., 1975. A First Course in Stochastic Processes. Academic Press,
New York.
Lahiri, S. N., 1999. Theoretical comparison of block bootstrap methods. Annals of
Statistics 27, 386–404.
Newey, W. K., West, K. D., 1987. A simple positive semi-definite, heteroskedasticity
and autocorrelation consistent covariance matrix. Econometrica 55, 703–708.
Nolsøe, K., Nielsen, J. N., Madsen, H., 2000. Prediction-based estimating functions for
diffusion processes with measurement noise. Technical reports no. 10, Informatics
and mathematical modelling, Technical University of Denmark.
Sørensen, M., 2000. Prediction-based estimating functions. Econometrics Journal 3,
123–147.
Sørensen, M., 2011. Prediction-based estimating functions: Review and new develop-
ments. Brazilian Journal of Probability and Statistics 25, 362–391.
46 CHAPTER 1. PBEFS FOR STOCHASTIC VOLATILITY MODELS WITH NOISY DATA
Todorov, V., 2009. Estimation of continuous-time stochastic volatility models with
jumps using high-frequency data. Journal of Econometrics 148, 131–148.
Todorov, V., Tauchen, G., 2006. Simulation methods for Lévy-driven CARMA stochastic
volatility models. Journal of Business and Economic Statistics 24, 455–469.
CH
AP
TE
R
2ON ESTIMATION METHODS FOR NON-GAUSSIAN
ORNSTEIN-UHLENBECK PROCESSES
A MONTE CARLO STUDY
Anne Floor Brix
Aarhus University and CREATES
Abstract
Estimators based on quadratic martingale estimating functions are derived for the
parameters governing the non-Gaussian Ornstein-Uhlenbeck process. The perfor-
mance of the estimators are analyzed in a Monte Carlo study and compared to the
performance of the approximate maximum likelihood estimator from Valdivieso et al.
(2009), that are based on fast Fourier transforms. Finite activity as well as infinite
activity non-Gaussian Ornstein-Uhlenbeck processes are considered. The perfor-
mance of the estimators are investigated in different scenarios, corresponding to
the two different types of processes used for modeling commodity spot prices - the
“base-signal” and “spike” process. Different ways of obtaining initial values for the
estimation procedures are also analyzed.
47
48 CHAPTER 2. ON ESTIMATION METHODS FOR NON-GAUSSIAN OU PROCESSES
2.1 Introduction
In the vast literature on energy commodities, Ornstein-Uhlenbeck (OU) processes are
often used as building blocks for constructing models. Since the spot prices in these
markets are determined by supply and demand equilibrium they are, in contrast to
stock prices, stationary and mean-revert to a, possibly stochastic, seasonally varying
mean-level. The usage of Gaussian Ornstein-Uhlenbeck processes for modeling
energy spot prices goes back to the well-known Schwartz model from Schwartz
(1997). Another stylized feature of energy markets, that needs to be taken into account
when building models, is price spikes, which are particularly important for electricity
modeling. Spikes in electricity spot prices are caused by large imbalances in supply
and demand. Because of the exponentially increasing marginal cost structure of
production1 and inelastic demand, these imbalances result in sudden large jumps in
the spot price. The mean-reversion is typically very strong during these peak periods
and the prices rapidly revert back to normal evolution, leaving a spike in the spot
price series. Models based on non-Gaussian OU processes is a popular way of dealing
with these price spikes, see for instance Benth and Saltyte Benth (2004), Benth et al.
(2007), Meyer-Brandis and Tankov (2008), Klüppelberg, Meyer-Brandis, and Schmidt
(2010) and Benth and Vos (2013). In these papers the spot prices are often modeled as
a superposition of OU processes, one OU process for modeling the normal variations
(often referred to as the base-signal) and one used for capturing the price spikes. In
for instance Benth et al. (2007) and Benth, Kiesel, and Nazarova (2012) the authors
also use a non-Gaussian OU process for the base-signal part of electricity spot prices.
Statistical inference for superpositions of OU processes is complicated by the
resulting model not being Markovian. Most (if not all) non-Bayesian parametric
approaches found in the literature on commodity modeling deal with this problem
by splitting the observations into a base-component and a spike-component using
various filtering techniques, see Meyer-Brandis and Tankov (2008), Klüppelberg et al.
(2010) and Benth et al. (2012). Parameter estimation is then, subsequently, carried
out on each of the filtered OU processes. This will serve as the starting point for the
present chapter, which is an investigation of various estimation methods used for
conducting parametric inference for non-Gaussian OU processes. More specifically
we will consider observations from D-OU processes, where D refers to the marginal
distribution of the observations. As for the choices of D, we consider both the finite
activity Γ-OU process and the infinite activity Inverse Gaussian (IG) OU process.
In contrast to the Gaussian case, maximum likelihood estimation is not directly
feasible since no analytical expression for the transition density is available. However,
the characteristic function is analytically tractable and can be inverted using Fourier
techniques, making approximate likelihood estimation possible. This idea was sug-
1Nuclear power and renewable energy often have low variable cost and are used to cover the basedemand. A sudden big increase in demand are often covered by burning fuel, which have a very highvariable cost.
2.1. INTRODUCTION 49
gested and carried out in Valdivieso et al. (2009), where the finite sample performance
is also investigated for non-Gaussian OU processes with marginal distributions simi-
lar to the ones used for modeling the normal variations in commodity spot prices.
This chapter extends the Monte Carlo study from Valdivieso et al. (2009) to also in-
clude parameter configurations resulting in marginal distributions more suitable for
modeling the spike behavior, for instance by having larger but fewer jumps in the
Γ-OU case. Instead of approximating the likelihood function, the estimation method
based on martingale estimating functions (MGEFs) aims at approximating the score
function. Since the non-Gaussian OU process is a Markov process and conditional
moments are analytical computable, MGEFs are a natural choice of estimating func-
tion. For an introduction to MGEFs an non-exhaustive list of references includes
Bibby and Sørensen (1995), Kessler (1995), Sørensen (1999) and Bibby, Jacobsen, and
Sørensen (2002).
The class of MGEFs considered in this chapter is the quadratic MGEFs which
are based on the conditional mean and variance of the OU process. The optimal
quadratic MGEF and simple benchmark with a similar structure are derived for the
non-Gaussian OU process and their finite sample performances are studied and
compared to the maximum likelihood method based on fast Fourier transforms from
Valdivieso et al. (2009). MGEFs are usually derived and studied in the context of
(jump) diffusions. A suboptimal quadratic MGEF is derived for non-Gaussian OU
processes in Hubalek and Posedel (2013) in the context of the stochastic volatility
model from Barndorff-Nielsen and Shephard (2001b), under the assumption that
both the price process and volatility process is observable. The quadratic MGEF from
Hubalek and Posedel (2013) admits an explicit estimator and the authors prove both
consistency and asymptotic normality of this estimator. However, to the best of my
knowledge, it is the first time that optimal quadratic MGEFs are explicitly derived for
non-Gaussian OU processes, making a study of the finite sample performances of
the resulting estimator possible. The optimal quadratic MGEF does not result in an
explicit expression for the estimator, so numerical routines must be applied.
As natural benchmarks for the two estimation approaches under consideration,
we consider straightforward estimation methods often applied in the aforementioned
literature. In particular, we investigate which of these methods are best suited for
generating initial values to be used in the two estimation approaches, and investigate
how big, if any, the gain is from using the two more complex methods. Both finite
and infinite activity OU processes are considered in two different parameter settings,
(the base-signal and spike scenario). Furthermore, the finite sample performances
are also investigated for two different levels of the mean-reversion parameter, which
directly translates into two different observation frequencies.
The chapter is organized as follows: in the following section the non-Gaussian
OU process and some of its basic properties are reviewed. In section 3 the method
from Valdivieso et al. (2009) is reviewed and an optimal MGEF for the non-Gaussian
50 CHAPTER 2. ON ESTIMATION METHODS FOR NON-GAUSSIAN OU PROCESSES
OU process is derived. Section 4 contains the extensive Monte Carlo study of the
finite sample performances of the various estimators under consideration. In section
5 possible extensions of the estimation procedures to other non-Gaussian OU based
models are discussed. The final section concludes on the findings.
2.2 Non-Gaussian Ornstein-Uhlenbeck Processes
The non-Gaussian Ornstein-Uhlenbeck (OU) process, X , is given as the stationary
solution of the following stochastic differential equation driven by the Lévy process
Z :
dX (t ) =−λX (t )dt +dZ (λt ), λ> 0, (2.1)
where X (0) is assumed to be independent of Z and have marginal distribution D.
The Lévy process Z is denoted the background driving Lévy process (BPLP) and
will throughout the chapter be assumed to be a subordinator2. This choice ensures
positivity of X and also implies that X will have bounded variation. Expect for the
derivation of a high-frequency estimator for the mean-reversion parameter, all the
estimators in the paper are also valid with Z being a general Lévy process. The time-
change of the BDLP, Z , implies that the marginal distribution of X will not depend
on λ.3 Since E(Z (λt )) =λtE(Z (1)), we can rewrite the SDE in equation (2.1) as
dX (t ) =−λ(X (t )−E(Z (1)))dt +d(Z (λt )−E(Z (λt ))).
Hence, it follows that X (t ) mean-reverts to the long run mean of the BDLP, E(Z (1)),
at rate λ. The non-Gaussian OU process moves up when the BDLP Z jumps up and
when Z is of finite activity X (t) decays exponentially at rate λ between the jumps.
Note that, due to the time-change, the mean-reversion parameter λ also determines
the rate at which jumps occur. When E(log(1+∣∣Z (1)∣∣)) < ∞ the SDE (2.1) has the
following stationary solution
X (t ) = e−λt X (0)+∫ t
0e−λ(t−u) dZ (λu). (2.2)
From the solution it is clear that X (t) will be non-negative when the background
driving Lévy process (BDLP) Z is a subordinator and X (0) ≥ 0. By using (2.2) and
letting ∆ denote the length of the time interval between observations we can derive a
recursive relationship for the discretized non-Gaussian OU process
X (i∆) = e−λ∆X ((i −1)∆)+e−λi∆∫ λi∆
λ(i−1)∆e s dZ (s)
d= e−λ∆X ((i −1)∆)+e−λ∆∫ λ∆
0e s dZ (s), (2.3)
2Lévy process with stationary, independent and non-negative increments.3See Barndorff-Nielsen and Shephard (2001b).
2.2. NON-GAUSSIAN ORNSTEIN-UHLENBECK PROCESSES 51
where the equality in distribution follows from Proposition 3.1 in Valdivieso et al.
(2009). Now introducing the notation E(X (t )) = ξ and Var(X (t )) =ω2 it follows from
(2.3) that the non-Gaussian OU process can be interpreted as a continuous time
analog of the AR(1) model since we have
X (i∆) = e−λ∆X ((i −1)∆)+εi , where εi ∼ i .i .d . (2.4)
with
E(ε) = ξ(1−e−λ∆) and Var(ε) =ω2(1−e−2λ∆).
In Barndorff-Nielsen and Shephard (2001b) the relationship between the cumu-
lant generating function of the non-Gaussian OU process and the BDLP is established.
If we let k(θ) := log(E [exp(−θZ (1))]
)and k(θ) := log
(E [exp(−θX (t ))]
), then it follows
from Barndorff-Nielsen and Shephard (2001b) that k(θ) = θk ′(θ), and we furthermore
get κm = mκm , where κm and κm denote the cumulants of Z (1) and X (t ) respectively.
In particular this yields E (X (t )) = E (Z (1)) and Var(X (t )) = 12 Var(Z (1)). The autocorre-
lation function of the stationary non-Gaussian OU process is given by r (u) = e−λ|u|.When building non-Gaussian OU processes, there are two different approaches.
One approach is to specify the marginal distribution, D, of the process X , in which
case X is called a D-OU process. In the other approach, the non-Gaussian OU process
is constructed by specifying the BDLP. For constraints on valid BDLPs see Barndorff-
Nielsen and Shephard (2001b). We will take the first approach and consider two
different choices of D, both from the class of generalized inverse Gaussian (GIG)
marginal laws. The following two special cases of the GIG class, which are commonly
used in the literature, will be considered.
• The Γ-OU process:
In this case X ∼ Γ(ν,α) with shape parameter ν> 0 and scale parameter α> 0.
The density of the non-Gaussian OU process X is given by
f (x,ν,α) = 1
Γ(ν)ανxν−1 exp(−x/α), ∀x > 0,
where Γ(·) denotes the gamma function.
• The IG-OU process:
In this case X ∼ IG(δ,γ) with δ> 0 and γ≥ 0. The density of X is given by
f (x,δ,γ) = δp2π
exp(δγ)x−3/2 exp(− 1
2(δ2x−1 +γ2x)
), ∀x > 0.
In the literature the Inverse Gaussian distribution has been parametrized in several
different ways. Here the parametrization from Barndorff-Nielsen (1998) is followed.
The Γ(ν,α)-OU process is of finite activity, meaning that almost all paths have a finite
52 CHAPTER 2. ON ESTIMATION METHODS FOR NON-GAUSSIAN OU PROCESSES
number of jumps on any finite time interval. The IG(δ,γ)-OU process displays an
infinite number of jumps on any finite time interval and is therefore of infinite activity.
The activity of the OU process is determined by the behavior of the small jumps. More
precisely, X , will be of finite activity if∫ 1
0+ W (dx) <∞, where W is the Lévy measure
of the Lévy-Khintchine representation for Z (1).
One of the questions this chapter aims to answer is whether the different jump
characters of the processes will influence the finite sample performances of the
estimation methods under consideration.
2.3 Estimation Methods
In this section the two main estimation methods under consideration will be pre-
sented. The Gaussian OU process can be estimated using the maximum likelihood
(ML) method. Although the model is still Markovian in the non-Gaussian case, we
no longer have an analytical expression for the density of the i.i.d. Lévy functionals,
ε, from (2.4). Furthermore, for finite activity OU processes the Lévy functionals are
mixed random variables that only have a density conditional on the presence of
jumps. One way of circumventing these challenges and performing approximate
likelihood estimation was suggested in Valdivieso et al. (2009). The method from Val-
divieso et al. (2009) uses the fast Fourier transform (FFT) to obtain an approximation
of the likelihood function by inverting the characteristic function of the Lévy func-
tionals. The estimation method from Valdivieso et al. (2009) is presented in the next
subsection and should, if well implemented, give parameter estimates close to the
infeasible ML estimates. In subsection 2.3.2 the estimation method based on martin-
gale estimating functions (MGEFs) is presented. This method aims at approximating
the score function instead of the likelihood function. In contrast to the method from
Valdivieso et al. (2009), where the link between the characteristic function and density
was exploited, the approximation is for the MGEF based method a bit ad-hoc, simply
approximating the unknown score function with martingales of a certain structure.
The optimal MGEF within a given class of MGEFs is the one closest to the score
function in an L 2 sense. The choice of MGEFs used to approximate the score will
obviously influence the efficiency of the resulting estimator. The MGEFs considered
in this chapter are the so-called quadratic MGEFs. The optimal MGEF within this
class will be derived in subsection 2.3.2. The estimation method based on quadratic
MGEFs relies only on computation of conditional moments of the OU process and
does not require knowledge of the entire conditional density function. Therefore, the
MGEF based procedure is not affected by the finite activity OU processes being a
mixed random variable.
2.3. ESTIMATION METHODS 53
2.3.1 Maximum Likelihood Estimation using the Fast Fourier Transform
Assume we have observations x0, x∆, . . . , xn∆ from a discretely sampled D-OU process
X , sampled at equidistant time intervals of length ∆. We will let time be measured
in days such that daily sampling corresponds to ∆= 1. Due to Markovianity and the
AR(1) structure of X , the likelihood function of the sample is given by
L (θ) = fX (0)(x0)n∏
i=1fX (i∆)|X ((i−1)∆)=x(i−1)∆ (xi∆),
where fX (i∆)|X ((i−1)∆)=x(i−1)∆) denotes the conditional density of X (i∆) given X ((i−1)∆)
takes the value x(i−1)∆. Unfortunately, this conditional density is not available in
closed form when X is a non-Gaussian OU process. One way of circumventing this
problem is to approximate the conditional density as done in Valdivieso et al. (2009).
Recall the recursive formula
X (i∆)d= e−λ∆X ((i −1)∆)+e−λ∆Z∗(∆),
where Z∗(∆) = ∫ λ∆0 eλs dZ (s). From this it follows that the conditional cumulative
distribution function evaluated at xi∆ is given by
P (X (i∆) ≤ xi∆|X ((i −1)∆) = x(i−1)∆) = P (Z∗(∆) ≤ eλ∆xi∆−x(i−1)∆).
If Z∗(∆) is a continuous random variable we can differentiate w.r.t xi∆ and obtain the
link between the density functions
fX (i∆)|X ((i−1)∆)=x(i−1)∆ (xi∆) = eλ∆ fZ∗(∆)(eλ∆xi∆−x(i−1)∆).
The idea is then to compute the characteristic function,φZ∗(∆), for the Lévy functional
Z∗(∆) and use the inversion formula
fZ∗(∆)(z) = 1
π
∫ ∞
0e−i uzφZ∗(∆)(u)du (2.5)
and the discrete fast Fourier transform (FFT) to evaluate the density function of
Z∗(∆). Using the Proposition below from Barndorff-Nielsen (1998) the cumulant
characteristic function, and hence also the characteristic function of Z∗(∆), can be
computed in terms of the cumulant characteristic function for the BDLP, Z , and a
process with the same marginal distribution D as the OU process.
Proposition 2. For all ∆> 0 and v ∈Rwe have that
CZ∗(∆)(v) = log(E(exp(i v Z∗(∆)))) =λ∫ ∆
0CZ (1)(v exp(λs))ds,
where CZ (1) = v dCDdv (v).
Proof. See Barndorff-Nielsen (1998) for a proof.
54 CHAPTER 2. ON ESTIMATION METHODS FOR NON-GAUSSIAN OU PROCESSES
In the Monte Carlo study the method will be applied to the Γ(ν,α)-OU process
and the IG(δ,γ)-OU process. For the Γ(ν,α)-OU process, the corresponding Lévy
functional Z∗(∆) will not be a continuous random variable but a mixed random
variable since Z∗(∆) equals 0 if there are no jumps on the interval and this happens
with probability p = e−νλ∆. Again following Valdivieso et al. (2009), the conditional
density of the OU process will be slightly modified and given by
fX (i∆)|X ((i−1)∆)=x(i−1)∆ (xi∆) =p if eλ∆xi∆−x(i−1)∆ = 0
eλ∆ f JZ∗(∆)(eλ∆xi∆−x(i−1)∆) if eλ∆xi∆−x(i−1)∆ > 0,
(2.6)
where f JZ∗(∆) denotes the density of Z∗(∆) conditioned to the presence of jumps.
Since the characteristic function of Z∗(∆) equals one when there are no jumps,
f JZ∗(∆) can be evaluated by computing the Fourier transform of φJ
Z∗(∆) =φZ∗(∆)−p
1−p . The
characteristic function of a Γ(ν,α) distributed random variable X is given by
φX (v) = ( 1/α
1/α− i v
)ν,
and using Proposition 2 we find CD (v) = log(φX (v)) = ν(
log(1/α)− log(1/α−i v))
and
CZ (1)(v) = νvi
1/α− i v= ν
( 1/α
1/α− i v−1
).
The characteristic function for the Lévy functional can therefore be found to equal
φZ∗(∆)(v) = exp(λ
∫ ∆
0ν( 1/α
1/α− i v exp(λs)ds
))= exp
(ν log
( 1/α− i v
1/α− i v exp(λ∆)
))=
( 1/α− i v
1/α− i v exp(λ∆)
)ν.
In the IG(δ,γ)-OU case the characteristic function φX (v) is given by
φX (v) = exp(δγ−δ√γ2 −2i v),
so, again using Proposition 2 we find
CZ (1)(v) = δvi√γ2 −2i v
,
and
φZ∗(∆)(v) = exp(λ
∫ ∆
0
δi v exp(λs)
γ2 −2i v exp(λs)ds
)= exp
(δ[−
√γ2 −2i v exp(λs))
]∆0
)= exp
(δ(√
γ2 −2i v −√γ2 −2i v exp(λ∆)
)).
2.3. ESTIMATION METHODS 55
Given the characteristic functions, FFT can now be used to approximate the likelihood
function.
Discrete Fast Fourier Transform
The discrete FFT is a numerical method used for evaluating the integral in (2.5) at
each point in the vector x = x0, x1, . . . , xN−1. The function fft in MATLAB operates
the following sum for j = 1. . . , N
X ( j ) =N−1∑m=0
x(m)ωm( j−1)N , ωN = e−2πi /N . (2.7)
To see how this fits with the inversion formula, consider the discretized version of the
integral using the trapezoid rule
fZ∗(∆)(z) = 1
π
∫ ∞
0e−i uzφZ∗(∆)(u)du ≈ 1
π
N−1∑m=0
δme−i um zφZ∗(∆)(um)∆u ,
where δ0 = 1/2 and δm = 1 when m 6= 0. Now let η=∆u and let um = ηm. If we let z j =−b +ζ( j −1) for j = 1, . . . , N with ζ= 2π/ηN and b ∈R, we obtain the approximation
fZ∗(∆)(z j ) ≈ 1
π
N−1∑m=0
δme−iηm(−b+ 2πηN ( j−1))
φZ∗(∆)(um)η
=N−1∑m=0
x(m)e−2πiN m( j−1), with x(m) = 1
πδme iηmbφZ∗(∆)(ηm)η,
which is on the FFT form from (2.7) and can be used to evaluate the desired density
fZ∗(∆) at the grid points z1, . . . , zN . Note that to center the grid points around 0, let
b = ζN2 . When constructing the likelihood function, linear interpolation will be used
to evaluate the density function at points that do not coincide with the grid points.
2.3.2 Martingale Estimating Functions
In this subsection the theory of martingale estimating functions (MGEFs) is briefly
reviewed and the optimal quadratic MGEF for the non-Gaussian OU process is de-
rived. Most of the literature on MGEFs is developed for diffusion processes, see for
example Bibby and Sørensen (1995) and Bibby et al. (2002), but as also noted in
Sørensen (1997), Kessler (1995) and Sørensen (1999) most of the results extend to
general Markov processes and more general MGEFs than those considered here.
The non-Gaussian OU process is a Markov process but, as was already noted
in the previous subsection, the transition density is unknown and exact maximum
likelihood estimation is therefore infeasible. The main idea underlying the use of
MGEFs is to try to approximate the unknown score function, which in itself is a MGEF.
Asymptotic results for MGEFs utilize the well-developed martingale limit theory and
56 CHAPTER 2. ON ESTIMATION METHODS FOR NON-GAUSSIAN OU PROCESSES
the reader is referred to Bibby and Sørensen (1995), Kessler (1995) and Sørensen
(1999) for conditions ensuring consistency and asymptotic normality of the resulting
estimators.
An estimating function, Gn(θ), is a function that depends on the n data points
and on the p-dimensional vector θ ∈Θ⊆Rp , that we wish to estimate. An estimator
is then obtained by solving the p equations Gn(θ) = 0 w.r.t. θ. In our case θ is 3-
dimensional and consist of the mean-reversion parameter λ and the two parameters
governing the marginal distribution D. An MGEF is an estimating function satisfying
Eθ(Gn(θ)|Fn−1
)=Gn−1(θ), n = 1,2. . . ,
where Fn−1 is the σ-field generated by the past observations up til time (n −1)∆ and
G0 = 0. If we let y 7→ p(s, x, y ;θ) denote the transition density, i.e. the conditional
density of X t+s given X t = x, then the score function takes the form
Un(θ) =n∑
i=1∂θ log p(∆, X(i−1)∆, Xi∆;θ).
Under mild regularity conditions allowing for the interchange of differentiation and
integration it easily follows that the score function is an MGEF, see for instance
Barndorff-Nielsen and Sørensen (1994). The score function will be approximated
using MGEFs of the form
Gn(θ) =n∑
i=1g (∆, X(i−1)∆, Xi∆;θ) =
n∑i=1
a(∆, X(i−1)∆;θ)h(∆, X(i−1)∆, Xi∆;θ), (2.8)
where h = (h1, . . . ,hN )′ and for each j , the function h j satisfies the following condition:∫S h j (∆, x, y ;θ)p(∆, x, y ;θ) = 0 for all x in the state space, S, of X and for all θ ∈Θ. The
p ×N weight matrix a(x,θ) is a function of x such that (2.8) is Pθ integrable. Gn(θ)
defined in this way is clearly a MGEF and in particular it is an unbiased estimating
function, i.e. Eθ(Gn(θ)
)= 0. Given the real valued functions, h j ’s, the weight matrix
can be chosen in an optimal way using the theory of optimal estimating functions.
The optimal weight matrix will result in the estimating function that approximates
the score function the best, in a mean square sense, within the class of estimating
functions of the form (2.8). Results stating the optimal weight matrix in a setting
containing non-Gaussian OU processes can for be found in Theorem 3.1. of Sørensen
(1997) and in the more general setting of Markov chains in Kessler (1995). How to
choose the approximating functions, h j ’s, is however much more of an art than
science. The choice of h j ’s will affect how well the score function is approximated
and hence also affect the efficiency of the corresponding estimator.
In our application to non-Gaussian OU processes we will focus on the quadratic
MGEFs, which have proven to work well in the diffusion setting (see for instance
Bibby and Sørensen (1995)). The quadratic MGEF is of the form (2.8) with N = 2 and
2.3. ESTIMATION METHODS 57
with h1 and h2 given by
h1(∆, x, y ;θ) = y −F (∆, x;θ),
h2(∆, x, y ;θ) = (y −F (∆, x;θ)
)2 −β(∆, x;θ),
where F and β denote the conditional mean and variance of the transition density,
F (∆, x;θ) = Eθ(X∆|X0 = x) and β(∆, x;θ) = Varθ(X∆|X0 = x). With no a priori insight
on how to chose the approximating function it seems like a good choice to consider
quadratic MGEFs, since these will ensure that the empirical first and second order
conditional moments match the theoretical ones. One could also consider increas-
ing N and include functions of higher order conditional moments or trigonomet-
ric/exponential moments of X . However, this would further complicate the derivation
of the optimal weight matrix and as already mentioned the quadratic MGEFs seems
like a natural starting point for investigating the potential of MGEFs for pure jump
OU processes.
In most settings F and β will have to be computed using simulations, but for the
non-Gaussian OU process we can find explicit analytic expressions for them as well
as for the optimal weight matrix a∗(∆, X(i−1)∆;θ). The optimal weight matrix is found
computing the L 2 projection of ∂θ log p(∆, X(i−1)∆, Xi∆;θ) onto the space spanned
by the two functions h1 and h2. As a result (see Kessler (1995)) the optimal weight is
given by
a∗(∆, X(i−1)∆;θ) =−Eθ(∂θ′h(∆, X(i−1)∆, Xi∆;θ)|Fi−1)′Vh(X(i−1)∆, Xi∆;θ)−1, where
Vh(∆, X(i−1)∆, Xi∆;θ) = Eθ(h(∆, X(i−1)∆, Xi∆;θ)h(∆, X(i−1)∆, Xi∆;θ)′|Fi−1).
With our choice of h we have
Vh(∆, x;θ) = Eθ(h(∆, X(i−1)∆, Xi∆;θ)h(∆, X(i−1)∆, Xi∆;θ)′|X(i−1)∆ = x)
=(β(∆, x;θ) η(∆, x;θ)
η(∆, x;θ) ψ(∆, x;θ).
)where β(∆, x,θ) as before denotes the conditional variance of Xi∆ given X(i−1)∆ = x
and
η(∆, x;θ) = Eθ((Xi∆−F (∆, X(i−1)∆;θ))3|X(i−1)∆ = x
),
ψ(∆, x;θ) = Eθ((Xi∆−F (∆, X(i−1)∆;θ))4|X(i−1)∆ = x
)−β(∆, x;θ)2.
To ease notation, we will suppress ∆ in the notation and for instance use F (x;θ)
instead of F (∆, x;θ). The optimal weight matrix (after multiplying by -1) will then
have columns given by
a∗1 (x;θ) = ∂θβ(x;θ)η(x;θ)−∂θF (x;θ)ψ(x;θ)
β(x;θ)ψ(x;θ)−η(x;θ)2
a∗2 (x;θ) = ∂θF (x;θ)η(x;θ)−∂θβ(x;θ)β(x;θ)
β(x;θ)ψ(x;θ)−η(x;θ)2 .(2.9)
58 CHAPTER 2. ON ESTIMATION METHODS FOR NON-GAUSSIAN OU PROCESSES
Recall, that for the non-Gaussian OU processes the discretized process has the
AR(1) structure from (2.4). We therefore get
F (x;θ) = e−λ∆x +Eθ(εi ) = e−λ∆x +ξ(1−e−λ∆),
β(x;θ) = Varθ(Xi∆|X(i−1)∆ = x) = Varθ(εi ) =ω2(1−e−2λ∆),
η(x;θ) = Eθ([εi −Eθ(εi )]3|X(i−1)∆ = x) = Eθ([εi −Eθ(εi )]3),
ψ(x;θ) = Eθ([εi −Eθ(εi )]4|X(i−1)∆ = x)−β(x;θ)2 = Eθ([εi −Eθ(εi )]4)−β(x;θ)2,
since εi is independent of X(i−1)∆. Note that for OU processes,β,η andψ only depend
on θ and ∆ and not on x. This also means that Vh only depends on θ and ∆. Since
we do not know the distribution of ε we have to compute the moments entering
η(x;θ) and ψ(x;θ) in a recursive way. Defining e j := Eθ(ε j ) and letting the marginal
moments of the OU process be denoted by m j := Eθ(X j ), then we can use (2.4) to
obtain the following recursive relationship
e1 = Eθ(ε) = (1−e−λ∆)m1,
e2 = Eθ(ε2) = (1−e−2λ∆)m2 −2e−λ∆m1e1,
e3 = Eθ(ε3) = (1−e−3λ∆)m3 −3e−2λ∆m2e1 −3e−λ∆m1e2,
e4 = Eθ(ε4) = (1−e−4λ∆)m4 −4e−3λ∆m3e1 −6e−2λ∆m2e2 −4e−λ∆m1e3.
Finally, η(x;θ) and ψ(x;θ) can be computed as
η(x;θ) = e3 −3e2e1 +2e31,
ψ(x;θ) = e4 −4e3e1 +8e2e21 −4e4
1 −e22.
Now we only need to compute the partial derivatives ∂θF (x;θ) and ∂θβ(x;θ) before
the optimal quadratic MGEF can be constructed. The 3-dimensional parameter
vector is given by θ = (λ,θ2,θ3), where θ2 and θ3 are the two parameters governing
the marginal distribution of the D-OU process. The partial derivatives we need are
given by
∂θF (x;θ)′ = (−∆e−λ∆x +ξ∆e−λ∆, (1−e−λ∆)∂θ2ξ, (1−e−λ∆)∂θ3ξ),
∂θβ(x;θ)′ = (2ω2∆e−2λ∆, (1−e−2λ∆)∂θ2ω
2, (1−e−2λ∆)∂θ3ω2).
Using the moment generating function of the stationary OU process we can compute
m1, . . . ,m4 and the optimal quadratic MGEF can be implemented. For the two types
of OU processes considered in this chapter the moment generating functions are
given below, and using E(X n) = d n Mxdun (0) the marginal moments can be derived.
• When X ∼ Γ(ν,α) the moment generating function and marginal moments are
2.3. ESTIMATION METHODS 59
given by
MX (u) := E(euX )= (1−αu)−ν, u ∈R
m1 = ξ= E(X ) =αν, m2 =α2ν(ν+1), ω2 = Var(X ) =α2ν,
m3 =α3ν(ν+1)(ν+2), m4 =α4 (ν+4−1)!
(ν−1)!.
• When X ∼ IG(δ,γ) the moment generating function and marginal moments
are given by
MX (u) = eδγ−δpγ2−2u , u ∈R
m1 = ξ= E(X ) = δ/γ, m2 = δ(1+δγ)
γ3 , ω2 = Var(X ) = δ/γ3,
m3 = δ(3+3δγ+δ2γ2)
γ5 , m4 = δ(15+15δγ+6δ2γ2 +δ3γ3)
γ7 .
We now have everything we need for constructing the optimal quadratic MGEF.
To the best of my knowledge this is the first time an optimal quadratic MGEF
is derived for non-Gaussian OU processes. In Hubalek and Posedel (2013), the au-
thors consider the Barndorff-Nielsen and Shephard stochastic volatility model from
Barndorff-Nielsen and Shephard (2001b) in a bivariate setting, assuming that both the
price process as well as the volatility process can be directly observed. The volatility
specification in that model is a non-Gaussian OU process and the part of their MGEF
that concerns the volatility process can therefore be compared to the quadratic MGEF
derived in this paper. Their estimating function corresponds to letting the weight
matrix be a 3×3 identity matrix and choosing the h j ’s as
h1(∆, X(i−1)∆, Xi∆;θ) = Xi∆−F (∆, X(i−1)∆;θ),
h2(∆, X(i−1)∆, Xi∆;θ) = Xi∆X(i−1)∆−Eθ(Xi∆X(i−1)∆|X(i−1)∆ = x(i−1)∆),
h3(∆, X(i−1)∆, Xi∆;θ) = X 2i∆−Eθ(X 2
i∆|X(i−1)∆ = x(i−1)∆).
In contrast to our MGEF, the vector of h j ’s used in Hubalek and Posedel (2013) is now
3-dimensional. The MGEF studied in Hubalek and Posedel (2013) is sub-optimal since
their simple choice of weight matrix does not result in the most efficient estimator
within the class of MGEF based on the h j ’s above. However, the simple structure
allows for an explicit estimator since an explicit solution to Gn(θ) = 0 for θ = (κ,ξ,ω2)
is available, whereκ= e−λ∆ and ξ andω2 are the mean and variance of the OU process.
Hence, the estimator from Hubalek and Posedel (2013) does not rely on numerical
optimization procedures. The parameters of interest, the mean-reversion rate and
the parameters characterizing D, can then be backed out from their estimate of θ.
The resulting estimator for the mean-reversion parameter, λ, becomes a function
of the autocorrelation coefficient of an AR(1) process and the estimates of ξ and
60 CHAPTER 2. ON ESTIMATION METHODS FOR NON-GAUSSIAN OU PROCESSES
ω2 are also based on well-known estimators for the mean and variance of an AR(1)
process. The estimator for the mean-reversion parameter from Hubalek and Posedel
(2013) and similar moment based estimates for the parameters characterizing D,
will be studied in our Monte Carlo study as a way of obtaining initial values for the
numerical routines used in the FFT MLE method and the method based on our
optimal quadratic MGEF.
Note that in the case of the Γ-OU process, the mean-reversion parameter λ can
be estimated simultaneously with the parameters from the marginal distribution.
This is not possible in the context of the FFT MLE procedure, where the conditional
density has to be split according to whether or not a jump has occurred, see (2.6). The
splitting depends on λ and in Valdivieso et al. (2009) and our Monte Carlo study, λ
will be assumed known in order to implement the estimation procedure.
2.4 Monte Carlo Study
In the Monte Carlo study the performance of the estimators will be investigated in two
different parameter scenarios, both for the Γ(ν,α)-OU process and the IG(δ,γ)-OU
process. The parameters controlling the marginal distribution of the OU process are
chosen to resemble the “base-signal” and “spike” part of commodity prices. That is,
the Γ(ν,α)-OU process in scenario 1 (the “base-signal” scenario) will have many, but
small, jumps mimicking the small imbalances in demand and supply. In scenario
2, (the “spike” scenario), the Γ(ν,α)-OU will have few, but large, jumps capturing
possible shocks to, or large imbalances in, demand and supply. Given the parameters
ν and α in the two scenarios, the parameters governing the marginal distribution of
the IG(δ,γ)-OU process will then be chosen to match the mean and variance of the
Γ(ν,α)-OU process in each scenario. As for the mean-reversion parameter, λ, several
studies have shown that the base-signal has a slower mean-reversion than the spike
process.4 Therefore, two choices of the mean-reversion parameter will be considered.
The parameters are chosen in the following way:
• Scenario 1 (The base-signal)
– The Γ(ν,α)-OU process: ν= 10 and α= 1/15.
– The IG(δ,γ)-OU process: δ=p20/3 and γ=p
15.
• Scenario 2 (The spike process):
– The Γ(ν,α)-OU process: ν= 0.5 and α= 0.5.
– The IG(δ,γ)-OU process: δ=p1/8 and γ=p
2.
4See for instance Benth, Benth, and Koekebakker (2008) and Meyer-Brandis and Tankov (2008).
2.4. MONTE CARLO STUDY 61
0 0.5 1 1.5 20
0.5
1
1.5
2
X(t)
0 0.5 1 1.5 20
10
20
30
X(t)
0 0.5 1 1.5 20
0.5
1
1.5
2
2.5
X(t)
0 0.5 1 1.5 20
2
4
6
X(t)
Figure 2.1. The top panel shows the marginal density of X (t) in scenario 1 and 2 for theΓ(ν,α)-OU process and the bottom panel shows the marginal densities in each scenario forthe IG(δ,γ)-OU process.
In each of the two scenarios we will consider both λ = 0.02 and λ = 0.25. In
scenario 1 we have E(X t ) = 0.667 and Var(X t ) = 0.044, whereas in scenario 2 we
have a lower mean and higher variance, E(X t ) = 0.25 and Var(X t ) = 0.125. Figure 2.1
depicts the marginal densities in each scenario. From the figure it is clear that the
two scenarios also represent the different shapes of the marginal densities. Plots of
simulated paths of the two OU processes in each of the two scenarios are provided in
Figure 2.2 - 2.5 in the Appendix, where the methods used for simulating the Γ-OU
and IG-OU processes are also described.
In the next subsections the results for various simulation studies, examining ways
to estimate the Γ-OU and IG-OU process are presented. The finite sample perfor-
mance of the FFT MLE procedure from Valdivieso et al. (2009) and the MGEF based
procedure, derived in the previous subsection, are investigated. Estimators used for
choosing starting values for the two estimation procedures will also be derived and
analyzed. The parameter configurations from scenario 1 and 2 are considered, and
in each setting we fix ∆ = 1 and simulate the processes on the interval [0,T ] with
T = 1000, leaving us with n = 1000 daily observations. The number of Monte Carlo
replications is 500.5
Before analyzing the performances of the estimation methods we make a few
observations. First, note that the effective observation frequency is in fact determined
by λ∆, due to the timing in Z (λt ), that also shows up in the autocorrelation function
r (u) = corr(X ((n +u)∆), X (n∆)) = exp(−λ∆|u|). Secondly, λ would in fact be known
5In Valdivieso et al. (2009) the authors only use 100 Monte Carlo replications.
62 CHAPTER 2. ON ESTIMATION METHODS FOR NON-GAUSSIAN OU PROCESSES
if we could observe the OU process, X , continuously. This can be seen by considering
the integral version of equation (2.1)
X (t ) = X (0)−λ∫ t
0X (s)ds +Z (λt ), (2.10)
where Z is a pure-jump process of bounded variation.6 Since Z (λt) =∑0<s≤t ∆Zλs ,
where ∆Zλs := Z (λs)−Z (λs−), and because t 7→ ∫ t0 X (s)ds is continuous, then jumps
of the OU process are the jumps of Z . Equation (2.10) therefore becomes
X (t ) = X (0)−λ∫ t
0X (s)ds + ∑
0<s≤t∆Xs ,
which gives
λ= 1∫ t0 X (s)ds
(X (0)−X (t )+ ∑
0<s≤t∆Xs
). (2.11)
This means that λ is known if we could observe the OU process continuously. So for a
fixed time horizon T , using high-frequency observations will result in a (unknown)
maximum likelihood estimator of λ that is really close to the true parameter value.
Note that a simulation study investigating this property would involve fixing the value
of T and λ and let ∆→ 0, thereby also increasing the number of observations used
for constructing the estimator. Our Monte Carlo study instead focus on the finite
sample performance of the various estimators given a fixed number of observations.
We investigate the performances at different effective frequencies by fixing ∆= 1 and
varying λ.
2.4.1 Initial Values for the two Estimation Procedures
In this subsection several ways of obtaining initial values for the two estimation
procedures from the previous section are considered. Two straightforward ways of
obtaining initial estimates of the mean-reversion parameter, λ, is by exploiting the
AR(1) structure of the discretized process or by matching the theoretical and empirical
autocorrelation function. By regressing X (n∆) on X ((n −1)∆) in (2.3) and isolating
for λ in the regression coefficient we get the estimator
λ1 = − log( ˆacf(1))
∆,
where ˆacf(1) denotes the empirical autocorrelation function of lag 1. If we instead
match the theoretical and empirical autocorrelation structure up to lag 100, we obtain
the estimator
λ100 = argminλ
100∑k=1
( ˆacf(k)−exp(−λk∆))2.
6The Lévy measure ν of Z satisfies∫
1∧|s|ν(d s) <∞. Subordinators are of bounded variation.
2.4. MONTE CARLO STUDY 63
Inspired by the observation that λ is known if we could observe in continuous time,
the following high-frequency estimator for λ is derived. Since Z is a pure-jump
subordinator, then Z and also X will be of bounded variation. Recall that the total
variation of the stochastic process X :Ω 7→R+ with cádlág paths on [0, t ] is defined
for each ω ∈Ω as the supremum of the sum of the absolute values of the increments
of the path Xs (ω), over the set of all partitions of the interval [0, t ]:
‖X ‖t (ω) = sup( n∑
i=1
∣∣X ti (ω)−X ti−1 (ω)∣∣ : 0 = t0 < t1 < ·· · < tn = t ,n ∈N
).
When the stochastic process is of bounded variation, the supremum will be finite
for almost all ω ∈Ω. It now follows that the total variation process,‖X ‖t , of the OU
process is given by
‖X ‖t =λ∫ t
0X (s)ds +Z (λt )
=λ∫ t
0X (s)ds + (
X (t )−X (0))+λ∫ t
0X (s)ds
= 2λ∫ t
0X (s)ds + (
X (t )−X (0)),
since X (t ) can be written as the difference between the two monotone, non-decreasing
right-continuous processes, Y 1t := Z (λt) and Y 2
t := λ∫ t
0 X (s)ds − X0, and the two
Lebesgue-Stieltjes measures induced by the functions are singular. Hence, the total
variation of X (t ) will be the sum of the total variations of the two stochastic processes
Y 1t and Y 2
t . The total variation on [0, t ] of a monotone non-decreasing function, f , is
simply f (t )− f (0) as the sum of increments from the definition becomes a telescoping
sum and the first equality now follows. The second equality follows from (2.10). Under
the assumption that∫ t
0 X (s)ds 6= 0 the above derivation gives rise to the following
high-frequency estimator
λHF = ‖X ‖t −(X (t )−X (0)
)2 á∫ t
0 X (s)ds.
An estimator, ‖X ‖t , of the total variation would be ‖X ‖t =∑t/∆
i=1
∣∣X (i∆)−X ((i −1)∆)∣∣
and as an estimator of the integral, á∫ t0 X (s)ds, the Riemann sum
∑t/∆i=1 X ((i−1)∆)∆will
be used. The estimator will be implemented for t = T . The performance of the high
frequency estimator λHF depends on in-fill asymptotics (∆→ 0) in contrast to the
other two estimators λ1 and λ100 where the performance will improve as the sample
size grows (T →∞). In order to obtain initial values for the parameters governing
the marginal distribution of the D-OU process two different ideas will be employed.
The first way of obtaining initial values comes from simple moment matching and
is also the method used in Valdivieso et al. (2009). From matching the theoretical
64 CHAPTER 2. ON ESTIMATION METHODS FOR NON-GAUSSIAN OU PROCESSES
expressions for the mean and variance of the process
Yi =∫ λi∆
λ(i−1)∆exp(s)dZ (s) = exp(λ∆)X (i∆)−X ((i −1)∆)
d= Z∗(∆)
with the empirical estimators Y = 1T /∆
∑T /∆i=1 Yi and S2 = 1
T /∆−1
∑T /∆i=1 (Yi − Y )2 we
obtain the following estimators
• When X ∼ Γ(ν,α) we have
νMoM = Y(exp(λ∆)−1
)αMoM
with αMoM = S2(
exp(λ∆)−1)
Y(
exp(2λ∆)−1) .
• When X ∼ IG(δ,γ) we have
δMoM = γMoM Y
exp(λ∆)−1with γMoM = 1
S
√Y
(exp(2λ∆)−1
)exp(λ∆)−1
.
The above estimators depend onλ and in the implementations λ1 will instead be used.
The other way of obtaining initial values for the parameters governing the marginal
distribution of X is to (wrongly) assume that X0, X∆, . . . , XT /∆ is an i.i.d. sequence
and perform maximum likelihood estimation using the marginal distribution, D, of
X . This procedure will give rise to estimators denoted by the subscript iidMLE. The
accuracy of the different ways of obtaining initial parameters are investigated in a
simulation study and the results are summarized in Tables 2.1 - 2.4. In each scenario
we report the mean, bias, standard deviation and root mean square error (RMSE) of
the estimators.
For the Γ(ν,α)-OU process we see that, as expected, our λHF estimator per-
forms the best when λ= 0.02. However, λHF is the worst performing estimator when
λ= 0.25, mainly because of a large increase in the bias of the estimator. This result
holds both in scenario 1 and scenario 2. Comparing λ1 with λ100 the first estimator
always outperforms the latter. In both scenarios there appears to be little difference
between αMoM and αiidMLE and their RMSEs are lowest when λ = 0.25. As for the
two estimators of ν, νiidMLE has a lowest RMSE than νMoM when λ= 0.02, but when
λ = 0.25 the difference between the two estimators is diminishing. Again the RM-
SEs are in general lower when λ = 0.25. This result is not surprising since λ = 0.25
corresponds to sampling at a lower frequency, which makes the i.i.d. assumption
more reasonable and also causes the empirical moments to be closer to the true
theoretical moments, due to the persistence of the OU process. All in all resulting in
more accurate parameter estimates than when λ= 0.02, both in terms of bias and
standard deviation.
For the infinite activity IG(δ,γ)-OU process, the results for the three estimators
of λ are the same as in the Γ(ν,α) case. The downwards bias in λHF that worsens
when λ∆ increases could be explained by the estimator of the total variation in the
2.4. MONTE CARLO STUDY 65
Table 2.1. Initial values for the Γ(ν,α)-OU process in scenario 1 with λ = λ1 in the MoMestimates.
λ= 0.02 λ1 λ100 λHF νMoM αMoM νiidMLE αiidMLE
True 0.0200 0.0200 0.0200 10.00 0.0666 10.00 0.0666Mean 0.0252 0.0300 0.0166 12.74 0.0583 12.18 0.0609Bias 0.0052 0.0100 -0.0035 2.737 -0.0084 2.176 -0.0057Std. dev. 0.0080 0.0123 0.0002 4.373 0.0194 4.139 0.0203RMSE 0.0096 0.0158 0.0035 5.155 0.0211 4.673 0.0210
λ= 0.25 λ1 λ100 λHF νMoM αMoM νiidMLE αiidMLE
True 0.2500 0.2500 0.2500 10.00 0.0666 10.00 0.0666Mean 0.2539 0.2600 0.0814 10.13 0.0664 10.09 0.0666Bias 0.0039 0.0100 -0.1686 0.1278 -0.0003 0.0893 -4.7e-05Std. dev. 0.0256 0.0415 0.0018 0.9598 0.0063 0.9469 0.0062RMSE 0.0259 0.0427 0.1686 0.9674 0.0063 0.9502 0.0062
Table 2.2. Initial values for the Γ(ν,α)-OU process in scenario 2 with λ = λ1 in the MoMestimates.
λ= 0.02 λ1 λ100 λHF νMoM αMoM νiidMLE αiidMLE
True 0.0200 0.0200 0.0200 0.5000 0.5000 0.5000 0.5000Mean 0.0248 0.0285 0.0196 0.7185 0.3896 0.6513 0.4158Bias 0.0048 0.0085 -0.0004 0.2185 -0.1104 0.1513 -0.0842Std. dev. 0.0068 0.0117 0.0001 0.5556 0.1880 0.2726 0.2089RMSE 0.0083 0.0144 0.0004 0.5965 0.2179 0.3116 0.2251
λ= 0.25 λ1 λ100 λHF νMoM αMoM νiidMLE αiidMLE
True 0.2500 0.2500 0.2500 0.5000 0.5000 0.5000 0.5000Mean 0.2568 0.2628 0.1992 0.5194 0.4880 0.5190 0.4877Bias 0.0068 0.0128 -0.0508 0.0194 -0.0120 0.0190 -0.0123Std. dev. 0.0247 0.0422 0.0031 0.0742 0.0738 0.0726 0.0692RMSE 0.0256 0.0440 0.0509 0.0766 0.0747 0.0750 0.0702
66 CHAPTER 2. ON ESTIMATION METHODS FOR NON-GAUSSIAN OU PROCESSES
Table 2.3. Initial values for the IG(δ,γ)-OU process in scenario 1 with λ = λ1 in the MoMestimates.
λ= 0.02 λ1 λ100 λHF δMoM γMoM δiidMLE γiidMLE
True 0.0200 0.0200 0.0200 2.582 3.873 2.582 3.873Mean 0.0260 0.0303 0.0139 2.944 4.434 2.868 4.318Bias 0.0059 0.0103 -0.0061 0.3616 0.5612 0.2863 0.4449Std. dev. 0.0074 0.0119 0.0002 0.4862 0.7406 0.4830 0.7356RMSE 0.0095 0.0157 0.0061 0.6055 0.9287 0.5610 0.8590
λ= 0.25 λ1 λ100 λHF δMoM γMoM δiidMLE γiidMLE
True 0.2500 0.2500 0.2500 2.582 3.873 2.582 3.873Mean 0.2560 0.2602 0.0770 2.610 3.913 2.607 3.908Bias 0.0060 0.0103 -0.1730 0.0285 0.0404 0.0249 0.0347Std. dev. 0.0257 0.0419 0.0017 0.1369 0.2125 0.1300 0.2056RMSE 0.0263 0.0431 0.1730 0.1397 0.2161 0.1322 0.2083
Table 2.4. Initial values for the IG(δ,γ)-OU process in scenario 2 with λ = λ1 in the MoMestimates.
λ= 0.02 λ1 λ100 λHF δMoM γMoM δiidMLE γiidMLE
True 0.0200 0.0200 0.0200 0.3536 1.414 0.3536 1.414Mean 0.0251 0.0284 0.0184 0.4521 2.098 0.4015 1.891Bias 0.0051 0.0084 -0.0016 0.0986 0.6836 0.0480 0.4770Std. dev. 0.0117 0.0101 0.0004 0.1201 0.9048 0.0949 0.8669RMSE 0.0128 0.0131 0.0017 0.1553 1.133 0.1062 0.9887
λ= 0.25 λ1 λ100 λHF δMoM γMoM δiidMLE γiidMLE
True 0.2500 0.2500 0.2500 0.3536 1.414 0.3536 1.414Mean 0.2565 0.2615 0.1755 0.3626 1.468 0.3578 1.451Bias 0.0065 0.0115 -0.0745 0.0091 0.0533 0.0043 0.0371Std. dev. 0.0230 0.0385 0.0036 0.0400 0.1963 0.0249 0.1820RMSE 0.0239 0.0401 0.0746 0.0409 0.2032 0.0252 0.1856
2.4. MONTE CARLO STUDY 67
nominator which is always smaller than or equal to the total variation. The bias in
the total variation estimator seems to dominate the bias from replacing the integral
in the denominator by a Riemann sum resulting in an overall downwards biased
estimator for λ. For both the Γ-OU and IG-OU process this bias is more pronounced
in scenario 1. For the two parameters governing the marginal distribution of the IG-
OU process the patterns are also comparable to those obtained for the Γ-OU process.
The estimators based on (wrongly) assuming i.i.d. observations and conducting MLE
estimation using the marginal distribution have smaller RMSEs than those obtained
by matching the mean and variance of the process Y .
Based on the results above, the i i d MLE estimators of the parameters governing
the marginal distribution will be used as initial values in the two estimation methods
from the previous section. In applications to real data, we will never know if we are
in a situation where λHF highly underestimates the mean-reversion parameter λ. A
more robust choice is therefore to use λ1 as our initial value for the FFT MLE and
MGEF based estimation procedures.
2.4.2 Results for the FFT MLE Procedure
In this subsection we investigate the performance of the FFT MLE procedure from
Valdivieso et al. (2009) in a Monte Carlo study. The simulation study extends the
one found in Valdivieso et al. (2009) by also considering parameter configurations
yielding marginal densities of the type in the right side of the panel in Figure 2.1, and
by using 500 Monte Carlo replications instead of 100.
We follow Valdivieso et al. (2009) and only estimate the parameters characterizing
the marginal distribution when considering the Γ-OU process. The reason for this
is based on the assumption that for at least one j ∈ 1,2, . . . ,T the Γ-OU process
will not jump between the two consecutive time periods ( j − 1)∆ and j∆. Hence,
the mean-reversion parameter does not need to be estimated and can be recovered
according to
λ= 1
∆log
( x( j−1)∆
x j∆
).
The results from estimating the Γ-OU process in the two different scenarios are
reported in Table 2.5 and Table 2.6. In both scenarios the FFT parameters are set as
N = 215 and ζ= 0.001 and the optimization is performed using fmincon with initial
values (νiidMLE , αiidMLE ). For the infinite activity IG-OU process the mean-reversion
parameter λ also needs to be estimated and we will use (λ1, νiidMLE , αiidMLE) as ini-
tial values for the optimization procedure. For the IG-OU process it was harder to
obtain convergence in the optimization procedure, mainly due to the sensitivity
towards the initial value λ1.7 As a result we used the simulated annealing proce-
dure (simulannealbnd function in MATLAB) for the first 1000 iteration and then
7Especially λ1 <λ gave rise to convergence problems.
68 CHAPTER 2. ON ESTIMATION METHODS FOR NON-GAUSSIAN OU PROCESSES
Table 2.5. FFT MLE parameter estimates for the Γ(ν,α)-OU process in scenario 1.
λ= 0.02 λ= 0.25
νFFT αFFT νFFT αFFT
True 10.00 0.0666 10.00 0.0666Mean 10.02 0.0672 10.01 0.0667Bias 0.0235 0.0005 0.0119 4.6e-05Std. dev. 0.7086 0.0049 0.3637 0.0024RMSE 0.7088 0.0049 0.3636 0.0024nRMSE 0.0709 0.0735 0.0364 0.0360
Table 2.6. FFT MLE parameter estimates for the Γ(ν,α)-OU process in scenario 2.
λ= 0.02 λ= 0.25
νFFT αFFT νFFT αFFT
True 0.5000 0.5000 0.5000 0.5000Mean 0.4941 0.5095 0.5013 0.4993Bias -0.0059 0.0095 0.0013 -0.0007Std. dev. 0.1445 0.1908 0.0447 0.0455RMSE 0.1445 0.1908 0.0447 0.0454nRMSE 0.2890 0.3816 0.0894 0.0908
proceeded with fmincon. The necessary choices of FFT parameters also reflect the
fact that the optimization becomes more unstable in the IG-OU case. In scenario
1, N = 217 and ζ=5e-05 was used for both choices of λ. In scenario 2, convergence
could only be obtained in the lower frequency case λ= 0.25 and we had to set N = 219
and ζ=1.25e-05. Since the frequency (λ= 0.02) seemed to be the problem we instead
evaluated the density function fZ∗(∆) at the data points eλ∆xi ∆−x(i−1)∆ with ∆= c∆
for c ∈N. In order not to reduce the number of data points to much we used rolling
time intervals leaving us with T −c data points. The cost of this is that the data points
will no longer be independent but instead display a correlation structure of order c−1.
The resulting method will be referred to as the quasi FFT MLE estimation procedure.
As for the choice of c, one must try to balance the gain from lowering the frequency
with the loss from using correlated data when constructing the likelihood function.
A more extensive investigation on how to optimally choose c is not pursued in this
chapter. The results for the IG-OU process are reported in Tables 2.7 - 2.9.
In each scenario we report the mean, bias, standard deviation, RMSE and the
2.4. MONTE CARLO STUDY 69
Table 2.7. FFT MLE parameter estimates for the IG(δ,γ)-OU process in scenario 1.
λ= 0.02 λ= 0.25
λFFT δFFT γFFT λFFT δFFT γFFT
True 0.0200 2.582 3.873 0.2500 2.582 3.873Mean 0.0200 2.520 3.977 0.2496 2.525 3.953Bias -1.8e-05 -0.0622 0.1043 -0.0004 -0.0575 0.0797Std. dev. 0.0003 0.1252 0.4103 0.0026 0.0680 0.1279RMSE 0.0003 0.1397 0.4230 0.0026 0.0890 0.1506nRMSE 0.0150 0.0541 0.1093 0.0104 0.0345 0.0389
Table 2.8. FFT MLE parameter estimates for the IG(δ,γ)-OU process in scenario 2.
λ= 0.25
λFFT δFFT γFFT
True 0.2500 0.3536 1.414Mean 0.2500 0.3442 1.447Bias -3.7e-05 -0.0093 0.0327Std. dev. 0.0003 0.0230 0.2046RMSE 0.0003 0.0248 0.2070nRMSE 0.0012 0.0701 0.1464
Table 2.9. Quasi FFT MLE parameter estimates for the IG(δ,γ)-OU process in scenario 2 withc = 10.
λ= 0.02
λFFT δFFT γFFT
True 0.0200 0.3536 1.414Mean 0.0200 0.3885 1.536Bias -4.6e-06 0.0350 0.1217Std. dev. 1.6e-05 0.0287 0.7790RMSE 1.7e-05 0.0453 0.7873nRMSE 8.5e-04 0.1281 0.5567
70 CHAPTER 2. ON ESTIMATION METHODS FOR NON-GAUSSIAN OU PROCESSES
RMSE normalized by the true parameter value (nRMSE). The results reveal that in all
scenarios and for both types of OU processes the parameters from the marginal dis-
tribution are easier to estimate when λ= 0.25. This is not surprising since observing
the OU process over a longer time span, although at the cost of lower frequency of
observations, is beneficial, especially for persistent processes. Furthermore, for the
Γ-OU process the number of expected jumps in our sample is given by λν∆T , which
in scenario 2 equals 125 when λ= 0.25 and only 10 when λ= 0.02. In this case, the
jumps contain almost all the information on the parameters governing the marginal
distribution and the extra available information when λ= 0.25 translates into more
accurate parameter estimates. Another reason could be the usage of more accurate
initial values, since the i.i.d. assumption underlying the iidMLE estimators becomes
more plausible when λ increases. Finally, a more numerical reason could be that the
order of magnitude of the data points , eλ∆xi∆−x(i−1)∆, where the density function is
to be evaluated, decreases when λ decreases. As a consequence it becomes harder
to fine tune the FFT parameters (N and ζ) and obtain convergence when λ= 0.02.
As for the mean reversion parameter, which is also estimated for the IG-OU process,
then at both frequencies λ is extremely well estimated. In scenario 1, the same results
for the marginals parameters as well as the mean-reversion parameter were found in
the Monte Carlo study in Valdivieso et al. (2009).
If we look at the performance of the estimation procedure across the two sce-
narios, by comparing the normalized RMSEs, several patterns become evident. First
of all, the parameters from the marginal distribution are easier to estimate in the
“base-signal” scenario (scenario 1), regardless of the value of λ and for both types
of OU processes. For the IG-OU process, where λ is also estimated, the results for λ
show that the parameter is more accurately estimated in the “spike” scenario. These
results could be explained by the fact that in the “base-signal” scenario there is a lot
of activity/spikes and hence the Lévy functionals Yid= Z∗(∆) influence the observa-
tions more than in the “spike” scenario, where the OU process (almost) just exhibits
exponential decay, determined by λ ,on the intervals between large spikes.
Based on the results of the Monte Carlo study, the FFT procedure seems to per-
form equally well for finite and infinite activity OU processes. However, one important
difference lies in the applicability of the estimation method. It was in general harder
to obtain convergence in the estimation procedure when the infinite activity process
was considered, especially in scenario 2. And even when convergence was obtained
in the different setting, the choice of nuisance parameters (N and ζ) influences the
parameter estimates. If the nuisance parameters are not chosen optimally the estima-
tor looses efficiency and the bias may increase. For the Γ-OU process λ is assumed to
be known and one way of fine tuning N and ζ is to calibrate fZ∗(∆) to the empirical
density of the Lévy functionals. The construction of the Lévy functionals depends on
λ, and for the IG-OU process and estimate of λ must be used.
In terms of RMSEs the FFT MLE procedure performs significantly better than any
2.4. MONTE CARLO STUDY 71
of the estimators used for finding initial values, except for the marginal parameters of
the IG-OU process in scenario 2 where the performance is similar to the i.i.d. MLE
approach.
2.4.3 Results for the Estimation Procedure based on the Optimal MGEF
In this subsection the results on the finite sample performance of the optimal MGEF
are presented. As a benchmark for the optimal MGEF based estimator, we will also
consider the simpler estimating function that emerges from solving the minimization
problem
minθ∈Θ
n∑i=1
[xi∆−F (x(i−1)∆;θ)
]2 + [(xi∆−F (x(i−1)∆;θ))2 −β(x(i−1)∆;θ)
]2, (2.12)
Differentiation of (2.12) w.r.t. the parameters in θ yields the following expression
Sn(θ) =n∑
i=1−2∂θF (xi∆−F )− (
4(xi∆−F )∂θF +2∂θβ)(
(xi∆−F )2 −β), (2.13)
where the dependence on θ and x(i−1)∆ is suppressed. Note that (2.13) is almost of
the form (2.8) with the same choice of functions, h1 and h2, as in the optimal MGEF.
The only difference is the dependence on xi∆ in the weight in front of h2. In most
cases this means that the estimating function will not be a martingale. In particular,
Sn(θ) will only be a martingale for OU processes satisfying Eθ((xn∆−F )3|X(n−1)∆ =
x) = η(∆, x;θ) = 0. This property holds for the Gaussian-OU process, and it is also
satisfied for non-Gaussian OU processes where the AR(1) corrected residuals, ε, have
a symmetric distribution around its mean. Unfortunately this is not the case for the
two subordinator-driven non-Gaussian OU processes considered in this chapter. This
also implies that the estimating function, Sn(θ), is biased and the resulting estimates
might be biased as well. The size of η(∆, x;θ) is however quite small in our setting
and is, not surprisingly, smaller in scenario 1 and decreases when λ decreases. In an
attempt to balance out the potential bias, we instead minimize the function
minθ∈Θ
n∑i=1
[xi∆−F (x(i−1)∆;θ)
]2 +0.01[(xi∆−F (x(i−1)∆;θ))2 −β(x(i−1)∆;θ)
]2. (2.14)
The constant 0.01 in (2.14) might seem somewhat arbitrary and of course one
could choose the constant in a more optimal way, but since the resulting estimator
just serves as a benchmark for the optimal MGEF this will not be further pursued.
Besides, an optimal constant would depend on the unknown parameters, θ, and the
purpose of the simple estimating function is to investigate the performance of an
easy implementable and straightforward approach that has a structure similar to the
optimal MGEF. Furthermore, the simple structure in (2.12) allows for a parametriza-
tion in terms of λ, ξ andω2, which gives a smoother objective function. The estimates
of the parameters from the marginal distribution can then be backed out using the
72 CHAPTER 2. ON ESTIMATION METHODS FOR NON-GAUSSIAN OU PROCESSES
estimates of ξ andω2. Such a re-parametrization is not possible for the optimal MGEF
since η(∆, x;θ) and ψ(∆, x;θ) can not be expressed in terms of ξ and ω2.8 The two es-
timating functions are easy to implement and, in contrast to the FFT MLE procedure,
the estimation procedure do not rely on any nuisance parameters. Furthermore, the
mean-reversion parameter can be estimated simultaneously with the parameters
from the marginal distribution for all non-Gaussian OU processes. With the FFT MLE
procedure this was only possible for infinite activity OU processes. For numerical
reasons, when implementing the estimator based on the optimal MGEF, we minimize
Gn(θ)′Gn(θ) w.r.t. θ instead of solving Gn(θ) = 0. For both types of OU processes and
at both frequencies, the method based on the simple estimating function was robust
w.r.t. the choice of initial values and had no problems converging. For the method
using the optimal MGEF it was in some cases a bit harder to obtain convergence and
not ending up in corner solutions. We therefore used a grid of initial values centered
around λ1 and the i i d MLE estimates. Except for the Γ-OU process in scenario 1, this
strategy worked and convergence to a local minimum was obtained. For the Γ-OU
process in scenario 1, further numerical difficulties were encountered. The objective
function to be minimized became very irregular and had several (meaningful) local
minima. The problem is not a small sample problem and fixing the value ofλ and only
optimizing w.r.t. the parameters from the marginal distribution did not help either.
The results presented for that case, therefore depend on the choice of grid of initial
values. The problem of not having a unique solution to Gn(θ) = 0 was also reported in
Benth et al. (2012) where a Γ-OU process is used for modeling the base-signal of EEX
electricity spot prices and estimated using prediction-based estimating functions.9
Prediction-based estimating functions were introduced in Sørensen (2000) and are a
generalization of MGEFs based on unconditional moments instead of conditional
moments. In this case, where the model is Markovian and conditional moments are
computable, MGEFs result in more efficient estimators (asymptotically) and are to
be preferred. The results from implementing the simple benchmark from (2.12) and
the optimal MGEF are reported in Table 2.10 - 2.13, where the abbreviations SB and
OMG are used for the estimators based on the simple and optimal quadratic MGEF.
As was the case with FFT MLE procedure, the parameters are more accurately
estimated when the frequency is lower (λ = 0.25). As with for instance the MoM
estimation procedure, this can be explained by the improved match between em-
pirical moments, in this case conditional moments, and the theoretical moments
underlying the estimation procedure, when the process is observed over a longer
time span. The optimal MGEF based method outperforms the estimator from (2.12),
in terms of RMSEs, except for the Γ-OU process in scenario 1 with frequency λ= 0.25.
In that case the simple benchmark performs a bit better and, as mentioned in the
8Other re-parametrizations might be fruitful in terms of numerical stability, but this was not pursuedfurther.
9See Sørensen (2000) or Brix and Lunde (2013) for an introduction to prediction-based estimatingfunctions.
2.4. MONTE CARLO STUDY 73
Table 2.10. Parameter estimates for the Γ(ν,α)-OU process in scenario 1 using the optimalMGEF and the simple benchmark.
λ= 0.02 λ= 0.25
λSB νSB αSB λOMG νOMG αOMG λSB νSB αSB λOMG νOMG αOMG
True 0.0200 10.0000 0.0666 0.0200 10.0000 0.0666 0.2500 10.0000 0.0666 0.2500 10.0000 0.0666Mean 0.0241 12.2331 0.0611 0.0241 10.8144 0.0656 0.2529 10.1074 0.0665 0.2529 10.0948 0.0667Bias 0.0041 2.2331 -0.0056 0.0041 0.8144 -0.0011 0.0029 0.1074 -0.0001 0.0029 0.0948 0.0000Std. dev. 0.0079 4.3024 0.0209 0.0079 2.5237 0.0178 0.0255 0.9573 0.0063 0.0255 1.0417 0.0067RMSE 0.0089 4.8436 0.0216 0.0089 2.6494 0.0178 0.0256 0.9623 0.0063 0.0256 1.0450 0.0067nRMSE 0.4445 0.4844 0.3239 0.4438 0.2649 0.2667 0.1025 0.0962 0.0941 0.1024 0.1045 0.0998
Table 2.11. Parameter estimates for the Γ(ν,α)-OU process in scenario 2 using the optimalMGEF and the simple benchmark.
λ= 0.02 λ= 0.25
λSB νSB αSB λOMG νOMG αOMG λSB νSB αSB λOMG νOMG αOMG
True 0.0200 0.5000 0.5000 0.0200 0.5000 0.5000 0.2500 0.5000 0.5000 0.5000 0.5000 0.5000Mean 0.0237 0.6995 0.4070 0.0231 0.6573 0.4149 0.2560 0.5417 0.4779 0.2545 0.5165 0.4899Bias 0.0037 0.1995 -0.0930 0.0031 0.1573 -0.0851 0.0060 0.0417 -0.0221 0.0045 0.0165 -0.0101Std. dev. 0.0053 0.3014 0.2889 0.0046 0.2800 0.2126 0.0250 0.0735 0.0693 0.0192 0.0708 0.0701RMSE 0.0064 0.3612 0.3032 0.0056 0.3209 0.2288 0.0257 0.0844 0.0727 0.0197 0.0726 0.0707nRMSE 0.3205 0.7224 0.6065 0.2785 0.6418 0.4576 0.1028 0.1688 0.1454 0.0788 0.1451 0.1414
Table 2.12. Parameter estimates for the IG(δ,γ)-OU process in scenario 1 using the optimalMGEF and the simple benchmark.
λ= 0.02 λ= 0.25
λSB δSB γSB λOMG δOMG γOMG λSB δSB γSB λOMG δOMG γOMG
True 0.0200 2.5820 3.8730 0.0200 2.5820 3.8730 0.2500 2.5820 3.8730 0.2500 2.5820 3.8730Mean 0.0242 2.8875 4.3056 0.0216 2.7461 4.1035 0.2558 2.6151 3.9204 0.2529 2.6030 3.9045Bias 0.0042 0.3055 0.4326 0.0016 0.1642 0.2305 0.0058 0.0331 0.0474 0.0029 0.0210 0.0316Std. dev. 0.0064 0.4581 0.6613 0.0035 0.3193 0.4965 0.0258 0.1376 0.2119 0.0183 0.1200 0.1852RMSE 0.0077 0.5502 0.7897 0.0038 0.3588 0.5469 0.0264 0.1414 0.2170 0.0185 0.1217 0.1877nRMSE 0.3835 0.2131 0.2039 0.1916 0.1389 0.1412 0.1055 0.0548 0.0560 0.0739 0.0471 0.0485
74 CHAPTER 2. ON ESTIMATION METHODS FOR NON-GAUSSIAN OU PROCESSES
Table 2.13. Parameter estimates for the IG(δ,γ)-OU process in scenario 2 using the optimalMGEF and the simple benchmark.
λ= 0.02 λ= 0.25
λSB δSB γSB λOMG δOMG γOMG λSB δSB γSB λOMG δOMG γOMG
True 0.0200 0.3536 1.4142 0.0200 0.3536 1.4142 0.2500 0.3536 1.4142 0.2500 0.3536 1.4142Mean 0.0234 0.4738 2.0955 0.0214 0.4254 1.9986 0.2553 0.3810 1.4979 0.2511 0.3610 1.4671Bias 0.0034 0.1203 0.6813 0.0014 0.0718 0.5844 0.0053 0.0274 0.0837 0.0011 0.0074 0.0529Std. dev. 0.0053 0.3580 0.8393 0.0029 0.1023 0.7911 0.0250 0.0374 0.1854 0.0139 0.0377 0.1882RMSE 0.0063 0.3773 1.0803 0.0032 0.1249 0.9829 0.0255 0.0463 0.2032 0.0139 0.0384 0.1953nRMSE 0.3150 1.0672 0.7639 0.1599 0.3532 0.6950 0.1021 0.1310 0.1437 0.0555 0.1086 0.1381
beginning of this subsection, is also numerically more stable. The gain from using the
optimal MGEF is most apparent when λ= 0.02. Maybe because the better the data
fit the underlying conditional moment conditions imposed by the functions h1 and
h2, the less gain there is from using an optimal weighting matrix. If we compare the
performance of the estimation method based on MGEFs across the two scenarios, in
terms of nRMSEs, we find the same patterns as was found for the FFT MLE estimation
procedure. Namely, that the mean-reversion parameter is easier to estimate in the
“spike” scenario, whereas the parameters from the marginal distribution of the OU
process are more accurately estimated in the “base-signal” scenario. The differences
between the nRMSEs are most prominent when λ= 0.02, corresponding to the case
where the process is being observed at a higher frequency but over a shorter time
span. One possible explanation for this could be that in scenario 2 there are more
intervals without jumps/without big jumps, making inference on lambda easier and
λ = 0.02 increases the prob of these intervals. As for the marginal parameters, the
opposite argument can be applied since we in the “base-signal” scenario have more
observations containing info about the marginal parameters.
If we in each of the two scenarios compare the nRMSE across the two types of
OU processes we find that in scenario 1 the parameters seem to be more accurately
estimated for the IG-OU process. This is also the case in scenario 2 when λ= 0.25,
but when λ = 0.02 the estimation method performs equally well for the finite and
infinite activity OU processes.
For the Γ-OU process the estimator for λ based on the optimal MGEF has a
lower RMSE than the regression-based estimator, λ1, especially in scenario 2. For
the IG-OU process there is an even bigger gain from using the optimal MGEF to
estimate λ instead of using the estimator λ1. In general, the gain from using the
optimal MGEF based estimation method, instead of the straightforward methods
studied in the subsection of methods used for obtaining initial values, are largest for
the estimation of the mean-reversion parameter. In fact, for the marginal parameters
the performance is only better than the iidMLE estimators in scenario 1 and again
the gain is most apparent when λ= 0.02. In scenario 2, the RMSEs for the marginal
2.5. EXTENSIONS 75
parameters are around the same size as for the iidMLE estimators for the Γ-OU
process. For the IG-OU process in scenario 2, the iidMLE estimators actually perform
slightly better than the estimator based on the optimal MGEF. However, it is still
preferable to estimate all three parameters simultaneously using the MGEFs, instead
of splitting the estimation procedure and use the iidMLE estimators and λ1.
The results for the FFT MLE estimation procedure might, as already discussed,
serve as a lower bound for the attainable efficiency since maximum likelihood estima-
tion is the most efficient estimation procedure and when comparing the performance
of the MGEF based estimators with the results for the FFT MLE procedure this also
becomes evident. For the Γ-OU process the differences between the RMSEs for the
marginal parameters are largest in scenario 1 where the RMSEs for the optimal MGEF
based procedure are approximately three times higher than for the FFT MLE proce-
dure. In scenario 2 the RMSE is around twice as high for ν and a bit less for α. For
the IG-OU process the difference in scenario 1 is a factor 2, whereas the difference is
smaller in scenario 2 and is in fact mostly present for δ. For both type of processes,
the difference in RMSEs between the FFT MLE procedure and the procedure based
on the optimal MGEF are larger when λ = 0.02. The mean-reversion parameter is
extremely well estimated when the FFT MLE estimation procedure is used compared
to the optimal MGEF based procedure and again the difference is most pronounced
when λ= 0.02.
2.5 Extensions
This section offers a discussion of interesting ways in which estimation methods
could be extended to handle other models based on non-Gaussian OU processes.
It would be natural to consider extending the estimation procedures to the case
where we have observations from a superposition of independent OU processes.
These multi-factor models are very popular in the literature on modeling commodity
spot prices, like electricity, that can be split into a base-signal and spike component,
see for instance Benth et al. (2007), Meyer-Brandis and Tankov (2008), Klüppelberg
et al. (2010) and Benth and Vos (2013). When estimating these models standard prac-
tice, in the literature on modeling commodity prices, is to split the spot prices into
the base-signal and spike component using various filtering techniques and then in
a second step estimate each of the OU processes separately. It would therefore be of
great interest to develop estimation methods for estimating all the parameters simul-
taneously. If the OU processes have the same mean-reversion rates, Markovianity of
the model is preserved and MGEFs are still applicable. The FFT MLE approach is also
still feasible since the characteristic function of a sum of independent processes is
just the product of the characteristic functions of the entering factors. However, this
extension of the FFT MLE procedure is only directly applicable to the superposition
of infinite activity OU processes. For the finite activity processes, the extension will
76 CHAPTER 2. ON ESTIMATION METHODS FOR NON-GAUSSIAN OU PROCESSES
be difficult to implement since these processes are mixed random variables and it is
therefore not straightforward to decide how the density function should be defined.
One will again need to condition on the presence of jumps, and this time also on
which of the processes that jumped.
Unfortunately, when using a superposition to capture the dynamics of energy
commodity spot prices, in particular electricity prices, different mean-reversion rates
are a crucial feature of the model. The OU process used for modeling the spike com-
ponent must have a much faster mean-reversion rate as the spikes occur when, for
instance, a nuclear power plant unexpectedly shuts down. These sudden imbalances
in the market cause prices to spike, but shortly after the prices revert back to normal
levels. In order to create this spike behavior it is necessary for the spike process to
have a larger mean-reversion rate than the one used for capturing minor imbalances
in supply and demand, (the base-signal process). Furthermore, allowing for different
mean-reversion rates makes modeling of the autocorrelation structure more flexible,
and the stylized fact of multi-scale autocorrelation function can now be accommo-
dated. Incorporating different mean-reversion rates causes statistical challenges
since the model becomes non-Markovian. This also means that maximum likelihood
estimation is no longer feasible. Instead of resorting to filtering methods and splitting
the spot price into a base and spike component, simultaneous estimation of the
parameters can be carried out using prediction-based estimating functions, which
are a generalization of martingale estimating functions, (see Sørensen (2000) or Brix
and Lunde (2013) for an introduction to prediction-based estimating functions). The
idea of using prediction-based estimating function to estimate superpositions of OU
processes was put forward in Sørensen (2000) and in the context of commodity prices
in Benth et al. (2007), but the performance of this method is still to be investigated.
The method only relies on the computation of unconditional moments, a task that is
still feasible when considering the non-Markovian superposition of OU processes.
Implementing and studying the performance of this estimation approach is left for
further research.
Another interesting way of extending the two estimation procedures is to consider
the inclusion of measurement errors in the observations. This would also be a step in
the direction of investigating how the methods handle more realistic data. For the
methods based on MGEFs the measurement errors could be taken into account para-
metrically if we are willing to assume a concrete specification of the measurement
errors, such as normally distributed additive measurement errors. In fact, for con-
struction of the simple benchmark in (2.12) no distributional assumption is needed
for the measurement errors. The mean and variance of the error process could simply
be included as additional parameters. In contrast, for the optimal MGEF the 3rd and
4th order moments of the error process are also needed, and a quasi MGEF approach
where the errors are assumed to be normally distributed might be more efficient
than including 4 additional parameters in the parameter vector. For the FFT MLE
2.6. CONCLUSION 77
method the extension to the measurement error case is not as straightforward, at
least not for the finite activity OU processes. For the IG-OU process, the characteristic
function for the Lévy functionals plus the independent error term can be computed
if we make a distributional assumption about the errors. For the Γ-OU process the
estimation is complicated by the fact that the process is a mixed random variable.
This means we have to split the computation of the conditional density into a jump
and no-jump part as was done in (2.6) for the Γ-OU process without measurement
errors. One way of deciding if a jump has occurred or not could be to plot the AR(1)
transformed residuals (the Lévy functional plus the measurement error) and from
this plot obtain a threshold, instead of 0 as was used in (2.6), for deciding whether or
not a jump has occurred. This method would probably only perform well in a setting
where the jumps are of a much larger magnitude than the measurement errors. Also,
if the jump intensity is believed to be very high, ignoring the no-jump part of the
likelihood might also work. However, the theoretical justification and finite sample
performance of the estimation procedures are outside the scope of this chapter.
2.6 Conclusion
The Monte Carlo study from Valdivieso et al. (2009) was extended by also considering
other shapes of the marginal density of the OU process, better suited for modeling
the spike part of commodity prices. In this parameter setting the density of the Lévy
functionals were more concentrated around zero, giving rise to numerical challenges.
In particular, fine tuning the parameters N and ζ of the FFT was a bit harder. However,
when well implemented, the FFT MLE procedure performed very well in terms of bias
and standard deviation of the estimates and could serve as a bound on the attainable
efficiency of other estimators. The chapter also considered another simulation free
estimation method, a method based on MGEFs, that are analytically tractable in the
Markovian setting of non-Gaussian OU processes. The optimal MGEF was derived for
a general non-Gaussian OU process and the method was implemented and analyzed
for the Γ-OU and IG-OU process in the “base-signal” and “spike” scenario. Except for
numerical problems for the Γ-OU process in the “base-signal” scenario, the method
performed well. The highest gain in terms of RMSE, compared to the methods sug-
gested for obtaining initial values for the estimation procedures, was encountered
in the mean-reversion parameter. As for the parameters governing the marginal
distribution, the RMSEs were not much higher than those from the simple maxi-
mum likelihood estimation procedure that wrongly assumes that the observations
are independent (the iidMLE estimator). Comparing the performance of the MGEF-
based method and FFT MLE method revealed that despite the good performance
of the MGEF based method there is still room for efficiency improvements. These
improvements could possibly be obtained by leaving the class of quadratic MGEFs
and increase the dimension of h, the vector containing conditional moment condi-
78 CHAPTER 2. ON ESTIMATION METHODS FOR NON-GAUSSIAN OU PROCESSES
tions. It is also worth noting that, in contrast to the FFT MLE estimation procedure,
the MGEFs also offer a framework for simultaneously estimating the mean-reversion
parameter and the parameters from the marginal distribution for finite activity OU
processes. Furthermore, the MGEF-based methods are much easier to extend to han-
dle more realistic data containing measurement errors or to the case of observations
from a superposition of OU processes.
All in all, leaving the ideal setup with no model-misspecification and simulated
data, the MGEF-based estimation method seems like a numerically more robust
approach than the FFT MLE method. Especially if we consider the trouble the FFT
MLE method has with handling high-frequency data and the sensitivity towards the
nuisance parameters N and ζ. In spite of the superior performance of the FFT MLE
method documented in the present Monte Carlo study, the performance might be
quite different in settings where the parameters N and ζ are not easily fine-tuned
and, as already discussed, the method might not even be applicable when fitting
finite activity OU processes to real data. If the aim is to model the “base-signal” by a
non-Gaussian OU process, like the one from scenario 1 with a low value of λ, then
the estimation method using the optimal MGEF seems like a good choice, especially
for the infinite activity OU processes. If on the other hand, the “spike” process is
considered, that is, we are in scenario 2 and λ is high, then the iidMLE estimates of
the marginal parameters seems like an easy implementable and robust choice, with
little or no loss in effeciency compared to the MGEF-based estimators. The gain from
using MGEF-based methods in the “spike” scenario is in the efficiency of the estimate
of the mean-reversion parameter.
2.7. APPENDIX 79
2.7 Appendix
2.7.1 Simulation of non-Gaussian OU Processes
In this section simulation of the Γ-OU and IG-OU process will be described. Time
will be measured in units of trading days, such that ∆= 1 will corresponds to daily
sampling of the data. The aim is to simulate the non-Gaussian OU process X at the
discrete time points X (0), X (∆), X (2∆), . . . , X (n∆). The initial value X (0) can be simu-
lated by drawing from the invariant distribution D, which will be either the Γ(ν,α)
or the IG(δ,γ) distribution. From the recursive relationship in (2.3) the discretized
non-Gaussian OU process can be simulated if we can simulate random draws of the
variable∫ λ∆
0 e s dZ (s).
Simulating the Γ(ν,α)-OU Process
Simulation of the Γ-OU process is based on the following infinite series representa-
tion of the Lévy integral∫ T
0 f (s)dZ (s) of a positive and integrable function f w.r.t. a
subordinator ∫ λ
0f (s)dZ (s)
d=∞∑
i=1W −1(ai /λ) f (λri ), (2.15)
where W −1 denotes the inverse of the tail mass function W +(x) = ∫ ∞x w(y)dy with w
being the density of W , the Lévy measure of Z (1). In (2.15) the two series ai and ri
are independent and a1 < a2 < ·· · < ai < . . . are the arrival times of a Poisson process
with intensity 1 and ri are independent U [0,1] random variables. (see Thm 8.1 in
Barndorff-Nielsen and Shephard (2001a) for a proof of this result).
When X ∼ Γ(ν,α), the inverse of the tail mass function is
W −1(x) = max(0,−α log
( x
ν
)). (2.16)
By combining (2.15) and (2.16) we can now simulate from a gamma-OU process by
using
∫ λ∆
0e s dZ (s)
d=∞∑
i=1W −1(ai /λ∆)eλ∆ri
d=−α∞∑
i=11]0,ν[(ai /λ∆) log(ai /νλ∆)eλ∆ri
d=α∞∑
i=11]0,1[(ci ) log(c−1
i )eλ∆ri
d=αN (1)∑i=1
log(c−1i )eλ∆ri , (2.17)
80 CHAPTER 2. ON ESTIMATION METHODS FOR NON-GAUSSIAN OU PROCESSES
where c1 < c2 < . . . are the arrival times of a Poisson process with intensity νλ∆ and
N (1) are the corresponding number of event up until time 1. This means that both ci
and ri are U [0,1] random variables.10
0 5 10 15 20 250.55
0.6
0.65
0.7
0.75
Days
0 100 200 300 400 5000.4
0.6
0.8
1
1.2
1.4
Days
0 5 10 15 20 250.4
0.6
0.8
1
1.2
1.4
Days
0 100 200 300 400 5000
0.5
1
1.5
Days
Figure 2.2. A short and long plot of the non-Gaussian gamma-OU process, X with∆= 1, ν= 10,α= 1/15 and for two different choices of λ. In the first row λ= 0.02 and in the second λ= 0.25.
Simulating the IG(δ,γ)-OU Process
When X ∼ IG(δ,γ), the tail mass function is given by (see Barndorff-Nielsen and
Shephard (2001b))
W +(x) = δp2π
x−1/2 exp(− 1
2γ2x
).
Finding the inverse tail mass function requires the use of the Lambert-W function,
Lw (x), which solves for w as a function of x in wew = x. Straightforward computation
now gives
W −1(x) = 1
γ2 Lw( γ2δ2
2πx2
). (2.18)
Unlike the case with the gamma-OU process combining (2.15) and (2.18) will not
result in a finite sum. Therefore, the sum in (2.15) must be truncated at some point.
This also means that a simulation scheme based on a truncation of the infinite series
representation will only result in an approximation of the desired IG(δ,γ)-OU process
since the sum of the small jumps is neglected.11 Instead, we use the exact simulation
10Note that to simulate the corresponding realization of Z , one can just choose the constant function 1as the f function in (2.15) and reuse the ci and ri from (2.17).
11See Gander and Stephens (2007) for a way of choosing the truncation point.
2.7. APPENDIX 81
0 5 10 15 20 25
0.16
0.18
0.2
0.22
0.24
Days
0 100 200 300 400 5000
0.5
1
1.5
Days
0 5 10 15 20 250
0.2
0.4
0.6
0.8
1
Days
0 100 200 300 400 5000
0.5
1
1.5
2
2.5
3
Days
Figure 2.3. A short and long plot of the non-Gaussian gamma-OU process, X with ∆ = 1,ν= 1/2, α= 1/2 and for two different choices of λ. In the first row λ= 0.02 and in the secondλ= 0.25.
method from Zhang and Zhang (2008) when simulating trajectories from the IG(δ,γ)-
process. From the recursive relationship in (2.3) we need to simulate from the i.i.d.
process
εn := e−λ∆∫ λ∆n
λ∆(n−1)e s d Z (s),
and from Theorem 1 in Zhang and Zhang (2008) we can do so by using the following
result
Theorem 6. For fixed∆> 0 and if γ> 0, then the random variable ε1 equals the sum of
an inverse Gaussian random variable and a compound Poisson process in distribution,
that is,
ε1d=W ∆
0 +N∆∑i=1
W ∆i ,
where W ∆0 ∼ IG(δ(1−e−
12λ∆),γ), N∆ is Poisson distributed with intensity δ(1−e−
12λ∆)γ
and W ∆1 ,W ∆
2 , . . . are independent random variables with density function
fW ∆ (w) =
γ−1p
2πw−3/2
(e
12λ∆−1
)−1(e−
12γ
2w −e−12γ
2weλ∆), w > 0
0, otherwise.
Further, W ∆0 , W ∆
1 ,W ∆2 , . . . and N∆ are independent.
Proof. For a proof of the Theorem see pages 341-343 in Zhang and Zhang (2008).
82 CHAPTER 2. ON ESTIMATION METHODS FOR NON-GAUSSIAN OU PROCESSES
In Zhang and Zhang (2008) they also prove that for any w > 0 the density func-
tion fW ∆ (w) is dominated by 12
(1+ e
12λ∆
)g (w), where g is the density function of a
Γ( 12 ,2γ−2) distribution. The i.i.d. random variables W ∆
1 ,W ∆2 , . . . can now be generated
using the general acceptance rejection method and the simulation scheme for the
IG(δ,γ)-OU process immediately follows.
0 5 10 15 20 250.75
0.8
0.85
0.9
0.95
Days
0 100 200 300 400 5000
0.5
1
1.5
2
Days
0 5 10 15 20 250
0.5
1
1.5
Days
0 100 200 300 400 5000
0.5
1
1.5
Days
Figure 2.4. A short and long plot of the non-Gaussian IG-OU process, X with ∆= 1, δ=p20/3,
γ=p15 and for two different choices of λ. In the first row λ= 0.02 and in the second λ= 0.25.
0 5 10 15 20 250.05
0.1
0.15
0.2
0.25
0.3
Days
0 100 200 300 400 5000
0.5
1
1.5
Days
0 5 10 15 20 250
0.5
1
1.5
2
Days
0 100 200 300 400 5000
0.5
1
1.5
2
2.5
Days
Figure 2.5. A short and long plot of the non-Gaussian IG-OU process, X with ∆= 1, δ=p1/8,
γ=p2 and for two different choices of λ. In the first row λ= 0.02 and in the second λ= 0.25.
2.8. REFERENCES 83
2.8 References
Barndorff-Nielsen, O. E., 1998. Processes of normal inverse gaussian type. Finance
Stochastic 2, 41–68.
Barndorff-Nielsen, O. E., Shephard, N., 2001a. Modelling by Lévy processes for finan-
cial econometrics. In: Barndorff-Nielsen, O. E., Mikosch, T., Resnick, S. (Eds.), Lévy
processes -Theory and Applications. Birkhäuser, pp. 283–318.
Barndorff-Nielsen, O. E., Shephard, N., 2001b. Non-Gaussian OU-based models and
some of their uses in financial economics (with dicussion). Journal of the Royal
Statistical Society B 63, 167–241.
Barndorff-Nielsen, O. E., Sørensen, M., 1994. A review of some aspects of asymptotic
likelihood theory for stochastic processes. International Statistical Review 62, 133–
165.
Benth, F. E., Benth, J. S., Koekebakker, S., 2008. Statistical Modeling of Electricity and
Related Markets. Advanced Series on Statistical Science and Applied Probability.
World Scientific.
Benth, F. E., Kallsen, J., Meyer-Brandis, T., 2007. A non-Gaussian Ornstein-Uhlenbeck
process for electricity spot price modeling and derivatives pricing. Applied Mathe-
matical Finance 14:2, 153–169.
Benth, F. E., Kiesel, R., Nazarova, A., 2012. A critical empirical study of three electricity
spot price models. Energy Economics 34, 1589–1616.
Benth, F. E., Saltyte Benth, J., 2004. The normal inverse gaussian distribution and
spot price modelling in energy markets. International Journal of Theoretical and
Applied Finance 07 (02), 177–192.
Benth, F. E., Vos, L., 2013. Cross-commodity spot price modeling with stochastic
volatility and leverage for energy markets. Advances in Applied Probability 45,
545–571.
Bibby, B. M., Jacobsen, M., Sørensen, M., 2002. Estimating functions for discretely
sampled diffusion-type models. In: Aït-Sahalia, Hansen, L. P. (Eds.), Handbook of
Financial Econometrics. North Holland - Elsevier, pp. 203–268.
Bibby, B. M., Sørensen, M., 1995. Martingale estimation functions for discretely ob-
served diffusions processes. Bernoulli 1, 17–39.
Brix, A. F., Lunde, A., 2013. Estimating stochastic volatility models using prediction-
based estimating functions. Research paper 2013-23 2013-23, Creates.
84 CHAPTER 2. ON ESTIMATION METHODS FOR NON-GAUSSIAN OU PROCESSES
Gander, M. P. S., Stephens, D. A., 2007. Stochastic volatility modelling with general
marginal distributions: Inference, prediction and model selection. Journal of Sta-
tistical Planning and Inference 137, 3068–3081.
Hubalek, F., Posedel, P., 2013. Asymptotic analysis and explicit esimation of a class of
stochastic volatility models with jumps using the martingale estimating function
approach. Glasnik Matematicki 48(1), 185–210.
Kessler, M., 1995. Martingale estimating functions for a Markov chain. PhD disserta-
tion Preprint, Laboratoire de Probabilités, Université Paris VI.
Klüppelberg, C., Meyer-Brandis, T., Schmidt, A., 2010. Electricity spot price modelling
with a view towards extreme spike risk. Quantitative Finance 10:9, 963–974.
Meyer-Brandis, T., Tankov, P., 2008. Multi-factor jump-diffusion models of electricity
prices. International Journal of Theoretical and Applied Finance 11, 503–528.
Schwartz, E., 1997. The stochastic behaviour of commodity prices: Implications for
valuation and hedging. The Journal of Finance 52(3), 923–973.
Sørensen, M., 1997. Estimating functions for discretely observed diffusions: A re-
view. Lecture Notes-Monograph Series, selected proceeding of the symposium on
estimating functions 32, 305–325.
Sørensen, M., 1999. On asymptotics of estimating functions. Brazilian Journal of
Probability and Statistics 13, 111–136.
Sørensen, M., 2000. Prediction-based estimating functions. Econometrics Journal 3,
123–147.
Valdivieso, L., Schoutens, W., Tuerlinckx, F., 2009. Maximum likelihood estimation
in processes of Ornstein-Uhlenbeck type. Statistical Inference for Stochastic Pro-
cesses 12, 1–19.
Zhang, S., Zhang, X., 2008. Exact simulation of IG-OU processes. Methodology and
Computing in Applied Probability 10, 337–355.
CH
AP
TE
R
3A GENERALIZED SCHWARTZ MODEL FOR
ENERGY SPOT PRICES
ESTIMATION USING A PARTICLE MCMC METHOD
Anne Floor Brix
Aarhus University and CREATES
Asger Lunde
Aarhus University and CREATES
Wei Wei
Aarhus University and CREATES
Abstract
We consider a two-factor geometric spot price model with stochastic volatility and
jumps. The first factor models the normal variations of the price process and the
other factor accounts for the presence of spikes. Instead of using various filtering
techniques for splitting the two factors, as often found in the literature, we estimate
the model in one step using a MCMC method with a particle filter. In our empirical
analysis we fit the model to UK natural gas spot prices and investigate the importance
of allowing for jumps and stochastic volatility. We find that the inclusion of stochastic
volatility in the process used for modeling the normal price variations is crucial and
that it strongly impacts the jump intensity in the spike process. Furthermore, our
estimation method enables us to consider both a continuous and purely jump-driven
specification of the volatility process, and thereby assess if the volatility specification
also influences the spike process and the overall model fit.
85
86 CHAPTER 3. A GENERALIZED SCHWARTZ MODEL FOR ENERGY SPOT PRICES
3.1 Introduction
The liberalization of the European energy markets has over the last couple of decades
led to highly deregularized and liquid markets for trading energy commodities, such
as gas and electricity. The introduction of competition, and hence also price risk, has
caused the markets to experience a significant increase in price volatility and a market
place for energy-based derivatives, used for hedging, has emerged. The transition to a
competitive market where prices are set according to supply and demand means that
energy spot prices have several distinct characteristics that should be captured by
any proposed model. The most important features are seasonality, mean-reversion,
spikes, multi-scale autocorrelation and stochastic volatility, see for instance Eydeland
and Wolyniec (2003) for empirical evidence on these stylized facts. Seasonality is
caused by the seasonal pattern on the demand side of the market, for instance by an
increased need for heating during the winter. Mean-reversion is a direct consequence
of the markets being supply and demand driven, which means that, unlike the stock
market, prices are not allowed to evolve freely but will fluctuate around a (possibly
stochastic) level. This also has the important implication that the deseasonalized
spot prices will be modeled using stationary processes. Due to delivery constraints in
the spot market, sudden imbalances in supply and demand are almost immediately
reflected in the spot price, causing the price to jump because of an inelastic demand
curve. These imbalances are typically caused by an unexpected rise in demand or
technical problems on the supply side. After experiencing a jump, the price quickly
mean-reverts to the normal level of production costs, leaving a spike in the price path.
The multi-scale autocorrelation structure that is observed in many markets is often a
consequence of the spike part of the price process having a stronger mean-reversion
rate than the so-called base-signal process that accounts for the normal variations.
The inclusion of stochastic volatility in the modeling framework helps to replicate
the time-series properties of the prices, such as a leptokurtic distribution, and to
accurately estimate the jump part of the model. As we will shall see in our application
to the UK natural gas market, failing to include stochastic volatility will drive up the
expected number of jumps, contradicting the fact that jumps are supposed to be rare
events.
In the univariate model, proposed in this chapter, the logarithmic spot price
is given as the sum of three factors. The first factor is a deterministic mean-level
function that models the, possibly trending, seasonally varying mean-level of the
logarithmic spot price. The second factor captures the base-signal part of the price
process and will be modeled using a Gaussian Ornstein-Uhlenbech (OU) process
with stochastic volatility. The third factor is a non-Gaussian OU process that accounts
for the spike behavior. We will consider both a continuous and a purely jump-driven
specification of the volatility process. Instead of using various filtering techniques
to split the base-signal and spike process in a first step before estimating the model
parameters, the chapter contributes to the existing literature by proposing a method
3.1. INTRODUCTION 87
for estimating the model in one step using the particle MCMC (PMCMC) methods
developed in Andrieu et al. (2010). The estimation procedure therefore allows for an
investigation of the interplay between the specifications of base-signal and the spike
process. In Green and Nossman (2008), a similar model is also estimated in one step
using MCMC. In contrast to our approach, the authors in Green and Nossman (2008)
condition on future values when computing the posterior distribution, rendering in-
sample forecasts a useless tool for model evaluation. The authors also have to include
a Brownian component in the specification of the spike factor in order to ensure that
the factor has an absolutely continuous distribution when conditioning on the jumps,
thereby simplifying the MCMC estimation. Furthermore, our estimation approach,
using the particle marginal Metropolis-Hastings sampler, has the great advantage of
being able to accommodate different volatility specifications, including pure jump
processes. This will enable us to investigate if the different volatility specifications
have the same impact on the filtered spike process and if the volatility specification
impacts the overall fit of the model. The method can also handle non-Markovian
models, which is essential for effective sampling of the spike process in our proposed
two-factor model. Finally, one of the outputs of the particle filter is the likelihood,
which makes computation of Bayes factors and model comparison straightforward.
The stepping stone for many of the spot price models found in the literature is
the mean-reverting one-factor Schwartz model from Schwartz (1997), where the spot
price is defined as the exponential of a Gaussian OU process. This model was further
extended to include a deterministic seasonality factor in Lucia and Schwartz (2002).
In Benth, Ekeland, Hauge, and Nielsen (2003), the geometric spot price model from
Lucia and Schwartz (2002) are generalized to allow for jumps. A special case of this
model, solely based on the NIG distribution is applied to oil and gas in Benth and
Saltyte Benth (2004). Another special case of the model from Benth et al. (2003) is the
jump diffusion model, which have been used for modeling electricity spot prices in
Cartea and Figueroa (2005) and Benth, Kiesel, and Nazarova (2012).
In Benth, Kallsen, and Meyer-Brandis (2007) an arithmetic multi-factor model
based on non-Gaussian OU processes is suggested. The model is able to capture
both the spike behavior and multi-scale autocorrelation structure of spot prices, and
positivity of prices are ensured by letting the non-Gaussian OU processes be driven
by subordinators. The possibility of negative prices in arithmetic models can also be
viewed as an advantage since negative prices can actually occur in some energy mar-
kets, such as the electricity market. Arithmetic models are also advantageous when it
comes to pricing of forward contracts with delivery being made over a period instead
of at a single point in time. Due to the affine structure of the spot price in arithmetic
models, forward prices become more analytically tractable than in geometric models.
We will instead consider geometric models, as these are a natural extension of the
GBM used in the financial markets. Besides, derivative pricing is not the focus of this
chapter. It is also easier to model negative spikes, a feature often observed in gas
88 CHAPTER 3. A GENERALIZED SCHWARTZ MODEL FOR ENERGY SPOT PRICES
spot prices, and ensure prices above a certain level (for instance zero) when using
geometric models. The arithmetic multi-factor model from Benth et al. (2007) are
estimated in Meyer-Brandis and Tankov (2008) and Benth et al. (2012) by splitting
the spike process and base-signal process using a nonparametric method called
hard-thresholding. In Klüppelberg, Meyer-Brandis, and Schmidt (2010), the filtering
of the two mean-reverting processes are instead performed using a method based on
extreme value theory. A two-factor extension of the geometric jump diffusion model
from Cartea and Figueroa (2005), with a different jump size distribution, can be found
in Hambley, Howison, and Kluge (2009), but no estimation method is suggested in
the paper.
The inclusion of stochastic volatility in the models used for modeling energy
markets, was among others suggested by Geman (2005), where a Heston stochastic
volatility extension of the Schwarz model is considered, but not estimated. In Green
and Nossman (2008) a two-factor extension of the Schwartz model with Heston
stochastic volatility is proposed and fitted to electricity spot prices using Markov
Chain Monte Carlo (MCMC) techniques. A jump-driven specification of the volatility
process is considered in Benth (2011), where the geometric one-factor model from
Lucia and Schwartz (2002) is augmented with stochastic volatility given by the sum of
non-Gaussian OU processes. However, in Benth (2011) a one-factor volatility process
is utilized when the model is fitted to UK natural gas spot prices. The stochastic
volatility model from Benth (2011) is extended in Benth and Vos (2013a) to incorporate
spikes and a leverage effect, in a multidimensional setting, allowing for the joint
modeling of several commodities. The model in Benth and Vos (2013a) only allows for
positive jumps in the spot price, as the non-Gaussian OU factors entering the model
are driven by subordinators. Estimation of the model from Benth and Vos (2013a) is
however still an open question. The estimation method detailed and employed in this
chapter in a univariate setting, also has potential for usage in the multi-dimensional
setup.
With the pure jump driven specification of the volatility process, our proposed
model is a univariate version of the geometric model from Benth and Vos (2013a),
with the extra flexibility of accommodating both negative and positive spikes. We will
use the tempered stable OU process as our jump driven volatility specification. When
the continuous specification is considered instead, our model closely resembles the
model proposed in Green and Nossman (2008), where a CIR specification is used to
describe the volatility dynamics. Instead, we will use a logarithmic OU process as our
continuous specification of the volatility process.
The chapter is organized as follows: In Section 2 our proposed model and the
benchmark models are presented. Section 3 describes the setup of our empirical
application and the PMCMC estimation method is outlined in Section 4. In Section 5
the estimation results are presented and various methods for model comparisons
are performed. Section 6 offers a discussion of possible extensions of the model and
3.2. MODEL DESCRIPTIONS 89
estimation procedure. Final remarks are given in Section 7.
3.2 Model Descriptions
In this section we will describe our proposed model. Let S(t ) denote the spot price at
time t . The dynamics of the spot price will be described using the following geometric
OU-based factor model, augmented with stochastic volatility:
d logS(t ) = d logΛ(t )+d X (t )+dY (t ),
d X (t ) =−αx X (t )d t +σ(t )dB(t ),
dY (t ) =−αy Y (t )d t +d I (t ).
The first factor,Λ(t ), is a deterministic function that accounts for the possible trend
and seasonal patterns of the data. The specification ofΛ(t ) and the procedure used
for detrending and deseasonalizing the spot prices will be described in Section 3.
In this section, the focus will be on modeling the detrended and deseasonalized
process: Z (t) = logS(t)− logΛ(t)∆= X (t)+Y (t), discretized using a time interval of
length ∆= 1, day to match the data in our empirical application. The process, X (t ), is
a Gaussian OU process with stochastic volatility, σ(t), that models the continuous
variations in the logarithmic spot price and will be interpreted as the base-signal
process. The last factor, Y (t), is non-Gaussian OU process with a pure jump Lévy
process as background driving Lévy process (BDLP). Y (t ) will be interpreted as the
spike process. The different mean-reversion rates, αx > 0 and αy > 0, make multi-
scale autocorrelation possible and help to reproduce the time series properties of
the data, where a faster mean-reversion rate is observed for the spike process. The
specification of X (t) and Y (t) are given below, and in the following subsections
we present our benchmark models, that are all nested in the model described in
Subsection 3.2.1.
3.2.1 The TF-SVJ Model
In our proposed two-factor model with stochastic volatility and jumps, labeled TF-
SVJ, and the benchmark models we will use I (t) = N (t) as our BDLP, where N (t) is
a compound Poisson Process with intensity parameter λJ and normally distributed
jump sizes. The detrended and deseasonalized spot price, Z (t ), will then solve
d Z (t ) = d X (t )+dY (t )
=−αx X (t )d t −αy Y (t )d t +σ(t )dB(t )+d N (t ).
If we assume that at most one jump occurs per day and approximate the variance of
the increments in the AR(1) representation of X (t ),∫ t+1
t e−2αx (t+1−s)σ2(s)d s, by
σ2(t )∫ t+1
te−2αx (t+1−s)d s =σ2(t )
1−e−2αx
2αx,
90 CHAPTER 3. A GENERALIZED SCHWARTZ MODEL FOR ENERGY SPOT PRICES
the discretized model becomes
Zt+1 = X t+1 +Yt+1
X t+1 = e−αx X t +εt+1
Yt+1 = e−αy Yt +ξt+1 Jt+1
where εt+1 ∼ N(0,σ2(t ) 1−e−2αx
2αx
), Jt+1 ∼ Ber noull i (λJ ) and ξt+1 ∼ Nor mal (µJ ,σ2
J ).
A similar model was suggested in Green and Nossman (2008), using a CIR specifi-
cation of the stochastic volatility process and including an additional independent
Brownian component in the spike process, Y (t). We consider both a purely jump-
driven specification of the volatility process and a continuous specification. Note
that the base-signal, X (t), will be continuous regardless of the specification of the
volatility process and the process Y (t ) will therefore account for the spikes. The two
specifications of the volatility process σ2(t ) are given below.
Lévy-driven volatility with tempered stable marginals
The first specification of the volatility process under consideration is the case where
σ2(t ) is a tempered stable (TS) OU process. That is, σ2(t ) solves
dσ2(t ) =−λσ2(t )d t +dL(λt ),
and the marginal distribution of σ2(t ) follows a tempered stable distribution, σ2(t ) ∼T S(κ,δ,γ). This process can be simulated resursively from
σ2(t +1) = e−λσ2(t )+e−λ∫ 1
0eλudL(λu).
It is shown in Barndorff-Nielsen and Shephard (2001) that the BDLP of the TS-OU
is the sum of a TS Lévy process and a compound Poisson process. We use Rosinski’s
method to simulate the infinite activity part, and the innovations can be sampled
using the following expression:
e−λ∆∫ ∆
0eλudL(λu)
d=∞∑
i=1e(−λ∆ri ) min
(aiκ
Aλ∆
)−1/κ
,ei v1/κi
+
N (λ∆)∑i=1
e−λ∆r∗i ci ,
where A = δ2κκ2/Γ(1−κ) and B = 12γ
1/κ. The sequences of random variables, ri ,
ai , ei , vi , r∗i and ci are all mutually independent. ri , vi and r∗
i are i.i.d.
standard uniforms, and ei are i.i.d. exponential with mean 1/B , and ci are i.i.d.
Gamma with shape parameter (1−κ) and scale parameter 1/B . The a1 < ... < ai < ...
are the arrival times of a Poisson process with intensity 1. Finally, N (λ∆) is a Poisson
random variable with mean λ∆δγκ. Further, we have
σ2(0)d=
∞∑i=1
min
(aiκ
A0
)−1/κ
,ei v1/κi
,
3.2. MODEL DESCRIPTIONS 91
where A0 = δ2κκ/Γ(1−κ) .
The infinite sums are dominated by the first few terms, as shown in Barndorff-
Nielsen and Shephard (2001). We truncate the sum to its first 100 terms as in Andrieu
et al. (2010). As a special case, the volatility process becomes an Inverse Gaussian
(IG) OU process when κ= 0.5. In Benth (2011) a Gaussian OU process with stochastic
volatility following a IG-OU process is fitted to the logarithm of natural gas spot prices
in the UK.
Logarithmic volatility
The second volatility specification we consider is a continuous specification, where
we assume that the logarithmic volatility, h(t) = logσ2(t), follows a Gaussian OU
process
dh(t ) =−αh(h(t )−µh)d t +σhdBh(t ),
where Bh(t ) and B(t ) are two independent Brownian motions.
3.2.2 The SF-J Model
In this subsection and the following ones, the benchmark models used for compar-
ison are outlined. All the models are nested in the TF-SVJ model and all of them
describe the dynamics of the deseasonalized logarithmic spot price Z (t ).
In the first, and most simple, benchmark model we consider a single factor model
with jumps (and constant volatility), which we label the SF-J model (Single factor
model with Jumps). That is, we assume αx = αy = α and σ(t) = σ and obtain the
following model
d Z (t ) =−αZ (t )d t +σdB(t )+d N (t ).
If we assume that there is at most one jump in a day, we get the following discretized
model
Zt+1 = e−αZt +εt+1 +ξt+1 Jt+1
where εt+1 ∼ N(0, σ
21−e−2α
2α
), Jt+1 ∼ Ber noull i (λJ ) and ξt+1 ∼ Nor mal (µJ ,σ2
J ). This
model resembles the model proposed in Cartea and Figueroa (2005), with the only
difference being the jump size distribution. In Cartea and Figueroa (2005) the authors
instead use a log-normal jump distribution.
3.2.3 The SF-SV Model
The next benchmark model we consider, is a single factor Gaussian OU process with
stochastic volatility, corresponding to the assumptions αx =αy =α and I (t ) = 0. This
92 CHAPTER 3. A GENERALIZED SCHWARTZ MODEL FOR ENERGY SPOT PRICES
model will be labeled the SF-SV model since it is a single factor model with stochastic
volatility (and no jumps). The detrended and deseasonalized logarithmic spot price
now solves
d Z (t ) =−αZ (t )d t +σ(t )dB(t ),
and
Z (t +1) = e−αZ (t )+∫ t+1
tσ(s)e−α(t+1−s)dB(s)
∼ N
(e−αZ (t ),
∫ t+1
te−2α(t+1−s)σ2(s)d s
).
If we again approximate∫ t+1
t e−2α(t+1−s)σ2(s)d s by σ2(t)∫ t+1
t e−2α(t+1−s)d s, the
discretized model becomes the AR(1) model
Zt+1 = e−αZt +εt+1,
where εt+1 ∼ N(0,σ2(t ) 1−e−2α
2α
). We consider the same two specifications of σ(t ) as in
the full model, the TF-SVJ model. With the TS-OU specification, the SF-SV model is a
special case of the model considered in Benth (2011).
3.2.4 The SF-SVJ Model
We now consider adding jumps to the model specification from the previous subsec-
tion, resulting in a single factor version of our proposed model, the TF-SVJ model.
This model will therefore be labeled as the SF-SVJ model. Using the same assump-
tions and approximations as in the SF-J and SF-SV models, we obtain the following
discretized model
Zt+1 = e−αZt +εt+1 +ξt+1 Jt+1
where εt+1 ∼ N(0,σ2(t ) 1−e−2α
2α
), Jt+1 ∼ Ber noul l i (λJ ) and ξt+1 ∼ Nor mal (µJ ,σ2
J ).
The specifications of the volatility process are again the tempered stable OU
process and the logarithmic volatility model.
3.2.5 The TF-J Model
All the benchmark models considered so far have been single factor models, in
the sense that the base-signal and spike part have had the same mean-reversion
parameter α. The restriction, αx =αy , also has the important implication that Z (t )
will be a Markov process. We now consider extending our first benchmark model,
the SF-J model, by relaxing the assumption αx =αy , and instead allow each factor to
3.2. MODEL DESCRIPTIONS 93
have a separate mean-reversion rate. The resulting two factor model with jumps will
be labeled the TF-J model. Z (t ) will then solve
d Z (t ) =−αx X (t )d t −αy Y (t )d t +σdB(t )+d N (t ).
Once again we assume that there is at most one jump a day and we get the following
discretized model
Zt+1 = X t+1 +Yt+1
X t+1 = e−αx X t +εxt+1
Yt+1 = e−αy Yt ++ξt+1 Jt+1
where εt+1 ∼ N(0,σ2 1−e−2αx
2αx
), Jt+1 ∼ Ber noull i (λJ ) and ξt+1 ∼ Nor mal (µJ ,σ2
J ).
3.2.6 Model Overview
In Table 3.1 we give an overview of our proposed model and the benchmark models
described in the previous subsections.
Table 3.1. Model Overview.
Stoc. Const. Jumps Jumps Two-factor
vol. vol. in vol. (αx 6=αy )
TF-SVJTS ! ! ! !
TF-SVJLog ! ! !
SF-J ! !
SF-SVTS ! !
SF-SVLog !
SF-SVJTS ! ! !
SF-SVJLog ! !
TF-J ! ! !
94 CHAPTER 3. A GENERALIZED SCHWARTZ MODEL FOR ENERGY SPOT PRICES
3.3 Data Description and Initial Analysis
This section describes the data used in our empirical investigation and the detrending
and deseasonalization of the data. We will fit our model and the benchmark models
from the previous section to a time series of daily UK gas spot prices ranging from
September 11, 2007 to February 10, 2014. The data is collected from Bloomberg1 and
report day-ahead gas spot prices collected at the virtual hub NBP (Natural Balancing
Point) for trading days (weekdays) in the sample period. This leaves us with a total of
1620 daily price quotes. There are no missing observations in the data set and the log
spot price is depicted in Figure 3.1.
2008 2009 2010 2011 2012 2013 2014
2.5
3
3.5
4
4.5
logarith
mic
price
time
logarithm of gas spot price
Trend and Seasonality
Figure 3.1. The logarithm of the daily day-ahead UK gas spot price.
From Figure 3.1, the presence of both positive and negative spikes become evi-
dent. Positive spikes are usually caused by unpredicted weather changes, yielding
an increase in demand for gas used in power production. In the UK market, supply
uncertainty is also starting to play a role as the UK are becoming more and more de-
pendent on gas import. The dependence on import from mainland Europe, through
capacity constrained pipelines, can cause a slower reaction to an increase in demand,
and in turn cause a spike in the price process. The negative spikes are often a conse-
quence of poor anticipation of market-wide gas storage levels. Storage is costly and
1Code: NBPGDAHD index
3.3. DATA DESCRIPTION AND INITIAL ANALYSIS 95
−0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.60
100
200
300
400
500
600Histogram of log−returns compared to Normal distribution
daily changes in logarithm of gas spot prices
frequency
log−return
Gaussian
Figure 3.2. Histogram of daily changes in the logarithm of gas spot prices.
can not fully reconcile the variable seasonal demand for gas with the more constant
rate of production. Low inventory levels can also increase price volatility and the risk
of spike occurrence.
Like in most of the literature on commodity modeling, we start our investigation
of the data characteristics by fitting a deterministic trend and seasonality function
to the data. However, before this can be implemented we need to check for outliers
in the data as these might influence the parameters in the fitted trend and seasonal
components. A visual inspection of Figure 3.1 already suggested the presence of
outliers, i.e. the large price spikes. From the histogram in Figure 3.2 it becomes clear
that the daily changes in the logarithmic spot price are not normally distributed,
but instead follow a leptokurtic distribution. To detect the possible outliers in data
that are not normally distributed, the same approach as in Chapter 5 of Benth et al.
(2008) is employed. Let the daily change in the logarithmic spot price from day t −1
to day t be denoted by ∆st = log(St )− log(St−1) for t = 2, . . . ,1620. Now define the
interquartile, IQR , as the difference between the upper quartile Q3 and lower quartile
Q1 of the time series ∆st . Assuming that the first observation in the data is not an
outlier, log(St−1) will be labeled as an outlier whenever∆st is larger than Q3+3× IQR
or smaller than Q1 −3× IQR. This procedure resulted in 56 detected outliers. The
96 CHAPTER 3. A GENERALIZED SCHWARTZ MODEL FOR ENERGY SPOT PRICES
Table 3.2. Fitted parameter values and std. errors for logΛ(t ).
a0 a1 a2 a3
3.6494 0.0003 0.0761 100.08(0.0139) (1.49e-05) (0.0098) (5.1467)
detected outliers are then replaced by the average of the two closest non-outlier
observations.
Assuming 250 trading days a year, the trend and seasonal patterns in the logarith-
mic spot prices are modeled by the following function
logΛ(t ) = a0 +a1t +a2 cos(2π(t −a3)/250).
The function represents the average level around which the gas prices fluctuate, and
consists of a linear trend describing the inflation in the natural gas prices and a
seasonal component modeling the seasonal variation over the year. The function is
fitted to the logarithmic spot prices, with the replacement of the outliers, using the
nlinfit function in MATLAB. The results are reported in Table 3.2 and the fitted
seasonality function is depicted in Figure 3.1.
All the estimates in Table 3.2 are significant at the 5% level. We also fitted a
function taking weekly, monthly and quarterly effects into account, but these effects
were not significant at the 5% level and will be ignored going forward. The detrended
and deseasonalized logarithmic spot price, Z (t ), can now be computed by inserting
back the detected outliers and subtracting the fitted logΛ(t ) function. The resulting
time series is depicted in Figure 3.3 and will serve as the input for our estimation
method outlined in the following section.
3.4 Estimation Method
In this section, the Bayesian techniques underlying our estimation method will be
carefully described. Our model is able to account for important features of the spot
price dynamics, such as stochastic volatility, jumps, and separate mean reversion
rates for the base-signal and the spike process. The flexibility of the model also poses
many challenges to the estimation. First, for models with stochastic volatility, eval-
uating the exact likelihood involves intractable high dimensional integration since
volatility is latent. By treating the stochastic volatility as a state variable, these models
have a nonlinear state space representation, where the measurement equation de-
scribes how the logarithmic price changes given state variables, and the transition
equation describes the evolution of the states. Jacquier, Polson, and Rossi (1994)
developed Bayesian MCMC methods for conducting exact inference in stochastic
volatility models. Since then, Bayesian methods have been extensively applied to
3.4. ESTIMATION METHOD 97
2008 2009 2010 2011 2012 2013 2014−1.5
−1
−0.5
0
0.5
1detr
ended a
nd d
eseasonaliz
ed log o
f gas s
pot price
time
Figure 3.3. Detrended and deseasonalized logarithm of gas spot prices.
stock return models, including jump-diffusion models, see for example Eraker, Jo-
hannes, and Polson (2003). Second, contrary to stock prices, energy prices tend to
revert to a long run mean determined by the marginal cost of production. When
jumps are present, they appear as spikes, meaning that prices revert to the mean
level fast after a jump has occurred. Green and Nossman (2008) propose a MCMC
algorithm to handle energy models with these special features.
The third complication comes when we consider stochastic volatility that is
driven by a pure jump process. In this case, the volatility process and the parameters
governing its dynamics can be highly correlated in their posterior distributions, which
results in extremely slowly mixing chains in the above mentioned MCMC algorithms.
This problem is referred to as over-conditioning. Roberts, Papaspiliopoulos, and
Dellaportas (2004) suggest a reparameterization to reduce the correlation. Griffin
and Steel (2006) propose an algorithm with dependent thinning and reversible jump
MCMC. However, these procedures can not be easily generalized to the multi-factor
models that are popular with commodity prices.
We adopt the particle MCMC methods introduced in Andrieu et al. (2010), in
particular the particle marginal Metropolis-Hastings (PMMH) sampler. PMMH al-
gorithms can be easily adapted to accommodate different volatility specifications,
including both pure jump OU processes and the logarithmic Gaussian OU process.
98 CHAPTER 3. A GENERALIZED SCHWARTZ MODEL FOR ENERGY SPOT PRICES
Furthermore, it can be applied to non-Markovian models, where the measurement
density or the transition density may depend on the entire past of the latent process.
This allows us to use a non-Markovian representation of the multi-factor model and
is essential for effective sampling of the spike process. Last but not least, we use the
likelihood obtained from the algorithm to compute Bayes factors and conduct model
comparison.
As the name suggests, PMCMC has two components: a particle filter or sequential
Monte Carlo (SMC) step and a MCMC step. Specifically, the PMMH sampler employs
SMC to approximate the likelihood and latent variables conditional on the model
parameters, and then apply MH algorithms to obtain the joint distribution of the
parameters and the states, given the observations. We extend the standard PMMH
algorithm in two aspects. First, for models with jumps or spikes, advanced SMC
techniques need to be employed to alleviate a problem known as sample impoverish-
ment. We propose to deal with this issue by marginalizing out some latent variables,
a technique called Rao-Blackwellization; see Doucet, Freitas, Murphy, and Russell
(2000). Our approach is closely related to auxiliary particle filters developed by Pitt
and Shephard (1999) and illustrated in Johannes, Polson, and Stroud (2009). Second,
it is costly to evaluate the likelihood using SMC, and we therefore utilize adaptive
algorithms to improve the efficiency of the Metropolis-Hasting sampler; see Andrieu
and Thoms (2008) for a review on adaptive MCMC. The rest of this section focus on
the estimation of the model in Section 3.2.1, as it is the most complex model, and
nests all the benchmark models.
3.4.1 Sequential Monte Carlo
In the state space representation of our proposed model, the TF-SVJ model, the
observed price process, Zt , is the sum of two latent processes X t and Yt , without
any measurement error. We can not apply particle filters directly in this case since
there is no measurement density. One solution is to add a small Gaussian error term
to the measurement equation. This is equivalent to assuming that Yt is a jump-
diffusion instead of pure jump process, and it is comparable to the model in Green
and Nossman (2008). However, this would still be problematic if particle filters with
blind proposals, also called bootstrap filters, are implemented. If the variance of the
measurement errors is small compared to the variance of the latent process, then
the observations are informative about the latent process and bootstrap filters will
perform poorly, see Pitt, dos Santos Silva, Giordani, and Kohn (2012) for example.
We propose a different approach for solving this problem. Specifically, we use the
following representation of Model 1
Zt+1 = e−αx Zt +Yt+1 −e−αx Yt +εt+1,
Yt+1 = e−αy Yt +ξt+1 Jt+1.
3.4. ESTIMATION METHOD 99
Notice that this is no longer a Markovian state space model, in the sense that the
measurement density depends on both Yt and Yt+1, but we can use SMC methods to
evaluate the likelihood and simulate the states given the parameters. Let θ and K de-
note the parameters and the latent variables respectively, where Kt+1 = σ2(t ), Yt+1.
SMC methods start with approximating the continuous filtering density pθ(K1:t |Z1:t )
by a discrete distribution made of weighted random samples called particles. Given
particles and associated weights, K (i )1:t ,ω(i )
t Ni=1, that approximate pθ(K1:t |Z1:t ), SMC
obtain samples from pθ(K1:t+1|Z1:t+1) and compute pθ(Zt+1|Z1:t ) sequentially. Using
Bayes Theorem,
pθ(K1:t+1|Z1:t+1) = pθ(Zt+1|Kt+1, Z1:t ,K1:t )pθ(Kt+1|K1:t )
pθ(Zt+1|Z1:t )pθ(K1:t |Z1:t ), (3.1)
the density of interest pθ(K1:t+1|Z1:t+1) can be sampled using importance sam-
pling techniques. The basic SMC choose the proposal density (importance density)
gθ(K1:t+1) to be pθ(Kt+1|K1:t )pθ(K1:t |Z1:t ), i.e., the new particles K (i )t+1 are propagated
from K (i )t using only transition densities and are “blind” to the observations. The
importance weights are given by the ratio of the target density and the proposal
density. From equation (3.1), it is therefore easily seen that the weights for the par-
ticles K (i )1:t+1 are proportional to ω(i )
t+1ω(i )t , where the incremental weights ω(i )
t+1 are
simply given by pθ(Zt+1|K (i )t+1, Z1:t ,K (i )
1:t ). The likelihood, pθ(Zt+1|Z1:t ), is the normal-
izing constant for the particles and is equal to∑N
i=1 ω(i )t+1ω
(i )t . After normalizing the
weights ω(i )t+1 = ω(i )
t ω(i )t+1/pθ(Zt+1|Z1:t ), the particles K (i )
1:t+1,ω(i )t+1N
i=1 now approxi-
mate pθ(K1:t+1|Z1:t+1).
If the variance of the weights is large, the particles yield a worse approximation to
the continuous distribution pθ(K1:t+1|Z1:t+1), as the number of effective particles has
decreased. In bootstrap filters, the incremental weights are simply the measurement
density, and the algorithm performs better when the states are persistent, and when
the observations are less informative about the states than the transition density. This
is not the case for the spike process Yt+1. If there is a jump at time t +1, and Yt+1 is
propagated from a blind proposal, the measurement density, pθ(Zt+1|Kt+1, Z1:t ,K1:t ),
will peak at a few values, resulting in only a few particles having prominent weights.
To alleviate this problem, one needs to adapt the proposal density of Yt+1, or in other
words, to incorporate Zt+1 in the proposal density.
We employ the Rao-Blackwellization technique, as the innovations in the spike
process can be integrated out conditional on other state variables. The vector Kt+1
has two components, the stochastic volatility,σ2(t ), and spike process, Yt+1. Since the
innovation in Yt+1 is assumed to be Bernoulli distributed jump times with normally
distributed jump sizes, pθ(Yt+1|Zt+1,σ2(t ), Z1:t ,K1:t ) is analytically tractable. Hence,
we can rewrite equation (3.1) as
pθ(K1:t+1|Z1:t+1)
=pθ(Yt+1|Zt+1,σ2(t ), Z1:t ,K1:t )pθ(Zt+1|σ2(t ),K1:t , Z1:t )pθ(σ2(t )|σ2(t −1))
pθ(Zt+1|Z1:t )pθ(K1:t |Z1:t ),
100 CHAPTER 3. A GENERALIZED SCHWARTZ MODEL FOR ENERGY SPOT PRICES
and choose the following proposal density,
gθ(K1:t+1|Z1:t+1) = pθ(Yt+1|Zt+1,σ2(t ), Z1:t ,K1:t )pθ(σ2(t )|σ2(t −1))pθ(K1:t |Z1:t ).
Here, the stochastic volatility, σ2(t ), is still propagated from its transition density, but
Yt+1 is adapted to Zt+1 as we can sample directly from pθ(Yt+1|Zt+1,σ2(t ), Z1:t ,K1:t ).
In particular, we draw Jt+1 and ξt+1 from
pθ(Jt+1|Zt+1,σ2(t ), Z1:t ,K1:t ) =pθ(Zt+1|Jt+1,σ2(t ), Z1:t ,K1:t )pθ(Jt+1)
pθ(Zt+1|σ2(t ),K1:t , Z1:t ),
and
pθ(ξt+1|Jt+1, Zt+1,σ2(t ), Z1:t ,K1:t ) =pθ(Zt+1|ξt+1, Jt+1,σ2(t ), Z1:t ,K1:t )pθ(ξt+1)
pθ(Zt+1|σ2(t ), Z1:t ,K1:t )pθ(Jt+1),
then let Yt+1 = e−αy Yt +ξt+1 Jt+1.
The incremental weights ω(i )t+1 = pθ(Zt+1|σ2(t ),K1:t , Z1:t ) do not depend on Yt+1
as ξt+1 and Jt+1 are integrated out. As before, the likelihood pθ(Zt+1|Z1:t ) equals∑Ni=1 ω
(i )t+1ω
(i )t .
If importance sampling is carried out sequentially, weights will degenerate and
only a few particles will have significant weights after a few iterations. The degeneracy
grows exponentially in time and makes particle approximations unreliable. SMC
uses a resampling step to deal with this problem. The particles K (i )1:t are resampled
with replacement according to their normalized weights ω(i )t , using the multinomial
distribution ω(i )t N
i=1 for instance. Particles with higher weights will be duplicated
and particles with lower weights will be eliminated. After resampling the particles,
they all have equal weights.
The likelihood computed from SMC is random, and the variance of the likelihood,
which is related to the variance of the weights, greatly impacts the acceptance rate in
the MCMC step. The Rao-Blackwellization technique described above is the first step
we adopt to reduce the variance. Second, the resampling step introduces additional
Monte Carlo error, and we implement residual resampling as it has smaller variance
than multinomial resampling, see Douc and Cappe (2005). Also, it satisfies the unbi-
asedness condition, meaning that the expected number of particles is proportional to
the weights. Lastly, the variance of the likelihood decreases as the number of particles
increases. However, for limited computation time, one faces a trade-off between
the number of MCMC iterations to run and the number of particles to use in each
iteration. See Pitt et al. (2012) for a guide on how to choose the optimal number of
particles.
3.4.2 MCMC
SMC methods approximate the likelihood and state variables conditional on the
parameters, but we are more interested in the joint distribution of the parameters and
3.4. ESTIMATION METHOD 101
states. Notice that p(θ,K1:T |Z1:T ) can be decomposed into p(θ|Z1:T )p(K1:T |θ, Z1:T ).
The PMMH sampler suggest the following proposal density,
q(θ,K1:T |Z1:T ) = q(θ|θg )p(K1:T |θ, Z1:T ).
The draw, θg+1, from q(θ|θg ), therefore has the simplified acceptance probability,
αg+1 = min
p(Z1:T |θg+1)p(θg+1)q(θg |θg+1)
p(Z1:T |θg )p(θg )q(θg+1|θg ),1
, (3.2)
where p(Z1:T |θ) can be replaced by its particle approximation, as shown in Andrieu
et al. (2010). If the marginal distributions of the states are of interest, we also sample
K g+11:T from p(K1:T |θg+1, Z1:T ) in the SMC step and accept it jointly with θg+1.
The choice of proposal density q(θ|θg ) is another crucial element in determining
the efficiency of the MCMC algorithm. We use a random-walk proposal, θg+1 ∼TN(θg ,βgΣg ), where TN denotes truncated normal, as some of the parameters have
finite support. In particular:
q(θg |θg+1) = f n(θg ;θg+1,βgΣg )
FN (θu;θg+1,βgΣg )−F N (θl ;θg+1,βgΣg )
q(θg+1|θg ) = f n(θg+1;θg ,βgΣg )
FN (θu;θg ,βgΣg )−F N (θl ;θg ,βgΣg ), (3.3)
where f N and F N denote pdf and cdf of the multivariate normal distribution respec-
tively, θu is the upper limit of parameters, and θl is the lower limit of parameters.
The ratio of proposal densities in equation (3.2) simplifies to the ratio of normalizing
constants as f n is symmetric:
q(θg |θg+1)
q(θg+1|θg )= FN (θu;θg ,βgΣg )−F N (θl ;θg ,βgΣg )
FN (θu;θg+1,βgΣg )−F N (θl ;θg+1,βgΣg ). (3.4)
Notice that if all parameters have support on the whole real line, the proposal density
is symmetric, and the ratio of proposal densities becomes 1.
Gelman, Roberts, and Gilks (1996) show that the efficiency of the random-walk
MH algorithm is maximized when Σg is the covariance matrix of the target poste-
rior distribution, and the scaling factor βg is approximately 2.382/d , where d is the
number of parameters.
In practice, we do not know Σg a priori. Adaptive MCMC allows us to learn Σg “on
the fly”, using previous updates in the chain to construct this covariance. The resulting
chain θg+1Gg=1 is not Markovian as the proposal density depends on the history of θ,
and ergodicity of the chain can be perturbed. Haario, Saksman, and Tamminen (2001)
propose an adaptive Metropolis (AM) algorithm, using the whole history of the chain,
or any increasing part of the past, which leads to vanishing adaptation and preserves
the correct ergodic property. We adopt the AM algorithm with global adaptive scaling
as in Andrieu and Thoms (2008). When the chain is starting, Σg might be a poor
102 CHAPTER 3. A GENERALIZED SCHWARTZ MODEL FOR ENERGY SPOT PRICES
initial guess, resulting in too many or too few rejections. Andrieu and Thoms (2008)
suggest adapting the scaling factor βg using the the acceptance probability in (3.2).
If the acceptance probability is higher than the optimal acceptance probability, βg
increases, and vice versa. The optimal acceptance rate is chosen to be around 24% as
suggested in Gelman et al. (1996).
Bayesian inference requires specifying prior distributions for the parameters. For
most of the model parameters, we choose diffuse but proper priors. For the jump
process, we use a prior that elicit our belief that jumps are large compare to the
base-signal. Specifically, we use a gamma distribution for the standard deviation of
jump sizes which places lower probability on small jumps.
3.4.3 PMMH Algorithm
We outline the algorithm for the TF-SVJ model in this subsection:
1. For g = 1, ...,G , where G is the number of MCMC iterations, sample θg+1 ∼T N (θg ,βgΣg ), then run the following SMC algorithm to obtain p(Z1:T |θg+1))
and K g+11:T :
a) sample σ2(0, i ) and Y (i )1 from their stationary distribution.
i. compute
ω(i )1 =
pθ(
Z1|Y (i )1 ,σ2(0, i )
)N
for i = 1, ..., N , where N is the number of particles.
ii. obtain likelihood from p(Z1) =∑Ni=1ω
(i )1 , and compute normalized
weights: ω(i )1 = ω(i )
1p(Z1) .
b) at t = 1, ...,T −1
i. sample the index a(i )t for i = 1, ..., N , using ωt and set ωt = 1
N .
ii. sample σ2(t , i ) ∼ pθ(σ2(t )|σ2(t −1, a(i )t ))
and Y (i )t+1 ∼ pθ(Yt+1|Zt+1,σ2(t , i ),Y
a(i )t
t ).
iii. compute the incremental weights: ω(i )t+1 = pθ(Zt+1|σ2(t , i ),Y
a(i )t
t , Z1:t ).
iv. obtain the likelihood: pθ(Zt+1|Z1:t ) =∑Ni=1 ω
(i )t+1ω
(i )t .
v. normalize the weights ω(i )t+1 =
ω(i )t ω(i )
t+1∑Ni=1 ω
(i )t ω(i )
t+1
c) at t = T
i. obtain p(Z1:T |θg+1)) = pθ(Z1)∏T−1
t=1 pθ(Zt+1|Z1:t ).
ii. use ωT and a1:T to draw a realization of states K g+11:T .
3.4. ESTIMATION METHOD 103
2. accept θg+1 and K g+11:T with probability:
αg+1 = min
p(Z1:T |θg+1))p(θg+1)q(θg |θg+1)
p(Z1:T |θg )p(θg )q(θg+1|θg ),1
,
where p(Z1:T |θg+1)) is computed from the SMC algorithm above, p(θ) is the
prior density of the model parameters which we specify in Section 5, and q is
the truncated normal proposal density. The ratio of proposal densities is given
in Equation (3.4). If rejected, we set θg+1 and K g+11:T equal to θg and K g
1:T .
3. update the scaling factor and the covariance matrix for the proposal density:
νg+1 = 1/(g +1)0.5
logβg+1 = logβg +νg+1(αg+1 −α∗
)µg+1 =µg +νg+1(θg+1 −µg )
Σg+1 =Σg +νg+1((θg+1 −µg
)(θg+1 −µg
)T −Σg)
.
3.4.4 Model Comparison
The estimation procedure is easily adapted to all the models we considered in Section
2. The question remains, which of the models fits the data better? Specifically, is
stochastic volatility important? Which volatility process is more suitable for the UK
Gas price data? What is the role of jumps? Is it necessary to have different mean
reversion rate for the spike process and the base-signal? To address these important
questions, we estimate a large set of models and conduct an extensive model compar-
ison. For nested models, we carry out model specification tests using likelihood ratio
statistics. We also compute Bayes factors as models with tempered stable volatility
and logarithmic volatility are not nested.
Given two competing models, say the TF-SVJTS model and the TF-SVJLog model,
the Bayes factor is then the ratio of the probability of each model given data, i.e.,
BF = p(TF-SVJTS|Z )/p(TF-SVJLog|Z ). If we assume that the competing models are
equally probable a priori, the Bayes factor can be expressed as the posterior odds
ratio: BF = p(Z |TF-SVJTS)/p(Z |TF-SVJLog). The density p(Z |M) is termed marginal
likelihood, as it is the likelihood of data under model M obtained by marginalizing
over the parameters in model M :
p(Z |M) =∫
p(Z |θ, M)p(θ|M)dθ, (3.5)
where p(θ|M) is the prior density of parameters in model M .
We use the output from the PMMH algorithm to compute the Bayes factors. The
algorithm produces p(Z |θg , M)Gg=1, where θg are draws from the posterior density
p(θ|Z , M). In equation (3.5), the integration is over the prior density of θ. Newton and
104 CHAPTER 3. A GENERALIZED SCHWARTZ MODEL FOR ENERGY SPOT PRICES
Raftery (1994) propose several estimators to compute marginal likelihood based on
importance sampling and Monte Carlo integration. We adopt the version which uses
a mixture of the prior and posterior as the importance density, yet does not require
further simulation from the prior. Given G samples of θ from the posterior, imagine
that δpG/(1−δp ) additional samples of θ are drawn from the prior, resulting in a
total of G/(1−δp ) samples from the mixture density δp p(θ|M)+ (1−δp )p(θ|Z , M).
Assume that the draws from the prior all have likelihood p(Z |θ, M) equal to their
expected value p(Z |M), we then obtain the following estimator,
p(Z |M) =δpG/(1−δp )+∑G
g=1 p(Z |θg , M)/
(δp(Z |M)+
(1−δp
)p(Z |θg , M)
)δpG/
((1−δp )p(Z |M)
)+∑G
g=1
(δp(Z |M)+
(1−δp
)p(Z |θg , M)
)−1 .
(3.6)
3.5 Estimation Results
We apply the PMMH algorithm from subsection 3.4.3 to the deseasonalized and
detrended logarithmic gas spot prices. We start with a preliminary run which uses
adaptive MCMC, then “freeze” the covariance matrix Σ and scaling factor β and run a
further 20000 iterations to get the posterior distributions of θ and K . The number of
particles for the logarithmic volatility specification is set to 4800, while the number of
particles for the model with tempered stable volatility is set to 1600 for computational
considerations. To conduct likelihood ratio test, we need a point estimate of θ, and
we choose both the mean and median of the posterior distribution of θ. To minimize
randomness from SMC, we use 300,000 particles to evaluate the likelihood at the
point estimates. Marginal likelihood p(Z |M) is computed from equation (3.6).
The estimation for the benchmark models from Section 3.2 follows similar proce-
dures. For models with jumps in the logarithmic price, we utilize Rao-Blackwellization
and integrate out the jump times and jump sizes analytically for the likelihood.2 For
models with two factors, we use the non-Markovian representation. For the SF-J
model, the likelihood can be obtained analytically, and we use the MH step for updat-
ing parameters.
The same prior is specified for models with overlapping parameters, and we as-
sume that the parameters are independent a priori and the joint prior is simply the
multiplication of prior distributions. In summary, the priors for the mean reversion
parameters are αx ∼G(1,1) and αy ∼G(1,1), where G denotes the Gamma distribu-
tion. In the TF-J model, we also imposeαy >αx to ensure that the mean reversion rate
for the spike process is larger than the mean reversion rate for base-signal. For stochas-
tic volatility with tempered stable marginals, we specify λ∼G(1,2), κ∼ Beta(10,10),
2For the TF-J model we use auxiliary particle filters. This is the case with “perfect adaptation”, andRao-Blackwellized particle filters and auxiliary particle filters only differs in the order of the sampling andresampling steps.
3.5. ESTIMATION RESULTS 105
δ∼G(1,p
50), and γ∼G(1,p
50). For logarithmic volatility,αh ∼G(1,1), µh ∼ N (−5,5)
and σ2h ∼G(1,2). For models with constant volatility, we choose σ2 ∼G(1,2). Finally,
the priors for jump parameters are µJ ∼ N (0,2), σJ ∼G(1.5,0.5), and λJ ∼G(1,10).
3.5.1 Parameter Estimates
The parameter estimates obtained from fitting the models to the detrended and
deseasonalized logarithmic gas spot prices are reported in Table 3.3. We also report
the log-likelihood evaluated at the mean and median of the posterior distribution of
the parameters, and the marginal log-likelihood computed using formula (3.6).
The parameter estimates for the simplest benchmark model, SF-J, indicate that
the model does not adequately capture the dynamics of the data. Like in Green
and Nossman (2008), failing to include stochastic volatility severely drives up the
estimated jump intensity. In our case, we find λJ = 0.2445, which does not fare well
with our data and the general observation that jumps are rare events. The high jump
intensity means that most of the variability in the data is explained by jumps, and as a
consequence the estimate of the constant volatility,σ, is very low. From the estimated
spike innovations, plotted in the top panel of Figure 3.4, we see that there is clustering
in the jump times and the assumption of a constant jump intensity does not hold.
The estimate of the mean-reversion rate, α, are in line with the estimate α= 0.0064,
obtained from fitting the theoretical autocorrelation function, exp−α|t |, to the first
100 lags of the empirical autocorrelation function for Z (t ). The mean-reversion rate
corresponds to a half-life of approximately 90 days, revealing that most emphasis is
put on capturing the mean-reversion of the base-signal and that the logarithmic spot
price is very persistent.
In the next two benchmark models, SF-JTS and SF-JLog, we consider allowing for
stochastic volatility, but neglect to account for the presence of jumps. In Green and
Nossman (2008), the authors find that in the constant volatility case, neglecting to
account for jumps will drive up the estimate of the mean-reversion rate. This is not
the case when there is stochastic volatility in the model, at least not for the data at
hand. The estimate of the mean-reversion rate has not changed much, and it appears
that most of the variations in the data can be explained by stochastic volatility. This
can also be seen from the filtered variance processes in the top panel of Figure 3.6,
where the volatile period in the beginning of the data sample, that was explained
by jumps in the SF-J model (see the top panel of Figure 3.4), is now captured by the
stochastic volatility process. The log-likelihood evaluated at the mean and median
of the posterior distribution of the parameters has increased a lot, indicating that
including stochastic volatility in the model is more important than allowing for jumps.
From the plots of the filtered stochastic variance processes, in Figure 3.6, there do
not seem to be much difference across the two specifications. When κ equals 0.5 in
the TS-OU specification we get a IG-OU process, which was the specification used in
the SF-SV model that was fitted to UK natural gas spot prices in Benth (2011). Our
106 CHAPTER 3. A GENERALIZED SCHWARTZ MODEL FOR ENERGY SPOT PRICES
estimate of κ is 0.4504, and testing κ= 0.5 yields a p-value of 0.196.
Table 3.3. Parameter estimates.
SF-J SF-SVTS SF-SVLog SF-SVJTS SF-SVJLog TF-J TF-SVJTS TF-SVJLog
αx 0.00773 0.00679 0.00696 0.00553 0.00784 0.00421 0.00867 0.00766(0.00293) (0.00304) (0.00207) (0.00268) (0.00291) (0.00203) (0.00226) (0.00161)
αy 0.00773 0.00679 0.00696 0.00553 0.00784 0.03956 2.1008 2.1297(0.00293) (0.00304) (0.00207) (0.00268) (0.00291) (0.01709) (0.378) (0.478)
σ2 0.00047 0.00046(0.00004) (0.00004)
λ 0.2108 0.1841 0.2011(0.02637) (0.02701) (0.02623)
κ 0.4504 0.4429 0.4108(0.03834) (0.03024) (0.04401)
δ 0.04262 0.04430 0.06809(0.01879) (0.01541) (0.03196)
γ 8.3085 8.4637 8.4585(1.420) (1.389) (1.522)
αh 0.06049 0.06369 0.06336(0.01346) (0.01434) (0.01353)
µh -7.0472 -7.1098 -7.1014(0.138) (0.216) (0.148)
σ2h 0.2737 0.2742 0.2651
(0.05783) (0.05434) (0.05281)
µJ -0.00189 -0.07008 0.07445 -0.00583 -0.09599 -0.3270(0.00520) (0.125) (1.686) (0.00526) (0.328) (0.358)
σJ 0.09342 0.1605 0.4919 0.09140 0.7232 0.4330(0.00480) (0.104) (0.292) (0.00452) (0.273) (0.230)
λJ 0.2445 0.01092 0.00031 0.2521 0.00265 0.00221(0.02340) (0.00975) (0.00020) (0.02292) (0.00172) (0.00128)
LogLikMean 2941.69 3174.17 3194.21 3176.57 3194.26 2944.32 3178.72 3199.63
LogLikMedian 2941.71 3174.82 3194.69 3177.59 3194.49 2944.36 3180.44 3199.95
MarginalLL 2937.97 3172.10 3193.00 3173.35 3191.60 2939.96 3176.43 3196.45
In the benchmark models labeled SF-SVJ, we still consider a one-factor model
but this time we include both stochastic volatility and jumps in the model specifica-
tion. From the results in Table 3.3, we see that the different volatility specifications
now results in different estimates of the mean-reversion rate. The half-life of the ob-
served process is now 125 days in the SF-SVJTS model and 88 days for the SF-SVJLog
3.5. ESTIMATION RESULTS 107
model. The jump intensity and jump size distribution also changes with the volatility
specification. Perhaps a bit surprisingly, the estimated jump intensity is higher in
the SF-SVJTS model where the stochastic volatility process is allowed to jump. If we
compare the middle and bottom panel in Figure 3.4, it is clear that the two models
especially differ in the modeling of the large innovation in the end of the data set. The
volatile period in the beginning the data set is captured (mainly) by stochastic volatil-
ity in both models and as a consequence the jump intensity has drastically dropped
compared to the estimate from the SF-J model. From the estimated spike innovations
in the SF-SVJTS model, we see that there is still some clustering in the jump times and
allowing for a time-varying jump intensity would be a natural extension of the model.
This is however left for future research, but will be further discussed in Section 3.6.
The estimated jump intensity in the SF-SVJLog model is extremely low, and from the
plots in Figure 3.4 we see that only one large innovation is categorized as a jump.
The low jump intensities in both models also explain the high standard errors on the
parameters, µJ and σJ , governing the jump size distribution. The inclusion of jumps
does however not seem to impact the parameters governing the volatility process
much. However, testing for κ= 0.5 in the TS-OU specification now gives a p-value of
0.0590, making it less plausible that the IG-OU specification could have been used
instead.
In our last benchmark model, TF-J, we consider extending the SF-J model to a
two-factor model, where the mean-reversion rates are different for the base-signal
and spike process. We impose the restriction αy >αx , to insure that the estimated
mean reversion rate for the spike process is higher than for the base-signal process.
We see that the estimate of α obtained in the SF-J model is a weighted average of αx
and αy , with most weight put on the mean-reversion rate for the base-signal. The
half-life of the base-signal is 165 days and the half-life of the spike process is 17.5
days. The half-life of the spike process is too high compared to what is normally
to be understood by a spike. The estimates for the jump process are unaffected by
the inclusion of separate mean-reversion rates for the two factors, and hence the
estimated jump intensity is still unrealistically high.
In our proposed model, TF-SVJ, which is the two-factor extension of SF-SVJ, the
estimates of the mean-reversion rates are higher than the estimate found for the
SF-SVJ model. The half-life of the base-signal equals 80 days for the TF-SVJTS model
and 90 days for the TF-SVJLog model. The estimated half-life for the spike process now
equals a third of a day with both volatility specifications. We also fitted the theoretical
autocorrelation function, w1 exp(−αx |t |)+(1−w1)exp(−αy |t |), to the first 100 lags of
the empirical autocorrelation function, and obtained the estimates αx = 0.0050 and
αy = 0.4141, corresponding to half-lifes of 138 days and 1.7 days. The high estimate
of αy in our TF-SVJ models, could just be a consequence of the low number of jumps
in the spike process. The jump intensity has gone down for the TS-OU specification
and up for the log-OU specification, such that it is now roughly equal for the two
108 CHAPTER 3. A GENERALIZED SCHWARTZ MODEL FOR ENERGY SPOT PRICES
volatility specifications. Even though the estimates of the mean and variance of the
jump size distribution are not the same for the two models, Figure 3.5 reveals that
the filtered jump processes are almost identical. From the filtered variance processes
in Figure 3.6, we see that the volatility processes look very similar to the ones from
the SF-SVJ and SF-SV models, except for the high peak in the end of 2009, which is
now less pronounced. The parameters governing the TS-OU volatility process has
slightly changed compared to the benchmark models, but the log-OU specifications
remains unaffected by the inclusion of separate mean-reversion rates. Testing the
restriction κ= 0.5 gives a p-value of 0.0386, rejecting that a IG-OU specification could
have captured the volatility dynamics equally well.
In order to check if our proposed model is internally consistent we compare the
theoretical mean of the variance process, σ(t), to the mean of the filtered variance
processes from Figure 3.6. The theoretical mean of the TS-OU volatility process
equals 0.00262, whereas the empirical mean is found to be 0.00224. With the log-OU
specification, the theoretical mean equals 0.00230 and the empirical mean is found
to be 0.00221. Hence, both models appear internally consistent. From the reported
log-likelihood values, we see that the inclusion of separate mean-reversion rates in
the model matters more in the TF-SVJLog model.
The significance of the different model characteristics and the internal ranking of
the models will be investigated thoroughly in the next subsection.
3.5. ESTIMATION RESULTS 109
2008 2009 2010 2011 2012 2013 2014−0.5
0
0.5
SF−J
2008 2009 2010 2011 2012 2013 2014−0.2
−0.1
0
0.1
0.2
Estim
ate
d S
pik
e Innovations
SF−SVJTS
2008 2009 2010 2011 2012 2013 2014−0.05
0
0.05
SF−SVJlog
Figure 3.4. The figure depicts the estimated spike innovations, computed from the mean ofthe posterior distribution p(ξ1:T , J1:T |Z1:T ).
110 CHAPTER 3. A GENERALIZED SCHWARTZ MODEL FOR ENERGY SPOT PRICES
2008 2009 2010 2011 2012 2013 2014−2
−1
0
1
TF−J
Estimated Spike Process
log−price
2008 2009 2010 2011 2012 2013 2014−2
−1
0
1
TF−SVJTS
Estimated Spike Process
log−price
2008 2009 2010 2011 2012 2013 2014−2
−1
0
1
TF−SVJlog
Estimated Spike Process
log−price
Figure 3.5. The figure depicts the estimated spike processes, computed from the mean of theposterior distribution p(Y1:T |Z1:T ).
3.5. ESTIMATION RESULTS 111
2008 2009 2010 2011 2012 2013 20140
0.02
0.04
0.06
0.08
SF−SV
log
TS
2008 2009 2010 2011 2012 2013 20140
0.02
0.04
0.06
0.08
Estim
ate
d V
ola
tility
σ2(t
)
SF−SVJ
log
TS
2008 2009 2010 2011 2012 2013 20140
0.02
0.04
0.06
0.08
TF−SVJ
log
TS
Figure 3.6. The figure depicts the estimated stochastic volatility processes, computed from themean of the posterior distribution p(σ2(1 : T −1)|Z1:T ).
112 CHAPTER 3. A GENERALIZED SCHWARTZ MODEL FOR ENERGY SPOT PRICES
3.5.2 Model Evaluation
We conduct a comprehensive comparison between the models using Bayes factors
computed from the marginal log likelihood in Table 3.3. We report in Table 3.4 twice
the logarithm of the Bayes factor as it has the same scale as likelihood ratio test
statistics. Let LBF(M1, M0) =−2(log p(M1|Z )− log p(M0|Z )), Kass and Raftery (1995)
suggest the following scale for interpretation: if LBF(M1, M0) is between 2 to 6, it is
viewed as positive evidence against model M0; between 6 to 10, it indicates strong
evidence; and a value greater than 10 is interpreted as very strong evidence. Negative
values are interpreted on the same scale while it suggests evidence in favor of M0.
When volatility is assumed to be constant, allowing for different mean-reversion
rates in the two factors improves the overall performance of the model, as seen
by LBF(TF-J,SF-J) = 3.989. However, allowing for stochastic volatility has a much
larger impact on the model fit, as LBF(SF-SV,SF-J) are over 400 for both volatility
specifications.
Next, we look at the effect of having price jumps in the model when stochastic
volatility is being accounted for. With the TS-OU volatility specification, including
jumps in the price renders LBF(SF-SVJTS,SF-SVTS) = 2.513, indicating positive evi-
dence against SF-SVTS. The two-factor version of the model further improves the fit as
LBF(TF-SVJTS,SF-SVJTS) is equal to 6.159. For the log-OU volatility specification, the
Bayes factor favors SF-SVLog over SF-SVJLog. Although the log-likelihood of SF-SVJLog
evaluated at the posterior mean of parameters is higher than that of SF-SVLog, the
Bayes factor penalizes models with more parameters and in this case selects the sim-
pler model SF-SVLog. However, when we consider the full model, TF-SVJLog, there is
strong evidence against both SF-SVJLog and SF-SVLog, with LBF(TF-SVJLog,SF-SVLog) =6.91 and LBF(TF-SVJLog,SF-SVJLog) = 9.69. In summary, simply including jumps in
the model description is only slightly favorable (TS-OU volatility) or unfavorable (log-
OU volatility), while specifying a spike process, by imposing αx 6=αy , does improve
the model fit. This finding is in line with our intuition that jumps are faster decaying
and that this feature is important for model building.
The Bayes factors also provide a direct comparison between the two types of
volatility specifications. Here LBF(M Log, M TS) is ranging from 36.5 to 41.8, with M
indicating SF-SV, SF-SVJ or TF-SVJ. For the UK natural gas data, models with the
log-OU volatility specification are favored over models with TS-OU volatility across
the different jump specifications, with the TF-SVJLog model providing the best fit to
the data.
We further investigate the performance of the two volatility specifications in
different periods of time. Johannes et al. (2009) suggest tracking the likelihood ratio
sequentially to help identify when a model fails. We compute the sequential deviance,
defined as D t = −2(log p(Z1:t |θ, M TS)− log p(Z1:t |θ, M Log)). We choose θ to be the
posterior mean of θ, since the log-likelihood evaluated at the mean and median are
quite close. As before, we use 300,000 particles when computing the likelihood. The
3.5. ESTIMATION RESULTS 113
Table 3.4. Bayes factors.
SF-J SF-SVTS SF-SVLog SF-SVJTS SF-SVJLog TF-J TF-SVJTS
SF-SVTS 468.26
SF-SVLog 510.06 41.80
SF-SVJTS 470.77 2.513 -39.29
SF-SVJLog 507.28 39.02 -2.788 36.50
TF-J 3.989 -464.27 -506.07 -466.78 -503.29
TF-SVJTS 476.93 8.672 -33.13 6.159 -30.35 472.94
TF-SVJLog 516.97 48.71 6.910 46.20 9.697 512.98 40.04
The table reports twice the natural logarithm of the Bayes factors. The entry (i , j ) in the matrix compares themodel in the i th row and the model in the j th column, with a positive value favoring the first model and anegative value favoring the latter.
results are plotted in Figure 3.7. By comparing the three panels in Figure 3.7 we see
that while D t behaves similarly across the different jump specifications, its dynamics
reveal important differences between the TS-OU and log-OU volatility processes.
Compared with Figure 3.6, we see that D t is positive during 2007 and the beginning
of 2008, indicating that log-OU specification provides a better fit than the TS-OU
specification, in this low volatility period. In mid 2008, D t drops drastically at the
pronounced volatility spikes, and stays negative in the following relatively volatile
period. After 2010, as the market enters a more tranquil period, D t starts to increase,
resulting a in positive value for the full sample.
Our results indicate that the TS-OU specification is well suited for volatile periods
where the volatility of volatility is high, while the log-OU specification fits the tranquil
periods better. The differences could arise from two aspects: First, while the log-OU
volatility process is a continuous process, the TS-OU process is purely jump-driven,
allowing it to better explain large price changes. Second, from Figure 3.8, we see that
the autocorrelation function for the log-OU specification decays slower, inferring a
more persistent volatility and providing a better fit to the empirical autocorrelation
function. This helps the log-OU specification to outperform the TS-OU specification
in the last 4 years of the sample period. The theoretical autocorrelation function for
the TS-OU specifications, fits the first few lags of the empirical autocorrelation very
well and the faster decay rate of the function stress its ability to better capture large
changes in the volatility, as seen in the first period of our data set. Figure 3.8 also
114 CHAPTER 3. A GENERALIZED SCHWARTZ MODEL FOR ENERGY SPOT PRICES
suggests extending the model to a multi-factor specification for the volatility process.
We also conduct LR tests in the nested models. The models SF-J, SF-SVTS, SF-SVJTS
and TF-J are all nested by TF-SVJTS, and we report the LR test statistics and p-values
in Table 3.5. With a 10% significance level, all of the restricted models are rejected.
The same conclusion applies to the log-OU specification at a 5% significance level.
2008 2009 2010 2011 2012 2013 2014−20
0
20
40
SF−SV
2008 2009 2010 2011 2012 2013 2014−20
0
20
40
SF−SVJ
Sequential D
evia
nce
2008 2009 2010 2011 2012 2013 2014−20
0
20
40
TF−SVJ
Figure 3.7. The figure plots the sequential deviance between models with different volatilityspecifications.
3.5. ESTIMATION RESULTS 115
0 5 10 15 20 25 30 35 40 45 500
0.2
0.4
0.6
0.8
1
Autocorrelation Function of the TS−OU Volatility Process
Theotical ACF
Sample ACF
0 5 10 15 20 25 30 35 40 45 500
0.2
0.4
0.6
0.8
1
Autocorrelation Function of the log−OU Volatility Process
Theotical ACF
Sample ACF
Figure 3.8. The figure displays the theoretical and empirical autocorrelation functions for thetwo stochastic volatility processes in the TF-SVJ model. The theoretical acf’s are computedusing the posterior mean of the parameters governing the volatility processes. The empiricalacf’s are computed in each MCMC iteration and then averaged.
116 CHAPTER 3. A GENERALIZED SCHWARTZ MODEL FOR ENERGY SPOT PRICES
Table 3.5. LR-tests for the full model.
SF-J SF-SVTS SF-SVJTS TF-J
LRmean 474.072 9.11 4.31 468.802(0.000) (0.058) (0.038) (0.000)
LRmedian 477.453 11.239 5.69 472.157(0.000) (0.024) (0.017) (0.000)
SF-J SF-SVLog SF-SVJLog TF-J
LRmean 515.896 10.849 10.742 510.625(0.000) (0.028) (0.001) (0.000)
LRmedian 516.477 10.520 10.908 511.181(0.000) (0.033) (0.001) (0.000)
The table reports the LR-test statistic for comparing the models in the columns to the full TF-SVJ model. Thep-values for accepting the null model are reported in parentheses.
3.5.3 Model validation
We conclude our empirical investigation by checking that our proposed model is
able to reproduce the statistical properties of the data. For purposes of derivative
pricing and risk management, it is very important that the model implied price and
return distributions match the empirical distributions. For each estimated model, we
simulate 5000 artificial data sets of the same length as the observed spot prices. Then,
using these simulated paths, we calculate the empirical model implied distributions
of skewness and kurtosis. In Table 3.6, the 5th, 50th and 90th percentile of the model
implied distributions for the skewness and kurtosis of the deseasonalized logarithmic
spot price, Z (t), and its returns are reported for each model. The table also reports
the values from the observed deseasonalized UK natural gas spot prices.
Table 3.6 shows that stochastic volatility is crucial for capturing the price skewness
and kurtosis. All the models, except for the SF-J and TF-J model, have distributions
covering the observed negative price skewness, and small excess kurtosis, of the
deseasonalized UK natural gas spot prices. For both the SF-J and TF-J model, the
probability that the observed sample value of the skewness and kurtosis are real-
izations from the model implied distributions, is less than 5%. The best performing
model is the TF-SVJLog model, with the SF-SVJLog model being a close runner-up. In
fact, as we shall see in Figure 3.9, there is almost no difference in the distribution of
skewness and kurtosis implied by these two models.
3.5. ESTIMATION RESULTS 117
Table 3.6. Distribution of skewness and kurtosis
prctile SF-J SF-SVTS SF-SVLog SF-SVJTS SF-SVJLog TF-J TF-SVJTS TF-SVJLog
Price Skewness -0.7818
5th -0.628 -0.827 -1.088 -0.813 -1.113 -0.505 -0.793 -1.129
50th -0.011 -0.004 0.005 -0.028 0.006 -0.014 -0.002 -0.026
95th 0.603 0.808 1.117 0.759 1.141 0.491 0.757 1.059
Price Kurtosis 3.8611
5th 2.065 2.106 2.242 2.023 2.271 2.238 2.245 2.296
50th 2.632 2.832 3.174 2.728 3.245 2.778 3.106 3.300
95th 3.651 4.445 5.855 4.201 5.972 3.655 4.908 5.867
Return Skewness -0.3703
5th -0.515 -0.875 -1.425 -1.297 -1.771 -0.622 -2.050 -1.999
50th -0.071 0.024 0.012 -0.377 0.039 -0.203 -0.224 -0.540
95th 0.376 0.850 1.387 0.508 2.411 0.206 1.493 0.798
Return Kurtosis 21.4775
5th 7.947 8.203 8.613 9.082 8.617 7.675 10.777 11.838
50th 9.058 11.523 14.391 12.796 15.502 8.710 71.515 43.733
95th 10.593 20.381 41.516 21.153 68.271 10.145 210.076 140.477
The table reports the skewness and kurtosis of the deseasonalized logarithmic spot price (Z (t )) and its returnseries. The table also report the percentiles of the simulated sample skewness and kurtosis for each of theconsidered models. For each model, the percentiles are computed using 5000 simulated data sets with 1620observations each. The simulations are performed using the parameter estimates from Table 3.3.
Turning our attention to the return distribution, Table 3.6 shows that the return
skewness can be captured by all the considered models. Once again, we find that
inclusion of stochastic volatility enables the model to produce more skewness. The
transition from a single factor model to a two-factor model also impacts the distri-
bution of return skewness, in contrast to the results found for the price skewness.
The model that captures the return skewness the best, in terms of producing the
highest probability of observing an outcome from the model implied distribution
that is more extreme than the sample skewness of the data, is the SF-SVJTS model. As
for the kurtosis of the return series, it follows from Table 3.6 that stochastic volatility
is essential in order to produce a high enough level of kurtosis. It is also clear that the
volatility specification impacts the distribution of the kurtosis of the returns. For the
single factor models, only the models with a log-OU volatility specification are able to
fully match the level of kurtosis observed in the data. With the TS-OU specification,
the probability of observing an outcome from the distribution of kurtosis, that is
118 CHAPTER 3. A GENERALIZED SCHWARTZ MODEL FOR ENERGY SPOT PRICES
more extreme than the sample value from our data, is only around 5%. Allowing the
spike process to have its own mean-reversion rate, significantly impact the models
ability to generate high levels of kurtosis. In the two-factor models, a jump is now
followed by a quick reversion back to the base-signal process, instead of the much
slower reversion found in the single factor models, and this behavior increases the
kurtosis of the return series. The distribution that captures the observed kurtosis the
best, is the distribution implied by the SF-SVJLog model. The closest runner-up is the
TF-SVJLog model.
−5 −4 −3 −2 −1 0 1 2 3 4 5
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Sample Skewnes
Z(t): SF-SVJLog versus TF-SVJLog
TF-SVJLog: pval=0.222
SF-SVJLog: pval=0.222
0 2 4 6 8 100
0.1
0.2
0.3
0.4
0.5
0.6
Sample Kurtosis
Z(t): SF-SVJLog versus TF-SVJLog
TF-SVJLog: pval=0.566
SF-SVJLog: pval=0.566
Figure 3.9. The figure plots the model implied distribtions of skewness and kurtosis of thedeseasonalized logarithmic spot price Z (t ), along with the sample values.
In the Figure 3.9 and Figure 3.10 the model implied distributions of skewness
and kurtosis of the price and return series are plottet for our favoured model, the
TF-SVJLog model, and the closest contender, the SF-SVJLog model. The reported p-
values should be interpreted as the probability of observing an outcome from the
distribution that is more extreme than the sample value. Hence, if the sample value
matches the median of the model implied distribution, then we report a p-value of 1.
Following the discussion above, Figure 3.9 and 3.10 highlights the fact that allow-
ing for separate mean-reversion rates matters for the return distribution in terms of
matching skewness and kurtosis. Also, in the TF-SVJLog model, the estimated jump
intensity is about seven times larger than the jump intensity in the SF-SVJLog model,
which contributes to the increased spread of the kurtosis distribution. Both models
are, however, able to capture the statistical properties of the data at hand.
To sum up, we find that inclusion of stochastic volatility in the models is crucial
for matching the price and return distributions. In financial applications, such as risk
management, matching the skewness and kurtosis of the return series is often more
3.6. EXTENSIONS 119
−5 −4 −3 −2 −1 0 1 2 3 4 5
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Sample Skewnes
r(t): SF-SVJLog versus TF-SVJLog
TF-SVJLog: pval=0.828
SF-SVJLog: pval=0.545
0 5 10 15 20 25 30 35 40 45 500
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
Sample Kurtosis
r(t): SF-SVJLog versus TF-SVJLog
TF-SVJLog: pval=0.444
SF-SVJLog: pval=0.586
Figure 3.10. The figure plots the model implied distribtions of skewness and kurtosis of thereturns from the deseasonalized logarithmic spot price Z (t ), along with the sample values.
important than matching the price distribution. In this context, we found that the
volatility specification and the inclusion of a separate mean-reversion rate for the
spike process greatly impacts the models ability to generate high levels of kurtosis.
3.6 Extensions
The economic importance of our findings remain to be tested. One approach could
be to consider how the different model specifications impact the forward prices by
considering the empirical risk premium, RP (t) = F observed(t ,T1,T2)−FP(t ,T1,T2),
defined as the difference between the observed market price and the predicted spot
price over the period of delivery [T1,T2]. The latter is computed using the theoretical
forward prices with Q = P, and then averaged over the delivery period. From the
forward prices derived in Benth and Vos (2013b), we suspect that the forward prices
will depend on the filtered factors, X (t ) and Y (t ), as well as the filtered spot volatility
σ2(t). Among other things, this has the implication that even without jumps in the
spot price model the forward price can still jump if the volatility process has jumps.
The study of forward price dynamics might therefore serve as a way of testing the
economic difference between the different volatility specifications. Derivation of
forward prices in our full model, TF-SVJ, is therefore a topic for future research.
It would also be of interest to see how the model performs on data from other
energy markets. In particular the electricity market, where spikes are larger and more
frequent and the overall volatility of the observed spot prices are larger compared to
the gas spot prices. In this data it might also be important to consider other jump size
120 CHAPTER 3. A GENERALIZED SCHWARTZ MODEL FOR ENERGY SPOT PRICES
specifications as most of the spikes are positive. Related to this, extending the model
to have several spike components - i.e. one spike component for the positive jumps
and one component for capturing the negative jumps would also be of relevance. It
should be noted that there are significant differences between electricity markets,
with e.g. spike occurrences and sizes getting smaller in the EEX market. Our model
and estimation setup could therefore also be used to investigate these changes, and
the differences across the various electricity markets.
The clustered jumps in the SF-J model suggest that it could be interesting to
incorporate a time-varying jump intensity. For example, we can specify a stochastic
process for the jump intensity, or allow the jump intensity to depend on the spot
volatility or on exogenous variables such as weather. From the plot of the autocorrela-
tion functions for the filtered volatility processes, it also appears that a better model
fit might be obtained from extending the volatility specification to a multi-factor
specification with different decay-rates for the entering factors.
Another extension of the model specification could be to incorporate leverage
effects. In Green and Nossman (2008) the authors introduce leverage in their model
by making the driving Brownian motions of the volatility process and base-signal
process correlated. This would of course only be possible with the log-OU volatility
specification. With the TS-OU specification, another approach could be to investigate
the implications of making the volatility and the spike process correlated.
Finally, it would be very interesting to try to adapt our estimation approach to
the multi-dimensional model from Benth and Vos (2013a) and investigate how the
different model characteristics impact the joint modeling of for instance gas and
electricity spot prices.
3.7 Conclusion
We proposed a two-factor geometric model with stochastic volatility and jumps for
the detrended and deseasonalized logarithmic UK natural gas spot price. We then
described how this model could be estimated by using a non-Markovian representa-
tion of the model, with the spike process and stochastic volatility process being latent
variables. In contrast to most estimation approaches found in the literature, the base-
signal and spike component of the model are estimated simultaneously, allowing us
to investigate the interplay between the specification of the two components. The es-
timation method employed uses the particle marginal Metropolis-Hastings sampler
from Andrieu et al. (2010) and has the advantage of being easy to adopt to different
volatility specifications, including pure jump-driven specifications. The estimation
method is very general and made it easy to estimate and compare our proposed
model to other benchmark models in a unified framework. This also made it possible
to answer question like: What matters the most, including stochastic volatility or
jump? Is this conclusion affected by the volatility specification? And is it necessary to
3.7. CONCLUSION 121
allow for jumps in the volatility process? Our empirical application to logarithmic UK
natural gas spot prices showed that inclusion of stochastic volatility is much more
important than having jumps in the model. Like, in Green and Nossman (2008) we
also found that neglecting to include stochastic volatility results in a severely overes-
timated jump intensity. From the results for the two-factor version of the model with
jumps and constant volatility, we saw that the inclusion of separate mean-reversion
rates did not make the jump intensity drop. The results for the single factor model
with jumps (SF-J), resembling the model from Cartea and Figueroa (2005), also re-
vealed the need for a time-varying jump intensity when stochastic volatility is not
included in the model.
Sticking with a single factor model, but allowing for both stochastic volatility
and jumps showed that the volatility specification can impact the estimated jump
intensity and jump size distribution. The TS-OU specification, where the volatility
process is allowed to jump, actually resulted in a higher jump intensity than with the
log-OU volatility specification. From fitting the full model (TF-SVJ), with separate
mean-reversion rates, jumps and stochastic volatility, we found that having different
mean-reversion rates justifies the inclusion of jumps in the model specification,
even though the spike process only accounts for a small part of the variations in
our data. The model with a log-OU volatility specification outperforms the model
with TS-OU specification, even when jumps and different mean-reversion rates are
included in the latter model and not in the first. By tracking the likelihood ratio
sequentially, it became clear that the TS-OU volatility specification is well suited
for volatile periods where the volatility process is changing a lot, while the log-OU
specification outperforms the TS-OU specification during the more tranquil periods
where the volatility of volatility is low.
The estimation method based on particle MCMC is very general and it would
be interesting to extend our proposed model by incorporating a time-varying jump
intensity, to see if this changes the split between how much of the variations in the
data is captured by the base-signal and how much is captured by the spike process.
Another path for future research, is to investigate if stochastic volatility also matters
the most when other energy spot prices, such as electricity prices, are considered.
In electricity prices, spikes are larger and more frequent so the inclusion of a time-
varying jump intensity might be more relevant in this setting. Furthermore, adapting
the estimation framework to models with several spike components, like one process
for modeling the negative spikes and one for the positive spikes, as well as considering
other jump size distributions would be interesting topics for future research. The ACF
of our filtered volatility processes also indicate the need for multi-factor volatility
processes, something that our estimation approach can easily be adapted to, and is
an obvious path for future research. As already noted, the spike dynamics in some
electricity markets are changing, with spike occurrences and sizes dropping. It might
therefore be interesting to use our proposed model and estimation technique to
122 CHAPTER 3. A GENERALIZED SCHWARTZ MODEL FOR ENERGY SPOT PRICES
further investigate the differences across various electricity markets. The PMCMC
method also has potential in applications to multi-dimensional cross-commodity
models, like the one considered in Benth and Vos (2013a), and it would be very
interesting to adapt the approach to this setup.
3.8. APPENDIX 123
3.8 Appendix
3.8.1 Model Diagnostics
We present the model diagnostic plots for TF-SVJTS and TF-SVJLog in Figure 3.11 and
3.12. The plots for other models are similar and omitted to save space. The left panels
are trace plots of parameter draws against iterations, while the right panels report
the prior and empirical posterior distributions (histogram) of the parameters. We
first look at the trace plots. For model TF-SVJTS, we ran four chains in parallel from
different starting points for speed considerations, and the vertical lines indicates
when a new chain is started. For TF-SVJLog, one long chain is used. Visual inspection
indicates convergence, although the mixing of some parameters is less satisfactory,
for example for σ j , the standard deviation of jump sizes. This is likely due to the very
low jump intensity in these two models, and hence the algorithms have a hard time
estimating the jump sizes.
From the posterior-prior plots, we see that for most of the parameters, prior
information is negligible compared with posterior. To assist visual inspections, we
use different scales for the densities: ticks on the left axis are the density for posterior,
while ticks on the right axis denotes the density for prior. For example in Figure 3.11,
αx in the TF-SVJTS model, the prior density at αx = 0.0087 is around 1, while the
posterior density is about 180. For αx in the TF-SVJLog model, the prior density at
αx = 0.0077 is around 1, while the posterior density is about 250. One exception is
σ j , for which we choose a prior that places lower probability on jumps being small.
The posterior-prior plots in both models indicate that the prior for σ j is informative
about the posteriors.
124 CHAPTER 3. A GENERALIZED SCHWARTZ MODEL FOR ENERGY SPOT PRICES
0 5000 10000 150002.61
3.66
4.71
5.75
κ∗10−1
0.26 0.37 0.47 0.58 0 2 4 6 810
012234
0 5000 10000 150000.00
0.53
1.07
1.60
δ∗10−1
0 0.053 0.11 0.16 0 4 8121620
0.1300.1340.1380.1420.1460.150
0 5000 10000 15000 2.47
6.85
11.23
15.61
γ
2.5 6.8 11 16 0
0.080.160.240.32 0.4
0.000.020.040.060.080.10
0 5000 10000 150000.74
1.64
2.55
3.45
λ∗10−1
0.074 0.16 0.25 0.35 0 4 8121620
0.400.420.440.460.480.50
0 5000 10000 150000.00
0.67
1.33
2.00
αx∗10−2
0 0.0067 0.013 0.02 0 40 80120160200
0.980.991.001.001.011.02
0 5000 10000 150000.06
1.43
2.81
4.18
αy
0.06 1.4 2.8 4.2 00.40.81.21.6 2
0.00.20.40.60.81.0
0 5000 10000 15000−1.92
−0.78
0.36
1.51
µj
−1.9 −0.78 0.36 1.5 00.40.81.21.6 2
0.100.120.140.160.180.20
0 5000 10000 150000.00
0.53
1.07
1.60
σj
0 0.53 1.1 1.6 00.40.81.21.6 2
0.00.20.40.60.81.0
0 5000 10000 150000.00
0.40
0.80
1.20
λj∗10−2
0 0.004 0.008 0.012 0 80160240320400
0.09980.09990.10000.10000.10010.1002
Figure 3.11. Diagnostic Plots for the TF-SVJ model with TS-OU volatility. Left panels are traceplots of parameters. In the right panels, blue lines are posterior densities, while green linesindicate priors.
3.8. APPENDIX 125
0 5000 10000 150000.00
0.40
0.80
1.20
αh∗10−1
0 0.04 0.08 0.12 0 816243240
0.800.840.880.920.961.00
0 5000 10000 15000−7.93
−7.37
−6.80
−6.24
µh
−7.9 −7.4 −6.8 −6.2 00.81.62.43.2 4
0.060.070.080.080.090.10
0 5000 10000 150000.00
2.00
4.00
6.00
σ2 h∗10−1
0 0.2 0.4 0.6 0 2 4 6 810
0.300.340.380.420.460.50
0 5000 10000 150000.00
0.50
1.00
1.50
αx∗10−2
0 0.005 0.01 0.015 0 80160240320400
0.980.991.001.001.011.02
0 5000 10000 150000.00
1.67
3.33
5.00
αy
0 1.7 3.3 5 00.20.40.60.8 1
0.00.20.40.60.81.0
0 5000 10000 15000−3.30
−1.42
0.45
2.32
µj
−3.3 −1.4 0.45 2.3 00.40.81.21.6 2
0.000.040.080.120.160.20
0 5000 10000 150000.00
0.50
1.00
1.50
σj
0 0.5 1 1.5 00.81.62.43.2 4
0.00.20.40.60.81.0
0 5000 10000 150000.00
0.33
0.67
1.00
λj∗10−2
0 0.0033 0.0067 0.01 0 80160240320400
0.09980.09990.10000.10000.10010.1002
Figure 3.12. Diagnostic Plots for the TF-SVJ model with log-OU volatility.Left panels are traceplots of parameters. In the right panels, blue lines are posterior densities, while green linesindicate priors.
126 CHAPTER 3. A GENERALIZED SCHWARTZ MODEL FOR ENERGY SPOT PRICES
3.9 References
Andrieu, C., Doucet, A., Holenstein, R., 2010. Particle Markov chain Monte Carlo
methods. Journal of the Royal Statistical Society: Series B (Statistical Methodology)
72 (3), 269–342.
Andrieu, C., Thoms, J., 2008. A tutorial on adaptive MCMC. Statistics and Computing
18 (4), 343–373.
Barndorff-Nielsen, O. E., Shephard, N., 2001. Normal modified stable processes. The-
ory of Probability and Mathematical Statistics 65, 1–19.
Benth, F. E., 2011. The stochastic volatility models of Barndorff-Nielsen and Shephard
in commodity markets. Mathematical Finance 4, 595–625.
Benth, F. E., Benth, J. S., Koekebakker, S., 2008. Statistical Modeling of Electricity and
Related Markets. Advanced Series on Statistical Science and Applied Probability.
World Scientific.
Benth, F. E., Ekeland, L., Hauge, R., Nielsen, B. F., 2003. A note on arbitrage-free pricing
of forward contracts in energy markets. Applied Mathematical Finance 10, 325–336.
Benth, F. E., Kallsen, J., Meyer-Brandis, T., 2007. A non-Gaussian Ornstein-Uhlenbeck
process for electricity spot price modeling and derivatives pricing. Applied Mathe-
matical Finance 14:2, 153–169.
Benth, F. E., Kiesel, R., Nazarova, A., 2012. A critical empirical study of three electricity
spot price models. Energy Economics 34, 1589–1616.
Benth, F. E., Saltyte Benth, J., 2004. The normal inverse Gaussian distribution and
spot price modelling in energy markets. International Journal of Theoretical and
Applied Finance 07 (02), 177–192.
Benth, F. E., Vos, L., 2013a. Cross-commodity spot price modeling with stochastic
volatility and leverage for energy markets. Advances in Applied Probability 45,
545–571.
Benth, F. E., Vos, L., 2013b. Pricing of forwards and options in a multivariate non-
gaussian stochastic volatility model for energy markets. Advances in Applied Prob-
ability 45, 572–594.
Cartea, A., Figueroa, M., 2005. Pricing in electricity markets: a mean reverting jump
diffusion model with seasonality. Applied Mathematical Finance 12(4), 313–335.
Douc, R., Cappe, O., Sept 2005. Comparison of resampling schemes for particle filter-
ing. In: Image and Signal Processing and Analysis, 2005. ISPA 2005. Proceedings of
the 4th International Symposium on. pp. 64–69.
3.9. REFERENCES 127
Doucet, A., Freitas, N. d., Murphy, K. P., Russell, S. J., 2000. Rao-Blackwellised particle
filtering for dynamic Bayesian networks. In: Proceedings of the 16th Conference on
Uncertainty in Artificial Intelligence. UAI ’00. Morgan Kaufmann Publishers Inc.,
San Francisco, CA, USA, pp. 176–183.
Eraker, B., Johannes, M., Polson, N., 06 2003. The impact of jumps in volatility and
returns. Journal of Finance 58 (3), 1269–1300.
Eydeland, A., Wolyniec, K., 2003. Energy and Power Risk Management, New Develop-
ments in Modeling, Pricing and Hedging. John Wiley.
Gelman, A., Roberts, G., Gilks, W., 1996. Efficient Metropolis jumping hules. Bayesian
statistics 5, 599–608.
Geman, H., 2005. Commodities and Commodity Derivatives. John Wiley.
Green, R., Nossman, M., 2008. Markov chain Monte Carlo estimation of a multi-factor
jump diffusion model for power prices. The Journal of Energy Markets 1(4), 65–90.
Griffin, J., Steel, M., 2006. Inference with non-Gaussian Ornstein-Uhlenbeck pro-
cesses for stochastic volatility. Journal of Econometrics 134, 605–644.
Haario, H., Saksman, E., Tamminen, J., 2001. An adaptive Metropolis algorithm.
Bernoulli, 223–242.
Hambley, B., Howison, S., Kluge, T., 2009. Modelling spikes and pricing swing options
in electricity markets. Quantitative Finance 9(8), 937–949.
Jacquier, E., Polson, N. G., Rossi, P. E., October 1994. Bayesian analysis of stochastic
volatility models. Journal of Business & Economic Statistics 12 (4), 371–89.
Johannes, M. S., Polson, N. G., Stroud, J. R., July 2009. Optimal filtering of jump
diffusions: Extracting latent states from asset prices. Review of Financial Studies
22 (7), 2559–2599.
Kass, R. E., Raftery, A. E., 1995. Bayes factors. Journal of the American Statistical
Association 90 (430), pp. 773–795.
Klüppelberg, C., Meyer-Brandis, T., Schmidt, A., 2010. Electricity spot price modelling
with a view towards extreme spike risk. Quantitative Finance 10:9, 963–974.
Lucia, J., Schwartz, E., 2002. Electricity prices and power derivatives: evidence from
the Nordic power exchange. Review of Derivatives Research 5(1), 5–50.
Meyer-Brandis, T., Tankov, P., 2008. Multi-factor jump-diffusion models of electricity
prices. International Journal of Theoretical and Applied Finance 11, 503–528.
128 CHAPTER 3. A GENERALIZED SCHWARTZ MODEL FOR ENERGY SPOT PRICES
Newton, M. A., Raftery, A. E., 1994. Approximate Bayesian inference with the weighted
likelihood bootstrap. Journal of the Royal Statistical Society. Series B (Methodologi-
cal) 56 (1), pp. 3–48.
Pitt, M. K., dos Santos Silva, R., Giordani, P., Kohn, R., 2012. On some properties of
Markov chain Monte Carlo simulation methods based on the particle filter. Journal
of Econometrics 171 (2), 134 – 151, bayesian Models, Methods and Applications.
Pitt, M. K., Shephard, N., June 1999. Filtering via simulation: Auxiliary particle filters.
Journal of the American Statistical Association 94 (446), 590–599.
Roberts, G. O., Papaspiliopoulos, O., Dellaportas, P., 2004. Bayesian inference for
non-Gaussian Ornstein-Uhlenbeck stochastic volatility processes. Journal of the
Royal Statistical Society Series B 66 (2), 369–393.
Schwartz, E., 1997. The stochastic behaviour of commodity prices: Implications for
valuation and hedging. The Journal of Finance 52(3), 923–973.
DEPARTMENT OF ECONOMICS AND BUSINESS AARHUS UNIVERSITY
SCHOOL OF BUSINESS AND SOCIAL SCIENCES www.econ.au.dk
PhD Theses since 1 July 2011 2011-4 Anders Bredahl Kock: Forecasting and Oracle Efficient Econometrics 2011-5 Christian Bach: The Game of Risk 2011-6 Stefan Holst Bache: Quantile Regression: Three Econometric Studies 2011:12 Bisheng Du: Essays on Advance Demand Information, Prioritization and Real Options
in Inventory Management 2011:13 Christian Gormsen Schmidt: Exploring the Barriers to Globalization 2011:16 Dewi Fitriasari: Analyses of Social and Environmental Reporting as a Practice of
Accountability to Stakeholders 2011:22 Sanne Hiller: Essays on International Trade and Migration: Firm Behavior, Networks
and Barriers to Trade 2012-1 Johannes Tang Kristensen: From Determinants of Low Birthweight to Factor-Based
Macroeconomic Forecasting 2012-2 Karina Hjortshøj Kjeldsen: Routing and Scheduling in Liner Shipping 2012-3 Soheil Abginehchi: Essays on Inventory Control in Presence of Multiple Sourcing 2012-4 Zhenjiang Qin: Essays on Heterogeneous Beliefs, Public Information, and Asset
Pricing 2012-5 Lasse Frisgaard Gunnersen: Income Redistribution Policies 2012-6 Miriam Wüst: Essays on early investments in child health 2012-7 Yukai Yang: Modelling Nonlinear Vector Economic Time Series 2012-8 Lene Kjærsgaard: Empirical Essays of Active Labor Market Policy on Employment 2012-9 Henrik Nørholm: Structured Retail Products and Return Predictability 2012-10 Signe Frederiksen: Empirical Essays on Placements in Outside Home Care
2012-11 Mateusz P. Dziubinski: Essays on Financial Econometrics and Derivatives Pricing 2012-12 Jens Riis Andersen: Option Games under Incomplete Information 2012-13 Margit Malmmose: The Role of Management Accounting in New Public Management Reforms: Implications in a Socio-Political Health Care Context 2012-14 Laurent Callot: Large Panels and High-dimensional VAR 2012-15 Christian Rix-Nielsen: Strategic Investment 2013-1 Kenneth Lykke Sørensen: Essays on Wage Determination 2013-2 Tue Rauff Lind Christensen: Network Design Problems with Piecewise Linear Cost
Functions
2013-3 Dominyka Sakalauskaite: A Challenge for Experts: Auditors, Forensic Specialists and the Detection of Fraud 2013-4 Rune Bysted: Essays on Innovative Work Behavior 2013-5 Mikkel Nørlem Hermansen: Longer Human Lifespan and the Retirement Decision 2013-6 Jannie H.G. Kristoffersen: Empirical Essays on Economics of Education 2013-7 Mark Strøm Kristoffersen: Essays on Economic Policies over the Business Cycle 2013-8 Philipp Meinen: Essays on Firms in International Trade 2013-9 Cédric Gorinas: Essays on Marginalization and Integration of Immigrants and Young Criminals – A Labour Economics Perspective 2013-10 Ina Charlotte Jäkel: Product Quality, Trade Policy, and Voter Preferences: Essays on
International Trade 2013-11 Anna Gerstrøm: World Disruption - How Bankers Reconstruct the Financial Crisis: Essays on Interpretation 2013-12 Paola Andrea Barrientos Quiroga: Essays on Development Economics 2013-13 Peter Bodnar: Essays on Warehouse Operations 2013-14 Rune Vammen Lesner: Essays on Determinants of Inequality 2013-15 Peter Arendorf Bache: Firms and International Trade 2013-16 Anders Laugesen: On Complementarities, Heterogeneous Firms, and International Trade
2013-17 Anders Bruun Jonassen: Regression Discontinuity Analyses of the Disincentive Effects of Increasing Social Assistance 2014-1 David Sloth Pedersen: A Journey into the Dark Arts of Quantitative Finance 2014-2 Martin Schultz-Nielsen: Optimal Corporate Investments and Capital Structure 2014-3 Lukas Bach: Routing and Scheduling Problems - Optimization using Exact and Heuristic Methods 2014-4 Tanja Groth: Regulatory impacts in relation to a renewable fuel CHP technology:
A financial and socioeconomic analysis 2014-5 Niels Strange Hansen: Forecasting Based on Unobserved Variables 2014-6 Ritwik Banerjee: Economics of Misbehavior 2014-7 Christina Annette Gravert: Giving and Taking – Essays in Experimental Economics 2014-8 Astrid Hanghøj: Papers in purchasing and supply management: A capability-based perspective 2014-9 Nima Nonejad: Essays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques in Time Series Econometrics 2014-10 Tine L. Mundbjerg Eriksen: Essays on Bullying: an Economist’s Perspective 2014-11 Sashka Dimova: Essays on Job Search Assistance 2014-12 Rasmus Tangsgaard Varneskov: Econometric Analysis of Volatility in Financial Additive Noise Models 2015-1 Anne Floor Brix: Estimation of Continuous Time Models Driven by Lévy Processes
ISBN: 9788793195097