estimation of continuous time models driven by …...estimation of continuous time models driven by...

2015-1

Anne Floor Brix

PhD Thesis

DEPARTMENT OF ECONOMICS AND BUSINESS

AARHUS UNIVERSITY DENMARK

Estimation of Continuous Time Models Driven

by Lévy Processes

ESTIMATION OF CONTINUOUS TIME MODELS

DRIVEN BY LÉVY PROCESSES

By Anne Floor Brix

A PhD thesis submitted to

School of Business and Social Sciences, Aarhus University,

in partial fulfilment of the requirements of

the PhD degree in

Economics and Business

August 2014

CREATESCenter for Research in Econometric Analysis of Time Series

PREFACE

This dissertation is the result of my Ph.D. studies at the Department of Economics

and Business at Aarhus University and was written in the period from October 2010

to August 2014. I am grateful to the Department of Economics and Business as well

as the Center for Research in Econometric Analysis of Time Series (CREATES) funded

by the Danish National Research Foundation for providing an excellent research envi-

ronment and for allowing me to attend numerous inspiring courses and conferences

both in Denmark and abroad. I would also like to thank the department for giving me

the opportunity to lecture the course Mathematics - Dynamic Analysis, it was a very

exciting and fruitful experience for me.

I wish to thank my main advisor Asger Lunde for his guidance, many helpful

comments and for encouraging me to apply for a Ph.D. position. I had the pleasure of

working with Asger during my studies, and two of the chapters in this dissertation

are the outcome of our collaboration. I also wish to thank Wei Wei for working with

me and for introducing me to Bayesian techniques. It has been a great experience

working with her and Asger on the last chapter of the dissertation. My co-advisors

Claus Munk and Elisa Nicolato also deserve thanks.

From March 2013 to July 2013 I had the pleasure of visiting Torben G. Andersen

at Kellogg School of Management, Northwestern University. I would like to thank

Torben for inviting me and to thank Kellogg School of Management for their hos-

pitality. I would also like to thank Almut Veraart for letting me visit her at Imperial

College London during December 2012 and 2013. She has been extremely helpful and

encouraging during my studies. Jan Pedersen also deserves my gratitude for many

interesting discussions, helpful comments and guidance in general. I also owe him a

big thank for making it possible for me to change my line of studies from mathematics

to econometrics and finance, back when I was a bachelor student at the Department

of Mathematics. I would also like to thank Michael Sørensen for inspiring discussions

and for inviting me to visit him at University of Copenhagen.

At Aarhus University I would like to thank the faculty and my fellow PhD students

for providing a friendly and inspiring work environment. I owe my old office mates

Lasse and Mikkel H a big thank for saving me from the basement and inviting me

into their office. Special thanks go to my new office mates Mikkel B and Strange for

all their help and numerous hours of fun discussions about math, statistics, gossip

i

ii

and Game of Thrones. I would also like to thank Jannie, Tine and in particular Sanni

for their support and for always being up for coffee and a chat when I needed it. All

my friends at CREATES, especially, Anders Kock, Laurent, Juan Carlos, Manuel, Jonas,

Rasmus, Mikkel B and Strange deserve a huge thank for making all the lunch breaks,

Friday bars and other social activities fun and unforgettable. Johannes and Strange

also deserve my gratitude for providing friendly and patient LATEX-support.

I am thankful for the never-ending encouragement and support from my parents,

family and friends. Thank you for helping me take my mind off the dissertation from

time to time.

Finally and most importantly, I would like to thank my husband Rasmus for all his

love and encouragement. Thank you for always believing in me, I truly would never

have made it without you.

Anne Floor Brix

Aarhus, July 2014

UPDATED PREFACE

The pre-defence meeting was held on September 5, 2014, in Aarhus. I would like

to thank the assessment committee consisting of Fred Espen Benth, University of

Oslo, Almut Veraart, Imperial College, London and Niels Haldrup, Aarhus University

for their careful reading of the dissertation and their constructive comments and

suggestions. I am very grateful for the inputs from our discussion and many of the

suggestions have been incorporated in the present version of the dissertation.

Anne Floor Brix

Aarhus, December 2014

iii

CONTENTS

Summary vii

Danish summary xiii

1 PBEFs for Stochastic Volatility Models with Noisy Data 1

1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Estimating Stochastic Volatility Models . . . . . . . . . . . . . . . . . 4

1.3 A Monte Carlo Study of the Finite Sample Performances . . . . . . . 16

1.4 Empirical Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

1.5 Conclusion and Final Remarks . . . . . . . . . . . . . . . . . . . . . . 31

1.6 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

1.7 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

2 On Estimation Methods for non-Gaussian OU Processes 47

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

2.2 Non-Gaussian Ornstein-Uhlenbeck Processes . . . . . . . . . . . . . 50

2.3 Estimation Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

2.4 Monte Carlo Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

2.5 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

2.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

2.7 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

2.8 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

3 A Generalized Schwartz Model for Energy Spot Prices 85

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

3.2 Model Descriptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

3.3 Data Description and Initial Analysis . . . . . . . . . . . . . . . . . . . 94

3.4 Estimation Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

3.5 Estimation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

3.6 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

3.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

3.8 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

v

vi CONTENTS

3.9 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

SUMMARY

The dissertation comprises three self-contained chapters on different methods for

estimating continuous time models driven by Lévy processes. The first two chapters

focus on the derivation and finite sample performances of estimators based on

estimating functions. In Chapter 1, prediction-based estimating functions are used

to estimate a model with diffusion type stochastic volatility. In Chapter 2, the purely

jump-driven non-Gaussian Ornstein-Uhlenbeck (OU) processes are estimated using

martingale estimating functions. In both chapters, a Monte Carlo simulation study is

carried out and the performance of the estimators are compared to the performance

of other competing estimators. In Chapter 3, the features of the two models from the

previous chapters are combined and a two-factor OU-based model with stochastic

volatility and jumps is proposed as a model for commodity spot prices. Due to the

complexity of the model under consideration, the methods based on estimating

functions are disregarded, and instead the Bayesian technique of Markov Chain

Monte Carlo (MCMC) with particle filters are employed.

The first chapter investigates the finite sample performances of estimators based

on prediction-based estimating functions (PBEFs). More specifically, we consider

estimation of the widely popular Heston model. Ever since the introduction of PBEFs

in Sørensen (2000), many papers have suggested using PBEFs to estimate various

models, including stochastic volatility models. However, the actual implementation

and the performance of this estimation approach was still to be investigated and

served as the starting point for our endeavors in Chapter 1. PBEFs are a generalization

of martingale estimating functions (MGEFs), and are particularly useful when consid-

ering non-Markovian models and other models where conditional moments become

analytically intractable. PBEFs are instead based on unconditional moments and aim

at approximating the unknown score function, which is a MGEF, by suitably choosing

the functions of the data to predict and the corresponding prediction spaces, used

for constructing the PBEFs. There are no guidelines on how to make these choices,

and the choices might impact the loss in efficiency from not using the likelihood

function to conduct inference. Inspired by the stylized fact of volatility clustering, we

consider predicting squared five minute returns using the most recent observations

of squared returns. As a method of reference, used for evaluating the finite sample

performance of the PBEF-based estimator, we consider the GMM estimator from

vii

viii SUMMARY

Bollerslev and Zhou (2002). The GMM estimator is based on conditional moments of

the latent daily integrated volatility and is constructed using realized variance as a

proxy. Our Monte Carlo study investigates the possible informational gain from using

the intra-daily returns directly in the PBEF-based estimator versus aggregating them

into the realized measures used in the GMM estimator.

Besides investigating the potential of PBEFs in the context of stochastic volatility

models, we also extend both estimators to account for the presence of market mi-

crostructure noise in the observations, thereby making the estimators more suitable

for practical applications. In both settings, with and without noise, our Monte Carlo

study reveals great promise for the PBEF-based estimator. Finally, we conduct an

empirical study, where the Heston model is fitted to SPDR S&P 500 (SPY) data. The

study shows how the flexibility of the PBEF-based estimator can be utilized to check

for model misspecification.

The focus of the second chapter is on estimation of non-Gaussian OU processes

driven by Lévy subordinators. These processes were made popular by Barndorff-

Nielsen and Shephard (2001) in the context of stochastic volatility models and used

by Benth, Kallsen, and Meyer-Brandis (2007), among others, to model commodity

spot prices. In most multi-factor models found in the literature on commodity spot

price modeling, the non-Gaussian OU factors are split using various filtering tech-

niques in a first step and then estimated separately in a subsequent step. Chapter 2

investigates the performance of different estimators used for estimating one-factor

non-Gaussian OU processes. The chapter therefore aims at answering the question

of which estimation procedure to invoke after splitting the factors in a multi-factor

model. In contrast to Chapter 1, the models are now Markovian. The transition den-

sity of the observations is however still not tractable, making maximum likelihood

estimation infeasible. We consider two ways of circumventing this problem. The first

approach, put forward in Valdivieso, Schoutens, and Tuerlinckx (2009), is to consider

approximating the likelihood function using the fast Fourier transform (FFT) and

the analytical expression for the characteristic function of the Lévy functionals. As a

second approach, we derive the optimal quadratic martingale estimating function

(MGEFs). The finite sample performance of the two estimators are then investigated

in a Monte Carlo study. Both finite and infinite activity OU processes are considered

and two different parameter settings, resembling respectively the base-signal and

spike part of the commodity spot price, are investigated. The performances of the

two estimation methods are also compared to that of straightforward methods, such

as quasi maximum likelihood and simple moment matching.

The performance of the FFT MLE method from Valdivieso et al. (2009) was supe-

rior in the ideal setup with simulated data and no model misspecification. However,

leaving this setup, the MGEF-based estimation method seems like a numerically more

robust approach, especially if we consider the trouble the FFT MLE method has with

handling high-frequency data and the sensitivity towards the nuisance parameters

ix

used for fine-tuning the FFT algorithm. Furthermore, using the MGEF-based method,

the mean-reversion parameter as well as the parameters governing the marginal

distribution can also be estimated simultaneously for finite activity processes. The

MGEF-based method also has the advantage of being able to handle a setup with

noise in the observations. This extension and others are discussed at the end of the

chapter.

The third and last chapter of the dissertation combines the building blocks from

Chapter 1 and 2 and propose a two-factor OU based commodity spot price model

with stochastic volatility and spikes. The model and benchmark models are fitted to

UK natural gas spot prices using Bayesian estimation techniques. We use a Markov

Chain Monte Carlo method with a particle filter (PMCMC), introduced in Andrieu,

Doucet, and Holenstein (2010), in order to avoid the aforementioned splitting of

the OU factors, often encountered in the literature, and are able to estimate the

complex model in one step. Besides adapting the PMCMC approach to our proposed

model, the chapter contributes by investigating the interplay between the inclusion

of stochastic volatility in the model and having a separate factor to account for the

spikes. Furthermore, our PMCMC method enables us to consider both a continuous

and purely jump-driven specification of the volatility process and thereby assess

whether the volatility specifications also influence the dynamics of the spike process.

Another advantage of the PMCMC method is that one of the outputs from the particle

filter is an estimate of the likelihood function, making model comparison easy.

It turns out, that the inclusion of stochastic volatility in the process, used for

modeling the base-signal, has a strong impact on the jump intensity and jump size

distribution in the spike process. Our study reveals, that for the UK natural gas data,

having stochastic volatility in the model is much more important than allowing for

jumps, and that neglecting to include stochastic volatility will drive up the jump

intensity severely. We also find, that even though the jumps only account for a small

part of the variations in the data, having separate mean-reversion rates for the two

OU-factors justifies the inclusion of jumps in the model. By tracking the likelihood

ratio sequentially it becomes clear, that the jump-driven volatility specification is well

suited for modeling the volatile periods with large changes in the volatility, whereas

the continuous specification is better at handling the more tranquil periods.

The Bayesian estimation approach employed in the last chapter is very different

from the classical estimation methods investigated in the other chapters. In classical

statistics, the parameters of interest, θ, are considered fixed but unknown, and the

data, y , is treated as a random realization from a hypothetical infinitely large data set.

The object of interest is the likelihood p(y |θ) (or its derivative, the score), from which

parameter estimates are obtained. In the Bayesian framework, θ is assumed to have a

distribution and the target is now p(θ|y) ∝ p(y |θ)p(θ), with the prior, p(θ), allowing

prior beliefs about θ to be included in the estimation method.

The methods from Chapter 1 and Chapter 2, which are based on estimating

x SUMMARY

functions, have the advantage of being fairly straight forward to implement, as they

only rely on unconditional and conditional moments respectively. Furthermore, the

methods are simulation free and are fast to execute. The relative simplicity of the

PBEF-based estimation approach, along with the promising results found in the first

chapter, suggest that the method could be used for estimating more complex Lévy-

driven models, such as the multi-factor model considered in Chapter 3. However, the

estimation method still remains to be tested on non-Gaussian Lévy based models,

like the superposition of non-Gaussian OU processes, as was suggested in Benth

et al. (2007). Since the PBEF-based approach only relies on unconditional moments,

which might have to be simulated in complex models, and due to the unpredictable

nature of the jumps in Lèvy processes the method might have a hard time producing

reliable results for multi-factor models with a high-dimensional parameter vector.

One solution to this potential problem could be, as previously discussed, to disen-

tangle the factors of the model in a first step and then in a second step apply the

relevant estimation methods from Chapter 1 and 2 on each factor. Of course, the

major drawback of this approach would be the loss of coherency, as the obtained

results might depend on the filtering technique used for splitting the factors and the

interplay between the factors is lost, since the estimation of each factor is carried out

separately. This drawback can be overcome by using Bayesian techniques, such as

the PMCMC method from Chapter 3. The method also has the advantage, that model

comparison and computation of statistics, like standard errors, come almost for free,

since the method estimates the entire distribution of θ, and the likelihood function

follows as a by-product of the particle filter. The underlying states of the model, like

the stochastic volatility process, can also be obtained from the particle filter. This is a

clear advantage of the Bayesian approach, since pricing of financial derivatives, such

as forward contracts, often depend on the filtered states of the spot price model.

One disadvantage of the Bayesian approach is its lack of simplicity, in the sense

that the approach might not be easy to explain to practitioners. Furthermore, a lot of

choices have to be made when implementing the method, like how many particles

to use in the filter and which priors to use. Time issues also arise if one wishes to

re-estimate the model on a daily basis, as the most general model from Chapter 3

takes several days to estimate. However, after estimating the model parameters, then

updating the filtered state variables as new data arrives, is just a question of running

the particle filter once. A task which takes less than a minute.

When estimating complex Lévy-driven models, the Bayesian methods is, in my

opinion, to be preferred and will in fact often be the only feasible approach. In the

context of commodity spot price modeling, topics for future research could include

estimation of advanced multi-dimensional cross-commodity models, like the one

recently proposed by Benth and Vos (2013), or the models found in Barndorff-Nielsen,

Benth, and Veraart (2013) which are driven by Lévy semistationary processes.

xi

References

Andrieu, C., Doucet, A., Holenstein, R., 2010. Particle markov chain monte carlo

methods. Journal of the Royal Statistical Society: Series B (Statistical Methodology)

72 (3), 269–342.

Barndorff-Nielsen, O. E., Benth, F. E., Veraart, A. E. D., 2013. Modelling energy spot

prices by volatility modulated Lévy-driven Volterra processes. Bernoulli 19(3),

803–845.

Barndorff-Nielsen, O. E., Shephard, N., 2001. Non-Gaussian OU-based models and

some of their uses in financial economics (with dicussion). Journal of the Royal

Statistical Society B 63, 167–241.

Benth, F. E., Kallsen, J., Meyer-Brandis, T., 2007. A non-Gaussian Ornstein-Uhlenbeck

process for electricity spot price modeling and derivatives pricing. Applied Mathe-

matical Finance 14:2, 153–169.

Benth, F. E., Vos, L., 2013. Cross-commodity spot price modeling with stochastic

volatility and leverage for energy markets. Advances in Applied Probability 45,

545–571.

Bollerslev, T., Zhou, H., 2002. Estimating stochastic volatility diffusions using condi-

tional moments of integrated volatility. Journal of Econometrics 109, 33–65.

Sørensen, M., 2000. Prediction-based estimating functions. Econometrics Journal 3,

123–147.

Valdivieso, L., Schoutens, W., Tuerlinckx, F., 2009. Maximum likelihood estimation

in processes of Ornstein-Uhlenbeck type. Statistical Inference for Stochastic Pro-

cesses 12, 1–19.

DANISH SUMMARY

Afhandlingen indeholder tre selvstændige kapitler der alle omhandler estimation af

kontinuert-tids modeller, hvor stokastikken er drevet af Lévy processer. De to første

kapitler fokuserer på udledningen af estimatorer, som bygger på estimationsfunktio-

ner, samt på undersøgelsen af estimatorernes egenskaber, når man betragter endelige

stikprøver. I kapitel 1 bruges prædiktionsbaserede estimationsfunktioner til at esti-

mere en difussionsbaseret stokastisk volatilitets model. I kapitel 2 betragtes i stedet

springdrevne ikke-Gaussiske Ornstein-Uhlenbeck (OU) processer, og disse estimeres

vha. martingal estimationsfunktioner. I begge kapitler sammenlignes estimatorernes

performance med andre konkurrerende estimatorer i et Monte Carlo simulations

studie. I kapitel 3 kombineres karakteristika for modellerne fra de to foregående kapit-

ler, til en to-faktor OU-baseret model med stokastisk volatilitet og spring. Modellen

bruges i kapitel 3 til at modellere råvarepriser. På grund af modellens kompleksitet be-

nyttes den Bayesianske estimationsmetode Markov Chain Monte Carlo (MCMC) med

et partikelfilter, i stedet for estimationsmetoderne baseret på estimationsfunktioner.

I det første kapitel undersøges brugen af prædiktionsbaserede estimationsfunktio-

ner (PBEF) til at estimere den populære Heston model. Siden PBEF blev introduceret

i Sørensen (2000), har mange artikler foreslået brugen af PBEF, bl.a. til at estime-

re stokastiske volatilitets modeller. Dog er implementeringen og kvaliteten af den

PBEF-baserede estimationsmetode endnu ikke undersøgt, og dette er derfor ud-

gangspunktet for vores analyse i kapitel 1. PBEF er en generalisering af martingal

estimationsfunktioner (MGEF), som er specielt brugbar når man betragter modeller

der ikke er Markovianske, eller andre modeller, hvor betingede middelværdier er svæ-

re at beregne analytisk. Byggestenene i PBEF er i stedet ubetingede middelværdier,

og man forsøger at approksimere den ukendte scorefunktion, der selv er en MGEF,

ved nøje at vælge hvilken funktion af data der prædikteres og hvilke prædiktorer der

skal bruges til at udforme PBEF’en. Der er ingen retningslinjer for, hvordan disse

valg foretages, og valgene kan have indflydelse på tabet af efficiens, ved ikke at bruge

maximum likelihood estimatoren. Inspireret af det velkendte fænomen volatility

clustering, vælger vi at prædiktere kvadrerede afkast vha. de seneste observerede

kvadrerede afkast. Den resulterende PBEF implementeres derefter og bruges til at

estimere Heston modellen i et Monte Carlo studie. Som referencemetode betragter vi

også GMM estimatoren fra Bollerslev and Zhou (2002). GMM estimatoren er baseret

xiii

xiv DANISH SUMMARY

på betingende middelværdier af funktioner af den latente integrerede daglige volati-

litet (IV), og konstrueres ved at benytte realized variance som et proxy for IV. Vores

Monte Carlo studie undersøger den mulige informationsfordel der kunne ligge i at

bruge intra-dags afkast direkte, som i PBEF estimatoren, i stedet for at aggregere dem

til daglige mål og bruge disse til at konstruere estimatorer.

Udover at undersøge den PBEF-baserede metodes potentiale indenfor estimation

af stokastiske volatilitets modeller, udvider vi også begge estimationsmetoder til at

kunne håndtere market microstructure støj. Både i scenariet med og uden støj i data

viser vores Monte Carlo studie, at den PBEF-baserede metode har et stort potentiale,

især når man kun har et kort datasæt til rådighed. Til sidst i kapitlet udfører vi et

empirisk studie, hvor vi fitter Heston modellen til SPDR S&P 500 (SPY) data vha. af

de to forskellige estimationsmetoder. Fra anvendelsen af markedsdata ses, hvordan

fleksibiliteten af den PBEF-baserede metode kan bruges til at udføre robusthedstjek

for modelmisspecifikation.

I kapitel 2 er fokus på estimation af ikke-Gaussiske OU processer drevet af Lévy su-

bordinatorer. Disse processer blev gjort populære i Barndorff-Nielsen and Shephard

(2001), som modeller for den stokastiske volatilitet, men er siden også blevet brugt til

at modellere råvarepriser i bl.a. Benth et al. (2007). I de fleste multi-faktor modeller,

som findes i litteraturen om modellering af råvarepriser, bruges diverse filtreringstek-

niker til først at skille de indgående faktorer ad, hvorefter de estimeres separat. Kapitel

2 undersøger performance af forskellige estimatorer brugt til at estimere en-faktor

ikke-Gaussiske OU processer. Kapitlet forsøger derved at besvare spørgsmålet om,

hvilken estimationsmetode der er bedst at bruge efter modellens faktorer er skilt ad. I

modsætning til kapitel 1 er modellerne nu Markov. Tæthedsfunktionen for transitio-

nerne er dog stadigvæk ikke tilgængelig og maximum likelihood estimation er derfor

ikke mulig. Vi betragter to forskellige metoder til at omgå dette problem. Den første

metode vi betragter er metoden fra Valdivieso et al. (2009), hvor likelihood funktionen

approksimeres vha. fast Fourier transform (FFT) algoritmen og den karakteristiske

funktion for processens Lévy-baserede tilvækster. Den anden estimationsmetode

bygger på den optimale kvadratiske martingal estimationsfunktion (MGEF), som vi

udleder analytisk for den ikke-Gaussiske OU process. De to estimatorers egenskaber

undersøges derefter i et Monte Carlo studie, hvor både processer med endelig og

uendelig springaktivitet betragtes. For hver process betragter vi to forskellige konfigu-

rationer af parametrene, alt efter om det simulerede data skal minde om base-signal

eller spike-delen af råvareprisen. De to estimatorers performance sammenlignes og-

så med simple estimationsmetoder, såsom quasi maximum likelihood og moment

matching.

FFT MLE metoden fra Valdivieso et al. (2009) havde den klart bedste performance

i vores simulations-setup, uden modelmisspecifikation. Hvis man bevæger sig væk

fra dette setup virker den MGEF-baserede metode dog til at være mere numerisk

robust, især set i lyset af de vanskeligheder FFT MLE metoden har med at håndtere

xv

højfrekvent data, og den følsomhed metoden har overfor parametrene der bruges

til at fine-tune FFT algoritmen. MGEF metoden har også den fordel, at alle model-

lens parametre stadig estimeres simultant, når vi betragter processer med endelig

springaktivitet. Ydermere har MGEF metoden, i modsætning til FFT MLE metoden,

potentiale for at kunne udvides til at håndtere støj i observationerne. Denne udvidel-

se, og andre, diskuteres sidst i kapitlet.

Det tredje og sidste kapitel i afhandlingen kombinerer modellerne fra de to foregå-

ende kapitler og betragter en to-faktor OU baseret model med stokastisk volatilitet og

spring. Ved hjælp af Bayesianske estimationsteknikker fittes modellen og andre ben-

chmark modeller til naturgasspotpriser fra Storbritanien (UK). Mere specifikt, bruger

vi en Markov Chain Monte Carlo metode med et partikelfilter (PMCMC), introduceret

i Andrieu et al. (2010), for at undgå at splitte de indgående faktorer fra hinanden før

modellen kan estimeres. Estimationsmetoden, og den foreslåede model, muliggør

en undersøgelse af vigtigheden af at inkludere stokastisk volatilitet og forskellige

mean-reversion rates for faktorerne i modelspecifikationen. Vores PMCMC estima-

tionsmetode gør det også muligt at betragte både en kontinuert og en springdreven

specifikation af volatiliteten, og derved undersøge specifikationernes indflydelse på

fx. estimaterne i springdelen af prisprocessen. En anden fordel ved PMCMC metoden

er, at likelihood funktionen automatisk estimeres i partikelfiltret og sammenligning

af forskellige modeller er derfor let af udføre.

Vi viser, at det er yderst vigtigt at have stokastisk volatilitet med i modellen for

naturgasspotpriserne, samt at denne beslutning har stor indflydelse på springdelen af

prisprocessen. Vores studie viser også, at hvis stokastisk volatilitet ikke inkorporeres

i modellen, så bliver springintensiteten kraftigt overestimeret. Vores studie viser at

forskellige mean-reversion rates for faktorerne retfærdiggør spring i modellen, selv-

om springene kun står for en lille andel af variansen i spotpriserne. Ved at betragte

likelihood ratioen som en funktion af tid kan vi se, at den springdrevne volatilitets-

specifikation er bedst til modellere volatilt data, hvor variansen af variansen er stor,

hvorimod den kontinuerte specifikation vi betragter er bedst til at håndtere de mere

rolige perioder.

Den Bayesianske estimationsmetode fra det sidste kapitel er meget anderledes

end de klassiske estimationsteknikker fra de andre kapitler. I klassisk statistik bli-

ver parametrene man ønsker at bestemme, θ, betragtet som faste men ukendte og

data, y , betragtes som en tilfældig stikprøve fra en hypotetisk uendelig stor popu-

lation. Man er hovedsageligt interesseret i at beregne likelihoodfunktionen p(y |θ)

(eller dens afledte, scorefunktionen) idet denne bruges til at bestemme et estimat

for θ. I Bayesiansk statitisk antages det, at θ også har en fordeling og ikke blot er

fast. Man er nu interesseret i den a posteri fordeling p(θ|y) ∝ p(y |θ)p(θ), hvor p(θ)

betegner a priori viden (forhåndsviden) om parametrene, og ud fra denne bestemme

parameterestimatet.

Metoderne fra kapitel 1 og 2, der er baseret på estimationsfunktioner, har den

xvi DANISH SUMMARY

fordel at de er forholdsvis lette at implementere, da de kun afhænger af henholdsvis

ubetingede og betingede middelværdier. Derudover benyttes ingen simulations-

teknikker og metoderne er hurtige at udføre. Den PBEF-baserede metodes relative

simplicitet, sammenholdt med de lovende resultater fra det første kapitel kunne

antyde, at metoden også har potentiale i forbindelse med estimation af mere kompli-

cerede Lévy-drevne modeller, som fx. den multi-faktor model vi betragter i kapitel 3.

Dette potentiale er dog endnu ikke testet på ikke-Gaussiske Lévy-drevne modeller,

såsom superpositioner af ikke-Gaussiske OU processer, hvilket blev foreslået i Benth

et al. (2007). Metoden har muligvis svært ved at producere nøjagtige resultater for

multi-faktor modeller med mange parametre, da metoden kun afhænger af ubetinge-

de middelværdier, hvilke muligvis skal simuleres i komplekse modeller, og pga. den

ikke-prædiktable opførsel som springene i Lévy processer har. En mulig løsning på

dette potentielle problem kunne være, som tidligere diskuteret, at splitte de indgåen-

de faktorer ad og dernæst estimere dem separat, fx. vha. metoderne fra kapitel 1 og 2.

Den største ulempe ved denne løsning er naturligvis den manglende sammenhæng i

estimationen. Parameterestimaterne kommer potentielt til at afhænge af den valgte

metode der er brugt til at splitte faktorerne, og da hver faktor efterfølgende estimeres

separat, går samspillet mellem faktorernes specifikation tabt. Denne problemstilling

kan løses ved brugen af Bayesianske estimationsmetoder og var præsis motivationen

for brugen af PMCMC metoden i kapitel 3. PMCMC metoden har også den fordel, at

det er let at sammenligne forskellige modeller samt beregne fx. standardafvigelser,

da likelihood funktionen estimateres i partikelfiltret og man med PMCMC metoden

estimerer fordelingen for θ. Modellens underliggende variable, såsom den stokasti-

ske volatilitetsprocess, kan også estimeres vha. partikelfiltret. Dette er en stor fordel

ved den Bayesianske estimationmetode idet prisfastsættelse af afledte finansielle

instrumenter, som fx. forwardkontrakter, ofte afhænger af disse variable.

En mulig ulempe ved den Bayesianske tilgang til estimation er manglende simpli-

citet, da metoden potentielt kan være svær at forklare til praktikere. Desuden skal der

foretages en masse valg når metoden implementeres, fx. hvor mange partikler der

skal anvendes i filteret, og hvilken a priori viden der skal specificeres for parametrene.

Endelig kan de Bayesianske metoder være meget tidskrævende. Fx. tager det flere

dage at estimere den mest komplekse model fra kapitel 3. Efter parametrene er esti-

meret for en given model, tager det dog mindre end et minut at opdatere de filtrede

underliggende variable, når nye observationer kommer til.

Når man skal estimere komplekse Lévy-drevne modeller med mange parametre,

er Bayesianske metoder efter min mening det bedste, og ofte eneste valg. I forbindelse

med modellering af råvarepriser, kunne fremtidige forskningsemner eksempelvis

fokusere på estimation af avancerede multi-dimensionale modeller, fx. modellen

fra Benth and Vos (2013), der kan bruges til simultan modellering af flere råvare-

priser, som fx. gas- og elektricitetspriser. Det ville også være interessant at udvikle

estimationsmetoder for modellerne fra Barndorff-Nielsen et al. (2013) som er drevet

xvii

af såkaldte Lévy semistationære processer.

Litteratur

Andrieu, C., Doucet, A., Holenstein, R., 2010. Particle markov chain monte carlo


72 (3), 269–342.

Barndorff-Nielsen, O. E., Benth, F. E., Veraart, A. E. D., 2013. Modelling energy spot

prices by volatility modulated Lévy-driven Volterra processes. Bernoulli 19(3),

803–845.









545–571.




123–147.

Valdivieso, L., Schoutens, W., Tuerlinckx, F., 2009. Maximum likelihood estimation in

processes of Ornstein-Uhlenbeck type. Statistical Inference for Stochastic Proces-

ses 12, 1–19.

CH

AP

TE

R

1PREDICTION-BASED ESTIMATING FUNCTIONS

FOR STOCHASTIC VOLATILITY MODELS WITH

NOISY DATA

A COMPARATIVE STUDY

Anne Floor Brix

Aarhus University and CREATES

Asger Lunde


Abstract

Prediction-based estimating functions (PBEFs), introduced in Sørensen (2000), are

reviewed, and PBEFs for the Heston (1993) stochastic volatility model are derived

with and without the inclusion of noise in the data. The finite sample performance of

the PBEF-based estimator is investigated in a Monte Carlo study and compared to the

performance of the Generalized Method of Moments (GMM) estimator from Boller-

slev and Zhou (2002) that is based on conditional moments of integrated volatility. We

derive new moment conditions in the presence of noise, but we also consider noise

correcting the GMM estimator by basing it on a realized kernel instead of realized

variance. Our Monte Carlo study reveals great promise for the estimator based on

PBEFs. The study also shows that the PBEF-based estimator outperforms the GMM

estimator, both in the setting with MMS noise and in the setting without MMS noise,

especially for small sample sizes. Finally, in an empirical application we fit the Heston

model to SPY data and investigate how the two methods handle real data and possible

model misspecification. The empirical study also shows how the flexibility of the

PBEF-based method can be used for robustness checks.

1

2 CHAPTER 1. PBEFS FOR STOCHASTIC VOLATILITY MODELS WITH NOISY DATA

1.1 Introduction

Continuous time stochastic volatility (SV) models are widely used in econometrics

and empirical finance for modeling prices of financial assets. Considerable efforts

have been put into modeling and estimation of the latent volatility process. Most of

this research is surveyed in part II of Andersen, Davis, Kreiss, and Mikosch (2009).

Stochastic volatility diffusion models, such as the Heston (1993) model, represent a

popular class of models within the continuous time framework. The Heston model

will be the baseline model considered in this chapter, since it is one of the most widely

used models in financial institutions, due to its analytical tractability.

Parameter estimation in SV-models is difficult because the volatility process is

latent. The hidden Markov structure complicates inference, since the observed log-

price process will not in it self be a Markov process, which implies that computing

conditional expectations of functions of the observed process is practically infeasi-

ble. As a consequence, martingale estimating functions will not be a useful tool for

conducting inference in SV-models. Likelihood inference is also not straightforward,

because an analytical expression for the transition density is almost never available

and methods based on extensive simulations are called for.

We will circumvent the above mentioned problems for conducting inference in

SV models by using prediction-based estimating functions (PBEFs), introduced in

Sørensen (2000), which are a generalization of martingale estimating functions. This

generalization becomes particularly useful when applied to observations from a non-

Markovian model. PBEFs are estimating functions based on predictors of functions

of the observed process. The structure of PBEFs is essentially a sum of weighted

augmented prediction errors, and an estimator is found by making this sum zero.

In this chapter we investigate and contrast two estimation approaches. First,

PBEFs will be reviewed, detailed, and used for parameter estimation in the Heston

model. The estimation method is fairly easy to implement and fast to execute, as

the construction of PBEFs rely only on the computation of unconditional moments.

When the Heston SV-model is considered, no simulations are needed for constructing

the PBEFs used in the chapter.1 As a benchmark, we consider the method suggested

in Bollerslev and Zhou (2002). In Bollerslev and Zhou (2002) a Generalized Method

of Moments (GMM) type estimator based on the first and second order conditional

moments of the integrated volatility (IV ) is derived. Since IV is latent, realized

variance (RV ) is used as a proxy and the sample moments of RV are matched to

the population moments of IV implied by the model. When high-frequency data is

1An implementable version of the optimal PBEF will however require simulation of a covariancematrix. Simulation based estimation methods for continuous time SV models, such as indirect inference,see Gourieroux, Monfort, and Renault (1993), the efficient method of moments (EMM), see Gallant andTauchen (1996), or Markov Chain Monte Carlo (MCMC), see Eraker (2001), are not as easily implemented,since many of them require substantial computational efforts. Another way of tackling the difficulties, thatarise when considering parameter estimation in continuous time SV models, is based on approximationsof the likelihood function, see for example Aït-Sahalia and Kimmel (2007).

1.1. INTRODUCTION 3

available, several other simulation-free methods have been suggested in the literature,

see for instance Barndorff-Nielsen and Shephard (2002), Corradi and Distaso (2006),

and Todorov (2009). Common to these methods, including the GMM-based estimator

from Bollerslev and Zhou (2002), is that they are all based on time-series of daily

realized measures, such as realized variance (RV ) and bipower variation (BV ). Instead

of being transformed into daily realized measures, the squared intra-daily returns

are used directly when constructing PBEFs. This means that PBEFs have a potential

informational advantage, the strength of which will be investigated throughout the

chapter. More specifically, the chapter investigates the finite sample properties of the

PBEF-based estimator in a Monte Carlo study and compares its performance to that

of the GMM estimator from Bollerslev and Zhou (2002).

The case where the efficient price is assumed to be directly observable as well as

the case where noise is present in data are considered. In particular, we contribute by

extending the two competing methods to handle the presence of noise. The usage of

PBEFs for estimating SV-models was suggested, among others, in Barndorff-Nielsen

and Shephard (2001), but to the best of our knowledge this is the first time the finite

sample performance of PBEFs applied to SV-models is studied. In fact, this is the most

extensive Monte Carlo study of the finite sample performance of the PBEF-based

estimation method. In Nolsøe, Nielsen, and Madsen (2000) the authors conduct a

small Monte Carlo study for the case where a Cox-Ingersoll-Ross (CIR) process is

observed with additive white noise, but the potential of using PBEFs to estimate

SV-models has not previously been studied.

The chapter also addresses the link between the estimation method based on

PBEFs and GMM based on the moment conditions underlying the PBEF. Especially,

the connection between the optimal PBEF and the optimal choice of the weight

matrix in GMM estimation is established.

Lastly, an empirical application using SPY data is carried out, investigating how

the two estimation methods handle real data characteristics and possible model

misspecification. In the empirical application we also study how different choices in

the flexible PBEF-based estimation method might impact the parameter estimates.

In particular, we investigate how considering different choices of the predictor space

might serve as a robustness check of whether there is a need for additional volatility

factors in the model.

The chapter is organized as follows: In the following section the PBEF estimation

method is review and detailed. The connection to GMM estimation is established,

and a brief review of the GMM estimator from Bollerslev and Zhou (2002) is provided.

For both methods, the estimator of the parameters in the Heston model is derived

with and without the inclusion of noise in the data. In section 3 we present our

Monte Carlo study. This includes an investigation of how i.i.d. noise impacts the

performances of the two methods and how the noise corrected estimators perform.

Section 4 contains an empirical application to SPY data that investigates how the


methods handle real data and if and how the choice of estimation method impacts

the parameter estimates. The final section concludes, and ideas on further research

are outlined.

1.2 Estimating Stochastic Volatility Models

In this section the two estimation methods from Sørensen (2000) and Bollerslev and

Zhou (2002) are reviewed and extended to handle market microstructure (MMS)

noise. We also discuss the link between the optimal PBEF and a GMM estimator with

the optimal choice of weight matrix. The focus of this chapter is on the performance

of the two considered estimation methods used for estimating SV models of the form

dX t =pvt dWt , dvt = b(vt ;θ)dt + c(vt ;θ)dBt , (1.1)

where W and B are independent standard Brownian motions. The independence

assumption rules out the possibility of leverage effects, but it is only imposed for

computational ease and could be relaxed in other applications. We will assume v to

be a positive, ergodic, diffusion process with invariant measure µθ and that v0 ∼µθ is

independent of B , which implies that v is stationary. In particular, we are interested

in studying inference for the Heston SV-model

dX t =pvt dWt , dvt = κ(α− vt )dt +σpvt dBt , (1.2)

where the spot volatility, vt , is a CIR-process. The parameter, α, is the long run aver-

age variance of the observed process, X t , and the other drift parameter, κ, is the rate

at which vt reverts to the long run average. The third parameter,σ, can be interpreted

as the volatility of volatility. The Feller condition, 2κα≥σ2, ensures positivity of the

variance process vt . The Heston model is widely used in mathematical finance

where the observed process, X t , would be the logarithm of an asset price. The pop-

ularity of the Heston model in financial institutions is primarily due to the analytical

tractability of the model, which allows for (quasi) closed form expressions for prices

of financial derivatives, such as European options.

1.2.1 Estimation using Prediction-based Estimating Functions

First, we explain the general setup and ideas underlying the estimation method based

on PBEFs that was introduced in Sørensen (2000) and further developed in Sørensen

(2011). Then, following Sørensen (2000), we derive the PBEFs for the Heston model

without MMS noise. Finally, we add MMS noise to the observations and derive PBEFs

in this setting.

The General Setup and Ideas

The estimation method based on PBEFs is used for conducting parametric infer-

ence based on observations Y1,Y2, . . . ,Yn from a general stochastic process. The

1.2. ESTIMATING STOCHASTIC VOLATILITY MODELS 5

stochastic process is assumed to belong to a class of models parametrized by a p-

dimensional vector, θ ∈Θ⊆Rp , that we wish to estimate. An estimating function is a

p-dimensional function Gn(θ) that depends on the data Y1,Y2, . . . ,Yn and θ, and an

estimator is then obtained by solving the p equations Gn(θ) = 0 w.r.t. θ.

Let Fi denote the σ-algebra generated by the observations Y1,Y2, . . . ,Yi . When

θ is the true parameter, we denote by H θi the L 2-space of all square integrable,

Fi -measurable, 1-dimensional random variables. H θi is a Hilbert space of real-

valued functions of the type h(Y1, . . . ,Yi ), with inner product given by ⟨h1,h2⟩ =Eθ[h1(Y1, . . . ,Yi )h2(Y1, . . . ,Yi )]. For each i , a closed linear subspace P θ

i−1 of H θi−1 can

be chosen as the predictor space for predicting f (Yi ), where f is some known 1-

dimensional function2. In the setup of PBEFs, we study estimating functions of the

form

Gn(θ) =n∑

i=1Π(i−1)(θ)︸︷︷︸

p×1

[ f (Yi )− π(i−1)(θ)︸︷︷︸∈P θ

i−1︸︷︷︸1×1

], (1.3)

where the function to be predicted, f (Yi ), is defined on the state space of the data

generating process Y . The function f is assumed to satisfy the condition Eθ[ f (Yi )2] <∞ for all θ ∈ Θ and for i = 1, . . . ,n. In (1.3), the p-dimensional stochastic vector of

weights,Π(i−1)(θ) = (π(i−1)

1 (θ), . . . ,π(i−1)p (θ)

), has elements that belong to the predictor

space P θi−1, and π(i−1)(θ) is the minimum mean square error (MMSE) predictor of

f (Yi ) in P θi−1. That is, π(i−1)(θ) is the orthogonal projection of f (Yi ) onto P θ

i−1 w.r.t.

the inner product in H θi defined above. Since the predictor space is both closed and

linear, this orthogonal projection exists and is uniquely determined by the normal

equations

Eθ[π

[f (Yi )− π(i−1)(θ)

]]= 0, for all π ∈P θi−1, (1.4)

see e.g. Thm. 3.1 in Karlin and Taylor (1975).3

A special class of PBEFs is the class of martingale estimating functions (MGEFs),

which is obtained by choosing P θi−1 := H θ

i−1. In this case, the MMSE predictor of

f (Yi ) in P θi−1 is the conditional expectation, π(i−1)(θ) = Eθ

[f (Yi )|Y1, . . . ,Yi−1

], and

Gn(θ) becomes a Pθ-martingale w.r.t. the filtration generated by the data process.

MGEFs are however mainly useful when considering Markovian models, since for

Non-Markovian models it is practically infeasible to calculate conditional expecta-

tions, conditioning on the entire past of observations. The idea underlying PBEFs, is

to use a smaller and more tractable predictor space in place of H θi−1 and think of the

resulting PBEF as an approximation of the MGEF. The advantage of considering this

2One could also choose to predict functions of the type f (Yi , . . . ,Yi−s ), see Sørensen (2011), but forthe purpose of this study f (Yi ) will be general enough. The function f can be chosen freely but willoften take the form f (Yi ) = Y ν

i , ν ∈ N, such that the moments needed to find the (optimal) PBEF areeasier to calculate. PBEFs can in fact be further generalized to a setup where several functions of thedata, f j (Yi ) j = 1, . . . , N , are predicted, see Sørensen (2000) and Sørensen (2011). This generalization will,however, not be necessary for estimating the SV-model we are considering.

3Unique in the sense of mean square distance.


approximation is that it is only based on unconditional moments, which are much

easier to compute, or simulate, than conditional moments. Regarding efficiency is-

sues, estimators based on conditional moments are more efficient than those based

on the unconditional version of the moments. The reason is that the score function is

a martingale, so we can obtain a close approximation to the score function by using

MGEFs. By suitably choosing the predictor space, the hope is that the PBEF will also

be a good approximation of the score function, and the resulting estimator will obtain

high efficiency.

In the rest of the chapter we will restrict our attention to finite dimensional pre-

dictor spaces, P θi−1, and assume that the observed process Yi is stationary.4 For

asymptotic properties of the estimator in this setting, consult Sørensen (2000) and

Sørensen (2011). In order to obtain even more tractable PBEFs, we will only con-

sider q +1 dimensional predictor spaces with basis elements of the form Z (i−1)k =

hk (Yi−1, . . . ,Yi−s ), k = 0, . . . , q , where hk : Rs 7→ R, s ∈ N and where the functions

h0,h1, . . . ,hq are linearly independent and do not depend on θ. The predictor space

used for predicting f (Yi ) is then given by P θi−1 = spanZ (i−1)

0 , Z (i−1)1 , . . . , Z (i−1)

q . The

basis elements of the predictor space are no longer assumed to be functions of the en-

tire past, but they are instead functions of the “most recent past” of a period of length s.

To adapt to usual practice, and to ensure that the resulting MMSE predictor of f (Yi ) in

P θi−1 becomes unbiased, we will assume h0 = 1. The predictors in P θ

i−1 will therefore

be of the form a0 +a′Z (i−1), where a′ = (a1, . . . , aq ) and Z (i−1) = (Z (i−1)

1 , . . . , Z (i−1)q

)′.5The normal equations (1.4) lead to the MMSE predictor

π(i−1)(θ) = a0(θ)+ a(θ)T Z (i−1), (1.5)

where a(θ)) =C (θ)−1b(θ) and a0(θ) = Eθ[ f (Yi )]−a(θ)′Eθ[Z (i−1)]. C (θ) denotes the q×q covariance matrix of Z (i−1) and b(θ) = [

Covθ(Z (i−1)1 , f (Yi )), . . . ,Covθ(Z (i−1)

q , f (Yi ))]′.

Note that, since the observed process Yi is stationary, the coefficients of the MMSE

predictor do not depend on i , but stay constant across time.6 For a formal derivation

of the expressions for the coefficients in (1.5) see Appendix A.

From (1.3-1.5) it follows that PBEFs can be calculated provided that we can calcu-

late the first- and second-order moments of the random vector(

f (Yi ), Z (i−1)1 , . . . , Z (i−1)

q

).

Within the setup of the finite dimensional predictor spaces considered above,

we now turn to the specification of the p ×1-vectorΠ(i−1)(θ) from (1.3). Since each

element of the vectorΠ(i−1)(θ) belongs to the predictor space P θi−1, the j th element

of Π(i−1)(θ) is of the form π(i−1)j (θ) =∑q

k=0 a j k (θ)Z (i−1)k , where, as before, Z (i−1)

0 = 1.

Note that the coefficients a j k (θ) do not depend on i but are, like the coefficients of

4In the context of stochastic volatility models, Yi will be the series of stationary asset returns.5In the case of the Heston model, the basis elements we will consider is of the form Z (i−1)

k= Y 2

i−k , k =1, . . . q .

6PBEFs with finite dimensional predictor spaces can also be computed for non-stationary processes,but in this case computing the MMSE predictor, π(θ), is a bit more complicated since the coefficients,a0(θ), . . . , aq (θ), become time-varying.


the MMSE, a(θ) and a0(θ), constant over time. Therefore, in order to ease notation,

we define

A(θ)p×(q+1)

=

a10(θ) . . . a1q (θ)

......

...

ap0(θ) . . . apq (θ)

, H (i )(θ)(q+1)×1

=

Z (i−1)

0

[f (Yi )− π(i−1)(θ)

]...

Z (i−1)q

[f (Yi )− π(i−1)(θ)

] ,

for i = 1, . . . ,n and Fn(θ) :=∑ni=s+1 H (i )(θ).7 With this notation at hand we are consid-

ering PBEFs of the form

Gn(θ) = A(θ)Fn(θ), (1.6)

where we need p ≤ q +1 to identify the p unknown parameters. Finding the optimal

PBEF within a class of PBEFs of the type (1.6) is then a question of choosing an optimal

weight matrix, A∗(θ). The resulting optimal PBEF will be the estimating function,

within the considered class of estimating functions of type (1.6), that is closest to the

score in an L 2-sense. Since we have asymptotic normality of the estimator, the PBEF

with the optimal choice of weight matrix will then give rise to the estimator with the

smallest asymptotic variance. For further details on the optimal PBEF see Sørensen

(2000) or Appendix B.

Relating PBEFs to GMM Estimation

The PBEFs and martingale estimating functions share many similarities with GMM

estimation from the econometrics literature. In this subsection we will explain the link

between the optimal PBEF and the optimal GMM estimator based on the moment

conditions from the normal equations. Throughout the chapter, optimal refers to

efficiency of the resulting estimator.

The PBEF-based estimator is obtained by solving Gn(θ) = 0 for θ, but for numer-

ical reasons it is often easier to minimize Gn(θ)′Gn(θ) w.r.t. θ ∈ Θ. We employ this

approach and find an estimator by solving

minθ∈Θ

Gn(θ)′Gn(θ) = minθ∈Θ

Fn(θ)′A(θ)′A(θ)Fn(θ).

This expression looks very similar to the GMM objective function that emerges if we

perform GMM estimation using the q +1 moment conditions Eθ[H(θ)] = 0. In this

case the GMM objective function to be minimized is ( 1n−s Fn(θ))′Wn(θ)( 1

n−s Fn(θ)),

which is equivalent to minimizing Fn(θ)′Wn(θ)Fn(θ). In the latter case, the corre-

sponding p first order conditions are

2(∂θFn(θ)

)′︸︷︷︸p×(q+1)

Wn(θ)︸︷︷︸(q+1)×(q+1)

Fn(θ)︸︷︷︸(q+1)×1

= 0, (1.7)

7Note that the sum starts at i = s +1 since Z (i−1)k

is only well-defined for i ≥ s +1.


if we evaluate the GMM weight matrix, Wn(θ), at some consistent parameter estimate

θ such that the weight matrix does not depend on θ. The first order conditions

(1.7) have the same structure as the PBEFs in (1.6). The only difference is that the

term in front of Fn(θ) in (1.7) becomes data dependent. However, it turns out that

there is a strong link between (1.7) with Wn(θ) chosen optimally and the optimal

PBEF of type (1.6). The optimal PBEF takes the form G∗n(θ) = A∗

n(θ)Fn(θ), where

A∗n(θ) = U (θ)′M n(θ)−1 and the expression for U (θ) and M n(θ)−1 can be found in

Sørensen (2000) or Appendix B. Straightforward calculations show that

− 1

n − s∂θFn(θ)′

p−→U (θ)′, when n −→∞. (1.8)

From the theory on GMM estimation, we know that the optimal choice of weight

matrix, Wn(θ), is the inverse of the covariance matrix of Fn(θ) since the H (i )’s are

correlated. In the GMM setting this weight matrix will in practice be constructed using

the sample version of the covariance matrix. When Wn(θ) is chosen optimally, Wn(θ)

equals 1n−s M n(θ)−1 and (1.7) becomes (except of a factor − 1

2 ) the empirical analog of

the optimal PBEF, G∗n(θ) = A∗

n(θ)Fn(θ). Constructing the optimal PBEF is therefore

the same as constructing the theoretical first order conditions that emerge from the

optimal GMM objective functions based on the moment conditions, Eθ[H(θ)] = 0,

from the normal equations. The choice of f and predictor space then translates into

which moment conditions to use in the GMM estimation. These choices will therefore

also impact the efficiency of the resulting PBEF-based estimator. Once these choices

are made, the optimal PBEF-based estimator is linked to the optimal GMM estimator,

based on the moment conditions from the normal equations, as described above.

Note that a sub-optimal choice of the weight matrix Wn(θ) will lead to a sub-optimal

PBEF, but the class of PBEFs are in general broader than the ones having the structure

(1.7).

PBEFs for SV-Models without Noise in the Data

We now return to the setup from (1.1) and, following Sørensen (2000), review how to

compute PBEFs for SV-models without MMS noise.

Suppose the process X has been observed at discrete time points X0, X∆, . . . , Xn∆.

It is more convenient to base the statistical inference on the differences Yi = Xi∆−X(i−1)∆ since the process Yi , in contrast to Xi∆, will be stationary since vt is

assumed stationary. In this setup, inference based on MGEFs becomes practically

infeasible, since the conditional expectations appearing in the MGEFs, which are

based on f (Yi )−Eθ[ f (Yi )|Yi−1, . . . ,Y1], are difficult to compute analytically, as well as

numerically. One feasible approach for conducting inference is to use PBEFs instead.

In fact, for many models, such as the Heston model, we are able to derive analytical


expressions for the PBEFs. Then the continuous time returns from (1.1) are given by

Yi = Xi∆−X(i−1)∆ =∫ i∆

(i−1)∆

pvt dWt , (1.9)

which allows for the decomposition Yi =p

Si Zi , where the Zi ’s are i.i.d. standard nor-

mal random variables independent of Si , and where the process Si is given by Si =∫ i∆(i−1)∆vt dt . The distribution of vt is the same on all intervals [0,∆), . . . , [(n −1)∆,n∆)

because vt is stationary, hence it follows that Si and Yi are stationary processes.

Note that the Y ′i s have zero mean and are uncorrelated, but not independent.

To construct PBEFs, we have to decide on which function of the data to predict.

Since the Y ′i s are uncorrelated, trying to predict Yi using Yi−1,Yi−2, . . .Yi−q will not

work and f (y) = y is a bad choice. To match empirical data, where we have volatility

clustering, squared returns from the considered SV-models are often correlated, and

a natural choice for f would therefore be f (y) = y2. The decomposition Yi =p

Si Zi

also reveals that f (y) = y2 is a convenient choice, as it eases the computation of the

moments required to construct the PBEFs. Correlation between absolute returns

tend to be even more persistent than the correlation between squared returns and

f (y) =∣∣y∣∣ might also seem like a good choice. This choice will however complicate

the computation of the moments need for constructing the PBEF. The problem is that

there is in general no simple way of relating the moments Eθ[Sηi ] to the moments of

the volatility process, unless η is an integer. For instance, computing Eθ[p

Si ] is not

an easy task. Other choices of f might result in efficiency gains, but without further

knowledge of the intractable score functions that we aim to estimate, we will stick to

the class of polynomial PBEFs and use f (y) = y2, as this offers computational ease.8

As our predictor spaces we choose

P θi−1 = a0 +a1Y 2

i−1 +·· ·+aq Y 2i−q |ak ∈R k = 0,1, . . . , q.

This means that the predictor variables Z (i−1)k = Y 2

i−k for k = 1,2, . . . , q have the same

functional form as the function of the data to predict.9 Notice that in this case s = q

since P θi−1 is spanned by the “most recent past of squared returns of length q”.10

With the above choice of f and predictor space, the MMSE predictor is given by

π(i−1)(θ) = a0(θ)+ a(θ)′Z (i−1), with Z (i−1) = (Y 2i−1, . . . ,Y 2

i−q ),

a(θ) =C−1(θ)b(θ), a0(θ) = Eθ(Y 21 )[1− (a1(θ)+·· ·+ aq (θ))]. (1.10)

8Higher power of y could also have been considered, but very high moments are often not reliable forempirical investigations, and we choose to stick with f (y) = y2 as suggested in Sørensen (2000).

9It should be noted that one does not have to choose a predictor space spanned by variables of thesame functional form as f , even though it seems like the most natural choice.

10When the volatility process vt is ρ-mixing the coefficient ak (θ) decreases exponentially with k andq need not be very large since q represents the “‘required” information for predicting f (Y ), (see Thm.3.3 in Bradley (2005)). Note that if the volatility process vt is α-mixing then the observed process Yi inherits this property and is also α-mixing, (see lemma 6.3 in Sørensen (2000)).


As before, C denotes the covariance matrix of Z (i−1), and b is the q ×1-vector with

j th element given by Covθ(Y 2i− j ,Y 2

i ). Together with (1.6), this means that we are

considering PBEFs of the form

Gn(θ) =n∑

i=q+1Π(i−1)(θ)︸︷︷︸

p×1

[Y 2i − (a0(θ)+ a1(θ)Y 2

i−1 +·· ·+ aq (θ)Y 2i−q )], (1.11)

with Π(i−1)(θ) = A(θ)Z (i−1), where Z (i−1) = (1,Y 2i−1, . . . ,Y 2

i−q )′. In our Monte Carlo

study we will use the following sub-optimal, yet simple weight matrix

A(θ) =

1 0 0 0 . . . 0

0 1 0 0 . . . 0

0 0 1 0 . . . 0

,

since computing the optimal weight matrix A∗(θ) involves computing the covariance

matrix of Fn(θ).11 The resulting sub-optimal PBEF is

Gn(θ) =n∑

i=q+1

1

Y 2i−1

Y 2i−2

[Y 2i − (a0(θ)+ a1(θ)Y 2

i−1 +·· ·+ aq (θ)Y 2i−q )]. (1.12)

Equating (1.12) to zero and solving for θ gives ap

n-consistent estimator, but we may

loose some efficiency for not using the optimal weight matrix A∗(θ). However, the aim

of the chapter is to study the finite sample performance of an easy implementable

and simulation free PBEF. As we shall see in our Monte Carlo study, the estimator

based on the sub-optimal PBEF performs well in finite samples, and a study of the

possible further improvements from using the optimal PBEF is left for future research.

The sub-optimal PBEF from (1.12) can now be computed if we can calculate

a0(θ), a1(θ), . . . , aq (θ). For this we need Eθ[Y 2i ],Varθ(Y 2

i ) and Covθ(Y 2i ,Y 2

i+ j ) for j =1, . . . , q , so we have to assume Eθ[Y 4

i ] <∞ for the MMSE predictor to be well-defined.12

The required moments can be calculated from the moments of the volatility process

vt . Now, define the mean, variance and autocorrelation function of the volatility

process as ξ(θ) := Eθ[vt ],ω(θ) := Varθ(vt ), and r (u;θ) := Covθ(vt , vt+u)/ω(θ). From

(Barndorff-Nielsen and Shephard, 2001, pp. 179-181) it follows that

Eθ[Y 2i ] =∆ξ(θ), (1.13)

Varθ(Y 2i ) = 6ω(θ)R∗(∆;θ)+2∆2ξ(θ)2, (1.14)

Covθ(Y 2i ,Y 2

i+ j ) =ω(θ)[R∗(∆( j +1);θ)−2R∗(∆ j ;θ)+R∗(∆( j −1);θ)], (1.15)

11A task that involves computing Eθ[Y 2i Y 2

j Y 2k Y 2

1 ] for i ≥ j ≥ k. For further details on how to compute

optimal PBEFs for stochastic volatility models, see Sørensen (2000). In Sørensen (2000) an analyticalformula for the optimal PBEF for an affine SV-model, such as the Heston model, is also given. Even thoughan analytical expression for A∗(θ) is in principle available, it is a very complicated expression and noteasily implementable. In practice, a feasible strategy could be to simulate A∗(θ).

12From Jensen’s inequality it follows that Eθ[vβ/2t ] < ∞ implies Eθ[Y

βi ] < ∞ for β ≥ 2. For β ≤ 2,

Eθ[vt ] <∞ implies Eθ[Yβ

i ] <∞.


where R∗(t ;θ) = ∫ t0

∫ s0 r (u;θ)duds. In the Heston model the stationary distribution of

vt is the Gamma distribution with shape parameter 2κασ−2 and scale parameter

2κσ−2, provided that σ> 0, α> 0 (non-negativity), κ> 0 (stationary in mean), and

2κα≥σ2 (stationary in volatility). Thus, we have ξ(θ) =α, ω(θ) = ασ2

2κ , r (u;θ) = e−κu

and R∗(t ;θ) = 1κ2

(e−κt +κt −1

). The moments we need are then given by

Eθ[Y 2i ] =∆α, (1.16)

Varθ(Y 2i ) = 6ασ2

2κ3

(e−κ∆+κ∆−1

)+2∆2α2, (1.17)

Covθ(Y 2i ,Y 2

i+ j ) = ασ2

2κ3

(e−κ∆ j [e−κ∆−2+eκ∆

]). (1.18)

From the above derivations it is clear that PBEFs can easily be derived in other

diffusion models where we can compute the mean, variance and autocorrelation

structure of the volatility process. This is for instance the case when the volatility

process belong to the class of Pearson diffusions, that nests the CIR process, see

Forman and Sørensen (2008).

PBEFs for SV-Models with Noisy Data

We now add noise to the observation scheme from (1.9). More specifically, we add

i.i.d. Gaussian noise to the log-price process

Xi = X ∗i +Ui , Ui i.i.d. N (0,ω2), (1.19)

where the efficient log-price process X ∗ comes from the Heston model and X ∗ and

U are assumed to be independent. The additive error term Ui will be interpreted as

market microstructure (MMS) noise due to market frictions such as bid-ask bounce,

liquidity changes, and discreteness of prices. When MMS noise is present, the ob-

served returns have the following structure

Yi = Xi −Xi−1 = (X ∗i −X ∗

i−1)+ (Ui −Ui−1) = Y ∗i +εi , (1.20)

where the M A(1) process, ε, is normally distributed, N (0,2ω2), and independent of

the efficient return process Y ∗.

To correct for MMS noise in the PBEFs, the moments used to construct the MMSE

predictor have to be recalculated. That is, Eθ[Y 2i ],Varθ(Y 2

i ) and Covθ(Y 2i ,Y 2

i+ j ) need

to be computed in the setting from (1.20).

Straightforward calculations give Eθ[Y 2i ] = ∆α+2ω2, since Y ∗ and ε are inde-

pendent and have mean zero. We can now derive the bias in α that can be expected

to occur, when performing the PBEF-based estimation procedure without correct-

ing for MMS noise. If the MMS noise is not taken into account, then the equation,

Eθ[Y 2i ] =∆α, is erroneously used for constructing the PBEF. Therefore, the expected

bias in α is given by 2ω2

∆ , and as we shall see, this quantity matches with the bias


found in our Monte Carlo Study. Since we do not have an analytical expression for

the estimators, we will not attempt to derive the bias encountered in κ and σ.

As for the variance of the squared returns, it follows from (1.20) that

Y 2i = Y ∗2

i +ε2i +2Y ∗

i εi ,

and since the three terms are uncorrelated, we have that

Varθ(Y 2i ) = Varθ(Y ∗2

i )+Varθ(ε2i )+4Varθ(Y ∗

i εi ). (1.21)

Given the structure of the noise process, and since the noise process is normally

distributed with mean zero and a variance of 2ω2, we find that Varθ(ε2i ) = 8ω4. The

efficient return process and the noise process are independent, and both have zero

mean, so Varθ(Y ∗i εi ) = Eθ[ε2

i ]Eθ[Y ∗2

i ] = 2ω2∆α. Plugging this into (1.21) yields

Varθ(Y 2i ) = Varθ(Y ∗2

i )+8ω4 +8ω2∆α. (1.22)

Regarding the covariance structure of the squared returns, only the first order covari-

ance will change due to the M A(1) structure in the return errors ε. By, once again,

exploiting that Y ∗ and ε are independent and both have mean zero, we obtain the

following expression for the first order covariance of the observed squared return

series

Covθ(Y 2i ,Y 2

i+1) = Covθ(Y ∗2i ,Y ∗2

i+1)+Covθ(ε2i ,ε2

i+1) = Covθ(Y ∗2i ,Y ∗2

i+1)+2ω4. (1.23)

We can now compute the noise corrected version of the PBEF previously described.

Note that we can choose to estimate the variance of the noise,ω2, in a first step, by for

instance, plugging a non-parametric estimator into the noise corrected PBEF used for

estimating κ,α and σ. Another approach would be to expand the parameter vector

to θ = (κ,α,σ,ω2) and use the noise corrected PBEF to estimate all four parameters.

In the last approach one would have to choose a 4× (q +1) weight matrix, A(θ). This

will result in a 4×1 estimating function Gn(θ), such that our estimator θ is obtained

by solving four equations in four unknowns. We will follow the last approach and

estimate all four parameters in one step. Since we have chosen q = 3 and wish to

estimate ω2, the weight matrix will be a 4×4 matrix. This means, that the weight

matrix can be ignored when solving Gn(θ) = 0, under the assumption that A(θ) is

invertible and the sub-optimal PBEF we have considered so far will, in this setting, be

optimal.

1.2.2 A GMM Estimator based on Moments of Integrated Volatility

In this subsection the GMM estimation procedure from Bollerslev and Zhou (2002)

is reviewed and extended to handle MMS noise. In Bollerslev and Zhou (2002), the

moment conditions for constructing the GMM estimator arise from the analytical


derivations of the conditional first- and second-order moments of the daily integrated

volatility (IV ) process. We will consider both a parametric and a non-parametric way

of accounting for the presence of MMS noise in the data used for constructing the

GMM estimator. In the parametric approach, the moment conditions are adjusted

to hold in the MMS noise setting, and in the non-parametric approach we use a

noise robust estimate of IV , namely the realized kernel (RK ) from Barndorff-Nielsen,

Hansen, Lunde, and Shephard (2008a).

The GMM estimator without Noise in the Data

We now review the GMM estimator from Bollerslev and Zhou (2002) in the case where

we have observations from the Heston model without MMS noise. Since the daily

IV is latent, the realization of this time-series is approximated by the daily realized

variance (RV ). Replacing population moments of IV with sample moments of RV

result in an easy-to-implement GMM estimator. Once again, the statistical inference

will be based on the discretely sampled returns Yi = Xi∆ − X(i−1)∆, which we will

assume to be available at high frequencies. The GMM estimation method crucially

depends on the availability of high-frequency data, since high-frequency data will

ensure that RV is a good approximation of IV and, hence the moment conditions

will hold approximately for RV .

When considering the Heston model, the conditional moment conditions used

for constructing the GMM estimator are given by

Eθ[IVt+1,t+2 −δIVt ,t+1 −β|Gt ] = 0,

Eθ[IV 2t+1,t+2 −H(IV 2

t ,t+1)− I (IVt ,t+1)− J |Gt ] = 0,(1.24)

where IVt ,t+1 denotes the integrated volatility from day t to day t − 1 and Gt =σ

(IVt−s−1,t−s |s = 0,1,2, . . .∞

). The functions δ,β, H , I , and J are functions of the

parameters κ,α, and σ and can be found in Appendix C. The functions δ and β only

depend on the drift parameters κ and α which is why the second moment condi-

tion is needed. For further details on the derivation of the two conditional moment

conditions see Bollerslev and Zhou (2002) or Appendix C. To get enough moment

conditions to identify θ, the two moment conditions are augmented by IVt−1,t and

IV 2t−1,t , yielding a total of six moment conditions. By replacing daily IV with daily RV ,

and using the unconditional versions of these six moment conditions, we are now

able to construct a feasible GMM estimator for the parameters of interest θ = (κ,α,σ).

Letting T denote the number of trading days, the feasible GMM estimator is then

given by

θT = argminθ

( 1

T −2

T−2∑t=1

ft (θ))′

W( 1

T −2

T−2∑t=1

ft (θ)), (1.25)

with W = S−1, where S is a consistent estimate of the asymptotic covariance matrix


of gT (θ) = 1T−2

∑T−2t=1 ft (θ) and where ft (θ) is given by

ft (θ) =

RVt+1,t+2 −δRVt ,t+1 −βRV 2

t+1,t+2 −H(RV 2t ,t+1)− I (RVt ,t+1)− J

[RVt+1,t+2 −δRVt ,t+1 −β]RVt−1,t

[RV 2t+1,t+2 −H(RV 2

t ,t+1)− I (RVt ,t+1)− J ]RVt−1,t

[RVt+1,t+2 −δRVt ,t+1 −β]RV 2t−1,t

[RV 2t+1,t+2 −H(RV 2

t ,t+1)− I (RVt ,t+1)− J ]RV 2t−1,t

. (1.26)

Parametrically Correcting for Noisy Data in the GMM Approach

Recall, when MMS noise is present, the observed returns have the following structure

Yi = Xi −Xi−1 = (X ∗i −X ∗

i−1)+ (Ui −Ui−1) = Y ∗i +εi .

If we denote the realized variance based on the MMS noise contaminated returns by

RV MMS and the realized variance based on the efficient return process by RV ∗, we

can rewrite RV MMS over day t as

RV MMSt ,t+1 = RV ∗

t ,t+1 +m∑

i=1ε2

i ,t +2m∑

i=1εi ,t Y ∗

i ,t , (1.27)

where the number of intra-day observations are given by m :=∆−1.

The idea is to noise correct the GMM estimation approach from Bollerslev and

Zhou (2002) by adjusting the moment condition from (1.26) such that they hold

for RV MMS. In order to do so, we have to extend the filtration we condition on

to a larger filtration, making RV MMS measurable w.r.t that filtration. The moment

conditions from Bollerslev and Zhou (2002) were derived using the sigma-algebra

Gt =σ(IVt−s−1,t−s |s = 0,1,2, . . .

), that was approximated by Gt =σ

(RV ∗

t−s−1,t−s |s =0,1,2, . . .

). Instead of Gt , we will now consider the larger filtration, H t , generated by

RV ∗, the efficient returns process, Y ∗, and the noise process, ε, up until the beginning

of day t . Define

H t :=σ(RV ∗

t−s−1,t−s ,Y ∗i ,t−s−1,εi ,t−s−1|s = 0,1,2, . . . and i = 1,2, . . . ,m

),

where Y ∗i ,t−1 and εi ,t−1 for i = 1,2, . . . ,m denote the intra-day returns of the efficient

price process and the MMS noise process during day t − 1 respectively. We now

consider how to extend the first conditional moment condition from (1.24), using the

decomposition from (1.27)

Eθ[RV MMSt+1,t+2 −δRV MMS

t ,t+1 −β|H t ] = Eθ[RV ∗t+1,t+2 −δRV ∗

t ,t+1 −β|H t ] (1.28)

+Eθ[m∑

i=1ε2

i ,t+1 −δm∑

i=1ε2

i ,t |H t ] (1.29)

+2Eθ[m∑

i=1εi ,t+1Y ∗2

i ,t+1 −δm∑

i=1εi ,t Y ∗2

i ,t |H t ]. (1.30)


Let us first consider (1.28). From the moment conditions used in the no noise case

we know that we approximately get a zero when conditioning on Gt in (1.28) instead.

Under the assumption that this approximating still holds when information on the

efficient return series, Y ∗, up til time t is added to the sigma-algebra Gt , then

(1.28) will also equal zero approximately, since both the RV ∗ and Y ∗ series are

independent of the noise process. We do not use overnight returns, so the M A(1)

structure in the noise process only holds within the trading day. This means that it

will not impact the calculation of the conditional expectation (1.29), which will just

equal the unconditional expectation (1−δ)2ω2m. Since, as just discussed, the noise

process is independent of any realizations from previous days and has mean zero,

the conditional expectation (1.30) will just equal zero. All in all, this leaves us with the

noise adjusted moment condition

Eθ[RV MMSt+1,t+2 −δRV MMS

t ,t+1 −β− (1−δ)2ω2m|H t ] ≈ 0. (1.31)

As in the no MMS noise case we will augment the conditional moment condition

(1.31) by RV MMSt−1,t and (RV MMS

t−1,t )2 to get two additional moment conditions.

Turning our attention to the second conditional moment condition from (1.24),

we wish to compute

Eθ[(RV MMS

t+1,t+2)2 −H(RV MMSt ,t+1 )2 − I (RV MMS

t ,t+1 )− J |H t]. (1.32)

This task is however not feasible since it involves computing Eθ[Y ∗2

i ,t+1|H t ] and

Eθ[Y ∗2

i ,t |H t ]. If this was possible, we could use these expressions to form martin-

gale estimating functions and would not need to use PBEFs for the estimation of

our SV-model. The problem is that we do not have an analytical expression for the

conditional expectation of the squared returns during day t +1 and day t given the

filtration generation by the return series up until time t . Instead we will settle for four

moment conditions and simply use the unconditional expectation of (1.32) given by

Eθ[(RV MMS


t ,t+1 )− J −K −L]≈ 0, (1.33)

K = (1−H)(4m2ω4 +4mω2α+12mω4 −4mω4 +8ω2α),

L =−2mω2I

The derivation of the moment condition above can be found in Appendix C.

Non-parametrically Correcting for Noisy Data in the GMM Approach

In the presence of i.i.d. MMS noise that is independent of the efficient log-price

process, Hansen and Lunde (2006) show that the bias in RV equals 2∆−1ω2. In fact,

the variance of RV also diverges to infinity as the sampling frequency increases. In

the setting with MMS noise, RV is no longer a consistent estimator of IV . We will

therefore use a noise robust estimate of IV when constructing the GMM estimator.


Instead of basing the estimation procedure on the time-series of daily RV , we will use

the time series of daily realized kernels (RK ) from Barndorff-Nielsen et al. (2008a).

The estimator is then constructed using the moment conditions (1.26), replacing

RV with RK . We use the flat-top Tukey-Hanning2 kernel, since the resulting RK

is closest to being efficient in the setting of i.i.d. noise that is independent of the

observed process. As for the bandwidth, H , we follow the asymptotic derivations

from Barndorff-Nielsen et al. (2008a) and let H ∝ (1/∆)1/2, in order to obtain the

optimal rate of convergence, (1/∆)1/4, of RK to IV .13

1.3 A Monte Carlo Study of the Finite Sample Performances

1.3.1 The Setup and the Case without Noisy Data

In this subsection and the following, the finite sample performances of the PBEF-

based estimator from Sørensen (2000) and the GMM estimator from Bollerslev and

Zhou (2002) are investigated in a Monte Carlo study. This subsection investigates

the potential of using the intra-day returns directly in the PBEF-based estimator in

a setting without MMS noise. The benchmark used for evaluating the performance

of the PBEF-based method is the GMM approach from the previous subsection. In

the next subsection, we first consider a setup with mild model misspecification, in

the sense that we now add MMS noise to the simulated data and investigate how

this impacts the two estimation methods. Afterwards, the performance of the noise

corrected estimation methods are studied.

The data used for constructing the estimators are simulated realizations from the

Heston model (1.2). We use a first-order Euler scheme to simulate the volatility- and

log-price processes. The log-price is sampled every 30 seconds in the artificial 6.5

hours of daily trading, for sample sizes of T = 100,400,1000 and 4000 trading days.

Using the simulated data, daily realized variances based on the artificial five-minute

returns are constructed. We will think of the five-minute returns as our available data.

Since we are using five-minute returns over 6.5 hours of trading, we have ∆= 1/78.

To get a better grasp of the finite sample performance of the estimator based on

PBEFs, as well as the GMM estimator, we conduct our Monte Carlo experiment in

three different scenarios of parameter configurations.

• Scenario 1: (κ,α,σ) = (0.03,0.25,0.10). The volatility process is highly persis-

tent (near unit-root). The autocorrelation is given by r (u,θ) = e−κu and the

correlation between the volatility process sampled five minutes apart equals

e−0.03∗1/78 = 0.9996. The half-life of the volatility process equals 23.1 days.

13For further details on how the bandwidth is chosen, consult section 4 of Barndorff-Nielsen et al.(2008a).

1.3. A MONTE CARLO STUDY OF THE FINITE SAMPLE PERFORMANCES 17

• Scenario 2: (κ,α,σ) = (0.10,0.25,0.10). Here we have a slightly less persistent

volatility process due to the increase in the mean-reversion parameter. The

half-life now equals 6.93 days.

• Scenario 3: (κ,α,σ) = (0.10,0.25,0.20). The local variance of volatility is now

increased. This process is also close to the non-stationary region since the

CIR process is stationary if and only if 2κα ≥ σ2, and here 2κα−σ2 = 0.01

(compared to 0.04 in scenario 2).

The same scenarios were considered in the Monte Carlo study conducted in Bollerslev

and Zhou (2002). In Bollerslev and Zhou (2002), the authors only consider the rather

large sample sizes T = 1000 and T = 4000, corresponding to 4 and 16 years of data.

Our Monte Carlo study therefore also contributes by investigating the usability of this

method when less data are available. We impose strict positivity of the parameter

estimates κ, α and σ and use the true values of θ as starting values in the numerical

routines. In each case the number of Monte Carlo replications is 1000.

An interesting question that arises when considering PBEFs is how to optimally

choose q . No theory exists for this choice, as this would require knowledge of the

intractable conditional expectation Eθ[

f (Yi )|Y1, . . . ,Yi−1]

that we wish to approxi-

mate. One approach for answering this question could be to consider the partial-

autocorrelation function for f (Yi ) and the functions used for predicting f (Yi ) and

then choose q as the cut-off point where the function dies out. In our setting this

corresponds to considering the partial-autocorrelation function for the squared re-

turns. However, inverting the covariance matrix C (θ) can cause numerical challenges

and inaccuracies for large values of q . Instead, we start at the smallest interesting

choice, q = 3, and then later on investigate the sensitivity w.r.t the choice of q .14

When the parameters θ = (κ,α,σ) are estimated, we minimize Gn(θ)′Gn(θ) instead of

solving Gn(θ) = 0. In the implementation of the GMM estimation procedure, we use

continuously updated GMM, where the weight matrix is estimated simultaneously

with the parameters θ. The asymptotic covariance matrix of gT (θ) is estimated using

the heteroscedasticity and autocorrelation consistent estimator from Newey and

West (1987). Regarding the lag length in the Bartlett kernel we follow a rule-of-thumb

from Newey and West (1987), that is b4(T /100)2/9c. The results on the finite sample

performance of the two estimation methods in the absence of noise are summarized

in Table 1.1.

The potential of PBEFs are clear from Panel A in Table 1.1. The estimators are

practically unbiased for the large sample sizes, T = 1000,4000, and the bias for the

small sample size is of an acceptable size. For the small sample sizes, there is a

small downwards bias in κ and a more pronounced upwards bias in σ. The root

mean square relative errors (root MSRE) behave as expected, decaying with T and

14q = 2 would automatically result in an optimal PBEF because the weight matrix A(θ) would then be a3×3 matrix and could be disregarded when solving Gn (θ) = 0.


Table 1.1. Performance of estimators in absence of noise.

Panel A: PBEF based estimator with q = 3.

Relative Bias (%) Root MSRE (%)

100 400 1000 4000 100 400 1000 4000

Scenario 1

κ= 0.03 -6.924[0.295] -3.386[0.173] -1.467[0.111] -0.279[0.044] 11.97 6.648 3.975 1.499α= 0.25 -1.983[1.590] -2.150[0.893] -0.929[0.594] -0.256[0.311] 52.74 29.67 19.70 10.30σ= 0.10 19.97[0.974] 8.368[0.434] 3.486[0.260] 0.631[0.093] 37.96 16.64 9.309 3.151

Scenario 2

κ= 0.10 -1.207[0.113] -0.165[0.025] -0.062[0.018] -0.012[0.007] 3.934 0.842 0.584 0.245α= 0.25 0.185[0.570] -0.021[0.293] -0.265[0.186] -0.096[0.096] 18.91 9.726 6.172 3.175σ= 0.10 2.940[0.275] 0.348[0.048] 0.132[0.034] 0.026[0.015] 9.594 1.623 1.146 0.486

Scenario 3

κ= 0.10 -4.418[0.192] -1.436[0.111] -0.373[0.058] -0.129[0.025] 7.750 3.942 1.956 0.837α= 0.25 -2.285[1.101] -1.244[0.567] -0.337[0.364] 0.298[0.184] 36.59 18.83 12.06 6.104σ= 0.20 10.92[0.527] 3.347[0.254] 0.853[0.122] 0.275[0.051] 20.61 9.065 4.148 1.719

Panel B: GMM with daily realized variance from five-minute returns.


100 400 1000 4000 100 400 1000 4000

Scenario 1

κ= 0.03 180.7[10.34] 36.88[2.127] 15.26[1.110] 4.925[0.518] 386.5 79.59 39.85 17.85α= 0.25 226.5[47.73] 1.956[2.677] -3.437[0.620] -2.567[0.311] 1594 88.76 20.84 10.64σ= 0.10 -3.781[0.892] -0.762[0.367] 0.900[0.232] 2.053[0.127] 29.73 12.18 7.743 4.690

Scenario 2

κ= 0.10 54.14[3.840] 13.60[1.172] 7.114[0.644] 2.370[0.291] 138.3 41.15 22.51 9.948α= 0.25 51.57[24.82] -2.793[0.303] -2.640[0.187] -1.885[0.095] 824.0 10.43 6.747 3.668σ= 0.10 3.609[1.036] 5.573[0.390] 6.584[0.234] 6.977[0.115] 34.52 14.07 10.16 7.946

Scenario 3

κ= 0.10 74.58[4.084] 22.22[1.215] 10.67[0.673] 4.222[0.323] 154.5 45.99 24.73 11.51α= 0.25 33.65[11.96] -7.543[0.585] -5.095[0.364] -2.806[0.181] 397.6 20.82 13.10 6.629σ= 0.20 -0.839[0.643] 0.402[0.278] 1.301[0.170] 2.038[0.086] 21.31 9.234 5.796 3.504

The table reports the relative bias (with Monte Carlo std. error in brackets) and the root mean squared relativeerror (root MSRE) of the estimates from the sub-optimal PBEF and the RV-based GMM estimation procedurein the three different scenarios of parameter configurations. The number of Monte Carlo replications is 1000.


are roughly halved when the sample size grows from T = 100 to T = 400 and from

T = 1000 to T = 4000. The mean-reversion rate, κ, is most accurately estimated in

terms of root MSRE, whereas the other drift parameter, α, has the highest root MSRE.

All three parameters are easiest to estimate in Scenario 2, where the volatility process

is less persistent and less volatile. This could be because the other two scenarios are

closer to the non-stationary region where the Feller condition is violated. In Scenario

1, with a highly persistent volatility process, the root MSREs are higher compared to

the other two scenarios.

Turning our attention to Panel B of Table 1.1, we see that the GMM estimator from

Bollerslev and Zhou (2002) performs poorly when the sample size is small. In fact,

the results for T = 100 indicate that the method is not working for sample sizes this

small. For the larger sample sizes, our results match those found in Bollerslev and

Zhou (2002). The table reveals an upwards bias in κ and a smaller downwards bias in

α. The volatility of volatility parameter σ has a small, yet systematic, upwards bias

that actually seems to worsen when the sample size increases.15 The drift parameters

again appear to be easiest to estimate in Scenario 2. In contrast to the results for the

PBEF-based estimator, the most accurate estimates of σ are now found in Scenario 3,

where the volatility of volatility is high.

If we compare Panel A and B of Table 1.1, it is clear that the PBEF-based method

outperforms the GMM approach, especially when the sample size is small. The

informational content of 100 observations of daily realized variance is too small

to fully extract the dynamics of the underlying volatility process compared to 7800

observations of intra-day squared returns, and in general it seems that the PBEF-

based estimator is able to exploit the extra information contained in the intra-daily

returns. Furthermore, a GMM estimator based on 100 observations will, in general,

often result in inaccurate estimates. The gains from using PBEFs are most prominent

for the mean-reversion rate, κ, whereas the root MSREs for α are similar across

the two estimation methods for the larger sample sizes. The gains from using the

PBEF-based method might be even larger if the optimal PBEF was used. As already

discussed, the optimal PBEF could be constructed by simulating the optimal weight

matrix A∗(θ), but this would render the Monte Carlo study more time consuming.

Besides, the aim of the study is to investigate the potential of PBEFs by comparing the

performance of two easily implementable simulation-free estimation methods, and

the promising results for the sup-optimal PBEF only leave room for minor efficiency

gains. The study of the performance of the optimal PBEF is left for future research.

15 This bias in σ can, as found in Bollerslev and Zhou (2002), be explained by the variance of thediscretization error ut ,t+1 := RVt ,t+1 − IVt ,t+1, since Barndorff-Nielsen and Shephard (2002) show thatRV 2

t ,t+1 , is for any fixed sampling frequency, an upwards biased estimator of IV 2t ,t+1. To account for

this discretization error, Bollerslev and Zhou (2002) introduce a nuisance parameter, γ, and approximateIV 2

t+1,t+2 by RV 2t+1,t+2 −γ. We also implemented this simple discretization error correction and found,

in line with Bollerslev and Zhou (2002), that is helps remove the systematic bias in σ, but also roughlydoubles the root MSRE. We will therefore proceed without this correction in the rest of our Monte Carlostudy. The results with discretization error correction are available upon request.


2 3 4 5 6 7 8 9 100.95

1

1.05

1.1

1.15

1.2

1.25

1.3

1.35

1.4

1.45

Choice of q

RMSRE

forκ

Scenario 1 Scenario 2 Scenario 3

2 3 4 5 6 7 8 9 100.99

0.995

1

1.005

1.01

1.015

1.02

1.025

Choice of q

RMSRE

forα


2 3 4 5 6 7 8 9 100.95

1

1.05

1.1

1.15

1.2

1.25

1.3

1.35

1.4

1.45

Choice of q

RMSRE

forσ


Figure 1.1. Normalized root MSRE for the three parameter estimates in scenario 1-3, withT = 1000, plotted as a function of the number of predictor variables q . The root MSRE are ineach case normalized to have a minimum of 1.


To investigate the optimal choice of q in the PBEF, the impact on the root MSRE

from increasing q is examined. In Figure 1.1 the three different scenarios are consid-

ered for T = 1000, and the root MSREs for the three parameters are plotted against

the choice of q . The shape of the plots for κ andσ looks almost identical. In Scenarios

1 and 3, where the volatility process is close to the non-stationary region, q = 3 seems

to be the optimal choice for κ and σ. In Scenario 2, q = 6 appears to be the optimal

choice for both parameters. Looking at the three plots for α there does not seem to be

much variation in the root MSRE across the choice of q . In Scenario 1 and 3, where

we are close to the non-stationary region, the root MSRE decreases when q increases.

Therefore, we chose q = 3 in our Monte Carlo Study even though the variations in the

root MSREs are small. As already discussed, the optimal choice of q might depend on

the PBEF under consideration, that is, the choice of f and the functional form of the

basis elements in the predictor space. In the rest of our Monte Carlo study we will fix

q = 3.

1.3.2 Including Noise in the Observations

In this section, the impact of MMS noise on the parameter estimates from the two

estimation procedures is investigated. We consider the noise level, ω2 = 0.001, as this

choice is in line with the empirical estimates found for stock returns in Hansen and

Lunde (2006).16 First, we will simulate data from the Heston model with the inclusion

of MMS noise and then perform parameter estimation ignoring the presence of noise.

The resulting estimates will be analyzed in the next subsection. Then, in the following

subsection, the finite sample performance of the noise corrected estimators will be

investigated.

The Impact of Failing to Correct for the Presence of Noise

The finite sample performances of the two estimation methods without noise cor-

rection are summarized in Table 1.2. Panel A of the table reports the results for the

PBEF-based estimator, and Panel B reports the results for the GMM estimator based

on RV .

From the results it follows that the inclusion of MMS noise in the observed process

leads to biases in the parameters. Panel A of the table shows that the downwards bias

in κ has worsened in the presence of noise and the small downwards bias in α has

turned into a severe upwards bias. The bias in α matches exactly the expected bias,2ω2

∆ , which in relative terms equals 62.4%. The upwards bias in σ has also worsened,

but α is the parameter that is most affected by ignoring the presence of noise. For all

three parameters, the highest root MSREs still occur in Scenario 1, and the lowest in

Scenario 2. In Panel B, the results for the performance of the GMM estimator based

on RV reveal that the bias in κ has roughly doubled compared to the no noise setting,

16We also considered the noise level, ω2 = 0.0005, and the results are available upon request.


Table 1.2. Performance of estimators in presence of noise, ω2 = 0.001.



100 400 1000 4000 100 400 1000 4000

Scenario 1

κ= 0.03 -9.624[0.436] -8.276[0.239] -7.992[0.148] -7.544[0.055] 17.37 11.46 9.383 7.764α= 0.25 60.40[1.589] 60.22[0.893] 61.46[0.594] 62.14[0.311] 80.14 67.11 64.53 62.99σ= 0.10 36.08[1.915] 22.65[0.907] 19.58[0.504] 17.29[0.147] 73.02 37.63 25.74 17.96

Scenario 2

κ= 0.10 -5.750[0.091] -5.485[0.036] -5.490[0.022] -5.563[0.012] 6.492 5.614 5.539 5.577α= 0.25 62.66[0.572] 62.40[0.295] 62.13[0.186] 62.31[0.096] 65.46 63.16 62.44 62.39σ= 0.10 12.88[0.249] 11.92[0.087] 11.90[0.052] 12.06[0.028] 15.30 12.27 12.03 12.09

Scenario 3

κ= 0.10 -8.494[0.291] -7.842[0.132] -7.647[0.068] -7.717[0.031] 12.85 8.983 7.976 7.784α= 0.25 60.10[1.107] 61.18[0.567] 62.12[0.364] 62.71[0.184] 70.42 64.00 63.28 63.01σ= 0.20 24.43[1.154] 18.38[0.422] 17.16[0.180] 17.16[0.077] 45.39 23.10 18.17 17.35



100 400 1000 4000 100 400 1000 4000

Scenario 1

κ= 0.03 251.2[28.73] 48.49[2.902] 24.77[1.479] 13.07[0.678] 982.2 107.7 54.93 26.00α= 0.25 466.1[55.44] 82.07[11.53] 57.64[0.615] 58.53[0.310] 1890 390.9 61.13 59.42σ= 0.10 -21.71[1.531] -23.15[0.589] -21.23[0.373] -19.48[0.180] 55.06 30.28 24.56 20.37

Scenario 2

κ= 0.10 70.58[7.012] 14.78[1.829] 10.29[1.020] 5.459[0.459] 242.6 62.42 35.35 16.17α= 0.25 212.4[29.47] 59.98[0.471] 59.10[0.191] 59.67[0.096] 998.5 61.98 59.44 59.76σ= 0.10 -5.983[1.735] -10.01[0.728] -7.537[0.409] -7.136[0.192] 57.75 26.13 15.51 9.559

Scenario 3

κ= 0.10 99.39[9.789] 35.58[1.518] 20.86[0.797] 14.58[0.395] 339.4 61.63 33.67 19.59α= 0.25 210.8[41.62] 54.90[0.607] 56.99[0.372] 58.11[0.190] 1396 58.47 58.31 58.45σ= 0.20 -21.13[0.974] -20.20[0.291] -19.61[0.176] -18.99[0.091] 38.58 22.38 20.46 19.23

The table reports the relative bias (with Monte Carlo std. error in brackets) and the root mean squared relativeerror (root MSRE) of the estimates from the sub-optimal PBEF and the RV-based GMM estimation procedurein the three different scenarios of parameter configurations. The number of Monte Carlo replications is 1000.


and it is approximately twice the size of the bias in κ in Panel A. As for the other drift

parameter, the downwards bias in α has been turned into a severe upwards bias of a

similar size as the one found in Panel A. The sign of the bias in σ has also changed,

and the table now reports a downwards bias around the same size as the upwards

bias found in Panel A.

The root MSREs reported in Table 1.2 have all gone up compared to the no noise

case from Table 1.1, and the results show that failing to correct for noise strongly

impacts the parameter estimates. The long-run mean of volatility, α, appears to be

affected the most. As in the no noise setting of Table 1.1, the performance of the

GMM estimator is quite poor for small sample sizes, and it only produces trustworthy

results when T = 400 and we are in Scenario 2 or 3. When the larger sample sizes

T = 1000 and T = 4000 are considered, the root MSREs of κ are higher in Panel B, but

the impact of noise on the root MSREs of α and σ appears to be the same across the

two estimation methods.

Finite Sample Performances of the Noise Corrected Estimation Procedures

The performances of the estimator based on the noise corrected PBEF and the GMM

estimator where noise is corrected for parametrically are reported in Table 1.3. Table

1.4 presents the results for the GMM estimator based on RK .

The results in Panel A of Table 1.3 show that the noise corrected PBEF based

estimation procedure is able to correctly account for the presence of noise and

produce unbiased estimates for the larger sample sizes. In fact, the small upwards

bias in σ found in Table 1.1 has decreased significantly for the smaller sample sizes,

and it disappears as the sample size grows. The variance of the noise process, ω2, is

also accurately estimated. The root MSREs for κ are in general very similar to those

reported in Table 1.1. For the other drift parameter, α, the root MSREs have now

gone down, compared to the no noise setting. Due to the bias reduction in σ, for the

smaller sample sizes T = 100 and T = 400, the root MSRE are also lower than what

was reported in Table 1.1. For the larger sample sizes T = 1000 and T = 4000, the

root MSREs for σ are a bit higher or similar to those found in Table 1.1. For all three

parameters, the root MSREs are smallest in Scenario 2, as was also the case in the no

noise setting.

The performance of the parametrically noise corrected GMM estimator is sum-

marized in Panel B of Table 1.3. An inspection of the results shows that although the

GMM estimator do not produce unbiased estimates it succeeds at incorporating the

noise and produce estimates with small biases comparable to those from Table 1.1.

In fact, the biases are now a bit lower than in Table 1.1, with the difference being most

apparent in Scenario 3. The results for the smallest sample size, T = 100, appear unre-

liable, in the sense that 100 observations are simply not enough to infer the dynamics

of the underlying volatility process and produce estimates with an acceptable level of

bias and root MSREs. Comparing the root MSREs to those reported in Table 1.1, we


Table 1.3. Performance of noise corrected estimators, ω2 = 0.001.



100 400 1000 4000 100 400 1000 4000

Scenario 1

κ= 0.03 -2.119[0.398] -1.157[0.190] -0.302[0.106] -0.191[0.060] 13.35 6.388 3.535 1.986α= 0.25 -0.945[1.166] -1.326[0.657] -0.549[0.432] -0.130[0.225] 38.66 21.81 14.35 7.475σ= 0.10 8.567[0.676] 3.595[0.414] 0.985[0.221] 0.505[0.123] 24.00 14.17 7.389 4.097ω2 = 0.001 -1.680[0.699] -1.359[0.386] -0.624[0.263] -0.200[0.139] 23.22 12.87 8.742 4.614

Scenario 2

κ= 0.10 0.230[0.156] 0.090[0.047] 0.019[0.030] 0.020[0.014] 5.176 1.567 0.993 0.469α= 0.25 0.185[0.413] 0.016[0.213] -0.185[0.135] -0.060[0.069] 13.71 7.068 4.464 2.300σ= 0.10 0.023[0.189] -0.114[0.085] -0.009[0.058] -0.033[0.029] 6.278 2.805 1.928 0.947ω2 = 0.001 0.121[0.261] -0.023[0.134] -0.130[0.084] -0.052[0.044] 8.652 4.456 2.796 1.456

Scenario 3

κ= 0.10 -1.743[0.247] -0.297[0.110] -0.134[0.077] -0.088[0.042] 8.356 3.649 2.554 1.403α= 0.25 -1.350[0.817] -0.742[0.416] -0.163[0.267] 0.251[0.134] 27.13 13.79 8.865 4.454σ= 0.20 5.617[0.520] 0.991[0.226] 0.466[0.159] 0.235[0.086] 18.13 7.545 5.290 2.864ω2 = 0.001 -1.512[0.483] -0.761[0.260] -0.178[0.170] 0.096[0.082] 16.07 8.661 5.648 2.731



100 400 1000 4000 100 400 1000 4000

Scenario 1

κ= 0.03 227.0[36.85] 37.05[2.709] 13.99[1.452] 3.166[0.683] 1226 97.15 50.13 22.88α= 0.25 504.5[94.12] 1.871[2.559] -2.444[0.468] -1.674[0.228] 3118 84.86 15.71 7.732σ= 0.10 2.073[5.774] -5.076[0.816] -1.670[0.440] 0.909[0.216] 188.8 27.52 14.70 7.209ω2 = 0.001 22.76[3.409] 5.935[1.124] -0.732[0.300] -1.308[0.143] 113.8 37.74 9.987 4.912

Scenario 2

κ= 0.10 78.30[6.307] 11.78[1.773] 6.247[1.000] 1.541[0.459] 223.1 59.96 33.75 15.29α= 0.25 209.6[59.22] -2.932[0.511] -1.869[0.147] -1.248[0.071] 1973 17.21 5.219 2.662σ= 0.10 6.528[2.338] 3.196[0.832] 6.355[0.451] 6.846[0.214] 77.73 27.78 16.25 9.855ω2 = 0.001 18.27[3.274] 1.651[0.392] -1.276[0.118] -1.736[0.050] 110.0 13.11 4.109 2.401

Scenario 3

κ= 0.10 69.43[4.119] 17.70[1.287] 7.345[0.721] 2.263[0.373] 153.1 46.20 25.01 12.57α= 0.25 136.3[54.52] -2.967[0.430] -2.182[0.266] -1.422[0.134] 1811 14.57 9.088 4.674σ= 0.20 -1.539[1.072] -0.614[0.342] 0.546[0.208] 1.784[0.109] 35.54 11.35 6.919 4.020ω2 = 0.001 15.14[4.856] -1.739[0.270] -1.611[0.162] -1.402[0.081] 161.5 9.123 5.597 3.035

The table reports the relative bias (with Monte Carlo std. error in brackets) and the root mean squaredrelative error (root MSRE) of the estimates from the noise corrected versions of the sub-optimal PBEF andthe RV-based GMM estimation procedure in the three different scenarios of parameter configurations. Thenumber of Monte Carlo replications is 1000.


Table 1.4. Performance of of the GMM estimator based on RK , ω2 = 0.001.


100 400 1000 4000 100 400 1000 4000

Scenario 1

κ= 0.03 282.0[26.76] 36.34[3.053] 13.81[1.716] 3.484[0.834] 928.8 107.5 58.55 27.85α= 0.25 398.7[62.41] 13.14[5.635] -4.466[0.719] -4.056[0.308] 2103 187.3 24.25 10.98σ= 0.10 15.83[2.544] 3.289[1.138] 6.062[0.802] 9.292[0.423] 85.63 37.85 27.26 16.83

Scenario 2

κ= 0.10 179.6[30.65] 12.80[2.324] 5.320[1.375] 1.527[0.655] 1031 78.10 45.88 21.75α= 0.25 206.5[32.43] 4.226[2.124] -3.348[0.407] -3.674[0.095] 1094 70.54 13.89 4.840σ= 0.10 50.92[6.523] 23.22[1.409] 26.45[0.899] 28.85[0.425] 222.1 52.15 39.85 32.10

Scenario 3

κ= 0.10 75.02[4.856] 19.33[1.403] 8.028[0.779] 2.546[0.389] 177.5 50.37 27.03 13.14α= 0.25 166.0[35.48] -7.895[0.584] -5.964[0.356] -4.141[0.179] 1187 20.90 13.23 7.238σ= 0.20 2.936[1.112] 6.264[0.499] 7.654[0.327] 9.132[0.178] 36.95 17.69 13.26 10.87

The table reports the relative bias (with Monte Carlo std. error in brackets) and the root mean squared relativeerror (root MSRE) of the estimates from the RK-based GMM estimation procedure in the three differentscenarios of parameter configurations. The number of Monte Carlo replications is 1000.

find that the root MSREs are in general similar to the no noise setting, with the root

MSREs for κ and σ being a bit bigger than in Table 1.1 and the root MSREs for α a bit

smaller. The patterns, found in Panel B of Table 1.1, across the different scenarios

are repeated, the drift parameters have lower root MSREs in Scenario 2, whereas σ is

most accurately estimated in Scenario 3.

When comparing Panel A and B of Table 1.3, it is clear that the PBEF-based

estimation method produces more accurate parameter estimates. The root MSREs

of α are quite similar for the two methods, but the root MSREs are lower in Panel A,

for the other two parameters and the noise variance. The difference between the two

methods is most prominent for the mean-reversion parameter, which is extremely

well estimated with the PBEF-based method.

We also considered estimating the Heston model using noisy data, by simply

replacing RV with a realized kernel, RK , in the original moment conditions from

Bollerslev and Zhou (2002). The finite sample performance of this estimator is sum-

marized in Table 1.4. Even though the estimator is based on six moment conditions,

compared to the four moment conditions used for constructing the parametrically

noise corrected estimator, the performance is not nearly as good. The biases in the

parameters are larger, but have the same signs as in Panel B of Table 1.3. The small

systematic bias in σ due to the dicretization error and noisy data has also increased

compared to Table 1.1. This could be explained by the slower rate of convergence of

RK to IV , compared to the convergence rate of RV . The biases in the drift parameters

are however comparable to those reported in the no noise setting, and for κ the bias

is in fact a bit lower. The two ways of correcting for noise in the GMM approach give

rise to somewhat similar results for the mean-reversion rate κ, but the root MSREs


are lower for α and σ when the noise corrected estimator based on RV is employed.

In the parametrically noise corrected GMM approach, the noise varianceω2 is also es-

timated. This was not possible in the non-parametric way of accounting for noise. Of

course, not having to specify the dynamics of the noise process could be an advantage

in other applications where model misspecification might occur. The investigation

of robustness towards misspecification of the noise process is outside the scope of

this chapter. Note that the noise specification also impacts the optimal choice of

kernel used for constructing the time-series of RK . In our setting, without model

misspecification, the parametric approach of correcting for the noise outperforms

the non-parametric approach.

In conclusion, the PBEF-based estimation method produced promising results,

and it appears that the method is able to exploit the extra information contained in

directly using the dynamics of the intra-day returns, instead of aggregating them into

realized measures. When little data history is available, the difference between the

two methods also becomes more pronounced, as the PBEF-based method is based on

moments of high frequency data and not on moments of the daily realized measures.

The PBEF-based method also handles the presence of noise better than the GMM

approach, at least in our simulation setting.

It still remains to be investigated how the two estimation methods perform in

empirical application to economic data, such as stock returns, a task that we will

undertake in the following section.

1.4 Empirical Application

In this section we use actual five-minute returns as input in the two estimation

methods analyzed in our Monte Carlo study. We are well aware that the Heston model

might not fit the chosen data. This is however not the purpose of this exercise. The

empirical application should rather be seen as a check of what happens when the

estimation methods are used to fit a (possibly misspecified) model to real data. The

empirical application is also an investigation of how different choices, such as how to

correct for MMS noise in the GMM estimator and the choice of predictor space in the

flexible PBEF-based method, might affect the parameter estimates.

1.4.1 Data description

For our empirical illustration we use five-minute returns for SPDR S&P 500 (SPY).

SPY is an exchange traded fund (ETF) that tracks the S&P 500. The sample covers

the period from January 4, 2010 through December 31, 2013. We sample the first

price each day at 9:35 and then every 5 minute until the close at 16:00. Thus, we

have 77 daily five-minute returns for each of the 1006 trading days in our sample,

yielding a total of 77462 five-minute returns. Careful data cleaning is important when

estimating volatility models from high-frequency data. Numerous problems and

1.4. EMPIRICAL APPLICATION 27

solutions are discussed in Falkenberry (2001), Hansen and Lunde (2006), Brownless

and Gallo (2006) and Barndorff-Nielsen, Hansen, Lunde, and Shephard (2008b). We

follow the step-by-step cleaning procedure used in Barndorff-Nielsen et al. (2008b).

As a first inspection of the data characteristics, we consider the empirical auto-

correlation functions for the squared five-minute returns, reported in the top panel

of Figure 1.2. The autocorrelation function does not seem to be exponentially de-

caying, revealing that the Heston model will not be able to properly account for the

dynamics of the data. However, our main interest lies in investigating whether the two

estimation methods will yield similar parameter estimates or whether they perform

differently. The autocorrelation function also exhibits cyclical patterns corresponding

to a lag length of one trading day. This is due to the well-documented intra-day peri-

odicity in volatility in foreign exchange and equity markets, see for instance Andersen

and Bollerslev (1997) or Dacorogna, Müller, Nagler, Olsen, and Pictet (1993).

1 77 154 231 308 385 462 539 616 693 770

−0.02

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

ACF,5min.retu

rns

Lag length j

1 77 154 231 308 385 462 539 616 693 770

−0.02

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

ACF,adj.

5min.retu

rns

Lag length j

Figure 1.2. Autocorrelation function for the squared five-minute returns (top) and the adjustedsquared five-minute returns (bottom) on SPY.

The intra-daily periodicity in the volatility might cause the two estimation meth-

ods to perform differently. The intra-daily periodicity should not affect the GMM esti-

mator much, since the intra-daily pattern in volatility will be “smoothed out” when

the five-minute returns are aggregated into the daily realized measures. The same

does not apply for the estimator based on PBEFs, since this estimator is based directly

on the squared five-minute returns. Hence, the intra-daily periodicity might effect

the parameter estimates when the PBEF-based estimation method is carried out. In


order to avoid this, the intra-daily volatility pattern is captured by fitting a spline

function to the intra-daily averages of five-minute returns using a non-parametric

kernel regression17.

The data are then adjusted for periodicity in intra-daily volatility by dividing the

squared returns by the fitted values from the spline function, matched according to

the intra-daily five-minute interval in which the observation falls. Finally, the squared

returns are normalized such that the overall variance of the squared returns remains

unchanged. The bottom panel of Figure 1.2 displays the autocorrelation function

for the adjusted data. From the figure it is clear that the intra-daily periodicity has

been removed. It is however also evident that the autocorrelation function is not

exponentially decaying, rendering the Heston model a poor model choice. There

seems to be a need for at least a two factor SV model, in order to properly capture

the dynamics of the autocorrelation function. One factor is needed for capturing

the fast decay in the autocorrelation function at the short end, whereas the other

factor should be slowly mean-reverting and thereby account for the persistent or long

memory-like factor in the variance.

1.4.2 Estimation results

In the Heston model, the decay rate of the autocorrelation function for the squared

returns is uniquely governed by the mean-reversion parameter κ. Due to the dynamic

structure of the autocorrelation function, discussed above, the choice of prediction

space might heavily influence the estimated value of κ. Depending on the largest

time lag of past squared returns included in the predictor space, different dynamics

might be captured. When fitting the Heston model to the adjusted data, we hold the

dimension of the predictor space fixed at 4, (q=3), but consider four different choices

of basis elements, spanning the predictor space. The four cases correspond to having

1 hour, 1 day, 2 days, and 4 days between each of the basis elements.18 The time lag

between the basis elements will be denoted by the variable l . The simple choice of

weight matrix used in the Monte Carlo study is also employed when constructing the

PBEFs. We will only consider the noise corrected estimators, as MMS noise is a stylized

feature of the present data. The variance of the noise process is not estimated in the

GMM approach based on RK , so instead we report the non-parametric estimate of

the daily MMS noise variance, using the formula favored in Barndorff-Nielsen et al.

(2008a)

ω2 = exp[log(ω2)−RK /RV ],

17We use the fit function in MATLAB with a smoothing spline, and we set the smoothing parameterequal to 0.001.

18That is, in the first case we let the predictor space be spanned by Y 2i−1,Y 2

i−13,Y 2i−25 and a constant.

In the second case we choose Y 2i−1,Yi−78,Yi−155 and a constant as the basis elements of the predictor

space and so on.

1.4. EMPIRICAL APPLICATION 29

with RK and RV constructed using the intra-daily adjusted five-minute returns and

where ω2 = RV /(2m). In our case m = 77, and the overall estimate of the MMS noise

variance is found by averaging the 1006 daily estimates. For the SPY data we find

ω2avg = 0.001563. The realized kernel is now computed using the parzen kernel with

H ∝ (1/∆)3/5 as recommended in Barndorff-Nielsen et al. (2008b) for empirical

applications. The obtained convergence rate of RK to IV is now (1/∆)1/5. The choice

of bandwidth H , that resulted in the convergence rate of (1/∆)1/4 in our Monte Carlo

study, relies heavily on the assumption of i.i.d. MMS noise which might not hold in

practice.

The results from fitting the Heston model to the data, using the various estimators,

are reported in Table 1.5. Computation of asymptotic standard errors for the PBEF

estimator is challenging. It involves computation of the matrix M n(θ), that also enters

the expression for the optimal weight matrix A∗(θ), and was in fact the main reason

why we focused on the sub-optimal PBEF in our Monte Carlo study. Therefore, we

resort to bootstrap methods for computing standard errors. The standard errors and

95% confidence intervals (CI) reported in Table 1.5 are computed using the moving

block bootstrap, as recommended in Lahiri (1999). The confidence intervals are

equal tail intervals, constructed using the percentile method. As for the block length

in the bootstrap method, then T 1/3(≈ 10) days is the rule-of-thumb advocated in

Hall, Horowitz, and Jing (1995) for constructing standard errors. Due to the strong

persistence in the data, we choose to be conservative and use a block length of 20

days.

Table 1.5 also reports the fit to the moments of the adjusted squared returns im-

plied by the parameter estimates from the various estimation methods. The obtained

parameter estimates vary across the different estimation methods, but all are mean-

ingful and within the same range. From Table 1.5, we see that the two different ways

of noise correcting the GMM estimator impact the parameter estimates, especially

σ. The estimated noise variance in the parametrically noise corrected method is

only about half of the non-parametric estimate ω2avg . It also becomes evident that

the choice of the predictor space, represented by the l variable, highly impacts the

parameter estimates. When l is low, κ is high, and more emphasis is put, by the PBEF

estimator, on capturing the fast decaying part in the short end of the autocorrelation

function. As the l variable increases, κ drastically drops and reveals the need for

several volatility factors in order to fully capture the dynamics of the data. The split

between how much of the mean of the squared adjusted returns is due to the long-run

mean of the volatility process, α, and how much is due to the variance of MMS noise,

ω2, also varies with the l variable. An exception is when l = 77, where the estimated

noise variance is similar to the estimate obtained with the parametrically noise cor-

rected GMM method, otherwise the noise variance is either severely underestimated

or potentially overestimated. This could be a consequence of the i.i.d. Gaussian noise

assumption, which might not hold. It could also be that the PBEF-based method has


Tab

le1.

5.E

stim

atio

nre

sult

sfo

rth

ees

tim

atio

nm

eth

od

sw

ith

no

ise

corr

ecti

on

.

100×

100×

100×

Est

imat

ion

met

ho

dκ

ασ

FC

ω2

E[Y

2 i]

V(Y

2 i)

AC

F1

(Y2 i

)

Sam

ple

mo

men

ts0.

684

0.04

420.

240

GM

Mw

ith

RK

0.07

76(0

.056

1)0.

500

(0.1

11)

0.48

5(0

.120

)0.

157

(0.1

15)

0.15

63(0

.025

0)0.

961

0.05

670.

233

[0.0

469:

0.27

70]

[0.3

47:0

.692

][0

.299

:0.7

45]

[-0.

012:

0.38

6][0

.111

9:0.

2103

]

GM

Mn

ois

eCo

rr.

0.10

13(0

.052

8)0.

426

(0.1

86)

0.77

5(0

.223

)0.

514

(0.3

56)

0.08

80(0

.009

4)0.

866

0.07

880.

276

[0.0

317:

0.25

86]

[0.2

69:0

.661

][0

.456

:1.2

84]

[0.1

27:1

.460

][0

.076

8:0.

1059

]

PB

EF

1st

ep

q=

3,l=

120.

3874

(0.2

085)

0.40

3(0

.140

)1.

017

(0.5

20)

0.72

1(1

.897

)0.

0803

(0.0

745)

0.83

60.

0412

0.23

1[0

.103

6:0.

9379

][0

.144

:0.6

41]

[0.4

13:2

.319

][-

0.07

5:5.

160]

[0.0

004:

0.24

50]

q=

3,l=

770.

2593

(0.0

532)

0.51

4(0

.078

)0.

786

(0.3

50)

0.35

1(0

.908

)0.

0085

(0.0

111)

0.98

10.

0502

0.21

5[0

.230

0:0.

4461

][0

.375

:0.6

80]

[0.4

37:1

.642

][-

0.07

5:2.

313]

[0.0

004:

0.00

86]

q=

3,l=

154

0.09

56(0

.050

6)0.

128

(0.0

59)

0.63

0(0

.617

)0.

373

(2.5

87)

0.25

98(0

.063

1)0.

479

0.01

800.

275

[0.0

376:

0.24

74]

[0.0

34:0

.268

][0

.195

:2.3

76]

[0.0

20:5

.635

][0

.125

9:0.

3741

]

q=

3,l=

308

0.06

77(0

.033

0)0.

139

(0.1

32)

0.53

5(0

.405

)0.

267

(1.1

85)

0.25

34(0

.089

9)0.

493

0.01

970.

276

[0.0

539:

0.18

07]

[0.0

53:0

.547

][0

.251

:1.6

84]

[0.0

21:2

.824

][0

.008

7:0.

3493

]

PB

EF

2st

ep

q=

3,l=

120.

2356

(0.2

232)

0.28

6(0

.043

)0.

876

(0.6

32)

0.63

2(2

.032

)0.

1563

(0.0

250)

0.68

40.

0329

0.25

3[0

.008

5:0.

8752

][0

.214

:0.3

79]

[0.1

18:2

.372

][0

.007

:5.3

01]

[0.1

119:

0.21

03]

q=

3,l=

770.

2270

(0.0

498)

0.28

7(0

.043

)0.

894

(0.4

66)

0.66

8(1

.491

)0.

1563

(0.0

250)

0.68

50.

0349

0.25

7[0

.166

7:0.

3747

][0

.214

:0.3

79]

[0.4

11:2

.008

][0

.064

:3.8

53]

[0.1

119:

0.21

03]

q=

3,l=

154

0.15

85(0

.038

6)0.

287

(0.0

43)

0.70

9(0

.376

)0.

412

(0.9

66)

0.15

63(0

.025

1)0.

686

0.03

250.

251

[0.1

255:

0.28

13]

[0.2

13:0

.378

][0

.325

:1.6

69]

[0.0

35:2

.658

][0

.111

5:0.

2103

]

q=

3,l=

308

0.09

63(0

.025

3)0.

289

(0.0

43)

0.56

9(0

.301

)0.

269

(0.5

61)

0.15

63(0

.025

0)0.

687

0.03

400.

255

[0.0

892:

0.19

12]

[0.2

13:0

.379

][0

.295

:1.3

55]

[0.0

34:1

.746

][0

.111

9:0.

2103

]

Th

eta

ble

rep

orts

the

pa

ram

eter

esti

ma

tes

from

fitt

ing

the

Hes

ton

mod

elto

the

SPY

da

ta.T

he

vari

abl

eF

Cd

enot

esth

eFe

ller

con

dit

ion

,σ2−2

κα

,an

dit

isp

osit

ive

ifth

ep

aram

eter

con

stra

inti

svi

olat

ed.T

he

tabl

eal

sore

por

tssa

mp

lem

omen

tsas

wel

las

theo

reti

calm

omen

tsim

pli

edby

the

vari

ous

obta

ined

par

amet

eres

tim

ates

.Th

est

d.e

rror

san

d95

%eq

ual

tail

CI’s

are

com

pu

ted

usi

ng

the

mov

ing

bloc

kbo

otst

rap

wit

ha

bloc

kle

ngt

hof

20d

ays

and

B=9

99.

1.5. CONCLUSION AND FINAL REMARKS 31

problems identifying ω2 and α in the data at hand. We therefore also consider fixing

the noise variance at the estimate ω2avg and only estimate the three parameters from

the Heston model. This procedure is denoted by PBEF 2 step in Table 1.5.

The results from the PBEF 2 step method reveals that fixing the noise variance

mainly affects the estimate of α, which is now constant across the choice of predictor

space. The PBEF 2 step method also appears more stable, with the relative standard

errors19 being constant across the values of the time lag in the predictor space, except

for the intra-day time lag (l = 12). Checking for parameter stability across the different

predictor spaces employed could serve as a general robustness check that might

reveal model misspecification. The last three columns of Table 1.5 report the model-

implied fit to the sample moments of the data. The mean of the adjusted squared

returns is extremely well fitted by the PBEF 2 step procedure, whereas the other

methods do not give as good fits. The variance is however reasonably well matched

when the lower l values are used in the PBEF 1 step procedure, but is poorly matched

when the l variable is high. The PBEF 2 step procedure and the GMM estimator based

on RK also produce reasonable fits to the variance. The first order autocorrelation

of the squared returns are best matched by the GMM method based on RV and the

PBEF 1 step method with l = 12, with the PBEF 2 step approach being the runner-up.

Not surprisingly, the overall best fit is obtained by the PBEF-based method, and the

fit appears more stable when the PBEF 2 step approach is used.

From Table 1.5 we also observe that the Feller condition is violated in all the

estimation procedures, indicating that the Heston model provides a poor fit to the

data.20 However, this has no influence on the specific aim of this section, which

was to investigate how the two different estimation methods handle real data with

possible model misspecification. The problem seems to be that the dynamic structure

implied by the Heston model is not flexible enough to adequately model the observed

dynamics. The need for allowing for several volatility factors is best highlighted by

the PBEF-based estimation method. The flexibility of the PBEF-based method can

in general serve as a robustness check of the specified model, including the noise

specification.

1.5 Conclusion and Final Remarks

The general theory underlying PBEFs was reviewed and detailed. We explicitly con-

structed PBEFs for parameter estimation in the Heston model with and without the

inclusion of noise in the data. Implementation issues and the link between opti-

mal GMM estimation and the optimal PBEF were also derived. As a benchmark for

evaluating the performance of the PBEF-based estimator, we considered the GMM

estimator from Bollerslev and Zhou (2002), and we extended the method to handle

19The standard errors normalized by the parameter estimate.200 is actually contained in the CI’s for the FC variable when the GMM method based on RK and the

PBEF 1 step method with l = 12 and l = 77 are employed, but only barely.


noisy data in a parametric and non-parametric way. The finite sample performance

of the estimator based on PBEFs were investigated in a Monte Carlo study and com-

pared to that of the GMM estimator. Both the case with and without the inclusion

of additive i.i.d. MMS noise was considered. In the no MMS noise setting, there are

gains to be made from using PBEFs, both in terms of bias and root MSRE, especially

when the sample size is small. The gain from using the PBEF-based method was most

prominent for the mean-reversion parameter, κ, that was extremely well estimated.

The PBEF-based method produced promising results in all three parameter configu-

rations, but the root MSREs were lower when the volatility process was less persistent

and less volatile.

Including MMS noise in the observation equation, but neglecting to correct

for it, produced biased estimates, with the upwards bias in the long run average

variance, α, being most severe. We then considered the performance of the noise

corrected estimation methods. The PBEF-based estimator and the parametrically

noise corrected GMM estimator produced results similar to those found in the no

MMS noise setting. The non-parametric approach, where the GMM estimator is

based on RK , did not perform as well. This result is probably caused by the slower

convergence rate of RK compared to RV , making it a more noisy proxy for IV . The

non-parametric approach also has the drawback of not producing an estimator for

the noise variance. Even though the parametric way of correcting the GMM estimator

for noise produced results similar to those found in the no MMS noise setting, the

estimator could not compete with the results obtained using the noise corrected

PBEF. The difference is again more pronounced for the mean-reversion rate, but

the volatility of volatility parameter, σ, and the noise variance, ω2, were also more

accurately estimated with the PBEF-based method.

Based on our Monte Carlo study, PBEFs seem like a promising tool for conducting

inference in stochastic processes, and it appears that the method is able to exploit

the additional information contained in the intra-day returns. The gain from using

high frequency observations directly might however come at a cost and one concern

regarding the application of the PBEF based method to real data is the possible

sensitivity towards intra-daily dynamics, like intra-daily periodicity in volatility. With

the GMM based estimator, this intra-daily periodicity is not of any concern, as this is

averaged out in the aggregation step. On the other hand, if the data is only available

at low frequencies, then the PBEF-based method has an obvious advantage. In our

empirical application, by fitting the Heston model to SPY data, we investigated how

the two different approaches handle real data. The data was cleaned and corrected

for the intra-daily volatility pattern. The aim was not to provide the best fitting model,

but to investigate how the methods deal with possible model misspecification. The

empirical application revealed that the choice of estimation approach impacts the

parameter estimates. The study also made it clear, how the great flexibility of the

PBEF based estimation method could serve as a way of conducting robustness checks,

1.5. CONCLUSION AND FINAL REMARKS 33

for instance by checking for parameter stability across different time-spans of the

predictor space.

An interesting extension of the Heston model would be to allow for several inde-

pendent volatility factors, in order to better capture the persistence in the data. It is

possible to derive PBEFs in this setup as long as the mean, variance and covariance

structure is computable for each of the volatility factors. It would also be of interest

to see how the estimation method based on PBEFs performs if we extend the Monte

Carlo setup by leaving the assumption of i.i.d. noise. This would however complicate

the construction of the MMS noise corrected PBEF and the recalculation of the mo-

ments used for constructing the GMM estimator. A solution to this potential problem

could be to filter out the noise in a first step using the method of pre-averaging intro-

duced by Jacod, Li, Mykland, Podolskij, and Vetter (2009), instead of modeling the

noise directly. The performance of this approach is still to be investigated. Since the

PBEF based estimation method is quite general, an important contribution to the

existing literature would be to consider PBEFs in a setting where the driving sources

of randomness are general Lévy processes, like the models considered in Brockwell

(2001), Barndorff-Nielsen and Shephard (2001), and in Todorov and Tauchen (2006).

Finally, quantifying the gain from using the optimal PBEF in different settings, as well

as how to best simulate or approximate the optimal weight matrix, would also be a

topic for future research.


1.6 Appendix

Appendix A: A Note on Orthogonal Projection

Let Y , Z1, . . . , Zn denote random variables with finite second moments. We wish to

compute the orthogonal projection, Y , of the random variable Y on the linear space

V = span1, Z1, . . . , Zn. To that end, let us introduce the notation

Z = (Z1, . . . , Zn

),

mY = E [Y ],

mZ = (E [Z1], . . . ,E [Zn]

),

Cov(Z,Z) = E[(Z−mZ)(Z−mZ)T ]

,

Cov(Y ,Z) = E[(Y −mY )(Z−mZ)T ]

.

From Karlin and Taylor (1975) we know that the orthogonal projection, Y , of Y on

V is an element of V that fulfills the normal equations E[v(Y − Y )

]= 0 for all v ∈ V .

This means (Y − Y )⊥V due to our definition of the inner product in this L 2-space.

Theorem 1. Under the assumption of a non-singular covariance matrix Cov(Z,Z), the

orthogonal projection Y exists and is given by

Y = mY +Cov(Y ,Z)Cov−1(Z,Z)(Z−mZ).

Proof. Let a = (a1, . . . , an) be an arbitrary vector and consider the random variable φ

given by

φ= (Y −mY )+a(Z−mZ).

The aim is now to choose the vector, a, such thatφ becomes orthogonal to V , because

in this case we obtain the following decomposition of Y

Y = mY −a(Z−mZ)︸︷︷︸∈V

+ φ︸︷︷︸∈V ⊥

,

and hence we have Y = mY −a(Z−mZ). By construction, we know that E [φ] = 0 and

E [Z−mZ] = 0, and since we furthermore have φ ∈ V ⊥ and Z−mZ ∈ V we get

E [φ(Z−mZ)] = 0.

Combining this with the definition of φ, we obtain the following equation

Cov(Y ,Z)+aCov(Z,Z) = 0

In order to ensure φ ∈ V ⊥ we conclude that we should put

a =−Cov(Y ,Z)Cov−1(Z,Z).

Thus, the orthogonal projection of Y onto V is given by

Y = mY +Cov(Y ,Z)Cov−1(Z,Z)(Z−mZ).

1.6. APPENDIX 35

Appendix B: General Theory on Optimal Estimating Functions

It is well-known that the ideal choice of estimating function would be to use the

score function, Un(θ), since this usually yields an efficient estimator and provides

a minimal sufficient partitioning of the sample space. However, the score function

might be unavailable or difficult to calculate, so the need for a optimal estimating

function within a class of estimating functions arises. We focus only on the class, G ,

of zero mean, square integrable estimating functions Gn(θ) :=Gn(Yi , i ≤ n,θ). We

furthermore assume that the p ×p matrices Eθ[∂θT Gn(θ)] and Eθ[Gn(θ)Gn(θ)T ] are

non-singular. Let Gn ⊆G and consider the standardized estimating function given by

G (s)n (θ) =−Eθ[∂θT Gn(θ)]T (

Eθ[Gn(θ)Gn(θ)T ])−1Gn(θ).

OF -optimality, (fixed sample optimality / Godambe optimality), within Gn is achieved

by maximizing the covariance matrix of the standardized estimating functions. That

is, the information criterion to be maximized is the Godambe information

I (Gn(θ)) = Eθ[G (s)n (θ)G (s)

n (θ)T ] = Eθ[∂θT Gn(θ)]T (Eθ[Gn(θ)Gn(θ)T ]

)−1Eθ[∂θT Gn(θ)],

which is a natural generalization of the Fisher information. If the score function,

Un(θ) = ∂θT logLn(θ), exists we actually obtain the Fisher information

I (Un(θ)) = Eθ[∂θT Un(θ)]T (Eθ[Un(θ)Un(θ)T ]

)−1Eθ[∂θT Un(θ)] = Eθ[Un(θ)Un(θ)T ],

because the score function (usually) satisfies the second Bartlett-identity

Eθ[Un(θ)Un(θ)T ] =−Eθ[∂θT Un(θ)].

The rational behind considering the standardized estimating function G (s)n (θ) is that

G (s)n (θ) satisfies the second Bartlett-identity and is therefore more directly comparable

to the score function.

Definition 2. G∗n(θ) ∈Gn is an OF -optimal estimating function within Gn if

I (G∗n(θ))−I (Gn(θ))

is non-negative definite for all Gn(θ) ∈Gn and for all θ ∈Θ.

If an OF -optimal estimating function exists it is often referred to as the quasi-

score estimating function and the corresponding estimator as the quasi-likelihood

estimator. The quasi-score estimating function is close to the score function in an

L 2-sense since we have the following result. Suppose G∗n(θ) is OF -optimal in Gn ,

then

Eθ[(G (s)n (θ)−Un(θ))T (G (s)

n (θ)−Un(θ))] ≥ Eθ[(G∗(s)n (θ)−Un(θ))T (G∗(s)

n (θ)−Un(θ))]


for all Gn ∈Gn and for all θ ∈Θ. In fact, if Gn is a closed subspace of G , then the quasi-

score function is the orthogonal projection of Un(θ) onto Gn and can be interpreted

as an approximation to the score function. By choosing a sequence of subspaces Gn ,

that, as n →∞, converges to a subspace containing Un(θ), a sequence of estimators

that are asymptotic fully efficient can be constructed.

The following theorem (Thm. 2.1 in Heyde (1997)) provides a tool for verifying

optimality of an estimating function and can be used to find optimal PBEFs.

Theorem 3. G∗n ∈Gn is an OF -optimal estimating function within Gn if

Eθ[G∗(s)n (θ)G (s)

n (θ)T ] = Eθ[G (s)n (θ)G∗(s)

n (θ)T ] = Eθ[G (s)n (θ)G (s)

n (θ)T ], (1.34)

or equivalently

Eθ[∂θT Gn(θ)]−1Eθ[Gn(θ)G∗n(θ)T ] (1.35)

is a constant matrix for all Gn ∈Gn and for all θ ∈Θ. Conversely, if Gn is convex and

G∗n ∈Gn is OF -optimal within Gn , then (1.34) holds.

For a proof of the theorem see p. 14-15 in Heyde (1997). (1.35) can often be verified

by showing that

Eθ[Gn(θ)G∗n(θ)T ] =−Eθ[∂θT Gn(θ)]

for all Gn ∈ Gn and for all θ ∈Θ. In this case the optimal estimating function G∗n(θ)

satisfies the second Bartlett-identity and the Godambe information simplifies to the

Fisher information, which in this situation equals −Eθ[∂θT G∗n(θ)].

Optimal Prediction-based Estimating Functions

We now consider how to find the optimal PBEF within the class of PBEFs based on

finite dimensional predictor spaces of the form considered in the second section of

the chapter. This means we are studying PBEFs of the form (1.6), with s = q

Gn(θ) = A(θ)n∑

i=q+1H (i )(θ). (1.36)

Let r = q +1 and define the r ×p matrix

U (θ) =−Eθ[∂θT H (r )] = C (θ)∂θT â(θ),

where

C (θ) = [Eθ[Z (r−1)

k Z (r−1)l ]

]k,l=0...,q

with Z (r−1)0 = 1 as usual and where the r -dimensional vector â(θ) is given by

â(θ)T = (a0(θ), a1(θ), . . . , aq (θ))T .

1.6. APPENDIX 37

Remark 4. Note that C (θ) is related to the covariance matrix C (θ) in the following

way

C (θ) =

0 . . . 0... C (θ)

0

+Eθ[Z (r−1)]Eθ[Z (r−1)]T ,

with the r -dimensional vector Z (r−1) given by

Z (r−1) = (Z (r−1)

0 , Z (r−1)1 , . . . , Z (r−1)

q

)T .

From Sørensen (2000) we have the following result on how to choose the optimal

weight matrix A∗(θ)

Proposition 1. Suppose that for all θ ∈ Θ the matrix ∂θT ˆa(θ) has rank p. Then the

matrix

M n(θ) = Eθ[H (r )(θ)H (r )(θ)T ] (1.37)

+n−r∑k=1

(n − r −k +1)

n − r +1

(Eθ[H (r )(θ)H (r+k)(θ)T ]+Eθ[H (r+k)(θ)H (r )(θ)T ]

)(1.38)

is invertible, and the estimating function

G∗n(θ) = A∗

n(θ)n∑

i=rH (i )(θ),

where

A∗n(θ) =U (θ)T M n(θ)−1,

is OF -optimal within the class of estimating functions of the type (1.36), for which A(θ)

has rank p. Furthermore, the optimal estimating function G∗n(θ) satisfies the second

Bartlett-identity with Godambe information U (θ)T M n(θ)−1U (θ).

For a proof see proposition 3.2 in Sørensen (2000). In the uncommon case where

p equals q +1 the weight matrix A∗n(θ) is invertible and hence does not influence the

estimator. In this case, A∗n(θ) only ensures that the second Bartlett-identity holds.

Since the observed process is assumed to be stationary we use equation (1.5) and

find

∂θka(θ) =C (θ)−1[∂θk

b(θ)− (∂θkC (θ))a(θ)].

An expression for ∂θT a0 can be found by differentiating the expression following (1.5).

Note that only unconditional moments and derivatives of unconditional moments are

needed to compute optimal prediction-based estimating functions. Thus, if we know

C (θ),b(θ),Eθ[Z (r−1)] and Eθ[ f (Yr )], their derivatives and the moments appearing in

(1.38) we can compute optimal PBEFs.

If the observed process Yi is sufficientlyα-mixing then M n(θ) → M(θ) as n →∞,

(see section 6 in Sørensen (2000) or section 4 in Sørensen (2011) for these type of


results). Asymptotically it does not matter if we use U (θ)T M n(θ)−1 or U (θ)T M(θ)−1

as our optimal weight, the asymptotic variance of the resulting estimator will be

unaffected by this choice. The most challenging part of computing the optimal weight

matrix, A∗n(θ), is to compute M n(θ).

Remark 5. (From Sørensen (2011))

If we let

Hn(θ) = 1

n − r +1

n∑i=r

H (i )(θ),

then M n(θ) is the covariance matrix ofp

n − r +1Hn(θ). This means that in practice

the matrix M n(θ) can be calculated by simulatingp

n − r +1Hn(θ) a large number of

times under Pθ and then calculate the empirical covariance matrix. Calculating the

optimal weight matrix A∗n(θ) can be quite time consuming, and one can save a lot

time if A∗n(θ) is calculated for one parameter value only. This can be done by replacing

A∗n(θ) by A∗

n(θn), where θn is a consistent estimator of θ. Under the conditions 4.1

and 4.2 from Sørensen (2011), one way of obtaining a consistent estimator θn would

be to use the estimating function obtained by choosing p coordinates of Hn(θ).

1.6. APPENDIX 39

Appendix C: Moment Conditions underlying the GMM Estimator

Derivation of the Conditional First- and Second order Moments of IntegratedVolatility

The moment conditions, that Bollerslev and Zhou (2002) base their GMM estimator

on, comes from the derivations of the conditional first- and second-order moments

of daily integrated volatility (IV ). In this subsection the derivation for the Heston

model found inBollerslev and Zhou (2002) will be reviewed. In the next subsection

we consider how to compute the expectation of squared realized variance in the

presence of MMS noise. The derivations are then used to form a moment condition

used for constructing the GMM estimator that takes MMS noise into account.

Before deriving the conditional moments and the resulting moment conditions

the following notation is introduced

Ft =σvs ; s ≤ t ,

Gt =σIVt−s−1,t−s ; s = 0,1,2, . . .∞.

Note that the discrete sigma-algebra, Gt , generated by the integrated volatility series

is contained in the continuous sigma-algebra, Ft , generated by the point-in-time

volatility process. The distinction between the two sigma-algebras is important in

the derivation of the conditional first and and second order moments of IV .

First, let us consider how to derive the moment condition based on the con-

ditional mean of daily IV . From Cox, Ingersoll, and Ross (1985) it follows that the

conditional mean of the volatility process is given by

Eθ[vT |Ft ] = vt e−κ(T−t ) +α(1−e−κ(T−t ))= δT−t vt +βT−t . (1.39)

Hence, by interchanging the order of integration using Fubini’s theorem we obtain

Eθ[IVt ,T |Ft ] = Eθ[∫ T

tvs d s|Ft

]=

∫ T

tEθ[vs |Ft ]d s

=∫ T

t

(vt e−κ(s−t ) +α(1−e−κ(s−t ))

)d s

= vt1

κ

(1−e−κ(T−t ))+α(T − t )− α

κ

(1−e−κ(T−t ))

= dT−t vt +bT−t .

(1.40)

To ease the notation we let δ,β,d and b denote the parameters corresponding to the

daily horizon where T − t = 1. Using the law of iterated expectations we obtain

Eθ[Eθ[IVt+1,t+2]Ft+1]

∣∣Ft ] = Eθ[IVt+1,t+2|Ft ] = Eθ[d vt+1 +b|Ft ]

= dEθ[vt+1|Ft ]+b = d(δvt +β)+b

= δd vt +dβ+b

= δ(Eθ[IVt ,t+1|Ft ]−b

)+dβ+b

= δEθ[IVt ,t+1|Ft ]+β,


and by using the law of iterated expectations once more we get

Eθ[IVt+1,t+2|Gt ] = δEθ[IVt ,t+1|Gt ]+β. (1.41)

That is, the integrated volatility process satisfies the conditional moment condition

Eθ[IVt+1,t+2 −δIVt ,t+1 −β|Gt ] = 0, (1.42)

and hence also the unconditional moment restriction

Eθ[IVt+1,t+2 −δIVt ,t+1 −β] = 0. (1.43)

Since the moment condition (1.43) only depend on the drift parameters, κ and α,

we need an additional moment condition in order to construct a GMM estimator for

all three parameters. We will therefore also derive a moment condition based on the

conditional second order moment of integrated volatility.

From the SDE describing the evolution of spot volatility and the formula (1.40)

we can obtain an SDE for Eθ[IVt ,T |Ft ] by applying Itô’s lemma

dEθ[IVt ,T |Ft ] =(dT−tκ(α− vt )+ ∂dT−t

∂tvt + ∂bT−t

∂t

)d t +dT−tσ

pvt dWt

=−vt d t +dT−tσp

vt dWt .(1.44)

Now we fix the upper limit T and let the lower limit t be time-varying. Integrating

from t to T in (1.44) then yields

Eθ[IVT,T |FT ] = Eθ[IVt ,T |Ft ]−∫ T

tvs d s +

∫ T

tdT−sσ

pvs dWs ,

but since the left-hand side obviously equals zero, this implies

IVt ,T −Eθ[IVt ,T |Ft ] =∫ T

tdT−sσ

pvs dWs .

Using the Itô isometry we now obtain an expression for the conditional variance of

integrated volatility

Varθ[IVt ,T |Ft ] = Eθ[(IVt ,T −Eθ[IVt ,T |Ft ])2|Ft ]

= Eθ[(∫ T

tdT−sσ

pvs dWs

)2|Ft

]= Eθ

[∫ T

td 2

T−sσ2vs d s|Ft

]=

∫ T

td 2

T−sσ2Eθ[vs |Ft ]d s

=∫ T

td 2

T−sσ2[δs−t vt +βs−t ]d s

= DT−t vt +BT−t ,

(1.45)

1.6. APPENDIX 41

where

DT−t = σ2

κ2

[ 1

κ−2e−κ(T−t )(T − t )− 1

κe−2κ(T−t )],

BT−t = σ2

κ2

[α(T − t )

(1+2e−κ(T−t ))+ α

2κ

(e−κ(T−t ) +5

)(e−κ(T−t ) −1

)].

From Cox et al. (1985) and the obtained expression for Eθ[vT |Ft ] it follows that

Eθ[v2T |Ft ] = Varθ(vT |Ft )+ (

Eθ[vT |Ft ])2

= vtσ2

κ

(e−κ(T−t ) −e−2κ(T−t ))+ σ2α

2κ

(1−e−κ(T−t ))2 + (

δT−t vt +βT−t)2

=CT−t vt +ET−t +δ2T−t v2

t +β2T−t +2δT−tβT−t vt

= δ2T−t v2

t +(CT−t +2δT−tβT−t

)vt +

(ET−t +β2

T−t

).

Focusing on the one-day horizon and using (1.40) and (1.45) gives us

Eθ[IV 2t ,t+1|Ft ] = Varθ(IVt ,t+1|Ft )+ (

Eθ[IVt ,t+1|Ft ])2

= Dvt +B +d 2v2t +b2 +2dbvt

= d 2v2t + (D +2db)vt + (B +b2).

Now by using the law of iterated expectation and leading the arguments by one period

we get

Eθ[Eθ[IVt+1,t+2|Ft+1]|Ft

]= d 2Eθ[v2t+1|Ft ]+ (D +2db)Eθ[vt+1|Ft ]+ (B +b2),

and substituting in the obtained expressions for Eθ[vt+1|Ft ] and Eθ[v2t+1|Ft ] gives

us

Eθ[IV 2t+1,t+2|Ft ] =d 2[δ2v2

t + (C +2δβ)vt + (E +β2)]+ (D +2db)(δvt +β)+ (B +b2)

=δ2d 2v2t + [d 2(C +2δβ)+δ(D +2db)]vt

+ [d 2(E +β2)+β(D +2db)+ (B +b2)].

If we now reversely substitute out v2t by using our expression for Eθ[IV 2

t ,t+1|Ft ] we

get

Eθ[IV 2t+1,t+2|Ft ] =δ2[Eθ[IV 2

t ,t+1|Ft ]− (D +2db)vt − (B +b2)]

+ [d 2(C +2δβ)+δ(D +2db)]vt + [d 2(E +β2)+β(D +2db)+ (B +b2)].

Reversely substituting out vt by using Eθ[IVt ,t+1|Ft ] = d vt +b gives

Eθ[IV 2t+1,t+2|Ft ] =δ2Eθ[IV 2

t ,t+1|Ft ]

+ [d 2(C +2δβ)+ (δ−δ2)(D +2db)]1

d[Eθ[IVt ,t+1|Ft ]−b]

+ [d 2(E +β2)+β(D +2db)+ (1−δ2)(B +b2)],


and rearranging the terms yields

Eθ[IV 2t+1,t+2|Ft ] =δ2Eθ[IV 2

t ,t+1|Ft ]+ 1

d[d 2(C +2δβ)+ (δ−δ2)(D +2db)]Eθ[IVt ,t+1|Ft ]

− b

a[d 2(C +2δβ)+ (δ−δ2)(D +2db)]

+ [d 2(E +β2)+β(D +2db)+ (1−δ2)(B +b2)].

Finally, using the law of iterated expectations we get

Eθ[IV 2t+1,t+2|Gt ] =δ2Eθ[IV 2

t ,t+1|Gt ]+ 1

d[d 2(C +2δβ)+ (δ−δ2)(D +2db)]Eθ[IVt ,t+1|Gt ]

− b

d[d 2(C +2δβ)+ (δ−δ2)(D +2db)]

+ [d 2(E +β2)+β(D +2db)+ (1−δ2)(B +b2)]

=HEθ[IV 2t ,t+1|Gt ]+ I Eθ[IVt ,t+1|Gt ]+ J .

That is, the integrated volatility process satisfy the conditional moment restriction


t ,t+1)− I (IVt ,t+1)− J |Gt ] = 0, (1.46)

and hence also the unconditional moment restriction


t ,t+1)− I (IVt ,t+1)− J ] = 0. (1.47)

Derivation of the Fourth Moment Condition in the Presence of MMS Noise

We now consider computing

Eθ[(RV MMS


t ,t+1 )− J ].

The first step is to compute Eθ[(RV MMS

t ,t+1 )2]. By recalling the decomposition of realized

variance in the presence of MMS noise

RV MMSt ,t+1 = RV ∗

t ,t+1 +m∑

i=1ε2

i ,t +2m∑

i=1εi ,t Y ∗

i ,t , (1.48)

we se that Eθ[RV MMSt ,t+1 ] = Eθ[RV ∗

t ,t+1]+2mω2. The three terms in (1.48) are uncorre-

lated so the variance of RV MMSt ,t+1 equals

Varθ(RV MMS

t ,t+1

)= Varθ(RV ∗

t ,t+1

)+Varθ( m∑

i=1ε2

i ,t

)+4Varθ( m∑

i=1εi ,t Y ∗

i ,t

).

Due to the M A(1) structure and distribution of the noise process, ε∼ N (0,2ω2), we

find that the second term equals

Varθ( m∑

i=1ε2

i ,t

)= m∑i=1

Varθ(ε2i ,t )+

m∑i , j=1;i 6= j

Covθ(ε2i ,t ,ε2

j ,t )

= m(8ω4)+2(m −1)(2ω4) = 12mω4 −4ω4. (1.49)

1.6. APPENDIX 43

The last term in the expression for the variance consists of uncorrelated term and

since the efficient return process and the noise process are independent we find

4Varθ( m∑

i=1εi ,t Y ∗

i ,t

)= 4m∑

i=1Varθ(εi ,t Y ∗

i ,t ) = 8ω2Eθ[RV ∗t ,t+1]. (1.50)

This means that

Varθ(RV MMS

t ,t+1

)= Varθ(RV ∗

t ,t+1

)+12mω4 −4ω4 +8ω2[RV ∗t ,t+1]

and we get the following expression for Eθ[(RV MMS

t ,t+1 )2],

Eθ[(RV MMS

t ,t+1 )2]= Eθ[(RV ∗

t ,t+1)2]+4m2ω4 +4mω2[RV ∗t ,t+1]+12mω4 −4ω4 +8ω2Eθ[RV ∗

t ,t+1].

(1.51)

Using the old moment condition

Eθ[(RV ∗

t+1,t+2)2 −H(RV ∗t ,t+1)2 − I (RV ∗

t ,t+1)− J]≈ 0,

where equality holds if we replace RV ∗ with IV , and the above derivations we find

that

Eθ[(RV MMS


t ,t+1 )− J −K −L]≈ 0, (1.52)

K = (1−H)(4m2ω4 +4mω2α+12mω4 −4mω4 +8ω2α),

L =−2mω2I ,

since the expectation of RV ∗ is α.


1.7 References

Andersen, T. G., Bollerslev, T., 1997. Intraday periodicity and volatility persistence in

financial markets. Journal of Empirical Finance 4, 115–158.

Andersen, T. G., Davis, R., Kreiss, J.-P., Mikosch, T., 2009. Handbook of Financial Time

Series. Springer.

Aït-Sahalia, Y., Kimmel, R., 2007. Maximum likelihood estimation of stochastic volatil-

ity models. Journal of Financial Economics 83, 413–452.

Barndorff-Nielsen, O. E., Hansen, P. R., Lunde, A., Shephard, N., 2008a. Designing

realized kernels to meausure the ex post variantion of equity prices in the presence

of noise. Econometrica 76, 1481–1536.

Barndorff-Nielsen, O. E., Hansen, P. R., Lunde, A., Shephard, N., 2008b. Realised

kernels in practice: Trades and quotes. Econometrics Journal 04, 1–32.




Barndorff-Nielsen, O. E., Shephard, N., 2002. Econometric analysis of realized volatil-

ity and its use in estimating stochastic volatility models. Journal of the Royal Statis-

tical Society B 64, 253–280.



Bradley, R. C., 2005. Basic properties of strong mixing conditions: A survey and some

open questions. Probability Surveys 2, 107–144.

Brockwell, P. J., 2001. Lévy-driven CARMA processes. Annals of the Instute of Statistical

Mathematics 53, 113–124.

Brownless, C. T., Gallo, G. M., 2006. Financial econometric analysis at ultra-high

frequency: Data handling concerns. Computational Statistics & Data Analysis 51,

2232–2245.

Corradi, V., Distaso, W., 2006. Semi-parametric comparison of stochastic volatility

models using realized measures. Review of Economic Studies 73, 635–667.

Cox, J. C., Ingersoll, J. E., Ross, S. A., 1985. A theory of the term structure of interest

rates. Econometrica 53, 385–408.

Dacorogna, M., Müller, U., Nagler, R., Olsen, R., Pictet, O., 1993. A geographical model

for the daily and weekly seasonal volatility in the foreign exchange market. Journal

of International Money and Finance 12, 413–438.

1.7. REFERENCES 45

Eraker, B., 2001. Markov Chain Monte Carlo analysis of diffusion models with appli-

cation to finance. Journal of Business and Economic Statistics 19-2, 177–191.

Falkenberry, T. N., 2001. High frequency data filtering. Technical report, Tick Data.

Forman, J. L., Sørensen, M., 2008. The Pearson diffusions: A class of statistically

tractable diffusion processes. Scandinavian Journal of Statistics 35, 438–465.

Gallant, A. R., Tauchen, G., 1996. Which moments to match? Econometric Theory 12,

657–681.

Gourieroux, C., Monfort, A., Renault, E., 1993. Indirect inference. Journal of Applied

Econometrics 8, S85–S118.

Hall, P., Horowitz, J., Jing, B., 1995. On blocking rules for the bootstrap with dependent

data. Biometrika 82, 561–574.

Hansen, P. R., Lunde, A., 2006. Realized variance and market microstructure noise.

Journal of Business and Economic Statistics 24, 127–218.

Heston, S. L., 1993. A closed-form solution for options with stochastic volatility with

applications to bond and currency options. Review of Financial Studies 6, 327–343.

Heyde, C. C., 1997. Quasi-Likelihood and its Application. Springer-Verlag, New York.

Jacod, J., Li, Y., Mykland, P., Podolskij, M., Vetter, M., 2009. Microstructure noise in

the continuous case: The pre-averaging approach. Stochastic Processes and Their

Applications 119, 2249–2276.

Karlin, S., Taylor, H. M., 1975. A First Course in Stochastic Processes. Academic Press,

New York.

Lahiri, S. N., 1999. Theoretical comparison of block bootstrap methods. Annals of

Statistics 27, 386–404.

Newey, W. K., West, K. D., 1987. A simple positive semi-definite, heteroskedasticity

and autocorrelation consistent covariance matrix. Econometrica 55, 703–708.

Nolsøe, K., Nielsen, J. N., Madsen, H., 2000. Prediction-based estimating functions for

diffusion processes with measurement noise. Technical reports no. 10, Informatics

and mathematical modelling, Technical University of Denmark.


123–147.

Sørensen, M., 2011. Prediction-based estimating functions: Review and new develop-

ments. Brazilian Journal of Probability and Statistics 25, 362–391.


Todorov, V., 2009. Estimation of continuous-time stochastic volatility models with

jumps using high-frequency data. Journal of Econometrics 148, 131–148.

Todorov, V., Tauchen, G., 2006. Simulation methods for Lévy-driven CARMA stochastic

volatility models. Journal of Business and Economic Statistics 24, 455–469.

CH

AP

TE

R

2ON ESTIMATION METHODS FOR NON-GAUSSIAN

ORNSTEIN-UHLENBECK PROCESSES

A MONTE CARLO STUDY

Anne Floor Brix


Abstract

Estimators based on quadratic martingale estimating functions are derived for the

parameters governing the non-Gaussian Ornstein-Uhlenbeck process. The perfor-

mance of the estimators are analyzed in a Monte Carlo study and compared to the

performance of the approximate maximum likelihood estimator from Valdivieso et al.

(2009), that are based on fast Fourier transforms. Finite activity as well as infinite

activity non-Gaussian Ornstein-Uhlenbeck processes are considered. The perfor-

mance of the estimators are investigated in different scenarios, corresponding to

the two different types of processes used for modeling commodity spot prices - the

“base-signal” and “spike” process. Different ways of obtaining initial values for the

estimation procedures are also analyzed.

47

48 CHAPTER 2. ON ESTIMATION METHODS FOR NON-GAUSSIAN OU PROCESSES

2.1 Introduction

In the vast literature on energy commodities, Ornstein-Uhlenbeck (OU) processes are

often used as building blocks for constructing models. Since the spot prices in these

markets are determined by supply and demand equilibrium they are, in contrast to

stock prices, stationary and mean-revert to a, possibly stochastic, seasonally varying

mean-level. The usage of Gaussian Ornstein-Uhlenbeck processes for modeling

energy spot prices goes back to the well-known Schwartz model from Schwartz

(1997). Another stylized feature of energy markets, that needs to be taken into account

when building models, is price spikes, which are particularly important for electricity

modeling. Spikes in electricity spot prices are caused by large imbalances in supply

and demand. Because of the exponentially increasing marginal cost structure of

production1 and inelastic demand, these imbalances result in sudden large jumps in

the spot price. The mean-reversion is typically very strong during these peak periods

and the prices rapidly revert back to normal evolution, leaving a spike in the spot

price series. Models based on non-Gaussian OU processes is a popular way of dealing

with these price spikes, see for instance Benth and Saltyte Benth (2004), Benth et al.

(2007), Meyer-Brandis and Tankov (2008), Klüppelberg, Meyer-Brandis, and Schmidt

(2010) and Benth and Vos (2013). In these papers the spot prices are often modeled as

a superposition of OU processes, one OU process for modeling the normal variations

(often referred to as the base-signal) and one used for capturing the price spikes. In

for instance Benth et al. (2007) and Benth, Kiesel, and Nazarova (2012) the authors

also use a non-Gaussian OU process for the base-signal part of electricity spot prices.

Statistical inference for superpositions of OU processes is complicated by the

resulting model not being Markovian. Most (if not all) non-Bayesian parametric

approaches found in the literature on commodity modeling deal with this problem

by splitting the observations into a base-component and a spike-component using

various filtering techniques, see Meyer-Brandis and Tankov (2008), Klüppelberg et al.

(2010) and Benth et al. (2012). Parameter estimation is then, subsequently, carried

out on each of the filtered OU processes. This will serve as the starting point for the

present chapter, which is an investigation of various estimation methods used for

conducting parametric inference for non-Gaussian OU processes. More specifically

we will consider observations from D-OU processes, where D refers to the marginal

distribution of the observations. As for the choices of D, we consider both the finite

activity Γ-OU process and the infinite activity Inverse Gaussian (IG) OU process.

In contrast to the Gaussian case, maximum likelihood estimation is not directly

feasible since no analytical expression for the transition density is available. However,

the characteristic function is analytically tractable and can be inverted using Fourier

techniques, making approximate likelihood estimation possible. This idea was sug-

1Nuclear power and renewable energy often have low variable cost and are used to cover the basedemand. A sudden big increase in demand are often covered by burning fuel, which have a very highvariable cost.

2.1. INTRODUCTION 49

gested and carried out in Valdivieso et al. (2009), where the finite sample performance

is also investigated for non-Gaussian OU processes with marginal distributions simi-

lar to the ones used for modeling the normal variations in commodity spot prices.

This chapter extends the Monte Carlo study from Valdivieso et al. (2009) to also in-

clude parameter configurations resulting in marginal distributions more suitable for

modeling the spike behavior, for instance by having larger but fewer jumps in the

Γ-OU case. Instead of approximating the likelihood function, the estimation method

based on martingale estimating functions (MGEFs) aims at approximating the score

function. Since the non-Gaussian OU process is a Markov process and conditional

moments are analytical computable, MGEFs are a natural choice of estimating func-

tion. For an introduction to MGEFs an non-exhaustive list of references includes

Bibby and Sørensen (1995), Kessler (1995), Sørensen (1999) and Bibby, Jacobsen, and

Sørensen (2002).

The class of MGEFs considered in this chapter is the quadratic MGEFs which

are based on the conditional mean and variance of the OU process. The optimal

quadratic MGEF and simple benchmark with a similar structure are derived for the

non-Gaussian OU process and their finite sample performances are studied and

compared to the maximum likelihood method based on fast Fourier transforms from

Valdivieso et al. (2009). MGEFs are usually derived and studied in the context of

(jump) diffusions. A suboptimal quadratic MGEF is derived for non-Gaussian OU

processes in Hubalek and Posedel (2013) in the context of the stochastic volatility

model from Barndorff-Nielsen and Shephard (2001b), under the assumption that

both the price process and volatility process is observable. The quadratic MGEF from

Hubalek and Posedel (2013) admits an explicit estimator and the authors prove both

consistency and asymptotic normality of this estimator. However, to the best of my

knowledge, it is the first time that optimal quadratic MGEFs are explicitly derived for

non-Gaussian OU processes, making a study of the finite sample performances of

the resulting estimator possible. The optimal quadratic MGEF does not result in an

explicit expression for the estimator, so numerical routines must be applied.

As natural benchmarks for the two estimation approaches under consideration,

we consider straightforward estimation methods often applied in the aforementioned

literature. In particular, we investigate which of these methods are best suited for

generating initial values to be used in the two estimation approaches, and investigate

how big, if any, the gain is from using the two more complex methods. Both finite

and infinite activity OU processes are considered in two different parameter settings,

(the base-signal and spike scenario). Furthermore, the finite sample performances

are also investigated for two different levels of the mean-reversion parameter, which

directly translates into two different observation frequencies.

The chapter is organized as follows: in the following section the non-Gaussian

OU process and some of its basic properties are reviewed. In section 3 the method

from Valdivieso et al. (2009) is reviewed and an optimal MGEF for the non-Gaussian


OU process is derived. Section 4 contains the extensive Monte Carlo study of the

finite sample performances of the various estimators under consideration. In section

5 possible extensions of the estimation procedures to other non-Gaussian OU based

models are discussed. The final section concludes on the findings.

2.2 Non-Gaussian Ornstein-Uhlenbeck Processes

The non-Gaussian Ornstein-Uhlenbeck (OU) process, X , is given as the stationary

solution of the following stochastic differential equation driven by the Lévy process

Z :

dX (t ) =−λX (t )dt +dZ (λt ), λ> 0, (2.1)

where X (0) is assumed to be independent of Z and have marginal distribution D.

The Lévy process Z is denoted the background driving Lévy process (BPLP) and

will throughout the chapter be assumed to be a subordinator2. This choice ensures

positivity of X and also implies that X will have bounded variation. Expect for the

derivation of a high-frequency estimator for the mean-reversion parameter, all the

estimators in the paper are also valid with Z being a general Lévy process. The time-

change of the BDLP, Z , implies that the marginal distribution of X will not depend

on λ.3 Since E(Z (λt )) =λtE(Z (1)), we can rewrite the SDE in equation (2.1) as

dX (t ) =−λ(X (t )−E(Z (1)))dt +d(Z (λt )−E(Z (λt ))).

Hence, it follows that X (t ) mean-reverts to the long run mean of the BDLP, E(Z (1)),

at rate λ. The non-Gaussian OU process moves up when the BDLP Z jumps up and

when Z is of finite activity X (t) decays exponentially at rate λ between the jumps.

Note that, due to the time-change, the mean-reversion parameter λ also determines

the rate at which jumps occur. When E(log(1+∣∣Z (1)∣∣)) < ∞ the SDE (2.1) has the

following stationary solution

X (t ) = e−λt X (0)+∫ t

0e−λ(t−u) dZ (λu). (2.2)

From the solution it is clear that X (t) will be non-negative when the background

driving Lévy process (BDLP) Z is a subordinator and X (0) ≥ 0. By using (2.2) and

letting ∆ denote the length of the time interval between observations we can derive a

recursive relationship for the discretized non-Gaussian OU process

X (i∆) = e−λ∆X ((i −1)∆)+e−λi∆∫ λi∆

λ(i−1)∆e s dZ (s)

d= e−λ∆X ((i −1)∆)+e−λ∆∫ λ∆

0e s dZ (s), (2.3)

2Lévy process with stationary, independent and non-negative increments.3See Barndorff-Nielsen and Shephard (2001b).

2.2. NON-GAUSSIAN ORNSTEIN-UHLENBECK PROCESSES 51

where the equality in distribution follows from Proposition 3.1 in Valdivieso et al.

(2009). Now introducing the notation E(X (t )) = ξ and Var(X (t )) =ω2 it follows from

(2.3) that the non-Gaussian OU process can be interpreted as a continuous time

analog of the AR(1) model since we have

X (i∆) = e−λ∆X ((i −1)∆)+εi , where εi ∼ i .i .d . (2.4)

with

E(ε) = ξ(1−e−λ∆) and Var(ε) =ω2(1−e−2λ∆).

In Barndorff-Nielsen and Shephard (2001b) the relationship between the cumu-

lant generating function of the non-Gaussian OU process and the BDLP is established.

If we let k(θ) := log(E [exp(−θZ (1))]

)and k(θ) := log

(E [exp(−θX (t ))]

), then it follows

from Barndorff-Nielsen and Shephard (2001b) that k(θ) = θk ′(θ), and we furthermore

get κm = mκm , where κm and κm denote the cumulants of Z (1) and X (t ) respectively.

In particular this yields E (X (t )) = E (Z (1)) and Var(X (t )) = 12 Var(Z (1)). The autocorre-

lation function of the stationary non-Gaussian OU process is given by r (u) = e−λ|u|.When building non-Gaussian OU processes, there are two different approaches.

One approach is to specify the marginal distribution, D, of the process X , in which

case X is called a D-OU process. In the other approach, the non-Gaussian OU process

is constructed by specifying the BDLP. For constraints on valid BDLPs see Barndorff-

Nielsen and Shephard (2001b). We will take the first approach and consider two

different choices of D, both from the class of generalized inverse Gaussian (GIG)

marginal laws. The following two special cases of the GIG class, which are commonly

used in the literature, will be considered.

• The Γ-OU process:

In this case X ∼ Γ(ν,α) with shape parameter ν> 0 and scale parameter α> 0.

The density of the non-Gaussian OU process X is given by

f (x,ν,α) = 1

Γ(ν)ανxν−1 exp(−x/α), ∀x > 0,

where Γ(·) denotes the gamma function.

• The IG-OU process:

In this case X ∼ IG(δ,γ) with δ> 0 and γ≥ 0. The density of X is given by

f (x,δ,γ) = δp2π

exp(δγ)x−3/2 exp(− 1

2(δ2x−1 +γ2x)

), ∀x > 0.

In the literature the Inverse Gaussian distribution has been parametrized in several

different ways. Here the parametrization from Barndorff-Nielsen (1998) is followed.

The Γ(ν,α)-OU process is of finite activity, meaning that almost all paths have a finite


number of jumps on any finite time interval. The IG(δ,γ)-OU process displays an

infinite number of jumps on any finite time interval and is therefore of infinite activity.

The activity of the OU process is determined by the behavior of the small jumps. More

precisely, X , will be of finite activity if∫ 1

0+ W (dx) <∞, where W is the Lévy measure

of the Lévy-Khintchine representation for Z (1).

One of the questions this chapter aims to answer is whether the different jump

characters of the processes will influence the finite sample performances of the

estimation methods under consideration.

2.3 Estimation Methods

In this section the two main estimation methods under consideration will be pre-

sented. The Gaussian OU process can be estimated using the maximum likelihood

(ML) method. Although the model is still Markovian in the non-Gaussian case, we

no longer have an analytical expression for the density of the i.i.d. Lévy functionals,

ε, from (2.4). Furthermore, for finite activity OU processes the Lévy functionals are

mixed random variables that only have a density conditional on the presence of

jumps. One way of circumventing these challenges and performing approximate

likelihood estimation was suggested in Valdivieso et al. (2009). The method from Val-

divieso et al. (2009) uses the fast Fourier transform (FFT) to obtain an approximation

of the likelihood function by inverting the characteristic function of the Lévy func-

tionals. The estimation method from Valdivieso et al. (2009) is presented in the next

subsection and should, if well implemented, give parameter estimates close to the

infeasible ML estimates. In subsection 2.3.2 the estimation method based on martin-

gale estimating functions (MGEFs) is presented. This method aims at approximating

the score function instead of the likelihood function. In contrast to the method from

Valdivieso et al. (2009), where the link between the characteristic function and density

was exploited, the approximation is for the MGEF based method a bit ad-hoc, simply

approximating the unknown score function with martingales of a certain structure.

The optimal MGEF within a given class of MGEFs is the one closest to the score

function in an L 2 sense. The choice of MGEFs used to approximate the score will

obviously influence the efficiency of the resulting estimator. The MGEFs considered

in this chapter are the so-called quadratic MGEFs. The optimal MGEF within this

class will be derived in subsection 2.3.2. The estimation method based on quadratic

MGEFs relies only on computation of conditional moments of the OU process and

does not require knowledge of the entire conditional density function. Therefore, the

MGEF based procedure is not affected by the finite activity OU processes being a

mixed random variable.

2.3. ESTIMATION METHODS 53

2.3.1 Maximum Likelihood Estimation using the Fast Fourier Transform

Assume we have observations x0, x∆, . . . , xn∆ from a discretely sampled D-OU process

X , sampled at equidistant time intervals of length ∆. We will let time be measured

in days such that daily sampling corresponds to ∆= 1. Due to Markovianity and the

AR(1) structure of X , the likelihood function of the sample is given by

L (θ) = fX (0)(x0)n∏

i=1fX (i∆)|X ((i−1)∆)=x(i−1)∆ (xi∆),

where fX (i∆)|X ((i−1)∆)=x(i−1)∆) denotes the conditional density of X (i∆) given X ((i−1)∆)

takes the value x(i−1)∆. Unfortunately, this conditional density is not available in

closed form when X is a non-Gaussian OU process. One way of circumventing this

problem is to approximate the conditional density as done in Valdivieso et al. (2009).

Recall the recursive formula

X (i∆)d= e−λ∆X ((i −1)∆)+e−λ∆Z∗(∆),

where Z∗(∆) = ∫ λ∆0 eλs dZ (s). From this it follows that the conditional cumulative

distribution function evaluated at xi∆ is given by

P (X (i∆) ≤ xi∆|X ((i −1)∆) = x(i−1)∆) = P (Z∗(∆) ≤ eλ∆xi∆−x(i−1)∆).

If Z∗(∆) is a continuous random variable we can differentiate w.r.t xi∆ and obtain the

link between the density functions

fX (i∆)|X ((i−1)∆)=x(i−1)∆ (xi∆) = eλ∆ fZ∗(∆)(eλ∆xi∆−x(i−1)∆).

The idea is then to compute the characteristic function,φZ∗(∆), for the Lévy functional

Z∗(∆) and use the inversion formula

fZ∗(∆)(z) = 1

π

∫ ∞

0e−i uzφZ∗(∆)(u)du (2.5)

and the discrete fast Fourier transform (FFT) to evaluate the density function of

Z∗(∆). Using the Proposition below from Barndorff-Nielsen (1998) the cumulant

characteristic function, and hence also the characteristic function of Z∗(∆), can be

computed in terms of the cumulant characteristic function for the BDLP, Z , and a

process with the same marginal distribution D as the OU process.

Proposition 2. For all ∆> 0 and v ∈Rwe have that

CZ∗(∆)(v) = log(E(exp(i v Z∗(∆)))) =λ∫ ∆

0CZ (1)(v exp(λs))ds,

where CZ (1) = v dCDdv (v).

Proof. See Barndorff-Nielsen (1998) for a proof.


In the Monte Carlo study the method will be applied to the Γ(ν,α)-OU process

and the IG(δ,γ)-OU process. For the Γ(ν,α)-OU process, the corresponding Lévy

functional Z∗(∆) will not be a continuous random variable but a mixed random

variable since Z∗(∆) equals 0 if there are no jumps on the interval and this happens

with probability p = e−νλ∆. Again following Valdivieso et al. (2009), the conditional

density of the OU process will be slightly modified and given by

fX (i∆)|X ((i−1)∆)=x(i−1)∆ (xi∆) =p if eλ∆xi∆−x(i−1)∆ = 0

eλ∆ f JZ∗(∆)(eλ∆xi∆−x(i−1)∆) if eλ∆xi∆−x(i−1)∆ > 0,

(2.6)

where f JZ∗(∆) denotes the density of Z∗(∆) conditioned to the presence of jumps.

Since the characteristic function of Z∗(∆) equals one when there are no jumps,

f JZ∗(∆) can be evaluated by computing the Fourier transform of φJ

Z∗(∆) =φZ∗(∆)−p

1−p . The

characteristic function of a Γ(ν,α) distributed random variable X is given by

φX (v) = ( 1/α

1/α− i v

)ν,

and using Proposition 2 we find CD (v) = log(φX (v)) = ν(

log(1/α)− log(1/α−i v))

and

CZ (1)(v) = νvi

1/α− i v= ν

( 1/α

1/α− i v−1

).

The characteristic function for the Lévy functional can therefore be found to equal

φZ∗(∆)(v) = exp(λ

∫ ∆

0ν( 1/α

1/α− i v exp(λs)ds

))= exp

(ν log

( 1/α− i v

1/α− i v exp(λ∆)

))=

( 1/α− i v

1/α− i v exp(λ∆)

)ν.

In the IG(δ,γ)-OU case the characteristic function φX (v) is given by

φX (v) = exp(δγ−δ√γ2 −2i v),

so, again using Proposition 2 we find

CZ (1)(v) = δvi√γ2 −2i v

,

and

φZ∗(∆)(v) = exp(λ

∫ ∆

0

δi v exp(λs)

γ2 −2i v exp(λs)ds

)= exp

(δ[−

√γ2 −2i v exp(λs))

]∆0

)= exp

(δ(√

γ2 −2i v −√γ2 −2i v exp(λ∆)

)).


Given the characteristic functions, FFT can now be used to approximate the likelihood

function.

Discrete Fast Fourier Transform

The discrete FFT is a numerical method used for evaluating the integral in (2.5) at

each point in the vector x = x0, x1, . . . , xN−1. The function fft in MATLAB operates

the following sum for j = 1. . . , N

X ( j ) =N−1∑m=0

x(m)ωm( j−1)N , ωN = e−2πi /N . (2.7)

To see how this fits with the inversion formula, consider the discretized version of the

integral using the trapezoid rule

fZ∗(∆)(z) = 1

π

∫ ∞

0e−i uzφZ∗(∆)(u)du ≈ 1

π

N−1∑m=0

δme−i um zφZ∗(∆)(um)∆u ,

where δ0 = 1/2 and δm = 1 when m 6= 0. Now let η=∆u and let um = ηm. If we let z j =−b +ζ( j −1) for j = 1, . . . , N with ζ= 2π/ηN and b ∈R, we obtain the approximation

fZ∗(∆)(z j ) ≈ 1

π

N−1∑m=0

δme−iηm(−b+ 2πηN ( j−1))

φZ∗(∆)(um)η

=N−1∑m=0

x(m)e−2πiN m( j−1), with x(m) = 1

πδme iηmbφZ∗(∆)(ηm)η,

which is on the FFT form from (2.7) and can be used to evaluate the desired density

fZ∗(∆) at the grid points z1, . . . , zN . Note that to center the grid points around 0, let

b = ζN2 . When constructing the likelihood function, linear interpolation will be used

to evaluate the density function at points that do not coincide with the grid points.

2.3.2 Martingale Estimating Functions

In this subsection the theory of martingale estimating functions (MGEFs) is briefly

reviewed and the optimal quadratic MGEF for the non-Gaussian OU process is de-

rived. Most of the literature on MGEFs is developed for diffusion processes, see for

example Bibby and Sørensen (1995) and Bibby et al. (2002), but as also noted in

Sørensen (1997), Kessler (1995) and Sørensen (1999) most of the results extend to

general Markov processes and more general MGEFs than those considered here.

The non-Gaussian OU process is a Markov process but, as was already noted

in the previous subsection, the transition density is unknown and exact maximum

likelihood estimation is therefore infeasible. The main idea underlying the use of

MGEFs is to try to approximate the unknown score function, which in itself is a MGEF.

Asymptotic results for MGEFs utilize the well-developed martingale limit theory and


the reader is referred to Bibby and Sørensen (1995), Kessler (1995) and Sørensen

(1999) for conditions ensuring consistency and asymptotic normality of the resulting

estimators.

An estimating function, Gn(θ), is a function that depends on the n data points

and on the p-dimensional vector θ ∈Θ⊆Rp , that we wish to estimate. An estimator

is then obtained by solving the p equations Gn(θ) = 0 w.r.t. θ. In our case θ is 3-

dimensional and consist of the mean-reversion parameter λ and the two parameters

governing the marginal distribution D. An MGEF is an estimating function satisfying

Eθ(Gn(θ)|Fn−1

)=Gn−1(θ), n = 1,2. . . ,

where Fn−1 is the σ-field generated by the past observations up til time (n −1)∆ and

G0 = 0. If we let y 7→ p(s, x, y ;θ) denote the transition density, i.e. the conditional

density of X t+s given X t = x, then the score function takes the form

Un(θ) =n∑

i=1∂θ log p(∆, X(i−1)∆, Xi∆;θ).

Under mild regularity conditions allowing for the interchange of differentiation and

integration it easily follows that the score function is an MGEF, see for instance

Barndorff-Nielsen and Sørensen (1994). The score function will be approximated

using MGEFs of the form

Gn(θ) =n∑

i=1g (∆, X(i−1)∆, Xi∆;θ) =

n∑i=1

a(∆, X(i−1)∆;θ)h(∆, X(i−1)∆, Xi∆;θ), (2.8)

where h = (h1, . . . ,hN )′ and for each j , the function h j satisfies the following condition:∫S h j (∆, x, y ;θ)p(∆, x, y ;θ) = 0 for all x in the state space, S, of X and for all θ ∈Θ. The

p ×N weight matrix a(x,θ) is a function of x such that (2.8) is Pθ integrable. Gn(θ)

defined in this way is clearly a MGEF and in particular it is an unbiased estimating

function, i.e. Eθ(Gn(θ)

)= 0. Given the real valued functions, h j ’s, the weight matrix

can be chosen in an optimal way using the theory of optimal estimating functions.

The optimal weight matrix will result in the estimating function that approximates

the score function the best, in a mean square sense, within the class of estimating

functions of the form (2.8). Results stating the optimal weight matrix in a setting

containing non-Gaussian OU processes can for be found in Theorem 3.1. of Sørensen

(1997) and in the more general setting of Markov chains in Kessler (1995). How to

choose the approximating functions, h j ’s, is however much more of an art than

science. The choice of h j ’s will affect how well the score function is approximated

and hence also affect the efficiency of the corresponding estimator.

In our application to non-Gaussian OU processes we will focus on the quadratic

MGEFs, which have proven to work well in the diffusion setting (see for instance

Bibby and Sørensen (1995)). The quadratic MGEF is of the form (2.8) with N = 2 and


with h1 and h2 given by

h1(∆, x, y ;θ) = y −F (∆, x;θ),

h2(∆, x, y ;θ) = (y −F (∆, x;θ)

)2 −β(∆, x;θ),

where F and β denote the conditional mean and variance of the transition density,

F (∆, x;θ) = Eθ(X∆|X0 = x) and β(∆, x;θ) = Varθ(X∆|X0 = x). With no a priori insight

on how to chose the approximating function it seems like a good choice to consider

quadratic MGEFs, since these will ensure that the empirical first and second order

conditional moments match the theoretical ones. One could also consider increas-

ing N and include functions of higher order conditional moments or trigonomet-

ric/exponential moments of X . However, this would further complicate the derivation

of the optimal weight matrix and as already mentioned the quadratic MGEFs seems

like a natural starting point for investigating the potential of MGEFs for pure jump

OU processes.

In most settings F and β will have to be computed using simulations, but for the

non-Gaussian OU process we can find explicit analytic expressions for them as well

as for the optimal weight matrix a∗(∆, X(i−1)∆;θ). The optimal weight matrix is found

computing the L 2 projection of ∂θ log p(∆, X(i−1)∆, Xi∆;θ) onto the space spanned

by the two functions h1 and h2. As a result (see Kessler (1995)) the optimal weight is

given by

a∗(∆, X(i−1)∆;θ) =−Eθ(∂θ′h(∆, X(i−1)∆, Xi∆;θ)|Fi−1)′Vh(X(i−1)∆, Xi∆;θ)−1, where

Vh(∆, X(i−1)∆, Xi∆;θ) = Eθ(h(∆, X(i−1)∆, Xi∆;θ)h(∆, X(i−1)∆, Xi∆;θ)′|Fi−1).

With our choice of h we have

Vh(∆, x;θ) = Eθ(h(∆, X(i−1)∆, Xi∆;θ)h(∆, X(i−1)∆, Xi∆;θ)′|X(i−1)∆ = x)

=(β(∆, x;θ) η(∆, x;θ)

η(∆, x;θ) ψ(∆, x;θ).

)where β(∆, x,θ) as before denotes the conditional variance of Xi∆ given X(i−1)∆ = x

and

η(∆, x;θ) = Eθ((Xi∆−F (∆, X(i−1)∆;θ))3|X(i−1)∆ = x

),

ψ(∆, x;θ) = Eθ((Xi∆−F (∆, X(i−1)∆;θ))4|X(i−1)∆ = x

)−β(∆, x;θ)2.

To ease notation, we will suppress ∆ in the notation and for instance use F (x;θ)

instead of F (∆, x;θ). The optimal weight matrix (after multiplying by -1) will then

have columns given by

a∗1 (x;θ) = ∂θβ(x;θ)η(x;θ)−∂θF (x;θ)ψ(x;θ)

β(x;θ)ψ(x;θ)−η(x;θ)2

a∗2 (x;θ) = ∂θF (x;θ)η(x;θ)−∂θβ(x;θ)β(x;θ)

β(x;θ)ψ(x;θ)−η(x;θ)2 .(2.9)


Recall, that for the non-Gaussian OU processes the discretized process has the

AR(1) structure from (2.4). We therefore get

F (x;θ) = e−λ∆x +Eθ(εi ) = e−λ∆x +ξ(1−e−λ∆),

β(x;θ) = Varθ(Xi∆|X(i−1)∆ = x) = Varθ(εi ) =ω2(1−e−2λ∆),

η(x;θ) = Eθ([εi −Eθ(εi )]3|X(i−1)∆ = x) = Eθ([εi −Eθ(εi )]3),

ψ(x;θ) = Eθ([εi −Eθ(εi )]4|X(i−1)∆ = x)−β(x;θ)2 = Eθ([εi −Eθ(εi )]4)−β(x;θ)2,

since εi is independent of X(i−1)∆. Note that for OU processes,β,η andψ only depend

on θ and ∆ and not on x. This also means that Vh only depends on θ and ∆. Since

we do not know the distribution of ε we have to compute the moments entering

η(x;θ) and ψ(x;θ) in a recursive way. Defining e j := Eθ(ε j ) and letting the marginal

moments of the OU process be denoted by m j := Eθ(X j ), then we can use (2.4) to

obtain the following recursive relationship

e1 = Eθ(ε) = (1−e−λ∆)m1,

e2 = Eθ(ε2) = (1−e−2λ∆)m2 −2e−λ∆m1e1,

e3 = Eθ(ε3) = (1−e−3λ∆)m3 −3e−2λ∆m2e1 −3e−λ∆m1e2,

e4 = Eθ(ε4) = (1−e−4λ∆)m4 −4e−3λ∆m3e1 −6e−2λ∆m2e2 −4e−λ∆m1e3.

Finally, η(x;θ) and ψ(x;θ) can be computed as

η(x;θ) = e3 −3e2e1 +2e31,

ψ(x;θ) = e4 −4e3e1 +8e2e21 −4e4

1 −e22.

Now we only need to compute the partial derivatives ∂θF (x;θ) and ∂θβ(x;θ) before

the optimal quadratic MGEF can be constructed. The 3-dimensional parameter

vector is given by θ = (λ,θ2,θ3), where θ2 and θ3 are the two parameters governing

the marginal distribution of the D-OU process. The partial derivatives we need are

given by

∂θF (x;θ)′ = (−∆e−λ∆x +ξ∆e−λ∆, (1−e−λ∆)∂θ2ξ, (1−e−λ∆)∂θ3ξ),

∂θβ(x;θ)′ = (2ω2∆e−2λ∆, (1−e−2λ∆)∂θ2ω

2, (1−e−2λ∆)∂θ3ω2).

Using the moment generating function of the stationary OU process we can compute

m1, . . . ,m4 and the optimal quadratic MGEF can be implemented. For the two types

of OU processes considered in this chapter the moment generating functions are

given below, and using E(X n) = d n Mxdun (0) the marginal moments can be derived.

• When X ∼ Γ(ν,α) the moment generating function and marginal moments are


given by

MX (u) := E(euX )= (1−αu)−ν, u ∈R

m1 = ξ= E(X ) =αν, m2 =α2ν(ν+1), ω2 = Var(X ) =α2ν,

m3 =α3ν(ν+1)(ν+2), m4 =α4 (ν+4−1)!

(ν−1)!.

• When X ∼ IG(δ,γ) the moment generating function and marginal moments

are given by

MX (u) = eδγ−δpγ2−2u , u ∈R

m1 = ξ= E(X ) = δ/γ, m2 = δ(1+δγ)

γ3 , ω2 = Var(X ) = δ/γ3,

m3 = δ(3+3δγ+δ2γ2)

γ5 , m4 = δ(15+15δγ+6δ2γ2 +δ3γ3)

γ7 .

We now have everything we need for constructing the optimal quadratic MGEF.

To the best of my knowledge this is the first time an optimal quadratic MGEF

is derived for non-Gaussian OU processes. In Hubalek and Posedel (2013), the au-

thors consider the Barndorff-Nielsen and Shephard stochastic volatility model from

Barndorff-Nielsen and Shephard (2001b) in a bivariate setting, assuming that both the

price process as well as the volatility process can be directly observed. The volatility

specification in that model is a non-Gaussian OU process and the part of their MGEF

that concerns the volatility process can therefore be compared to the quadratic MGEF

derived in this paper. Their estimating function corresponds to letting the weight

matrix be a 3×3 identity matrix and choosing the h j ’s as

h1(∆, X(i−1)∆, Xi∆;θ) = Xi∆−F (∆, X(i−1)∆;θ),

h2(∆, X(i−1)∆, Xi∆;θ) = Xi∆X(i−1)∆−Eθ(Xi∆X(i−1)∆|X(i−1)∆ = x(i−1)∆),

h3(∆, X(i−1)∆, Xi∆;θ) = X 2i∆−Eθ(X 2

i∆|X(i−1)∆ = x(i−1)∆).

In contrast to our MGEF, the vector of h j ’s used in Hubalek and Posedel (2013) is now

3-dimensional. The MGEF studied in Hubalek and Posedel (2013) is sub-optimal since

their simple choice of weight matrix does not result in the most efficient estimator

within the class of MGEF based on the h j ’s above. However, the simple structure

allows for an explicit estimator since an explicit solution to Gn(θ) = 0 for θ = (κ,ξ,ω2)

is available, whereκ= e−λ∆ and ξ andω2 are the mean and variance of the OU process.

Hence, the estimator from Hubalek and Posedel (2013) does not rely on numerical

optimization procedures. The parameters of interest, the mean-reversion rate and

the parameters characterizing D, can then be backed out from their estimate of θ.

The resulting estimator for the mean-reversion parameter, λ, becomes a function

of the autocorrelation coefficient of an AR(1) process and the estimates of ξ and


ω2 are also based on well-known estimators for the mean and variance of an AR(1)

process. The estimator for the mean-reversion parameter from Hubalek and Posedel

(2013) and similar moment based estimates for the parameters characterizing D,

will be studied in our Monte Carlo study as a way of obtaining initial values for the

numerical routines used in the FFT MLE method and the method based on our

optimal quadratic MGEF.

Note that in the case of the Γ-OU process, the mean-reversion parameter λ can

be estimated simultaneously with the parameters from the marginal distribution.

This is not possible in the context of the FFT MLE procedure, where the conditional

density has to be split according to whether or not a jump has occurred, see (2.6). The

splitting depends on λ and in Valdivieso et al. (2009) and our Monte Carlo study, λ

will be assumed known in order to implement the estimation procedure.

2.4 Monte Carlo Study

In the Monte Carlo study the performance of the estimators will be investigated in two

different parameter scenarios, both for the Γ(ν,α)-OU process and the IG(δ,γ)-OU

process. The parameters controlling the marginal distribution of the OU process are

chosen to resemble the “base-signal” and “spike” part of commodity prices. That is,

the Γ(ν,α)-OU process in scenario 1 (the “base-signal” scenario) will have many, but

small, jumps mimicking the small imbalances in demand and supply. In scenario

2, (the “spike” scenario), the Γ(ν,α)-OU will have few, but large, jumps capturing

possible shocks to, or large imbalances in, demand and supply. Given the parameters

ν and α in the two scenarios, the parameters governing the marginal distribution of

the IG(δ,γ)-OU process will then be chosen to match the mean and variance of the

Γ(ν,α)-OU process in each scenario. As for the mean-reversion parameter, λ, several

studies have shown that the base-signal has a slower mean-reversion than the spike

process.4 Therefore, two choices of the mean-reversion parameter will be considered.

The parameters are chosen in the following way:

• Scenario 1 (The base-signal)

– The Γ(ν,α)-OU process: ν= 10 and α= 1/15.

– The IG(δ,γ)-OU process: δ=p20/3 and γ=p

15.

• Scenario 2 (The spike process):

– The Γ(ν,α)-OU process: ν= 0.5 and α= 0.5.

– The IG(δ,γ)-OU process: δ=p1/8 and γ=p

2.

4See for instance Benth, Benth, and Koekebakker (2008) and Meyer-Brandis and Tankov (2008).

2.4. MONTE CARLO STUDY 61

0 0.5 1 1.5 20

0.5

1

1.5

2

X(t)

0 0.5 1 1.5 20

10

20

30

X(t)

0 0.5 1 1.5 20

0.5

1

1.5

2

2.5

X(t)

0 0.5 1 1.5 20

2

4

6

X(t)

Figure 2.1. The top panel shows the marginal density of X (t) in scenario 1 and 2 for theΓ(ν,α)-OU process and the bottom panel shows the marginal densities in each scenario forthe IG(δ,γ)-OU process.

In each of the two scenarios we will consider both λ = 0.02 and λ = 0.25. In

scenario 1 we have E(X t ) = 0.667 and Var(X t ) = 0.044, whereas in scenario 2 we

have a lower mean and higher variance, E(X t ) = 0.25 and Var(X t ) = 0.125. Figure 2.1

depicts the marginal densities in each scenario. From the figure it is clear that the

two scenarios also represent the different shapes of the marginal densities. Plots of

simulated paths of the two OU processes in each of the two scenarios are provided in

Figure 2.2 - 2.5 in the Appendix, where the methods used for simulating the Γ-OU

and IG-OU processes are also described.

In the next subsections the results for various simulation studies, examining ways

to estimate the Γ-OU and IG-OU process are presented. The finite sample perfor-

mance of the FFT MLE procedure from Valdivieso et al. (2009) and the MGEF based

procedure, derived in the previous subsection, are investigated. Estimators used for

choosing starting values for the two estimation procedures will also be derived and

analyzed. The parameter configurations from scenario 1 and 2 are considered, and

in each setting we fix ∆ = 1 and simulate the processes on the interval [0,T ] with

T = 1000, leaving us with n = 1000 daily observations. The number of Monte Carlo

replications is 500.5

Before analyzing the performances of the estimation methods we make a few

observations. First, note that the effective observation frequency is in fact determined

by λ∆, due to the timing in Z (λt ), that also shows up in the autocorrelation function

r (u) = corr(X ((n +u)∆), X (n∆)) = exp(−λ∆|u|). Secondly, λ would in fact be known

5In Valdivieso et al. (2009) the authors only use 100 Monte Carlo replications.


if we could observe the OU process, X , continuously. This can be seen by considering

the integral version of equation (2.1)

X (t ) = X (0)−λ∫ t

0X (s)ds +Z (λt ), (2.10)

where Z is a pure-jump process of bounded variation.6 Since Z (λt) =∑0<s≤t ∆Zλs ,

where ∆Zλs := Z (λs)−Z (λs−), and because t 7→ ∫ t0 X (s)ds is continuous, then jumps

of the OU process are the jumps of Z . Equation (2.10) therefore becomes

X (t ) = X (0)−λ∫ t

0X (s)ds + ∑

0<s≤t∆Xs ,

which gives

λ= 1∫ t0 X (s)ds

(X (0)−X (t )+ ∑

0<s≤t∆Xs

). (2.11)

This means that λ is known if we could observe the OU process continuously. So for a

fixed time horizon T , using high-frequency observations will result in a (unknown)

maximum likelihood estimator of λ that is really close to the true parameter value.

Note that a simulation study investigating this property would involve fixing the value

of T and λ and let ∆→ 0, thereby also increasing the number of observations used

for constructing the estimator. Our Monte Carlo study instead focus on the finite

sample performance of the various estimators given a fixed number of observations.

We investigate the performances at different effective frequencies by fixing ∆= 1 and

varying λ.

2.4.1 Initial Values for the two Estimation Procedures

In this subsection several ways of obtaining initial values for the two estimation

procedures from the previous section are considered. Two straightforward ways of

obtaining initial estimates of the mean-reversion parameter, λ, is by exploiting the

AR(1) structure of the discretized process or by matching the theoretical and empirical

autocorrelation function. By regressing X (n∆) on X ((n −1)∆) in (2.3) and isolating

for λ in the regression coefficient we get the estimator

λ1 = − log( âcf(1))

∆,

where âcf(1) denotes the empirical autocorrelation function of lag 1. If we instead

match the theoretical and empirical autocorrelation structure up to lag 100, we obtain

the estimator

λ100 = argminλ

100∑k=1

( âcf(k)−exp(−λk∆))2.

6The Lévy measure ν of Z satisfies∫

1∧|s|ν(d s) <∞. Subordinators are of bounded variation.


Inspired by the observation that λ is known if we could observe in continuous time,

the following high-frequency estimator for λ is derived. Since Z is a pure-jump

subordinator, then Z and also X will be of bounded variation. Recall that the total

variation of the stochastic process X :Ω 7→R+ with cádlág paths on [0, t ] is defined

for each ω ∈Ω as the supremum of the sum of the absolute values of the increments

of the path Xs (ω), over the set of all partitions of the interval [0, t ]:

‖X ‖t (ω) = sup( n∑

i=1

∣∣X ti (ω)−X ti−1 (ω)∣∣ : 0 = t0 < t1 < ·· · < tn = t ,n ∈N

).

When the stochastic process is of bounded variation, the supremum will be finite

for almost all ω ∈Ω. It now follows that the total variation process,‖X ‖t , of the OU

process is given by

‖X ‖t =λ∫ t

0X (s)ds +Z (λt )

=λ∫ t

0X (s)ds + (

X (t )−X (0))+λ∫ t

0X (s)ds

= 2λ∫ t

0X (s)ds + (

X (t )−X (0)),

since X (t ) can be written as the difference between the two monotone, non-decreasing

right-continuous processes, Y 1t := Z (λt) and Y 2

t := λ∫ t

0 X (s)ds − X0, and the two

Lebesgue-Stieltjes measures induced by the functions are singular. Hence, the total

variation of X (t ) will be the sum of the total variations of the two stochastic processes

Y 1t and Y 2

t . The total variation on [0, t ] of a monotone non-decreasing function, f , is

simply f (t )− f (0) as the sum of increments from the definition becomes a telescoping

sum and the first equality now follows. The second equality follows from (2.10). Under

the assumption that∫ t

0 X (s)ds 6= 0 the above derivation gives rise to the following

high-frequency estimator

λHF = ‖X ‖t −(X (t )−X (0)

)2 á∫ t

0 X (s)ds.

An estimator, ‖X ‖t , of the total variation would be ‖X ‖t =∑t/∆

i=1

∣∣X (i∆)−X ((i −1)∆)∣∣

and as an estimator of the integral, á∫ t0 X (s)ds, the Riemann sum

∑t/∆i=1 X ((i−1)∆)∆will

be used. The estimator will be implemented for t = T . The performance of the high

frequency estimator λHF depends on in-fill asymptotics (∆→ 0) in contrast to the

other two estimators λ1 and λ100 where the performance will improve as the sample

size grows (T →∞). In order to obtain initial values for the parameters governing

the marginal distribution of the D-OU process two different ideas will be employed.

The first way of obtaining initial values comes from simple moment matching and

is also the method used in Valdivieso et al. (2009). From matching the theoretical


expressions for the mean and variance of the process

Yi =∫ λi∆

λ(i−1)∆exp(s)dZ (s) = exp(λ∆)X (i∆)−X ((i −1)∆)

d= Z∗(∆)

with the empirical estimators Y = 1T /∆

∑T /∆i=1 Yi and S2 = 1

T /∆−1

∑T /∆i=1 (Yi − Y )2 we

obtain the following estimators

• When X ∼ Γ(ν,α) we have

νMoM = Y(exp(λ∆)−1

)αMoM

with αMoM = S2(

exp(λ∆)−1)

Y(

exp(2λ∆)−1) .

• When X ∼ IG(δ,γ) we have

δMoM = γMoM Y

exp(λ∆)−1with γMoM = 1

S

√Y

(exp(2λ∆)−1

)exp(λ∆)−1

.

The above estimators depend onλ and in the implementations λ1 will instead be used.

The other way of obtaining initial values for the parameters governing the marginal

distribution of X is to (wrongly) assume that X0, X∆, . . . , XT /∆ is an i.i.d. sequence

and perform maximum likelihood estimation using the marginal distribution, D, of

X . This procedure will give rise to estimators denoted by the subscript iidMLE. The

accuracy of the different ways of obtaining initial parameters are investigated in a

simulation study and the results are summarized in Tables 2.1 - 2.4. In each scenario

we report the mean, bias, standard deviation and root mean square error (RMSE) of

the estimators.

For the Γ(ν,α)-OU process we see that, as expected, our λHF estimator per-

forms the best when λ= 0.02. However, λHF is the worst performing estimator when

λ= 0.25, mainly because of a large increase in the bias of the estimator. This result

holds both in scenario 1 and scenario 2. Comparing λ1 with λ100 the first estimator

always outperforms the latter. In both scenarios there appears to be little difference

between αMoM and αiidMLE and their RMSEs are lowest when λ = 0.25. As for the

two estimators of ν, νiidMLE has a lowest RMSE than νMoM when λ= 0.02, but when

λ = 0.25 the difference between the two estimators is diminishing. Again the RM-

SEs are in general lower when λ = 0.25. This result is not surprising since λ = 0.25

corresponds to sampling at a lower frequency, which makes the i.i.d. assumption

more reasonable and also causes the empirical moments to be closer to the true

theoretical moments, due to the persistence of the OU process. All in all resulting in

more accurate parameter estimates than when λ= 0.02, both in terms of bias and

standard deviation.

For the infinite activity IG(δ,γ)-OU process, the results for the three estimators

of λ are the same as in the Γ(ν,α) case. The downwards bias in λHF that worsens

when λ∆ increases could be explained by the estimator of the total variation in the


Table 2.1. Initial values for the Γ(ν,α)-OU process in scenario 1 with λ = λ1 in the MoMestimates.

λ= 0.02 λ1 λ100 λHF νMoM αMoM νiidMLE αiidMLE

True 0.0200 0.0200 0.0200 10.00 0.0666 10.00 0.0666Mean 0.0252 0.0300 0.0166 12.74 0.0583 12.18 0.0609Bias 0.0052 0.0100 -0.0035 2.737 -0.0084 2.176 -0.0057Std. dev. 0.0080 0.0123 0.0002 4.373 0.0194 4.139 0.0203RMSE 0.0096 0.0158 0.0035 5.155 0.0211 4.673 0.0210


True 0.2500 0.2500 0.2500 10.00 0.0666 10.00 0.0666Mean 0.2539 0.2600 0.0814 10.13 0.0664 10.09 0.0666Bias 0.0039 0.0100 -0.1686 0.1278 -0.0003 0.0893 -4.7e-05Std. dev. 0.0256 0.0415 0.0018 0.9598 0.0063 0.9469 0.0062RMSE 0.0259 0.0427 0.1686 0.9674 0.0063 0.9502 0.0062

Table 2.2. Initial values for the Γ(ν,α)-OU process in scenario 2 with λ = λ1 in the MoMestimates.






Table 2.3. Initial values for the IG(δ,γ)-OU process in scenario 1 with λ = λ1 in the MoMestimates.

λ= 0.02 λ1 λ100 λHF δMoM γMoM δiidMLE γiidMLE

True 0.0200 0.0200 0.0200 2.582 3.873 2.582 3.873Mean 0.0260 0.0303 0.0139 2.944 4.434 2.868 4.318Bias 0.0059 0.0103 -0.0061 0.3616 0.5612 0.2863 0.4449Std. dev. 0.0074 0.0119 0.0002 0.4862 0.7406 0.4830 0.7356RMSE 0.0095 0.0157 0.0061 0.6055 0.9287 0.5610 0.8590



Table 2.4. Initial values for the IG(δ,γ)-OU process in scenario 2 with λ = λ1 in the MoMestimates.






nominator which is always smaller than or equal to the total variation. The bias in

the total variation estimator seems to dominate the bias from replacing the integral

in the denominator by a Riemann sum resulting in an overall downwards biased

estimator for λ. For both the Γ-OU and IG-OU process this bias is more pronounced

in scenario 1. For the two parameters governing the marginal distribution of the IG-

OU process the patterns are also comparable to those obtained for the Γ-OU process.

The estimators based on (wrongly) assuming i.i.d. observations and conducting MLE

estimation using the marginal distribution have smaller RMSEs than those obtained

by matching the mean and variance of the process Y .

Based on the results above, the i i d MLE estimators of the parameters governing

the marginal distribution will be used as initial values in the two estimation methods

from the previous section. In applications to real data, we will never know if we are

in a situation where λHF highly underestimates the mean-reversion parameter λ. A

more robust choice is therefore to use λ1 as our initial value for the FFT MLE and

MGEF based estimation procedures.

2.4.2 Results for the FFT MLE Procedure

In this subsection we investigate the performance of the FFT MLE procedure from

Valdivieso et al. (2009) in a Monte Carlo study. The simulation study extends the

one found in Valdivieso et al. (2009) by also considering parameter configurations

yielding marginal densities of the type in the right side of the panel in Figure 2.1, and

by using 500 Monte Carlo replications instead of 100.

We follow Valdivieso et al. (2009) and only estimate the parameters characterizing

the marginal distribution when considering the Γ-OU process. The reason for this

is based on the assumption that for at least one j ∈ 1,2, . . . ,T the Γ-OU process

will not jump between the two consecutive time periods ( j − 1)∆ and j∆. Hence,

the mean-reversion parameter does not need to be estimated and can be recovered

according to

λ= 1

∆log

( x( j−1)∆

x j∆

).

The results from estimating the Γ-OU process in the two different scenarios are

reported in Table 2.5 and Table 2.6. In both scenarios the FFT parameters are set as

N = 215 and ζ= 0.001 and the optimization is performed using fmincon with initial

values (νiidMLE , αiidMLE ). For the infinite activity IG-OU process the mean-reversion

parameter λ also needs to be estimated and we will use (λ1, νiidMLE , αiidMLE) as ini-

tial values for the optimization procedure. For the IG-OU process it was harder to

obtain convergence in the optimization procedure, mainly due to the sensitivity

towards the initial value λ1.7 As a result we used the simulated annealing proce-

dure (simulannealbnd function in MATLAB) for the first 1000 iteration and then

7Especially λ1 <λ gave rise to convergence problems.


Table 2.5. FFT MLE parameter estimates for the Γ(ν,α)-OU process in scenario 1.

λ= 0.02 λ= 0.25

νFFT αFFT νFFT αFFT

True 10.00 0.0666 10.00 0.0666Mean 10.02 0.0672 10.01 0.0667Bias 0.0235 0.0005 0.0119 4.6e-05Std. dev. 0.7086 0.0049 0.3637 0.0024RMSE 0.7088 0.0049 0.3636 0.0024nRMSE 0.0709 0.0735 0.0364 0.0360

Table 2.6. FFT MLE parameter estimates for the Γ(ν,α)-OU process in scenario 2.

λ= 0.02 λ= 0.25

νFFT αFFT νFFT αFFT

True 0.5000 0.5000 0.5000 0.5000Mean 0.4941 0.5095 0.5013 0.4993Bias -0.0059 0.0095 0.0013 -0.0007Std. dev. 0.1445 0.1908 0.0447 0.0455RMSE 0.1445 0.1908 0.0447 0.0454nRMSE 0.2890 0.3816 0.0894 0.0908

proceeded with fmincon. The necessary choices of FFT parameters also reflect the

fact that the optimization becomes more unstable in the IG-OU case. In scenario

1, N = 217 and ζ=5e-05 was used for both choices of λ. In scenario 2, convergence

could only be obtained in the lower frequency case λ= 0.25 and we had to set N = 219

and ζ=1.25e-05. Since the frequency (λ= 0.02) seemed to be the problem we instead

evaluated the density function fZ∗(∆) at the data points eλ∆xi ∆−x(i−1)∆ with ∆= c∆

for c ∈N. In order not to reduce the number of data points to much we used rolling

time intervals leaving us with T −c data points. The cost of this is that the data points

will no longer be independent but instead display a correlation structure of order c−1.

The resulting method will be referred to as the quasi FFT MLE estimation procedure.

As for the choice of c, one must try to balance the gain from lowering the frequency

with the loss from using correlated data when constructing the likelihood function.

A more extensive investigation on how to optimally choose c is not pursued in this

chapter. The results for the IG-OU process are reported in Tables 2.7 - 2.9.

In each scenario we report the mean, bias, standard deviation, RMSE and the


Table 2.7. FFT MLE parameter estimates for the IG(δ,γ)-OU process in scenario 1.

λ= 0.02 λ= 0.25

λFFT δFFT γFFT λFFT δFFT γFFT

True 0.0200 2.582 3.873 0.2500 2.582 3.873Mean 0.0200 2.520 3.977 0.2496 2.525 3.953Bias -1.8e-05 -0.0622 0.1043 -0.0004 -0.0575 0.0797Std. dev. 0.0003 0.1252 0.4103 0.0026 0.0680 0.1279RMSE 0.0003 0.1397 0.4230 0.0026 0.0890 0.1506nRMSE 0.0150 0.0541 0.1093 0.0104 0.0345 0.0389

Table 2.8. FFT MLE parameter estimates for the IG(δ,γ)-OU process in scenario 2.

λ= 0.25

λFFT δFFT γFFT

True 0.2500 0.3536 1.414Mean 0.2500 0.3442 1.447Bias -3.7e-05 -0.0093 0.0327Std. dev. 0.0003 0.0230 0.2046RMSE 0.0003 0.0248 0.2070nRMSE 0.0012 0.0701 0.1464

Table 2.9. Quasi FFT MLE parameter estimates for the IG(δ,γ)-OU process in scenario 2 withc = 10.

λ= 0.02

λFFT δFFT γFFT

True 0.0200 0.3536 1.414Mean 0.0200 0.3885 1.536Bias -4.6e-06 0.0350 0.1217Std. dev. 1.6e-05 0.0287 0.7790RMSE 1.7e-05 0.0453 0.7873nRMSE 8.5e-04 0.1281 0.5567


RMSE normalized by the true parameter value (nRMSE). The results reveal that in all

scenarios and for both types of OU processes the parameters from the marginal dis-

tribution are easier to estimate when λ= 0.25. This is not surprising since observing

the OU process over a longer time span, although at the cost of lower frequency of

observations, is beneficial, especially for persistent processes. Furthermore, for the

Γ-OU process the number of expected jumps in our sample is given by λν∆T , which

in scenario 2 equals 125 when λ= 0.25 and only 10 when λ= 0.02. In this case, the

jumps contain almost all the information on the parameters governing the marginal

distribution and the extra available information when λ= 0.25 translates into more

accurate parameter estimates. Another reason could be the usage of more accurate

initial values, since the i.i.d. assumption underlying the iidMLE estimators becomes

more plausible when λ increases. Finally, a more numerical reason could be that the

order of magnitude of the data points , eλ∆xi∆−x(i−1)∆, where the density function is

to be evaluated, decreases when λ decreases. As a consequence it becomes harder

to fine tune the FFT parameters (N and ζ) and obtain convergence when λ= 0.02.

As for the mean reversion parameter, which is also estimated for the IG-OU process,

then at both frequencies λ is extremely well estimated. In scenario 1, the same results

for the marginals parameters as well as the mean-reversion parameter were found in

the Monte Carlo study in Valdivieso et al. (2009).

If we look at the performance of the estimation procedure across the two sce-

narios, by comparing the normalized RMSEs, several patterns become evident. First

of all, the parameters from the marginal distribution are easier to estimate in the

“base-signal” scenario (scenario 1), regardless of the value of λ and for both types

of OU processes. For the IG-OU process, where λ is also estimated, the results for λ

show that the parameter is more accurately estimated in the “spike” scenario. These

results could be explained by the fact that in the “base-signal” scenario there is a lot

of activity/spikes and hence the Lévy functionals Yid= Z∗(∆) influence the observa-

tions more than in the “spike” scenario, where the OU process (almost) just exhibits

exponential decay, determined by λ ,on the intervals between large spikes.

Based on the results of the Monte Carlo study, the FFT procedure seems to per-

form equally well for finite and infinite activity OU processes. However, one important

difference lies in the applicability of the estimation method. It was in general harder

to obtain convergence in the estimation procedure when the infinite activity process

was considered, especially in scenario 2. And even when convergence was obtained

in the different setting, the choice of nuisance parameters (N and ζ) influences the

parameter estimates. If the nuisance parameters are not chosen optimally the estima-

tor looses efficiency and the bias may increase. For the Γ-OU process λ is assumed to

be known and one way of fine tuning N and ζ is to calibrate fZ∗(∆) to the empirical

density of the Lévy functionals. The construction of the Lévy functionals depends on

λ, and for the IG-OU process and estimate of λ must be used.

In terms of RMSEs the FFT MLE procedure performs significantly better than any


of the estimators used for finding initial values, except for the marginal parameters of

the IG-OU process in scenario 2 where the performance is similar to the i.i.d. MLE

approach.

2.4.3 Results for the Estimation Procedure based on the Optimal MGEF

In this subsection the results on the finite sample performance of the optimal MGEF

are presented. As a benchmark for the optimal MGEF based estimator, we will also

consider the simpler estimating function that emerges from solving the minimization

problem

minθ∈Θ

n∑i=1

[xi∆−F (x(i−1)∆;θ)

]2 + [(xi∆−F (x(i−1)∆;θ))2 −β(x(i−1)∆;θ)

]2, (2.12)

Differentiation of (2.12) w.r.t. the parameters in θ yields the following expression

Sn(θ) =n∑

i=1−2∂θF (xi∆−F )− (

4(xi∆−F )∂θF +2∂θβ)(

(xi∆−F )2 −β), (2.13)

where the dependence on θ and x(i−1)∆ is suppressed. Note that (2.13) is almost of

the form (2.8) with the same choice of functions, h1 and h2, as in the optimal MGEF.

The only difference is the dependence on xi∆ in the weight in front of h2. In most

cases this means that the estimating function will not be a martingale. In particular,

Sn(θ) will only be a martingale for OU processes satisfying Eθ((xn∆−F )3|X(n−1)∆ =

x) = η(∆, x;θ) = 0. This property holds for the Gaussian-OU process, and it is also

satisfied for non-Gaussian OU processes where the AR(1) corrected residuals, ε, have

a symmetric distribution around its mean. Unfortunately this is not the case for the

two subordinator-driven non-Gaussian OU processes considered in this chapter. This

also implies that the estimating function, Sn(θ), is biased and the resulting estimates

might be biased as well. The size of η(∆, x;θ) is however quite small in our setting

and is, not surprisingly, smaller in scenario 1 and decreases when λ decreases. In an

attempt to balance out the potential bias, we instead minimize the function

minθ∈Θ

n∑i=1

[xi∆−F (x(i−1)∆;θ)

]2 +0.01[(xi∆−F (x(i−1)∆;θ))2 −β(x(i−1)∆;θ)

]2. (2.14)

The constant 0.01 in (2.14) might seem somewhat arbitrary and of course one

could choose the constant in a more optimal way, but since the resulting estimator

just serves as a benchmark for the optimal MGEF this will not be further pursued.

Besides, an optimal constant would depend on the unknown parameters, θ, and the

purpose of the simple estimating function is to investigate the performance of an

easy implementable and straightforward approach that has a structure similar to the

optimal MGEF. Furthermore, the simple structure in (2.12) allows for a parametriza-

tion in terms of λ, ξ andω2, which gives a smoother objective function. The estimates

of the parameters from the marginal distribution can then be backed out using the


estimates of ξ andω2. Such a re-parametrization is not possible for the optimal MGEF

since η(∆, x;θ) and ψ(∆, x;θ) can not be expressed in terms of ξ and ω2.8 The two es-

timating functions are easy to implement and, in contrast to the FFT MLE procedure,

the estimation procedure do not rely on any nuisance parameters. Furthermore, the

mean-reversion parameter can be estimated simultaneously with the parameters

from the marginal distribution for all non-Gaussian OU processes. With the FFT MLE

procedure this was only possible for infinite activity OU processes. For numerical

reasons, when implementing the estimator based on the optimal MGEF, we minimize

Gn(θ)′Gn(θ) w.r.t. θ instead of solving Gn(θ) = 0. For both types of OU processes and

at both frequencies, the method based on the simple estimating function was robust

w.r.t. the choice of initial values and had no problems converging. For the method

using the optimal MGEF it was in some cases a bit harder to obtain convergence and

not ending up in corner solutions. We therefore used a grid of initial values centered

around λ1 and the i i d MLE estimates. Except for the Γ-OU process in scenario 1, this

strategy worked and convergence to a local minimum was obtained. For the Γ-OU

process in scenario 1, further numerical difficulties were encountered. The objective

function to be minimized became very irregular and had several (meaningful) local

minima. The problem is not a small sample problem and fixing the value ofλ and only

optimizing w.r.t. the parameters from the marginal distribution did not help either.

The results presented for that case, therefore depend on the choice of grid of initial

values. The problem of not having a unique solution to Gn(θ) = 0 was also reported in

Benth et al. (2012) where a Γ-OU process is used for modeling the base-signal of EEX

electricity spot prices and estimated using prediction-based estimating functions.9

Prediction-based estimating functions were introduced in Sørensen (2000) and are a

generalization of MGEFs based on unconditional moments instead of conditional

moments. In this case, where the model is Markovian and conditional moments are

computable, MGEFs result in more efficient estimators (asymptotically) and are to

be preferred. The results from implementing the simple benchmark from (2.12) and

the optimal MGEF are reported in Table 2.10 - 2.13, where the abbreviations SB and

OMG are used for the estimators based on the simple and optimal quadratic MGEF.

As was the case with FFT MLE procedure, the parameters are more accurately

estimated when the frequency is lower (λ = 0.25). As with for instance the MoM

estimation procedure, this can be explained by the improved match between em-

pirical moments, in this case conditional moments, and the theoretical moments

underlying the estimation procedure, when the process is observed over a longer

time span. The optimal MGEF based method outperforms the estimator from (2.12),

in terms of RMSEs, except for the Γ-OU process in scenario 1 with frequency λ= 0.25.

In that case the simple benchmark performs a bit better and, as mentioned in the

8Other re-parametrizations might be fruitful in terms of numerical stability, but this was not pursuedfurther.

9See Sørensen (2000) or Brix and Lunde (2013) for an introduction to prediction-based estimatingfunctions.


Table 2.10. Parameter estimates for the Γ(ν,α)-OU process in scenario 1 using the optimalMGEF and the simple benchmark.

λ= 0.02 λ= 0.25

λSB νSB αSB λOMG νOMG αOMG λSB νSB αSB λOMG νOMG αOMG

True 0.0200 10.0000 0.0666 0.0200 10.0000 0.0666 0.2500 10.0000 0.0666 0.2500 10.0000 0.0666Mean 0.0241 12.2331 0.0611 0.0241 10.8144 0.0656 0.2529 10.1074 0.0665 0.2529 10.0948 0.0667Bias 0.0041 2.2331 -0.0056 0.0041 0.8144 -0.0011 0.0029 0.1074 -0.0001 0.0029 0.0948 0.0000Std. dev. 0.0079 4.3024 0.0209 0.0079 2.5237 0.0178 0.0255 0.9573 0.0063 0.0255 1.0417 0.0067RMSE 0.0089 4.8436 0.0216 0.0089 2.6494 0.0178 0.0256 0.9623 0.0063 0.0256 1.0450 0.0067nRMSE 0.4445 0.4844 0.3239 0.4438 0.2649 0.2667 0.1025 0.0962 0.0941 0.1024 0.1045 0.0998

Table 2.11. Parameter estimates for the Γ(ν,α)-OU process in scenario 2 using the optimalMGEF and the simple benchmark.

λ= 0.02 λ= 0.25

λSB νSB αSB λOMG νOMG αOMG λSB νSB αSB λOMG νOMG αOMG

True 0.0200 0.5000 0.5000 0.0200 0.5000 0.5000 0.2500 0.5000 0.5000 0.5000 0.5000 0.5000Mean 0.0237 0.6995 0.4070 0.0231 0.6573 0.4149 0.2560 0.5417 0.4779 0.2545 0.5165 0.4899Bias 0.0037 0.1995 -0.0930 0.0031 0.1573 -0.0851 0.0060 0.0417 -0.0221 0.0045 0.0165 -0.0101Std. dev. 0.0053 0.3014 0.2889 0.0046 0.2800 0.2126 0.0250 0.0735 0.0693 0.0192 0.0708 0.0701RMSE 0.0064 0.3612 0.3032 0.0056 0.3209 0.2288 0.0257 0.0844 0.0727 0.0197 0.0726 0.0707nRMSE 0.3205 0.7224 0.6065 0.2785 0.6418 0.4576 0.1028 0.1688 0.1454 0.0788 0.1451 0.1414

Table 2.12. Parameter estimates for the IG(δ,γ)-OU process in scenario 1 using the optimalMGEF and the simple benchmark.

λ= 0.02 λ= 0.25

λSB δSB γSB λOMG δOMG γOMG λSB δSB γSB λOMG δOMG γOMG

True 0.0200 2.5820 3.8730 0.0200 2.5820 3.8730 0.2500 2.5820 3.8730 0.2500 2.5820 3.8730Mean 0.0242 2.8875 4.3056 0.0216 2.7461 4.1035 0.2558 2.6151 3.9204 0.2529 2.6030 3.9045Bias 0.0042 0.3055 0.4326 0.0016 0.1642 0.2305 0.0058 0.0331 0.0474 0.0029 0.0210 0.0316Std. dev. 0.0064 0.4581 0.6613 0.0035 0.3193 0.4965 0.0258 0.1376 0.2119 0.0183 0.1200 0.1852RMSE 0.0077 0.5502 0.7897 0.0038 0.3588 0.5469 0.0264 0.1414 0.2170 0.0185 0.1217 0.1877nRMSE 0.3835 0.2131 0.2039 0.1916 0.1389 0.1412 0.1055 0.0548 0.0560 0.0739 0.0471 0.0485


Table 2.13. Parameter estimates for the IG(δ,γ)-OU process in scenario 2 using the optimalMGEF and the simple benchmark.

λ= 0.02 λ= 0.25

λSB δSB γSB λOMG δOMG γOMG λSB δSB γSB λOMG δOMG γOMG

True 0.0200 0.3536 1.4142 0.0200 0.3536 1.4142 0.2500 0.3536 1.4142 0.2500 0.3536 1.4142Mean 0.0234 0.4738 2.0955 0.0214 0.4254 1.9986 0.2553 0.3810 1.4979 0.2511 0.3610 1.4671Bias 0.0034 0.1203 0.6813 0.0014 0.0718 0.5844 0.0053 0.0274 0.0837 0.0011 0.0074 0.0529Std. dev. 0.0053 0.3580 0.8393 0.0029 0.1023 0.7911 0.0250 0.0374 0.1854 0.0139 0.0377 0.1882RMSE 0.0063 0.3773 1.0803 0.0032 0.1249 0.9829 0.0255 0.0463 0.2032 0.0139 0.0384 0.1953nRMSE 0.3150 1.0672 0.7639 0.1599 0.3532 0.6950 0.1021 0.1310 0.1437 0.0555 0.1086 0.1381

beginning of this subsection, is also numerically more stable. The gain from using the

optimal MGEF is most apparent when λ= 0.02. Maybe because the better the data

fit the underlying conditional moment conditions imposed by the functions h1 and

h2, the less gain there is from using an optimal weighting matrix. If we compare the

performance of the estimation method based on MGEFs across the two scenarios, in

terms of nRMSEs, we find the same patterns as was found for the FFT MLE estimation

procedure. Namely, that the mean-reversion parameter is easier to estimate in the

“spike” scenario, whereas the parameters from the marginal distribution of the OU

process are more accurately estimated in the “base-signal” scenario. The differences

between the nRMSEs are most prominent when λ= 0.02, corresponding to the case

where the process is being observed at a higher frequency but over a shorter time

span. One possible explanation for this could be that in scenario 2 there are more

intervals without jumps/without big jumps, making inference on lambda easier and

λ = 0.02 increases the prob of these intervals. As for the marginal parameters, the

opposite argument can be applied since we in the “base-signal” scenario have more

observations containing info about the marginal parameters.

If we in each of the two scenarios compare the nRMSE across the two types of

OU processes we find that in scenario 1 the parameters seem to be more accurately

estimated for the IG-OU process. This is also the case in scenario 2 when λ= 0.25,

but when λ = 0.02 the estimation method performs equally well for the finite and

infinite activity OU processes.

For the Γ-OU process the estimator for λ based on the optimal MGEF has a

lower RMSE than the regression-based estimator, λ1, especially in scenario 2. For

the IG-OU process there is an even bigger gain from using the optimal MGEF to

estimate λ instead of using the estimator λ1. In general, the gain from using the

optimal MGEF based estimation method, instead of the straightforward methods

studied in the subsection of methods used for obtaining initial values, are largest for

the estimation of the mean-reversion parameter. In fact, for the marginal parameters

the performance is only better than the iidMLE estimators in scenario 1 and again

the gain is most apparent when λ= 0.02. In scenario 2, the RMSEs for the marginal

2.5. EXTENSIONS 75

parameters are around the same size as for the iidMLE estimators for the Γ-OU

process. For the IG-OU process in scenario 2, the iidMLE estimators actually perform

slightly better than the estimator based on the optimal MGEF. However, it is still

preferable to estimate all three parameters simultaneously using the MGEFs, instead

of splitting the estimation procedure and use the iidMLE estimators and λ1.

The results for the FFT MLE estimation procedure might, as already discussed,

serve as a lower bound for the attainable efficiency since maximum likelihood estima-

tion is the most efficient estimation procedure and when comparing the performance

of the MGEF based estimators with the results for the FFT MLE procedure this also

becomes evident. For the Γ-OU process the differences between the RMSEs for the

marginal parameters are largest in scenario 1 where the RMSEs for the optimal MGEF

based procedure are approximately three times higher than for the FFT MLE proce-

dure. In scenario 2 the RMSE is around twice as high for ν and a bit less for α. For

the IG-OU process the difference in scenario 1 is a factor 2, whereas the difference is

smaller in scenario 2 and is in fact mostly present for δ. For both type of processes,

the difference in RMSEs between the FFT MLE procedure and the procedure based

on the optimal MGEF are larger when λ = 0.02. The mean-reversion parameter is

extremely well estimated when the FFT MLE estimation procedure is used compared

to the optimal MGEF based procedure and again the difference is most pronounced

when λ= 0.02.

2.5 Extensions

This section offers a discussion of interesting ways in which estimation methods

could be extended to handle other models based on non-Gaussian OU processes.

It would be natural to consider extending the estimation procedures to the case

where we have observations from a superposition of independent OU processes.

These multi-factor models are very popular in the literature on modeling commodity

spot prices, like electricity, that can be split into a base-signal and spike component,

see for instance Benth et al. (2007), Meyer-Brandis and Tankov (2008), Klüppelberg

et al. (2010) and Benth and Vos (2013). When estimating these models standard prac-

tice, in the literature on modeling commodity prices, is to split the spot prices into

the base-signal and spike component using various filtering techniques and then in

a second step estimate each of the OU processes separately. It would therefore be of

great interest to develop estimation methods for estimating all the parameters simul-

taneously. If the OU processes have the same mean-reversion rates, Markovianity of

the model is preserved and MGEFs are still applicable. The FFT MLE approach is also

still feasible since the characteristic function of a sum of independent processes is

just the product of the characteristic functions of the entering factors. However, this

extension of the FFT MLE procedure is only directly applicable to the superposition

of infinite activity OU processes. For the finite activity processes, the extension will


be difficult to implement since these processes are mixed random variables and it is

therefore not straightforward to decide how the density function should be defined.

One will again need to condition on the presence of jumps, and this time also on

which of the processes that jumped.

Unfortunately, when using a superposition to capture the dynamics of energy

commodity spot prices, in particular electricity prices, different mean-reversion rates

are a crucial feature of the model. The OU process used for modeling the spike com-

ponent must have a much faster mean-reversion rate as the spikes occur when, for

instance, a nuclear power plant unexpectedly shuts down. These sudden imbalances

in the market cause prices to spike, but shortly after the prices revert back to normal

levels. In order to create this spike behavior it is necessary for the spike process to

have a larger mean-reversion rate than the one used for capturing minor imbalances

in supply and demand, (the base-signal process). Furthermore, allowing for different

mean-reversion rates makes modeling of the autocorrelation structure more flexible,

and the stylized fact of multi-scale autocorrelation function can now be accommo-

dated. Incorporating different mean-reversion rates causes statistical challenges

since the model becomes non-Markovian. This also means that maximum likelihood

estimation is no longer feasible. Instead of resorting to filtering methods and splitting

the spot price into a base and spike component, simultaneous estimation of the

parameters can be carried out using prediction-based estimating functions, which

are a generalization of martingale estimating functions, (see Sørensen (2000) or Brix

and Lunde (2013) for an introduction to prediction-based estimating functions). The

idea of using prediction-based estimating function to estimate superpositions of OU

processes was put forward in Sørensen (2000) and in the context of commodity prices

in Benth et al. (2007), but the performance of this method is still to be investigated.

The method only relies on the computation of unconditional moments, a task that is

still feasible when considering the non-Markovian superposition of OU processes.

Implementing and studying the performance of this estimation approach is left for

further research.

Another interesting way of extending the two estimation procedures is to consider

the inclusion of measurement errors in the observations. This would also be a step in

the direction of investigating how the methods handle more realistic data. For the

methods based on MGEFs the measurement errors could be taken into account para-

metrically if we are willing to assume a concrete specification of the measurement

errors, such as normally distributed additive measurement errors. In fact, for con-

struction of the simple benchmark in (2.12) no distributional assumption is needed

for the measurement errors. The mean and variance of the error process could simply

be included as additional parameters. In contrast, for the optimal MGEF the 3rd and

4th order moments of the error process are also needed, and a quasi MGEF approach

where the errors are assumed to be normally distributed might be more efficient

than including 4 additional parameters in the parameter vector. For the FFT MLE

2.6. CONCLUSION 77

method the extension to the measurement error case is not as straightforward, at

least not for the finite activity OU processes. For the IG-OU process, the characteristic

function for the Lévy functionals plus the independent error term can be computed

if we make a distributional assumption about the errors. For the Γ-OU process the

estimation is complicated by the fact that the process is a mixed random variable.

This means we have to split the computation of the conditional density into a jump

and no-jump part as was done in (2.6) for the Γ-OU process without measurement

errors. One way of deciding if a jump has occurred or not could be to plot the AR(1)

transformed residuals (the Lévy functional plus the measurement error) and from

this plot obtain a threshold, instead of 0 as was used in (2.6), for deciding whether or

not a jump has occurred. This method would probably only perform well in a setting

where the jumps are of a much larger magnitude than the measurement errors. Also,

if the jump intensity is believed to be very high, ignoring the no-jump part of the

likelihood might also work. However, the theoretical justification and finite sample

performance of the estimation procedures are outside the scope of this chapter.

2.6 Conclusion

The Monte Carlo study from Valdivieso et al. (2009) was extended by also considering

other shapes of the marginal density of the OU process, better suited for modeling

the spike part of commodity prices. In this parameter setting the density of the Lévy

functionals were more concentrated around zero, giving rise to numerical challenges.

In particular, fine tuning the parameters N and ζ of the FFT was a bit harder. However,

when well implemented, the FFT MLE procedure performed very well in terms of bias

and standard deviation of the estimates and could serve as a bound on the attainable

efficiency of other estimators. The chapter also considered another simulation free

estimation method, a method based on MGEFs, that are analytically tractable in the

Markovian setting of non-Gaussian OU processes. The optimal MGEF was derived for

a general non-Gaussian OU process and the method was implemented and analyzed

for the Γ-OU and IG-OU process in the “base-signal” and “spike” scenario. Except for

numerical problems for the Γ-OU process in the “base-signal” scenario, the method

performed well. The highest gain in terms of RMSE, compared to the methods sug-

gested for obtaining initial values for the estimation procedures, was encountered

in the mean-reversion parameter. As for the parameters governing the marginal

distribution, the RMSEs were not much higher than those from the simple maxi-

mum likelihood estimation procedure that wrongly assumes that the observations

are independent (the iidMLE estimator). Comparing the performance of the MGEF-

based method and FFT MLE method revealed that despite the good performance

of the MGEF based method there is still room for efficiency improvements. These

improvements could possibly be obtained by leaving the class of quadratic MGEFs

and increase the dimension of h, the vector containing conditional moment condi-


tions. It is also worth noting that, in contrast to the FFT MLE estimation procedure,

the MGEFs also offer a framework for simultaneously estimating the mean-reversion

parameter and the parameters from the marginal distribution for finite activity OU

processes. Furthermore, the MGEF-based methods are much easier to extend to han-

dle more realistic data containing measurement errors or to the case of observations

from a superposition of OU processes.

All in all, leaving the ideal setup with no model-misspecification and simulated

data, the MGEF-based estimation method seems like a numerically more robust

approach than the FFT MLE method. Especially if we consider the trouble the FFT

MLE method has with handling high-frequency data and the sensitivity towards the

nuisance parameters N and ζ. In spite of the superior performance of the FFT MLE

method documented in the present Monte Carlo study, the performance might be

quite different in settings where the parameters N and ζ are not easily fine-tuned

and, as already discussed, the method might not even be applicable when fitting

finite activity OU processes to real data. If the aim is to model the “base-signal” by a

non-Gaussian OU process, like the one from scenario 1 with a low value of λ, then

the estimation method using the optimal MGEF seems like a good choice, especially

for the infinite activity OU processes. If on the other hand, the “spike” process is

considered, that is, we are in scenario 2 and λ is high, then the iidMLE estimates of

the marginal parameters seems like an easy implementable and robust choice, with

little or no loss in effeciency compared to the MGEF-based estimators. The gain from

using MGEF-based methods in the “spike” scenario is in the efficiency of the estimate

of the mean-reversion parameter.

2.7. APPENDIX 79

2.7 Appendix

2.7.1 Simulation of non-Gaussian OU Processes

In this section simulation of the Γ-OU and IG-OU process will be described. Time

will be measured in units of trading days, such that ∆= 1 will corresponds to daily

sampling of the data. The aim is to simulate the non-Gaussian OU process X at the

discrete time points X (0), X (∆), X (2∆), . . . , X (n∆). The initial value X (0) can be simu-

lated by drawing from the invariant distribution D, which will be either the Γ(ν,α)

or the IG(δ,γ) distribution. From the recursive relationship in (2.3) the discretized

non-Gaussian OU process can be simulated if we can simulate random draws of the

variable∫ λ∆

0 e s dZ (s).

Simulating the Γ(ν,α)-OU Process

Simulation of the Γ-OU process is based on the following infinite series representa-

tion of the Lévy integral∫ T

0 f (s)dZ (s) of a positive and integrable function f w.r.t. a

subordinator ∫ λ

0f (s)dZ (s)

d=∞∑

i=1W −1(ai /λ) f (λri ), (2.15)

where W −1 denotes the inverse of the tail mass function W +(x) = ∫ ∞x w(y)dy with w

being the density of W , the Lévy measure of Z (1). In (2.15) the two series ai and ri

are independent and a1 < a2 < ·· · < ai < . . . are the arrival times of a Poisson process

with intensity 1 and ri are independent U [0,1] random variables. (see Thm 8.1 in

Barndorff-Nielsen and Shephard (2001a) for a proof of this result).

When X ∼ Γ(ν,α), the inverse of the tail mass function is

W −1(x) = max(0,−α log

( x

ν

)). (2.16)

By combining (2.15) and (2.16) we can now simulate from a gamma-OU process by

using

∫ λ∆

0e s dZ (s)

d=∞∑

i=1W −1(ai /λ∆)eλ∆ri

d=−α∞∑

i=11]0,ν[(ai /λ∆) log(ai /νλ∆)eλ∆ri

d=α∞∑

i=11]0,1[(ci ) log(c−1

i )eλ∆ri

d=αN (1)∑i=1

log(c−1i )eλ∆ri , (2.17)


where c1 < c2 < . . . are the arrival times of a Poisson process with intensity νλ∆ and

N (1) are the corresponding number of event up until time 1. This means that both ci

and ri are U [0,1] random variables.10

0 5 10 15 20 250.55

0.6

0.65

0.7

0.75

Days

0 100 200 300 400 5000.4

0.6

0.8

1

1.2

1.4

Days

0 5 10 15 20 250.4

0.6

0.8

1

1.2

1.4

Days

0 100 200 300 400 5000

0.5

1

1.5

Days

Figure 2.2. A short and long plot of the non-Gaussian gamma-OU process, X with∆= 1, ν= 10,α= 1/15 and for two different choices of λ. In the first row λ= 0.02 and in the second λ= 0.25.

Simulating the IG(δ,γ)-OU Process

When X ∼ IG(δ,γ), the tail mass function is given by (see Barndorff-Nielsen and

Shephard (2001b))

W +(x) = δp2π

x−1/2 exp(− 1

2γ2x

).

Finding the inverse tail mass function requires the use of the Lambert-W function,

Lw (x), which solves for w as a function of x in wew = x. Straightforward computation

now gives

W −1(x) = 1

γ2 Lw( γ2δ2

2πx2

). (2.18)

Unlike the case with the gamma-OU process combining (2.15) and (2.18) will not

result in a finite sum. Therefore, the sum in (2.15) must be truncated at some point.

This also means that a simulation scheme based on a truncation of the infinite series

representation will only result in an approximation of the desired IG(δ,γ)-OU process

since the sum of the small jumps is neglected.11 Instead, we use the exact simulation

10Note that to simulate the corresponding realization of Z , one can just choose the constant function 1as the f function in (2.15) and reuse the ci and ri from (2.17).

11See Gander and Stephens (2007) for a way of choosing the truncation point.

2.7. APPENDIX 81

0 5 10 15 20 25

0.16

0.18

0.2

0.22

0.24

Days

0 100 200 300 400 5000

0.5

1

1.5

Days

0 5 10 15 20 250

0.2

0.4

0.6

0.8

1

Days

0 100 200 300 400 5000

0.5

1

1.5

2

2.5

3

Days

Figure 2.3. A short and long plot of the non-Gaussian gamma-OU process, X with ∆ = 1,ν= 1/2, α= 1/2 and for two different choices of λ. In the first row λ= 0.02 and in the secondλ= 0.25.

method from Zhang and Zhang (2008) when simulating trajectories from the IG(δ,γ)-

process. From the recursive relationship in (2.3) we need to simulate from the i.i.d.

process

εn := e−λ∆∫ λ∆n

λ∆(n−1)e s d Z (s),

and from Theorem 1 in Zhang and Zhang (2008) we can do so by using the following

result

Theorem 6. For fixed∆> 0 and if γ> 0, then the random variable ε1 equals the sum of

an inverse Gaussian random variable and a compound Poisson process in distribution,

that is,

ε1d=W ∆

0 +N∆∑i=1

W ∆i ,

where W ∆0 ∼ IG(δ(1−e−

12λ∆),γ), N∆ is Poisson distributed with intensity δ(1−e−

12λ∆)γ

and W ∆1 ,W ∆

2 , . . . are independent random variables with density function

fW ∆ (w) =

γ−1p

2πw−3/2

(e

12λ∆−1

)−1(e−

12γ

2w −e−12γ

2weλ∆), w > 0

0, otherwise.

Further, W ∆0 , W ∆

1 ,W ∆2 , . . . and N∆ are independent.

Proof. For a proof of the Theorem see pages 341-343 in Zhang and Zhang (2008).


In Zhang and Zhang (2008) they also prove that for any w > 0 the density func-

tion fW ∆ (w) is dominated by 12

(1+ e

12λ∆

)g (w), where g is the density function of a

Γ( 12 ,2γ−2) distribution. The i.i.d. random variables W ∆

1 ,W ∆2 , . . . can now be generated

using the general acceptance rejection method and the simulation scheme for the

IG(δ,γ)-OU process immediately follows.

0 5 10 15 20 250.75

0.8

0.85

0.9

0.95

Days

0 100 200 300 400 5000

0.5

1

1.5

2

Days

0 5 10 15 20 250

0.5

1

1.5

Days

0 100 200 300 400 5000

0.5

1

1.5

Days

Figure 2.4. A short and long plot of the non-Gaussian IG-OU process, X with ∆= 1, δ=p20/3,

γ=p15 and for two different choices of λ. In the first row λ= 0.02 and in the second λ= 0.25.

0 5 10 15 20 250.05

0.1

0.15

0.2

0.25

0.3

Days

0 100 200 300 400 5000

0.5

1

1.5

Days

0 5 10 15 20 250

0.5

1

1.5

2

Days

0 100 200 300 400 5000

0.5

1

1.5

2

2.5

Days

Figure 2.5. A short and long plot of the non-Gaussian IG-OU process, X with ∆= 1, δ=p1/8,

γ=p2 and for two different choices of λ. In the first row λ= 0.02 and in the second λ= 0.25.

2.8. REFERENCES 83

2.8 References

Barndorff-Nielsen, O. E., 1998. Processes of normal inverse gaussian type. Finance

Stochastic 2, 41–68.

Barndorff-Nielsen, O. E., Shephard, N., 2001a. Modelling by Lévy processes for finan-

cial econometrics. In: Barndorff-Nielsen, O. E., Mikosch, T., Resnick, S. (Eds.), Lévy

processes -Theory and Applications. Birkhäuser, pp. 283–318.

Barndorff-Nielsen, O. E., Shephard, N., 2001b. Non-Gaussian OU-based models and



Barndorff-Nielsen, O. E., Sørensen, M., 1994. A review of some aspects of asymptotic

likelihood theory for stochastic processes. International Statistical Review 62, 133–

165.

Benth, F. E., Benth, J. S., Koekebakker, S., 2008. Statistical Modeling of Electricity and

Related Markets. Advanced Series on Statistical Science and Applied Probability.

World Scientific.




Benth, F. E., Kiesel, R., Nazarova, A., 2012. A critical empirical study of three electricity

spot price models. Energy Economics 34, 1589–1616.

Benth, F. E., Saltyte Benth, J., 2004. The normal inverse gaussian distribution and

spot price modelling in energy markets. International Journal of Theoretical and

Applied Finance 07 (02), 177–192.



545–571.

Bibby, B. M., Jacobsen, M., Sørensen, M., 2002. Estimating functions for discretely

sampled diffusion-type models. In: Aït-Sahalia, Hansen, L. P. (Eds.), Handbook of

Financial Econometrics. North Holland - Elsevier, pp. 203–268.

Bibby, B. M., Sørensen, M., 1995. Martingale estimation functions for discretely ob-

served diffusions processes. Bernoulli 1, 17–39.

Brix, A. F., Lunde, A., 2013. Estimating stochastic volatility models using prediction-

based estimating functions. Research paper 2013-23 2013-23, Creates.


Gander, M. P. S., Stephens, D. A., 2007. Stochastic volatility modelling with general

marginal distributions: Inference, prediction and model selection. Journal of Sta-

tistical Planning and Inference 137, 3068–3081.

Hubalek, F., Posedel, P., 2013. Asymptotic analysis and explicit esimation of a class of

stochastic volatility models with jumps using the martingale estimating function

approach. Glasnik Matematicki 48(1), 185–210.

Kessler, M., 1995. Martingale estimating functions for a Markov chain. PhD disserta-

tion Preprint, Laboratoire de Probabilités, Université Paris VI.

Klüppelberg, C., Meyer-Brandis, T., Schmidt, A., 2010. Electricity spot price modelling

with a view towards extreme spike risk. Quantitative Finance 10:9, 963–974.

Meyer-Brandis, T., Tankov, P., 2008. Multi-factor jump-diffusion models of electricity

prices. International Journal of Theoretical and Applied Finance 11, 503–528.

Schwartz, E., 1997. The stochastic behaviour of commodity prices: Implications for

valuation and hedging. The Journal of Finance 52(3), 923–973.

Sørensen, M., 1997. Estimating functions for discretely observed diffusions: A re-

view. Lecture Notes-Monograph Series, selected proceeding of the symposium on

estimating functions 32, 305–325.

Sørensen, M., 1999. On asymptotics of estimating functions. Brazilian Journal of

Probability and Statistics 13, 111–136.


123–147.

Valdivieso, L., Schoutens, W., Tuerlinckx, F., 2009. Maximum likelihood estimation

in processes of Ornstein-Uhlenbeck type. Statistical Inference for Stochastic Pro-

cesses 12, 1–19.

Zhang, S., Zhang, X., 2008. Exact simulation of IG-OU processes. Methodology and

Computing in Applied Probability 10, 337–355.

CH

AP

TE

R

3A GENERALIZED SCHWARTZ MODEL FOR

ENERGY SPOT PRICES

ESTIMATION USING A PARTICLE MCMC METHOD

Anne Floor Brix


Asger Lunde


Wei Wei


Abstract

We consider a two-factor geometric spot price model with stochastic volatility and

jumps. The first factor models the normal variations of the price process and the

other factor accounts for the presence of spikes. Instead of using various filtering

techniques for splitting the two factors, as often found in the literature, we estimate

the model in one step using a MCMC method with a particle filter. In our empirical

analysis we fit the model to UK natural gas spot prices and investigate the importance

of allowing for jumps and stochastic volatility. We find that the inclusion of stochastic

volatility in the process used for modeling the normal price variations is crucial and

that it strongly impacts the jump intensity in the spike process. Furthermore, our

estimation method enables us to consider both a continuous and purely jump-driven

specification of the volatility process, and thereby assess if the volatility specification

also influences the spike process and the overall model fit.

85

86 CHAPTER 3. A GENERALIZED SCHWARTZ MODEL FOR ENERGY SPOT PRICES

3.1 Introduction

The liberalization of the European energy markets has over the last couple of decades

led to highly deregularized and liquid markets for trading energy commodities, such

as gas and electricity. The introduction of competition, and hence also price risk, has

caused the markets to experience a significant increase in price volatility and a market

place for energy-based derivatives, used for hedging, has emerged. The transition to a

competitive market where prices are set according to supply and demand means that

energy spot prices have several distinct characteristics that should be captured by

any proposed model. The most important features are seasonality, mean-reversion,

spikes, multi-scale autocorrelation and stochastic volatility, see for instance Eydeland

and Wolyniec (2003) for empirical evidence on these stylized facts. Seasonality is

caused by the seasonal pattern on the demand side of the market, for instance by an

increased need for heating during the winter. Mean-reversion is a direct consequence

of the markets being supply and demand driven, which means that, unlike the stock

market, prices are not allowed to evolve freely but will fluctuate around a (possibly

stochastic) level. This also has the important implication that the deseasonalized

spot prices will be modeled using stationary processes. Due to delivery constraints in

the spot market, sudden imbalances in supply and demand are almost immediately

reflected in the spot price, causing the price to jump because of an inelastic demand

curve. These imbalances are typically caused by an unexpected rise in demand or

technical problems on the supply side. After experiencing a jump, the price quickly

mean-reverts to the normal level of production costs, leaving a spike in the price path.

The multi-scale autocorrelation structure that is observed in many markets is often a

consequence of the spike part of the price process having a stronger mean-reversion

rate than the so-called base-signal process that accounts for the normal variations.

The inclusion of stochastic volatility in the modeling framework helps to replicate

the time-series properties of the prices, such as a leptokurtic distribution, and to

accurately estimate the jump part of the model. As we will shall see in our application

to the UK natural gas market, failing to include stochastic volatility will drive up the

expected number of jumps, contradicting the fact that jumps are supposed to be rare

events.

In the univariate model, proposed in this chapter, the logarithmic spot price

is given as the sum of three factors. The first factor is a deterministic mean-level

function that models the, possibly trending, seasonally varying mean-level of the

logarithmic spot price. The second factor captures the base-signal part of the price

process and will be modeled using a Gaussian Ornstein-Uhlenbech (OU) process

with stochastic volatility. The third factor is a non-Gaussian OU process that accounts

for the spike behavior. We will consider both a continuous and a purely jump-driven

specification of the volatility process. Instead of using various filtering techniques

to split the base-signal and spike process in a first step before estimating the model

parameters, the chapter contributes to the existing literature by proposing a method

3.1. INTRODUCTION 87

for estimating the model in one step using the particle MCMC (PMCMC) methods

developed in Andrieu et al. (2010). The estimation procedure therefore allows for an

investigation of the interplay between the specifications of base-signal and the spike

process. In Green and Nossman (2008), a similar model is also estimated in one step

using MCMC. In contrast to our approach, the authors in Green and Nossman (2008)

condition on future values when computing the posterior distribution, rendering in-

sample forecasts a useless tool for model evaluation. The authors also have to include

a Brownian component in the specification of the spike factor in order to ensure that

the factor has an absolutely continuous distribution when conditioning on the jumps,

thereby simplifying the MCMC estimation. Furthermore, our estimation approach,

using the particle marginal Metropolis-Hastings sampler, has the great advantage of

being able to accommodate different volatility specifications, including pure jump

processes. This will enable us to investigate if the different volatility specifications

have the same impact on the filtered spike process and if the volatility specification

impacts the overall fit of the model. The method can also handle non-Markovian

models, which is essential for effective sampling of the spike process in our proposed

two-factor model. Finally, one of the outputs of the particle filter is the likelihood,

which makes computation of Bayes factors and model comparison straightforward.

The stepping stone for many of the spot price models found in the literature is

the mean-reverting one-factor Schwartz model from Schwartz (1997), where the spot

price is defined as the exponential of a Gaussian OU process. This model was further

extended to include a deterministic seasonality factor in Lucia and Schwartz (2002).

In Benth, Ekeland, Hauge, and Nielsen (2003), the geometric spot price model from

Lucia and Schwartz (2002) are generalized to allow for jumps. A special case of this

model, solely based on the NIG distribution is applied to oil and gas in Benth and

Saltyte Benth (2004). Another special case of the model from Benth et al. (2003) is the

jump diffusion model, which have been used for modeling electricity spot prices in

Cartea and Figueroa (2005) and Benth, Kiesel, and Nazarova (2012).

In Benth, Kallsen, and Meyer-Brandis (2007) an arithmetic multi-factor model

based on non-Gaussian OU processes is suggested. The model is able to capture

both the spike behavior and multi-scale autocorrelation structure of spot prices, and

positivity of prices are ensured by letting the non-Gaussian OU processes be driven

by subordinators. The possibility of negative prices in arithmetic models can also be

viewed as an advantage since negative prices can actually occur in some energy mar-

kets, such as the electricity market. Arithmetic models are also advantageous when it

comes to pricing of forward contracts with delivery being made over a period instead

of at a single point in time. Due to the affine structure of the spot price in arithmetic

models, forward prices become more analytically tractable than in geometric models.

We will instead consider geometric models, as these are a natural extension of the

GBM used in the financial markets. Besides, derivative pricing is not the focus of this

chapter. It is also easier to model negative spikes, a feature often observed in gas


spot prices, and ensure prices above a certain level (for instance zero) when using

geometric models. The arithmetic multi-factor model from Benth et al. (2007) are

estimated in Meyer-Brandis and Tankov (2008) and Benth et al. (2012) by splitting

the spike process and base-signal process using a nonparametric method called

hard-thresholding. In Klüppelberg, Meyer-Brandis, and Schmidt (2010), the filtering

of the two mean-reverting processes are instead performed using a method based on

extreme value theory. A two-factor extension of the geometric jump diffusion model

from Cartea and Figueroa (2005), with a different jump size distribution, can be found

in Hambley, Howison, and Kluge (2009), but no estimation method is suggested in

the paper.

The inclusion of stochastic volatility in the models used for modeling energy

markets, was among others suggested by Geman (2005), where a Heston stochastic

volatility extension of the Schwarz model is considered, but not estimated. In Green

and Nossman (2008) a two-factor extension of the Schwartz model with Heston

stochastic volatility is proposed and fitted to electricity spot prices using Markov

Chain Monte Carlo (MCMC) techniques. A jump-driven specification of the volatility

process is considered in Benth (2011), where the geometric one-factor model from

Lucia and Schwartz (2002) is augmented with stochastic volatility given by the sum of

non-Gaussian OU processes. However, in Benth (2011) a one-factor volatility process

is utilized when the model is fitted to UK natural gas spot prices. The stochastic

volatility model from Benth (2011) is extended in Benth and Vos (2013a) to incorporate

spikes and a leverage effect, in a multidimensional setting, allowing for the joint

modeling of several commodities. The model in Benth and Vos (2013a) only allows for

positive jumps in the spot price, as the non-Gaussian OU factors entering the model

are driven by subordinators. Estimation of the model from Benth and Vos (2013a) is

however still an open question. The estimation method detailed and employed in this

chapter in a univariate setting, also has potential for usage in the multi-dimensional

setup.

With the pure jump driven specification of the volatility process, our proposed

model is a univariate version of the geometric model from Benth and Vos (2013a),

with the extra flexibility of accommodating both negative and positive spikes. We will

use the tempered stable OU process as our jump driven volatility specification. When

the continuous specification is considered instead, our model closely resembles the

model proposed in Green and Nossman (2008), where a CIR specification is used to

describe the volatility dynamics. Instead, we will use a logarithmic OU process as our

continuous specification of the volatility process.

The chapter is organized as follows: In Section 2 our proposed model and the

benchmark models are presented. Section 3 describes the setup of our empirical

application and the PMCMC estimation method is outlined in Section 4. In Section 5

the estimation results are presented and various methods for model comparisons

are performed. Section 6 offers a discussion of possible extensions of the model and

3.2. MODEL DESCRIPTIONS 89

estimation procedure. Final remarks are given in Section 7.

3.2 Model Descriptions

In this section we will describe our proposed model. Let S(t ) denote the spot price at

time t . The dynamics of the spot price will be described using the following geometric

OU-based factor model, augmented with stochastic volatility:

d logS(t ) = d logΛ(t )+d X (t )+dY (t ),

d X (t ) =−αx X (t )d t +σ(t )dB(t ),

dY (t ) =−αy Y (t )d t +d I (t ).

The first factor,Λ(t ), is a deterministic function that accounts for the possible trend

and seasonal patterns of the data. The specification ofΛ(t ) and the procedure used

for detrending and deseasonalizing the spot prices will be described in Section 3.

In this section, the focus will be on modeling the detrended and deseasonalized

process: Z (t) = logS(t)− logΛ(t)∆= X (t)+Y (t), discretized using a time interval of

length ∆= 1, day to match the data in our empirical application. The process, X (t ), is

a Gaussian OU process with stochastic volatility, σ(t), that models the continuous

variations in the logarithmic spot price and will be interpreted as the base-signal

process. The last factor, Y (t), is non-Gaussian OU process with a pure jump Lévy

process as background driving Lévy process (BDLP). Y (t ) will be interpreted as the

spike process. The different mean-reversion rates, αx > 0 and αy > 0, make multi-

scale autocorrelation possible and help to reproduce the time series properties of

the data, where a faster mean-reversion rate is observed for the spike process. The

specification of X (t) and Y (t) are given below, and in the following subsections

we present our benchmark models, that are all nested in the model described in

Subsection 3.2.1.

3.2.1 The TF-SVJ Model

In our proposed two-factor model with stochastic volatility and jumps, labeled TF-

SVJ, and the benchmark models we will use I (t) = N (t) as our BDLP, where N (t) is

a compound Poisson Process with intensity parameter λJ and normally distributed

jump sizes. The detrended and deseasonalized spot price, Z (t ), will then solve

d Z (t ) = d X (t )+dY (t )

=−αx X (t )d t −αy Y (t )d t +σ(t )dB(t )+d N (t ).

If we assume that at most one jump occurs per day and approximate the variance of

the increments in the AR(1) representation of X (t ),∫ t+1

t e−2αx (t+1−s)σ2(s)d s, by

σ2(t )∫ t+1

te−2αx (t+1−s)d s =σ2(t )

1−e−2αx

2αx,


the discretized model becomes

Zt+1 = X t+1 +Yt+1

X t+1 = e−αx X t +εt+1

Yt+1 = e−αy Yt +ξt+1 Jt+1

where εt+1 ∼ N(0,σ2(t ) 1−e−2αx

2αx

), Jt+1 ∼ Ber noull i (λJ ) and ξt+1 ∼ Nor mal (µJ ,σ2

J ).

A similar model was suggested in Green and Nossman (2008), using a CIR specifi-

cation of the stochastic volatility process and including an additional independent

Brownian component in the spike process, Y (t). We consider both a purely jump-

driven specification of the volatility process and a continuous specification. Note

that the base-signal, X (t), will be continuous regardless of the specification of the

volatility process and the process Y (t ) will therefore account for the spikes. The two

specifications of the volatility process σ2(t ) are given below.

Lévy-driven volatility with tempered stable marginals

The first specification of the volatility process under consideration is the case where

σ2(t ) is a tempered stable (TS) OU process. That is, σ2(t ) solves

dσ2(t ) =−λσ2(t )d t +dL(λt ),

and the marginal distribution of σ2(t ) follows a tempered stable distribution, σ2(t ) ∼T S(κ,δ,γ). This process can be simulated resursively from

σ2(t +1) = e−λσ2(t )+e−λ∫ 1

0eλudL(λu).

It is shown in Barndorff-Nielsen and Shephard (2001) that the BDLP of the TS-OU

is the sum of a TS Lévy process and a compound Poisson process. We use Rosinski’s

method to simulate the infinite activity part, and the innovations can be sampled

using the following expression:

e−λ∆∫ ∆

0eλudL(λu)

d=∞∑

i=1e(−λ∆ri ) min

(aiκ

Aλ∆

)−1/κ

,ei v1/κi

+

N (λ∆)∑i=1

e−λ∆r∗i ci ,

where A = δ2κκ2/Γ(1−κ) and B = 12γ

1/κ. The sequences of random variables, ri ,

ai , ei , vi , r∗i and ci are all mutually independent. ri , vi and r∗

i are i.i.d.

standard uniforms, and ei are i.i.d. exponential with mean 1/B , and ci are i.i.d.

Gamma with shape parameter (1−κ) and scale parameter 1/B . The a1 < ... < ai < ...

are the arrival times of a Poisson process with intensity 1. Finally, N (λ∆) is a Poisson

random variable with mean λ∆δγκ. Further, we have

σ2(0)d=

∞∑i=1

min

(aiκ

A0

)−1/κ

,ei v1/κi

,


where A0 = δ2κκ/Γ(1−κ) .

The infinite sums are dominated by the first few terms, as shown in Barndorff-

Nielsen and Shephard (2001). We truncate the sum to its first 100 terms as in Andrieu

et al. (2010). As a special case, the volatility process becomes an Inverse Gaussian

(IG) OU process when κ= 0.5. In Benth (2011) a Gaussian OU process with stochastic

volatility following a IG-OU process is fitted to the logarithm of natural gas spot prices

in the UK.

Logarithmic volatility

The second volatility specification we consider is a continuous specification, where

we assume that the logarithmic volatility, h(t) = logσ2(t), follows a Gaussian OU

process

dh(t ) =−αh(h(t )−µh)d t +σhdBh(t ),

where Bh(t ) and B(t ) are two independent Brownian motions.

3.2.2 The SF-J Model

In this subsection and the following ones, the benchmark models used for compar-

ison are outlined. All the models are nested in the TF-SVJ model and all of them

describe the dynamics of the deseasonalized logarithmic spot price Z (t ).

In the first, and most simple, benchmark model we consider a single factor model

with jumps (and constant volatility), which we label the SF-J model (Single factor

model with Jumps). That is, we assume αx = αy = α and σ(t) = σ and obtain the

following model

d Z (t ) =−αZ (t )d t +σdB(t )+d N (t ).

If we assume that there is at most one jump in a day, we get the following discretized

model

Zt+1 = e−αZt +εt+1 +ξt+1 Jt+1

where εt+1 ∼ N(0, σ

21−e−2α

2α


J ). This

model resembles the model proposed in Cartea and Figueroa (2005), with the only

difference being the jump size distribution. In Cartea and Figueroa (2005) the authors

instead use a log-normal jump distribution.

3.2.3 The SF-SV Model

The next benchmark model we consider, is a single factor Gaussian OU process with

stochastic volatility, corresponding to the assumptions αx =αy =α and I (t ) = 0. This


model will be labeled the SF-SV model since it is a single factor model with stochastic

volatility (and no jumps). The detrended and deseasonalized logarithmic spot price

now solves

d Z (t ) =−αZ (t )d t +σ(t )dB(t ),

and

Z (t +1) = e−αZ (t )+∫ t+1

tσ(s)e−α(t+1−s)dB(s)

∼ N

(e−αZ (t ),

∫ t+1

te−2α(t+1−s)σ2(s)d s

).

If we again approximate∫ t+1

t e−2α(t+1−s)σ2(s)d s by σ2(t)∫ t+1

t e−2α(t+1−s)d s, the

discretized model becomes the AR(1) model

Zt+1 = e−αZt +εt+1,

where εt+1 ∼ N(0,σ2(t ) 1−e−2α

2α

). We consider the same two specifications of σ(t ) as in

the full model, the TF-SVJ model. With the TS-OU specification, the SF-SV model is a

special case of the model considered in Benth (2011).

3.2.4 The SF-SVJ Model

We now consider adding jumps to the model specification from the previous subsec-

tion, resulting in a single factor version of our proposed model, the TF-SVJ model.

This model will therefore be labeled as the SF-SVJ model. Using the same assump-

tions and approximations as in the SF-J and SF-SV models, we obtain the following

discretized model

Zt+1 = e−αZt +εt+1 +ξt+1 Jt+1

where εt+1 ∼ N(0,σ2(t ) 1−e−2α

2α

), Jt+1 ∼ Ber noul l i (λJ ) and ξt+1 ∼ Nor mal (µJ ,σ2

J ).

The specifications of the volatility process are again the tempered stable OU

process and the logarithmic volatility model.

3.2.5 The TF-J Model

All the benchmark models considered so far have been single factor models, in

the sense that the base-signal and spike part have had the same mean-reversion

parameter α. The restriction, αx =αy , also has the important implication that Z (t )

will be a Markov process. We now consider extending our first benchmark model,

the SF-J model, by relaxing the assumption αx =αy , and instead allow each factor to


have a separate mean-reversion rate. The resulting two factor model with jumps will

be labeled the TF-J model. Z (t ) will then solve

d Z (t ) =−αx X (t )d t −αy Y (t )d t +σdB(t )+d N (t ).

Once again we assume that there is at most one jump a day and we get the following

discretized model

Zt+1 = X t+1 +Yt+1

X t+1 = e−αx X t +εxt+1

Yt+1 = e−αy Yt ++ξt+1 Jt+1

where εt+1 ∼ N(0,σ2 1−e−2αx

2αx


J ).

3.2.6 Model Overview

In Table 3.1 we give an overview of our proposed model and the benchmark models

described in the previous subsections.

Table 3.1. Model Overview.

Stoc. Const. Jumps Jumps Two-factor

vol. vol. in vol. (αx 6=αy )

TF-SVJTS ! ! ! !

TF-SVJLog ! ! !

SF-J ! !

SF-SVTS ! !

SF-SVLog !

SF-SVJTS ! ! !

SF-SVJLog ! !

TF-J ! ! !


3.3 Data Description and Initial Analysis

This section describes the data used in our empirical investigation and the detrending

and deseasonalization of the data. We will fit our model and the benchmark models

from the previous section to a time series of daily UK gas spot prices ranging from

September 11, 2007 to February 10, 2014. The data is collected from Bloomberg1 and

report day-ahead gas spot prices collected at the virtual hub NBP (Natural Balancing

Point) for trading days (weekdays) in the sample period. This leaves us with a total of

1620 daily price quotes. There are no missing observations in the data set and the log

spot price is depicted in Figure 3.1.

2008 2009 2010 2011 2012 2013 2014

2.5

3

3.5

4

4.5

logarith

mic

price

time

logarithm of gas spot price

Trend and Seasonality

Figure 3.1. The logarithm of the daily day-ahead UK gas spot price.

From Figure 3.1, the presence of both positive and negative spikes become evi-

dent. Positive spikes are usually caused by unpredicted weather changes, yielding

an increase in demand for gas used in power production. In the UK market, supply

uncertainty is also starting to play a role as the UK are becoming more and more de-

pendent on gas import. The dependence on import from mainland Europe, through

capacity constrained pipelines, can cause a slower reaction to an increase in demand,

and in turn cause a spike in the price process. The negative spikes are often a conse-

quence of poor anticipation of market-wide gas storage levels. Storage is costly and

1Code: NBPGDAHD index

3.3. DATA DESCRIPTION AND INITIAL ANALYSIS 95

−0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.60

100

200

300

400

500

600Histogram of log−returns compared to Normal distribution

daily changes in logarithm of gas spot prices

frequency

log−return

Gaussian

Figure 3.2. Histogram of daily changes in the logarithm of gas spot prices.

can not fully reconcile the variable seasonal demand for gas with the more constant

rate of production. Low inventory levels can also increase price volatility and the risk

of spike occurrence.

Like in most of the literature on commodity modeling, we start our investigation

of the data characteristics by fitting a deterministic trend and seasonality function

to the data. However, before this can be implemented we need to check for outliers

in the data as these might influence the parameters in the fitted trend and seasonal

components. A visual inspection of Figure 3.1 already suggested the presence of

outliers, i.e. the large price spikes. From the histogram in Figure 3.2 it becomes clear

that the daily changes in the logarithmic spot price are not normally distributed,

but instead follow a leptokurtic distribution. To detect the possible outliers in data

that are not normally distributed, the same approach as in Chapter 5 of Benth et al.

(2008) is employed. Let the daily change in the logarithmic spot price from day t −1

to day t be denoted by ∆st = log(St )− log(St−1) for t = 2, . . . ,1620. Now define the

interquartile, IQR , as the difference between the upper quartile Q3 and lower quartile

Q1 of the time series ∆st . Assuming that the first observation in the data is not an

outlier, log(St−1) will be labeled as an outlier whenever∆st is larger than Q3+3× IQR

or smaller than Q1 −3× IQR. This procedure resulted in 56 detected outliers. The


Table 3.2. Fitted parameter values and std. errors for logΛ(t ).

a0 a1 a2 a3

3.6494 0.0003 0.0761 100.08(0.0139) (1.49e-05) (0.0098) (5.1467)

detected outliers are then replaced by the average of the two closest non-outlier

observations.

Assuming 250 trading days a year, the trend and seasonal patterns in the logarith-

mic spot prices are modeled by the following function

logΛ(t ) = a0 +a1t +a2 cos(2π(t −a3)/250).

The function represents the average level around which the gas prices fluctuate, and

consists of a linear trend describing the inflation in the natural gas prices and a

seasonal component modeling the seasonal variation over the year. The function is

fitted to the logarithmic spot prices, with the replacement of the outliers, using the

nlinfit function in MATLAB. The results are reported in Table 3.2 and the fitted

seasonality function is depicted in Figure 3.1.

All the estimates in Table 3.2 are significant at the 5% level. We also fitted a

function taking weekly, monthly and quarterly effects into account, but these effects

were not significant at the 5% level and will be ignored going forward. The detrended

and deseasonalized logarithmic spot price, Z (t ), can now be computed by inserting

back the detected outliers and subtracting the fitted logΛ(t ) function. The resulting

time series is depicted in Figure 3.3 and will serve as the input for our estimation

method outlined in the following section.

3.4 Estimation Method

In this section, the Bayesian techniques underlying our estimation method will be

carefully described. Our model is able to account for important features of the spot

price dynamics, such as stochastic volatility, jumps, and separate mean reversion

rates for the base-signal and the spike process. The flexibility of the model also poses

many challenges to the estimation. First, for models with stochastic volatility, eval-

uating the exact likelihood involves intractable high dimensional integration since

volatility is latent. By treating the stochastic volatility as a state variable, these models

have a nonlinear state space representation, where the measurement equation de-

scribes how the logarithmic price changes given state variables, and the transition

equation describes the evolution of the states. Jacquier, Polson, and Rossi (1994)

developed Bayesian MCMC methods for conducting exact inference in stochastic

volatility models. Since then, Bayesian methods have been extensively applied to

3.4. ESTIMATION METHOD 97

2008 2009 2010 2011 2012 2013 2014−1.5

−1

−0.5

0

0.5

1detr

ended a

nd d

eseasonaliz

ed log o

f gas s

pot price

time

Figure 3.3. Detrended and deseasonalized logarithm of gas spot prices.

stock return models, including jump-diffusion models, see for example Eraker, Jo-

hannes, and Polson (2003). Second, contrary to stock prices, energy prices tend to

revert to a long run mean determined by the marginal cost of production. When

jumps are present, they appear as spikes, meaning that prices revert to the mean

level fast after a jump has occurred. Green and Nossman (2008) propose a MCMC

algorithm to handle energy models with these special features.

The third complication comes when we consider stochastic volatility that is

driven by a pure jump process. In this case, the volatility process and the parameters

governing its dynamics can be highly correlated in their posterior distributions, which

results in extremely slowly mixing chains in the above mentioned MCMC algorithms.

This problem is referred to as over-conditioning. Roberts, Papaspiliopoulos, and

Dellaportas (2004) suggest a reparameterization to reduce the correlation. Griffin

and Steel (2006) propose an algorithm with dependent thinning and reversible jump

MCMC. However, these procedures can not be easily generalized to the multi-factor

models that are popular with commodity prices.

We adopt the particle MCMC methods introduced in Andrieu et al. (2010), in

particular the particle marginal Metropolis-Hastings (PMMH) sampler. PMMH al-

gorithms can be easily adapted to accommodate different volatility specifications,

including both pure jump OU processes and the logarithmic Gaussian OU process.


Furthermore, it can be applied to non-Markovian models, where the measurement

density or the transition density may depend on the entire past of the latent process.

This allows us to use a non-Markovian representation of the multi-factor model and

is essential for effective sampling of the spike process. Last but not least, we use the

likelihood obtained from the algorithm to compute Bayes factors and conduct model

comparison.

As the name suggests, PMCMC has two components: a particle filter or sequential

Monte Carlo (SMC) step and a MCMC step. Specifically, the PMMH sampler employs

SMC to approximate the likelihood and latent variables conditional on the model

parameters, and then apply MH algorithms to obtain the joint distribution of the

parameters and the states, given the observations. We extend the standard PMMH

algorithm in two aspects. First, for models with jumps or spikes, advanced SMC

techniques need to be employed to alleviate a problem known as sample impoverish-

ment. We propose to deal with this issue by marginalizing out some latent variables,

a technique called Rao-Blackwellization; see Doucet, Freitas, Murphy, and Russell

(2000). Our approach is closely related to auxiliary particle filters developed by Pitt

and Shephard (1999) and illustrated in Johannes, Polson, and Stroud (2009). Second,

it is costly to evaluate the likelihood using SMC, and we therefore utilize adaptive

algorithms to improve the efficiency of the Metropolis-Hasting sampler; see Andrieu

and Thoms (2008) for a review on adaptive MCMC. The rest of this section focus on

the estimation of the model in Section 3.2.1, as it is the most complex model, and

nests all the benchmark models.

3.4.1 Sequential Monte Carlo

In the state space representation of our proposed model, the TF-SVJ model, the

observed price process, Zt , is the sum of two latent processes X t and Yt , without

any measurement error. We can not apply particle filters directly in this case since

there is no measurement density. One solution is to add a small Gaussian error term

to the measurement equation. This is equivalent to assuming that Yt is a jump-

diffusion instead of pure jump process, and it is comparable to the model in Green

and Nossman (2008). However, this would still be problematic if particle filters with

blind proposals, also called bootstrap filters, are implemented. If the variance of the

measurement errors is small compared to the variance of the latent process, then

the observations are informative about the latent process and bootstrap filters will

perform poorly, see Pitt, dos Santos Silva, Giordani, and Kohn (2012) for example.

We propose a different approach for solving this problem. Specifically, we use the

following representation of Model 1

Zt+1 = e−αx Zt +Yt+1 −e−αx Yt +εt+1,

Yt+1 = e−αy Yt +ξt+1 Jt+1.


Notice that this is no longer a Markovian state space model, in the sense that the

measurement density depends on both Yt and Yt+1, but we can use SMC methods to

evaluate the likelihood and simulate the states given the parameters. Let θ and K de-

note the parameters and the latent variables respectively, where Kt+1 = σ2(t ), Yt+1.

SMC methods start with approximating the continuous filtering density pθ(K1:t |Z1:t )

by a discrete distribution made of weighted random samples called particles. Given

particles and associated weights, K (i )1:t ,ω(i )

t Ni=1, that approximate pθ(K1:t |Z1:t ), SMC

obtain samples from pθ(K1:t+1|Z1:t+1) and compute pθ(Zt+1|Z1:t ) sequentially. Using

Bayes Theorem,

pθ(K1:t+1|Z1:t+1) = pθ(Zt+1|Kt+1, Z1:t ,K1:t )pθ(Kt+1|K1:t )

pθ(Zt+1|Z1:t )pθ(K1:t |Z1:t ), (3.1)

the density of interest pθ(K1:t+1|Z1:t+1) can be sampled using importance sam-

pling techniques. The basic SMC choose the proposal density (importance density)

gθ(K1:t+1) to be pθ(Kt+1|K1:t )pθ(K1:t |Z1:t ), i.e., the new particles K (i )t+1 are propagated

from K (i )t using only transition densities and are “blind” to the observations. The

importance weights are given by the ratio of the target density and the proposal

density. From equation (3.1), it is therefore easily seen that the weights for the par-

ticles K (i )1:t+1 are proportional to ω(i )

t+1ω(i )t , where the incremental weights ω(i )

t+1 are

simply given by pθ(Zt+1|K (i )t+1, Z1:t ,K (i )

1:t ). The likelihood, pθ(Zt+1|Z1:t ), is the normal-

izing constant for the particles and is equal to∑N

i=1 ω(i )t+1ω

(i )t . After normalizing the

weights ω(i )t+1 = ω(i )

t ω(i )t+1/pθ(Zt+1|Z1:t ), the particles K (i )

1:t+1,ω(i )t+1N

i=1 now approxi-

mate pθ(K1:t+1|Z1:t+1).

If the variance of the weights is large, the particles yield a worse approximation to

the continuous distribution pθ(K1:t+1|Z1:t+1), as the number of effective particles has

decreased. In bootstrap filters, the incremental weights are simply the measurement

density, and the algorithm performs better when the states are persistent, and when

the observations are less informative about the states than the transition density. This

is not the case for the spike process Yt+1. If there is a jump at time t +1, and Yt+1 is

propagated from a blind proposal, the measurement density, pθ(Zt+1|Kt+1, Z1:t ,K1:t ),

will peak at a few values, resulting in only a few particles having prominent weights.

To alleviate this problem, one needs to adapt the proposal density of Yt+1, or in other

words, to incorporate Zt+1 in the proposal density.

We employ the Rao-Blackwellization technique, as the innovations in the spike

process can be integrated out conditional on other state variables. The vector Kt+1

has two components, the stochastic volatility,σ2(t ), and spike process, Yt+1. Since the

innovation in Yt+1 is assumed to be Bernoulli distributed jump times with normally

distributed jump sizes, pθ(Yt+1|Zt+1,σ2(t ), Z1:t ,K1:t ) is analytically tractable. Hence,

we can rewrite equation (3.1) as

pθ(K1:t+1|Z1:t+1)

=pθ(Yt+1|Zt+1,σ2(t ), Z1:t ,K1:t )pθ(Zt+1|σ2(t ),K1:t , Z1:t )pθ(σ2(t )|σ2(t −1))

pθ(Zt+1|Z1:t )pθ(K1:t |Z1:t ),


and choose the following proposal density,

gθ(K1:t+1|Z1:t+1) = pθ(Yt+1|Zt+1,σ2(t ), Z1:t ,K1:t )pθ(σ2(t )|σ2(t −1))pθ(K1:t |Z1:t ).

Here, the stochastic volatility, σ2(t ), is still propagated from its transition density, but

Yt+1 is adapted to Zt+1 as we can sample directly from pθ(Yt+1|Zt+1,σ2(t ), Z1:t ,K1:t ).

In particular, we draw Jt+1 and ξt+1 from

pθ(Jt+1|Zt+1,σ2(t ), Z1:t ,K1:t ) =pθ(Zt+1|Jt+1,σ2(t ), Z1:t ,K1:t )pθ(Jt+1)

pθ(Zt+1|σ2(t ),K1:t , Z1:t ),

and

pθ(ξt+1|Jt+1, Zt+1,σ2(t ), Z1:t ,K1:t ) =pθ(Zt+1|ξt+1, Jt+1,σ2(t ), Z1:t ,K1:t )pθ(ξt+1)

pθ(Zt+1|σ2(t ), Z1:t ,K1:t )pθ(Jt+1),

then let Yt+1 = e−αy Yt +ξt+1 Jt+1.

The incremental weights ω(i )t+1 = pθ(Zt+1|σ2(t ),K1:t , Z1:t ) do not depend on Yt+1

as ξt+1 and Jt+1 are integrated out. As before, the likelihood pθ(Zt+1|Z1:t ) equals∑Ni=1 ω

(i )t+1ω

(i )t .

If importance sampling is carried out sequentially, weights will degenerate and

only a few particles will have significant weights after a few iterations. The degeneracy

grows exponentially in time and makes particle approximations unreliable. SMC

uses a resampling step to deal with this problem. The particles K (i )1:t are resampled

with replacement according to their normalized weights ω(i )t , using the multinomial

distribution ω(i )t N

i=1 for instance. Particles with higher weights will be duplicated

and particles with lower weights will be eliminated. After resampling the particles,

they all have equal weights.

The likelihood computed from SMC is random, and the variance of the likelihood,

which is related to the variance of the weights, greatly impacts the acceptance rate in

the MCMC step. The Rao-Blackwellization technique described above is the first step

we adopt to reduce the variance. Second, the resampling step introduces additional

Monte Carlo error, and we implement residual resampling as it has smaller variance

than multinomial resampling, see Douc and Cappe (2005). Also, it satisfies the unbi-

asedness condition, meaning that the expected number of particles is proportional to

the weights. Lastly, the variance of the likelihood decreases as the number of particles

increases. However, for limited computation time, one faces a trade-off between

the number of MCMC iterations to run and the number of particles to use in each

iteration. See Pitt et al. (2012) for a guide on how to choose the optimal number of

particles.

3.4.2 MCMC

SMC methods approximate the likelihood and state variables conditional on the

parameters, but we are more interested in the joint distribution of the parameters and


states. Notice that p(θ,K1:T |Z1:T ) can be decomposed into p(θ|Z1:T )p(K1:T |θ, Z1:T ).

The PMMH sampler suggest the following proposal density,

q(θ,K1:T |Z1:T ) = q(θ|θg )p(K1:T |θ, Z1:T ).

The draw, θg+1, from q(θ|θg ), therefore has the simplified acceptance probability,

αg+1 = min

p(Z1:T |θg+1)p(θg+1)q(θg |θg+1)

p(Z1:T |θg )p(θg )q(θg+1|θg ),1

, (3.2)

where p(Z1:T |θ) can be replaced by its particle approximation, as shown in Andrieu

et al. (2010). If the marginal distributions of the states are of interest, we also sample

K g+11:T from p(K1:T |θg+1, Z1:T ) in the SMC step and accept it jointly with θg+1.

The choice of proposal density q(θ|θg ) is another crucial element in determining

the efficiency of the MCMC algorithm. We use a random-walk proposal, θg+1 ∼TN(θg ,βgΣg ), where TN denotes truncated normal, as some of the parameters have

finite support. In particular:

q(θg |θg+1) = f n(θg ;θg+1,βgΣg )

FN (θu;θg+1,βgΣg )−F N (θl ;θg+1,βgΣg )

q(θg+1|θg ) = f n(θg+1;θg ,βgΣg )

FN (θu;θg ,βgΣg )−F N (θl ;θg ,βgΣg ), (3.3)

where f N and F N denote pdf and cdf of the multivariate normal distribution respec-

tively, θu is the upper limit of parameters, and θl is the lower limit of parameters.

The ratio of proposal densities in equation (3.2) simplifies to the ratio of normalizing

constants as f n is symmetric:

q(θg |θg+1)

q(θg+1|θg )= FN (θu;θg ,βgΣg )−F N (θl ;θg ,βgΣg )

FN (θu;θg+1,βgΣg )−F N (θl ;θg+1,βgΣg ). (3.4)

Notice that if all parameters have support on the whole real line, the proposal density

is symmetric, and the ratio of proposal densities becomes 1.

Gelman, Roberts, and Gilks (1996) show that the efficiency of the random-walk

MH algorithm is maximized when Σg is the covariance matrix of the target poste-

rior distribution, and the scaling factor βg is approximately 2.382/d , where d is the

number of parameters.

In practice, we do not know Σg a priori. Adaptive MCMC allows us to learn Σg “on

the fly”, using previous updates in the chain to construct this covariance. The resulting

chain θg+1Gg=1 is not Markovian as the proposal density depends on the history of θ,

and ergodicity of the chain can be perturbed. Haario, Saksman, and Tamminen (2001)

propose an adaptive Metropolis (AM) algorithm, using the whole history of the chain,

or any increasing part of the past, which leads to vanishing adaptation and preserves

the correct ergodic property. We adopt the AM algorithm with global adaptive scaling

as in Andrieu and Thoms (2008). When the chain is starting, Σg might be a poor


initial guess, resulting in too many or too few rejections. Andrieu and Thoms (2008)

suggest adapting the scaling factor βg using the the acceptance probability in (3.2).

If the acceptance probability is higher than the optimal acceptance probability, βg

increases, and vice versa. The optimal acceptance rate is chosen to be around 24% as

suggested in Gelman et al. (1996).

Bayesian inference requires specifying prior distributions for the parameters. For

most of the model parameters, we choose diffuse but proper priors. For the jump

process, we use a prior that elicit our belief that jumps are large compare to the

base-signal. Specifically, we use a gamma distribution for the standard deviation of

jump sizes which places lower probability on small jumps.

3.4.3 PMMH Algorithm

We outline the algorithm for the TF-SVJ model in this subsection:

1. For g = 1, ...,G , where G is the number of MCMC iterations, sample θg+1 ∼T N (θg ,βgΣg ), then run the following SMC algorithm to obtain p(Z1:T |θg+1))

and K g+11:T :

a) sample σ2(0, i ) and Y (i )1 from their stationary distribution.

i. compute

ω(i )1 =

pθ(

Z1|Y (i )1 ,σ2(0, i )

)N

for i = 1, ..., N , where N is the number of particles.

ii. obtain likelihood from p(Z1) =∑Ni=1ω

(i )1 , and compute normalized

weights: ω(i )1 = ω(i )

1p(Z1) .

b) at t = 1, ...,T −1

i. sample the index a(i )t for i = 1, ..., N , using ωt and set ωt = 1

N .

ii. sample σ2(t , i ) ∼ pθ(σ2(t )|σ2(t −1, a(i )t ))

and Y (i )t+1 ∼ pθ(Yt+1|Zt+1,σ2(t , i ),Y

a(i )t

t ).

iii. compute the incremental weights: ω(i )t+1 = pθ(Zt+1|σ2(t , i ),Y

a(i )t

t , Z1:t ).

iv. obtain the likelihood: pθ(Zt+1|Z1:t ) =∑Ni=1 ω

(i )t+1ω

(i )t .

v. normalize the weights ω(i )t+1 =

ω(i )t ω(i )

t+1∑Ni=1 ω

(i )t ω(i )

t+1

c) at t = T

i. obtain p(Z1:T |θg+1)) = pθ(Z1)∏T−1

t=1 pθ(Zt+1|Z1:t ).

ii. use ωT and a1:T to draw a realization of states K g+11:T .


2. accept θg+1 and K g+11:T with probability:

αg+1 = min

p(Z1:T |θg+1))p(θg+1)q(θg |θg+1)

p(Z1:T |θg )p(θg )q(θg+1|θg ),1

,

where p(Z1:T |θg+1)) is computed from the SMC algorithm above, p(θ) is the

prior density of the model parameters which we specify in Section 5, and q is

the truncated normal proposal density. The ratio of proposal densities is given

in Equation (3.4). If rejected, we set θg+1 and K g+11:T equal to θg and K g

1:T .

3. update the scaling factor and the covariance matrix for the proposal density:

νg+1 = 1/(g +1)0.5

logβg+1 = logβg +νg+1(αg+1 −α∗

)µg+1 =µg +νg+1(θg+1 −µg )

Σg+1 =Σg +νg+1((θg+1 −µg

)(θg+1 −µg

)T −Σg)

.

3.4.4 Model Comparison

The estimation procedure is easily adapted to all the models we considered in Section

2. The question remains, which of the models fits the data better? Specifically, is

stochastic volatility important? Which volatility process is more suitable for the UK

Gas price data? What is the role of jumps? Is it necessary to have different mean

reversion rate for the spike process and the base-signal? To address these important

questions, we estimate a large set of models and conduct an extensive model compar-

ison. For nested models, we carry out model specification tests using likelihood ratio

statistics. We also compute Bayes factors as models with tempered stable volatility

and logarithmic volatility are not nested.

Given two competing models, say the TF-SVJTS model and the TF-SVJLog model,

the Bayes factor is then the ratio of the probability of each model given data, i.e.,

BF = p(TF-SVJTS|Z )/p(TF-SVJLog|Z ). If we assume that the competing models are

equally probable a priori, the Bayes factor can be expressed as the posterior odds

ratio: BF = p(Z |TF-SVJTS)/p(Z |TF-SVJLog). The density p(Z |M) is termed marginal

likelihood, as it is the likelihood of data under model M obtained by marginalizing

over the parameters in model M :

p(Z |M) =∫

p(Z |θ, M)p(θ|M)dθ, (3.5)

where p(θ|M) is the prior density of parameters in model M .

We use the output from the PMMH algorithm to compute the Bayes factors. The

algorithm produces p(Z |θg , M)Gg=1, where θg are draws from the posterior density

p(θ|Z , M). In equation (3.5), the integration is over the prior density of θ. Newton and


Raftery (1994) propose several estimators to compute marginal likelihood based on

importance sampling and Monte Carlo integration. We adopt the version which uses

a mixture of the prior and posterior as the importance density, yet does not require

further simulation from the prior. Given G samples of θ from the posterior, imagine

that δpG/(1−δp ) additional samples of θ are drawn from the prior, resulting in a

total of G/(1−δp ) samples from the mixture density δp p(θ|M)+ (1−δp )p(θ|Z , M).

Assume that the draws from the prior all have likelihood p(Z |θ, M) equal to their

expected value p(Z |M), we then obtain the following estimator,

p(Z |M) =δpG/(1−δp )+∑G

g=1 p(Z |θg , M)/

(δp(Z |M)+

(1−δp

)p(Z |θg , M)

)δpG/

((1−δp )p(Z |M)

)+∑G

g=1

(δp(Z |M)+

(1−δp

)p(Z |θg , M)

)−1 .

(3.6)

3.5 Estimation Results

We apply the PMMH algorithm from subsection 3.4.3 to the deseasonalized and

detrended logarithmic gas spot prices. We start with a preliminary run which uses

adaptive MCMC, then “freeze” the covariance matrix Σ and scaling factor β and run a

further 20000 iterations to get the posterior distributions of θ and K . The number of

particles for the logarithmic volatility specification is set to 4800, while the number of

particles for the model with tempered stable volatility is set to 1600 for computational

considerations. To conduct likelihood ratio test, we need a point estimate of θ, and

we choose both the mean and median of the posterior distribution of θ. To minimize

randomness from SMC, we use 300,000 particles to evaluate the likelihood at the

point estimates. Marginal likelihood p(Z |M) is computed from equation (3.6).

The estimation for the benchmark models from Section 3.2 follows similar proce-

dures. For models with jumps in the logarithmic price, we utilize Rao-Blackwellization

and integrate out the jump times and jump sizes analytically for the likelihood.2 For

models with two factors, we use the non-Markovian representation. For the SF-J

model, the likelihood can be obtained analytically, and we use the MH step for updat-

ing parameters.

The same prior is specified for models with overlapping parameters, and we as-

sume that the parameters are independent a priori and the joint prior is simply the

multiplication of prior distributions. In summary, the priors for the mean reversion

parameters are αx ∼G(1,1) and αy ∼G(1,1), where G denotes the Gamma distribu-

tion. In the TF-J model, we also imposeαy >αx to ensure that the mean reversion rate

for the spike process is larger than the mean reversion rate for base-signal. For stochas-

tic volatility with tempered stable marginals, we specify λ∼G(1,2), κ∼ Beta(10,10),

2For the TF-J model we use auxiliary particle filters. This is the case with “perfect adaptation”, andRao-Blackwellized particle filters and auxiliary particle filters only differs in the order of the sampling andresampling steps.

3.5. ESTIMATION RESULTS 105

δ∼G(1,p

50), and γ∼G(1,p

50). For logarithmic volatility,αh ∼G(1,1), µh ∼ N (−5,5)

and σ2h ∼G(1,2). For models with constant volatility, we choose σ2 ∼G(1,2). Finally,

the priors for jump parameters are µJ ∼ N (0,2), σJ ∼G(1.5,0.5), and λJ ∼G(1,10).

3.5.1 Parameter Estimates

The parameter estimates obtained from fitting the models to the detrended and

deseasonalized logarithmic gas spot prices are reported in Table 3.3. We also report

the log-likelihood evaluated at the mean and median of the posterior distribution of

the parameters, and the marginal log-likelihood computed using formula (3.6).

The parameter estimates for the simplest benchmark model, SF-J, indicate that

the model does not adequately capture the dynamics of the data. Like in Green

and Nossman (2008), failing to include stochastic volatility severely drives up the

estimated jump intensity. In our case, we find λJ = 0.2445, which does not fare well

with our data and the general observation that jumps are rare events. The high jump

intensity means that most of the variability in the data is explained by jumps, and as a

consequence the estimate of the constant volatility,σ, is very low. From the estimated

spike innovations, plotted in the top panel of Figure 3.4, we see that there is clustering

in the jump times and the assumption of a constant jump intensity does not hold.

The estimate of the mean-reversion rate, α, are in line with the estimate α= 0.0064,

obtained from fitting the theoretical autocorrelation function, exp−α|t |, to the first

100 lags of the empirical autocorrelation function for Z (t ). The mean-reversion rate

corresponds to a half-life of approximately 90 days, revealing that most emphasis is

put on capturing the mean-reversion of the base-signal and that the logarithmic spot

price is very persistent.

In the next two benchmark models, SF-JTS and SF-JLog, we consider allowing for

stochastic volatility, but neglect to account for the presence of jumps. In Green and

Nossman (2008), the authors find that in the constant volatility case, neglecting to

account for jumps will drive up the estimate of the mean-reversion rate. This is not

the case when there is stochastic volatility in the model, at least not for the data at

hand. The estimate of the mean-reversion rate has not changed much, and it appears

that most of the variations in the data can be explained by stochastic volatility. This

can also be seen from the filtered variance processes in the top panel of Figure 3.6,

where the volatile period in the beginning of the data sample, that was explained

by jumps in the SF-J model (see the top panel of Figure 3.4), is now captured by the

stochastic volatility process. The log-likelihood evaluated at the mean and median

of the posterior distribution of the parameters has increased a lot, indicating that

including stochastic volatility in the model is more important than allowing for jumps.

From the plots of the filtered stochastic variance processes, in Figure 3.6, there do

not seem to be much difference across the two specifications. When κ equals 0.5 in

the TS-OU specification we get a IG-OU process, which was the specification used in

the SF-SV model that was fitted to UK natural gas spot prices in Benth (2011). Our


estimate of κ is 0.4504, and testing κ= 0.5 yields a p-value of 0.196.

Table 3.3. Parameter estimates.

SF-J SF-SVTS SF-SVLog SF-SVJTS SF-SVJLog TF-J TF-SVJTS TF-SVJLog

αx 0.00773 0.00679 0.00696 0.00553 0.00784 0.00421 0.00867 0.00766(0.00293) (0.00304) (0.00207) (0.00268) (0.00291) (0.00203) (0.00226) (0.00161)

αy 0.00773 0.00679 0.00696 0.00553 0.00784 0.03956 2.1008 2.1297(0.00293) (0.00304) (0.00207) (0.00268) (0.00291) (0.01709) (0.378) (0.478)

σ2 0.00047 0.00046(0.00004) (0.00004)

λ 0.2108 0.1841 0.2011(0.02637) (0.02701) (0.02623)

κ 0.4504 0.4429 0.4108(0.03834) (0.03024) (0.04401)

δ 0.04262 0.04430 0.06809(0.01879) (0.01541) (0.03196)

γ 8.3085 8.4637 8.4585(1.420) (1.389) (1.522)

αh 0.06049 0.06369 0.06336(0.01346) (0.01434) (0.01353)

µh -7.0472 -7.1098 -7.1014(0.138) (0.216) (0.148)

σ2h 0.2737 0.2742 0.2651

(0.05783) (0.05434) (0.05281)

µJ -0.00189 -0.07008 0.07445 -0.00583 -0.09599 -0.3270(0.00520) (0.125) (1.686) (0.00526) (0.328) (0.358)

σJ 0.09342 0.1605 0.4919 0.09140 0.7232 0.4330(0.00480) (0.104) (0.292) (0.00452) (0.273) (0.230)

λJ 0.2445 0.01092 0.00031 0.2521 0.00265 0.00221(0.02340) (0.00975) (0.00020) (0.02292) (0.00172) (0.00128)

LogLikMean 2941.69 3174.17 3194.21 3176.57 3194.26 2944.32 3178.72 3199.63

LogLikMedian 2941.71 3174.82 3194.69 3177.59 3194.49 2944.36 3180.44 3199.95

MarginalLL 2937.97 3172.10 3193.00 3173.35 3191.60 2939.96 3176.43 3196.45

In the benchmark models labeled SF-SVJ, we still consider a one-factor model

but this time we include both stochastic volatility and jumps in the model specifica-

tion. From the results in Table 3.3, we see that the different volatility specifications

now results in different estimates of the mean-reversion rate. The half-life of the ob-

served process is now 125 days in the SF-SVJTS model and 88 days for the SF-SVJLog


model. The jump intensity and jump size distribution also changes with the volatility

specification. Perhaps a bit surprisingly, the estimated jump intensity is higher in

the SF-SVJTS model where the stochastic volatility process is allowed to jump. If we

compare the middle and bottom panel in Figure 3.4, it is clear that the two models

especially differ in the modeling of the large innovation in the end of the data set. The

volatile period in the beginning the data set is captured (mainly) by stochastic volatil-

ity in both models and as a consequence the jump intensity has drastically dropped

compared to the estimate from the SF-J model. From the estimated spike innovations

in the SF-SVJTS model, we see that there is still some clustering in the jump times and

allowing for a time-varying jump intensity would be a natural extension of the model.

This is however left for future research, but will be further discussed in Section 3.6.

The estimated jump intensity in the SF-SVJLog model is extremely low, and from the

plots in Figure 3.4 we see that only one large innovation is categorized as a jump.

The low jump intensities in both models also explain the high standard errors on the

parameters, µJ and σJ , governing the jump size distribution. The inclusion of jumps

does however not seem to impact the parameters governing the volatility process

much. However, testing for κ= 0.5 in the TS-OU specification now gives a p-value of

0.0590, making it less plausible that the IG-OU specification could have been used

instead.

In our last benchmark model, TF-J, we consider extending the SF-J model to a

two-factor model, where the mean-reversion rates are different for the base-signal

and spike process. We impose the restriction αy >αx , to insure that the estimated

mean reversion rate for the spike process is higher than for the base-signal process.

We see that the estimate of α obtained in the SF-J model is a weighted average of αx

and αy , with most weight put on the mean-reversion rate for the base-signal. The

half-life of the base-signal is 165 days and the half-life of the spike process is 17.5

days. The half-life of the spike process is too high compared to what is normally

to be understood by a spike. The estimates for the jump process are unaffected by

the inclusion of separate mean-reversion rates for the two factors, and hence the

estimated jump intensity is still unrealistically high.

In our proposed model, TF-SVJ, which is the two-factor extension of SF-SVJ, the

estimates of the mean-reversion rates are higher than the estimate found for the

SF-SVJ model. The half-life of the base-signal equals 80 days for the TF-SVJTS model

and 90 days for the TF-SVJLog model. The estimated half-life for the spike process now

equals a third of a day with both volatility specifications. We also fitted the theoretical

autocorrelation function, w1 exp(−αx |t |)+(1−w1)exp(−αy |t |), to the first 100 lags of

the empirical autocorrelation function, and obtained the estimates αx = 0.0050 and

αy = 0.4141, corresponding to half-lifes of 138 days and 1.7 days. The high estimate

of αy in our TF-SVJ models, could just be a consequence of the low number of jumps

in the spike process. The jump intensity has gone down for the TS-OU specification

and up for the log-OU specification, such that it is now roughly equal for the two


volatility specifications. Even though the estimates of the mean and variance of the

jump size distribution are not the same for the two models, Figure 3.5 reveals that

the filtered jump processes are almost identical. From the filtered variance processes

in Figure 3.6, we see that the volatility processes look very similar to the ones from

the SF-SVJ and SF-SV models, except for the high peak in the end of 2009, which is

now less pronounced. The parameters governing the TS-OU volatility process has

slightly changed compared to the benchmark models, but the log-OU specifications

remains unaffected by the inclusion of separate mean-reversion rates. Testing the

restriction κ= 0.5 gives a p-value of 0.0386, rejecting that a IG-OU specification could

have captured the volatility dynamics equally well.

In order to check if our proposed model is internally consistent we compare the

theoretical mean of the variance process, σ(t), to the mean of the filtered variance

processes from Figure 3.6. The theoretical mean of the TS-OU volatility process

equals 0.00262, whereas the empirical mean is found to be 0.00224. With the log-OU

specification, the theoretical mean equals 0.00230 and the empirical mean is found

to be 0.00221. Hence, both models appear internally consistent. From the reported

log-likelihood values, we see that the inclusion of separate mean-reversion rates in

the model matters more in the TF-SVJLog model.

The significance of the different model characteristics and the internal ranking of

the models will be investigated thoroughly in the next subsection.


2008 2009 2010 2011 2012 2013 2014−0.5

0

0.5

SF−J

2008 2009 2010 2011 2012 2013 2014−0.2

−0.1

0

0.1

0.2

Estim

ate

d S

pik

e Innovations

SF−SVJTS

2008 2009 2010 2011 2012 2013 2014−0.05

0

0.05

SF−SVJlog

Figure 3.4. The figure depicts the estimated spike innovations, computed from the mean ofthe posterior distribution p(ξ1:T , J1:T |Z1:T ).


2008 2009 2010 2011 2012 2013 2014−2

−1

0

1

TF−J

Estimated Spike Process

log−price

2008 2009 2010 2011 2012 2013 2014−2

−1

0

1

TF−SVJTS


log−price

2008 2009 2010 2011 2012 2013 2014−2

−1

0

1

TF−SVJlog


log−price

Figure 3.5. The figure depicts the estimated spike processes, computed from the mean of theposterior distribution p(Y1:T |Z1:T ).


2008 2009 2010 2011 2012 2013 20140

0.02

0.04

0.06

0.08

SF−SV

log

TS

2008 2009 2010 2011 2012 2013 20140

0.02

0.04

0.06

0.08

Estim

ate

d V

ola

tility

σ2(t

)

SF−SVJ

log

TS

2008 2009 2010 2011 2012 2013 20140

0.02

0.04

0.06

0.08

TF−SVJ

log

TS

Figure 3.6. The figure depicts the estimated stochastic volatility processes, computed from themean of the posterior distribution p(σ2(1 : T −1)|Z1:T ).


3.5.2 Model Evaluation

We conduct a comprehensive comparison between the models using Bayes factors

computed from the marginal log likelihood in Table 3.3. We report in Table 3.4 twice

the logarithm of the Bayes factor as it has the same scale as likelihood ratio test

statistics. Let LBF(M1, M0) =−2(log p(M1|Z )− log p(M0|Z )), Kass and Raftery (1995)

suggest the following scale for interpretation: if LBF(M1, M0) is between 2 to 6, it is

viewed as positive evidence against model M0; between 6 to 10, it indicates strong

evidence; and a value greater than 10 is interpreted as very strong evidence. Negative

values are interpreted on the same scale while it suggests evidence in favor of M0.

When volatility is assumed to be constant, allowing for different mean-reversion

rates in the two factors improves the overall performance of the model, as seen

by LBF(TF-J,SF-J) = 3.989. However, allowing for stochastic volatility has a much

larger impact on the model fit, as LBF(SF-SV,SF-J) are over 400 for both volatility

specifications.

Next, we look at the effect of having price jumps in the model when stochastic

volatility is being accounted for. With the TS-OU volatility specification, including

jumps in the price renders LBF(SF-SVJTS,SF-SVTS) = 2.513, indicating positive evi-

dence against SF-SVTS. The two-factor version of the model further improves the fit as

LBF(TF-SVJTS,SF-SVJTS) is equal to 6.159. For the log-OU volatility specification, the

Bayes factor favors SF-SVLog over SF-SVJLog. Although the log-likelihood of SF-SVJLog

evaluated at the posterior mean of parameters is higher than that of SF-SVLog, the

Bayes factor penalizes models with more parameters and in this case selects the sim-

pler model SF-SVLog. However, when we consider the full model, TF-SVJLog, there is

strong evidence against both SF-SVJLog and SF-SVLog, with LBF(TF-SVJLog,SF-SVLog) =6.91 and LBF(TF-SVJLog,SF-SVJLog) = 9.69. In summary, simply including jumps in

the model description is only slightly favorable (TS-OU volatility) or unfavorable (log-

OU volatility), while specifying a spike process, by imposing αx 6=αy , does improve

the model fit. This finding is in line with our intuition that jumps are faster decaying

and that this feature is important for model building.

The Bayes factors also provide a direct comparison between the two types of

volatility specifications. Here LBF(M Log, M TS) is ranging from 36.5 to 41.8, with M

indicating SF-SV, SF-SVJ or TF-SVJ. For the UK natural gas data, models with the

log-OU volatility specification are favored over models with TS-OU volatility across

the different jump specifications, with the TF-SVJLog model providing the best fit to

the data.

We further investigate the performance of the two volatility specifications in

different periods of time. Johannes et al. (2009) suggest tracking the likelihood ratio

sequentially to help identify when a model fails. We compute the sequential deviance,

defined as D t = −2(log p(Z1:t |θ, M TS)− log p(Z1:t |θ, M Log)). We choose θ to be the

posterior mean of θ, since the log-likelihood evaluated at the mean and median are

quite close. As before, we use 300,000 particles when computing the likelihood. The


Table 3.4. Bayes factors.

SF-J SF-SVTS SF-SVLog SF-SVJTS SF-SVJLog TF-J TF-SVJTS

SF-SVTS 468.26

SF-SVLog 510.06 41.80

SF-SVJTS 470.77 2.513 -39.29

SF-SVJLog 507.28 39.02 -2.788 36.50

TF-J 3.989 -464.27 -506.07 -466.78 -503.29

TF-SVJTS 476.93 8.672 -33.13 6.159 -30.35 472.94

TF-SVJLog 516.97 48.71 6.910 46.20 9.697 512.98 40.04

The table reports twice the natural logarithm of the Bayes factors. The entry (i , j ) in the matrix compares themodel in the i th row and the model in the j th column, with a positive value favoring the first model and anegative value favoring the latter.

results are plotted in Figure 3.7. By comparing the three panels in Figure 3.7 we see

that while D t behaves similarly across the different jump specifications, its dynamics

reveal important differences between the TS-OU and log-OU volatility processes.

Compared with Figure 3.6, we see that D t is positive during 2007 and the beginning

of 2008, indicating that log-OU specification provides a better fit than the TS-OU

specification, in this low volatility period. In mid 2008, D t drops drastically at the

pronounced volatility spikes, and stays negative in the following relatively volatile

period. After 2010, as the market enters a more tranquil period, D t starts to increase,

resulting a in positive value for the full sample.

Our results indicate that the TS-OU specification is well suited for volatile periods

where the volatility of volatility is high, while the log-OU specification fits the tranquil

periods better. The differences could arise from two aspects: First, while the log-OU

volatility process is a continuous process, the TS-OU process is purely jump-driven,

allowing it to better explain large price changes. Second, from Figure 3.8, we see that

the autocorrelation function for the log-OU specification decays slower, inferring a

more persistent volatility and providing a better fit to the empirical autocorrelation

function. This helps the log-OU specification to outperform the TS-OU specification

in the last 4 years of the sample period. The theoretical autocorrelation function for

the TS-OU specifications, fits the first few lags of the empirical autocorrelation very

well and the faster decay rate of the function stress its ability to better capture large

changes in the volatility, as seen in the first period of our data set. Figure 3.8 also


suggests extending the model to a multi-factor specification for the volatility process.

We also conduct LR tests in the nested models. The models SF-J, SF-SVTS, SF-SVJTS

and TF-J are all nested by TF-SVJTS, and we report the LR test statistics and p-values

in Table 3.5. With a 10% significance level, all of the restricted models are rejected.

The same conclusion applies to the log-OU specification at a 5% significance level.

2008 2009 2010 2011 2012 2013 2014−20

0

20

40

SF−SV

2008 2009 2010 2011 2012 2013 2014−20

0

20

40

SF−SVJ

Sequential D

evia

nce

2008 2009 2010 2011 2012 2013 2014−20

0

20

40

TF−SVJ

Figure 3.7. The figure plots the sequential deviance between models with different volatilityspecifications.


0 5 10 15 20 25 30 35 40 45 500

0.2

0.4

0.6

0.8

1

Autocorrelation Function of the TS−OU Volatility Process

Theotical ACF

Sample ACF

0 5 10 15 20 25 30 35 40 45 500

0.2

0.4

0.6

0.8

1

Autocorrelation Function of the log−OU Volatility Process

Theotical ACF

Sample ACF

Figure 3.8. The figure displays the theoretical and empirical autocorrelation functions for thetwo stochastic volatility processes in the TF-SVJ model. The theoretical acf’s are computedusing the posterior mean of the parameters governing the volatility processes. The empiricalacf’s are computed in each MCMC iteration and then averaged.


Table 3.5. LR-tests for the full model.

SF-J SF-SVTS SF-SVJTS TF-J

LRmean 474.072 9.11 4.31 468.802(0.000) (0.058) (0.038) (0.000)

LRmedian 477.453 11.239 5.69 472.157(0.000) (0.024) (0.017) (0.000)

SF-J SF-SVLog SF-SVJLog TF-J

LRmean 515.896 10.849 10.742 510.625(0.000) (0.028) (0.001) (0.000)

LRmedian 516.477 10.520 10.908 511.181(0.000) (0.033) (0.001) (0.000)

The table reports the LR-test statistic for comparing the models in the columns to the full TF-SVJ model. Thep-values for accepting the null model are reported in parentheses.

3.5.3 Model validation

We conclude our empirical investigation by checking that our proposed model is

able to reproduce the statistical properties of the data. For purposes of derivative

pricing and risk management, it is very important that the model implied price and

return distributions match the empirical distributions. For each estimated model, we

simulate 5000 artificial data sets of the same length as the observed spot prices. Then,

using these simulated paths, we calculate the empirical model implied distributions

of skewness and kurtosis. In Table 3.6, the 5th, 50th and 90th percentile of the model

implied distributions for the skewness and kurtosis of the deseasonalized logarithmic

spot price, Z (t), and its returns are reported for each model. The table also reports

the values from the observed deseasonalized UK natural gas spot prices.

Table 3.6 shows that stochastic volatility is crucial for capturing the price skewness

and kurtosis. All the models, except for the SF-J and TF-J model, have distributions

covering the observed negative price skewness, and small excess kurtosis, of the

deseasonalized UK natural gas spot prices. For both the SF-J and TF-J model, the

probability that the observed sample value of the skewness and kurtosis are real-

izations from the model implied distributions, is less than 5%. The best performing

model is the TF-SVJLog model, with the SF-SVJLog model being a close runner-up. In

fact, as we shall see in Figure 3.9, there is almost no difference in the distribution of

skewness and kurtosis implied by these two models.


Table 3.6. Distribution of skewness and kurtosis

prctile SF-J SF-SVTS SF-SVLog SF-SVJTS SF-SVJLog TF-J TF-SVJTS TF-SVJLog

Price Skewness -0.7818

5th -0.628 -0.827 -1.088 -0.813 -1.113 -0.505 -0.793 -1.129

50th -0.011 -0.004 0.005 -0.028 0.006 -0.014 -0.002 -0.026

95th 0.603 0.808 1.117 0.759 1.141 0.491 0.757 1.059

Price Kurtosis 3.8611

5th 2.065 2.106 2.242 2.023 2.271 2.238 2.245 2.296

50th 2.632 2.832 3.174 2.728 3.245 2.778 3.106 3.300

95th 3.651 4.445 5.855 4.201 5.972 3.655 4.908 5.867

Return Skewness -0.3703

5th -0.515 -0.875 -1.425 -1.297 -1.771 -0.622 -2.050 -1.999

50th -0.071 0.024 0.012 -0.377 0.039 -0.203 -0.224 -0.540

95th 0.376 0.850 1.387 0.508 2.411 0.206 1.493 0.798

Return Kurtosis 21.4775

5th 7.947 8.203 8.613 9.082 8.617 7.675 10.777 11.838

50th 9.058 11.523 14.391 12.796 15.502 8.710 71.515 43.733

95th 10.593 20.381 41.516 21.153 68.271 10.145 210.076 140.477

The table reports the skewness and kurtosis of the deseasonalized logarithmic spot price (Z (t )) and its returnseries. The table also report the percentiles of the simulated sample skewness and kurtosis for each of theconsidered models. For each model, the percentiles are computed using 5000 simulated data sets with 1620observations each. The simulations are performed using the parameter estimates from Table 3.3.

Turning our attention to the return distribution, Table 3.6 shows that the return

skewness can be captured by all the considered models. Once again, we find that

inclusion of stochastic volatility enables the model to produce more skewness. The

transition from a single factor model to a two-factor model also impacts the distri-

bution of return skewness, in contrast to the results found for the price skewness.

The model that captures the return skewness the best, in terms of producing the

highest probability of observing an outcome from the model implied distribution

that is more extreme than the sample skewness of the data, is the SF-SVJTS model. As

for the kurtosis of the return series, it follows from Table 3.6 that stochastic volatility

is essential in order to produce a high enough level of kurtosis. It is also clear that the

volatility specification impacts the distribution of the kurtosis of the returns. For the

single factor models, only the models with a log-OU volatility specification are able to

fully match the level of kurtosis observed in the data. With the TS-OU specification,

the probability of observing an outcome from the distribution of kurtosis, that is


more extreme than the sample value from our data, is only around 5%. Allowing the

spike process to have its own mean-reversion rate, significantly impact the models

ability to generate high levels of kurtosis. In the two-factor models, a jump is now

followed by a quick reversion back to the base-signal process, instead of the much

slower reversion found in the single factor models, and this behavior increases the

kurtosis of the return series. The distribution that captures the observed kurtosis the

best, is the distribution implied by the SF-SVJLog model. The closest runner-up is the

TF-SVJLog model.

−5 −4 −3 −2 −1 0 1 2 3 4 5

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Sample Skewnes

Z(t): SF-SVJLog versus TF-SVJLog

TF-SVJLog: pval=0.222

SF-SVJLog: pval=0.222

0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

Sample Kurtosis

Z(t): SF-SVJLog versus TF-SVJLog



Figure 3.9. The figure plots the model implied distribtions of skewness and kurtosis of thedeseasonalized logarithmic spot price Z (t ), along with the sample values.

In the Figure 3.9 and Figure 3.10 the model implied distributions of skewness

and kurtosis of the price and return series are plottet for our favoured model, the

TF-SVJLog model, and the closest contender, the SF-SVJLog model. The reported p-

values should be interpreted as the probability of observing an outcome from the

distribution that is more extreme than the sample value. Hence, if the sample value

matches the median of the model implied distribution, then we report a p-value of 1.

Following the discussion above, Figure 3.9 and 3.10 highlights the fact that allow-

ing for separate mean-reversion rates matters for the return distribution in terms of

matching skewness and kurtosis. Also, in the TF-SVJLog model, the estimated jump

intensity is about seven times larger than the jump intensity in the SF-SVJLog model,

which contributes to the increased spread of the kurtosis distribution. Both models

are, however, able to capture the statistical properties of the data at hand.

To sum up, we find that inclusion of stochastic volatility in the models is crucial

for matching the price and return distributions. In financial applications, such as risk

management, matching the skewness and kurtosis of the return series is often more

3.6. EXTENSIONS 119

−5 −4 −3 −2 −1 0 1 2 3 4 5

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Sample Skewnes

r(t): SF-SVJLog versus TF-SVJLog



0 5 10 15 20 25 30 35 40 45 500

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

Sample Kurtosis

r(t): SF-SVJLog versus TF-SVJLog



Figure 3.10. The figure plots the model implied distribtions of skewness and kurtosis of thereturns from the deseasonalized logarithmic spot price Z (t ), along with the sample values.

important than matching the price distribution. In this context, we found that the

volatility specification and the inclusion of a separate mean-reversion rate for the

spike process greatly impacts the models ability to generate high levels of kurtosis.

3.6 Extensions

The economic importance of our findings remain to be tested. One approach could

be to consider how the different model specifications impact the forward prices by

considering the empirical risk premium, RP (t) = F observed(t ,T1,T2)−FP(t ,T1,T2),

defined as the difference between the observed market price and the predicted spot

price over the period of delivery [T1,T2]. The latter is computed using the theoretical

forward prices with Q = P, and then averaged over the delivery period. From the

forward prices derived in Benth and Vos (2013b), we suspect that the forward prices

will depend on the filtered factors, X (t ) and Y (t ), as well as the filtered spot volatility

σ2(t). Among other things, this has the implication that even without jumps in the

spot price model the forward price can still jump if the volatility process has jumps.

The study of forward price dynamics might therefore serve as a way of testing the

economic difference between the different volatility specifications. Derivation of

forward prices in our full model, TF-SVJ, is therefore a topic for future research.

It would also be of interest to see how the model performs on data from other

energy markets. In particular the electricity market, where spikes are larger and more

frequent and the overall volatility of the observed spot prices are larger compared to

the gas spot prices. In this data it might also be important to consider other jump size


specifications as most of the spikes are positive. Related to this, extending the model

to have several spike components - i.e. one spike component for the positive jumps

and one component for capturing the negative jumps would also be of relevance. It

should be noted that there are significant differences between electricity markets,

with e.g. spike occurrences and sizes getting smaller in the EEX market. Our model

and estimation setup could therefore also be used to investigate these changes, and

the differences across the various electricity markets.

The clustered jumps in the SF-J model suggest that it could be interesting to

incorporate a time-varying jump intensity. For example, we can specify a stochastic

process for the jump intensity, or allow the jump intensity to depend on the spot

volatility or on exogenous variables such as weather. From the plot of the autocorrela-

tion functions for the filtered volatility processes, it also appears that a better model

fit might be obtained from extending the volatility specification to a multi-factor

specification with different decay-rates for the entering factors.

Another extension of the model specification could be to incorporate leverage

effects. In Green and Nossman (2008) the authors introduce leverage in their model

by making the driving Brownian motions of the volatility process and base-signal

process correlated. This would of course only be possible with the log-OU volatility

specification. With the TS-OU specification, another approach could be to investigate

the implications of making the volatility and the spike process correlated.

Finally, it would be very interesting to try to adapt our estimation approach to

the multi-dimensional model from Benth and Vos (2013a) and investigate how the

different model characteristics impact the joint modeling of for instance gas and

electricity spot prices.

3.7 Conclusion

We proposed a two-factor geometric model with stochastic volatility and jumps for

the detrended and deseasonalized logarithmic UK natural gas spot price. We then

described how this model could be estimated by using a non-Markovian representa-

tion of the model, with the spike process and stochastic volatility process being latent

variables. In contrast to most estimation approaches found in the literature, the base-

signal and spike component of the model are estimated simultaneously, allowing us

to investigate the interplay between the specification of the two components. The es-

timation method employed uses the particle marginal Metropolis-Hastings sampler

from Andrieu et al. (2010) and has the advantage of being easy to adopt to different

volatility specifications, including pure jump-driven specifications. The estimation

method is very general and made it easy to estimate and compare our proposed

model to other benchmark models in a unified framework. This also made it possible

to answer question like: What matters the most, including stochastic volatility or

jump? Is this conclusion affected by the volatility specification? And is it necessary to

3.7. CONCLUSION 121

allow for jumps in the volatility process? Our empirical application to logarithmic UK

natural gas spot prices showed that inclusion of stochastic volatility is much more

important than having jumps in the model. Like, in Green and Nossman (2008) we

also found that neglecting to include stochastic volatility results in a severely overes-

timated jump intensity. From the results for the two-factor version of the model with

jumps and constant volatility, we saw that the inclusion of separate mean-reversion

rates did not make the jump intensity drop. The results for the single factor model

with jumps (SF-J), resembling the model from Cartea and Figueroa (2005), also re-

vealed the need for a time-varying jump intensity when stochastic volatility is not

included in the model.

Sticking with a single factor model, but allowing for both stochastic volatility

and jumps showed that the volatility specification can impact the estimated jump

intensity and jump size distribution. The TS-OU specification, where the volatility

process is allowed to jump, actually resulted in a higher jump intensity than with the

log-OU volatility specification. From fitting the full model (TF-SVJ), with separate

mean-reversion rates, jumps and stochastic volatility, we found that having different

mean-reversion rates justifies the inclusion of jumps in the model specification,

even though the spike process only accounts for a small part of the variations in

our data. The model with a log-OU volatility specification outperforms the model

with TS-OU specification, even when jumps and different mean-reversion rates are

included in the latter model and not in the first. By tracking the likelihood ratio

sequentially, it became clear that the TS-OU volatility specification is well suited

for volatile periods where the volatility process is changing a lot, while the log-OU

specification outperforms the TS-OU specification during the more tranquil periods

where the volatility of volatility is low.

The estimation method based on particle MCMC is very general and it would

be interesting to extend our proposed model by incorporating a time-varying jump

intensity, to see if this changes the split between how much of the variations in the

data is captured by the base-signal and how much is captured by the spike process.

Another path for future research, is to investigate if stochastic volatility also matters

the most when other energy spot prices, such as electricity prices, are considered.

In electricity prices, spikes are larger and more frequent so the inclusion of a time-

varying jump intensity might be more relevant in this setting. Furthermore, adapting

the estimation framework to models with several spike components, like one process

for modeling the negative spikes and one for the positive spikes, as well as considering

other jump size distributions would be interesting topics for future research. The ACF

of our filtered volatility processes also indicate the need for multi-factor volatility

processes, something that our estimation approach can easily be adapted to, and is

an obvious path for future research. As already noted, the spike dynamics in some

electricity markets are changing, with spike occurrences and sizes dropping. It might

therefore be interesting to use our proposed model and estimation technique to


further investigate the differences across various electricity markets. The PMCMC

method also has potential in applications to multi-dimensional cross-commodity

models, like the one considered in Benth and Vos (2013a), and it would be very

interesting to adapt the approach to this setup.

3.8. APPENDIX 123

3.8 Appendix

3.8.1 Model Diagnostics

We present the model diagnostic plots for TF-SVJTS and TF-SVJLog in Figure 3.11 and

3.12. The plots for other models are similar and omitted to save space. The left panels

are trace plots of parameter draws against iterations, while the right panels report

the prior and empirical posterior distributions (histogram) of the parameters. We

first look at the trace plots. For model TF-SVJTS, we ran four chains in parallel from

different starting points for speed considerations, and the vertical lines indicates

when a new chain is started. For TF-SVJLog, one long chain is used. Visual inspection

indicates convergence, although the mixing of some parameters is less satisfactory,

for example for σ j , the standard deviation of jump sizes. This is likely due to the very

low jump intensity in these two models, and hence the algorithms have a hard time

estimating the jump sizes.

From the posterior-prior plots, we see that for most of the parameters, prior

information is negligible compared with posterior. To assist visual inspections, we

use different scales for the densities: ticks on the left axis are the density for posterior,

while ticks on the right axis denotes the density for prior. For example in Figure 3.11,

αx in the TF-SVJTS model, the prior density at αx = 0.0087 is around 1, while the

posterior density is about 180. For αx in the TF-SVJLog model, the prior density at

αx = 0.0077 is around 1, while the posterior density is about 250. One exception is

σ j , for which we choose a prior that places lower probability on jumps being small.

The posterior-prior plots in both models indicate that the prior for σ j is informative

about the posteriors.


0 5000 10000 150002.61

3.66

4.71

5.75

κ∗10−1

0.26 0.37 0.47 0.58 0 2 4 6 810

012234

0 5000 10000 150000.00

0.53

1.07

1.60

δ∗10−1

0 0.053 0.11 0.16 0 4 8121620

0.1300.1340.1380.1420.1460.150

0 5000 10000 15000 2.47

6.85

11.23

15.61

γ

2.5 6.8 11 16 0

0.080.160.240.32 0.4

0.000.020.040.060.080.10

0 5000 10000 150000.74

1.64

2.55

3.45

λ∗10−1

0.074 0.16 0.25 0.35 0 4 8121620

0.400.420.440.460.480.50

0 5000 10000 150000.00

0.67

1.33

2.00

αx∗10−2

0 0.0067 0.013 0.02 0 40 80120160200

0.980.991.001.001.011.02

0 5000 10000 150000.06

1.43

2.81

4.18

αy

0.06 1.4 2.8 4.2 00.40.81.21.6 2

0.00.20.40.60.81.0

0 5000 10000 15000−1.92

−0.78

0.36

1.51

µj

−1.9 −0.78 0.36 1.5 00.40.81.21.6 2

0.100.120.140.160.180.20

0 5000 10000 150000.00

0.53

1.07

1.60

σj

0 0.53 1.1 1.6 00.40.81.21.6 2

0.00.20.40.60.81.0

0 5000 10000 150000.00

0.40

0.80

1.20

λj∗10−2

0 0.004 0.008 0.012 0 80160240320400

0.09980.09990.10000.10000.10010.1002

Figure 3.11. Diagnostic Plots for the TF-SVJ model with TS-OU volatility. Left panels are traceplots of parameters. In the right panels, blue lines are posterior densities, while green linesindicate priors.

3.8. APPENDIX 125

0 5000 10000 150000.00

0.40

0.80

1.20

αh∗10−1

0 0.04 0.08 0.12 0 816243240

0.800.840.880.920.961.00

0 5000 10000 15000−7.93

−7.37

−6.80

−6.24

µh

−7.9 −7.4 −6.8 −6.2 00.81.62.43.2 4

0.060.070.080.080.090.10

0 5000 10000 150000.00

2.00

4.00

6.00

σ2 h∗10−1

0 0.2 0.4 0.6 0 2 4 6 810

0.300.340.380.420.460.50

0 5000 10000 150000.00

0.50

1.00

1.50

αx∗10−2

0 0.005 0.01 0.015 0 80160240320400

0.980.991.001.001.011.02

0 5000 10000 150000.00

1.67

3.33

5.00

αy

0 1.7 3.3 5 00.20.40.60.8 1

0.00.20.40.60.81.0

0 5000 10000 15000−3.30

−1.42

0.45

2.32

µj

−3.3 −1.4 0.45 2.3 00.40.81.21.6 2

0.000.040.080.120.160.20

0 5000 10000 150000.00

0.50

1.00

1.50

σj

0 0.5 1 1.5 00.81.62.43.2 4

0.00.20.40.60.81.0

0 5000 10000 150000.00

0.33

0.67

1.00

λj∗10−2

0 0.0033 0.0067 0.01 0 80160240320400

0.09980.09990.10000.10000.10010.1002

Figure 3.12. Diagnostic Plots for the TF-SVJ model with log-OU volatility.Left panels are traceplots of parameters. In the right panels, blue lines are posterior densities, while green linesindicate priors.


3.9 References

Andrieu, C., Doucet, A., Holenstein, R., 2010. Particle Markov chain Monte Carlo


72 (3), 269–342.

Andrieu, C., Thoms, J., 2008. A tutorial on adaptive MCMC. Statistics and Computing

18 (4), 343–373.

Barndorff-Nielsen, O. E., Shephard, N., 2001. Normal modified stable processes. The-

ory of Probability and Mathematical Statistics 65, 1–19.

Benth, F. E., 2011. The stochastic volatility models of Barndorff-Nielsen and Shephard

in commodity markets. Mathematical Finance 4, 595–625.

Benth, F. E., Benth, J. S., Koekebakker, S., 2008. Statistical Modeling of Electricity and

Related Markets. Advanced Series on Statistical Science and Applied Probability.

World Scientific.

Benth, F. E., Ekeland, L., Hauge, R., Nielsen, B. F., 2003. A note on arbitrage-free pricing

of forward contracts in energy markets. Applied Mathematical Finance 10, 325–336.




Benth, F. E., Kiesel, R., Nazarova, A., 2012. A critical empirical study of three electricity

spot price models. Energy Economics 34, 1589–1616.

Benth, F. E., Saltyte Benth, J., 2004. The normal inverse Gaussian distribution and

spot price modelling in energy markets. International Journal of Theoretical and

Applied Finance 07 (02), 177–192.

Benth, F. E., Vos, L., 2013a. Cross-commodity spot price modeling with stochastic


545–571.

Benth, F. E., Vos, L., 2013b. Pricing of forwards and options in a multivariate non-

gaussian stochastic volatility model for energy markets. Advances in Applied Prob-

ability 45, 572–594.

Cartea, A., Figueroa, M., 2005. Pricing in electricity markets: a mean reverting jump

diffusion model with seasonality. Applied Mathematical Finance 12(4), 313–335.

Douc, R., Cappe, O., Sept 2005. Comparison of resampling schemes for particle filter-

ing. In: Image and Signal Processing and Analysis, 2005. ISPA 2005. Proceedings of

the 4th International Symposium on. pp. 64–69.

3.9. REFERENCES 127

Doucet, A., Freitas, N. d., Murphy, K. P., Russell, S. J., 2000. Rao-Blackwellised particle

filtering for dynamic Bayesian networks. In: Proceedings of the 16th Conference on

Uncertainty in Artificial Intelligence. UAI ’00. Morgan Kaufmann Publishers Inc.,

San Francisco, CA, USA, pp. 176–183.

Eraker, B., Johannes, M., Polson, N., 06 2003. The impact of jumps in volatility and

returns. Journal of Finance 58 (3), 1269–1300.

Eydeland, A., Wolyniec, K., 2003. Energy and Power Risk Management, New Develop-

ments in Modeling, Pricing and Hedging. John Wiley.

Gelman, A., Roberts, G., Gilks, W., 1996. Efficient Metropolis jumping hules. Bayesian

statistics 5, 599–608.

Geman, H., 2005. Commodities and Commodity Derivatives. John Wiley.

Green, R., Nossman, M., 2008. Markov chain Monte Carlo estimation of a multi-factor

jump diffusion model for power prices. The Journal of Energy Markets 1(4), 65–90.

Griffin, J., Steel, M., 2006. Inference with non-Gaussian Ornstein-Uhlenbeck pro-

cesses for stochastic volatility. Journal of Econometrics 134, 605–644.

Haario, H., Saksman, E., Tamminen, J., 2001. An adaptive Metropolis algorithm.

Bernoulli, 223–242.

Hambley, B., Howison, S., Kluge, T., 2009. Modelling spikes and pricing swing options

in electricity markets. Quantitative Finance 9(8), 937–949.

Jacquier, E., Polson, N. G., Rossi, P. E., October 1994. Bayesian analysis of stochastic

volatility models. Journal of Business & Economic Statistics 12 (4), 371–89.

Johannes, M. S., Polson, N. G., Stroud, J. R., July 2009. Optimal filtering of jump

diffusions: Extracting latent states from asset prices. Review of Financial Studies

22 (7), 2559–2599.

Kass, R. E., Raftery, A. E., 1995. Bayes factors. Journal of the American Statistical

Association 90 (430), pp. 773–795.

Klüppelberg, C., Meyer-Brandis, T., Schmidt, A., 2010. Electricity spot price modelling

with a view towards extreme spike risk. Quantitative Finance 10:9, 963–974.

Lucia, J., Schwartz, E., 2002. Electricity prices and power derivatives: evidence from

the Nordic power exchange. Review of Derivatives Research 5(1), 5–50.

Meyer-Brandis, T., Tankov, P., 2008. Multi-factor jump-diffusion models of electricity

prices. International Journal of Theoretical and Applied Finance 11, 503–528.


Newton, M. A., Raftery, A. E., 1994. Approximate Bayesian inference with the weighted

likelihood bootstrap. Journal of the Royal Statistical Society. Series B (Methodologi-

cal) 56 (1), pp. 3–48.

Pitt, M. K., dos Santos Silva, R., Giordani, P., Kohn, R., 2012. On some properties of

Markov chain Monte Carlo simulation methods based on the particle filter. Journal

of Econometrics 171 (2), 134 – 151, bayesian Models, Methods and Applications.

Pitt, M. K., Shephard, N., June 1999. Filtering via simulation: Auxiliary particle filters.

Journal of the American Statistical Association 94 (446), 590–599.

Roberts, G. O., Papaspiliopoulos, O., Dellaportas, P., 2004. Bayesian inference for

non-Gaussian Ornstein-Uhlenbeck stochastic volatility processes. Journal of the

Royal Statistical Society Series B 66 (2), 369–393.

Schwartz, E., 1997. The stochastic behaviour of commodity prices: Implications for

valuation and hedging. The Journal of Finance 52(3), 923–973.

DEPARTMENT OF ECONOMICS AND BUSINESS AARHUS UNIVERSITY

SCHOOL OF BUSINESS AND SOCIAL SCIENCES www.econ.au.dk

PhD Theses since 1 July 2011 2011-4 Anders Bredahl Kock: Forecasting and Oracle Efficient Econometrics 2011-5 Christian Bach: The Game of Risk 2011-6 Stefan Holst Bache: Quantile Regression: Three Econometric Studies 2011:12 Bisheng Du: Essays on Advance Demand Information, Prioritization and Real Options

in Inventory Management 2011:13 Christian Gormsen Schmidt: Exploring the Barriers to Globalization 2011:16 Dewi Fitriasari: Analyses of Social and Environmental Reporting as a Practice of

Accountability to Stakeholders 2011:22 Sanne Hiller: Essays on International Trade and Migration: Firm Behavior, Networks

and Barriers to Trade 2012-1 Johannes Tang Kristensen: From Determinants of Low Birthweight to Factor-Based

Macroeconomic Forecasting 2012-2 Karina Hjortshøj Kjeldsen: Routing and Scheduling in Liner Shipping 2012-3 Soheil Abginehchi: Essays on Inventory Control in Presence of Multiple Sourcing 2012-4 Zhenjiang Qin: Essays on Heterogeneous Beliefs, Public Information, and Asset

Pricing 2012-5 Lasse Frisgaard Gunnersen: Income Redistribution Policies 2012-6 Miriam Wüst: Essays on early investments in child health 2012-7 Yukai Yang: Modelling Nonlinear Vector Economic Time Series 2012-8 Lene Kjærsgaard: Empirical Essays of Active Labor Market Policy on Employment 2012-9 Henrik Nørholm: Structured Retail Products and Return Predictability 2012-10 Signe Frederiksen: Empirical Essays on Placements in Outside Home Care

http://www.econ.au.dk/

2012-11 Mateusz P. Dziubinski: Essays on Financial Econometrics and Derivatives Pricing 2012-12 Jens Riis Andersen: Option Games under Incomplete Information 2012-13 Margit Malmmose: The Role of Management Accounting in New Public Management Reforms: Implications in a Socio-Political Health Care Context 2012-14 Laurent Callot: Large Panels and High-dimensional VAR 2012-15 Christian Rix-Nielsen: Strategic Investment 2013-1 Kenneth Lykke Sørensen: Essays on Wage Determination 2013-2 Tue Rauff Lind Christensen: Network Design Problems with Piecewise Linear Cost

Functions

2013-3 Dominyka Sakalauskaite: A Challenge for Experts: Auditors, Forensic Specialists and the Detection of Fraud 2013-4 Rune Bysted: Essays on Innovative Work Behavior 2013-5 Mikkel Nørlem Hermansen: Longer Human Lifespan and the Retirement Decision 2013-6 Jannie H.G. Kristoffersen: Empirical Essays on Economics of Education 2013-7 Mark Strøm Kristoffersen: Essays on Economic Policies over the Business Cycle 2013-8 Philipp Meinen: Essays on Firms in International Trade 2013-9 Cédric Gorinas: Essays on Marginalization and Integration of Immigrants and Young Criminals – A Labour Economics Perspective 2013-10 Ina Charlotte Jäkel: Product Quality, Trade Policy, and Voter Preferences: Essays on

International Trade 2013-11 Anna Gerstrøm: World Disruption - How Bankers Reconstruct the Financial Crisis: Essays on Interpretation 2013-12 Paola Andrea Barrientos Quiroga: Essays on Development Economics 2013-13 Peter Bodnar: Essays on Warehouse Operations 2013-14 Rune Vammen Lesner: Essays on Determinants of Inequality 2013-15 Peter Arendorf Bache: Firms and International Trade 2013-16 Anders Laugesen: On Complementarities, Heterogeneous Firms, and International Trade

2013-17 Anders Bruun Jonassen: Regression Discontinuity Analyses of the Disincentive Effects of Increasing Social Assistance 2014-1 David Sloth Pedersen: A Journey into the Dark Arts of Quantitative Finance 2014-2 Martin Schultz-Nielsen: Optimal Corporate Investments and Capital Structure 2014-3 Lukas Bach: Routing and Scheduling Problems - Optimization using Exact and Heuristic Methods 2014-4 Tanja Groth: Regulatory impacts in relation to a renewable fuel CHP technology:

A financial and socioeconomic analysis 2014-5 Niels Strange Hansen: Forecasting Based on Unobserved Variables 2014-6 Ritwik Banerjee: Economics of Misbehavior 2014-7 Christina Annette Gravert: Giving and Taking – Essays in Experimental Economics 2014-8 Astrid Hanghøj: Papers in purchasing and supply management: A capability-based perspective 2014-9 Nima Nonejad: Essays in Applied Bayesian Particle and Markov Chain Monte Carlo Techniques in Time Series Econometrics 2014-10 Tine L. Mundbjerg Eriksen: Essays on Bullying: an Economist’s Perspective 2014-11 Sashka Dimova: Essays on Job Search Assistance 2014-12 Rasmus Tangsgaard Varneskov: Econometric Analysis of Volatility in Financial Additive Noise Models 2015-1 Anne Floor Brix: Estimation of Continuous Time Models Driven by Lévy Processes

ISBN: 9788793195097

estimation of continuous time models driven by …...estimation of continuous time models driven by...

Documents