by paul karapanagiotidis - university of toronto t-space · paul karapanagiotidis doctor of...

ESSAYS IN FINANCIAL AND MACRO ECONOMETRICS

by

Paul Karapanagiotidis

A thesis submitted in conformity with the requirementsfor the degree of Doctor of Philosophy

Graduate Department of EconomicsUniversity of Toronto

c© Copyright 2014 by Paul Karapanagiotidis

Abstract

Essays in Financial and Macro Econometrics

Paul KarapanagiotidisDoctor of Philosophy

Graduate Department of EconomicsUniversity of Toronto

2014

Theory suggests that physical commodity prices may exhibit nonlinear features such as bubbles and various

types of asymmetries. Chapter one investigates these claims empirically by introducing a new time series model

apt to capture such features. The data set is composed of 25 individual, continuous contract, commodity futures

price series, representative of a number of industry sectors including softs, precious metals, energy, and livestock.

It is shown that the linear causal ARMA model with Gaussian innovations is unable to adequately account for

the features of the data. In the purely descriptive time series literature, often a threshold autoregression (TAR) is

employed to model cycles or asymmetries. Rather than take this approach, we suggest a novel process which is

able to accommodate both bubbles and asymmetries in a flexible way. This process is composed of both causal

and noncausal components and is formalized as the mixed causal/noncausal autoregressive model of order (r, s).

Estimating the mixed causal/noncausal model with leptokurtic errors, by an approximated maximum likelihood

method, results in dramatically improved model fit according to the Akaike information criterion. Comparisons of

the estimated unconditional distributions of both the purely causal and mixed models also suggest that the mixed

causal/noncausal model is more representative of the data according to the Kullback-Leibler measure. Moreover,

these estimation results demonstrate that allowing for such leptokurtic errors permits identification of various types

of asymmetries. Finally, a strategy for computing the multiple steps ahead forecast of the conditional distribution

is discussed.

Chapter two considers a vector autoregressive model (VAR) model with stochastic volatility which appeals to

the Inverse Wishart distribution. Dramatic changes in macroeconomic time series volatility pose a challenge to

contemporary VAR forecasting models. Traditionally, the conditional volatility of such models had been assumed

constant over time or allowed for breaks across long time periods. More recent work, however, has improved

forecasts by allowing the conditional volatility to be completely time variant by specifying the VAR innovation

variance as a distinct discrete time process. For example, Clark (2011) specifies the elements of the covariance

matrix process of the VAR innovations as linear functions of independent nonstationary processes. Unfortunately,

there is no empirical reason to believe that the VAR innovation volatility processes of macroeconomic growth se-

ries are nonstationary, nor that the volatility dynamics of each series are structured in this way. This suggests that

a more robust specification on the volatility process—one that both easily captures volatility spill-over across time

ii

series and exhibits stationary behaviour—should improve density forecasts, especially over the long-run forecast-

ing horizon. In this respect, we employ a latent Inverse Wishart autoregressive stochastic volatility specification

on the conditional variance equation of a Bayesian VAR, with U.S. macroeconomic time series data, in evaluating

Bayesian forecast efficiency against a competing specification by Clark (2011).

iii

Dedication

This thesis is dedicated to my wife Amanda, for all your love and support over the years.

Acknowledgements

This thesis owes a debt of gratitude to many people. Specifically, I would like to thank my supervisor, ChristianGourieroux, for all of his guidance throughout each step of the process. I was very lucky to have the opportunityto work with Christian and despite his busy schedule he always made time for me. Christian’s love for teachingalways shone through each time he was given the opportunity, prompted often by some eager student seekingknowledge.

I am also greatly indebted to John M. Maheu, for not only his previous advising, but also for being supportiveduring critical periods in my academic life. I’d also like to thank both Martin Burda and Angelo Melino for theirgenerous advice, support, and comments on my work.

Finally, I’d like to give a special thanks to Allan Hynes, who taught me that, without an appreciation of context,our understanding of economics can be no better than superficial.

iv

Contents

Introduction 1

1 Dynamic Modeling of Commodity Futures Prices 31.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2 Description of the asset and data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.2.1 The forward contract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.2.2 The futures contract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.2.3 The futures contract without delivery . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

1.2.4 Organization of the markets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

1.2.5 Example of a futures contract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

1.2.6 Data on the commodity futures contracts . . . . . . . . . . . . . . . . . . . . . . . . . 16

1.2.7 Features of the price level series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

1.3 The linear causal ARMA model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

1.3.1 Test specifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

1.3.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

1.4 The linear mixed causal/noncausal process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

1.4.1 The asymmetries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

1.4.2 The purely causal representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

1.4.3 Other bubble like processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

1.5 Estimation of the mixed causal/noncausal process . . . . . . . . . . . . . . . . . . . . . . . . . 33

1.5.1 The mixed causal/noncausal autoregressive model of order (r, s) . . . . . . . . . . . . . 34

1.5.2 ML estimation of the mixed causal/noncausal autoregressive model . . . . . . . . . . . 36

1.5.3 Estimation results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

1.6 Comparison of the estimated unconditional distributions . . . . . . . . . . . . . . . . . . . . . 44

1.7 Forecasting the mixed causal/noncausal model . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

1.7.1 The predictive distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

1.7.2 Equivalence of information sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

1.7.3 Examples: the causal prediction problem of the noncausal process . . . . . . . . . . . . 49

1.7.4 A Look-Ahead estimator of the predictive distribution . . . . . . . . . . . . . . . . . . 51

1.7.5 Drawing from the predictive distribution by SIR method . . . . . . . . . . . . . . . . . 52

1.7.6 Application to commodity futures data . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

1.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

1.9 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

Appendices 62

v

1.10 Appendix: Rolling over the futures contract . . . . . . . . . . . . . . . . . . . . . . . . . . . . 621.11 Appendix: Mixed causal/noncausal process . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

1.11.1 Strong moving average . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 641.11.2 Identification of a strong moving average representation . . . . . . . . . . . . . . . . . 651.11.3 Probability distribution functions of the stationary strong form noncausal representation 651.11.4 The causal strong autoregressive representation . . . . . . . . . . . . . . . . . . . . . . 661.11.5 Distributions with fat tails . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

1.12 Appendix: Approximation of the mixed causal/noncausal AR(r, s) likelihood . . . . . . . . . . 681.13 Appendix: Numerical algorithm for mixed causal/noncausal AR(r, s) forecasts . . . . . . . . . 701.14 Appendix: Tables and Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

2 Improving VAR forecasts through AR Inverse Wishart SV 842.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

2.1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 852.2 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 882.3 Model specifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

2.3.1 Benchmark model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 902.3.2 Alternative volatility process specification . . . . . . . . . . . . . . . . . . . . . . . . . 922.3.3 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

2.4 Priors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 962.4.1 VAR(J) priors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 972.4.2 Volatility model priors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

2.5 Model estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 992.6 Forecasts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

2.6.1 Point and interval forecasts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1012.6.2 Forecast comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

2.7 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1072.7.1 Monte-Carlo Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1072.7.2 Real data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

2.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1362.9 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

Appendices 1412.10 Appendix: Real data, LDL

′factorization of Σt . . . . . . . . . . . . . . . . . . . . . . . . . . 141

2.11 Appendix: Derivation of the posterior distributions needed for Gibbs sampling . . . . . . . . . . 1412.11.1 Definition of the parameters and priors . . . . . . . . . . . . . . . . . . . . . . . . . . 1432.11.2 Computation of the posterior distribution for the Clark (2011) volatility model . . . . . 1432.11.3 Computation of the posterior distribution for the IWSV model . . . . . . . . . . . . . . 149

2.12 Appendix: Tables and Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

vi

List of Tables

1.1 Commodity sectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

1.2 Commodity specifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

1.3 Summary statistics - commodity futures price level series . . . . . . . . . . . . . . . . . . . 24

1.4 ARMA estimation results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

1.5.i Estimation results of mixed causal/noncausal AR(r, s) models . . . . . . . . . . . . . . . . 42

1.5.ii . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

1.5.iii . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

1.6 Kullback-Leibler divergence measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

1.7.i Lag polynomial roots of the mixed and benchmark models . . . . . . . . . . . . . . . . . . 73

1.7.ii . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

1.7.iii . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

2.1 Simulated data, Sample moments of the LPLh,n metrics across N = 100 sample windows . 117

2.2 Real data, Sample moments of the LPLh,n metrics across N = 100 sample windows . . . . 129

2.3.i Section 2.7.1: Posterior distribution of the parameters for IWSV model, Simulated data, 1stsample, Rolling window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

2.3.ii Section 2.7.1: Distribution of the posterior mean of the parameters for IWSV model, Simu-lated data, across N = 100 samples, Rolling window . . . . . . . . . . . . . . . . . . . . . 152

2.4.i Section 2.7.1: Posterior distribution of the parameters for Clark model, Simulated data, 1stsample, Rolling window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

2.4.ii Section 2.7.1: Distribution of the posterior mean of the parameters for Clark model, Simu-lated data, across N = 100 samples, Rolling window . . . . . . . . . . . . . . . . . . . . . 154

2.5.i Section 2.7.2: Posterior distribution of the parameters for IWSV model, Real data, 1st sample,Recursive window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

2.5.ii Section 2.7.2: Distribution of the posterior mean of the parameters for IWSV model, Realdata, across N = 100 samples, Recursive window . . . . . . . . . . . . . . . . . . . . . . . 156

2.5.iii Section 2.7.2: Distribution of the posterior mean of the parameters for IWSV model, Realdata, across N = 100 samples, Rolling window . . . . . . . . . . . . . . . . . . . . . . . . 157

2.6.i Section 2.7.2: Posterior distribution of the parameters for Clark model, Real data, 1st sample,Recursive window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158

2.6.ii Section 2.7.2: Distribution of the posterior mean of the parameters for Clark model, Realdata, across N = 100 samples, Recursive window . . . . . . . . . . . . . . . . . . . . . . . 159

vii

2.6.iii Section 2.7.2: Distribution of the posterior mean of the parameters for Clark model, Realdata, across N = 100 samples, Rolling window . . . . . . . . . . . . . . . . . . . . . . . . 160

List of Figures

1.1 Plots of daily continuous contract futures price level series, Coffee with zoom . . . . . . . . 8

1.2 Coffee futures contracts, ICE exchange . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

1.3 Coffee futures with delivery in December 2013, ICE exchange . . . . . . . . . . . . . . . . 16

1.4 Plots of daily continuous contract futures price level series, Sugar and Lean hogs . . . . . . . 22

1.5 Soybean meal residuals from ARMA model . . . . . . . . . . . . . . . . . . . . . . . . . . 27

1.6 The mixed causal/noncausal model with Cauchy shocks . . . . . . . . . . . . . . . . . . . . 30

1.7 Plots of simulated bubble processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

1.8 Estimated unconditional densities, Cocoa and Coffee . . . . . . . . . . . . . . . . . . . . . 47

1.9 Forecast predictive density for Coffee futures price series . . . . . . . . . . . . . . . . . . . 55

1.10.i Plots of daily continuous contract futures price level series . . . . . . . . . . . . . . . . . . 76

1.10.ii . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

1.10.iii . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

1.10.iv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

1.11.i Histograms of daily continuous contract futures price level series . . . . . . . . . . . . . . . 80

1.11.ii . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

1.11.iii . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

1.11.iv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

2.1 Subsample sequence by recursive window . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

2.2 Subsample sequence by rolling window . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

2.3 Simulated sample paths yt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

2.4 Simulated stochastic volatilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

2.5 Simulated stochastic correlations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

2.6 IWSV, Posterior of ν across N = 100 sample windows . . . . . . . . . . . . . . . . . . . . 114

2.7.i IWSV, filtered latent volatility for 1st series, 1st sample window . . . . . . . . . . . . . . . . 115

2.7.ii IWSV, filtered latent volatility for 4th series, 1st sample window . . . . . . . . . . . . . . . 115

2.8 Simulated data, Forecast horizon term structure according to MLPLh,N metric, N = 100

sample windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

2.9 Simulated data, Sample window term structure according to difference of LPLh,n’s metric,h = 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

2.10 Histograms of the LPLh=10,n metrics across n = 1, . . . , N sample windows, 10th horizon . 117

2.11 Clark (2011) macroeconomic dataset, series and smoothed trends . . . . . . . . . . . . . . . 119

2.12 Clark (2011) macroeconomic dataset, detrended series . . . . . . . . . . . . . . . . . . . . . 120

2.13.i Largest eigenvalue of VAR(3) companion matrix, across N = 100 sample windows . . . . . 122

viii

2.13.ii Largest eigenvalue of the Υ matrix, across N = 100 sample windows . . . . . . . . . . . . 1222.14 IWSV and Clark, filtered latent stochastic volatilies, 100th sample window, Recursive window 1232.15.i Real data, IWSV model, filtered latent stochastic correlations for the complete sample, n = 100 1242.15.ii Real data, Clark model, filtered latent stochastic correlations for the complete sample, n = 100 1252.16 Real data, MSE comparison of VAR forecasts, % difference, both window types (below 0,

IWSV better) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1262.17.i Real data, Forecast horizon term structure according to MLPLh,N metric, N = 100 sample

windows, both Recursive and Rolling sample windows . . . . . . . . . . . . . . . . . . . . 1272.17.ii Real data, Forecast horizon term structure according to MLPLh,N metric, N = 100 sample

windows, includes homoskedastic vt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1272.18 Real data, Sample window structure of the difference of LPLh,n’s metric, h = 20 . . . . . . 1282.19 Histograms of the LPLh=20,n metrics across n = 1, . . . , N sample windows, 20th horizon . 1302.20 Real data, Sample window structure of the LPLh,n’s metric, h = 20 . . . . . . . . . . . . . 1312.21.i GDP growth series y1,t and forecast, IWSV and Clark models for vt, n = 100, Recursive

sample window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1322.21.ii Inflation growth series y2,t and forecast, IWSV andClarkmodels for vt, n = 100, Recursive

sample window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1332.21.iii Interest rate series y3,t and forecast, IWSV and Clark models for vt, n = 100, Recursive

sample window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1342.21.iv Unemployment rate series y4,t and forecast, IWSV and Clark models for vt, n = 100,

Recursive sample window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1352.22.i L21,t, time varying density . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1422.22.ii L31,t, time varying density . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1422.23.i IWSV, Posterior of parameters across N = 100 recursive sample windows . . . . . . . . . . 1612.23.ii IWSV, Posterior of parameters across N = 100 rolling sample windows . . . . . . . . . . . 1622.24.i Clark, Posterior of parameters across N = 100 recursive sample windows . . . . . . . . . . 1632.24.ii Clark, Posterior of parameters across N = 100 rolling sample windows . . . . . . . . . . . . 164

ix

Introduction

This thesis employs modern time series econometric techniques of nonlinear dynamic analysis to the fields of

both financial and macroeconomic data. In particular, we focus on some specific features of the data that include,

bubbles and cyclical features in the time evolution of financial data, as well as the mean reverting and co-dependent

nature of both macroeconomic indicator variables and their time varying volatility. This volatility can be described

as the so-called “Stochastic Volatility.”

We will examine some specific time series for which these features are important. For example, bubbles and

cyclical features in the time evolution of commodity prices are commonly modeled as either unit root or causal

nonlinear regime switching processes. Moreover, the time varying volatility of macroeconomic data, such as GDP

growth or inflation rates, is often considered as nonstationary.

Instead, we will consider some new types of dynamic models which are apt to capture these feature, but

which do not rely on the assumption of nonstationarity or direct time causality. For commodities data, these are

the class of “causal/noncausal” autoregressive models, which allow us to flexibly account for both longitudinal

and transversal asymmetry in the time evolution of both bubbles and cyclical features. This class of model

goes beyond the standard ARMA or ARCH models in number of ways. In fact, the possibility of longitudinal

asymmetry is unidentified in the standard linear causal ARMA model with Gaussian innovations. Moreover, the

causal/noncausal autoregressive model can exhibit a form of autoregressive conditional heteroskedasticity, for

example, the special case of the noncausal Cauchy autoregressive model exhibits these effects. Interestingly, the

mixed causal/noncausal model is linear in its mixed causality representation, but it is nonlinear in its equivalent

strictly causal form.

Moreover, we also propose a novel multivariate volatility specification for the time varying volatility of the

standard Vector Autoregressive (VAR) process common to macroeconomic time series analysis. This process

appeals to the properties of the Inverse Wishart distribution and allows us to capture co-dependence in the volatility

between series in a flexible way.

All these types of models involve nonlinear dynamics and latent factors. For the mixed causal/noncausal

model, the summaries of past and future paths form a nonlinear dynamic. Within the context of our macroe-

1

2

conomic VAR model, the latent factor is the volatility itself, which is unobserved given a stochastic volatility

specification driven by exogenous noise.

In general, we also face complicated causal forecast functions without closed form. There typically do not

exist closed forms for the likelihood function as well. Therefore, we apply appropriate numerical techniques

which are simulation based. For example, we generate forecasts from the causal predictive density of commodity

price data within a mixed causal/noncausal framework by appealing to the Look-Ahead estimator of the Markov

transition density and the Sampling Importance Resampling (SIR) algorithm. Alternatively, the macroeconomic

VAR forecasts under Inverse Wishart stochastic volatility are generated by employing a Bayesian approach which

relies on the Gibbs sampling method.

Chapter 1

Dynamic Modeling of Commodity Futures

Prices

1.1 Introduction

Financial theory has proposed general approaches for pricing financial assets and their derivatives, based on

arbitrage pricing theory [Ross (1976)], or equilibrium models: for example the Capital Asset Pricing Model

[Sharpe (1964)] or Consumption-Based Capital Asset Pricing Model [Breeden (1979)]. Traders have also relied

on technical analysis for insight into price movements [see e.g. Frost (1986)].

These approaches are generally applied separately on the different segments of the market, each segment

including a set of basic assets plus the derivatives written on these basic assets. These segments are used for

different purposes and can have very different characteristics. A standard example is the stock market, where the

basic assets are the stocks and the derivatives are both options written on the market index and futures written on

the index of implied volatility, called the VIX. These derivatives have been introduced to hedge and trade against

volatility risk. A large part of the theoretical and applied literature analyzes this stochastic volatility feature.

Another segment also widely studied is the bond market, including the sovereign bonds, but also the bonds

issued by corporations and the mortgage backed securities; the associated derivatives in this case are insurance

contracts on the default of the borrowers, such as Credit Default Swaps (CDS) or Collateralized Debt Obligations

(CDO). These derivatives have been introduced to manage the counterparty risks existing in the bond market.

This paper will focus on another segment; that is, the segment of commodities. This segment includes the spot

markets, derivatives such as the commodity futures with and without delivery, and derivatives such as options, puts

and calls, written on these futures.

3

CHAPTER 1. DYNAMIC MODELING OF COMMODITY FUTURES PRICES 4

This segment has special features compared to other segments, such as the stock market for instance. At least

three features make the commodity markets rather unique:

i) The basic assets are physical assets. There is a physical demand and a physical supply for these commodities

and by matching their demand and supply, we may define a “fundamental price” for each commodity. It

is known that the analysis of these fundamental prices can be rather complex even if it concerns the real

economy only. This is mainly a consequence of both shifts in demand and supply and of various interventions

to control the fundamental price of commodities. What follows are examples of such effects which differ

according to the commodity.

Cycles are often observed in commodity prices. They can be a consequence of costly, irreversible investment,

made to profit from high prices. For instance, farmers producing corn can substitute into producing cattle,

when grain prices are low. The production of milk (or meat) will increase and jointly the production of grain

will diminish. As a consequence the prices of milk (or meat) will decline, whereas the price of grain will

increase. This creates an incentive to substitute grain to cattle in the future and so forth, which introduces

cycles in the price evolution of both corn and cattle. Other substitutions between commodities can also

create a change of trend in prices. For example, the development of alternative fuel derived from soy created

a significant movement in soy prices.

These complicated movements can also be affected by different interventions to sustain and/or stabilize the

prices. The interventions can be done by governments (e.g. U.S., or European nations) for agricultural

commodities, as well as by (monopolistic or oligopolistic) producers such as the Organization of Petroleum

Exporting Countries (OPEC) for petroleum production or the De Beers company for diamonds. The real

demand and supply will affect the spot prices and futures contracts with delivery.

ii) Recently the commodity markets have also experienced additional demand and supply pressures by finan-

cial intermediaries. These intermediaries are not interested in taking delivery of the underlying products

upon maturity and are only interested in cashing in on favourable price changes in the futures contracts.

This behaviour betrays the original purpose of the futures markets which was to enable both producers and

consumers to hedge against the risk of future price fluctuations of the underlying commodity.

To try to separate the market for the physical commodity from simply gambling on their prices, purely

intangible assets have been introduced that are the commodity futures without delivery. Thus the market for

commodity derivatives has been enlarged. As usual, the speculative effect is proportional to the magnitude

and importance of the derivative market. This speculative effect is rather similar to what might be seen in the

markets for CDS or on the implied volatility index (VIX).


iii) The different spot and futures markets for commodities are not very organized and can involve a small number

of players and very often feature a lack of liquidity.

The economic literature mainly focuses on two features of commodity prices, that are their cross-sectional

and serial heterogeneity, respectively. Below, we will discuss the literature specific to each. The cross-sectional

analysis tries to understand how the prices of futures contracts with delivery are related with the spot prices, or to

explain the difference between the prices of futures with and without delivery. The analysis of the serial hetero-

geneity of prices focuses on the nonlinear dynamic features due to either the cycles and rationing effects coming

from the real part of the market, or the speculative bubbles created by the behaviour of financial arbitrageurs.

The questions above can be considered from either a structural, or a descriptive point of view. A “structural”

approach attempts to construct a theoretical model involving the relevant economic variables of interest which may

be important in explaining relationships which drive commodity spot and futures prices. The descriptive approach

does not explain “why” these series exhibit particular features, but rather provides a framework to estimate the

relationships between the prices, make forecasts, and price the derivatives.

What follows is a discussion on how these two approaches above have been addressed in the literature.

i) Cross-sectional heterogeneity

The study of cross-sectional heterogeneity of commodity futures prices has its roots in both the theory of

normal backwardation and the theory of storage. The Keynesian theory of normal backwardation implies a greater

expected future spot price than the current futures contract price, assuming that producers are on net hedgers and

that speculators, in order to take on the risk offered by producers, must be offered a positive risk premium.

Of the two theories, the theory of storage has probably had the greater influence. Instead of focusing on the net

balance of trader’s positions as in the theory of normal backwardation, the theory of storage focuses on how the

levels of inventory, that is the “stocks,” of the underlying commodities affect the decisions of market participants.

Inventories play an important role since it is known that both the consumption and supply of many commodities

are inelastic to price changes. For example, it is known that gasoline and petroleum products are everyday neces-

sities and both their consumption and production adjust slowly to price changes. Moreover, given real supply and

demand shocks the inelastic nature of these markets can lead to wild price fluctuations. Therefore, the role of in-

ventories is important in buffering market participants from price fluctuations, by avoiding disruptions in the flow

of the underlying commodities, and by allowing them to shift their consumption or production intertemporally.

The cost of storage is essentially a “no arbitrage” result. Let the difference of the current futures price and the

spot price be known as the basis. If the basis is positive, it must necessarily equal the cost of holding an inventory

into the future, known as the cost of carry, since otherwise a trader could purchase the good on the spot market,

enter into a futures contract for later delivery, and make a sure profit (or loss). From the reverse point of view, the


basis could never be negative since holders of inventories could always sell the good at the spot price, and enter a

futures contract to buy at the lower price, with no cost of carry.

However, empirical examination of the basis reveals that it is often negative. Kaldor (1939) was the first to

suggest a solution to this problem known as the convenience yield. The convenience yield measures the benefit of

owning physical inventories, rather than owning a futures contract written on them. When a good is in abundance,

an investor gains little by owning physical inventories. However, when the good is scarce, it is preferable to hold

inventories. Therefore, in equilibrium the basis should be equal to the difference between the cost of carry and

the convenience yield, permitting the basis to be negative when inventories are scarce.

Working (1933,1948,1949) used the theory of storage to describe the relationship between the price of storage

and inventories for the wheat market, called the “Working curve” or the storage function. The Working curve is

positively sloped and for some positive threshold storage level, relates inventories to the costs of storing them;

however, below this positive threshold of inventories, the function takes on negative values, illustrating that posi-

tive inventories can be held even when the returns from storage are negative, thereby incorporating the notion of

Kaldor’s convenience yield into the storage function.

Later work generalized these results in considering motivations for both storage behaviour and the convenience

yield. For example, Brennan (1958) considered storage from the speculative point of view, suggesting that on the

supply side, in addition to cost of storage, we expand the notion of the convenience yield to include a risk premium

to holders of inventories who may speculate upon, and benefit from, a possible rise of demand on short notice.

Modern structural models distinguish between what is the fundamental price connected with the underlying

physical supply and demand, from the cost of storage and any speculation. For example, in looking at oil price

speculation, Knittel and Pindyck (2013) address what is meant by the notion of “oil price speculation” and how

it relates to investment in oil reserves, inventories, or derivatives such as futures contracts. Although the price of

storage is not directly observed, it can be determined from the spread between futures and spot prices. In their

model there are two interrelated markets for a commodity: the cash market for immediate or “spot” purchase/sale,

and the “storage market” for inventories. The model attempts to distinguish between the physical supply and

demand market and the effect of speculators on both the futures and spot prices.

Other structural work on the basis has employed the CAPM model. For example Black (1976) studied the

nature of futures contracts on commodities, suggesting that the capital asset model of Sharpe (1964) could be

employed to study the expected price change of the futures contract. Dusak (1973) also studied the behaviour of

futures prices within a model of capital market equilibrium and found no risk premium for U.S. corn, soybeans,

and wheat futures between 1952 and 1967. Breeden (1979) developed the consumption CAPM model which

allowed us to consider the futures price as composed of both an expected risk premium and a forecast of the

future spot price.


Econometrically, Fama and French (1987) found evidence that the response of futures prices to storage-cost

variables was easier to detect than evidence that futures prices contain premiums or power to forecast spot prices.

Other econometric work has been purely descriptive in attempting to model the basis process itself. For

example, Gibson and Schwartz (1990) model the convenience yield as a mean reverting continuous time stochastic

process, where the unconditional mean represents the state of inventories which satisfy industry under normal

conditions.

The cost of storage also imposes a natural constraint on inventories in that they cannot be negative; this

has effects which show up empirically. For example, inventory levels and the basis tend to share a positive

relationship as the theory of storage and convenience yield would suggest. Brooks et al. (2011) employ actual

physical inventory levels data on 20 different commodities between 1993-2009 and show that inventory levels are

informative about the basis, so that when inventories are low the basis is possibly negative (and vice versa). They

also find that futures price level volatility is a decreasing linear function of inventories so that when the basis is

negative, price volatility is higher. Empirical evidence also suggests that the basis behaves differently when it is

positive versus when it is negative. For example, Brennan (1991) expanded the work of Gibson and Schwartz

(1990) by incorporating the non-negativity constraint of inventories and so the convenience yield is downward

limited.

Finally, there is econometric evidence that corroborates Brennan (1958) above. Sigl-Grub and Schiereck

(2010), employ commitment of traders information on 19 commodity futures contracts between 1986 and 2007

(using the commitment of traders information as a proxy for speculation) and find that the autoregressive persis-

tence of futures returns processes tend to increase with speculation.

ii) Price dynamics

Another part of the literature tries to understand the nonlinear dynamic patterns observed in futures prices

that can manifest as either cycles or speculative bubbles. Generally, we observe more or less frequent successive

peaks and troughs in the evolution of prices. These peaks and troughs have non standard patterns which can

be classified according to the terminology in Ramsey and Rothman (1996) where they distinguish the concepts

of “longitudinal” and “transversal” asymmetry. The notion of longitudinal asymmetry employed in Ramsey and

Rothman (1996) builds upon other previous work, for example the study of business cycle asymmetry from Neftci

(1984).

Longitudinal asymmetry refers to asymmetry where the process behaves differently when traveling in direct

time versus in reverse time. For example, longitudinal asymmetry may manifest as a process where the peaks

rise faster than then they decline (and behaves in the opposite way in reverse). Figure 1.1 provides a plot which

illustrates these features for the coffee price level, continuous futures contract without delivery. In the right panel

(which provides a zoom) we can see how the peaks tend to rise quickly, but take a long time to decline into the


trough.

Transversal asymmetry is characterized by different process dynamics above and below some horizontal plane

in the time direction; that is, in the vertical displacement of the series from its mean value. For example, the

coffee process also exhibits transversal asymmetry in that the peaks in the positive direction are very sharp and

prominent, while the troughs are very drawn out and shallow (again see Figure 1.1 right panel). So, a series can

be both longitudinally and transversely asymmetric.

Figure 1.1: Plots of daily continuous contract futures price level series, Coffee with zoom

0

50

100

150

200

250

300

350

08/0

119

77

09/0

119

82

09/0

119

87

10/0

119

92

11/0

119

97

12/0

120

02

01/0

120

08

02/0

120

13

Coffee from 07/18/1977 to 02/08/2013

0

50

100

150

200

250

300

350

05/0

119

96

10/0

119

97

03/0

119

99

08/0

120

00

01/0

120

02

06/0

120

03

11/0

120

04

Coffee (zoom)

The theoretical literature has been able to derive price evolutions with such patterns as a consequence of self-

fulfilling prophecies. The initial rational expectation (RE) models were linear: the demand is a linear function

of the current expected future prices and exogenous shocks on demand, and the supply is a linear function of the

current price and of supply shocks. In this way we can consider the path of equilibrium prices. Muth (1961) was

the first to employ such a framework which incorporated expectations formation directly into the model.

Since the equilibrium in RE models is both with respect to prices and information, these models have an infi-

nite number of solutions, even if the exogenous shocks have only linear dynamic features. Some of these solutions

have nonlinear dynamic features which are similar to the asymmetric bubble patterns described above. Among

these solutions featuring bubbles, some can exhibit isolated bubbles and others can demonstrate a sequence of

repeating bubbles. For example, Blanchard (1979) and Blanchard and Watson (1982) derived RE bubble models

for the stock market which presumed the price process is composed of both the fundamental competitive market

solution for price 1 plus a nonstationary martingale component that admits a rational expectation representation

[Gourieroux, Laffont, and Monfort (1982)], but exhibits bubble like increases or decreases in price. Blanchard and

1That is, where price is the linear present value of future dividends.


Watson (1982) described a possible piecewise linear model for the martingale bubble component which spurred

later authors to test statistically for the presence of this component. Later, Evans (1991) suggested that such

econometric tests may be limited in their ability to detect a certain important class of rational bubbles which

exhibit repeating explosive periods.

Generally these basic modeling attempts were focused on the stock market and it is not clear what analog there

is (if any) of the “fundamental” price of the futures contract without delivery. Moreover, they take into account

only the expected prices, not the level of volatility, and they incorporate linear functions for the price, and so the

solution may not be unique.

More recent RE models have exhibited features consistent with the asymmetries discussed above with regards

to both Ramsey and Rothman (1996) and the cost of storage models, and the natural asymmetry which occurs since

inventories cannot be negative. For example, Deaton and Laroque (1996) construct a RE model of commodity

spot prices, in which they generate a “harvest” process2 which drives a competitive price in agricultural markets

composed of both final consumers and risk-neutral speculators. From an intertemporal equilibrium perspective,

when the price today is high (relative to tomorrow) nothing will be stored so there will be little speculation;

however, when the price tomorrow is high (relative to today), speculation will take place and storage will be

positive. Because inventories cannot be negative, the market price process under storage will follow a piecewise

linear dynamic stochastic process.

Moreover, both theory and evidence suggests that RE models might take the form of a noncausal process. For

example, Hansen and Sargent (1991) showed that if agents in the commodity futures market can be described by

a linear RE model, and have access to an information set strictly larger than that available to the econometrician

modeling them, then the true shocks of the moving average representation that describe the RE equilibrium process

will not represent the shocks the econometrician estimates given a purely causal linear model. In fact, the shocks

of the model will have a non-fundamental representation and we say that the model is at least partly “noncausal.”

Of course, modeling a process as partly noncausal does not imply that agents somehow “know the future.” Rather,

it simply represents another equivalent linear representation.

Through simulation studies, Lof (2011) also showed that if we simulate the market asset price from both an

RE model with homogenous agents and that from a model with boundedly rational agents with heterogenous

beliefs [based on the model by Brock and Hommes (1998)], and then estimate both a purely causal model and a

model with a noncausal component on this data (given that the econometrician has full information) we find that

on average the rational expectations model is better fit by the causal model, while the heterogenous agents model

is better fit by a noncausal model.

2The process may possibly be serially correlated. The authors discuss at length the major differences that occur in the model dynamicswhen harvests are i.i.d. versus serially correlated.


Given these features, the time series literature has rapidly realized that the standard linear dynamic models,

that is, the autoregressive moving average (ARMA) processes with Gaussian shocks, are not appropriate for

representing the evolution of either commodity spot or futures prices. Indeed, they are not able to capture the

nonlinear dynamic features due to asymmetric cycles and price bubbles described above. For describing the

cycles created through the dynamics of investment between two substitutable commodities among producers (see

the discussion of the example of cattle vs. grain above), it is rather natural to consider an autoregressive model

with a threshold, that is, the threshold autoregressive model (TAR) introduced by Tong and Lim (1980) in the

time series literature. Indeed, the cycles associated with substitutable products are in some ways analogous to

the predator-prey cycle for which the TAR model was initially introduced. The TAR model has been applied on

commodity prices to study the integration between corn and soybean markets in North Carolina by Goodwin and

Piggotts (2001) and U.S. soybeans and Brazilian coffee by Ramirez (2009) to compare the asymmetry of such

cycles.

Contribution of the paper

Our paper contributes to the empirical literature on commodity futures prices by implementing nonlinear dynamic

models apt to reproduce the patterns of speculative bubbles observed on the commodity price data. To focus on

speculative bubbles and not on the underlying cycles of the fundamental spot price, we consider the continuous

contract futures price series available from Bloomberg on which it is believed that the speculative effects will be

more pronounced. We propose to analyze such series by means of the mixed causal/noncausal models where the

underlying noise defining the process has fat tails. Indeed, it has been shown in Gourieroux and Zakoian (2012)

that such models can be used to mimic speculative bubbles, or more generally peaks and troughs with either

longitudinal or transversal asymmetry. The estimation of such mixed models will be performed on 25 different

physical commodities, across five different industrial sectors, to check for the robustness of this modeling.

The rest of the paper is as follows. Section 1.2 discusses the details of the futures contracts including the

underlying commodities, the markets they are traded in, and the features of the data series themselves includ-

ing summary statistics. Section 1.3 shows that the linear causal ARMA models with Gaussian innovations are

unable to adequately capture the structure of this commodity data. Section 1.4 introduces the theory of mixed

causal/noncausal processes, and discusses the special case of the noncausal Cauchy autoregressive process of

order 1. This section also demonstrates how the mixed causal/noncausal process can accommodate both asym-

metries and bubble type features. Section 1.5 then introduces the mixed causal/noncausal autoregressive model

of order (r, s) and discusses its estimation by approximated maximum likelihood. Section 1.5.2 then details the

results of estimating the mixed causal/noncausal autoregressive model to the commodity futures price level data.

Section 1.6 then compares the estimated unconditional distributions of both the purely causal and mixed models

according to the Kullback-Leibler measure. Section 1.7 then considers the appropriate method for forecasting the


mixed causal/noncausal model given data on the past values of the process and applies this method to forecast

the futures data. Finally, the technical proofs and the other material related to the data series are gathered in the

appendices.

1.2 Description of the asset and data

1.2.1 The forward contract

A forward contract on a commodity is a contract to trade, at a future date, a given quantity of the underlying good

at a price fixed in advance. Such a forward contract will stipulate:

The names of those entering into the contract, i.e. the buyers and sellers.

The date at which the contract is entered into at some time t.

The date at which the contract matures at some future time t+ h.

The forward delivery price ft,t+h, negotiated and set in the contract at time t to be paid at the future time

t+ h.

The monetary denomination of the contract.

The characteristics and quality of the underlying good, often categorized by pre-specified “grades.”

The amount and units of the underlying good; typically commodity contracts will stipulate a number of

predefined base units e.g. 40,000 lbs of lean hogs.

Whether the good is to be delivered to the buyers upon maturity at time t+h (otherwise the buyer will have

to pick up the good themselves).

It will also specify the location of delivery if applicable and the condition in which the good should be

received.

Historically, such forward contracts were introduced to serve an economic need for producers or consumers to

be able to hedge against the risk of price fluctuations in which they sell or purchase their products. For example,

a producer of wheat might be subject to future supply and demand conditions that are unpredictable. As such a

risk adverse producer would enter into a forward contract which would ensure a stable price at a certain date in

the future for their products. Therefore, despite whether the price of their product rises or falls they can be certain

of receiving the forward price. As another example, consider the consumer’s side of the problem, where an airline


company wishes to guarantee a stable future price for inputs, e.g. jet fuel, in order to provide customers with

relatively unchanging prices of their outputs i.e. airline tickets.

Such traditional forward contracts still exist as bilateral agreements between two parties, sold on so called

“over the counter” (OTC) markets. These contracts still fulfill an important role for certain groups, for example

large organizations such as national governments, since the parties involved are unlikely to default on their end

of the contract. However, if the investor is not sure of the financial integrity of the opposite party, such a forward

contract is by construction subject to counterparty risk. Therefore, as opposed to nations which have the power to

recover from counterparty loses and are self insured, contracts catering to other types of investors must somehow

incorporate an insurance scheme into the contract itself to accommodate counterparty risk.

Counterparty risk presents itself as the forward contract approaches maturity since if the forward price is

below (resp. above) the spot price, ft,t+h < (resp. >)pt+h, then the contract is profitable only to the buyer (resp.

seller), except if the seller (resp. buyer) defaults.

1.2.2 The futures contract

A futures contract on a commodity is a forward contract, but with an underlying insurance in place against possible

counterparty risk. The insurance is paid by means of insurance premia, called “margin” on the futures markets.

There is an initial premium or initial margin, and intermediary premia, or “margin calls.”

Therefore a futures contract with delivery contains the same information and contractual stipulations as the

forward contract. It still represents an agreement to either buy or sell some underlying good at a future date, given

a predetermined “futures price” Ft,t+h set at time t today. However, in addition it will also specify a margin

scheme which:

Stipulates the initial margin; that is the amount each trader must first put up as collateral to enter into futures

contracts.

Implements a mechanism whereby the margin account balance is maintained at a certain level sufficient to

cover potential losses. If the margin account balance drops below a threshold amount, the trader is obliged

to put up more collateral, known as the margin call.

Generally, the price of a futures contract with delivery, Ft,t+h, differs from the price of a similar forward

contract ft,t+h, since it must account for the price of the underlying insurance against counterparty risks.

A futures contract requires the presence of an “insurance provider” usually either a broker, or a clearing house.

This provider will fix the margin rules for both the buyer and seller and manage a reserve account to be able to


hedge the counterparty risks in case of default of either party unable to fulfill margin calls.3

Of course, the clearing house plays a second very important role: namely that of “clearing the market” by

trying to match demand and supply between buyers and sellers of contracts. As a consequence, the clearing house

facilitates the formation of futures prices Ft,t+h as equilibrium prices. Therefore, we must distinguish between

brokers themselves who act as intermediaries, and the clearing house and brokering platforms which also serve a

more central purpose.

Finally, if the date and magnitude of the margin calls were known at the date of the futures’ contract issue,

the contract with delivery would simply reflect a portfolio (or sequence) of forward contracts which are renewed

each day [Black (1976)]. However, the margin calls are fixed by the brokers or the clearing house according to

the evolution of the risk, i.e. to the observed evolution of the spot prices, but also to the margin rules followed by

their competitors and so the interpretation as a portfolio of forwards is no longer valid.

1.2.3 The futures contract without delivery

In the market for futures with delivery, historically some intermediaries or investors have demonstrated that they

are not in the market simply to buy or sell physical goods for future delivery and that they do not actually take

delivery of the underlying physical good. Rather these investors are in the market simply to speculate on the future

price of the contract.

Given this trend, futures contracts without delivery have been introduced where instead of taking delivery of

the commodity they receive cash. Without delivery of a physical good, the derivative product becomes a purely

“financial” asset. Therefore there has been an attempt to separate these two types of instruments: a financial

market designed purely for speculative purposes and a “real” market that provides a mechanism for both producers

and consumers to hedge against the risk of price fluctuations.

This trend towards differentiation of futures with and without delivery was designed to suppress the effect that

speculation may have on the spot price of the underlying good. For example, traders who are in a loss position

may be unable to offset their positions rapidly enough as maturity of the futures contract with delivery approaches.

Given this situation they are forced to purchase or sell the underlying good in the spot markets in order to meet

their contractual obligation. If many traders are in this situation simultaneously and on the same side of the

market, the effect could have a dramatic impact on the spot price.

3There also exists a counterparty risk of the insurance provider itself. For instance, in 1987 the clearing house for commodity futures inHong Kong defaulted. This “double default” counterparty risk is not considered in our analysis.


1.2.4 Organization of the markets

In recent years, the futures commodity markets have become more organized. There is standardization of the

financial products and the margin rules. For example the Standard Portfolio Analysis of Risk (SPAN) system has

become common place as an instrument to determine the margin levels (both the clearing houses associated with

the Chicago Mercantile Exchange (CME) and Intercontinental Exchange (ICE) have adopted its use). The system

represents a computational algorithm which determines each trading day the risk for each commodity future by

scanning over sixteen different possible price and volatility scenarios given the time to maturity of the contract.

The sixteen scenarios consider various possible gains or losses for each futures contract, with each gain or loss

classification representing a certain fraction of the margin ratio.4 The results of these tests are used to define the

appropriate margin call requirements for the different participants. Even if the SPAN methodology is a standard

one, the choice of the risk scenarios depends on the clearing house. Finally, the SPAN system is not perfect and

is likely to be modified in the near future. See for example, the “CoMargin” framework discussed in Cruz Lopez

et al. (2013).

Interestingly, the OTC forward markets are slowly becoming more organized like the futures markets. For

example the European Market Infrastructure Regulation (EMIR) that entered into force on August 16, 2012, was

designed to promote the trading of standardized forward contracts on exchanges or electronic trading platforms

which are cleared by central counterparties and non-centrally cleared contracts should be subject to higher capital

requirements. Generally there is concern that the clearing houses need to play a larger role in their function of

mitigating counterparty risk, especially as it pertains to large valued contracts which could effect the economic

base if they were left to default.5

1.2.5 Example of a futures contract

Figure 1.2 provides an example of a set of futures contracts with delivery written on coffee and traded on the

ICE exchange.6 There are different contracts available for different maturities, which are listed on the far left

column. Coffee production generally occurs in both the northern and southern hemispheres – there is a northern

harvest taking place between October and January and a southern harvest between May and September. Given

these differing harvests, coffee futures mature every two months from March to September and every three months

onward until the following March. Furthermore, there exist contracts currently available for purchase that mature

4See https://www.theice.com/publicdocs/clear_us/SPAN_Explanation.pdf available on the ICE exchange web-site.

5However, having the clearing house play a more predominant role also raises concerns over systemic risk – that is,could clearing houses themselves become “too big to fail” institutions? See the H. Plumridge (December 2nd, 2011) ,“What if a clearing house failed?,” Wall Street Journal, accessed Sept. 20, 2013 at http://online.wsj.com/article/SB10001424052970204397704577074023939710652.html.

6The chart is provided by TradingCharts.com at http://tfc-charts.w2d.com/marketquotes/KC.html.


quite far into the future. For example, the coffee future contract currently with the longest time to maturity is the

contract for March 2016 delivery.

The date this chart was accessed is also given as September 19th, 2013. Therefore, when we speak of the

futures price Ft,t+h, within the context of our model with daily data (see the data section below) the time t would

be the current date given above, and the period h would represent the number of trading days until the contract

matures. Such contracts with delivery stipulate a last trading day which is typically the last business day prior

to the 15th day of the given contract’s maturity month. For instance, given the December 2013 contract, the last

business day before December 15th will fall on Friday December 13th, 2013 (resp. Friday March 14th, 2014;

Thursday May 15th, 2014; etc; for the subsequent contracts).

The “open,” “high,” “low,” and “last,” describe the intraday trading activity of the current trading session; that

is, the opening price, the highest and lowest prices, and the last price paid, respectively. The table also displays the

last change in price, the current volume of trades, and the set price and open interest from the last trading session

of the prior day. “Open interest” (also known as open contracts or open commitments) refers to the total number

of contracts that have not yet been settled (or “liquidated”) in the immediately previous time period, either by an

offsetting contractual transaction or by delivery. Therefore, a larger open interest can complement the volume

measure in interpreting the level of liquidity in the market. As contracts approach maturity, both the volume and

open interest levels tend to rise; contracts with very distant times to maturity are not very liquid.

Figure 1.2: Coffee futures contracts, ICE exchange


Figure 1.3 provides a candlestick plot of the typical intraday trading activity between September 13th, 2013,

and September 19th, 2013, for the coffee future contract with delivery in December 2013. Note that trading

does not occur 24 hours a day (rather the trading day takes place between 8:30AM-7:00 PM BST7) and so there

are discontinuities in the price series. The thin top and bottom sections of the candlestick, called the shadows,

represent the high and low prices, and the thick section called the real body, denotes the opening and closing

prices. Each candlestick describes trading activity over a 30 minute period.8

Figure 1.3: Coffee futures with delivery in December 2013, ICE exchange, intraday price $ US

1.2.6 Data on the commodity futures contracts

The continuous contract

The discussion above illustrates some of the difficulties in analyzing price data for derivative products. For

example, many of the products are very thinly traded with low liquidity. Moreover, some products may only be

available on one trading platform and not another. For example, many futures contracts with delivery are available

mutually exclusively either on CME, or the ICE, and their associated clearing houses do not necessarily follow7British Summer Time as the ICE exchange is located in London, England.8There are 21 candlesticks each day, representing the 10.5 opening hours.


identical margin schemes. Also, OTC product data may only be available through certain brokers proprietary

trading platforms.

Perhaps the most consequential problem we face in attempting to analyze futures contracts data is that the

individual contracts of various maturities will eventually expire and so we need a method whereby we can “extend”

the futures price series indefinitely. However, even in accomplishing this task we must consider that the contracts

of various maturities, while written on the same underlying good are not quite the same “asset” and so the asset

itself is changing over time. Therefore, we need some method to, not only extend the series, but to standardize the

price measurements across time and maturity, and ensure that when we construct the series we are taking prices

which are relevant, e.g. with sufficient liquidity to be appropriately representative, deriving in essence a new asset

that no longer matures. In doing so we would also like to be able to bring together information on prices available

from different trading platforms in one place.

The Bloomberg console offers a solution to this problem by amalgamating futures data for delivery from both

the ICE and CME exchanges into one system. Bloomberg also offers what is called called a continuous contract

which mimics the behaviour of a typical trader who is said to “roll over” the futures contract as it approaches

maturity. “Rolling over” refers to the situation where a trader would close out, or “zero,” their account balance

upon the approach of a futures contract’s maturity, if they do not intend on taking delivery, by first purchasing an

offsetting futures contract and then simultaneously reinvesting in another future with a further expiration month.

In this way, an artificial asset is created which tracks this representative trader’s futures account holdings across

time indefinitely. Details on how this is accomplished, as well as other methods that can be employed, are outlined

in Appendix 1.10. Users of the Bloomberg console can customize criteria which define the rollover strategy, e.g.

volume of trades or open interest; in this paper we choose to employ the continuous contract that mimics the

rolling over of the futures contract with the shortest time to maturity known as the “front month” contract.

Industry sectors

We will consider a number of physical commodity futures contracts for a broad range of products. The commodi-

ties are divided into various industry sectors that are expected to behave similarly to each other. The industry

sectors are given in Table 1.1.

Within each futures contract itself there are specified a number of different product grades. At the exchange

level it is determined that any products which match pre-specified grade criteria are considered part of the same

futures contract. This is to promote standardization of contracts and volume of trades. For example, the coffee

future discussed above is specified on the ICE exchange as the “Coffee C” future with exchange code KC. This

future allows a number of grades and a “Notice of Certification” is issued based on testing the grade of the


Table 1.1: Commodity sectors

Energy Metals Softs Soy LivestockBrent crude oil Copper Corn Soybeans Lean hogsLight crude oil Gold Rice Soybean meal Live cattleHeating oil Palladium Wheat Soybean oilNatural gas Platinum SugarGas oil Silver Orange juiceGasoline RBOB Cocoa

CoffeeCottonLumber

beans and by cup testing for flavor. The Exchange uses certain coffees to establish the ”basis.” Coffees judged

better are at a premium; those judged inferior are at a discount. Moreover, these grades are established within

a framework of deliverable products, for example from the ICE product guide for this KC commodity future

we have that “Mexico, Salvador, Guatemala, Costa Rica, Nicaragua, Kenya, New Guinea, Panama, Tanzania,

Uganda, Honduras, and Peru all at par, Colombia at 200 point premium, Burundi, Venezuela and India at 100 point

discount, Rwanda at 300 point discount, and Dominican Republic and Ecuador at 400 point discount. Effective

with the March 2013 delivery, the discount for Rwanda will become 100 points, and Brazil will be deliverable at

a discount of 900 points.”

Energy

Brent crude oil is a class of sweet light crude oil (a “sweet” crude is classified as containing less than 0.42% sulfur,

otherwise it is known as “sour”). The term “light” crude oil characterizes how light or heavy a petroleum liquid

is compared to water. The standard measure of “lightness” is the American Petroleum Institute’s API gravity

measure. The New York Mercantile Exchange (NYMEX) defines U.S. light crude oil as having an API measure

between 37 (840 kg/m3) and 42 (816 kg/m3) and foreign as having between 32 (865 kg/m3) and 42 API.

Therefore, various grades are defined in the standardized contract. Both foreign and domestic light crude oil

products are required to admit various characteristics based on sulfur levels, API gravity, viscosity, Reid vapor

pressure, pour point, and basic sediments or impurities. Exact grade specifications are available in the CME Group

handbook, Chapter 200, 200101.A and B.

The price of Brent crude is used as a benchmark for most Atlantic basin crude oils, although Brent itself derives

from North Sea offshore production. Other important benchmarks also include North America’s West Texas

Intermediate and the middle east UAE Dubai Crude which together track the world’s internationally traded crude

oil supplies. The representative light crude oil future employed in this paper is written on West Texas Intermediate


and exchanged by the CME Group. The delivery point for (WTI) light crude oil is Cushing, Oklahoma, U.S.,

which is also accessible to the international spot markets via pipelines. Likewise, the Brent crude oil future is

exchanged by ICE and admits delivery at Sullom Voe, an island north of Scotland.

Heating oil is a low viscosity, liquid petroleum product used as a fuel for furnaces or boilers in both residential

and commercial buildings. Heating oil contracts take delivery in New York Harbor. Just as in crude oil contracts,

very detailed stipulations exist regarding product quality grades; see the CME handbook, Chapter 150, 150101.

Natural gas is a hydrocarbon gas mixture consisting primarily of methane, used as an important energy source in

generating both heating and electricity. It is also used as a fuel for vehicles and is employed in both the production

of plastics and other organic chemicals. Natural gas admits delivery at the Henry Hub, a distribution hub on the

natural gas pipeline system in Erath, Louisiana, U.S. Contract details are available in the CME handbook, Chapter

220, 220101. Gas oil (as it is known in Northern Europe) is Diesel fuel. Diesel fuel is very similar in its physical

properties to heating oil, although it has commonly been associated with combustion in Diesel engines. Gas oil

admits delivery in the Amsterdam-Rotterdam-Antwerp (ARA) area of the Netherlands and Belgium. Contract

grade specifications are available from the exchange, ICE.

The Gasoline RBOB classification stands for Reformulated Blendstock for Oxygenate Blending. RBOB is the

base gasoline mixture produced by refiners or blenders that is shipped to terminals, where ethanol is then added to

create the finished ethanol-blended reformulated gasoline (RFG). Gasoline RBOB admits delivery in New York

Harbor and quality grade details are outlined in the CME handbook, Chapter 191, 191101.

Metals

Gold and silver, have both traditionally been highly sought after precious metals for use in coinage, jewelry,

and other applications since before the beginning of recorded history. Both also have important applications in

electronics engineering and medicine. The CME exchange licenses storage facilities located within a 150 mile

radius of New York city, in which gold or silver may be stored for delivery on exchange contracts. The quality

grades for gold and silver are defined in the CME handbook, Chapters 113 and 112, respectively.

Platinum, while also considered a precious metal, also plays an important role, along with the metal Palladium

in the construction of catalytic converters. Catalytic converters are used in the exhaust systems of combustion

engines to render output gases less harmful to the environment. Palladium also plays a key role in the construction

of hydrogen fuel cells. Finally, copper is a common element used extensively in electrical cabling given its good

conductivity properties. Platinum, Palladium, and Copper offer a number of delivery options, including delivery

to warehouses in Zurich, Switzerland. See the CME handbook Chapters 105, 106 and 111 respectively.


Softs and Livestock

“Soft goods” are typically considered those that are either perishable or grown in an organic manner as opposed

to “hard goods” like metals which are extracted from the earth through mining techniques.

In the grains category we have corn, rice, and wheat which are all considered “cereal grains”; that is, they

represent grasses from which the seeds can be harvested as food. Sugar, derived from sugarcane, is also a grass

but the sugar is derived not from the seeds but from inside the stalks. Corn, rice, and wheat all admit a number

of standardized delivery points within the U.S. See the CME handbook chapters 10, 14, and 17 for grade specifi-

cations and delivery options. Sugar delivery point options and grade details are available online from ICE, under

the Sugar No.11 contract specification.

Orange juice is derived from oranges which grow as the fruit of citrus tree, typically flourishing in tropical

to subtropical climates. The juice traded is in frozen concentrated form. Orange juice is deliverable to a number

of points in the U.S., including California, Delaware, Florida, and New Jersey warehouses. See the ICE FCOJ

Rulebook available online for further information and quality grade details. Coffee is derived from the seeds of

the coffea plant, referred to commonly as coffee “beans.” Cocoa represents the dried and fully fermented fatty

seeds contained in the fruit of the cocoa tree. Finally, cotton is a fluffy fibre that grows around the seeds of the

cotton plant. Delivery point information and quality grade details for Coffee, Cocoa, and Cotton are also available

via the ICE Rulebook chapters available online.

In the soy category we have soybeans, a species of legume widely grown for its edible beans; soybean meal

which represents a fat-free, cheap source of protein for animal feed and many other pre-packaged meals; and

finally, soybean oil is derived from the seeds of the soy plant and represents one of the most widely consumed

cooking oils. All three soybean products admit a number of standardized delivery points within the U.S. See the

CME handbook chapters 11, 12, and 13 for grade specifications and delivery options.

Lean hogs refers to a common type of pork hog carcass used typically for consumption. A lean hog is

considered to be 51-52% lean, with 0.80-0.99 inches of back fat at the last rib, with a 170-191 lbs. dressed weight

(both “barrow” and “gilt” carcasses). Live cattle are considered 55% choice, 45% select, yield grade 3 live steers

(a castrated male cow). Finally, lumber is traded as random length 2×4’s between 8-20 feet long. Lean hogs

futures are not delivered but are cash settled based on the CME Lean Hog Index price. Cattle is to be delivered

to the buyer’s holding pen. Lumber shall be delivered on rail track to the buyer’s producing mill. See CME

handbook Chapters 152, 101, and 201, respectively for details.


Data sources

The following Table 1.2 outlines the dates for which there exists data for each commodity futures price series,

the time to maturity, currency denomination, commodity exchange and code, and basic unit/characteristics of the

product traded.

Table 1.2: Commodity specifications

Commodity Start date CEM Currency unit Exchange Code Basic unitSoybean meal 7/18/1977 FHKNZ U.S.$/st CME ZM/SM 100 st’sSoybean oil 7/18/1977 FHKNZ U.S.$/100lbs CME ZL/BO 60,000 lbsSoybeans 7/18/1977 FHKNX U.S.$/100bushel CME ZS/S 5,000 bushelsOrange juice 7/18/1977 FHKNUX U.S.$/100lbs ICE OJ 15,000 lbsSugar 7/18/1977 HKNV U.S.$/100lbs ICE SB 112,000 lbsWheat 7/18/1977 HKNUZ U.S.$/100bushel CME ZW/W 5,000 bushelsCocoa 7/18/1977 HKNUZ U.S.$/MT ICE CC 10 MTCoffee 7/18/1977 HKNUZ U.S.$/100lbs ICE KC 37,500 lbsCorn 7/18/1977 HKNUZ U.S.$/100bushel CME CZ/C 5,000 bushelsCotton 7/18/1977 HKNZ U.S.$/100lbs ICE CT 50,000 lbsRice 12/6/1988 FHKNUX U.S.$/100hw CME ZR/RR 2,000 hwLumber 4/7/1986 FHKNUX U.S.$/mbf CME LBS/LB 110 mbfGold 7/18/1977 GMQZ U.S.$/oz CME GC 100 troy ozSilver 7/18/1977 HKNUZ U.S.$/100oz CME SI 5,000 troy ozPlatinum 4/1/1986 FJNV U.S.$/oz CME PL 50 troy ozPalladium 4/1/1986 HMUZ U.S.$/oz CME PA 100 troy ozCopper 12/6/1988 HKNUZ U.S.$/100lbs CME HG 25,000 lbsLight crude oil 3/30/1983 All U.S.$/barrel CME CL 1,000 barrelsHeating oil 7/1/1986 All U.S.$/gallon CME HO 42,000 gallonsBrent crude oil 6/23/1988 All U.S.$/barrel ICE CO 1,000 barrelsGas oil 7/3/1989 All U.S.$/MT ICE QS? 100 MTNatural gas 4/3/1990 All U.S.$/mmBtu CME NG 10,000 mmBtuGasoline RBOB 10/4/2005 All U.S.$/gallon ICE HO 42,000 gallonsLive cattle 7/18/1977 GJMQVZ U.S.$/100lbs CME LE/LC 40,000 lbsLean hogs 4/1/1986 GJMQVZ U.S.$/100lbs CME HE/LH 40,000 lbs

The units are described as follows. A barrel is considered to be 42 U.S. gallons. An mmBtu is one million

British Thermal Units, a traditional unit of energy equal to about 1055 joules per Btu. An MT is one metric tonne,

which is a unit of mass approximately equal to 1,000 kilograms. Lbs and oz are the abbreviations for pounds and

ounces, respectively. A “Troy oz” is a slightly modified system whereby one troy oz is equal to approximately

1.09714 standard oz. A bushel is a customary unit of dry volume, equivalent to 8 gallons. An mbf is a specialized

unit of measure for the volume of lumber in the U.S, called a “board-foot.” A board-foot (or “bf”) is the volume

of a one-foot length of a wooden board, one foot wide and one inch thick. Therefore an mbf is one million

such board-feet. Finally, an “st” or short tonne is a unit of mass smaller than the metric tonne, equivalent to

approximately 907 kilograms.


The column CEM represents the range of “contract ending months” that each futures contract may be specified

for. The month codes are as follows: F - January, G - February, H - March, J - April, K - May, M - June, N -

July, Q - August, U - September, V - October, X - November, and Z - December. These are the standard codes

employed by the exchanges.

All series end on February 8th, 2013, and represent daily closing prices for those days the commodities are

traded on the exchange. In June 2007 the CBOT (Chicago Board of Trade) which acted as the exchange for soy

products, wheat corn, and rice, merged with the CME (Chicago Mercantile Exchange) to form the CME Group.

Moreover, most of the energy futures were originally traded on the NYMEX (New York Mercantile Exchange) and

the metals were traded on the COMEX (Commodity Exchange; a division of the NYMEX). However, on August

18, 2008, the NYMEX (along with the COMEX) also merged with the CME Group. Gas oil was originally traded

on the IPE (International Petroleum Exchange) which was acquired by ICE (IntercontinentalExchange) in 2001.

Therefore, care must be taken in interpreting the various exchange codes which have changed over time.

For most CME contracts, the last trading day is typically the 15th business day before the first day of the

contract month. The delivery date is then freely chosen as any day during the contract month.

1.2.7 Features of the price level series

When dealing with financial data we typically consider the continuously compounded returns series,

rt = ln(Pt/Pt−1), since the price level process is nonstationary and so we are obliged to transform the initial

price data. However, in the case of futures price data without delivery, an examination of the time evolution of the

price level processes does not necessarily suggest the presence of trends, either of the stochastic type (i.e. random

walk), or due to a deterministic increase or decrease.

Figure 1.4: Plots of daily continuous contract futures price level series, Sugar and Lean hogs

0

5

10

15

20

25

30

35

40

45

50

08/0

119

77

09/0

119

82

09/0

119

87

10/0

119

92

11/0

119

97

12/0

120

02

01/0

120

08

02/0

120

13

Sugar from 07/18/1977 to 02/08/2013

20

30

40

50

60

70

80

90

100

110

04/0

119

86

05/0

119

91

05/0

119

96

06/0

120

01

07/0

120

06

08/0

120

11

Lean hogs from 04/01/1986 to 02/08/2013


For example, let us consider the two plots in Figure 1.4, that display the time evolution of the futures prices

of sugar and lean hogs. Both series do not exhibit obvious deterministic time trends and their dramatic bubbles

(especially in sugar) suggest that they cannot have been generated by a random walk. Interestingly, lean hogs

exhibits the well known “pork cycle,” or cyclical patterns related to pork production.

The price level series all exhibit a very high level of linear persistence in the sense that their estimated au-

tocorrelation function, ρ(s), takes on the value ρ(1) ≈ 1, with small, but significant, ρ(s) for some s > 1 (see

Table 1.3 for the autocorrelation at lag 1). Moreover, their normalized spectral densities exhibit extremely sharp

peaks at the zero frequency and are near zero elsewhere in the spectrum. Of course, this is suggestive of a unit

root process, however, augmented Dickey-Fuller unit root tests of the series are inconclusive in rejecting the null

of a unit root (including a constant, but no time trend).9

This is unsurprising given what we know about the properties of some exotic parametric processes which are

able to elude detection by traditional unit root testing (see for example the causal representation of the noncausal

AR(1) model with i.i.d. Cauchy innovations discussed later in Section 1.4.2). A linear unit root test is not of much

use if the causal representation of the process may be nonlinear and strictly stationary, with moments that do not

exist. Finally, linear unit root tests have been shown to have low power in the presence of nonlinearity (such as

multiple regimes, for example).

Since the continuous contract futures series are constructed through the “rolling over” mechanism, they re-

flect the price of a reconstituted futures contract in which the time to maturity, h, remains fixed throughout the

time evolution of the price level, despite the fact that the reconstitution is generated from individual contracts of

different maturities, each representing daily closing prices for those days these futures contracts are traded on the

exchange. The different starting dates for each of the series are given in Table 1.2 and all the continuous contract

series end on February 8th, 2013.

Summary statistics for the price levels series are given in Table 1.3 and plots and histograms of all the price

level series are available in Appendix 1.14 (Figures 1.10.i to 1.11.iv).

Note some of the salient features from the summary statistics in Table 1.3. If we are to interpret the series

as strictly stationary, the sample moments suggest highly leptokurtic unconditional distributions for most of the

series. Exceptions to this exist, however, in orange juice, lumber, platinum, copper, gasoline RBOB, and lean

hogs. Perhaps more importantly we should consider that most of the series are also positively skewed, again

with a few exceptions in gasoline RBOB and lean hogs (and possibly orange juice). Visual examination of the

histograms in Appendix 1.14 corroborate these statistics. Moreover, some of the histograms indicate a bimodal

structure, especially among those series that are highly skewed, suggesting the possibility of a mixture between

low price and high price regimes. A good example of this is the copper series.

9The estimated spectral density and Dickey-Fuller test results are available upon request.


Tabl

e1.

3:Su

mm

ary

stat

istic

s-c

omm

odity

futu

res

pric

ele

vels

erie

s

Lev

els

Qua

ntile

sSe

ries

10%

50%

90%

Mea

nSt

nd.D

ev.

Skew

ness

Kur

tosi

sA

CF(

1)Sa

mpl

esi

zeSo

ybea

nm

eal

149.

600

185.

800

314.

200

210.

347

70.1

511.

729

6.19

00.

998

9280

Soyb

ean

oil

16.6

4023

.750

39.9

9326

.399

10.4

491.

709

5.51

60.

999

9280

Soyb

eans

503.

750

629.

000

1057

.600

716.

563

249.

577

1.75

55.

735

0.99

892

80O

rang

eju

ice

79.2

5011

5.12

517

0.35

011

8.92

633

.531

0.59

22.

663

0.99

892

80Su

gar

6.04

09.

830

20.5

0311

.586

6.34

31.

946

7.28

30.

998

9280

Whe

at26

7.25

035

7.50

062

2.75

040

1.67

215

1.03

61.

878

6.65

60.

998

9280

Coc

oa99

1.00

016

21.0

0029

71.1

0018

35.2

6874

4.05

10.

926

3.46

60.

997

9280

Cof

fee

64.7

0012

4.45

019

2.00

012

6.32

548

.051

0.69

93.

495

0.99

792

80C

orn

203.

750

258.

250

435.

000

298.

578

126.

933

2.09

77.

126

0.99

892

80C

otto

n49

.059

65.1

5085

.720

67.6

6519

.798

2.68

816

.481

0.99

792

80R

ice

5.36

08.

440

14.6

019.

243

3.55

70.

844

3.50

30.

999

6309

Lum

ber

181.

700

261.

700

366.

920

267.

773

70.5

620.

463

2.45

80.

996

7005

Gol

d27

7.70

038

5.40

096

4.23

051

0.66

435

1.24

52.

202

7.13

90.

999

9280

Silv

er4.

400

6.03

718

.050

9.40

67.

680

2.27

27.

910

0.99

892

80Pl

atin

um36

7.20

053

4.00

015

55.4

2075

5.71

546

3.35

21.

169

3.09

60.

999

7009

Palla

dium

111.

000

206.

150

645.

140

286.

657

203.

778

1.30

33.

935

0.99

970

09C

oppe

r74

.000

115.

400

358.

860

168.

275

111.

428

1.06

02.

562

0.99

963

09L

ight

crud

eoi

l16

.400

26.7

4085

.712

38.1

0327

.475

1.37

13.

827

0.99

977

93H

eatin

goi

l45

.733

67.6

5526

4.86

511

2.31

686

.145

1.29

23.

484

0.99

969

44B

rent

crud

eoi

l15

.796

25.4

1010

0.12

841

.547

32.5

011.

205

3.19

90.

999

6427

Gas

oil

147.

000

226.

500

894.

875

375.

818

281.

273

1.16

13.

180

0.99

961

60N

atur

alga

s1.

631

3.14

27.

366

3.98

72.

478

1.37

04.

950

0.99

859

64G

asol

ine

RB

OB

153.

220

223.

895

304.

360

227.

116

57.8

770.

023

2.30

90.

995

1920

Liv

eca

ttle

60.5

0071

.488

95.1

0075

.023

15.8

711.

219

4.91

50.

998

9280

Lea

nho

gs46

.550

63.3

4581

.380

63.7

2613

.133

0.16

52.

830

0.99

570

09*

Not

eth

atA

CF(

1)r

epre

sent

sth

eau

toco

rrel

atio

nfu

nctio

nat

lag

1an

dT

isth

esa

mpl

esi

ze.A

lso

the

kurt

osis

mea

sure

empl

oyed

here

isno

tthe

exce

ssku

rtos

is.


1.3 The linear causal ARMA model

In this section we show that the causal linear ARMA model, with Gaussian innovations, is unable to adequately

capture the features of the futures price level data.

In order to assess the ARMA model’s ability to fit the price level data, we estimate a number of different

ARMA(p, q) specifications and choose among the best fitting according to the Akaike information criteria (AIC).

The software used to estimate the ARMA model is the popular “R project for statistical computing” available for

download at http://www.r-project.org/. In order to facilitate the (p, q) parameter search we employ the

auto.arima() function in the R forecast package due to Hyndman and Khandakar (2008). Given computational

constraints, maximum orders of p + q = 13, p ≤ 10 and q ≤ 3 are chosen. AIC’s are specified not to be

approximated and the “stepwise” selection procedure is avoided to make sure all possible model combinations are

tested.

The arima() routine called by auto.arima() obtains reasonable starting parameter values by conditional sum

of squares and then the parameter space is more thoroughly searched via a Nelder and Mead (1965) type algo-

rithm. The pseudo-likelihood function is computed via a state-space representation of the ARIMA process, and

the innovations and their variance found by a Kalman filter. Since the assumption of Gaussian shocks may be

misspecified, robust sandwich estimator standard errors are employed of the type introduced by White (1980).

If the ARMA model captures the nonlinear features of the data, the residuals (et) should be approximately

representative of a strong white noise series. Therefore, we test for this feature in two ways: 1) we employ the

Ljung-Box test with the null of weak white noise residuals [Ljung and Box (1978)] and 2) the BDS test with the

null of independent residuals [Brock, Dechert and Scheinkman, and LeBaron (1996)].

1.3.1 Test specifications

The Ljung-Box test statistic is given as

LB(S) = T

S∑s=1

T + 2

T − s%(s)2, (1.1)

where %(s) is the estimated autocorrelation function of the ARMA model residuals. The null hypothesis is that

the autocorrelations of the ARMA residuals are jointly 0 up to lag S. Finally, LB(S) ∼ χ2(S), if the residuals

are representative of the true theoretical (εt) which is a strong white noise (and neglecting the fact that %(s) is an

estimated quantity itself).

The BDS test was designed to be employed on the residuals of a best fitting linear model in order to look

for deterministic chaos in the residual nonlinear structure. This test involves the correlation dimension tech-


nique originally developed by Grassberger and Procaccia (1983) to detect the presence of chaotic structure by

embedding overlapping subsequences of the data in k-space. Given a k-dimensional time series vector xt,k =

(xt, xt+1, . . . , xt+k−1)′ called the k-history, the BDS test treats this k-history as a point in a k-dimensional space.

The BDS test statistic, called the correlation integral is given as

Ck(ε, T ) =2

Tk(Tk − 1)

∑i<j

Iε(xi,k,xj,k), where Tk = T − k + 1, (1.2)

and where Iε(u, v) is an indicator variable that equals one if ‖u − v‖ < ε and zero otherwise, where ‖·‖ is

the supnorm. The correlation integral estimates the fraction of data pairs of xt,k that are within ε distance from

each other in k-space. Despite the original purpose of the test, it is effectively a test for independence since

if we can reject the null hypothesis of correlation of (xt)Tkt=1 in every k-dimensional embedding space this is

equivalent to being i.i.d. That is, if the k-histories show no pattern in k-dimensional space, then we should have

that Ck(ε, T ) ≈ C1(ε, T )k.

It is shown that the BDS statistic√T[Ck(ε, T )− C1(ε, T )k

]is asymptotically Normal with mean zero and

finite variance under the null hypothesis [see Tsay (2010), Ch.4.2.1]. If we cannot reject the null hypothesis the

alternative is quite broad since, depending on the correlation structure in the k-dimensional spaces, the nonlinear-

ity could have come about due to either deterministic nonlinearity, i.e. chaos [see Blank (1991), Decoster et al.

(1992), and Yang and Brorsen (1993)], or stochastic nonlinearity.

For the Ljung-Box test we specify the number of lags as S = ln(T ) rounded to the nearest integer, where T

is the sample size given in Table 1.3. According to Tsay (2010), Ch.2.2, pg.33, simulation studies suggest that

this choice maximizes the power of the test. For the BDS test we consider embedding dimensions k up to k = 15,

which trades off number of dimensions for computational efficiency.

1.3.2 Results

Table 1.4 presents estimation results for the ARMA model. Generally, for all the series, the best fitting linear

ARMA model residuals reject the BDS null hypothesis of i.i.d. shocks at the 1% test significance level (in fact all

of the test statistic p-values are extremely close to 0). There is one exception in the lean hogs price levels series,

where for ε = 2.6 (the parameter that defines “near points” in the k-dimensional space, i.e. ‖u− v‖ < ε), we are

not able to reject the null hypothesis of i.i.d. residuals (however, we are able to reject for smaller ε = 1.95). The

p-values in this case decline monotonically from 0.731 at k = 2 down to 0.165 at k = 15.

Plots of all the residuals series also suggest ARCH effects (see Figure 1.5 for an example). Interestingly,

except in the case of coffee, the noises are still weak white according to the Ljung-Box test as we are unable to

reject the null hypothesis at the 10% level, although we are able to reject platinum at the 13% level and soybean


meal at the 15% level.

Interestingly, the ARMA estimation software is unable to fit an autoregressive model to the gold series, and

so we skip testing its residuals for whiteness.

Figure 1.5: Soybean meal residuals from ARMA model

Clearly, the causal linear ARMA model is not able to fully capture the structure of the data as the residuals

are often weak white noise, but not i.i.d. Therefore, the evidence presented in this section suggests that we need a

better model if we are to adequately capture the nonlinear dynamic features of the futures price level data.

1.4 The linear mixed causal/noncausal process

The linear mixed causal/noncausal process takes the form of a two sided infinite moving average representation,

Yt =

∞∑i=−∞

aiεt−i, (1.3)

where (εt) is a strong white noise, that is a sequence of independently and identically distributed (i.i.d.) variables,

that doesn’t necessarily admit finite moments. The mixed causal/noncausal process is composed of both a purely

causal component that depends only on past shocks, that is the sum of aiεt−i for all i > 0, and a purely noncausal

component that depends only on future shocks, that is the sum of aiεt−i for all i < 0. We have a unique

representation for (1.3), up to a scale factor on εt, except in the case where the white noise (εt) is Gaussian [see

e.g. Findley (1986) and Cheng (1992)]. For Gaussian white noise, there exists an equivalent purely causal linear


Table 1.4: ARMA estimation results

Series p qa AIC Log-likelihood Ljung-Box Pvalb BDS Pvalc

Soybean meal 6 2 52395.00 -26188.50 0.15 0Soybean oil 8 3 11859.05 -5917.52 0.92 0Soybeans 9 2 73548.21 -36762.11 0.52 0Orange juice 4 3 42121.61 -21052.80 0.40 0Sugar 10 2 7842.39 -3908.20 1.00 0Wheat 7 2 67069.47 -33524.74 1.00 0Cocoa 8 3 94368.76 -47172.38 0.72 0Coffee 4 2 48866.80 -24426.40 0.06 0Corn 7 3 59385.84 -29681.92 0.63 0Cotton 10 0 32760.78 -16369.39 1.00 0Rice 10 3 -4799.02 2413.51 0.96 0Lumber 8 3 44027.92 -22001.96 1.00 0Gold 0 3 102914.50 -51453.27 n/a 0Silver 9 3 7424.04 -3699.02 0.94 0Platinum 8 2 55936.82 -27957.41 0.13 0Palladium 9 3 48209.69 -24091.84 0.99 0Copper 10 0 34719.50 -17348.75 1.00 0Light crude oil 7 2 22244.11 -11112.06 0.95 0Heating oil 9 2 34465.28 -17220.64 1.00 0Brent crude oil 7 2 18807.92 -9393.96 0.90 0Gas oil 5 3 44142.24 -22062.12 0.92 0Natural gas 3 2 -4178.27 2095.13 0.23 0Gasoline RBOB 5 3 11715.32 -5848.66 0.99 0Live cattle 6 1 22771.40 -11377.70 0.99 0Lean hogs 3 2 23567.63 -11777.81 0.70 0

a The orders of the ARMA(p, q) model are given in the first and second columns.b The column denoted “Ljung-Box Pval” indicates the p-value statistic for this test – therefore we reject the null

hypothesis at x% probability of committing a type I error if Pval < x.c The BDS p-value provided is for the case where ε = 1.95. The p-values are 0 for all lags k = 2, . . . , 15.


representation where (ε∗t ) is another Gaussian white noise. This implies that for non-Gaussian (εt), a mixed linear

process including a noncausal component (i.e. ∃i < 0, ai 6= 0) will necessarily admit a nonlinear causal dynamic.

For more details see Appendix 1.11.1.

1.4.1 The asymmetries

As an example, let us consider the effect of shocks (εt) on the model above in (1.3). Let εt be distributed Cauchy

which admits no first and second-order finite moments. Moreover, let ai = ρi1 for i ≥ 0, ai = ρi2 for i ≤ 0, and

|ρk| < 1 for k = 1, 2, where we are free to choose ρ1 and ρ2 as such.

In choosing various values for ρk, k = 1, 2, we will see how the general linear causal/noncausal model is

able to exhibit bubble like phenomenon with asymmetries of the type discussed in Ramsey and Rothman (1996)

(see the Introduction, Section 1.1, (ii), Price dynamics). Consider the simulated sample path of the linear mixed

causal/noncausal model with standard Cauchy shocks as depicted in Figure 1.6, where we have zoomed in on a

bubble episode to focus on the dynamics.

Within Figure 1.6 we have an example of a positive shock εt > 0 around time t = 57. Depending on

the values chosen for ρ1 and ρ2, the bubble’s build up and subsequent crash exhibits different rates of ascent

and descent. For example, consider the parameter combination (ρ1 = 0.8, ρ2 = 0). This represents the purely

causal case where the shock occurs at time t = 57 and its effect dies off slowly, and so we have a quick rise

and a subsequently slow decline. Also consider the opposite case where (ρ1 = 0, ρ2 = 0.8). This is the purely

noncausal case where the bubble builds up slowly until time t = 57 and then quickly declines. The other cases

represent mixed causal/noncausal models where the bubble rises and falls at rates which depend on the ratio of

ρ1/ρ2 = α. If α > 1 the bubble rises quicker than it declines; if α < 1 then it rises slower than it declines,

and if α = 1 then it behaves symmetrically around time t = 57. These asymmetries can be classified within the

framework of Ramsey and Rothman (1996) as being longitudinally asymmetric in that the probabilistic behaviour

of the process is not the same in direct and reverse time.

Of course, for a negative shock εt < 0 the behaviour would be duplicated, but instead we would see a

crash instead of a bubble. This suggests that the mixed causal/noncausal process can also exhibit transversal

asymmetries, that is asymmetries in the vertical plane, by modifying the distribution of the shocks. For example,

if we were to only accept positive Cauchy shocks, εt > 0, this would induce a process that only exhibited positive

bubbles which would represent a transversally asymmetric process.

Therefore, by managing both the moving average coefficients, ai, and the distribution of the shocks εt in (1.3),

the mixed causal/noncausal model can exhibit both longitudinal and transversal asymmetries of the type discussed

by Ramsey and Rothman (1996).


Figure 1.6: The mixed causal/noncausal model with Cauchy shocks

-20

0

20

40

60

80

100

120

140

160

180

40 45 50 55 60 65 70 75

ρ1 = 0.8, ρ2 = 0.0ρ1 = 0.6, ρ2 = 0.2ρ1 = 0.4, ρ2 = 0.4ρ1 = 0.2, ρ2 = 0.6ρ1 = 0.0, ρ2 = 0.8

1.4.2 The purely causal representation

As discussed above, in general we have a unique linear representation as (1.3) except when the white noise process

is Gaussian. This implies that, for fat tailed distributions, such as the t-distribution or Cauchy distribution, the

purely causal strong form representation will necessarily admit a nonlinear representation [Rosenblatt (2000)].

Example: The noncausal autoregressive process with Cauchy shocks

Consider the noncausal autoregressive process of order 1 with Cauchy shocks,

xt = ρxt+1 + εt, (1.4)

where |ρ| < 1 and εt/σε follows a standard i.i.d. Cauchy distribution. The shocks can be interpreted as back-

ward innovations, defined as εt = xt −median(xt|xt+1), since, strictly speaking, the moments of the Cauchy

distribution do not exist.

This process admits both a strong purely causal representation which is necessarily nonlinear with i.i.d.

shocks, and a weak form purely causal representation which is linear, but where the shocks are weak white

noise and not i.i.d.

More precisely, the noncausal process (xt) is a Markov process in direct time with a causal transition p.d.f.

given as [Gourieroux and Zakoian (2012), Proposition 2, and Appendix 1.11.3 of this paper]:

ft+1|t(xt+1|xt) =1

σεπ

σ2ε

σ2ε + z2

t

σ2ε + (1− |ρ|)2x2

t

σ2ε + (1− |ρ|)2x2

t+1

. (1.5)


In particular the causal conditional moments associated with the equation above exist up to order three, whereas

the noncausal conditional moments associated with the forward autoregression in (1.4), and the unconditional

moments, do not exist.

i) The causal strong autoregressive representation

In order to represent (1.4) as a causal, direct time, process in strong form, we must appeal to the nonlinear (or

generalized) innovations of the process [see Rosenblatt (2000), Corollary 5.4.2. or Gourieroux and Jasiak (2005),

Section 2.1].

Intuitively, a nonlinear error term, (ηt), of the causal process (xt) is a strong white noise where we can write

the current value of the process xt as a nonlinear function of its own past value xt−1 and ηt, say,

xt = G(xt−1, ηt), ηt ∼ i.i.d., (1.6)

where xt and ηt satisfy a continuous one-to-one relationship given any xt−1. For more details see Appendix

1.11.4.

ii) The causal weak autoregressive representation

Only the Gaussian autoregressive processes possess both causal and noncausal strong form linear autoregres-

sive representations. The noncausal AR(1) Cauchy model therefore admits only a weak form linear representation

given as [Gourieroux and Zakoian (2012), Section 2.3]:

xt = Et|t−1[xt|xt−1] + η∗t

√V art|t−1[xt|xt−1]. (1.7)

The representation is weak since (η∗t ) is a weak white noise (not i.i.d.) and η∗t√V art|t−1[xt|xt−1] = ε∗t is

conditionally heteroskedastic. That is, the weak innovations also display GARCH type effects.

The conditional moments of xt are given as:

Et|t−1[xt|xt−1] = sign(ρ)xt−1 and (1.8a)

Et|t−1[x2t |xt−1] =

1

|ρ|x2t−1 +

σ2ε

|ρ|(1− |ρ|). (1.8b)

Interestingly, from equation (1.8a), we see that for ρ > 0 the process exhibits a unit root (this is the mar-

tingale property), but is still stationary; this unit root is expected since the unconditional moments of xt do not

exist. Usually when we consider the properties of a unit root model this is within the context of models with a

nonstationary stochastic trend. However, in the example above the causal process (xt) has a unit root when being

strongly stationary. So the unit root does not generate a stochastic trend, but can generate bubbles due to the


martingale interpretation [see Gourieroux and Zakoian (2012), and the discussion in Section 1.4.3].

1.4.3 Other bubble like processes

As described in Gourieroux and Zakoian (2012), several other examples of martingale processes with bubbles

have been introduced in the literature. However, none of these processes are as easy to introduce into a general

dynamic framework as the set of mixed causal/noncausal processes.

Interestingly, these previous bubble processes are piecewise linear, but still maintain the martingale property.

For example, the bubble process introduced in Blanchard and Watson (1982) is given by:

xt+1 =

1πxt + εt+1, with probability π,

εt+1, with probability (1− π),

(1.9a)

where εt is a Gaussian error term and π ∈ (0, 1). This is a martingale process, with a piecewise linear dynamic in

that given the latent state, the parameter on the autoregression switches between zero and 1/π.

Evans (1991) proposes to model the explosive rate parameter, (θt), say, as a Bernoulli random variable,

B(1, π). Again, this process represents one that is piecewise linear, but in this case is also a multiplicative

error term model, with (ut) representing an i.i.d. process with ut ≥ 0, Et[ut+1] = 1, and with parameters

0 < δ < (1 + r)α where r > 0, π ∈ (0, 1], and

xt+1 =

(δ + 1

π (1 + r)θt+1

(xt − δ

(1+r)

))ut+1 if xt > α

(1 + r)xtut+1 if xt ≤ α.(1.10a)

In this case the regime is not latent, but is a function of the observable xt. In this way, the process is an extension

of the self-exciting threshold autoregression of Tong and Lim (1980).

For illustration we have simulated sample paths from the two bubble processes above along with the causal

AR(1) Cauchy process (see Figure 1.7). The Blanchard and Watson process is simulated by choosing π = 0.8 and

εt ∼ IIN(0, 1). The Evans process is simulated in accordance to the parameters chosen in simulating bubbles

for Table 1, on page 925, of their paper; that is, we have α = 1, δ = 0.5, 1 + r = 1.05, π = 0.75 and a sample

path of length T = 100 is generated. Moreover, ut is log-normally distributed, where ut = exp [yt − τ2/2] and

yt ∼ IIN(0, 0.052). Finally, the causal AR(1) Cauchy is simulated by choosing ρ = 0.8 in equation (1.4) and

σ = 0.1 as the scale parameter of the Cauchy distribution.

The bubble processes above were constructed for very specific theoretical reasons. The Blanchard and Watson

(1982) process is given as an example of a bubble consistent with the rational expectation hypothesis and the Evans


Figure 1.7: Plots of simulated bubble processes

-60

-40

-20

0

20

40

60

80

100

0 50 100 150 200 250 300 350 400 450 500

Blanchard and Watson model

0

0.5

1

1.5

2

2.5

3

0 10 20 30 40 50 60 70 80 90 100

Evans model

-10

0

10

20

30

40

50

60

0 50 100 150 200 250 300 350 400 450 500

Gourieroux and Zakoian model

(1991) process is given as an example of a stationary process with periodically collapsing bubbles that defies

standard linear unit root testing. Alone, and without further modification, neither process should be considered a

serious candidate to model bubbles in commodity futures price levels. On the other hand, unlike these previous

bubble processes, the AR(1) Cauchy model is easily introduced in a mixed causal/noncausal framework.

1.5 Estimation of the mixed causal/noncausal process

In this section we introduce the mixed causal/noncausal autoregressive model which will be estimated in an

attempt to model the asymmetric bubble features exhibited by the commodity futures price level data. The


model is a linear parameterization of the general mixed causal/noncausal model in (1.3) and represents the mixed

causal/noncausal analog of the causal autoregressive model. The model is discussed in the next Section 1.5.1 and

estimation of the model via maximum likelihood is discussed in Section 1.5.2.

1.5.1 The mixed causal/noncausal autoregressive model of order (r, s)

Definition 1.5.1. The mixed causal/noncausal autoregressive process of order (r, s)

Let (xt) be a univariate stochastic process generated by a linear autoregressive mixed causal/noncausalmodel with order (r, s). The process is defined by

α(L)xt = ε∗t , ε∗t ∼ i.i.d., (1.11a)

where α(L) = 1− α1L− α2L2 − . . .− αpLp, (1.11b)

such that L is the lag operator (i.e. Lxt = xt−1 and L−1xt = xt+1), p = r + s, and the operatorα(z) can be factorized as α(z) = φ(z)ϕ∗(z). We have that φ(z) (of order r) contains all its rootsstrictly outside the complex unit circle and ϕ∗(z) (of order s) contains all its roots strictly insidethe unit circle.10

Therefore, φ(z) represents the purely causal autoregressive component and ϕ∗(z) represents the purely noncausal

autoregressive component [Breidt et al. (1991)].

Moving average representation of the stationary solution

If α(L) has no roots on the unit circle, and εt belongs to a Lν-space with ν > 0 (that is E[|εt|ν ] < ∞), then

a unique stationary solution to the difference equation defined in (1.11b) exists [see Appendix 1.11.1]. We can

write:

xt = α(L)−1ε∗t =

∞∑l=−∞

γlε∗t−l, (1.12)

where the series of moving average coefficients is absolutely summable,∑∞l=−∞|γl| <∞.

The strong stationary representation is derived as follows. Let us factorize φ(L) and ϕ∗(L) as

φ(L) =

r∏j=1

(1− λ1,jL), where |λ1,j | < 1, (1.13a)

and ϕ∗(L) =

s∏k=1

(1− 1

λ2,kL), where |λ2,k| < 1. (1.13b)

The noncausal component can also be written as

ϕ∗(L) =(−1)sLs∏sk=1 λ2,k

s∏k=1

(1− λ2,kL−1). (1.14)

10To ensure the existence of a stationary solution, we assume that all roots have a modulus strictly different from 1.


We get the Taylor series expansions

(1− λ1,jL)−1 =

∞∑l=0

λl1,jLl, (1.15a)

and (1− λ2,kL−1)−1 =

∞∑l=0

λl2,kL−l, (1.15b)

which are valid because the roots are such that |λ1,j | < 1, ∀j and |λ2,k| < 1, ∀k. Thus we get

xt = φ(L)−1ϕ∗(L)−1ε∗t =

∏sk=1 λ2,k

(−1)sLs1∏r

j=1(1− λ1,jL)∏sk=1(1− λ2,kL−1)

ε∗t

=

∏sk=1 λ2,k

(−1)sLs

r∏j=1

( ∞∑l=0

λl1,jLl

)s∏

k=1

( ∞∑l=0

λl2,kL−l

)ε∗t =

∞∑l=−∞

γlε∗t−l. (1.16)

An alternative representation

Since such a representation in (1.11a) is defined up to a scale factor on ε∗t , another equivalent representation is

given as

Φ(L)xt = φ(L)ϕ(L−1)xt = εt, (1.17a)

where ϕ(L−1) = 1− ϕ1L−1 − ϕ2L

−2 − . . .− ϕsL−s, (1.17b)

φ(L) = 1− φ1L− φ2L2 − . . .− φrLr, (1.17c)

and (εt) is the sequence of i.i.d. random variables defined as εt = −(1/ϕ∗sLs)ε∗t = −(1/ϕ∗s)ε

∗t+s.

We can always map the parameters from model (1.17) to (1.11) since we have −(1/ϕ∗sLs)ϕ∗(L) = ϕ(L−1),

where the coefficients of ϕ(L−1) are given as ϕi = −ϕ∗s−i/ϕ∗s for i = 1, . . . , s − 1, and ϕs = 1/ϕ∗s for i = s,

and the roots of ϕ∗(L) and ϕ(L−1) are inverses (in the sense that ϕ∗(z) = ϕ(1/z) = 0 for some complex z

where |z| < 1).

From the original representation in equation (1.11) we have that

α(L)xt = ε∗t ⇔ xt − α1xt−1 − · · · − αpxt−p = ε∗t , (1.18)

and so under this standardization, the autoregressive coefficient associated with the current time period, xt, is

normalized to one (i.e. α0 = 1).


However, given the alternative representation we have that

1

ϕ∗sxt+s +

ϕ∗1ϕ∗sxt+(s−1) + · · ·+ xt + · · ·+ φr−1xt−(r−1) + φrxt−r =

ϕsxt+s + ϕs−1xt+(s−1) + · · ·+ xt + · · ·+ φr−1xt−(r−1) + φrxt−r = εt, (1.19)

and so under this alternative standardization, the autoregressive coefficient chosen as equal to 1 does not coincide

with the most recent time period, t+ s; rather the standardization applies the autoregressive coefficient equal to 1

to the “intermediate” value xt, where the autoregression depends also on s future lags and r past lags.

1.5.2 ML estimation of the mixed causal/noncausal autoregressive model

In estimating the parameters of the mixed causal/noncausal autoregressive process in Section 1.5.1, we can apply

the usual maximum likelihood estimation (MLE) approach. The likelihood function represents the distribution

of the sample data, conditional on the parameters of the model. The maximum likelihood method estimates the

parameters of the model as the values of the parameters which maximize this likelihood function. Let θ represent

the vector of parameters, including the vectors of causal and noncausal autoregressive coefficients, φ and ϕ, and

the parameters characterizing the fat tailed, t-distributed, error term,11 that are its degree of freedom parameter λ

and scale σ. The maximum likelihood estimator is given as

θmle = argmaxθ

f(xT |θ), (1.20)

where

f(xT |θ) = f(xT , xT−1, xT−2, . . . , x1|θ)

= f(xT |xT−1, . . . , x1; θ)f(xT−1|xT−2, . . . , x1; θ) . . . f(x3|x2, x1; θ)f(x2|x1; θ)f(x1|θ), (1.21)

and xT = xT , xT−1, xT−2, . . . , x1 is the joint vector of sample data.

i) Approximation of the likelihood in the causal autoregressive model

In causal time series analysis of autoregressive models, say for the autoregressive model of order p, we know

that the likelihood function can be approximated by neglecting the effect of starting values. For example, the

11We employ either a t-distributed or skew t-distributed error term in order to identify the mixed causal/noncausal model. See Appendix1.11.5.


causal AR(p) model’s likelihood:

f(xT |θ) = f(xT , xT−1, xT−2, . . . , x1|θ)

= f(xT |xT−1, . . . , xT−p; θ)f(xT−1|xT−2, . . . , xT−p−1; θ) . . . f(x3|x2, x1; θ)f(x2|x1; θ)f(x1|θ), (1.22)

can be approximated by neglecting the conditional densities of the initial values xt for all t ≤ p. For large sample

size, T , this approximation error becomes negligible and the estimator obtained by maximizing the approximated

likelihood is still asymptotically efficient.

ii) Approximation of the likelihood in the mixed causal/noncausal autoregressive model

The maximum likelihood approach can also be used in the general framework of the mixed causal/noncausal

processes, that is the parameters estimated by:

θmle = argmaxθ

f(xT |θ), . (1.20)

Under standard regularity conditions, including the strong stationarity of the process and appropriate mixing

conditions, the ML estimator is consistent and its asymptotic properties, that is its speed of convergence and

asymptotic distribution, are easily derived [see Breidt et al. (1991)].

However, in practice the closed form expression of the likelihood, f(xT |θ), is difficult to derive and the

likelihood function has to be approximated, without loosening the asymptotic properties of the ML estimator.

Two approaches are typically suggested:

i) Take the autoregressive expression α(L)xt = εt, and approximate the likelihood by:

T∏t=p+1

fε(α(L)xt|β), (1.23)

where β are the parameters characterizing the distribution of the error. Such an approximation is wrong and

leads in general to an inconsistent estimator. The reason is as follows. Since the approximation is based on

the autoregression:

xt − α1xt−1 − α2xt−2 − · · · − αpxt−p = εt, (1.24)

the approximation above is valid if εt is independent of the explanatory variables, xt−1, . . . , xt−p. But in

a mixed model with a noncausal component, εt appears in the moving average representation of xt−1, . . . ,

and xt−p, which creates dependence. This is the well known error-in-variables model encountered in linear

models and usually solved by introducing instrumental variables, with, in general, a loss of efficiency.


ii) Consider the moving average expression of xt =∑∞l=−∞ aiεt−i with the identification restriction a0 = 1.

Set to zero the values of the noise corresponding to the indices outside the observation period, 1, 2, . . . , T,

that is, εt = 0, if t ≤ 0 and if t ≥ T + 1. Thus we truncate the moving average representation into:

xt ≈T−1∑

i=−T+t

ai(α)εt−i =

T∑τ=1

at−τ (α)ετ , for t = 1, . . . , T, (1.25)

where the dependence on the autoregressive parameters is explicitly indicated. We get a linear system of

equations, which relates the observations x1, . . . , xt and the errors ε1, . . . , εT in a one-to-one rela-

tionship. Therefore, the joint distribution of x1, . . . , xt can be deduced from the joint distribution of

ε1, . . . , εT , which has a closed form by applying the change of variables Jacobian formula. However, this

approach is difficult to implement numerically, since the matrix of the transformation, A(α), with generic

elements at−τ (α), t, τ = 1, . . . , T has a large T × T dimension. This makes difficult, first the inversion of

this matrix, and second the numerical computation of its determinant.

This explains why a methodology has been introduced to circumvent this numerical difficulty, which explains

how to approximately invert this matrix and compute the determinant, by using appropriately both the causal and

noncausal components [see Breidt et al. (1991), Lanne and Saikkonen (2008), and Appendix 1.12 of this paper].

This approximated likelihood is used in our application to commodity futures prices. The approximation requires

knowledge of the causal and noncausal orders (r, s) respectively. If they are unknown, the approach is applied to

all pairs of orders (r, s) such that r + s = p as given. The selected orders are the ones which minimize the AIC

criterion, based on the log-likelihood value.

More precisely, Lanne and Saikkonen (2008) note that the matrixA(α) can be approximately written as

A(α) ≈ Ac(φ)Anc(ϕ∗), (1.26)

where Ac(φ) (resp. Anc(ϕ∗)) depends on the causal (resp. noncausal) autoregressive coefficients only, and is

lower (resp. upper) triangular with only 1’s on the diagonal. Therefore, the Jacobian

|det (A(α))| ≈ |det (Anc(φ))||det (Anc(ϕ∗))| = 1 (1.27)

and can be neglected. Therefore, the likelihood function can be approximated by

T−s∏t=r+1

fε(ϕ(L−1)φ(L)xt;λ, σ), (1.28)


where θ = φ,ϕ, λ, σ represents the parameters of the model, that is, the vectors of causal and noncausal

autoregressive coefficients respectively, and the t-distribution degree of freedom and scale parameter assumed on

(εt).

They show that the only autoregressive representation which leads to consistent estimators is the representation

with the autoregressive coefficient equal to 1 for xt with r lagged values before, and s lagged values afterwards,

as given above in the autoregressive equation (1.19).

Example 1: Causal AR(1) process

Let us consider the stationary causal AR(1) process

xt = α1xt−1 + εt, where |α1| < 1. (1.29)

This is the usual case and so we can employ the MLE to estimate α1 by maximizing the approximate likelihood

function∏Tt=2 fε(xt − α1xt−1). This case does not present a problem since we already have the coefficient in

front of xt equal to 1.

Example 2: Noncausal AR(1) process

However, given the stationary noncausal AR(1) process

xt = α1xt−1 + εt, where |α1| > 1, (1.30)

the estimator which maximizes the approximate likelihood function∏Tt=2 fε(xt−α1xt−1) is now biased. Indeed,

since xt can be written as the noncausal moving average xt =∑∞j=0

(1α1

)jε∗t+1+j , there now exists a dependence

between xt and xt−1.

The methodology leading to consistent estimation consists in the case of regressing xt−1 on xt, instead of

regressing xt on xt−1. We can rewrite the noncausal regression above as

xt =1

α1xt+1 −

1

α1εt+1 =

1

α1xt+1 + ε∗t+1 where |α1| > 1, (1.31)

which now restores the independence between the regressand and the regressor, and so the MLE which maximizes∏Tt=2 fε∗(xt −

1α1xt+1) is asymptotically unbiased.

1.5.3 Estimation results

In this Section we will evaluate estimation results from the mixed autoregressive model of order (r, s) as applied

to the 25 commodity futures price level series. Estimation of the model parameters numerically optimizes the


approximated likelihood function discussed in the last section. As in Lanne and Saikkonen (2008) and Lanne,

Luoto, and Saikkonen (2012), we assume the regularity conditions of Andrews et al. (2006) are satisfied, which

require the likelihood to be twice differentiable with respect to both xT and θ. The approximated likelihood

algorithm is computed in Fortran and the optimization of the likelihood function is performed using a set of

Fortran optimization subroutines called the PORT library, designed by David M. Gay at Bell Laboratories [Gay

(1990)].

As in Section 1.3 where the linear causal ARMA model with Gaussian innovations was shown to inadequately

capture the features of the price level data, we will again employ the AIC criterion as a measure of model fit, along

with Ljung-Box statistics testing the hypothesis that the innovations exhibit no linear autocorrelation. In this way,

we will consider the best fitting linear causal ARMA model, with Gaussian innovations, from Section 1.3 as a

benchmark model.

Table 1.5.i presents the results of maximum likelihood estimation. The mixed AR model orders, (r, s), were

selected via AIC among a possible set of (r, s) values such that r ≤ 10 and s ≤ 10. The first row of the results

for each series represents the benchmark ARMA model, with Gaussian innovations, from Section 1.3, while the

second and third rows represent the mixed AR(r, s) model with both t-distributed and skew t-distributed errors,

respectively. Recall that the mixed causal/noncausal model is only identified for non-Gaussian error terms. The

lags column represents the number of lags included in the Ljung-Box statistic, where p-values are provided in

their respective columns. Finally, an ’x’ marks the model with the lowest normalized AIC.

The estimation results suggest that the mixed causal/noncausal model improves model fit over the baseline

causal ARMA model, with Gaussian innovations. When the models are nested, we employ likelihood ratio (LR)

tests. In every case the mixed causal/noncausal model improves model fit significantly at the 1% significance

level.

In comparing the skewed t-distributed error term mixed causal/noncausal model to the standard t-distribution

error term model, the results vary by series. In most cases the skewed t-distribution improves model fit and passes

a LR test at the 1% level. Moreover, orange juice, lumber, silver, copper, light crude oil, and gas oil also pass at

the 5% level and coffee passes at the 10% level. Series that do not pass LR tests at the 10% level are soybean

meal and oil, sugar, corn, cotton, rice, gold, palladium, natural gas, and lean cattle, suggesting that there is little

gain in employing a skewed t-distribution on the innovations of these mixed models.

Interestingly, the estimated t-distribution degree of freedom parameter, λ, for the mixed causal/noncausal

model error terms range between near 1 (i.e. Cauchy distributed) to around 3 or so in most cases, which suggests

bubble like behaviour as discussed in Gourieroux and Zakoian (2012). The only exceptions to this are found in

lumber (λ ≈ 3.88), gasoline RBOB (λ ≈ 4.93), and live cattle (λ ≈ 3.39).

Moreover, an examination of the roots of the lag polynomials implied by the estimated parameters also con-


firms the partly noncausal nature of the series. If we accept only the statistically significant estimated parameters

12 and solve for the roots of the implied causal and noncausal lag polynomials, φ(L) and ϕ(L−1) (from (1.17a)),

we find that the roots of both appropriately lie outside the unit circle.13 Of course, if the data generating process

was purely causal, none of the lags of the noncausal polynomial, ϕ(L−1), should be statistically significant.

Moreover, if we fit the best (according to the AIC criterion) purely causal ARMA model, with t-distributed

error terms instead of Gaussian ones, we often find that the estimated roots of the causal lag polynomial lie inside

the unit circle. This suggests misspecification of the noncausal component, as well as the fact that the noncausality

is not identified in the purely causal ARMA model with Gaussian innovations.

For reference we provide tables with all of the roots of the lag polynomials of both the causal ARMA models

of order (p, q), with t-distributed innovations, and the mixed causal/noncausal AR models of order (r, s) (see

Tables 1.7.i to 1.7.iii within Appendix 1.14).

For example, estimating purely causal ARMA models with t-distributed innovations suggest the following

results: wheat, coffee, rice, gold, platinum, all the energy series except natural gas, and lean hogs all share at least

one root with absolute value less than one in their α(L)β(L) = δ(L) lag polynomial (that is α(L)

β(L) = δ(L) in the ARMA

model δ(L)xt = εt, where α(L) and β(L) are the AR and MA lag polynomials respectively), suggesting that this

polynomial could be factorized and then estimated as a mixed causal/noncausal model (instead of the traditional

differencing technique employed).

Furthermore, the very large valued roots of the causal polynomial for light crude oil, gas oil, and heating oil,

suggest that these series may be better represented as purely noncausal since these large causal roots have little

effect on the causal impulse response. This result is confirmed by looking at the mixed causal/noncausal model

roots of light crude oil, but not for gas or heating oil which have causal polynomial roots relatively close to 1.

Finally, the mixed causal/noncausal representation for soybeans suggests that the process may be better modeled

as purely causal, while the results for cotton, live cattle, and lean hogs suggest they may be purely noncausal.

To summarize, our results suggest that most of the futures price series exhibit much better in-sample model

fit, according to the AIC criterion, when modeled by a mixed causal/noncausal autoregressive specification that

takes into account their possible noncausal components. Moreover, this noncausality is unidentified in the purely

causal ARMA model with Gaussian innovations. Finally, estimation of purely causal ARMA models with fat

tailed, t-distributed, innovations reinforces the series’ noncausal nature, as often the causal lag polynomial roots

lie inside the complex unit circle.

12Tested at the 5% level, assuming Normally distributed parameters and employing the inverse of the observed Hessian matrix at the MLEestimated value as the parameter covariance matrix.

13Which implies that (1.11a), α(L) = φ(L)ϕ∗(L), is such that the roots of φ(L) lie strictly outside the complex unit circle while those ofϕ∗(L) lie strictly inside.


Table 1.5.i: Estimation results of mixed causal/noncausal AR(r, s) models

Series p/r q/s AIC Log-likelihood Ljung-Box λ

Soybean meal 6 2 52395.000 -26188.500 0.152 ∞

x 10 10 48208.261 -24081.130 0.007 2.070

10 10 48210.118 -24081.059 0.007 2.072

Soybean oil 8 3 11859.050 -5917.523 0.919 ∞

x 10 10 9211.876 -4582.938 0.135 2.455

10 10 9213.688 -4582.844 0.138 2.455

Soybeans 9 2 73548.210 -36762.110 0.521 ∞

10 10 69444.844 -34699.422 0.000 2.073

x 10 10 69438.354 -34695.177 0.000 2.086

Orange juice 4 3 42121.610 -21052.800 0.395 ∞

10 10 38686.959 -19320.480 0.378 2.326

x 10 10 38683.919 -19317.960 0.389 2.331

Sugar 10 2 7842.392 -3908.196 0.999 ∞

x 2 2 1549.499 -767.750 0.000 1.702

2 2 1551.289 -767.645 0.000 1.702

Wheat 7 2 67069.470 -33524.740 0.998 ∞

5 5 61896.849 -30935.424 0.000 2.028

x 5 5 61880.290 -30926.145 0.000 2.047

Cocoa 8 3 94368.760 -47172.380 0.716 ∞

2 1 91804.882 -45896.441 0.000 2.558

x 10 10 91586.110 -45769.055 0.003 2.584

Coffee 4 2 48866.800 -24426.400 0.064 ∞

10 10 43731.886 -21842.943 0.014 1.923

x 10 10 43730.300 -21841.150 0.012 1.925

Corn 7 3 59385.840 -29681.920 0.625 ∞

x 2 3 53647.827 -26815.913 0.776 1.811

2 3 53649.243 -26815.622 0.783 1.811

Cotton 10 0 32760.780 -16369.390 1.000 ∞

x 1 3 27005.831 -13495.916 0.000 2.455

1 3 27007.812 -13495.906 0.000 2.455


Table 1.5.ii: Estimation results of mixed causal/noncausal AR(r, s) models

Series p/r q/s AIC Log-likelihood Ljung-Box λ

Platinum 8 2 55936.820 -27957.410 0.129 ∞

10 10 51667.822 -25810.911 0.000 1.572

x 10 10 51644.800 -25798.400 0.000 1.585

Rice 10 3 -4799.022 2413.511 0.958 ∞

x 1 3 -7173.685 3593.842 0.013 2.076

1 3 -7172.345 3594.173 0.013 2.075

Lumber 8 3 44027.920 -22001.960 1.000 ∞

10 10 42939.948 -21446.974 0.562 3.874

x 10 10 42937.244 -21444.622 0.546 3.876

Gold 0 3 102914.500 -51453.270 n/a ∞

x 10 10 56917.739 -28435.869 0.000 1.317

10 10 56919.621 -28435.811 0.000 1.318

Silver 9 3 7424.036 -3699.018 0.935 ∞

10 10 -7052.297 3549.149 0.000 1.063

x 10 10 -7056.283 3552.141 0.000 1.066

Palladium 9 3 48209.690 -24091.840 0.992 ∞

x 8 8 42569.544 -21265.772 0.000 1.225

8 8 42571.492 -21265.746 0.000 1.225

Copper 10 0 34719.500 -17348.750 1.000 ∞

10 10 30533.482 -15243.741 0.000 1.349

x 10 10 30529.777 -15240.889 0.000 1.354

Light crude oil 7 2 22244.110 -11112.060 0.949 ∞

1 3 17297.702 -8641.851 0.015 1.409

x 1 3 17295.206 -8639.603 0.014 1.415

Heating oil 9 2 34465.280 -17220.640 0.998 ∞

x 10 10 30808.001 -15381.000 0.042 1.535

8 8 30841.794 -15400.897 0.000 1.538

Brent crude oil 7 2 18807.920 -9393.960 0.901 ∞

10 10 15081.643 -7517.822 0.000 1.458

x 10 10 15073.528 -7512.764 0.000 1.462


Table 1.5.iii: Estimation results of mixed causal/noncausal AR(r, s) models

Series p/r q/s AIC Log-likelihood Ljung-Boxa λf

c Platinum 8 2 55936.820 -27957.410 0.129 ∞d 10 10 51667.822 -25810.911 0.000 1.572

e xb 10 10 51644.800 -25798.400 0.000 1.585

Gas oil 5 3 44142.240 -22062.120 0.922 ∞

10 10 41116.045 -20535.023 0.259 1.566

x 10 10 41112.456 -20532.228 0.261 1.574

Natural gas 3 2 -4178.268 2095.134 0.226 ∞

x 1 1 -7772.315 3891.158 0.017 1.666

1 1 -7771.454 3891.727 0.018 1.666

Gasoline RBOB 5 3 11715.320 -5848.658 0.988 ∞

2 1 11535.858 -5761.929 0.050 4.662

x 2 1 11526.267 -5756.133 0.056 4.925

Live cattle 6 1 22771.400 -11377.700 0.986 ∞

10 10 20427.885 -10190.943 0.915 3.331

x 8 8 20426.530 -10193.265 0.873 3.392

Lean hogs 3 2 23567.630 -11777.810 0.704 ∞

0 2 18929.149 -9459.574 0.572 2.728

x 0 2 18922.375 -9455.188 0.570 2.737

a The Ljung-Box statistics are given as p-values, where the lag parameter chosen is the log sample size, ln(T ).

b The ’x’ row for each series denotes the model with the lowest AIC.

c The first row in each series is the causal ARMA(p, q) model with Gaussian innovations estimated in Section

1.3.

d The second row is the mixed causal/noncausal AR(r, s) with t-distributed errors.

e The third row is the same model but with skew t-distributed errors.

f The λ column indicates the estimated degree of freedom parameter for the error term distribution–in the

skewed t-distributed case this value represents the sum of the two skew parameters. See Appendix 1.11.5.

1.6 Comparison of the estimated unconditional distributions

Another way to evaluate the mixed causal/noncausal autoregressive model is by comparing its model based un-

conditional distribution by sample histogram. Histograms are estimated for both the purely causal ARMA and


mixed causal/noncausal autoregressive models, both employing t-distributed error terms, by simulating long sam-

ple paths of length T = 200000, given the model parameters estimated by MLE in Section 1.5.3.

The mixed causal/noncausal autoregressive model seeks to capture both the asymmetries and bubble features

present in commodity futures prices. The transversal asymmetry and bubble features present in the series can

be examined visually by considering the sample histograms of the price series presented in Figures 1.11.i to

1.11.iv in Appendix 1.14. Note the long, positively-skewed, tails many of the series exhibit, illustrating how these

price series tend to spend most of the time in the shallow troughs, occasionally interrupted by brief, but dramatic,

positive bubbles.

The metric employed in comparing the estimated unconditional distributions is the Kullback-Leibler diver-

gence measure, which is a non-symmetric measure of the difference between two probability distributions P

and Q. Specifically, the Kullback-Leibler measure, from continuous distributions Q to P, denoted KL(Q,P ) =∫∞−∞ ln

(p(x)q(x)

)p(x)dx, is the measure of the information lost when we use Q to approximate P. 14 Since the

Kullback-Leibler measure is “information monotonic”, as an ordinal measure of making comparisons it is invari-

ant to the choice of histogram bin size. Table 1.6 reports the Kullback-Leibler measures of the sample histogram

densities for both KL(P,Q) and KL(Q,P ) where p(x) denotes the estimated p.d.f. of the sample data and the

q(x)’s are model based estimates from the simulated sample paths of the purely causal and mixed causal/noncausal

autoregressions.

Table 1.6 is broken into two sections: the two left columns report the Kullback-Leibler measure where the

estimated models are used to approximate the sample data. In this case if the sample path density has zero support

for some region in its domain, it does not punish the prospective model density for allocating too much (resp. too

little) probability to this region since this component of the Kullback-Leibler sum is zero. The two right columns

report the opposite case where the sample path density is used to approximate the estimated models; in this case,

if the model density has zero support in some region of its domain then the sample path density isn’t penalized

for allocated too much (resp. too little) probability to this region. Finally, smaller values indicate less information

lost by the approximation and are preferred.

The results of these comparisons suggest the following. First, the Kullback-Leibler measures show that the

unconditional distributions generated by the causal ARMA models represent a poor fit to the sample data. The

ARMA model seems unable to produce the sharp bubble like behaviour we see in most of the series and the

shape of its unconditional density is often much too uniform. It does not exhibit long, positively skewed, tails as

are present in many of the estimated histograms of the commodity futures prices as provided in Figures 1.11.i

to 1.11.iv in Appendix 1.14. Moreover, we often find that the sample paths from the causal ARMA models are

14In employing estimated sample histograms we use the discretized version of the Kullback-Leibler formula where areas of zero supportare padded with 1−315.


Table 1.6: Kullback-Leibler divergence measures

KL(Q,P)a KL(P,Q)Series ARMA MIXED ARMA MIXEDSoybean meal n.s.b 00.329 n.s. 97.216Soybean oil 01.965 00.316 495.751 55.752Soybeans n.s. 00.310 n.s. 49.584Orange juice 00.976 00.216 351.966 60.033Sugar 01.768 00.500 326.343 168.821Wheat 00.535 00.427 44.699 32.956Cocoa 00.625 01.247 230.260 37.961Coffee 04.519 00.216 703.097 81.218Corn 01.526 00.549 185.980 144.244Cotton 00.808 12.710 114.104 25.918Rice 00.429 00.311 59.220 123.030Lumber 00.149 00.136 07.610 08.477Gold n.s. uns.c n.s. uns.Silver n.s. uns. n.s. uns.Platinum n.s. 00.662 n.s. 96.789Palladium n.s. 01.368 n.s. 440.585Copper n.s. 00.832 n.s. 173.295Light crude oil n.s. 00.813 n.s. 202.916Heating oil n.s. 01.043 n.s. 326.858Brent crude oil n.s. 00.759 n.s. 118.503Gas oil n.s. 00.709 n.s. 132.528Natural gas 00.906 00.753 303.694 325.575Gasoline RBOB 01.429 00.261 483.674 08.649Live cattle 00.562 18.227 31.469 76.491Lean hogs 02.649 00.032 640.295 03.308average 01.346 01.858 284.154 121.335selective averaged 01.206 00.650

a P represents the sample data.b “n.s.” stands for nonstationary, i.e. the simulations from the causal linear model

were explosive.c “uns.” within the context of the mixed causal/noncausal models implies that the

simulated sample paths were, for a lack of better words, “unstable”: highly er-ratic with extremely long tails and extremely irregular, almost “chaotic” type be-haviour. In general, while stationary, models with “uns.” listed represented poorcandidates as having come from the data’s DGP.

d The selective average omits the extreme outlying cases highlighted in bold.


explosive, due to the noncausal root in their estimated causal lag polynomials.

The results from the left hand columns of Table 1.6 suggest a few distinct outlying Kullback-Leibler measures.

For example, cotton and live cattle are extremely large compared to the other series’ measures in the case of the

mixed causal/noncausal autoregressive model, and coffee represents an outlier in the case of the causal ARMA

models. Given the presence of these outliers, we calculate both the average Kullback-Leibler measure across all

series and the average omitting these outliers. Given this selective average, we find that, in both the left and right

columns (i.e. in the case of both KL(Q,P ) and KL(P,Q), respectively), the mixed causal/noncausal model

represents a better fit to the sample data than the purely causal ARMA model.

Finally, Figure 1.8 provides an example of the estimated unconditional densities for cocoa and coffee, respec-

tively.

Figure 1.8: Estimated unconditional densities, Cocoa and Coffee

1.7 Forecasting the mixed causal/noncausal model

This section will first consider the problem of computing the predictive conditional density of the mixed

causal/noncausal model, when the information set includes only the past values of the time series data up to some

time t, say Ft = xt, xt−1, . . . , x1. We then evaluate the ability of the mixed causal/noncausal model to not

only fit the training sample, but also its ability to forecast out of sample.


1.7.1 The predictive distribution

Let us consider the general stochastic process:

xt = h(. . . , εt−1, εt, εt+1, . . . ), where εt ∼ i.i.d. (1.32)

Moreover, let Ft = xt, xt−1, . . . , x1 represent the information set generated by the stochastic process up to and

including time t.

The best nonlinear forecasts, at date t, for a given horizon h, can be deduced from the conditional distribution

of xt+h, given Ft. More precisely, if a(xt+h) is a square integrable transformation of xt+h, then its best predictor

is simply E[a(xt+h)|Ft] =∫a(xt+h)ft+h|t(xt+h|Ft)dxt+h.

In our framework, the standard moments may not exist and so we cannot choose to predict a(xt+h) = xt+h

for example. An alternative approach can be to compute the prediction intervals by considering the quantiles of

the predictive distribution. This is the solution adopted here.

Lanne, Luoto, and Saikkonen (2012) suggest a means whereby we can simulate these quantiles. Their nu-

merical algorithm is discussed in Appendix 1.13. However, this method is computationally demanding and not

necessarily the most straightforward method. Therefore, we begin a discussion below that considers the problem

from first principles.

1.7.2 Equivalence of information sets

Consider the general mixed causal/noncausal model from (1.17), with causal order r and noncausal order s. It is

clear that given the information set Ft = x1, . . . , xt, this is equivalent to knowing,

Ft ≡ x1, . . . , xr, ur+1, . . . , ut, (1.33)

since ut = φ(L)xt [see the Appendix 1.12]. Note that ut represents a shock to the process xt, which is an autore-

gressive function of xt, since ut = φ(L)xt, but where the ut’s are not i.i.d. Rather ut is noncausal autoregressive

since we have that ϕ(L−1)ut = ϕ(L−1)φ(L)xt = εt, where εt is i.i.d.

Knowing the latter information set in (1.33) is also equivalent to knowing,

Ft ≡ x1, . . . , xr, εr+1, . . . , εt−s, ut−s+1, . . . , ut, (1.34)


since εt = ϕ(L−1)ut = ϕ(L−1)φ(L)xt. Moreover, this information is also equivalent to,

Ft ≡ v1, . . . , vr, εr+1, . . . , εt−s, ut−s+1, . . . , ut, (1.35)

where ϕ(L−1)xt = vt. Therefore, for the process ut that is noncausal of order s, predicting ut+1 based on the

information set Ft, is equivalent to predicting it based on the information subset ut−s+1, . . . , ut, since the

v1, . . . , vr, εr+1, . . . , εt−s elements are independent of the future values ut+1, . . . , ut+h, for some forecast

horizon h.

Therefore, in establishing the predictive density of the mixed causal/noncausal process xt, we can focus our

attention on the problem of predicting the noncausal component ut+1 conditional on the past information set

Ft = ut−s+1, . . . , ut, since there is a direct relationship between the predictive distributions of ut+1 and xt+1

in the sense that,

fxt+1|t(xt+1 − µt|xt, . . . , x1) = fxt+1|t(φ(L)xt+1|xt, . . . , x1) (1.36a)

= fut+1|t(ut+1|xt, . . . , x1) (1.36b)

= fut+1|t(ut+1|ut, . . . , ut−s+1, εt−s, . . . , εr+1, xr, . . . , x1) (1.36c)

= fut+1|t(ut+1|ut, . . . , ut−s+1), (1.36d)

where µt = φ1xt + φ2xt−1 + · · ·+ φrxt−(r−1). (1.36e)

Since the change of variables implies a Jacobian determinant of 1, the conditional density of xt+1 is just a relo-

cation of the conditional density of ut+1. Here, µt represents a location parameter and so ut+1 = xt+1 − µt =

φ(L)xt+1. Therefore, by simulating the quantiles of fut+h|t(ut+h|Ft), we are able to generate prediction intervals

for xt+h.

The prediction problem of the noncausal process ϕ(L−1)ut+1 = εt+1, based on past information set Ft,

must be considered with some care. In this way we first consider some simple examples. Note that while ut is a

noncausal autoregressive process, we desire the causal predictor which is based on the past information set Ft,

and this predictor is generally nonlinear for non-Gaussian εt.

1.7.3 Examples: the causal prediction problem of the noncausal process

Example 1: AR(0, 1) case

Let us consider the prediction problem for the purely noncausal model of order s = 1. We get, xt+1 = ut+1 =

ϕ1ut+2 + εt+1, and εt+1 ∼ i.i.d. In this case we desire the predictive density fxt+1|t(xt+1|Ft), based on the past


values of the process Ft = xt, . . . , x1, but where the process (xt) is noncausal.

Since xt = ut and s = 1, the predictive density fxt+1|t(xt+1|xt) depends only on the past information set

xt = ut, and by Bayes Theorem we get

fxt+1|t(xt+1|xt) = fxt|t+1(xt|xt+1)fx(xt+1)/fx(xt), (1.37)

where fx(·) denotes the stationary distribution of the process (xt). We already know the noncausal transition den-

sity fxt|t+1(xt|xt+1), since it is defined by our linear model and our assumption on the shocks εt, since ut = xt,

and ϕ(L−1)ut = εt, the conditional density of ut given ut+1 is the same as the density of εt, up to a location

parameter. However, what is not clear is how to deal with the stationary distribution fx(·) since its analytical

expression is unknown in the general case [although it has been derived where εt ∼ Cauchy(0, σ2) in Gourier-

oux and Zakoian (2012)]. Lanne, Luoto, and Saikkonen (2012) suggest a means whereby we can circumvent

this problem by “enlarging the space” of random variables (see the Appendix 1.13). This very computationally

intensive approach is not the most direct, as we shall show below.

One alternative that works quite well when the order of the noncausal polynomial is low, is to simply ap-

proximate the stationary distribution fx(xt+1) in (1.37) by means of a kernel smoother. For example, given the

stationary nature of the data, xt = ut, a consistent estimator of fx(·) is given by the kernel density estimator:

fx(xt) ≈1

Th

T∑τ=1

K

(xt − xτ

h

), (1.38)

where h > 0 is an appropriately chosen smoothing parameter defining the bandwidth andK(·) is a kernel function,

for instance a symmetric function that integrates to one. The Epanechnikov Kernel K(x) = 34 (1−x2)1|x|≤1, can

be shown to be efficient in the mean squared error sense [see e.g. Epanechnikov (1969)].

Example 2: AR(0, s) with s > 1

Let us now consider a larger noncausal autoregressive order, where we still face the purely noncausal prediction

problem ut+1 = xt+1. Let the noncausal lag polynomial be of order s: ϕ(L−1) = 1− ϕ1L−1 − · · · − ϕsL−s.

Again, let us express the predictive density in terms of Bayes theorem, where, since ut = xt is a noncausal

autoregressive process of arbitrary order s, the prediction depends only on the subset of information given by

Ft = xt, . . . , xt−s+1,

fxt+1|t,...,t−s+1(xt+1|xt, . . . , xt−s+1)

=fxt|t+1,...,t+s

(xt−s+1|xt−s+2, . . . , xt, xt+1)fx(xt+1, xt, . . . , xt−s+2)

fx(xt, xt−1, . . . , xt−s+1)(1.39)


Again, fx(xt−s+1|xt−s+2, . . . , xt, xt+1) is known from our linear noncausal autoregressive model of order s.

However, it remains unclear how to deal with the joint stationary density, fx(·), of a sequence of s successive

values of the process, especially for a larger dimension of s.

Indeed, the kernel estimator will prove problematic for large noncausal orders, s, since we now face a multidi-

mensional smoothing problem. As the dimension of the smoothing problem increases, much more data is required

in order to get a reliable estimate of this joint density.

1.7.4 A Look-Ahead estimator of the predictive distribution

Gourieroux and Jasiak (2013) suggest a direct solution to the problem of computing the predictive density

fxt+1|t(·) of the noncausal process when the dimension s is relatively large. The method relies on the “Look-

Ahead” estimator of the stationary density fx(·) [see Glynn and Henderson (1998) for the introduction of this

estimator and Garibotti (2004) for an application]. First we describe the estimator in the univariate framework

where the order of the noncausal polynomial is s = 1, and then provide an analog for the case where s > 1.

Markov process

The Look-Ahead estimator, introduced by Glynn and Henderson (1998), is a relatively simple method which

allows us to estimate the stationary distribution of a Markov process, if it exists. Take for example, the Markov

process, (xt), discussed in Example 1 above, with unique invariant density fx(·), and transition density fxt|t+1(·)

as expressed in (1.37). This Markov transition density satisfies the Kolmolgorov equation,

fx(x∗t ) =

∫fxt|t+1

(x∗t |xt+1)fx(xt+1)dxt+1,∀x∗t , (1.40)

where x∗t denotes the generic argument of the stationary density. Therefore, given a finite sample from the

stationary process, (xτ )tτ=1, we can approximate the stationary density by

fx(x∗t ) =

t−1∑τ=0

fxt|t+1(x∗t |xτ+1),∀x∗t , (1.41)

where fxt|t+1(x∗t |xt+1) is known explicitly from our linear noncausal autoregressive model.


Markov process of order s

For larger noncausal order s > 1, the result is analogous. The two stationary distributions in the numerator and

denominator, fx(·), can be estimated by the Look-Ahead estimator as

fx(x∗t ) =

t−s∑τ=0

lxt|t+1(x∗t |xτ+s), (1.42)

where xt = xt, xt−1, . . . , xt−s+1. The density above is more easily understood as the factorization of the joint

noncausal transition density,

lxt|t+1(xt, . . . , xt−s+1|xt+1, . . . , xt+s) =

s−1∏j=0

fxt|t+1,...,t+s(xt−j |xt+1−j , . . . , xt+s−j), (1.43)

whose terms are known for all j, given the linear noncausal autoregressive model (they are equal to the density of

εt, up to a location parameter).

1.7.5 Drawing from the predictive distribution by SIR method

Given the approximate expression for the stationary density functions, fx(·), of both the numerator and denomina-

tor in (1.39), provided by the Look-Ahead estimator, we are now free to draw samples from the entire predictive

density fxt+1|t(·) directly. One way this can be accomplished is by means of the (SIR) Sampling Importance

Resampling technique [see Rubin (1988), and Smith and Gelfand (1992)].

The SIR method is essentially a reweighted bootstrap simulation. Suppose we have access to some drawings

from the continuous probability density f(x), say x1, . . . , xN, but we are unable to draw samples ourselves.

The bootstrap procedure directs us to resample from the set x1, . . . , xN, each draw having probability 1/N . The

resulting resampled set is then an approximation to draws from f(x), with the approximation error approaching

zero as N →∞. Indeed, for any resampled draw x we have,

Pr(x ≤ a) =1

N

N∑i=1

1xi≤a →n→∞ Ef [1x≤a] =

∫ a

−∞f(x)dx. (1.44)

Of course the bootstrap is limited in that if our initial sample from f(x) is small, repeatedly resampling

from this limited sample will provide a poor approximation. The SIR allows us to circumvent this problem by

allowing us to draw our initial sample from some instrumental density g(x). Then by resampling from this sample,

according to the weights f(x)/g(x), we are able to approximate a sample from f(x), rather than a sample from

g(x). To show this note that


Pr(x ≤ a) =

N∑i=1

(f(xi)

g(xi)

)1xi≤a →n→∞ Eg[

(f(x)

g(x)

)1x≤a] =

∫ a

−∞f(x)dx. (1.45)

The closer is the target f(x) to the instrumental density g(x), the faster the rate of convergence.

Within the context of generating draws from the predictive density of the noncausal process, fxt+1|t(·), we

should therefore generate draws from some proposal g(·) which closely approximates the target. Indeed, we have

an analytic approximate expression for fxt+1|t(·) in terms of the product of the noncausal conditional density and

the Look-Ahead estimators of the stationary densities (see equation (1.39)), but we are unable to draw from this

density directly.

The SIR method is especially appealing since it can be easily parallelized with reduced computational costs.

That is, we can draw N samples from the predictive density in parallel as opposed to say a Metropolis Hastings

algorithm, which is inherently sequential in nature.

Moreover, the Basel III voluntary regulation standard on bank capital levels, stress testing, and market liquidity

risk, was agreed upon by the members of the Basel Committee on Banking Supervision between 2010 to 2011

and is scheduled to be introduced in 2018. Part of this regulation is the requirement that econometric models

employed by financial institutions must include the possibility to simulate future sample paths for asset prices.

Of course, this is a prerequisite for performing stress tests. In this respect, the proposed methodology of Lanne,

Luoto, and Saikkonen (2012) would be rejected by regulators.

Forecasts up to some horizon ’h’

Given the joint predictive density, conditional on Ft, but out to some horizon h > 1, we can use the same SIR

method to draw samples as in the case where h = 1, since we can factorize the joint density as the product of the

expressions given in equation (1.39) as,

g(xt+h, . . . , xt+1|Ft) =

h∏j=1

fxt|t,...,t−s+1(xt+j |xt+j−1, . . . , xt+j−s+1) (1.46a)

=

h∏j=1

(fxt|t+1,...,t+s

(xt−s+j |xt+j−s+1, . . . , xt+j)) fx(xt+h, xt+h−1, . . . , xt+h−s+1)

fx(xt, xt−1, . . . , xt−s+1).

(1.46b)

Therefore, since terms in the product cancel, as h gets large we need only estimate one term in both the numerator

and denominator by the Look-Ahead method. Of course, for the SIR simulation with horizon h, we require an

h-dimension proposal density g(·).


1.7.6 Application to commodity futures data

While the method described above is computationally intensive, it is clear that it is ripe for parallelization since we

can potentially draw each of the N samples from the h-dimensional predictive density, g(xt+h, . . . , xt+1|Ft), at

the same time. In this sense, we have implemented the algorithm in parallel using the CUDA development libraries

designed and freely available from Nvidia at http://www.nvidia.ca/object/cuda\_home\_new.

html. All that is required is a Nvidia GPU (graphics processing unit) and knowledge of the C programming

language.

In order to evaluate forecasts, we set aside an additional 107 sample data points beyond the most recent date

available within-sample, which is February 8th, 2013. Therefore, this out of sample period extends between

February 11th to July 15th, 2013. 15

As an example we now employ the Look-Ahead estimator of the stationary density, and the SIR method, to

generate draws from the predictive density of the mixed causal/noncausal model for the coffee futures series. The

parameters of the model are those estimated in section 1.5.3, where the shock is skew t-distributed.

In the implementation of the SIR approach, the instrumental distribution, that is the importance function has to

be chosen close to the conditional distribution used to simulate the future asset price paths, that is, the predictive

distribution outlined above. We select as the instrumental distribution a multivariate Gaussian distribution. Such

a Gaussian distribution is parametrized by the vector of means and by the variance-covariance matrix.

However, the first and second order moments of the conditional distribution do not necessarily exist. There-

fore, the matching of the two distributions has to be based on other existing moments. Among the possible

alternatives are calibrations based on the joint characteristic function, or calibration based on the first and second

order moments of the square root of the absolute values of future prices, which exist. We have followed the

second calibration, which has the advantage of leading to a number of moment restrictions equal to the number

of parameters to be matched. Finally note that both the square root marginal and cross moments of the con-

ditional distribution of interest, and of the Gaussian approximation, have no closed form expression and have

to be computed numerically; for instance by reapplying the modified Look-ahead estimator for the conditional

distribution.

The following plot in Figure 1.9 provides the forecasted conditional median, and 95% prediction intervals.

15Feburary 9th and 10th fall on a weekend.


Figure 1.9: Forecast predictive density for Coffee futures price series

1.8 Conclusion

The mixed causal/noncausal autoregressive model is able to capture asymmetries and bubble features present

in the data on commodity futures prices. It improves model fit over the causal ARMA model with Gaussian

innovations, according to the AIC criterion, since the mixed causal/noncausal autoregressive specification takes

into account possible noncausality. This noncausality is unidentified in the traditional time series model, that is

the purely causal ARMA model with Gaussian innovations. Estimation of the purely causal ARMA models with

fat tailed, t-distributed, innovations emphasizes the noncausal nature of most series, where often the causal lag

polynomial roots lie inside the complex unit circle.

Moreover, inspection of the causal and noncausal lag polynomial roots of the mixed causal/noncausal autore-

gressive models suggest that longitudinal asymmetries can be accounted for by varying the causal and noncausal

coefficient weights. Moreover, allowing for a low degree of freedom in the fat tailed t-distribution of the er-

ror term can account for bubble like phenomenon and these bubbles can induce transversal asymmetries if the


model’s shock, εt, admits a skewed distribution. In this way the model can account for both the longitudinal and

transversal asymmetries described in Ramsey and Rothman (1996).

Furthermore, a comparison of the unconditional distributions, by sample histogram and Kullback-Leibler

measure, suggest that the mixed causal/noncausal model with t-distributed shocks is a much closer approximation

to the data than the equivalent purely causal ARMA model.

Finally, taking into account noncausal components is especially important when producing forecasts. Indeed,

the standard Gaussian causal model will provide smooth term structure of linear forecasts with some long run

equilibria. These forecasts are misleading in the presence of a noncausal component. Moreover, in many cases,

including the energy and metals sectors, the causal polynomial admits explosive roots and so the forecasts do

not exist. Employing a mixed causal/noncausal model therefore permits us to forecast the occurrence of future

bubbles, including when they begin their build-up, when they crash, and what will be their magnitude.


1.9 References

ANDREWS, B., R.A. DAVIS AND F.J. BREIDT (2006): “Maximum Likelihood Estimation for All-Pass TimeSeries Models,” Journal of Multivariate Analysis, 97, 1638-1659.

AZZALINI, A., AND A. CAPITANIO (2003): “Distributions Generated by Perturbation of Symmetry with Em-phasis on a Multivariate Skew t-Distribution,” Journal of the Royal Statistical Society, Series B, 65, 2, 367-389.

BLACK, F. (1976): “The Pricing of Commodity Contracts,” The Journal of Financial Economics, 3, 167-179.

BLANCHARD, O.J. (1979): “Speculative Bubbles, Crashes, and Rational Expectations,” Economic Letters, 3,387-389.

BLANCHARD, O.J., AND M. WATSON (1982): “Bubbles, Rational Expectations and Financial Markets,” Na-

tional Bureau of Economic Research, Working Paper No. 945.

BLANK, S.C. (1991): “Chaos in Futures Markets? A Nonlinear Dynamical Analysis,” The Journal of Futures

Markets, 11, 6, 711-728.

BLOOMBERG L.P. (2013) ”Futures Price Data for Various Continuous Contracts,” Bloomberg database, Univer-sity of Toronto, Mississauga, Li Koon Chun Finance Learning Center.

BREEDEN, D. (1979): “An Intertemporal Asset Pricing Model with Stochastic Consumption and InvestmentOpportunities,” Journal of Financial Economics, 7, 3, 265-296.

BREIDT, J., R. DAVIS, K. LII, AND M. ROSENBLATT (1991): “Maximum Likelihood Estimation for Non-causal Autoregressive Processes,” Journal of Multivariate Analysis, 36, 175-198.

BRENNAN, M.J. (1958): “The Supply of Storage,” American Economic Review, 47, 50-72.

—————- (1991): “The Price of Convenience and the Valuation of Commodity Contingent Claims,” in Stochas-

tic Models and Options Values, ed. by D. Land, and B. Oksendal, Elsevier Science Publishers.

BROCK, W.A., W.D. DECHERT, J. SCHEINKMAN, AND B. LEBARON (1996): “A Test for IndependenceBased on the Correlation Dimension,” Econometric Reviews, 15, 3, 197-235.

BROCK, W.A., AND C.H. HOMMES (1998): “Heterogenous Beliefs and Routes to Chaos in a Simple AssetPricing Model,” Journal of Economic Dynamics and Control, 22, 8-9, 1235-1274.

BROOKS, C., E. LAZAR, M. PROKOPCZUK, AND L. SYMEONIDIS (2011): “Futures Basis, Scarcity andCommodity Price Volatility: An Empirical Analysis,” International Capital Markets Association Center, WorkingPaper, University of Reading.


CHENG, Q. (1992): “On the Unique Representation of Non Gaussian Linear Processes,” Annals of Statistics, 20,1143-1145.

CRUZ LOPEZ, J., J.H. HARRIS, C. HURLIN, AND C. PERIGNON (2013): “CoMargin,” Bank of Canada,

Working Paper.

DEATON, A., AND G. LAROQUE (1996): “Competitive Storage and Commodity Price Dynamics,” Journal of

Political Economy, 104, 5, 896-923.

DECOSTER, G.P., W.C. LABYS, AND D.W. MITCHELL (1992): “Evidence of Chaos in Commodity FuturesPrices,” The Journal of Futures Markets, 12, 3, 291-305.

DUSAK, K. (1973): “Futures Trading and Investor Returns: An Investigation of Commodity Market Risk Premi-ums,” Journal of Political Economy, 81, 1387-1406.

EPANECHNIKOV, V.A. (1969): “Non-Parametric Estimation of a Multivariate Probability Density,” Theory of

Probability and its Applications, 14, 153-158.

EVANS, G. (1991): “Pitfalls in Testing for Explosive Bubbles in Asset Prices,” The American Economic Review,81, 4, 922-930.

FAMA, E.F., AND K.R. FRENCH (1987): “Commodity Futures Prices: Some Evidence on Forecast Power, Pre-miums, and the Theory of Storage,” The Journal of Business, 60, 1, 55-73.

FINDLEY, D.F. (1986): “The Uniqueness of Moving Average Representations with Independent and IdenticallyDistributed Random Variables for Non-Gaussian Stationary Time Series,” Biometrika, 73, 2, 520-521.

FROST, R. (1986): Trading Tactics: A Livestock Futures Anthology, ed. by Todd Lofton, Chicago MercantileExchange.

FULKS, B. (2000): “Back-Adjusting Futures Contracts,” Trading Recipes DB, http://www.trade2win.

com/boards/attachments/commodities/

90556d1283158105-rolling-futures-contracts-cntcontr.pdf.

GARIBOTTI, G. (2013): “Estimation of the Stationary Distribution of Markov Chains,” PhD Dissertation, Uni-versity of Massachusetts, Amherst, Department of Mathematics and Statistics.

GAY, D.M. (1990): “Usage Summary for Selected Optimization Routines,” Computing Science Technical Report,

No. 153, https://r-forge.r-project.org/scm/viewvc.php/*checkout*/pkg/Rnlminb2/inst/doc/PORT.pdf?revision=4506&root=rmetrics, New Jersey: AT&T Bell Labs.

GIBSON, R., AND E.S. SCHWARTZ (1990): “Stochastic Convenience Yield and the Pricing of Oil ContingentClaims,” The Journal of Finance, 45, 3, 959-976.


GLYNN, P., AND S. HENDERSON (1998): Estimation of Stationary Densities for Markov Chains, Winter Sim-ulation Conference, ed. by D. Medeiros, E. Watson, J. Carson and M. Manivannan, Piscataway, NJ: Institute ofElectrical and Electronics Engineers.

GOODWIN, B.K., AND N.E. PIGGOTT (2001): “Spatial Market Integration in the Presence of Threshold Ef-fects,” The American Journal of Agricultural Economics, 83, 2, 302-317.

GOURIEROUX, C., AND J. JASIAK (2005): “Nonlinear Innovations and Impulse Responses with Applicationto VaR Sensitivity,” Annals of Economics and Statistics, 1-31.

—————- (2013), “Filtering, Prediction, and Estimation of Noncausal Processes,” CREST, DP.

GOURIEROUX, C., J.J. LAFFONT, AND A. MONFORT (1982): “Rational Expectations in Dynamic LinearModels: Analysis of the Solutions,” Econometrica, 50, 2, 409-425.

GOURIEROUX, C., AND J.M ZAKOIAN (2012): “Explosive Bubble Modelling by Noncausal Cauchy Autore-gressive Process,” Center for Research in Economics and Statistics, Working Paper.

GRASSBERGER, P., AND I. PROCACCIA (1983): “Measuring the Strangeness of Strange Attractors,” Physica

D: Nonlinear Phenomena, 9, 1, 189-208.

HALLIN, M., C. LEFEVRE, AND M. PURI (1988): “On Time-Reversibility and the Uniqueness of MovingAverage Representations for Non-Gaussian Stationary Time Series,” Biometrika, 71, 1, 170-171.

HANSEN, L.P, AND T.J. SARGENT (1991): “Two Difficulties in Interpreting Vector Autogressions,” in Rational

Expectations Econometrics, ed. by L.P. Hansen and T.J. Sargent, Boulder, CO: Westview Press Inc., 77-119.

HYNDMAN, R.J. AND Y. KHANDAKAR (2008): ”Automatic Time Series Forecasting: The Forecast Packagefor R”, Journal of Statistical Software, 27, 3.

JONES, M.C. (2001): “A Skew-t Distribution,” in Probability and Statistical Models with Applications, ed. by A.Charalambides, M.V. Koutras, and N. Balakrishnan, Chapman & Hall/CRC Press.

KALDOR, N. (1939): ”Speculation and Economic Stability,” Review of Economic Studies, October, 7, 1-27.

KNITTEL, C.R., AND R.S. PINDYCK (2013): “The Simple Economics of Commodity Price Speculation,” Na-

tional Bureau of Economic Research, Working Paper No. 18951.

LANNE, M., J. LUOTO, AND P. SAIKKONEN (2012): “Optimal Forecasting of Noncausal Autoregressive TimeSeries,” International Journal of Forecasting, 28, 3, 623-631.

LANNE, M., H. NYBERG, AND E. SAARINEN (2011): “Forecasting U.S. Macroeconomic and Financial Time


Series with Noncausal and Causal Autoregressive Models: a Comparison,” Helsinki Center of Economic Re-

search, Discussion Paper No. 319.

LANNE, M., AND P. SAIKKONEN (2008): “Modeling Expectations with Noncausal Autoregressions,” Helsinki

Center of Economic Research, Discussion Paper No. 212.

LJUNG, G., AND E.P. BOX (1978): “On a Measure of a Lack of Fit in Time Series Models,” Biometrika, 65, 2,297-303.

LOF, M. (2011): “Noncausality and Asset Pricing,” Helsinki Center of Economic Research, Discussion Paper No.323.

MASTEIKA, S., A.V. RUTKAUSKAS, J.A. ALEXANDER (2012): ““Continuous Futures Data Series for BackTesting and Technical Analysis,” International Conference on Economics, Business and Marketing Management,

29, Singapore: IACSIT Press.

MUTH, J. (1961): “Rational Expectations and the Theory of Price Movements,” Econometrica, 29, 315-335.

NEFTCI, S.N. (1984): “Are Economic Time Series Asymmetric Over the Business Cycle,” Journal of Political

Economy, 92, 307-328.

NELDER, J.A., AND R. MEAD (1965): “A Simplex Method for Function Minimization,” The Computer Journal,

7, 4, 308-313.

NOLAN, J. (2009) “Stable Distributions: Models for Heavy Tailed Data,” http://academic2.american.edu/˜jpnolan/stable/chap1.pdf, American University.

RAMIREZ, O.A. (2009): “The Asymmetric Cycling of U.S. Soybeans and Brazilian Coffee Prices: An Op-portunity for Improved Forecasting and Understanding of Price Behavior,” Journal of Agricultural and Applied

Economics, 41, 1, 253-270.

RAMSEY, J., AND P. ROTHMAN (1996): “Time Irreversibility and Business Cycle Asymmetry,” Journal of

Money and Banking, 28, 1-21.

ROSENBLATT, M. (2000): Gaussian and Non-Gaussian Linear Time Series and Random Fields, New York:Springer Verlag.

ROSS, S. (1976): “The Arbitrage Theory of Capital Asset Pricing,” Journal of Economic Theory, 13, 3, 341-360.

RUBIN, D.B. (1988): “Using the SIR Algorithm to Simulate Posterior Distributions,” Bayesian Statistics, 3, ed.by J. M. Bernardo, M. H. DeGroot, D. V. Lindley, and A. F. M. Smith, Cambridge, MA: Oxford University Press,395-402.


SHARPE, W.F. (1964): “Capital Asset Prices: A Theory of Market Equilibrium Under Conditions of Risk,” Jour-

nal of Finance, 19, 3, 425-442.

SIGL-GRUB, C., AND D. SCHIERECK (2010): “Speculation and Nonlinear Price Dynamics in CommodityFutures Markets,” Investment Management and Financial Innovations, 7, 1, 62-76.

SMITH, A.F.M, AND A.E. GELFAND (1992): “Bayesian Statistics Without Tears: A Sampling-ResamplingPerspective,” The American Statistician, 46, 2, 84-88.

TERASVIRTA, T. (1994): “Specification, Estimation, and Evaluation of Smooth Transition Autoregressive Mod-els,” Journal of the American Statistical Association, 89, 425, 208-218.

TONG, H., AND K.S. LIM (1980): “Threshold Autoregression, Limit Cycles, and Cyclical Data,” Journal of the

Royal Statistical Society, Series B, 42, 3, 245-292.

TSAY, R.S (2010): Analysis of Financial Time Series, 3rd ed., New Jersey: Wiley Press.

WHITE, H. (1980): ”A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Het-eroskedasticity”, Econometrica, 48, 4, 817-838.

WORKING, H. (1933): “Price Relations Between July and September Wheat Futures at Chicago Since 1885,”Wheat Studies of the Food Research Institute, 9, 6, 187-238.

—————- (1948): “Theory of the Inverse Carrying Charge in Futures Markets.” Journal of Farm Economics,

30, 1, 1-28.

—————- (1949): “The Theory of the Price of Storage,” American Economic Review, 39, 1254-1262.

YANG, S.R., AND W. BRORSEN (1993): “Nonlinear Dynamics of Daily Futures Prices: Conditional Het-eroskedasticity or Chaos?,” The Journal of Futures Markets, 13, 2, 175-191.


1.10 Appendix: Rolling over the futures contract

Consider first, the “fair price” of the futures contract implied by the spot-futures parity theorem. The theoremimplies that, given the assumption of well functioning competitive markets, a constant, annual, risk-free rate ofinterest rf and a cost of carry c, no arbitrage should ensure that the following relationship between the futures andspot price of the underlying commodity holds at time t:

Ft,t+k = St(1 +k

365(rf + c)), (1.47)

where c ∈ [0, 1]. That is, given the exploitation of arbitrage opportunities, we should have that the cost ofpurchasing the underlying good at price St today and holding it until t+ k (given opportunity cost of capital andcost of carry) should be equal to the current futures price Ft,t+k. Of course, this relationship implies that as thematurity date approaches (i.e. as k → 0) we have that Ft,t = St.

This relationship is an approximate one and will not hold exactly in reality: indeed, the risk-free rate and thecost of carry vary in time and are uncertain, and some goods are perishable and cannot be stored indefinitely.Nevertheless this relationship is useful for considering the rolling over of futures contracts, since if we keep agiven futures contract in a portfolio, its residual maturity will decrease. The formula in (1.47) demonstrates thiseffect and the need to adjust the futures price series level if we want it to maintain the same residual maturity.

Upon the approach of the futures’ maturity, we also wish to extend the price series and obtain price data foreach date. In order to do so we would have to close out our current position and then open a new position inthe futures contract of the next maturity. For example, suppose we are holding a futures contract that expires attime t + k and k is approaching 0. We could sell this futures contract and purchase a new contract on the sameunderlying good but that expires at time t + k + j. However in doing so we would clearly incur a loss since wehave that:

1 +k

365(rf + c) < 1 +

k + j

365(rf + c) (1.48)

by the spot-futures parity theorem. This is known as rollover risk and the difference in the two prices is called thecalendar spread.

However, this loss for the trader should not be considered as part of the overall price series historical data weuse for forecasting since it represents a predictable discontinuity in the series. Therefore typically futures priceseries are also adjusted for this calendar spread by the data provider. There are a few ways to go about doing this,each with their pros and cons: 16

1. Just append together prices without any adjustment. This will distort the series, by including spuriousautocorrelation.

2. Directly adjust the prices up or down according to either the new or old contract at the rollover time period.This can be done by simply subtracting the difference between the two price series, or multiplying one ofthe price series by ratio of the two (i.e. absolute difference or relative difference, respectively). This methodworks, but it causes either the newer or older contract prices to diverge further and further from their originalvalues as we append additional contracts. Moreover, it leaves the choice of adjustment a rather arbitraryone.

3. Continuously adjust the price series over time. This method melds together the futures contract prices of

16See Fulks (2000), a widely disseminated PDF document available on the world wide web. Alternatively, Masteika et al. (2012) providesa more recent treatment of the relevant issues.


both the “front month” contract (i.e. the contract with the shortest time-to-maturity) with the contracts oflonger times-to-maturity (called the “back month” contracts) in a continuous manner. This allows us thepotential to create a continuous contract price which reflects an “unobserved” futures contract which main-tains a fixed time-to-maturity as time progresses. Ultimately, we are free to choose a model whereby wecan reconstitute the unobserved futures contract price by employing information in the prices of observedcontracts of different maturities.

Example: Smooth transition model

Consider two futures contracts on the same underlying commodity, one with time-to-maturity k, the otherwith time-to-maturity k+ j, where we assume that their prices, Ft,t+k and Ft,t+k+j , approximately satisfythe no arbitrage condition of the spot-futures parity theorem. Moreover, let εi,t for i = 1, 2 be error termssatisfying the standard assumptions of a regression model. The price variables Ft,t+k, Ft,t+k+j , and Stare observable, as is the current risk free rate rf,t. The cost of carry ct, is unobservable since it includes aconvenience yield, and so we must estimate it. Either way, we can then write down the model:

Ft,t+k = St(1 +k

365(rf,t + ct)) + ε1,t (1.49a)

Ft,t+k+j = St(1 +k + j

365(rf,t + ct)) + ε2,t (1.49b)

Pt = αFt,t+k + (1− α)Ft,t+k+j (1.49c)

where εi,t represents a residual deviation away from the spot-futures parity fair value, α = kK , where K

is an upper bound on k + j (that is it represents the time to maturity when the future is first issued) and jis sufficiently large so that the difference in futures prices aren’t negligible (typically j ≥ 30 since futurescontracts of different maturities are indexed by month).

Pt, therefore, represents our estimate of the unobserved contract which incorporates the information in thefront and back month contracts. Since the spot-futures parity doesn’t hold exactly, Pt reflects not just thespot price St, the risk free rate rf,t, and the cost of carry ct; but also some residual error factors εi,t fori = 1, 2.

The Bloomberg console allows the user to specify various criteria which modify how the continuous contractprice series is constructed from the front and back month contracts. Any of the 3 methods above are available foruse. In constructing the price series data employed in this paper we use a method similar to (3) above but simplerin its weighting. The continuous contract futures price Pt is equal to the front month contract price Ft,t+k until thecontract has 30 days left to maturity, so that k = 30. At that point, the continuous contract reflects the weightedaverage between the front month and the next back month contract, with the weights reflecting the number of daysleft until maturity of the front month contract. That is,

Pt =

(k

d

)Ft,t+k +

(k − xd

)Ft,t+k+j (1.50)

where d = 30 represents the total number of days in the month and k is the number of days remaining in themonth. Once k = 0, the price is then Pt = Ft,t+j , until this new front month contract again has 30 days leftuntil maturity, or j = 30. If the difference in time-to-maturity for all contracts is fixed at 30 days (i.e. a differentcontract matures every month), then this scheme represents the reconstitution of an unobserved futures contractwith a fixed time to maturity of 30 days, as time progresses forward indefinitely.


1.11 Appendix: Mixed causal/noncausal process

In this appendix we provide the definitions of mixed causal/noncausal processes and review several of their prop-erties employed in the main part of the text.

1.11.1 Strong moving average

The infinite moving average Yt =∑∞i=−∞ aiεt−i, where (εt) is a sequence of i.i.d. variables, that is a strong

white noise, can be defined for a white noise without first and/or second order moments.

Let us consider the Banach space Lp of the real random variables such that ‖Y ‖p =√E[|Y |p] exists, for a

given p. For expository purposes we consider the Banach space which requires p ≥ 1. However, the existenceof the process can also be proved for 0 < p ≤ 1. If ‖εt‖p =

√E[|εt|p] exists and if the set of moving av-

erage coefficients is absolutely convergent,∑∞i=−∞|ai| < ∞, then the series with elements aiεt−i is such that∑∞

i=−∞‖aiεt−i‖p =∑∞i=−∞|ai|‖εt−i‖p =

(∑∞i=−∞|ai|

)‖εt−i‖p < ∞, since ‖εt‖p is independent of date t.

Thus the series with elements aiεt−i is normally convergent. In particular the variable Yt =∑∞i=−∞ aiεt−i has a

meaning for the ‖·‖p convergence in the sense that

Yt = limn→∞,m→∞

n∑i=−m

aiεt−i, (1.51)

where the limit is with respect to the Lp-norm. Moreover, the limit Yt has a finite Lp-norm, such that ‖Yt‖p ≤(∑∞i=−∞|ai|

)‖εt−i‖p <∞.

The Lp convergence implies the convergence in distribution. The distribution of the process (εt) is invariantwith respect to the lag of time, that is to the operator L which transforms the process (εt) into L(εt) = (εt−1).Since the process (Yt) is derived from the white noise (εt) by a time invariant function, we deduce that thedistribution of (Yt) is the same as the distribution of L(Yt) = (Yt−1), that is (Yt) is a strong stationary process.

Similar arguments apply to any moving average transformation of a strongly stationary process existing in Lp,that is to:

Xt =

∞∑i=−∞

bjYt−j , (1.52)

whenever∑∞j=−∞|bj | <∞, since ‖Yt‖p is finite and time independent. In particular, we can as usual compound

moving averages. From the equations:

Yt =

∞∑i=−∞

aiεt−i = a(L)εt, with a(L) =

∞∑i=−∞

aiLi, (1.53a)

Xt =

∞∑j=−∞

bjYt−j = b(L)Yt, with b(L) =

∞∑j=−∞

bjLj , (1.53b)

we can deduceXt = b(L)a(L)εt, (1.54)

that is, the moving average representation of process (Xt) in terms of the underlying strong white noise (εt). Thenew moving average operator

c(L) = b(L)a(L) =

∞∑k=−∞

ckLk (1.55)


admits moving average coefficients given by

ck =

∞∑i=−∞

aibk−i =

∞∑j=−∞

ak−jbj , ∀k. (1.56)

1.11.2 Identification of a strong moving average representation

The question of the identification of a strong moving average representation is as follows. Let us consider astrong moving average process in Lp, Yt =

∑∞i=−∞ aiεt−i. Is it possible to also write this process as Yt =∑∞

i=−∞ a∗i ε∗t−i, that is, with different noise and moving average coefficients? Of course the white noise is defined

up to a multiplicative positive scalar c, since

Yt =

∞∑i=−∞

a∗i ε∗t−i, with a∗i = ai/c, ε

∗t = cεt. (1.57)

The identification conditions below have been derived previously in Findley (1986), Hallin, Lefevre, and Puri(1988), and Cheng (1992).

Identification condition

i) The moving average representation is identifiable up to a multiplicative positive scalar and to a drift of thetime index for the noise process, if and only if the distribution of the white noise is not Gaussian.

ii) If the white noise is Gaussian, the process always admits a causal Gaussian representation,

Yt =

∞∑i=0

a∗i ε∗t−i, with ε∗t ∼ IIN(0, 1). (1.58)

As a consequence the general linear process which is not purely causal, that is which depends on at least onefuture shock (i.e. ai 6= 0 for at least one negative time index i) cannot admit a strong linear causal representation.Equivalently, its strong causal representation will automatically feature nonlinear dynamic features.

1.11.3 Probability distribution functions of the stationary strong form noncausal repre-sentation

It can be shown that the unconditional distribution of the process in equation (1.4) is given as

ft(xt) =1− |ρ|σεπ

σ2ε

σ2ε + (1− |ρ|)2x2

t

. (1.59)

[Gourieroux and Zakoian (2012), Proposition 1] This unconditional distribution is independent of date t by thestrong stationary property.

Moreover, the Markov transition distribution (conditional density) of the forward-looking process is given as

ft|t+1(xt|xt+1) =1

σεπ

σ2ε

σ2ε + z2

t

, where zt =xt − ρxt+1

σε, (1.60)

which follows from the definition of the standard Cauchy distribution.


Therefore, from Bayes theorem along with equations (1.59) and (1.60), we have that

ft+1|t(xt+1|xt) = ft|t+1(xt|xt+1)ft+1(xt+1)/ft(xt) (1.61a)

=1

σεπ

σ2ε

σ2ε + z2

t

1−|ρ|σεπ

σ2ε

σ2ε+(1−|ρ|)2x2

t+1

1−|ρ|σεπ

σ2ε

σ2ε+(1−|ρ|)2x2

t

(1.61b)

=1

σεπ

σ2ε

σ2ε + z2

t

σ2ε + (1− |ρ|)2x2

t

σ2ε + (1− |ρ|)2x2

t+1

, (1.61c)

which provides the causal transition density of the process [Gourieroux and Zakoian (2012), Proposition 2].

1.11.4 The causal strong autoregressive representation

A nonlinear causal innovation, (ηt), of the process (xt) is a strong white noise such that we can write the currentvalue of the process xt as a nonlinear function of its own past value xt−1 and ηt: xt = G(xt−1, ηt), say, wherext and ηt are in a continuous one-to-one relationship given any xt−1 [Rosenblatt (2000)].

Moreover, since the conditional cumulative distribution function of xt|xt−1 is strictly monotone increasingand continuous, it has an inverse. We can write

xt = F−1(Φ(ηt)|xt−1) where ηt ∼ IIN(0, 1) (1.62a)

⇔ ηt = Φ−1[F (xt|xt−1)], (1.62b)

and F (·|xt−1) is the c.c.d.f. of xt while Φ(·) is the c.d.f. of the standard Normal distribution. Therefore, bychoosingG(xt−1, ηt) = F−1(Φ(ηt)|xt−1), we can select a Gaussian causal innovation. The choice of a Gaussiancausal innovation is purely conventional.

1.11.5 Distributions with fat tails

Different distributions with fat tails can be used as the distribution of the baseline shocks (εt) to construct mixedcausal/noncausal linear processes. Below we provide three examples of fat tailed distributions that are employedin this paper, that are the student t-distribution, the skewed student t-distribution [see Jones (2001)], and the“stable” distributions [see Nolan (2009)], respectively.

i) Student t-distribution:This is a distribution on (−∞,+∞) with probability density function given as:

f(x) =1√νπ

Γ(ν+12 )

Γ(ν2 )

(1 +

x2

ν

)− ν+12

, (1.63)

where ν > 0 is the real degree of freedom parameter and Γ(·) is the Gamma function defined as Γ(z) =∫∞0tz−1e−tdt, if z > 0.

The p.d.f. is symmetric; it bears the same “bell” shape as the Normal distribution except that the t-distributionexhibits fat tails. As the number of degrees of freedom, ν goes to 1 the t-distribution approaches the Cauchydistribution and as the degree of freedom approaches∞, the t-distribution approaches the Normal distribu-tion.

Its tail behaviour is such that E[|x|p] <∞, if ν > p.


ii) Skewed t-distribution: [Jones (2001), Section 17.2]This is a distribution on (−∞,+∞) with probability density function given as:

f(x) =1

2ν−1β(a, b)√ν

(1 +

x√ν + x2

)a+1/2(1− x√

ν + x2

)b+1/2

, (1.64)

where ν = a+ b, a and b are two positive real valued degrees of freedom parameters and β(a, b) representsthe Beta function defined as β(a, b) = Γ(a)Γ(b)/Γ(a + b). If a > b the distribution is positively skewed,negatively skewed if a < b, and identical to the t-distribution above if a = b. This distribution allows fordifferent magnitudes for the left and right fat tails, respectively.

Another skewed t-distribution has been proposed in the literature as a generalization of the skewed Normaldistribution. This alternative skewed t-distribution is parameterized by only one skewness parameter insteadof two as in Jones (2001) [see Azzalini and Capitanio (2003), Section 4, for more details].

iii) Stable distribution:A random variable, x, is said to be “stable,” or to have a “stable distribution,” if a linear combination of twoindependent copies of x has the same distribution as x, up to location and scale parameters. That is, if x1 andx2 are independently drawn from the distribution of x, then x is stable if for any constants a > 0 and b > 0

the random variable z = ax1 + bx2 has the same distribution as cx+ d for some constants c > 0 and d. Thedistribution is said to be strictly stable if d = 0.

Generally, we cannot express the p.d.f. of the stable random variable x in an analytical form. However, thep.d.f is always expressable as the Fourier transform of the characteristic function, ϕ(t), which always exists,that is, f(x) = 1

2π

∫∞−∞ ϕ(t)e−ixtdt. The characteristic function is given as:

ϕ(t) = exp [itµ− |ct|α (1− iβsign(t) tan (πα/2))]. (1.65)

Therefore the distribution is parameterized by α, β, c, µ where α ∈ (0, 2] is the stability parameter, β ∈[−1, 1] is a skewness parameter, c ∈ (0,∞) is the scale parameter, and µ ∈ (−∞,∞) is the locationparameter.

The Normal, Cauchy, and Levy distributions are all stable continuous distributions. If α = 2 the stabledistribution reduces to the Normal distribution. If α = 1/2 and β = 1, it corresponds to the Levy distribution.Finally, if α = 1 and β = 0 the distribution is Cauchy and the p.d.f. is given analytically as:

f(x) =1

π (1 + x2). (1.66)

Even if the p.d.f. of a stable distribution has no explicit expression, its asymptotic behaviour is known. Wehave [see Nolan (2009), Th 1.12]:

f(x) ∼ cα (1 + sign(x)β)sin(πα/2)Γ(α+ 1)/π

|x|1+α, for large x. (1.67)

Therefore, E[|x|p] <∞, if α > p. In particular the mean does not exist if α ≤ 1.


1.12 Appendix: Approximation of the mixed causal/noncausal AR(r, s)

likelihood

This section describes the nature of the matrix transformations which ensure that the MLE estimator is consistent,by regressing both forward (noncausal) and backward (causal) lags on xt.

It will first be useful to define the following processes ut and vt. From (1.17a), let ut be defined as

ut = φ(L)xt = ϕ(L−1)−1εt =

∞∑j=0

ϕ∗j εt+j , (1.68)

where ϕ∗0 = 1 and the right hand side series of moving average coefficients are absolutely summable. We call(1.68) the forward looking moving average representation of xt.

Moreover, also from (1.17a) let vt be defined as

vt = ϕ(L−1)xt = φ(L)−1εt =

∞∑j=0

φ∗j εt−j , (1.69)

where φ∗0 = 1 and the right hand side series of moving average coefficients are absolutely summable. We call(1.69) the backward looking moving average representation of xt.

The changes of variables above can also be written in matrix form. Consider the time series xt for t =

1, . . . , T . From (1.68) and (1.69), we have ut = φ(L)xt and vt = ϕ(L−1)xt. Therefore, let us introduce thefollowing matrices, Φc and Φnc:

Φc =

Ir×r 0r×(T−r)

−φr −φr−1 . . . −φ1 1 0 . . . . . . . . . . . .

0 −φr −φr−1 . . . −φ1 1 0 . . . . . . . . .

. . .. . . . . . 0 −φr −φr−1 . . . −φ1 1 0 . . .

0s×(T−s) Is×s

(1.70)

and

Φnc =

1 −ϕ1 . . . −ϕs−1 −ϕs 0 . . . . . . . . . . . .

0 1 −ϕ1 . . . −ϕs−1 −ϕs 0 . . . . . . . . .

. . .. . .

. . .. . . . . . . . . . . . 0 1 −ϕ1 . . . −ϕs−1 −ϕs. . . 0 −φr −φr−1 . . . −φ1 1 0 . . . 0

. . .. . . . . . . . . 0 −φr −φr−1 . . . −φ1 1 0

. . . . . . . . . . . . 0 −φr −φr−1 . . . −φ1 1

(1.71)

where the lower partition of Φnc has s rows. Therefore, Φc will represent the causal transformation and Φnc thenoncausal transformation, respectively. Both matrices are of size T × T .


Applying the noncausal transformation to the vector of data, x, we have:

v1

...

...vT−s

uT−s+1

...uT

=

x1 − ϕ1x2 − · · · − ϕsx1+s

...

...xT−s − ϕ1xT−s+1 − · · · − ϕsxT

xT−s+1 − φ1xT−s − · · · − φrxT−s+1−r...

xT − φ1xT−1 − · · · − φrxT−r

= Φnc

x1

...

...xT−s

xT−s+1

...xT

(1.72)

Moreover, from εt = φ(L)ϕ(L−1)xt = φ(L)vt, we have:

e =

v1

...vr

εr+1

...εT−s

uT−s+1

...uT

=

v1

...vr

vr+1 − φ1vr − · · · − φrv1

...vT−s − φ1vT−s−1 − · · · − φrvT−s−r

uT−s+1

...uT

= Φc

v1

...vT−s

uT−s+1

...uT

(1.73)

So we have the transformation e = ΦcΦncx.

Thus the elements of e are mutually independent and the joint density of e is given as:

fe(e|θ) = fv(v1, . . . , vr)

(T−s∏t=r+1

fε(εt;λ, σ)

)fu(uT−s+1, . . . , uT ), (1.74)

where θ = φ,ϕ, λ, σ represents the parameters of the model.

The Φc matrix is lower triangular and its determinant is equal to 1. Therefore, using the change of variablesJacobian formula, we can express the joint density in terms of x as:

fx(x|θ) = fv(ϕ(L−1)x1, . . . , ϕ(L−1)xr)

(T−s∏t=r+1

fε(ϕ(L−1)φ(L)xt;λ, σ)

)fu(φ(L)xT−s+1, . . . , φ(L)xT )|det(Φnc)|. (1.75)

Since the determinant of Φnc is independent of sample size,17 we can approximate asymptotically the likelihood

17To show this we can employ the partitioned matrix determinant formula: det

([A11 A12

A21 A22

])=

det (A11) det(A22 −A21A11

−1A12), where it can be shown that A11 is (T − s) × (T − s) with determinant 1, and so the

second term in the factorization represents the determinant of an s× s matrix, for all T .


by using the second factor in the above expression, that is,

T−s∏t=r+1

fε(ϕ(L−1)φ(L)xt;λ, σ). (1.76)

For large samples, T will dwarf r + s = p and so the approximation will be consistent.

Asymptotic properties of the approximated maximum likelihood estimators are discussed in section 3.2 andconsistent estimation of the standard errors is detailed in section 3.3, both of Lanne et al. (2008).

1.13 Appendix: Numerical algorithm for mixed causal/noncausal AR(r, s)

forecasts

Solution proposed by Lanne, Luoto, and Saikkonen (2012)Lanne, Luoto, and Saikkonen (2012) propose to circumvent the problem presented by our ignorance of the sta-tionary distribution fx(·) by enlarging the space of random variables. They first rewrite (1.37) as:

fxt+1|t(xt+1|xt) = fxt,xt+1(xt, xt+1)/fx(xt). (1.77)

Then by using the fact that xt = ut = ϕ(L−1)εt =∑∞j=0 ϕ

j1εt+j , they choose to employ the mapping

(xt, xt+1, xt+2, . . . ) → (εt, εt+1, εt+2, . . . ). This suggests a linear relationship which, by approximating xt =

ut ≈∑Mj=0 ϕ

j1εt+j given a sufficiently large truncation lag M , we are able to invert, providing an approximate

expression for εt as a linear function of both xt and future εt+1, εt+2, . . . , εt+M . For example in this case wherethe noncausal polynomial is of order 1, we have that εt ≈ xt −

∑Mj=1 ϕ

j1εt+j .

Since, by assumption, the distribution of the shocks εt is known, the authors are able to compute the prob-ability of these approximated εt’s, and relying upon Monte-Carlo simulation methods, are able to approximatethe conditional C.D.F. function of xt+1 = ut+1. The conditional C.D.F. function at a given value α ∈ R can becomputed from (1.77) above by means of approximating the following integral by Monte-Carlo simulation, wherewe average across draws of sufficiently long future paths of ε+

t+1 = εt+1, . . . , εt+M:

Fxt+1|t(α|xt) =

∫1α>xt+1fxt+1|t(xt+1|xt)dxt+1 (1.78a)

≈∫

1

fε(εt)1α>xt+1

M−1∑j=0

ϕj1εt+1+j

fε(εt)

M∏j=1

fε(εt+j)dε+t+1 (1.78b)

This method has two drawbacks: first, we approximate the above integral by Monte-Carlo simulation of thelong future paths of ε+

t+1. Second, M has to be sufficiently large so that the approximation does not miss theeffect of far future shocks. The value of M required to obtain an accurate approximation will grow as the roots ofthe noncausal polynomial approach 1, and so will the computational requirements of the algorithm.

The numerical method proposed by Lanne, Luoto, and Saikkonen (2012) also works in the more general casewhere s > 1. However, now that the noncausal order is greater than 1 enlarging the space from(xt−s+1, . . . , xt, xt+1, . . . )→ (εt−s+1, . . . , εt, εt+1, εt+2, . . . ) requires us to invert a system of equations. There-fore, we may employ a matrix transformation between the two spaces and this matrix is inverted to provide anapproximation to εt, . . . , εt−s+1 in terms of both xt, . . . , xt−s+1 and future εt+1, . . . , εt+M . It is noted in theirpaper (and in the Appendix here) that the Jacobian determinant of this transformation is always 1. However, while


this matrix is sparse, for large s and M it is computationally costly.Below, we describe their method for the approximate simulation of the conditional c.d.f.,

Fut+h|t(α|Ft) =

∫ α

−∞fut+h|t(ut+h|Ft)dut+h (1.79)

for h = 1, when s > 1 (which they also generalized to the case where h > 1 in their paper). The method isbroken down into a number of discussion points as follows:

1. We require the density of ε+t+1 = εt+1, εt+2, . . . , conditional on the data

xt = xt, xt−1, . . . x1.

2. Since from (1.68), we have that ut+1 =∑∞j=0 ϕ

∗j εt+1+j , from equation (1.75) it can be shown that:

fx,ε+(xt, ε+t+1|θ)

fx(xt|θ)= p(ε+

t+1|xt; θ) =fu−,ε+(u−t (φ), ε+

t+1)

fu−(u−t (φ)), (1.80)

where θ represents the parameters of the mixed causal/noncausal AR(r, s) model andu−t (φ) = φ(L)xt−s+1, . . . , φ(L)xt = ut−s+1, . . . , ut.

3. Then, we can use Monte-Carlo simulations to approximate both the numerator and denominator of (1.80)in order to approximate the desired conditional c.d.f. as:

Fut+1|t(α|Ft) ≈1

fu(u−t (φ))

∫1α>ut+1

M−1∑j=0

ϕ∗j εt+1+j

fu,ε+(u−t (φ), ε+t+1)dε+

t+1, (1.81)

where under the assumption of some finite M (such that as M →∞, (ϕ∗j )→ 0), we can approximate ut+1

as ut+1 ≈∑M−1j=0 ϕ∗j εt+1+j .

4. In order to do this, however, we need to accomplish a change of variables between (u−t (φ), ε+t+1) and

(εt−s+1, . . . , εt, ε+t+1). Given (1.68), the approximate mapping between these two sets of variables is

given as:

1 ϕ∗1 . . . . . . . . . . . . ϕ∗M+s−1

0. . . . . .

......

. . . 1 ϕ∗1 . . . . . . ϕ∗M...

. . . 1 0 . . . 0...

. . . . . . . . ....

.... . . . . . 0

0 . . . . . . . . . . . . 0 1

εt−s+1

...εt

εt+1

...εt+M

≈

ut−s+1

...ut

εt+1

...εt+M

(1.82)

which can be written as Ce ≈ w. Therefore, by inverting C and noting that its determinant is 1, we canwrite the numerator in (1.80) as:

fu−,ε+(u−t (φ), ε+t+1) ≈

s∏j=1

fε(εt−s+j(u−t (φ), ε+

t+1))

t+M∏τ=t+1

fε(ετ ), (1.83)

where we write the elements εt−s+j(u−t (φ), ε+t+1) as such to indicate that they are functions of both u−t (φ)


and ε+t+1.

5. Therefore, if we simulate N i.i.d. draws of the M length vector ε+t+1,i (i.e. for i = 1, . . . , N ) according to

fε(·), an approximation to the desired conditional c.d.f. in (1.81) is given as:

Fut+1|t(α|Ft) ≈N−1

∑Ni=1 1α>ut+1

(∑M−1j=0 ϕ∗j εt+1+j,i

)∏sj=1 fε(εt−s+j(u

−t (φ), ε+

t+1,i))

N−1∑Ni=1

∏sj=1 fε(εt−s+j(u

−t (φ), ε+

t+1,i)). (1.84)

Then, given an appropriately chosen grid of αi’s, we can generate an approximation to the shape of thec.d.f. across its support.

1.14 Appendix: Tables and Figures


Table 1.7.i: Lag polynomial roots of the mixed and benchmark models

Model p/r,q/s Sig.p/r Sig.q/s cR cMC ncR ncMC #CCSoybean meal skew-t arma 10,0 1,8,9 1.010 1.571 4

1.5811.5821.583

t-dist mixed 10,10 1,3,5,7,9,10 1,2,3,4,6,9 1.385 1.354 -1.716 1.091 4/4-2.532 1.414 1.530

1.474 1.5301.500 1.561

Soybean oil skew-t arma 10,0 1,10 1.033 1.478 41.306 1.558

1.6001.619

t-dist mixed 10,10 1,2,4,9,10 1,2,3,4,8 1.373 1.341 1.009 1.666 4/3-1.797 1.359 1.285 1.669

1.390 1.4741.510

Soybeans skew-t arma 10,0 1,2,5,8,9 1.028 1.514 41.5511.5561.582

skew-t mixed 10,10 1,2,5,8,10 1 -1.559 1.358 0.944 4/01.749 1.464

1.4771.558

Orange juice skew-t arma 10,0 1,2,3,10 1.033 1.505 41.5721.6231.660

skew-t mixed 10,10 1,2,5,9 1,2,5 1.556 1.518 1.060 2.460 4/11.542 1.8431.555 -2.7501.608

Sugar skew-t arma 1,2 1 1,2 1.000 4.590 34.7565.0105.487

t-dist mixed 2,2 1,2 1,2 4.373 1.002 1/014.637

Wheat skew-t arma 5,0 1,5 0.992 2.350 22.655

skew-t mixed 5,5 1,2,3,5 1,3,4 1.006 1.814 1.789 2.046 2/12.071 -2.434

Cocoa skew-t arma 10,0 1 1.022 0skew-t mixed 10,10 1,6,9 1,2,4,9,10 1.436 1.417 -1.435 1.202 4/4

1.486 1.740 1.4081.499 1.4141.508 1.426


Table 1.7.ii: Lag polynomial roots of the mixed and benchmark models

Model p/r,q/sa Sig.p/rb Sig.q/sb cRc cMCc ncRd ncMCd #CCe

Coffee t-dist arma 10,0 1,3 0.995 4.740 1skew-t mixed 10,10 1,2,5,6,10 1,2,5,6,7 1.375 1.027 1.684 5/2

1.403 1.571 1.7621.428 -1.6451.4301.446

Corn skew-t arma 2,0 1,2 1.000 051.190

t-dist mixed 2,3 1 1,2,3 -32.542 1.002 5.484 0/1Cotton skew-t arma 10,0 1,2,6,7 1.007 1.738 3

1.7071.615

t-dist mixed 1,3 0 1,2,3 1.003 5.317 0/1Rice skew-t arma 2,2 1,2 1,2 0.997 3.099 3

2.917 3.332-3.552 3.493

t-dist mixed 1,3 1 1,2,3 -15.328 1.001 5.003 0/1Lumber skew-t arma 1,1 1 1 1.005 13.181 4

13.23713.31413.375

skew-t mixed 10,10 1,2,4-10 1,5 1.015 1.235 -1.862 1.218 4/2-1.454 1.247 1.752

1.3361.900

Gold t-dist arma 3,0 1,2,3 0.999 5.618 1t-dist mixed 10,10 1,2,6,10 1 -1.450 1.395 0.974 4/0

1.489 1.4161.4311.434

Silver skew-t arma 10,0 1,2,4,8 1.003 1.606 3-1.874 1.715

1.751skew-t mixed 10,10 1,3-6,9,10 1,4,5,7 1.479 1.424 0.996 4

-1.533 1.424 1.600 1.721 4/21.451 -2.070 1.6431.327

Platinum skew-t arma 10,0 1,4,7,8,9 0.957 1.493 41.5281.5721.582

skew-t mixed 10,10 1,2,3,5-9 1,2,6-8,10 -1.786 1.355 0.974 1.304 4/41.376 1.257 1.3281.385 1.4011.860 1.594

a (p,q) or (r,s) pairs for ARMA(p,q) and mixed causal/noncausal AR(r, s) models respectively.b Significant lags at the 5% level assuming Normal distributed parameters.c Causal lag polynomial; real roots and modulus of complex roots respectively.d Noncausal lag polynomial; real roots and modulus of complex roots respectively.e Number of complex conjugate roots with the same modulus (causal/noncausal).


Table 1.7.iii: Lag polynomial roots of the mixed and benchmark models

Model p/r,q/s Sig.p/r Sig.q/s cR cMC ncR ncMC #CCPalladium skew-t arma 5,0 1,2,4,5 1.006 2.431 1

-2.4343.525

t-dist mixed 8,8 1,2-7 1,2,3,7,8 -1.618 1.621 0.989 1.547 3/31.632 1.536 1.5741.884 1.619

Copper skew-t arma 10,0 1,2,6 1.055 2.020 21.696 2.101

skew-t mixed 10,10 1,2,3,6 1,6,7,8 1.728 0.952 1.352 3/31.737 -1.323 1.4821.831 1.751

Light crude oil t-dist arma 2,0 1,2 0.999 0-23.729

skew-t mixed 1,3 1 1,2,3 -14.222 1.002 6.144 0/1Heating oil t-dist arma 2,0 1,2 0.999 0

-27.213t-dist mixed 10,10 1-4,7,9,10 1-6,9,10 1.245 1.279 1.032 1.259 4/4

-2.553 1.307 -1.505 1.3031.349 1.3151.368 1.372

Brent crude oil t-dist arma 2,2 1,2 1,2 0.989 2.466 32.255 2.621

-2.716 2.695skew-t mixed 10,10 1,4,9,10 1,2,5,6,9 1.261 1.292 1.068 1.276 4/3

-1.527 1.331 1.101 1.3881.336 -1.723 1.5401.500

Gas oil skew-t arma 1,0 1 0.998 0skew-t mixed 10,10 3,7,9,10 1,4,7-10 1.230 1.324 0.925 1.346 4/4

-2.140 1.328 -1.264 1.4831.341 1.5421.508 1.563

Natural gas t-dist arma 1,2 1 1 1.001 34.697 434.76534.83934.886

t-dist mixed 1,1 1 1 -31.650 1.001 0/0Gasoline RBOB skew-t arma 3,0 1,3 0.972 4.452 1

skew-t mixed 2,1 2 1 4.390 1.005 1/0Live cattle skew-t arma 10,0 1,5 1.019 2.408 1

1.973-2.543

t-dist mixed 10,10 1 3,4,6 0.994 1.896 0/31.7281.891

Lean hogs skew-t arma 5,0 1,4,5 0.984 2.555 1-2.5252.744

skew-t mixed 0,2 1.004 0/055.339


Figure 1.10.i: Plots of daily continuous contract futures price level series

100

150

200

250

300

350

400

450

500

55008

/01

1977

09/0

119

82

09/0

119

87

10/0

119

92

11/0

119

97

12/0

120

02

01/0

120

08

02/0

120

13

Soybean meal from 07/18/1977 to 02/08/2013

10

20

30

40

50

60

70

80

08/0

119

77

09/0

119

82

09/0

119

87

10/0

119

92

11/0

119

97

12/0

120

02

01/0

120

08

02/0

120

13

Soybean oil from 07/18/1977 to 02/08/2013

400

600

800

1000

1200

1400

1600

1800

08/0

119

77

09/0

119

82

09/0

119

87

10/0

119

92

11/0

119

97

12/0

120

02

01/0

120

08

02/0

120

13

Soybeans from 07/18/1977 to 02/08/2013

40

60

80

100

120

140

160

180

200

220

08/0

119

77

09/0

119

82

09/0

119

87

10/0

119

92

11/0

119

97

12/0

120

02

01/0

120

08

02/0

120

13

Orange juice from 07/18/1977 to 02/08/2013

0

5

10

15

20

25

30

35

40

45

50

08/0

119

77

09/0

119

82

09/0

119

87

10/0

119

92

11/0

119

97

12/0

120

02

01/0

120

08

02/0

120

13

Sugar from 07/18/1977 to 02/08/2013

200

400

600

800

1000

1200

1400

08/0

119

77

09/0

119

82

09/0

119

87

10/0

119

92

11/0

119

97

12/0

120

02

01/0

120

08

02/0

120

13

Wheat from 07/18/1977 to 02/08/2013

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

5500

08/0

119

77

09/0

119

82

09/0

119

87

10/0

119

92

11/0

119

97

12/0

120

02

01/0

120

08

02/0

120

13

Cocoa from 07/18/1977 to 02/08/2013

0

50

100

150

200

250

300

350

08/0

119

77

09/0

119

82

09/0

119

87

10/0

119

92

11/0

119

97

12/0

120

02

01/0

120

08

02/0

120

13

Coffee from 07/18/1977 to 02/08/2013


Figure 1.10.ii: Plots of daily continuous contract futures price level series

100

200

300

400

500

600

700

800

90008

/01

1977

09/0

119

82

09/0

119

87

10/0

119

92

11/0

119

97

12/0

120

02

01/0

120

08

02/0

120

13

Corn from 07/18/1977 to 02/08/2013

20

40

60

80

100

120

140

160

180

200

220

08/0

119

77

09/0

119

82

09/0

119

87

10/0

119

92

11/0

119

97

12/0

120

02

01/0

120

08

02/0

120

13

Cotton from 07/18/1977 to 02/08/2013

0

5

10

15

20

25

12/0

119

88

09/0

119

92

07/0

119

96

05/0

120

00

02/0

120

04

12/0

120

07

10/0

120

11

Rice from 12/06/1988 to 02/08/2013

100

150

200

250

300

350

400

450

500

04/0

119

86

01/0

119

90

11/0

119

93

09/0

119

97

06/0

120

01

04/0

120

05

02/0

120

09

11/0

120

12

Lumber from 04/07/1986 to 02/08/2013

0

200

400

600

800

1000

1200

1400

1600

1800

2000

08/0

119

77

09/0

119

82

09/0

119

87

10/0

119

92

11/0

119

97

12/0

120

02

01/0

120

08

02/0

120

13

Gold from 07/18/1977 to 02/08/2013

0

5

10

15

20

25

30

35

40

45

50

08/0

119

77

09/0

119

82

09/0

119

87

10/0

119

92

11/0

119

97

12/0

120

02

01/0

120

08

02/0

120

13

Silver from 07/18/1977 to 02/08/2013

200 400 600 800

1000 1200 1400 1600 1800 2000 2200 2400

04/0

119

86

01/0

119

90

11/0

119

93

09/0

119

97

06/0

120

01

04/0

120

05

01/0

120

09

11/0

120

12

Platinum from 04/01/1986 to 02/08/2013

0

200

400

600

800

1000

1200

04/0

119

86

01/0

119

90

11/0

119

93

09/0

119

97

06/0

120

01

04/0

120

05

01/0

120

09

11/0

120

12

Palladium from 04/01/1986 to 02/08/2013


Figure 1.10.iii: Plots of daily continuous contract futures price level series

50

100

150

200

250

300

350

400

450

50012

/01

1988

09/0

119

92

07/0

119

96

05/0

120

00

02/0

120

04

12/0

120

07

10/0

120

11

Copper from 12/06/1988 to 02/08/2013

0

20

40

60

80

100

120

140

160

04/0

119

83

01/0

119

87

11/0

119

90

09/0

119

94

06/0

119

98

04/0

120

02

01/0

120

06

11/0

120

09

Light crude oil from 03/30/1983 to 02/08/2013

0

50

100

150

200

250

300

350

400

450

07/0

119

86

04/0

119

90

02/0

119

94

12/0

119

97

09/0

120

01

07/0

120

05

04/0

120

09

02/0

120

13

Heating oil from 07/01/1986 to 02/08/2013

0

20

40

60

80

100

120

140

160

07/0

119

88

04/0

119

92

02/0

119

96

12/0

119

99

09/0

120

03

07/0

120

07

04/0

120

11

Brent crude oil from 06/23/1988 to 02/08/2013

0

200

400

600

800

1000

1200

1400

07/0

119

89

04/0

119

93

02/0

119

97

12/0

120

00

09/0

120

04

07/0

120

08

04/0

120

12

Gas oil from 07/03/1989 to 02/08/2013

0

2

4

6

8

10

12

14

16

04/0

119

90

01/0

119

94

11/0

119

97

09/0

120

01

06/0

120

05

04/0

120

09

01/0

120

13

Natural gas from 04/03/1990 to 02/08/2013

50

100

150

200

250

300

350

400

10/0

120

05

01/0

120

07

04/0

120

08

07/0

120

09

11/0

120

10

02/0

120

12

Gasoline RBOB from 10/04/2005 to 02/08/2013


Figure 1.10.iv: Plots of daily continuous contract futures price level series

30 40 50 60 70 80 90

100 110 120 130 140

08/0

119

77

05/0

119

81

03/0

119

85

01/0

119

89

10/0

119

92

08/0

119

96

05/0

120

00

03/0

120

04

01/0

120

08

10/0

120

11

Live cattle from 07/18/1977 to 02/08/2013

20

30

40

50

60

70

80

90

100

110

04/0

119

86

01/0

119

90

11/0

119

93

09/0

119

97

06/0

120

01

04/0

120

05

01/0

120

09

11/0

120

12

Lean hogs from 04/01/1986 to 02/08/2013


Figure 1.11.i: Histograms of daily continuous contract futures price level series

0

100

200

300

400

500

600

700

100 150 200 250 300 350 400 450 500 550

Soybean meal from 07/18/1977 to 02/08/2013

0

50

100

150

200

250

300

350

10 20 30 40 50 60 70 80

Soybean oil from 07/18/1977 to 02/08/2013

0

50

100

150

200

250

300

350

400

450

500

400 600 800 1000 1200 1400 1600 1800

Soybeans from 07/18/1977 to 02/08/2013

0

50

100

150

200

250

300

350

40 60 80 100 120 140 160 180 200 220

Orange juice from 07/18/1977 to 02/08/2013

0

100

200

300

400

500

600

700

0 5 10 15 20 25 30 35 40 45 50

Sugar from 07/18/1977 to 02/08/2013

0

100

200

300

400

500

600

200 400 600 800 1000 1200 1400

Wheat from 07/18/1977 to 02/08/2013

0

50

100

150

200

250

300

350

400

450

500 1000 1500 2000 2500 3000 3500 4000 4500 5000 5500

Cocoa from 07/18/1977 to 02/08/2013

0

50

100

150

200

250

300

25 50 75 100 125 150 175 200 225 250 275 300 325

Coffee from 07/18/1977 to 02/08/2013


Figure 1.11.ii: Histograms of daily continuous contract futures price level series

0

50

100

150

200

250

300

350

400

450

100 200 300 400 500 600 700 800 900

Corn from 07/18/1977 to 02/08/2013

0

100

200

300

400

500

600

700

20 40 60 80 100 120 140 160 180 200 220

Cotton from 07/18/1977 to 02/08/2013

0

50

100

150

200

250

0 5 10 15 20 25

Rice from 12/06/1988 to 02/08/2013

0

50

100

150

200

250

300

100 150 200 250 300 350 400 450 500

Lumber from 04/07/1986 to 02/08/2013

0

200

400

600

800

1000

1200

0 200 400 600 800 1000 1200 1400 1600 1800 2000

Gold from 07/18/1977 to 02/08/2013

0

200

400

600

800

1000

1200

1400

1600

1800

0 5 10 15 20 25 30 35 40 45 50

Silver from 07/18/1977 to 02/08/2013

0

100

200

300

400

500

600

700

200 400 600 800 1000 1200 1400 1600 1800 2000 2200 2400

Platinum from 04/01/1986 to 02/08/2013

0

100

200

300

400

500

600

700

0 200 400 600 800 1000 1200

Palladium from 04/01/1986 to 02/08/2013


Figure 1.11.iii: Histograms of daily continuous contract futures price level series

0

50

100

150

200

250

300

350

400

450

500

50 100 150 200 250 300 350 400 450 500

Copper from 12/06/1988 to 02/08/2013

0

100

200

300

400

500

600

0 20 40 60 80 100 120 140 160

Light crude oil from 03/30/1983 to 02/08/2013

0

100

200

300

400

500

600

700

800

900

1000

0 50 100 150 200 250 300 350 400 450

Heating oil from 07/01/1986 to 02/08/2013

0

50

100

150

200

250

300

350

400

450

500

0 20 40 60 80 100 120 140 160

Brent crude oil from 06/23/1988 to 02/08/2013

0

50

100

150

200

250

300

350

400

450

500

0 200 400 600 800 1000 1200 1400

Gas oil from 07/03/1989 to 02/08/2013

0

100

200

300

400

500

600

0 2 4 6 8 10 12 14 16

Natural gas from 04/03/1990 to 02/08/2013

0

5

10

15

20

25

30

35

40

45

50 100 150 200 250 300 350 400

Gasoline RBOB from 10/04/2005 to 02/08/2013


Figure 1.11.iv: Histograms of daily continuous contract futures price level series

0

50

100

150

200

250

300

350

400

450

30 40 50 60 70 80 90 100 110 120 130 140

Live cattle from 07/18/1977 to 02/08/2013

0

50

100

150

200

250

300

20 30 40 50 60 70 80 90 100 110

Lean hogs from 04/01/1986 to 02/08/2013

Chapter 2

Improving Bayesian VAR density forecaststhrough autoregressive Inverse WishartStochastic Volatility

2.1 Introduction

Forecasts of macroeconomic time series have become a ubiquitous component of any policymaker’s toolkit. As

such, central banks like the Federal Reserve typically publish density forecasts for inflation, output, interest rates,

or other major indicators. This information helps both industry and consumers make decisions consistent with

economic fundamentals. However, forecasts themselves are not infallible. In fact, while major advances have

been made in the area of statistical forecasting, there remains much room for improvement.

This paper resolves some of the relevant issues by proposing a key change in the volatility process of Vector

Autoregressions (VAR) popular among macroeconomists. Instead of assuming that the time varying VAR inno-

vation covariance structure is driven by independent nonstationary processes, we employ a stationary multivariate

Inverse Wishart process where the scale matrix is a function of past covariance matrices. Furthermore, we employ

four major U.S. macroeconomic data series, that are, the rate of GDP growth, the inflation rate, the interest rate,

and the unemployment rate, respectively.1 A Bayesian approach, employing Markov Chain Monte Carlo meth-

ods (MCMC), is then taken in both estimation and in comparing forecasts between the benchmark model [Clark

(2011)] and our competing Inverse Wishart autoregressive volatility specification [Philipov and Glickman (2006)].

Results suggest that incorporating the more sophisticated Inverse Wishart autoregressive volatility process can im-

prove density forecasts in both the short and long-run, with larger improvements as the horizon increases, despite

1Note that the data is taken from the RTDSM database – the same dataset, in fact, as Clark (2011), our benchmark comparison model (withthe exception of the interest rate—see Section 2.2 on the data set, below). Moreover, all data is at the aggregate U.S. level. Finally the interestrate employed in our paper is the 3-month Federal Treasury Bill rate.

84

CHAPTER 2. IMPROVING VAR FORECASTS THROUGH AR INVERSE WISHART SV 85

a small data set and increased parameterization of the model. With this in mind, the following discussion aims to

provide a broader context surrounding the relevant forecasting issues precipitating this proposed modification to

the typical VAR process volatility specification.

2.1.1 Background

A fundamental issue facing the production of good forecasts has been that of how to deal with the changing mo-

ments of the conditional forecast distributions. For example, dramatic changes in U.S. economic volatility have

posed a modeling challenge to contemporary forecasters, specifically among macroeconomists where Gaussian

VAR models are popular. An analysis of major U.S. economic indicators, such as output growth over the past 100

years, illustrates that the economy goes through periods of changing volatility. For example, “The Great Moder-

ation,” which began in the 1980’s, represented a period of unusually low volatility vis-a-vis both a lengthy prior

period of erratic volatility and the more recent instability we’ve experienced since 2007. In this respect, both Sims

(2001) and Stock (2001), in separate discussions of Cogley and Sargent’s (2001) paper, criticized the assumption

of homoskedastic VAR variances, pointing to evidence analyzed by Bernanke and Mihov (1998a,1998b), 2 Kim

and Nelson (1999), or McConnell and Perez Quiros (2000). 3 Clark (2011) also finds significant changes in

conditional volatility across time when estimating the latent stochastic volatilities of a Bayesian VAR model.

It should not come as a surprise then, that while, traditionally, the volatility of forecasting models was assumed

constant over time (primarily for the sake of simplicity), it can be shown that this assumption leads to poor

conditional forecasts. For example, Jore, Mitchell, and Vahey (2010) employ a model averaging approach to

U.S. data, with both equal weights and recursively adapted weights, based on log predictive density scores across

a range of different specifications. Their results show strong support for a recursive weighting scheme across

specifications. More interestingly, however, they find that during periods of changing macroeconomic volatility,

for example when the U.S. economy transitioned into “The Great Moderation”, the weighting scheme tends to

place more weight on specifications which dynamically account for structural breaks in volatility. Moreover, they

find evidence of poor forecasting given a simple assumption of fixed volatility or equal weights across model

specifications. However, it worth noting that the specifications which do respond to structural change within Jore

et al.’s (2010) framework are limited in that they are restricted to a finite set of possible volatility states and breaks.

Consequently, it is important to account for changing volatility in any forecasting specification. Furthermore,

if such changes in volatility occurred relatively infrequently and could be extracted from the data with reasonable

statistical significance, then employing a regime switching specification such as in Jore et al. (2010) might prove

sufficient in drawing good forecasts. However, the truth is that, given the complexity of the economy, changes in

2In the case of monetary policy shocks between 1979 and 1982.3With respect to the growing stability of output around 1985.


volatility probably occur much more frequently and take on many more values than can be effectively captured

by a finite state-space model. For this reason forecasters have adopted a continuous state-space framework for

estimating the conditional volatility of VAR models as opposed to the finite state regime switching type model

applied to volatility, as popularized by Hamilton (1989), and employed by Jore et al (2010). Moreover, the use of

the so-called continuous state “Stochastic Volatility” model has also grown in popularity given its usefulness in

modeling a latent volatility process based on a filtration that includes more than just lagged VAR series shocks, as

for example in the case of a GARCH model or a volatility-in-mean model.4

Both Cogley and Sargent (2005), and Primiceri (2005), allow for time variation in the conditional covariance

matrix across VAR series shocks according to a Stochastic Volatility law of motion, where the conditional volatil-

ity can take on any value in a continuous positive real set (and covariances can be any real number). Moreover,

they also allow for time variation in the VAR parameters themselves, through another Stochastic Volatility law of

motion on their state across time. Clark’s (2011) model, which represents our benchmark, follows the same struc-

ture of the previous two studies, albeit without the time varying VAR parameters, which are dropped in favour of

tight Bayesian steady state priors on the deterministic trend parameters (which define the unconditional mean of

the VAR process) and a rolling sample window which re-estimates the parameters across time.

Villani (2009) showed that imposing Bayesian steady state prior distributions allow us to incorporate prior

beliefs about macroeconomic variable steady states into our model. Furthermore, our belief is that employing this

information probably reduces the need for time varying VAR parameters since much of the time variation in the

autoregressive parameters (which is not due to a lack of time variation in the shock covariance, as was the case

with Cogley and Sargent (2001)), may in fact be due to a lack of a well defined trend (see Cogley and Sargent

(2005), where they model their VAR intercepts5 as stochastic random walks). Moreover, given the quarterly

nature of most macroeconomic time series, small sample sizes are usually the norm. In this situation a tight prior

also plays the role of constraining VAR parameters to aid in the identification of trends that might not otherwise

be readily apparent.6 In this respect, Villani (2009) demonstrates that informative priors for trends can greatly

improve point forecasts, especially over the longer term horizon where correct specification of the trend of the

series is important—see Clements and Hendry (1998). All of this of course assumes that our prior beliefs on the

nature of the time series trends are accurate. In fact, whether or not the trends in macroeconomic data are better

modeled as stochastic (i.e. unit roots with drift) or deterministic is still an open question of debate.

However, most of these studies adopt certain features which could still be improved upon. For example, many

of these studies construct the VAR innovation covariance dynamics as driven by independent nonstationary pro-

4See also Sartore and Billio (2005) for a useful survey of Stochastic Volatility.5Noting of course that given their formulation, the VAR long-run mean µt is both time varying, stochastic, and a function of the VAR

intercepts, αt, as µt = (I− Φt)−1αt.

6In Clark (2011) for example, his rolling sample window is only of size T = 80.


cesses. Often, some form of fixed relationship is imposed between the elements of the VAR innovation covariance

matrix and the independent processes driving them. This is done in order to reduce the parameterization of the

model, but it may limit the richness of the covariance matrix dynamics.

The empirical rationale for this choice of specification is not entirely clear since an analysis of many macroe-

conomic time series suggest stationary volatility dynamics. Moreover, without explicitly parameterizing time

varying covariances matrices, it is extremely difficult to interpret any volatility spill-over effects, as have been

shown to be prevalent among financial time series from U.S. markets (see for example, Diebold and Yilmaz,

(2008)). Furthermore, studies such as Cogley and Sargent (2005) seem to provide little explicit justification for

the choice of a nonstationary process driving VAR innovation volatility, other than a brief comment that “the

random walk specification is designed for permanent shifts in the innovation variance, such as those emphasized

in the literature on the growing stability of the U.S. economy.” Ultimately, if the assumption of nonstationarity is

misspecified and there does in fact exist volatility spill-over across macroeconomic time series, then these existing

specifications leave something to be desired.

Given this, the multivariate volatility process can be constructed to directly model the time varying covariance

matrices without simply extending, in an ad hoc way, the traditional univariate specification to the multivariate

case. Moreover, any autoregressive persistence in volatility can be captured and a finite unconditional mean can

be specified. Philipov and Glickman (2006) [see also Chib, Omori, and Asai (2009)] apply such an autoregressive

Inverse Wishart process to analyze the conditional volatility of financial data and find that it improves volatility

forecasts over simpler formulations, where a number of Bayesian and frequentist measures are applied to compare

forecast accuracy given a variety of competing specifications. It is worth noting, however, that there exist problems

with the Philipov and Glickman (2006) implementation of the Inverse Wishart autoregressive volatility process as

it stands—see Rinnergschwentner et al. (2011) for more details and quite a few corrections. Therefore, in light

of Phillipov and Glickman’s idea of Inverse Wishart covariance modeling, we propose a modified version of their

process which we feel serves the purpose of multivariate forecasting better than Clark (2011)—see Section 2.3

below for more details, including differences between our model and that of Philipov and Glickman (2006).

The rest of the paper is organized as follows. Section 2.2 discusses the data and the methodology used to adjust

the data for trends. Section 2.3 discusses both the benchmark model based on Clark (2011) and the proposed

Inverse Wishart process modification based on Philipov and Glickman (2006). Section 2.4 then details the trend

specification and the conjugate priors we impose within the Bayesian framework. Section 2.5 then discussion

estimation of the model parameters by Gibbs sampler method. Section 2.6 discusses the method whereby we

generate forecast densities for both the VAR levels and covariance matrices across various horizons. Section 2.7

details the results of both the estimation process and the forecast comparisons based on Bayesian analysis of the

predictive likelihoods. Finally, Section 2.8 summarizes and concludes.


2.2 Data

We consider four macroeconomic time series generated from aggregate U.S. data, that are

1. the real output growth,

2. a measure of the inflation rate,

3. the unemployment rate,

4. and an interest rate.

The data source is the same as in Clark (2011): the so-called “real-time” 7 data from the Federal Reserve Bank

of Philadelphia’s Real-Time Data Set for Macroeconomics (or “RTDSM”). The total sample size is quite small:

only T = 252 data points extending from the 2nd quarter of 1948 (hereon denoted as 1948:Q2) until the 1st

quarter of 2011. Output from the RTDSM database is quarterly real data and measured as either Gross Domestic

Product (GDP) or Gross National Product (GNP) depending on the data vintage.8 Inflation from the RTDSM is

also measured quarterly and as either a GDP or GNP deflator or a price index, depending on the vintage. We

measure growth and inflation rates as annualized log changes.9 The unemployment rate, however, is available

on a monthly basis so we simply average across each quarter in matching the quarterly nature of output and

inflation. Moreover, it should be noted that the unemployment rate tends to differ much less dramatically across

vintages. Finally, while Clark (2011) employs the federal funds interest rate series, Primiceri (2005) recommends

the nominal annualized yield on 3-month Federal Treasury Bills, since this series goes back much further. We

therefore adopt the latter series, and again, average across quarters since the data is monthly.10 Finally, output,

inflation, and the unemployment rate are already seasonally adjusted by their source providers.

Clark and McCracken (2008,2010) also provide evidence that point forecasts of GDP growth, inflation, and

interest rates are improved by specifying the latter two series as deviations from some form of deterministic trend

simulating inflation expectations. Given this result Clark (2011) adopts the Blue Chip Consensus forecast pro-

duced from survey data and published by Aspen Publishers Ltd., as a form of long-term inflations expectations.

Unfortunately, as Clark mentions in his online appendix, the data for this Blue Chip forecast of inflation expecta-

tions only extends back to the fourth quarter of 1979 (i.e. 1979:Q4). Therefore, Clark appends an exponentially

smoothed trend from his inflation series to be beginning of the Blue Chip series in extending it back to 1964. Clark7That is data that is regenerated annually to conform to new changes in the way we measure macroeconomic indicators, or to take into

account flaws in some previous set, observed ex-post. Each new issue is deemed a “vintage.”8The RTDSM generates entirely new time series each quarter (deemed “vintages”) based on updated chain weighting techniques or other

improvements. Thus newer vintages represent larger samples than older ones which were generated at previous dates.9Since log differences are already continuously compounded, we simply multiply each quarterly value by 4.

10The 3-month Federal Treasury bill rate series employed is a combination of two very similar series joined together at June 2000,since the first vintage was discontinued. “H15/discontinued/H1.RIFSGFPIM03 N.M” is the unique ID for the discontinued seriesand “H15/H15/RIFLGFCM03 N.M” is the newer series. Both series are available at the Federal Reserve website: http://www.federalreserve.gov/releases/h15/data.htm.


mentions that despite his attempts at keeping the data “as real time as possible” by employing every quarterly vin-

tage of inflation data, in the end a trend based on his most recent vintage (2008:Q4) deviates little from the others.

Moreover, as Clark notes, Kozicki and Tinsley (2001a,2001b) and Clark and McCracken (2008), both suggest

that exponentially smoothed trends of the inflation rate match up reasonably well with survey-based measures of

long-run expectations in the data since the early 1980’s. Given both of these facts, we will simply employ an

exponentially smoothed trend of the inflation rate through the most recent vintage currently available (2011:Q4)

in generating a long-term inflation expectations series, skipping the Blue Chip survey data entirely and ignoring

the previous vintages of inflation data.11

Finally, the unemployment rate series is also detrended by an exponential smoother (in the same way the

inflation rate was detrended in order to generate the long-run inflation expectations [see footnote 11]).

Therefore, to summarize:

1. GDP growth is not detrended (but will be centered on a long-run constant mean of 3.0% through the prior

distribution).

2. The inflation rate is detrended by its exponentially smoothed trend (with a smoothing parameter of α =

0.05).

3. The interest rate (3-month Treasury bill) is detrended around the same trend as inflation (which is supposed

to simulate long-term inflation expectations), although we force a long-run constant mean of 2.5% above

trend through the prior on the unconditional mean.

4. The unemployment rate is detrended by its exponentially smoothed values lagged one period, with a

smoothing parameter of α = 0.02.

See the model and estimation Sections 2.3 and 2.5 respectively, for more details as to how these trends are

implemented into the model.

2.3 Model specifications

The benchmark model is the Bayesian VAR, steady-state prior, Stochastic Volatility specification (BVAR-SSP-

SV) as outlined in Clark (2011). This model employs a Bayesian V AR(J) formulation for the detrended series,

where the covariance matrix of the VAR innovations is driven by linear functions of separate univariate, indepen-

dent, geometric Brownian motions.

11The exponential smoother employed is as follows: y∗t = y∗t−1 + α(yt − y∗t−1), where yt is the actual data series and y∗t is theexponentially smoothed trend. α is a parameter which can be adjusted depending on how “tight” we want the trend to follow the data series.For the inflation rate trend used as long-term inflation expectations, Clark suggests a value of α = 0.05.


2.3.1 Benchmark model

We refer to this benchmark model from Clark (2011) as the Clark specification:

vt = Π (L) (yt −Ψdt) , (2.1a)

where Π (L) = Ip −J∑j=1

ΠjLj and L is the lag operator, (2.1b)

vt = B−1Λ0.5t εt where εt ∼MVNp (0, Ip) , (2.1c)

and B is lower triangular with 1’s along the main diagonal.

Moreover, Λt = diag(λ1,t, λ2,t, . . . , λp,t), (2.1d)

ln (λi,t) = ln (λi,t−1) + ξi,t, ∀i = 1, . . . , p, (2.1e)

and ξi,t ∼ i.i.d. N(0, ϕi), ∀i = 1, . . . , p. (2.1f)

The first equation, (2.1a), is a vector autoregressive model applied to the macroeconomic series, adjusted for

trends. The trends have to be estimated by means of the Ψ matrix. More precisely, we introduce the state variables

dt as in Villani (2009). The state variables can be chosen in a number of ways.12 The yt admits a time varying

unconditional mean, µt = Ψdt, where Ψ is a p× q matrix and dt is a q × 1 vector of deterministic trends.

Π(L) is the matrix lag-polynomial and vt denotes the innovations of the process (yt). The second equation,

(2.1c), highlights the form of the stochastic volatility of the innovations. The stochastic volatility is V ar[vt|Λt] =

B−1ΛtB−1′ = Γt, and conditionally standardized innovations (εt) are i.i.d. Gaussian, for any volatility history.

Thus these standardized innovations are independent of the volatility process Γt.

The dynamic of the volatility process is very constrained, since the serial dependence arises only through the

diagonal matrix Λt, not by means of B, which is unchanging across time. Finally, the natural logarithms of the

diagonal elements of the Λt matrix are assumed to follow independent Gaussian random walks.

A few points of discussion are worth mentioning. First, if λi,t = λi,t−1 for all t, then the underlying processes,

λi,t, that drive the volatility of vt cannot be identified independently of the B matrix. Moreover, the choice of B

being constrained to be lower-triangular solves the identification problem of identifying the elements of Λt from

those of B−1.

Note that the Clark specification is not invariant to permutations in the asset order within the VAR. Indeed,

12For example, if dt = 1, ∀t, i.e. takes on a single constant value for all time periods, then Ψ is a vector of regression constants, the valuesof which determine the time invariant long-run means of the autoregressive levels processes, yt. However, if for example, dt = t, then theelements of the vector Ψ represent the slope coefficients of a linear time-trend relationship shared by each of the yt series. Furthermore, ifdt = [t, f(t)] for example, where f(t) is perhaps some nonlinear function of t, then Ψ is now a matrix of “factor loadings,” the elementsof which reflect how the time varying long-run means of each process are expressed as a linear combination of both the linear and nonlineartime trend, simultaneously. This approach to modeling the unconditional levels first-order moment allows for greater flexibility. For example,we could incorporate a pre-exponentially smoothed trend as one possible nonlinear function of time f(t) as above. See Section 2.4 for moredetails.


without loss of generality, let us consider the bivariate case. The variance of the innovation vt is

E[vtvt′] = E[B−1Λt

0.5εtεt′Λt

0.5(B−1)′] = B−1Λt(B

−1)′

= Γt (2.2a)

=

b11 0

b21 b22

λ1,t 0

0 λ2,t

b11 b21

0 b22

(2.2b)

=

(b11)2λ1,t b11b21λ1,t

b11b21λ1,t (b21)2λ1,t + (b22)2λ2,t

, (2.2c)

where bij , i, j = 1, 2 denote the elements of the B−1 matrix.

Therefore, the variance of the innovation of the first series and the covariance between the two innovations

depend only on the process λ1,t. However, the variance of the second series depends on both processes λ1,t and

λ2,t. Therefore, shocks to the processes ξi,t have asymmetric effects on the variances of the innovations v1,t and

v2,t. This asymmetry is chosen arbitrarily by our order of assignment of the series to the VAR.

In the Clark specification, the volatility and covolatility processes are nonstationary. By the properties of the

Gaussian random walk, we get

ln(λi,t)| ln(λi,0) ∼ N [ln(λi,0), tϕi]. (2.3)

We deduce that

E[λi,t|λi,0] = λi,0 exp

(tϕi2

). (2.4)

On average we get an exponential rate of explosion of the diagonal elements of the matrix Λt. If ϕi > ϕj ,

say, the volatility of series i becomes, asymptotically, infinitely larger than the volatility of series j. And so as

t → ∞ we have that, conditional on past information, the process λi,t is divergent (i.e. explosive). This result

implies that forecasts of the VAR innovation covariance matrices will have explosive elements, which is not a

suitable property of Clark’s model.

However, all is not lost; a similar type of argument shows that if we respecify the λi,t process as

ln(λi,t) = zi,t = α+ βzi,t−1 + ξi,t, (2.5)

then for |β| < 1 the process λi,t is now convergent with unconditional mean

E[λi,t] = exp α

1− β+

ϕi2(1− β2)

. (2.6)


2.3.2 Alternative volatility process specification

As an alternative to the Clark specification, we propose respecifying the volatility process to more easily account

for spill-over effects in covariance across time, through the use of a multivariate Inverse Wishart specification.

While in the Clark specification the nonstationary λi,t processes drive the covariance dynamics, Γt, through

the lower-triangular B matrix, the Inverse Wishart model specifies the dynamics of the latent covariance matrices

directly. Fundamentally, the Inverse Wishart process implies stationary covariance matrix dynamics. Therefore,

the Inverse Wishart process also allows us to formulate the covariance matrix process as autoregressive with a

finite unconditional mean that exists under certain conditions defined below.

The Inverse Wishart Stochastic Volatility (IWSV ) model, is given as follows:

vt = Π (L) (yt −Ψdt) , (2.1a)

vt|Σt,yt−1∼MVNp(0,Σt), (2.7a)

Σt|Σt−1,yt−1∼ IWp(ν,St−1), (2.7b)

where St−1 =

(CC

′+

K∑k=1

AkΣt−kA′

k

)(ν − p− 1) , (2.7c)

where yt, for instance, denotes the set of current and lagged values of yt, and IWp(ν,S), denotes the Inverse

Wishart distribution with dimension p, degree of freedom (i.e. shape parameter) ν, and scale matrix S.13 The

specification of the scale matrix in (2.7c) is the same as in the multivariate ARCH models considered in Engle and

Kroner (1995). In particular, the p× p matrices, C and Ak, k = 1, . . . ,K, are identified if C is lower triangular

with strictly positive elements along the main diagonal, and if the top left element of each Ak are strictly positive.

The Inverse Wishart distribution is a continuous distribution for stochastic, symmetric, positive-definite matri-

ces [see e.g. Press (1982)]. The joint density function of the Inverse Wishart distribution has the simple analytic

expression given as:

f(Σ; ν,S) =|det(S)|ν/2|det(Σ)|−

ν+p+12

2vp2 Γp(ν/2)

exp [−1

2Tr(SΣ−1

)], (2.8)

where Tr(·) denotes the trace operator and Γp(·), the multivariate Gamma function.14

The dynamics of the stochastic volatility matrix given in (2.7c) do not involve lagged values of the series

variable (yt). Thus, the stochastic volatility is exogenous and the IWSV specification assumes no leverage

effects.

13A stochastic, symmetric, positive-definite matrix Σ follows the Inverse Wishart distribution: Σ ∼ IWp(ν,S), if and only if Σ−1

follows the Wishart distribution: Σ−1 ∼Wp(ν,S−1).14The multivariate Gamma function is defined as Γp(a) = πp(p−1)/4

∏pj=1 Γ(a + (1 − j)/2), and depends on the dimension p and

argument a.


From the properties of the Inverse Wishart distribution, we deduce the first and second-order conditional

moments of the volatilities [Press (1982)]:

E[Σt|Σt−1] =St−1

ν − p− 1= CC

′+

K∑k=1

AkΣt−kA′

k, (2.9a)

V ar[σij,t|Σt−1] =(ν − p+ 1) s2

ij,t−1 + (ν − p− 1) sii,t−1sjj,t−1

(ν − p)(ν − p− 1)2(ν − p− 3)

, (2.9b)

and Cov[σij,t, σkl,t|Σt−1] =2sij,tskl,t + (ν − p− 1) (sik,tsjl,t + sil,tskj,t)

(ν − p) (ν − p− 1)2

(ν − p− 3), (2.9c)

where σij is the ijth element of Σt and sij,t−1 is the ijth element of St−1.

This specification is similar to that in Philipov and Glickman (2006) although we modify it slightly. For

one, we add the constant matrix CC′

to the scale matrix expression (2.7c) in order to allow for a non-zero

unconditional mean of the volatility process. Secondly, we add a number of lags K instead of just one. Finally,

Philipov and Glickman (2006) employ an extra parameter, d, to allow for a geometric autoregressive recursion of

varying rates, as opposed to the fixed arithmitic average employed here. For instance, they consider the similar

model with autoregressive lag order set to 1, but allow for the lagged effect to be taken account by means of a

Σt−1d matrix:15

St−1 = νA−1/2′Σt−1dA−1/2 (2.10)

Note, the purpose here is not to improve on the Philipov and Glickman model, but rather to suggest something

similar as a useful alternative to Clark (2011) in terms of forecasting.

At this point we present weak stationarity conditions of the IWSV volatility process.

Proposition 2.3.1. Existence of the unconditional mean of the IWSV process

The unconditional mean of the IWSV process exists if and only if all the eigenvalues of the matrix

Υ =∑Kk=1 Ξk are less than 1 in modulus. In this case the unconditional mean is given by:

E [σt] =(Ig −Υ)−1c, (2.11)

where g = p(p+1)2 , c = vech

(CC

′), σt = vech (Σt) , and Ξi = L (Ai ⊗Ai) D. The

existence of the unconditional mean is a necessary condition for (weak) stationarity.

Note that in this case L and D are the elimination and duplication matrices respectively so that vec (X) =

Dvech(X) and vech (X) =Lvec(X).16

15Note that in Philipov and Glickman (2006) the use of notation is different. For example, they have that Σt−1|ν,St−1 ∼Wp(ν,St−1),

where St−1 = 1νA1/2

(Σt−1

−1)d

A1/2′ . Therefore, this implies that Σt|ν,S−1t−1 ∼ IWp(ν,S−1

t−1) and so depends on the inverse scalematrix instead of the scale matrix. Therefore our scale matix is the inverse of theirs.

16The duplication matrix is the unique n2 × n(n + 1)/2 matrix, D, which, for any n × n symmetric matrix X, transforms vech(X)


Proof of Proposition 2.3.1

From equation (2.9a) we have:

Σt = CC′+

K∑k=1

AkΣt−kAk

′+ Zt, (2.12)

where Zt is a mean zero matrix of weak white noises.

First, by recursive substitution of Σt−i, i = 1, . . . we can show that the right hand side of (2.12) converges in

expectation. Next, taking unconditional expectation Σ = E[Σt] we have:

Σ = CC′+

K∑k=1

AkΣAk

′. (2.13)

Vectorizing above we have:

vec(Σ) = vec(CC′) +

K∑k=1

vec(AkΣAk

′) (2.14a)

= vec(CC′) +

K∑k=1

(Ak ⊗Ak

′)vec(Σ) (2.14b)

⇒ Lvec(Σ) = Lvec(CC′) +

K∑k=1

L(Ak ⊗Ak

′)

Dvech(Σ) (2.14c)

⇒ vech(Σ) = vech(CC′) +

K∑k=1

Ξkvech(Σ). (2.14d)

And so Proposition 2.3.1 follows.

The condition in Proposition 2.3.1 is a necessary condition for stationarity, but not a sufficient condition. In a

Bayesian approach, we are interested in the whole distribution, not in the mean only. Thus strict stationarity, that

is concerning the entire distribution, has to be considered, not only weak stationarity. Unfortunately, necessary

and sufficient conditions for the strict stationarity of the autoregressive Inverse Wishart process have not yet been

derived in the literature.17

2.3.3 Comments

The advantages of such a change to the specification of the volatility process defined by the IWSV model are as

follows:

into vec(X), where vec(·) is the vectorization operator which maps from the n× n dimensional space to the n2 × 1 dimensional space andvech(·) is the operator that omits the lower (resp. upper) triangle of the symmetric matrix X so that it maps from the n×n dimensional spaceinto the n(n+ 1)/2× 1 dimensional space. The elimination matrix performs the inverse operation: it is the unique n(n+ 1)/2×n2 matrix,L, which, for any n× n symmetric matrix X, transforms vec(X) into vech(X). See Magnus and Neudecker (1980) for more details.

17Whereas they have been derived for the analogue Wishart autoregressive process (WAR) [see Gourieroux et al. (2009)].


1. The direct specification of the dynamics of the latent stochastic volatility process, Σt, precludes the need

to specify a B matrix.

2. These autoregressive dynamics between volatility series are more easily interpreted as volatility spill-over

effects, since we no longer need to disentangle the convoluted relationships implied by the B matrix and

the independent volatility driving processes, λi,t.

3. The model is now invariant to permutation of the order of the observed series.

4. As was shown, it is easy to derive conditions ensuring the existence of the unconditional mean of the

processes (Σt) and (yt). However, this condition is a weak one and the condition of strong stationarity

remains to be shown.

5. The assumption of stationary volatility dynamics resolves a problem with forecasts, since assuming nonsta-

tionary volatility would make our forecast density prediction intervals “blow up” as the horizon increases.

In both the Clark and IWSV specifications, the volatility processes are coupled to the vector autoregressive

process, with trends, for the observed variables yt. Both specifications have a state-space representation with the

V AR(J) as observation equation and the IWSV or Clark as the state equation, the covariance being the “state”

of the model. In control engineering, a state-space representation is a mathematical model of a physical system

as a set of input, output and state variables related by first-order differential equations. However, in this case the

model is not a linear state-space representation [see the system in (2.7)]. Therefore, the standard Kalman Filter

algorithm for extracting the state from the noise will not be optimal.

Note that our Inverse Wishart model can be inverted to show the precision matrix as Wishart distributed

(omitting the CC′

constant matrix for simplicity) as

Σt−1|Σt−1 ∼Wp(ν,

(K∑k=1

AkΣtAk

′

)−1

), (2.15)

or after a change of notation Ωt = Σt−1 as

Ωt|Ωt−1 ∼Wp(ν,

(K∑k=1

AkΩt−1Ak

′

)−1

). (2.16)

However, an autoregressive Wishart matrix process, Ωt, say, is usually written as [Gourieroux et al. (2009)]

Ωt|Ωt−1 ∼Wp(ν,

(K∑k=1

AkΩtAk

′

)). (2.17)


And so, the only difference between the two models is the specification of the scale matrix: as an arithmetic

average in the case of the standard Wishart autoregressive process, or as a harmonic average in our case. Therefore,

when the lag order of the autoregression K > 1, we have an asymmetry between the behaviour of the precision

matrix and covariance matrix processes. Of course, this issue does not affect Philipov and Glickman (2006) since

their autoregression is only of order 1.

Since we have that

E

[1

x

]≥ 1

E[x]⇔(E

[1

x

])−1

≤ E[x], (2.18)

we can expect that the harmonic average is smaller than an arithmetic average (even in the case of matrices, but we

omit the proof). It is possible that this inequality maybe useful in deriving sufficient conditions for strict stationary

for the IWSV autoregressive model.

As an aside, Fox and West (2011) also propose a novel class of stationary covariance matrix processes which

exploit properties of Inverse Wishart partitioned matrix theory. Specifically, by augmenting the parameter state-

space they show that we can easily obtain representations for the terms in a factorization of the joint density of

covariance matrices across time, f(ΣT , . . . ,Σ0) =∏Tt=1 f(Σt,Σt−1)

∏Tt=2 f(Σt−1). This expression defines a

stationary first-order Markov process on the covariance matrices across time, with the marginal distribution given

as Σt ∼ IWq(ν + 2, νS), and given the following augmented matrix

Σt φ′

t

φt Σt−1

∼ IW2q

ν + 2, ν

S SF′

FS S

, (2.19)

where φt = ΥtΣt−1, we have that by Inverse Wishart partitioned matrix theory the covariance process can be

written as an AR(1) process Σt = Ψt + ΥtΣt−1Υ′

t, with Υt representing a random coefficient matrix and Ψt

representing an innovation (note both Υt and Ψt are latent variables). Under this framework the conditional

density Σt|Σt−1 is not of an analytical form but can nonetheless be explored theoretically. See Fox and West

(2011) for more details.

2.4 Priors

The Bayesian estimation framework employed requires of us to specify certain prior beliefs upon the parameter

set and this is done through the specification of prior densities. In most cases the prior densities are chosen to be

conditionally conjugate—that is, they are chosen of a known family such that the conditional posterior density,

i.e. the density of a particular parameter, conditional on both the other parameters and the data, works out to be of

the same family as the prior. This facilitates estimation greatly since the need for arbitrarily choosing a suitable


proposal density, as in a Metropolis-Hastings algorithm (MH), is avoided completely—in fact, in this case the

proposal is always accepted and the MH algorithm is just a special case of the Gibbs sampler [Greenberg (2008),

pg.101]. The following Sections outline the specific families the prior densities take, as well as chosen values for

hyperparameters.

Clark and IWSV specifications share the same dynamic model for the observed macroeconomic time series,

yt, given the volatility path, with parameters θ1 = Ψ,Π1, . . . ,ΠJ, but different dynamics for the volatility

process, with parameters θ2 = B,Φ for the Clark model18 and θ2 = C, ν,A1, . . . ,AK for the IWSV

model. We assume that parameters θ1 and θ2 are independent under the prior distributions. We describe below in

greater detail the priors for both θ1 and θ2.

2.4.1 VAR(J) priors

i) Prior on Π

The prior for the VAR coefficients Πj, follow a modified Minnesota specification (see Litterman (1986)). In

this case we assume that the prior for the joint density of Π′= [Π1,Π2, . . . ,ΠJ] is Normal, Π ∼ N [µΠ,ΞΠ],

where the autoregressive order J is assumed known. Moreover, the prior mean of the joint density of the elements

of Π assumes that the VAR follows an AR(1) process, i.e. means of the prior density for all the elements of

autoregressive matrices beyond lag 1 are set to 0. Since GDP growth displays more autoregressive decay in

levels, we set its first-order autoregressive prior mean to 0.25 and set the others to 0.8. Cross equation prior

means, that is, the means for the prior density of the off-diagonal elements of Πik,1 for i 6= k, are also set to 0.

Let us now explain how the variances and covariances of the prior are chosen. Minnesota “own equation”

variances, that are, the variances of the prior density for the main diagonal elements of Πj, shrink as a harmonic

series for each additional lag (i.e. ωii,j = 0.2j for j = 1, . . . J) . Also, “cross equation” variances are typically set

to ωik,j = 0.5( 0.2j ×

σ∗iσ∗k

), where σ∗i is the estimated standard error of the residuals from a univariate autoregression

on the ith macroeconomic series with six lags, pre-fit to the data in advance. For simplicity, however, we will

employ ωik,j = 0.5( 0.2j ) instead, that is, the variance of the prior density for the i, kth element of Πj.

ii) Prior on Ψ

Priors on the deterministic parameters of Ψ defining the trend are extremely important given the modest

sample size employed and are chosen as to influence the series’ trends toward certain reasonable values. In the

case where the trend is assumed constant: dt = 1, and Ψ is a vector. This dramatically reduces the number

of parameters that need to be estimated as the number of series increases. However, it places a prior constraint

on the model by assuming that the selected constant trend chosen is correct. On the other hand, if we allow the

18Where Φ is the diagonal matrix with the variances of the λi,t volatility driving process shocks, ϕi, along its main diagonal.


trends to enter individually through the dt term where dt=[1, f(t− 1), g(t)]′, and f(t) and g(t) are exponentially

smoothed trends for the unemployment rate and inflation growth respectively, then Ψ becomes a p×3 matrix from

which we can statistically evaluate whether the relevant diagonal elements are indeed equal to 1 (which would

imply the trends are in fact correct).

In either case we assume that the prior for the joint density of the elements of Ψ is Normal, Ψ ∼ N [µΨ,ΞΨ].

Moreover, we assume that the priors for Ψ and Π are independent. GDP growth is influenced to have a constant

trend around 3.0% through its prior mean, while inflation and unemployment are influenced to center around their

trends, g(t) and f(t−1) which are exponentially smoothed values of inflation growth and the unemployment rate

respectively (see Section 2.2). Finally, the interest rate is centered around the same trend as inflation; however, we

also add to this the constant trend of 2.5% to reflect the real long-run rate. More precisely, for the macroeconomic

series taken in order as: GDP growth, the inflation rate, the interest rate, and unemployment rate, the prior mean

of Ψ will take the form

3.0

0

2.5

0

(2.20)

when the trends are constant, and

3.0 0 0

0 0 1

2.5 0 1

0 1 0

. (2.21)

when the trends are driven by the 3-dimensional dt.

The prior variances of Ψ are set as follows: GDP growth, 0.2 (0.3); inflation, 0.2 (0.3); the interest rate, 0.6

(0.75); and unemployment, 0.2 (0.3)—where these values have been adopted directly from Clark (2011). The first

values, not in parenthesis, represent those employed in the recursive estimation scheme and are tighter since the

gradually increasing sample size tends to limit the influence of the prior. Prior covariances for the elements of Ψ

are set to zero.

2.4.2 Volatility model priors

i) Clark model

For the Clark (2011) model, priors on the volatility components of the model are as follows. The prior density

for the elements of B is multivariate Normal and the prior for each of the ϕi, i = 1, . . . , p is Inverse Gamma, and


under the prior, the elements of B and the ϕi, i = 1, . . . , p, are independent. Finally, we borrow numerical values

of the hyperparameters directly from Clark’s paper [ Clark (2011) pg.331].

ii) IWSV model

For the Inverse Wishart autoregressive specification, we employ independent multivariate Normal priors on

both Ak, ∀k and C, and independently, a Gamma prior on (ν−p). The Gamma prior is set with hyperparameters

α = 30, β = 2 (shape and rate) as to represent ignorance of its value while the multivariate Normal priors for

the C and Ak’s are set somewhat loosely to let the data speak. In this respect, prior means for the main diagonal

of C are 0.3 and the main diagonal of A1 is set to 0.9 (both prior densities for the off-diagonals elements have

zero means, and the means of all other elements of the Ak, k = 1, . . . ,K matrices are set to 0). Variances are set

equal to 0.002 (i.e. standard deviation of about 0.045), and all covariances are set to zero.

2.5 Model estimation

Both Clark and IWSV model specifications are estimated within the Bayesian framework using Markov Chain

Monte-Carlo (MCMC) techniques, particularly the Gibbs sampler.

i) Gibbs sampler

Indeed, by selecting prior distributions in conjugate families, we can derive closed form expressions of con-

ditional posterior distributions. For expository purposes, let us consider a case in which the set of parameters can

be divided into two subsets, θ1 and θ2, such that we know the expression of conditional posterior distributions

p(θ1|θ2, y) and p(θ2|θ1, y). Let us also assume that it is easy to draw in these conditional posterior distributions.

In general, it is not possible to obtain the closed form expression for the joint posterior distribution p(θ1, θ2|y).

The Gibbs sampler is a method to derive numerically a good approximation of the joint posterior, while also

allowing us to draw in this posterior. The idea is to consider the Markov process θ(m), defined recursively by:

1. θ(m)1 is drawn in p(θ1|θ(m−1)

2 , y),

2. θ(m)2 is drawn in p(θ2|θ(m)

1 , y).

For large m ≥ M , the values θ(m) will approximately follow the invariant distribution of the Markov process

θ(m), that is, the joint posterior. In particular, θ(m), for large m, is a drawing in p(θ1, θ2|y).

This approach is easily extended when the set of parameters is divided into more than two subsets [see below

the sequence used for both the Clark and IWSV specifications].

ii) Augmented parameters

In a Bayesian framework there is little difference between parameter θ and latent volatilities Σt, t = 1, . . . , T .

They are both unobserved and stochastic. Therefore, the Gibbs sampler can be applied jointly to θ and ΣT =


Σ1, . . . ,ΣT to reconstitute the joint density p(θ,ΣT|yT). This joint density has two components, that are

p(θ|yT), which is the posterior distribution of the parameter, and p(ΣT|θ,yT), which is the filtering distribution

of the sequence of latent volatilities.

iii) Gibbs sampler steps - Clark (2011)

Specifically, given the parameters described in the Section 2.3.1 above, we have the following Gibbs sampling

steps for the Clark (2011) benchmark model, where the volatility driving process ΛT is introduced as an aug-

mented parameter to be estimated. All conditional posteriors below are conditional on the data, yT, left unstated

below for ease of exposition.

1. Draw the autoregressive coefficients Π′= [Π1,Π2, . . . ,ΠJ] of the VAR, conditional on Ψ, ΛT, B, and

Φ=diag(ϕ1, ϕ2, . . . , ϕp), given a conditionally conjugate multivariate Normal prior, Π ∼ N(µΠ,ΞΠ).

2. Draw the trend coefficients Ψ of the VAR, conditional on Π, ΛT, B, and Φ=diag(ϕ1, ϕ2, . . . , ϕp), given

a conditionally conjugate multivariate Normal prior, Ψ ∼ N(µΨ,ΞΨ).

3. Draw the elements of B (lower triangular with ones in the diagonal) conditional on Π, Ψ, ΛT, and

Φ=diag(ϕ1, ϕ2, . . . , ϕp), given Normal, independent, priors on each of the elements of the B matrix.

4. Draw the elements of the volatility driving process Λt for each time t = 1, . . . , T in sequence, each

conditional on Λ\t,Π, Ψ, B, and Φ=diag(ϕ1, ϕ2, . . . , ϕp), where the notation \t denotes the set of

all matrices except that at time t. A Metropolis-Hastings-within-Gibbs step is required here since the

posterior distribution is of an unknown family.

5. Draw the diagonal elements of Φ conditional on Π,, Ψ, B, and ΛT. We assume conditionally conjugate,

independent Inverse-Gamma priors on each ϕi ∼ IG(γ2 ,δ2 ).

iv) Gibbs sampler steps - IWSV model

Similarly, we have the following Gibbs sampling steps for the IWSV specification, where the covariance ma-

trices Σt are introduced as augmented variables. Again, all conditional posteriors below are implicitly conditional

on the data, yT.

1. Draw the slope coefficients Π′

= [Π1,Π2, . . . ,ΠJ] of the VAR, conditional on Ψ, ΣT, Ak, ∀k, C, and

ν, given multivariate Normal prior, Π ∼ N(µΠ,ΞΠ).

2. Draw the steady state coefficients Ψ of the VAR, conditional on Π, ΣT, Ak, ∀k, C, and ν, given multi-

variate Normal prior, Ψ ∼ N(µΨ,ΞΨ).


3. Draw the parameters Ak, ∀k,C, and ν jointly, conditional on Π, Ψ, and ΣT. Multivariate Normal priors

are imposed on the Ak matrices, and the C matrix. A Gamma prior is imposed on the degree of freedom,

ν. A Metropolis-Hastings-within-Gibbs step is required here since the posterior distribution is of an

unknown family.

4. Finally, draw Σt conditional on Σ\t,Ak, ∀k,C, ν, Π, and Ψ, ∀t in sequence. A Metropolis-Hastings-

within-Gibbs step is required here since the posterior distribution is of an unknown family.

The steps above are derived in greater detail within Appendix 2.11.

Under the IWSV process, the direct estimation of the latent stochastic volatility covariance matrix process

increases the number of latent parameters from Tp to Tp(p + 1)/2, raising quickly the curse of dimensionality

as an issue as p increases. Moreover, the number of regular parameters goes from p(p−1)2 + p to p(p+1)

2 +Kp2 +

1, although in this latter case many possible reparameterizations are possible to reduce the number of regular

parameters the IWSV must estimate.

Moreover, since conditionally conjugate priors are unknown at this point for the conditional posterior densities

of the IWSV regular parameters, a Metropolis-Hastings random walk sampler is employed. This additional

Metropolis-Within-Gibbs step requires some extra work to obtain reasonable draws.

Finally, the estimation methods employed for both specifications suffer from the fact that they draw the latent

stochastic volatility covariance matrices sequentially across time rather than jointly. Clearly, if the volatilities Σt

are significantly correlated across time, joint sampling would prove superior. This is since the conditional posterior

density may suffer from low variance, leading to draws which may fail to traverse the full parameter support in a

reasonable amount of time. See Greenburg (2008) pg.94 for a simple example illustrating the problem.

2.6 Forecasts

2.6.1 Point and interval forecasts

Given the Bayesian model estimation framework employed, forecasts can be easily obtained with little extra

computational overhead. Moreover, the Bayesian framework provides an intuitive way of comparing forecast

accuracy.

Generally, the desired predictive density of some forecasted value yf given data set y is

p (yf | y) =

∫p (yf | θ, y)π (θ | y) dθ (2.22)

where p(yf |θ, y) is the predictive distribution given parameter θ and data y, and π(θ|y) is the posterior distribution


of θ. When the value yf represents the true outcome of the data series known ex post, given the particular model

formulation estimated ex ante, the left hand side is known as the predictive likelihood of the given outcome value

yf [Geweke and Amisano (2010)].

In our specification, formula (2.22) can be applied to forecast a future path yT+1, . . . ,yT+h at date T . More-

over, we can introduce explicitly the stochastic volatility. We get:

p(yT+1, . . . ,yT+H|yT) =

∫. . .

∫p(yT+1, . . . ,yT+H,ΣT+1, . . . ,ΣT+H, θ,ΣT|yT)·

dθdΣT

H∏h=1

dΣT+h (2.23)

where again, yT = yT, . . . ,y0. By factorizing the joint density within the integral in (2.23), we get

p(yT+1, . . . ,yT+H|yT) =

∫. . .

∫p(yT+1, . . . ,yT+H|ΣT+H,yT, θ)p(ΣT+1, . . . ,ΣT+H|ΣT,yT, θ)·

p(ΣT, θ|yT)dθdΣT

H∏h=1

dΣT+h. (2.24)

We already possess draws from p(ΣT, θ|yT) = p(ΣT|θ,yT)p(θ|yT) by using the Gibbs sampler with aug-

mented parameter. Therefore, each time we draw an mth value from the Gibbs sampler for θ(m) and ΣT(m), we

can simultaneously draw sequences yT(m), . . . ,yT+H

(m) and ΣT(m), . . . ,ΣT+H

(m) given the parameter-

ization of the model and its implied conditional normality.

To summarize, we have the following steps:

1. Obtain a draw for θ(m) and ΣT(m) from the Gibbs sampler.

2. Conditioning on these values and the data, draw a covariance matrix ΣT+1(m) using the chosen volatility

specification, either Clark, or IWSV .

3. Draw yT+1(m) from the V AR(J) Gaussian level process.

4. Repeat again from step (1), drawing the next horizon, until we finish drawing for horizon T +H .

These steps above provide a draw from the joint conditional density

p(yT+1, . . . ,yT+H,ΣT+1, . . . ,ΣT+H, θ,ΣT|yT). (2.25)

These steps are repeated providing a sequence of draws for m sufficiently large, M0 +M > m ≥ M0, say, with

M0 and M large.


The sequence of draws can be used for approximating the joint predictive distribution, p(yT+1, . . . ,yT+H|yT+1),

or some of its moments. Let us focus on the point and interval forecasts.

For the purposes of forecasting, we wish to consider both the conditional mean and quantiles of the predictive

density

p(yT+1, . . . ,yT+H|yT) = p(yT+H|yT+H−1) . . . p(yT+1|yT). (2.26)

i) Point forecasts

For the short-term horizon a consistent estimator of the mean of E[yT+1|yT] is its sample counterpart

1

M

M0+M∑m=M0

yT+1(m), (2.27)

computed on the final iteration of the Gibbs sampler.

Moreover, by the Law of Iterated Expectations, we have:

E[yT+1|yT] = E[E[yT+1|ΣT,yT, θ]|yT]. (2.28)

Therefore, another consistent estimator of E[yT+1|yT] is

1

M

M0+M∑m=M0

E[yT+1|ΣT(m),yT, θ

(m)], (2.29)

as long as the conditional expectation E[yT+1|ΣT,yT, θ] has an analytical form.

Moreover, again by the Iterated Law of Expectations, the results above can be generalized to any horizon

h = 1, 2, . . . .,H since:

E[E[E[yT+h|yT+h−1, . . . ,yT+1,yT]|yT+h−2, . . . ,yT+1,yT] . . . ] = E[yT+h|yT]. (2.30)

ii) Interval forecasts

Interval forecasts at horizon h can be derived by estimating the quantiles of the predictive density p(yT+h|yT).

Let us look for a prediction interval with lower bound a α-quantile and upper bound the (1 − α)-quantile. This

forecast interval can be approximated as follows:

1. Rank in increasing order the yT+h(m), m = M0 + 1, . . . ,M .

2. The approximated confidence interval admits as lower bound the yT+h(m) at rank αM and an upper bound

as the yT+h(m) at rank (1− α)M .


2.6.2 Forecast comparison

i) Sample windows

The focus of this paper is to compare forecast performance. Forecast comparisons are made by employing a

limited subsample of the data for estimation purposes, thus leaving some latter part of the data available as the

“true” outcome of the macroeconomic series. Furthermore, we do not simply estimate the parameters of the model

once, rather we estimate the parameters a number of times in sequence, n = 1, . . . , N , which we call “sample

windows,” each sample window focusing on a different subsample of the entire data set.

As in Clark (2011) we employ both “recursive” and “rolling” schemes for these sample windows. Under each

scheme, sample windows are iterated upon: first, a subsample of the entire data set is isolated; we then use this

subsample to estimate the model parameters, forecasts are then generated, and comparisons are made according

to the chosen metrics discussed in this section. Both schemes employ the same subsample of the data for the

first sample window. However, the two schemes differ in how they deal with later sample windows after the first.

Under the recursive scheme, for each subsequent sample window, one future data point is appended to the end of

the subsample, and so the subsample grows larger as the sequence of sample windows progress. Under the rolling

scheme, the size of the subsample is fixed, and so the subsample shifts forward in time by one data point for each

iteration.

The following Figures 2.1 and Figures 2.2 illustrate both the recursive and rolling window schemes where

we have arbitrarily chosen a subsample size of 130 data points for the initial sample window. Notice how the time

index of the last value in the subsample, T , changes across the iterations of sample windows, n = 1, . . . , N , and

so we can denote them T (n) to show that they depend on the value of n.

ii) Comparing point forecasts

In comparing point forecasts we can employ the mean squared error (MSE) estimator for any forecast horizon

Figure 2.1: Subsample sequence by recursive window


Figure 2.2: Subsample sequence by rolling window

h = 1, . . . ,H . Forecast errors can be computed where y∗T+h is the “true” out-of-sample data point at forecast

horizon h and the point forecast E[yT+h|yT] is estimated as described above.

The mean squared error estimator is given as

MSEh,N =1

N

N∑n=1

uT(n)+huT(n)+h

′(2.31)

where the expected forecast error E[yT(n)+h|yT(n)] − y∗T(n)+h is computed for each iteration, n, and where

T (n) denotes the end of sample data point. In the case of the rolling window scheme, the set yT(n) denotes only

those data points in the rolling window and not the entire set of data starting from y1.

The off-diagonals of MSEh,N represent the squared forecast errors across macro series at horizon h, and the

main diagonal elements are the squared forecast errors for each individual series themselves.

iii) Model fit

The standard Bayesian tool for measuring the overall forecast performance is the predictive likelihood de-

scribed above in (2.22) [Geweke and Amisano (2010)]. The finite sample approximation, within the context of

our model, is given as:

p(y∗T(n)+h, . . . ,y∗T(n)+1|yT(n)) =

1

M

M0+M∑m=M0

p(y∗T(n)+h, . . . ,y

∗T(n)+1|ΣT(n)

(m),yT(n), θ(m))

≈ E[p(y∗T(n)+h, . . . ,y

∗T(n)+1|ΣT(n),yT(n), θ

)|yT(n)

]= p(y∗T(n)+h, . . . ,y

∗T(n)+1|yT(n)), (2.32)

by LLN, where y∗T(n)+h is the true out of sample data point at horizon h. This estimator gives us an idea of how

well the model and parameter estimates “fit” the true out-of-sample data. That is, given that

y∗T(n)+h, . . . ,y∗T(n)+1

are the actual values observed ex-post, what is the probability of their occurrence under our model and ex-ante


estimated parameters, given the subsample window at iteration n. If one model is more congruent with the actual

future outcomes than another, its predictive likelihood should be greater.

When the entire sample data set is known ex-post, we can interpret the sequence of one-step ahead predictive

densities as the marginal likelihood:

p(yT

)=

T∗−1∏t=1

p(y∗t+1|yt

). (2.33)

However, we would like to generalize (2.33) to account for our iterative sample window schemes discussed above.

Instead of assuming that the entire data set is known, rather we will assume that under each sample window, n,

only the data yT(n) is known ex-ante when we estimate the model parameters that define pn(·). Then, keeping

in mind the ex-post predictive likelihood as in (2.32), we can take the product across the N sample windows to

obtain a measure of ex-post fit at the first horizon h = 1:

N∏n=1

pn

(y∗T(n)+1|yT(n)

). (2.34)

Under the recursive window scheme, the expression (2.34) is the analog to (2.33), but where we take the product

across only a subset of the data, t = T (1), . . . , T (N), and where we employ the changing estimated predictive

distribution pn(·).

However, under the rolling sample window scheme, it is not clear how to interpret (2.34). Moreover, taking

the product across sample windows under changing densities and window schemes forces us to reinterpret (2.34)

not as a density, but purely as a product of forecast metrics. Therefore, taking logs we can transform the product

(2.34) into a sum and interpret this metric as an average across dependent forecast attempts. Moreover, we can

also consider robustness of the forecast attempts via sample moments such as the variance across sample window

forecasts. For example, we have the mean of the log-predictive likelihoods as:

MLPLh=1,N =1

N

N∑n=1

lnpn

(y∗T(n)+1|yT(n)

), (2.35)

where each term in the sum will hereon be denoted as LPLh=1,n, and the variance is given as,

V LPLh=1,N =1

N

N∑n=1

(lnpn

(y∗T(n)+1|yT(n)

)−MLPLh=1,N

)2

. (2.36)

Finally, we generalize the above two sample moments to any horizon by replacing the predictive likelihoods with

the ex-post one-step ahead predictive distribution at horizon h:

pn

(y∗T(n)+h|y

∗T(n)+(h−1), . . . ,y

∗T(n)+1,yT(n)

). (2.37)


That is, the last term in the factorization of the joint predictive likelihood given in (2.32).

We can compare the Clark and IWSV specifications by looking at how these sample moments differ across

forecast horizons. For example, we could compare the difference between the two model’s MLPLh,N ’s across

horizons h = 1, . . . ,H as a means of comparing the “term structure” of competing forecast performance across

the horizons.

Furthermore, this difference can be “decomposed” into a sum of N log-ratios which can be compared across

sample windows n = 1, . . . , N to suggest at which sample window, n, either model did better, or worse, at

forecasting given some fixed horizon h. This difference, given, say, models A ≡ IWSV and B ≡ Clark is

defined, for example given h = 1, as

N(MLPLAh=1,N −MLPLBh=1,N

)=

N∑n=1

ln

ˆpA,n

(y∗T(n)+1|yT(n)

)−

N∑n=1

ln

ˆpB,n

(y∗T(n)+1|yT(n)

)

=

N∑n=1

ln

ˆpA,n

(y∗T(n)+1|yT(n)

)ˆpB,n

(y∗T(n)+1|yT(n)

) . (2.38)

after post multiplication by N . In fact, each log-ratio term in the sum can be interpreted as the ex-post predictive

Bayes factor in favour of model A over model B, at sample window n.

Bayes factors are the standard Bayesian method of model comparison. The predictive likelihood method

represents an inherently Bayesian approach to forecast comparison and as such we do not require p-values, since

we obtain the finite sample distribution directly. For more details on Bayesian versus frequentist approaches to

forecast generation and analysis see Geweke and Amisano (2010).

2.7 Applications

This section will now discuss applications. The first subsection will evaluate the implementation of the Bayesian

estimation methodology via simulated data. The second subsection applies the Clark and IWSV VAR volatility

specifications to the macroeconomic data set. Specifically, we will endeavour to compare the two models along

a number of dimensions, including: point and interval forecast accuracy, posterior trend estimation, VAR rate of

decay, volatility process behaviour, and estimation of the other model parameters.

2.7.1 Monte-Carlo Analysis

We first perform a Monte-Carlo analysis to provide some insight on the implementation of the Bayesian method-

ology. We first generate an artificial data set, following a IWSV model. This dataset is used within a rolling

sample window scheme to iterate a sequence of estimations of both the Clark and IWSV specifications. The


experiment will provide information on the convergence and accuracy of the IWSV based estimation of the pa-

rameters. It will also be used to detect the misspecification of Clark’s model. Moreover, we will compare forecasts

from the misspecified model according to the metrics discussed in Section 2.6.2.

i) The Data Generating Process

We simulate 370 data points according to the IWSV model. The selected orders are 3 for the VAR component,

and 3 for the Inverse Wishart component. The parameter values are:

C = 0.3Ip, (2.39a)

A1 =

0.5 0 0 0

0 0.75 0 0

0 0 0.85 0

0 0 0 0.98

, A2 = A3 = 0, (2.39b)

Π1 =

0.25 0 0 0

0 0.8 0 0

0 0 0.8 0

0 0 0 0.8

, Π2 = Π3 = 0, (2.39c)

Ψ =

[3.0 0 2.5 0

]′, dt = 1,∀t, so no trend, and (2.39d)

ν = 30. (2.39e)

This model implies the following relationships among the series variables, yi,t, i = 1, . . . , 4:

y1,t = 3.0 + 0.25 (y1,t−1 − 3.0) + v1,t, (2.40a)

y2,t = 0.8y2,t−1 + v2,t, (2.40b)

y3,t = 2.5 + 0.8 (y3,t−1 − 2.5) + v3,t, (2.40c)

and y4,t = 0.8y4,t−1 + v4,t. (2.40d)

The following relationship on the conditional scale matrix, St−1, of the Inverse Wishart covariance matrices Σt,


is given as:

St−1 =

0.09 + 0.25σ211,t−1 0.375σ12,t−1 0.425σ13,t−1 0.49σ14,t−1

0.375σ21,t−1 0.09 + 0.5625σ222,t−1 0.6375σ23,t−1 0.735σ24,t−1

0.425σ31,t−1 0.6375σ32,t−1 0.09 + 0.7225σ233,t−1 0.833σ34,t−1

0.49σ41,t−1 0.735σ42,t−1 0.833σ43,t−1 0.09 + 0.96σ244,t−1

(30−4−1),

(2.41)

where the scale matrix is given as in equation (2.7c) above, and the σij,t−1 are the elements of Σt−1. Moreover,

from equation (2.9a), the conditional mean of the stochastic covariance matrix Σt is given as St−1/(30−4−1) =

CC′+ A1Σt−1A1

′. Finally, by applying Proposition 2.3.1, the unconditional means of the stochastic volatilies

are given as:

σ211 = 0.12, σ2

22 = 0.21, σ233 = 0.32, and σ2

44 = 2.25, (2.42)

where the stochastic covolatilities have zero unconditional mean.

From the proof of Proposition 2.3.1 the IWSV model can be written as:

Σt = CC′+ A1Σt−1A1

′+ Zt, (2.43)

where Zt is a zero mean matrix of weak white noises. Furthermore, this expression can be vectorized and rewritten

in terms of the autoregressive coefficient matrix Υ = L (A1 ⊗A1) D as:

vech(Σt) = vech(CC′) + Υvech(Σt−1) + vech(Zt). (2.44)

The persistence in the (co)volatility series are therefore measured by looking at the eigenvalues of the 10× 10

dimensional Υ matrix that determines the rate of reversion to the unconditional mean, or response, given a unit

impulse shock, Zτ , to the (co)volatilies, at time τ . Since, the matrix Υ is diagonal, we can easily solve for the

eigenvalues as:

0.250, 0.375, 0.425, 0.490, 0.563, 0.638, 0.723, 0.735, 0.833, 0.960 . (2.45)

The largest eigenvalue is close to a unit root, and so it will influence σ244,t to exhibit the slowest rate of autore-

gressive decay.

The following figures describe the simulated data set. Figure 2.3, provides the sample paths of the simulated

data series yt. Figure 2.4, provides a plot of the simulated stochastic volatilities σ2ii,t, i = 1, . . . , 4, that is,

the diagonal elements of the matrix Σt. Finally, Figure 2.5, provides the i, jth stochastic correlations ρij,t =

σij,t√σ2ii,t

√σ2jj,t

.


These artificial series reveal a number of features. For example, we can see that the values chosen for Ψ

influence the mean of the series yt in Figure 2.3. Moreover, since the shocks, vi,t, are driven by their conditional

volatilities, σ2ii,t, their conditional meanEt−1[σ2

ii,t] = 0.09+αiiσ2ii,t−1 plays a role in determining the magnitude

of the volatility of yi,t. Indeed, series yi,t, associated with stochastic volatilities with larger unconditional means,

0.09/(1− αii), tend to exhibit larger overall episodes of volatility spikes.

For example, from Figure 2.3, we see that the 4th series exhibits the largest volatility episodes as it is asso-

ciated with Et−1[σ244,t] = 0.09 + 0.96σ2

44,t−1, and the largest unconditional mean σ244 = 2.25. Moreover, by

examining Figure 2.3, we see that the IWSV process exhibits volatility clustering for the 4th series, y4,t. For

example, volatility between periods 1 to 100 is much smaller than between periods 200 to 300, and volatility

episodes tend to persist across time. Furthermore, these episodes tend to coincide with larger increases in the

volatility process σ244,t given in Figure 2.4, although this relationship is convoluted since it also involves the au-

toregressive behaviour of the equations given in (2.40). Finally, the stochastic correlations seen in Figure 2.5 tend

to grow larger in magnitude as the α2ij of the corresponding i, jth expected variance components increase, since it

represents a multiplicative constant in the conditional variance of the (co)volatility Vt−1[σij,t]. For example, the

ρij,t’s associated with the 3rd and 4th series are more variable than those associated with the 1st and 2nd series.


Figu

re2.

3:Si

mul

ated

sam

ple

path

sy

t

-6-4-2 0 2 4 6

110

020

030

0

y t,1 ψ1

-6-4-2 0 2 4 6

110

020

030

0

y t,2 ψ2

-6-4-2 0 2 4 6

110

020

030

0

y t,3 ψ3

-6-4-2 0 2 4 6

110

020

030

0

y t,4 ψ4


Figu

re2.

4:Si

mul

ated

stoc

hast

icvo

latil

ities

0

0.5 1

1.5 2

2.5 3

3.5

110

020

030

0

σ2

11,t

σ2

22,t

σ2

33,t

σ2

44,t


Figure 2.5: Simulated stochastic correlations

-0.8-0.6-0.4-0.2

0 0.2 0.4 0.6 0.8

1 100 200 300

ρ21,t

-0.8-0.6-0.4-0.2

0 0.2 0.4 0.6 0.8

1 100 200 300

ρ31,t

-0.8-0.6-0.4-0.2

0 0.2 0.4 0.6 0.8

1 100 200 300

ρ41,t

-0.8-0.6-0.4-0.2

0 0.2 0.4 0.6 0.8

1 100 200 300

ρ32,t

-0.8-0.6-0.4-0.2

0 0.2 0.4 0.6 0.8

1 100 200 300

ρ42,t

-0.8-0.6-0.4-0.2

0 0.2 0.4 0.6 0.8

1 100 200 300

ρ43,t

ii) Estimation

Given this simulated data, we estimate both the Clark and IWSV volatility specifications, according to the

rolling window scheme, described Figure 2.2, with fixed window size of 260 and 100 sample windows. The prior

densities for the Gibbs sampling are set with means equal to the true values and variances set as in Section 2.4.

The estimated parameters turn out to be quite close to their true values–see Tables 2.3.i to 2.4.ii in Appendix

2.12, which describe the posterior distribution of the parameters for both models under the 1st sample window, as

well as the distribution of the posterior means across allN = 100 further iterations. As expected, given the nature


of the simulated data, there is less variation in the distribution of the posterior means across sample windows than

in the posterior density itself, given the 1st sample window. In other words, there is little change in the posterior

means of the model parameters across sample windows.

The only parameter that seems systematically biased is the degree of freedom parameter of the IWSV pro-

cess, ν. Figure 2.6 plots the true value of ν = 30 against the posterior mean and 95% credibility region across

the N = 100 sample windows. As can be seen, its estimate is lower than the true value. However, as expected,

for larger simulated sample sizes the estimated value converges to the true value.

From Table 2.4.i, notice that the posterior means for the VAR parameters of theClark model are very close to

those of the IWSV . For theClark volatility parameters, the true values are omitted since the data were generated

using the IWSV .

Interestingly, under the IWSV process, the tracking of the latent stochastic volatility estimated posterior

means tends to be worse when tracking series associated with the smaller eigenvalues of Υ–see Figures 2.7.i

and 2.7.ii below. It appears as though there exists a lower bound on the tracking of the posterior mean estimates

across time for the 1st volatility series—see Figure 2.7.i. Multiple tests have revealed that the smaller the value

chosen for the associated diagonal element of A1, the more pronounced is this lower bound, and conversely, the

closer is the diagonal element of A1 to 1, the better the posterior mean is able to account for variation in the true

volatility sample path. More generally, we assume that the tracking improves as the eigenvalues of the stability

matrix Υ =∑Kk=1 Ξk approach 1. More investigation is needed, however, to establish definitively the theoretical

properties of this phenomenon.

Figure 2.6: IWSV, Posterior of ν across N = 100 sample windows

10

15

20

25

30

10 20 30 40 50 60 70 80 90 100

ν

Time

True valuePosterior mean

2.5% C.I.97.5% C.I.


Figure 2.7.i: IWSV, filtered latent volatility for 1st series, 1st sample window

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

50 100 150 200 250

Var

ianc

e

Time

Posterior mean2.5% C.I.

97.5% C.I.True value

Figure 2.7.ii: IWSV, filtered latent volatility for 4th series, 1st sample window

0

1

2

3

4

5

6

50 100 150 200 250

Var

ianc

e

Time

Posterior mean2.5% C.I.

97.5% C.I.True value

iii) Forecasts

In comparing the overall forecast performance, we appeal to the mean of the log predictive likelihoods,

MLPLh,N , term structure, across horizons h = 1, . . . , 10. Notice that the IWSV model fits the out-of-sample


data much better than the Clark, especially at large horizons–see Figure 2.8. At large horizons, such as h = 10,

the MLPLh,N metric is much larger for the IWSV model, which suggests that this model fits the out-of-sample

data better as the horizon increases.

Of course, we can also consider the term structure of the model specific log predictive likelihoods, LPLh,n,

across the individual sample windows. In this case we can interpret the log-ratios at each sample window, n, as

predictive Bayes factors (recall equation (2.38) from Section 2.6.2.iii). Figure 2.9 plots these Bayes factors across

sample windows where model A ≡ IWSV and model B ≡ Clark. Larger values suggest that the IWSV is

more representative of the out-of-sample outcomes than the Clark at each sample window n = 1, . . . , N .

Figure 2.9 also suggests that there is a variability in the forecasting performance according to the LPLh,n

metric. This Figure 2.9 provides the term structure of model performance, according to the LPLh,n metric,

across the sample windows n = 1, . . . , N , where as usual A ≡ IWSV and B ≡ Clark. The variability in the

LPLh,n forecast metrics for each model, can be measured by considering their sample moments across sample

windows, given in Table 2.1. Interestingly, while the IWSV fares better in terms of the sample mean of LPLh,n

metrics across sample windows, the Clark model metrics have the advantage of being less variable, skewed, and

leptokurtic. Figure 2.10 provides histograms of these distributions across sample windows, at the 10th horizon.

Figure 2.8: Simulated data, Forecast horizon term structure according to MLPLh,N metric, N = 100 samplewindows

-4.5

-4.4

-4.3

-4.2

-4.1

-4

-3.9

-3.8

-3.7

-3.6

-3.5

1 2 3 4 5 6 7 8 9 10

ML

PLh,

N=1

00

Horizon 'h'

IWSVClark


Figure 2.9: Simulated data, Sample window term structure according to difference of LPLh,n’s metric, h = 10

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

0 10 20 30 40 50 60 70 80 90 100

sample window 'n'

LPLA,h=10,n-LPLB,h=10,n

Table 2.1: Simulated data, Sample moments of the LPLh,n metrics across N = 100 sample windowsHorizon h = 1 Horizon h = 10IWSV Clark IWSV Clark

mean -3.6816 -3.8461 -3.7208 -4.2147stnd. dev. 2.1310 1.9748 2.0290 1.9748skewness -1.005 -0.6926 -0.9314 -0.5453kurtosis 3.834 3.1393 3.5944 3.1594

Figure 2.10: Histograms of the LPLh=10,n metrics across n = 1, . . . , N sample windows, 10th horizon


2.7.2 Real data

i) The Clark (2011) data set

Let us now turn to the real-world Clark (2011) macroeconomic data set. In this case we do not know the true

model and so we will attempt to choose between the two misspecified models, i.e. the Clark and IWSV models,

according to out-of-sample forecast performance.

We first consider plots of the Clark (2011) data series provided in figures Figures 2.11 and 2.12. Figure

2.11 provides a plot of the raw data series, along with exponentially smoothed trends, as described in Section 2.2

above. Figure 2.12 presents the data series after detrending by the associated smoothed trend. Recall that the

trends applied are those from Clark (2011) in order to replicate the results from that paper, and not necessarily

because we believe these trends to provide the best fit.

ii) Estimation

We estimate both volatility models on an initial subsample data size of 130, across 100 sample windows. Both

rolling and recursive window schemes are estimated. Various orders of both the VAR and IWSV specification

were tested and three lags were ultimately chosen as a balance between parameterization and improvement in

model fit. The Gibbs sampling steps in Appendix 2.11 are performed for both models with a draw size of M =

100, 000, and a burn-in of M0 = 10, 000, and the priors are set as described in Section 2.4.

Summary statistics of the posterior distributions of the parameters are given in Tables 2.5.i to 2.6.iii, in Ap-

pendix 2.12. Tables 2.5.i and 2.6.i provide the posterior means and 95% credibility intervals for both the IWSV

and Clark model parameters, respectively, given the 1st sample window of a recursive window scheme. Tables

2.5.ii and 2.6.ii provide the sample means and 95% confidence intervals for the distribution of the posterior means

of both the IWSV and Clark model parameters, across the N = 100 sample windows, again given a recursive

window scheme. Finally, Tables 2.5.iii and 2.6.iii are the analogs to the previous described tables, except we

instead employ a rolling sampling window scheme.

These tables reveal a number of interesting features. First, irrespective of sample window size or scheme, the

posterior means of many parameters deviate from our assumptions on the prior means, suggesting that the data

are informative. For example, within the context of the IWSV model, the elements of the main diagonal of the

C matrix deviate from the assumption of 0.3, although the off-diagonal elements tend to stay close to zero.


Figu

re2.

11:C

lark

(201

1)m

acro

econ

omic

data

set,

seri

esan

dsm

ooth

edtr

ends

-15

-10-5 0 5 10

15

20

1948

-04

1962

-07

1976

-10

1991

-01

2005

-04

GD

P gr

owth

-4-2 0 2 4 6 8 10

12

14

1948

-04

1962

-07

1976

-10

1991

-01

2005

-04

Infl

atio

n ra

teIn

flat

ion

tren

d

0 2 4 6 8 10

12

14

16

1948

-04

1962

-07

1976

-10

1991

-01

2005

-04

Inte

rest

rate

Infl

atio

n tr

end

2 3 4 5 6 7 8 9 10

11

1948

-04

1962

-07

1976

-10

1991

-01

2005

-04

Une

mpl

oym

ent r

ate

Une

mpl

oym

ent t

rend


Figu

re2.

12:C

lark

(201

1)m

acro

econ

omic

data

set,

detr

ende

dse

ries

-15

-10-5 0 5 10

15

20

1948

-04

1962

-07

1976

-10

1991

-01

2005

-04

GD

P gr

owth

-8-6-4-2 0 2 4 6 8 10

1948

-04

1962

-07

1976

-10

1991

-01

2005

-04

Infl

atio

n ra

te

-6-4-2 0 2 4 6 8 1948

-04

1962

-07

1976

-10

1991

-01

2005

-04

Inte

rest

rate

-2-1 0 1 2 3 4 5 1948

-04

1962

-07

1976

-10

1991

-01

2005

-04

Une

mpl

oym

ent r

ate


Interestingly, the degree of freedom parameter ν exhibits posterior means in a range between approximately 15 to

19, higher than our assumption of 15 on the prior. As for the Clark model, the distribution of the posterior means

for the elements of the B matrix, suggests that we can reject the prior mean assumption that these elements are

zero. Within the context of both models, the 1st element, ψ1, of Ψ, associated with GDP growth, exhibits posterior

means slightly above the assumption of 3 and the 3rd value, ψ3, associated with the interest rate, exhibits posterior

means slightly below the assumption of 2.5. Moreover, ψ2 and ψ4 both exhibit the possibility of non-zero means,

despite our assumptions.

Moreover, there is a surprising level of consistency in the evolution of the posterior means across sample

windows and window types. For example, let us consider Figures 2.23.i to 2.24.ii, in Appendix 2.12, which plot

the posterior distributions of various model parameters from both the IWSV and Clark volatility specifications,

across the sample windows for both the recursive and rolling window schemes. The posterior distributions in-

volving the rolling sample windows are much more variable than those that employ the recursive sample window

scheme.

We can also check the stability properties of both the VAR and IWSV volatility process across the sample

windows. Figures 2.13.i and 2.13.ii plot the absolute value of the largest eigenvalues from both the VAR(3)

companion matrix given as Π1 Π2 Π3

I4 0 0

0 I4 0

(2.46)

and the Υ =∑3k=1 L(Ak ⊗ Ak)D matrix, which determines the IWSV stability (recall equation (2.44) and

Proposition 2.3.1), across the N = 100 runs. We employ the posterior mean of the relevant parameters in con-

structing these matrices and their associated eigenvalues. The VAR(3) processes are generally stable. However, it

appears that there may exist a unit root in the IWSV volatility process.

Finally, Figure 2.14 plots the posterior mean of the latent stochastic volatilities for both the IWSV andClark

models, for the complete sample, given augmented parameters filtered given a recursive sample window at the

N = 100th iteration. Moreover, Figure 2.15.i and Figure 2.15.ii plot the associated stochastic correlations for

both the IWSV and Clark models respectively. The Clark model exhibits stochastic correlations which are too

smooth, and, as expected, both models exhibit a negative stochastic correlation, ρ41,t, between shocks to GDP

growth and the unemployment rate.


Figure 2.13.i: Largest eigenvalue of VAR(3) companion matrix, across N = 100 sample windows

0.88

0.89

0.9

0.91

0.92

0.93

0.94

0.95

0.96

0.97

0 10 20 30 40 50 60 70 80 90 100

IWSV volatility - recursive windowIWSV volatility - rolling window

0.88

0.9

0.92

0.94

0.96

0.98

1

0 10 20 30 40 50 60 70 80 90 100

Clark volatility - recursive windowClark volatility - rolling window

Figure 2.13.ii: Largest eigenvalue of the Υ matrix, across N = 100 sample windows

0.97

0.975

0.98

0.985

0.99

0.995

1

1.005

1.01

1.015

1.02

0 10 20 30 40 50 60 70 80 90 100

Recursive window

0.955

0.96

0.965

0.97

0.975

0.98

0.985

0.99

0.995

1

1.005

0 10 20 30 40 50 60 70 80 90 100

Rolling window


Figu

re2.

14:I

WSV

and

Cla

rk,fi

ltere

dla

tent

stoc

hast

icvo

latil

ies,

100t

hsa

mpl

ew

indo

w,R

ecur

sive

win

dow

0 10

20

30

40

50

60

70

1948

-10

1963

-01

1977

-04

1991

-07

2005

-10

σ2

11,t,

IWSV

γ2

11,t,

Cla

rk

0 5 10

15

20

25

30

35

1948

-10

1963

-01

1977

-04

1991

-07

2005

-10

σ2

22,t,

IWSV

γ2

22,t,

Cla

rk

0 1 2 3 4 5 6 7 8 1948

-10

1963

-01

1977

-04

1991

-07

2005

-10

σ2

33,t,

IWSV

γ2

33,t,

Cla

rk

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7 19

48-1

019

63-0

119

77-0

419

91-0

720

05-1

0

σ2

44,t,

IWSV

γ2

44,t,

Cla

rk


Figure 2.15.i: Real data, IWSV model, filtered latent stochastic correlations for the complete sample, n = 100

-1

-0.5

0

0.5

1

1948-10 1963-01 1977-04 1991-07 2005-10

ρ21,t

-1

-0.5

0

0.5

1

1948-10 1963-01 1977-04 1991-07 2005-10

ρ31,t

-1

-0.5

0

0.5

1

1948-10 1963-01 1977-04 1991-07 2005-10

ρ41,t

-1

-0.5

0

0.5

1

1948-10 1963-01 1977-04 1991-07 2005-10

ρ32,t

-1

-0.5

0

0.5

1

1948-10 1963-01 1977-04 1991-07 2005-10

ρ42,t

-1

-0.5

0

0.5

1

1948-10 1963-01 1977-04 1991-07 2005-10

ρ43,t

iii) Forecasts

We would now like to establish the forecast properties of the VAR associated with both the IWSV and Clark

volatility models. We first compare the MSEh,N metrics which establish point forecast accuracy across the term

structure of forecast horizons, h = 1, . . . , 20. Figure 2.16 provides plots of the percentage difference in the main

diagonal elements of theMSEh,N matrix, across h, for both the recursive sample window scheme and the rolling

sample window scheme, respectively.

From these plots it is not clear that either model performs better in terms of point forecast performance. This


Figure 2.15.ii: Real data, Clark model, filtered latent stochastic correlations for the complete sample, n = 100

-1

-0.5

0

0.5

1

1948-10 1963-01 1977-04 1991-07 2005-10

ρ21,t

-1

-0.5

0

0.5

1

1948-10 1963-01 1977-04 1991-07 2005-10

ρ31,t

-1

-0.5

0

0.5

1

1948-10 1963-01 1977-04 1991-07 2005-10

ρ41,t

-1

-0.5

0

0.5

1

1948-10 1963-01 1977-04 1991-07 2005-10

ρ32,t

-1

-0.5

0

0.5

1

1948-10 1963-01 1977-04 1991-07 2005-10

ρ42,t

-1

-0.5

0

0.5

1

1948-10 1963-01 1977-04 1991-07 2005-10

ρ43,t


is to be expected of course, since our IWSV modification to the model is an alternative specification on the

volatility of the process, not the mean.

Figure 2.16: Real data, MSE comparison of VAR forecasts, % difference, both window types (below 0, IWSV

better)

-20-15-10-5 0 5

10 15 20

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Horizon 'h'

Recursive window

GDP growthInflation rateInterest rate

Unemployment rate

-20-15-10-5 0 5

10 15 20

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Horizon 'h'

Rolling window

GDP growthInflation rate

Interest rateUnemployment rate

Turning now to comparing overall forecast performance, we again appeal to the mean of the log predictive

likelihoods, MLPLh,N , term structure, across horizons h = 1, . . . , 20. This term structure should illustrate

improved out-of-sample predictive likelihood, given the alternative specification on the second moment of the

VAR process shocks.

From Figure 2.17.i we see how the mean of these LPLh,n measures is improved as the forecast horizon ex-

tends out toH . This result is irrespective of sample window scheme, although the recursive window, which grows

in size across the iterations, tends to fare slightly better than the rolling window at further horizons. Interestingly,

the Clark model does better according to this metric at the 1st horizon, across both sample window types. It is


Figure 2.17.i: Real data, Forecast horizon term structure according to MLPLh,N metric, N = 100 samplewindows, both Recursive and Rolling sample windows

-4.6

-4.4

-4.2

-4

-3.8

-3.6

-3.4

2 4 6 8 10 12 14 16 18 20

ML

PLh,

N=1

00

Horizon 'h'

IWSV - RecursiveClark - Recursive

IWSV - RollingClark - Rolling

Figure 2.17.ii: Real data, Forecast horizon term structure according to MLPLh,N metric, N = 100 samplewindows, includes homoskedastic vt

-7

-6.5

-6

-5.5

-5

-4.5

-4

-3.5

-3

2 4 6 8 10 12 14 16 18 20

ML

PLh,

N=1

00

Horizon 'h'

IWSV - RecursiveClark - Recursive

Homoskedastic


not clear why this is the case. For reference, Figure 2.17.ii duplicates the previous figure, but includes the case of

homoskedastic VAR innovations, vt, for reference.19

Again, we can also consider the term structure of the model specific log predictive likelihoods, LPLh,n,

across the individual sample windows. Figure 2.18 plots the difference in the LPLh=20,n metrics at horizon,

h = 20, across sample windows, where model A ≡ IWSV and model B ≡ Clark. Larger values suggest

that the IWSV is more representative of the out-of-sample outcomes than the Clark at each sample window

n = 1, . . . , N .

Figure 2.18: Real data, Sample window structure of the difference of LPLh,n’s metric, h = 20

-4-3-2-1 0 1 2 3 4

0 10 20 30 40 50 60 70 80 90 100

sample window 'n'

Recursive window


-4-3-2-1 0 1 2 3 4

0 10 20 30 40 50 60 70 80 90 100

sample window 'n'

Rolling window


Moreover, these LPLh,n metrics are changing across the sample windows according to their own sampling

distribution, given in Table 2.2. Again, we see similar results as in the simulated data case above in Section

19Estimating the VAR model with homoskedastic shocks vt is accomplished by replacing the Gibbs sampler steps for the volatility parame-ters with a single Gibbs step that employs an Inverse Wishart prior density. Since the Inverse Wishart prior is conditionally conjugate with themultivariate Normal, the conditional posterior is also Inverse Wishart. That is, if π (Σ) ∼ IW (a0,V0) ⇒ p (Σ | v) ∼ IW (a1,V1)

such that V1 =∑Tt=1 vtvt

′+ V0 and a1 = T +a0, where vt is the p×1 vector of VAR residuals. V0 is set to the unconditional sample

covariance matrix of simulated VAR residuals (generated with reasonable guesses on the VAR parameters) and a0 = 15.


2.7.1.iii. For the 20th forecast horizon, h = H = 20, we find that the mean of the log predictive likelihoods,

MLPLh,N=100, are larger under the IWSV , which initially suggests that the IWSV improves out of sample

fit at the longer horizons. However, for the 1st horizon, we see the opposite, that is, the Clark has a larger

MLPLh=1,N=100 metric associated with it. As for the the other sample moments, generally they suggest larger

deviations in forecast performance across the sample windows when using the IWSV model. The larger kurtosis

and negative skew imply that the IWSV is more sensitive to rare occurrences of poor forecasting fit, this sensi-

tivity increases as the horizon extends out, and it is worse under the rolling sample window scheme. Interestingly,

with the exception of the recursive sample window scheme at horizon, h = 1, the Clark model now exhibits

a larger standard deviation of LPLh,n, metrics, than the IWSV . Finally, to get an idea of the shape of these

distributions, Figure 2.19 provides histograms of the LPLh,n, metrics across the sample windows.

Figure 2.20 provides the analog to Figure 2.18, in scatter plot format. That is, it presents the LPLh=20,n

values, for each sample window n = 1, . . . , 100, at the largest horizon, h = 20. The shape of the scatter gives

us an intuition on how each model performs across the sample windows which complements the previous Figure

2.20 which presented the sample window LPLh=20,n differences in chronological order.

Finally, Figures 2.21.i to 2.21.iv plot the out-of-sample forecasts of the VAR series, yt, out H = 20 periods,

given the last recursive sample window n = 100 (this subsample includes nearly the entire data set). While

the forecasted conditional means are quite similar between the IWSV and Clark models, as expected, the

prediction intervals are distinctly shaped, reflecting the different underlying stochastic volatility processes. The

Clark prediction intervals tend to “bell” out and expand as the horizon grows large, while the IWSV prediction

intervals tend to stabilize. Interestingly, in referencing Figure 2.18 for this particular sample window, n = 100,

we see that this represented a subsample where the Clark model performed better. Clearly, in this case the 95%

prediction intervals encompass the true data outcome more accurately than the IWSV prediction intervals.

Table 2.2: Real data, Sample moments of the LPLh,n metrics across N = 100 sample windowsRecursive sample window

Horizon h = 1 Horizon h = 20IWSV Clark IWSV Clark


Rolling sample windowHorizon h = 1 Horizon h = 20IWSV Clark IWSV Clark



Figure 2.19: Histograms of the LPLh=20,n metrics across n = 1, . . . , N sample windows, 20th horizon


Figu

re2.

20:R

eald

ata,

Sam

ple

win

dow

stru

ctur

eof

theLPLh,n

’sm

etri

c,h

=20

-14

-12

-10-8-6-4-2 0 2 -1

4-1

2-1

0-8

-6-4

-2 0

2

Clark

IWSV

Rec

ursi

ve w

indo

w

LPL

h=20

,n-1

4

-12

-10-8-6-4-2 0 2 -1

4-1

2-1

0-8

-6-4

-2 0

2

Clark

IWSV

Rol

ling

win

dow

LPL

h=20

,n


Figure 2.21.i: GDP growth series y1,t and forecast, IWSV and Clark models for vt, n = 100, Recursive sample window

-15

-10

-5

0

5

10

15

20

2001-04 2002-03 2003-02 2004-02 2005-01 2006-01 2006-12 2007-12 2008-11 2009-10 2010-10

IWSV

GDP growth2.5% C.I.

97.5% C.I.True outcome

Estimated trend

-15

-10

-5

0

5

10

15

20

2001-04 2002-03 2003-02 2004-02 2005-01 2006-01 2006-12 2007-12 2008-11 2009-10 2010-10

Clark

GDP growth2.5% C.I.


Estimated trend


Figure 2.21.ii: Inflation growth series y2,t and forecast, IWSV and Clark models for vt, n = 100, Recursive sample window

-10

-5

0

5

10

2001-04 2002-03 2003-02 2004-02 2005-01 2006-01 2006-12 2007-12 2008-11 2009-10 2010-10

IWSV

Inflation growth2.5% C.I.


Estimated trend

-10

-5

0

5

10

2001-04 2002-03 2003-02 2004-02 2005-01 2006-01 2006-12 2007-12 2008-11 2009-10 2010-10

Clark

Inflation growth2.5% C.I.


Estimated trend


Figure 2.21.iii: Interest rate series y3,t and forecast, IWSV and Clark models for vt, n = 100, Recursive sample window

-10

-5

0

5

10

15

2001-04 2002-03 2003-02 2004-02 2005-01 2006-01 2006-12 2007-12 2008-11 2009-10 2010-10

IWSV

Interest rate2.5% C.I.


Estimated trend

-10

-5

0

5

10

15

2001-04 2002-03 2003-02 2004-02 2005-01 2006-01 2006-12 2007-12 2008-11 2009-10 2010-10

Clark

Interest rate2.5% C.I.


Estimated trend


Figure 2.21.iv: Unemployment rate series y4,t and forecast, IWSV and Clark models for vt, n = 100, Recursive sample window

-4

-2

0

2

4

2001-04 2002-03 2003-02 2004-02 2005-01 2006-01 2006-12 2007-12 2008-11 2009-10 2010-10

IWSV

Unemployment rate2.5% C.I.


Estimated trend

-4

-2

0

2

4

2001-04 2002-03 2003-02 2004-02 2005-01 2006-01 2006-12 2007-12 2008-11 2009-10 2010-10

Clark

Unemployment rate2.5% C.I.


Estimated trend


2.8 Conclusion

Dramatic changes in macroeconomic time series volatility have posed a challenge to contemporary VAR fore-

casting models. Traditionally, the conditional volatility of these models had been assumed constant over time or

allowed for structural breaks across long time periods. More recent work, however, has improved forecasts by

allowing the conditional volatility to be completely time variant by specifying the VAR innovation variance as a

distinct discrete time process. For example, Clark (2011) specified the elements of the covariance matrix process

of the VAR innovations as linear functions of independent nonstationary processes.

However, it is not clear that the choice of nonstationary driving processes is suitable. Moreover, in order to

reduce parameterization, some form of fixed relationship is imposed between the elements of the VAR innovation

covariance matrix and the independent processes driving them.

Ultimately, we would like to have an empirical rationale for this choice of specification. Given this, we have

proposed and tested both the Clark (2011) benchmark model, and the alternative multivariate volatility process,

IWSV , which is constructed in such a way to directly model the time varying covariance matrices by means of

the Inverse Wishart distribution. These models have been estimated, and forecasts been constructed, on a data set

as close to Clark (2011) as possible.

Motivating this study are also a number of theoretical advantages of the proposed IWSV specification. For

one, the direct specification of the dynamics of the latent stochastic volatility process, Σt, precludes the need to

specify convoluted relationships between the (co)volatility elements of Γt, γij,t, and the driving processes λi,t,

through the B matrix. Moreover, the autoregressive dynamics between volatility series are more easily interpreted

as volatility spill-over effects, since we no longer need to disentangle these relationships. The model is now also

invariant to permutation of the order of the observed series. Finally, it is easy to derive conditions ensuring the

existence of the unconditional mean of the processes (Σt) and (yt).

In applying both models to the data, we have chosen to evaluate their performance strictly in terms of fore-

casting ability. In doing so, we have chosen to evaluate both point and interval forecasts. Point forecasts consider

the mean squared error (MSE) of out-of-sample forecasts, while interval forecasts considered the Bayesian log

predictive likelihood measure, LPLh,n, along a number of forecast horizons. Moreover, we have computed these

metrics a number of times, across sample windows, which represent subsamples of the entire data set.

Estimating both models provides a number of interesting results. First, posterior means of the parameters

are much more stable across the sample windows, given a recursive window that grows larger, than a rolling

window of fixed size. Moreover, irrespective of sample window size or scheme, the posterior means of many

parameters deviate from our assumptions on the prior means, suggesting that the data are informative, despite the

small sample size.


Interestingly, the stationarity of the multivariate volatility process driving the VAR innovations may be ques-

tionable, as the IWSV model estimates seem to suggest the possibility of at least one unit root. We also find that

the filtered latent (co)volatilities of the Clark model are much too smooth and suggest too little variation across

time.

Forecasting performance is also mixed. For example, the MSE out-of-sample forecasts suggest that neither

model exhibits a strong advantage. Turning to interval forecasts, we consider the distribution of the log predictive

likelihood, LPLh,n, measures for both models. While the IWSV exhibits a strong advantage in the sense

that the mean of these measures are substantially improved, this model suffers from the fact that the measures

are more skewed in the negative direction, and prone to rare occurrences of dramatically poor forecast fit, with

this sensitivity becoming worse as the forecast horizon grows large. Finally, as expected, given the dramatic

nonstationarity of the Clark volatility process, its prediction intervals tend to grow exponentially.

Ultimately, we must emphasize that our methodology does not presume that one of the competing models is

well specified, instead we insist on the opposite. Rather, given these results we suggest an approach that might

make use jointly of both the Clark and IWSV specifications in practice. That is, use, say, the Clark in some

environments and the IWSV in others. Moreover, we could likely encompass both models in a better, also

misspecified, model. A natural idea is to introduce a model with endogenous switching regimes, where the Clark

model is employed in situations where it performs better, while the IWSV can be used otherwise. It is important

to note that this encompassing model would not represent a mixture of the Clark and IWSV specifications, with

unknown mixing weights, regularly updated by Bayesian techniques.


2.9 References

BERNANKE, B., AND I. MIHOV (1998a): “The Liquidity Effect and Long-Run Neutrality,” Carnegie-Rochester

Conference Series on Public Policy, 49, 149-194.

—————- (1998b): “Measuring Monetary Policy,” Quarterly Journal of Economics, 113, 869-902.

CHIB, S., Y. OMORI, AND M. ASAI (2009): “Multivariate Stochastic Volatility,” in Handbook of Financial Time

Series., ed. by T.G. Anderson, et al., Berlin: Springer-Verlag Publishing.

CLARK, T.E. (2011): “Real-Time Density Forecasts from Bayesian Vector Autoregressions with StochasticVolatility,” Journal of Business & Economic Statistics, 29, 3, 327-341.

CLARK, T.E., AND M.W. McCRACKEN (2001): “Tests of Equal Forecast Accuracy and Encompassing forNested Models,” Journal of Econometrics, 105, 85-110.

—————- (2008): “Forecasting with Small Macroeconomic VARs in the Presence of Instability”, in Forecast-

ing in the Presence of Structural Breaks and Model Uncertainty, ed. by D.E. Rapach and M.E. Wohar, Bingley,UK: Emerald Publishing.

—————- (2010): “Averaging Forecasts from VARs with Uncertain Instabilities”, Journal of Applied Econo-metrics, 25, 5-29.

CLEMENTS, M.P., AND D.F. HENDRY (1998): Forecasting Economic Time Series, Cambridge, U.K.: Cam-bridge University Press.

COGLEY, T., AND T.J. SARGENT (2001): “Evolving Post-World War II U.S. Inflation Dynamics,” NBER

Macroeconomics Annual, 16, 331-373.

—————- (2005) “Drifts and Volatilities: Monetary Policies and Outcomes in the Post-World War II U.S.,”Review of Economic Dynamics, 8, 262-302.

CROUSHORE, D. (2006): “Forecasting with Real-Time Macroeconomic Data,” in Handbook of Economic Fore-

casting, ed. by G. Elliott, C. Granger, and A. Timmermann, Amsterdam: North-Holland Publishers.

DIEBOLD, F.X., AND K. YILMAZ (2008): “Measuring Financial Asset Return and Volatility Spillovers withApplication to Global Equity Markets,” Working Paper 08-16, Research Department, Federal Reserve Bank of

Philadelphia.

ENGLE, R.F., AND K.F. KRONER (1995): “Multivariate Simultaneous Generalized ARCH,” Economic Theory,11, 122-150.

FOX, E.B., AND M. WEST (2011): ”Autoregressive Models for Variance Matrices: Stationary Inverse WishartProcesses,” arXiv:1107.5239v1.


GEWEKE, J., AND G. AMISANO (2010): “Comparing and Evaluating Bayesian Predictive Distributions of As-set Returns,” International Journal of Forecasting, 26, 216-230.

GOURIEROUX, C., J. JASIAK, AND R. SUFANA (2009): “The Wishart Autoregressive Process of MultivariateStochastic Volatility,” Journal of Econometrics, 150, 167-181.

GOLOSNOY, V., B. GRIBISCH, AND R. LIESENFELD (2010): “The Conditional Autoregressive WishartModel for Multivariate Stock Market Volatility,” Working Paper, Christian-Albrechts-Universitat zu Kiel, 07.

GREENBURG, E. (2008): Introduction to Bayesian Econometrics, Cambridge, U.K.: Cambridge UniversityPress.

HAMILTON, J.D. (1989): “A New Approach to the Economic Analysis of Nonstationary Time Series and theBusiness Cycle,” Econometrica, 57, 357-384.

JORE, A.S., J. MITCHELL, AND S.P. VAHEY (2010): “Combining Forecast Densities from VARs with Uncer-tain Instabilities,” Journal of Applied Econometrics, 25, 621-634.

KIM, C.J., AND C.R. NELSON (1999): “Has the U.S. Economy Become More Stable? A Bayesian ApproachBased on a Markov Switching Model of the Business Cycle,” Review of Economics and Statistics, 81, 608-661.

KOZICKI, S., AND P.A. TINSLEY (2001a): “Shifting Endpoints in the Term Structure of Interest Rates,” Jour-

nal of Monetary Economics, 47, 613-652.

—————- (2001b): “Term Structure Views of Monetary Policy Under Alternative Models of Agent Expecta-tions,” Journal of Economic Dynamics and Control, 25, 149-184.

LITTERMAN, R.B. (1986): “Forecasting with Bayesian Vector Autoregressions: Five Years of Experience,”Journal of Business and Economic Statistics, 4, 25-38.

MAGNUS, J.R., AND H. NEUDECKER (1980): “The Elimination Matrix: Some Lemmas and Applications,”SIAM Journal on Algebraic and Discrete Methods, 1, 422-449.

—————- (1988): Matrix Differential Calculus with Applications in Statistics and Econometrics, New York:John Wiley and Sons Publishers.

McCONNELL, M., AND G. PEREZ QUIROS (2000): “Output Fluctuations in the United States: What HasChanged Since the Early 1980s?,” American Economic Review, 90, 1464-1476.

PHILIPOV, A., AND M.E. GLICKMAN (2006): “Multivariate Stochastic Volatility via Wishart Processes,” Jour-

nal of Business and Economic Statistics, 24, 3, 313-328.


PRESS, S.J. (1982): Applied Multivariate Analysis, 2nd ed., New York: Dover Publications.

PRIMICERI, G. (2005): “Time Varying Structural Vector Autoregressions and Monetary Policy,” Review of Eco-

nomic Studies, 72, 821-852.

RINNERGSCHWENTNER, W., G. TAPPEINER, AND J. WALDE (2011): “Multivariate Stochastic Volatilityvia Wishart Processes – A Continuation,” Working Papers in Economics and Statistics: University of Innsbruck,19.

ROMER, C.D., AND D.H. ROMER (2000): “Federal Reserve Information and the Behaviour of Interest Rates,”American Economic Review, 90, 429-457.

SARTORE, D., AND M. BILLIO (2005): “Stochastic Volatility Models: A Survey with Applications to OptionPricing and Value at Risk,” in Applied Quantitative Methods for Trading and Investment, ed. by C.L. Dunis, J.Laws, and P. Naim, John Wiley and Sons Publishers.

SIMS, C.A. (2001): “Comment on Sargent and Cogley’s ‘Evolving Post-World War II U.S. Inflation Dynamics’,”NBER Macroeconomics Annual, 16, 373-379.

—————- (2002): “The Role of Models and Probabilities in the Monetary Policy Process,” Brookings Papers

on Economic Activity, 2, 1-40.

STOCK, J.H. (2001): “Discussion of Cogley and Sargent ‘Evolving Post-World War II U.S. Inflation Dynamics’,”NBER Macroeconomics Annual, 16, 379-387.

VILLANI, M. (2009): “Steady-State Priors for Vector Autoregressions,” Journal of Applied Econometrics, 24,630-650.


2.10 Appendix: Real data, LDL′ factorization of Σt

Further evidence for the IWSV model can be drawn by considering the form of the constraint which theClark model places on the time varying structure of the covariance matrix process of the VAR innovations, vt.Recall from Section 2.3.1 that the Clark model imposes the following parameterization on the VAR innovationsgiven as

vt=B−1Λ0.5t εt, where εt ∼MVNp (0, Ip) . (2.47)

Of course, the constraint above implies that

Γt=B−1Λt

(B−1

)′= var (vt) . (2.48)

The interesting point to note is that this parameterization is equivalent to imposing an LDL′

factorization on thecovariance matrix of the VAR innovations, where L is a lower triangular matrix with ones on the diagonal andD is a diagonal matrix. Note that this LDL

′factorization always exists for positive definite, real, symmetric

matrices and is unique.

This result implies a method for testing whether Clark’s parametric assumption on the volatility process iscorrect. Suppose that we estimate the VAR model under the IWSV specification, given the entire data set. Ateach point in time that we draw a covariance matrix Σt

(m), from the Gibbs sampler, we factorize this covariancematrix as Σt = LtDtL

′

t. Iterating in this way provides us with a finite sample distribution of the L(m)t matrices

implied by the IWSV specification for each time period t = 1, . . . , T . Since these Lt matrices are unique,their time varying distributions must suggest something about whether or not it is appropriate to assume that theelements of the B−1 matrix in the Clark specification should be specified as constant across time.

Of course, while there are other ways to check the validity of this assumption (e.g. estimate the Clark withtime varying B matrices and compare results according to some metric), the aforementioned test proves the mostimmediately applicable.

Figures 2.22.i and 2.22.ii illustrate the results of this test which suggest that there exists significant time vari-ation in the elements of the Lt matrix factors across time, especially with regards to the elements correspondingto the pairing of GDP growth with both the inflation rate and the interest rate (i.e. L21,t and L31,t).

2.11 Appendix: Derivation of the posterior distributions needed for Gibbssampling

The following appendix describes the steps required for generating the conditional posterior distribution of theparameters used in the Gibbs sampler. First, we consider the steps required for estimation of the V AR(J) modelwith Clark volatility, then we consider the steps required for estimating the IWSV parameters. Please refer toSection 2.5 for a summary of the steps outlined below.

The benchmark volatility specification, Clark, is estimated by Gibbs sampler, given estimation steps basedon those of Villani (2009) and Cogley and Sargent (2005). Clark (2011) provides expressions for the conditionalposterior distributions of the VAR parameters of his Gibbs sampler, obtained from Mattias Villani, who himselfderived them based on the constant variance sampler employed in Villani (2009). We have re-derived them herefor completeness.


Figure 2.22.i: L21,t, time varying density

-1.5

-1

-0.5

0

0.5

1

1 20 39 58 77 96 115 134 153 172 191 210 229

L21

,t

Time 't'

L21,t97.5% C.I.2.5% C.I.

Figure 2.22.ii: L31,t, time varying density

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

1 20 39 58 77 96 115 134 153 172 191 210 229

L31

,t

Time 't'

L31,t97.5% C.I.2.5% C.I.


2.11.1 Definition of the parameters and priors

What follows provides descriptions of the model parameters and their chosen prior distributions.

i) Parameters related to the V AR(J) process:

Πj , for j = 1, . . . , J , the autoregressive coefficient matrices on the V AR(J) specification. Each Πi is ofdimension p× p. Combining all the J coefficient matrices, Πj, into one larger matrix, Π, the conditionallyconjugate prior is multivariate Normal, Π ∼ N(µΠ,ΞΠ).

Ψ, the matrix which when multiplied by the deterministic trend vector dt, forms the unconditional meanvector of the VAR specification. Ψ is of dimension p× q. The conditionally conjugate prior is multivariateNormal, Ψ ∼ N(µΨ,ΞΨ).

ii) (Augmented) parameters specific to the Clark (2011) volatility specification:

B, the lower triangular matrix with ones along its main diagonal, which when inverted, and pre and postmultiplied by Λt, forms the covariance matrix of the VAR shocks, vt, given as Γt below. B is of dimensionp× p. The elements of the B matrix are assumed independent Normal, with details provided in the relevantsection below.

Λt, the diagonal matrix which contains the nonstationary, independent, driving processes λi,t, for i =

1, . . . , p, along its main diagonal. Λt is of dimension p × p. The prior for this augmented parameter isLog-normal with details provided in the relevant section below.

Φ, the diagonal matrix which contains the variances, ϕi, of the shocks of the driving processes, ξi, fori = 1, . . . , p. Φ is of dimension p × p. We assume conditionally conjugate, independent Inverse-Gammapriors on each ϕi ∼ IG(γ2 ,

δ2 ).

iii) (Augmented) parameters specific to the proposed IWSV volatility specification:

Ak, for k = 1, . . . ,K, the autoregressive coefficient matrices on the IWSV volatility specification. EachAk is of dimension p×p, and is not necessarily symmetric. We assume multivariate Normal priors for eachof the Ak, for k = 1, . . . ,K, matrices.

C the lower diagonal constant matrix in the IWSV volatility specification. C is of dimension p × p. Weassume a multivariate Normal prior for on the C matrix.

ν the degree of freedom parameter describing the shape of the Inverse Wishart distribution driving thevolatility shocks. The degree of freedom parameter is a scalar. The prior chosen is a Gamma distribution.

Σt, ∀t, the covariance matrices of the VAR process shocks under the IWSV model. The Σt matrices areall of dimension p× p. The prior is Inverse Wishart with details provided in the relevant section below.

2.11.2 Computation of the posterior distribution for the Clark (2011) volatility model

Let us now describe the different conditional posterior distributions involved in the sequence for Gibbs sampling.

1. Draw from the posterior density of the slope coefficients Π′

= [Π1,Π2, . . . ,ΠJ] of the VAR, conditionalon Ψ, ΛT, B,Φ=diag(ϕ1, ϕ2, . . . , ϕp) and the data, given multivariate Normal prior, Π ∼ N(µΠ,ΩTΠ).


For this step we rewrite the VAR as:

Yt = Π′Xt + vt, (2.49a)

where Yt = yt−Ψdt, (2.49b)

vt = B−1Λ0.5t εt, (2.49c)

and Xt =[(yt−1−Ψdt−1)

′, (yt−2−Ψdt−2)

′, . . . , (yt−J−Ψdt−J)

′]′. (2.49d)

This is a linear model with respect to the parameter elements of the matrix Π. To clearly illustrate thislinear model we can use the alternative expression [Magnus and Neudecker (1998)]:

Yt = vec(Π′Xt

)+ vt

=(Ip ⊗X

′

t

)· vec (Π) +vt, (2.50a)

where ⊗ denotes the tensor product. Eliminating the heteroskedasticity by pre-multiplication we have

Y∗t =Γ−0.5t Yt=Γ−0.5

t

(Ip ⊗X

′

t

)· vec (Π) +εt, (2.51)

where εt ∼ N (0, Ip) and Γ−0.5t =Λ−0.5

t B. Or equivalently with clear notation:

Y∗t = X∗tvec(Π) + εt, (2.52)

where X∗t = Γ−0.5t

(Ip ⊗Xt

′)

.

Thus we have the following Lemma [See Tsay (2005), Section 12.3.2, or Box and Tiao (1973)].

Lemma 2.11.1.

Let us consider the Gaussian regression model

Y∗t = X∗tvec(Π) + εt, (2.53)

with prior,

vec(Π) ∼ N(µΠ,ΞΠ). (2.54)

Then the posterior distribution is such that vec(Π) ∼ N(µ∗Π,Ξ∗Π), where

Ξ∗Π =[Ξ−1

Π + X∗′X∗]−1

(2.55a)

and µ∗Π = Ξ∗Π[Ξ−1Π µΠ+X∗

′Y∗], (2.55b)

(2.55c)

given X∗X∗′

=∑Tt=1 X∗

′

t X∗t and X∗Y∗′

=∑Tt=1 X∗

′

t Y∗t .


The sufficient summaries of the past, appearing in this posterior distribution, are given as

X∗′X∗ =

T∑t=1

[Γ−0.5

t

(Ip ⊗X

′

t

)]′Γ−0.5

t

(Ip ⊗X

′

t

)(2.56a)

=

T∑t=1

(Ip ⊗X

′

t

)′Γ−1

t

(Ip ⊗X

′

t

)(2.56b)

and X∗′Y∗ =

T∑t=1

[Γ−0.5

t

(Ip ⊗X

′

t

)]′(Γ−0.5

t Yt) (2.56c)

=

T∑t=1

(Ip ⊗X

′

t

)′Γ−1

t Yt. (2.56d)

2. Draw from the posterior density of the coefficients Ψ defining the trend, conditional on Π, ΛT, B, and

Φ=diag(ϕ1, ϕ2, . . . , ϕp) and the data, given a multivariate Normal prior, Ψ ∼ N(µΨ,ΞΨ).

The equation defining Yt can be rewritten as:

Yt = Π (L) yt = Π (L) Ψdt + vt. (2.57)

Let us check that this is still a linear model. We have

Yt =

Ip −J∑j=1

ΠjLj

Ψdt + vt (2.58a)

= IpΨdt −Π1Ψdt−1 − · · · −ΠJΨdt−J + vt (2.58b)

=(dt

′⊗ Ip

)· vec(Ψ)−

(dt−1

′⊗Π1

)· vec(Ψ)− · · · −

(dt−J

′⊗ΠJ

)· vec(Ψ) + vt (2.58c)

=((

dt

′⊗ Ip

)−(dt−1

′⊗Π1

)− · · · −

(dt−J

′⊗ΠJ

))· vec(Ψ) + vt (2.58d)

= Xt · vec(Ψ) + vt, (2.58e)

where Xt =((

dt

′⊗ Ip

)−(dt−1

′⊗Π1

)− · · · −

(dt−J

′⊗ΠJ

)).

Thus we can standardize by pre-multiplication and get

Y∗t = Γ−0.5t Yt=Γ−0.5

t (Π (L) yt) =Γ−0.5t Xt · vec (Ψ) +εt (2.59a)

≡ X∗tvec(Ψ) + εt, say. (2.59b)

We can reapply Lemma 2.11.1, with this new set of explanatory variables, to get the posterior mean and


variance for vec(Ψ) ∼ N(µ∗Ψ,Ξ∗Ψ) as:

Ξ∗Ψ =[Ξ−1

Ψ + X∗′X∗]−1

(2.60a)

and µ∗Ψ = Ξ∗Ψ[Ξ−1Ψ µΨ+X∗

′Y∗]. (2.60b)

The sufficient summaries of the past are now given as

X∗′X∗ =

T∑t=1

(Γ−0.5

t Xt

)′Γ−0.5

t Xt =

T∑t=1

Xt

′Γ−1

t Xt (2.61a)

and X∗′Y∗ =

T∑t=1

(Γ−0.5

t Xt

)′Γ−0.5

t Yt =

T∑t=1

Xt

′Γ−1

t Yt. (2.61b)

3. Draw from the posterior density of the elements of B (lower triangular with ones in the diagonal) conditional

on Π,Ψ, ΛT, Φ=diag(ϕ1, ϕ2, . . . , ϕp) and the data, given Normal, independent, priors on each of the

elements of the B matrix.

The system defining Yt can now be rewritten as

BΠ (L) (yt −Ψdt) = BYt = Λ0.5t εt (2.62)

Since B is lower triangular, this system of equations reduces to the following

Y1,t = λ0.51,t ε1,t (2.63a)

Y2,t = −b21Y1,t + λ0.52,t ε2,t (2.63b)

Y3,t = −b31Y1,t−b32Y2,t + λ0.53,t ε3,t (2.63c)

Y4,t = −b41Y1,t−b42Y2,t−b43Y3,t + λ0.54,t ε4,t (2.63d)

...

Yp,t = −bp1Y1,t−bp2Y2,t−bp3Y3,t − . . .−bp,(p−1)Y(p−1),t + λ0.5p,tεp,t (2.63e)

where Yi,t is the ith element of the p× 1 column vector Π (L) (yt −Ψdt) = Yt.

We can treat each of the i = 2, . . . , p equations above as linear regressions. Again, pre-multiplication of

each of the i equations by λ−0.5i,t , ∀t, removes the heteroskedasticity. Furthermore, given the assumption

of independent Normal prior densities, the conditional posterior for each row vector of B is also Normal:


N (βi∗,Gi

∗) , ∀i = 2, . . . , p, where

G∗i =[Gi−1 +X∗i

′X∗i

]−1

, (2.64a)

and β∗i = G∗i [G−1i βi+X∗i

′Y∗i ], (2.64b)

and

Y∗i =[λ−0.5i,1 Yi,1, . . . , λ

−0.5i,T Yi,T

]′(2.65a)

and X∗i =

−λ−0.5

i,1 Y1,1 −λ−0.5i,1 Y2,1 . . . −λ−0.5

i,1 Yi−1,1

. . . . . . . . . . . .

−λ−0.5i,T Y1,T −λ−0.5

i,T Y2,T . . . −λ−0.5i,T Yi−1,T

. (2.65b)

4. Draw from the posterior density of the elements of the time varying covariance matrix Λt for each time

t = 1, . . . , T in sequence, each conditional on Π, Ψ, B, and Φ=diag(ϕ1, ϕ2, . . . , ϕp) and the data.

Since the stochastic volatilities are independent of each other for all i = 1, . . . , p, we can estimate the

corresponding equation separately. In order to do so, we need an expression for the posterior density of

each augmented parameter λi,t conditional on everything else, including the entire macroeconomic series

values for all t = 1, . . . , T .

Since each volatility is Markov of order one, we can write for each i = 1, . . . , p

g(λi,t

∣∣ λi,\t, ϕi,Y∗i ) ∝ f (Y∗i | λi) g(λi,t

∣∣ λi,\t, ϕi) ∝ f (y∗i,t ∣∣ λi,t) g (λi,t ∣∣ λi,\t, ϕi)= f

(y∗i,t

∣∣ λi,t) g (λi,t | λi,t−1, ϕi) g (λi,t+1 | λi,t, ϕi) = f(y∗i,t

∣∣ λi,t) g (λi,t | λi,t−1, λi,t+1, ϕi)

(2.66)

where λi,\t denotes all elements of the λi vector except for the tth element, Y∗i =y∗i,1, . . . , y

∗i,T

, and

y∗i,t is the ith element of BΠ (L) (yt −Ψdt). Furthermore, since

λi,t|λi,t−1 ∼ LN(eln(λi,t−1) +

ϕi2 , (e

ϕi − 1)e2ln(λi,t−1) +ϕi), (2.67)


we have

f(y∗i,t

∣∣ λi,t) g (λi,t | λi,t−1, λi,t+1, ϕi)

∝ λ−0.5i,t exp

(−(y∗i,t)2

2λi,t

)λ−1i,t exp

(− (ln (λi,t) − µi,t)2

2σ2

), (2.68)

where we can solve for missing values according to Section 12.6.1 of Tsay (2005) and find that

µi,t =1

2(ln (λi,t+1) + ln (λi,t−1) ) , (2.69a)

and σ2 =1

2ϕi. (2.69b)

Therefore, in implementing a Metropolis-within-Gibbs step we can draw a proposal from the Gibbs sampler

for λ(m)i,t , from λ

(m)i,t ∼ LN(eµi,t+

σ2

2 , (eµi,t −1)e2µi,t+σ

2

), and accept it as the mth draw with probability

α(λ

(m−1)i,t , λ

(m)i,t

)= min1,

f(y∗i,t

∣∣∣ λ(m)i,t

)f(y∗i,t

∣∣∣ λ(m−1)i,t

), (2.70)

since the proposal densities cancel out in the ratio.

5. Draw from the posterior density of the diagonal elements of Φ conditional on Π, Ψ, B, and ΛT and the

data.

The Inverse Gamma prior is conjugate for the variance parameter of the Normal density. Therefore, the

conditional posterior of ϕi is also Inverse Gamma as

f(ϕi|λi) ∝ h (λi | ϕi) p(ϕi)

∝T∏t=1

1

ϕ0.5i

exp

− (ln (λi,t) − ln (λi,t−1) )

2

2ϕi

× ϕ−( γ2 +1)

i e− δ

2ϕi . (2.71)

Furthermore, the right hand side above is equal to

ϕ−( γ2 +1)−T2i exp

− δ

2ϕi− 1

2ϕi

T∑t=1

ln

(λi,tλi,t−1

) 2

= ϕ−( γ+T2 +1)i exp

−δ +

∑Tt=1 ln

(λi,tλi,t−1

) 2

2ϕi

(2.72)

Consequently, assuming identical Inverse Gamma priors on each ϕi ∼ IG(γ2 ,δ2 ), the conditional posterior


is also Inverse Gamma, or IG(γ∗

2 ,δ∗

2 ), where

γ∗ = γ + T, (2.73a)

and δ∗ = δ +

T∑t=1

(ln

(λi,tλi,t−1

) )2

. (2.73b)

2.11.3 Computation of the posterior distribution for the IWSV model

Let us now describe the sequence of conditional posterior distributions for the IWSV model.

1. First, we repeat step (1) above in Section 2.11.2, except that we replace Γt=B−1Λt

(B−1

)′= var (vt)

with Σt. That is, we no longer condition on B, ΛT, and Φ, but rather on Ak, for k = 1, . . . ,K, C, ν, andΣT.

2. Repeat step (2) above in Section 2.11.2, except this time replace Γt=B−1Λt

(B−1

)′= var (vt), with Σt.

3. Draw from the posterior density of the parameters Ak, ∀k,C, and ν jointly, conditional on Π, Ψ, and ΣT,and the data.

All of the individual elements of the parameter matrices Ak, ∀k,C and ν are drawn jointly by a Metropolis-within-Gibbs step employing a random walk proposal. The joint proposal is multivariate Normal, and weassume multivariate Normal priors on both Ak, ∀k and C and a Gamma prior on (ν − p). See Section 2.4on priors for more details.

The random walk multivariate Normal proposal is symmetric and conditioned on the last value in the pro-cess through its mean vector; therefore it drops out of the acceptance ratio. The variance of the proposal isinitially set to the inverse of the observed negative Hessian matrix at the mode of the conditional posteriorfor a first attempt, and then a second attempt is employed using the covariance matrix of the initial Markovprocess draws themselves for improved mixing.

Moreover, the likelihood of the IWSV model is now given as

f (v | θ) = L (θ) =

T∏t=1

f (vt | Σt) g (Σt | Σt−1, . . . ,Σt−K; θ)

=

T∏t=1

1

(2π)p2 |Σt|

12

exp

−1

2v′

tΣ−1t vt

× 2−( νp2 ) |St−1|

ν2 Γp

(ν2

)−1

|Σt|−(ν+p+1)/2exp

−1

2tr[St−1Σ−1

t

], (2.74)

where vt = Π (L) (yt −Ψdt) is a function of the data, y. Therefore, by Bayes Theorem we can considerthe conditional posterior of θ as proportional to the likelihood (which is really a function of the data) timesthe prior density for θ (where θ = A1, . . . ,Ak,C, ν) as follows

p(θ|yT,Π,Ψ,ΣT) ∝ L(θ)π(θ) = f(yT,ΣT|θ; Π,Ψ)π(θ) ∝ f(vT|θ)π(θ). (2.75)

Therefore, the Metropolis acceptance probability of the mth draw, θ(m), in the random walk sampler can


be expressed as

α(θ(m−1), θ(m)) = min

1,

p(θ(m)|yT,Π,Ψ,ΣT)

p(θ(m−1)|yT,Π,Ψ,ΣT)

. (2.76)

4. Similarly to step (4) above, we now draw from the posterior density of Σt conditional onΣ\t,Ak, ∀k,C, ν, Π, Ψ, and the data, in sequence for each time t = 1, . . . , T . We have:

P(Σt

∣∣ Σ\t,v)∝ P (vt | Σt)P (Σt | Σt−1)P (Σt+1 | Σt) (2.77a)

∝ |Σt|−12 |St|

ν2 |Σt|−(ν+p+1)/2

exp

−1

2tr[(

St−1 + vtv′

t

)Σ−1

t

]exp

−1

2tr[StΣ

−1t+1

], (2.77b)

whereSt−1

ν − p− 1= CC

′+

K∑k=1

AkΣ−1t−kA

′

k. (2.77c)

Therefore, by letting the proposal be Inverse Wishart Σt ∼ IWp

(ν,S∗t−1

)where S∗t−1 = St−1 + vtv

′

t,the proposal drops out of the Metropolis-Hastings ratio. Indeed, the probability of accepting the mth drawof Σ

(m)t , sequentially, for each time period t = 1, . . . , T , is now 20

α(Σ

(m−1)t ,Σ

(m)t

)= min

1,

∣∣∣Σ(m)t

∣∣∣− 12∣∣∣S(m)

t

∣∣∣ ν2 exp− 1

2 tr[S

(m)t Σ−1

t+1

]∣∣∣Σ(m−1)

t

∣∣∣− 12∣∣∣S(m−1)

t

∣∣∣ ν2 exp− 1

2 tr[S

(m−1)t Σ−1

t+1

] . (2.78)

2.12 Appendix: Tables and Figures

20To avoid numerical problems logs are taken of both the numerator and denominator, then differenced, before finally taking their exponen-tial. This avoids issues when the non-logged function values grow either too large or too small to be machine comparable.


Table 2.3.i: Section 2.7.1: Posterior distribution of the parameters for IWSV model, Simulated data, 1st sample, Rolling window

Parameter Pop. value 2.5% C.I. mean 97.5% C.I. Parameter Pop. value 2.5% C.I. mean 97.5% C.I.vech(C) 0.3 0.2778 0.2984 0.3208 vec(Π1) 0.2 0.0976 0.2185 0.3374

0 -0.0229 0.0046 0.0320 0 -0.1666 -0.0157 0.13600 -0.0413 -0.0069 0.0248 0 -0.2724 -0.0920 0.08820 -0.0407 -0.0048 0.0381 0 -0.2415 0.0110 0.2627

0.3 0.2775 0.3038 0.3337 0 -0.0806 0.0113 0.10250 -0.0375 -0.0057 0.0273 0.8 0.5904 0.7091 0.82770 -0.0392 -0.0003 0.0418 0 -0.0888 0.0456 0.1798

0.3 0.2736 0.3077 0.3427 0 -0.1756 0.0218 0.21880 -0.0468 -0.0039 0.0403 0 -0.1156 -0.0425 0.0309

0.3 0.2701 0.3138 0.3577 0 -0.0961 -0.0047 0.0870vec(A1) 0.5 0.4449 0.4943 0.5344 0.8 0.7345 0.8532 0.9721

0 -0.0417 0.0020 0.0486 0 -0.2937 -0.1246 0.04470 -0.0438 0.0019 0.0455 0 -0.0508 -0.0107 0.02910 -0.0468 0.0014 0.0503 0 0.0227 0.0744 0.12620 -0.0467 -0.0018 0.0487 0 -0.1318 -0.0693 -0.0065

0.75 0.7062 0.7440 0.7811 0.8 0.6551 0.7704 0.88630 -0.0477 -0.0029 0.0421 vec(Π2) 0 -0.2741 -0.1557 -0.03670 -0.0466 -0.0011 0.0451 0 -0.2246 -0.0768 0.06970 -0.0447 -0.0077 0.0338 0 -0.1544 0.0256 0.20420 -0.0444 -0.0030 0.0349 0 -0.2398 0.0054 0.2486

0.85 0.8052 0.8459 0.8804 0 -0.1330 -0.0199 0.09270 -0.0417 -0.0008 0.0415 0 -0.0688 0.0753 0.21750 -0.0200 0.0027 0.0293 0 -0.2405 -0.0768 0.08600 -0.0191 0.0039 0.0267 0 -0.1928 0.0324 0.25630 -0.0250 0.0024 0.0279 0 -0.0284 0.0688 0.1653

0.98 0.9477 0.9739 1.0006 0 -0.1910 -0.0714 0.0485vec(A2) 0 0.0007 0.0189 0.0508 0 -0.1478 0.0007 0.1505

0 -0.0455 -0.0005 0.0402 0 -0.1146 0.0918 0.29590 -0.0424 0.0001 0.0421 0 -0.0796 -0.0276 0.02440 -0.0445 0.0016 0.0456 0 -0.0540 0.0124 0.07810 -0.0416 0.0002 0.0460 0 0.0106 0.0902 0.16910 -0.0438 -0.0014 0.0400 0 -0.0797 0.0618 0.20050 -0.0458 -0.0024 0.0458 vec(Π3) 0 -0.1753 -0.0536 0.06760 -0.0440 -0.0007 0.0428 0 -0.0817 0.0704 0.22270 -0.0452 -0.0003 0.0453 0 -0.0947 0.0814 0.25650 -0.0409 0.0010 0.0412 0 -0.3865 -0.1355 0.11400 -0.0487 0.0017 0.0479 0 -0.0670 0.0241 0.11440 -0.0488 -0.0033 0.0466 0 -0.1501 -0.0357 0.07860 -0.0372 0.0016 0.0433 0 -0.1332 -0.0015 0.13190 -0.0421 -0.0007 0.0410 0 -0.1896 0.0080 0.20930 -0.0450 -0.0002 0.0431 0 -0.1138 -0.0421 0.02960 -0.0381 0.0018 0.0450 0 -0.0385 0.0546 0.1461

vec(A3) 0 0.0005 0.0158 0.0448 0 -0.1840 -0.0714 0.04010 -0.0382 0.0021 0.0477 0 -0.2107 -0.0451 0.12060 -0.0470 -0.0011 0.0464 0 -0.0360 0.0064 0.04880 -0.0499 -0.0001 0.0445 0 -0.1245 -0.0708 -0.01770 -0.0428 0.0005 0.0423 0 -0.0620 0.0039 0.07050 -0.0422 0.0001 0.0404 0 -0.1837 -0.0698 0.04390 -0.0434 -0.0046 0.0376 Ψ 3 2.9699 3.0083 3.04650 -0.0420 0.0006 0.0432 0 -0.0832 -0.0223 0.03880 -0.0478 -0.0032 0.0379 2.5 2.4380 2.4987 2.56000 -0.0452 -0.0037 0.0425 0 -0.0596 0.0016 0.06300 -0.0459 -0.0002 0.04830 -0.0424 -0.0003 0.04410 -0.0387 -0.0001 0.03880 -0.0396 0.0022 0.03990 -0.0396 0.0040 0.05440 -0.0395 0.0041 0.0477

ν 30 15.9899 19.9123 26.1828


Table 2.3.ii: Section 2.7.1: Distribution of the posterior mean of the parameters for IWSV model, Simulated data, across N = 100samples, Rolling window

Parameter Pop. value 2.5% C.I. mean 97.5% C.I. Parameter Pop. value 2.5% C.I. mean 97.5% C.I.vech(C) 0.3 0.2943 0.3003 0.3070 vec(Π1) 0.2 0.2145 0.2303 0.2498

0 -0.0003 0.0033 0.0086 0 -0.0441 -0.0107 0.02060 -0.0146 -0.0093 -0.0053 0 -0.1211 -0.0611 0.00060 -0.0105 -0.0051 0.0005 0 -0.0394 -0.0132 0.0140

0.3 0.3032 0.3098 0.3155 0 0.0114 0.0360 0.05220 -0.0133 -0.0068 -0.0001 0.8 0.6974 0.7248 0.74970 -0.0069 -0.0019 0.0031 0 -0.0833 -0.0031 0.0751

0.3 0.2924 0.2994 0.3064 0 -0.0148 0.0361 0.07510 -0.0041 0.0007 0.0059 0 -0.0377 -0.0055 0.0193

0.3 0.3101 0.3147 0.3192 0 -0.0301 -0.0124 -0.0005vec(A1) 0.5 0.4921 0.4970 0.5031 0.8 0.6782 0.7691 0.8529

0 -0.0052 -0.0004 0.0048 0 -0.1299 -0.1068 -0.06620 -0.0050 -0.0011 0.0028 0 -0.0074 -0.0009 0.00560 -0.0069 -0.0016 0.0029 0 0.0089 0.0418 0.07470 -0.0022 0.0038 0.0106 0 -0.0901 -0.0726 -0.0578

0.75 0.7377 0.7438 0.7492 0.8 0.7667 0.7887 0.81480 -0.0060 -0.0005 0.0046 vec(Π2) 0 -0.1492 -0.1286 -0.10640 -0.0098 -0.0041 0.0019 0 -0.0638 -0.0408 -0.02300 -0.0127 -0.0053 0.0031 0 -0.0440 0.0070 0.04230 -0.0105 -0.0024 0.0052 0 0.0107 0.0671 0.1271

0.85 0.8334 0.8408 0.8481 0 -0.0437 -0.0253 -0.01190 -0.0058 -0.0014 0.0034 0 0.0112 0.0404 0.06960 -0.0134 -0.0056 0.0040 0 -0.1043 -0.0603 0.00240 0.0000 0.0051 0.0120 0 -0.0987 -0.0476 0.00280 -0.0043 0.0018 0.0088 0 0.0368 0.0551 0.0869

0.98 0.9675 0.9743 0.9825 0 -0.0745 -0.0430 -0.0105vec(A2) 0 0.0156 0.0178 0.0208 0 -0.0068 0.0230 0.0750

0 -0.0058 0.0001 0.0050 0 -0.0123 0.0596 0.12910 -0.0044 0.0000 0.0044 0 -0.0298 -0.0116 0.00310 -0.0052 0.0001 0.0051 0 -0.0047 0.0103 0.02400 -0.0037 0.0000 0.0043 0 0.0596 0.0895 0.10910 -0.0055 -0.0003 0.0047 0 0.0423 0.0695 0.10540 -0.0039 0.0003 0.0051 vec(Π3) 0 -0.0437 -0.0268 -0.01000 -0.0042 0.0001 0.0049 0 -0.0723 -0.0117 0.06310 -0.0043 -0.0001 0.0041 0 0.0212 0.0502 0.07850 -0.0049 0.0004 0.0043 0 -0.1623 -0.1254 -0.09140 -0.0055 -0.0002 0.0047 0 0.0073 0.0316 0.06900 -0.0053 0.0001 0.0057 0 -0.0352 -0.0194 -0.00170 -0.0044 0.0000 0.0037 0 -0.0057 0.0414 0.07550 -0.0047 -0.0003 0.0043 0 -0.0033 0.0635 0.11900 -0.0040 0.0006 0.0052 0 -0.0634 -0.0495 -0.03900 -0.0042 0.0001 0.0052 0 0.0088 0.0386 0.0664

vec(A3) 0 0.0147 0.0178 0.0206 0 -0.0939 -0.0620 -0.02180 -0.0045 0.0000 0.0050 0 -0.1322 -0.0835 -0.04060 -0.0048 0.0001 0.0047 0 -0.0266 -0.0082 0.01240 -0.0045 0.0003 0.0053 0 -0.0705 -0.0422 -0.02590 -0.0044 0.0000 0.0054 0 -0.0268 -0.0144 0.00480 -0.0042 0.0001 0.0042 0 -0.0778 -0.0592 -0.04240 -0.0057 0.0000 0.0049 Ψ 3 3.0092 3.0289 3.04820 -0.0043 0.0000 0.0046 0 -0.0236 -0.0207 -0.01630 -0.0059 0.0002 0.0054 2.5 2.4865 2.4942 2.50110 -0.0040 0.0002 0.0052 0 0.0001 0.0015 0.00270 -0.0051 0.0002 0.00520 -0.0031 0.0002 0.00360 -0.0038 0.0001 0.00380 -0.0045 0.0000 0.00480 -0.0042 0.0002 0.00480 -0.0057 -0.0002 0.0050

ν 30 18.2853 19.8090 21.3300


Table 2.4.i: Section 2.7.1: Posterior distribution of the parameters for Clark model, Simulated data, 1st sample, Rolling window

Parameter Pop. value 2.5% C.I. mean 97.5% C.I. Parameter Pop. value 2.5% C.I. mean 97.5% C.I.b21 n/a -0.1449 0.0192 0.1800 vec(Π1) 0.2 0.0839 0.2117 0.3408b31 -0.0650 0.1480 0.3667 0 -0.1787 -0.0310 0.1214b32 -0.0766 0.0693 0.2143 0 -0.2839 -0.0989 0.0873b41 -0.3200 0.0221 0.3661 0 -0.2180 0.0422 0.3032b42 -0.2173 0.0609 0.3390 0 -0.0723 0.0170 0.1059b43 -0.3054 -0.0779 0.1498 0.8 0.5884 0.7131 0.8391ϕ1 0.1643 0.2869 0.4712 0 -0.1150 0.0198 0.1573ϕ2 0.1624 0.2783 0.4575 0 -0.2074 -0.0027 0.2066ϕ3 0.1625 0.2784 0.4718 0 -0.1155 -0.0420 0.0307ϕ4 0.1655 0.2857 0.4731 0 -0.1196 -0.0223 0.0741

0.8 0.7102 0.8354 0.96110 -0.3274 -0.1493 0.02910 -0.0555 -0.0175 0.02110 0.0135 0.0681 0.12370 -0.1441 -0.0806 -0.0168

0.8 0.6478 0.7690 0.8883vec(Π2) 0 -0.2772 -0.1491 -0.0207

0 -0.2267 -0.0713 0.08210 -0.1454 0.0432 0.22800 -0.2279 0.0210 0.26800 -0.1414 -0.0332 0.07640 -0.0729 0.0736 0.21880 -0.2103 -0.0430 0.12030 -0.1926 0.0414 0.27650 -0.0142 0.0742 0.16210 -0.1774 -0.0584 0.06190 -0.1553 -0.0034 0.14810 -0.0863 0.1282 0.34200 -0.0602 -0.0109 0.03850 -0.0536 0.0157 0.08420 0.0181 0.0985 0.17910 -0.0803 0.0655 0.2081

vec(Π3) 0 -0.1198 0.0034 0.12650 -0.0921 0.0618 0.21470 -0.1051 0.0763 0.25740 -0.3904 -0.1337 0.12350 -0.0537 0.0316 0.11620 -0.1479 -0.0305 0.08750 -0.1297 0.0010 0.13270 -0.2270 -0.0112 0.20510 -0.1037 -0.0370 0.02950 -0.0458 0.0465 0.13790 -0.1650 -0.0478 0.06950 -0.2126 -0.0398 0.13220 -0.0392 0.0020 0.04310 -0.1223 -0.0666 -0.01280 -0.0792 -0.0091 0.06070 -0.1899 -0.0723 0.0474

Ψ 3 2.9584 2.9990 3.03990 -0.0846 -0.0235 0.0378

2.5 2.4360 2.4970 2.55870 -0.0591 0.0024 0.0637


Table 2.4.ii: Section 2.7.1: Distribution of the posterior mean of the parameters for Clark model, Simulated data, across N = 100samples, Rolling window

Parameter Pop. value 2.5% C.I. mean 97.5% C.I. Parameter Pop. value 2.5% C.I. mean 97.5% C.I.b21 n/a -0.0622 -0.0161 0.0575 vec(Π1) 0.2 0.2074 0.2277 0.2505b31 0.0581 0.1111 0.1570 0 -0.0475 -0.0157 0.0198b32 0.0447 0.0737 0.0976 0 -0.1249 -0.0656 -0.0187b41 0.0136 0.1623 0.2286 0 -0.0248 0.0012 0.0265b42 -0.0571 -0.0100 0.0425 0 0.0155 0.0661 0.1033b43 -0.1218 -0.0750 0.0001 0.8 0.6895 0.7079 0.7328ϕ1 0.2860 0.2984 0.3196 0 -0.0761 -0.0090 0.0645ϕ2 0.2774 0.2914 0.3149 0 -0.0254 0.0169 0.0461ϕ3 0.2659 0.2872 0.3172 0 -0.0409 -0.0096 0.0230ϕ4 0.2744 0.2997 0.3161 0 -0.0643 -0.0407 -0.0207

0.8 0.6899 0.7621 0.83390 -0.1470 -0.1236 -0.09460 -0.0179 -0.0070 0.00220 0.0006 0.0443 0.07030 -0.1090 -0.0846 -0.0703

0.8 0.7640 0.7833 0.8128vec(Π2) 0 -0.1434 -0.1120 -0.0737

0 -0.0686 -0.0317 -0.00040 0.0121 0.0386 0.07040 0.0286 0.0782 0.14410 -0.0708 -0.0528 -0.02570 0.0084 0.0390 0.06890 -0.0699 -0.0414 0.01190 -0.0815 -0.0274 0.01870 0.0504 0.0684 0.09950 -0.0635 -0.0325 0.00100 -0.0191 0.0215 0.07540 0.0013 0.0799 0.14600 -0.0150 0.0037 0.02140 -0.0108 0.0104 0.02620 0.0854 0.1053 0.13030 0.0481 0.0778 0.1082

vec(Π3) 0 -0.0087 0.0169 0.03820 -0.0770 -0.0212 0.04840 0.0023 0.0451 0.08320 -0.1576 -0.1200 -0.08860 0.0166 0.0399 0.06800 -0.0290 -0.0019 0.02010 -0.0174 0.0305 0.05640 -0.0172 0.0443 0.09810 -0.0677 -0.0514 -0.03590 0.0155 0.0505 0.07380 -0.0816 -0.0649 -0.05200 -0.1280 -0.0889 -0.04450 -0.0359 -0.0135 0.01140 -0.0660 -0.0394 -0.02010 -0.0458 -0.0301 -0.00600 -0.0893 -0.0699 -0.0503

Ψ 3 2.9994 3.0256 3.04710 -0.0282 -0.0236 -0.0181

2.5 2.4833 2.4926 2.50090 -0.0003 0.0019 0.0034


Table 2.5.i: Section 2.7.2: Posterior distribution of the parameters for IWSV model, Real data, 1st sample, Recursive windowPosterior Posterior

Parameter Prior mean 2.5% C.I. mean 97.5% C.I. Parameter Prior mean 2.5% C.I. mean 97.5% C.I.vech(C) 0.3 0.3026 0.3668 0.4314 vec(Π1) 0.2 0.0998 0.2678 0.4365

0 -0.0924 -0.0101 0.0665 0 -0.0432 0.0129 0.06960 -0.0625 0.0089 0.0712 0 -0.0086 0.0187 0.04620 -0.0526 0.0009 0.0526 0 -0.0456 -0.0288 -0.0128

0.3 0.3004 0.3702 0.4730 0 0.2734 0.6964 1.13890 -0.0661 -0.0065 0.0591 0.8 0.3613 0.5683 0.77600 -0.0706 -0.0230 0.0256 0 -0.0551 0.0092 0.0750

0.3 0.1957 0.2553 0.3155 0 -0.0602 -0.0193 0.01900 -0.0541 -0.0050 0.0480 0 -0.6476 -0.1205 0.4230

0.3 0.0882 0.1251 0.1671 0 -0.0750 0.1870 0.4571vec(A1) 0.9 0.9828 1.0039 1.0259 0.8 1.0069 1.1873 1.3663

0 -0.0151 0.0101 0.0246 0 -0.0583 0.0105 0.07610 -0.0046 0.0046 0.0140 0 -0.6838 0.0451 0.77510 -0.0163 -0.0098 -0.0039 0 -0.7457 -0.2081 0.33900 -0.1494 -0.0655 0.0745 0 -0.3954 -0.0664 0.2640

0.9 0.8988 0.9514 0.9860 0.8 0.9701 1.1614 1.35340 -0.0364 -0.0029 0.0260 vec(Π2) 0 -0.0961 0.0635 0.22140 -0.0089 0.0083 0.0255 0 -0.0348 0.0229 0.07950 -0.0757 0.0047 0.0844 0 -0.0180 0.0107 0.03910 -0.0931 -0.0094 0.0740 0 -0.0378 -0.0224 -0.0072

0.9 0.8054 0.8714 0.9351 0 -0.9829 -0.5763 -0.15040 -0.0614 -0.0155 0.0244 0 -0.0972 0.0953 0.29250 -0.1045 -0.0133 0.0735 0 -0.0895 -0.0237 0.04320 -0.1191 -0.0446 0.0536 0 -0.0280 0.0114 0.05040 -0.0744 0.0077 0.0824 0 -0.9201 -0.2808 0.3699

0.9 0.7397 0.8091 0.8692 0 -0.4920 -0.1107 0.2704vec(A2) 0 0.0017 0.0405 0.1098 0 -0.7450 -0.4738 -0.2042

0 -0.0613 -0.0032 0.0616 0 -0.1357 -0.0376 0.06290 -0.0280 0.0015 0.0281 0 -0.3902 0.3363 1.06350 -0.0170 -0.0016 0.0141 0 -0.4375 0.1732 0.78280 -0.0974 -0.0062 0.0935 0 -0.3910 0.0464 0.47600 -0.0970 -0.0014 0.0931 0 -0.6903 -0.4303 -0.16410 -0.0621 0.0035 0.0640 vec(Π3) 0 -0.3406 -0.1860 -0.03560 -0.0439 0.0000 0.0385 0 -0.0329 0.0165 0.06650 -0.1015 -0.0019 0.0791 0 -0.0300 -0.0042 0.02140 -0.0829 -0.0003 0.0934 0 -0.0161 -0.0025 0.01150 -0.0854 -0.0009 0.0903 0 -0.6159 -0.2589 0.08890 -0.0677 -0.0023 0.0659 0 -0.1229 0.0435 0.21000 -0.1018 -0.0074 0.0855 0 -0.0574 -0.0055 0.04770 -0.1016 -0.0071 0.0795 0 -0.0161 0.0169 0.04900 -0.0804 0.0087 0.0984 0 -0.3772 0.2538 0.89650 -0.0952 -0.0038 0.0792 0 -0.2421 0.0972 0.4394

vec(A3) 0 0.0022 0.0427 0.1221 0 0.0530 0.2489 0.44100 -0.0603 0.0011 0.0560 0 -0.0582 0.0239 0.10640 -0.0282 0.0011 0.0300 0 -0.4226 0.2740 0.97740 -0.0144 0.0012 0.0171 0 -0.4789 0.0136 0.50480 -0.0911 -0.0016 0.0869 0 -0.3233 -0.0316 0.26550 -0.0893 0.0014 0.0916 0 -0.0173 0.1529 0.32120 -0.0596 -0.0047 0.0552 Ψ 3 2.9048 3.2533 3.58010 -0.0403 0.0043 0.0457 0 -0.1817 0.1979 0.57110 -0.0712 0.0140 0.1038 2.5 0.8292 1.8164 3.08350 -0.0958 -0.0006 0.0941 0 -0.1466 0.2088 0.54090 -0.0922 -0.0082 0.08070 -0.0632 0.0059 0.07770 -0.0802 0.0066 0.09230 -0.0924 0.0057 0.09110 -0.0755 0.0048 0.08950 -0.0822 0.0020 0.0853

ν 15 13.0737 15.9185 19.2311


Table 2.5.ii: Section 2.7.2: Distribution of the posterior mean of the parameters for IWSV model, Real data, across N = 100 samples,Recursive window

Distribution DistributionParameter Prior mean 2.5% C.I. mean 97.5% C.I. Parameter Prior mean 2.5% C.I. mean 97.5% C.I.vech(C) 0.3 0.3169 0.3570 0.3940 vec(Π1) 0.2 0.1783 0.2069 0.2417

0 -0.0472 -0.0101 0.0244 0 0.0188 0.0283 0.04010 -0.0102 0.0087 0.0266 0 0.0138 0.0169 0.02030 -0.0098 0.0029 0.0156 0 -0.0314 -0.0254 -0.0222

0.3 0.3078 0.3575 0.3972 0 0.2150 0.3770 0.61170 -0.0137 -0.0019 0.0135 0.8 0.5636 0.5971 0.63600 -0.0271 -0.0174 -0.0100 0 0.0000 0.0181 0.0254

0.3 0.1809 0.2132 0.2406 0 -0.0215 -0.0054 0.00130 -0.0202 -0.0078 0.0042 0 -0.0235 0.1114 0.2094

0.3 0.0946 0.1171 0.1378 0 0.0029 0.0524 0.1450vec(A1) 0.9 0.9962 1.0018 1.0068 0.8 1.1440 1.2516 1.3223

0 -0.0110 0.0013 0.0104 0 -0.0072 -0.0019 0.01040 0.0023 0.0047 0.0076 0 -0.0303 0.0498 0.12400 -0.0112 -0.0100 -0.0089 0 -0.2833 -0.2083 -0.13130 -0.0792 -0.0297 0.0369 0 -0.1228 -0.0924 -0.0575

0.9 0.9218 0.9332 0.9467 0.8 1.1078 1.1842 1.23440 -0.0033 0.0036 0.0091 vec(Π2) 0 0.0807 0.1457 0.18850 -0.0004 0.0040 0.0076 0 0.0078 0.0137 0.02080 -0.0436 -0.0006 0.0357 0 0.0071 0.0107 0.01550 -0.0238 0.0082 0.0414 0 -0.0234 -0.0186 -0.0150

0.9 0.8701 0.8866 0.8998 0 -0.5158 -0.3684 -0.27940 -0.0247 -0.0191 -0.0134 0 0.0854 0.1547 0.19280 -0.0335 -0.0008 0.0359 0 -0.0230 -0.0055 0.00480 -0.0429 -0.0184 0.0141 0 0.0070 0.0103 0.01550 -0.0244 -0.0019 0.0221 0 -0.4067 -0.2102 -0.0803

0.9 0.7822 0.7959 0.8100 0 -0.1261 0.0442 0.1253vec(A2) 0 0.0363 0.0452 0.0544 0 -0.5686 -0.5257 -0.4351

0 -0.0141 -0.0011 0.0110 0 -0.0389 -0.0268 -0.01870 -0.0026 0.0011 0.0056 0 0.1418 0.3322 0.47010 -0.0029 -0.0014 0.0005 0 0.0574 0.1723 0.23860 -0.0115 -0.0004 0.0103 0 -0.0120 0.0219 0.06370 -0.0182 -0.0062 0.0060 0 -0.4234 -0.3567 -0.30300 -0.0085 -0.0007 0.0072 vec(Π3) 0 -0.1889 -0.1567 -0.10480 -0.0021 0.0019 0.0054 0 -0.0003 0.0141 0.02290 -0.0127 0.0011 0.0144 0 -0.0102 -0.0065 -0.00270 -0.0109 0.0021 0.0172 0 -0.0061 0.0004 0.00430 -0.0202 -0.0054 0.0067 0 -0.3499 -0.2877 -0.21740 -0.0081 -0.0001 0.0064 0 0.0297 0.1074 0.14370 -0.0120 -0.0005 0.0097 0 -0.0171 -0.0132 -0.00630 -0.0109 0.0002 0.0121 0 0.0170 0.0217 0.02550 -0.0084 0.0005 0.0120 0 -0.0871 0.0283 0.19340 -0.0108 0.0000 0.0080 0 -0.1700 -0.0872 0.1153

vec(A3) 0 0.0388 0.0455 0.0536 0 0.1564 0.2198 0.26300 -0.0111 -0.0016 0.0083 0 0.0207 0.0286 0.04340 -0.0021 0.0017 0.0061 0 0.1454 0.2404 0.32740 -0.0034 -0.0014 0.0004 0 -0.1485 -0.0634 0.03840 -0.0119 -0.0009 0.0119 0 -0.0453 0.0107 0.04590 -0.0159 -0.0035 0.0081 0 0.0378 0.0787 0.15270 -0.0074 -0.0002 0.0074 Ψ 3 3.1668 3.2304 3.29540 -0.0020 0.0026 0.0071 0 -0.0393 0.0112 0.17000 -0.0098 0.0005 0.0122 2.5 1.8627 2.1368 2.47180 -0.0100 0.0001 0.0117 0 -0.0699 0.0350 0.23960 -0.0150 -0.0042 0.00820 -0.0070 0.0005 0.00880 -0.0097 -0.0004 0.01120 -0.0142 -0.0012 0.01060 -0.0100 0.0011 0.01210 -0.0098 0.0005 0.0130

ν 15 14.7539 15.7235 16.3972


Table 2.5.iii: Section 2.7.2: Distribution of the posterior mean of the parameters for IWSV model, Real data, across N = 100 samples,Rolling window

Distribution DistributionParameter Prior mean 2.5% C.I. mean 97.5% C.I. Parameter Prior mean 2.5% C.I. mean 97.5% C.I.vech(C) 0.3 0.3185 0.3531 0.3832 vec(Π1) 0.2 0.0400 0.1544 0.2780

0 -0.0371 -0.0066 0.0222 0 -0.0100 0.0065 0.02000 -0.0132 0.0068 0.0336 0 0.0165 0.0310 0.05840 -0.0169 -0.0063 0.0046 0 -0.0353 -0.0183 -0.0104

0.3 0.3064 0.3458 0.3836 0 -0.1644 0.1977 0.48250 -0.0209 -0.0008 0.0182 0.8 0.4511 0.5484 0.61190 -0.0208 -0.0110 0.0013 0 -0.0454 0.0289 0.0486

0.3 0.1913 0.2236 0.2495 0 -0.0092 0.0121 0.03320 -0.0312 -0.0093 0.0070 0 -0.2187 -0.0084 0.1595

0.3 0.0816 0.1003 0.1321 0 -0.0593 0.0253 0.1636vec(A1) 0.9 0.9910 0.9961 1.0023 0.8 1.0033 1.0719 1.1743

0 -0.0179 -0.0046 0.0081 0 -0.0058 0.0160 0.03170 -0.0027 0.0031 0.0101 0 -0.1244 0.0075 0.14310 -0.0110 -0.0077 -0.0054 0 -0.3676 -0.1666 0.06470 -0.0676 -0.0050 0.0482 0 -0.3851 -0.2110 -0.0573

0.9 0.8813 0.9029 0.9404 0.8 1.0663 1.1334 1.21860 -0.0137 0.0028 0.0188 vec(Π2) 0 0.0633 0.1755 0.28320 -0.0050 0.0027 0.0087 0 -0.0132 0.0032 0.01920 -0.0348 0.0104 0.0534 0 0.0151 0.0240 0.03210 -0.0157 0.0190 0.0502 0 -0.0240 -0.0170 -0.0113

0.9 0.8688 0.8838 0.8970 0 -0.5463 -0.2745 0.08460 -0.0285 -0.0216 -0.0121 0 0.0916 0.1512 0.25740 -0.0358 -0.0048 0.0249 0 -0.0128 0.0363 0.11000 -0.0418 -0.0103 0.0239 0 -0.0131 0.0066 0.02230 -0.0314 -0.0085 0.0217 0 -0.5758 -0.3562 -0.1639

0.9 0.7857 0.8007 0.8251 0 -0.1191 0.0599 0.1634vec(A2) 0 0.0331 0.0410 0.0511 0 -0.5216 -0.4447 -0.3696

0 -0.0321 -0.0015 0.0264 0 -0.0522 -0.0314 -0.01330 -0.0128 -0.0005 0.0094 0 0.0263 0.2298 0.49280 -0.0042 -0.0015 0.0012 0 0.0015 0.2249 0.34910 -0.0109 -0.0002 0.0082 0 -0.1277 -0.0247 0.08340 -0.0150 -0.0030 0.0086 0 -0.4164 -0.2914 -0.17170 -0.0107 -0.0012 0.0084 vec(Π3) 0 -0.2182 -0.0745 0.00150 -0.0036 0.0014 0.0050 0 0.0062 0.0231 0.04190 -0.0099 0.0005 0.0097 0 -0.0056 0.0027 0.01010 -0.0088 0.0004 0.0098 0 -0.0138 -0.0074 -0.00090 -0.0144 -0.0029 0.0110 0 -0.3420 -0.1868 -0.02810 -0.0086 -0.0011 0.0060 0 0.0505 0.1847 0.24790 -0.0095 0.0000 0.0097 0 -0.0340 -0.0042 0.02500 -0.0106 0.0004 0.0094 0 0.0027 0.0182 0.02980 -0.0093 0.0003 0.0088 0 -0.1851 0.0293 0.22040 -0.0079 0.0003 0.0094 0 -0.3163 -0.1617 0.0951

vec(A3) 0 0.0342 0.0414 0.0486 0 0.1535 0.2405 0.30260 -0.0263 0.0020 0.0289 0 0.0320 0.0602 0.09650 -0.0119 0.0006 0.0126 0 -0.0743 0.0807 0.29530 -0.0037 -0.0018 0.0003 0 -0.4598 -0.2316 0.14920 -0.0127 -0.0008 0.0116 0 -0.0681 0.0543 0.16600 -0.0135 -0.0021 0.0105 0 0.0391 0.0917 0.16400 -0.0102 -0.0010 0.0126 Ψ 3 2.9755 3.2153 3.43350 -0.0028 0.0012 0.0056 0 -0.0780 0.0484 0.25500 -0.0086 0.0000 0.0088 2.5 1.5758 1.8761 2.20940 -0.0084 0.0008 0.0093 0 -0.1144 -0.0045 0.20370 -0.0128 -0.0034 0.00410 -0.0060 0.0002 0.00560 -0.0117 -0.0014 0.00700 -0.0138 -0.0012 0.01030 -0.0110 0.0004 0.01040 -0.0080 0.0001 0.0088

ν 15 14.8556 17.0125 19.2372


Table 2.6.i: Section 2.7.2: Posterior distribution of the parameters for Clark model, Real data, 1st sample, Recursive windowPosterior Posterior

Parameter Prior mean 2.5% C.I. mean 97.5% C.I. Parameter Prior mean 2.5% C.I. mean 97.5% C.I.b21 0 -0.0035 0.0340 0.0706 vec(Π1) 0.2 0.0731 0.2563 0.4417b31 0 -0.0303 -0.0133 0.0013 0 -0.0431 0.0050 0.0535b32 0 0.0225 0.0537 0.0786 0 -0.0095 0.0040 0.0192b41 0 0.0298 0.0422 0.0543 0 -0.0445 -0.0275 -0.0113b42 0 0.0015 0.0285 0.0562 0 -0.1109 0.3320 0.7760b43 0 0.0221 0.0758 0.1242 0.8 0.3110 0.5186 0.7296ϕ1 doesn’t exist 0.2569 0.5142 0.9586 0 -0.0461 -0.0149 0.0154ϕ2 doesn’t exist 0.2716 0.5229 0.9416 0 -0.0440 -0.0043 0.0354ϕ3 doesn’t exist 0.3263 0.6611 1.2517 0 -0.8804 -0.3001 0.2954ϕ4 doesn’t exist 0.2419 0.4893 0.9293 0 -0.1741 0.1089 0.4020

0.8 1.1264 1.3160 1.50040 -0.0403 0.0288 0.08890 -0.7122 0.0270 0.76210 -0.8124 -0.2888 0.24110 -0.3009 -0.1286 0.0573

0.8 0.9702 1.1552 1.3412vec(Π2) 0 -0.1059 0.0762 0.2561

0 -0.0319 0.0255 0.08090 -0.0074 0.0065 0.02200 -0.0373 -0.0216 -0.00660 -0.7804 -0.2908 0.18380 -0.0849 0.1416 0.37130 -0.0565 -0.0275 0.00510 -0.0463 -0.0075 0.03050 -0.8763 -0.1833 0.51650 -0.4712 -0.0387 0.39790 -0.8819 -0.6029 -0.31660 -0.1286 -0.0536 0.02860 -0.4021 0.3384 1.07900 -0.4346 0.1610 0.75700 -0.0309 0.2028 0.43610 -0.7155 -0.4575 -0.1950

vec(Π3) 0 -0.3412 -0.1761 -0.00840 -0.0338 0.0140 0.06400 -0.0147 0.0006 0.01690 -0.0186 -0.0057 0.00690 -0.6458 -0.2553 0.15210 -0.0510 0.1404 0.31840 -0.0190 0.0039 0.03440 -0.0108 0.0225 0.05500 -0.4066 0.2528 0.90530 -0.3194 0.0160 0.36000 0.0828 0.2758 0.46620 -0.0340 0.0250 0.08530 -0.5465 0.1582 0.86850 -0.4387 0.0452 0.53770 -0.2472 -0.0772 0.08510 0.0155 0.1813 0.3413

Ψ 3 2.9053 3.2772 3.62300 -0.2167 0.1691 0.5383

2.5 1.0637 2.2502 3.52550 -0.0718 0.2747 0.6080


Table 2.6.ii: Section 2.7.2: Distribution of the posterior mean of the parameters for Clark model, Real data, across N = 100 samples,Recursive window

Distribution DistributionParameter Prior mean 2.5% C.I. mean 97.5% C.I. Parameter Prior mean 2.5% C.I. mean 97.5% C.I.b21 0 0.0271 0.0343 0.0413 vec(Π1) 0.2 0.2193 0.2416 0.2597b31 0 -0.0226 -0.0186 -0.0140 0 0.0055 0.0222 0.0398b32 0 0.0316 0.0400 0.0534 0 0.0032 0.0051 0.0073b41 0 0.0361 0.0403 0.0447 0 -0.0321 -0.0243 -0.0195b42 0 0.0185 0.0229 0.0317 0 0.0629 0.1623 0.3159b43 0 0.0807 0.0938 0.1072 0.8 0.5087 0.5298 0.5612ϕ1 doesn’t exist 0.3160 0.4001 0.5112 0 -0.0180 -0.0082 -0.0026ϕ2 doesn’t exist 0.3743 0.4450 0.5259 0 -0.0145 0.0080 0.0165ϕ3 doesn’t exist 0.5106 0.5640 0.6417 0 -0.2665 -0.1119 -0.0251ϕ4 doesn’t exist 0.3213 0.3985 0.5030 0 -0.0825 -0.0068 0.1260

0.8 1.2945 1.3557 1.40730 0.0015 0.0144 0.03710 -0.0477 -0.0011 0.04990 -0.3630 -0.3034 -0.23560 -0.1271 -0.1025 -0.0671

0.8 1.1025 1.1652 1.2153vec(Π2) 0 0.0829 0.1489 0.1932

0 0.0128 0.0194 0.02590 0.0060 0.0081 0.01020 -0.0233 -0.0178 -0.01380 -0.2946 -0.1851 -0.13300 0.1322 0.1765 0.21210 -0.0275 -0.0133 -0.00360 -0.0112 -0.0003 0.00700 -0.3450 -0.1816 -0.08260 -0.0759 0.0510 0.10850 -0.6551 -0.5973 -0.56030 -0.0594 -0.0419 -0.03440 0.2069 0.3600 0.47900 0.0497 0.2284 0.36070 0.1230 0.1734 0.20450 -0.4429 -0.3629 -0.2985

vec(Π3) 0 -0.1724 -0.1322 -0.09890 0.0030 0.0204 0.03410 -0.0018 0.0005 0.00290 -0.0083 -0.0024 0.00070 -0.2876 -0.2446 -0.19500 0.1273 0.1985 0.24130 0.0039 0.0079 0.01280 0.0165 0.0231 0.03160 0.0396 0.1384 0.27860 -0.1424 -0.0560 0.04110 0.1336 0.2178 0.27690 0.0236 0.0364 0.05590 0.0896 0.1705 0.26050 -0.1737 -0.0393 0.11130 -0.0852 -0.0694 -0.03890 0.0537 0.1045 0.1890

Ψ 3 3.1523 3.2340 3.34830 -0.1078 -0.0359 0.1655

2.5 1.9251 2.2429 2.62200 -0.0373 0.0878 0.3491


Table 2.6.iii: Section 2.7.2: Distribution of the posterior mean of the parameters for Clark model, Real data, across N = 100 samples,Rolling window

Distribution DistributionParameter Prior mean 2.5% C.I. mean 97.5% C.I. Parameter Prior mean 2.5% C.I. mean 97.5% C.I.b21 0 0.0116 0.0332 0.0508 vec(Π1) 0.2 0.1299 0.1831 0.2565b31 0 -0.0392 -0.0255 -0.0145 0 0.0010 0.0248 0.0539b32 0 -0.0440 -0.0130 0.0558 0 0.0062 0.0240 0.0516b41 0 0.0301 0.0360 0.0442 0 -0.0321 -0.0190 -0.0110b42 0 0.0053 0.0163 0.0315 0 -0.3065 0.0323 0.2289b43 0 0.0861 0.1064 0.1342 0.8 0.4393 0.5232 0.5880ϕ1 doesn’t exist 0.4483 0.4846 0.5617 0 -0.0245 0.0125 0.0449ϕ2 doesn’t exist 0.4900 0.5326 0.5793 0 -0.0093 0.0244 0.0372ϕ3 doesn’t exist 0.5823 0.6738 0.7938 0 -0.2616 -0.1561 -0.0367ϕ4 doesn’t exist 0.4382 0.4754 0.5246 0 -0.1113 -0.0291 0.1231

0.8 1.0559 1.1462 1.27490 -0.0004 0.0237 0.03500 -0.0360 0.0300 0.11740 -0.3795 -0.2548 -0.08740 -0.3983 -0.2249 -0.0638

0.8 1.0626 1.1299 1.2290vec(Π2) 0 0.0678 0.1641 0.2412

0 -0.0113 0.0101 0.03250 0.0004 0.0157 0.02580 -0.0215 -0.0162 -0.01290 -0.4334 -0.2454 -0.05040 0.1170 0.1688 0.26110 -0.0253 0.0368 0.07700 -0.0142 -0.0022 0.01720 -0.3906 -0.2491 -0.13710 -0.0668 0.0364 0.11100 -0.6243 -0.5301 -0.45090 -0.0531 -0.0336 -0.01880 0.0121 0.2567 0.45700 0.0297 0.2753 0.42410 -0.1188 0.0901 0.28900 -0.4588 -0.3234 -0.1831

vec(Π3) 0 -0.1717 -0.0732 -0.02880 0.0055 0.0223 0.04810 -0.0037 0.0067 0.01790 -0.0119 -0.0070 -0.00310 -0.2680 -0.1150 -0.00270 0.1423 0.2313 0.29250 -0.0007 0.0213 0.04960 0.0005 0.0146 0.02720 -0.0211 0.1319 0.26360 -0.1609 -0.0717 0.05000 0.1379 0.2394 0.32530 0.0311 0.0562 0.08330 0.0640 0.1231 0.19400 -0.4318 -0.1993 0.15930 -0.2045 -0.0389 0.16790 0.0603 0.1250 0.1950

Ψ 3 2.9878 3.2183 3.52110 -0.1435 0.0376 0.2415

2.5 1.4283 1.7693 2.09760 -0.1917 -0.0455 0.2607


Figure 2.23.i: IWSV, Posterior of parameters across N = 100 recursive sample windows

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

0.55

10 20 30 40 50 60 70 80 90 100

c 11

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

0.55

10 20 30 40 50 60 70 80 90 100

c 22

0.1

0.15

0.2

0.25

0.3

0.35

10 20 30 40 50 60 70 80 90 100

c 33

0

0.05

0.1

0.15

0.2

0.25

0.3

10 20 30 40 50 60 70 80 90 100

c 44

2.6

2.8

3

3.2

3.4

3.6

3.8

4

10 20 30 40 50 60 70 80 90 100

ψ1

0.5

1

1.5

2

2.5

3

3.5

4

10 20 30 40 50 60 70 80 90 100

ψ3

10

12

14

16

18

20

22

24

10 20 30 40 50 60 70 80 90 100

ν

Sample window 'n'

Prior meanPosterior mean

2.5% C.I.97.5% C.I.


Figure 2.23.ii: IWSV, Posterior of parameters across N = 100 rolling sample windows

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

0.55

10 20 30 40 50 60 70 80 90 100

c 11

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

0.55

10 20 30 40 50 60 70 80 90 100

c 22

0.1

0.15

0.2

0.25

0.3

0.35

10 20 30 40 50 60 70 80 90 100

c 33

0

0.05

0.1

0.15

0.2

0.25

0.3

10 20 30 40 50 60 70 80 90 100

c 44

2.6

2.8

3

3.2

3.4

3.6

3.8

4

10 20 30 40 50 60 70 80 90 100

ψ1

0.5

1

1.5

2

2.5

3

3.5

4

10 20 30 40 50 60 70 80 90 100

ψ3

10

12

14

16

18

20

22

24

10 20 30 40 50 60 70 80 90 100

ν

Sample window 'n'

Prior meanPosterior mean

2.5% C.I.97.5% C.I.


Figure 2.24.i: Clark, Posterior of parameters across N = 100 recursive sample windows

-0.02

0

0.02

0.04

0.06

0.08

0.1

10 20 30 40 50 60 70 80 90 100

b 21

-0.07-0.06-0.05-0.04-0.03-0.02-0.01

0 0.01 0.02

10 20 30 40 50 60 70 80 90 100

b 31

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

10 20 30 40 50 60 70 80 90 100

b 32

0

0.01

0.02

0.03

0.04

0.05

0.06

10 20 30 40 50 60 70 80 90 100

b 41

0

0.05

0.1

0.15

10 20 30 40 50 60 70 80 90 100

b 42

0 0.02 0.04 0.06 0.08 0.1

0.12 0.14 0.16 0.18

10 20 30 40 50 60 70 80 90 100

b 43

2.6

2.8

3

3.2

3.4

3.6

3.8

4

10 20 30 40 50 60 70 80 90 100

ψ1

0.5

1

1.5

2

2.5

3

3.5

4

10 20 30 40 50 60 70 80 90 100

ψ3


Figure 2.24.ii: Clark, Posterior of parameters across N = 100 rolling sample windows

-0.02

0

0.02

0.04

0.06

0.08

0.1

10 20 30 40 50 60 70 80 90 100

b 21

-0.07-0.06-0.05-0.04-0.03-0.02-0.01

0 0.01 0.02

10 20 30 40 50 60 70 80 90 100

b 31

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

10 20 30 40 50 60 70 80 90 100

b 32

0

0.01

0.02

0.03

0.04

0.05

0.06

10 20 30 40 50 60 70 80 90 100

b 41

0

0.05

0.1

0.15

10 20 30 40 50 60 70 80 90 100

b 42

0 0.02 0.04 0.06 0.08 0.1

0.12 0.14 0.16 0.18

10 20 30 40 50 60 70 80 90 100

b 43

2.6

2.8

3

3.2

3.4

3.6

3.8

4

10 20 30 40 50 60 70 80 90 100

ψ1

0.5

1

1.5

2

2.5

3

3.5

4

10 20 30 40 50 60 70 80 90 100

ψ3

by paul karapanagiotidis - university of toronto t-space · paul karapanagiotidis doctor of...

Documents