by paul karapanagiotidis - university of toronto t-space · paul karapanagiotidis doctor of...
TRANSCRIPT
ESSAYS IN FINANCIAL AND MACRO ECONOMETRICS
by
Paul Karapanagiotidis
A thesis submitted in conformity with the requirementsfor the degree of Doctor of Philosophy
Graduate Department of EconomicsUniversity of Toronto
c© Copyright 2014 by Paul Karapanagiotidis
Abstract
Essays in Financial and Macro Econometrics
Paul KarapanagiotidisDoctor of Philosophy
Graduate Department of EconomicsUniversity of Toronto
2014
Theory suggests that physical commodity prices may exhibit nonlinear features such as bubbles and various
types of asymmetries. Chapter one investigates these claims empirically by introducing a new time series model
apt to capture such features. The data set is composed of 25 individual, continuous contract, commodity futures
price series, representative of a number of industry sectors including softs, precious metals, energy, and livestock.
It is shown that the linear causal ARMA model with Gaussian innovations is unable to adequately account for
the features of the data. In the purely descriptive time series literature, often a threshold autoregression (TAR) is
employed to model cycles or asymmetries. Rather than take this approach, we suggest a novel process which is
able to accommodate both bubbles and asymmetries in a flexible way. This process is composed of both causal
and noncausal components and is formalized as the mixed causal/noncausal autoregressive model of order (r, s).
Estimating the mixed causal/noncausal model with leptokurtic errors, by an approximated maximum likelihood
method, results in dramatically improved model fit according to the Akaike information criterion. Comparisons of
the estimated unconditional distributions of both the purely causal and mixed models also suggest that the mixed
causal/noncausal model is more representative of the data according to the Kullback-Leibler measure. Moreover,
these estimation results demonstrate that allowing for such leptokurtic errors permits identification of various types
of asymmetries. Finally, a strategy for computing the multiple steps ahead forecast of the conditional distribution
is discussed.
Chapter two considers a vector autoregressive model (VAR) model with stochastic volatility which appeals to
the Inverse Wishart distribution. Dramatic changes in macroeconomic time series volatility pose a challenge to
contemporary VAR forecasting models. Traditionally, the conditional volatility of such models had been assumed
constant over time or allowed for breaks across long time periods. More recent work, however, has improved
forecasts by allowing the conditional volatility to be completely time variant by specifying the VAR innovation
variance as a distinct discrete time process. For example, Clark (2011) specifies the elements of the covariance
matrix process of the VAR innovations as linear functions of independent nonstationary processes. Unfortunately,
there is no empirical reason to believe that the VAR innovation volatility processes of macroeconomic growth se-
ries are nonstationary, nor that the volatility dynamics of each series are structured in this way. This suggests that
a more robust specification on the volatility process—one that both easily captures volatility spill-over across time
ii
series and exhibits stationary behaviour—should improve density forecasts, especially over the long-run forecast-
ing horizon. In this respect, we employ a latent Inverse Wishart autoregressive stochastic volatility specification
on the conditional variance equation of a Bayesian VAR, with U.S. macroeconomic time series data, in evaluating
Bayesian forecast efficiency against a competing specification by Clark (2011).
iii
Dedication
This thesis is dedicated to my wife Amanda, for all your love and support over the years.
Acknowledgements
This thesis owes a debt of gratitude to many people. Specifically, I would like to thank my supervisor, ChristianGourieroux, for all of his guidance throughout each step of the process. I was very lucky to have the opportunityto work with Christian and despite his busy schedule he always made time for me. Christian’s love for teachingalways shone through each time he was given the opportunity, prompted often by some eager student seekingknowledge.
I am also greatly indebted to John M. Maheu, for not only his previous advising, but also for being supportiveduring critical periods in my academic life. I’d also like to thank both Martin Burda and Angelo Melino for theirgenerous advice, support, and comments on my work.
Finally, I’d like to give a special thanks to Allan Hynes, who taught me that, without an appreciation of context,our understanding of economics can be no better than superficial.
iv
Contents
Introduction 1
1 Dynamic Modeling of Commodity Futures Prices 31.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Description of the asset and data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.2.1 The forward contract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.2.2 The futures contract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.2.3 The futures contract without delivery . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.2.4 Organization of the markets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.2.5 Example of a futures contract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.2.6 Data on the commodity futures contracts . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.2.7 Features of the price level series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.3 The linear causal ARMA model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.3.1 Test specifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.3.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.4 The linear mixed causal/noncausal process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.4.1 The asymmetries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
1.4.2 The purely causal representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
1.4.3 Other bubble like processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
1.5 Estimation of the mixed causal/noncausal process . . . . . . . . . . . . . . . . . . . . . . . . . 33
1.5.1 The mixed causal/noncausal autoregressive model of order (r, s) . . . . . . . . . . . . . 34
1.5.2 ML estimation of the mixed causal/noncausal autoregressive model . . . . . . . . . . . 36
1.5.3 Estimation results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
1.6 Comparison of the estimated unconditional distributions . . . . . . . . . . . . . . . . . . . . . 44
1.7 Forecasting the mixed causal/noncausal model . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
1.7.1 The predictive distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
1.7.2 Equivalence of information sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
1.7.3 Examples: the causal prediction problem of the noncausal process . . . . . . . . . . . . 49
1.7.4 A Look-Ahead estimator of the predictive distribution . . . . . . . . . . . . . . . . . . 51
1.7.5 Drawing from the predictive distribution by SIR method . . . . . . . . . . . . . . . . . 52
1.7.6 Application to commodity futures data . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
1.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
1.9 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Appendices 62
v
1.10 Appendix: Rolling over the futures contract . . . . . . . . . . . . . . . . . . . . . . . . . . . . 621.11 Appendix: Mixed causal/noncausal process . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
1.11.1 Strong moving average . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 641.11.2 Identification of a strong moving average representation . . . . . . . . . . . . . . . . . 651.11.3 Probability distribution functions of the stationary strong form noncausal representation 651.11.4 The causal strong autoregressive representation . . . . . . . . . . . . . . . . . . . . . . 661.11.5 Distributions with fat tails . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
1.12 Appendix: Approximation of the mixed causal/noncausal AR(r, s) likelihood . . . . . . . . . . 681.13 Appendix: Numerical algorithm for mixed causal/noncausal AR(r, s) forecasts . . . . . . . . . 701.14 Appendix: Tables and Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
2 Improving VAR forecasts through AR Inverse Wishart SV 842.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
2.1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 852.2 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 882.3 Model specifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
2.3.1 Benchmark model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 902.3.2 Alternative volatility process specification . . . . . . . . . . . . . . . . . . . . . . . . . 922.3.3 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
2.4 Priors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 962.4.1 VAR(J) priors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 972.4.2 Volatility model priors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
2.5 Model estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 992.6 Forecasts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
2.6.1 Point and interval forecasts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1012.6.2 Forecast comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
2.7 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1072.7.1 Monte-Carlo Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1072.7.2 Real data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
2.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1362.9 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
Appendices 1412.10 Appendix: Real data, LDL
′factorization of Σt . . . . . . . . . . . . . . . . . . . . . . . . . . 141
2.11 Appendix: Derivation of the posterior distributions needed for Gibbs sampling . . . . . . . . . . 1412.11.1 Definition of the parameters and priors . . . . . . . . . . . . . . . . . . . . . . . . . . 1432.11.2 Computation of the posterior distribution for the Clark (2011) volatility model . . . . . 1432.11.3 Computation of the posterior distribution for the IWSV model . . . . . . . . . . . . . . 149
2.12 Appendix: Tables and Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
vi
List of Tables
1.1 Commodity sectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.2 Commodity specifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.3 Summary statistics - commodity futures price level series . . . . . . . . . . . . . . . . . . . 24
1.4 ARMA estimation results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1.5.i Estimation results of mixed causal/noncausal AR(r, s) models . . . . . . . . . . . . . . . . 42
1.5.ii . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
1.5.iii . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
1.6 Kullback-Leibler divergence measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
1.7.i Lag polynomial roots of the mixed and benchmark models . . . . . . . . . . . . . . . . . . 73
1.7.ii . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
1.7.iii . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
2.1 Simulated data, Sample moments of the LPLh,n metrics across N = 100 sample windows . 117
2.2 Real data, Sample moments of the LPLh,n metrics across N = 100 sample windows . . . . 129
2.3.i Section 2.7.1: Posterior distribution of the parameters for IWSV model, Simulated data, 1stsample, Rolling window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
2.3.ii Section 2.7.1: Distribution of the posterior mean of the parameters for IWSV model, Simu-lated data, across N = 100 samples, Rolling window . . . . . . . . . . . . . . . . . . . . . 152
2.4.i Section 2.7.1: Posterior distribution of the parameters for Clark model, Simulated data, 1stsample, Rolling window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
2.4.ii Section 2.7.1: Distribution of the posterior mean of the parameters for Clark model, Simu-lated data, across N = 100 samples, Rolling window . . . . . . . . . . . . . . . . . . . . . 154
2.5.i Section 2.7.2: Posterior distribution of the parameters for IWSV model, Real data, 1st sample,Recursive window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
2.5.ii Section 2.7.2: Distribution of the posterior mean of the parameters for IWSV model, Realdata, across N = 100 samples, Recursive window . . . . . . . . . . . . . . . . . . . . . . . 156
2.5.iii Section 2.7.2: Distribution of the posterior mean of the parameters for IWSV model, Realdata, across N = 100 samples, Rolling window . . . . . . . . . . . . . . . . . . . . . . . . 157
2.6.i Section 2.7.2: Posterior distribution of the parameters for Clark model, Real data, 1st sample,Recursive window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
2.6.ii Section 2.7.2: Distribution of the posterior mean of the parameters for Clark model, Realdata, across N = 100 samples, Recursive window . . . . . . . . . . . . . . . . . . . . . . . 159
vii
2.6.iii Section 2.7.2: Distribution of the posterior mean of the parameters for Clark model, Realdata, across N = 100 samples, Rolling window . . . . . . . . . . . . . . . . . . . . . . . . 160
List of Figures
1.1 Plots of daily continuous contract futures price level series, Coffee with zoom . . . . . . . . 8
1.2 Coffee futures contracts, ICE exchange . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.3 Coffee futures with delivery in December 2013, ICE exchange . . . . . . . . . . . . . . . . 16
1.4 Plots of daily continuous contract futures price level series, Sugar and Lean hogs . . . . . . . 22
1.5 Soybean meal residuals from ARMA model . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.6 The mixed causal/noncausal model with Cauchy shocks . . . . . . . . . . . . . . . . . . . . 30
1.7 Plots of simulated bubble processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
1.8 Estimated unconditional densities, Cocoa and Coffee . . . . . . . . . . . . . . . . . . . . . 47
1.9 Forecast predictive density for Coffee futures price series . . . . . . . . . . . . . . . . . . . 55
1.10.i Plots of daily continuous contract futures price level series . . . . . . . . . . . . . . . . . . 76
1.10.ii . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
1.10.iii . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
1.10.iv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
1.11.i Histograms of daily continuous contract futures price level series . . . . . . . . . . . . . . . 80
1.11.ii . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
1.11.iii . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
1.11.iv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
2.1 Subsample sequence by recursive window . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
2.2 Subsample sequence by rolling window . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
2.3 Simulated sample paths yt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
2.4 Simulated stochastic volatilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
2.5 Simulated stochastic correlations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
2.6 IWSV, Posterior of ν across N = 100 sample windows . . . . . . . . . . . . . . . . . . . . 114
2.7.i IWSV, filtered latent volatility for 1st series, 1st sample window . . . . . . . . . . . . . . . . 115
2.7.ii IWSV, filtered latent volatility for 4th series, 1st sample window . . . . . . . . . . . . . . . 115
2.8 Simulated data, Forecast horizon term structure according to MLPLh,N metric, N = 100
sample windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
2.9 Simulated data, Sample window term structure according to difference of LPLh,n’s metric,h = 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
2.10 Histograms of the LPLh=10,n metrics across n = 1, . . . , N sample windows, 10th horizon . 117
2.11 Clark (2011) macroeconomic dataset, series and smoothed trends . . . . . . . . . . . . . . . 119
2.12 Clark (2011) macroeconomic dataset, detrended series . . . . . . . . . . . . . . . . . . . . . 120
2.13.i Largest eigenvalue of VAR(3) companion matrix, across N = 100 sample windows . . . . . 122
viii
2.13.ii Largest eigenvalue of the Υ matrix, across N = 100 sample windows . . . . . . . . . . . . 1222.14 IWSV and Clark, filtered latent stochastic volatilies, 100th sample window, Recursive window 1232.15.i Real data, IWSV model, filtered latent stochastic correlations for the complete sample, n = 100 1242.15.ii Real data, Clark model, filtered latent stochastic correlations for the complete sample, n = 100 1252.16 Real data, MSE comparison of VAR forecasts, % difference, both window types (below 0,
IWSV better) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1262.17.i Real data, Forecast horizon term structure according to MLPLh,N metric, N = 100 sample
windows, both Recursive and Rolling sample windows . . . . . . . . . . . . . . . . . . . . 1272.17.ii Real data, Forecast horizon term structure according to MLPLh,N metric, N = 100 sample
windows, includes homoskedastic vt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1272.18 Real data, Sample window structure of the difference of LPLh,n’s metric, h = 20 . . . . . . 1282.19 Histograms of the LPLh=20,n metrics across n = 1, . . . , N sample windows, 20th horizon . 1302.20 Real data, Sample window structure of the LPLh,n’s metric, h = 20 . . . . . . . . . . . . . 1312.21.i GDP growth series y1,t and forecast, IWSV and Clark models for vt, n = 100, Recursive
sample window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1322.21.ii Inflation growth series y2,t and forecast, IWSV andClarkmodels for vt, n = 100, Recursive
sample window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1332.21.iii Interest rate series y3,t and forecast, IWSV and Clark models for vt, n = 100, Recursive
sample window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1342.21.iv Unemployment rate series y4,t and forecast, IWSV and Clark models for vt, n = 100,
Recursive sample window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1352.22.i L21,t, time varying density . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1422.22.ii L31,t, time varying density . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1422.23.i IWSV, Posterior of parameters across N = 100 recursive sample windows . . . . . . . . . . 1612.23.ii IWSV, Posterior of parameters across N = 100 rolling sample windows . . . . . . . . . . . 1622.24.i Clark, Posterior of parameters across N = 100 recursive sample windows . . . . . . . . . . 1632.24.ii Clark, Posterior of parameters across N = 100 rolling sample windows . . . . . . . . . . . . 164
ix
Introduction
This thesis employs modern time series econometric techniques of nonlinear dynamic analysis to the fields of
both financial and macroeconomic data. In particular, we focus on some specific features of the data that include,
bubbles and cyclical features in the time evolution of financial data, as well as the mean reverting and co-dependent
nature of both macroeconomic indicator variables and their time varying volatility. This volatility can be described
as the so-called “Stochastic Volatility.”
We will examine some specific time series for which these features are important. For example, bubbles and
cyclical features in the time evolution of commodity prices are commonly modeled as either unit root or causal
nonlinear regime switching processes. Moreover, the time varying volatility of macroeconomic data, such as GDP
growth or inflation rates, is often considered as nonstationary.
Instead, we will consider some new types of dynamic models which are apt to capture these feature, but
which do not rely on the assumption of nonstationarity or direct time causality. For commodities data, these are
the class of “causal/noncausal” autoregressive models, which allow us to flexibly account for both longitudinal
and transversal asymmetry in the time evolution of both bubbles and cyclical features. This class of model
goes beyond the standard ARMA or ARCH models in number of ways. In fact, the possibility of longitudinal
asymmetry is unidentified in the standard linear causal ARMA model with Gaussian innovations. Moreover, the
causal/noncausal autoregressive model can exhibit a form of autoregressive conditional heteroskedasticity, for
example, the special case of the noncausal Cauchy autoregressive model exhibits these effects. Interestingly, the
mixed causal/noncausal model is linear in its mixed causality representation, but it is nonlinear in its equivalent
strictly causal form.
Moreover, we also propose a novel multivariate volatility specification for the time varying volatility of the
standard Vector Autoregressive (VAR) process common to macroeconomic time series analysis. This process
appeals to the properties of the Inverse Wishart distribution and allows us to capture co-dependence in the volatility
between series in a flexible way.
All these types of models involve nonlinear dynamics and latent factors. For the mixed causal/noncausal
model, the summaries of past and future paths form a nonlinear dynamic. Within the context of our macroe-
1
2
conomic VAR model, the latent factor is the volatility itself, which is unobserved given a stochastic volatility
specification driven by exogenous noise.
In general, we also face complicated causal forecast functions without closed form. There typically do not
exist closed forms for the likelihood function as well. Therefore, we apply appropriate numerical techniques
which are simulation based. For example, we generate forecasts from the causal predictive density of commodity
price data within a mixed causal/noncausal framework by appealing to the Look-Ahead estimator of the Markov
transition density and the Sampling Importance Resampling (SIR) algorithm. Alternatively, the macroeconomic
VAR forecasts under Inverse Wishart stochastic volatility are generated by employing a Bayesian approach which
relies on the Gibbs sampling method.
Chapter 1
Dynamic Modeling of Commodity Futures
Prices
1.1 Introduction
Financial theory has proposed general approaches for pricing financial assets and their derivatives, based on
arbitrage pricing theory [Ross (1976)], or equilibrium models: for example the Capital Asset Pricing Model
[Sharpe (1964)] or Consumption-Based Capital Asset Pricing Model [Breeden (1979)]. Traders have also relied
on technical analysis for insight into price movements [see e.g. Frost (1986)].
These approaches are generally applied separately on the different segments of the market, each segment
including a set of basic assets plus the derivatives written on these basic assets. These segments are used for
different purposes and can have very different characteristics. A standard example is the stock market, where the
basic assets are the stocks and the derivatives are both options written on the market index and futures written on
the index of implied volatility, called the VIX. These derivatives have been introduced to hedge and trade against
volatility risk. A large part of the theoretical and applied literature analyzes this stochastic volatility feature.
Another segment also widely studied is the bond market, including the sovereign bonds, but also the bonds
issued by corporations and the mortgage backed securities; the associated derivatives in this case are insurance
contracts on the default of the borrowers, such as Credit Default Swaps (CDS) or Collateralized Debt Obligations
(CDO). These derivatives have been introduced to manage the counterparty risks existing in the bond market.
This paper will focus on another segment; that is, the segment of commodities. This segment includes the spot
markets, derivatives such as the commodity futures with and without delivery, and derivatives such as options, puts
and calls, written on these futures.
3
CHAPTER 1. DYNAMIC MODELING OF COMMODITY FUTURES PRICES 4
This segment has special features compared to other segments, such as the stock market for instance. At least
three features make the commodity markets rather unique:
i) The basic assets are physical assets. There is a physical demand and a physical supply for these commodities
and by matching their demand and supply, we may define a “fundamental price” for each commodity. It
is known that the analysis of these fundamental prices can be rather complex even if it concerns the real
economy only. This is mainly a consequence of both shifts in demand and supply and of various interventions
to control the fundamental price of commodities. What follows are examples of such effects which differ
according to the commodity.
Cycles are often observed in commodity prices. They can be a consequence of costly, irreversible investment,
made to profit from high prices. For instance, farmers producing corn can substitute into producing cattle,
when grain prices are low. The production of milk (or meat) will increase and jointly the production of grain
will diminish. As a consequence the prices of milk (or meat) will decline, whereas the price of grain will
increase. This creates an incentive to substitute grain to cattle in the future and so forth, which introduces
cycles in the price evolution of both corn and cattle. Other substitutions between commodities can also
create a change of trend in prices. For example, the development of alternative fuel derived from soy created
a significant movement in soy prices.
These complicated movements can also be affected by different interventions to sustain and/or stabilize the
prices. The interventions can be done by governments (e.g. U.S., or European nations) for agricultural
commodities, as well as by (monopolistic or oligopolistic) producers such as the Organization of Petroleum
Exporting Countries (OPEC) for petroleum production or the De Beers company for diamonds. The real
demand and supply will affect the spot prices and futures contracts with delivery.
ii) Recently the commodity markets have also experienced additional demand and supply pressures by finan-
cial intermediaries. These intermediaries are not interested in taking delivery of the underlying products
upon maturity and are only interested in cashing in on favourable price changes in the futures contracts.
This behaviour betrays the original purpose of the futures markets which was to enable both producers and
consumers to hedge against the risk of future price fluctuations of the underlying commodity.
To try to separate the market for the physical commodity from simply gambling on their prices, purely
intangible assets have been introduced that are the commodity futures without delivery. Thus the market for
commodity derivatives has been enlarged. As usual, the speculative effect is proportional to the magnitude
and importance of the derivative market. This speculative effect is rather similar to what might be seen in the
markets for CDS or on the implied volatility index (VIX).
CHAPTER 1. DYNAMIC MODELING OF COMMODITY FUTURES PRICES 5
iii) The different spot and futures markets for commodities are not very organized and can involve a small number
of players and very often feature a lack of liquidity.
The economic literature mainly focuses on two features of commodity prices, that are their cross-sectional
and serial heterogeneity, respectively. Below, we will discuss the literature specific to each. The cross-sectional
analysis tries to understand how the prices of futures contracts with delivery are related with the spot prices, or to
explain the difference between the prices of futures with and without delivery. The analysis of the serial hetero-
geneity of prices focuses on the nonlinear dynamic features due to either the cycles and rationing effects coming
from the real part of the market, or the speculative bubbles created by the behaviour of financial arbitrageurs.
The questions above can be considered from either a structural, or a descriptive point of view. A “structural”
approach attempts to construct a theoretical model involving the relevant economic variables of interest which may
be important in explaining relationships which drive commodity spot and futures prices. The descriptive approach
does not explain “why” these series exhibit particular features, but rather provides a framework to estimate the
relationships between the prices, make forecasts, and price the derivatives.
What follows is a discussion on how these two approaches above have been addressed in the literature.
i) Cross-sectional heterogeneity
The study of cross-sectional heterogeneity of commodity futures prices has its roots in both the theory of
normal backwardation and the theory of storage. The Keynesian theory of normal backwardation implies a greater
expected future spot price than the current futures contract price, assuming that producers are on net hedgers and
that speculators, in order to take on the risk offered by producers, must be offered a positive risk premium.
Of the two theories, the theory of storage has probably had the greater influence. Instead of focusing on the net
balance of trader’s positions as in the theory of normal backwardation, the theory of storage focuses on how the
levels of inventory, that is the “stocks,” of the underlying commodities affect the decisions of market participants.
Inventories play an important role since it is known that both the consumption and supply of many commodities
are inelastic to price changes. For example, it is known that gasoline and petroleum products are everyday neces-
sities and both their consumption and production adjust slowly to price changes. Moreover, given real supply and
demand shocks the inelastic nature of these markets can lead to wild price fluctuations. Therefore, the role of in-
ventories is important in buffering market participants from price fluctuations, by avoiding disruptions in the flow
of the underlying commodities, and by allowing them to shift their consumption or production intertemporally.
The cost of storage is essentially a “no arbitrage” result. Let the difference of the current futures price and the
spot price be known as the basis. If the basis is positive, it must necessarily equal the cost of holding an inventory
into the future, known as the cost of carry, since otherwise a trader could purchase the good on the spot market,
enter into a futures contract for later delivery, and make a sure profit (or loss). From the reverse point of view, the
CHAPTER 1. DYNAMIC MODELING OF COMMODITY FUTURES PRICES 6
basis could never be negative since holders of inventories could always sell the good at the spot price, and enter a
futures contract to buy at the lower price, with no cost of carry.
However, empirical examination of the basis reveals that it is often negative. Kaldor (1939) was the first to
suggest a solution to this problem known as the convenience yield. The convenience yield measures the benefit of
owning physical inventories, rather than owning a futures contract written on them. When a good is in abundance,
an investor gains little by owning physical inventories. However, when the good is scarce, it is preferable to hold
inventories. Therefore, in equilibrium the basis should be equal to the difference between the cost of carry and
the convenience yield, permitting the basis to be negative when inventories are scarce.
Working (1933,1948,1949) used the theory of storage to describe the relationship between the price of storage
and inventories for the wheat market, called the “Working curve” or the storage function. The Working curve is
positively sloped and for some positive threshold storage level, relates inventories to the costs of storing them;
however, below this positive threshold of inventories, the function takes on negative values, illustrating that posi-
tive inventories can be held even when the returns from storage are negative, thereby incorporating the notion of
Kaldor’s convenience yield into the storage function.
Later work generalized these results in considering motivations for both storage behaviour and the convenience
yield. For example, Brennan (1958) considered storage from the speculative point of view, suggesting that on the
supply side, in addition to cost of storage, we expand the notion of the convenience yield to include a risk premium
to holders of inventories who may speculate upon, and benefit from, a possible rise of demand on short notice.
Modern structural models distinguish between what is the fundamental price connected with the underlying
physical supply and demand, from the cost of storage and any speculation. For example, in looking at oil price
speculation, Knittel and Pindyck (2013) address what is meant by the notion of “oil price speculation” and how
it relates to investment in oil reserves, inventories, or derivatives such as futures contracts. Although the price of
storage is not directly observed, it can be determined from the spread between futures and spot prices. In their
model there are two interrelated markets for a commodity: the cash market for immediate or “spot” purchase/sale,
and the “storage market” for inventories. The model attempts to distinguish between the physical supply and
demand market and the effect of speculators on both the futures and spot prices.
Other structural work on the basis has employed the CAPM model. For example Black (1976) studied the
nature of futures contracts on commodities, suggesting that the capital asset model of Sharpe (1964) could be
employed to study the expected price change of the futures contract. Dusak (1973) also studied the behaviour of
futures prices within a model of capital market equilibrium and found no risk premium for U.S. corn, soybeans,
and wheat futures between 1952 and 1967. Breeden (1979) developed the consumption CAPM model which
allowed us to consider the futures price as composed of both an expected risk premium and a forecast of the
future spot price.
CHAPTER 1. DYNAMIC MODELING OF COMMODITY FUTURES PRICES 7
Econometrically, Fama and French (1987) found evidence that the response of futures prices to storage-cost
variables was easier to detect than evidence that futures prices contain premiums or power to forecast spot prices.
Other econometric work has been purely descriptive in attempting to model the basis process itself. For
example, Gibson and Schwartz (1990) model the convenience yield as a mean reverting continuous time stochastic
process, where the unconditional mean represents the state of inventories which satisfy industry under normal
conditions.
The cost of storage also imposes a natural constraint on inventories in that they cannot be negative; this
has effects which show up empirically. For example, inventory levels and the basis tend to share a positive
relationship as the theory of storage and convenience yield would suggest. Brooks et al. (2011) employ actual
physical inventory levels data on 20 different commodities between 1993-2009 and show that inventory levels are
informative about the basis, so that when inventories are low the basis is possibly negative (and vice versa). They
also find that futures price level volatility is a decreasing linear function of inventories so that when the basis is
negative, price volatility is higher. Empirical evidence also suggests that the basis behaves differently when it is
positive versus when it is negative. For example, Brennan (1991) expanded the work of Gibson and Schwartz
(1990) by incorporating the non-negativity constraint of inventories and so the convenience yield is downward
limited.
Finally, there is econometric evidence that corroborates Brennan (1958) above. Sigl-Grub and Schiereck
(2010), employ commitment of traders information on 19 commodity futures contracts between 1986 and 2007
(using the commitment of traders information as a proxy for speculation) and find that the autoregressive persis-
tence of futures returns processes tend to increase with speculation.
ii) Price dynamics
Another part of the literature tries to understand the nonlinear dynamic patterns observed in futures prices
that can manifest as either cycles or speculative bubbles. Generally, we observe more or less frequent successive
peaks and troughs in the evolution of prices. These peaks and troughs have non standard patterns which can
be classified according to the terminology in Ramsey and Rothman (1996) where they distinguish the concepts
of “longitudinal” and “transversal” asymmetry. The notion of longitudinal asymmetry employed in Ramsey and
Rothman (1996) builds upon other previous work, for example the study of business cycle asymmetry from Neftci
(1984).
Longitudinal asymmetry refers to asymmetry where the process behaves differently when traveling in direct
time versus in reverse time. For example, longitudinal asymmetry may manifest as a process where the peaks
rise faster than then they decline (and behaves in the opposite way in reverse). Figure 1.1 provides a plot which
illustrates these features for the coffee price level, continuous futures contract without delivery. In the right panel
(which provides a zoom) we can see how the peaks tend to rise quickly, but take a long time to decline into the
CHAPTER 1. DYNAMIC MODELING OF COMMODITY FUTURES PRICES 8
trough.
Transversal asymmetry is characterized by different process dynamics above and below some horizontal plane
in the time direction; that is, in the vertical displacement of the series from its mean value. For example, the
coffee process also exhibits transversal asymmetry in that the peaks in the positive direction are very sharp and
prominent, while the troughs are very drawn out and shallow (again see Figure 1.1 right panel). So, a series can
be both longitudinally and transversely asymmetric.
Figure 1.1: Plots of daily continuous contract futures price level series, Coffee with zoom
0
50
100
150
200
250
300
350
08/0
119
77
09/0
119
82
09/0
119
87
10/0
119
92
11/0
119
97
12/0
120
02
01/0
120
08
02/0
120
13
Coffee from 07/18/1977 to 02/08/2013
0
50
100
150
200
250
300
350
05/0
119
96
10/0
119
97
03/0
119
99
08/0
120
00
01/0
120
02
06/0
120
03
11/0
120
04
Coffee (zoom)
The theoretical literature has been able to derive price evolutions with such patterns as a consequence of self-
fulfilling prophecies. The initial rational expectation (RE) models were linear: the demand is a linear function
of the current expected future prices and exogenous shocks on demand, and the supply is a linear function of the
current price and of supply shocks. In this way we can consider the path of equilibrium prices. Muth (1961) was
the first to employ such a framework which incorporated expectations formation directly into the model.
Since the equilibrium in RE models is both with respect to prices and information, these models have an infi-
nite number of solutions, even if the exogenous shocks have only linear dynamic features. Some of these solutions
have nonlinear dynamic features which are similar to the asymmetric bubble patterns described above. Among
these solutions featuring bubbles, some can exhibit isolated bubbles and others can demonstrate a sequence of
repeating bubbles. For example, Blanchard (1979) and Blanchard and Watson (1982) derived RE bubble models
for the stock market which presumed the price process is composed of both the fundamental competitive market
solution for price 1 plus a nonstationary martingale component that admits a rational expectation representation
[Gourieroux, Laffont, and Monfort (1982)], but exhibits bubble like increases or decreases in price. Blanchard and
1That is, where price is the linear present value of future dividends.
CHAPTER 1. DYNAMIC MODELING OF COMMODITY FUTURES PRICES 9
Watson (1982) described a possible piecewise linear model for the martingale bubble component which spurred
later authors to test statistically for the presence of this component. Later, Evans (1991) suggested that such
econometric tests may be limited in their ability to detect a certain important class of rational bubbles which
exhibit repeating explosive periods.
Generally these basic modeling attempts were focused on the stock market and it is not clear what analog there
is (if any) of the “fundamental” price of the futures contract without delivery. Moreover, they take into account
only the expected prices, not the level of volatility, and they incorporate linear functions for the price, and so the
solution may not be unique.
More recent RE models have exhibited features consistent with the asymmetries discussed above with regards
to both Ramsey and Rothman (1996) and the cost of storage models, and the natural asymmetry which occurs since
inventories cannot be negative. For example, Deaton and Laroque (1996) construct a RE model of commodity
spot prices, in which they generate a “harvest” process2 which drives a competitive price in agricultural markets
composed of both final consumers and risk-neutral speculators. From an intertemporal equilibrium perspective,
when the price today is high (relative to tomorrow) nothing will be stored so there will be little speculation;
however, when the price tomorrow is high (relative to today), speculation will take place and storage will be
positive. Because inventories cannot be negative, the market price process under storage will follow a piecewise
linear dynamic stochastic process.
Moreover, both theory and evidence suggests that RE models might take the form of a noncausal process. For
example, Hansen and Sargent (1991) showed that if agents in the commodity futures market can be described by
a linear RE model, and have access to an information set strictly larger than that available to the econometrician
modeling them, then the true shocks of the moving average representation that describe the RE equilibrium process
will not represent the shocks the econometrician estimates given a purely causal linear model. In fact, the shocks
of the model will have a non-fundamental representation and we say that the model is at least partly “noncausal.”
Of course, modeling a process as partly noncausal does not imply that agents somehow “know the future.” Rather,
it simply represents another equivalent linear representation.
Through simulation studies, Lof (2011) also showed that if we simulate the market asset price from both an
RE model with homogenous agents and that from a model with boundedly rational agents with heterogenous
beliefs [based on the model by Brock and Hommes (1998)], and then estimate both a purely causal model and a
model with a noncausal component on this data (given that the econometrician has full information) we find that
on average the rational expectations model is better fit by the causal model, while the heterogenous agents model
is better fit by a noncausal model.
2The process may possibly be serially correlated. The authors discuss at length the major differences that occur in the model dynamicswhen harvests are i.i.d. versus serially correlated.
CHAPTER 1. DYNAMIC MODELING OF COMMODITY FUTURES PRICES 10
Given these features, the time series literature has rapidly realized that the standard linear dynamic models,
that is, the autoregressive moving average (ARMA) processes with Gaussian shocks, are not appropriate for
representing the evolution of either commodity spot or futures prices. Indeed, they are not able to capture the
nonlinear dynamic features due to asymmetric cycles and price bubbles described above. For describing the
cycles created through the dynamics of investment between two substitutable commodities among producers (see
the discussion of the example of cattle vs. grain above), it is rather natural to consider an autoregressive model
with a threshold, that is, the threshold autoregressive model (TAR) introduced by Tong and Lim (1980) in the
time series literature. Indeed, the cycles associated with substitutable products are in some ways analogous to
the predator-prey cycle for which the TAR model was initially introduced. The TAR model has been applied on
commodity prices to study the integration between corn and soybean markets in North Carolina by Goodwin and
Piggotts (2001) and U.S. soybeans and Brazilian coffee by Ramirez (2009) to compare the asymmetry of such
cycles.
Contribution of the paper
Our paper contributes to the empirical literature on commodity futures prices by implementing nonlinear dynamic
models apt to reproduce the patterns of speculative bubbles observed on the commodity price data. To focus on
speculative bubbles and not on the underlying cycles of the fundamental spot price, we consider the continuous
contract futures price series available from Bloomberg on which it is believed that the speculative effects will be
more pronounced. We propose to analyze such series by means of the mixed causal/noncausal models where the
underlying noise defining the process has fat tails. Indeed, it has been shown in Gourieroux and Zakoian (2012)
that such models can be used to mimic speculative bubbles, or more generally peaks and troughs with either
longitudinal or transversal asymmetry. The estimation of such mixed models will be performed on 25 different
physical commodities, across five different industrial sectors, to check for the robustness of this modeling.
The rest of the paper is as follows. Section 1.2 discusses the details of the futures contracts including the
underlying commodities, the markets they are traded in, and the features of the data series themselves includ-
ing summary statistics. Section 1.3 shows that the linear causal ARMA models with Gaussian innovations are
unable to adequately capture the structure of this commodity data. Section 1.4 introduces the theory of mixed
causal/noncausal processes, and discusses the special case of the noncausal Cauchy autoregressive process of
order 1. This section also demonstrates how the mixed causal/noncausal process can accommodate both asym-
metries and bubble type features. Section 1.5 then introduces the mixed causal/noncausal autoregressive model
of order (r, s) and discusses its estimation by approximated maximum likelihood. Section 1.5.2 then details the
results of estimating the mixed causal/noncausal autoregressive model to the commodity futures price level data.
Section 1.6 then compares the estimated unconditional distributions of both the purely causal and mixed models
according to the Kullback-Leibler measure. Section 1.7 then considers the appropriate method for forecasting the
CHAPTER 1. DYNAMIC MODELING OF COMMODITY FUTURES PRICES 11
mixed causal/noncausal model given data on the past values of the process and applies this method to forecast
the futures data. Finally, the technical proofs and the other material related to the data series are gathered in the
appendices.
1.2 Description of the asset and data
1.2.1 The forward contract
A forward contract on a commodity is a contract to trade, at a future date, a given quantity of the underlying good
at a price fixed in advance. Such a forward contract will stipulate:
The names of those entering into the contract, i.e. the buyers and sellers.
The date at which the contract is entered into at some time t.
The date at which the contract matures at some future time t+ h.
The forward delivery price ft,t+h, negotiated and set in the contract at time t to be paid at the future time
t+ h.
The monetary denomination of the contract.
The characteristics and quality of the underlying good, often categorized by pre-specified “grades.”
The amount and units of the underlying good; typically commodity contracts will stipulate a number of
predefined base units e.g. 40,000 lbs of lean hogs.
Whether the good is to be delivered to the buyers upon maturity at time t+h (otherwise the buyer will have
to pick up the good themselves).
It will also specify the location of delivery if applicable and the condition in which the good should be
received.
Historically, such forward contracts were introduced to serve an economic need for producers or consumers to
be able to hedge against the risk of price fluctuations in which they sell or purchase their products. For example,
a producer of wheat might be subject to future supply and demand conditions that are unpredictable. As such a
risk adverse producer would enter into a forward contract which would ensure a stable price at a certain date in
the future for their products. Therefore, despite whether the price of their product rises or falls they can be certain
of receiving the forward price. As another example, consider the consumer’s side of the problem, where an airline
CHAPTER 1. DYNAMIC MODELING OF COMMODITY FUTURES PRICES 12
company wishes to guarantee a stable future price for inputs, e.g. jet fuel, in order to provide customers with
relatively unchanging prices of their outputs i.e. airline tickets.
Such traditional forward contracts still exist as bilateral agreements between two parties, sold on so called
“over the counter” (OTC) markets. These contracts still fulfill an important role for certain groups, for example
large organizations such as national governments, since the parties involved are unlikely to default on their end
of the contract. However, if the investor is not sure of the financial integrity of the opposite party, such a forward
contract is by construction subject to counterparty risk. Therefore, as opposed to nations which have the power to
recover from counterparty loses and are self insured, contracts catering to other types of investors must somehow
incorporate an insurance scheme into the contract itself to accommodate counterparty risk.
Counterparty risk presents itself as the forward contract approaches maturity since if the forward price is
below (resp. above) the spot price, ft,t+h < (resp. >)pt+h, then the contract is profitable only to the buyer (resp.
seller), except if the seller (resp. buyer) defaults.
1.2.2 The futures contract
A futures contract on a commodity is a forward contract, but with an underlying insurance in place against possible
counterparty risk. The insurance is paid by means of insurance premia, called “margin” on the futures markets.
There is an initial premium or initial margin, and intermediary premia, or “margin calls.”
Therefore a futures contract with delivery contains the same information and contractual stipulations as the
forward contract. It still represents an agreement to either buy or sell some underlying good at a future date, given
a predetermined “futures price” Ft,t+h set at time t today. However, in addition it will also specify a margin
scheme which:
Stipulates the initial margin; that is the amount each trader must first put up as collateral to enter into futures
contracts.
Implements a mechanism whereby the margin account balance is maintained at a certain level sufficient to
cover potential losses. If the margin account balance drops below a threshold amount, the trader is obliged
to put up more collateral, known as the margin call.
Generally, the price of a futures contract with delivery, Ft,t+h, differs from the price of a similar forward
contract ft,t+h, since it must account for the price of the underlying insurance against counterparty risks.
A futures contract requires the presence of an “insurance provider” usually either a broker, or a clearing house.
This provider will fix the margin rules for both the buyer and seller and manage a reserve account to be able to
CHAPTER 1. DYNAMIC MODELING OF COMMODITY FUTURES PRICES 13
hedge the counterparty risks in case of default of either party unable to fulfill margin calls.3
Of course, the clearing house plays a second very important role: namely that of “clearing the market” by
trying to match demand and supply between buyers and sellers of contracts. As a consequence, the clearing house
facilitates the formation of futures prices Ft,t+h as equilibrium prices. Therefore, we must distinguish between
brokers themselves who act as intermediaries, and the clearing house and brokering platforms which also serve a
more central purpose.
Finally, if the date and magnitude of the margin calls were known at the date of the futures’ contract issue,
the contract with delivery would simply reflect a portfolio (or sequence) of forward contracts which are renewed
each day [Black (1976)]. However, the margin calls are fixed by the brokers or the clearing house according to
the evolution of the risk, i.e. to the observed evolution of the spot prices, but also to the margin rules followed by
their competitors and so the interpretation as a portfolio of forwards is no longer valid.
1.2.3 The futures contract without delivery
In the market for futures with delivery, historically some intermediaries or investors have demonstrated that they
are not in the market simply to buy or sell physical goods for future delivery and that they do not actually take
delivery of the underlying physical good. Rather these investors are in the market simply to speculate on the future
price of the contract.
Given this trend, futures contracts without delivery have been introduced where instead of taking delivery of
the commodity they receive cash. Without delivery of a physical good, the derivative product becomes a purely
“financial” asset. Therefore there has been an attempt to separate these two types of instruments: a financial
market designed purely for speculative purposes and a “real” market that provides a mechanism for both producers
and consumers to hedge against the risk of price fluctuations.
This trend towards differentiation of futures with and without delivery was designed to suppress the effect that
speculation may have on the spot price of the underlying good. For example, traders who are in a loss position
may be unable to offset their positions rapidly enough as maturity of the futures contract with delivery approaches.
Given this situation they are forced to purchase or sell the underlying good in the spot markets in order to meet
their contractual obligation. If many traders are in this situation simultaneously and on the same side of the
market, the effect could have a dramatic impact on the spot price.
3There also exists a counterparty risk of the insurance provider itself. For instance, in 1987 the clearing house for commodity futures inHong Kong defaulted. This “double default” counterparty risk is not considered in our analysis.
CHAPTER 1. DYNAMIC MODELING OF COMMODITY FUTURES PRICES 14
1.2.4 Organization of the markets
In recent years, the futures commodity markets have become more organized. There is standardization of the
financial products and the margin rules. For example the Standard Portfolio Analysis of Risk (SPAN) system has
become common place as an instrument to determine the margin levels (both the clearing houses associated with
the Chicago Mercantile Exchange (CME) and Intercontinental Exchange (ICE) have adopted its use). The system
represents a computational algorithm which determines each trading day the risk for each commodity future by
scanning over sixteen different possible price and volatility scenarios given the time to maturity of the contract.
The sixteen scenarios consider various possible gains or losses for each futures contract, with each gain or loss
classification representing a certain fraction of the margin ratio.4 The results of these tests are used to define the
appropriate margin call requirements for the different participants. Even if the SPAN methodology is a standard
one, the choice of the risk scenarios depends on the clearing house. Finally, the SPAN system is not perfect and
is likely to be modified in the near future. See for example, the “CoMargin” framework discussed in Cruz Lopez
et al. (2013).
Interestingly, the OTC forward markets are slowly becoming more organized like the futures markets. For
example the European Market Infrastructure Regulation (EMIR) that entered into force on August 16, 2012, was
designed to promote the trading of standardized forward contracts on exchanges or electronic trading platforms
which are cleared by central counterparties and non-centrally cleared contracts should be subject to higher capital
requirements. Generally there is concern that the clearing houses need to play a larger role in their function of
mitigating counterparty risk, especially as it pertains to large valued contracts which could effect the economic
base if they were left to default.5
1.2.5 Example of a futures contract
Figure 1.2 provides an example of a set of futures contracts with delivery written on coffee and traded on the
ICE exchange.6 There are different contracts available for different maturities, which are listed on the far left
column. Coffee production generally occurs in both the northern and southern hemispheres – there is a northern
harvest taking place between October and January and a southern harvest between May and September. Given
these differing harvests, coffee futures mature every two months from March to September and every three months
onward until the following March. Furthermore, there exist contracts currently available for purchase that mature
4See https://www.theice.com/publicdocs/clear_us/SPAN_Explanation.pdf available on the ICE exchange web-site.
5However, having the clearing house play a more predominant role also raises concerns over systemic risk – that is,could clearing houses themselves become “too big to fail” institutions? See the H. Plumridge (December 2nd, 2011) ,“What if a clearing house failed?,” Wall Street Journal, accessed Sept. 20, 2013 at http://online.wsj.com/article/SB10001424052970204397704577074023939710652.html.
6The chart is provided by TradingCharts.com at http://tfc-charts.w2d.com/marketquotes/KC.html.
CHAPTER 1. DYNAMIC MODELING OF COMMODITY FUTURES PRICES 15
quite far into the future. For example, the coffee future contract currently with the longest time to maturity is the
contract for March 2016 delivery.
The date this chart was accessed is also given as September 19th, 2013. Therefore, when we speak of the
futures price Ft,t+h, within the context of our model with daily data (see the data section below) the time t would
be the current date given above, and the period h would represent the number of trading days until the contract
matures. Such contracts with delivery stipulate a last trading day which is typically the last business day prior
to the 15th day of the given contract’s maturity month. For instance, given the December 2013 contract, the last
business day before December 15th will fall on Friday December 13th, 2013 (resp. Friday March 14th, 2014;
Thursday May 15th, 2014; etc; for the subsequent contracts).
The “open,” “high,” “low,” and “last,” describe the intraday trading activity of the current trading session; that
is, the opening price, the highest and lowest prices, and the last price paid, respectively. The table also displays the
last change in price, the current volume of trades, and the set price and open interest from the last trading session
of the prior day. “Open interest” (also known as open contracts or open commitments) refers to the total number
of contracts that have not yet been settled (or “liquidated”) in the immediately previous time period, either by an
offsetting contractual transaction or by delivery. Therefore, a larger open interest can complement the volume
measure in interpreting the level of liquidity in the market. As contracts approach maturity, both the volume and
open interest levels tend to rise; contracts with very distant times to maturity are not very liquid.
Figure 1.2: Coffee futures contracts, ICE exchange
CHAPTER 1. DYNAMIC MODELING OF COMMODITY FUTURES PRICES 16
Figure 1.3 provides a candlestick plot of the typical intraday trading activity between September 13th, 2013,
and September 19th, 2013, for the coffee future contract with delivery in December 2013. Note that trading
does not occur 24 hours a day (rather the trading day takes place between 8:30AM-7:00 PM BST7) and so there
are discontinuities in the price series. The thin top and bottom sections of the candlestick, called the shadows,
represent the high and low prices, and the thick section called the real body, denotes the opening and closing
prices. Each candlestick describes trading activity over a 30 minute period.8
Figure 1.3: Coffee futures with delivery in December 2013, ICE exchange, intraday price $ US
1.2.6 Data on the commodity futures contracts
The continuous contract
The discussion above illustrates some of the difficulties in analyzing price data for derivative products. For
example, many of the products are very thinly traded with low liquidity. Moreover, some products may only be
available on one trading platform and not another. For example, many futures contracts with delivery are available
mutually exclusively either on CME, or the ICE, and their associated clearing houses do not necessarily follow7British Summer Time as the ICE exchange is located in London, England.8There are 21 candlesticks each day, representing the 10.5 opening hours.
CHAPTER 1. DYNAMIC MODELING OF COMMODITY FUTURES PRICES 17
identical margin schemes. Also, OTC product data may only be available through certain brokers proprietary
trading platforms.
Perhaps the most consequential problem we face in attempting to analyze futures contracts data is that the
individual contracts of various maturities will eventually expire and so we need a method whereby we can “extend”
the futures price series indefinitely. However, even in accomplishing this task we must consider that the contracts
of various maturities, while written on the same underlying good are not quite the same “asset” and so the asset
itself is changing over time. Therefore, we need some method to, not only extend the series, but to standardize the
price measurements across time and maturity, and ensure that when we construct the series we are taking prices
which are relevant, e.g. with sufficient liquidity to be appropriately representative, deriving in essence a new asset
that no longer matures. In doing so we would also like to be able to bring together information on prices available
from different trading platforms in one place.
The Bloomberg console offers a solution to this problem by amalgamating futures data for delivery from both
the ICE and CME exchanges into one system. Bloomberg also offers what is called called a continuous contract
which mimics the behaviour of a typical trader who is said to “roll over” the futures contract as it approaches
maturity. “Rolling over” refers to the situation where a trader would close out, or “zero,” their account balance
upon the approach of a futures contract’s maturity, if they do not intend on taking delivery, by first purchasing an
offsetting futures contract and then simultaneously reinvesting in another future with a further expiration month.
In this way, an artificial asset is created which tracks this representative trader’s futures account holdings across
time indefinitely. Details on how this is accomplished, as well as other methods that can be employed, are outlined
in Appendix 1.10. Users of the Bloomberg console can customize criteria which define the rollover strategy, e.g.
volume of trades or open interest; in this paper we choose to employ the continuous contract that mimics the
rolling over of the futures contract with the shortest time to maturity known as the “front month” contract.
Industry sectors
We will consider a number of physical commodity futures contracts for a broad range of products. The commodi-
ties are divided into various industry sectors that are expected to behave similarly to each other. The industry
sectors are given in Table 1.1.
Within each futures contract itself there are specified a number of different product grades. At the exchange
level it is determined that any products which match pre-specified grade criteria are considered part of the same
futures contract. This is to promote standardization of contracts and volume of trades. For example, the coffee
future discussed above is specified on the ICE exchange as the “Coffee C” future with exchange code KC. This
future allows a number of grades and a “Notice of Certification” is issued based on testing the grade of the
CHAPTER 1. DYNAMIC MODELING OF COMMODITY FUTURES PRICES 18
Table 1.1: Commodity sectors
Energy Metals Softs Soy LivestockBrent crude oil Copper Corn Soybeans Lean hogsLight crude oil Gold Rice Soybean meal Live cattleHeating oil Palladium Wheat Soybean oilNatural gas Platinum SugarGas oil Silver Orange juiceGasoline RBOB Cocoa
CoffeeCottonLumber
beans and by cup testing for flavor. The Exchange uses certain coffees to establish the ”basis.” Coffees judged
better are at a premium; those judged inferior are at a discount. Moreover, these grades are established within
a framework of deliverable products, for example from the ICE product guide for this KC commodity future
we have that “Mexico, Salvador, Guatemala, Costa Rica, Nicaragua, Kenya, New Guinea, Panama, Tanzania,
Uganda, Honduras, and Peru all at par, Colombia at 200 point premium, Burundi, Venezuela and India at 100 point
discount, Rwanda at 300 point discount, and Dominican Republic and Ecuador at 400 point discount. Effective
with the March 2013 delivery, the discount for Rwanda will become 100 points, and Brazil will be deliverable at
a discount of 900 points.”
Energy
Brent crude oil is a class of sweet light crude oil (a “sweet” crude is classified as containing less than 0.42% sulfur,
otherwise it is known as “sour”). The term “light” crude oil characterizes how light or heavy a petroleum liquid
is compared to water. The standard measure of “lightness” is the American Petroleum Institute’s API gravity
measure. The New York Mercantile Exchange (NYMEX) defines U.S. light crude oil as having an API measure
between 37 (840 kg/m3) and 42 (816 kg/m3) and foreign as having between 32 (865 kg/m3) and 42 API.
Therefore, various grades are defined in the standardized contract. Both foreign and domestic light crude oil
products are required to admit various characteristics based on sulfur levels, API gravity, viscosity, Reid vapor
pressure, pour point, and basic sediments or impurities. Exact grade specifications are available in the CME Group
handbook, Chapter 200, 200101.A and B.
The price of Brent crude is used as a benchmark for most Atlantic basin crude oils, although Brent itself derives
from North Sea offshore production. Other important benchmarks also include North America’s West Texas
Intermediate and the middle east UAE Dubai Crude which together track the world’s internationally traded crude
oil supplies. The representative light crude oil future employed in this paper is written on West Texas Intermediate
CHAPTER 1. DYNAMIC MODELING OF COMMODITY FUTURES PRICES 19
and exchanged by the CME Group. The delivery point for (WTI) light crude oil is Cushing, Oklahoma, U.S.,
which is also accessible to the international spot markets via pipelines. Likewise, the Brent crude oil future is
exchanged by ICE and admits delivery at Sullom Voe, an island north of Scotland.
Heating oil is a low viscosity, liquid petroleum product used as a fuel for furnaces or boilers in both residential
and commercial buildings. Heating oil contracts take delivery in New York Harbor. Just as in crude oil contracts,
very detailed stipulations exist regarding product quality grades; see the CME handbook, Chapter 150, 150101.
Natural gas is a hydrocarbon gas mixture consisting primarily of methane, used as an important energy source in
generating both heating and electricity. It is also used as a fuel for vehicles and is employed in both the production
of plastics and other organic chemicals. Natural gas admits delivery at the Henry Hub, a distribution hub on the
natural gas pipeline system in Erath, Louisiana, U.S. Contract details are available in the CME handbook, Chapter
220, 220101. Gas oil (as it is known in Northern Europe) is Diesel fuel. Diesel fuel is very similar in its physical
properties to heating oil, although it has commonly been associated with combustion in Diesel engines. Gas oil
admits delivery in the Amsterdam-Rotterdam-Antwerp (ARA) area of the Netherlands and Belgium. Contract
grade specifications are available from the exchange, ICE.
The Gasoline RBOB classification stands for Reformulated Blendstock for Oxygenate Blending. RBOB is the
base gasoline mixture produced by refiners or blenders that is shipped to terminals, where ethanol is then added to
create the finished ethanol-blended reformulated gasoline (RFG). Gasoline RBOB admits delivery in New York
Harbor and quality grade details are outlined in the CME handbook, Chapter 191, 191101.
Metals
Gold and silver, have both traditionally been highly sought after precious metals for use in coinage, jewelry,
and other applications since before the beginning of recorded history. Both also have important applications in
electronics engineering and medicine. The CME exchange licenses storage facilities located within a 150 mile
radius of New York city, in which gold or silver may be stored for delivery on exchange contracts. The quality
grades for gold and silver are defined in the CME handbook, Chapters 113 and 112, respectively.
Platinum, while also considered a precious metal, also plays an important role, along with the metal Palladium
in the construction of catalytic converters. Catalytic converters are used in the exhaust systems of combustion
engines to render output gases less harmful to the environment. Palladium also plays a key role in the construction
of hydrogen fuel cells. Finally, copper is a common element used extensively in electrical cabling given its good
conductivity properties. Platinum, Palladium, and Copper offer a number of delivery options, including delivery
to warehouses in Zurich, Switzerland. See the CME handbook Chapters 105, 106 and 111 respectively.
CHAPTER 1. DYNAMIC MODELING OF COMMODITY FUTURES PRICES 20
Softs and Livestock
“Soft goods” are typically considered those that are either perishable or grown in an organic manner as opposed
to “hard goods” like metals which are extracted from the earth through mining techniques.
In the grains category we have corn, rice, and wheat which are all considered “cereal grains”; that is, they
represent grasses from which the seeds can be harvested as food. Sugar, derived from sugarcane, is also a grass
but the sugar is derived not from the seeds but from inside the stalks. Corn, rice, and wheat all admit a number
of standardized delivery points within the U.S. See the CME handbook chapters 10, 14, and 17 for grade specifi-
cations and delivery options. Sugar delivery point options and grade details are available online from ICE, under
the Sugar No.11 contract specification.
Orange juice is derived from oranges which grow as the fruit of citrus tree, typically flourishing in tropical
to subtropical climates. The juice traded is in frozen concentrated form. Orange juice is deliverable to a number
of points in the U.S., including California, Delaware, Florida, and New Jersey warehouses. See the ICE FCOJ
Rulebook available online for further information and quality grade details. Coffee is derived from the seeds of
the coffea plant, referred to commonly as coffee “beans.” Cocoa represents the dried and fully fermented fatty
seeds contained in the fruit of the cocoa tree. Finally, cotton is a fluffy fibre that grows around the seeds of the
cotton plant. Delivery point information and quality grade details for Coffee, Cocoa, and Cotton are also available
via the ICE Rulebook chapters available online.
In the soy category we have soybeans, a species of legume widely grown for its edible beans; soybean meal
which represents a fat-free, cheap source of protein for animal feed and many other pre-packaged meals; and
finally, soybean oil is derived from the seeds of the soy plant and represents one of the most widely consumed
cooking oils. All three soybean products admit a number of standardized delivery points within the U.S. See the
CME handbook chapters 11, 12, and 13 for grade specifications and delivery options.
Lean hogs refers to a common type of pork hog carcass used typically for consumption. A lean hog is
considered to be 51-52% lean, with 0.80-0.99 inches of back fat at the last rib, with a 170-191 lbs. dressed weight
(both “barrow” and “gilt” carcasses). Live cattle are considered 55% choice, 45% select, yield grade 3 live steers
(a castrated male cow). Finally, lumber is traded as random length 2×4’s between 8-20 feet long. Lean hogs
futures are not delivered but are cash settled based on the CME Lean Hog Index price. Cattle is to be delivered
to the buyer’s holding pen. Lumber shall be delivered on rail track to the buyer’s producing mill. See CME
handbook Chapters 152, 101, and 201, respectively for details.
CHAPTER 1. DYNAMIC MODELING OF COMMODITY FUTURES PRICES 21
Data sources
The following Table 1.2 outlines the dates for which there exists data for each commodity futures price series,
the time to maturity, currency denomination, commodity exchange and code, and basic unit/characteristics of the
product traded.
Table 1.2: Commodity specifications
Commodity Start date CEM Currency unit Exchange Code Basic unitSoybean meal 7/18/1977 FHKNZ U.S.$/st CME ZM/SM 100 st’sSoybean oil 7/18/1977 FHKNZ U.S.$/100lbs CME ZL/BO 60,000 lbsSoybeans 7/18/1977 FHKNX U.S.$/100bushel CME ZS/S 5,000 bushelsOrange juice 7/18/1977 FHKNUX U.S.$/100lbs ICE OJ 15,000 lbsSugar 7/18/1977 HKNV U.S.$/100lbs ICE SB 112,000 lbsWheat 7/18/1977 HKNUZ U.S.$/100bushel CME ZW/W 5,000 bushelsCocoa 7/18/1977 HKNUZ U.S.$/MT ICE CC 10 MTCoffee 7/18/1977 HKNUZ U.S.$/100lbs ICE KC 37,500 lbsCorn 7/18/1977 HKNUZ U.S.$/100bushel CME CZ/C 5,000 bushelsCotton 7/18/1977 HKNZ U.S.$/100lbs ICE CT 50,000 lbsRice 12/6/1988 FHKNUX U.S.$/100hw CME ZR/RR 2,000 hwLumber 4/7/1986 FHKNUX U.S.$/mbf CME LBS/LB 110 mbfGold 7/18/1977 GMQZ U.S.$/oz CME GC 100 troy ozSilver 7/18/1977 HKNUZ U.S.$/100oz CME SI 5,000 troy ozPlatinum 4/1/1986 FJNV U.S.$/oz CME PL 50 troy ozPalladium 4/1/1986 HMUZ U.S.$/oz CME PA 100 troy ozCopper 12/6/1988 HKNUZ U.S.$/100lbs CME HG 25,000 lbsLight crude oil 3/30/1983 All U.S.$/barrel CME CL 1,000 barrelsHeating oil 7/1/1986 All U.S.$/gallon CME HO 42,000 gallonsBrent crude oil 6/23/1988 All U.S.$/barrel ICE CO 1,000 barrelsGas oil 7/3/1989 All U.S.$/MT ICE QS? 100 MTNatural gas 4/3/1990 All U.S.$/mmBtu CME NG 10,000 mmBtuGasoline RBOB 10/4/2005 All U.S.$/gallon ICE HO 42,000 gallonsLive cattle 7/18/1977 GJMQVZ U.S.$/100lbs CME LE/LC 40,000 lbsLean hogs 4/1/1986 GJMQVZ U.S.$/100lbs CME HE/LH 40,000 lbs
The units are described as follows. A barrel is considered to be 42 U.S. gallons. An mmBtu is one million
British Thermal Units, a traditional unit of energy equal to about 1055 joules per Btu. An MT is one metric tonne,
which is a unit of mass approximately equal to 1,000 kilograms. Lbs and oz are the abbreviations for pounds and
ounces, respectively. A “Troy oz” is a slightly modified system whereby one troy oz is equal to approximately
1.09714 standard oz. A bushel is a customary unit of dry volume, equivalent to 8 gallons. An mbf is a specialized
unit of measure for the volume of lumber in the U.S, called a “board-foot.” A board-foot (or “bf”) is the volume
of a one-foot length of a wooden board, one foot wide and one inch thick. Therefore an mbf is one million
such board-feet. Finally, an “st” or short tonne is a unit of mass smaller than the metric tonne, equivalent to
approximately 907 kilograms.
CHAPTER 1. DYNAMIC MODELING OF COMMODITY FUTURES PRICES 22
The column CEM represents the range of “contract ending months” that each futures contract may be specified
for. The month codes are as follows: F - January, G - February, H - March, J - April, K - May, M - June, N -
July, Q - August, U - September, V - October, X - November, and Z - December. These are the standard codes
employed by the exchanges.
All series end on February 8th, 2013, and represent daily closing prices for those days the commodities are
traded on the exchange. In June 2007 the CBOT (Chicago Board of Trade) which acted as the exchange for soy
products, wheat corn, and rice, merged with the CME (Chicago Mercantile Exchange) to form the CME Group.
Moreover, most of the energy futures were originally traded on the NYMEX (New York Mercantile Exchange) and
the metals were traded on the COMEX (Commodity Exchange; a division of the NYMEX). However, on August
18, 2008, the NYMEX (along with the COMEX) also merged with the CME Group. Gas oil was originally traded
on the IPE (International Petroleum Exchange) which was acquired by ICE (IntercontinentalExchange) in 2001.
Therefore, care must be taken in interpreting the various exchange codes which have changed over time.
For most CME contracts, the last trading day is typically the 15th business day before the first day of the
contract month. The delivery date is then freely chosen as any day during the contract month.
1.2.7 Features of the price level series
When dealing with financial data we typically consider the continuously compounded returns series,
rt = ln(Pt/Pt−1), since the price level process is nonstationary and so we are obliged to transform the initial
price data. However, in the case of futures price data without delivery, an examination of the time evolution of the
price level processes does not necessarily suggest the presence of trends, either of the stochastic type (i.e. random
walk), or due to a deterministic increase or decrease.
Figure 1.4: Plots of daily continuous contract futures price level series, Sugar and Lean hogs
0
5
10
15
20
25
30
35
40
45
50
08/0
119
77
09/0
119
82
09/0
119
87
10/0
119
92
11/0
119
97
12/0
120
02
01/0
120
08
02/0
120
13
Sugar from 07/18/1977 to 02/08/2013
20
30
40
50
60
70
80
90
100
110
04/0
119
86
05/0
119
91
05/0
119
96
06/0
120
01
07/0
120
06
08/0
120
11
Lean hogs from 04/01/1986 to 02/08/2013
CHAPTER 1. DYNAMIC MODELING OF COMMODITY FUTURES PRICES 23
For example, let us consider the two plots in Figure 1.4, that display the time evolution of the futures prices
of sugar and lean hogs. Both series do not exhibit obvious deterministic time trends and their dramatic bubbles
(especially in sugar) suggest that they cannot have been generated by a random walk. Interestingly, lean hogs
exhibits the well known “pork cycle,” or cyclical patterns related to pork production.
The price level series all exhibit a very high level of linear persistence in the sense that their estimated au-
tocorrelation function, ρ(s), takes on the value ρ(1) ≈ 1, with small, but significant, ρ(s) for some s > 1 (see
Table 1.3 for the autocorrelation at lag 1). Moreover, their normalized spectral densities exhibit extremely sharp
peaks at the zero frequency and are near zero elsewhere in the spectrum. Of course, this is suggestive of a unit
root process, however, augmented Dickey-Fuller unit root tests of the series are inconclusive in rejecting the null
of a unit root (including a constant, but no time trend).9
This is unsurprising given what we know about the properties of some exotic parametric processes which are
able to elude detection by traditional unit root testing (see for example the causal representation of the noncausal
AR(1) model with i.i.d. Cauchy innovations discussed later in Section 1.4.2). A linear unit root test is not of much
use if the causal representation of the process may be nonlinear and strictly stationary, with moments that do not
exist. Finally, linear unit root tests have been shown to have low power in the presence of nonlinearity (such as
multiple regimes, for example).
Since the continuous contract futures series are constructed through the “rolling over” mechanism, they re-
flect the price of a reconstituted futures contract in which the time to maturity, h, remains fixed throughout the
time evolution of the price level, despite the fact that the reconstitution is generated from individual contracts of
different maturities, each representing daily closing prices for those days these futures contracts are traded on the
exchange. The different starting dates for each of the series are given in Table 1.2 and all the continuous contract
series end on February 8th, 2013.
Summary statistics for the price levels series are given in Table 1.3 and plots and histograms of all the price
level series are available in Appendix 1.14 (Figures 1.10.i to 1.11.iv).
Note some of the salient features from the summary statistics in Table 1.3. If we are to interpret the series
as strictly stationary, the sample moments suggest highly leptokurtic unconditional distributions for most of the
series. Exceptions to this exist, however, in orange juice, lumber, platinum, copper, gasoline RBOB, and lean
hogs. Perhaps more importantly we should consider that most of the series are also positively skewed, again
with a few exceptions in gasoline RBOB and lean hogs (and possibly orange juice). Visual examination of the
histograms in Appendix 1.14 corroborate these statistics. Moreover, some of the histograms indicate a bimodal
structure, especially among those series that are highly skewed, suggesting the possibility of a mixture between
low price and high price regimes. A good example of this is the copper series.
9The estimated spectral density and Dickey-Fuller test results are available upon request.
CHAPTER 1. DYNAMIC MODELING OF COMMODITY FUTURES PRICES 24
Tabl
e1.
3:Su
mm
ary
stat
istic
s-c
omm
odity
futu
res
pric
ele
vels
erie
s
Lev
els
Qua
ntile
sSe
ries
10%
50%
90%
Mea
nSt
nd.D
ev.
Skew
ness
Kur
tosi
sA
CF(
1)Sa
mpl
esi
zeSo
ybea
nm
eal
149.
600
185.
800
314.
200
210.
347
70.1
511.
729
6.19
00.
998
9280
Soyb
ean
oil
16.6
4023
.750
39.9
9326
.399
10.4
491.
709
5.51
60.
999
9280
Soyb
eans
503.
750
629.
000
1057
.600
716.
563
249.
577
1.75
55.
735
0.99
892
80O
rang
eju
ice
79.2
5011
5.12
517
0.35
011
8.92
633
.531
0.59
22.
663
0.99
892
80Su
gar
6.04
09.
830
20.5
0311
.586
6.34
31.
946
7.28
30.
998
9280
Whe
at26
7.25
035
7.50
062
2.75
040
1.67
215
1.03
61.
878
6.65
60.
998
9280
Coc
oa99
1.00
016
21.0
0029
71.1
0018
35.2
6874
4.05
10.
926
3.46
60.
997
9280
Cof
fee
64.7
0012
4.45
019
2.00
012
6.32
548
.051
0.69
93.
495
0.99
792
80C
orn
203.
750
258.
250
435.
000
298.
578
126.
933
2.09
77.
126
0.99
892
80C
otto
n49
.059
65.1
5085
.720
67.6
6519
.798
2.68
816
.481
0.99
792
80R
ice
5.36
08.
440
14.6
019.
243
3.55
70.
844
3.50
30.
999
6309
Lum
ber
181.
700
261.
700
366.
920
267.
773
70.5
620.
463
2.45
80.
996
7005
Gol
d27
7.70
038
5.40
096
4.23
051
0.66
435
1.24
52.
202
7.13
90.
999
9280
Silv
er4.
400
6.03
718
.050
9.40
67.
680
2.27
27.
910
0.99
892
80Pl
atin
um36
7.20
053
4.00
015
55.4
2075
5.71
546
3.35
21.
169
3.09
60.
999
7009
Palla
dium
111.
000
206.
150
645.
140
286.
657
203.
778
1.30
33.
935
0.99
970
09C
oppe
r74
.000
115.
400
358.
860
168.
275
111.
428
1.06
02.
562
0.99
963
09L
ight
crud
eoi
l16
.400
26.7
4085
.712
38.1
0327
.475
1.37
13.
827
0.99
977
93H
eatin
goi
l45
.733
67.6
5526
4.86
511
2.31
686
.145
1.29
23.
484
0.99
969
44B
rent
crud
eoi
l15
.796
25.4
1010
0.12
841
.547
32.5
011.
205
3.19
90.
999
6427
Gas
oil
147.
000
226.
500
894.
875
375.
818
281.
273
1.16
13.
180
0.99
961
60N
atur
alga
s1.
631
3.14
27.
366
3.98
72.
478
1.37
04.
950
0.99
859
64G
asol
ine
RB
OB
153.
220
223.
895
304.
360
227.
116
57.8
770.
023
2.30
90.
995
1920
Liv
eca
ttle
60.5
0071
.488
95.1
0075
.023
15.8
711.
219
4.91
50.
998
9280
Lea
nho
gs46
.550
63.3
4581
.380
63.7
2613
.133
0.16
52.
830
0.99
570
09*
Not
eth
atA
CF(
1)r
epre
sent
sth
eau
toco
rrel
atio
nfu
nctio
nat
lag
1an
dT
isth
esa
mpl
esi
ze.A
lso
the
kurt
osis
mea
sure
empl
oyed
here
isno
tthe
exce
ssku
rtos
is.
CHAPTER 1. DYNAMIC MODELING OF COMMODITY FUTURES PRICES 25
1.3 The linear causal ARMA model
In this section we show that the causal linear ARMA model, with Gaussian innovations, is unable to adequately
capture the features of the futures price level data.
In order to assess the ARMA model’s ability to fit the price level data, we estimate a number of different
ARMA(p, q) specifications and choose among the best fitting according to the Akaike information criteria (AIC).
The software used to estimate the ARMA model is the popular “R project for statistical computing” available for
download at http://www.r-project.org/. In order to facilitate the (p, q) parameter search we employ the
auto.arima() function in the R forecast package due to Hyndman and Khandakar (2008). Given computational
constraints, maximum orders of p + q = 13, p ≤ 10 and q ≤ 3 are chosen. AIC’s are specified not to be
approximated and the “stepwise” selection procedure is avoided to make sure all possible model combinations are
tested.
The arima() routine called by auto.arima() obtains reasonable starting parameter values by conditional sum
of squares and then the parameter space is more thoroughly searched via a Nelder and Mead (1965) type algo-
rithm. The pseudo-likelihood function is computed via a state-space representation of the ARIMA process, and
the innovations and their variance found by a Kalman filter. Since the assumption of Gaussian shocks may be
misspecified, robust sandwich estimator standard errors are employed of the type introduced by White (1980).
If the ARMA model captures the nonlinear features of the data, the residuals (et) should be approximately
representative of a strong white noise series. Therefore, we test for this feature in two ways: 1) we employ the
Ljung-Box test with the null of weak white noise residuals [Ljung and Box (1978)] and 2) the BDS test with the
null of independent residuals [Brock, Dechert and Scheinkman, and LeBaron (1996)].
1.3.1 Test specifications
The Ljung-Box test statistic is given as
LB(S) = T
S∑s=1
T + 2
T − s%(s)2, (1.1)
where %(s) is the estimated autocorrelation function of the ARMA model residuals. The null hypothesis is that
the autocorrelations of the ARMA residuals are jointly 0 up to lag S. Finally, LB(S) ∼ χ2(S), if the residuals
are representative of the true theoretical (εt) which is a strong white noise (and neglecting the fact that %(s) is an
estimated quantity itself).
The BDS test was designed to be employed on the residuals of a best fitting linear model in order to look
for deterministic chaos in the residual nonlinear structure. This test involves the correlation dimension tech-
CHAPTER 1. DYNAMIC MODELING OF COMMODITY FUTURES PRICES 26
nique originally developed by Grassberger and Procaccia (1983) to detect the presence of chaotic structure by
embedding overlapping subsequences of the data in k-space. Given a k-dimensional time series vector xt,k =
(xt, xt+1, . . . , xt+k−1)′ called the k-history, the BDS test treats this k-history as a point in a k-dimensional space.
The BDS test statistic, called the correlation integral is given as
Ck(ε, T ) =2
Tk(Tk − 1)
∑i<j
Iε(xi,k,xj,k), where Tk = T − k + 1, (1.2)
and where Iε(u, v) is an indicator variable that equals one if ‖u − v‖ < ε and zero otherwise, where ‖·‖ is
the supnorm. The correlation integral estimates the fraction of data pairs of xt,k that are within ε distance from
each other in k-space. Despite the original purpose of the test, it is effectively a test for independence since
if we can reject the null hypothesis of correlation of (xt)Tkt=1 in every k-dimensional embedding space this is
equivalent to being i.i.d. That is, if the k-histories show no pattern in k-dimensional space, then we should have
that Ck(ε, T ) ≈ C1(ε, T )k.
It is shown that the BDS statistic√T[Ck(ε, T )− C1(ε, T )k
]is asymptotically Normal with mean zero and
finite variance under the null hypothesis [see Tsay (2010), Ch.4.2.1]. If we cannot reject the null hypothesis the
alternative is quite broad since, depending on the correlation structure in the k-dimensional spaces, the nonlinear-
ity could have come about due to either deterministic nonlinearity, i.e. chaos [see Blank (1991), Decoster et al.
(1992), and Yang and Brorsen (1993)], or stochastic nonlinearity.
For the Ljung-Box test we specify the number of lags as S = ln(T ) rounded to the nearest integer, where T
is the sample size given in Table 1.3. According to Tsay (2010), Ch.2.2, pg.33, simulation studies suggest that
this choice maximizes the power of the test. For the BDS test we consider embedding dimensions k up to k = 15,
which trades off number of dimensions for computational efficiency.
1.3.2 Results
Table 1.4 presents estimation results for the ARMA model. Generally, for all the series, the best fitting linear
ARMA model residuals reject the BDS null hypothesis of i.i.d. shocks at the 1% test significance level (in fact all
of the test statistic p-values are extremely close to 0). There is one exception in the lean hogs price levels series,
where for ε = 2.6 (the parameter that defines “near points” in the k-dimensional space, i.e. ‖u− v‖ < ε), we are
not able to reject the null hypothesis of i.i.d. residuals (however, we are able to reject for smaller ε = 1.95). The
p-values in this case decline monotonically from 0.731 at k = 2 down to 0.165 at k = 15.
Plots of all the residuals series also suggest ARCH effects (see Figure 1.5 for an example). Interestingly,
except in the case of coffee, the noises are still weak white according to the Ljung-Box test as we are unable to
reject the null hypothesis at the 10% level, although we are able to reject platinum at the 13% level and soybean
CHAPTER 1. DYNAMIC MODELING OF COMMODITY FUTURES PRICES 27
meal at the 15% level.
Interestingly, the ARMA estimation software is unable to fit an autoregressive model to the gold series, and
so we skip testing its residuals for whiteness.
Figure 1.5: Soybean meal residuals from ARMA model
Clearly, the causal linear ARMA model is not able to fully capture the structure of the data as the residuals
are often weak white noise, but not i.i.d. Therefore, the evidence presented in this section suggests that we need a
better model if we are to adequately capture the nonlinear dynamic features of the futures price level data.
1.4 The linear mixed causal/noncausal process
The linear mixed causal/noncausal process takes the form of a two sided infinite moving average representation,
Yt =
∞∑i=−∞
aiεt−i, (1.3)
where (εt) is a strong white noise, that is a sequence of independently and identically distributed (i.i.d.) variables,
that doesn’t necessarily admit finite moments. The mixed causal/noncausal process is composed of both a purely
causal component that depends only on past shocks, that is the sum of aiεt−i for all i > 0, and a purely noncausal
component that depends only on future shocks, that is the sum of aiεt−i for all i < 0. We have a unique
representation for (1.3), up to a scale factor on εt, except in the case where the white noise (εt) is Gaussian [see
e.g. Findley (1986) and Cheng (1992)]. For Gaussian white noise, there exists an equivalent purely causal linear
CHAPTER 1. DYNAMIC MODELING OF COMMODITY FUTURES PRICES 28
Table 1.4: ARMA estimation results
Series p qa AIC Log-likelihood Ljung-Box Pvalb BDS Pvalc
Soybean meal 6 2 52395.00 -26188.50 0.15 0Soybean oil 8 3 11859.05 -5917.52 0.92 0Soybeans 9 2 73548.21 -36762.11 0.52 0Orange juice 4 3 42121.61 -21052.80 0.40 0Sugar 10 2 7842.39 -3908.20 1.00 0Wheat 7 2 67069.47 -33524.74 1.00 0Cocoa 8 3 94368.76 -47172.38 0.72 0Coffee 4 2 48866.80 -24426.40 0.06 0Corn 7 3 59385.84 -29681.92 0.63 0Cotton 10 0 32760.78 -16369.39 1.00 0Rice 10 3 -4799.02 2413.51 0.96 0Lumber 8 3 44027.92 -22001.96 1.00 0Gold 0 3 102914.50 -51453.27 n/a 0Silver 9 3 7424.04 -3699.02 0.94 0Platinum 8 2 55936.82 -27957.41 0.13 0Palladium 9 3 48209.69 -24091.84 0.99 0Copper 10 0 34719.50 -17348.75 1.00 0Light crude oil 7 2 22244.11 -11112.06 0.95 0Heating oil 9 2 34465.28 -17220.64 1.00 0Brent crude oil 7 2 18807.92 -9393.96 0.90 0Gas oil 5 3 44142.24 -22062.12 0.92 0Natural gas 3 2 -4178.27 2095.13 0.23 0Gasoline RBOB 5 3 11715.32 -5848.66 0.99 0Live cattle 6 1 22771.40 -11377.70 0.99 0Lean hogs 3 2 23567.63 -11777.81 0.70 0
a The orders of the ARMA(p, q) model are given in the first and second columns.b The column denoted “Ljung-Box Pval” indicates the p-value statistic for this test – therefore we reject the null
hypothesis at x% probability of committing a type I error if Pval < x.c The BDS p-value provided is for the case where ε = 1.95. The p-values are 0 for all lags k = 2, . . . , 15.
CHAPTER 1. DYNAMIC MODELING OF COMMODITY FUTURES PRICES 29
representation where (ε∗t ) is another Gaussian white noise. This implies that for non-Gaussian (εt), a mixed linear
process including a noncausal component (i.e. ∃i < 0, ai 6= 0) will necessarily admit a nonlinear causal dynamic.
For more details see Appendix 1.11.1.
1.4.1 The asymmetries
As an example, let us consider the effect of shocks (εt) on the model above in (1.3). Let εt be distributed Cauchy
which admits no first and second-order finite moments. Moreover, let ai = ρi1 for i ≥ 0, ai = ρi2 for i ≤ 0, and
|ρk| < 1 for k = 1, 2, where we are free to choose ρ1 and ρ2 as such.
In choosing various values for ρk, k = 1, 2, we will see how the general linear causal/noncausal model is
able to exhibit bubble like phenomenon with asymmetries of the type discussed in Ramsey and Rothman (1996)
(see the Introduction, Section 1.1, (ii), Price dynamics). Consider the simulated sample path of the linear mixed
causal/noncausal model with standard Cauchy shocks as depicted in Figure 1.6, where we have zoomed in on a
bubble episode to focus on the dynamics.
Within Figure 1.6 we have an example of a positive shock εt > 0 around time t = 57. Depending on
the values chosen for ρ1 and ρ2, the bubble’s build up and subsequent crash exhibits different rates of ascent
and descent. For example, consider the parameter combination (ρ1 = 0.8, ρ2 = 0). This represents the purely
causal case where the shock occurs at time t = 57 and its effect dies off slowly, and so we have a quick rise
and a subsequently slow decline. Also consider the opposite case where (ρ1 = 0, ρ2 = 0.8). This is the purely
noncausal case where the bubble builds up slowly until time t = 57 and then quickly declines. The other cases
represent mixed causal/noncausal models where the bubble rises and falls at rates which depend on the ratio of
ρ1/ρ2 = α. If α > 1 the bubble rises quicker than it declines; if α < 1 then it rises slower than it declines,
and if α = 1 then it behaves symmetrically around time t = 57. These asymmetries can be classified within the
framework of Ramsey and Rothman (1996) as being longitudinally asymmetric in that the probabilistic behaviour
of the process is not the same in direct and reverse time.
Of course, for a negative shock εt < 0 the behaviour would be duplicated, but instead we would see a
crash instead of a bubble. This suggests that the mixed causal/noncausal process can also exhibit transversal
asymmetries, that is asymmetries in the vertical plane, by modifying the distribution of the shocks. For example,
if we were to only accept positive Cauchy shocks, εt > 0, this would induce a process that only exhibited positive
bubbles which would represent a transversally asymmetric process.
Therefore, by managing both the moving average coefficients, ai, and the distribution of the shocks εt in (1.3),
the mixed causal/noncausal model can exhibit both longitudinal and transversal asymmetries of the type discussed
by Ramsey and Rothman (1996).
CHAPTER 1. DYNAMIC MODELING OF COMMODITY FUTURES PRICES 30
Figure 1.6: The mixed causal/noncausal model with Cauchy shocks
-20
0
20
40
60
80
100
120
140
160
180
40 45 50 55 60 65 70 75
ρ1 = 0.8, ρ2 = 0.0ρ1 = 0.6, ρ2 = 0.2ρ1 = 0.4, ρ2 = 0.4ρ1 = 0.2, ρ2 = 0.6ρ1 = 0.0, ρ2 = 0.8
1.4.2 The purely causal representation
As discussed above, in general we have a unique linear representation as (1.3) except when the white noise process
is Gaussian. This implies that, for fat tailed distributions, such as the t-distribution or Cauchy distribution, the
purely causal strong form representation will necessarily admit a nonlinear representation [Rosenblatt (2000)].
Example: The noncausal autoregressive process with Cauchy shocks
Consider the noncausal autoregressive process of order 1 with Cauchy shocks,
xt = ρxt+1 + εt, (1.4)
where |ρ| < 1 and εt/σε follows a standard i.i.d. Cauchy distribution. The shocks can be interpreted as back-
ward innovations, defined as εt = xt −median(xt|xt+1), since, strictly speaking, the moments of the Cauchy
distribution do not exist.
This process admits both a strong purely causal representation which is necessarily nonlinear with i.i.d.
shocks, and a weak form purely causal representation which is linear, but where the shocks are weak white
noise and not i.i.d.
More precisely, the noncausal process (xt) is a Markov process in direct time with a causal transition p.d.f.
given as [Gourieroux and Zakoian (2012), Proposition 2, and Appendix 1.11.3 of this paper]:
ft+1|t(xt+1|xt) =1
σεπ
σ2ε
σ2ε + z2
t
σ2ε + (1− |ρ|)2x2
t
σ2ε + (1− |ρ|)2x2
t+1
. (1.5)
CHAPTER 1. DYNAMIC MODELING OF COMMODITY FUTURES PRICES 31
In particular the causal conditional moments associated with the equation above exist up to order three, whereas
the noncausal conditional moments associated with the forward autoregression in (1.4), and the unconditional
moments, do not exist.
i) The causal strong autoregressive representation
In order to represent (1.4) as a causal, direct time, process in strong form, we must appeal to the nonlinear (or
generalized) innovations of the process [see Rosenblatt (2000), Corollary 5.4.2. or Gourieroux and Jasiak (2005),
Section 2.1].
Intuitively, a nonlinear error term, (ηt), of the causal process (xt) is a strong white noise where we can write
the current value of the process xt as a nonlinear function of its own past value xt−1 and ηt, say,
xt = G(xt−1, ηt), ηt ∼ i.i.d., (1.6)
where xt and ηt satisfy a continuous one-to-one relationship given any xt−1. For more details see Appendix
1.11.4.
ii) The causal weak autoregressive representation
Only the Gaussian autoregressive processes possess both causal and noncausal strong form linear autoregres-
sive representations. The noncausal AR(1) Cauchy model therefore admits only a weak form linear representation
given as [Gourieroux and Zakoian (2012), Section 2.3]:
xt = Et|t−1[xt|xt−1] + η∗t
√V art|t−1[xt|xt−1]. (1.7)
The representation is weak since (η∗t ) is a weak white noise (not i.i.d.) and η∗t√V art|t−1[xt|xt−1] = ε∗t is
conditionally heteroskedastic. That is, the weak innovations also display GARCH type effects.
The conditional moments of xt are given as:
Et|t−1[xt|xt−1] = sign(ρ)xt−1 and (1.8a)
Et|t−1[x2t |xt−1] =
1
|ρ|x2t−1 +
σ2ε
|ρ|(1− |ρ|). (1.8b)
Interestingly, from equation (1.8a), we see that for ρ > 0 the process exhibits a unit root (this is the mar-
tingale property), but is still stationary; this unit root is expected since the unconditional moments of xt do not
exist. Usually when we consider the properties of a unit root model this is within the context of models with a
nonstationary stochastic trend. However, in the example above the causal process (xt) has a unit root when being
strongly stationary. So the unit root does not generate a stochastic trend, but can generate bubbles due to the
CHAPTER 1. DYNAMIC MODELING OF COMMODITY FUTURES PRICES 32
martingale interpretation [see Gourieroux and Zakoian (2012), and the discussion in Section 1.4.3].
1.4.3 Other bubble like processes
As described in Gourieroux and Zakoian (2012), several other examples of martingale processes with bubbles
have been introduced in the literature. However, none of these processes are as easy to introduce into a general
dynamic framework as the set of mixed causal/noncausal processes.
Interestingly, these previous bubble processes are piecewise linear, but still maintain the martingale property.
For example, the bubble process introduced in Blanchard and Watson (1982) is given by:
xt+1 =
1πxt + εt+1, with probability π,
εt+1, with probability (1− π),
(1.9a)
where εt is a Gaussian error term and π ∈ (0, 1). This is a martingale process, with a piecewise linear dynamic in
that given the latent state, the parameter on the autoregression switches between zero and 1/π.
Evans (1991) proposes to model the explosive rate parameter, (θt), say, as a Bernoulli random variable,
B(1, π). Again, this process represents one that is piecewise linear, but in this case is also a multiplicative
error term model, with (ut) representing an i.i.d. process with ut ≥ 0, Et[ut+1] = 1, and with parameters
0 < δ < (1 + r)α where r > 0, π ∈ (0, 1], and
xt+1 =
(δ + 1
π (1 + r)θt+1
(xt − δ
(1+r)
))ut+1 if xt > α
(1 + r)xtut+1 if xt ≤ α.(1.10a)
In this case the regime is not latent, but is a function of the observable xt. In this way, the process is an extension
of the self-exciting threshold autoregression of Tong and Lim (1980).
For illustration we have simulated sample paths from the two bubble processes above along with the causal
AR(1) Cauchy process (see Figure 1.7). The Blanchard and Watson process is simulated by choosing π = 0.8 and
εt ∼ IIN(0, 1). The Evans process is simulated in accordance to the parameters chosen in simulating bubbles
for Table 1, on page 925, of their paper; that is, we have α = 1, δ = 0.5, 1 + r = 1.05, π = 0.75 and a sample
path of length T = 100 is generated. Moreover, ut is log-normally distributed, where ut = exp [yt − τ2/2] and
yt ∼ IIN(0, 0.052). Finally, the causal AR(1) Cauchy is simulated by choosing ρ = 0.8 in equation (1.4) and
σ = 0.1 as the scale parameter of the Cauchy distribution.
The bubble processes above were constructed for very specific theoretical reasons. The Blanchard and Watson
(1982) process is given as an example of a bubble consistent with the rational expectation hypothesis and the Evans
CHAPTER 1. DYNAMIC MODELING OF COMMODITY FUTURES PRICES 33
Figure 1.7: Plots of simulated bubble processes
-60
-40
-20
0
20
40
60
80
100
0 50 100 150 200 250 300 350 400 450 500
Blanchard and Watson model
0
0.5
1
1.5
2
2.5
3
0 10 20 30 40 50 60 70 80 90 100
Evans model
-10
0
10
20
30
40
50
60
0 50 100 150 200 250 300 350 400 450 500
Gourieroux and Zakoian model
(1991) process is given as an example of a stationary process with periodically collapsing bubbles that defies
standard linear unit root testing. Alone, and without further modification, neither process should be considered a
serious candidate to model bubbles in commodity futures price levels. On the other hand, unlike these previous
bubble processes, the AR(1) Cauchy model is easily introduced in a mixed causal/noncausal framework.
1.5 Estimation of the mixed causal/noncausal process
In this section we introduce the mixed causal/noncausal autoregressive model which will be estimated in an
attempt to model the asymmetric bubble features exhibited by the commodity futures price level data. The
CHAPTER 1. DYNAMIC MODELING OF COMMODITY FUTURES PRICES 34
model is a linear parameterization of the general mixed causal/noncausal model in (1.3) and represents the mixed
causal/noncausal analog of the causal autoregressive model. The model is discussed in the next Section 1.5.1 and
estimation of the model via maximum likelihood is discussed in Section 1.5.2.
1.5.1 The mixed causal/noncausal autoregressive model of order (r, s)
Definition 1.5.1. The mixed causal/noncausal autoregressive process of order (r, s)
Let (xt) be a univariate stochastic process generated by a linear autoregressive mixed causal/noncausalmodel with order (r, s). The process is defined by
α(L)xt = ε∗t , ε∗t ∼ i.i.d., (1.11a)
where α(L) = 1− α1L− α2L2 − . . .− αpLp, (1.11b)
such that L is the lag operator (i.e. Lxt = xt−1 and L−1xt = xt+1), p = r + s, and the operatorα(z) can be factorized as α(z) = φ(z)ϕ∗(z). We have that φ(z) (of order r) contains all its rootsstrictly outside the complex unit circle and ϕ∗(z) (of order s) contains all its roots strictly insidethe unit circle.10
Therefore, φ(z) represents the purely causal autoregressive component and ϕ∗(z) represents the purely noncausal
autoregressive component [Breidt et al. (1991)].
Moving average representation of the stationary solution
If α(L) has no roots on the unit circle, and εt belongs to a Lν-space with ν > 0 (that is E[|εt|ν ] < ∞), then
a unique stationary solution to the difference equation defined in (1.11b) exists [see Appendix 1.11.1]. We can
write:
xt = α(L)−1ε∗t =
∞∑l=−∞
γlε∗t−l, (1.12)
where the series of moving average coefficients is absolutely summable,∑∞l=−∞|γl| <∞.
The strong stationary representation is derived as follows. Let us factorize φ(L) and ϕ∗(L) as
φ(L) =
r∏j=1
(1− λ1,jL), where |λ1,j | < 1, (1.13a)
and ϕ∗(L) =
s∏k=1
(1− 1
λ2,kL), where |λ2,k| < 1. (1.13b)
The noncausal component can also be written as
ϕ∗(L) =(−1)sLs∏sk=1 λ2,k
s∏k=1
(1− λ2,kL−1). (1.14)
10To ensure the existence of a stationary solution, we assume that all roots have a modulus strictly different from 1.
CHAPTER 1. DYNAMIC MODELING OF COMMODITY FUTURES PRICES 35
We get the Taylor series expansions
(1− λ1,jL)−1 =
∞∑l=0
λl1,jLl, (1.15a)
and (1− λ2,kL−1)−1 =
∞∑l=0
λl2,kL−l, (1.15b)
which are valid because the roots are such that |λ1,j | < 1, ∀j and |λ2,k| < 1, ∀k. Thus we get
xt = φ(L)−1ϕ∗(L)−1ε∗t =
∏sk=1 λ2,k
(−1)sLs1∏r
j=1(1− λ1,jL)∏sk=1(1− λ2,kL−1)
ε∗t
=
∏sk=1 λ2,k
(−1)sLs
r∏j=1
( ∞∑l=0
λl1,jLl
)s∏
k=1
( ∞∑l=0
λl2,kL−l
)ε∗t =
∞∑l=−∞
γlε∗t−l. (1.16)
An alternative representation
Since such a representation in (1.11a) is defined up to a scale factor on ε∗t , another equivalent representation is
given as
Φ(L)xt = φ(L)ϕ(L−1)xt = εt, (1.17a)
where ϕ(L−1) = 1− ϕ1L−1 − ϕ2L
−2 − . . .− ϕsL−s, (1.17b)
φ(L) = 1− φ1L− φ2L2 − . . .− φrLr, (1.17c)
and (εt) is the sequence of i.i.d. random variables defined as εt = −(1/ϕ∗sLs)ε∗t = −(1/ϕ∗s)ε
∗t+s.
We can always map the parameters from model (1.17) to (1.11) since we have −(1/ϕ∗sLs)ϕ∗(L) = ϕ(L−1),
where the coefficients of ϕ(L−1) are given as ϕi = −ϕ∗s−i/ϕ∗s for i = 1, . . . , s − 1, and ϕs = 1/ϕ∗s for i = s,
and the roots of ϕ∗(L) and ϕ(L−1) are inverses (in the sense that ϕ∗(z) = ϕ(1/z) = 0 for some complex z
where |z| < 1).
From the original representation in equation (1.11) we have that
α(L)xt = ε∗t ⇔ xt − α1xt−1 − · · · − αpxt−p = ε∗t , (1.18)
and so under this standardization, the autoregressive coefficient associated with the current time period, xt, is
normalized to one (i.e. α0 = 1).
CHAPTER 1. DYNAMIC MODELING OF COMMODITY FUTURES PRICES 36
However, given the alternative representation we have that
1
ϕ∗sxt+s +
ϕ∗1ϕ∗sxt+(s−1) + · · ·+ xt + · · ·+ φr−1xt−(r−1) + φrxt−r =
ϕsxt+s + ϕs−1xt+(s−1) + · · ·+ xt + · · ·+ φr−1xt−(r−1) + φrxt−r = εt, (1.19)
and so under this alternative standardization, the autoregressive coefficient chosen as equal to 1 does not coincide
with the most recent time period, t+ s; rather the standardization applies the autoregressive coefficient equal to 1
to the “intermediate” value xt, where the autoregression depends also on s future lags and r past lags.
1.5.2 ML estimation of the mixed causal/noncausal autoregressive model
In estimating the parameters of the mixed causal/noncausal autoregressive process in Section 1.5.1, we can apply
the usual maximum likelihood estimation (MLE) approach. The likelihood function represents the distribution
of the sample data, conditional on the parameters of the model. The maximum likelihood method estimates the
parameters of the model as the values of the parameters which maximize this likelihood function. Let θ represent
the vector of parameters, including the vectors of causal and noncausal autoregressive coefficients, φ and ϕ, and
the parameters characterizing the fat tailed, t-distributed, error term,11 that are its degree of freedom parameter λ
and scale σ. The maximum likelihood estimator is given as
θmle = argmaxθ
f(xT |θ), (1.20)
where
f(xT |θ) = f(xT , xT−1, xT−2, . . . , x1|θ)
= f(xT |xT−1, . . . , x1; θ)f(xT−1|xT−2, . . . , x1; θ) . . . f(x3|x2, x1; θ)f(x2|x1; θ)f(x1|θ), (1.21)
and xT = xT , xT−1, xT−2, . . . , x1 is the joint vector of sample data.
i) Approximation of the likelihood in the causal autoregressive model
In causal time series analysis of autoregressive models, say for the autoregressive model of order p, we know
that the likelihood function can be approximated by neglecting the effect of starting values. For example, the
11We employ either a t-distributed or skew t-distributed error term in order to identify the mixed causal/noncausal model. See Appendix1.11.5.
CHAPTER 1. DYNAMIC MODELING OF COMMODITY FUTURES PRICES 37
causal AR(p) model’s likelihood:
f(xT |θ) = f(xT , xT−1, xT−2, . . . , x1|θ)
= f(xT |xT−1, . . . , xT−p; θ)f(xT−1|xT−2, . . . , xT−p−1; θ) . . . f(x3|x2, x1; θ)f(x2|x1; θ)f(x1|θ), (1.22)
can be approximated by neglecting the conditional densities of the initial values xt for all t ≤ p. For large sample
size, T , this approximation error becomes negligible and the estimator obtained by maximizing the approximated
likelihood is still asymptotically efficient.
ii) Approximation of the likelihood in the mixed causal/noncausal autoregressive model
The maximum likelihood approach can also be used in the general framework of the mixed causal/noncausal
processes, that is the parameters estimated by:
θmle = argmaxθ
f(xT |θ), . (1.20)
Under standard regularity conditions, including the strong stationarity of the process and appropriate mixing
conditions, the ML estimator is consistent and its asymptotic properties, that is its speed of convergence and
asymptotic distribution, are easily derived [see Breidt et al. (1991)].
However, in practice the closed form expression of the likelihood, f(xT |θ), is difficult to derive and the
likelihood function has to be approximated, without loosening the asymptotic properties of the ML estimator.
Two approaches are typically suggested:
i) Take the autoregressive expression α(L)xt = εt, and approximate the likelihood by:
T∏t=p+1
fε(α(L)xt|β), (1.23)
where β are the parameters characterizing the distribution of the error. Such an approximation is wrong and
leads in general to an inconsistent estimator. The reason is as follows. Since the approximation is based on
the autoregression:
xt − α1xt−1 − α2xt−2 − · · · − αpxt−p = εt, (1.24)
the approximation above is valid if εt is independent of the explanatory variables, xt−1, . . . , xt−p. But in
a mixed model with a noncausal component, εt appears in the moving average representation of xt−1, . . . ,
and xt−p, which creates dependence. This is the well known error-in-variables model encountered in linear
models and usually solved by introducing instrumental variables, with, in general, a loss of efficiency.
CHAPTER 1. DYNAMIC MODELING OF COMMODITY FUTURES PRICES 38
ii) Consider the moving average expression of xt =∑∞l=−∞ aiεt−i with the identification restriction a0 = 1.
Set to zero the values of the noise corresponding to the indices outside the observation period, 1, 2, . . . , T,
that is, εt = 0, if t ≤ 0 and if t ≥ T + 1. Thus we truncate the moving average representation into:
xt ≈T−1∑
i=−T+t
ai(α)εt−i =
T∑τ=1
at−τ (α)ετ , for t = 1, . . . , T, (1.25)
where the dependence on the autoregressive parameters is explicitly indicated. We get a linear system of
equations, which relates the observations x1, . . . , xt and the errors ε1, . . . , εT in a one-to-one rela-
tionship. Therefore, the joint distribution of x1, . . . , xt can be deduced from the joint distribution of
ε1, . . . , εT , which has a closed form by applying the change of variables Jacobian formula. However, this
approach is difficult to implement numerically, since the matrix of the transformation, A(α), with generic
elements at−τ (α), t, τ = 1, . . . , T has a large T × T dimension. This makes difficult, first the inversion of
this matrix, and second the numerical computation of its determinant.
This explains why a methodology has been introduced to circumvent this numerical difficulty, which explains
how to approximately invert this matrix and compute the determinant, by using appropriately both the causal and
noncausal components [see Breidt et al. (1991), Lanne and Saikkonen (2008), and Appendix 1.12 of this paper].
This approximated likelihood is used in our application to commodity futures prices. The approximation requires
knowledge of the causal and noncausal orders (r, s) respectively. If they are unknown, the approach is applied to
all pairs of orders (r, s) such that r + s = p as given. The selected orders are the ones which minimize the AIC
criterion, based on the log-likelihood value.
More precisely, Lanne and Saikkonen (2008) note that the matrixA(α) can be approximately written as
A(α) ≈ Ac(φ)Anc(ϕ∗), (1.26)
where Ac(φ) (resp. Anc(ϕ∗)) depends on the causal (resp. noncausal) autoregressive coefficients only, and is
lower (resp. upper) triangular with only 1’s on the diagonal. Therefore, the Jacobian
|det (A(α))| ≈ |det (Anc(φ))||det (Anc(ϕ∗))| = 1 (1.27)
and can be neglected. Therefore, the likelihood function can be approximated by
T−s∏t=r+1
fε(ϕ(L−1)φ(L)xt;λ, σ), (1.28)
CHAPTER 1. DYNAMIC MODELING OF COMMODITY FUTURES PRICES 39
where θ = φ,ϕ, λ, σ represents the parameters of the model, that is, the vectors of causal and noncausal
autoregressive coefficients respectively, and the t-distribution degree of freedom and scale parameter assumed on
(εt).
They show that the only autoregressive representation which leads to consistent estimators is the representation
with the autoregressive coefficient equal to 1 for xt with r lagged values before, and s lagged values afterwards,
as given above in the autoregressive equation (1.19).
Example 1: Causal AR(1) process
Let us consider the stationary causal AR(1) process
xt = α1xt−1 + εt, where |α1| < 1. (1.29)
This is the usual case and so we can employ the MLE to estimate α1 by maximizing the approximate likelihood
function∏Tt=2 fε(xt − α1xt−1). This case does not present a problem since we already have the coefficient in
front of xt equal to 1.
Example 2: Noncausal AR(1) process
However, given the stationary noncausal AR(1) process
xt = α1xt−1 + εt, where |α1| > 1, (1.30)
the estimator which maximizes the approximate likelihood function∏Tt=2 fε(xt−α1xt−1) is now biased. Indeed,
since xt can be written as the noncausal moving average xt =∑∞j=0
(1α1
)jε∗t+1+j , there now exists a dependence
between xt and xt−1.
The methodology leading to consistent estimation consists in the case of regressing xt−1 on xt, instead of
regressing xt on xt−1. We can rewrite the noncausal regression above as
xt =1
α1xt+1 −
1
α1εt+1 =
1
α1xt+1 + ε∗t+1 where |α1| > 1, (1.31)
which now restores the independence between the regressand and the regressor, and so the MLE which maximizes∏Tt=2 fε∗(xt −
1α1xt+1) is asymptotically unbiased.
1.5.3 Estimation results
In this Section we will evaluate estimation results from the mixed autoregressive model of order (r, s) as applied
to the 25 commodity futures price level series. Estimation of the model parameters numerically optimizes the
CHAPTER 1. DYNAMIC MODELING OF COMMODITY FUTURES PRICES 40
approximated likelihood function discussed in the last section. As in Lanne and Saikkonen (2008) and Lanne,
Luoto, and Saikkonen (2012), we assume the regularity conditions of Andrews et al. (2006) are satisfied, which
require the likelihood to be twice differentiable with respect to both xT and θ. The approximated likelihood
algorithm is computed in Fortran and the optimization of the likelihood function is performed using a set of
Fortran optimization subroutines called the PORT library, designed by David M. Gay at Bell Laboratories [Gay
(1990)].
As in Section 1.3 where the linear causal ARMA model with Gaussian innovations was shown to inadequately
capture the features of the price level data, we will again employ the AIC criterion as a measure of model fit, along
with Ljung-Box statistics testing the hypothesis that the innovations exhibit no linear autocorrelation. In this way,
we will consider the best fitting linear causal ARMA model, with Gaussian innovations, from Section 1.3 as a
benchmark model.
Table 1.5.i presents the results of maximum likelihood estimation. The mixed AR model orders, (r, s), were
selected via AIC among a possible set of (r, s) values such that r ≤ 10 and s ≤ 10. The first row of the results
for each series represents the benchmark ARMA model, with Gaussian innovations, from Section 1.3, while the
second and third rows represent the mixed AR(r, s) model with both t-distributed and skew t-distributed errors,
respectively. Recall that the mixed causal/noncausal model is only identified for non-Gaussian error terms. The
lags column represents the number of lags included in the Ljung-Box statistic, where p-values are provided in
their respective columns. Finally, an ’x’ marks the model with the lowest normalized AIC.
The estimation results suggest that the mixed causal/noncausal model improves model fit over the baseline
causal ARMA model, with Gaussian innovations. When the models are nested, we employ likelihood ratio (LR)
tests. In every case the mixed causal/noncausal model improves model fit significantly at the 1% significance
level.
In comparing the skewed t-distributed error term mixed causal/noncausal model to the standard t-distribution
error term model, the results vary by series. In most cases the skewed t-distribution improves model fit and passes
a LR test at the 1% level. Moreover, orange juice, lumber, silver, copper, light crude oil, and gas oil also pass at
the 5% level and coffee passes at the 10% level. Series that do not pass LR tests at the 10% level are soybean
meal and oil, sugar, corn, cotton, rice, gold, palladium, natural gas, and lean cattle, suggesting that there is little
gain in employing a skewed t-distribution on the innovations of these mixed models.
Interestingly, the estimated t-distribution degree of freedom parameter, λ, for the mixed causal/noncausal
model error terms range between near 1 (i.e. Cauchy distributed) to around 3 or so in most cases, which suggests
bubble like behaviour as discussed in Gourieroux and Zakoian (2012). The only exceptions to this are found in
lumber (λ ≈ 3.88), gasoline RBOB (λ ≈ 4.93), and live cattle (λ ≈ 3.39).
Moreover, an examination of the roots of the lag polynomials implied by the estimated parameters also con-
CHAPTER 1. DYNAMIC MODELING OF COMMODITY FUTURES PRICES 41
firms the partly noncausal nature of the series. If we accept only the statistically significant estimated parameters
12 and solve for the roots of the implied causal and noncausal lag polynomials, φ(L) and ϕ(L−1) (from (1.17a)),
we find that the roots of both appropriately lie outside the unit circle.13 Of course, if the data generating process
was purely causal, none of the lags of the noncausal polynomial, ϕ(L−1), should be statistically significant.
Moreover, if we fit the best (according to the AIC criterion) purely causal ARMA model, with t-distributed
error terms instead of Gaussian ones, we often find that the estimated roots of the causal lag polynomial lie inside
the unit circle. This suggests misspecification of the noncausal component, as well as the fact that the noncausality
is not identified in the purely causal ARMA model with Gaussian innovations.
For reference we provide tables with all of the roots of the lag polynomials of both the causal ARMA models
of order (p, q), with t-distributed innovations, and the mixed causal/noncausal AR models of order (r, s) (see
Tables 1.7.i to 1.7.iii within Appendix 1.14).
For example, estimating purely causal ARMA models with t-distributed innovations suggest the following
results: wheat, coffee, rice, gold, platinum, all the energy series except natural gas, and lean hogs all share at least
one root with absolute value less than one in their α(L)β(L) = δ(L) lag polynomial (that is α(L)
β(L) = δ(L) in the ARMA
model δ(L)xt = εt, where α(L) and β(L) are the AR and MA lag polynomials respectively), suggesting that this
polynomial could be factorized and then estimated as a mixed causal/noncausal model (instead of the traditional
differencing technique employed).
Furthermore, the very large valued roots of the causal polynomial for light crude oil, gas oil, and heating oil,
suggest that these series may be better represented as purely noncausal since these large causal roots have little
effect on the causal impulse response. This result is confirmed by looking at the mixed causal/noncausal model
roots of light crude oil, but not for gas or heating oil which have causal polynomial roots relatively close to 1.
Finally, the mixed causal/noncausal representation for soybeans suggests that the process may be better modeled
as purely causal, while the results for cotton, live cattle, and lean hogs suggest they may be purely noncausal.
To summarize, our results suggest that most of the futures price series exhibit much better in-sample model
fit, according to the AIC criterion, when modeled by a mixed causal/noncausal autoregressive specification that
takes into account their possible noncausal components. Moreover, this noncausality is unidentified in the purely
causal ARMA model with Gaussian innovations. Finally, estimation of purely causal ARMA models with fat
tailed, t-distributed, innovations reinforces the series’ noncausal nature, as often the causal lag polynomial roots
lie inside the complex unit circle.
12Tested at the 5% level, assuming Normally distributed parameters and employing the inverse of the observed Hessian matrix at the MLEestimated value as the parameter covariance matrix.
13Which implies that (1.11a), α(L) = φ(L)ϕ∗(L), is such that the roots of φ(L) lie strictly outside the complex unit circle while those ofϕ∗(L) lie strictly inside.
CHAPTER 1. DYNAMIC MODELING OF COMMODITY FUTURES PRICES 42
Table 1.5.i: Estimation results of mixed causal/noncausal AR(r, s) models
Series p/r q/s AIC Log-likelihood Ljung-Box λ
Soybean meal 6 2 52395.000 -26188.500 0.152 ∞
x 10 10 48208.261 -24081.130 0.007 2.070
10 10 48210.118 -24081.059 0.007 2.072
Soybean oil 8 3 11859.050 -5917.523 0.919 ∞
x 10 10 9211.876 -4582.938 0.135 2.455
10 10 9213.688 -4582.844 0.138 2.455
Soybeans 9 2 73548.210 -36762.110 0.521 ∞
10 10 69444.844 -34699.422 0.000 2.073
x 10 10 69438.354 -34695.177 0.000 2.086
Orange juice 4 3 42121.610 -21052.800 0.395 ∞
10 10 38686.959 -19320.480 0.378 2.326
x 10 10 38683.919 -19317.960 0.389 2.331
Sugar 10 2 7842.392 -3908.196 0.999 ∞
x 2 2 1549.499 -767.750 0.000 1.702
2 2 1551.289 -767.645 0.000 1.702
Wheat 7 2 67069.470 -33524.740 0.998 ∞
5 5 61896.849 -30935.424 0.000 2.028
x 5 5 61880.290 -30926.145 0.000 2.047
Cocoa 8 3 94368.760 -47172.380 0.716 ∞
2 1 91804.882 -45896.441 0.000 2.558
x 10 10 91586.110 -45769.055 0.003 2.584
Coffee 4 2 48866.800 -24426.400 0.064 ∞
10 10 43731.886 -21842.943 0.014 1.923
x 10 10 43730.300 -21841.150 0.012 1.925
Corn 7 3 59385.840 -29681.920 0.625 ∞
x 2 3 53647.827 -26815.913 0.776 1.811
2 3 53649.243 -26815.622 0.783 1.811
Cotton 10 0 32760.780 -16369.390 1.000 ∞
x 1 3 27005.831 -13495.916 0.000 2.455
1 3 27007.812 -13495.906 0.000 2.455
CHAPTER 1. DYNAMIC MODELING OF COMMODITY FUTURES PRICES 43
Table 1.5.ii: Estimation results of mixed causal/noncausal AR(r, s) models
Series p/r q/s AIC Log-likelihood Ljung-Box λ
Platinum 8 2 55936.820 -27957.410 0.129 ∞
10 10 51667.822 -25810.911 0.000 1.572
x 10 10 51644.800 -25798.400 0.000 1.585
Rice 10 3 -4799.022 2413.511 0.958 ∞
x 1 3 -7173.685 3593.842 0.013 2.076
1 3 -7172.345 3594.173 0.013 2.075
Lumber 8 3 44027.920 -22001.960 1.000 ∞
10 10 42939.948 -21446.974 0.562 3.874
x 10 10 42937.244 -21444.622 0.546 3.876
Gold 0 3 102914.500 -51453.270 n/a ∞
x 10 10 56917.739 -28435.869 0.000 1.317
10 10 56919.621 -28435.811 0.000 1.318
Silver 9 3 7424.036 -3699.018 0.935 ∞
10 10 -7052.297 3549.149 0.000 1.063
x 10 10 -7056.283 3552.141 0.000 1.066
Palladium 9 3 48209.690 -24091.840 0.992 ∞
x 8 8 42569.544 -21265.772 0.000 1.225
8 8 42571.492 -21265.746 0.000 1.225
Copper 10 0 34719.500 -17348.750 1.000 ∞
10 10 30533.482 -15243.741 0.000 1.349
x 10 10 30529.777 -15240.889 0.000 1.354
Light crude oil 7 2 22244.110 -11112.060 0.949 ∞
1 3 17297.702 -8641.851 0.015 1.409
x 1 3 17295.206 -8639.603 0.014 1.415
Heating oil 9 2 34465.280 -17220.640 0.998 ∞
x 10 10 30808.001 -15381.000 0.042 1.535
8 8 30841.794 -15400.897 0.000 1.538
Brent crude oil 7 2 18807.920 -9393.960 0.901 ∞
10 10 15081.643 -7517.822 0.000 1.458
x 10 10 15073.528 -7512.764 0.000 1.462
CHAPTER 1. DYNAMIC MODELING OF COMMODITY FUTURES PRICES 44
Table 1.5.iii: Estimation results of mixed causal/noncausal AR(r, s) models
Series p/r q/s AIC Log-likelihood Ljung-Boxa λf
c Platinum 8 2 55936.820 -27957.410 0.129 ∞d 10 10 51667.822 -25810.911 0.000 1.572
e xb 10 10 51644.800 -25798.400 0.000 1.585
Gas oil 5 3 44142.240 -22062.120 0.922 ∞
10 10 41116.045 -20535.023 0.259 1.566
x 10 10 41112.456 -20532.228 0.261 1.574
Natural gas 3 2 -4178.268 2095.134 0.226 ∞
x 1 1 -7772.315 3891.158 0.017 1.666
1 1 -7771.454 3891.727 0.018 1.666
Gasoline RBOB 5 3 11715.320 -5848.658 0.988 ∞
2 1 11535.858 -5761.929 0.050 4.662
x 2 1 11526.267 -5756.133 0.056 4.925
Live cattle 6 1 22771.400 -11377.700 0.986 ∞
10 10 20427.885 -10190.943 0.915 3.331
x 8 8 20426.530 -10193.265 0.873 3.392
Lean hogs 3 2 23567.630 -11777.810 0.704 ∞
0 2 18929.149 -9459.574 0.572 2.728
x 0 2 18922.375 -9455.188 0.570 2.737
a The Ljung-Box statistics are given as p-values, where the lag parameter chosen is the log sample size, ln(T ).
b The ’x’ row for each series denotes the model with the lowest AIC.
c The first row in each series is the causal ARMA(p, q) model with Gaussian innovations estimated in Section
1.3.
d The second row is the mixed causal/noncausal AR(r, s) with t-distributed errors.
e The third row is the same model but with skew t-distributed errors.
f The λ column indicates the estimated degree of freedom parameter for the error term distribution–in the
skewed t-distributed case this value represents the sum of the two skew parameters. See Appendix 1.11.5.
1.6 Comparison of the estimated unconditional distributions
Another way to evaluate the mixed causal/noncausal autoregressive model is by comparing its model based un-
conditional distribution by sample histogram. Histograms are estimated for both the purely causal ARMA and
CHAPTER 1. DYNAMIC MODELING OF COMMODITY FUTURES PRICES 45
mixed causal/noncausal autoregressive models, both employing t-distributed error terms, by simulating long sam-
ple paths of length T = 200000, given the model parameters estimated by MLE in Section 1.5.3.
The mixed causal/noncausal autoregressive model seeks to capture both the asymmetries and bubble features
present in commodity futures prices. The transversal asymmetry and bubble features present in the series can
be examined visually by considering the sample histograms of the price series presented in Figures 1.11.i to
1.11.iv in Appendix 1.14. Note the long, positively-skewed, tails many of the series exhibit, illustrating how these
price series tend to spend most of the time in the shallow troughs, occasionally interrupted by brief, but dramatic,
positive bubbles.
The metric employed in comparing the estimated unconditional distributions is the Kullback-Leibler diver-
gence measure, which is a non-symmetric measure of the difference between two probability distributions P
and Q. Specifically, the Kullback-Leibler measure, from continuous distributions Q to P, denoted KL(Q,P ) =∫∞−∞ ln
(p(x)q(x)
)p(x)dx, is the measure of the information lost when we use Q to approximate P. 14 Since the
Kullback-Leibler measure is “information monotonic”, as an ordinal measure of making comparisons it is invari-
ant to the choice of histogram bin size. Table 1.6 reports the Kullback-Leibler measures of the sample histogram
densities for both KL(P,Q) and KL(Q,P ) where p(x) denotes the estimated p.d.f. of the sample data and the
q(x)’s are model based estimates from the simulated sample paths of the purely causal and mixed causal/noncausal
autoregressions.
Table 1.6 is broken into two sections: the two left columns report the Kullback-Leibler measure where the
estimated models are used to approximate the sample data. In this case if the sample path density has zero support
for some region in its domain, it does not punish the prospective model density for allocating too much (resp. too
little) probability to this region since this component of the Kullback-Leibler sum is zero. The two right columns
report the opposite case where the sample path density is used to approximate the estimated models; in this case,
if the model density has zero support in some region of its domain then the sample path density isn’t penalized
for allocated too much (resp. too little) probability to this region. Finally, smaller values indicate less information
lost by the approximation and are preferred.
The results of these comparisons suggest the following. First, the Kullback-Leibler measures show that the
unconditional distributions generated by the causal ARMA models represent a poor fit to the sample data. The
ARMA model seems unable to produce the sharp bubble like behaviour we see in most of the series and the
shape of its unconditional density is often much too uniform. It does not exhibit long, positively skewed, tails as
are present in many of the estimated histograms of the commodity futures prices as provided in Figures 1.11.i
to 1.11.iv in Appendix 1.14. Moreover, we often find that the sample paths from the causal ARMA models are
14In employing estimated sample histograms we use the discretized version of the Kullback-Leibler formula where areas of zero supportare padded with 1−315.
CHAPTER 1. DYNAMIC MODELING OF COMMODITY FUTURES PRICES 46
Table 1.6: Kullback-Leibler divergence measures
KL(Q,P)a KL(P,Q)Series ARMA MIXED ARMA MIXEDSoybean meal n.s.b 00.329 n.s. 97.216Soybean oil 01.965 00.316 495.751 55.752Soybeans n.s. 00.310 n.s. 49.584Orange juice 00.976 00.216 351.966 60.033Sugar 01.768 00.500 326.343 168.821Wheat 00.535 00.427 44.699 32.956Cocoa 00.625 01.247 230.260 37.961Coffee 04.519 00.216 703.097 81.218Corn 01.526 00.549 185.980 144.244Cotton 00.808 12.710 114.104 25.918Rice 00.429 00.311 59.220 123.030Lumber 00.149 00.136 07.610 08.477Gold n.s. uns.c n.s. uns.Silver n.s. uns. n.s. uns.Platinum n.s. 00.662 n.s. 96.789Palladium n.s. 01.368 n.s. 440.585Copper n.s. 00.832 n.s. 173.295Light crude oil n.s. 00.813 n.s. 202.916Heating oil n.s. 01.043 n.s. 326.858Brent crude oil n.s. 00.759 n.s. 118.503Gas oil n.s. 00.709 n.s. 132.528Natural gas 00.906 00.753 303.694 325.575Gasoline RBOB 01.429 00.261 483.674 08.649Live cattle 00.562 18.227 31.469 76.491Lean hogs 02.649 00.032 640.295 03.308average 01.346 01.858 284.154 121.335selective averaged 01.206 00.650
a P represents the sample data.b “n.s.” stands for nonstationary, i.e. the simulations from the causal linear model
were explosive.c “uns.” within the context of the mixed causal/noncausal models implies that the
simulated sample paths were, for a lack of better words, “unstable”: highly er-ratic with extremely long tails and extremely irregular, almost “chaotic” type be-haviour. In general, while stationary, models with “uns.” listed represented poorcandidates as having come from the data’s DGP.
d The selective average omits the extreme outlying cases highlighted in bold.
CHAPTER 1. DYNAMIC MODELING OF COMMODITY FUTURES PRICES 47
explosive, due to the noncausal root in their estimated causal lag polynomials.
The results from the left hand columns of Table 1.6 suggest a few distinct outlying Kullback-Leibler measures.
For example, cotton and live cattle are extremely large compared to the other series’ measures in the case of the
mixed causal/noncausal autoregressive model, and coffee represents an outlier in the case of the causal ARMA
models. Given the presence of these outliers, we calculate both the average Kullback-Leibler measure across all
series and the average omitting these outliers. Given this selective average, we find that, in both the left and right
columns (i.e. in the case of both KL(Q,P ) and KL(P,Q), respectively), the mixed causal/noncausal model
represents a better fit to the sample data than the purely causal ARMA model.
Finally, Figure 1.8 provides an example of the estimated unconditional densities for cocoa and coffee, respec-
tively.
Figure 1.8: Estimated unconditional densities, Cocoa and Coffee
1.7 Forecasting the mixed causal/noncausal model
This section will first consider the problem of computing the predictive conditional density of the mixed
causal/noncausal model, when the information set includes only the past values of the time series data up to some
time t, say Ft = xt, xt−1, . . . , x1. We then evaluate the ability of the mixed causal/noncausal model to not
only fit the training sample, but also its ability to forecast out of sample.
CHAPTER 1. DYNAMIC MODELING OF COMMODITY FUTURES PRICES 48
1.7.1 The predictive distribution
Let us consider the general stochastic process:
xt = h(. . . , εt−1, εt, εt+1, . . . ), where εt ∼ i.i.d. (1.32)
Moreover, let Ft = xt, xt−1, . . . , x1 represent the information set generated by the stochastic process up to and
including time t.
The best nonlinear forecasts, at date t, for a given horizon h, can be deduced from the conditional distribution
of xt+h, given Ft. More precisely, if a(xt+h) is a square integrable transformation of xt+h, then its best predictor
is simply E[a(xt+h)|Ft] =∫a(xt+h)ft+h|t(xt+h|Ft)dxt+h.
In our framework, the standard moments may not exist and so we cannot choose to predict a(xt+h) = xt+h
for example. An alternative approach can be to compute the prediction intervals by considering the quantiles of
the predictive distribution. This is the solution adopted here.
Lanne, Luoto, and Saikkonen (2012) suggest a means whereby we can simulate these quantiles. Their nu-
merical algorithm is discussed in Appendix 1.13. However, this method is computationally demanding and not
necessarily the most straightforward method. Therefore, we begin a discussion below that considers the problem
from first principles.
1.7.2 Equivalence of information sets
Consider the general mixed causal/noncausal model from (1.17), with causal order r and noncausal order s. It is
clear that given the information set Ft = x1, . . . , xt, this is equivalent to knowing,
Ft ≡ x1, . . . , xr, ur+1, . . . , ut, (1.33)
since ut = φ(L)xt [see the Appendix 1.12]. Note that ut represents a shock to the process xt, which is an autore-
gressive function of xt, since ut = φ(L)xt, but where the ut’s are not i.i.d. Rather ut is noncausal autoregressive
since we have that ϕ(L−1)ut = ϕ(L−1)φ(L)xt = εt, where εt is i.i.d.
Knowing the latter information set in (1.33) is also equivalent to knowing,
Ft ≡ x1, . . . , xr, εr+1, . . . , εt−s, ut−s+1, . . . , ut, (1.34)
CHAPTER 1. DYNAMIC MODELING OF COMMODITY FUTURES PRICES 49
since εt = ϕ(L−1)ut = ϕ(L−1)φ(L)xt. Moreover, this information is also equivalent to,
Ft ≡ v1, . . . , vr, εr+1, . . . , εt−s, ut−s+1, . . . , ut, (1.35)
where ϕ(L−1)xt = vt. Therefore, for the process ut that is noncausal of order s, predicting ut+1 based on the
information set Ft, is equivalent to predicting it based on the information subset ut−s+1, . . . , ut, since the
v1, . . . , vr, εr+1, . . . , εt−s elements are independent of the future values ut+1, . . . , ut+h, for some forecast
horizon h.
Therefore, in establishing the predictive density of the mixed causal/noncausal process xt, we can focus our
attention on the problem of predicting the noncausal component ut+1 conditional on the past information set
Ft = ut−s+1, . . . , ut, since there is a direct relationship between the predictive distributions of ut+1 and xt+1
in the sense that,
fxt+1|t(xt+1 − µt|xt, . . . , x1) = fxt+1|t(φ(L)xt+1|xt, . . . , x1) (1.36a)
= fut+1|t(ut+1|xt, . . . , x1) (1.36b)
= fut+1|t(ut+1|ut, . . . , ut−s+1, εt−s, . . . , εr+1, xr, . . . , x1) (1.36c)
= fut+1|t(ut+1|ut, . . . , ut−s+1), (1.36d)
where µt = φ1xt + φ2xt−1 + · · ·+ φrxt−(r−1). (1.36e)
Since the change of variables implies a Jacobian determinant of 1, the conditional density of xt+1 is just a relo-
cation of the conditional density of ut+1. Here, µt represents a location parameter and so ut+1 = xt+1 − µt =
φ(L)xt+1. Therefore, by simulating the quantiles of fut+h|t(ut+h|Ft), we are able to generate prediction intervals
for xt+h.
The prediction problem of the noncausal process ϕ(L−1)ut+1 = εt+1, based on past information set Ft,
must be considered with some care. In this way we first consider some simple examples. Note that while ut is a
noncausal autoregressive process, we desire the causal predictor which is based on the past information set Ft,
and this predictor is generally nonlinear for non-Gaussian εt.
1.7.3 Examples: the causal prediction problem of the noncausal process
Example 1: AR(0, 1) case
Let us consider the prediction problem for the purely noncausal model of order s = 1. We get, xt+1 = ut+1 =
ϕ1ut+2 + εt+1, and εt+1 ∼ i.i.d. In this case we desire the predictive density fxt+1|t(xt+1|Ft), based on the past
CHAPTER 1. DYNAMIC MODELING OF COMMODITY FUTURES PRICES 50
values of the process Ft = xt, . . . , x1, but where the process (xt) is noncausal.
Since xt = ut and s = 1, the predictive density fxt+1|t(xt+1|xt) depends only on the past information set
xt = ut, and by Bayes Theorem we get
fxt+1|t(xt+1|xt) = fxt|t+1(xt|xt+1)fx(xt+1)/fx(xt), (1.37)
where fx(·) denotes the stationary distribution of the process (xt). We already know the noncausal transition den-
sity fxt|t+1(xt|xt+1), since it is defined by our linear model and our assumption on the shocks εt, since ut = xt,
and ϕ(L−1)ut = εt, the conditional density of ut given ut+1 is the same as the density of εt, up to a location
parameter. However, what is not clear is how to deal with the stationary distribution fx(·) since its analytical
expression is unknown in the general case [although it has been derived where εt ∼ Cauchy(0, σ2) in Gourier-
oux and Zakoian (2012)]. Lanne, Luoto, and Saikkonen (2012) suggest a means whereby we can circumvent
this problem by “enlarging the space” of random variables (see the Appendix 1.13). This very computationally
intensive approach is not the most direct, as we shall show below.
One alternative that works quite well when the order of the noncausal polynomial is low, is to simply ap-
proximate the stationary distribution fx(xt+1) in (1.37) by means of a kernel smoother. For example, given the
stationary nature of the data, xt = ut, a consistent estimator of fx(·) is given by the kernel density estimator:
fx(xt) ≈1
Th
T∑τ=1
K
(xt − xτ
h
), (1.38)
where h > 0 is an appropriately chosen smoothing parameter defining the bandwidth andK(·) is a kernel function,
for instance a symmetric function that integrates to one. The Epanechnikov Kernel K(x) = 34 (1−x2)1|x|≤1, can
be shown to be efficient in the mean squared error sense [see e.g. Epanechnikov (1969)].
Example 2: AR(0, s) with s > 1
Let us now consider a larger noncausal autoregressive order, where we still face the purely noncausal prediction
problem ut+1 = xt+1. Let the noncausal lag polynomial be of order s: ϕ(L−1) = 1− ϕ1L−1 − · · · − ϕsL−s.
Again, let us express the predictive density in terms of Bayes theorem, where, since ut = xt is a noncausal
autoregressive process of arbitrary order s, the prediction depends only on the subset of information given by
Ft = xt, . . . , xt−s+1,
fxt+1|t,...,t−s+1(xt+1|xt, . . . , xt−s+1)
=fxt|t+1,...,t+s
(xt−s+1|xt−s+2, . . . , xt, xt+1)fx(xt+1, xt, . . . , xt−s+2)
fx(xt, xt−1, . . . , xt−s+1)(1.39)
CHAPTER 1. DYNAMIC MODELING OF COMMODITY FUTURES PRICES 51
Again, fx(xt−s+1|xt−s+2, . . . , xt, xt+1) is known from our linear noncausal autoregressive model of order s.
However, it remains unclear how to deal with the joint stationary density, fx(·), of a sequence of s successive
values of the process, especially for a larger dimension of s.
Indeed, the kernel estimator will prove problematic for large noncausal orders, s, since we now face a multidi-
mensional smoothing problem. As the dimension of the smoothing problem increases, much more data is required
in order to get a reliable estimate of this joint density.
1.7.4 A Look-Ahead estimator of the predictive distribution
Gourieroux and Jasiak (2013) suggest a direct solution to the problem of computing the predictive density
fxt+1|t(·) of the noncausal process when the dimension s is relatively large. The method relies on the “Look-
Ahead” estimator of the stationary density fx(·) [see Glynn and Henderson (1998) for the introduction of this
estimator and Garibotti (2004) for an application]. First we describe the estimator in the univariate framework
where the order of the noncausal polynomial is s = 1, and then provide an analog for the case where s > 1.
Markov process
The Look-Ahead estimator, introduced by Glynn and Henderson (1998), is a relatively simple method which
allows us to estimate the stationary distribution of a Markov process, if it exists. Take for example, the Markov
process, (xt), discussed in Example 1 above, with unique invariant density fx(·), and transition density fxt|t+1(·)
as expressed in (1.37). This Markov transition density satisfies the Kolmolgorov equation,
fx(x∗t ) =
∫fxt|t+1
(x∗t |xt+1)fx(xt+1)dxt+1,∀x∗t , (1.40)
where x∗t denotes the generic argument of the stationary density. Therefore, given a finite sample from the
stationary process, (xτ )tτ=1, we can approximate the stationary density by
fx(x∗t ) =
t−1∑τ=0
fxt|t+1(x∗t |xτ+1),∀x∗t , (1.41)
where fxt|t+1(x∗t |xt+1) is known explicitly from our linear noncausal autoregressive model.
CHAPTER 1. DYNAMIC MODELING OF COMMODITY FUTURES PRICES 52
Markov process of order s
For larger noncausal order s > 1, the result is analogous. The two stationary distributions in the numerator and
denominator, fx(·), can be estimated by the Look-Ahead estimator as
fx(x∗t ) =
t−s∑τ=0
lxt|t+1(x∗t |xτ+s), (1.42)
where xt = xt, xt−1, . . . , xt−s+1. The density above is more easily understood as the factorization of the joint
noncausal transition density,
lxt|t+1(xt, . . . , xt−s+1|xt+1, . . . , xt+s) =
s−1∏j=0
fxt|t+1,...,t+s(xt−j |xt+1−j , . . . , xt+s−j), (1.43)
whose terms are known for all j, given the linear noncausal autoregressive model (they are equal to the density of
εt, up to a location parameter).
1.7.5 Drawing from the predictive distribution by SIR method
Given the approximate expression for the stationary density functions, fx(·), of both the numerator and denomina-
tor in (1.39), provided by the Look-Ahead estimator, we are now free to draw samples from the entire predictive
density fxt+1|t(·) directly. One way this can be accomplished is by means of the (SIR) Sampling Importance
Resampling technique [see Rubin (1988), and Smith and Gelfand (1992)].
The SIR method is essentially a reweighted bootstrap simulation. Suppose we have access to some drawings
from the continuous probability density f(x), say x1, . . . , xN, but we are unable to draw samples ourselves.
The bootstrap procedure directs us to resample from the set x1, . . . , xN, each draw having probability 1/N . The
resulting resampled set is then an approximation to draws from f(x), with the approximation error approaching
zero as N →∞. Indeed, for any resampled draw x we have,
Pr(x ≤ a) =1
N
N∑i=1
1xi≤a →n→∞ Ef [1x≤a] =
∫ a
−∞f(x)dx. (1.44)
Of course the bootstrap is limited in that if our initial sample from f(x) is small, repeatedly resampling
from this limited sample will provide a poor approximation. The SIR allows us to circumvent this problem by
allowing us to draw our initial sample from some instrumental density g(x). Then by resampling from this sample,
according to the weights f(x)/g(x), we are able to approximate a sample from f(x), rather than a sample from
g(x). To show this note that
CHAPTER 1. DYNAMIC MODELING OF COMMODITY FUTURES PRICES 53
Pr(x ≤ a) =
N∑i=1
(f(xi)
g(xi)
)1xi≤a →n→∞ Eg[
(f(x)
g(x)
)1x≤a] =
∫ a
−∞f(x)dx. (1.45)
The closer is the target f(x) to the instrumental density g(x), the faster the rate of convergence.
Within the context of generating draws from the predictive density of the noncausal process, fxt+1|t(·), we
should therefore generate draws from some proposal g(·) which closely approximates the target. Indeed, we have
an analytic approximate expression for fxt+1|t(·) in terms of the product of the noncausal conditional density and
the Look-Ahead estimators of the stationary densities (see equation (1.39)), but we are unable to draw from this
density directly.
The SIR method is especially appealing since it can be easily parallelized with reduced computational costs.
That is, we can draw N samples from the predictive density in parallel as opposed to say a Metropolis Hastings
algorithm, which is inherently sequential in nature.
Moreover, the Basel III voluntary regulation standard on bank capital levels, stress testing, and market liquidity
risk, was agreed upon by the members of the Basel Committee on Banking Supervision between 2010 to 2011
and is scheduled to be introduced in 2018. Part of this regulation is the requirement that econometric models
employed by financial institutions must include the possibility to simulate future sample paths for asset prices.
Of course, this is a prerequisite for performing stress tests. In this respect, the proposed methodology of Lanne,
Luoto, and Saikkonen (2012) would be rejected by regulators.
Forecasts up to some horizon ’h’
Given the joint predictive density, conditional on Ft, but out to some horizon h > 1, we can use the same SIR
method to draw samples as in the case where h = 1, since we can factorize the joint density as the product of the
expressions given in equation (1.39) as,
g(xt+h, . . . , xt+1|Ft) =
h∏j=1
fxt|t,...,t−s+1(xt+j |xt+j−1, . . . , xt+j−s+1) (1.46a)
=
h∏j=1
(fxt|t+1,...,t+s
(xt−s+j |xt+j−s+1, . . . , xt+j)) fx(xt+h, xt+h−1, . . . , xt+h−s+1)
fx(xt, xt−1, . . . , xt−s+1).
(1.46b)
Therefore, since terms in the product cancel, as h gets large we need only estimate one term in both the numerator
and denominator by the Look-Ahead method. Of course, for the SIR simulation with horizon h, we require an
h-dimension proposal density g(·).
CHAPTER 1. DYNAMIC MODELING OF COMMODITY FUTURES PRICES 54
1.7.6 Application to commodity futures data
While the method described above is computationally intensive, it is clear that it is ripe for parallelization since we
can potentially draw each of the N samples from the h-dimensional predictive density, g(xt+h, . . . , xt+1|Ft), at
the same time. In this sense, we have implemented the algorithm in parallel using the CUDA development libraries
designed and freely available from Nvidia at http://www.nvidia.ca/object/cuda\_home\_new.
html. All that is required is a Nvidia GPU (graphics processing unit) and knowledge of the C programming
language.
In order to evaluate forecasts, we set aside an additional 107 sample data points beyond the most recent date
available within-sample, which is February 8th, 2013. Therefore, this out of sample period extends between
February 11th to July 15th, 2013. 15
As an example we now employ the Look-Ahead estimator of the stationary density, and the SIR method, to
generate draws from the predictive density of the mixed causal/noncausal model for the coffee futures series. The
parameters of the model are those estimated in section 1.5.3, where the shock is skew t-distributed.
In the implementation of the SIR approach, the instrumental distribution, that is the importance function has to
be chosen close to the conditional distribution used to simulate the future asset price paths, that is, the predictive
distribution outlined above. We select as the instrumental distribution a multivariate Gaussian distribution. Such
a Gaussian distribution is parametrized by the vector of means and by the variance-covariance matrix.
However, the first and second order moments of the conditional distribution do not necessarily exist. There-
fore, the matching of the two distributions has to be based on other existing moments. Among the possible
alternatives are calibrations based on the joint characteristic function, or calibration based on the first and second
order moments of the square root of the absolute values of future prices, which exist. We have followed the
second calibration, which has the advantage of leading to a number of moment restrictions equal to the number
of parameters to be matched. Finally note that both the square root marginal and cross moments of the con-
ditional distribution of interest, and of the Gaussian approximation, have no closed form expression and have
to be computed numerically; for instance by reapplying the modified Look-ahead estimator for the conditional
distribution.
The following plot in Figure 1.9 provides the forecasted conditional median, and 95% prediction intervals.
15Feburary 9th and 10th fall on a weekend.
CHAPTER 1. DYNAMIC MODELING OF COMMODITY FUTURES PRICES 55
Figure 1.9: Forecast predictive density for Coffee futures price series
1.8 Conclusion
The mixed causal/noncausal autoregressive model is able to capture asymmetries and bubble features present
in the data on commodity futures prices. It improves model fit over the causal ARMA model with Gaussian
innovations, according to the AIC criterion, since the mixed causal/noncausal autoregressive specification takes
into account possible noncausality. This noncausality is unidentified in the traditional time series model, that is
the purely causal ARMA model with Gaussian innovations. Estimation of the purely causal ARMA models with
fat tailed, t-distributed, innovations emphasizes the noncausal nature of most series, where often the causal lag
polynomial roots lie inside the complex unit circle.
Moreover, inspection of the causal and noncausal lag polynomial roots of the mixed causal/noncausal autore-
gressive models suggest that longitudinal asymmetries can be accounted for by varying the causal and noncausal
coefficient weights. Moreover, allowing for a low degree of freedom in the fat tailed t-distribution of the er-
ror term can account for bubble like phenomenon and these bubbles can induce transversal asymmetries if the
CHAPTER 1. DYNAMIC MODELING OF COMMODITY FUTURES PRICES 56
model’s shock, εt, admits a skewed distribution. In this way the model can account for both the longitudinal and
transversal asymmetries described in Ramsey and Rothman (1996).
Furthermore, a comparison of the unconditional distributions, by sample histogram and Kullback-Leibler
measure, suggest that the mixed causal/noncausal model with t-distributed shocks is a much closer approximation
to the data than the equivalent purely causal ARMA model.
Finally, taking into account noncausal components is especially important when producing forecasts. Indeed,
the standard Gaussian causal model will provide smooth term structure of linear forecasts with some long run
equilibria. These forecasts are misleading in the presence of a noncausal component. Moreover, in many cases,
including the energy and metals sectors, the causal polynomial admits explosive roots and so the forecasts do
not exist. Employing a mixed causal/noncausal model therefore permits us to forecast the occurrence of future
bubbles, including when they begin their build-up, when they crash, and what will be their magnitude.
CHAPTER 1. DYNAMIC MODELING OF COMMODITY FUTURES PRICES 57
1.9 References
ANDREWS, B., R.A. DAVIS AND F.J. BREIDT (2006): “Maximum Likelihood Estimation for All-Pass TimeSeries Models,” Journal of Multivariate Analysis, 97, 1638-1659.
AZZALINI, A., AND A. CAPITANIO (2003): “Distributions Generated by Perturbation of Symmetry with Em-phasis on a Multivariate Skew t-Distribution,” Journal of the Royal Statistical Society, Series B, 65, 2, 367-389.
BLACK, F. (1976): “The Pricing of Commodity Contracts,” The Journal of Financial Economics, 3, 167-179.
BLANCHARD, O.J. (1979): “Speculative Bubbles, Crashes, and Rational Expectations,” Economic Letters, 3,387-389.
BLANCHARD, O.J., AND M. WATSON (1982): “Bubbles, Rational Expectations and Financial Markets,” Na-
tional Bureau of Economic Research, Working Paper No. 945.
BLANK, S.C. (1991): “Chaos in Futures Markets? A Nonlinear Dynamical Analysis,” The Journal of Futures
Markets, 11, 6, 711-728.
BLOOMBERG L.P. (2013) ”Futures Price Data for Various Continuous Contracts,” Bloomberg database, Univer-sity of Toronto, Mississauga, Li Koon Chun Finance Learning Center.
BREEDEN, D. (1979): “An Intertemporal Asset Pricing Model with Stochastic Consumption and InvestmentOpportunities,” Journal of Financial Economics, 7, 3, 265-296.
BREIDT, J., R. DAVIS, K. LII, AND M. ROSENBLATT (1991): “Maximum Likelihood Estimation for Non-causal Autoregressive Processes,” Journal of Multivariate Analysis, 36, 175-198.
BRENNAN, M.J. (1958): “The Supply of Storage,” American Economic Review, 47, 50-72.
—————- (1991): “The Price of Convenience and the Valuation of Commodity Contingent Claims,” in Stochas-
tic Models and Options Values, ed. by D. Land, and B. Oksendal, Elsevier Science Publishers.
BROCK, W.A., W.D. DECHERT, J. SCHEINKMAN, AND B. LEBARON (1996): “A Test for IndependenceBased on the Correlation Dimension,” Econometric Reviews, 15, 3, 197-235.
BROCK, W.A., AND C.H. HOMMES (1998): “Heterogenous Beliefs and Routes to Chaos in a Simple AssetPricing Model,” Journal of Economic Dynamics and Control, 22, 8-9, 1235-1274.
BROOKS, C., E. LAZAR, M. PROKOPCZUK, AND L. SYMEONIDIS (2011): “Futures Basis, Scarcity andCommodity Price Volatility: An Empirical Analysis,” International Capital Markets Association Center, WorkingPaper, University of Reading.
CHAPTER 1. DYNAMIC MODELING OF COMMODITY FUTURES PRICES 58
CHENG, Q. (1992): “On the Unique Representation of Non Gaussian Linear Processes,” Annals of Statistics, 20,1143-1145.
CRUZ LOPEZ, J., J.H. HARRIS, C. HURLIN, AND C. PERIGNON (2013): “CoMargin,” Bank of Canada,
Working Paper.
DEATON, A., AND G. LAROQUE (1996): “Competitive Storage and Commodity Price Dynamics,” Journal of
Political Economy, 104, 5, 896-923.
DECOSTER, G.P., W.C. LABYS, AND D.W. MITCHELL (1992): “Evidence of Chaos in Commodity FuturesPrices,” The Journal of Futures Markets, 12, 3, 291-305.
DUSAK, K. (1973): “Futures Trading and Investor Returns: An Investigation of Commodity Market Risk Premi-ums,” Journal of Political Economy, 81, 1387-1406.
EPANECHNIKOV, V.A. (1969): “Non-Parametric Estimation of a Multivariate Probability Density,” Theory of
Probability and its Applications, 14, 153-158.
EVANS, G. (1991): “Pitfalls in Testing for Explosive Bubbles in Asset Prices,” The American Economic Review,81, 4, 922-930.
FAMA, E.F., AND K.R. FRENCH (1987): “Commodity Futures Prices: Some Evidence on Forecast Power, Pre-miums, and the Theory of Storage,” The Journal of Business, 60, 1, 55-73.
FINDLEY, D.F. (1986): “The Uniqueness of Moving Average Representations with Independent and IdenticallyDistributed Random Variables for Non-Gaussian Stationary Time Series,” Biometrika, 73, 2, 520-521.
FROST, R. (1986): Trading Tactics: A Livestock Futures Anthology, ed. by Todd Lofton, Chicago MercantileExchange.
FULKS, B. (2000): “Back-Adjusting Futures Contracts,” Trading Recipes DB, http://www.trade2win.
com/boards/attachments/commodities/
90556d1283158105-rolling-futures-contracts-cntcontr.pdf.
GARIBOTTI, G. (2013): “Estimation of the Stationary Distribution of Markov Chains,” PhD Dissertation, Uni-versity of Massachusetts, Amherst, Department of Mathematics and Statistics.
GAY, D.M. (1990): “Usage Summary for Selected Optimization Routines,” Computing Science Technical Report,
No. 153, https://r-forge.r-project.org/scm/viewvc.php/*checkout*/pkg/Rnlminb2/inst/doc/PORT.pdf?revision=4506&root=rmetrics, New Jersey: AT&T Bell Labs.
GIBSON, R., AND E.S. SCHWARTZ (1990): “Stochastic Convenience Yield and the Pricing of Oil ContingentClaims,” The Journal of Finance, 45, 3, 959-976.
CHAPTER 1. DYNAMIC MODELING OF COMMODITY FUTURES PRICES 59
GLYNN, P., AND S. HENDERSON (1998): Estimation of Stationary Densities for Markov Chains, Winter Sim-ulation Conference, ed. by D. Medeiros, E. Watson, J. Carson and M. Manivannan, Piscataway, NJ: Institute ofElectrical and Electronics Engineers.
GOODWIN, B.K., AND N.E. PIGGOTT (2001): “Spatial Market Integration in the Presence of Threshold Ef-fects,” The American Journal of Agricultural Economics, 83, 2, 302-317.
GOURIEROUX, C., AND J. JASIAK (2005): “Nonlinear Innovations and Impulse Responses with Applicationto VaR Sensitivity,” Annals of Economics and Statistics, 1-31.
—————- (2013), “Filtering, Prediction, and Estimation of Noncausal Processes,” CREST, DP.
GOURIEROUX, C., J.J. LAFFONT, AND A. MONFORT (1982): “Rational Expectations in Dynamic LinearModels: Analysis of the Solutions,” Econometrica, 50, 2, 409-425.
GOURIEROUX, C., AND J.M ZAKOIAN (2012): “Explosive Bubble Modelling by Noncausal Cauchy Autore-gressive Process,” Center for Research in Economics and Statistics, Working Paper.
GRASSBERGER, P., AND I. PROCACCIA (1983): “Measuring the Strangeness of Strange Attractors,” Physica
D: Nonlinear Phenomena, 9, 1, 189-208.
HALLIN, M., C. LEFEVRE, AND M. PURI (1988): “On Time-Reversibility and the Uniqueness of MovingAverage Representations for Non-Gaussian Stationary Time Series,” Biometrika, 71, 1, 170-171.
HANSEN, L.P, AND T.J. SARGENT (1991): “Two Difficulties in Interpreting Vector Autogressions,” in Rational
Expectations Econometrics, ed. by L.P. Hansen and T.J. Sargent, Boulder, CO: Westview Press Inc., 77-119.
HYNDMAN, R.J. AND Y. KHANDAKAR (2008): ”Automatic Time Series Forecasting: The Forecast Packagefor R”, Journal of Statistical Software, 27, 3.
JONES, M.C. (2001): “A Skew-t Distribution,” in Probability and Statistical Models with Applications, ed. by A.Charalambides, M.V. Koutras, and N. Balakrishnan, Chapman & Hall/CRC Press.
KALDOR, N. (1939): ”Speculation and Economic Stability,” Review of Economic Studies, October, 7, 1-27.
KNITTEL, C.R., AND R.S. PINDYCK (2013): “The Simple Economics of Commodity Price Speculation,” Na-
tional Bureau of Economic Research, Working Paper No. 18951.
LANNE, M., J. LUOTO, AND P. SAIKKONEN (2012): “Optimal Forecasting of Noncausal Autoregressive TimeSeries,” International Journal of Forecasting, 28, 3, 623-631.
LANNE, M., H. NYBERG, AND E. SAARINEN (2011): “Forecasting U.S. Macroeconomic and Financial Time
CHAPTER 1. DYNAMIC MODELING OF COMMODITY FUTURES PRICES 60
Series with Noncausal and Causal Autoregressive Models: a Comparison,” Helsinki Center of Economic Re-
search, Discussion Paper No. 319.
LANNE, M., AND P. SAIKKONEN (2008): “Modeling Expectations with Noncausal Autoregressions,” Helsinki
Center of Economic Research, Discussion Paper No. 212.
LJUNG, G., AND E.P. BOX (1978): “On a Measure of a Lack of Fit in Time Series Models,” Biometrika, 65, 2,297-303.
LOF, M. (2011): “Noncausality and Asset Pricing,” Helsinki Center of Economic Research, Discussion Paper No.323.
MASTEIKA, S., A.V. RUTKAUSKAS, J.A. ALEXANDER (2012): ““Continuous Futures Data Series for BackTesting and Technical Analysis,” International Conference on Economics, Business and Marketing Management,
29, Singapore: IACSIT Press.
MUTH, J. (1961): “Rational Expectations and the Theory of Price Movements,” Econometrica, 29, 315-335.
NEFTCI, S.N. (1984): “Are Economic Time Series Asymmetric Over the Business Cycle,” Journal of Political
Economy, 92, 307-328.
NELDER, J.A., AND R. MEAD (1965): “A Simplex Method for Function Minimization,” The Computer Journal,
7, 4, 308-313.
NOLAN, J. (2009) “Stable Distributions: Models for Heavy Tailed Data,” http://academic2.american.edu/˜jpnolan/stable/chap1.pdf, American University.
RAMIREZ, O.A. (2009): “The Asymmetric Cycling of U.S. Soybeans and Brazilian Coffee Prices: An Op-portunity for Improved Forecasting and Understanding of Price Behavior,” Journal of Agricultural and Applied
Economics, 41, 1, 253-270.
RAMSEY, J., AND P. ROTHMAN (1996): “Time Irreversibility and Business Cycle Asymmetry,” Journal of
Money and Banking, 28, 1-21.
ROSENBLATT, M. (2000): Gaussian and Non-Gaussian Linear Time Series and Random Fields, New York:Springer Verlag.
ROSS, S. (1976): “The Arbitrage Theory of Capital Asset Pricing,” Journal of Economic Theory, 13, 3, 341-360.
RUBIN, D.B. (1988): “Using the SIR Algorithm to Simulate Posterior Distributions,” Bayesian Statistics, 3, ed.by J. M. Bernardo, M. H. DeGroot, D. V. Lindley, and A. F. M. Smith, Cambridge, MA: Oxford University Press,395-402.
CHAPTER 1. DYNAMIC MODELING OF COMMODITY FUTURES PRICES 61
SHARPE, W.F. (1964): “Capital Asset Prices: A Theory of Market Equilibrium Under Conditions of Risk,” Jour-
nal of Finance, 19, 3, 425-442.
SIGL-GRUB, C., AND D. SCHIERECK (2010): “Speculation and Nonlinear Price Dynamics in CommodityFutures Markets,” Investment Management and Financial Innovations, 7, 1, 62-76.
SMITH, A.F.M, AND A.E. GELFAND (1992): “Bayesian Statistics Without Tears: A Sampling-ResamplingPerspective,” The American Statistician, 46, 2, 84-88.
TERASVIRTA, T. (1994): “Specification, Estimation, and Evaluation of Smooth Transition Autoregressive Mod-els,” Journal of the American Statistical Association, 89, 425, 208-218.
TONG, H., AND K.S. LIM (1980): “Threshold Autoregression, Limit Cycles, and Cyclical Data,” Journal of the
Royal Statistical Society, Series B, 42, 3, 245-292.
TSAY, R.S (2010): Analysis of Financial Time Series, 3rd ed., New Jersey: Wiley Press.
WHITE, H. (1980): ”A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Het-eroskedasticity”, Econometrica, 48, 4, 817-838.
WORKING, H. (1933): “Price Relations Between July and September Wheat Futures at Chicago Since 1885,”Wheat Studies of the Food Research Institute, 9, 6, 187-238.
—————- (1948): “Theory of the Inverse Carrying Charge in Futures Markets.” Journal of Farm Economics,
30, 1, 1-28.
—————- (1949): “The Theory of the Price of Storage,” American Economic Review, 39, 1254-1262.
YANG, S.R., AND W. BRORSEN (1993): “Nonlinear Dynamics of Daily Futures Prices: Conditional Het-eroskedasticity or Chaos?,” The Journal of Futures Markets, 13, 2, 175-191.
CHAPTER 1. DYNAMIC MODELING OF COMMODITY FUTURES PRICES 62
1.10 Appendix: Rolling over the futures contract
Consider first, the “fair price” of the futures contract implied by the spot-futures parity theorem. The theoremimplies that, given the assumption of well functioning competitive markets, a constant, annual, risk-free rate ofinterest rf and a cost of carry c, no arbitrage should ensure that the following relationship between the futures andspot price of the underlying commodity holds at time t:
Ft,t+k = St(1 +k
365(rf + c)), (1.47)
where c ∈ [0, 1]. That is, given the exploitation of arbitrage opportunities, we should have that the cost ofpurchasing the underlying good at price St today and holding it until t+ k (given opportunity cost of capital andcost of carry) should be equal to the current futures price Ft,t+k. Of course, this relationship implies that as thematurity date approaches (i.e. as k → 0) we have that Ft,t = St.
This relationship is an approximate one and will not hold exactly in reality: indeed, the risk-free rate and thecost of carry vary in time and are uncertain, and some goods are perishable and cannot be stored indefinitely.Nevertheless this relationship is useful for considering the rolling over of futures contracts, since if we keep agiven futures contract in a portfolio, its residual maturity will decrease. The formula in (1.47) demonstrates thiseffect and the need to adjust the futures price series level if we want it to maintain the same residual maturity.
Upon the approach of the futures’ maturity, we also wish to extend the price series and obtain price data foreach date. In order to do so we would have to close out our current position and then open a new position inthe futures contract of the next maturity. For example, suppose we are holding a futures contract that expires attime t + k and k is approaching 0. We could sell this futures contract and purchase a new contract on the sameunderlying good but that expires at time t + k + j. However in doing so we would clearly incur a loss since wehave that:
1 +k
365(rf + c) < 1 +
k + j
365(rf + c) (1.48)
by the spot-futures parity theorem. This is known as rollover risk and the difference in the two prices is called thecalendar spread.
However, this loss for the trader should not be considered as part of the overall price series historical data weuse for forecasting since it represents a predictable discontinuity in the series. Therefore typically futures priceseries are also adjusted for this calendar spread by the data provider. There are a few ways to go about doing this,each with their pros and cons: 16
1. Just append together prices without any adjustment. This will distort the series, by including spuriousautocorrelation.
2. Directly adjust the prices up or down according to either the new or old contract at the rollover time period.This can be done by simply subtracting the difference between the two price series, or multiplying one ofthe price series by ratio of the two (i.e. absolute difference or relative difference, respectively). This methodworks, but it causes either the newer or older contract prices to diverge further and further from their originalvalues as we append additional contracts. Moreover, it leaves the choice of adjustment a rather arbitraryone.
3. Continuously adjust the price series over time. This method melds together the futures contract prices of
16See Fulks (2000), a widely disseminated PDF document available on the world wide web. Alternatively, Masteika et al. (2012) providesa more recent treatment of the relevant issues.
CHAPTER 1. DYNAMIC MODELING OF COMMODITY FUTURES PRICES 63
both the “front month” contract (i.e. the contract with the shortest time-to-maturity) with the contracts oflonger times-to-maturity (called the “back month” contracts) in a continuous manner. This allows us thepotential to create a continuous contract price which reflects an “unobserved” futures contract which main-tains a fixed time-to-maturity as time progresses. Ultimately, we are free to choose a model whereby wecan reconstitute the unobserved futures contract price by employing information in the prices of observedcontracts of different maturities.
Example: Smooth transition model
Consider two futures contracts on the same underlying commodity, one with time-to-maturity k, the otherwith time-to-maturity k+ j, where we assume that their prices, Ft,t+k and Ft,t+k+j , approximately satisfythe no arbitrage condition of the spot-futures parity theorem. Moreover, let εi,t for i = 1, 2 be error termssatisfying the standard assumptions of a regression model. The price variables Ft,t+k, Ft,t+k+j , and Stare observable, as is the current risk free rate rf,t. The cost of carry ct, is unobservable since it includes aconvenience yield, and so we must estimate it. Either way, we can then write down the model:
Ft,t+k = St(1 +k
365(rf,t + ct)) + ε1,t (1.49a)
Ft,t+k+j = St(1 +k + j
365(rf,t + ct)) + ε2,t (1.49b)
Pt = αFt,t+k + (1− α)Ft,t+k+j (1.49c)
where εi,t represents a residual deviation away from the spot-futures parity fair value, α = kK , where K
is an upper bound on k + j (that is it represents the time to maturity when the future is first issued) and jis sufficiently large so that the difference in futures prices aren’t negligible (typically j ≥ 30 since futurescontracts of different maturities are indexed by month).
Pt, therefore, represents our estimate of the unobserved contract which incorporates the information in thefront and back month contracts. Since the spot-futures parity doesn’t hold exactly, Pt reflects not just thespot price St, the risk free rate rf,t, and the cost of carry ct; but also some residual error factors εi,t fori = 1, 2.
The Bloomberg console allows the user to specify various criteria which modify how the continuous contractprice series is constructed from the front and back month contracts. Any of the 3 methods above are available foruse. In constructing the price series data employed in this paper we use a method similar to (3) above but simplerin its weighting. The continuous contract futures price Pt is equal to the front month contract price Ft,t+k until thecontract has 30 days left to maturity, so that k = 30. At that point, the continuous contract reflects the weightedaverage between the front month and the next back month contract, with the weights reflecting the number of daysleft until maturity of the front month contract. That is,
Pt =
(k
d
)Ft,t+k +
(k − xd
)Ft,t+k+j (1.50)
where d = 30 represents the total number of days in the month and k is the number of days remaining in themonth. Once k = 0, the price is then Pt = Ft,t+j , until this new front month contract again has 30 days leftuntil maturity, or j = 30. If the difference in time-to-maturity for all contracts is fixed at 30 days (i.e. a differentcontract matures every month), then this scheme represents the reconstitution of an unobserved futures contractwith a fixed time to maturity of 30 days, as time progresses forward indefinitely.
CHAPTER 1. DYNAMIC MODELING OF COMMODITY FUTURES PRICES 64
1.11 Appendix: Mixed causal/noncausal process
In this appendix we provide the definitions of mixed causal/noncausal processes and review several of their prop-erties employed in the main part of the text.
1.11.1 Strong moving average
The infinite moving average Yt =∑∞i=−∞ aiεt−i, where (εt) is a sequence of i.i.d. variables, that is a strong
white noise, can be defined for a white noise without first and/or second order moments.
Let us consider the Banach space Lp of the real random variables such that ‖Y ‖p =√E[|Y |p] exists, for a
given p. For expository purposes we consider the Banach space which requires p ≥ 1. However, the existenceof the process can also be proved for 0 < p ≤ 1. If ‖εt‖p =
√E[|εt|p] exists and if the set of moving av-
erage coefficients is absolutely convergent,∑∞i=−∞|ai| < ∞, then the series with elements aiεt−i is such that∑∞
i=−∞‖aiεt−i‖p =∑∞i=−∞|ai|‖εt−i‖p =
(∑∞i=−∞|ai|
)‖εt−i‖p < ∞, since ‖εt‖p is independent of date t.
Thus the series with elements aiεt−i is normally convergent. In particular the variable Yt =∑∞i=−∞ aiεt−i has a
meaning for the ‖·‖p convergence in the sense that
Yt = limn→∞,m→∞
n∑i=−m
aiεt−i, (1.51)
where the limit is with respect to the Lp-norm. Moreover, the limit Yt has a finite Lp-norm, such that ‖Yt‖p ≤(∑∞i=−∞|ai|
)‖εt−i‖p <∞.
The Lp convergence implies the convergence in distribution. The distribution of the process (εt) is invariantwith respect to the lag of time, that is to the operator L which transforms the process (εt) into L(εt) = (εt−1).Since the process (Yt) is derived from the white noise (εt) by a time invariant function, we deduce that thedistribution of (Yt) is the same as the distribution of L(Yt) = (Yt−1), that is (Yt) is a strong stationary process.
Similar arguments apply to any moving average transformation of a strongly stationary process existing in Lp,that is to:
Xt =
∞∑i=−∞
bjYt−j , (1.52)
whenever∑∞j=−∞|bj | <∞, since ‖Yt‖p is finite and time independent. In particular, we can as usual compound
moving averages. From the equations:
Yt =
∞∑i=−∞
aiεt−i = a(L)εt, with a(L) =
∞∑i=−∞
aiLi, (1.53a)
Xt =
∞∑j=−∞
bjYt−j = b(L)Yt, with b(L) =
∞∑j=−∞
bjLj , (1.53b)
we can deduceXt = b(L)a(L)εt, (1.54)
that is, the moving average representation of process (Xt) in terms of the underlying strong white noise (εt). Thenew moving average operator
c(L) = b(L)a(L) =
∞∑k=−∞
ckLk (1.55)
CHAPTER 1. DYNAMIC MODELING OF COMMODITY FUTURES PRICES 65
admits moving average coefficients given by
ck =
∞∑i=−∞
aibk−i =
∞∑j=−∞
ak−jbj , ∀k. (1.56)
1.11.2 Identification of a strong moving average representation
The question of the identification of a strong moving average representation is as follows. Let us consider astrong moving average process in Lp, Yt =
∑∞i=−∞ aiεt−i. Is it possible to also write this process as Yt =∑∞
i=−∞ a∗i ε∗t−i, that is, with different noise and moving average coefficients? Of course the white noise is defined
up to a multiplicative positive scalar c, since
Yt =
∞∑i=−∞
a∗i ε∗t−i, with a∗i = ai/c, ε
∗t = cεt. (1.57)
The identification conditions below have been derived previously in Findley (1986), Hallin, Lefevre, and Puri(1988), and Cheng (1992).
Identification condition
i) The moving average representation is identifiable up to a multiplicative positive scalar and to a drift of thetime index for the noise process, if and only if the distribution of the white noise is not Gaussian.
ii) If the white noise is Gaussian, the process always admits a causal Gaussian representation,
Yt =
∞∑i=0
a∗i ε∗t−i, with ε∗t ∼ IIN(0, 1). (1.58)
As a consequence the general linear process which is not purely causal, that is which depends on at least onefuture shock (i.e. ai 6= 0 for at least one negative time index i) cannot admit a strong linear causal representation.Equivalently, its strong causal representation will automatically feature nonlinear dynamic features.
1.11.3 Probability distribution functions of the stationary strong form noncausal repre-sentation
It can be shown that the unconditional distribution of the process in equation (1.4) is given as
ft(xt) =1− |ρ|σεπ
σ2ε
σ2ε + (1− |ρ|)2x2
t
. (1.59)
[Gourieroux and Zakoian (2012), Proposition 1] This unconditional distribution is independent of date t by thestrong stationary property.
Moreover, the Markov transition distribution (conditional density) of the forward-looking process is given as
ft|t+1(xt|xt+1) =1
σεπ
σ2ε
σ2ε + z2
t
, where zt =xt − ρxt+1
σε, (1.60)
which follows from the definition of the standard Cauchy distribution.
CHAPTER 1. DYNAMIC MODELING OF COMMODITY FUTURES PRICES 66
Therefore, from Bayes theorem along with equations (1.59) and (1.60), we have that
ft+1|t(xt+1|xt) = ft|t+1(xt|xt+1)ft+1(xt+1)/ft(xt) (1.61a)
=1
σεπ
σ2ε
σ2ε + z2
t
1−|ρ|σεπ
σ2ε
σ2ε+(1−|ρ|)2x2
t+1
1−|ρ|σεπ
σ2ε
σ2ε+(1−|ρ|)2x2
t
(1.61b)
=1
σεπ
σ2ε
σ2ε + z2
t
σ2ε + (1− |ρ|)2x2
t
σ2ε + (1− |ρ|)2x2
t+1
, (1.61c)
which provides the causal transition density of the process [Gourieroux and Zakoian (2012), Proposition 2].
1.11.4 The causal strong autoregressive representation
A nonlinear causal innovation, (ηt), of the process (xt) is a strong white noise such that we can write the currentvalue of the process xt as a nonlinear function of its own past value xt−1 and ηt: xt = G(xt−1, ηt), say, wherext and ηt are in a continuous one-to-one relationship given any xt−1 [Rosenblatt (2000)].
Moreover, since the conditional cumulative distribution function of xt|xt−1 is strictly monotone increasingand continuous, it has an inverse. We can write
xt = F−1(Φ(ηt)|xt−1) where ηt ∼ IIN(0, 1) (1.62a)
⇔ ηt = Φ−1[F (xt|xt−1)], (1.62b)
and F (·|xt−1) is the c.c.d.f. of xt while Φ(·) is the c.d.f. of the standard Normal distribution. Therefore, bychoosingG(xt−1, ηt) = F−1(Φ(ηt)|xt−1), we can select a Gaussian causal innovation. The choice of a Gaussiancausal innovation is purely conventional.
1.11.5 Distributions with fat tails
Different distributions with fat tails can be used as the distribution of the baseline shocks (εt) to construct mixedcausal/noncausal linear processes. Below we provide three examples of fat tailed distributions that are employedin this paper, that are the student t-distribution, the skewed student t-distribution [see Jones (2001)], and the“stable” distributions [see Nolan (2009)], respectively.
i) Student t-distribution:This is a distribution on (−∞,+∞) with probability density function given as:
f(x) =1√νπ
Γ(ν+12 )
Γ(ν2 )
(1 +
x2
ν
)− ν+12
, (1.63)
where ν > 0 is the real degree of freedom parameter and Γ(·) is the Gamma function defined as Γ(z) =∫∞0tz−1e−tdt, if z > 0.
The p.d.f. is symmetric; it bears the same “bell” shape as the Normal distribution except that the t-distributionexhibits fat tails. As the number of degrees of freedom, ν goes to 1 the t-distribution approaches the Cauchydistribution and as the degree of freedom approaches∞, the t-distribution approaches the Normal distribu-tion.
Its tail behaviour is such that E[|x|p] <∞, if ν > p.
CHAPTER 1. DYNAMIC MODELING OF COMMODITY FUTURES PRICES 67
ii) Skewed t-distribution: [Jones (2001), Section 17.2]This is a distribution on (−∞,+∞) with probability density function given as:
f(x) =1
2ν−1β(a, b)√ν
(1 +
x√ν + x2
)a+1/2(1− x√
ν + x2
)b+1/2
, (1.64)
where ν = a+ b, a and b are two positive real valued degrees of freedom parameters and β(a, b) representsthe Beta function defined as β(a, b) = Γ(a)Γ(b)/Γ(a + b). If a > b the distribution is positively skewed,negatively skewed if a < b, and identical to the t-distribution above if a = b. This distribution allows fordifferent magnitudes for the left and right fat tails, respectively.
Another skewed t-distribution has been proposed in the literature as a generalization of the skewed Normaldistribution. This alternative skewed t-distribution is parameterized by only one skewness parameter insteadof two as in Jones (2001) [see Azzalini and Capitanio (2003), Section 4, for more details].
iii) Stable distribution:A random variable, x, is said to be “stable,” or to have a “stable distribution,” if a linear combination of twoindependent copies of x has the same distribution as x, up to location and scale parameters. That is, if x1 andx2 are independently drawn from the distribution of x, then x is stable if for any constants a > 0 and b > 0
the random variable z = ax1 + bx2 has the same distribution as cx+ d for some constants c > 0 and d. Thedistribution is said to be strictly stable if d = 0.
Generally, we cannot express the p.d.f. of the stable random variable x in an analytical form. However, thep.d.f is always expressable as the Fourier transform of the characteristic function, ϕ(t), which always exists,that is, f(x) = 1
2π
∫∞−∞ ϕ(t)e−ixtdt. The characteristic function is given as:
ϕ(t) = exp [itµ− |ct|α (1− iβsign(t) tan (πα/2))]. (1.65)
Therefore the distribution is parameterized by α, β, c, µ where α ∈ (0, 2] is the stability parameter, β ∈[−1, 1] is a skewness parameter, c ∈ (0,∞) is the scale parameter, and µ ∈ (−∞,∞) is the locationparameter.
The Normal, Cauchy, and Levy distributions are all stable continuous distributions. If α = 2 the stabledistribution reduces to the Normal distribution. If α = 1/2 and β = 1, it corresponds to the Levy distribution.Finally, if α = 1 and β = 0 the distribution is Cauchy and the p.d.f. is given analytically as:
f(x) =1
π (1 + x2). (1.66)
Even if the p.d.f. of a stable distribution has no explicit expression, its asymptotic behaviour is known. Wehave [see Nolan (2009), Th 1.12]:
f(x) ∼ cα (1 + sign(x)β)sin(πα/2)Γ(α+ 1)/π
|x|1+α, for large x. (1.67)
Therefore, E[|x|p] <∞, if α > p. In particular the mean does not exist if α ≤ 1.
CHAPTER 1. DYNAMIC MODELING OF COMMODITY FUTURES PRICES 68
1.12 Appendix: Approximation of the mixed causal/noncausal AR(r, s)
likelihood
This section describes the nature of the matrix transformations which ensure that the MLE estimator is consistent,by regressing both forward (noncausal) and backward (causal) lags on xt.
It will first be useful to define the following processes ut and vt. From (1.17a), let ut be defined as
ut = φ(L)xt = ϕ(L−1)−1εt =
∞∑j=0
ϕ∗j εt+j , (1.68)
where ϕ∗0 = 1 and the right hand side series of moving average coefficients are absolutely summable. We call(1.68) the forward looking moving average representation of xt.
Moreover, also from (1.17a) let vt be defined as
vt = ϕ(L−1)xt = φ(L)−1εt =
∞∑j=0
φ∗j εt−j , (1.69)
where φ∗0 = 1 and the right hand side series of moving average coefficients are absolutely summable. We call(1.69) the backward looking moving average representation of xt.
The changes of variables above can also be written in matrix form. Consider the time series xt for t =
1, . . . , T . From (1.68) and (1.69), we have ut = φ(L)xt and vt = ϕ(L−1)xt. Therefore, let us introduce thefollowing matrices, Φc and Φnc:
Φc =
Ir×r 0r×(T−r)
−φr −φr−1 . . . −φ1 1 0 . . . . . . . . . . . .
0 −φr −φr−1 . . . −φ1 1 0 . . . . . . . . .
. . .. . . . . . 0 −φr −φr−1 . . . −φ1 1 0 . . .
0s×(T−s) Is×s
(1.70)
and
Φnc =
1 −ϕ1 . . . −ϕs−1 −ϕs 0 . . . . . . . . . . . .
0 1 −ϕ1 . . . −ϕs−1 −ϕs 0 . . . . . . . . .
. . .. . .
. . .. . . . . . . . . . . . 0 1 −ϕ1 . . . −ϕs−1 −ϕs. . . 0 −φr −φr−1 . . . −φ1 1 0 . . . 0
. . .. . . . . . . . . 0 −φr −φr−1 . . . −φ1 1 0
. . . . . . . . . . . . 0 −φr −φr−1 . . . −φ1 1
(1.71)
where the lower partition of Φnc has s rows. Therefore, Φc will represent the causal transformation and Φnc thenoncausal transformation, respectively. Both matrices are of size T × T .
CHAPTER 1. DYNAMIC MODELING OF COMMODITY FUTURES PRICES 69
Applying the noncausal transformation to the vector of data, x, we have:
v1
...
...vT−s
uT−s+1
...uT
=
x1 − ϕ1x2 − · · · − ϕsx1+s
...
...xT−s − ϕ1xT−s+1 − · · · − ϕsxT
xT−s+1 − φ1xT−s − · · · − φrxT−s+1−r...
xT − φ1xT−1 − · · · − φrxT−r
= Φnc
x1
...
...xT−s
xT−s+1
...xT
(1.72)
Moreover, from εt = φ(L)ϕ(L−1)xt = φ(L)vt, we have:
e =
v1
...vr
εr+1
...εT−s
uT−s+1
...uT
=
v1
...vr
vr+1 − φ1vr − · · · − φrv1
...vT−s − φ1vT−s−1 − · · · − φrvT−s−r
uT−s+1
...uT
= Φc
v1
...vT−s
uT−s+1
...uT
(1.73)
So we have the transformation e = ΦcΦncx.
Thus the elements of e are mutually independent and the joint density of e is given as:
fe(e|θ) = fv(v1, . . . , vr)
(T−s∏t=r+1
fε(εt;λ, σ)
)fu(uT−s+1, . . . , uT ), (1.74)
where θ = φ,ϕ, λ, σ represents the parameters of the model.
The Φc matrix is lower triangular and its determinant is equal to 1. Therefore, using the change of variablesJacobian formula, we can express the joint density in terms of x as:
fx(x|θ) = fv(ϕ(L−1)x1, . . . , ϕ(L−1)xr)
(T−s∏t=r+1
fε(ϕ(L−1)φ(L)xt;λ, σ)
)fu(φ(L)xT−s+1, . . . , φ(L)xT )|det(Φnc)|. (1.75)
Since the determinant of Φnc is independent of sample size,17 we can approximate asymptotically the likelihood
17To show this we can employ the partitioned matrix determinant formula: det
([A11 A12
A21 A22
])=
det (A11) det(A22 −A21A11
−1A12), where it can be shown that A11 is (T − s) × (T − s) with determinant 1, and so the
second term in the factorization represents the determinant of an s× s matrix, for all T .
CHAPTER 1. DYNAMIC MODELING OF COMMODITY FUTURES PRICES 70
by using the second factor in the above expression, that is,
T−s∏t=r+1
fε(ϕ(L−1)φ(L)xt;λ, σ). (1.76)
For large samples, T will dwarf r + s = p and so the approximation will be consistent.
Asymptotic properties of the approximated maximum likelihood estimators are discussed in section 3.2 andconsistent estimation of the standard errors is detailed in section 3.3, both of Lanne et al. (2008).
1.13 Appendix: Numerical algorithm for mixed causal/noncausal AR(r, s)
forecasts
Solution proposed by Lanne, Luoto, and Saikkonen (2012)Lanne, Luoto, and Saikkonen (2012) propose to circumvent the problem presented by our ignorance of the sta-tionary distribution fx(·) by enlarging the space of random variables. They first rewrite (1.37) as:
fxt+1|t(xt+1|xt) = fxt,xt+1(xt, xt+1)/fx(xt). (1.77)
Then by using the fact that xt = ut = ϕ(L−1)εt =∑∞j=0 ϕ
j1εt+j , they choose to employ the mapping
(xt, xt+1, xt+2, . . . ) → (εt, εt+1, εt+2, . . . ). This suggests a linear relationship which, by approximating xt =
ut ≈∑Mj=0 ϕ
j1εt+j given a sufficiently large truncation lag M , we are able to invert, providing an approximate
expression for εt as a linear function of both xt and future εt+1, εt+2, . . . , εt+M . For example in this case wherethe noncausal polynomial is of order 1, we have that εt ≈ xt −
∑Mj=1 ϕ
j1εt+j .
Since, by assumption, the distribution of the shocks εt is known, the authors are able to compute the prob-ability of these approximated εt’s, and relying upon Monte-Carlo simulation methods, are able to approximatethe conditional C.D.F. function of xt+1 = ut+1. The conditional C.D.F. function at a given value α ∈ R can becomputed from (1.77) above by means of approximating the following integral by Monte-Carlo simulation, wherewe average across draws of sufficiently long future paths of ε+
t+1 = εt+1, . . . , εt+M:
Fxt+1|t(α|xt) =
∫1α>xt+1fxt+1|t(xt+1|xt)dxt+1 (1.78a)
≈∫
1
fε(εt)1α>xt+1
M−1∑j=0
ϕj1εt+1+j
fε(εt)
M∏j=1
fε(εt+j)dε+t+1 (1.78b)
This method has two drawbacks: first, we approximate the above integral by Monte-Carlo simulation of thelong future paths of ε+
t+1. Second, M has to be sufficiently large so that the approximation does not miss theeffect of far future shocks. The value of M required to obtain an accurate approximation will grow as the roots ofthe noncausal polynomial approach 1, and so will the computational requirements of the algorithm.
The numerical method proposed by Lanne, Luoto, and Saikkonen (2012) also works in the more general casewhere s > 1. However, now that the noncausal order is greater than 1 enlarging the space from(xt−s+1, . . . , xt, xt+1, . . . )→ (εt−s+1, . . . , εt, εt+1, εt+2, . . . ) requires us to invert a system of equations. There-fore, we may employ a matrix transformation between the two spaces and this matrix is inverted to provide anapproximation to εt, . . . , εt−s+1 in terms of both xt, . . . , xt−s+1 and future εt+1, . . . , εt+M . It is noted in theirpaper (and in the Appendix here) that the Jacobian determinant of this transformation is always 1. However, while
CHAPTER 1. DYNAMIC MODELING OF COMMODITY FUTURES PRICES 71
this matrix is sparse, for large s and M it is computationally costly.Below, we describe their method for the approximate simulation of the conditional c.d.f.,
Fut+h|t(α|Ft) =
∫ α
−∞fut+h|t(ut+h|Ft)dut+h (1.79)
for h = 1, when s > 1 (which they also generalized to the case where h > 1 in their paper). The method isbroken down into a number of discussion points as follows:
1. We require the density of ε+t+1 = εt+1, εt+2, . . . , conditional on the data
xt = xt, xt−1, . . . x1.
2. Since from (1.68), we have that ut+1 =∑∞j=0 ϕ
∗j εt+1+j , from equation (1.75) it can be shown that:
fx,ε+(xt, ε+t+1|θ)
fx(xt|θ)= p(ε+
t+1|xt; θ) =fu−,ε+(u−t (φ), ε+
t+1)
fu−(u−t (φ)), (1.80)
where θ represents the parameters of the mixed causal/noncausal AR(r, s) model andu−t (φ) = φ(L)xt−s+1, . . . , φ(L)xt = ut−s+1, . . . , ut.
3. Then, we can use Monte-Carlo simulations to approximate both the numerator and denominator of (1.80)in order to approximate the desired conditional c.d.f. as:
Fut+1|t(α|Ft) ≈1
fu(u−t (φ))
∫1α>ut+1
M−1∑j=0
ϕ∗j εt+1+j
fu,ε+(u−t (φ), ε+t+1)dε+
t+1, (1.81)
where under the assumption of some finite M (such that as M →∞, (ϕ∗j )→ 0), we can approximate ut+1
as ut+1 ≈∑M−1j=0 ϕ∗j εt+1+j .
4. In order to do this, however, we need to accomplish a change of variables between (u−t (φ), ε+t+1) and
(εt−s+1, . . . , εt, ε+t+1). Given (1.68), the approximate mapping between these two sets of variables is
given as:
1 ϕ∗1 . . . . . . . . . . . . ϕ∗M+s−1
0. . . . . .
......
. . . 1 ϕ∗1 . . . . . . ϕ∗M...
. . . 1 0 . . . 0...
. . . . . . . . ....
.... . . . . . 0
0 . . . . . . . . . . . . 0 1
εt−s+1
...εt
εt+1
...εt+M
≈
ut−s+1
...ut
εt+1
...εt+M
(1.82)
which can be written as Ce ≈ w. Therefore, by inverting C and noting that its determinant is 1, we canwrite the numerator in (1.80) as:
fu−,ε+(u−t (φ), ε+t+1) ≈
s∏j=1
fε(εt−s+j(u−t (φ), ε+
t+1))
t+M∏τ=t+1
fε(ετ ), (1.83)
where we write the elements εt−s+j(u−t (φ), ε+t+1) as such to indicate that they are functions of both u−t (φ)
CHAPTER 1. DYNAMIC MODELING OF COMMODITY FUTURES PRICES 72
and ε+t+1.
5. Therefore, if we simulate N i.i.d. draws of the M length vector ε+t+1,i (i.e. for i = 1, . . . , N ) according to
fε(·), an approximation to the desired conditional c.d.f. in (1.81) is given as:
Fut+1|t(α|Ft) ≈N−1
∑Ni=1 1α>ut+1
(∑M−1j=0 ϕ∗j εt+1+j,i
)∏sj=1 fε(εt−s+j(u
−t (φ), ε+
t+1,i))
N−1∑Ni=1
∏sj=1 fε(εt−s+j(u
−t (φ), ε+
t+1,i)). (1.84)
Then, given an appropriately chosen grid of αi’s, we can generate an approximation to the shape of thec.d.f. across its support.
1.14 Appendix: Tables and Figures
CHAPTER 1. DYNAMIC MODELING OF COMMODITY FUTURES PRICES 73
Table 1.7.i: Lag polynomial roots of the mixed and benchmark models
Model p/r,q/s Sig.p/r Sig.q/s cR cMC ncR ncMC #CCSoybean meal skew-t arma 10,0 1,8,9 1.010 1.571 4
1.5811.5821.583
t-dist mixed 10,10 1,3,5,7,9,10 1,2,3,4,6,9 1.385 1.354 -1.716 1.091 4/4-2.532 1.414 1.530
1.474 1.5301.500 1.561
Soybean oil skew-t arma 10,0 1,10 1.033 1.478 41.306 1.558
1.6001.619
t-dist mixed 10,10 1,2,4,9,10 1,2,3,4,8 1.373 1.341 1.009 1.666 4/3-1.797 1.359 1.285 1.669
1.390 1.4741.510
Soybeans skew-t arma 10,0 1,2,5,8,9 1.028 1.514 41.5511.5561.582
skew-t mixed 10,10 1,2,5,8,10 1 -1.559 1.358 0.944 4/01.749 1.464
1.4771.558
Orange juice skew-t arma 10,0 1,2,3,10 1.033 1.505 41.5721.6231.660
skew-t mixed 10,10 1,2,5,9 1,2,5 1.556 1.518 1.060 2.460 4/11.542 1.8431.555 -2.7501.608
Sugar skew-t arma 1,2 1 1,2 1.000 4.590 34.7565.0105.487
t-dist mixed 2,2 1,2 1,2 4.373 1.002 1/014.637
Wheat skew-t arma 5,0 1,5 0.992 2.350 22.655
skew-t mixed 5,5 1,2,3,5 1,3,4 1.006 1.814 1.789 2.046 2/12.071 -2.434
Cocoa skew-t arma 10,0 1 1.022 0skew-t mixed 10,10 1,6,9 1,2,4,9,10 1.436 1.417 -1.435 1.202 4/4
1.486 1.740 1.4081.499 1.4141.508 1.426
CHAPTER 1. DYNAMIC MODELING OF COMMODITY FUTURES PRICES 74
Table 1.7.ii: Lag polynomial roots of the mixed and benchmark models
Model p/r,q/sa Sig.p/rb Sig.q/sb cRc cMCc ncRd ncMCd #CCe
Coffee t-dist arma 10,0 1,3 0.995 4.740 1skew-t mixed 10,10 1,2,5,6,10 1,2,5,6,7 1.375 1.027 1.684 5/2
1.403 1.571 1.7621.428 -1.6451.4301.446
Corn skew-t arma 2,0 1,2 1.000 051.190
t-dist mixed 2,3 1 1,2,3 -32.542 1.002 5.484 0/1Cotton skew-t arma 10,0 1,2,6,7 1.007 1.738 3
1.7071.615
t-dist mixed 1,3 0 1,2,3 1.003 5.317 0/1Rice skew-t arma 2,2 1,2 1,2 0.997 3.099 3
2.917 3.332-3.552 3.493
t-dist mixed 1,3 1 1,2,3 -15.328 1.001 5.003 0/1Lumber skew-t arma 1,1 1 1 1.005 13.181 4
13.23713.31413.375
skew-t mixed 10,10 1,2,4-10 1,5 1.015 1.235 -1.862 1.218 4/2-1.454 1.247 1.752
1.3361.900
Gold t-dist arma 3,0 1,2,3 0.999 5.618 1t-dist mixed 10,10 1,2,6,10 1 -1.450 1.395 0.974 4/0
1.489 1.4161.4311.434
Silver skew-t arma 10,0 1,2,4,8 1.003 1.606 3-1.874 1.715
1.751skew-t mixed 10,10 1,3-6,9,10 1,4,5,7 1.479 1.424 0.996 4
-1.533 1.424 1.600 1.721 4/21.451 -2.070 1.6431.327
Platinum skew-t arma 10,0 1,4,7,8,9 0.957 1.493 41.5281.5721.582
skew-t mixed 10,10 1,2,3,5-9 1,2,6-8,10 -1.786 1.355 0.974 1.304 4/41.376 1.257 1.3281.385 1.4011.860 1.594
a (p,q) or (r,s) pairs for ARMA(p,q) and mixed causal/noncausal AR(r, s) models respectively.b Significant lags at the 5% level assuming Normal distributed parameters.c Causal lag polynomial; real roots and modulus of complex roots respectively.d Noncausal lag polynomial; real roots and modulus of complex roots respectively.e Number of complex conjugate roots with the same modulus (causal/noncausal).
CHAPTER 1. DYNAMIC MODELING OF COMMODITY FUTURES PRICES 75
Table 1.7.iii: Lag polynomial roots of the mixed and benchmark models
Model p/r,q/s Sig.p/r Sig.q/s cR cMC ncR ncMC #CCPalladium skew-t arma 5,0 1,2,4,5 1.006 2.431 1
-2.4343.525
t-dist mixed 8,8 1,2-7 1,2,3,7,8 -1.618 1.621 0.989 1.547 3/31.632 1.536 1.5741.884 1.619
Copper skew-t arma 10,0 1,2,6 1.055 2.020 21.696 2.101
skew-t mixed 10,10 1,2,3,6 1,6,7,8 1.728 0.952 1.352 3/31.737 -1.323 1.4821.831 1.751
Light crude oil t-dist arma 2,0 1,2 0.999 0-23.729
skew-t mixed 1,3 1 1,2,3 -14.222 1.002 6.144 0/1Heating oil t-dist arma 2,0 1,2 0.999 0
-27.213t-dist mixed 10,10 1-4,7,9,10 1-6,9,10 1.245 1.279 1.032 1.259 4/4
-2.553 1.307 -1.505 1.3031.349 1.3151.368 1.372
Brent crude oil t-dist arma 2,2 1,2 1,2 0.989 2.466 32.255 2.621
-2.716 2.695skew-t mixed 10,10 1,4,9,10 1,2,5,6,9 1.261 1.292 1.068 1.276 4/3
-1.527 1.331 1.101 1.3881.336 -1.723 1.5401.500
Gas oil skew-t arma 1,0 1 0.998 0skew-t mixed 10,10 3,7,9,10 1,4,7-10 1.230 1.324 0.925 1.346 4/4
-2.140 1.328 -1.264 1.4831.341 1.5421.508 1.563
Natural gas t-dist arma 1,2 1 1 1.001 34.697 434.76534.83934.886
t-dist mixed 1,1 1 1 -31.650 1.001 0/0Gasoline RBOB skew-t arma 3,0 1,3 0.972 4.452 1
skew-t mixed 2,1 2 1 4.390 1.005 1/0Live cattle skew-t arma 10,0 1,5 1.019 2.408 1
1.973-2.543
t-dist mixed 10,10 1 3,4,6 0.994 1.896 0/31.7281.891
Lean hogs skew-t arma 5,0 1,4,5 0.984 2.555 1-2.5252.744
skew-t mixed 0,2 1.004 0/055.339
CHAPTER 1. DYNAMIC MODELING OF COMMODITY FUTURES PRICES 76
Figure 1.10.i: Plots of daily continuous contract futures price level series
100
150
200
250
300
350
400
450
500
55008
/01
1977
09/0
119
82
09/0
119
87
10/0
119
92
11/0
119
97
12/0
120
02
01/0
120
08
02/0
120
13
Soybean meal from 07/18/1977 to 02/08/2013
10
20
30
40
50
60
70
80
08/0
119
77
09/0
119
82
09/0
119
87
10/0
119
92
11/0
119
97
12/0
120
02
01/0
120
08
02/0
120
13
Soybean oil from 07/18/1977 to 02/08/2013
400
600
800
1000
1200
1400
1600
1800
08/0
119
77
09/0
119
82
09/0
119
87
10/0
119
92
11/0
119
97
12/0
120
02
01/0
120
08
02/0
120
13
Soybeans from 07/18/1977 to 02/08/2013
40
60
80
100
120
140
160
180
200
220
08/0
119
77
09/0
119
82
09/0
119
87
10/0
119
92
11/0
119
97
12/0
120
02
01/0
120
08
02/0
120
13
Orange juice from 07/18/1977 to 02/08/2013
0
5
10
15
20
25
30
35
40
45
50
08/0
119
77
09/0
119
82
09/0
119
87
10/0
119
92
11/0
119
97
12/0
120
02
01/0
120
08
02/0
120
13
Sugar from 07/18/1977 to 02/08/2013
200
400
600
800
1000
1200
1400
08/0
119
77
09/0
119
82
09/0
119
87
10/0
119
92
11/0
119
97
12/0
120
02
01/0
120
08
02/0
120
13
Wheat from 07/18/1977 to 02/08/2013
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
5500
08/0
119
77
09/0
119
82
09/0
119
87
10/0
119
92
11/0
119
97
12/0
120
02
01/0
120
08
02/0
120
13
Cocoa from 07/18/1977 to 02/08/2013
0
50
100
150
200
250
300
350
08/0
119
77
09/0
119
82
09/0
119
87
10/0
119
92
11/0
119
97
12/0
120
02
01/0
120
08
02/0
120
13
Coffee from 07/18/1977 to 02/08/2013
CHAPTER 1. DYNAMIC MODELING OF COMMODITY FUTURES PRICES 77
Figure 1.10.ii: Plots of daily continuous contract futures price level series
100
200
300
400
500
600
700
800
90008
/01
1977
09/0
119
82
09/0
119
87
10/0
119
92
11/0
119
97
12/0
120
02
01/0
120
08
02/0
120
13
Corn from 07/18/1977 to 02/08/2013
20
40
60
80
100
120
140
160
180
200
220
08/0
119
77
09/0
119
82
09/0
119
87
10/0
119
92
11/0
119
97
12/0
120
02
01/0
120
08
02/0
120
13
Cotton from 07/18/1977 to 02/08/2013
0
5
10
15
20
25
12/0
119
88
09/0
119
92
07/0
119
96
05/0
120
00
02/0
120
04
12/0
120
07
10/0
120
11
Rice from 12/06/1988 to 02/08/2013
100
150
200
250
300
350
400
450
500
04/0
119
86
01/0
119
90
11/0
119
93
09/0
119
97
06/0
120
01
04/0
120
05
02/0
120
09
11/0
120
12
Lumber from 04/07/1986 to 02/08/2013
0
200
400
600
800
1000
1200
1400
1600
1800
2000
08/0
119
77
09/0
119
82
09/0
119
87
10/0
119
92
11/0
119
97
12/0
120
02
01/0
120
08
02/0
120
13
Gold from 07/18/1977 to 02/08/2013
0
5
10
15
20
25
30
35
40
45
50
08/0
119
77
09/0
119
82
09/0
119
87
10/0
119
92
11/0
119
97
12/0
120
02
01/0
120
08
02/0
120
13
Silver from 07/18/1977 to 02/08/2013
200 400 600 800
1000 1200 1400 1600 1800 2000 2200 2400
04/0
119
86
01/0
119
90
11/0
119
93
09/0
119
97
06/0
120
01
04/0
120
05
01/0
120
09
11/0
120
12
Platinum from 04/01/1986 to 02/08/2013
0
200
400
600
800
1000
1200
04/0
119
86
01/0
119
90
11/0
119
93
09/0
119
97
06/0
120
01
04/0
120
05
01/0
120
09
11/0
120
12
Palladium from 04/01/1986 to 02/08/2013
CHAPTER 1. DYNAMIC MODELING OF COMMODITY FUTURES PRICES 78
Figure 1.10.iii: Plots of daily continuous contract futures price level series
50
100
150
200
250
300
350
400
450
50012
/01
1988
09/0
119
92
07/0
119
96
05/0
120
00
02/0
120
04
12/0
120
07
10/0
120
11
Copper from 12/06/1988 to 02/08/2013
0
20
40
60
80
100
120
140
160
04/0
119
83
01/0
119
87
11/0
119
90
09/0
119
94
06/0
119
98
04/0
120
02
01/0
120
06
11/0
120
09
Light crude oil from 03/30/1983 to 02/08/2013
0
50
100
150
200
250
300
350
400
450
07/0
119
86
04/0
119
90
02/0
119
94
12/0
119
97
09/0
120
01
07/0
120
05
04/0
120
09
02/0
120
13
Heating oil from 07/01/1986 to 02/08/2013
0
20
40
60
80
100
120
140
160
07/0
119
88
04/0
119
92
02/0
119
96
12/0
119
99
09/0
120
03
07/0
120
07
04/0
120
11
Brent crude oil from 06/23/1988 to 02/08/2013
0
200
400
600
800
1000
1200
1400
07/0
119
89
04/0
119
93
02/0
119
97
12/0
120
00
09/0
120
04
07/0
120
08
04/0
120
12
Gas oil from 07/03/1989 to 02/08/2013
0
2
4
6
8
10
12
14
16
04/0
119
90
01/0
119
94
11/0
119
97
09/0
120
01
06/0
120
05
04/0
120
09
01/0
120
13
Natural gas from 04/03/1990 to 02/08/2013
50
100
150
200
250
300
350
400
10/0
120
05
01/0
120
07
04/0
120
08
07/0
120
09
11/0
120
10
02/0
120
12
Gasoline RBOB from 10/04/2005 to 02/08/2013
CHAPTER 1. DYNAMIC MODELING OF COMMODITY FUTURES PRICES 79
Figure 1.10.iv: Plots of daily continuous contract futures price level series
30 40 50 60 70 80 90
100 110 120 130 140
08/0
119
77
05/0
119
81
03/0
119
85
01/0
119
89
10/0
119
92
08/0
119
96
05/0
120
00
03/0
120
04
01/0
120
08
10/0
120
11
Live cattle from 07/18/1977 to 02/08/2013
20
30
40
50
60
70
80
90
100
110
04/0
119
86
01/0
119
90
11/0
119
93
09/0
119
97
06/0
120
01
04/0
120
05
01/0
120
09
11/0
120
12
Lean hogs from 04/01/1986 to 02/08/2013
CHAPTER 1. DYNAMIC MODELING OF COMMODITY FUTURES PRICES 80
Figure 1.11.i: Histograms of daily continuous contract futures price level series
0
100
200
300
400
500
600
700
100 150 200 250 300 350 400 450 500 550
Soybean meal from 07/18/1977 to 02/08/2013
0
50
100
150
200
250
300
350
10 20 30 40 50 60 70 80
Soybean oil from 07/18/1977 to 02/08/2013
0
50
100
150
200
250
300
350
400
450
500
400 600 800 1000 1200 1400 1600 1800
Soybeans from 07/18/1977 to 02/08/2013
0
50
100
150
200
250
300
350
40 60 80 100 120 140 160 180 200 220
Orange juice from 07/18/1977 to 02/08/2013
0
100
200
300
400
500
600
700
0 5 10 15 20 25 30 35 40 45 50
Sugar from 07/18/1977 to 02/08/2013
0
100
200
300
400
500
600
200 400 600 800 1000 1200 1400
Wheat from 07/18/1977 to 02/08/2013
0
50
100
150
200
250
300
350
400
450
500 1000 1500 2000 2500 3000 3500 4000 4500 5000 5500
Cocoa from 07/18/1977 to 02/08/2013
0
50
100
150
200
250
300
25 50 75 100 125 150 175 200 225 250 275 300 325
Coffee from 07/18/1977 to 02/08/2013
CHAPTER 1. DYNAMIC MODELING OF COMMODITY FUTURES PRICES 81
Figure 1.11.ii: Histograms of daily continuous contract futures price level series
0
50
100
150
200
250
300
350
400
450
100 200 300 400 500 600 700 800 900
Corn from 07/18/1977 to 02/08/2013
0
100
200
300
400
500
600
700
20 40 60 80 100 120 140 160 180 200 220
Cotton from 07/18/1977 to 02/08/2013
0
50
100
150
200
250
0 5 10 15 20 25
Rice from 12/06/1988 to 02/08/2013
0
50
100
150
200
250
300
100 150 200 250 300 350 400 450 500
Lumber from 04/07/1986 to 02/08/2013
0
200
400
600
800
1000
1200
0 200 400 600 800 1000 1200 1400 1600 1800 2000
Gold from 07/18/1977 to 02/08/2013
0
200
400
600
800
1000
1200
1400
1600
1800
0 5 10 15 20 25 30 35 40 45 50
Silver from 07/18/1977 to 02/08/2013
0
100
200
300
400
500
600
700
200 400 600 800 1000 1200 1400 1600 1800 2000 2200 2400
Platinum from 04/01/1986 to 02/08/2013
0
100
200
300
400
500
600
700
0 200 400 600 800 1000 1200
Palladium from 04/01/1986 to 02/08/2013
CHAPTER 1. DYNAMIC MODELING OF COMMODITY FUTURES PRICES 82
Figure 1.11.iii: Histograms of daily continuous contract futures price level series
0
50
100
150
200
250
300
350
400
450
500
50 100 150 200 250 300 350 400 450 500
Copper from 12/06/1988 to 02/08/2013
0
100
200
300
400
500
600
0 20 40 60 80 100 120 140 160
Light crude oil from 03/30/1983 to 02/08/2013
0
100
200
300
400
500
600
700
800
900
1000
0 50 100 150 200 250 300 350 400 450
Heating oil from 07/01/1986 to 02/08/2013
0
50
100
150
200
250
300
350
400
450
500
0 20 40 60 80 100 120 140 160
Brent crude oil from 06/23/1988 to 02/08/2013
0
50
100
150
200
250
300
350
400
450
500
0 200 400 600 800 1000 1200 1400
Gas oil from 07/03/1989 to 02/08/2013
0
100
200
300
400
500
600
0 2 4 6 8 10 12 14 16
Natural gas from 04/03/1990 to 02/08/2013
0
5
10
15
20
25
30
35
40
45
50 100 150 200 250 300 350 400
Gasoline RBOB from 10/04/2005 to 02/08/2013
CHAPTER 1. DYNAMIC MODELING OF COMMODITY FUTURES PRICES 83
Figure 1.11.iv: Histograms of daily continuous contract futures price level series
0
50
100
150
200
250
300
350
400
450
30 40 50 60 70 80 90 100 110 120 130 140
Live cattle from 07/18/1977 to 02/08/2013
0
50
100
150
200
250
300
20 30 40 50 60 70 80 90 100 110
Lean hogs from 04/01/1986 to 02/08/2013
Chapter 2
Improving Bayesian VAR density forecaststhrough autoregressive Inverse WishartStochastic Volatility
2.1 Introduction
Forecasts of macroeconomic time series have become a ubiquitous component of any policymaker’s toolkit. As
such, central banks like the Federal Reserve typically publish density forecasts for inflation, output, interest rates,
or other major indicators. This information helps both industry and consumers make decisions consistent with
economic fundamentals. However, forecasts themselves are not infallible. In fact, while major advances have
been made in the area of statistical forecasting, there remains much room for improvement.
This paper resolves some of the relevant issues by proposing a key change in the volatility process of Vector
Autoregressions (VAR) popular among macroeconomists. Instead of assuming that the time varying VAR inno-
vation covariance structure is driven by independent nonstationary processes, we employ a stationary multivariate
Inverse Wishart process where the scale matrix is a function of past covariance matrices. Furthermore, we employ
four major U.S. macroeconomic data series, that are, the rate of GDP growth, the inflation rate, the interest rate,
and the unemployment rate, respectively.1 A Bayesian approach, employing Markov Chain Monte Carlo meth-
ods (MCMC), is then taken in both estimation and in comparing forecasts between the benchmark model [Clark
(2011)] and our competing Inverse Wishart autoregressive volatility specification [Philipov and Glickman (2006)].
Results suggest that incorporating the more sophisticated Inverse Wishart autoregressive volatility process can im-
prove density forecasts in both the short and long-run, with larger improvements as the horizon increases, despite
1Note that the data is taken from the RTDSM database – the same dataset, in fact, as Clark (2011), our benchmark comparison model (withthe exception of the interest rate—see Section 2.2 on the data set, below). Moreover, all data is at the aggregate U.S. level. Finally the interestrate employed in our paper is the 3-month Federal Treasury Bill rate.
84
CHAPTER 2. IMPROVING VAR FORECASTS THROUGH AR INVERSE WISHART SV 85
a small data set and increased parameterization of the model. With this in mind, the following discussion aims to
provide a broader context surrounding the relevant forecasting issues precipitating this proposed modification to
the typical VAR process volatility specification.
2.1.1 Background
A fundamental issue facing the production of good forecasts has been that of how to deal with the changing mo-
ments of the conditional forecast distributions. For example, dramatic changes in U.S. economic volatility have
posed a modeling challenge to contemporary forecasters, specifically among macroeconomists where Gaussian
VAR models are popular. An analysis of major U.S. economic indicators, such as output growth over the past 100
years, illustrates that the economy goes through periods of changing volatility. For example, “The Great Moder-
ation,” which began in the 1980’s, represented a period of unusually low volatility vis-a-vis both a lengthy prior
period of erratic volatility and the more recent instability we’ve experienced since 2007. In this respect, both Sims
(2001) and Stock (2001), in separate discussions of Cogley and Sargent’s (2001) paper, criticized the assumption
of homoskedastic VAR variances, pointing to evidence analyzed by Bernanke and Mihov (1998a,1998b), 2 Kim
and Nelson (1999), or McConnell and Perez Quiros (2000). 3 Clark (2011) also finds significant changes in
conditional volatility across time when estimating the latent stochastic volatilities of a Bayesian VAR model.
It should not come as a surprise then, that while, traditionally, the volatility of forecasting models was assumed
constant over time (primarily for the sake of simplicity), it can be shown that this assumption leads to poor
conditional forecasts. For example, Jore, Mitchell, and Vahey (2010) employ a model averaging approach to
U.S. data, with both equal weights and recursively adapted weights, based on log predictive density scores across
a range of different specifications. Their results show strong support for a recursive weighting scheme across
specifications. More interestingly, however, they find that during periods of changing macroeconomic volatility,
for example when the U.S. economy transitioned into “The Great Moderation”, the weighting scheme tends to
place more weight on specifications which dynamically account for structural breaks in volatility. Moreover, they
find evidence of poor forecasting given a simple assumption of fixed volatility or equal weights across model
specifications. However, it worth noting that the specifications which do respond to structural change within Jore
et al.’s (2010) framework are limited in that they are restricted to a finite set of possible volatility states and breaks.
Consequently, it is important to account for changing volatility in any forecasting specification. Furthermore,
if such changes in volatility occurred relatively infrequently and could be extracted from the data with reasonable
statistical significance, then employing a regime switching specification such as in Jore et al. (2010) might prove
sufficient in drawing good forecasts. However, the truth is that, given the complexity of the economy, changes in
2In the case of monetary policy shocks between 1979 and 1982.3With respect to the growing stability of output around 1985.
CHAPTER 2. IMPROVING VAR FORECASTS THROUGH AR INVERSE WISHART SV 86
volatility probably occur much more frequently and take on many more values than can be effectively captured
by a finite state-space model. For this reason forecasters have adopted a continuous state-space framework for
estimating the conditional volatility of VAR models as opposed to the finite state regime switching type model
applied to volatility, as popularized by Hamilton (1989), and employed by Jore et al (2010). Moreover, the use of
the so-called continuous state “Stochastic Volatility” model has also grown in popularity given its usefulness in
modeling a latent volatility process based on a filtration that includes more than just lagged VAR series shocks, as
for example in the case of a GARCH model or a volatility-in-mean model.4
Both Cogley and Sargent (2005), and Primiceri (2005), allow for time variation in the conditional covariance
matrix across VAR series shocks according to a Stochastic Volatility law of motion, where the conditional volatil-
ity can take on any value in a continuous positive real set (and covariances can be any real number). Moreover,
they also allow for time variation in the VAR parameters themselves, through another Stochastic Volatility law of
motion on their state across time. Clark’s (2011) model, which represents our benchmark, follows the same struc-
ture of the previous two studies, albeit without the time varying VAR parameters, which are dropped in favour of
tight Bayesian steady state priors on the deterministic trend parameters (which define the unconditional mean of
the VAR process) and a rolling sample window which re-estimates the parameters across time.
Villani (2009) showed that imposing Bayesian steady state prior distributions allow us to incorporate prior
beliefs about macroeconomic variable steady states into our model. Furthermore, our belief is that employing this
information probably reduces the need for time varying VAR parameters since much of the time variation in the
autoregressive parameters (which is not due to a lack of time variation in the shock covariance, as was the case
with Cogley and Sargent (2001)), may in fact be due to a lack of a well defined trend (see Cogley and Sargent
(2005), where they model their VAR intercepts5 as stochastic random walks). Moreover, given the quarterly
nature of most macroeconomic time series, small sample sizes are usually the norm. In this situation a tight prior
also plays the role of constraining VAR parameters to aid in the identification of trends that might not otherwise
be readily apparent.6 In this respect, Villani (2009) demonstrates that informative priors for trends can greatly
improve point forecasts, especially over the longer term horizon where correct specification of the trend of the
series is important—see Clements and Hendry (1998). All of this of course assumes that our prior beliefs on the
nature of the time series trends are accurate. In fact, whether or not the trends in macroeconomic data are better
modeled as stochastic (i.e. unit roots with drift) or deterministic is still an open question of debate.
However, most of these studies adopt certain features which could still be improved upon. For example, many
of these studies construct the VAR innovation covariance dynamics as driven by independent nonstationary pro-
4See also Sartore and Billio (2005) for a useful survey of Stochastic Volatility.5Noting of course that given their formulation, the VAR long-run mean µt is both time varying, stochastic, and a function of the VAR
intercepts, αt, as µt = (I− Φt)−1αt.
6In Clark (2011) for example, his rolling sample window is only of size T = 80.
CHAPTER 2. IMPROVING VAR FORECASTS THROUGH AR INVERSE WISHART SV 87
cesses. Often, some form of fixed relationship is imposed between the elements of the VAR innovation covariance
matrix and the independent processes driving them. This is done in order to reduce the parameterization of the
model, but it may limit the richness of the covariance matrix dynamics.
The empirical rationale for this choice of specification is not entirely clear since an analysis of many macroe-
conomic time series suggest stationary volatility dynamics. Moreover, without explicitly parameterizing time
varying covariances matrices, it is extremely difficult to interpret any volatility spill-over effects, as have been
shown to be prevalent among financial time series from U.S. markets (see for example, Diebold and Yilmaz,
(2008)). Furthermore, studies such as Cogley and Sargent (2005) seem to provide little explicit justification for
the choice of a nonstationary process driving VAR innovation volatility, other than a brief comment that “the
random walk specification is designed for permanent shifts in the innovation variance, such as those emphasized
in the literature on the growing stability of the U.S. economy.” Ultimately, if the assumption of nonstationarity is
misspecified and there does in fact exist volatility spill-over across macroeconomic time series, then these existing
specifications leave something to be desired.
Given this, the multivariate volatility process can be constructed to directly model the time varying covariance
matrices without simply extending, in an ad hoc way, the traditional univariate specification to the multivariate
case. Moreover, any autoregressive persistence in volatility can be captured and a finite unconditional mean can
be specified. Philipov and Glickman (2006) [see also Chib, Omori, and Asai (2009)] apply such an autoregressive
Inverse Wishart process to analyze the conditional volatility of financial data and find that it improves volatility
forecasts over simpler formulations, where a number of Bayesian and frequentist measures are applied to compare
forecast accuracy given a variety of competing specifications. It is worth noting, however, that there exist problems
with the Philipov and Glickman (2006) implementation of the Inverse Wishart autoregressive volatility process as
it stands—see Rinnergschwentner et al. (2011) for more details and quite a few corrections. Therefore, in light
of Phillipov and Glickman’s idea of Inverse Wishart covariance modeling, we propose a modified version of their
process which we feel serves the purpose of multivariate forecasting better than Clark (2011)—see Section 2.3
below for more details, including differences between our model and that of Philipov and Glickman (2006).
The rest of the paper is organized as follows. Section 2.2 discusses the data and the methodology used to adjust
the data for trends. Section 2.3 discusses both the benchmark model based on Clark (2011) and the proposed
Inverse Wishart process modification based on Philipov and Glickman (2006). Section 2.4 then details the trend
specification and the conjugate priors we impose within the Bayesian framework. Section 2.5 then discussion
estimation of the model parameters by Gibbs sampler method. Section 2.6 discusses the method whereby we
generate forecast densities for both the VAR levels and covariance matrices across various horizons. Section 2.7
details the results of both the estimation process and the forecast comparisons based on Bayesian analysis of the
predictive likelihoods. Finally, Section 2.8 summarizes and concludes.
CHAPTER 2. IMPROVING VAR FORECASTS THROUGH AR INVERSE WISHART SV 88
2.2 Data
We consider four macroeconomic time series generated from aggregate U.S. data, that are
1. the real output growth,
2. a measure of the inflation rate,
3. the unemployment rate,
4. and an interest rate.
The data source is the same as in Clark (2011): the so-called “real-time” 7 data from the Federal Reserve Bank
of Philadelphia’s Real-Time Data Set for Macroeconomics (or “RTDSM”). The total sample size is quite small:
only T = 252 data points extending from the 2nd quarter of 1948 (hereon denoted as 1948:Q2) until the 1st
quarter of 2011. Output from the RTDSM database is quarterly real data and measured as either Gross Domestic
Product (GDP) or Gross National Product (GNP) depending on the data vintage.8 Inflation from the RTDSM is
also measured quarterly and as either a GDP or GNP deflator or a price index, depending on the vintage. We
measure growth and inflation rates as annualized log changes.9 The unemployment rate, however, is available
on a monthly basis so we simply average across each quarter in matching the quarterly nature of output and
inflation. Moreover, it should be noted that the unemployment rate tends to differ much less dramatically across
vintages. Finally, while Clark (2011) employs the federal funds interest rate series, Primiceri (2005) recommends
the nominal annualized yield on 3-month Federal Treasury Bills, since this series goes back much further. We
therefore adopt the latter series, and again, average across quarters since the data is monthly.10 Finally, output,
inflation, and the unemployment rate are already seasonally adjusted by their source providers.
Clark and McCracken (2008,2010) also provide evidence that point forecasts of GDP growth, inflation, and
interest rates are improved by specifying the latter two series as deviations from some form of deterministic trend
simulating inflation expectations. Given this result Clark (2011) adopts the Blue Chip Consensus forecast pro-
duced from survey data and published by Aspen Publishers Ltd., as a form of long-term inflations expectations.
Unfortunately, as Clark mentions in his online appendix, the data for this Blue Chip forecast of inflation expecta-
tions only extends back to the fourth quarter of 1979 (i.e. 1979:Q4). Therefore, Clark appends an exponentially
smoothed trend from his inflation series to be beginning of the Blue Chip series in extending it back to 1964. Clark7That is data that is regenerated annually to conform to new changes in the way we measure macroeconomic indicators, or to take into
account flaws in some previous set, observed ex-post. Each new issue is deemed a “vintage.”8The RTDSM generates entirely new time series each quarter (deemed “vintages”) based on updated chain weighting techniques or other
improvements. Thus newer vintages represent larger samples than older ones which were generated at previous dates.9Since log differences are already continuously compounded, we simply multiply each quarterly value by 4.
10The 3-month Federal Treasury bill rate series employed is a combination of two very similar series joined together at June 2000,since the first vintage was discontinued. “H15/discontinued/H1.RIFSGFPIM03 N.M” is the unique ID for the discontinued seriesand “H15/H15/RIFLGFCM03 N.M” is the newer series. Both series are available at the Federal Reserve website: http://www.federalreserve.gov/releases/h15/data.htm.
CHAPTER 2. IMPROVING VAR FORECASTS THROUGH AR INVERSE WISHART SV 89
mentions that despite his attempts at keeping the data “as real time as possible” by employing every quarterly vin-
tage of inflation data, in the end a trend based on his most recent vintage (2008:Q4) deviates little from the others.
Moreover, as Clark notes, Kozicki and Tinsley (2001a,2001b) and Clark and McCracken (2008), both suggest
that exponentially smoothed trends of the inflation rate match up reasonably well with survey-based measures of
long-run expectations in the data since the early 1980’s. Given both of these facts, we will simply employ an
exponentially smoothed trend of the inflation rate through the most recent vintage currently available (2011:Q4)
in generating a long-term inflation expectations series, skipping the Blue Chip survey data entirely and ignoring
the previous vintages of inflation data.11
Finally, the unemployment rate series is also detrended by an exponential smoother (in the same way the
inflation rate was detrended in order to generate the long-run inflation expectations [see footnote 11]).
Therefore, to summarize:
1. GDP growth is not detrended (but will be centered on a long-run constant mean of 3.0% through the prior
distribution).
2. The inflation rate is detrended by its exponentially smoothed trend (with a smoothing parameter of α =
0.05).
3. The interest rate (3-month Treasury bill) is detrended around the same trend as inflation (which is supposed
to simulate long-term inflation expectations), although we force a long-run constant mean of 2.5% above
trend through the prior on the unconditional mean.
4. The unemployment rate is detrended by its exponentially smoothed values lagged one period, with a
smoothing parameter of α = 0.02.
See the model and estimation Sections 2.3 and 2.5 respectively, for more details as to how these trends are
implemented into the model.
2.3 Model specifications
The benchmark model is the Bayesian VAR, steady-state prior, Stochastic Volatility specification (BVAR-SSP-
SV) as outlined in Clark (2011). This model employs a Bayesian V AR(J) formulation for the detrended series,
where the covariance matrix of the VAR innovations is driven by linear functions of separate univariate, indepen-
dent, geometric Brownian motions.
11The exponential smoother employed is as follows: y∗t = y∗t−1 + α(yt − y∗t−1), where yt is the actual data series and y∗t is theexponentially smoothed trend. α is a parameter which can be adjusted depending on how “tight” we want the trend to follow the data series.For the inflation rate trend used as long-term inflation expectations, Clark suggests a value of α = 0.05.
CHAPTER 2. IMPROVING VAR FORECASTS THROUGH AR INVERSE WISHART SV 90
2.3.1 Benchmark model
We refer to this benchmark model from Clark (2011) as the Clark specification:
vt = Π (L) (yt −Ψdt) , (2.1a)
where Π (L) = Ip −J∑j=1
ΠjLj and L is the lag operator, (2.1b)
vt = B−1Λ0.5t εt where εt ∼MVNp (0, Ip) , (2.1c)
and B is lower triangular with 1’s along the main diagonal.
Moreover, Λt = diag(λ1,t, λ2,t, . . . , λp,t), (2.1d)
ln (λi,t) = ln (λi,t−1) + ξi,t, ∀i = 1, . . . , p, (2.1e)
and ξi,t ∼ i.i.d. N(0, ϕi), ∀i = 1, . . . , p. (2.1f)
The first equation, (2.1a), is a vector autoregressive model applied to the macroeconomic series, adjusted for
trends. The trends have to be estimated by means of the Ψ matrix. More precisely, we introduce the state variables
dt as in Villani (2009). The state variables can be chosen in a number of ways.12 The yt admits a time varying
unconditional mean, µt = Ψdt, where Ψ is a p× q matrix and dt is a q × 1 vector of deterministic trends.
Π(L) is the matrix lag-polynomial and vt denotes the innovations of the process (yt). The second equation,
(2.1c), highlights the form of the stochastic volatility of the innovations. The stochastic volatility is V ar[vt|Λt] =
B−1ΛtB−1′ = Γt, and conditionally standardized innovations (εt) are i.i.d. Gaussian, for any volatility history.
Thus these standardized innovations are independent of the volatility process Γt.
The dynamic of the volatility process is very constrained, since the serial dependence arises only through the
diagonal matrix Λt, not by means of B, which is unchanging across time. Finally, the natural logarithms of the
diagonal elements of the Λt matrix are assumed to follow independent Gaussian random walks.
A few points of discussion are worth mentioning. First, if λi,t = λi,t−1 for all t, then the underlying processes,
λi,t, that drive the volatility of vt cannot be identified independently of the B matrix. Moreover, the choice of B
being constrained to be lower-triangular solves the identification problem of identifying the elements of Λt from
those of B−1.
Note that the Clark specification is not invariant to permutations in the asset order within the VAR. Indeed,
12For example, if dt = 1, ∀t, i.e. takes on a single constant value for all time periods, then Ψ is a vector of regression constants, the valuesof which determine the time invariant long-run means of the autoregressive levels processes, yt. However, if for example, dt = t, then theelements of the vector Ψ represent the slope coefficients of a linear time-trend relationship shared by each of the yt series. Furthermore, ifdt = [t, f(t)] for example, where f(t) is perhaps some nonlinear function of t, then Ψ is now a matrix of “factor loadings,” the elementsof which reflect how the time varying long-run means of each process are expressed as a linear combination of both the linear and nonlineartime trend, simultaneously. This approach to modeling the unconditional levels first-order moment allows for greater flexibility. For example,we could incorporate a pre-exponentially smoothed trend as one possible nonlinear function of time f(t) as above. See Section 2.4 for moredetails.
CHAPTER 2. IMPROVING VAR FORECASTS THROUGH AR INVERSE WISHART SV 91
without loss of generality, let us consider the bivariate case. The variance of the innovation vt is
E[vtvt′] = E[B−1Λt
0.5εtεt′Λt
0.5(B−1)′] = B−1Λt(B
−1)′
= Γt (2.2a)
=
b11 0
b21 b22
λ1,t 0
0 λ2,t
b11 b21
0 b22
(2.2b)
=
(b11)2λ1,t b11b21λ1,t
b11b21λ1,t (b21)2λ1,t + (b22)2λ2,t
, (2.2c)
where bij , i, j = 1, 2 denote the elements of the B−1 matrix.
Therefore, the variance of the innovation of the first series and the covariance between the two innovations
depend only on the process λ1,t. However, the variance of the second series depends on both processes λ1,t and
λ2,t. Therefore, shocks to the processes ξi,t have asymmetric effects on the variances of the innovations v1,t and
v2,t. This asymmetry is chosen arbitrarily by our order of assignment of the series to the VAR.
In the Clark specification, the volatility and covolatility processes are nonstationary. By the properties of the
Gaussian random walk, we get
ln(λi,t)| ln(λi,0) ∼ N [ln(λi,0), tϕi]. (2.3)
We deduce that
E[λi,t|λi,0] = λi,0 exp
(tϕi2
). (2.4)
On average we get an exponential rate of explosion of the diagonal elements of the matrix Λt. If ϕi > ϕj ,
say, the volatility of series i becomes, asymptotically, infinitely larger than the volatility of series j. And so as
t → ∞ we have that, conditional on past information, the process λi,t is divergent (i.e. explosive). This result
implies that forecasts of the VAR innovation covariance matrices will have explosive elements, which is not a
suitable property of Clark’s model.
However, all is not lost; a similar type of argument shows that if we respecify the λi,t process as
ln(λi,t) = zi,t = α+ βzi,t−1 + ξi,t, (2.5)
then for |β| < 1 the process λi,t is now convergent with unconditional mean
E[λi,t] = exp α
1− β+
ϕi2(1− β2)
. (2.6)
CHAPTER 2. IMPROVING VAR FORECASTS THROUGH AR INVERSE WISHART SV 92
2.3.2 Alternative volatility process specification
As an alternative to the Clark specification, we propose respecifying the volatility process to more easily account
for spill-over effects in covariance across time, through the use of a multivariate Inverse Wishart specification.
While in the Clark specification the nonstationary λi,t processes drive the covariance dynamics, Γt, through
the lower-triangular B matrix, the Inverse Wishart model specifies the dynamics of the latent covariance matrices
directly. Fundamentally, the Inverse Wishart process implies stationary covariance matrix dynamics. Therefore,
the Inverse Wishart process also allows us to formulate the covariance matrix process as autoregressive with a
finite unconditional mean that exists under certain conditions defined below.
The Inverse Wishart Stochastic Volatility (IWSV ) model, is given as follows:
vt = Π (L) (yt −Ψdt) , (2.1a)
vt|Σt,yt−1∼MVNp(0,Σt), (2.7a)
Σt|Σt−1,yt−1∼ IWp(ν,St−1), (2.7b)
where St−1 =
(CC
′+
K∑k=1
AkΣt−kA′
k
)(ν − p− 1) , (2.7c)
where yt, for instance, denotes the set of current and lagged values of yt, and IWp(ν,S), denotes the Inverse
Wishart distribution with dimension p, degree of freedom (i.e. shape parameter) ν, and scale matrix S.13 The
specification of the scale matrix in (2.7c) is the same as in the multivariate ARCH models considered in Engle and
Kroner (1995). In particular, the p× p matrices, C and Ak, k = 1, . . . ,K, are identified if C is lower triangular
with strictly positive elements along the main diagonal, and if the top left element of each Ak are strictly positive.
The Inverse Wishart distribution is a continuous distribution for stochastic, symmetric, positive-definite matri-
ces [see e.g. Press (1982)]. The joint density function of the Inverse Wishart distribution has the simple analytic
expression given as:
f(Σ; ν,S) =|det(S)|ν/2|det(Σ)|−
ν+p+12
2vp2 Γp(ν/2)
exp [−1
2Tr(SΣ−1
)], (2.8)
where Tr(·) denotes the trace operator and Γp(·), the multivariate Gamma function.14
The dynamics of the stochastic volatility matrix given in (2.7c) do not involve lagged values of the series
variable (yt). Thus, the stochastic volatility is exogenous and the IWSV specification assumes no leverage
effects.
13A stochastic, symmetric, positive-definite matrix Σ follows the Inverse Wishart distribution: Σ ∼ IWp(ν,S), if and only if Σ−1
follows the Wishart distribution: Σ−1 ∼Wp(ν,S−1).14The multivariate Gamma function is defined as Γp(a) = πp(p−1)/4
∏pj=1 Γ(a + (1 − j)/2), and depends on the dimension p and
argument a.
CHAPTER 2. IMPROVING VAR FORECASTS THROUGH AR INVERSE WISHART SV 93
From the properties of the Inverse Wishart distribution, we deduce the first and second-order conditional
moments of the volatilities [Press (1982)]:
E[Σt|Σt−1] =St−1
ν − p− 1= CC
′+
K∑k=1
AkΣt−kA′
k, (2.9a)
V ar[σij,t|Σt−1] =(ν − p+ 1) s2
ij,t−1 + (ν − p− 1) sii,t−1sjj,t−1
(ν − p)(ν − p− 1)2(ν − p− 3)
, (2.9b)
and Cov[σij,t, σkl,t|Σt−1] =2sij,tskl,t + (ν − p− 1) (sik,tsjl,t + sil,tskj,t)
(ν − p) (ν − p− 1)2
(ν − p− 3), (2.9c)
where σij is the ijth element of Σt and sij,t−1 is the ijth element of St−1.
This specification is similar to that in Philipov and Glickman (2006) although we modify it slightly. For
one, we add the constant matrix CC′
to the scale matrix expression (2.7c) in order to allow for a non-zero
unconditional mean of the volatility process. Secondly, we add a number of lags K instead of just one. Finally,
Philipov and Glickman (2006) employ an extra parameter, d, to allow for a geometric autoregressive recursion of
varying rates, as opposed to the fixed arithmitic average employed here. For instance, they consider the similar
model with autoregressive lag order set to 1, but allow for the lagged effect to be taken account by means of a
Σt−1d matrix:15
St−1 = νA−1/2′Σt−1dA−1/2 (2.10)
Note, the purpose here is not to improve on the Philipov and Glickman model, but rather to suggest something
similar as a useful alternative to Clark (2011) in terms of forecasting.
At this point we present weak stationarity conditions of the IWSV volatility process.
Proposition 2.3.1. Existence of the unconditional mean of the IWSV process
The unconditional mean of the IWSV process exists if and only if all the eigenvalues of the matrix
Υ =∑Kk=1 Ξk are less than 1 in modulus. In this case the unconditional mean is given by:
E [σt] =(Ig −Υ)−1c, (2.11)
where g = p(p+1)2 , c = vech
(CC
′), σt = vech (Σt) , and Ξi = L (Ai ⊗Ai) D. The
existence of the unconditional mean is a necessary condition for (weak) stationarity.
Note that in this case L and D are the elimination and duplication matrices respectively so that vec (X) =
Dvech(X) and vech (X) =Lvec(X).16
15Note that in Philipov and Glickman (2006) the use of notation is different. For example, they have that Σt−1|ν,St−1 ∼Wp(ν,St−1),
where St−1 = 1νA1/2
(Σt−1
−1)d
A1/2′ . Therefore, this implies that Σt|ν,S−1t−1 ∼ IWp(ν,S−1
t−1) and so depends on the inverse scalematrix instead of the scale matrix. Therefore our scale matix is the inverse of theirs.
16The duplication matrix is the unique n2 × n(n + 1)/2 matrix, D, which, for any n × n symmetric matrix X, transforms vech(X)
CHAPTER 2. IMPROVING VAR FORECASTS THROUGH AR INVERSE WISHART SV 94
Proof of Proposition 2.3.1
From equation (2.9a) we have:
Σt = CC′+
K∑k=1
AkΣt−kAk
′+ Zt, (2.12)
where Zt is a mean zero matrix of weak white noises.
First, by recursive substitution of Σt−i, i = 1, . . . we can show that the right hand side of (2.12) converges in
expectation. Next, taking unconditional expectation Σ = E[Σt] we have:
Σ = CC′+
K∑k=1
AkΣAk
′. (2.13)
Vectorizing above we have:
vec(Σ) = vec(CC′) +
K∑k=1
vec(AkΣAk
′) (2.14a)
= vec(CC′) +
K∑k=1
(Ak ⊗Ak
′)vec(Σ) (2.14b)
⇒ Lvec(Σ) = Lvec(CC′) +
K∑k=1
L(Ak ⊗Ak
′)
Dvech(Σ) (2.14c)
⇒ vech(Σ) = vech(CC′) +
K∑k=1
Ξkvech(Σ). (2.14d)
And so Proposition 2.3.1 follows.
The condition in Proposition 2.3.1 is a necessary condition for stationarity, but not a sufficient condition. In a
Bayesian approach, we are interested in the whole distribution, not in the mean only. Thus strict stationarity, that
is concerning the entire distribution, has to be considered, not only weak stationarity. Unfortunately, necessary
and sufficient conditions for the strict stationarity of the autoregressive Inverse Wishart process have not yet been
derived in the literature.17
2.3.3 Comments
The advantages of such a change to the specification of the volatility process defined by the IWSV model are as
follows:
into vec(X), where vec(·) is the vectorization operator which maps from the n× n dimensional space to the n2 × 1 dimensional space andvech(·) is the operator that omits the lower (resp. upper) triangle of the symmetric matrix X so that it maps from the n×n dimensional spaceinto the n(n+ 1)/2× 1 dimensional space. The elimination matrix performs the inverse operation: it is the unique n(n+ 1)/2×n2 matrix,L, which, for any n× n symmetric matrix X, transforms vec(X) into vech(X). See Magnus and Neudecker (1980) for more details.
17Whereas they have been derived for the analogue Wishart autoregressive process (WAR) [see Gourieroux et al. (2009)].
CHAPTER 2. IMPROVING VAR FORECASTS THROUGH AR INVERSE WISHART SV 95
1. The direct specification of the dynamics of the latent stochastic volatility process, Σt, precludes the need
to specify a B matrix.
2. These autoregressive dynamics between volatility series are more easily interpreted as volatility spill-over
effects, since we no longer need to disentangle the convoluted relationships implied by the B matrix and
the independent volatility driving processes, λi,t.
3. The model is now invariant to permutation of the order of the observed series.
4. As was shown, it is easy to derive conditions ensuring the existence of the unconditional mean of the
processes (Σt) and (yt). However, this condition is a weak one and the condition of strong stationarity
remains to be shown.
5. The assumption of stationary volatility dynamics resolves a problem with forecasts, since assuming nonsta-
tionary volatility would make our forecast density prediction intervals “blow up” as the horizon increases.
In both the Clark and IWSV specifications, the volatility processes are coupled to the vector autoregressive
process, with trends, for the observed variables yt. Both specifications have a state-space representation with the
V AR(J) as observation equation and the IWSV or Clark as the state equation, the covariance being the “state”
of the model. In control engineering, a state-space representation is a mathematical model of a physical system
as a set of input, output and state variables related by first-order differential equations. However, in this case the
model is not a linear state-space representation [see the system in (2.7)]. Therefore, the standard Kalman Filter
algorithm for extracting the state from the noise will not be optimal.
Note that our Inverse Wishart model can be inverted to show the precision matrix as Wishart distributed
(omitting the CC′
constant matrix for simplicity) as
Σt−1|Σt−1 ∼Wp(ν,
(K∑k=1
AkΣtAk
′
)−1
), (2.15)
or after a change of notation Ωt = Σt−1 as
Ωt|Ωt−1 ∼Wp(ν,
(K∑k=1
AkΩt−1Ak
′
)−1
). (2.16)
However, an autoregressive Wishart matrix process, Ωt, say, is usually written as [Gourieroux et al. (2009)]
Ωt|Ωt−1 ∼Wp(ν,
(K∑k=1
AkΩtAk
′
)). (2.17)
CHAPTER 2. IMPROVING VAR FORECASTS THROUGH AR INVERSE WISHART SV 96
And so, the only difference between the two models is the specification of the scale matrix: as an arithmetic
average in the case of the standard Wishart autoregressive process, or as a harmonic average in our case. Therefore,
when the lag order of the autoregression K > 1, we have an asymmetry between the behaviour of the precision
matrix and covariance matrix processes. Of course, this issue does not affect Philipov and Glickman (2006) since
their autoregression is only of order 1.
Since we have that
E
[1
x
]≥ 1
E[x]⇔(E
[1
x
])−1
≤ E[x], (2.18)
we can expect that the harmonic average is smaller than an arithmetic average (even in the case of matrices, but we
omit the proof). It is possible that this inequality maybe useful in deriving sufficient conditions for strict stationary
for the IWSV autoregressive model.
As an aside, Fox and West (2011) also propose a novel class of stationary covariance matrix processes which
exploit properties of Inverse Wishart partitioned matrix theory. Specifically, by augmenting the parameter state-
space they show that we can easily obtain representations for the terms in a factorization of the joint density of
covariance matrices across time, f(ΣT , . . . ,Σ0) =∏Tt=1 f(Σt,Σt−1)
∏Tt=2 f(Σt−1). This expression defines a
stationary first-order Markov process on the covariance matrices across time, with the marginal distribution given
as Σt ∼ IWq(ν + 2, νS), and given the following augmented matrix
Σt φ′
t
φt Σt−1
∼ IW2q
ν + 2, ν
S SF′
FS S
, (2.19)
where φt = ΥtΣt−1, we have that by Inverse Wishart partitioned matrix theory the covariance process can be
written as an AR(1) process Σt = Ψt + ΥtΣt−1Υ′
t, with Υt representing a random coefficient matrix and Ψt
representing an innovation (note both Υt and Ψt are latent variables). Under this framework the conditional
density Σt|Σt−1 is not of an analytical form but can nonetheless be explored theoretically. See Fox and West
(2011) for more details.
2.4 Priors
The Bayesian estimation framework employed requires of us to specify certain prior beliefs upon the parameter
set and this is done through the specification of prior densities. In most cases the prior densities are chosen to be
conditionally conjugate—that is, they are chosen of a known family such that the conditional posterior density,
i.e. the density of a particular parameter, conditional on both the other parameters and the data, works out to be of
the same family as the prior. This facilitates estimation greatly since the need for arbitrarily choosing a suitable
CHAPTER 2. IMPROVING VAR FORECASTS THROUGH AR INVERSE WISHART SV 97
proposal density, as in a Metropolis-Hastings algorithm (MH), is avoided completely—in fact, in this case the
proposal is always accepted and the MH algorithm is just a special case of the Gibbs sampler [Greenberg (2008),
pg.101]. The following Sections outline the specific families the prior densities take, as well as chosen values for
hyperparameters.
Clark and IWSV specifications share the same dynamic model for the observed macroeconomic time series,
yt, given the volatility path, with parameters θ1 = Ψ,Π1, . . . ,ΠJ, but different dynamics for the volatility
process, with parameters θ2 = B,Φ for the Clark model18 and θ2 = C, ν,A1, . . . ,AK for the IWSV
model. We assume that parameters θ1 and θ2 are independent under the prior distributions. We describe below in
greater detail the priors for both θ1 and θ2.
2.4.1 VAR(J) priors
i) Prior on Π
The prior for the VAR coefficients Πj, follow a modified Minnesota specification (see Litterman (1986)). In
this case we assume that the prior for the joint density of Π′= [Π1,Π2, . . . ,ΠJ] is Normal, Π ∼ N [µΠ,ΞΠ],
where the autoregressive order J is assumed known. Moreover, the prior mean of the joint density of the elements
of Π assumes that the VAR follows an AR(1) process, i.e. means of the prior density for all the elements of
autoregressive matrices beyond lag 1 are set to 0. Since GDP growth displays more autoregressive decay in
levels, we set its first-order autoregressive prior mean to 0.25 and set the others to 0.8. Cross equation prior
means, that is, the means for the prior density of the off-diagonal elements of Πik,1 for i 6= k, are also set to 0.
Let us now explain how the variances and covariances of the prior are chosen. Minnesota “own equation”
variances, that are, the variances of the prior density for the main diagonal elements of Πj, shrink as a harmonic
series for each additional lag (i.e. ωii,j = 0.2j for j = 1, . . . J) . Also, “cross equation” variances are typically set
to ωik,j = 0.5( 0.2j ×
σ∗iσ∗k
), where σ∗i is the estimated standard error of the residuals from a univariate autoregression
on the ith macroeconomic series with six lags, pre-fit to the data in advance. For simplicity, however, we will
employ ωik,j = 0.5( 0.2j ) instead, that is, the variance of the prior density for the i, kth element of Πj.
ii) Prior on Ψ
Priors on the deterministic parameters of Ψ defining the trend are extremely important given the modest
sample size employed and are chosen as to influence the series’ trends toward certain reasonable values. In the
case where the trend is assumed constant: dt = 1, and Ψ is a vector. This dramatically reduces the number
of parameters that need to be estimated as the number of series increases. However, it places a prior constraint
on the model by assuming that the selected constant trend chosen is correct. On the other hand, if we allow the
18Where Φ is the diagonal matrix with the variances of the λi,t volatility driving process shocks, ϕi, along its main diagonal.
CHAPTER 2. IMPROVING VAR FORECASTS THROUGH AR INVERSE WISHART SV 98
trends to enter individually through the dt term where dt=[1, f(t− 1), g(t)]′, and f(t) and g(t) are exponentially
smoothed trends for the unemployment rate and inflation growth respectively, then Ψ becomes a p×3 matrix from
which we can statistically evaluate whether the relevant diagonal elements are indeed equal to 1 (which would
imply the trends are in fact correct).
In either case we assume that the prior for the joint density of the elements of Ψ is Normal, Ψ ∼ N [µΨ,ΞΨ].
Moreover, we assume that the priors for Ψ and Π are independent. GDP growth is influenced to have a constant
trend around 3.0% through its prior mean, while inflation and unemployment are influenced to center around their
trends, g(t) and f(t−1) which are exponentially smoothed values of inflation growth and the unemployment rate
respectively (see Section 2.2). Finally, the interest rate is centered around the same trend as inflation; however, we
also add to this the constant trend of 2.5% to reflect the real long-run rate. More precisely, for the macroeconomic
series taken in order as: GDP growth, the inflation rate, the interest rate, and unemployment rate, the prior mean
of Ψ will take the form
3.0
0
2.5
0
(2.20)
when the trends are constant, and
3.0 0 0
0 0 1
2.5 0 1
0 1 0
. (2.21)
when the trends are driven by the 3-dimensional dt.
The prior variances of Ψ are set as follows: GDP growth, 0.2 (0.3); inflation, 0.2 (0.3); the interest rate, 0.6
(0.75); and unemployment, 0.2 (0.3)—where these values have been adopted directly from Clark (2011). The first
values, not in parenthesis, represent those employed in the recursive estimation scheme and are tighter since the
gradually increasing sample size tends to limit the influence of the prior. Prior covariances for the elements of Ψ
are set to zero.
2.4.2 Volatility model priors
i) Clark model
For the Clark (2011) model, priors on the volatility components of the model are as follows. The prior density
for the elements of B is multivariate Normal and the prior for each of the ϕi, i = 1, . . . , p is Inverse Gamma, and
CHAPTER 2. IMPROVING VAR FORECASTS THROUGH AR INVERSE WISHART SV 99
under the prior, the elements of B and the ϕi, i = 1, . . . , p, are independent. Finally, we borrow numerical values
of the hyperparameters directly from Clark’s paper [ Clark (2011) pg.331].
ii) IWSV model
For the Inverse Wishart autoregressive specification, we employ independent multivariate Normal priors on
both Ak, ∀k and C, and independently, a Gamma prior on (ν−p). The Gamma prior is set with hyperparameters
α = 30, β = 2 (shape and rate) as to represent ignorance of its value while the multivariate Normal priors for
the C and Ak’s are set somewhat loosely to let the data speak. In this respect, prior means for the main diagonal
of C are 0.3 and the main diagonal of A1 is set to 0.9 (both prior densities for the off-diagonals elements have
zero means, and the means of all other elements of the Ak, k = 1, . . . ,K matrices are set to 0). Variances are set
equal to 0.002 (i.e. standard deviation of about 0.045), and all covariances are set to zero.
2.5 Model estimation
Both Clark and IWSV model specifications are estimated within the Bayesian framework using Markov Chain
Monte-Carlo (MCMC) techniques, particularly the Gibbs sampler.
i) Gibbs sampler
Indeed, by selecting prior distributions in conjugate families, we can derive closed form expressions of con-
ditional posterior distributions. For expository purposes, let us consider a case in which the set of parameters can
be divided into two subsets, θ1 and θ2, such that we know the expression of conditional posterior distributions
p(θ1|θ2, y) and p(θ2|θ1, y). Let us also assume that it is easy to draw in these conditional posterior distributions.
In general, it is not possible to obtain the closed form expression for the joint posterior distribution p(θ1, θ2|y).
The Gibbs sampler is a method to derive numerically a good approximation of the joint posterior, while also
allowing us to draw in this posterior. The idea is to consider the Markov process θ(m), defined recursively by:
1. θ(m)1 is drawn in p(θ1|θ(m−1)
2 , y),
2. θ(m)2 is drawn in p(θ2|θ(m)
1 , y).
For large m ≥ M , the values θ(m) will approximately follow the invariant distribution of the Markov process
θ(m), that is, the joint posterior. In particular, θ(m), for large m, is a drawing in p(θ1, θ2|y).
This approach is easily extended when the set of parameters is divided into more than two subsets [see below
the sequence used for both the Clark and IWSV specifications].
ii) Augmented parameters
In a Bayesian framework there is little difference between parameter θ and latent volatilities Σt, t = 1, . . . , T .
They are both unobserved and stochastic. Therefore, the Gibbs sampler can be applied jointly to θ and ΣT =
CHAPTER 2. IMPROVING VAR FORECASTS THROUGH AR INVERSE WISHART SV 100
Σ1, . . . ,ΣT to reconstitute the joint density p(θ,ΣT|yT). This joint density has two components, that are
p(θ|yT), which is the posterior distribution of the parameter, and p(ΣT|θ,yT), which is the filtering distribution
of the sequence of latent volatilities.
iii) Gibbs sampler steps - Clark (2011)
Specifically, given the parameters described in the Section 2.3.1 above, we have the following Gibbs sampling
steps for the Clark (2011) benchmark model, where the volatility driving process ΛT is introduced as an aug-
mented parameter to be estimated. All conditional posteriors below are conditional on the data, yT, left unstated
below for ease of exposition.
1. Draw the autoregressive coefficients Π′= [Π1,Π2, . . . ,ΠJ] of the VAR, conditional on Ψ, ΛT, B, and
Φ=diag(ϕ1, ϕ2, . . . , ϕp), given a conditionally conjugate multivariate Normal prior, Π ∼ N(µΠ,ΞΠ).
2. Draw the trend coefficients Ψ of the VAR, conditional on Π, ΛT, B, and Φ=diag(ϕ1, ϕ2, . . . , ϕp), given
a conditionally conjugate multivariate Normal prior, Ψ ∼ N(µΨ,ΞΨ).
3. Draw the elements of B (lower triangular with ones in the diagonal) conditional on Π, Ψ, ΛT, and
Φ=diag(ϕ1, ϕ2, . . . , ϕp), given Normal, independent, priors on each of the elements of the B matrix.
4. Draw the elements of the volatility driving process Λt for each time t = 1, . . . , T in sequence, each
conditional on Λ\t,Π, Ψ, B, and Φ=diag(ϕ1, ϕ2, . . . , ϕp), where the notation \t denotes the set of
all matrices except that at time t. A Metropolis-Hastings-within-Gibbs step is required here since the
posterior distribution is of an unknown family.
5. Draw the diagonal elements of Φ conditional on Π,, Ψ, B, and ΛT. We assume conditionally conjugate,
independent Inverse-Gamma priors on each ϕi ∼ IG(γ2 ,δ2 ).
iv) Gibbs sampler steps - IWSV model
Similarly, we have the following Gibbs sampling steps for the IWSV specification, where the covariance ma-
trices Σt are introduced as augmented variables. Again, all conditional posteriors below are implicitly conditional
on the data, yT.
1. Draw the slope coefficients Π′
= [Π1,Π2, . . . ,ΠJ] of the VAR, conditional on Ψ, ΣT, Ak, ∀k, C, and
ν, given multivariate Normal prior, Π ∼ N(µΠ,ΞΠ).
2. Draw the steady state coefficients Ψ of the VAR, conditional on Π, ΣT, Ak, ∀k, C, and ν, given multi-
variate Normal prior, Ψ ∼ N(µΨ,ΞΨ).
CHAPTER 2. IMPROVING VAR FORECASTS THROUGH AR INVERSE WISHART SV 101
3. Draw the parameters Ak, ∀k,C, and ν jointly, conditional on Π, Ψ, and ΣT. Multivariate Normal priors
are imposed on the Ak matrices, and the C matrix. A Gamma prior is imposed on the degree of freedom,
ν. A Metropolis-Hastings-within-Gibbs step is required here since the posterior distribution is of an
unknown family.
4. Finally, draw Σt conditional on Σ\t,Ak, ∀k,C, ν, Π, and Ψ, ∀t in sequence. A Metropolis-Hastings-
within-Gibbs step is required here since the posterior distribution is of an unknown family.
The steps above are derived in greater detail within Appendix 2.11.
Under the IWSV process, the direct estimation of the latent stochastic volatility covariance matrix process
increases the number of latent parameters from Tp to Tp(p + 1)/2, raising quickly the curse of dimensionality
as an issue as p increases. Moreover, the number of regular parameters goes from p(p−1)2 + p to p(p+1)
2 +Kp2 +
1, although in this latter case many possible reparameterizations are possible to reduce the number of regular
parameters the IWSV must estimate.
Moreover, since conditionally conjugate priors are unknown at this point for the conditional posterior densities
of the IWSV regular parameters, a Metropolis-Hastings random walk sampler is employed. This additional
Metropolis-Within-Gibbs step requires some extra work to obtain reasonable draws.
Finally, the estimation methods employed for both specifications suffer from the fact that they draw the latent
stochastic volatility covariance matrices sequentially across time rather than jointly. Clearly, if the volatilities Σt
are significantly correlated across time, joint sampling would prove superior. This is since the conditional posterior
density may suffer from low variance, leading to draws which may fail to traverse the full parameter support in a
reasonable amount of time. See Greenburg (2008) pg.94 for a simple example illustrating the problem.
2.6 Forecasts
2.6.1 Point and interval forecasts
Given the Bayesian model estimation framework employed, forecasts can be easily obtained with little extra
computational overhead. Moreover, the Bayesian framework provides an intuitive way of comparing forecast
accuracy.
Generally, the desired predictive density of some forecasted value yf given data set y is
p (yf | y) =
∫p (yf | θ, y)π (θ | y) dθ (2.22)
where p(yf |θ, y) is the predictive distribution given parameter θ and data y, and π(θ|y) is the posterior distribution
CHAPTER 2. IMPROVING VAR FORECASTS THROUGH AR INVERSE WISHART SV 102
of θ. When the value yf represents the true outcome of the data series known ex post, given the particular model
formulation estimated ex ante, the left hand side is known as the predictive likelihood of the given outcome value
yf [Geweke and Amisano (2010)].
In our specification, formula (2.22) can be applied to forecast a future path yT+1, . . . ,yT+h at date T . More-
over, we can introduce explicitly the stochastic volatility. We get:
p(yT+1, . . . ,yT+H|yT) =
∫. . .
∫p(yT+1, . . . ,yT+H,ΣT+1, . . . ,ΣT+H, θ,ΣT|yT)·
dθdΣT
H∏h=1
dΣT+h (2.23)
where again, yT = yT, . . . ,y0. By factorizing the joint density within the integral in (2.23), we get
p(yT+1, . . . ,yT+H|yT) =
∫. . .
∫p(yT+1, . . . ,yT+H|ΣT+H,yT, θ)p(ΣT+1, . . . ,ΣT+H|ΣT,yT, θ)·
p(ΣT, θ|yT)dθdΣT
H∏h=1
dΣT+h. (2.24)
We already possess draws from p(ΣT, θ|yT) = p(ΣT|θ,yT)p(θ|yT) by using the Gibbs sampler with aug-
mented parameter. Therefore, each time we draw an mth value from the Gibbs sampler for θ(m) and ΣT(m), we
can simultaneously draw sequences yT(m), . . . ,yT+H
(m) and ΣT(m), . . . ,ΣT+H
(m) given the parameter-
ization of the model and its implied conditional normality.
To summarize, we have the following steps:
1. Obtain a draw for θ(m) and ΣT(m) from the Gibbs sampler.
2. Conditioning on these values and the data, draw a covariance matrix ΣT+1(m) using the chosen volatility
specification, either Clark, or IWSV .
3. Draw yT+1(m) from the V AR(J) Gaussian level process.
4. Repeat again from step (1), drawing the next horizon, until we finish drawing for horizon T +H .
These steps above provide a draw from the joint conditional density
p(yT+1, . . . ,yT+H,ΣT+1, . . . ,ΣT+H, θ,ΣT|yT). (2.25)
These steps are repeated providing a sequence of draws for m sufficiently large, M0 +M > m ≥ M0, say, with
M0 and M large.
CHAPTER 2. IMPROVING VAR FORECASTS THROUGH AR INVERSE WISHART SV 103
The sequence of draws can be used for approximating the joint predictive distribution, p(yT+1, . . . ,yT+H|yT+1),
or some of its moments. Let us focus on the point and interval forecasts.
For the purposes of forecasting, we wish to consider both the conditional mean and quantiles of the predictive
density
p(yT+1, . . . ,yT+H|yT) = p(yT+H|yT+H−1) . . . p(yT+1|yT). (2.26)
i) Point forecasts
For the short-term horizon a consistent estimator of the mean of E[yT+1|yT] is its sample counterpart
1
M
M0+M∑m=M0
yT+1(m), (2.27)
computed on the final iteration of the Gibbs sampler.
Moreover, by the Law of Iterated Expectations, we have:
E[yT+1|yT] = E[E[yT+1|ΣT,yT, θ]|yT]. (2.28)
Therefore, another consistent estimator of E[yT+1|yT] is
1
M
M0+M∑m=M0
E[yT+1|ΣT(m),yT, θ
(m)], (2.29)
as long as the conditional expectation E[yT+1|ΣT,yT, θ] has an analytical form.
Moreover, again by the Iterated Law of Expectations, the results above can be generalized to any horizon
h = 1, 2, . . . .,H since:
E[E[E[yT+h|yT+h−1, . . . ,yT+1,yT]|yT+h−2, . . . ,yT+1,yT] . . . ] = E[yT+h|yT]. (2.30)
ii) Interval forecasts
Interval forecasts at horizon h can be derived by estimating the quantiles of the predictive density p(yT+h|yT).
Let us look for a prediction interval with lower bound a α-quantile and upper bound the (1 − α)-quantile. This
forecast interval can be approximated as follows:
1. Rank in increasing order the yT+h(m), m = M0 + 1, . . . ,M .
2. The approximated confidence interval admits as lower bound the yT+h(m) at rank αM and an upper bound
as the yT+h(m) at rank (1− α)M .
CHAPTER 2. IMPROVING VAR FORECASTS THROUGH AR INVERSE WISHART SV 104
2.6.2 Forecast comparison
i) Sample windows
The focus of this paper is to compare forecast performance. Forecast comparisons are made by employing a
limited subsample of the data for estimation purposes, thus leaving some latter part of the data available as the
“true” outcome of the macroeconomic series. Furthermore, we do not simply estimate the parameters of the model
once, rather we estimate the parameters a number of times in sequence, n = 1, . . . , N , which we call “sample
windows,” each sample window focusing on a different subsample of the entire data set.
As in Clark (2011) we employ both “recursive” and “rolling” schemes for these sample windows. Under each
scheme, sample windows are iterated upon: first, a subsample of the entire data set is isolated; we then use this
subsample to estimate the model parameters, forecasts are then generated, and comparisons are made according
to the chosen metrics discussed in this section. Both schemes employ the same subsample of the data for the
first sample window. However, the two schemes differ in how they deal with later sample windows after the first.
Under the recursive scheme, for each subsequent sample window, one future data point is appended to the end of
the subsample, and so the subsample grows larger as the sequence of sample windows progress. Under the rolling
scheme, the size of the subsample is fixed, and so the subsample shifts forward in time by one data point for each
iteration.
The following Figures 2.1 and Figures 2.2 illustrate both the recursive and rolling window schemes where
we have arbitrarily chosen a subsample size of 130 data points for the initial sample window. Notice how the time
index of the last value in the subsample, T , changes across the iterations of sample windows, n = 1, . . . , N , and
so we can denote them T (n) to show that they depend on the value of n.
ii) Comparing point forecasts
In comparing point forecasts we can employ the mean squared error (MSE) estimator for any forecast horizon
Figure 2.1: Subsample sequence by recursive window
CHAPTER 2. IMPROVING VAR FORECASTS THROUGH AR INVERSE WISHART SV 105
Figure 2.2: Subsample sequence by rolling window
h = 1, . . . ,H . Forecast errors can be computed where y∗T+h is the “true” out-of-sample data point at forecast
horizon h and the point forecast E[yT+h|yT] is estimated as described above.
The mean squared error estimator is given as
MSEh,N =1
N
N∑n=1
uT(n)+huT(n)+h
′(2.31)
where the expected forecast error E[yT(n)+h|yT(n)] − y∗T(n)+h is computed for each iteration, n, and where
T (n) denotes the end of sample data point. In the case of the rolling window scheme, the set yT(n) denotes only
those data points in the rolling window and not the entire set of data starting from y1.
The off-diagonals of MSEh,N represent the squared forecast errors across macro series at horizon h, and the
main diagonal elements are the squared forecast errors for each individual series themselves.
iii) Model fit
The standard Bayesian tool for measuring the overall forecast performance is the predictive likelihood de-
scribed above in (2.22) [Geweke and Amisano (2010)]. The finite sample approximation, within the context of
our model, is given as:
p(y∗T(n)+h, . . . ,y∗T(n)+1|yT(n)) =
1
M
M0+M∑m=M0
p(y∗T(n)+h, . . . ,y
∗T(n)+1|ΣT(n)
(m),yT(n), θ(m))
≈ E[p(y∗T(n)+h, . . . ,y
∗T(n)+1|ΣT(n),yT(n), θ
)|yT(n)
]= p(y∗T(n)+h, . . . ,y
∗T(n)+1|yT(n)), (2.32)
by LLN, where y∗T(n)+h is the true out of sample data point at horizon h. This estimator gives us an idea of how
well the model and parameter estimates “fit” the true out-of-sample data. That is, given that
y∗T(n)+h, . . . ,y∗T(n)+1
are the actual values observed ex-post, what is the probability of their occurrence under our model and ex-ante
CHAPTER 2. IMPROVING VAR FORECASTS THROUGH AR INVERSE WISHART SV 106
estimated parameters, given the subsample window at iteration n. If one model is more congruent with the actual
future outcomes than another, its predictive likelihood should be greater.
When the entire sample data set is known ex-post, we can interpret the sequence of one-step ahead predictive
densities as the marginal likelihood:
p(yT
)=
T∗−1∏t=1
p(y∗t+1|yt
). (2.33)
However, we would like to generalize (2.33) to account for our iterative sample window schemes discussed above.
Instead of assuming that the entire data set is known, rather we will assume that under each sample window, n,
only the data yT(n) is known ex-ante when we estimate the model parameters that define pn(·). Then, keeping
in mind the ex-post predictive likelihood as in (2.32), we can take the product across the N sample windows to
obtain a measure of ex-post fit at the first horizon h = 1:
N∏n=1
pn
(y∗T(n)+1|yT(n)
). (2.34)
Under the recursive window scheme, the expression (2.34) is the analog to (2.33), but where we take the product
across only a subset of the data, t = T (1), . . . , T (N), and where we employ the changing estimated predictive
distribution pn(·).
However, under the rolling sample window scheme, it is not clear how to interpret (2.34). Moreover, taking
the product across sample windows under changing densities and window schemes forces us to reinterpret (2.34)
not as a density, but purely as a product of forecast metrics. Therefore, taking logs we can transform the product
(2.34) into a sum and interpret this metric as an average across dependent forecast attempts. Moreover, we can
also consider robustness of the forecast attempts via sample moments such as the variance across sample window
forecasts. For example, we have the mean of the log-predictive likelihoods as:
MLPLh=1,N =1
N
N∑n=1
lnpn
(y∗T(n)+1|yT(n)
), (2.35)
where each term in the sum will hereon be denoted as LPLh=1,n, and the variance is given as,
V LPLh=1,N =1
N
N∑n=1
(lnpn
(y∗T(n)+1|yT(n)
)−MLPLh=1,N
)2
. (2.36)
Finally, we generalize the above two sample moments to any horizon by replacing the predictive likelihoods with
the ex-post one-step ahead predictive distribution at horizon h:
pn
(y∗T(n)+h|y
∗T(n)+(h−1), . . . ,y
∗T(n)+1,yT(n)
). (2.37)
CHAPTER 2. IMPROVING VAR FORECASTS THROUGH AR INVERSE WISHART SV 107
That is, the last term in the factorization of the joint predictive likelihood given in (2.32).
We can compare the Clark and IWSV specifications by looking at how these sample moments differ across
forecast horizons. For example, we could compare the difference between the two model’s MLPLh,N ’s across
horizons h = 1, . . . ,H as a means of comparing the “term structure” of competing forecast performance across
the horizons.
Furthermore, this difference can be “decomposed” into a sum of N log-ratios which can be compared across
sample windows n = 1, . . . , N to suggest at which sample window, n, either model did better, or worse, at
forecasting given some fixed horizon h. This difference, given, say, models A ≡ IWSV and B ≡ Clark is
defined, for example given h = 1, as
N(MLPLAh=1,N −MLPLBh=1,N
)=
N∑n=1
ln
ˆpA,n
(y∗T(n)+1|yT(n)
)−
N∑n=1
ln
ˆpB,n
(y∗T(n)+1|yT(n)
)
=
N∑n=1
ln
ˆpA,n
(y∗T(n)+1|yT(n)
)ˆpB,n
(y∗T(n)+1|yT(n)
) . (2.38)
after post multiplication by N . In fact, each log-ratio term in the sum can be interpreted as the ex-post predictive
Bayes factor in favour of model A over model B, at sample window n.
Bayes factors are the standard Bayesian method of model comparison. The predictive likelihood method
represents an inherently Bayesian approach to forecast comparison and as such we do not require p-values, since
we obtain the finite sample distribution directly. For more details on Bayesian versus frequentist approaches to
forecast generation and analysis see Geweke and Amisano (2010).
2.7 Applications
This section will now discuss applications. The first subsection will evaluate the implementation of the Bayesian
estimation methodology via simulated data. The second subsection applies the Clark and IWSV VAR volatility
specifications to the macroeconomic data set. Specifically, we will endeavour to compare the two models along
a number of dimensions, including: point and interval forecast accuracy, posterior trend estimation, VAR rate of
decay, volatility process behaviour, and estimation of the other model parameters.
2.7.1 Monte-Carlo Analysis
We first perform a Monte-Carlo analysis to provide some insight on the implementation of the Bayesian method-
ology. We first generate an artificial data set, following a IWSV model. This dataset is used within a rolling
sample window scheme to iterate a sequence of estimations of both the Clark and IWSV specifications. The
CHAPTER 2. IMPROVING VAR FORECASTS THROUGH AR INVERSE WISHART SV 108
experiment will provide information on the convergence and accuracy of the IWSV based estimation of the pa-
rameters. It will also be used to detect the misspecification of Clark’s model. Moreover, we will compare forecasts
from the misspecified model according to the metrics discussed in Section 2.6.2.
i) The Data Generating Process
We simulate 370 data points according to the IWSV model. The selected orders are 3 for the VAR component,
and 3 for the Inverse Wishart component. The parameter values are:
C = 0.3Ip, (2.39a)
A1 =
0.5 0 0 0
0 0.75 0 0
0 0 0.85 0
0 0 0 0.98
, A2 = A3 = 0, (2.39b)
Π1 =
0.25 0 0 0
0 0.8 0 0
0 0 0.8 0
0 0 0 0.8
, Π2 = Π3 = 0, (2.39c)
Ψ =
[3.0 0 2.5 0
]′, dt = 1,∀t, so no trend, and (2.39d)
ν = 30. (2.39e)
This model implies the following relationships among the series variables, yi,t, i = 1, . . . , 4:
y1,t = 3.0 + 0.25 (y1,t−1 − 3.0) + v1,t, (2.40a)
y2,t = 0.8y2,t−1 + v2,t, (2.40b)
y3,t = 2.5 + 0.8 (y3,t−1 − 2.5) + v3,t, (2.40c)
and y4,t = 0.8y4,t−1 + v4,t. (2.40d)
The following relationship on the conditional scale matrix, St−1, of the Inverse Wishart covariance matrices Σt,
CHAPTER 2. IMPROVING VAR FORECASTS THROUGH AR INVERSE WISHART SV 109
is given as:
St−1 =
0.09 + 0.25σ211,t−1 0.375σ12,t−1 0.425σ13,t−1 0.49σ14,t−1
0.375σ21,t−1 0.09 + 0.5625σ222,t−1 0.6375σ23,t−1 0.735σ24,t−1
0.425σ31,t−1 0.6375σ32,t−1 0.09 + 0.7225σ233,t−1 0.833σ34,t−1
0.49σ41,t−1 0.735σ42,t−1 0.833σ43,t−1 0.09 + 0.96σ244,t−1
(30−4−1),
(2.41)
where the scale matrix is given as in equation (2.7c) above, and the σij,t−1 are the elements of Σt−1. Moreover,
from equation (2.9a), the conditional mean of the stochastic covariance matrix Σt is given as St−1/(30−4−1) =
CC′+ A1Σt−1A1
′. Finally, by applying Proposition 2.3.1, the unconditional means of the stochastic volatilies
are given as:
σ211 = 0.12, σ2
22 = 0.21, σ233 = 0.32, and σ2
44 = 2.25, (2.42)
where the stochastic covolatilities have zero unconditional mean.
From the proof of Proposition 2.3.1 the IWSV model can be written as:
Σt = CC′+ A1Σt−1A1
′+ Zt, (2.43)
where Zt is a zero mean matrix of weak white noises. Furthermore, this expression can be vectorized and rewritten
in terms of the autoregressive coefficient matrix Υ = L (A1 ⊗A1) D as:
vech(Σt) = vech(CC′) + Υvech(Σt−1) + vech(Zt). (2.44)
The persistence in the (co)volatility series are therefore measured by looking at the eigenvalues of the 10× 10
dimensional Υ matrix that determines the rate of reversion to the unconditional mean, or response, given a unit
impulse shock, Zτ , to the (co)volatilies, at time τ . Since, the matrix Υ is diagonal, we can easily solve for the
eigenvalues as:
0.250, 0.375, 0.425, 0.490, 0.563, 0.638, 0.723, 0.735, 0.833, 0.960 . (2.45)
The largest eigenvalue is close to a unit root, and so it will influence σ244,t to exhibit the slowest rate of autore-
gressive decay.
The following figures describe the simulated data set. Figure 2.3, provides the sample paths of the simulated
data series yt. Figure 2.4, provides a plot of the simulated stochastic volatilities σ2ii,t, i = 1, . . . , 4, that is,
the diagonal elements of the matrix Σt. Finally, Figure 2.5, provides the i, jth stochastic correlations ρij,t =
σij,t√σ2ii,t
√σ2jj,t
.
CHAPTER 2. IMPROVING VAR FORECASTS THROUGH AR INVERSE WISHART SV 110
These artificial series reveal a number of features. For example, we can see that the values chosen for Ψ
influence the mean of the series yt in Figure 2.3. Moreover, since the shocks, vi,t, are driven by their conditional
volatilities, σ2ii,t, their conditional meanEt−1[σ2
ii,t] = 0.09+αiiσ2ii,t−1 plays a role in determining the magnitude
of the volatility of yi,t. Indeed, series yi,t, associated with stochastic volatilities with larger unconditional means,
0.09/(1− αii), tend to exhibit larger overall episodes of volatility spikes.
For example, from Figure 2.3, we see that the 4th series exhibits the largest volatility episodes as it is asso-
ciated with Et−1[σ244,t] = 0.09 + 0.96σ2
44,t−1, and the largest unconditional mean σ244 = 2.25. Moreover, by
examining Figure 2.3, we see that the IWSV process exhibits volatility clustering for the 4th series, y4,t. For
example, volatility between periods 1 to 100 is much smaller than between periods 200 to 300, and volatility
episodes tend to persist across time. Furthermore, these episodes tend to coincide with larger increases in the
volatility process σ244,t given in Figure 2.4, although this relationship is convoluted since it also involves the au-
toregressive behaviour of the equations given in (2.40). Finally, the stochastic correlations seen in Figure 2.5 tend
to grow larger in magnitude as the α2ij of the corresponding i, jth expected variance components increase, since it
represents a multiplicative constant in the conditional variance of the (co)volatility Vt−1[σij,t]. For example, the
ρij,t’s associated with the 3rd and 4th series are more variable than those associated with the 1st and 2nd series.
CHAPTER 2. IMPROVING VAR FORECASTS THROUGH AR INVERSE WISHART SV 111
Figu
re2.
3:Si
mul
ated
sam
ple
path
sy
t
-6-4-2 0 2 4 6
110
020
030
0
y t,1 ψ1
-6-4-2 0 2 4 6
110
020
030
0
y t,2 ψ2
-6-4-2 0 2 4 6
110
020
030
0
y t,3 ψ3
-6-4-2 0 2 4 6
110
020
030
0
y t,4 ψ4
CHAPTER 2. IMPROVING VAR FORECASTS THROUGH AR INVERSE WISHART SV 112
Figu
re2.
4:Si
mul
ated
stoc
hast
icvo
latil
ities
0
0.5 1
1.5 2
2.5 3
3.5
110
020
030
0
σ2
11,t
σ2
22,t
σ2
33,t
σ2
44,t
CHAPTER 2. IMPROVING VAR FORECASTS THROUGH AR INVERSE WISHART SV 113
Figure 2.5: Simulated stochastic correlations
-0.8-0.6-0.4-0.2
0 0.2 0.4 0.6 0.8
1 100 200 300
ρ21,t
-0.8-0.6-0.4-0.2
0 0.2 0.4 0.6 0.8
1 100 200 300
ρ31,t
-0.8-0.6-0.4-0.2
0 0.2 0.4 0.6 0.8
1 100 200 300
ρ41,t
-0.8-0.6-0.4-0.2
0 0.2 0.4 0.6 0.8
1 100 200 300
ρ32,t
-0.8-0.6-0.4-0.2
0 0.2 0.4 0.6 0.8
1 100 200 300
ρ42,t
-0.8-0.6-0.4-0.2
0 0.2 0.4 0.6 0.8
1 100 200 300
ρ43,t
ii) Estimation
Given this simulated data, we estimate both the Clark and IWSV volatility specifications, according to the
rolling window scheme, described Figure 2.2, with fixed window size of 260 and 100 sample windows. The prior
densities for the Gibbs sampling are set with means equal to the true values and variances set as in Section 2.4.
The estimated parameters turn out to be quite close to their true values–see Tables 2.3.i to 2.4.ii in Appendix
2.12, which describe the posterior distribution of the parameters for both models under the 1st sample window, as
well as the distribution of the posterior means across allN = 100 further iterations. As expected, given the nature
CHAPTER 2. IMPROVING VAR FORECASTS THROUGH AR INVERSE WISHART SV 114
of the simulated data, there is less variation in the distribution of the posterior means across sample windows than
in the posterior density itself, given the 1st sample window. In other words, there is little change in the posterior
means of the model parameters across sample windows.
The only parameter that seems systematically biased is the degree of freedom parameter of the IWSV pro-
cess, ν. Figure 2.6 plots the true value of ν = 30 against the posterior mean and 95% credibility region across
the N = 100 sample windows. As can be seen, its estimate is lower than the true value. However, as expected,
for larger simulated sample sizes the estimated value converges to the true value.
From Table 2.4.i, notice that the posterior means for the VAR parameters of theClark model are very close to
those of the IWSV . For theClark volatility parameters, the true values are omitted since the data were generated
using the IWSV .
Interestingly, under the IWSV process, the tracking of the latent stochastic volatility estimated posterior
means tends to be worse when tracking series associated with the smaller eigenvalues of Υ–see Figures 2.7.i
and 2.7.ii below. It appears as though there exists a lower bound on the tracking of the posterior mean estimates
across time for the 1st volatility series—see Figure 2.7.i. Multiple tests have revealed that the smaller the value
chosen for the associated diagonal element of A1, the more pronounced is this lower bound, and conversely, the
closer is the diagonal element of A1 to 1, the better the posterior mean is able to account for variation in the true
volatility sample path. More generally, we assume that the tracking improves as the eigenvalues of the stability
matrix Υ =∑Kk=1 Ξk approach 1. More investigation is needed, however, to establish definitively the theoretical
properties of this phenomenon.
Figure 2.6: IWSV, Posterior of ν across N = 100 sample windows
10
15
20
25
30
10 20 30 40 50 60 70 80 90 100
ν
Time
True valuePosterior mean
2.5% C.I.97.5% C.I.
CHAPTER 2. IMPROVING VAR FORECASTS THROUGH AR INVERSE WISHART SV 115
Figure 2.7.i: IWSV, filtered latent volatility for 1st series, 1st sample window
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
50 100 150 200 250
Var
ianc
e
Time
Posterior mean2.5% C.I.
97.5% C.I.True value
Figure 2.7.ii: IWSV, filtered latent volatility for 4th series, 1st sample window
0
1
2
3
4
5
6
50 100 150 200 250
Var
ianc
e
Time
Posterior mean2.5% C.I.
97.5% C.I.True value
iii) Forecasts
In comparing the overall forecast performance, we appeal to the mean of the log predictive likelihoods,
MLPLh,N , term structure, across horizons h = 1, . . . , 10. Notice that the IWSV model fits the out-of-sample
CHAPTER 2. IMPROVING VAR FORECASTS THROUGH AR INVERSE WISHART SV 116
data much better than the Clark, especially at large horizons–see Figure 2.8. At large horizons, such as h = 10,
the MLPLh,N metric is much larger for the IWSV model, which suggests that this model fits the out-of-sample
data better as the horizon increases.
Of course, we can also consider the term structure of the model specific log predictive likelihoods, LPLh,n,
across the individual sample windows. In this case we can interpret the log-ratios at each sample window, n, as
predictive Bayes factors (recall equation (2.38) from Section 2.6.2.iii). Figure 2.9 plots these Bayes factors across
sample windows where model A ≡ IWSV and model B ≡ Clark. Larger values suggest that the IWSV is
more representative of the out-of-sample outcomes than the Clark at each sample window n = 1, . . . , N .
Figure 2.9 also suggests that there is a variability in the forecasting performance according to the LPLh,n
metric. This Figure 2.9 provides the term structure of model performance, according to the LPLh,n metric,
across the sample windows n = 1, . . . , N , where as usual A ≡ IWSV and B ≡ Clark. The variability in the
LPLh,n forecast metrics for each model, can be measured by considering their sample moments across sample
windows, given in Table 2.1. Interestingly, while the IWSV fares better in terms of the sample mean of LPLh,n
metrics across sample windows, the Clark model metrics have the advantage of being less variable, skewed, and
leptokurtic. Figure 2.10 provides histograms of these distributions across sample windows, at the 10th horizon.
Figure 2.8: Simulated data, Forecast horizon term structure according to MLPLh,N metric, N = 100 samplewindows
-4.5
-4.4
-4.3
-4.2
-4.1
-4
-3.9
-3.8
-3.7
-3.6
-3.5
1 2 3 4 5 6 7 8 9 10
ML
PLh,
N=1
00
Horizon 'h'
IWSVClark
CHAPTER 2. IMPROVING VAR FORECASTS THROUGH AR INVERSE WISHART SV 117
Figure 2.9: Simulated data, Sample window term structure according to difference of LPLh,n’s metric, h = 10
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
0 10 20 30 40 50 60 70 80 90 100
sample window 'n'
LPLA,h=10,n-LPLB,h=10,n
Table 2.1: Simulated data, Sample moments of the LPLh,n metrics across N = 100 sample windowsHorizon h = 1 Horizon h = 10IWSV Clark IWSV Clark
mean -3.6816 -3.8461 -3.7208 -4.2147stnd. dev. 2.1310 1.9748 2.0290 1.9748skewness -1.005 -0.6926 -0.9314 -0.5453kurtosis 3.834 3.1393 3.5944 3.1594
Figure 2.10: Histograms of the LPLh=10,n metrics across n = 1, . . . , N sample windows, 10th horizon
CHAPTER 2. IMPROVING VAR FORECASTS THROUGH AR INVERSE WISHART SV 118
2.7.2 Real data
i) The Clark (2011) data set
Let us now turn to the real-world Clark (2011) macroeconomic data set. In this case we do not know the true
model and so we will attempt to choose between the two misspecified models, i.e. the Clark and IWSV models,
according to out-of-sample forecast performance.
We first consider plots of the Clark (2011) data series provided in figures Figures 2.11 and 2.12. Figure
2.11 provides a plot of the raw data series, along with exponentially smoothed trends, as described in Section 2.2
above. Figure 2.12 presents the data series after detrending by the associated smoothed trend. Recall that the
trends applied are those from Clark (2011) in order to replicate the results from that paper, and not necessarily
because we believe these trends to provide the best fit.
ii) Estimation
We estimate both volatility models on an initial subsample data size of 130, across 100 sample windows. Both
rolling and recursive window schemes are estimated. Various orders of both the VAR and IWSV specification
were tested and three lags were ultimately chosen as a balance between parameterization and improvement in
model fit. The Gibbs sampling steps in Appendix 2.11 are performed for both models with a draw size of M =
100, 000, and a burn-in of M0 = 10, 000, and the priors are set as described in Section 2.4.
Summary statistics of the posterior distributions of the parameters are given in Tables 2.5.i to 2.6.iii, in Ap-
pendix 2.12. Tables 2.5.i and 2.6.i provide the posterior means and 95% credibility intervals for both the IWSV
and Clark model parameters, respectively, given the 1st sample window of a recursive window scheme. Tables
2.5.ii and 2.6.ii provide the sample means and 95% confidence intervals for the distribution of the posterior means
of both the IWSV and Clark model parameters, across the N = 100 sample windows, again given a recursive
window scheme. Finally, Tables 2.5.iii and 2.6.iii are the analogs to the previous described tables, except we
instead employ a rolling sampling window scheme.
These tables reveal a number of interesting features. First, irrespective of sample window size or scheme, the
posterior means of many parameters deviate from our assumptions on the prior means, suggesting that the data
are informative. For example, within the context of the IWSV model, the elements of the main diagonal of the
C matrix deviate from the assumption of 0.3, although the off-diagonal elements tend to stay close to zero.
CHAPTER 2. IMPROVING VAR FORECASTS THROUGH AR INVERSE WISHART SV 119
Figu
re2.
11:C
lark
(201
1)m
acro
econ
omic
data
set,
seri
esan
dsm
ooth
edtr
ends
-15
-10-5 0 5 10
15
20
1948
-04
1962
-07
1976
-10
1991
-01
2005
-04
GD
P gr
owth
-4-2 0 2 4 6 8 10
12
14
1948
-04
1962
-07
1976
-10
1991
-01
2005
-04
Infl
atio
n ra
teIn
flat
ion
tren
d
0 2 4 6 8 10
12
14
16
1948
-04
1962
-07
1976
-10
1991
-01
2005
-04
Inte
rest
rate
Infl
atio
n tr
end
2 3 4 5 6 7 8 9 10
11
1948
-04
1962
-07
1976
-10
1991
-01
2005
-04
Une
mpl
oym
ent r
ate
Une
mpl
oym
ent t
rend
CHAPTER 2. IMPROVING VAR FORECASTS THROUGH AR INVERSE WISHART SV 120
Figu
re2.
12:C
lark
(201
1)m
acro
econ
omic
data
set,
detr
ende
dse
ries
-15
-10-5 0 5 10
15
20
1948
-04
1962
-07
1976
-10
1991
-01
2005
-04
GD
P gr
owth
-8-6-4-2 0 2 4 6 8 10
1948
-04
1962
-07
1976
-10
1991
-01
2005
-04
Infl
atio
n ra
te
-6-4-2 0 2 4 6 8 1948
-04
1962
-07
1976
-10
1991
-01
2005
-04
Inte
rest
rate
-2-1 0 1 2 3 4 5 1948
-04
1962
-07
1976
-10
1991
-01
2005
-04
Une
mpl
oym
ent r
ate
CHAPTER 2. IMPROVING VAR FORECASTS THROUGH AR INVERSE WISHART SV 121
Interestingly, the degree of freedom parameter ν exhibits posterior means in a range between approximately 15 to
19, higher than our assumption of 15 on the prior. As for the Clark model, the distribution of the posterior means
for the elements of the B matrix, suggests that we can reject the prior mean assumption that these elements are
zero. Within the context of both models, the 1st element, ψ1, of Ψ, associated with GDP growth, exhibits posterior
means slightly above the assumption of 3 and the 3rd value, ψ3, associated with the interest rate, exhibits posterior
means slightly below the assumption of 2.5. Moreover, ψ2 and ψ4 both exhibit the possibility of non-zero means,
despite our assumptions.
Moreover, there is a surprising level of consistency in the evolution of the posterior means across sample
windows and window types. For example, let us consider Figures 2.23.i to 2.24.ii, in Appendix 2.12, which plot
the posterior distributions of various model parameters from both the IWSV and Clark volatility specifications,
across the sample windows for both the recursive and rolling window schemes. The posterior distributions in-
volving the rolling sample windows are much more variable than those that employ the recursive sample window
scheme.
We can also check the stability properties of both the VAR and IWSV volatility process across the sample
windows. Figures 2.13.i and 2.13.ii plot the absolute value of the largest eigenvalues from both the VAR(3)
companion matrix given as Π1 Π2 Π3
I4 0 0
0 I4 0
(2.46)
and the Υ =∑3k=1 L(Ak ⊗ Ak)D matrix, which determines the IWSV stability (recall equation (2.44) and
Proposition 2.3.1), across the N = 100 runs. We employ the posterior mean of the relevant parameters in con-
structing these matrices and their associated eigenvalues. The VAR(3) processes are generally stable. However, it
appears that there may exist a unit root in the IWSV volatility process.
Finally, Figure 2.14 plots the posterior mean of the latent stochastic volatilities for both the IWSV andClark
models, for the complete sample, given augmented parameters filtered given a recursive sample window at the
N = 100th iteration. Moreover, Figure 2.15.i and Figure 2.15.ii plot the associated stochastic correlations for
both the IWSV and Clark models respectively. The Clark model exhibits stochastic correlations which are too
smooth, and, as expected, both models exhibit a negative stochastic correlation, ρ41,t, between shocks to GDP
growth and the unemployment rate.
CHAPTER 2. IMPROVING VAR FORECASTS THROUGH AR INVERSE WISHART SV 122
Figure 2.13.i: Largest eigenvalue of VAR(3) companion matrix, across N = 100 sample windows
0.88
0.89
0.9
0.91
0.92
0.93
0.94
0.95
0.96
0.97
0 10 20 30 40 50 60 70 80 90 100
IWSV volatility - recursive windowIWSV volatility - rolling window
0.88
0.9
0.92
0.94
0.96
0.98
1
0 10 20 30 40 50 60 70 80 90 100
Clark volatility - recursive windowClark volatility - rolling window
Figure 2.13.ii: Largest eigenvalue of the Υ matrix, across N = 100 sample windows
0.97
0.975
0.98
0.985
0.99
0.995
1
1.005
1.01
1.015
1.02
0 10 20 30 40 50 60 70 80 90 100
Recursive window
0.955
0.96
0.965
0.97
0.975
0.98
0.985
0.99
0.995
1
1.005
0 10 20 30 40 50 60 70 80 90 100
Rolling window
CHAPTER 2. IMPROVING VAR FORECASTS THROUGH AR INVERSE WISHART SV 123
Figu
re2.
14:I
WSV
and
Cla
rk,fi
ltere
dla
tent
stoc
hast
icvo
latil
ies,
100t
hsa
mpl
ew
indo
w,R
ecur
sive
win
dow
0 10
20
30
40
50
60
70
1948
-10
1963
-01
1977
-04
1991
-07
2005
-10
σ2
11,t,
IWSV
γ2
11,t,
Cla
rk
0 5 10
15
20
25
30
35
1948
-10
1963
-01
1977
-04
1991
-07
2005
-10
σ2
22,t,
IWSV
γ2
22,t,
Cla
rk
0 1 2 3 4 5 6 7 8 1948
-10
1963
-01
1977
-04
1991
-07
2005
-10
σ2
33,t,
IWSV
γ2
33,t,
Cla
rk
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7 19
48-1
019
63-0
119
77-0
419
91-0
720
05-1
0
σ2
44,t,
IWSV
γ2
44,t,
Cla
rk
CHAPTER 2. IMPROVING VAR FORECASTS THROUGH AR INVERSE WISHART SV 124
Figure 2.15.i: Real data, IWSV model, filtered latent stochastic correlations for the complete sample, n = 100
-1
-0.5
0
0.5
1
1948-10 1963-01 1977-04 1991-07 2005-10
ρ21,t
-1
-0.5
0
0.5
1
1948-10 1963-01 1977-04 1991-07 2005-10
ρ31,t
-1
-0.5
0
0.5
1
1948-10 1963-01 1977-04 1991-07 2005-10
ρ41,t
-1
-0.5
0
0.5
1
1948-10 1963-01 1977-04 1991-07 2005-10
ρ32,t
-1
-0.5
0
0.5
1
1948-10 1963-01 1977-04 1991-07 2005-10
ρ42,t
-1
-0.5
0
0.5
1
1948-10 1963-01 1977-04 1991-07 2005-10
ρ43,t
iii) Forecasts
We would now like to establish the forecast properties of the VAR associated with both the IWSV and Clark
volatility models. We first compare the MSEh,N metrics which establish point forecast accuracy across the term
structure of forecast horizons, h = 1, . . . , 20. Figure 2.16 provides plots of the percentage difference in the main
diagonal elements of theMSEh,N matrix, across h, for both the recursive sample window scheme and the rolling
sample window scheme, respectively.
From these plots it is not clear that either model performs better in terms of point forecast performance. This
CHAPTER 2. IMPROVING VAR FORECASTS THROUGH AR INVERSE WISHART SV 125
Figure 2.15.ii: Real data, Clark model, filtered latent stochastic correlations for the complete sample, n = 100
-1
-0.5
0
0.5
1
1948-10 1963-01 1977-04 1991-07 2005-10
ρ21,t
-1
-0.5
0
0.5
1
1948-10 1963-01 1977-04 1991-07 2005-10
ρ31,t
-1
-0.5
0
0.5
1
1948-10 1963-01 1977-04 1991-07 2005-10
ρ41,t
-1
-0.5
0
0.5
1
1948-10 1963-01 1977-04 1991-07 2005-10
ρ32,t
-1
-0.5
0
0.5
1
1948-10 1963-01 1977-04 1991-07 2005-10
ρ42,t
-1
-0.5
0
0.5
1
1948-10 1963-01 1977-04 1991-07 2005-10
ρ43,t
CHAPTER 2. IMPROVING VAR FORECASTS THROUGH AR INVERSE WISHART SV 126
is to be expected of course, since our IWSV modification to the model is an alternative specification on the
volatility of the process, not the mean.
Figure 2.16: Real data, MSE comparison of VAR forecasts, % difference, both window types (below 0, IWSV
better)
-20-15-10-5 0 5
10 15 20
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Horizon 'h'
Recursive window
GDP growthInflation rateInterest rate
Unemployment rate
-20-15-10-5 0 5
10 15 20
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Horizon 'h'
Rolling window
GDP growthInflation rate
Interest rateUnemployment rate
Turning now to comparing overall forecast performance, we again appeal to the mean of the log predictive
likelihoods, MLPLh,N , term structure, across horizons h = 1, . . . , 20. This term structure should illustrate
improved out-of-sample predictive likelihood, given the alternative specification on the second moment of the
VAR process shocks.
From Figure 2.17.i we see how the mean of these LPLh,n measures is improved as the forecast horizon ex-
tends out toH . This result is irrespective of sample window scheme, although the recursive window, which grows
in size across the iterations, tends to fare slightly better than the rolling window at further horizons. Interestingly,
the Clark model does better according to this metric at the 1st horizon, across both sample window types. It is
CHAPTER 2. IMPROVING VAR FORECASTS THROUGH AR INVERSE WISHART SV 127
Figure 2.17.i: Real data, Forecast horizon term structure according to MLPLh,N metric, N = 100 samplewindows, both Recursive and Rolling sample windows
-4.6
-4.4
-4.2
-4
-3.8
-3.6
-3.4
2 4 6 8 10 12 14 16 18 20
ML
PLh,
N=1
00
Horizon 'h'
IWSV - RecursiveClark - Recursive
IWSV - RollingClark - Rolling
Figure 2.17.ii: Real data, Forecast horizon term structure according to MLPLh,N metric, N = 100 samplewindows, includes homoskedastic vt
-7
-6.5
-6
-5.5
-5
-4.5
-4
-3.5
-3
2 4 6 8 10 12 14 16 18 20
ML
PLh,
N=1
00
Horizon 'h'
IWSV - RecursiveClark - Recursive
Homoskedastic
CHAPTER 2. IMPROVING VAR FORECASTS THROUGH AR INVERSE WISHART SV 128
not clear why this is the case. For reference, Figure 2.17.ii duplicates the previous figure, but includes the case of
homoskedastic VAR innovations, vt, for reference.19
Again, we can also consider the term structure of the model specific log predictive likelihoods, LPLh,n,
across the individual sample windows. Figure 2.18 plots the difference in the LPLh=20,n metrics at horizon,
h = 20, across sample windows, where model A ≡ IWSV and model B ≡ Clark. Larger values suggest
that the IWSV is more representative of the out-of-sample outcomes than the Clark at each sample window
n = 1, . . . , N .
Figure 2.18: Real data, Sample window structure of the difference of LPLh,n’s metric, h = 20
-4-3-2-1 0 1 2 3 4
0 10 20 30 40 50 60 70 80 90 100
sample window 'n'
Recursive window
LPLA,h=20,n-LPLB,h=20,n
-4-3-2-1 0 1 2 3 4
0 10 20 30 40 50 60 70 80 90 100
sample window 'n'
Rolling window
LPLA,h=20,n-LPLB,h=20,n
Moreover, these LPLh,n metrics are changing across the sample windows according to their own sampling
distribution, given in Table 2.2. Again, we see similar results as in the simulated data case above in Section
19Estimating the VAR model with homoskedastic shocks vt is accomplished by replacing the Gibbs sampler steps for the volatility parame-ters with a single Gibbs step that employs an Inverse Wishart prior density. Since the Inverse Wishart prior is conditionally conjugate with themultivariate Normal, the conditional posterior is also Inverse Wishart. That is, if π (Σ) ∼ IW (a0,V0) ⇒ p (Σ | v) ∼ IW (a1,V1)
such that V1 =∑Tt=1 vtvt
′+ V0 and a1 = T +a0, where vt is the p×1 vector of VAR residuals. V0 is set to the unconditional sample
covariance matrix of simulated VAR residuals (generated with reasonable guesses on the VAR parameters) and a0 = 15.
CHAPTER 2. IMPROVING VAR FORECASTS THROUGH AR INVERSE WISHART SV 129
2.7.1.iii. For the 20th forecast horizon, h = H = 20, we find that the mean of the log predictive likelihoods,
MLPLh,N=100, are larger under the IWSV , which initially suggests that the IWSV improves out of sample
fit at the longer horizons. However, for the 1st horizon, we see the opposite, that is, the Clark has a larger
MLPLh=1,N=100 metric associated with it. As for the the other sample moments, generally they suggest larger
deviations in forecast performance across the sample windows when using the IWSV model. The larger kurtosis
and negative skew imply that the IWSV is more sensitive to rare occurrences of poor forecasting fit, this sensi-
tivity increases as the horizon extends out, and it is worse under the rolling sample window scheme. Interestingly,
with the exception of the recursive sample window scheme at horizon, h = 1, the Clark model now exhibits
a larger standard deviation of LPLh,n, metrics, than the IWSV . Finally, to get an idea of the shape of these
distributions, Figure 2.19 provides histograms of the LPLh,n, metrics across the sample windows.
Figure 2.20 provides the analog to Figure 2.18, in scatter plot format. That is, it presents the LPLh=20,n
values, for each sample window n = 1, . . . , 100, at the largest horizon, h = 20. The shape of the scatter gives
us an intuition on how each model performs across the sample windows which complements the previous Figure
2.20 which presented the sample window LPLh=20,n differences in chronological order.
Finally, Figures 2.21.i to 2.21.iv plot the out-of-sample forecasts of the VAR series, yt, out H = 20 periods,
given the last recursive sample window n = 100 (this subsample includes nearly the entire data set). While
the forecasted conditional means are quite similar between the IWSV and Clark models, as expected, the
prediction intervals are distinctly shaped, reflecting the different underlying stochastic volatility processes. The
Clark prediction intervals tend to “bell” out and expand as the horizon grows large, while the IWSV prediction
intervals tend to stabilize. Interestingly, in referencing Figure 2.18 for this particular sample window, n = 100,
we see that this represented a subsample where the Clark model performed better. Clearly, in this case the 95%
prediction intervals encompass the true data outcome more accurately than the IWSV prediction intervals.
Table 2.2: Real data, Sample moments of the LPLh,n metrics across N = 100 sample windowsRecursive sample window
Horizon h = 1 Horizon h = 20IWSV Clark IWSV Clark
mean -3.6352 -3.5736 -3.7165 -4.3938stnd. dev. 2.0322 1.9614 1.7725 2.0413skewness -0.9329 -0.2877 -1.3387 -0.4069kurtosis 4.1266 2.401 6.5644 3.9734
Rolling sample windowHorizon h = 1 Horizon h = 20IWSV Clark IWSV Clark
mean -3.5373 -3.4788 -3.8435 -4.501stnd. dev. 2.0205 2.0533 2.1202 2.3618skewness -0.9310 -0.2463 -1.8423 -0.2015kurtosis 4.1083 2.1766 8.0861 2.5614
CHAPTER 2. IMPROVING VAR FORECASTS THROUGH AR INVERSE WISHART SV 130
Figure 2.19: Histograms of the LPLh=20,n metrics across n = 1, . . . , N sample windows, 20th horizon
CHAPTER 2. IMPROVING VAR FORECASTS THROUGH AR INVERSE WISHART SV 131
Figu
re2.
20:R
eald
ata,
Sam
ple
win
dow
stru
ctur
eof
theLPLh,n
’sm
etri
c,h
=20
-14
-12
-10-8-6-4-2 0 2 -1
4-1
2-1
0-8
-6-4
-2 0
2
Clark
IWSV
Rec
ursi
ve w
indo
w
LPL
h=20
,n-1
4
-12
-10-8-6-4-2 0 2 -1
4-1
2-1
0-8
-6-4
-2 0
2
Clark
IWSV
Rol
ling
win
dow
LPL
h=20
,n
CHAPTER 2. IMPROVING VAR FORECASTS THROUGH AR INVERSE WISHART SV 132
Figure 2.21.i: GDP growth series y1,t and forecast, IWSV and Clark models for vt, n = 100, Recursive sample window
-15
-10
-5
0
5
10
15
20
2001-04 2002-03 2003-02 2004-02 2005-01 2006-01 2006-12 2007-12 2008-11 2009-10 2010-10
IWSV
GDP growth2.5% C.I.
97.5% C.I.True outcome
Estimated trend
-15
-10
-5
0
5
10
15
20
2001-04 2002-03 2003-02 2004-02 2005-01 2006-01 2006-12 2007-12 2008-11 2009-10 2010-10
Clark
GDP growth2.5% C.I.
97.5% C.I.True outcome
Estimated trend
CHAPTER 2. IMPROVING VAR FORECASTS THROUGH AR INVERSE WISHART SV 133
Figure 2.21.ii: Inflation growth series y2,t and forecast, IWSV and Clark models for vt, n = 100, Recursive sample window
-10
-5
0
5
10
2001-04 2002-03 2003-02 2004-02 2005-01 2006-01 2006-12 2007-12 2008-11 2009-10 2010-10
IWSV
Inflation growth2.5% C.I.
97.5% C.I.True outcome
Estimated trend
-10
-5
0
5
10
2001-04 2002-03 2003-02 2004-02 2005-01 2006-01 2006-12 2007-12 2008-11 2009-10 2010-10
Clark
Inflation growth2.5% C.I.
97.5% C.I.True outcome
Estimated trend
CHAPTER 2. IMPROVING VAR FORECASTS THROUGH AR INVERSE WISHART SV 134
Figure 2.21.iii: Interest rate series y3,t and forecast, IWSV and Clark models for vt, n = 100, Recursive sample window
-10
-5
0
5
10
15
2001-04 2002-03 2003-02 2004-02 2005-01 2006-01 2006-12 2007-12 2008-11 2009-10 2010-10
IWSV
Interest rate2.5% C.I.
97.5% C.I.True outcome
Estimated trend
-10
-5
0
5
10
15
2001-04 2002-03 2003-02 2004-02 2005-01 2006-01 2006-12 2007-12 2008-11 2009-10 2010-10
Clark
Interest rate2.5% C.I.
97.5% C.I.True outcome
Estimated trend
CHAPTER 2. IMPROVING VAR FORECASTS THROUGH AR INVERSE WISHART SV 135
Figure 2.21.iv: Unemployment rate series y4,t and forecast, IWSV and Clark models for vt, n = 100, Recursive sample window
-4
-2
0
2
4
2001-04 2002-03 2003-02 2004-02 2005-01 2006-01 2006-12 2007-12 2008-11 2009-10 2010-10
IWSV
Unemployment rate2.5% C.I.
97.5% C.I.True outcome
Estimated trend
-4
-2
0
2
4
2001-04 2002-03 2003-02 2004-02 2005-01 2006-01 2006-12 2007-12 2008-11 2009-10 2010-10
Clark
Unemployment rate2.5% C.I.
97.5% C.I.True outcome
Estimated trend
CHAPTER 2. IMPROVING VAR FORECASTS THROUGH AR INVERSE WISHART SV 136
2.8 Conclusion
Dramatic changes in macroeconomic time series volatility have posed a challenge to contemporary VAR fore-
casting models. Traditionally, the conditional volatility of these models had been assumed constant over time or
allowed for structural breaks across long time periods. More recent work, however, has improved forecasts by
allowing the conditional volatility to be completely time variant by specifying the VAR innovation variance as a
distinct discrete time process. For example, Clark (2011) specified the elements of the covariance matrix process
of the VAR innovations as linear functions of independent nonstationary processes.
However, it is not clear that the choice of nonstationary driving processes is suitable. Moreover, in order to
reduce parameterization, some form of fixed relationship is imposed between the elements of the VAR innovation
covariance matrix and the independent processes driving them.
Ultimately, we would like to have an empirical rationale for this choice of specification. Given this, we have
proposed and tested both the Clark (2011) benchmark model, and the alternative multivariate volatility process,
IWSV , which is constructed in such a way to directly model the time varying covariance matrices by means of
the Inverse Wishart distribution. These models have been estimated, and forecasts been constructed, on a data set
as close to Clark (2011) as possible.
Motivating this study are also a number of theoretical advantages of the proposed IWSV specification. For
one, the direct specification of the dynamics of the latent stochastic volatility process, Σt, precludes the need to
specify convoluted relationships between the (co)volatility elements of Γt, γij,t, and the driving processes λi,t,
through the B matrix. Moreover, the autoregressive dynamics between volatility series are more easily interpreted
as volatility spill-over effects, since we no longer need to disentangle these relationships. The model is now also
invariant to permutation of the order of the observed series. Finally, it is easy to derive conditions ensuring the
existence of the unconditional mean of the processes (Σt) and (yt).
In applying both models to the data, we have chosen to evaluate their performance strictly in terms of fore-
casting ability. In doing so, we have chosen to evaluate both point and interval forecasts. Point forecasts consider
the mean squared error (MSE) of out-of-sample forecasts, while interval forecasts considered the Bayesian log
predictive likelihood measure, LPLh,n, along a number of forecast horizons. Moreover, we have computed these
metrics a number of times, across sample windows, which represent subsamples of the entire data set.
Estimating both models provides a number of interesting results. First, posterior means of the parameters
are much more stable across the sample windows, given a recursive window that grows larger, than a rolling
window of fixed size. Moreover, irrespective of sample window size or scheme, the posterior means of many
parameters deviate from our assumptions on the prior means, suggesting that the data are informative, despite the
small sample size.
CHAPTER 2. IMPROVING VAR FORECASTS THROUGH AR INVERSE WISHART SV 137
Interestingly, the stationarity of the multivariate volatility process driving the VAR innovations may be ques-
tionable, as the IWSV model estimates seem to suggest the possibility of at least one unit root. We also find that
the filtered latent (co)volatilities of the Clark model are much too smooth and suggest too little variation across
time.
Forecasting performance is also mixed. For example, the MSE out-of-sample forecasts suggest that neither
model exhibits a strong advantage. Turning to interval forecasts, we consider the distribution of the log predictive
likelihood, LPLh,n, measures for both models. While the IWSV exhibits a strong advantage in the sense
that the mean of these measures are substantially improved, this model suffers from the fact that the measures
are more skewed in the negative direction, and prone to rare occurrences of dramatically poor forecast fit, with
this sensitivity becoming worse as the forecast horizon grows large. Finally, as expected, given the dramatic
nonstationarity of the Clark volatility process, its prediction intervals tend to grow exponentially.
Ultimately, we must emphasize that our methodology does not presume that one of the competing models is
well specified, instead we insist on the opposite. Rather, given these results we suggest an approach that might
make use jointly of both the Clark and IWSV specifications in practice. That is, use, say, the Clark in some
environments and the IWSV in others. Moreover, we could likely encompass both models in a better, also
misspecified, model. A natural idea is to introduce a model with endogenous switching regimes, where the Clark
model is employed in situations where it performs better, while the IWSV can be used otherwise. It is important
to note that this encompassing model would not represent a mixture of the Clark and IWSV specifications, with
unknown mixing weights, regularly updated by Bayesian techniques.
CHAPTER 2. IMPROVING VAR FORECASTS THROUGH AR INVERSE WISHART SV 138
2.9 References
BERNANKE, B., AND I. MIHOV (1998a): “The Liquidity Effect and Long-Run Neutrality,” Carnegie-Rochester
Conference Series on Public Policy, 49, 149-194.
—————- (1998b): “Measuring Monetary Policy,” Quarterly Journal of Economics, 113, 869-902.
CHIB, S., Y. OMORI, AND M. ASAI (2009): “Multivariate Stochastic Volatility,” in Handbook of Financial Time
Series., ed. by T.G. Anderson, et al., Berlin: Springer-Verlag Publishing.
CLARK, T.E. (2011): “Real-Time Density Forecasts from Bayesian Vector Autoregressions with StochasticVolatility,” Journal of Business & Economic Statistics, 29, 3, 327-341.
CLARK, T.E., AND M.W. McCRACKEN (2001): “Tests of Equal Forecast Accuracy and Encompassing forNested Models,” Journal of Econometrics, 105, 85-110.
—————- (2008): “Forecasting with Small Macroeconomic VARs in the Presence of Instability”, in Forecast-
ing in the Presence of Structural Breaks and Model Uncertainty, ed. by D.E. Rapach and M.E. Wohar, Bingley,UK: Emerald Publishing.
—————- (2010): “Averaging Forecasts from VARs with Uncertain Instabilities”, Journal of Applied Econo-metrics, 25, 5-29.
CLEMENTS, M.P., AND D.F. HENDRY (1998): Forecasting Economic Time Series, Cambridge, U.K.: Cam-bridge University Press.
COGLEY, T., AND T.J. SARGENT (2001): “Evolving Post-World War II U.S. Inflation Dynamics,” NBER
Macroeconomics Annual, 16, 331-373.
—————- (2005) “Drifts and Volatilities: Monetary Policies and Outcomes in the Post-World War II U.S.,”Review of Economic Dynamics, 8, 262-302.
CROUSHORE, D. (2006): “Forecasting with Real-Time Macroeconomic Data,” in Handbook of Economic Fore-
casting, ed. by G. Elliott, C. Granger, and A. Timmermann, Amsterdam: North-Holland Publishers.
DIEBOLD, F.X., AND K. YILMAZ (2008): “Measuring Financial Asset Return and Volatility Spillovers withApplication to Global Equity Markets,” Working Paper 08-16, Research Department, Federal Reserve Bank of
Philadelphia.
ENGLE, R.F., AND K.F. KRONER (1995): “Multivariate Simultaneous Generalized ARCH,” Economic Theory,11, 122-150.
FOX, E.B., AND M. WEST (2011): ”Autoregressive Models for Variance Matrices: Stationary Inverse WishartProcesses,” arXiv:1107.5239v1.
CHAPTER 2. IMPROVING VAR FORECASTS THROUGH AR INVERSE WISHART SV 139
GEWEKE, J., AND G. AMISANO (2010): “Comparing and Evaluating Bayesian Predictive Distributions of As-set Returns,” International Journal of Forecasting, 26, 216-230.
GOURIEROUX, C., J. JASIAK, AND R. SUFANA (2009): “The Wishart Autoregressive Process of MultivariateStochastic Volatility,” Journal of Econometrics, 150, 167-181.
GOLOSNOY, V., B. GRIBISCH, AND R. LIESENFELD (2010): “The Conditional Autoregressive WishartModel for Multivariate Stock Market Volatility,” Working Paper, Christian-Albrechts-Universitat zu Kiel, 07.
GREENBURG, E. (2008): Introduction to Bayesian Econometrics, Cambridge, U.K.: Cambridge UniversityPress.
HAMILTON, J.D. (1989): “A New Approach to the Economic Analysis of Nonstationary Time Series and theBusiness Cycle,” Econometrica, 57, 357-384.
JORE, A.S., J. MITCHELL, AND S.P. VAHEY (2010): “Combining Forecast Densities from VARs with Uncer-tain Instabilities,” Journal of Applied Econometrics, 25, 621-634.
KIM, C.J., AND C.R. NELSON (1999): “Has the U.S. Economy Become More Stable? A Bayesian ApproachBased on a Markov Switching Model of the Business Cycle,” Review of Economics and Statistics, 81, 608-661.
KOZICKI, S., AND P.A. TINSLEY (2001a): “Shifting Endpoints in the Term Structure of Interest Rates,” Jour-
nal of Monetary Economics, 47, 613-652.
—————- (2001b): “Term Structure Views of Monetary Policy Under Alternative Models of Agent Expecta-tions,” Journal of Economic Dynamics and Control, 25, 149-184.
LITTERMAN, R.B. (1986): “Forecasting with Bayesian Vector Autoregressions: Five Years of Experience,”Journal of Business and Economic Statistics, 4, 25-38.
MAGNUS, J.R., AND H. NEUDECKER (1980): “The Elimination Matrix: Some Lemmas and Applications,”SIAM Journal on Algebraic and Discrete Methods, 1, 422-449.
—————- (1988): Matrix Differential Calculus with Applications in Statistics and Econometrics, New York:John Wiley and Sons Publishers.
McCONNELL, M., AND G. PEREZ QUIROS (2000): “Output Fluctuations in the United States: What HasChanged Since the Early 1980s?,” American Economic Review, 90, 1464-1476.
PHILIPOV, A., AND M.E. GLICKMAN (2006): “Multivariate Stochastic Volatility via Wishart Processes,” Jour-
nal of Business and Economic Statistics, 24, 3, 313-328.
CHAPTER 2. IMPROVING VAR FORECASTS THROUGH AR INVERSE WISHART SV 140
PRESS, S.J. (1982): Applied Multivariate Analysis, 2nd ed., New York: Dover Publications.
PRIMICERI, G. (2005): “Time Varying Structural Vector Autoregressions and Monetary Policy,” Review of Eco-
nomic Studies, 72, 821-852.
RINNERGSCHWENTNER, W., G. TAPPEINER, AND J. WALDE (2011): “Multivariate Stochastic Volatilityvia Wishart Processes – A Continuation,” Working Papers in Economics and Statistics: University of Innsbruck,19.
ROMER, C.D., AND D.H. ROMER (2000): “Federal Reserve Information and the Behaviour of Interest Rates,”American Economic Review, 90, 429-457.
SARTORE, D., AND M. BILLIO (2005): “Stochastic Volatility Models: A Survey with Applications to OptionPricing and Value at Risk,” in Applied Quantitative Methods for Trading and Investment, ed. by C.L. Dunis, J.Laws, and P. Naim, John Wiley and Sons Publishers.
SIMS, C.A. (2001): “Comment on Sargent and Cogley’s ‘Evolving Post-World War II U.S. Inflation Dynamics’,”NBER Macroeconomics Annual, 16, 373-379.
—————- (2002): “The Role of Models and Probabilities in the Monetary Policy Process,” Brookings Papers
on Economic Activity, 2, 1-40.
STOCK, J.H. (2001): “Discussion of Cogley and Sargent ‘Evolving Post-World War II U.S. Inflation Dynamics’,”NBER Macroeconomics Annual, 16, 379-387.
VILLANI, M. (2009): “Steady-State Priors for Vector Autoregressions,” Journal of Applied Econometrics, 24,630-650.
CHAPTER 2. IMPROVING VAR FORECASTS THROUGH AR INVERSE WISHART SV 141
2.10 Appendix: Real data, LDL′ factorization of Σt
Further evidence for the IWSV model can be drawn by considering the form of the constraint which theClark model places on the time varying structure of the covariance matrix process of the VAR innovations, vt.Recall from Section 2.3.1 that the Clark model imposes the following parameterization on the VAR innovationsgiven as
vt=B−1Λ0.5t εt, where εt ∼MVNp (0, Ip) . (2.47)
Of course, the constraint above implies that
Γt=B−1Λt
(B−1
)′= var (vt) . (2.48)
The interesting point to note is that this parameterization is equivalent to imposing an LDL′
factorization on thecovariance matrix of the VAR innovations, where L is a lower triangular matrix with ones on the diagonal andD is a diagonal matrix. Note that this LDL
′factorization always exists for positive definite, real, symmetric
matrices and is unique.
This result implies a method for testing whether Clark’s parametric assumption on the volatility process iscorrect. Suppose that we estimate the VAR model under the IWSV specification, given the entire data set. Ateach point in time that we draw a covariance matrix Σt
(m), from the Gibbs sampler, we factorize this covariancematrix as Σt = LtDtL
′
t. Iterating in this way provides us with a finite sample distribution of the L(m)t matrices
implied by the IWSV specification for each time period t = 1, . . . , T . Since these Lt matrices are unique,their time varying distributions must suggest something about whether or not it is appropriate to assume that theelements of the B−1 matrix in the Clark specification should be specified as constant across time.
Of course, while there are other ways to check the validity of this assumption (e.g. estimate the Clark withtime varying B matrices and compare results according to some metric), the aforementioned test proves the mostimmediately applicable.
Figures 2.22.i and 2.22.ii illustrate the results of this test which suggest that there exists significant time vari-ation in the elements of the Lt matrix factors across time, especially with regards to the elements correspondingto the pairing of GDP growth with both the inflation rate and the interest rate (i.e. L21,t and L31,t).
2.11 Appendix: Derivation of the posterior distributions needed for Gibbssampling
The following appendix describes the steps required for generating the conditional posterior distribution of theparameters used in the Gibbs sampler. First, we consider the steps required for estimation of the V AR(J) modelwith Clark volatility, then we consider the steps required for estimating the IWSV parameters. Please refer toSection 2.5 for a summary of the steps outlined below.
The benchmark volatility specification, Clark, is estimated by Gibbs sampler, given estimation steps basedon those of Villani (2009) and Cogley and Sargent (2005). Clark (2011) provides expressions for the conditionalposterior distributions of the VAR parameters of his Gibbs sampler, obtained from Mattias Villani, who himselfderived them based on the constant variance sampler employed in Villani (2009). We have re-derived them herefor completeness.
CHAPTER 2. IMPROVING VAR FORECASTS THROUGH AR INVERSE WISHART SV 142
Figure 2.22.i: L21,t, time varying density
-1.5
-1
-0.5
0
0.5
1
1 20 39 58 77 96 115 134 153 172 191 210 229
L21
,t
Time 't'
L21,t97.5% C.I.2.5% C.I.
Figure 2.22.ii: L31,t, time varying density
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
1 20 39 58 77 96 115 134 153 172 191 210 229
L31
,t
Time 't'
L31,t97.5% C.I.2.5% C.I.
CHAPTER 2. IMPROVING VAR FORECASTS THROUGH AR INVERSE WISHART SV 143
2.11.1 Definition of the parameters and priors
What follows provides descriptions of the model parameters and their chosen prior distributions.
i) Parameters related to the V AR(J) process:
Πj , for j = 1, . . . , J , the autoregressive coefficient matrices on the V AR(J) specification. Each Πi is ofdimension p× p. Combining all the J coefficient matrices, Πj, into one larger matrix, Π, the conditionallyconjugate prior is multivariate Normal, Π ∼ N(µΠ,ΞΠ).
Ψ, the matrix which when multiplied by the deterministic trend vector dt, forms the unconditional meanvector of the VAR specification. Ψ is of dimension p× q. The conditionally conjugate prior is multivariateNormal, Ψ ∼ N(µΨ,ΞΨ).
ii) (Augmented) parameters specific to the Clark (2011) volatility specification:
B, the lower triangular matrix with ones along its main diagonal, which when inverted, and pre and postmultiplied by Λt, forms the covariance matrix of the VAR shocks, vt, given as Γt below. B is of dimensionp× p. The elements of the B matrix are assumed independent Normal, with details provided in the relevantsection below.
Λt, the diagonal matrix which contains the nonstationary, independent, driving processes λi,t, for i =
1, . . . , p, along its main diagonal. Λt is of dimension p × p. The prior for this augmented parameter isLog-normal with details provided in the relevant section below.
Φ, the diagonal matrix which contains the variances, ϕi, of the shocks of the driving processes, ξi, fori = 1, . . . , p. Φ is of dimension p × p. We assume conditionally conjugate, independent Inverse-Gammapriors on each ϕi ∼ IG(γ2 ,
δ2 ).
iii) (Augmented) parameters specific to the proposed IWSV volatility specification:
Ak, for k = 1, . . . ,K, the autoregressive coefficient matrices on the IWSV volatility specification. EachAk is of dimension p×p, and is not necessarily symmetric. We assume multivariate Normal priors for eachof the Ak, for k = 1, . . . ,K, matrices.
C the lower diagonal constant matrix in the IWSV volatility specification. C is of dimension p × p. Weassume a multivariate Normal prior for on the C matrix.
ν the degree of freedom parameter describing the shape of the Inverse Wishart distribution driving thevolatility shocks. The degree of freedom parameter is a scalar. The prior chosen is a Gamma distribution.
Σt, ∀t, the covariance matrices of the VAR process shocks under the IWSV model. The Σt matrices areall of dimension p× p. The prior is Inverse Wishart with details provided in the relevant section below.
2.11.2 Computation of the posterior distribution for the Clark (2011) volatility model
Let us now describe the different conditional posterior distributions involved in the sequence for Gibbs sampling.
1. Draw from the posterior density of the slope coefficients Π′
= [Π1,Π2, . . . ,ΠJ] of the VAR, conditionalon Ψ, ΛT, B,Φ=diag(ϕ1, ϕ2, . . . , ϕp) and the data, given multivariate Normal prior, Π ∼ N(µΠ,ΩTΠ).
CHAPTER 2. IMPROVING VAR FORECASTS THROUGH AR INVERSE WISHART SV 144
For this step we rewrite the VAR as:
Yt = Π′Xt + vt, (2.49a)
where Yt = yt−Ψdt, (2.49b)
vt = B−1Λ0.5t εt, (2.49c)
and Xt =[(yt−1−Ψdt−1)
′, (yt−2−Ψdt−2)
′, . . . , (yt−J−Ψdt−J)
′]′. (2.49d)
This is a linear model with respect to the parameter elements of the matrix Π. To clearly illustrate thislinear model we can use the alternative expression [Magnus and Neudecker (1998)]:
Yt = vec(Π′Xt
)+ vt
=(Ip ⊗X
′
t
)· vec (Π) +vt, (2.50a)
where ⊗ denotes the tensor product. Eliminating the heteroskedasticity by pre-multiplication we have
Y∗t =Γ−0.5t Yt=Γ−0.5
t
(Ip ⊗X
′
t
)· vec (Π) +εt, (2.51)
where εt ∼ N (0, Ip) and Γ−0.5t =Λ−0.5
t B. Or equivalently with clear notation:
Y∗t = X∗tvec(Π) + εt, (2.52)
where X∗t = Γ−0.5t
(Ip ⊗Xt
′)
.
Thus we have the following Lemma [See Tsay (2005), Section 12.3.2, or Box and Tiao (1973)].
Lemma 2.11.1.
Let us consider the Gaussian regression model
Y∗t = X∗tvec(Π) + εt, (2.53)
with prior,
vec(Π) ∼ N(µΠ,ΞΠ). (2.54)
Then the posterior distribution is such that vec(Π) ∼ N(µ∗Π,Ξ∗Π), where
Ξ∗Π =[Ξ−1
Π + X∗′X∗]−1
(2.55a)
and µ∗Π = Ξ∗Π[Ξ−1Π µΠ+X∗
′Y∗], (2.55b)
(2.55c)
given X∗X∗′
=∑Tt=1 X∗
′
t X∗t and X∗Y∗′
=∑Tt=1 X∗
′
t Y∗t .
CHAPTER 2. IMPROVING VAR FORECASTS THROUGH AR INVERSE WISHART SV 145
The sufficient summaries of the past, appearing in this posterior distribution, are given as
X∗′X∗ =
T∑t=1
[Γ−0.5
t
(Ip ⊗X
′
t
)]′Γ−0.5
t
(Ip ⊗X
′
t
)(2.56a)
=
T∑t=1
(Ip ⊗X
′
t
)′Γ−1
t
(Ip ⊗X
′
t
)(2.56b)
and X∗′Y∗ =
T∑t=1
[Γ−0.5
t
(Ip ⊗X
′
t
)]′(Γ−0.5
t Yt) (2.56c)
=
T∑t=1
(Ip ⊗X
′
t
)′Γ−1
t Yt. (2.56d)
2. Draw from the posterior density of the coefficients Ψ defining the trend, conditional on Π, ΛT, B, and
Φ=diag(ϕ1, ϕ2, . . . , ϕp) and the data, given a multivariate Normal prior, Ψ ∼ N(µΨ,ΞΨ).
The equation defining Yt can be rewritten as:
Yt = Π (L) yt = Π (L) Ψdt + vt. (2.57)
Let us check that this is still a linear model. We have
Yt =
Ip −J∑j=1
ΠjLj
Ψdt + vt (2.58a)
= IpΨdt −Π1Ψdt−1 − · · · −ΠJΨdt−J + vt (2.58b)
=(dt
′⊗ Ip
)· vec(Ψ)−
(dt−1
′⊗Π1
)· vec(Ψ)− · · · −
(dt−J
′⊗ΠJ
)· vec(Ψ) + vt (2.58c)
=((
dt
′⊗ Ip
)−(dt−1
′⊗Π1
)− · · · −
(dt−J
′⊗ΠJ
))· vec(Ψ) + vt (2.58d)
= Xt · vec(Ψ) + vt, (2.58e)
where Xt =((
dt
′⊗ Ip
)−(dt−1
′⊗Π1
)− · · · −
(dt−J
′⊗ΠJ
)).
Thus we can standardize by pre-multiplication and get
Y∗t = Γ−0.5t Yt=Γ−0.5
t (Π (L) yt) =Γ−0.5t Xt · vec (Ψ) +εt (2.59a)
≡ X∗tvec(Ψ) + εt, say. (2.59b)
We can reapply Lemma 2.11.1, with this new set of explanatory variables, to get the posterior mean and
CHAPTER 2. IMPROVING VAR FORECASTS THROUGH AR INVERSE WISHART SV 146
variance for vec(Ψ) ∼ N(µ∗Ψ,Ξ∗Ψ) as:
Ξ∗Ψ =[Ξ−1
Ψ + X∗′X∗]−1
(2.60a)
and µ∗Ψ = Ξ∗Ψ[Ξ−1Ψ µΨ+X∗
′Y∗]. (2.60b)
The sufficient summaries of the past are now given as
X∗′X∗ =
T∑t=1
(Γ−0.5
t Xt
)′Γ−0.5
t Xt =
T∑t=1
Xt
′Γ−1
t Xt (2.61a)
and X∗′Y∗ =
T∑t=1
(Γ−0.5
t Xt
)′Γ−0.5
t Yt =
T∑t=1
Xt
′Γ−1
t Yt. (2.61b)
3. Draw from the posterior density of the elements of B (lower triangular with ones in the diagonal) conditional
on Π,Ψ, ΛT, Φ=diag(ϕ1, ϕ2, . . . , ϕp) and the data, given Normal, independent, priors on each of the
elements of the B matrix.
The system defining Yt can now be rewritten as
BΠ (L) (yt −Ψdt) = BYt = Λ0.5t εt (2.62)
Since B is lower triangular, this system of equations reduces to the following
Y1,t = λ0.51,t ε1,t (2.63a)
Y2,t = −b21Y1,t + λ0.52,t ε2,t (2.63b)
Y3,t = −b31Y1,t−b32Y2,t + λ0.53,t ε3,t (2.63c)
Y4,t = −b41Y1,t−b42Y2,t−b43Y3,t + λ0.54,t ε4,t (2.63d)
...
Yp,t = −bp1Y1,t−bp2Y2,t−bp3Y3,t − . . .−bp,(p−1)Y(p−1),t + λ0.5p,tεp,t (2.63e)
where Yi,t is the ith element of the p× 1 column vector Π (L) (yt −Ψdt) = Yt.
We can treat each of the i = 2, . . . , p equations above as linear regressions. Again, pre-multiplication of
each of the i equations by λ−0.5i,t , ∀t, removes the heteroskedasticity. Furthermore, given the assumption
of independent Normal prior densities, the conditional posterior for each row vector of B is also Normal:
CHAPTER 2. IMPROVING VAR FORECASTS THROUGH AR INVERSE WISHART SV 147
N (βi∗,Gi
∗) , ∀i = 2, . . . , p, where
G∗i =[Gi−1 +X∗i
′X∗i
]−1
, (2.64a)
and β∗i = G∗i [G−1i βi+X∗i
′Y∗i ], (2.64b)
and
Y∗i =[λ−0.5i,1 Yi,1, . . . , λ
−0.5i,T Yi,T
]′(2.65a)
and X∗i =
−λ−0.5
i,1 Y1,1 −λ−0.5i,1 Y2,1 . . . −λ−0.5
i,1 Yi−1,1
. . . . . . . . . . . .
−λ−0.5i,T Y1,T −λ−0.5
i,T Y2,T . . . −λ−0.5i,T Yi−1,T
. (2.65b)
4. Draw from the posterior density of the elements of the time varying covariance matrix Λt for each time
t = 1, . . . , T in sequence, each conditional on Π, Ψ, B, and Φ=diag(ϕ1, ϕ2, . . . , ϕp) and the data.
Since the stochastic volatilities are independent of each other for all i = 1, . . . , p, we can estimate the
corresponding equation separately. In order to do so, we need an expression for the posterior density of
each augmented parameter λi,t conditional on everything else, including the entire macroeconomic series
values for all t = 1, . . . , T .
Since each volatility is Markov of order one, we can write for each i = 1, . . . , p
g(λi,t
∣∣ λi,\t, ϕi,Y∗i ) ∝ f (Y∗i | λi) g(λi,t
∣∣ λi,\t, ϕi) ∝ f (y∗i,t ∣∣ λi,t) g (λi,t ∣∣ λi,\t, ϕi)= f
(y∗i,t
∣∣ λi,t) g (λi,t | λi,t−1, ϕi) g (λi,t+1 | λi,t, ϕi) = f(y∗i,t
∣∣ λi,t) g (λi,t | λi,t−1, λi,t+1, ϕi)
(2.66)
where λi,\t denotes all elements of the λi vector except for the tth element, Y∗i =y∗i,1, . . . , y
∗i,T
, and
y∗i,t is the ith element of BΠ (L) (yt −Ψdt). Furthermore, since
λi,t|λi,t−1 ∼ LN(eln(λi,t−1) +
ϕi2 , (e
ϕi − 1)e2ln(λi,t−1) +ϕi), (2.67)
CHAPTER 2. IMPROVING VAR FORECASTS THROUGH AR INVERSE WISHART SV 148
we have
f(y∗i,t
∣∣ λi,t) g (λi,t | λi,t−1, λi,t+1, ϕi)
∝ λ−0.5i,t exp
(−(y∗i,t)2
2λi,t
)λ−1i,t exp
(− (ln (λi,t) − µi,t)2
2σ2
), (2.68)
where we can solve for missing values according to Section 12.6.1 of Tsay (2005) and find that
µi,t =1
2(ln (λi,t+1) + ln (λi,t−1) ) , (2.69a)
and σ2 =1
2ϕi. (2.69b)
Therefore, in implementing a Metropolis-within-Gibbs step we can draw a proposal from the Gibbs sampler
for λ(m)i,t , from λ
(m)i,t ∼ LN(eµi,t+
σ2
2 , (eµi,t −1)e2µi,t+σ
2
), and accept it as the mth draw with probability
α(λ
(m−1)i,t , λ
(m)i,t
)= min1,
f(y∗i,t
∣∣∣ λ(m)i,t
)f(y∗i,t
∣∣∣ λ(m−1)i,t
), (2.70)
since the proposal densities cancel out in the ratio.
5. Draw from the posterior density of the diagonal elements of Φ conditional on Π, Ψ, B, and ΛT and the
data.
The Inverse Gamma prior is conjugate for the variance parameter of the Normal density. Therefore, the
conditional posterior of ϕi is also Inverse Gamma as
f(ϕi|λi) ∝ h (λi | ϕi) p(ϕi)
∝T∏t=1
1
ϕ0.5i
exp
− (ln (λi,t) − ln (λi,t−1) )
2
2ϕi
× ϕ−( γ2 +1)
i e− δ
2ϕi . (2.71)
Furthermore, the right hand side above is equal to
ϕ−( γ2 +1)−T2i exp
− δ
2ϕi− 1
2ϕi
T∑t=1
ln
(λi,tλi,t−1
) 2
= ϕ−( γ+T2 +1)i exp
−δ +
∑Tt=1 ln
(λi,tλi,t−1
) 2
2ϕi
(2.72)
Consequently, assuming identical Inverse Gamma priors on each ϕi ∼ IG(γ2 ,δ2 ), the conditional posterior
CHAPTER 2. IMPROVING VAR FORECASTS THROUGH AR INVERSE WISHART SV 149
is also Inverse Gamma, or IG(γ∗
2 ,δ∗
2 ), where
γ∗ = γ + T, (2.73a)
and δ∗ = δ +
T∑t=1
(ln
(λi,tλi,t−1
) )2
. (2.73b)
2.11.3 Computation of the posterior distribution for the IWSV model
Let us now describe the sequence of conditional posterior distributions for the IWSV model.
1. First, we repeat step (1) above in Section 2.11.2, except that we replace Γt=B−1Λt
(B−1
)′= var (vt)
with Σt. That is, we no longer condition on B, ΛT, and Φ, but rather on Ak, for k = 1, . . . ,K, C, ν, andΣT.
2. Repeat step (2) above in Section 2.11.2, except this time replace Γt=B−1Λt
(B−1
)′= var (vt), with Σt.
3. Draw from the posterior density of the parameters Ak, ∀k,C, and ν jointly, conditional on Π, Ψ, and ΣT,and the data.
All of the individual elements of the parameter matrices Ak, ∀k,C and ν are drawn jointly by a Metropolis-within-Gibbs step employing a random walk proposal. The joint proposal is multivariate Normal, and weassume multivariate Normal priors on both Ak, ∀k and C and a Gamma prior on (ν − p). See Section 2.4on priors for more details.
The random walk multivariate Normal proposal is symmetric and conditioned on the last value in the pro-cess through its mean vector; therefore it drops out of the acceptance ratio. The variance of the proposal isinitially set to the inverse of the observed negative Hessian matrix at the mode of the conditional posteriorfor a first attempt, and then a second attempt is employed using the covariance matrix of the initial Markovprocess draws themselves for improved mixing.
Moreover, the likelihood of the IWSV model is now given as
f (v | θ) = L (θ) =
T∏t=1
f (vt | Σt) g (Σt | Σt−1, . . . ,Σt−K; θ)
=
T∏t=1
1
(2π)p2 |Σt|
12
exp
−1
2v′
tΣ−1t vt
× 2−( νp2 ) |St−1|
ν2 Γp
(ν2
)−1
|Σt|−(ν+p+1)/2exp
−1
2tr[St−1Σ−1
t
], (2.74)
where vt = Π (L) (yt −Ψdt) is a function of the data, y. Therefore, by Bayes Theorem we can considerthe conditional posterior of θ as proportional to the likelihood (which is really a function of the data) timesthe prior density for θ (where θ = A1, . . . ,Ak,C, ν) as follows
p(θ|yT,Π,Ψ,ΣT) ∝ L(θ)π(θ) = f(yT,ΣT|θ; Π,Ψ)π(θ) ∝ f(vT|θ)π(θ). (2.75)
Therefore, the Metropolis acceptance probability of the mth draw, θ(m), in the random walk sampler can
CHAPTER 2. IMPROVING VAR FORECASTS THROUGH AR INVERSE WISHART SV 150
be expressed as
α(θ(m−1), θ(m)) = min
1,
p(θ(m)|yT,Π,Ψ,ΣT)
p(θ(m−1)|yT,Π,Ψ,ΣT)
. (2.76)
4. Similarly to step (4) above, we now draw from the posterior density of Σt conditional onΣ\t,Ak, ∀k,C, ν, Π, Ψ, and the data, in sequence for each time t = 1, . . . , T . We have:
P(Σt
∣∣ Σ\t,v)∝ P (vt | Σt)P (Σt | Σt−1)P (Σt+1 | Σt) (2.77a)
∝ |Σt|−12 |St|
ν2 |Σt|−(ν+p+1)/2
exp
−1
2tr[(
St−1 + vtv′
t
)Σ−1
t
]exp
−1
2tr[StΣ
−1t+1
], (2.77b)
whereSt−1
ν − p− 1= CC
′+
K∑k=1
AkΣ−1t−kA
′
k. (2.77c)
Therefore, by letting the proposal be Inverse Wishart Σt ∼ IWp
(ν,S∗t−1
)where S∗t−1 = St−1 + vtv
′
t,the proposal drops out of the Metropolis-Hastings ratio. Indeed, the probability of accepting the mth drawof Σ
(m)t , sequentially, for each time period t = 1, . . . , T , is now 20
α(Σ
(m−1)t ,Σ
(m)t
)= min
1,
∣∣∣Σ(m)t
∣∣∣− 12∣∣∣S(m)
t
∣∣∣ ν2 exp− 1
2 tr[S
(m)t Σ−1
t+1
]∣∣∣Σ(m−1)
t
∣∣∣− 12∣∣∣S(m−1)
t
∣∣∣ ν2 exp− 1
2 tr[S
(m−1)t Σ−1
t+1
] . (2.78)
2.12 Appendix: Tables and Figures
20To avoid numerical problems logs are taken of both the numerator and denominator, then differenced, before finally taking their exponen-tial. This avoids issues when the non-logged function values grow either too large or too small to be machine comparable.
CHAPTER 2. IMPROVING VAR FORECASTS THROUGH AR INVERSE WISHART SV 151
Table 2.3.i: Section 2.7.1: Posterior distribution of the parameters for IWSV model, Simulated data, 1st sample, Rolling window
Parameter Pop. value 2.5% C.I. mean 97.5% C.I. Parameter Pop. value 2.5% C.I. mean 97.5% C.I.vech(C) 0.3 0.2778 0.2984 0.3208 vec(Π1) 0.2 0.0976 0.2185 0.3374
0 -0.0229 0.0046 0.0320 0 -0.1666 -0.0157 0.13600 -0.0413 -0.0069 0.0248 0 -0.2724 -0.0920 0.08820 -0.0407 -0.0048 0.0381 0 -0.2415 0.0110 0.2627
0.3 0.2775 0.3038 0.3337 0 -0.0806 0.0113 0.10250 -0.0375 -0.0057 0.0273 0.8 0.5904 0.7091 0.82770 -0.0392 -0.0003 0.0418 0 -0.0888 0.0456 0.1798
0.3 0.2736 0.3077 0.3427 0 -0.1756 0.0218 0.21880 -0.0468 -0.0039 0.0403 0 -0.1156 -0.0425 0.0309
0.3 0.2701 0.3138 0.3577 0 -0.0961 -0.0047 0.0870vec(A1) 0.5 0.4449 0.4943 0.5344 0.8 0.7345 0.8532 0.9721
0 -0.0417 0.0020 0.0486 0 -0.2937 -0.1246 0.04470 -0.0438 0.0019 0.0455 0 -0.0508 -0.0107 0.02910 -0.0468 0.0014 0.0503 0 0.0227 0.0744 0.12620 -0.0467 -0.0018 0.0487 0 -0.1318 -0.0693 -0.0065
0.75 0.7062 0.7440 0.7811 0.8 0.6551 0.7704 0.88630 -0.0477 -0.0029 0.0421 vec(Π2) 0 -0.2741 -0.1557 -0.03670 -0.0466 -0.0011 0.0451 0 -0.2246 -0.0768 0.06970 -0.0447 -0.0077 0.0338 0 -0.1544 0.0256 0.20420 -0.0444 -0.0030 0.0349 0 -0.2398 0.0054 0.2486
0.85 0.8052 0.8459 0.8804 0 -0.1330 -0.0199 0.09270 -0.0417 -0.0008 0.0415 0 -0.0688 0.0753 0.21750 -0.0200 0.0027 0.0293 0 -0.2405 -0.0768 0.08600 -0.0191 0.0039 0.0267 0 -0.1928 0.0324 0.25630 -0.0250 0.0024 0.0279 0 -0.0284 0.0688 0.1653
0.98 0.9477 0.9739 1.0006 0 -0.1910 -0.0714 0.0485vec(A2) 0 0.0007 0.0189 0.0508 0 -0.1478 0.0007 0.1505
0 -0.0455 -0.0005 0.0402 0 -0.1146 0.0918 0.29590 -0.0424 0.0001 0.0421 0 -0.0796 -0.0276 0.02440 -0.0445 0.0016 0.0456 0 -0.0540 0.0124 0.07810 -0.0416 0.0002 0.0460 0 0.0106 0.0902 0.16910 -0.0438 -0.0014 0.0400 0 -0.0797 0.0618 0.20050 -0.0458 -0.0024 0.0458 vec(Π3) 0 -0.1753 -0.0536 0.06760 -0.0440 -0.0007 0.0428 0 -0.0817 0.0704 0.22270 -0.0452 -0.0003 0.0453 0 -0.0947 0.0814 0.25650 -0.0409 0.0010 0.0412 0 -0.3865 -0.1355 0.11400 -0.0487 0.0017 0.0479 0 -0.0670 0.0241 0.11440 -0.0488 -0.0033 0.0466 0 -0.1501 -0.0357 0.07860 -0.0372 0.0016 0.0433 0 -0.1332 -0.0015 0.13190 -0.0421 -0.0007 0.0410 0 -0.1896 0.0080 0.20930 -0.0450 -0.0002 0.0431 0 -0.1138 -0.0421 0.02960 -0.0381 0.0018 0.0450 0 -0.0385 0.0546 0.1461
vec(A3) 0 0.0005 0.0158 0.0448 0 -0.1840 -0.0714 0.04010 -0.0382 0.0021 0.0477 0 -0.2107 -0.0451 0.12060 -0.0470 -0.0011 0.0464 0 -0.0360 0.0064 0.04880 -0.0499 -0.0001 0.0445 0 -0.1245 -0.0708 -0.01770 -0.0428 0.0005 0.0423 0 -0.0620 0.0039 0.07050 -0.0422 0.0001 0.0404 0 -0.1837 -0.0698 0.04390 -0.0434 -0.0046 0.0376 Ψ 3 2.9699 3.0083 3.04650 -0.0420 0.0006 0.0432 0 -0.0832 -0.0223 0.03880 -0.0478 -0.0032 0.0379 2.5 2.4380 2.4987 2.56000 -0.0452 -0.0037 0.0425 0 -0.0596 0.0016 0.06300 -0.0459 -0.0002 0.04830 -0.0424 -0.0003 0.04410 -0.0387 -0.0001 0.03880 -0.0396 0.0022 0.03990 -0.0396 0.0040 0.05440 -0.0395 0.0041 0.0477
ν 30 15.9899 19.9123 26.1828
CHAPTER 2. IMPROVING VAR FORECASTS THROUGH AR INVERSE WISHART SV 152
Table 2.3.ii: Section 2.7.1: Distribution of the posterior mean of the parameters for IWSV model, Simulated data, across N = 100samples, Rolling window
Parameter Pop. value 2.5% C.I. mean 97.5% C.I. Parameter Pop. value 2.5% C.I. mean 97.5% C.I.vech(C) 0.3 0.2943 0.3003 0.3070 vec(Π1) 0.2 0.2145 0.2303 0.2498
0 -0.0003 0.0033 0.0086 0 -0.0441 -0.0107 0.02060 -0.0146 -0.0093 -0.0053 0 -0.1211 -0.0611 0.00060 -0.0105 -0.0051 0.0005 0 -0.0394 -0.0132 0.0140
0.3 0.3032 0.3098 0.3155 0 0.0114 0.0360 0.05220 -0.0133 -0.0068 -0.0001 0.8 0.6974 0.7248 0.74970 -0.0069 -0.0019 0.0031 0 -0.0833 -0.0031 0.0751
0.3 0.2924 0.2994 0.3064 0 -0.0148 0.0361 0.07510 -0.0041 0.0007 0.0059 0 -0.0377 -0.0055 0.0193
0.3 0.3101 0.3147 0.3192 0 -0.0301 -0.0124 -0.0005vec(A1) 0.5 0.4921 0.4970 0.5031 0.8 0.6782 0.7691 0.8529
0 -0.0052 -0.0004 0.0048 0 -0.1299 -0.1068 -0.06620 -0.0050 -0.0011 0.0028 0 -0.0074 -0.0009 0.00560 -0.0069 -0.0016 0.0029 0 0.0089 0.0418 0.07470 -0.0022 0.0038 0.0106 0 -0.0901 -0.0726 -0.0578
0.75 0.7377 0.7438 0.7492 0.8 0.7667 0.7887 0.81480 -0.0060 -0.0005 0.0046 vec(Π2) 0 -0.1492 -0.1286 -0.10640 -0.0098 -0.0041 0.0019 0 -0.0638 -0.0408 -0.02300 -0.0127 -0.0053 0.0031 0 -0.0440 0.0070 0.04230 -0.0105 -0.0024 0.0052 0 0.0107 0.0671 0.1271
0.85 0.8334 0.8408 0.8481 0 -0.0437 -0.0253 -0.01190 -0.0058 -0.0014 0.0034 0 0.0112 0.0404 0.06960 -0.0134 -0.0056 0.0040 0 -0.1043 -0.0603 0.00240 0.0000 0.0051 0.0120 0 -0.0987 -0.0476 0.00280 -0.0043 0.0018 0.0088 0 0.0368 0.0551 0.0869
0.98 0.9675 0.9743 0.9825 0 -0.0745 -0.0430 -0.0105vec(A2) 0 0.0156 0.0178 0.0208 0 -0.0068 0.0230 0.0750
0 -0.0058 0.0001 0.0050 0 -0.0123 0.0596 0.12910 -0.0044 0.0000 0.0044 0 -0.0298 -0.0116 0.00310 -0.0052 0.0001 0.0051 0 -0.0047 0.0103 0.02400 -0.0037 0.0000 0.0043 0 0.0596 0.0895 0.10910 -0.0055 -0.0003 0.0047 0 0.0423 0.0695 0.10540 -0.0039 0.0003 0.0051 vec(Π3) 0 -0.0437 -0.0268 -0.01000 -0.0042 0.0001 0.0049 0 -0.0723 -0.0117 0.06310 -0.0043 -0.0001 0.0041 0 0.0212 0.0502 0.07850 -0.0049 0.0004 0.0043 0 -0.1623 -0.1254 -0.09140 -0.0055 -0.0002 0.0047 0 0.0073 0.0316 0.06900 -0.0053 0.0001 0.0057 0 -0.0352 -0.0194 -0.00170 -0.0044 0.0000 0.0037 0 -0.0057 0.0414 0.07550 -0.0047 -0.0003 0.0043 0 -0.0033 0.0635 0.11900 -0.0040 0.0006 0.0052 0 -0.0634 -0.0495 -0.03900 -0.0042 0.0001 0.0052 0 0.0088 0.0386 0.0664
vec(A3) 0 0.0147 0.0178 0.0206 0 -0.0939 -0.0620 -0.02180 -0.0045 0.0000 0.0050 0 -0.1322 -0.0835 -0.04060 -0.0048 0.0001 0.0047 0 -0.0266 -0.0082 0.01240 -0.0045 0.0003 0.0053 0 -0.0705 -0.0422 -0.02590 -0.0044 0.0000 0.0054 0 -0.0268 -0.0144 0.00480 -0.0042 0.0001 0.0042 0 -0.0778 -0.0592 -0.04240 -0.0057 0.0000 0.0049 Ψ 3 3.0092 3.0289 3.04820 -0.0043 0.0000 0.0046 0 -0.0236 -0.0207 -0.01630 -0.0059 0.0002 0.0054 2.5 2.4865 2.4942 2.50110 -0.0040 0.0002 0.0052 0 0.0001 0.0015 0.00270 -0.0051 0.0002 0.00520 -0.0031 0.0002 0.00360 -0.0038 0.0001 0.00380 -0.0045 0.0000 0.00480 -0.0042 0.0002 0.00480 -0.0057 -0.0002 0.0050
ν 30 18.2853 19.8090 21.3300
CHAPTER 2. IMPROVING VAR FORECASTS THROUGH AR INVERSE WISHART SV 153
Table 2.4.i: Section 2.7.1: Posterior distribution of the parameters for Clark model, Simulated data, 1st sample, Rolling window
Parameter Pop. value 2.5% C.I. mean 97.5% C.I. Parameter Pop. value 2.5% C.I. mean 97.5% C.I.b21 n/a -0.1449 0.0192 0.1800 vec(Π1) 0.2 0.0839 0.2117 0.3408b31 -0.0650 0.1480 0.3667 0 -0.1787 -0.0310 0.1214b32 -0.0766 0.0693 0.2143 0 -0.2839 -0.0989 0.0873b41 -0.3200 0.0221 0.3661 0 -0.2180 0.0422 0.3032b42 -0.2173 0.0609 0.3390 0 -0.0723 0.0170 0.1059b43 -0.3054 -0.0779 0.1498 0.8 0.5884 0.7131 0.8391ϕ1 0.1643 0.2869 0.4712 0 -0.1150 0.0198 0.1573ϕ2 0.1624 0.2783 0.4575 0 -0.2074 -0.0027 0.2066ϕ3 0.1625 0.2784 0.4718 0 -0.1155 -0.0420 0.0307ϕ4 0.1655 0.2857 0.4731 0 -0.1196 -0.0223 0.0741
0.8 0.7102 0.8354 0.96110 -0.3274 -0.1493 0.02910 -0.0555 -0.0175 0.02110 0.0135 0.0681 0.12370 -0.1441 -0.0806 -0.0168
0.8 0.6478 0.7690 0.8883vec(Π2) 0 -0.2772 -0.1491 -0.0207
0 -0.2267 -0.0713 0.08210 -0.1454 0.0432 0.22800 -0.2279 0.0210 0.26800 -0.1414 -0.0332 0.07640 -0.0729 0.0736 0.21880 -0.2103 -0.0430 0.12030 -0.1926 0.0414 0.27650 -0.0142 0.0742 0.16210 -0.1774 -0.0584 0.06190 -0.1553 -0.0034 0.14810 -0.0863 0.1282 0.34200 -0.0602 -0.0109 0.03850 -0.0536 0.0157 0.08420 0.0181 0.0985 0.17910 -0.0803 0.0655 0.2081
vec(Π3) 0 -0.1198 0.0034 0.12650 -0.0921 0.0618 0.21470 -0.1051 0.0763 0.25740 -0.3904 -0.1337 0.12350 -0.0537 0.0316 0.11620 -0.1479 -0.0305 0.08750 -0.1297 0.0010 0.13270 -0.2270 -0.0112 0.20510 -0.1037 -0.0370 0.02950 -0.0458 0.0465 0.13790 -0.1650 -0.0478 0.06950 -0.2126 -0.0398 0.13220 -0.0392 0.0020 0.04310 -0.1223 -0.0666 -0.01280 -0.0792 -0.0091 0.06070 -0.1899 -0.0723 0.0474
Ψ 3 2.9584 2.9990 3.03990 -0.0846 -0.0235 0.0378
2.5 2.4360 2.4970 2.55870 -0.0591 0.0024 0.0637
CHAPTER 2. IMPROVING VAR FORECASTS THROUGH AR INVERSE WISHART SV 154
Table 2.4.ii: Section 2.7.1: Distribution of the posterior mean of the parameters for Clark model, Simulated data, across N = 100samples, Rolling window
Parameter Pop. value 2.5% C.I. mean 97.5% C.I. Parameter Pop. value 2.5% C.I. mean 97.5% C.I.b21 n/a -0.0622 -0.0161 0.0575 vec(Π1) 0.2 0.2074 0.2277 0.2505b31 0.0581 0.1111 0.1570 0 -0.0475 -0.0157 0.0198b32 0.0447 0.0737 0.0976 0 -0.1249 -0.0656 -0.0187b41 0.0136 0.1623 0.2286 0 -0.0248 0.0012 0.0265b42 -0.0571 -0.0100 0.0425 0 0.0155 0.0661 0.1033b43 -0.1218 -0.0750 0.0001 0.8 0.6895 0.7079 0.7328ϕ1 0.2860 0.2984 0.3196 0 -0.0761 -0.0090 0.0645ϕ2 0.2774 0.2914 0.3149 0 -0.0254 0.0169 0.0461ϕ3 0.2659 0.2872 0.3172 0 -0.0409 -0.0096 0.0230ϕ4 0.2744 0.2997 0.3161 0 -0.0643 -0.0407 -0.0207
0.8 0.6899 0.7621 0.83390 -0.1470 -0.1236 -0.09460 -0.0179 -0.0070 0.00220 0.0006 0.0443 0.07030 -0.1090 -0.0846 -0.0703
0.8 0.7640 0.7833 0.8128vec(Π2) 0 -0.1434 -0.1120 -0.0737
0 -0.0686 -0.0317 -0.00040 0.0121 0.0386 0.07040 0.0286 0.0782 0.14410 -0.0708 -0.0528 -0.02570 0.0084 0.0390 0.06890 -0.0699 -0.0414 0.01190 -0.0815 -0.0274 0.01870 0.0504 0.0684 0.09950 -0.0635 -0.0325 0.00100 -0.0191 0.0215 0.07540 0.0013 0.0799 0.14600 -0.0150 0.0037 0.02140 -0.0108 0.0104 0.02620 0.0854 0.1053 0.13030 0.0481 0.0778 0.1082
vec(Π3) 0 -0.0087 0.0169 0.03820 -0.0770 -0.0212 0.04840 0.0023 0.0451 0.08320 -0.1576 -0.1200 -0.08860 0.0166 0.0399 0.06800 -0.0290 -0.0019 0.02010 -0.0174 0.0305 0.05640 -0.0172 0.0443 0.09810 -0.0677 -0.0514 -0.03590 0.0155 0.0505 0.07380 -0.0816 -0.0649 -0.05200 -0.1280 -0.0889 -0.04450 -0.0359 -0.0135 0.01140 -0.0660 -0.0394 -0.02010 -0.0458 -0.0301 -0.00600 -0.0893 -0.0699 -0.0503
Ψ 3 2.9994 3.0256 3.04710 -0.0282 -0.0236 -0.0181
2.5 2.4833 2.4926 2.50090 -0.0003 0.0019 0.0034
CHAPTER 2. IMPROVING VAR FORECASTS THROUGH AR INVERSE WISHART SV 155
Table 2.5.i: Section 2.7.2: Posterior distribution of the parameters for IWSV model, Real data, 1st sample, Recursive windowPosterior Posterior
Parameter Prior mean 2.5% C.I. mean 97.5% C.I. Parameter Prior mean 2.5% C.I. mean 97.5% C.I.vech(C) 0.3 0.3026 0.3668 0.4314 vec(Π1) 0.2 0.0998 0.2678 0.4365
0 -0.0924 -0.0101 0.0665 0 -0.0432 0.0129 0.06960 -0.0625 0.0089 0.0712 0 -0.0086 0.0187 0.04620 -0.0526 0.0009 0.0526 0 -0.0456 -0.0288 -0.0128
0.3 0.3004 0.3702 0.4730 0 0.2734 0.6964 1.13890 -0.0661 -0.0065 0.0591 0.8 0.3613 0.5683 0.77600 -0.0706 -0.0230 0.0256 0 -0.0551 0.0092 0.0750
0.3 0.1957 0.2553 0.3155 0 -0.0602 -0.0193 0.01900 -0.0541 -0.0050 0.0480 0 -0.6476 -0.1205 0.4230
0.3 0.0882 0.1251 0.1671 0 -0.0750 0.1870 0.4571vec(A1) 0.9 0.9828 1.0039 1.0259 0.8 1.0069 1.1873 1.3663
0 -0.0151 0.0101 0.0246 0 -0.0583 0.0105 0.07610 -0.0046 0.0046 0.0140 0 -0.6838 0.0451 0.77510 -0.0163 -0.0098 -0.0039 0 -0.7457 -0.2081 0.33900 -0.1494 -0.0655 0.0745 0 -0.3954 -0.0664 0.2640
0.9 0.8988 0.9514 0.9860 0.8 0.9701 1.1614 1.35340 -0.0364 -0.0029 0.0260 vec(Π2) 0 -0.0961 0.0635 0.22140 -0.0089 0.0083 0.0255 0 -0.0348 0.0229 0.07950 -0.0757 0.0047 0.0844 0 -0.0180 0.0107 0.03910 -0.0931 -0.0094 0.0740 0 -0.0378 -0.0224 -0.0072
0.9 0.8054 0.8714 0.9351 0 -0.9829 -0.5763 -0.15040 -0.0614 -0.0155 0.0244 0 -0.0972 0.0953 0.29250 -0.1045 -0.0133 0.0735 0 -0.0895 -0.0237 0.04320 -0.1191 -0.0446 0.0536 0 -0.0280 0.0114 0.05040 -0.0744 0.0077 0.0824 0 -0.9201 -0.2808 0.3699
0.9 0.7397 0.8091 0.8692 0 -0.4920 -0.1107 0.2704vec(A2) 0 0.0017 0.0405 0.1098 0 -0.7450 -0.4738 -0.2042
0 -0.0613 -0.0032 0.0616 0 -0.1357 -0.0376 0.06290 -0.0280 0.0015 0.0281 0 -0.3902 0.3363 1.06350 -0.0170 -0.0016 0.0141 0 -0.4375 0.1732 0.78280 -0.0974 -0.0062 0.0935 0 -0.3910 0.0464 0.47600 -0.0970 -0.0014 0.0931 0 -0.6903 -0.4303 -0.16410 -0.0621 0.0035 0.0640 vec(Π3) 0 -0.3406 -0.1860 -0.03560 -0.0439 0.0000 0.0385 0 -0.0329 0.0165 0.06650 -0.1015 -0.0019 0.0791 0 -0.0300 -0.0042 0.02140 -0.0829 -0.0003 0.0934 0 -0.0161 -0.0025 0.01150 -0.0854 -0.0009 0.0903 0 -0.6159 -0.2589 0.08890 -0.0677 -0.0023 0.0659 0 -0.1229 0.0435 0.21000 -0.1018 -0.0074 0.0855 0 -0.0574 -0.0055 0.04770 -0.1016 -0.0071 0.0795 0 -0.0161 0.0169 0.04900 -0.0804 0.0087 0.0984 0 -0.3772 0.2538 0.89650 -0.0952 -0.0038 0.0792 0 -0.2421 0.0972 0.4394
vec(A3) 0 0.0022 0.0427 0.1221 0 0.0530 0.2489 0.44100 -0.0603 0.0011 0.0560 0 -0.0582 0.0239 0.10640 -0.0282 0.0011 0.0300 0 -0.4226 0.2740 0.97740 -0.0144 0.0012 0.0171 0 -0.4789 0.0136 0.50480 -0.0911 -0.0016 0.0869 0 -0.3233 -0.0316 0.26550 -0.0893 0.0014 0.0916 0 -0.0173 0.1529 0.32120 -0.0596 -0.0047 0.0552 Ψ 3 2.9048 3.2533 3.58010 -0.0403 0.0043 0.0457 0 -0.1817 0.1979 0.57110 -0.0712 0.0140 0.1038 2.5 0.8292 1.8164 3.08350 -0.0958 -0.0006 0.0941 0 -0.1466 0.2088 0.54090 -0.0922 -0.0082 0.08070 -0.0632 0.0059 0.07770 -0.0802 0.0066 0.09230 -0.0924 0.0057 0.09110 -0.0755 0.0048 0.08950 -0.0822 0.0020 0.0853
ν 15 13.0737 15.9185 19.2311
CHAPTER 2. IMPROVING VAR FORECASTS THROUGH AR INVERSE WISHART SV 156
Table 2.5.ii: Section 2.7.2: Distribution of the posterior mean of the parameters for IWSV model, Real data, across N = 100 samples,Recursive window
Distribution DistributionParameter Prior mean 2.5% C.I. mean 97.5% C.I. Parameter Prior mean 2.5% C.I. mean 97.5% C.I.vech(C) 0.3 0.3169 0.3570 0.3940 vec(Π1) 0.2 0.1783 0.2069 0.2417
0 -0.0472 -0.0101 0.0244 0 0.0188 0.0283 0.04010 -0.0102 0.0087 0.0266 0 0.0138 0.0169 0.02030 -0.0098 0.0029 0.0156 0 -0.0314 -0.0254 -0.0222
0.3 0.3078 0.3575 0.3972 0 0.2150 0.3770 0.61170 -0.0137 -0.0019 0.0135 0.8 0.5636 0.5971 0.63600 -0.0271 -0.0174 -0.0100 0 0.0000 0.0181 0.0254
0.3 0.1809 0.2132 0.2406 0 -0.0215 -0.0054 0.00130 -0.0202 -0.0078 0.0042 0 -0.0235 0.1114 0.2094
0.3 0.0946 0.1171 0.1378 0 0.0029 0.0524 0.1450vec(A1) 0.9 0.9962 1.0018 1.0068 0.8 1.1440 1.2516 1.3223
0 -0.0110 0.0013 0.0104 0 -0.0072 -0.0019 0.01040 0.0023 0.0047 0.0076 0 -0.0303 0.0498 0.12400 -0.0112 -0.0100 -0.0089 0 -0.2833 -0.2083 -0.13130 -0.0792 -0.0297 0.0369 0 -0.1228 -0.0924 -0.0575
0.9 0.9218 0.9332 0.9467 0.8 1.1078 1.1842 1.23440 -0.0033 0.0036 0.0091 vec(Π2) 0 0.0807 0.1457 0.18850 -0.0004 0.0040 0.0076 0 0.0078 0.0137 0.02080 -0.0436 -0.0006 0.0357 0 0.0071 0.0107 0.01550 -0.0238 0.0082 0.0414 0 -0.0234 -0.0186 -0.0150
0.9 0.8701 0.8866 0.8998 0 -0.5158 -0.3684 -0.27940 -0.0247 -0.0191 -0.0134 0 0.0854 0.1547 0.19280 -0.0335 -0.0008 0.0359 0 -0.0230 -0.0055 0.00480 -0.0429 -0.0184 0.0141 0 0.0070 0.0103 0.01550 -0.0244 -0.0019 0.0221 0 -0.4067 -0.2102 -0.0803
0.9 0.7822 0.7959 0.8100 0 -0.1261 0.0442 0.1253vec(A2) 0 0.0363 0.0452 0.0544 0 -0.5686 -0.5257 -0.4351
0 -0.0141 -0.0011 0.0110 0 -0.0389 -0.0268 -0.01870 -0.0026 0.0011 0.0056 0 0.1418 0.3322 0.47010 -0.0029 -0.0014 0.0005 0 0.0574 0.1723 0.23860 -0.0115 -0.0004 0.0103 0 -0.0120 0.0219 0.06370 -0.0182 -0.0062 0.0060 0 -0.4234 -0.3567 -0.30300 -0.0085 -0.0007 0.0072 vec(Π3) 0 -0.1889 -0.1567 -0.10480 -0.0021 0.0019 0.0054 0 -0.0003 0.0141 0.02290 -0.0127 0.0011 0.0144 0 -0.0102 -0.0065 -0.00270 -0.0109 0.0021 0.0172 0 -0.0061 0.0004 0.00430 -0.0202 -0.0054 0.0067 0 -0.3499 -0.2877 -0.21740 -0.0081 -0.0001 0.0064 0 0.0297 0.1074 0.14370 -0.0120 -0.0005 0.0097 0 -0.0171 -0.0132 -0.00630 -0.0109 0.0002 0.0121 0 0.0170 0.0217 0.02550 -0.0084 0.0005 0.0120 0 -0.0871 0.0283 0.19340 -0.0108 0.0000 0.0080 0 -0.1700 -0.0872 0.1153
vec(A3) 0 0.0388 0.0455 0.0536 0 0.1564 0.2198 0.26300 -0.0111 -0.0016 0.0083 0 0.0207 0.0286 0.04340 -0.0021 0.0017 0.0061 0 0.1454 0.2404 0.32740 -0.0034 -0.0014 0.0004 0 -0.1485 -0.0634 0.03840 -0.0119 -0.0009 0.0119 0 -0.0453 0.0107 0.04590 -0.0159 -0.0035 0.0081 0 0.0378 0.0787 0.15270 -0.0074 -0.0002 0.0074 Ψ 3 3.1668 3.2304 3.29540 -0.0020 0.0026 0.0071 0 -0.0393 0.0112 0.17000 -0.0098 0.0005 0.0122 2.5 1.8627 2.1368 2.47180 -0.0100 0.0001 0.0117 0 -0.0699 0.0350 0.23960 -0.0150 -0.0042 0.00820 -0.0070 0.0005 0.00880 -0.0097 -0.0004 0.01120 -0.0142 -0.0012 0.01060 -0.0100 0.0011 0.01210 -0.0098 0.0005 0.0130
ν 15 14.7539 15.7235 16.3972
CHAPTER 2. IMPROVING VAR FORECASTS THROUGH AR INVERSE WISHART SV 157
Table 2.5.iii: Section 2.7.2: Distribution of the posterior mean of the parameters for IWSV model, Real data, across N = 100 samples,Rolling window
Distribution DistributionParameter Prior mean 2.5% C.I. mean 97.5% C.I. Parameter Prior mean 2.5% C.I. mean 97.5% C.I.vech(C) 0.3 0.3185 0.3531 0.3832 vec(Π1) 0.2 0.0400 0.1544 0.2780
0 -0.0371 -0.0066 0.0222 0 -0.0100 0.0065 0.02000 -0.0132 0.0068 0.0336 0 0.0165 0.0310 0.05840 -0.0169 -0.0063 0.0046 0 -0.0353 -0.0183 -0.0104
0.3 0.3064 0.3458 0.3836 0 -0.1644 0.1977 0.48250 -0.0209 -0.0008 0.0182 0.8 0.4511 0.5484 0.61190 -0.0208 -0.0110 0.0013 0 -0.0454 0.0289 0.0486
0.3 0.1913 0.2236 0.2495 0 -0.0092 0.0121 0.03320 -0.0312 -0.0093 0.0070 0 -0.2187 -0.0084 0.1595
0.3 0.0816 0.1003 0.1321 0 -0.0593 0.0253 0.1636vec(A1) 0.9 0.9910 0.9961 1.0023 0.8 1.0033 1.0719 1.1743
0 -0.0179 -0.0046 0.0081 0 -0.0058 0.0160 0.03170 -0.0027 0.0031 0.0101 0 -0.1244 0.0075 0.14310 -0.0110 -0.0077 -0.0054 0 -0.3676 -0.1666 0.06470 -0.0676 -0.0050 0.0482 0 -0.3851 -0.2110 -0.0573
0.9 0.8813 0.9029 0.9404 0.8 1.0663 1.1334 1.21860 -0.0137 0.0028 0.0188 vec(Π2) 0 0.0633 0.1755 0.28320 -0.0050 0.0027 0.0087 0 -0.0132 0.0032 0.01920 -0.0348 0.0104 0.0534 0 0.0151 0.0240 0.03210 -0.0157 0.0190 0.0502 0 -0.0240 -0.0170 -0.0113
0.9 0.8688 0.8838 0.8970 0 -0.5463 -0.2745 0.08460 -0.0285 -0.0216 -0.0121 0 0.0916 0.1512 0.25740 -0.0358 -0.0048 0.0249 0 -0.0128 0.0363 0.11000 -0.0418 -0.0103 0.0239 0 -0.0131 0.0066 0.02230 -0.0314 -0.0085 0.0217 0 -0.5758 -0.3562 -0.1639
0.9 0.7857 0.8007 0.8251 0 -0.1191 0.0599 0.1634vec(A2) 0 0.0331 0.0410 0.0511 0 -0.5216 -0.4447 -0.3696
0 -0.0321 -0.0015 0.0264 0 -0.0522 -0.0314 -0.01330 -0.0128 -0.0005 0.0094 0 0.0263 0.2298 0.49280 -0.0042 -0.0015 0.0012 0 0.0015 0.2249 0.34910 -0.0109 -0.0002 0.0082 0 -0.1277 -0.0247 0.08340 -0.0150 -0.0030 0.0086 0 -0.4164 -0.2914 -0.17170 -0.0107 -0.0012 0.0084 vec(Π3) 0 -0.2182 -0.0745 0.00150 -0.0036 0.0014 0.0050 0 0.0062 0.0231 0.04190 -0.0099 0.0005 0.0097 0 -0.0056 0.0027 0.01010 -0.0088 0.0004 0.0098 0 -0.0138 -0.0074 -0.00090 -0.0144 -0.0029 0.0110 0 -0.3420 -0.1868 -0.02810 -0.0086 -0.0011 0.0060 0 0.0505 0.1847 0.24790 -0.0095 0.0000 0.0097 0 -0.0340 -0.0042 0.02500 -0.0106 0.0004 0.0094 0 0.0027 0.0182 0.02980 -0.0093 0.0003 0.0088 0 -0.1851 0.0293 0.22040 -0.0079 0.0003 0.0094 0 -0.3163 -0.1617 0.0951
vec(A3) 0 0.0342 0.0414 0.0486 0 0.1535 0.2405 0.30260 -0.0263 0.0020 0.0289 0 0.0320 0.0602 0.09650 -0.0119 0.0006 0.0126 0 -0.0743 0.0807 0.29530 -0.0037 -0.0018 0.0003 0 -0.4598 -0.2316 0.14920 -0.0127 -0.0008 0.0116 0 -0.0681 0.0543 0.16600 -0.0135 -0.0021 0.0105 0 0.0391 0.0917 0.16400 -0.0102 -0.0010 0.0126 Ψ 3 2.9755 3.2153 3.43350 -0.0028 0.0012 0.0056 0 -0.0780 0.0484 0.25500 -0.0086 0.0000 0.0088 2.5 1.5758 1.8761 2.20940 -0.0084 0.0008 0.0093 0 -0.1144 -0.0045 0.20370 -0.0128 -0.0034 0.00410 -0.0060 0.0002 0.00560 -0.0117 -0.0014 0.00700 -0.0138 -0.0012 0.01030 -0.0110 0.0004 0.01040 -0.0080 0.0001 0.0088
ν 15 14.8556 17.0125 19.2372
CHAPTER 2. IMPROVING VAR FORECASTS THROUGH AR INVERSE WISHART SV 158
Table 2.6.i: Section 2.7.2: Posterior distribution of the parameters for Clark model, Real data, 1st sample, Recursive windowPosterior Posterior
Parameter Prior mean 2.5% C.I. mean 97.5% C.I. Parameter Prior mean 2.5% C.I. mean 97.5% C.I.b21 0 -0.0035 0.0340 0.0706 vec(Π1) 0.2 0.0731 0.2563 0.4417b31 0 -0.0303 -0.0133 0.0013 0 -0.0431 0.0050 0.0535b32 0 0.0225 0.0537 0.0786 0 -0.0095 0.0040 0.0192b41 0 0.0298 0.0422 0.0543 0 -0.0445 -0.0275 -0.0113b42 0 0.0015 0.0285 0.0562 0 -0.1109 0.3320 0.7760b43 0 0.0221 0.0758 0.1242 0.8 0.3110 0.5186 0.7296ϕ1 doesn’t exist 0.2569 0.5142 0.9586 0 -0.0461 -0.0149 0.0154ϕ2 doesn’t exist 0.2716 0.5229 0.9416 0 -0.0440 -0.0043 0.0354ϕ3 doesn’t exist 0.3263 0.6611 1.2517 0 -0.8804 -0.3001 0.2954ϕ4 doesn’t exist 0.2419 0.4893 0.9293 0 -0.1741 0.1089 0.4020
0.8 1.1264 1.3160 1.50040 -0.0403 0.0288 0.08890 -0.7122 0.0270 0.76210 -0.8124 -0.2888 0.24110 -0.3009 -0.1286 0.0573
0.8 0.9702 1.1552 1.3412vec(Π2) 0 -0.1059 0.0762 0.2561
0 -0.0319 0.0255 0.08090 -0.0074 0.0065 0.02200 -0.0373 -0.0216 -0.00660 -0.7804 -0.2908 0.18380 -0.0849 0.1416 0.37130 -0.0565 -0.0275 0.00510 -0.0463 -0.0075 0.03050 -0.8763 -0.1833 0.51650 -0.4712 -0.0387 0.39790 -0.8819 -0.6029 -0.31660 -0.1286 -0.0536 0.02860 -0.4021 0.3384 1.07900 -0.4346 0.1610 0.75700 -0.0309 0.2028 0.43610 -0.7155 -0.4575 -0.1950
vec(Π3) 0 -0.3412 -0.1761 -0.00840 -0.0338 0.0140 0.06400 -0.0147 0.0006 0.01690 -0.0186 -0.0057 0.00690 -0.6458 -0.2553 0.15210 -0.0510 0.1404 0.31840 -0.0190 0.0039 0.03440 -0.0108 0.0225 0.05500 -0.4066 0.2528 0.90530 -0.3194 0.0160 0.36000 0.0828 0.2758 0.46620 -0.0340 0.0250 0.08530 -0.5465 0.1582 0.86850 -0.4387 0.0452 0.53770 -0.2472 -0.0772 0.08510 0.0155 0.1813 0.3413
Ψ 3 2.9053 3.2772 3.62300 -0.2167 0.1691 0.5383
2.5 1.0637 2.2502 3.52550 -0.0718 0.2747 0.6080
CHAPTER 2. IMPROVING VAR FORECASTS THROUGH AR INVERSE WISHART SV 159
Table 2.6.ii: Section 2.7.2: Distribution of the posterior mean of the parameters for Clark model, Real data, across N = 100 samples,Recursive window
Distribution DistributionParameter Prior mean 2.5% C.I. mean 97.5% C.I. Parameter Prior mean 2.5% C.I. mean 97.5% C.I.b21 0 0.0271 0.0343 0.0413 vec(Π1) 0.2 0.2193 0.2416 0.2597b31 0 -0.0226 -0.0186 -0.0140 0 0.0055 0.0222 0.0398b32 0 0.0316 0.0400 0.0534 0 0.0032 0.0051 0.0073b41 0 0.0361 0.0403 0.0447 0 -0.0321 -0.0243 -0.0195b42 0 0.0185 0.0229 0.0317 0 0.0629 0.1623 0.3159b43 0 0.0807 0.0938 0.1072 0.8 0.5087 0.5298 0.5612ϕ1 doesn’t exist 0.3160 0.4001 0.5112 0 -0.0180 -0.0082 -0.0026ϕ2 doesn’t exist 0.3743 0.4450 0.5259 0 -0.0145 0.0080 0.0165ϕ3 doesn’t exist 0.5106 0.5640 0.6417 0 -0.2665 -0.1119 -0.0251ϕ4 doesn’t exist 0.3213 0.3985 0.5030 0 -0.0825 -0.0068 0.1260
0.8 1.2945 1.3557 1.40730 0.0015 0.0144 0.03710 -0.0477 -0.0011 0.04990 -0.3630 -0.3034 -0.23560 -0.1271 -0.1025 -0.0671
0.8 1.1025 1.1652 1.2153vec(Π2) 0 0.0829 0.1489 0.1932
0 0.0128 0.0194 0.02590 0.0060 0.0081 0.01020 -0.0233 -0.0178 -0.01380 -0.2946 -0.1851 -0.13300 0.1322 0.1765 0.21210 -0.0275 -0.0133 -0.00360 -0.0112 -0.0003 0.00700 -0.3450 -0.1816 -0.08260 -0.0759 0.0510 0.10850 -0.6551 -0.5973 -0.56030 -0.0594 -0.0419 -0.03440 0.2069 0.3600 0.47900 0.0497 0.2284 0.36070 0.1230 0.1734 0.20450 -0.4429 -0.3629 -0.2985
vec(Π3) 0 -0.1724 -0.1322 -0.09890 0.0030 0.0204 0.03410 -0.0018 0.0005 0.00290 -0.0083 -0.0024 0.00070 -0.2876 -0.2446 -0.19500 0.1273 0.1985 0.24130 0.0039 0.0079 0.01280 0.0165 0.0231 0.03160 0.0396 0.1384 0.27860 -0.1424 -0.0560 0.04110 0.1336 0.2178 0.27690 0.0236 0.0364 0.05590 0.0896 0.1705 0.26050 -0.1737 -0.0393 0.11130 -0.0852 -0.0694 -0.03890 0.0537 0.1045 0.1890
Ψ 3 3.1523 3.2340 3.34830 -0.1078 -0.0359 0.1655
2.5 1.9251 2.2429 2.62200 -0.0373 0.0878 0.3491
CHAPTER 2. IMPROVING VAR FORECASTS THROUGH AR INVERSE WISHART SV 160
Table 2.6.iii: Section 2.7.2: Distribution of the posterior mean of the parameters for Clark model, Real data, across N = 100 samples,Rolling window
Distribution DistributionParameter Prior mean 2.5% C.I. mean 97.5% C.I. Parameter Prior mean 2.5% C.I. mean 97.5% C.I.b21 0 0.0116 0.0332 0.0508 vec(Π1) 0.2 0.1299 0.1831 0.2565b31 0 -0.0392 -0.0255 -0.0145 0 0.0010 0.0248 0.0539b32 0 -0.0440 -0.0130 0.0558 0 0.0062 0.0240 0.0516b41 0 0.0301 0.0360 0.0442 0 -0.0321 -0.0190 -0.0110b42 0 0.0053 0.0163 0.0315 0 -0.3065 0.0323 0.2289b43 0 0.0861 0.1064 0.1342 0.8 0.4393 0.5232 0.5880ϕ1 doesn’t exist 0.4483 0.4846 0.5617 0 -0.0245 0.0125 0.0449ϕ2 doesn’t exist 0.4900 0.5326 0.5793 0 -0.0093 0.0244 0.0372ϕ3 doesn’t exist 0.5823 0.6738 0.7938 0 -0.2616 -0.1561 -0.0367ϕ4 doesn’t exist 0.4382 0.4754 0.5246 0 -0.1113 -0.0291 0.1231
0.8 1.0559 1.1462 1.27490 -0.0004 0.0237 0.03500 -0.0360 0.0300 0.11740 -0.3795 -0.2548 -0.08740 -0.3983 -0.2249 -0.0638
0.8 1.0626 1.1299 1.2290vec(Π2) 0 0.0678 0.1641 0.2412
0 -0.0113 0.0101 0.03250 0.0004 0.0157 0.02580 -0.0215 -0.0162 -0.01290 -0.4334 -0.2454 -0.05040 0.1170 0.1688 0.26110 -0.0253 0.0368 0.07700 -0.0142 -0.0022 0.01720 -0.3906 -0.2491 -0.13710 -0.0668 0.0364 0.11100 -0.6243 -0.5301 -0.45090 -0.0531 -0.0336 -0.01880 0.0121 0.2567 0.45700 0.0297 0.2753 0.42410 -0.1188 0.0901 0.28900 -0.4588 -0.3234 -0.1831
vec(Π3) 0 -0.1717 -0.0732 -0.02880 0.0055 0.0223 0.04810 -0.0037 0.0067 0.01790 -0.0119 -0.0070 -0.00310 -0.2680 -0.1150 -0.00270 0.1423 0.2313 0.29250 -0.0007 0.0213 0.04960 0.0005 0.0146 0.02720 -0.0211 0.1319 0.26360 -0.1609 -0.0717 0.05000 0.1379 0.2394 0.32530 0.0311 0.0562 0.08330 0.0640 0.1231 0.19400 -0.4318 -0.1993 0.15930 -0.2045 -0.0389 0.16790 0.0603 0.1250 0.1950
Ψ 3 2.9878 3.2183 3.52110 -0.1435 0.0376 0.2415
2.5 1.4283 1.7693 2.09760 -0.1917 -0.0455 0.2607
CHAPTER 2. IMPROVING VAR FORECASTS THROUGH AR INVERSE WISHART SV 161
Figure 2.23.i: IWSV, Posterior of parameters across N = 100 recursive sample windows
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
0.55
10 20 30 40 50 60 70 80 90 100
c 11
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
0.55
10 20 30 40 50 60 70 80 90 100
c 22
0.1
0.15
0.2
0.25
0.3
0.35
10 20 30 40 50 60 70 80 90 100
c 33
0
0.05
0.1
0.15
0.2
0.25
0.3
10 20 30 40 50 60 70 80 90 100
c 44
2.6
2.8
3
3.2
3.4
3.6
3.8
4
10 20 30 40 50 60 70 80 90 100
ψ1
0.5
1
1.5
2
2.5
3
3.5
4
10 20 30 40 50 60 70 80 90 100
ψ3
10
12
14
16
18
20
22
24
10 20 30 40 50 60 70 80 90 100
ν
Sample window 'n'
Prior meanPosterior mean
2.5% C.I.97.5% C.I.
CHAPTER 2. IMPROVING VAR FORECASTS THROUGH AR INVERSE WISHART SV 162
Figure 2.23.ii: IWSV, Posterior of parameters across N = 100 rolling sample windows
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
0.55
10 20 30 40 50 60 70 80 90 100
c 11
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
0.55
10 20 30 40 50 60 70 80 90 100
c 22
0.1
0.15
0.2
0.25
0.3
0.35
10 20 30 40 50 60 70 80 90 100
c 33
0
0.05
0.1
0.15
0.2
0.25
0.3
10 20 30 40 50 60 70 80 90 100
c 44
2.6
2.8
3
3.2
3.4
3.6
3.8
4
10 20 30 40 50 60 70 80 90 100
ψ1
0.5
1
1.5
2
2.5
3
3.5
4
10 20 30 40 50 60 70 80 90 100
ψ3
10
12
14
16
18
20
22
24
10 20 30 40 50 60 70 80 90 100
ν
Sample window 'n'
Prior meanPosterior mean
2.5% C.I.97.5% C.I.
CHAPTER 2. IMPROVING VAR FORECASTS THROUGH AR INVERSE WISHART SV 163
Figure 2.24.i: Clark, Posterior of parameters across N = 100 recursive sample windows
-0.02
0
0.02
0.04
0.06
0.08
0.1
10 20 30 40 50 60 70 80 90 100
b 21
-0.07-0.06-0.05-0.04-0.03-0.02-0.01
0 0.01 0.02
10 20 30 40 50 60 70 80 90 100
b 31
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
10 20 30 40 50 60 70 80 90 100
b 32
0
0.01
0.02
0.03
0.04
0.05
0.06
10 20 30 40 50 60 70 80 90 100
b 41
0
0.05
0.1
0.15
10 20 30 40 50 60 70 80 90 100
b 42
0 0.02 0.04 0.06 0.08 0.1
0.12 0.14 0.16 0.18
10 20 30 40 50 60 70 80 90 100
b 43
2.6
2.8
3
3.2
3.4
3.6
3.8
4
10 20 30 40 50 60 70 80 90 100
ψ1
0.5
1
1.5
2
2.5
3
3.5
4
10 20 30 40 50 60 70 80 90 100
ψ3
CHAPTER 2. IMPROVING VAR FORECASTS THROUGH AR INVERSE WISHART SV 164
Figure 2.24.ii: Clark, Posterior of parameters across N = 100 rolling sample windows
-0.02
0
0.02
0.04
0.06
0.08
0.1
10 20 30 40 50 60 70 80 90 100
b 21
-0.07-0.06-0.05-0.04-0.03-0.02-0.01
0 0.01 0.02
10 20 30 40 50 60 70 80 90 100
b 31
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
10 20 30 40 50 60 70 80 90 100
b 32
0
0.01
0.02
0.03
0.04
0.05
0.06
10 20 30 40 50 60 70 80 90 100
b 41
0
0.05
0.1
0.15
10 20 30 40 50 60 70 80 90 100
b 42
0 0.02 0.04 0.06 0.08 0.1
0.12 0.14 0.16 0.18
10 20 30 40 50 60 70 80 90 100
b 43
2.6
2.8
3
3.2
3.4
3.6
3.8
4
10 20 30 40 50 60 70 80 90 100
ψ1
0.5
1
1.5
2
2.5
3
3.5
4
10 20 30 40 50 60 70 80 90 100
ψ3