linear grouping using orthogonal regression
TRANSCRIPT
Multivariate Discount Weighted Regression and Local
Level Models
Kostas Triantafyllopoulosa
aUniversity of Newcastle, Newcastle upon Tyne, UK.
Abstract
In this paper we propose a multivariate discount weighted regression technique
to give a tractable solution to the problem of variance estimation and forecasting
for the multivariate local level model. We give the correspondence between discount
regression and matrix normal dynamic linear models and we show that the local level
model can be treated with discount regression techniques. We illustrate the proposed
methodology with London metal exchange data consisting of aluminium spot and fu-
ture contract closing prices. The proposed estimate of the noise covariance matrix
suggests these data exhibit high cross-correlation, which is discussed in some detail.
The performance of the weighted regression model is evaluated with a simple outlier
analysis. A sensitivity analysis shows that a low discount factor should be used and
practical guidelines are given for general future use.
Keywords: Time series, dynamic models, Kalman filter, state space models, regres-
sion, Bayesian forecasting.
1
1 Introduction
Multivariate time series have, recently, received significant development from both theoret-
ical and practical standpoints. Whittle (1963), Hannan (1970) and Lutkepohl (1993) exam-
ine the widely used stationary VARMA models for observation vectors, whilst Lutkepohl
(1993, ch. 13), West and Harrison (1997, ch. 16) and Durbin and Koopman (2001, ch. 3)
give detail consideration of state space or dynamic linear models. These models provide
a sound statistical framework with inclusion of practical modelling mechanisms such as
feed-forward and feed-back intervention. However, in most multivariate models the noise
covariance matrices have to be assumed known, if tractability is desired. Local level models
constitute the simplest class of dynamic models with significant application. Enns et al.
(1982), Machak et al. (1983), Harvey (1986, 1989, ch. 8) and Durbin and Koopman (2001,
p. 44) discuss multivariate local level models, although the problem of the specification of
the noise covariance matrices does not receive much attention. Sequential Markov chain
Monte Carlo techniques (Doucet et al., 2001) as well as EM algorithm techniques (Durbin
and Koopman, 2001) are available in multivariate time series, but their efficiency needs to
be studied when the dimension of the vector of the observations is large and when online
forecasting is on demand. The only tractable multivariate dynamic model, which assumes
a prior inverted Wishart distribution for the observation noise covariance matrix, is the
so called matrix dynamic linear model, reported in West and Harrison (1997, p. 597) and
applied in Salvador et al. (2003) and Salvador and Gargallo (2004). Triantafyllopoulos and
Pikoulas (2002) proposed a new multivariate dynamic linear model, which is termed here
as multivariate discount weighted regression. The novelty of this model is that, although
it does not assume a prior Wishart distribution for the unknown noise covariance matrix,
it produces neat recurrence equations for estimation and forecasting.
In this paper, using discount weighted regression, we provide an efficient methodology
for estimation and forecasting for the multivariate local level model. The observation
2
covariance matrix of the noise vector is assumed unknown and it is estimated from the
data using a standard maximum likelihood procedure. The correspondence of the discount
regression and the matrix dynamic linear model is established and some properties of the
regression model are discussed. As a practical illustration of the proposed methodology we
consider data consisting of aluminium spot and future contract prices. It is our belief that
these data are interesting for two reasons: they have strong similarities with international
exchange rates data and they are not often discussed in the literature.
This paper is organized as follows. Section 2.1 gives the general description of dis-
count regression as developed in Triantafyllopoulos and Pikoulas (2002). In section 2.2 the
correspondence of discount regression and matrix normal dynamic models is established,
while sections 2.3 and 2.4 show that multivariate local level models with unknown noise
covariance matrices can be analyzed with discount regression techniques. The proposed
local level model is illustrated by analyzing London metal exchange data in section 3.
Conclusions follow in section 4 and the proofs are given in the appendix (section 5).
2 Multivariate discount weighted regression
2.1 The model
Let yt be an p-dimensional vector of observations, which become available at roughly
equal intervals of time. The multivariate discount weighted regression model (DWR) is
defined by
y′t = F ′tΘt + ν ′t, νt|Σ ∼ Np×1(0, Σ), (observation equation) (2.1a)
Θt = Θt−1 + Ωt, Ωt|Σ ∼ Nd×p(0,Wt, Σ), (transition equation) (2.1b)
where Ft is a d-dimensional design vector, Θt is a d × p state matrix, Σ is the p × p
covariance matrix of the innovations νt, and Wt is a d× d covariance matrix. Conditional
on Σ, the innovations νt and Ωt follow, respectively, multivariate and matrix-variate normal
3
distributions, the latter of which is typically defined us
Ωt|Σ ∼ Nd×p(0,Wt, Σ) ⇔ vec(Ωt)|Σ ∼ Ndp×1(0, Σ⊗Wt),
where vec(·) denotes the column stacking operator of a lower portion of a square matrix,
⊗ denotes the Kronecker product of two matrices and Np×1(·) denotes the p-dimensional
normal distribution. For more details on matrix-variate distributions the reader is referred
to Gupta and Nagar (1999). For a positive integer T > 0, let yt = (y1, . . . , yt) denote the
information set comprising the available data up to time t (t = 1, . . . , T ) and assume that
the innovation series (νt)t=1,...,T and (Ωt)t=1,...,T are internally and mutually uncorrelated
and also they are uncorrelated with the assumed prior
Θ0|Σ ∼ Nd×p(m0, C0, Σ), (2.2)
for some known m0 and C0. The design vector Ft is assumed known for all t and the
covariance matrix Wt is specified with a discount factor δ as Wt = (1 − δ)Ct−1/δ, where
Ct−1 is a known covariance matrix at time t− 1 (see below). If we define
Xt =t−1∑j=0
δjFt−jF′t−j and Ht =
t−1∑j=0
δjFt−jy′t−j,
then Triantafyllopoulos and Pikoulas (2002) show that the unbiased linear squares estimate,
which is the same as the maximum likelihood estimate, of Θt based on the data yt is
mt = X−1t Ht = CtHt = mt−1 + Ate
′t, (2.3)
where At = CtFt is the adaptive vector and et = yt−m′t−1Ft is the one-step forecast error.
The estimate mt is optimal in the sense that it minimizes the discounted sum of squares
Sδ(Θ) =t−1∑j=1
δj(y′t−j − F ′
t−jΘ) (
y′t−j − F ′t−jΘ
)′,
for all d× p state matrices Θ.
4
With rt = yt −m′tFt, the maximum likelihood estimate of the covariance matrix Σ is
ntSt = nt−1St−1 + rte′t and nt = nt−1 + 1, (2.4)
which, for uninformative prior degrees of freedom n0 ≈ 0 and a bounded S0, reduces to
St =1
t
t∑j=1
rje′j.
The covariance matrix Ct is updated by:
Ct =1
δ
(Id − Ct−1FtF
′t
δ + F ′tCt−1Ft
)Ct−1, (2.5)
where Id is the d× d identity matrix. Conditional on Σ = St and yt, one can easily derive
the posterior distribution of Θt and the k-step ahead predictive distribution of yt+k, i.e.
Θt|Σt = St, yt ∼ Nd×p(mt, Ct, St) and yt+1|Σ = St, y
t ∼ Np×1m′tFt, (F
′tCt−1Ft/δ+1)St.
Full details and derivation of these results appear in Triantafyllopoulos and Pikoulas (2002).
2.2 Relationship with matrix normal dynamic models
Model (2.1a) and (2.1b) corresponds to the matrix normal dynamic linear models (DLMs)
as described in West and Harrison, (1997, p. 597). The matrix normal DLMs developed
independently in Harvey (1986) and Quintana and West (1987) and they have been further
explored in Salvador et al. (2003) and Salvador and Gargallo (2004). The matrix DLM is
defined by
y′t = F ′tΘt + ν ′t, νt|Σ ∼ Np×1(0, Σ), (observation equation) (2.6a)
Θt = GtΘt−1 + Ωt, Ωt|Σ ∼ Nd×p(0,Wt, Σ), (transition equation) (2.6b)
where Gt is a d× d transition matrix and the remaining model components are as defined
in the model (2.1a) and (2.1b). It is further assumed that in addition to the prior (2.2),
5
a prior inverted Wishart distribution is assumed for the covariance matrix Σ as Σ ∼IWp(n0 + 2p, n0S0) with density
p(Σ) = c(n0)n(n0+p−1)/20 det(S0)
(n0+p−1)/2 det(Σ)−(p+n0/2) exp−n0trace(S0Σ−1)/2 (2.7)
and
c(n0) =
2p(n0+p−1)/2πp(p−1)/4
p∏j=1
Γ
(n0 + p− j
2
)−1
,
where n0 are the prior degrees of freedom and S0 is the prior variance estimate. Then
inference and forecasting for model (2.6a) and (2.6b) apply providing a similar to Kalman
filter algorithm. This algorithm is a fully Bayesian algorithm and it is summarized as
follows. For any time t > 0, conditional on Σ, the posterior distribution of Θt is Θt|Σ, yt ∼Nd×p(mt, Ct, Σ) and the one-step predictive distribution is yt+1|Σ, yt ∼ Np×1mtF
′t , (F
′tRtFt+
1)Σ, where
mt = Gtmt−1 + Ate′t and Ct = Rt − (F ′
tRtFt + 1)AtA′t, (2.8)
for At = RtFt/(F′tRtFt + 1), Rt = GtCt−1G
′t + Wt and m0, C0 are taken from the prior
(2.2). The distribution of Σ given yt is the inverted Wishart Σ|yt ∼ IWp(nt + 2p, ntSt),
where
ntSt = nt−1St−1 +ete
′t
F ′tRtFt + 1
and nt = nt−1 + 1. (2.9)
Unconditionally of Σ, the posterior distribution of Θt and the predictive distribution of
yt+1 are, respectively, multivariate and matrix-variate Student t distributions.
Looking now at models (2.1a), (2.1b) and (2.6a), (2.6b), we see that if we set Gt = Id
and Wt = (1 − δ)Ct−1/δ, the two models coincide. We can see that the estimates mt, Ct
and St produced by each model are essentially the same. First note that the estimates mt
in equations (2.3) and (2.8) are the same. From equation (2.8) with Gt = Id we have
Ct =Ct−1
δ− Ct−1FtF
′tCt−1
δ2(F ′tCt−1Ft/δ + 1)2
=1
δ
(Id − Ct−1FtF
′t
δ + F ′tCt−1Ft
)Ct−1,
6
which is equation (2.5). From the definition of rt = yt −m′tFt and from the recursion of
mt, we have
rt = yt −m′t−1Ft − etA
′tFt = (1− A′
tFt)et.
Also it is
1− A′tFt = 1− F ′
tCt−1Ft
δ(F ′tCt−1Ft/δ + 1)
=1
F ′tCt−1Ft/δ + 1
.
Then from equation (2.9)
ntSt = nt−1St−1 +ete
′t
F ′tCt−1Ft/δ + 1
= nt−1St−1 + (1− A′tFt)ete
′t = nt−1St−1 + rte
′t,
which is equation (2.4).
The above show that the DWR is equivalent to the matrix normal DLM with identity
transition matrix. However, we need to note that the DWR does not make any assumption
for the distribution of Σ, while the DLM assumes an inverted Wishart prior.
2.3 The invertibility assumption of Xt
The DWR development is based on the assumption of the non-singularity of the matrix
Xt. However, in some practical situations Xt is singular, e.g. when Ft = F is constant over
time. It is evident that we can make use of the generalized Moore-Penrose inverse matrix
to overcome this difficulty. In particular, let X+t denote the Moore-Penrose inverse matrix
of Xt, which always exists and it is unique (Harville, 2000, ch. 20). The next result states
that under a condition, we can still have the same recurrences of mt and Ct as in section
2.1.
Theorem 1. Consider the DWR, defined by equations (2.1a) and (2.1b). With Xt as
defined in section 2.1, let X+t denote the Moore-Penrose inverse of Xt and write mt =
X+t Ht, At = X+
t Ft and Ct = X+t , where Xt and Ht are as defined in section 2.1. If the
following condition holds
rank([X+t−1 : X+
t ]) = rank(Xt−1) = rank(Xt), (2.10)
7
where [X+t−1 : X+
t ] denotes the extended matrix of X+t−1 and X+
t , then the recursive forms
of mt and Ct are given as in equations (2.3) and (2.5).
The proof of the recurrence of St does not make use of the matrix Xt and so the
updating of equation (2.4) holds, regardless of the singularity of Xt.
2.4 Local level models
The multivariate local level model is defined by
yt = ψt + νt, νt|Σ ∼ Np×1(0, Σ), (observation equation) (2.11a)
ψt = ψt−1 + ωt, ωt|Vt ∼ Np×1(0, Vt), (transition equation) (2.11b)
where yt is a p-dimensional vector of observations, ψt is a p-vector of states, Σ and Vt are
the p × p noise covariance matrices. Typically, the covariance matrices will be unknown
and we seek to obtain a fast algorithm which will allow estimation of these covariance
matrices as well as it will allow efficient forecasting of yt, for t = 1, . . . , T , for a positive
integer T > 0. When p = 1 and Vt = V is constant over time, the analysis of the local level
model follows by considering the signal noise ratio q = V/Σ (West and Harrison, 1997,
ch. 2; Franco and Souza, 2002). Enns et al. (1982) and Machak et al. (1983) suggest to
use the signal noise ratio for p > 1 and these authors provide some details on the scalar
function q(·) with V = qΣ. Harvey (1986) provides an estimator for Σ, based on maximum
likelihood estimation of q, but that method cannot be used for online estimation since each
time a new estimator for Σ is required a new estimate of q is needed and clearly this cannot
be applied for every consecutive t > 0. In addition to that the log likelihood function used
to estimate q, and hence Σ, is a complicated non-linear function of q, which depends on
some reference priors the choice of which has not been discussed. We believe that these
estimation problems have limited the widespread of the above estimation method.
In this section we assume that the transition covariance matrix Vt depends on time t
and so we define the time-varying signal noise ratio as qtIp = VtΣ−1, where it is assumed
8
that the variance Σ is strictly positive definite. Then in the transition equation (2.11b) we
can write ωt|Σ ∼ Np×1(0, qtΣ). This may be seen as a restriction of the model, although
we should note that every conjugate state space model (univariate or multivariate), which
is able to model the observation variance or covariance matrix uses this setting for Vt.
West and Harrison (1997, p. 109) state that no practical problem arises, if the transition
covariance matrix is scaled by the observation covariance matrix. The next result shows
that local level models can be seen as special cases of DWR.
Theorem 2. Consider the DWR of equations (2.1a) and (2.1b) and suppose that Ft =
F = [1 0 · · · 0]′, for all t ≥ 1. Then the DWR reduces to the local level model of equations
(2.11a) and (2.11b) with Vt = qtΣ, where qt > 0 is an appropriate scalar and Σ the p× p
covariance matrix of the observation innovations νt. In addition, let the prior distribution
of ψ0 be
ψ0|Σ = S∗0 ∼ Np×1(m∗0, C
∗0S
∗0),
for some known m∗0, C∗
0 and S∗0 . Then, conditional on Σ = S∗t , the posterior distribution
of ψt is
ψt|Σ = S∗t , yt ∼ Np×1(m
∗t , C
∗t S
∗t ),
with
m∗t = m∗
t−1 + A∗t e∗t , A∗
t =C∗
t−1
δ + C∗t−1
, C∗t =
1
δ + C∗t−1
, ntS∗t = nt−1S
∗t−1 + r∗t e
∗t ,
where e∗t = yt −m∗t−1, r∗t = yt −m∗
t , nt = nt−1 + 1 and 0 < δ ≤ 1.
For any positive integer k, the k-step predictive distribution follows from Triantafyl-
lopoulos and Pikoulas (2002) and from Theorem 2 as
yt+k|Σ = S∗t , yt ∼ Np×1yt(k), Rt(k)S∗t ,
where
yt(k) = m∗t , Rt(k) =
k(1− δ) + δ
δC∗
t + 1.
9
We note that yt(k) does not depend on the lead time k, sometimes referred to as forecast
horizon. This fact has been well known and it is characteristic in local level models.
However, the covariance matrix Rt(k) does depend on k; in fact Rt(k) is an increasing
sequence in k. This is expected, since the bigger the forecast horizon the larger the variance.
We also note that in the static model, where δ = 1, we have Vt = 0 and Rt(k) = C∗t (all
uncertainty comes from Σ), while in the other extreme (totally unstable model) where
δ = 0, we have Vt = Rt(k) = ∞. Of course both these cases are to be avoided. In practice
discount factors in the range [0.1, 0.99] should be used.
3 The London metal exchange data
3.1 Description of the LME data
The London metal exchange (LME) is the world’s premier non-ferrous metals market, with
highly liquid contracts. Its trading customers may be metal industries or individuals (sellers
or buyers). LMEX, the London metal exchange index, is a base metals index comprising
the six primary non-ferrous metals traded on the exchange: aluminium, copper grade A,
standard lead, primary nickel, tin, and zinc. More details about the LME may be found
on its web site: http://www.lme.co.uk.
Here we examine official prices for the aluminum, which is the most important metal
traded in the exchange. There are two categories of prices which appear in the market:
the ask price and the bid price. The ask price refers to the price which occurs from the
relevant amount of aluminium required by each customer/trading company. The bid price
refers to the price that the exchange suggests to the customer companies and its evaluation
is based on complicated models, which are not widely available. After consultation with
Aluminium of Greece S.A.I.C., we found out that there was increasing interest in predicting
the ask prices. Therefore, in this study we concentrate on the ask prices. The data are
10
1400
1500
1600
1700
1450
1500
1550
1600
1650
1480
1500
1520
1540
1560
1580
1600
1500
1520
1540
1560
0 50 100 150 200 250Time
y 4y 3
y 2y 1
London Metal Exchange Alluminium Closing Prices
Figure 1: London metal exchange aluminium closing prices yt = [y1 y2 y3 y4]′
11
collected for every trading day from March 2000 to February 2001, and are plotted in
Figure 1. After excluding week-ends and bank holidays, there are T = 246 trading days.
There are four variables in interest, y1t: spot ask price, y2t: 3 months future contract ask
price, y3t: 15 months future contract ask price and y4t: 27 months future contract ask
price, where each price is US dollars per tonne of aluminium. These four variables are
summarized in the vector time series yt = [y1t y2t y3t y4t]′, t = 1, . . . , 246. The data have
been kindly provided by Aluminium of Greece S.A.I.C., a member of the Pechiney Group
(http://www.pechiney.com/).
Over the last 10 years there has been a noticeable interest in modelling the LME market.
Slade (1988) and Meyer (1994) discuss the problems of pricing and hedging in the non-
ferrous metal market. The efficiency of the LME is examined in Sephton and Cochrane
(1990), Agbeyegbe (1992) and Moore and Cullen (1995). The relationship of future and
spot prices and how the futures can be used to predict the spot prices are considered in
Gilbert (1997) and in Heaney (2002). The important subject of volatility for the LME
market is discussed by Hall (1991) and McKenzie et al. (2001). Panas (2001) suggests
long memory and chaos models to assess the metal prices when nonlinear structure is
evident. A good review of the London metal exchange literature can be found in Watkins
and McAleer (2004).
It is evident that modelling and forecasting LME data requires the adaptation of ef-
ficient multivariate time series techniques. As Figure 1 indicates the four variables of
interest, yit, are cross-correlated and therefore any sound modelling system should be able
to model this correlation. Although the economics literature has dealt with modelling
issues in the non-ferrous metal market, in particular in the LME market, we must submit
that an automatic forecasting procedure is currently unavailable. Triantafyllopoulos and
Montana (2004) have used standard Markov chain Monte Carlo techniques to model the
LME data, but they found that simulation was too slow and not appropriate for online
forecasting. In the next sections we use the local level procedure of section 2.4 in order to
12
0.00
00.
005
0.01
00.
015
0.02
00.
025
−60 −40 −20 0 20 40 60 −40 −20 0 20 40 60 −20 0 20 40 −40 −20 0 20 40
−40
−20
020
4060
−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3
Den
sity
Theoretical Quantiles
Sam
ple
Qua
ntile
s
x1 x2 x3 x4
Histograms and normal QQ plots for the difference series
Figure 2: Histogram and normal QQ plots for the difference series xt
effectively model the LME data.
3.2 The model
In efficient markets (see, for instance, Sephton and Cochrane, 1990) a common practice
for commodity price short-term forecasting is to take as one-step forecast for time t just
the current observed value at time t − 1 (t = 2, . . . , 246). However, possible shocks can
be identified by modelling the difference yt − yt−1 instead of the actual series yt, and we
follow this approach here. We define the difference series xt = yt − yt−1 for 2 ≤ t ≤ 246.
Given yt−1, the difference xt = yt − yt−1 is expected to have zero mean (yt is predicted
as yt−1) and some unknown variance. Histograms and normal probability plots for the
4−dimensional time series xt are shown in Figure 2. Of course these are only marginal
13
histograms which do not show the sample cross correlation between the four time series.
However, such summary plots can validate the assumption of a normal distribution for the
first difference. A more detailed analysis involving multivariate normal distribution tests
can be done following the comprehensive review of Mecklin and Mundfrom (2004). Here,
we assume that given a state ψt and a variance matrix Σ, we have
xt|ψt, Σ ∼ N4×1(ψt, Σ),
where N4(·) denotes the 4-dimensional multivariate normal distribution.
The level of the series ψt may be considered constant, in which case we have a static
regression model with ψ1 = ψ2 = · · · = ψ246 = ψ. It seems more reasonable to assume that
ψt is expected to be static, but there is some uncertainty associated with it. This reasoning
leads to the adaptation of a local level model (2.11a) and (2.11b) for the time series xt,where p = 4.
3.3 Statistical analysis
We apply Theorem 2 to provide one-step forecasts for the difference time series xt,xt = yt − yt−1, where yt is the LME time series of Figure 1. We are also interested in
estimating the covariance matrix Σ and hence looking at the cross-correlation structure
of the four component time series x1t, x2t, x3t, x4t. Figure 3 plots the one-step forecasts
xi,t−1(1) of xit against the actual values xit, for i = 1 and i = 4 (spot and 27 months
contract component time series). We observe that the forecasts are generally good except
from some outliers. These outliers can be identified by a simple time series plot of the
standard forecast errors, shown in Figure 4. The standard errors are plotted against ±1.96
confidence intervals identifying 19 outliers for x1t, 15 outliers for x2t, 11 outliers for
x3t and 8 outliers for x4t. Out of 246 total observations these figures give 7.7%, 6.1%,
4.5% and 3.2% outliers respectively. For such kind of data we believe that the above
proportions are relatively low. However, as the forecasts are produced further analysis
14
Forecasts of the spot difference time series
time
0 50 100 150 200 250
−40
020
60
Forecasts of the 27 months difference time series
time
0 50 100 150 200 250
−40
−20
020
Figure 3: One-step forecasts for the difference spot and 27 months contract series x1tand x4t, where xt = [x1t x2t x3t x4t]
′. The solid line shows the actual values, while the
dashed line shows the forecasts.
can be done, i.e. the approach of Salvador and Gargallo (2004) for automatic intervention
can be applied. We also note that, comparing the forecasts of the four series, the future
contract series are predicted better with x4t having the best forecast accuracy, while
the spot time series x1t has the less forecast accuracy. This is expected since the spot
variable is more volatile than the future contracts and this can be verified by looking at
the range of the four variables.
For these results, priors in Theorem 2 have specified as: m∗0 = [0 0 0 0]′, C∗
0 = 1/1000,
S∗0 = 1000I4, n0 = 1/1000 and δ = 0.7. These priors have been chosen with care in order
15
Standard errors of the spot difference time series
time
0 50 100 150 200 250
−6−2
02
4
Standard errors of the 27 months difference time series
time
0 50 100 150 200 250
−4−2
02
4
Figure 4: One-step forecast standard errors for the difference series x1t and x4t, where
xt = [x1t x2t x3t x4t]′. The solid lines shows the standard errors and the dashed line plots
95% confidence intervals at ±1.96.
16
to provide a non-informative Bayesian prior specification. This is reflected by the high
values of the covariance matrix S∗0 and the low value of the degrees of freedom n0. In order
to judge and evaluate the correlation structure throughout the series we need to obtain
good estimates of Σ. Running several simulations we have found that the best performance
of the estimate of Σ, S∗t , is obtained when n0S∗0 = Ip and in this case n0S
∗0 = I4. The
prior C∗0 has been chosen to be very low, because again, we wish the covariance matrix
of ψ0|Σ = S∗0 to be equal to I4. The prior m∗0 is set to zero, because the time series xt
fluctuates around zero and a relatively low value for the discount factor δ is chosen so that
the shocks could be captured. A better forecast accuracy can be achieved if δ is chosen to
be even lower, i.e. δ = 0.6. A sensitivity analysis for δ is given in section 3.4.
As it is mentioned before, it is of particular modelling interest to estimate the covari-
ance matrix Σ efficiently. The respective correlation matrix will reflect upon the variance
throughout each of the series xit as well as the correlation among the four component
time series xit. Figure 5 plots the diagonal elements of the covariance matrix S∗t pro-
duced by Theorem 2. We observe that the variance estimate for the spot and 3 months
contract time series are more volatile, while the remaining future contract time series are
less volatile. Figure 6 shows the cross correlation between the four scalar time series xit. If
CR = ρ(ij), is the correlation matrix of xt given ψt, then the solid line shows the estimate
of ρ(12), the dashed line shows the estimate of ρ(23), the dotted line shows the estimate of
ρ(34) and the dashed/dotted line shows the estimate of ρ(14). Of course it is ρ(ii) = 1, for
i = 1, 2, 3, 4. We observe that the spot and the 3 months contract time series are much
more correlated than the other future contracts. We also observe that the correlation of the
spot and the 27 months contract time series is much smaller than the relevant correlation
of the spot and the 3 months contract time series.
17
Observation variance estimates
time
varia
nce
0 50 100 150 200 250
050
100
150
200
Figure 5: Variance estimates of Σ = σ(ij) (i, j = 1, 2, 3, 4). The solid line shows the
estimate of σ(11); the dashed line shows the estimate of σ(22); the dotted line shows the
estimate of σ(33); and the dashed/dotted line shows the estimate of σ(44).
18
Observation correlation estimates
time
corre
lation
0 50 100 150 200 250
0.60.7
0.80.9
1.0
Figure 6: Correlation estimates of the series xt given ψt. If CR = ρ(ij) (i, j = 1, 2, 3, 4),
is the correlation matrix of xt (corresponding correlation matrix to the covariance matrix
Σ), then the solid line shows the estimate of ρ(12), the dashed line shows the estimate of
ρ(23), the dotted line shows the estimate of ρ(34) and the dashed/dotted line shows the
estimate of ρ(14).
19
Estimation of the state variance
time
varia
nce
0 50 100 150 200 250
02
46
810
Figure 7: The effect of the discount factor δ in the estimated state scalar variance C∗t . The
solid line shows C∗t for δ = 0.1 and the dashed line shows C∗
t for δ = 0.7.
3.4 Sensitivity analysis of the discount factor δ
The design of a multivariate forecasting model, as the one used above, requires the spec-
ification of the starting values m∗0, n0, C∗
0 , S∗0 and δ. Specification of m∗0, n0, C∗
0 and S∗0
follows from the discussion of section 3.3, but the sensitivity of δ is of particular interest.
Many authors have argued (e.g.
West and Harrison, 1997) that in local level models and generally in state space models
the discount factor δ should be in the range [0.8, 0.99] because otherwise the system variance
Wt will have very large values yielding an unstable model with very large values in the
estimated state covariance matrix. This argument may be generally valid, however, we
20
Table 1: Effect of the discount factor δ in the sum of squared forecast errors (SSE).
SSE
δ x1t x2t x3t x4t
0.1 393.6 341.7 249.8 209.0
0.2 424.3 320.2 255.6 223.8
0.3 437.3 310.6 257.4 228.3
0.4 444.5 306.5 258.3 230.4
0.5 449.1 304.8 258.7 231.6
0.6 452.3 304.2 258.8 232.3
0.7 454.5 304.2 258.9 232.7
0.8 456.1 304.6 258.8 233.0
0.9 457.2 305.1 258.7 233.1
0.99 458.0 305.5 258.6 233.2
note that a high discount factor will not allow us to predict accurately the shocks in time
series like in section 3. The estimated covariance matrix of ψt is C∗t S
∗t , where C∗
t and S∗t
are calculated from Theorem 2. From C∗t = 1/(δ + C∗
t−1) it is clear that the real sequence
C∗t is neither increasing nor decreasing, but its limit exists and it is trivial to see that
it is C∗ = limt→∞ C∗t = (
√δ2 + 4 − δ)/2. Figure 7 plots the variance C∗
t for δ = 0.1 and
δ = 0.7. We see that for different values of δ the difference in C∗t is not very big with
limiting values C∗ = 0.95 (δ = 0.1) and C∗ = 0.71 (δ = 0.7). This means that δ does not
have a dramatic effect in the covariance matrix of ψt and this is an advantage since it gives
more flexibility to the modeller.
In order to judge better the effect of δ to the forecasts we have looked at the change
in the sum of squared forecast errors (SSE) for several values of δ throughout the range
[0.1, 0.99]. Table 1 shows that small values of δ give smaller SSE for the variables x1t, x3t, x4t,
21
but for x2t the lowest SSE is achieved for δ = 0.6 and δ = 0.7. This is why we have used in
section 3.3 δ = 0.7, which we consider low enough to give good forecast accuracy. Although
Table 1 suggests that δ = 0.1 gives the smallest SSE for x1t, x3t, x4t, we noticed slight fore-
cast improvement using δ = 0.1 as far as the outlier proportion is concerned. Table 1 also
suggests that perhaps two different discount factors should be used, i.e. δ1 = 0.1 for the
time series x1t, x3t, x4t and δ2 = 0.7 for x4t.
4 Conclusions
This paper proposes the application of discount weighted regression (DWR) of Triantafyl-
lopoulos and Pikoulas (2002) in order to model multivariate time series. DWR models
are appropriately modified to accommodate for multivariate local level models. The noise
covariance matrices are left unspecified and estimated with a maximum likelihood pro-
cedure, while the overall model exhibits maximum likelihood and weighted least squares
optimalities. The proposed methodology is applied to London metal exchange time series
data. Estimation and forecasting are performed with a fast and efficient online algorithm
and the cross-correlation of the component series is discussed in some detail. The starting
values of the algorithm as well as the discount factor are discussed giving practical guide-
lines which can be generally applied. A simple sensitivity analysis for the discount factor,
based on the sum of squared forecast errors, has given inspirations on employing several
discount factors in the models and it is expected that future research will be devoted in
this direction.
Acknowledgements
We thank Spiros Ikonomakos and Angelique Papageorgiou from the Commercial Depart-
ment of Aluminum of Greece S.A.I.C. for providing the LME data. Special thanks are due
22
to Giovanni Montana who helped on the computational part of the paper. The work of
the author was supported by grant NAL/00642/G of the Nuffield Foundation.
5 Appendix
Proof of Theorem 1. The condition (2.10) implies that both linear systems X+t−1X = X+
t
(in X) and X+t X = X+
t−1 (in X) are consistent and so (Harville, 1997, p. 120) we have
X+t−1Xt−1X
+t = X+
t and X+t XtX
+t−1 = X+
t−1. Now see that
X+t−1 − δX+
t = X+t (Xt − δXt−1)X
+t−1 = X+
t FtF′tX
+t−1. (5.1)
Then using mt = X+t Ht we get
mt −mt−1 = X+t Ht −X+
t−1Ht−1 = X+t (δHt−1 + Fty
′t)−X+
t−1Ht−1
= (δX+t −X+
t−1)Ht−1 + X+t Fty
′t = X+
t Fty′t −X+
t FtF′tX
+t−1Ht−1
= X+t Ft(y
′t − F ′
tX+t−1Ht−1) = X+
t Ft(y′t − F ′
tmt−1) = Ate′t.
From equation (5.1), the covariance matrix Ct = X+t is
Ct−1 = Ct(δId + FtF′tCt−1) ⇒ CtFt = (δ + F ′
tCt−1Ft)−1Ct−1Ft ⇒
Ct−1 − δCt = (δ + F ′tCt−1Ft)
−1Ct−1FtF′tCt−1 ⇒ Ct =
1
δ
(Id − Ct−1FtF
′t
δ + F ′tCt−1Ft
)Ct−1
and the proof is complete.
Proof of Theorem 2. With Ft = F = [1 0 · · · 0]′ and the definition of Xt, we have
Xt =1− δt
1− δdiag(1, 0, . . . , 0)
and so we have rank(Xt) = rank(Xt−1) = 1. The Moore-Penrose inverse of Xt is
X+t =
1− δ
1− δtdiag(1, 0, . . . , 0)
23
and the row space of the extended matrix [X+t−1 : X+
t ] is
R([X+t−1 : X+
t ]) = λ
[1− δ
1− δt−10 · · · 0
1− δ
1− δt0 · · · 0
]′,
for any λ ∈ R. Hence
rank([X+t−1 : X+
t ]) = dimR([X+t−1 : X+
t ]) = 1 = rank(Xt) = rank(Xt−1)
and so assumption (2.10) holds. Thus from Theorem 1, the estimates mt, Ct and St are
updated as in equations (2.3), (2.5) and (2.4), respectively.
With the transformation
ψt = Θ′tF and ωt = Ω′
tF,
model (2.1a) and (2.1b) reduces to the local level model (2.11a) and (2.11b) with Ωt = qtΣ
and qt = F ′WtF .
With mt, Ct and St as in the equations (2.3), (2.5) and (2.4), define m∗t = m′
tF ,
A∗t = F ′At and C∗
t S∗t = var(ψt|Σ = S∗t , y
t). Then we have
et = yt −m′t−1F = yt −m∗
t−1 = e∗t , rt = yt −m′tF = yt −m∗
t = r∗t
mt = mt−1 + Ate′t =⇒ m∗
t = m∗t−1 + A∗
t e∗t .
Since e∗t = et and r∗t = rt the estimate S∗t = St with St as in equation (2.4) providing the
required updating recursion for S∗t .
Also
var(ψt|Σ = S∗t , yt) = varvec(F ′Θt)|Σ = St, y
t
= (Id ⊗ F ′)varvec(Θt)|Σ = St, yt(Id ⊗ F )
= F ′CtFS∗t = C∗t S
∗t ,
where Ct is updated as in equation (2.5). From C∗t = F ′CtF , it follows that
A∗t = F ′At =
F ′Ct−1F
F ′Ct−1F + δ=
C∗t−1
δ + C∗t−1
.
24
By multiplying left and right equation (2.5) by F ′ and F , respectively, we obtain
C∗t =
1
δ
(1− C∗
t−1
δ + C∗t−1
)C∗
t−1 =1
δ + C∗t−1
.
The proof is complete by stating the prior
ψ0|Σ = S∗0 ∼ Np×1(m∗0, C
∗0S
∗0),
where m∗0 = m′
0F .
References
[1] Agbeyegbe T.D. (1992). Common stochastic trends: evidence from the London Metal
Exchange. Bulletin of Economic Research, 44, 141-151.
[2] Doucet, A., de Freitas, N. and Gordon, N.J. (2001) Sequential Monte Carlo Methods
in Practice. Springer-Verlag, New York.
[3] Durbin, J. and Koopman, S.J. (2001) Time Series Analysis by State Space Methods.
Oxford University Press, Oxford.
[4] Enns, P.G., Machack, J.A., Spivey, W.A. and Wrobleski, W.J. (1982) Forecasting ap-
plications of an adaptive multiple exponential smoothing model. Management Science,
28, 1035-1044.
[5] Franco, G.C. and Souza, R.C. (2002) A comparison of methods for bootstrapping in
the local level model. Journal of Forecasting, 21, 27-38.
[6] Gilbert C.L. (1997). Manipulation of metals futures: lessons from Sumitomo. Centre
for Economic Policy Research, 26, Discussion Paper: 1537.
[7] Gupta, A.K. and Nagar, D.K. (1999) Matrix Variate Distributions. Chapman and
Hall/CRC Monographs and Surveys in Pure and Applied Mathematics 104, New York.
25
[8] Hall S.G. (1991). An Application of the stochastic GARCH-in-mean model to risk pre-
mia in the London Metal Exchange. Manchester School of Economic and Social Studies
(Supplement), 59, 57-71.
[9] Hannan, E.J. (1970) Multiple Time Series. New York: Wiley.
[10] Harrison, P.J. and Stevens, S. (1976) Bayesian forecasting (with discussion). Journal
of the Royal Statistical Society (Series B), 38, 205-247.
[11] Harvey, A.C. (1986) Analysis and generalisation of a multivariate exponential smooth-
ing model. Management Science, 32, 374-380.
[12] Harvey, A.C. (1989) Forecasting, Structural Time Series Models and the Kalman Fil-
ter. Cambridge University Press, Cambridge.
[13] Heaney R. (2002). Does knowledge of the cost model improve commodity futures
price forecasting ability? A case study using the London Metal Exchange lead contract.
International Journal of Forecasting, 18, 45-65.
[14] Lutkepohl, H. (1993) Introduction to Multiple Time Series Analysis. Springer-Verlag,
Berlin.
[15] Machak, J.A., Spivey, W.A. and Wrobleski, W.J. (1983) Analyzing permanent and
transitory influences in multiple time series models. Journal of Business and Economic
Statistics, 1, 57-65.
[16] McKenzie M. Michell H. Brooks R.D. and Faff R.W. (2001) Power ARCH modelling of
commodity futures data on the London Metal Exchange. European Journal of Finance,
7, 22-38.
[17] Mecklin, C.J. and Mundfrom, D.J. (2004) An appraisal and bibliography of tests for
multivariate normality. International Statistical Review, 72, 123-138.
26
[18] Meyer T.O. (1994). The difficulty in cross-hedging London Metal Exchange spot-price
risk using U.S. metal and British pound futures. Journal of Multinational Financial
Management, 4, 141-153.
[19] Moore M.J. and Cullen U. (1995). Speculative efficiency on the London Metal Ex-
change. Manchester School of Economic and Social Studies, 63, 235-256.
[20] Panas E. (2001). Long memory and chaotic models of prices on the London Metal
Exchange. Resources Policy, 27, 235-246.
[21] Quintana, J.M. and West, M. (1987). An analysis of international exchange rates using
multivariate DLMs. The Statistician, 36, 275-281.
[22] Salvador, M. and Gargallo, P. (2004) Automatic monitoring and intervention in multi-
variate dynamic linear models. Computational Statistics and Data Analysis, 47, 401-431.
[23] Salvador, M., Gallizo, J.L. and Gargallo, P. (2003) A dynamic principal components
analysis based on multivariate matrix normal dynamic linear models. Journal of Fore-
casting, 22, 457-478.
[24] Sephton P.S. and Cochrane D.K. (1990). A note on the efficiency of the London Metal
Exchange. Economics Letters, 33, 341-345.
[25] Shephard, N. (1993) Distribution of the ML estimator of an MA(1) and a local level
model. Econometric Theory, 9, 377-401.
[26] Shephard, N. and Harvey, A.C. (1990) On the probability of estimating a deterministic
component in the local level model. Journal of Time Series Analysis, 11, 339-347.
[27] Slade M.E. (1988). Pricing of Metals. CRS Monograph series 22, Queens University
Centre for Resource Studies, Kingston Ontario.
27
[28] Triantafyllopoulos, K. and Montana, G. (2004) Forecasting the London metal ex-
change with a dynamic model. In J. Antoch (ed.) Proceedings in Computational Statistics
2004, Physica-Verlag, Heidelberg, 1885-1892.
[29] Triantafyllopoulos, K. and Pikoulas, J. (2002) Multivariate regression applied to the
problem of network security. Journal of Forecasting, 21, 579-594.
[30] Watkins, C. and McAleer, M. (2004) Econometric modelling of non-ferrous metal
prices. Journal of Economic Surveys (to appear).
[31] West, M. and Harrison, P.J. (1997) Bayesian Forecasting and Dynamic Models. 2nd
edition. Springer Verlag, New York.
[32] Whittle, P. (1963) Prediction and Regulation. English Universities Press, London.
28