linear grouping using orthogonal regression

28
Multivariate Discount Weighted Regression and Local Level Models Kostas Triantafyllopoulos a a University of Newcastle, Newcastle upon Tyne, UK. Abstract In this paper we propose a multivariate discount weighted regression technique to give a tractable solution to the problem of variance estimation and forecasting for the multivariate local level model. We give the correspondence between discount regression and matrix normal dynamic linear models and we show that the local level model can be treated with discount regression techniques. We illustrate the proposed methodology with London metal exchange data consisting of aluminium spot and fu- ture contract closing prices. The proposed estimate of the noise covariance matrix suggests these data exhibit high cross-correlation, which is discussed in some detail. The performance of the weighted regression model is evaluated with a simple outlier analysis. A sensitivity analysis shows that a low discount factor should be used and practical guidelines are given for general future use. Keywords: Time series, dynamic models, Kalman filter, state space models, regres- sion, Bayesian forecasting. 1

Upload: kuleuven

Post on 12-May-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

Multivariate Discount Weighted Regression and Local

Level Models

Kostas Triantafyllopoulosa

aUniversity of Newcastle, Newcastle upon Tyne, UK.

Abstract

In this paper we propose a multivariate discount weighted regression technique

to give a tractable solution to the problem of variance estimation and forecasting

for the multivariate local level model. We give the correspondence between discount

regression and matrix normal dynamic linear models and we show that the local level

model can be treated with discount regression techniques. We illustrate the proposed

methodology with London metal exchange data consisting of aluminium spot and fu-

ture contract closing prices. The proposed estimate of the noise covariance matrix

suggests these data exhibit high cross-correlation, which is discussed in some detail.

The performance of the weighted regression model is evaluated with a simple outlier

analysis. A sensitivity analysis shows that a low discount factor should be used and

practical guidelines are given for general future use.

Keywords: Time series, dynamic models, Kalman filter, state space models, regres-

sion, Bayesian forecasting.

1

1 Introduction

Multivariate time series have, recently, received significant development from both theoret-

ical and practical standpoints. Whittle (1963), Hannan (1970) and Lutkepohl (1993) exam-

ine the widely used stationary VARMA models for observation vectors, whilst Lutkepohl

(1993, ch. 13), West and Harrison (1997, ch. 16) and Durbin and Koopman (2001, ch. 3)

give detail consideration of state space or dynamic linear models. These models provide

a sound statistical framework with inclusion of practical modelling mechanisms such as

feed-forward and feed-back intervention. However, in most multivariate models the noise

covariance matrices have to be assumed known, if tractability is desired. Local level models

constitute the simplest class of dynamic models with significant application. Enns et al.

(1982), Machak et al. (1983), Harvey (1986, 1989, ch. 8) and Durbin and Koopman (2001,

p. 44) discuss multivariate local level models, although the problem of the specification of

the noise covariance matrices does not receive much attention. Sequential Markov chain

Monte Carlo techniques (Doucet et al., 2001) as well as EM algorithm techniques (Durbin

and Koopman, 2001) are available in multivariate time series, but their efficiency needs to

be studied when the dimension of the vector of the observations is large and when online

forecasting is on demand. The only tractable multivariate dynamic model, which assumes

a prior inverted Wishart distribution for the observation noise covariance matrix, is the

so called matrix dynamic linear model, reported in West and Harrison (1997, p. 597) and

applied in Salvador et al. (2003) and Salvador and Gargallo (2004). Triantafyllopoulos and

Pikoulas (2002) proposed a new multivariate dynamic linear model, which is termed here

as multivariate discount weighted regression. The novelty of this model is that, although

it does not assume a prior Wishart distribution for the unknown noise covariance matrix,

it produces neat recurrence equations for estimation and forecasting.

In this paper, using discount weighted regression, we provide an efficient methodology

for estimation and forecasting for the multivariate local level model. The observation

2

covariance matrix of the noise vector is assumed unknown and it is estimated from the

data using a standard maximum likelihood procedure. The correspondence of the discount

regression and the matrix dynamic linear model is established and some properties of the

regression model are discussed. As a practical illustration of the proposed methodology we

consider data consisting of aluminium spot and future contract prices. It is our belief that

these data are interesting for two reasons: they have strong similarities with international

exchange rates data and they are not often discussed in the literature.

This paper is organized as follows. Section 2.1 gives the general description of dis-

count regression as developed in Triantafyllopoulos and Pikoulas (2002). In section 2.2 the

correspondence of discount regression and matrix normal dynamic models is established,

while sections 2.3 and 2.4 show that multivariate local level models with unknown noise

covariance matrices can be analyzed with discount regression techniques. The proposed

local level model is illustrated by analyzing London metal exchange data in section 3.

Conclusions follow in section 4 and the proofs are given in the appendix (section 5).

2 Multivariate discount weighted regression

2.1 The model

Let yt be an p-dimensional vector of observations, which become available at roughly

equal intervals of time. The multivariate discount weighted regression model (DWR) is

defined by

y′t = F ′tΘt + ν ′t, νt|Σ ∼ Np×1(0, Σ), (observation equation) (2.1a)

Θt = Θt−1 + Ωt, Ωt|Σ ∼ Nd×p(0,Wt, Σ), (transition equation) (2.1b)

where Ft is a d-dimensional design vector, Θt is a d × p state matrix, Σ is the p × p

covariance matrix of the innovations νt, and Wt is a d× d covariance matrix. Conditional

on Σ, the innovations νt and Ωt follow, respectively, multivariate and matrix-variate normal

3

distributions, the latter of which is typically defined us

Ωt|Σ ∼ Nd×p(0,Wt, Σ) ⇔ vec(Ωt)|Σ ∼ Ndp×1(0, Σ⊗Wt),

where vec(·) denotes the column stacking operator of a lower portion of a square matrix,

⊗ denotes the Kronecker product of two matrices and Np×1(·) denotes the p-dimensional

normal distribution. For more details on matrix-variate distributions the reader is referred

to Gupta and Nagar (1999). For a positive integer T > 0, let yt = (y1, . . . , yt) denote the

information set comprising the available data up to time t (t = 1, . . . , T ) and assume that

the innovation series (νt)t=1,...,T and (Ωt)t=1,...,T are internally and mutually uncorrelated

and also they are uncorrelated with the assumed prior

Θ0|Σ ∼ Nd×p(m0, C0, Σ), (2.2)

for some known m0 and C0. The design vector Ft is assumed known for all t and the

covariance matrix Wt is specified with a discount factor δ as Wt = (1 − δ)Ct−1/δ, where

Ct−1 is a known covariance matrix at time t− 1 (see below). If we define

Xt =t−1∑j=0

δjFt−jF′t−j and Ht =

t−1∑j=0

δjFt−jy′t−j,

then Triantafyllopoulos and Pikoulas (2002) show that the unbiased linear squares estimate,

which is the same as the maximum likelihood estimate, of Θt based on the data yt is

mt = X−1t Ht = CtHt = mt−1 + Ate

′t, (2.3)

where At = CtFt is the adaptive vector and et = yt−m′t−1Ft is the one-step forecast error.

The estimate mt is optimal in the sense that it minimizes the discounted sum of squares

Sδ(Θ) =t−1∑j=1

δj(y′t−j − F ′

t−jΘ) (

y′t−j − F ′t−jΘ

)′,

for all d× p state matrices Θ.

4

With rt = yt −m′tFt, the maximum likelihood estimate of the covariance matrix Σ is

ntSt = nt−1St−1 + rte′t and nt = nt−1 + 1, (2.4)

which, for uninformative prior degrees of freedom n0 ≈ 0 and a bounded S0, reduces to

St =1

t

t∑j=1

rje′j.

The covariance matrix Ct is updated by:

Ct =1

δ

(Id − Ct−1FtF

′t

δ + F ′tCt−1Ft

)Ct−1, (2.5)

where Id is the d× d identity matrix. Conditional on Σ = St and yt, one can easily derive

the posterior distribution of Θt and the k-step ahead predictive distribution of yt+k, i.e.

Θt|Σt = St, yt ∼ Nd×p(mt, Ct, St) and yt+1|Σ = St, y

t ∼ Np×1m′tFt, (F

′tCt−1Ft/δ+1)St.

Full details and derivation of these results appear in Triantafyllopoulos and Pikoulas (2002).

2.2 Relationship with matrix normal dynamic models

Model (2.1a) and (2.1b) corresponds to the matrix normal dynamic linear models (DLMs)

as described in West and Harrison, (1997, p. 597). The matrix normal DLMs developed

independently in Harvey (1986) and Quintana and West (1987) and they have been further

explored in Salvador et al. (2003) and Salvador and Gargallo (2004). The matrix DLM is

defined by

y′t = F ′tΘt + ν ′t, νt|Σ ∼ Np×1(0, Σ), (observation equation) (2.6a)

Θt = GtΘt−1 + Ωt, Ωt|Σ ∼ Nd×p(0,Wt, Σ), (transition equation) (2.6b)

where Gt is a d× d transition matrix and the remaining model components are as defined

in the model (2.1a) and (2.1b). It is further assumed that in addition to the prior (2.2),

5

a prior inverted Wishart distribution is assumed for the covariance matrix Σ as Σ ∼IWp(n0 + 2p, n0S0) with density

p(Σ) = c(n0)n(n0+p−1)/20 det(S0)

(n0+p−1)/2 det(Σ)−(p+n0/2) exp−n0trace(S0Σ−1)/2 (2.7)

and

c(n0) =

2p(n0+p−1)/2πp(p−1)/4

p∏j=1

Γ

(n0 + p− j

2

)−1

,

where n0 are the prior degrees of freedom and S0 is the prior variance estimate. Then

inference and forecasting for model (2.6a) and (2.6b) apply providing a similar to Kalman

filter algorithm. This algorithm is a fully Bayesian algorithm and it is summarized as

follows. For any time t > 0, conditional on Σ, the posterior distribution of Θt is Θt|Σ, yt ∼Nd×p(mt, Ct, Σ) and the one-step predictive distribution is yt+1|Σ, yt ∼ Np×1mtF

′t , (F

′tRtFt+

1)Σ, where

mt = Gtmt−1 + Ate′t and Ct = Rt − (F ′

tRtFt + 1)AtA′t, (2.8)

for At = RtFt/(F′tRtFt + 1), Rt = GtCt−1G

′t + Wt and m0, C0 are taken from the prior

(2.2). The distribution of Σ given yt is the inverted Wishart Σ|yt ∼ IWp(nt + 2p, ntSt),

where

ntSt = nt−1St−1 +ete

′t

F ′tRtFt + 1

and nt = nt−1 + 1. (2.9)

Unconditionally of Σ, the posterior distribution of Θt and the predictive distribution of

yt+1 are, respectively, multivariate and matrix-variate Student t distributions.

Looking now at models (2.1a), (2.1b) and (2.6a), (2.6b), we see that if we set Gt = Id

and Wt = (1 − δ)Ct−1/δ, the two models coincide. We can see that the estimates mt, Ct

and St produced by each model are essentially the same. First note that the estimates mt

in equations (2.3) and (2.8) are the same. From equation (2.8) with Gt = Id we have

Ct =Ct−1

δ− Ct−1FtF

′tCt−1

δ2(F ′tCt−1Ft/δ + 1)2

=1

δ

(Id − Ct−1FtF

′t

δ + F ′tCt−1Ft

)Ct−1,

6

which is equation (2.5). From the definition of rt = yt −m′tFt and from the recursion of

mt, we have

rt = yt −m′t−1Ft − etA

′tFt = (1− A′

tFt)et.

Also it is

1− A′tFt = 1− F ′

tCt−1Ft

δ(F ′tCt−1Ft/δ + 1)

=1

F ′tCt−1Ft/δ + 1

.

Then from equation (2.9)

ntSt = nt−1St−1 +ete

′t

F ′tCt−1Ft/δ + 1

= nt−1St−1 + (1− A′tFt)ete

′t = nt−1St−1 + rte

′t,

which is equation (2.4).

The above show that the DWR is equivalent to the matrix normal DLM with identity

transition matrix. However, we need to note that the DWR does not make any assumption

for the distribution of Σ, while the DLM assumes an inverted Wishart prior.

2.3 The invertibility assumption of Xt

The DWR development is based on the assumption of the non-singularity of the matrix

Xt. However, in some practical situations Xt is singular, e.g. when Ft = F is constant over

time. It is evident that we can make use of the generalized Moore-Penrose inverse matrix

to overcome this difficulty. In particular, let X+t denote the Moore-Penrose inverse matrix

of Xt, which always exists and it is unique (Harville, 2000, ch. 20). The next result states

that under a condition, we can still have the same recurrences of mt and Ct as in section

2.1.

Theorem 1. Consider the DWR, defined by equations (2.1a) and (2.1b). With Xt as

defined in section 2.1, let X+t denote the Moore-Penrose inverse of Xt and write mt =

X+t Ht, At = X+

t Ft and Ct = X+t , where Xt and Ht are as defined in section 2.1. If the

following condition holds

rank([X+t−1 : X+

t ]) = rank(Xt−1) = rank(Xt), (2.10)

7

where [X+t−1 : X+

t ] denotes the extended matrix of X+t−1 and X+

t , then the recursive forms

of mt and Ct are given as in equations (2.3) and (2.5).

The proof of the recurrence of St does not make use of the matrix Xt and so the

updating of equation (2.4) holds, regardless of the singularity of Xt.

2.4 Local level models

The multivariate local level model is defined by

yt = ψt + νt, νt|Σ ∼ Np×1(0, Σ), (observation equation) (2.11a)

ψt = ψt−1 + ωt, ωt|Vt ∼ Np×1(0, Vt), (transition equation) (2.11b)

where yt is a p-dimensional vector of observations, ψt is a p-vector of states, Σ and Vt are

the p × p noise covariance matrices. Typically, the covariance matrices will be unknown

and we seek to obtain a fast algorithm which will allow estimation of these covariance

matrices as well as it will allow efficient forecasting of yt, for t = 1, . . . , T , for a positive

integer T > 0. When p = 1 and Vt = V is constant over time, the analysis of the local level

model follows by considering the signal noise ratio q = V/Σ (West and Harrison, 1997,

ch. 2; Franco and Souza, 2002). Enns et al. (1982) and Machak et al. (1983) suggest to

use the signal noise ratio for p > 1 and these authors provide some details on the scalar

function q(·) with V = qΣ. Harvey (1986) provides an estimator for Σ, based on maximum

likelihood estimation of q, but that method cannot be used for online estimation since each

time a new estimator for Σ is required a new estimate of q is needed and clearly this cannot

be applied for every consecutive t > 0. In addition to that the log likelihood function used

to estimate q, and hence Σ, is a complicated non-linear function of q, which depends on

some reference priors the choice of which has not been discussed. We believe that these

estimation problems have limited the widespread of the above estimation method.

In this section we assume that the transition covariance matrix Vt depends on time t

and so we define the time-varying signal noise ratio as qtIp = VtΣ−1, where it is assumed

8

that the variance Σ is strictly positive definite. Then in the transition equation (2.11b) we

can write ωt|Σ ∼ Np×1(0, qtΣ). This may be seen as a restriction of the model, although

we should note that every conjugate state space model (univariate or multivariate), which

is able to model the observation variance or covariance matrix uses this setting for Vt.

West and Harrison (1997, p. 109) state that no practical problem arises, if the transition

covariance matrix is scaled by the observation covariance matrix. The next result shows

that local level models can be seen as special cases of DWR.

Theorem 2. Consider the DWR of equations (2.1a) and (2.1b) and suppose that Ft =

F = [1 0 · · · 0]′, for all t ≥ 1. Then the DWR reduces to the local level model of equations

(2.11a) and (2.11b) with Vt = qtΣ, where qt > 0 is an appropriate scalar and Σ the p× p

covariance matrix of the observation innovations νt. In addition, let the prior distribution

of ψ0 be

ψ0|Σ = S∗0 ∼ Np×1(m∗0, C

∗0S

∗0),

for some known m∗0, C∗

0 and S∗0 . Then, conditional on Σ = S∗t , the posterior distribution

of ψt is

ψt|Σ = S∗t , yt ∼ Np×1(m

∗t , C

∗t S

∗t ),

with

m∗t = m∗

t−1 + A∗t e∗t , A∗

t =C∗

t−1

δ + C∗t−1

, C∗t =

1

δ + C∗t−1

, ntS∗t = nt−1S

∗t−1 + r∗t e

∗t ,

where e∗t = yt −m∗t−1, r∗t = yt −m∗

t , nt = nt−1 + 1 and 0 < δ ≤ 1.

For any positive integer k, the k-step predictive distribution follows from Triantafyl-

lopoulos and Pikoulas (2002) and from Theorem 2 as

yt+k|Σ = S∗t , yt ∼ Np×1yt(k), Rt(k)S∗t ,

where

yt(k) = m∗t , Rt(k) =

k(1− δ) + δ

δC∗

t + 1.

9

We note that yt(k) does not depend on the lead time k, sometimes referred to as forecast

horizon. This fact has been well known and it is characteristic in local level models.

However, the covariance matrix Rt(k) does depend on k; in fact Rt(k) is an increasing

sequence in k. This is expected, since the bigger the forecast horizon the larger the variance.

We also note that in the static model, where δ = 1, we have Vt = 0 and Rt(k) = C∗t (all

uncertainty comes from Σ), while in the other extreme (totally unstable model) where

δ = 0, we have Vt = Rt(k) = ∞. Of course both these cases are to be avoided. In practice

discount factors in the range [0.1, 0.99] should be used.

3 The London metal exchange data

3.1 Description of the LME data

The London metal exchange (LME) is the world’s premier non-ferrous metals market, with

highly liquid contracts. Its trading customers may be metal industries or individuals (sellers

or buyers). LMEX, the London metal exchange index, is a base metals index comprising

the six primary non-ferrous metals traded on the exchange: aluminium, copper grade A,

standard lead, primary nickel, tin, and zinc. More details about the LME may be found

on its web site: http://www.lme.co.uk.

Here we examine official prices for the aluminum, which is the most important metal

traded in the exchange. There are two categories of prices which appear in the market:

the ask price and the bid price. The ask price refers to the price which occurs from the

relevant amount of aluminium required by each customer/trading company. The bid price

refers to the price that the exchange suggests to the customer companies and its evaluation

is based on complicated models, which are not widely available. After consultation with

Aluminium of Greece S.A.I.C., we found out that there was increasing interest in predicting

the ask prices. Therefore, in this study we concentrate on the ask prices. The data are

10

1400

1500

1600

1700

1450

1500

1550

1600

1650

1480

1500

1520

1540

1560

1580

1600

1500

1520

1540

1560

0 50 100 150 200 250Time

y 4y 3

y 2y 1

London Metal Exchange Alluminium Closing Prices

Figure 1: London metal exchange aluminium closing prices yt = [y1 y2 y3 y4]′

11

collected for every trading day from March 2000 to February 2001, and are plotted in

Figure 1. After excluding week-ends and bank holidays, there are T = 246 trading days.

There are four variables in interest, y1t: spot ask price, y2t: 3 months future contract ask

price, y3t: 15 months future contract ask price and y4t: 27 months future contract ask

price, where each price is US dollars per tonne of aluminium. These four variables are

summarized in the vector time series yt = [y1t y2t y3t y4t]′, t = 1, . . . , 246. The data have

been kindly provided by Aluminium of Greece S.A.I.C., a member of the Pechiney Group

(http://www.pechiney.com/).

Over the last 10 years there has been a noticeable interest in modelling the LME market.

Slade (1988) and Meyer (1994) discuss the problems of pricing and hedging in the non-

ferrous metal market. The efficiency of the LME is examined in Sephton and Cochrane

(1990), Agbeyegbe (1992) and Moore and Cullen (1995). The relationship of future and

spot prices and how the futures can be used to predict the spot prices are considered in

Gilbert (1997) and in Heaney (2002). The important subject of volatility for the LME

market is discussed by Hall (1991) and McKenzie et al. (2001). Panas (2001) suggests

long memory and chaos models to assess the metal prices when nonlinear structure is

evident. A good review of the London metal exchange literature can be found in Watkins

and McAleer (2004).

It is evident that modelling and forecasting LME data requires the adaptation of ef-

ficient multivariate time series techniques. As Figure 1 indicates the four variables of

interest, yit, are cross-correlated and therefore any sound modelling system should be able

to model this correlation. Although the economics literature has dealt with modelling

issues in the non-ferrous metal market, in particular in the LME market, we must submit

that an automatic forecasting procedure is currently unavailable. Triantafyllopoulos and

Montana (2004) have used standard Markov chain Monte Carlo techniques to model the

LME data, but they found that simulation was too slow and not appropriate for online

forecasting. In the next sections we use the local level procedure of section 2.4 in order to

12

0.00

00.

005

0.01

00.

015

0.02

00.

025

−60 −40 −20 0 20 40 60 −40 −20 0 20 40 60 −20 0 20 40 −40 −20 0 20 40

−40

−20

020

4060

−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3

Den

sity

Theoretical Quantiles

Sam

ple

Qua

ntile

s

x1 x2 x3 x4

Histograms and normal QQ plots for the difference series

Figure 2: Histogram and normal QQ plots for the difference series xt

effectively model the LME data.

3.2 The model

In efficient markets (see, for instance, Sephton and Cochrane, 1990) a common practice

for commodity price short-term forecasting is to take as one-step forecast for time t just

the current observed value at time t − 1 (t = 2, . . . , 246). However, possible shocks can

be identified by modelling the difference yt − yt−1 instead of the actual series yt, and we

follow this approach here. We define the difference series xt = yt − yt−1 for 2 ≤ t ≤ 246.

Given yt−1, the difference xt = yt − yt−1 is expected to have zero mean (yt is predicted

as yt−1) and some unknown variance. Histograms and normal probability plots for the

4−dimensional time series xt are shown in Figure 2. Of course these are only marginal

13

histograms which do not show the sample cross correlation between the four time series.

However, such summary plots can validate the assumption of a normal distribution for the

first difference. A more detailed analysis involving multivariate normal distribution tests

can be done following the comprehensive review of Mecklin and Mundfrom (2004). Here,

we assume that given a state ψt and a variance matrix Σ, we have

xt|ψt, Σ ∼ N4×1(ψt, Σ),

where N4(·) denotes the 4-dimensional multivariate normal distribution.

The level of the series ψt may be considered constant, in which case we have a static

regression model with ψ1 = ψ2 = · · · = ψ246 = ψ. It seems more reasonable to assume that

ψt is expected to be static, but there is some uncertainty associated with it. This reasoning

leads to the adaptation of a local level model (2.11a) and (2.11b) for the time series xt,where p = 4.

3.3 Statistical analysis

We apply Theorem 2 to provide one-step forecasts for the difference time series xt,xt = yt − yt−1, where yt is the LME time series of Figure 1. We are also interested in

estimating the covariance matrix Σ and hence looking at the cross-correlation structure

of the four component time series x1t, x2t, x3t, x4t. Figure 3 plots the one-step forecasts

xi,t−1(1) of xit against the actual values xit, for i = 1 and i = 4 (spot and 27 months

contract component time series). We observe that the forecasts are generally good except

from some outliers. These outliers can be identified by a simple time series plot of the

standard forecast errors, shown in Figure 4. The standard errors are plotted against ±1.96

confidence intervals identifying 19 outliers for x1t, 15 outliers for x2t, 11 outliers for

x3t and 8 outliers for x4t. Out of 246 total observations these figures give 7.7%, 6.1%,

4.5% and 3.2% outliers respectively. For such kind of data we believe that the above

proportions are relatively low. However, as the forecasts are produced further analysis

14

Forecasts of the spot difference time series

time

0 50 100 150 200 250

−40

020

60

Forecasts of the 27 months difference time series

time

0 50 100 150 200 250

−40

−20

020

Figure 3: One-step forecasts for the difference spot and 27 months contract series x1tand x4t, where xt = [x1t x2t x3t x4t]

′. The solid line shows the actual values, while the

dashed line shows the forecasts.

can be done, i.e. the approach of Salvador and Gargallo (2004) for automatic intervention

can be applied. We also note that, comparing the forecasts of the four series, the future

contract series are predicted better with x4t having the best forecast accuracy, while

the spot time series x1t has the less forecast accuracy. This is expected since the spot

variable is more volatile than the future contracts and this can be verified by looking at

the range of the four variables.

For these results, priors in Theorem 2 have specified as: m∗0 = [0 0 0 0]′, C∗

0 = 1/1000,

S∗0 = 1000I4, n0 = 1/1000 and δ = 0.7. These priors have been chosen with care in order

15

Standard errors of the spot difference time series

time

0 50 100 150 200 250

−6−2

02

4

Standard errors of the 27 months difference time series

time

0 50 100 150 200 250

−4−2

02

4

Figure 4: One-step forecast standard errors for the difference series x1t and x4t, where

xt = [x1t x2t x3t x4t]′. The solid lines shows the standard errors and the dashed line plots

95% confidence intervals at ±1.96.

16

to provide a non-informative Bayesian prior specification. This is reflected by the high

values of the covariance matrix S∗0 and the low value of the degrees of freedom n0. In order

to judge and evaluate the correlation structure throughout the series we need to obtain

good estimates of Σ. Running several simulations we have found that the best performance

of the estimate of Σ, S∗t , is obtained when n0S∗0 = Ip and in this case n0S

∗0 = I4. The

prior C∗0 has been chosen to be very low, because again, we wish the covariance matrix

of ψ0|Σ = S∗0 to be equal to I4. The prior m∗0 is set to zero, because the time series xt

fluctuates around zero and a relatively low value for the discount factor δ is chosen so that

the shocks could be captured. A better forecast accuracy can be achieved if δ is chosen to

be even lower, i.e. δ = 0.6. A sensitivity analysis for δ is given in section 3.4.

As it is mentioned before, it is of particular modelling interest to estimate the covari-

ance matrix Σ efficiently. The respective correlation matrix will reflect upon the variance

throughout each of the series xit as well as the correlation among the four component

time series xit. Figure 5 plots the diagonal elements of the covariance matrix S∗t pro-

duced by Theorem 2. We observe that the variance estimate for the spot and 3 months

contract time series are more volatile, while the remaining future contract time series are

less volatile. Figure 6 shows the cross correlation between the four scalar time series xit. If

CR = ρ(ij), is the correlation matrix of xt given ψt, then the solid line shows the estimate

of ρ(12), the dashed line shows the estimate of ρ(23), the dotted line shows the estimate of

ρ(34) and the dashed/dotted line shows the estimate of ρ(14). Of course it is ρ(ii) = 1, for

i = 1, 2, 3, 4. We observe that the spot and the 3 months contract time series are much

more correlated than the other future contracts. We also observe that the correlation of the

spot and the 27 months contract time series is much smaller than the relevant correlation

of the spot and the 3 months contract time series.

17

Observation variance estimates

time

varia

nce

0 50 100 150 200 250

050

100

150

200

Figure 5: Variance estimates of Σ = σ(ij) (i, j = 1, 2, 3, 4). The solid line shows the

estimate of σ(11); the dashed line shows the estimate of σ(22); the dotted line shows the

estimate of σ(33); and the dashed/dotted line shows the estimate of σ(44).

18

Observation correlation estimates

time

corre

lation

0 50 100 150 200 250

0.60.7

0.80.9

1.0

Figure 6: Correlation estimates of the series xt given ψt. If CR = ρ(ij) (i, j = 1, 2, 3, 4),

is the correlation matrix of xt (corresponding correlation matrix to the covariance matrix

Σ), then the solid line shows the estimate of ρ(12), the dashed line shows the estimate of

ρ(23), the dotted line shows the estimate of ρ(34) and the dashed/dotted line shows the

estimate of ρ(14).

19

Estimation of the state variance

time

varia

nce

0 50 100 150 200 250

02

46

810

Figure 7: The effect of the discount factor δ in the estimated state scalar variance C∗t . The

solid line shows C∗t for δ = 0.1 and the dashed line shows C∗

t for δ = 0.7.

3.4 Sensitivity analysis of the discount factor δ

The design of a multivariate forecasting model, as the one used above, requires the spec-

ification of the starting values m∗0, n0, C∗

0 , S∗0 and δ. Specification of m∗0, n0, C∗

0 and S∗0

follows from the discussion of section 3.3, but the sensitivity of δ is of particular interest.

Many authors have argued (e.g.

West and Harrison, 1997) that in local level models and generally in state space models

the discount factor δ should be in the range [0.8, 0.99] because otherwise the system variance

Wt will have very large values yielding an unstable model with very large values in the

estimated state covariance matrix. This argument may be generally valid, however, we

20

Table 1: Effect of the discount factor δ in the sum of squared forecast errors (SSE).

SSE

δ x1t x2t x3t x4t

0.1 393.6 341.7 249.8 209.0

0.2 424.3 320.2 255.6 223.8

0.3 437.3 310.6 257.4 228.3

0.4 444.5 306.5 258.3 230.4

0.5 449.1 304.8 258.7 231.6

0.6 452.3 304.2 258.8 232.3

0.7 454.5 304.2 258.9 232.7

0.8 456.1 304.6 258.8 233.0

0.9 457.2 305.1 258.7 233.1

0.99 458.0 305.5 258.6 233.2

note that a high discount factor will not allow us to predict accurately the shocks in time

series like in section 3. The estimated covariance matrix of ψt is C∗t S

∗t , where C∗

t and S∗t

are calculated from Theorem 2. From C∗t = 1/(δ + C∗

t−1) it is clear that the real sequence

C∗t is neither increasing nor decreasing, but its limit exists and it is trivial to see that

it is C∗ = limt→∞ C∗t = (

√δ2 + 4 − δ)/2. Figure 7 plots the variance C∗

t for δ = 0.1 and

δ = 0.7. We see that for different values of δ the difference in C∗t is not very big with

limiting values C∗ = 0.95 (δ = 0.1) and C∗ = 0.71 (δ = 0.7). This means that δ does not

have a dramatic effect in the covariance matrix of ψt and this is an advantage since it gives

more flexibility to the modeller.

In order to judge better the effect of δ to the forecasts we have looked at the change

in the sum of squared forecast errors (SSE) for several values of δ throughout the range

[0.1, 0.99]. Table 1 shows that small values of δ give smaller SSE for the variables x1t, x3t, x4t,

21

but for x2t the lowest SSE is achieved for δ = 0.6 and δ = 0.7. This is why we have used in

section 3.3 δ = 0.7, which we consider low enough to give good forecast accuracy. Although

Table 1 suggests that δ = 0.1 gives the smallest SSE for x1t, x3t, x4t, we noticed slight fore-

cast improvement using δ = 0.1 as far as the outlier proportion is concerned. Table 1 also

suggests that perhaps two different discount factors should be used, i.e. δ1 = 0.1 for the

time series x1t, x3t, x4t and δ2 = 0.7 for x4t.

4 Conclusions

This paper proposes the application of discount weighted regression (DWR) of Triantafyl-

lopoulos and Pikoulas (2002) in order to model multivariate time series. DWR models

are appropriately modified to accommodate for multivariate local level models. The noise

covariance matrices are left unspecified and estimated with a maximum likelihood pro-

cedure, while the overall model exhibits maximum likelihood and weighted least squares

optimalities. The proposed methodology is applied to London metal exchange time series

data. Estimation and forecasting are performed with a fast and efficient online algorithm

and the cross-correlation of the component series is discussed in some detail. The starting

values of the algorithm as well as the discount factor are discussed giving practical guide-

lines which can be generally applied. A simple sensitivity analysis for the discount factor,

based on the sum of squared forecast errors, has given inspirations on employing several

discount factors in the models and it is expected that future research will be devoted in

this direction.

Acknowledgements

We thank Spiros Ikonomakos and Angelique Papageorgiou from the Commercial Depart-

ment of Aluminum of Greece S.A.I.C. for providing the LME data. Special thanks are due

22

to Giovanni Montana who helped on the computational part of the paper. The work of

the author was supported by grant NAL/00642/G of the Nuffield Foundation.

5 Appendix

Proof of Theorem 1. The condition (2.10) implies that both linear systems X+t−1X = X+

t

(in X) and X+t X = X+

t−1 (in X) are consistent and so (Harville, 1997, p. 120) we have

X+t−1Xt−1X

+t = X+

t and X+t XtX

+t−1 = X+

t−1. Now see that

X+t−1 − δX+

t = X+t (Xt − δXt−1)X

+t−1 = X+

t FtF′tX

+t−1. (5.1)

Then using mt = X+t Ht we get

mt −mt−1 = X+t Ht −X+

t−1Ht−1 = X+t (δHt−1 + Fty

′t)−X+

t−1Ht−1

= (δX+t −X+

t−1)Ht−1 + X+t Fty

′t = X+

t Fty′t −X+

t FtF′tX

+t−1Ht−1

= X+t Ft(y

′t − F ′

tX+t−1Ht−1) = X+

t Ft(y′t − F ′

tmt−1) = Ate′t.

From equation (5.1), the covariance matrix Ct = X+t is

Ct−1 = Ct(δId + FtF′tCt−1) ⇒ CtFt = (δ + F ′

tCt−1Ft)−1Ct−1Ft ⇒

Ct−1 − δCt = (δ + F ′tCt−1Ft)

−1Ct−1FtF′tCt−1 ⇒ Ct =

1

δ

(Id − Ct−1FtF

′t

δ + F ′tCt−1Ft

)Ct−1

and the proof is complete.

Proof of Theorem 2. With Ft = F = [1 0 · · · 0]′ and the definition of Xt, we have

Xt =1− δt

1− δdiag(1, 0, . . . , 0)

and so we have rank(Xt) = rank(Xt−1) = 1. The Moore-Penrose inverse of Xt is

X+t =

1− δ

1− δtdiag(1, 0, . . . , 0)

23

and the row space of the extended matrix [X+t−1 : X+

t ] is

R([X+t−1 : X+

t ]) = λ

[1− δ

1− δt−10 · · · 0

1− δ

1− δt0 · · · 0

]′,

for any λ ∈ R. Hence

rank([X+t−1 : X+

t ]) = dimR([X+t−1 : X+

t ]) = 1 = rank(Xt) = rank(Xt−1)

and so assumption (2.10) holds. Thus from Theorem 1, the estimates mt, Ct and St are

updated as in equations (2.3), (2.5) and (2.4), respectively.

With the transformation

ψt = Θ′tF and ωt = Ω′

tF,

model (2.1a) and (2.1b) reduces to the local level model (2.11a) and (2.11b) with Ωt = qtΣ

and qt = F ′WtF .

With mt, Ct and St as in the equations (2.3), (2.5) and (2.4), define m∗t = m′

tF ,

A∗t = F ′At and C∗

t S∗t = var(ψt|Σ = S∗t , y

t). Then we have

et = yt −m′t−1F = yt −m∗

t−1 = e∗t , rt = yt −m′tF = yt −m∗

t = r∗t

mt = mt−1 + Ate′t =⇒ m∗

t = m∗t−1 + A∗

t e∗t .

Since e∗t = et and r∗t = rt the estimate S∗t = St with St as in equation (2.4) providing the

required updating recursion for S∗t .

Also

var(ψt|Σ = S∗t , yt) = varvec(F ′Θt)|Σ = St, y

t

= (Id ⊗ F ′)varvec(Θt)|Σ = St, yt(Id ⊗ F )

= F ′CtFS∗t = C∗t S

∗t ,

where Ct is updated as in equation (2.5). From C∗t = F ′CtF , it follows that

A∗t = F ′At =

F ′Ct−1F

F ′Ct−1F + δ=

C∗t−1

δ + C∗t−1

.

24

By multiplying left and right equation (2.5) by F ′ and F , respectively, we obtain

C∗t =

1

δ

(1− C∗

t−1

δ + C∗t−1

)C∗

t−1 =1

δ + C∗t−1

.

The proof is complete by stating the prior

ψ0|Σ = S∗0 ∼ Np×1(m∗0, C

∗0S

∗0),

where m∗0 = m′

0F .

References

[1] Agbeyegbe T.D. (1992). Common stochastic trends: evidence from the London Metal

Exchange. Bulletin of Economic Research, 44, 141-151.

[2] Doucet, A., de Freitas, N. and Gordon, N.J. (2001) Sequential Monte Carlo Methods

in Practice. Springer-Verlag, New York.

[3] Durbin, J. and Koopman, S.J. (2001) Time Series Analysis by State Space Methods.

Oxford University Press, Oxford.

[4] Enns, P.G., Machack, J.A., Spivey, W.A. and Wrobleski, W.J. (1982) Forecasting ap-

plications of an adaptive multiple exponential smoothing model. Management Science,

28, 1035-1044.

[5] Franco, G.C. and Souza, R.C. (2002) A comparison of methods for bootstrapping in

the local level model. Journal of Forecasting, 21, 27-38.

[6] Gilbert C.L. (1997). Manipulation of metals futures: lessons from Sumitomo. Centre

for Economic Policy Research, 26, Discussion Paper: 1537.

[7] Gupta, A.K. and Nagar, D.K. (1999) Matrix Variate Distributions. Chapman and

Hall/CRC Monographs and Surveys in Pure and Applied Mathematics 104, New York.

25

[8] Hall S.G. (1991). An Application of the stochastic GARCH-in-mean model to risk pre-

mia in the London Metal Exchange. Manchester School of Economic and Social Studies

(Supplement), 59, 57-71.

[9] Hannan, E.J. (1970) Multiple Time Series. New York: Wiley.

[10] Harrison, P.J. and Stevens, S. (1976) Bayesian forecasting (with discussion). Journal

of the Royal Statistical Society (Series B), 38, 205-247.

[11] Harvey, A.C. (1986) Analysis and generalisation of a multivariate exponential smooth-

ing model. Management Science, 32, 374-380.

[12] Harvey, A.C. (1989) Forecasting, Structural Time Series Models and the Kalman Fil-

ter. Cambridge University Press, Cambridge.

[13] Heaney R. (2002). Does knowledge of the cost model improve commodity futures

price forecasting ability? A case study using the London Metal Exchange lead contract.

International Journal of Forecasting, 18, 45-65.

[14] Lutkepohl, H. (1993) Introduction to Multiple Time Series Analysis. Springer-Verlag,

Berlin.

[15] Machak, J.A., Spivey, W.A. and Wrobleski, W.J. (1983) Analyzing permanent and

transitory influences in multiple time series models. Journal of Business and Economic

Statistics, 1, 57-65.

[16] McKenzie M. Michell H. Brooks R.D. and Faff R.W. (2001) Power ARCH modelling of

commodity futures data on the London Metal Exchange. European Journal of Finance,

7, 22-38.

[17] Mecklin, C.J. and Mundfrom, D.J. (2004) An appraisal and bibliography of tests for

multivariate normality. International Statistical Review, 72, 123-138.

26

[18] Meyer T.O. (1994). The difficulty in cross-hedging London Metal Exchange spot-price

risk using U.S. metal and British pound futures. Journal of Multinational Financial

Management, 4, 141-153.

[19] Moore M.J. and Cullen U. (1995). Speculative efficiency on the London Metal Ex-

change. Manchester School of Economic and Social Studies, 63, 235-256.

[20] Panas E. (2001). Long memory and chaotic models of prices on the London Metal

Exchange. Resources Policy, 27, 235-246.

[21] Quintana, J.M. and West, M. (1987). An analysis of international exchange rates using

multivariate DLMs. The Statistician, 36, 275-281.

[22] Salvador, M. and Gargallo, P. (2004) Automatic monitoring and intervention in multi-

variate dynamic linear models. Computational Statistics and Data Analysis, 47, 401-431.

[23] Salvador, M., Gallizo, J.L. and Gargallo, P. (2003) A dynamic principal components

analysis based on multivariate matrix normal dynamic linear models. Journal of Fore-

casting, 22, 457-478.

[24] Sephton P.S. and Cochrane D.K. (1990). A note on the efficiency of the London Metal

Exchange. Economics Letters, 33, 341-345.

[25] Shephard, N. (1993) Distribution of the ML estimator of an MA(1) and a local level

model. Econometric Theory, 9, 377-401.

[26] Shephard, N. and Harvey, A.C. (1990) On the probability of estimating a deterministic

component in the local level model. Journal of Time Series Analysis, 11, 339-347.

[27] Slade M.E. (1988). Pricing of Metals. CRS Monograph series 22, Queens University

Centre for Resource Studies, Kingston Ontario.

27

[28] Triantafyllopoulos, K. and Montana, G. (2004) Forecasting the London metal ex-

change with a dynamic model. In J. Antoch (ed.) Proceedings in Computational Statistics

2004, Physica-Verlag, Heidelberg, 1885-1892.

[29] Triantafyllopoulos, K. and Pikoulas, J. (2002) Multivariate regression applied to the

problem of network security. Journal of Forecasting, 21, 579-594.

[30] Watkins, C. and McAleer, M. (2004) Econometric modelling of non-ferrous metal

prices. Journal of Economic Surveys (to appear).

[31] West, M. and Harrison, P.J. (1997) Bayesian Forecasting and Dynamic Models. 2nd

edition. Springer Verlag, New York.

[32] Whittle, P. (1963) Prediction and Regulation. English Universities Press, London.

28