a multivariate time series is the

Multivariate Time Series

Consider n time series variables {y1t}, . . . , {ynt}. Amultivariate time series is the (n×1) vector time series{Yt} where the ith row of {Yt} is {yit}. That is, forany time t, Yt = (y1t, . . . , ynt)

0.

Multivariate time series analysis is used when one

wants to model and explain the interactions and co-

movements among a group of time series variables:

• Consumption and income

• Stock prices and dividends

• Forward and spot exchange rates

• interest rates, money growth, income, inflation

Stock and Watson state that macroeconometricians

do four things with multivariate time series

1. Describe and summarize macroeconomic data

2. Make macroeconomic forecasts

3. Quantify what we do or do not know about the

true structure of the macroeconomy

4. Advise macroeconomic policymakers

Stationary and Ergodic Multivariate Time Series

A multivariate time series Yt is covariance stationary

and ergodic if all of its component time series are

stationary and ergodic.

E[Yt] = μ = (μ1, . . . , μn)0

var(Yt) = Γ0 = E[(Yt − μ)(Yt − μ)0]

=

⎛⎜⎜⎜⎝var(y1t) cov(y1t, y2t) · · · cov(y1t, ynt)

cov(y2t, y1t) var(y2t) · · · cov(y2t, ynt)... ... . . . ...

cov(ynt, y1t) cov(ynt, y2t) · · · var(ynt)

⎞⎟⎟⎟⎠

The correlation matrix of Yt is the (n× n) matrix

corr(Yt) = R0 = D−1Γ0D

−1

where D is an (n×n) diagonal matrix with jth diag-

onal element (γ0jj)1/2 =var(yjt)

1/2.

The parameters μ, Γ0 and R0 are estimated from data

(Y1, . . . ,YT ) using the sample moments

Y =1

T

TXt=1

Ytp→ E[Yt] = μ

Γ0 =1

T

TXt=1

(Yt−Y)(Yt−Y)0p→ var(Yt) = Γ0

R0 = D−1Γ0D−1 p→ corr(Yt) = R0

where D is the (n×n) diagonal matrix with the samplestandard deviations of yjt along the diagonal. The

Ergodic Theorem justifies convergence of the sample

moments to their population counterparts.

Cross Covariance and Correlation Matrices

With a multivariate time series Yt each component

has autocovariances and autocorrelations but there

are also cross lead-lag covariances and correlations be-

tween all possible pairs of components. The autoco-

variances and autocorrelations of yjt for j = 1, . . . , n

are defined as

γkjj = cov(yjt, yjt−k),

ρkjj = corr(yjt, yjt−k) =γkjj

γ0jj

and these are symmetric in k: γkjj = γ−kjj , ρkjj = ρ−kjj .

The cross lag covariances and cross lag correlations

between yit and yjt are defined as

γkij = cov(yit, yjt−k),

ρkij = corr(yjt, yjt−k) =γkijqγ0iiγ

0jj

and they are not necessarily symmetric in k. In gen-

eral,

γkij = cov(yit, yjt−k) 6= cov(yit, yjt+k)= cov(yjt, yit−k) = γ−kij

Defn:

• If γkij 6= 0 for some k > 0 then yjt is said to lead

yit.

• If γ−kij 6= 0 for some k > 0 then yit is said to lead

yjt.

• It is possible that yit leads yjt and vice-versa. Inthis case, there is said to be feedback between

the two series.

All of the lag k cross covariances and correlations are

summarized in the (n×n) lag k cross covariance and

lag k cross correlation matrices

Γk = E[(Yt − μ)(Yt−k − μ)0] =⎛⎜⎜⎜⎝cov(y1t, y1t−k) cov(y1t, y2t−k) · · · cov(y1t, ynt−k)cov(y2t, y1t−k) cov(y2t, y2t−k) · · · cov(y2t, ynt−k)

... ... . . . ...cov(ynt, y1t−k) cov(ynt, y2t−k) · · · cov(ynt, ynt−k)

⎞⎟⎟⎟⎠Rk = D

−1ΓkD−1

The matrices Γk and Rk are not symmetric in k but

it is easy to show that Γ−k = Γ0k and R−k = R0k.

The matrices Γk and Rk are estimated from data

(Y1, . . . ,YT ) using

Γk =1

T

TXt=k+1

(Yt − Y)(Yt−k − Y)0

Rk = D−1ΓkD−1

Multivariate Wold Representation

Any (n × 1) covariance stationary multivariate timeseries Yt has a Wold or linear process representationof the form

Yt = μ+ εt +Ψ1εt−1 +Ψ2εt−2 + · · ·

= μ+∞Xk=0

Ψkεt−k

Ψ0 = Inεt ∼WN(0,Σ)

Ψk is an (n× n) matrix with (i, j)th element ψkij. Inlag operator notation, the Wold form is

Yt = μ+Ψ(L)εt

Ψ(L) =∞Xk=0

ΨkLk

elements of Ψ(L) are 1-summable

The moments of Yt are given by

E[Yt] = μ, var(Yt) =∞Xk=0

ΨkΣΨ0k

Long Run Variance

Let Yt be an (n×1) stationary and ergodic multivari-ate time series with E[Yt] = μ. Anderson’s central

limit theorem for stationary and ergodic process states

√T (Y − μ)

d→ N

⎛⎝0, ∞Xj=−∞

Γj

⎞⎠or

YA∼ N

⎛⎝μ, 1T

∞Xj=−∞

Γj

⎞⎠Hence, the long-run variance of Yt is T times the

asymptotic variance of Y:

LRV(Yt) = T · avar(Y) =∞X

j=−∞Γj

Since Γ−j = Γ0j, LRV(Yt) may be alternatively ex-

pressed as

LRV(Yt) = Γ0 +∞Xj=1

(Γj + Γ0j)

Using the Wold representation of Yt it can be shown

that

LRV(Yt) = Ψ(1)ΣΨ(1)0

where Ψ(1) =P∞k=0Ψk.

Non-parametric Estimate of the Long-Run Variance

A consistent estimate of LRV(Yt) may be computed

using non-parametric methods. A popular estimator

is the Newey-West weighted autocovariance estimator

dLRVNW(Yt) = Γ0 +MTXj=1

wj,T ·³Γj + Γ0j

´where wj,T are weights which sum to unity and MT

is a truncation lag parameter that satisfies MT =

O(T 1/3). Usually, the Bartlett weights are used

wBartlettj,T = 1− j

MT + 1

Vector Autoregression Models

The vector autoregression (VAR) model is one of the

most successful, flexible, and easy to use models for

the analysis of multivariate time series.

• Made fameous in Chris Sims’s paper “Macroeco-nomics and Reality,” ECTA 1980.

• It is a natural extension of the univariate autore-gressive model to dynamic multivariate time se-

ries.

• Has proven to be especially useful for describingthe dynamic behavior of economic and financial

time series and for forecasting.

• It often provides superior forecasts to those fromunivariate time series models and elaborate theory-

based simultaneous equations models.

• Used for structural inference and policy analysis.In structural analysis, certain assumptions about

the causal structure of the data under investiga-

tion are imposed, and the resulting causal impacts

of unexpected shocks or innovations to specified

variables on the variables in the model are summa-

rized. These causal impacts are usually summa-

rized with impulse response functions and forecast

error variance

The Stationary Vector Autoregression Model

Let Yt = (y1t, y2t, . . . , ynt)0 denote an (n × 1) vec-

tor of time series variables. The basic p-lag vector

autoregressive (VAR(p)) model has the form

Yt = c+Π1Yt−1 +Π2Yt−2 + · · ·+ΠpYt−p + εt

εt ∼WN (0,Σ)

Example: Bivariate VAR(2)Ãy1ty2t

!=

Ãc1c2

!+

Ãπ111 π112π121 π122

!Ãy1t−1y2t−1

!

+

Ãπ211 π212π221 π222

!Ãy1t−2y2t−2

!+

Ãε1tε2t

!or

y1t = c1 + π111y1t−1 + π112y2t−1+π211y1t−2 + π212y2t−2 + ε1t

y2t = c2 + π121y1t−1 + π122y2t−1+π221y1t−1 + π222y2t−1 + ε2t

where cov(ε1t, ε2s) = σ12 for t = s; 0 otherwise.

Remarks:

• Each equation has the same regressors — laggedvalues of y1t and y2t.

• Endogeneity is avoided by using lagged values ofy1t and y2t.

• The VAR(p) model is just a seemingly unrelatedregression (SUR) model with lagged variables and

deterministic terms as common regressors.

In lag operator notation, the VAR(p) is written as

Π(L)Yt = c+ εt

Π(L) = In −Π1L− ...−ΠpLp

The VAR(p) is stable if the roots of

det (In −Π1z − · · ·−Πpzp) = 0

lie outside the complex unit circle (have modulus greater

than one), or, equivalently, if the eigenvalues of the

companion matrix

F =

⎛⎜⎜⎜⎝Π1 Π2 · · · Πn

In 0 · · · 00 . . . 0 ...0 0 In 0

⎞⎟⎟⎟⎠have modulus less than one. A stable VAR(p) process

is stationary and ergodic with time invariant means,

variances, and autocovariances.

Example: Stability of bivariate VAR(1) model

Yt = ΠYt−1 + εtÃy1ty2t

!=

Ãπ11 π12π21 π22

!Ãy1t−1y2t−1

!+

Ãε1tε2t

!

Then det (In −Πz) = 0 becomes

(1− π11z)(1− π22z)− π12π21z2 = 0

• Stability condition involves cross terms π12 andπ21

• If π12 = π21 = 0 (diagonal VAR) then bivariate

stability condition reduces to univariate stability

conditions for each equation.

If Yt is covariance stationary, then the unconditional

mean is given by

μ = (In −Π1 − · · ·−Πp)−1c

The mean-adjusted form of the VAR(p) is then

Yt − μ = Π1(Yt−1 − μ) +Π2(Yt−2 − μ) + · · ·+Πp(Yt−p − μ) + εt

The basic VAR(p) model may be too restrictive to

represent sufficiently the main characteristics of the

data. The general form of the VAR(p) model with

deterministic terms and exogenous variables is given

by

Yt = Π1Yt−1 +Π2Yt−2 + · · ·+ΠpYt−p+ΦDt +GXt + εt

Dt = deterministic terms

Xt = exogenous variables (E[Xtεt] = 0)

Wold Representation

Consider the stationary VAR(p) model

Π(L)Yt = c+ εt

Π(L) = In −Π1L− ...−ΠpLp

Since Yt is stationary, Π(L)−1 exists so that

Yt = Π(L)−1c+Π(L)−1εt

= μ+∞Xk=0

Ψkεt−k

Ψ0 = In

limk→∞

Ψk = 0

Note that

Π(L)−1 = Ψ(L) =∞Xk=0

ΨkLk

The Wold coefficients Ψk may be determined from

the VAR coefficients Πk by solving

Π(L)Ψ(L) = In

which implies

Ψ1 = Π1

Ψ2 = Π1Π1 +Π2...

Ψs = Π1Ψs−1 + · · ·+ΠpΨs−p

Since Π(L)−1 = Ψ(L), the long-run variance for Yt

has the form

LRVVAR(Yt) = Ψ(1)ΣΨ(1)0

= Π(1)−1ΣΠ(1)−10

= (In −Π1 − ...−Πp)−1Σ (In −Π1 − ...−Πp)

−10

Estimation

Assume that the VAR(p) model is covariance station-

ary, and there are no restrictions on the parameters

of the model. In SUR notation, each equation in the

VAR(p) may be written as

yi = Zπi + ei, i = 1, . . . , n

• yi is a (T × 1) vector of observations on the ithequation

• Z is a (T × k) matrix with tth row given by Z0t =(1,Y0t−1, . . . ,Y

0t−p)

• k = np+ 1

• πi is a (k × 1) vector of parameters and ei is a(T × 1) error with covariance matrix σ2i IT .

Since the VAR(p) is in the form of a SUR modelwhere each equation has the same explanatory vari-ables, each equation may be estimated separately byordinary least squares without losing efficiency relativeto generalized least squares. Let

Π = [π1, . . . , πn]

denote the (k×n) matrix of least squares coefficientsfor the n equations.Let

vec(Π) =

⎛⎜⎝ π1...πn

⎞⎟⎠Under standard assumptions regarding the behaviorof stationary and ergodic VAR models (see Hamilton(1994) vec(Π) is consistent and asymptotically nor-mally distributed with asymptotic covariance matrix

davar(vec(Π)) = Σ⊗ (Z0Z)−1

where

Σ =1

T − k

TXt=1

εtε0t

εt = Yt−Π0Zt

Lag Length Selection

The lag length for the VAR(p) model may be de-

termined using model selection criteria. The gen-

eral approach is to fit VAR(p) models with orders

p = 0, ..., pmax and choose the value of p which min-

imizes some model selection criteria

MSC(p) = ln |Σ(p)|+ cT · ϕ(n, p)

Σ(p) = T−1TXt=1

εtε0t

cT = function of sample size

ϕ(n, p) = penalty function

The three most common information criteria are the

Akaike (AIC), Schwarz-Bayesian (BIC) and Hannan-

Quinn (HQ):

AIC(p) = ln |Σ(p)|+ 2

Tpn2

BIC(p) = ln |Σ(p)|+ lnTT

pn2

HQ(p) = ln |Σ(p)|+ 2 ln lnTT

pn2

Remarks:

• AIC criterion asymptotically overestimates the or-der with positive probability,

• BIC and HQ criteria estimate the order consis-

tently under fairly general conditions if the true

order p is less than or equal to pmax.

Granger Causality

One of the main uses of VAR models is forecasting.

The following intuitive notion of a variable’s forecast-

ing ability is due to Granger (1969).

• If a variable, or group of variables, y1 is found tobe helpful for predicting another variable, or group

of variables, y2 then y1 is said to Granger-cause

y2; otherwise it is said to fail to Granger-cause

y2.

• Formally, y1 fails to Granger-cause y2 if for all

s > 0 the MSE of a forecast of y2,t+s based on

(y2,t, y2,t−1, . . .) is the same as the MSE of a

forecast of y2,t+s based on (y2,t, y2,t−1, . . .) and(y1,t, y1,t−1, . . .).

• The notion of Granger causality does not implytrue causality. It only implies forecasting ability.

Example: Bivariate VAR Model

In a bivariate VAR(p) , y2 fails to Granger-cause y1 if

all of the p VAR coefficient matrices Π1, . . . ,Πp are

lower triangular:Ãy1ty2t

!=

Ãc1c2

!+

Ãπ111 0π121 π122

!Ãy1t−1y2t−1

!+ · · ·

+

Ãπp11 0

πp21 π

p22

!Ãy1t−py2t−p

!+

Ãε1tε2t

!Similarly, y1 fails to Granger-cause y2 if all of the coef-

ficients on lagged values of y1 are zero in the equation

for y2. Notice that if y2 fails to Granger-cause y1 and

y1 fails to Granger-cause y2, then the VAR coefficient

matrices Π1, . . . ,Πp are diagonal.

The p linear coefficient restrictions implied by Granger

non-causality may be tested using the Wald statistic

Wald = (R·vec(Π)− r)0nRh davar(vec(Π))iR0o−1

×(R·vec(Π)− r)

Remark: In the bivariate model, testing H0 : y2 does

not Granger-cause y1 reduces to a testing H0 : π112 =

· · · = πp12 = 0 from the linear regression

y1t = c1 + π111y1t−1 + · · ·+ πp11y1t−p

+π112y2t−1 + · · ·+ πp12y2t−p + ε1t

The test statistic is a simple F-statistic.

Example: Trivariate VAR Model

In a trivariate VAR(p) , y2 and y3 fail to Granger-cause

y1 if πj12 = π

j13 = 0 for all j:⎛⎜⎝ y1t

y2ty3t

⎞⎟⎠ =

⎛⎜⎝ c1c2c3

⎞⎟⎠+⎛⎜⎝ π111 0 0

π121 π122 π123π131 π132 π133

⎞⎟⎠⎛⎜⎝ y1t−1

y2t−1y3t−1

⎞⎟⎠+ · · ·+

⎛⎜⎝ πp11 0 0

πp21 π

p22 π

p32

πp31 π

p32 π

p33

⎞⎟⎠⎛⎜⎝ y1t−py2t−py3t−p

⎞⎟⎠+⎛⎜⎝ ε1tε2tε3t

⎞⎟⎠Note: One can also use a simple F-statistic from a

linear regression in this situation:

y1t = c1 + π111y1t−1 + · · ·+ πp11y1t−p

+π112y2t−1 + · · ·+ πp12y2t−p

+π113y3t−1 + · · ·+ πp13y3t−p + ε1t

Example: Trivariate VAR Model

In a trivariate VAR(p) , y3 fails to Granger-cause

y1 and y2 if all of the p VAR coefficient matrices

Π1, . . . ,Πp are lower triangular:⎛⎜⎝ y1ty2ty3t

⎞⎟⎠ =

⎛⎜⎝ c1c2c3

⎞⎟⎠+⎛⎜⎝ π111 0 0

π121 π122 0π131 π132 π133

⎞⎟⎠⎛⎜⎝ y1t−1

y2t−1y3t−1

⎞⎟⎠+ · · ·+

⎛⎜⎝ πp11 0 0

πp21 π

p22 0

πp31 π

p32 π

p33

⎞⎟⎠⎛⎜⎝ y1t−py2t−py3t−p

⎞⎟⎠+⎛⎜⎝ ε1tε2tε3t

⎞⎟⎠Note: We cannot use a simple F-statistic in this case.

We must use the general Wald statistic.

Forecasting Algorithms

Forecasting from a VAR(p) is a straightforward exten-

sion of forecasting from an AR(p). The multivariate

Wold form is

Yt = μ+ εt +Ψ1εt−1 +Ψ2εt−2 + · · ·Yt+h = μ+ εt+h +Ψ1εt+h−1 + · · ·

+Ψh−1εt+1 +Ψhεt + · · ·εt ∼WN(0,Σ)

Note that

E[Yt] = μ

var(Yt) = E[(Yt − μ)(Yt − μ)0]

= E

⎡⎢⎣⎛⎝ ∞Xk=1

Ψkεt−k

⎞⎠⎛⎝ ∞Xk=1

Ψkεt−k

⎞⎠0⎤⎥⎦

=∞Xk=1

ΨkΣΨ0k

Chain-rule of Forecasting

The best linear predictor, in terms of minimum meansquared error (MSE), ofYt+1 or 1-step forecast basedon information available at time T is

YT+1|T = c+Π1YT + · · ·+ΠpYT−p+1Forecasts for longer horizons h (h-step forecasts) maybe obtained using the chain-rule of forecasting as

YT+h|T = c+Π1YT+h−1|T + · · ·+ΠpYT+h−p|TYT+j|T = YT+j for j ≤ 0.Note: Chain-rule may be derived from state-spaceframework (assume c = 0)⎛⎜⎜⎜⎝

YtYt−1...

Yt−p+1

⎞⎟⎟⎟⎠ =

⎛⎜⎜⎜⎝Π1 Π2 · · · Πp

In 0 ... 0... . . . ... ...0 · · · In 0

⎞⎟⎟⎟⎠⎛⎜⎜⎜⎝

YtYt−1...

Yt−p+1

⎞⎟⎟⎟⎠

+

⎛⎜⎜⎜⎝εt0...0

⎞⎟⎟⎟⎠ξt = Fξt−1 + vt

a multivariate time series is the

Documents