bayesian methods for var modelsapps.eui.eu/personal/canova/teachingmaterial/bvar_eui2013.pdfsims, c....

167
Bayesian methods for VAR Models Fabio Canova EUI and CEPR September 2013

Upload: others

Post on 08-Oct-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

Bayesian methods for VAR Models

Fabio Canova

EUI and CEPR

September 2013

Page 2: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

Outline

� Bayesian Preliminaries and Posterior simulators

� Likelihood function for an M variable VAR(q).

� Priors for BVARs (Di�use, Minnesota (Litterman), General, Hierarchical,DSGE based).

� Forecasting with BVARs.

� Structural (overidenti�ed) BVAR.

� BFAVAR.

� Univariate dynamic panels; endogenous grouping, partial pooling ofVARs.

Page 3: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

References

Korobilis, D. and Bauwens, L. (2012) Bayesian Methods, Handbook of Research Methods

and Applications on Empirical Macroeconomics, edited by Nigar Hashimzade and Michael

Thornton, Edward Elgar Publishing.

Lutkepohl, H., (1991), Introduction to Multiple Time Series Analysis, Springer and Ver-

lag.

Ballabriga, F. (1997) "Bayesian Vector Autoregressions", ESADE, manuscript.

Canova, F. (1992) " An Alternative Approach to Modelling and Forecasting Seasonal

Time Series " Journal of Business and Economic Statistics, 10, 97-108.

Canova, F. (1993a) " Forecasting time series with common seasonal patterns", Journal

of Econometrics, 55, 173-200.

Page 4: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

Canova, F. (2004), "Testing for Convergence Clubs in Income per Capita: A Predictive

Density Approach", International Economic Review, 45(1), 2004, 49-77.

Canova, F. and F. Forero (2012) Estimating Non-recursive, Overidenti�ed, TVC structural

VARs, UPF manuscript.

Del Negro, M. and F. Schorfheide (2004), " Priors from General Equilibrium Models for

VARs", International Economic Review, 45, 643-673.

Del Negro, M. and Schorfheide, F. (2012)," Bayesian macroeconometrics" in J. Geweke,

G.Koop, H. Van Dijk (eds.) The Oxford Handbook of Bayesian econometrics, Oxford

University Press.

Favero, C. (2001) Econometrics, Oxford University Press.

Giannone, D., Primiceri, G. and Lenza, M. (2012) Prior selection for vector autoregression,

Northwestern University, manuscript.

Page 5: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

Girlchrist, S. and Gertler, M., (1994), Monetary Policy, Business Cycles and the Behavior

of Small Manufactoring Firms, Quarterly Journal of Economics, CIX, 309-340.

Kadiyala, R. and Karlsson, S. (1997) Numerical methods for estimation and Inference in

Bayesian VAR models, Journal of Applied Econometrics, 12, 99-132.

Killian, L. (2011) Structural vector autoregressions, University of Michigan, manuscript.

Ingram, B. and Whitemann, C. (1994), "Supplanting the Minnesota prior. Forecasting

macroeconomic time series using real business cycle priors, Journal of Monetary Eco-

nomics, 34, 497-510.

Lindlay, D. V. and Smith, A.F.M. (1972) "Bayes Estimates of the Linear Model", Journal

of the Royal Statistical Association, Ser B, 34, 1-18.

Robertson, J. and Tallman, E. (1999), 'Vector Autoregressions: Forecasting and Reality",

Federal Reserve Bank of Atlanta, Economic Review, First quarter, 4-18.

Page 6: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

Sims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-

ternational Economic Review, 39, 949-968.

Waggoner and T. Zha (2003) A Gibbs Simulator for Restricted VAR models, Journal of

Economic Dynamics and Control, 26, 349-366.

Zellner, A., Hong, (1989) Forecasting International Growth rates using Bayesian Shrinkage

and other procedures, Journal of Econometrics, 40, 183-202.

Zha, T. (1999) "Block Recursion and Structural Vector Autoregressions", Journal of

Econometrics, 90, 291-316.

Page 7: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

1 Preliminaries

Classical and Bayesian analysis di�er on a number of issues

Classical analysis:

� Probabilities = limit of the relative frequency of the event.

� Parameters are �xed, unknown quantities.

� Unbiased estimators useful because average value of sample estimator converge to truevalue via some LLN. E�cient estimators preferable because they yield values closer to

true parameter.

� Estimators and tests are evaluated in repeated samples (to give correct result with highprobability).

Page 8: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

Bayesian analysis:

� Probabilities = degree of (typically subjective) beliefs of a researcher in an event.

� Parameters are random with a probability distributions.

� Properties of estimators and tests in repeated samples uninteresting: beliefs not nec-essarily related to relative frequency of an event in large number of hypothetical experi-

ments.

� Estimators are chosen to minimize expected loss functions (expectations taken withrespect to the posterior distribution), conditional on the data. Use of probability to

quantify uncertainty.

Page 9: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

In large samples (under appropriate regularity conditions):

� Posterior mode �� P! �0 (Consistency)

� Posterior distribution converges to a normal with mean �0 and variance (T � I(�0))�1,where I(�) is Fisher's information matrix (Asymptotic normality).

Classical and Bayesian analyses di�er in small samples and for dealing with unit root

processes.

Page 10: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

Bayesian analysis requires:

� Initial information ! Prior distribution.

� Data ! Likelihood.

� Prior and Likelihood ! Bayes theorem ! Posterior distribution.

� Can proceed recursively (mimic economic learning).

Page 11: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

2 Bayes Theorem and Prior Selection

Parameters of interest � 2 A, A compact. Prior information g(�). Sample information

f(yj�) � L(�jy).

� Bayes Theorem.

g(�jy) = f(yj�)g(�)f(y)

/ f(yj�)g(�) = L(�jy)g(�) � �g(�jy)

f(y) =Rf(yj�)g(�)d� is the unconditional sample density (Marginal likelihood), and it

is constant from the point of view of g(�jy); g(�jy) is the posterior density, �g(�jy) isthe posterior kernel, g(�jy) = �g(�jy)R

�g(�jy)d�.

Page 12: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

� f(y) it is a measure of �t. It tells us how good the model is in reproducing the data,not at a single point, but on average over the parameter space.

� � are regression coe�cients, structural parameters, etc.; g(�jy) is the conditionalprobability of �, given what we observe, y.

� Theorem uses: P (A;B) = P (AjB)P (B) = P (BjA)P (A). It says that if we start withsome beliefs on �, we may modify them if we observe y. It does not says what the initial

beliefs are, but how they should change is data is observed.

Page 13: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

To use Bayes theorem we need:

a) Formulate prior beliefs, i.e. choose g(�).

b) Formulate a model for the data (the conditional probability of f(yj�)).

After observing the data, we treat the model as the likelihood of � conditional on y, and

update beliefs about �.

Page 14: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

� Bayes theorem with nuisance parameters (e.g. �1 long run coe�cients, �2 short run

coe�cients; �1 regression coe�cient; �2 serial correlation coe�cient in the errors).

Let � = [�1; �2] and suppose interest is in �1. Then g(�1; �2jy) / f(yj�1; �2)g(�1; �2)

g(�1jy) =

Zg(�1; �2jy)d�2

=

Zg(�1j�2; y)g(�2jy)d�2 (1)

Posterior of �1 averages the conditional of �1 with weights given by the posterior of �2.

Page 15: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

� Bayes Theorem with two (N) samples.

Suppose yt = [y1t; y2t] and that y1t is independent of y2t. Then

�g � f(y1; y2j�)g(�) = f2(y2j�)f1(y1j�)g(�) / f2(y2j�)g(�jy1) (2)

Posterior for � is obtained �nding �rst the posterior of using y1t and then, treating it as

a prior, �nding the posterior using y2t.

- Sequential learning.

- Can use data from di�erent regimes.

- Can use data from di�erent countries.

Page 16: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

2.1 Likelihood Selection

� It should re ect an economic model.

� It must represent well the data. Misspeci�cation problematic since it spills across

equations and makes estimates uninterpretable.

Page 17: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

2.2 Prior Selection

� Three basic methods to choose priors in theory.

1) Non-Informative subjective. Choose reference priors because they are invariant to the

parametrization.

- Location invariant prior: g(�) =constant (=1 for convenience). Scale invariant prior

g(�) = ��1.

- Location-scale invariant prior : g(�; �) = ��1.

� Non-informative priors useful because many classical estimators (OLS, ML) are Bayesianestimators with non-informative priors

Page 18: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

2) Conjugate Priors

A prior is conjugate if the posterior has the same form as the prior. Hence, the form

posterior will be analytically available, only need to �gure out its posterior moments.

� Important result in linear models with conjugate priors: Posterior moments = weighted

average of sample and prior information. Weights = relative precision of sample and

prior informations.

Page 19: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

Tight Prior

­4 ­2 0 2 40.0

0.5

1.0

1.5

2.0POSTERIORPRIOR

Loose Prior

­4 ­2 0 2 40.0

0.5

1.0

1.5

2.0POSTERIORPRIOR

Page 20: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

3) Objective priors and ML-II approach. Based on:

f(y) =

ZL(�jy)g(�)d� � L(yjg) (3)

Since L(�jy) is �xed, L(yjg) re ects the plausibility of g in the data.If g1 and g2 are two priors and L(yjg1) > L(yjg2), there is better support for g1. Hence,can estimate the "best" g using L(yjg).

In practice, set g(�) = g(�j�), where �= hyperparameters (e.g. the mean and the

variance of the prior). Then L(yjg) � L(yj�).

The � that maximizes L(yj�) is called ML-II estimator and g(�j�ML) is ML-II based

prior.

Page 21: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

Important:

- y1; : : : yT should not be the same sample used for inference.

- y1; : : : yT could represent past time series information, cross sectional/ cross country

information.

- Typically y1; : : : yT is called "Training sample".

Page 22: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

Summary

Inputs of the analysis: g(�); f(yj�).

Outputs of the analysis:

g(�jy) / f(yj�)g(�) (posterior),

f(y) =Rf(yj�)g(�) (marginal likelihood), and

f(yT+� jyT ) (predictive density of future observations).

Likelihood should re ect data/ economic theory.

Prior could be non-informative, conjugate, data based (objective).

Page 23: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

- In simple setups, f(y), g(�jy), f(yT+� jyT ) can be computed analytically.

- In general, they can only be computed numerically by Monte Carlo methods.

- If the likelihood is a non-linear function of the parameters: always need numerical

computations.

Page 24: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

3 Posterior simulators

Objects of interest for Bayesian analysis: E(h(�)) =Rh(�)g(�jy)d�. Occasionally, can

evaluate the integral analytically. In general, it is impossible.

If g(�jy) were available: we could compute E(h(�)) with MC methods:

- Draw �l from g(�jy). Compute h(�l)

- Repeat draw L times. Average h(�l) over draws.

Example 1 Suppose we are interested in computing Pr(� > 0). Draw �l from g(�jy).If �l > 0, set h(�l) = 1, else set h(�l) = 0. Draw L times and average h(�l) over draws.The result is an estimate of Pr(� > 0).

� Approach works because with iid draws the law of large numbers (LLN) insures that

sample averages converge to population averages (ergodicity).

Page 25: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

� By a central limit theorem (CLT) the di�erence between sample and population averageshas a normal distribution with zero mean and some variance as L grows (numerical

standard errors can be used a a measure of accuracy).

- Since g(�jy) is not analytically available, need to use a gAP (�jy), which is close to(g(�jy), and easy to draw from.

� Normal Approximation

� Basic Posterior simulators (Acceptance and Importance sampling).

� Markov Chain Monte Carlo (MCMC) methods

Page 26: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

3.1 Normal posterior analysis

If T is large g(�jy) � f(�jy). If f(�jy) is unimodal, roughly symmetric, and �� (themode) is in the interior of A:

log g(�jy) � log g(��jy) + 0:5(�� ��)0[@2 log g(�jy)

@�@�0j�=��](�� ��) (4)

Since g(��jy) is constant, letting ��� = �[@2 log g(�jy)@�@�0

�1j�=��]

g(�jy) � N(��;���) (5)

- An approximate 100(1-�)% highest credible set is �� � �(�=2)��0:5�� where �(:) the

CDF of a standard normal.

Page 27: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

� Approximation is valid under regularity conditions when T !1 or when the posterior

kernel is roughly normal. It is highly inappropriate when:

- Likelihood function at in some dimension (��� badly estimated).

- Likelihood function is unbounded (no posterior mode exists).

- Likelihood function has multiple peaks.

- �� is on the boundary of A (quadratic approximation wrong).

- g(�) = 0 in a neighborhood of �� (quadratic approximation wrong).

Page 28: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

How do we construct a normal approximation?

A) Find the mode of the posterior.

max log g(�jy) = max(log f(�jy) + log g(�))

- Problem is identical to the one of �nding the maximum of a likelihood. The objective

function di�ers.

Two mode �nding algorithms:

Page 29: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

i) Newton algorithm

- Let L = log g(�jy) (or L = log �g(�jy)). Choose �0.

- Calculate L0 = @L@�(�0) L

00= @2L

@�@�0(�0).

- Set �l = �l�1 � (L00(�l�1jy))�1(L0(�l�1jy)) 2 (0; 1).

- Iterate until convergence i.e. until jj�l � �l�1jj < �, � small.

Fast and good if �0 is good and L close to quadratic. Bad if L00not positive de�nite.

Page 30: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

ii) Conditional maximization algorithm.

Let � = (�1; �2). Start from some (�10; �20). Then

- Maximize L(�1; �2) with respect to �1 keeping �20 �xed. Let ��1 the maximizer.

- Maximize L(�1; �2) with respect to �2 keeping �1 = ��1 �xed. Let ��2 the maximizer.

- Iterate on two previous steps until convergence.

- Start from di�erent (�10; �20), check if maximum is global.

Page 31: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

B) Compute the variance covariance matrix at the mode

- Use the Hessian ��� = �[@2 log g(�jy)@�@�0

�1j�=��]

C) Set gAP (�jy) = N(��;���).

- If multiple modes are present, �nd an approximation to each mode, and set gAP (�jy) =Pi %iN(��i ;���i ) where 0 � %i � 1. If modes are clearly separated select %i = g(��i jy)j���i j�0:5.

- If the sample is small, use a t-approximation i.e. gAP (�jy) =Pi %ig(~�jy)[� + (�� ��i )

0��i(�� ��i )]

�0:5(k+v).

(If � = 1 t-distribution=Cauchy distribution, large overdispersion. Typically � = 4; 5

appropriate. As � increases t! N).

Page 32: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

D) To conduct inference, draw �l from gAP (�jy).

If draws are iid, E(h(�)) = 1L

Pl h(�

l). Use LLN to approximate any posterior probability

contours of h(�), e.g. a 16-84 range is [h(�16); h(�84)].

E) Check accuracy of approximation.

Compute Importance Ratio IRl = �g(�ljy)gAP (�ljy). Accuracy is good if IR

l is constant across l.

If not, need to use other techniques.

Page 33: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

Example:True: g(�jy) is t(0,1,2). Approximation: N(0,c), where c = 3; 5; 10; 100.

100 200 300 400 5000

200

400

600

800

c=3

Fre

quen

cy

1 2 30

50

100

c=5

0.5 1 1.5 2 2.50

50

100

c=10

Weights

Fre

quen

cy

2 4 6 80

50

100

c=100

Weights

Horizontal axis=importance ratio weights, vertical axis= frequency of the weights.

- Posterior has fat tails relative to a normal (poor approximation).

Page 34: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

3.2 Basic Posterior Simulators

� Draw from a general gAP (�jy) (not necessarily normal).

� Non-iterative methods - gAP (�jy) is �xed across draws.

� Work well when IRl is roughly constant across draws.

A) Acceptance sampling

B) Importance sampling

3.3 Markov Chain Monte Carlo Methods

� Problem with basic simulators: approximating density is selected once and for all. If

mistakes are made, they stay. With MCMC location of approximating density changes as

iterations progress.

Page 35: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

� Idea: Suppose n states (x1; : : : xn). Let P (i; j) = Pr(xt+1 = xjjxt = xi) and let

�(t) = (�1t; : : : �nt) be the unconditional probability at t of each state n. Then �(t+1) =

P�(t) = P t�(0) and � is an equilibrium (ergodic, steady state, invariant) distribution if

� = �P .

Set � = g(�jy), choose some initial density �(0) and some transition P across states. If

conditions are right, iterate from �(0) and limiting distribution is g(�jy), the unknownposterior.

Page 36: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

                                      g(α|y)

                                                             gMC(1)

 gMC(0)

α

� Under general conditions, the ergodicity of P insures consistency and asymptotic nor-mality of estimates of any h(�).

Page 37: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

Need a transition P (�;A), where A is some set, such that jjP (�;A)��(�)jj ! 0 in the

limit. For this need that the chain associated with P :

� is irreducible, i.e. it has no absorbing state.

� is aperiodic, i.e. it does not cycle across a �nite number of states.

� it is Harris recurrent, i.e. each cell is visited an in�nite number of times with probabilityone.

Page 38: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

Bad draws Good draws

    A        B      ABB

Result 1: A reversible Markov chain, has an ergodic distribution (existence). (if �iPi;j =

�jPj;i then (�P )j =P

j �iPi;j =P

i �jPj;i = �jP

i Pj;i = �j.)

Result 2: (Tierney (1994)) (uniqueness) If a Markov chain is Harris recurrent and has a

proper invariant distribution. �(�), �(�) is unique.

Result 3: (Tierney(1994)) (convergence) If a Markov chain with invariant �(�) is Harris

recurrent and aperiodic, for all �0 2 A and all A, as L!1. - jjPL(�0; A)��(�)jj ! 0,

jj:jj is the total variation distance.

Page 39: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

- For all h(�) absolutely integrable with respect to �(�).

- limL!11L

PLl=1 h(�

l)a:s:!Rh(�)�(�)d�.

If chain has a �nite number of states, it is su�cient for the chain to be irreducible, Harris

recurrent and aperiodic that P (�l 2 A1j�l�1 = �0; y) > 0, all �0; A1 2 A.

� Can dispense with the �nite number of state assumption.

� Can dispense with the �rst order Markov assumption.

Page 40: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

General simulation strategy:

� Choose starting values �0, choose a P with the right properties.

� Run MCMC simulations.

� Check convergence.

� Summarize results i.e compute h(�).

Page 41: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

1) MCMC methods generate draws which are correlated (with normal/basic simulators,

posterior draws are iid).

2) MCMC methods generate draws from posterior only after a burn-in period (with nor-

mal/basic simulators, �rst draw is from the posterior).

3) MCMC methods only need the kernel �g(�jy) (no knowledge of the normalizing con-stants is needed).

Two main algorithm in this class: Metropolis-Hastings and Gibbs sampler.

Page 42: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

3.3.1 Gibbs sampler

Partition � = (�1; �2; :::; �K) so that g(�kj�k0; y; k0 6= k) are analytically available.

Then:

- Choose initial values �(o)1 ; �

(o)2 ; :::; �

(o)K .

- For l = 1; 2; : : :, draw �lk as follows

i) �(l)1 from g(�1j�(l�1)2 ; :::; �

(l�1)K ; y).

ii) �(l)2 from g(�2j�(l)1 ; :::; �

(l�1)K ; y).

iii) ...

iv) �(l)K from g(�kj�(l)1 ; :::; �

(l)K�1; y):

- Repeat L times

Page 43: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

Step 2 de�nes a transition from �l�1 to �l. Step 3 produces a sequence which is therealization of a Markov chain with transition

P (�l; �l�1) =

KYk=1

g(�lkj�l�1k0 (k0 > k); �lk0(k

0 < k); y) (6)

If L is large, �L = (�L1 ; �L2 ; :::; �

LK) is a draw from g(�jy) and �Lk ; k = 1; : : :K is a draw

from g(�kjy)

Page 44: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

� Main value of the algorithm: g(�kj�k0; y) typically easy to compute (use conjugateanalysis) and cheap to sample.

� Gibbs sampler works well when �k independent of �k0: group together highly correlatedcomponents (e.g. the regression parameters or the variance parameters).

Intuition for Gibbs sampler come from integration by parts:

Choose a �02. Draw �11 from g(�1j�02; y), then draw �12 from g(�2j�11; y), �21 fromg(�1j�12; y), etc. Each step can be thought as being a draw by parts from g(�1j�2; y)g(�2jy).Since we start from an arbitrary initial condition, we want to eliminate such arbitrariness,

dropping the �rst �L observations.

Page 45: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

Example 2 Suppose f(x; y) / n!x!(n�x)!y

x+�0�1(1�y)n�x+�1�1, x = 0; 1; : : : n; 0 � y � 1(binomial density for (x,y)) and the consider marginal of f(x). Direct integration leadsto

f(x) / n!

x!(n� x)!

�(�0 + �1)

�(�0)�(�1)

�(x+ �0)�(n� x+ �1)

�(�0 + �1 + n)

which is the beta-binomial distribution. Hence, f(xjy) is binomial with parameters (n; y),and f(yjx) is Beta with parameters (x+ �0; n� x+ �1).

Page 46: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

Figure: true/ Gibbs sampling marginal distribution for x with L = 500; n = 100; �0 =2; �1 = 4 and �L = 20.

Page 47: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

Example 3 (seemingly unrelated regression) Let yit = x0it�i + eit; et = (e1t; : : : emt)0 �N(0;�e), i = 1; : : : ;m; t = 1; : : : ; T ; �i k � 1 vector. Stacking observationsyt = xt� + et; yt = (yit; : : : ; ymt)0; xt = diag(x0it; : : : ; x

0mt), � = (�01; : : : �

0m)

0 is mk � 1vector. Suppose g(�;��1) = g(�)g(��1). Posterior kernel is:

�g(�;��1jy) = g(�)g(��1)j��1j0:5T expf�0:5Xt

(yt � xt�)0��1(yt � xt�)g (7)

� The target density is g(�;��1jy) = �g(�;��1)R�g(�;��1)d�d�

.

� Assume a conjugate Normal-Wishart prior for � and ��1.

� Conditional posteriors (�jY;��1) � N(~�; ~��) and (��1j�; Y ) � W (T + v0; ~�) where

~� = ~��1� (����� +P

t xt��1e yt); ~�� = (���1� +

Pt xt�

�1xt)�1 and ~� = (��1 +P

t(yt �xt�ols)(y�xt�ols)0)�1, where (��; ���) are the prior mean and variance of �, � is the scalematrix of the prior for ��1 and �ols is the OLS estimator of �.

� Use � and � as two Gibbs sampler blocks. When L is large obtain a sample suchthat �L � g(�jy1; : : : yt); ��1(L) � g(��1jy1; : : : ; yt) and (�L;��1(L)) is a draw fromg(�;��1jy).

Page 48: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

3.3.2 Metropolis-Hastings algorithm

MH is a general purpose MCMC algorithm that can be used when the Gibbs sampler are

either not usable or di�cult to implement.

Starts from an arbitrary transition function q(�y; �l�1), where �l�1; �y 2 A and an

arbitrary �0 2 A. For each l = 1; 2; : : : L.

- Draw �y from q(�y; �l�1) and draw $ � U(0; 1).

- If $ < E(�l�1; �y) = [ �g(�yjY )q(�y;�l�1)

�g(�l�1jY )q(�l�1;�y)], set �` = �y.

- Else set �` = �`�1:

These iterations de�ne a mixture of continuous and discrete transitions:

P (�l�1; �l) = q(�l�1; �l)E(�l�1; �l) if �l 6= �l�1

= 1�ZA

q(�l�1; �)E(�l�1; �)d� if �l = �l�1 (8)

Page 49: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

P (�l�1; �l) satis�es the conditions needed for existence, uniqueness and convergence.

� Idea: Want to sample from highest probability region but want to visit as much as

possible the parameter space. How to do it? Choose an initial vector and a candidate,

compute kernel of posterior at the two vectors. If you go uphill, keep the draw, otherwise

keep the draw with some probability.

Page 50: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

If q(�l�1; �y) = q(�y; �l�1), (Metropolis version of the algorithm) E(�l�1; �y) = �g(�l�1jY )�g(�yjY ) .

If E(�l�1; �y) > 1; the chain moves to �y. Hence, keep the draw if you move uphill. If

the draw moves you downhill stay at �l�1 with probability 1 � E(�l�1; �y); and explorenew areas with probability equal to E(�l�1; �y).

Important: q(�l�1; �y) is not necessarily equal (proportional) to posterior - histograms of

draws not equal to the posterior. This is why we use a scheme which accepts more in the

regions of high probability.

Page 51: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

How do you choose q(�l�1; �y) (the transition probability)?

- Typical choice: random walk chain. q(�y; �l�1) = q(�y��l�1), and �y = �l�1+v where

v � N(0; �2v). To get "reasonable" acceptance rates adjust �2v. Often �2v = c ��;� =[�g00(��jy)]�1. Choose c.

- Re ecting random walk: �y = �+ (�l�1 � �) + v

- Independent chain q(�y; �l�1) = �q(�y),E(�l�1; �y) = min[ w(�y)

w(�l�1); 1], where w(�) =

g(�jY )�q(�)

. Monitor both the location and the shape of �q to get reasonable acceptance rates.

Standard choices for �q are normal and t.

Page 52: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

� General rule for selecting q. A good q must:

a) be easy to sample from

b) be such that it is easy to compute E.

c) each move goes a reasonable distance in parameter space but does not reject too

frequently (ideal rejection rate 30-50%).

Page 53: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

Implementation issues

A) How to draw samples?

- Produce one sample (of dimension n�L+ �L). Throw away initial �L observations. Keeponly elements (L; 2L; : : : ; n � L) (to eliminate the serial correlation of the draws).

- Produces n samples of �L + L elements. Use last L observations in each sample for

inference.

B) How long should be �L? How do you check convergence?

- Start from di�erent �0. Check if sample you keep, for a given �L, has same properties

(Dynare approach).

- Choose two points, �L1 < �L2; compute distributions/moments of � after these points.

If visually similar, algorithm has converged at �L1. Could this recursively ! CUMSUM

statistic for mean, variance, etc.(checks if it settles down, no testing required).

For simple problems �L � 50 and L � 200. For DSGEs �L � 100; 000 � 200; 000 andL � 500; 000. If Multiple modes are present L could be even larger.

Page 54: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

C) Inference : easy.

- Weak Law of Large Numbers E(h(�)) � 1j

Pnj=1 h(�

jL); where �jL is the j � L-thobservation drawn after �L iterations are performed.

- E(h(�)h(�)0) =PJ(L)

�J(L)w(�)ACFh(�); ACFh(�) = autocovariance of h(�) for draws

separated by � periods; J(L) function of L, w(�) a set of weights.

- Marginal density (�1k; : : : �Lk ): g(�kjy) =

1L

PLj=1 g(�kjy; �

jk0; k

0 6= k).

- Predictive inference f(yt+� jyt) =Rf(yt+� jyt; �)g(�jyt)d�.

- Model comparisons: compute marginal likelihood numerically.

Page 55: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

Example 4 Consider a joint bivariate normal distribution for z = (x; y) with mean �z =(1; 2), variance 1 and covariance 0.8. A scatter plot (using 4000 draws) from this bivariatedistribution is in the �rst box of the left column of �gure : ellipsoids are very thin andpositively inclined.

Use a MH algorithm with re ecting random walk generating density z+ �z = (z+� �z)+ vwhere v is uniformly distributed in the interval [�0:5; 0:5] for both coordinate. The

probability of accepting the draw is equal to min( exp[�0:5(z+��z)��1(z+��z)]

exp[�0:5(zl�1��z)��1(zl�1��z)]; 1).

Use a Gibbs sampler based on (xjy) � N(1 + �(y � 2); 1 � �2); (yjx) � N(2 + �(x �1); 1� �2) where � is the correlation coe�cient.

Both OK. The Gibbs sampling slightly better (acceptance rate MH algorithm is 35%) (g(x) calculated by averaging over values of y).

Page 56: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational
Page 57: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

4 Why BVAR?

- VARs have lots of parameters to be estimated. If they are used for forecasting, their

performance is poor.

- Even if they are used for structural analysis, parameter uncertainty is a concern.

- Impossible to incorporate prior views of the client into classical analysis.

- BVAR are a exible way to incorporate extraneous (client) information. They can also

help to reduce the dimensionality of the parameter space.

Page 58: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

4.1 Likelihood function of an M variable VAR(q)

Consider an M � 1 VAR model with q lags each (k=Mq coe�cients each equation, Mktotal coe�cients in total), no constant.

yt = B(L)yt�1 + et et � N(0;�e)

Letting B = [B1; : : : Bq];Xt = [yt�1; : : : yt�p]; � = vec(B), the VAR is:

y = (IM X)� + e e � (0;�e IT ) (9)

where y; e are MT � 1 vectors, IM is the identify matrix, and � is a Mk � 1 vector.Conditioning on initial observations yp = [y�1; : : : ; y�q]:

L(�;�ejy; yp) =1

(2�)0:5MTj�e IT j�0:5

� expf�0:5(y � (IM X)�)0(��1e IT )(y � (IM X)�)g

Some manipulations of the likelihood function:

(y � (IM X)�)0(��1e IT )(y � (IM X)�) =

(��0:5e IT )(y � (IM X)�)0(��0:5e IT )(y � (IM X)�) =

Page 59: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

[(��0:5e IT )y � (��0:5e X)�)]0[(��0:5e IT )y � (��0:5e X)�)]

Also (��0:5e IT )y� (��0:5e X)� = (��0:5e IT )y� (��0:5e X)�+(��0:5e X)(���)where � = (��1e X 0X)�1(��1e X)y. Therefore:

(y � (IM X)�)0(��1e IT )(y � (IM X)�) =

((��0:5e IT )y � (��0:5e X)�)0((��0:5e IT )y � (��0:5e X)�) +

(� � �)0(��1e X 0X)(� � �)

Page 60: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

Putting the pieces together:

L(�;�e) / j�e IT j�0:5expf�0:5((� � �)0(��1e X 0X)(� � �)

� 0:5[(��0:5e IT )y � (��0:5e X)�)0

[(��0:5e IT )y � (��0:5e X)�)]g= j�ej�0:5kexpf�0:5(� � �)0(��1e X 0X)(� � �)g� j�ej�0:5(T�k)expf�0:5tr[(��0:5e IT )y

� (��0:5e X)�)0(��0:5e IT )y � (��0:5e X)�)]g/ N(�j�;�e; X; y; yp)� iW (�ej�; X; y; yp; T � �) (10)

where tr = trace of the matrix.

Page 61: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

� The conditional likelihood of a VAR(q) is the product of Normal density for � conditionalon � and �e, and an inverted Wishart distribution for �e, conditional on �, with scale

(y � (x �e)�)0(y � (x �e)�), where (T � �) degrees of freedom; � = k +M + 1.

� Bayesian inference: combine likelihood with a prior.

i)If the prior is conjugate and the hyperparameters (parameters of the prior) are known

(or estimated): closed form solution for the conditional and marginal of � and marginal

of �e are available.

ii) If hyperparameters are random, need numerical MC methods to get conditional and

marginal distributions, even if prior is conjugate.

What priors conjugate with (10)?

Page 62: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

5 Conjugate priors for VARs

1. Di�use prior for both � and �e.

2. Normal prior for � with �e �xed.

3. Normal prior for �, di�use prior for �e (semi-conjugate)

4. Normal for �j�e, inverted Wishart for �e (conjugate).

Page 63: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

Case 1: P (�;�e) / j�ej�0:5(M+1). This is typically called Je�rey's ( at) prior.

Joint posterior p(�;�ejY ) = L(�;�ejY )p(�;�e).

Posterior is similar to the likelihood: there is only an extra term in the normalizingconstant. Thus

p(�j�e; Y ) = N(�j�;�e; X; y; yp) (11)

p(�ejY ) = iW (�ej�; X; y; yp; T � k) (12)

k number of parameters in each equation.

Note: g(�jy) (the marginal of �) is a t-distribution with parameters ((X 0X); (y �XB)0(y �XB); B; T � k), where a B = (X 0X)�1(X 0Y ), � = vec(B).

Page 64: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

� If the prior is di�use, the mean of the posterior is the OLS estimator. Classical analysisis equivalent to Bayesian analysis with at prior and a quadratic loss function ( so that

the posterior mean is the optimal point estimator)

� Simulations can be done in two steps

1. Draw �e from the posterior inverted Wishart

2. Conditional on the draw for �e, draw � from a multivariate normal.

Alternatively

1'. Draw � from the a multivariate t-distribution.

Page 65: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

Case 2: � = �� + v; v � N(0;�b), where ��;�b known.

Then the prior is:

g(�) / j�bj�0:5exp[�0:5(� � ��)0��1b (� � ��)]= j�bj�0:5exp[�0:5(��0:5b (� � ��))0��0:5b (� � ��)] (13)

Page 66: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

Posterior:

g(�jy) / g(�)L(�jy)= j�bj�0:5 expf�0:5(��0:5b (� � ��))0��0:5b (� � ��)g � j�e IT j�0:5

� exp f(��0:5e IT )y � (��0:5e X)�)0(��0:5e IT )y � (��0:5e X)�)g= exp f�0:5(z � Z�)0(z � Z�)g= exp f�0:5(� � ~�)0Z 0Z(� � ~�) + (z � Z~�)0(z � Z~�)g (14)

where z = [��0:5b��; (��0:5e IT )y]0; Z = [�

�0:5b ; (��0:5e X)]0 and

~� = (Z 0Z)�1(Z 0z) = [��1b + (��1e X 0X)]�1[��1b�� + (��1e X)0y] (15)

Page 67: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

Since �e and �b are �xed, the second term in (14) is a constant and

g(�jy) / exp[�0:5(� � ~�)0Z 0Z(� � ~�)] (16)

/ exp[�0:5(� � ~�)0~��1b (� � ~�)] (17)

Conclusion: g(�jy) is N(~�; ~�b) where ~�b = [��1b + (��1e X 0X)]�1.

- If �e is unknown, use �e =1

T�1e0e in formulas, where et = yt�(IX)� and � = �ols.

- The ~� obtained with this prior is related to the classical least square estimator under

uncertain linear restrictions.

Page 68: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

Model

yt = xtB + et et � (0; �2)�B = B � � � � (0;�b) (18)

where B = [B1; : : : Bq]0; xt = [yt�1; : : : yt�q]. Set zt = [yt; �B]0; Zt = [xt; I]0; Et = [et; �]0.

Then zt = ZtB + Et where Et � (0;�E),�E is known, t = 1; : : : ; T . Thus:

BGLS = (Z 0��1E Z)�1(Z 0��1E z) = ~B (Theil' s mixed estimator).

� Prior on VAR coe�cients can be treated as a dummy observation added to the systemof VAR equations.

� Prior can treat it as initial condition. If we write the initial observation as y0 = x0B+e0,

then y0 = �2W�1 �B; x0 = �2W�1, e0 = �2W�1�, WW 0 = �b.

Page 69: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

Special case 1: Ridge Estimator

Consider a univariate model. If �B = 0; �e = I � �2e, �b = I � �2v,

~B = (Iq + �(X 0X)�1)�1B (19)

where � = �2e�2vand B = (X 0X)�1(X 0Y ).

- Prior re ects the belief that all the coe�cients of an AR(q) are small.

- Posterior estimator increases the smallest eigenvalues of the data matrix by a factor �

(useful when q is large: (X 0X) matrix ill-conditioned)

Page 70: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

Special case 2: Litterman (Minnesota) setup

Multivariate setup. Now ��;�� have special structure: �� = 0 except ��i1 = 1. �b = �b(�)where:

�ij;` =�0h(`)

if i = j

= �0�1h(`)

� (�i�j)2 otherwise (20)

= �0 � �2 for exogenous variables (21)

�0 = tightness on the variance of the �rst lag; �1 = relative tightness on other variables;

h(l) = tightness of the variance of lags other than the �rst one; (decay parameter);

(�i�j)2scaling factor.

Typically, h(`) regulated by one (decay) parameter. Useful structures: harmonic decay

h(`) = l�3; geometric decay h(`) = ��`+13 ; linear decay h(`) = l.

Page 71: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

Own lags

Lag 

1

­5.0 ­2.5 0.0 2.5 5.00.00.10.20.30.40.5

Lag 

2

­4 ­2 0 2 40.0

0.1

0.2

0.3

0.4

0.5

Lag 

3

­4 ­2 0 2 40.0

0.1

0.2

0.3

0.4

0.5

Other lags

Lag 

1

­4 ­2 0 2 40.00.10.20.30.40.5

Lag 

2

­4 ­2 0 2 40.0

0.1

0.2

0.3

0.4

0.5

Lag 

3

­4 ­2 0 2 40.0

0.1

0.2

0.3

0.4

0.5

Page 72: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

Logic for this (shrinkage) prior:

- Mean chosen so that the VAR is M a-priori random walks (good for forecasting).

- �b very big. Decrease dimensionality by setting �b = �b(�).

- �b is a-priori diagonal (no expected relationship among equations and coe�cients); �0

is the relative importance of prior to the data.

- The variance of lags of LHS variables shrinks to zero as lags increase. Variance of lags of

other RHS variables shrinks to zero at a di�erent rate (governed by �1). �1 � 1 relative

importance of other variables.

- Variance of the exogenous variables is regulated by �2. If �2 is large, prior information

on the exogenous variables di�use.

Page 73: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

Example 5 Bivariate VAR(2) with h(`) = `:

�� = [1; 0; 0; 0; 0; 1; 0; 0]

�b =

2666666666664

�0 0 0 0 0 0 0 00 �0�1(

�1�2)2 0 0 0 0 0 0

0 0 �02

0 0 0 0 0

0 0 0 �02�1(

�1�2)2 0 0 0 0

0 0 0 0 �0�1(�1�2)2 0 0 0

0 0 0 0 0 �0 0 0

0 0 0 0 0 0 �02�1(

�1�2)2 0

0 0 0 0 0 0 0 �02

3777777777775

Page 74: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

- If �b is diagonal, �1 = 1 and the same variables belong to all equations, then ~�=vec(~�i),

where ~�i computed equation by equation. In other setups, �b is not diagonal and this

result does not hold.

- Let � = (�; vech(�b)). Minnesota prior makes � = �(�), � small dimension. Better

estimates of � than for � from the data. Better forecasts than univariate ARIMA models

or traditional multivariate SES (see e.g. Robertson and Tallman (1999)).

- Standard approaches: "unimportant" lags purged using t-test. (see e.g. Favero (2001)).

Strong a-priori restrictions on what variables and which lags enter in the VAR. Unpalat-

able.

- Minnesota prior imposes probability distributions on VAR coe�cients (uncertain linear

restrictions). It gives a reasonable account of the uncertainty faced by an investigator.

Page 75: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

� How do we choose � = (�0; �1; �2; : : :) and (�i�j)2?

1) Use rules of thumb. Typical default values: �0 = 0:2; �1 = 0:5; �2 = 105, an harmonic

speci�cation for h(`) with �3 = 1 or 2, implying loose prior on lagged coe�cients and

uninformative prior for the exogenous variables.

2) Estimate them using ML-II approach. That is, maximize L(�jy) =Rf(�jy; �)g(�j�)d�

on training sample.

3) Set up prior g(�), produce hierarchical posterior estimates. For this we need MCMC

methods.

Page 76: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

Example 6 Consider yt = Bxt+ut, B scalar, ut � N(0; �2u), �2u known and let B = �B+�

where � � N(0; �2�), �B �xed and �2� = q(�)2, where � is a set of hyperparameters.

Then yt = �Bxt + �t where �t = et + �xt and posterior kernel is:

�g(�; �jy) = 1

(2�)0:5�u��expf�0:5(y �Bx)2

�2u� 0:5(B �

�B)2

�2�g (22)

where y = [y1; : : : yt]0; x = [x1; : : : xt]0. Integrating B out of (22):

~g(�jy) = 1

(2�q(�)2trjX 0Xj+ �2u)0:5expf�0:5 (y � �Bx)2

�2u + q(�)2trjX 0Xjg (23)

Maximize (23) using gradient or grid methods. Alternative: compute prediction errordecomposition of �g(�jy) with the Kalman �lter; �nd modal estimates of �.

Page 77: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

- Recent applications of this method.

i) Giannone, Primiceri, Lenza (2012): employ marginal likelihood to choose the informa-

tiveness of prior restrictions.

Idea: � � N(��;��) where � is a scalar � the covariance matrix of VAR shocks and a known scale matrix. problem choose � in an optimal way.

ii) Belmonte, Koop, Korobilis (2012): employ marginal likelihood to choose the informa-

tiveness of prior distribution for time variations in coe�cients and in the variance.

iii) Carriero, Kapetanios, Marcellino (2011): employ marginal likelihood to select the

variance of the prior from a grid.

Page 78: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

Hierarchical VAR model used in these cases is

yt = (I X)� + e e � N(0;�) (24)

� = �� + v v � N(0;� � �) (25)

� = �� + � � � N(0; �) (26)

��; ; ��; � known (or estimable).

- Need to compute the joint posterior of (�; �;�).

- Interest is in g(�jy;X; yp) =Rg(�; �;�jy;X; yp)d�d�.

- Typically impossible to compute g(�jy;X; yp) analytically. One example when this ispossible is in Canova (2007, chapter 9). Otherwise, use MCMC methods to get draws

from this distribution.

Page 79: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

Simulation

How do you similar from the posteriors with case 2 priors?

Easy since � is �xed. Thus

1. Draw � from the normal posterior keeping � �xed.

Page 80: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

Example 7 (Forecasting in ation rates in Italy)

- Large changes in the persistence of in ation: AR(1) coe�cient is 0.85 in 1980s and0.48 in 1990s.

- Which model to use? Univariate ARIMA; VAR(4) with annualized three month in ation,rent in ation and the unemployment rate; two trivariate BVAR(4) (one with arbitraryhyperparameters 0.2, 1, 0.5; one with optimal ones =(0.15, 2.0, 1.0)). Report one yearahead Theil-U Statistics.

Sample ARIMA VAR BVAR1 BVAR21996:1-2000:4 1.04 1.47 1.09 (0.03) 0.97 (0.02)1990:1-1995:4 0.99 1.24 1.04 (0.04) 0.94 (0.03)

- Di�cult to forecast; VAR poor, BVAR better.

- Results robust to changes of the forecasting sample.

Page 81: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

Results for other prior structures (Kadiyala and Karlsson (1997)):

Case 3): g(�;�e) is Normal-di�use, i.e. g(�) � N(��; ��b); �� and �b known, and

g(�e) / j�ej�0:5(M+1). This prior is semi-conjugate. Thus the conditional posteriors

are of the same form as case 2) (moments are di�erent) but the marginal posterior

g(�jy) / expf0:5(��~�)0���1b (��~�)g�j(y�XB)0(y�XB)+(B�B)0(X 0X)(B�B)j�0:5T

has an unknown format.

Case 4): g(�j�e) � N(��;�e �) and g(�e) � iW (��; ��). Then g(�j�e; y) � N(~�;�e~); g(�ejy) � iW (~�; T + ��) where ~ = (��1 +X 0X)�1; ~� = B0X 0XB + �B0��1 �B +

�� + (y �XB)0(y �XB)� ~B(��1 +X 0X) ~B; ~� = ~(��1�� +X 0X�). Marginal of � is

t(~�1; ~�e; ~B; T + ��).

- In cases 3)-4) there is posterior dependence among the equations (even with prior

independence and �1 = 1).

Page 82: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

� Any additional uncertain restrictions on the coe�cients can be tagged on to thesystem in exactly the same way as in case 1).

i) Quasi-deterministic seasonality

Example 8 In quarterly data, a prior for a bivariate VAR(2) with 4 seasonal dummies hasmean �� = [1; 0; 0; 0; 0; 0; 0; 0j0; 0; 1; 0; 0; 0; 0; 0] and the block of �a corresponding to theseasonal dummies has diagonal elements �dd = �0�s where �s is the tightness of seasonalinformation (large �s means little prior information).

Page 83: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

ii) Stochastic seasonality: there is peak in spectrum at !q =�2or � or both (quarterly

data).

Let yt = D(`)et. If there is a peak at !q : jD(!q)j2 is large or jB(!q)j2 small, whereB(`) = D(`)�1.

A small jB(!q)j2 impliesP1

j=1Bjcos(j!q) � �1. This is a "sum-of-coe�cients" restric-tions.

In a VAR model: 1+P1

j=1Bjcos(j!q) � 0, Bj where AR coe�cients in equation j. (see

Canova, 1992).

Set R� = r + v, r = [�1; : : : ;�1]0 and R is a 2 �Mk. For quarterly data, if the �rst

variable of the VAR displays seasonality at �2and �:

Page 84: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

R =

�0 �1 0 1 0 �1 : : : 0�1 1 �1 1 �1 1 : : : 0

�Add these restrictions to original prior. Use Theil's Mixed estimator.

iii) Trend restrictions on variable i:P1

j=1Bji � �1;

iv) Cyclical peak restrictionP1

j=1Bjicos(j!) � �1 for all ! 2 (2�d ��), some d, � small,i = 1; 2; : : :.

v) High coherence at frequency �2in series i and i0 of a VAR implies that

P1j=1(�1)jBi0i0(2j)+P1

j=1(�1)jBii(2j) � �2.

Page 85: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

Some tips

- If hyperparameters are treated as �xed, we need some sensitivity analysis. Rule-of-thumb

parameters work well for forecasting. Do they work well in structural estimation?

- You can set prior mean or prior variance as you wish (after all this is a prior!!). In all

cases we consider, the covariance matrix has a Kroneker product form (easy to compute).

- What are the gains from using fully hierarchical methods (relative to empirical based

or rules of thumb)? Not much is known (see Giannone et al. (2012), Carriero et al.

(2012)).

Page 86: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

General Hierarchical Bayesian VARs

yt = Xt� + e e � N(0;�) (27)

� = M0� + v v � N(0; D0) (28)

� = M1�+ � � � N(0; D1) (29)

M0;M1; D1 known; Xt = (I Xt). Priors:

p(�) � iW ( �S; s); p(D0) � iW ( �D0; �); p(�) / 1.Conditional Posteriors:

1) (�j ��; Y;X ) � N(~�; ~).

2) (�j ��; Y;X ) � iW (~�; s+ T )

3) (�j ��; Y;X ) � N( ~D1(D�11 M1�+M 0

0D�10 �); ~D1)

4) (D0j �D0; Y;X ) � iW ( ~D0; �+ 1)

5)(�j ��; Y;X ) � N(�;��)

where

Page 87: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

~ = (D�10 +

PtX

0t��1Xt)�1;

~� = ~(D�10 M0� +

PtX

0t��1yt);

~��1 = �S +P

t(Yt �Xt�)(yt �Xt�)0;

~D1 = (D�11 +M 0

0D�10 M0)�1;

~D0�1= D�1

0 +PM

g=1(�g � �)(�g � �)0

� = (M 01M1)�1(M1�)

�� = (� � �M1)0(� � �M1)

Use these conditional posterior in the Gibbs sampler (see later).

Page 88: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

5.1 DSGE priors for VARs

Log linearized solution of a DSGE model:

y1t+1 = �(�)y1t + et+1 (30)

y2t = �(�)y1t (31)

y1t are exogenous and endogenous (d1 � 1) states; y2t are endogenous (d2 � 1) controls;et+1 are the innovations in the shocks; �(�);�(�) function of �, structural parameters.Letting yt = [y2t; y1t]0 the system is:�

0 00 Id1

�yt+1 =

��Id2 �(�)0 �(�)

�yt +

�0et+1

�(32)

or

B0yt+1 = B1(�)yt + ut+1

� (Log-)linear DSGE solution is a restricted VAR!

Page 89: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

Given g(�), the model implies priors g(�(�)) and g(�(�)) for the decision rule coe�cients

and thus a prior � = [B(`)].

� A DSGE model implies restrictions on the VAR coe�cients. It can be used to link (ina hierarchical fashion) the VAR coe�cients � and DSGE parameters �).

Note: if � � N(��;��); vec(�(�)) � N(vec(�(��)); @vec(�(�))@�

��@vec(�(�))

@�0);

vec(�(�)) � N(vec(�(��)); @vec(�(�))@�

��@vec(�(�))

@�0).

Example 9 Consider a VAR(q) yt+1 = B(`)yt+ut. From (32) g(B1) is normal with mean

BG0 B1(��), BG0 is the generalized inverse of B0 and variance �b = �G0 �b1�G0

0 ; �G0 = vec(BG0 );

�b1 is the variance of vec(B1(�)). A DSGE prior on B`; ` � 2 has a dogmatic form: meanzero and zero variance.

Page 90: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

If there are unobservables, want a prior for VAR with observables only.

Example 10 (RBC prior: Ingram and Whiteman, 1994). A RBC model with utility func-tion U(c; n) = log(ct) + log(1� nt) implies�

Kt+1

lnAt+1

�=

� kk ka0 �

� �Kt

lnAt

�+

�0et+1

�� �

�Kt

lnAt

�+ ut+1 (33)

�ct nt yt it

�0= �

�Kt

lnAt

�(34)

Kt is the capital stock, At a technological disturbance; ct consumption, nt hours, ytoutput and it investments.

Here � and � are function of �, the share of labor in production; � the discount factor, �the depreciation rate, � the AR parameter of the technology shock. Let y1t = [ct; nt; yt; it]0

and y2t = [kt; lnAt]0, � = (�; �; �; �).

Page 91: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

A VAR for y1t only is y1t = H(�)y1t�1 + �1t where H(�) = �(�)�(�)(�(�)0

�(�))�1�(�); �1t = �(�)ut and (�(�)0�(�)�1)�(�) is the generalized inverse of �(�).

If � � N(

264 0:580:9880:0250:95

375 ;264 0:0006

0:00050:0006

0:00015

375), the model implies thatthe prior mean for H(�) is �H(�) =

264 0:19 0:33 0:13 �0:020:45 0:67 0:29 �0:100:49 1:32 0:40 0:171:35 4:00 1:18 0:64

375;(Note substantial feedback from C, Y, N to I in the last row).

Page 92: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

The prior variance for H(��) is �H =@H@���

@H@�0.

� A Minnesota-style prior for y1t consistent with the RBC is

- Coe�cient on y1t�1 � N( �H(�); �0 � �H).

- Coe�cients on y1t�j � N(0; �0h(`)

� �H); j > 1 where �0 is a tightness parameter and

h(l) a decay function. Note that here �1 = 1.

� Move from statistical to economic priors.

Page 93: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

Del Negro and Schorfheide (2004):

- DSGE model provides more than a "form" of the prior restrictions (zero mean on lags

greater than one, etc.). It gives quantitative info.

- Exploit the idea that prior is an additional set of equations that can be appended to a

model.

- Can make the DSGE prior more or less informative for the VAR depending on how much

DSGE data is appended to the actual data.

- Setup a hierarchical model that allows us to compute the posterior of DGSE and VAR

parameters jointly.

Page 94: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

Idea of the approach:

- Given �, simulate data from model. Append simulated data to actual data and estimate

a VAR on extended data set.

- Estimates of the VAR coe�cients and of the covariance matrix will re ect sample

and model information. The weight will be given by the precision of the two types of

information.

- Precision of data information depends on T (which is �xed). Precision of simulated

information depends on T1, which can be chosen by the investigator. By varying � =T1T,

one can make the prior more or less informative and thus assess of important the model

is for the data.

- The model has restrictions. If � large is optimal it means that the restrictions imposed

by the model are not violated. If � is small, restrictions are violated (test of the model).

Page 95: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

� Let g(�) =Qki=1 g(�k) be the prior on DGSE parameters.

� The DSGE model implies a prior g(�j�) � N(��(�); ��b(�)); �e � IW (T1��(�); T1 � k)on the VAR parameters of the decision rule where

��(�) = (Xs0Xs)�1(Xs0ys)��b(�) = �e(�) (T1Xs0Xs)�1

��(�) = (ys0ys � (ys0Xs)��(�)) (35)

ys simulated data, Xs lags in the VAR of simulated data, T1 =length of simulated data.

Let � = T1Tcontrol the relative importance of two types of information. �! 0 (�!1)

actual (simulated) data dominates.

� The VAR implies a density f(�;�ujy).

Page 96: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

The model has a hierarchical structure: f(�;�ejy)g(�j�)g(�ej�)g(�). Since likelihoodand the prior are conjugate (see the Normal-IW assumption above); the conditional pos-

teriors for VAR parameters are available in analytical format.

� g(�j�; y;�e) � N(~�(�); ~�b(�)); g(�ej�; y) � iW ((�+ T )~�(�); T + �� k) where

~�(�) = (T1Xs0Xs +X 0X)�1(T1X

s0ys +X 0y)~�b(�) = �e(�) (T1Xs0Xs +X 0X)�1

~�(�) =1

(1 + �)T[(T1y

s0ys + y0y)� (T1ys0Xs + y0X)~�(�)] (36)

� If we pick a � we can immediately construct these posteriors.

Page 97: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

� g(�jy) / g(�)� j�ej�0:5(T�M�1) expf�0:5tr[��1e (Y �X�)0(Y �X�)g �j�e(�)j�0:5(T1�M�1) expf�0:5tr[�e(�)�1(Y s�Xs�(�))0(Y s�Xs�(�))g. This conditionalposterior is non-standard: need Metropolis-Hasting step to calculate it.

- Use g(�jy); g(�j�; y;�e); g(�ej�; y) in the Gibbs sampler to obtain a marginal for �.

- All posterior moments in (36) are conditional on �. How do we select it? i) Use Rules

of thumbs (e.g. � = 1, T observation added). ii) Maximize the marginal likelihood.

Page 98: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

Example 11 (sticky price model) In a basic NK sticky price-sticky wage economy, set� = 0:66; �ss = 1:005; N ss = 0:33; c

gdp= 0:8; � = 0:99; �p = �w = 0:75; a0 = 0; a1 =

0:5; a2 = �1:0; a3 = 0:1. Run a VAR with output, interest rates, money and in ationusing actual quarterly data from 1973:1 to 1993:4 and data simulated from the model con-ditional on these parameters. Overall, only a modest amount of simulated data (roughly,20 data) should be used to set up a DSGE prior.

ML: Sticky price sticky wage model.� = 0 � = 0:1 � = 0:25 � = 0:5 � = 1 � = 2-1228.08 -828.51 -693.49 -709.13 -913.51 -1424.61

Page 99: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

6 Unconditional Forecasting: Fan Charts

Let the VAR be written in a companion form:

Yt = BYt�1 + Et (37)

where Yt and Et are Mq � 1 vectors, B is a Mq �Mq matrix.

Repeatedly substituting: Yt = B�Yt�� +P��1

j=0 BjEt�j or

yt = JB�Yt�� +��1Xj=0

Bjet�j (38)

where J is such that JYt = yt, JEt = et and J 0JEt = Et.

Page 100: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

� Unconditional point forecast for yt+�yt (�) = JB�Yt (39)

Use the posterior mean or median or mode,~B depending on the loss function. Recall thatif Et is normal, mean, mode and median coincide.

Forecast error is yt+� � yt (�) =P��1

j=0~Bjet+��j + [yt (�)� yt (�)].

Page 101: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

� Unconditional probability distributions for forecasts (fan charts).

Algorithm 6.1 Assume � � N(~�; ~�b). Set ~P ~P 0 = ~�b.

- Draw a normal (0,1) random vector vt and set �` = ~� + ~Pvt

- Construct point forecasts yt(�); � = 1; 2; ; : : : using �`

- Repeat previous steps L times.

- Construct distributions at each � using kernel methods and extract percentiles (fan

charts).

Can also be used for recursive forecasts charts, only di�erence would be that ~� and ~�b

depend on t (they are recursively estimated).

Page 102: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

� "Average" � -step ahead forecasts

Construct f (yt+� j yt) =Rf (yt+� j yt; �) g (� j yt) d� where f (yt+� j yt; �) is the condi-

tional density of yt+� and g (� j yt) the posterior of �.

- Can calculate this numerically. Draw �l from g (� j yt). Compute f�yt+� j yt; �l

�.

Average over yt+� paths.

Page 103: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

- Can use the above algorithm to calculate turning point probabilities

i) A upturn turn � in yt(�) (typically, GDP) if yt(��2) < yt(��1) < yt(�) > yt(�+1) >

yt(� + 2).

ii) A downturn at � in yit(�) if yt(� � 2) > yt(� � 1) > yt(�) < yt(� + 1) < yt(� + 2).

Implementation: draw �`, construct (yt(�))`; ` = 1; : : : L; apply above rule for each � .

The fraction of times for which the condition is satis�ed at each t is an estimate of the

probability of an upturn (downturn).

Page 104: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

Example 12 Use a BVAR to construct one year ahead bands for in ation, recursivelyupdating posterior estimates over 1995:4-1998:2.

- The bands are relatively tight: errors at the beginning. Distribution of one year aheadforecasts (based on 1995:4) also tight.

- Sample 1996:1 2002:4: 4 downturns. Median forecasted downturns 3; Pr(n� 3)= 0.9,Pr(n> 4)= 0.0.

Page 105: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

Recursive forecasts

Year

Perc

ent

1997 1998­1.6

0.0

1.6

3.2

4.8

6.4

8.0

9.68497.5162.5ACTUAL

Data up to 1995:4

Percent

Den

sity

2.0 3.0 4.0 5.0

0.0

0.5

1.0

1.5

2.0

2.5

Page 106: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

7 Structural BVARs

What kind of prior can be used for structural VARs?

B0yt � B(`)yt�1 = et et � (0; I) (40)

yt �B(`)yt�1 = ut ut � (0;�) (41)

B(`) = B1L+ : : :BqLq; B0 non singular; B(`) = B�10 B(`); � = B�10 B

�100 .

- (40) is a structural system, while (41) is the corresponding VAR.

- Why do we want prior for (40)?

i) We may have a-priori restrictions on the structural dynamics (output responses to a

monetary shock have a hump).

ii) We may have a-priori restrictions on the structural impacts e�ects (output responses

to a monetary shocks take time to materialize).

Page 107: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

What priors can we use for B0 and B(`)? How do you draw from their posterior?

� Standard approach (Canova (1991), Gordon and Leeper (1994)): Use Normal-invertedWishart prior for (B(`);�). This implies a Normal- inverted Wishart posterior. Draw

B(`)l;�l) and use identi�cation restrictions to obtain draws for B(ell), i.e., �l = (B�10 )l(B�100 )l;

Blj = Bl0Blj.

- Procedure is OK is model is just-identi�ed. If it is overidenti�ed, the above sampling

scheme does not take into account the restrictions.

� Sims and Zha (1998), Waggoner and Zha (2003) work directly with (40) (valid for bothjust-identi�ed and over-identi�ed systems). Staking the observations in (40):

Page 108: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

Y B0 �XB+ = E (42)

where Y is a T �M , X is a T � k matrix of lagged variables; E is a T �M matrix.Setting Z = [Y;�X]; B = [B0;B+]0, the likelihood is:

L(Bjy) / jB0jT expf�0:5tr(ZB)0(ZB)g / jB0jT expf�0:5b0(I Z 0Z)bg (43)

where b = vec(B) is aM(k+M)�1 vector; b0 = vec(B0) is aM2�1 vector; b+ = vec(B+)is a Mk � 1 vector, I a (Mk �Mk) identity matrix.

Suppose the prior are:

- g(b) = g(b0)g(b+jb0).

- g(b+jb0) � N(h(b0);�(b0)):

� Make prior on dynamics conditional on prior for contemporaneous e�ects.

Page 109: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

Posterior kernel:

g(bjy) / g(b0)jB0jT j�(b0)j�0:5 expf�0:5[b0(I Z 0Z)bgexpf(b+ � h(b0))

0�(b0)�1(b+ � h(b0))g (44)

- Since b0(I Z 0Z)b = b00(I Y 0Y )b0 + b0+(I X 0X)b+ � 2b0+(I X 0Y )b0, conditional

on b0, the quantity in the exponent is quadratic in b+, thus

- g(b+jb0; y) � N(~b0; ~�(b0)�1) where ~b0 = ((I X 0X) + �(b0)�1)�1((I X 0Y )b0 +

�(b0)�1h(b0)); ~�(b0) = ((I X 0X) + �(b0)�1).

In general dim(b+) = M(Mq + 1) so that obtaining and drawing from g(b+jb0; y) iscomplicated.

Page 110: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

- g(b0jy) / g(b0)jB0jT j(I X 0X)�(b0) + Ij�0:5

expf�0:5[b00(I Y 0Y )b0 + h(b0)0�(b0)�1�(b0)� ~b0~�(b0)~b0g

- g(b0jy) has unknown format even if g(b0) is known (because of jB0jT , makes the likeli-hood nonstandard).

Page 111: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

To simplify the computation of g(b+jy; b0)

1)Set �(b0) = �1 �2 and restrict �1 = ' � I.

Then even if �2i 6= �2j, independence across equations is guaranteed since (IX 0X)+

�(b0)�1) / (I X 0X) + diagf�21; : : :�2mg = diagf�21 +X 0X; : : :�2m +X 0Xg.

� We can proceed to estimate the equations one by one, without worrying about simul-taneity.

Page 112: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

2) Structural Minnesota priors for b+

Given B0, let yt = B(`)yt�1 + C + et. Let � = vec[B1; : : : Bq; C]. Since � = [B+B�10 ];E(�) = [Im; 0; : : : 0] and var(�) = �b imply

E(B+jB0) = [B0; 0; : : : ; 0]

var(B+jB0) = diag(�+(ijl)) =�0�1h(`)�j

i; j = 1; : : :m; ` = 1; : : : ; p

= �0�2 otherwise (45)

where i stands for equation, j for variable, ` for lag.

Page 113: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

(i) No distinction own vs. other coe�cients (in SES no normalization with respect to one

RHS variable).

(ii) Scale factor di�er from reduced form Minnesota prior since var(vt) = I.

(iii) Prior for constant independently parametrized.

(iv) Because � = vec[B+B�10 ] there is a-priori correlation in the coe�cients across equa-tions. For example, if �2i = �2 8i, g(�jB0) is normal with covariance matrix �e �2(see Kadiyala and Karlsson (1997)).

Page 114: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

� Additional interesting restrictions:

- Average value of lagged yi's (say �yio) is a good predictor of yit for each equation. Then

YdB0�XdB+ = V where Yd = fyijg = �3�y0i if i = j and zero otherwise, i; j = 1; : : :M ;

Xd = fxisg = �3�y0i if i = j; s < k and zero otherwise i = 1; : : :M; s = 1; : : : k.

Note that as �3 !1, this restriction implies model in �rst di�erence.

- Initial dummy restriction: suppose YdcB0�XdcB+ = E where Ydc = fyjg = �4�y0j if j =

1; : : :M Xdc = fxsg = �4�y0j if s < k � 1 and = �4 if s = k.

If �4 ! 1, the dummy observation becomes [I � A(1)]�y0 +A�10 C = 0. If C 6= 0, this

implies cointegration.

Page 115: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

How do we choose g(b0).

- Need to make a distinction between soft vs. hard restrictions.

- Hard restrictions give you identi�cation (possibly of blocks of equations). Soft restric-

tions are implied by the prior on the other parameters.

- Select the prior for non-zero coe�cients as non-informative, i.e. if bn0 are the non-zero

elements of b0, g(bn0) / 1, or normal.

Page 116: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

Example 13 Suppose there are M(M � 1)=2 restrictions, e.g. B0 upper triangular. Oneoption is to let g(�b0) to be independent normal with zero mean so E(�b0(ij)�b0(kh)) = 0

- no relationship across equations. The variance �2(�b0(ij)) = (�5�i)2 i.e. all the elements

of equation i have the same variance.

The alternative option is to use a Wishart prior for ��1e , i.e. g(��1e ) � IW (��; ��) where

�� are dof and �� the scale. If �� =M + 1; �� = diag (�5�i)2, then a prior for �b0 is the same

as before except for the Jacobian j@��1e

@B0 j = 2mQmj=1 a

jjj. Since likelihood contains a term

jB0jT =Qmj=1 b

Tjj, ignoring the Jacobian makes no di�erence if T >> m.

Page 117: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

How do we draw samples from g(b0jy)?

i) Normal approximation

1. Calculated mode of g(b0jy) and the Hessian at the mode.

2. Draw b0 from a normal centered at mode with covariance equal to the Hessian at the

mode or a t-distribution with the same mean and covariance and �� = M + 1 degrees of

freedom.

3. Compute the ratio IRl =gAP (bl0)�(bl0)

), and check the magnitude of IR over l = 1; : : : ; L.

ii) Restricted Gibbs sampling (Waggoner-Zha (2003)).

iii) Metropolis-Hastings step ( Canova and Forero (2012)).

Page 118: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

Restricted Gibbs sampling

Write the VAR as

y0tA = x0tF + �0t �t � N(0; I) (46)

where x0t = [y0t�1 : : : y

0t�p; z

0t] is a 1� k vector zt is a h� 1 vector of exogenous variabes,

F = [A01 : : : A0p; D] is a n� k vector, with k = np+ h, and = (AA

0)�1.

The identi�cation restrictions can be written as

Qiai = 0

Rifi = 0; i = 1; : : : ; n (47)

where ai; fi are the i� th columns of A and F.

Let the prior on ai and fi be

ai � N(0; �Si); fijai � N( �Piai; �Hi) (48)

Want a prior distribution that combines the identi�cation restrictions and (48). LettingUi be an n� qi matrix whose columns form an orthonormal basis for the null space of Qi

Page 119: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

and Vi a matrix whose columns form an orthonormal basis for the null space of Ri. Thena ai; fi satisfy (47) if they can be written as linear transformations of bi; gi

ai = Uibi; fi = Vigi (49)

and the prior for bi; gi is

bi � N(0; ~Si); gijbi � N( ~Pibi; ~Hi) (50)

where ~Hi = (V 0i�H�1i Vi)�1; ~P = ~HiV 0

i�H�1i�PUi; ~S = (U 0i

�S�1i Ui +

U 0i�P 0i�H�1i�PiUi � ~P 0i

~H�1i~Pi)�1.

The likelihood of bi; gi is proportional to

jdet[Uibij : : : jUnbn]jT

exp(�0:5nXi=1

(b0iU0iY

0Y Uibi � 2g0iV 0iX

0Y Yibi + g0iV0iX

0XVigi)) (51)

The posterior can be written as p(b1; : : : ; bnjX;Y )�Qni=1 p(gijbi; X; Y ) where

p(b1; : : : bnjX;Y ) / jdet[Uibij : : : jUnbn]jT exp(�T

2

nXi=1

b0iS�1i bi) (52)

p(gijbi; X; Y ) � N(Pibi; Hi) (53)

Page 120: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

where Hi = (V 0iX

0XVi+ ~H�1i )�1; Pi = Hi(V 0

iX0Y Ui+ ~H�1

i~Pi); Si = ((1=T )(U 0iY

0Y Ui+

~S�1i + ~P 0i~H�1i~Pi � P 0iH

�1i Pi))�1.

As mentioned, (52) is non-standard. However, draws form this posterior can be obtained

from the Gibbs sampler. In fact for �xed 1 � i� � n, the posterior of bi�, conditional on

b�i� (all the columns of b except i�), is equivalent to drawing independently a number ofunivariate normal distributions and one univariate Wishart distribution.

To do this need to choose a matrix w whose elements form an orthonormal basis for Rqi�.

Then bi� = Ti�Pqi�

j=1 �jwj where Ti�T0i� = Si� and

- �1 � UW (T�1; k + 1)

- for 2 � j � qj; �j � N(0; T�1).

To implement the algorithm need a) to draw � from the above distributions; b) construct

w; c) recover bi (and ai) using the transformations above (see package with computer

programs).

Page 121: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

MH algorithm

Consider �rst the static model is:

A (�)0 yt = "t; "t � N (0; I) (54)

yt; "t are N �1 vectors, A (�) is a non-singular N �N matrix , � is a vector of structural

parameters and yt � yt � B(`)yt�1 = yt � Fxt.

The likelihood function is

L�yT j �

�= (2�)�NT=2 det (A (�)0)

T exp

(�12

TXt=1

(A (�)0 yt)0 (A (�)0 yt)

)(55)

� Again, because of the Jacobian det (A (�))T the likelihood is highly non-linear in the �parameters. Thus, the posterior of � will be non-standard.

Page 122: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

Reparameterization of the SVAR

Vectorizing (54) produces vec (A (�)0 yt) = vec ("t) = "t. The model(54) can be ex-pressed as: eyt = Zt�+ "t (56)

where eyt � (y0t IM) sA; Zt � � (y0t IM)SA and vec (B (�)0) = SA�+sA, where SA

is a matrix and sA a vector of zeros and ones.

The likelihood function of (56) is

eL �yT j �� = (2�)�NT=2 (detD)T exp(�12

TXt=1

[eyt � Zt�]0 [eyt � Zt�]

)(57)

Sice D = @[vec(A(�)yt)]@y0t

= Dy + Dz, vec (Dy) = sA and vec (Dz) = SA�; vec (D) =

vec (A (�)) so that (57) and (55) are equivalent.

Page 123: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

The proposal distribution and the MH algorithm

- From (56) , proposal distribution to be used in a Metropolis routine is:

- Get OLS estimates of � in (56)

�� =

"TXt=1

Z 0tZt

#�1 " TXt=1

Z 0teyt#

(58)

and of the covariance matrix

P � (��) =

"TXt=1

Z 0t (SSE)�1Zt

#�1(59)

where SSE =PT

t=1 (eyt � Zt��) (eyt � Zt��)0.

� Set �0 = �� and for i = 1; 2; : : : ; G:

1)Draw a candidate z � p�� (z j �0) = t (�0; rP � (�0) ; �), where r > 0 is a constant,

and � � 4.

Page 124: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

2) Compute � = ep(zjyT )�p��(�0jz)ep(�0jyT )�p��(zj�0) where ep(:jyT ) = eL(yT j:)p(:) is the posterior of (z; �0).Draw a v � U (0; 1). Set �i = z if v < ! and �i = �0 otherwise, where ! = min f�; 1gand G is the total numbers of draws.

� The � vector is jointly sampled.

� The covariance matrix of the proposal P �(��) is non-diagonal.

� Need Metropolis because proposal does not take into account Jacobian.

Page 125: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

A Numerical example

A (�)0 =

24 1 0 0:50:8 1 00 0:5 1

35 (60)

- Simulate data according to (54) for t = 1; : : : ; 500, re-parametrize the model as in (56)

and estimate �� and P � using (58) and (59).

- Let p(�i) / 1; i = 1; 2; 3, Set G = 150; 000, discard the �rst 100; 000, and keep 1

every 100 from the remaining 50; 000.

- The acceptance rate is 24%.

Page 126: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

Posterior estimates of �

Page 127: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

Nesting the MH step in a general Gibbs sampling

The model is

A(�)0yt = A(�)(`)yt�1 + �t �t � N(0; I) (61)

- To run the Gibbs sampler we need the conditional posterior of � = vec(A(`)) and

� = vec(A0).

- If g(�) � N(��;��), g(�j�; yt) � N(~�; ~��) where ~�; ~�� have the usual weighted average

structure

- If g(�) is selected as above and candiates are generated with the MH step previously

de�ned, g(�j�; yt) can be numerically computed.

- Put these two blocks into the Gibbs sampler and iterate.

Page 128: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

What kind of Identi�cation restrictions are allowed?

1. Short-run linear restrictions Suppose

A (�)0 =

24 1 0 ��2�1 1 00 �2 1

35The reparametrized model is:

vec (A (�)0) =

2666666666664

1�1001�2��201

3777777777775=

2666666666664

0 01 00 00 00 00 10 �10 00 0

3777777777775| {z }

SA

��1�2

�| {z }

+

2666666666664

100010001

3777777777775| {z }

sA

Page 129: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

2.Short-run non-linear restrictions Suppose now

A (�)0 =

24 1 0 �3�1 1 0

0 (�2 + 1)2 1

35 (62)

The model is re-parametrized as

vec (A (�)0) =

2666666666664

1�1001

(�2 + 1)2

�301

3777777777775=

2666666666664

0 0 01 0 00 0 00 0 00 0 00 1 00 0 10 0 00 0 0

3777777777775| {z }

SA

24 �1(�2 + 1)

2

�3

35| {z }

F (�)

+

2666666666664

100010001

3777777777775| {z }

sA

De�ne e�2 � (�2 + 1)2 as a new parameter, the procedure applies to the vector e� =

(�1; e�2; �3), provided e�2 > 0. Adding this restriction avoids the need to linearize F (�).

Page 130: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

2.Short-run non-linear restrictions that do not �t Consider now a third example:

A (�)0 =

24 1 0 �1�2 � 1�1 1 00 �2 1

35 (63)

The reparametrized model is

vec (A (�)0) =

2666666666664

1�1001�2

�1�2 � 101

3777777777775=

2666666666664

0 0 01 0 00 0 00 0 00 0 00 1 00 0 10 0 00 0 0

3777777777775| {z }

SA

24 �1�2

�1�2 � 1

35| {z }

F (�)

+

2666666666664

100010001

3777777777775| {z }

sA

Linearity is lost since the third component of F (�) depends on the other two. Need to

use non-linear model of Canova and Perez Forero (2012).

� If a closed-form solution for combinations of � is available, the procedure allows for

Page 131: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

both linear and non-linear restrictions.

3. Long-run restrictions

B(�)0 yt = Byt�1 + "t; "t � N (0; IM) (64)

Letting B � [B (�)0]�1B, the VAR is:

yt = Byt�1 + [A (�)]�1"t (65)

The (long run) cumulative matrix is:

D � (IM �B)�1 [A (�)]�1 (66)

Given draws of B and �; one can immeditely construct D using (66) and check whether

the required restrictions are satis�ed.

For example, suppose

D =

24 D11 D12 00 D22 D23

D13 0 D33

35

Page 132: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

This set of restrictions can be summarized as

R0vec (D) =

24 000

35 (67)

with

R0 =

24 0 1 0 0 0 0 0 0 00 0 0 0 0 1 0 0 00 0 0 0 0 0 1 0 0

35From (65) setting byt � yt �Byt�1, we have that

A (�) byt = "t

where

A (�) =

24 1 �3 �5�1 1 �6�2 �4 1

35 (68)

� Draw B from the posterior, then draw candidate �'s using the suggested reparameteri-

zation and for each draw use an accept-reject step to make sure the long run restrictions

(67) are satis�ed.

Page 133: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

�When partial multipliers or the structural lagged coe�cients A+ are restricted, the sameacceptance/rejection framework can be used.

� Sign restrictions can be dealt with in the same way.

Page 134: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

Example 14 (Transmission of monetary shocks) Use US data, 1960:1 to 2003:1 for thelog IP, log of CPI, Fed Funds rate and the log of M2.

Overidentify the system: the central bank only looks at money when manipulating thenominal rate, i.e. contemporaneous impact matrix is Choleski form except (3,1), (3,2)elements which are zero.

g(�bi) � N(0; 1). Use an importance sampling to draw from a normal centered at themode and with dispersion equal to Hessian at the mode. Importance ratio: in 17 out of1000 draws weight is large.

Page 135: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

Real GDP

5 10 15 20­1.00

­0.75

­0.50

­0.25

0.00

0.25

Price level

Horizon (Quarters)5 10 15 20

­0.2

0.0

0.2

0.4

0.6

M2

5 10 15 20­0.8

­0.6

­0.4

­0.2

­0.0

0.2

Federal funds rate

Horizon (Quarters)5 10 15 20

­0.2

0.0

0.2

0.4

Page 136: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

� Both output and money persistently decline in response to an increase in interest rates.The response of price initially close to zero but turns positive and signi�cant after about

5 months (price puzzle?).

�Monetary shocks explain 4-18% of var(Y ) at the 48 month horizon and 0-7% of var(P ).

Page 137: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

8 Heterogeneous dynamic panels

8.1 Bayesian pooling

- Often in cross country studies we have only a few data points for a moderate number

of countries.

- If dynamic heterogeneities are suspected, exact pooling of cross sectional information

leads to biases and inconsistencies.

- Any way to do some partial cross sectional pooling to improve over single unit estima-

tors?

- How do you compute "average" e�ects in dynamic models which are heterogeneous in

the cross section?

Page 138: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

� Simple univariate model to set up ideas:

yit = Xit�i + eit eit � iid (0; �2I) (69)

where Xit = [1; yit�1; : : : ; yit�p]; �i = [ai0; Ai1; Ai2; : : : ; Aip]. Assume that T is short.Suppose

�i = �� + vi vi � iid(0;�v) (70)

� Coe�cient of the dynamic model are drawn from the same distribution (they are di�erentrealizations of the same process).

� �v controls degree of dispersion. �v = 0 coe�cients equal; �v ! 1 no relationship

between the coe�cients.

- Two interpretations of (70): i) uncertain linear restriction (classical approach); ii) prior

shrinking the coe�cients of unit i and j toward a common mean.

Page 139: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

� Bayesian Random coe�cient estimator.

If ei and vi are normally distributed, �� and �v known, g(�ijy) is normal with mean

(1

�2ix0ixi +�

�1v )

�1(1

�2ix0ixi�i;ols +�

�1v��)

where �i;ols is the OLS estimator of �i and variance

(1

�2ix0ixi +�

�1v )

�1

� Weighted average of prior and sample information with weights given by the relativeprecision of the two informations!!

Page 140: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

- �2 unknown. Use �2i;ols in the formulas.

- If �v is large, ~�i ! �i;ols.

� Average cross sectional estimator: ~� = 1n

Pni=1

~�i = �GLS applied to the linear model

with uncertain restrictions using Theil mixed estimator.

Page 141: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

- If ��; �v are unknown, we need a prior for these parameters. No analytical solution for

the posterior mean of �i exists.

- Approximate posterior modal estimates (see Smith (1973))

���=

1

n

nXi=1

��i (71)

(��i )2 =

1

T + 2[(yi � xi�

�i )0(yi � xi�

�i )] (72)

��v =1

n� dim(�)� 1[Xi

(��i � ���)(��i � ��

�) + � (73)

where "*" are modal estimates from a training sample, � = diag[0:001].

- Plug in these estimates in the posterior mean/variance formulas. Underestimate uncer-

tainty (parameters treated as �xed when they are random).

- Two step estimator.

Page 142: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

- Alternative estimator (see Rao (1975)) using a training sample:

��EB =1

n

nXi=1

�i;ols (74)

�2i;EB =1

T � dim(�)(y0iyi � y0ixi�i;ols) (75)

�v;EB =1

n� 1

nXi=1

(�i;ols � ��EB)(�i � ��EB)0 �1

n

nXi=1

(x0ixi)�1�2i;ols (76)

� The two estimators of �� are similar, but the �rst averages posterior modes, the secondaverages OLS estimates.

Can use the procedure to partially pool subsets of the cross sectional units. Assume (70)

within each subset but not across subsets (see e.g. Ciccarelli, et al. (2012)).

Page 143: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

8.2 Univariate dynamic panels

yit = %i +B1i(`)yit�1 +B2i(`)xt + eit eit � (0; �2i ) (77)

Bji(`) = Bji1` + Bji2`2 + : : : Bjiqj`qj, Bjil is a scalar, %i is the unit speci�c �xed e�ect,

xt are exogenous variables, common to all units. Assume E(eitej�) = 0 8i 6= j; 8t; � .

- Interesting quantities that can be computed: bi1(1) = (1 � B1i(1))�1, bi2(1) = (1 �B1i(1))�1B2i(1) (long run e�ects), bi1(`), bi2(`) (impulse responses).

Page 144: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

- Stack the T observations to create yi; x; ei; 1. Let Xi = (yi; x; 1); X = diagfXig,� = [B1; : : : BN ]0; Bi = (%i; B1i1; : : : ; B1iq1; B2i1; : : : ; B2iq2), �i = �2i � IT ;� = diagf�igthen:

y = X� + e e � (0;�) (78)

y = (y01; : : : y0N)

0; e = (e01; : : : e0N)

0.

- Comparing (78) with (9) one can see dynamic panel has same structure as a VAR but

Xi are unit speci�c and the covariance matrix has a (block) heteroschedastic structure.

- Likelihood is of (78) is still the product of a normal for �, conditional on �, and

N inverted gammas for �2i . Note that since var(e) is diagonal, ML=OLS equation by

equation.

What kind of priors could be used?

- Semi-conjugate prior: g(�) � N(��; ��b) and g(�2i ) � IG(0:5a1; 0:5a2).

Page 145: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

- Exchangeable prior: g(�) =Qi gi(�); �i � N(��; �b), where �b measures a-priori

heterogeneity. With exchangeability ~� can be computed equation by equation.

- Exchangeable prior on the di�erence (Canova and Marcet (1998)): �i��j � N(0;�b).

�b has a special structure.

- Depending on the choice of prior, the posterior will re ect prior and sample or prior and

pooled info (see Zellner and Hong, 1989).

Page 146: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

Example 15 (Growth and convergence)

Yit = %i +BiYit�1 + eit eit � N(0; �2i ) (79)

where Yit = log(yityt), yt is the average EU GDP.

Let �i = (%i; Bi) = �� + vi, where vi � N(0; �2b). Assume �2i given,

�� known (if not getit from pooled regression on (��; 0)) and treat �2b �xed.

Let �i;j =�2i�2bjj; j = 1; 2 be the relative importance of prior and sample information.

Choose loose prior (�i;j = 0:5).

Use income per-capita for 144 EU regions from 1980 to 1996 to construct SSi = ~%i1� ~BT

i

1� ~Bi

+

~BT+1i zi0 where ~%i; ~Bi are posterior mean, and CVi = 1� ~Bi (the convergence rate).

Page 147: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

-Mode of CVi distribution is 0.09: fast catch up. The highest 95% credible set is large(from 0.03 to 0.45).

- Distribution of SS has many modes (at least 2).

What can we say about the posterior of the cross sectional mean SS? Suppose g(SSi) �N(�; �2). Assume g(�) / 1 and � = 0:4.

- g(�jy) combines the prior and the data and the posterior of g(SSijy) combines unitspeci�c and pooled information.

- ~� = �0:14 (highly left skewed distribution); variance is 0.083; 95 percent credibleinterval is (-0.30, 0.02).

Page 148: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

Convergence rate

De

nsi

ty

0.00 0.600.0

1.6

3.2

4.8

6.4

Steady stateD

en

sity

­1.0 0.50.0

0.9

1.8

2.7

Page 149: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

8.3 Endogenous grouping

� Are there groups in the cross section? Convergence clubs; credit constrained vs non-credit constrained consumers, large vs. small �rms, etc. Classi�cations typically exoge-

nous (see e.g., Gertler and Gilchrist (1991)).

- Want an approach that simultaneously allows for endogenous cross sectional grouping

and Bayesian estimation of the parameters.

- Idea: if units i and j belong to a group, coe�cients �i and �j have same distribution.

If not, they have di�erent distributions.

- Basic problem: what ordering of the cross section gives grouping? There are } =

1; 2; : : : N ! orderings. How do you �nd groups?

Page 150: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

� Suppose & = 1; 2; : : : ;�& breaks, �& given. For each & + 1 groups let the model be:

yit = %i +B1i(`)yit�1 +A2i(`)xt�1 + eit (80)

�ji = ��j+ vj (81)

where i = 1; : : : ; nj(}); nj(}) is the number of units in group j, given the }-th ordering,Pj n

j(}) = N , each } and eit � (0; �2ei); vj � (0; ��j) �i = [%i; B1i1; : : : ; B1iq1; B2i1; : : : ; ; B2iq2].Let hj(}) be the location of the break for group j = 1; : : : ; & + 1.

Alternative to (81): �& = 0 and exchangeable structure 8i, i.e

�i = �� + vi i = 1; : : : ; N vi � N(0; ��i) (82)

Page 151: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

Want to evaluate (80)-(81) against (80)-(82) and estimate (�; �ei) jointly with optimal

(}; &; hj(})) (ordering, number of breaks, location of break).

- Given an ordering }, the number of breaks &, and the location of the break point hj(}),rewrite (80)� (81) as:

Y = X� + E E � (0;�e) (83)

� = ��0 + V V � (0;�V ) (84)

where �E is (NTM)� (NTM) and �V = diagf�ig is (Nk)� (Nk).

- Specify priors for (�0;�e;�V ). Construct posterior estimates for (�;�E), (�0;�V )

jointly with posterior estimates of (}; & , hj(})). Problem complicated!

- Split the problem in three steps. Use Empirical Bayes techniques to construct posteriors

estimates of �, conditional on optimal (}; & , hj(})) and estimates of (�0;�V ;�E).

Page 152: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

� Step 1: How do you compute }; &; hj(}) optimally?

a) Given (�0; �V ; �e), and a }, examine how many groups are present (select &).

b) Given } and &, check for the location of the break points (select hj(})).

c) Iterate on the �rst two steps, altering }.

Conclusion: selected submodel maximizes the predictive density over orderings }, groups

& + 1 and break points hj(}).

Let: f(Y jH0) be the predictive density under cross sectional homogeneity.

Let f(Y jH&;hj(}); }) =Q&+1j=1 f(Y

jjH&; hj(}); }) the predictive density for group j, with

& break points at location hj(}), using ordering }.

De�ne: - I&: set of possible break points when there are & groups

- J : set of possible orderings of the cross section.

Page 153: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

- �jh(}): (di�use) prior of a break at location h for group j of ordering }.

� f�(Y jH&; }) � suphj(})2I & f(Y jH&; hj(}); }) (max w.r. to break)

� f y(Y jH&) � sup}2J f�(Y jH&; }) (max w.r. to break and ordering)

� f0(Y jH&; }) �P

hj(})2I & �jh(})f(Y jH&; hj(}); }) (average).

Page 154: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

To test for breaks (set �& << (N=2)0:5).

1) Given };H(0) no breaks, H(1) & breaks:

PO(}) =�0f0(Y jH0)P& �&f

0(Y jH&; })(85)

�0 (�&) the prior probability that there are 0 (&) breaks.

2) Given };H0 : & � 1 breaks H(1) : & breaks.

PO(}; & � 1) =�&�1f0(&�1)(Y jH&�1; })

�&f0(&)(Y jH&; })(86)

Given &, assign units to j i.e �nd f�(Y jH�&; }). Alter } to get f y(Y jH�&).

Page 155: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

Questions:

i) Can we proceed sequentially to test for (cross sectional) breaks? Bai (1997) OK

consistent. But estimated break point is consistent for any of the existing break points,

location depends on the "strength" of the break.

ii) How to maximize predictive density over } when N is large? Do we need N permuta-

tions? No, much less. Plus use economic theory to give you interesting ordering.

Page 156: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

� Step 2: Given (}; &; hj(})) estimate [�00; vech(�V )0; vech(�e)0]0 using f y on a trainingsample.

If the e's are normally distributed, then

�j

0 =1

nj(})

nj(})Xi=1

�iols

�j =1

nj(})� 1

nj(})Xi=1

(�iols � �i)(�iols � �

i)0 � 1

nj(})

nj(m)Xi=1

(XiX0i)�1�2i

�2i =1

T � k(Y 0

i Yi � Y 0iXi�

iols) (87)

j = 1; : : : ; & + 1; xi regressors and yi dependent variables for unit i of group j , and

�jols = (xj 0xj)�1(xj

0yj) is the OLS estimator for unit i (in group j).

Page 157: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

� Step 3: Construct posterior estimates of � conditional on all other parameters.

- EB posterior point estimate � = (X 0��1E X + ��1V )�1(X 0��1E Y + ��1V A�0).

- Alternatively, joint estimation prior and posterior if e's and the v's are normal and the

prior on hyperparameters di�use (see Smith 1973).

Page 158: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

Example 16 (Convergence clubs). The posterior in example 11 is multimodal. Are thereat least two convergence clubs? Where is the break point? How di�erent are convergencerates across groups?

- Examine several ordering. More or less they give the same result. Best use initialconditions of relative income per-capita.

- Set & = 4 and sequentially examine & against & + 1 breaks starting from & = 0. Threebreaks, PO ratios of 0.06. 0.52, 0.66 respectively. Evidence in favour of two groups.

- Figure reports the predictive density as a function of the break point (for (& = 1) and(& = 0). Units up to 23 (poor, Mediterranean and peripheral regions in the EU) belongto the �rst group and from 24 to 144 to the second.

The average CV of two groups are 0.78 and 0.20: faster convergence to below averagesteady state in the �rst group. Posterior distributions of the steady states for the twogroups distinct.

Page 159: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

Break point

Pred

ictiv

e de

nsity

4850

4875

4900

4925

4950

4975 ONE BREAKNO BREAK

Steady State DistributionsPr

obab

ility

­1.2 ­0.4 0.40

5

10

15CLUB1CLUB2

Page 160: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

8.4 Bayesian pooling for VARs

- Can maintain same setup and same ideas. Model is

yit = (I Xt)�i + eit eit � iid (0;�e) (88)

where Xt = [1; yt�1; : : : ; yit�p]; �i = [ai0; Ai1; Ai2; : : : ; Aip]. Suppose

�i = �� + vi vi � (0;�v) (89)

Case 1: ��;�v known. Posterior for �i is normal with mean and variance given by

~�i = (1

�2ix0ixi +�

�1v )

�1(1

�2ix0ixi�i;ols +�

�1v��) (90)

~�� = (1

�2ix0ixi +�

�1v )

�1 (91)

Case 2:��;�v unknown �xed quantities estimable on a training sample.

Page 161: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

- (90)-(91) still applicable with estimates of ��;�v in place of true ones.

Case 3: ��;�v unknown random quantities with prior distribution.

- Use MCMC to derive posterior marginals of the parameters.

- Cross sectional prior can be used in addition or in alternative to time series prior. Both

have shrinkage features.

Page 162: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

- Same logic can be applied if one expects impulse responses (rather than VAR coe�cients)to be similar. Model in this case is

yit =Xj

ijeit�j eit � iid (0;�e) (92)

i = � + vi vi � (0;�v) (93)

where i = [ i1; i2; : : :].

- Posterior distribution of impulse responses will re ect unit speci�c (sample) informa-

tion and prior information. Weights will depend on the relative precision of the two

informations.

- Note that we treat �e as �xed (known or estimable quantity). If it is a random variable

we need to use some conjugate format to derive analytically the posterior; otherwise we

need to use MCMC methods.

Page 163: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

Appendix 1

Methods to sample from the posterior g(�jy) if available.

� Direct sampling (see example 1).

� Sampling by parts. If g(�jy) has a complicated structure could partition � = (�1; �2)

and g(�jy) = g(�1jy; �2)g(�2jy) and sample separately from the two pieces (see Bayes

theorem with multiple parameters algorithm).

Example 17 We use sampling by parts when we construct the predictive distribution offorecasts. In fact f(yT+� jyT ) =

Rf(yT+� jyT ; �)g(�jy)d�. Hence sample �l from the

posterior, use the model to forecast yT+� given �l, and average over draws.

Page 164: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

- Sampling by parts is typically used to obtain the marginal posterior of � in a linear

regression model.

Example 18 Suppose g(�jy; �2) is N(�y; �2=T ) and that g(�2jy) isIG(0:5(T � 1); 0:5(T � 1)s2) where �y and s2 are the sample mean and variance ofyt. Since g(�jy) =

Rg(�jy; �2)g(�2jy)d�2, draw (�2)l from g(�2jy), and draw � from

g(�jy; (�2)l). As L goes to in�nity we will have a sample from g(�jy).

� Sampling by inversion. If y = f(x) is U(0; 1) a draw for x can be obtained drawing

from a uniform draw for y applying x = f�1(y).

Page 165: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

Appendix 2Matrix Algebra results

1) Am�n Bp �q =

0@ a11B a12B : : : A1;nB: : : : : : : : : : : :am1B am2B : : : Am;nB

1A2) vec(A)0 = [a11; a12; : : : ; a1n; : : : ; am1; am2; : : : ; amn].

3) vec(A0)0vec(B) = tr(AB) = tr(BA) = vec(B0)0vec(A)

4) vec(ABC) = (C 0 A)vec(B).

Page 166: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

5)

tr(ABC) = vec(A0)0(C 0 I)vec(B)

= vec(A0)0(I B)vec(C)

= vec(B0)0(A0 I)vec(C)

= vec(B0)0(I C)vec(A)

= vec(C 0)0(B0 I)vec(A)

= vec(C 0)0(I A)vec(B)

Page 167: Bayesian methods for VAR Modelsapps.eui.eu/Personal/Canova/Teachingmaterial/Bvar_eui2013.pdfSims, C. and Zha T. (1998) \Bayesian Methods for Dynamic Multivariate Models", In-ternational

Some Distributions

1) Multivariate normal: x(M�1) � N(�;�)

p(x) = (2�)�0:5M j�j�0:5expf0:5(x� �)0�1(x� �)g (94)

2) Multivariate t: x(M�1) � tv(�;�)

p(x) =�(0:5(� +M))

�(0:5�)(��)0:5�j�j�0:5expf1 + 0:5

�(x� �)0�1(x� �)g�0:5(�+M) (95)

3) Inverse Wishart A(M�M) �W (S�1; �)

p(A) = (20:5�M�0:25M(M�1)MYi

�(0:5(� + 1� i))�1jSj0:5�

jAj�0:5(�+M+1)exp(�0:5tr(SA�1)) (96)