forecasting inflation using time-varying bayesian model averaging

34
Forecasting inflation using time-varying Bayesian model averaging Jordi van der Maas* Erasmus University Rotterdam, Burgemeester Oudlaan 50, 3062 PA Rotterdam, Netherlands This paper presents a Bayesian model averaging regression frame- work for forecasting US inflation, in which the set of predictors included in the model is automatically selected from a large pool of potential pre- dictors and the set of regressors is allowed to change over time. Using real-time data on the 19602011 period, this model is applied to fore- cast personal consumption expenditures and gross domestic product deflator inflation. The results of this forecasting exercise show that, although it is not able to beat a simple random-walk model in terms of point forecasts, it does produce superior density forecasts compared with a range of alternative forecasting models. Moreover, a sensitivity analysis shows that the forecasting results are relatively insensitive to prior choices and the forecasting performance is not affected by the inclusion of a very large set of potential predictors. Keywords and Phrases: Bayesian, Bayesian model averaging, BMA, Time varying BMA, Forecasting inflation. 1 Introduction Ination forecasting is an essential element of monetary policymaking. A frequently used building block for ination forecasting is the Phillips curve, which exploits the negative correlation between ination and unemployment. Various extensions using other measures of aggregated activity, past ination and variables related to the current state of the economy have been used. There are two main problems regarding the application of the Phillips curve type of models. First, the dynamics of ination measures, and therefore the Phillips curve corre- lations, have changed over time. STOCK and WATSON (1999) nd small but signicant changes in the parameters of the conventional Phillips curve, mostly due to changes in the contribution of lagged ination. Furthermore, ATKESON and OHANIAN (2001) point out that the Phillips curve correlation between ination and unemployment breaks down in the 1970s. GROEN,PAAP and RAVAZZOLO (2013) report several structural breaks in the parameters of an extended Phillips curve. A second problem is the question of which of the many potential explanatory variables can be used to improve the forecasting performance of the Phillips curve. STOCK and WATSON (1999) consider univariate and multivariate forecasts based *[email protected] © 2014 The Authors. Statistica Neerlandica © 2014 VVS. Published by Wiley Publishing, 9600 Garsington Road, Oxford OX4 2DQ, UK and 350 Main Street, Malden, MA 02148, USA. Statistica Neerlandica (2014) Vol. 68, nr. 3, pp. 149182 doi:10.1111/stan.12027 149

Upload: jordi

Post on 20-Feb-2017

214 views

Category:

Documents


0 download

TRANSCRIPT

Forecasting inflation using time-varyingBayesian model averaging

Jordi van der Maas*

Erasmus University Rotterdam, Burgemeester Oudlaan 50, 3062PA Rotterdam, Netherlands

This paper presents a Bayesian model averaging regression frame-work for forecasting US inflation, in which the set of predictors includedin the model is automatically selected from a large pool of potential pre-dictors and the set of regressors is allowed to change over time. Usingreal-time data on the 1960–2011 period, this model is applied to fore-cast personal consumption expenditures and gross domestic productdeflator inflation. The results of this forecasting exercise show that,although it is not able to beat a simple random-walk model in terms ofpoint forecasts, it does produce superior density forecasts comparedwith a range of alternative forecasting models. Moreover, a sensitivityanalysis shows that the forecasting results are relatively insensitive toprior choices and the forecasting performance is not affected by theinclusion of a very large set of potential predictors.

Keywords and Phrases: Bayesian, Bayesian model averaging, BMA,Time varying BMA, Forecasting inflation.

1 Introduction

Inflation forecasting is an essential element of monetary policymaking. A frequentlyused building block for inflation forecasting is the Phillips curve, which exploits thenegative correlation between inflation and unemployment. Various extensions usingother measures of aggregated activity, past inflation and variables related to thecurrent state of the economy have been used.There are two main problems regarding the application of the Phillips curve type of

models. First, the dynamics of inflation measures, and therefore the Phillips curve corre-lations, have changed over time. STOCK and WATSON (1999) find small but significantchanges in the parameters of the conventional Phillips curve, mostly due to changes inthe contribution of lagged inflation. Furthermore, ATKESON andOHANIAN (2001) pointout that the Phillips curve correlation between inflation and unemployment breaksdown in the 1970s. GROEN, PAAP and RAVAZZOLO (2013) report several structuralbreaks in the parameters of an extended Phillips curve.A second problem is the question of which of the many potential explanatory

variables can be used to improve the forecasting performance of the Phillips curve.STOCK and WATSON (1999) consider univariate and multivariate forecasts based

*[email protected]

© 2014 The Authors. Statistica Neerlandica © 2014 VVS.Published by Wiley Publishing, 9600 Garsington Road, Oxford OX4 2DQ, UK and 350 Main Street, Malden, MA 02148, USA.

Statistica Neerlandica (2014) Vol. 68, nr. 3, pp. 149–182doi:10.1111/stan.12027

149

on 168 different economic indicators. Their results show that variables other thanunemployment (mainly measures of aggregate activity) can improve upon traditionalPhillips curve forecasts. However, many of the forecasting relations appear to beunstable; variables yielding superior forecasting results in their first subsample fail todo so in the second, and vice versa. Similar results are obtained by CECCHETTI, CHU

and STEINDEL (2000).It is empirically likely that certain relations between variables only hold at certain

points in time, as a result of changes in inflation targeting. DAVIG and DOH (2009), forexample, find evidence for a more aggressive monetary policy regime after the Volckerdisinflation in the early 1980s and before 1970 than during the Great Inflation in the1970s. Therefore, forecasting performance can be improved upon by accounting for thetime-varying relations between inflation measures and predictors. Moreover, explicitlymodeling restrictions that hold only in some subperiods will limit the risk of misspecifyingthe model. This feature is not present in models that allow for structural breaks in theparameters (like time-varying parameter (TVP) models), as in those cases, relationsbetween variables that break down in certain points in time will lead to coefficients beingestimated close to zero, but never exactly zero, thereby leading to loss of efficiency. In thisresearch, I use an extended Phillips curve that allows for a time-varying selection ofpredictor variables, so predictors that do not have explanatory power over certain subpe-riods can be temporarily left out from the model or can be left out completely.This paper contributes to the literature by being among the first researches applying

a Bayesian variable selection that allows for different selections of variables for differ-ent periods. Other researches have modeled time-varying relationships with varyingparameters, where the model itself does not change. In this research, the set of regres-sors changes over time, whereas the regression parameters are kept constant.The setup of this paper is as follows. Section 2 gives an overview of the relevant liter-

ature on time variation in econometric models. The model developed in this paper is in-troduced in section 3, which also describes the prior settings and the Bayesian estimationprocedure. The results of estimating the model’s parameters over the full 1960Q1–2011Q2 sample are presented in section 5. In section 6, the procedure for forecastinginflation over the 1986Q1–2011Q2 sample is introduced, together with the forecastevaluation metrics used to compare the forecasts with forecasts obtained from a rangeof alternative models. The forecasting results are subsequently presented in section 7.

2 Literature review

A frequently used method to model time variation in inflation dynamics is the TVPmodel. GROEN et al. (2013) model inflation using a TVP specification, where the param-eters of an extended Phillips curve are allowed (but not forced) to change over time.Model uncertainty is explicitly incorporated in their model by including indicatorvariables for each potential explanatory variable. Out-of-sample forecasts over differenthorizons show that incorporating model uncertainty and structural breaks can improveforecasting performance over several benchmarks. However, the approach of GROEN

150 J. van der Maas

© 2014 The Authors. Statistica Neerlandica © 2014 VVS.

et al. (2013) assumes that the same set of predictors holds for the complete sampleperiod, which may be restrictive.As an extension to the TVP models, CHAN et al. (2012) introduce the time-varying

dimension (TVD) model, where not only the parameters are allowed to change overtime but the set of explanatory variables as well. The TVP models are estimated usingstandard methods for estimating dynamic mixture models. A forecasting exercise of USinflation using five predictors shows that their TVD models exhibit better forecastingperformance than a range of benchmarks. A possible drawback of the TVD approachis that both model changes and parameter changes may lead to overfitting when a largerset of predictors is considered.BELMONTE and KOOP (2013) use switching Gaussian state-space models to forecast

annual inflation. Their framework allows different prespecified models to hold at eachpoint in time. Switching between models over time is regulated by a Markov process.Their results using four different TVP models with one predictor show that the modelusing the Michigan Consumer Survey as a predictor is selected most of the time, witha simple AR(1) forecast being the most valuable during crises and volatile times. How-ever, when using a larger model space of 15 possible models, the forecasting perfor-mance of their framework breaks down.Themodel presented in this paper is similar to the TVPmodel of GROEN et al. (2013) as

both approaches take uncertainty in variable selection into account. The main differenceis that in their approach, the regression parameters change, and the set of predictors staysthe same over time, whereas inmy approach, the parameters do not change, but themodeldoes. Compared with the TVD model of CHAN et al. (2012), the model presented here issomewhat less flexible, as their approach allows for both changes in the model andchanges in the parameters over time, whereas my approach only allows for the firstfeature. However, the model presented here has the property that a predictor can becompletely left out by setting a single parameter to zero, whereas in the model of CHAN

et al. (2012) a complete range of parameters needs to be set to zero to exclude a variable.The model of BELMONTE and KOOP (2013) is the least comparable, as it asks for theresearcher to prespecify a set of models that can be selected at different points in time,thereby being less flexible than the TVD approach or my approach, which in principle al-lows for all combinations of predictors to be selected in the model at every point in time.

3 Methods

3.1 Model specification

As a starting point for modeling h-period-ahead inflation, I use the followingextended specification of the Phillips curve as a starting point

ytþh ¼ αþXki¼1

βixit þ εt; t ¼ 1;…; T � h; (1)

with ε= (εi1,…, εi,T�h)′~N(0, σ2I). yt+h is the inflation measure at time t+ h, definedas yt+h=100ln(Pt+h/Pt+h�1), with Pt as a price index. xitf gki¼1 is a set of explanatory

Forecasting inflation using time-varying BMA 151

© 2014 The Authors. Statistica Neerlandica © 2014 VVS.

variables and β= (β1, …, βk)′ its corresponding coefficient vector. xitf gki¼1 caninclude aggregated activity measures, lags of inflation, variables measuring thecurrent state of the economy and inflation expectations. As the set of potentialexplanatory variables is large, including all variables will lead to overfitting, whichmeans that we have to decide which variables are to be included in the model. Toaccomplish this, I adapt Equation (1) to incorporate uncertainty regarding the se-lection of variables.To account for the model selection problem, we can apply the method of KUO and

MALLICK (1998) to Equation (1), starting with the model with all possible explana-tory variables included. Each variable in Equation (1) is multiplied with an indicatorvariable γi that determines whether that variable is included in the model. γi=1 im-plies that explanatory variable xi is included in the model, with

Pr γi ¼ 1½ � ¼ λ: (2)

If γi=1, xi is omitted from the model. The intercept term α is always included in themodel.As mentioned before, there is sufficient evidence from the literature that the relation

between inflation and its predictors is not constant over time, meaning that the set ofvariables that contain useful information for forecasting changes over time. Tradi-tionally, structural changes are modeled as structural breaks in the model’s parameters(e.g., GROEN et al., 2013; KOOP and POTTER, 2007). In this research, I deviate fromthis by allowing the set of explanatory variables to change over time, while keepingthe parameters constant. I extend the model in Equation (1) to allow for a differentset of variables for each period. That is, for each time t ∈T� h, a variable is includedin the regression with a certain probability specific to that variable. This yields thespecification

ytþh ¼ αþXki¼1

γisitβixit þ εt; (3)

where sit ∈ (0, 1) indicates whether variable i is included at time t. As before, γi regu-lates the inclusion of variable i in the model. This means that γi determines if a vari-able is included and sit=1 determines when it is included. It is possible that sit=1for some t, but γi=0, so that the variable is never included. Moreover, a variable xican be excluded from the model if sit=0 for all t, γi=0 or both. Both situations willnot cause identification problems, as there is always a positive probability of sit=1if γi=0 and of γi=1 if all sit=0.Equation (3) implies 2k(T�h) possible models. For k and T, even when moderately

large, an extremely large number of iterations are needed to cover the model space.Hence, we have to impose restrictions on sit. I allow sit to switch from 0 to 1 only afterit has been 0 for at least four periods, and vice versa, which will greatly reduce thenumber of possible models:

152 J. van der Maas

© 2014 The Authors. Statistica Neerlandica © 2014 VVS.

Pr sit ¼ 0jsi;t�1 ¼ 0; si;t�2; si;t�3; si;t�4� � ¼ qi if si;t�2 ¼ si;t�3 ¼ si;t�4 ¼ 0

1 elsewhere

�(4)

and

Pr sit ¼ 1jsi;t�1 ¼ 1; si;t�2; si;t�3; si;t�4� � ¼ pi if si;t�2 ¼ si;t�3 ¼ si;t�4 ¼ 1

1 elsewhere:

�(5)

If, for example, we have sit= si,t�1 = si,t�2 = 1, and si,t�3 = 0, si,t+1 will equal 1 with aprobability of 1. If predictor i is included at times t, t� 1, t� 2 and t� 3, it will also beincluded at t+1 with probability pi. Similarly, qi is the probability that xi is excludedat time t+1, given that it was excluded at times t, t� 1, t� 2 and t� 3. Therefore,sequences of 1s or 0s of a length smaller than four will never be observed, althoughlonger sequences are possible.Other features of inflation that are commonly found in literature are breaks in the

variance and level shifts. To incorporate breaks in this framework, we can introduce atime-varying intercept αt and variance σ2

t in the model specification, so we that we have

ytþh ¼ αt þXki¼1

γisitβixit þ εt; (6)

with

εt ∼ NID 0;σ2t

� �(7)

and where αt follows the following process

αt ¼ αt�1 þ Ktξ t; (8)

with Kt ∈ (0, 1) and ξ t∼N 0;σ2ξ

� �. This implies that a level shift at time t of size ξt

occurs if Kt=1, with

Pr Kt ¼ 1½ � ¼ π: (9)

Another possibility to allow for level shifts is to set Kt=1 for all t. The specificationin Equation (8) allows for an occasional large structural break in the intercept, wherethe case of Kt=1 implies a random walk for αt with a small break every period. Thespecification in Equation (8) is more flexible in that sense, as αt is not forced tochange. Moreover, a random-walk specification will force αt to closely follow γt+h,decreasing the influence of explanatory variables. Rather, the intercept should pickup large shifts in the level of inflation as a result of changes in monetary policy.

σ2t is allowed to change once to accommodate a shift like the Great Moderation

often found in macroeconomic time series. Hence,

Forecasting inflation using time-varying BMA 153

© 2014 The Authors. Statistica Neerlandica © 2014 VVS.

σ2t ¼

σ21 for t ≤ τ

σ22 for t > τ

;

((10)

where τ is the breakpoint. Another possibility for change points in the variance is aspecification like Equation (8) for lnσ2

t (e.g., GROEN et al., 2013; GIORDANI andKOHN, 2008). To nest the case that there is no variance break, I take

σ22 ¼ σ2

1δ2d; (11)

where d is a binary indicator for a variance break. If d=0, there is no break in thevariance and σ2

2 ¼ σ21 also for t> τ. For d=1, the variance changes with a factor δ2

after t= τ. The probability of a variance break is

Pr d ¼ 1½ � ¼ κ: (12)

For convenience, the notationσ22 is used in the remainder of this paper instead ofσ2

1δ2d.

In short, the model allows for (i) time-varying selection of variables, (ii) the possi-bility to completely leave out certain variables, (iii) level shifts and (iv) one possiblebreak in conditional variance.

3.2 Prior specification

The parameters in the model are the vector of regression coefficients β; the transitionprobabilities Q= (q1,…, qk)′ and P= (p1,…, pk)′; the variable inclusion probability λ;the structural break probability π; the variance parameters σ2

1, δ2 and σ2

ξ; the variance

breakpoint τ; and the variance break probability κ. Estimation and inference on the

(T� h+4)k+7 vector of parameters θ ¼ β′;Q′;P′; λ′; π;σ21; δ

2; κ;σ2ξ ; τ

� �′can be

performed relatively straightforwardly in a Bayesian way.In order to use a Bayesian approach, prior distributions for the parameters in θ

have to be specified. For the regression parameters β, a conjugate choice is

β σ21;σ

22; τ∼Nk B0;V0ð Þ; (13)

a multivariate normal distribution. B0 reflects prior belief about the size of the βcoefficients. V0 is the prior covariance matrix, which depends on the variance parame-tersσ2

1 andσ22 and the breakpoint τ. AsV0, I takeV0 ¼ g σ�2

1 X1X1 þ σ�22 X2X2

� ��1, where

X1 and X2 are the rows of the matrix of explanatory variables before and after τ,respectively.1 This can be seen as a generalization of the g-prior of ZELLNER (1986),as it reduces to gσ2(X′X)�1 for σ2

1 ¼ σ22 ¼ σ2 . The g-prior is commonly used in the

Bayesian model averaging (BMA) literature (e.g. KOOP and POTTER, 2004) as it isproportional to the OLS covariance matrix and reduces the problem of specifyingk(k+1)/2 elements of a covariance matrix to the choice of a single hyperparameter g.Here, I treat g as a hyperparameter, but one can also specify a prior for g (e.g. LEY

and STEEL, 2012; LIANG et al., 2008).

154 J. van der Maas

© 2014 The Authors. Statistica Neerlandica © 2014 VVS.

For the transition probabilities, standard choices for the prior distribution are betadistributions

qi ∼ Beta a0i; b0ið Þ for i ¼ 1;…; k (14)and

pi ∼ Beta a1i; b1ið Þ for i ¼ 1;…; k: (15)

The mean prior probability of a transition from 0 to 1 is then b0i/(a0i+ b0i). A tran-sition from 1 to 0 occurs with prior expectation b1i/(a1i+ b1i). The variable inclusionprobability π and the break probability lambda also follow beta distributions

π ∼ Beta υπ;ωπð Þ (16)and

λ∼ Beta υλ;ωλð Þ; (17)

so that the prior mean probability of the variable inclusion equals υλ/(υλ+ωλ) and theexpected model size is k[υλ/(υλ+ωλ)]. A level shift occurs with prior mean probabilityυπ/(υπ+ωπ). For the variance parameters, I take inverted gamma-2 distributions

σ21 ∼ IG� 2 ν1; η1ð Þ; (18)

δ2 ∼ IG� 2 νδ; ηδð Þ (19)

and

σ2ξ ∼ IG� 2 νξ ; ηξ

� �: (20)

The parameters νj and ηj can be chosen to reflect prior belief about the size of thevariances. For (νj, ηj)→(0, 0), Equation (18) is a diffuse prior. νξ and ηξ can be chosento regulate the occurrence of structural breaks, where the expected variance of thebreak size is νξ/(ηξ � 2) for ηξ > 2. Large values of νξ/(ηξ � 2) will lead to few largebreaks, whereas small values of νξ/(ηξ� 2) will induce more frequent but smallerbreaks. Finally, I take a discrete uniform prior for the breakpoint τ

p τð Þ ¼1

T � h� 2afor τ ∈ hþ aþ 1;…; T � af g

0 elsewhere;

((21)

so that a break can occur at each time point with an equal prior probability, exceptfor the first and last a observations.For the variance break probability, I take a beta distribution

κ ∼ Beta υκ;ωκð Þ: (22)

The joint prior distribution of θ is given by

p θð Þ ¼ p βjσ21; δ

2; d; τ� �

p σ21

� �p δ2� �

p κð Þp τð Þp πð Þp λð Þp σ2ξ

� �∏k

i¼1p qið Þp pið Þ; (23)

where the terms on the right-hand side of the equation are given by the density func-tions corresponding to Equations (13–21).

Forecasting inflation using time-varying BMA 155

© 2014 The Authors. Statistica Neerlandica © 2014 VVS.

3.3 Estimation

Posterior distributions can be obtained from the Markov chain Monte Carlo(MCMC) simulation, where the latent variables S=(s11, s21, …, sk,T�h)′, Γ=(γ1, …, γk)′,K=(K1, …, KT�h)′, A=(α1, …, αT�h)′ and d are sampled with the parameter θ. In orderto use the MCMC sampler, we need to construct the complete data likelihood function,which is given by

p y;Γ; S;A;K; djθð Þ ¼ p yjβ;σ21; δ

2; d;Γ; S;A� �

p Γjλð Þp SjQ;Pð ÞpðAjK;σ2ξÞp K πÞp d κÞ;jðjð

(24)where the density of the data is given by

p yjβ;σ1; δ2; d;Γ; S;A� �

∝σ�τ1 σ� T�h�τð Þ

2

exp

t¼1yt � αt �

Xk

i¼1γisitβixit

� �22σ2

1

0B@

1CAexp

�XT�h�τ

t¼τþ1yt � αt �

Xk

i¼1γisitβixit

� �22σ2

2

0B@

1CA

(25)

and the density functions of the latent variables are given by

p Γjλð Þ∝∏k

i¼1λγi 1� λð Þ1�γi ; (26)

p SjQ;Pð Þ ¼ ∏k

i¼1∏T�h

t¼1psiti 1� pið Þ1�sit� �I1

1� qið Þsit q1�siti

� �I0δ si;t�1� �1�I1�I0 ; (27)

p AjK;σ2ξ

� �∝∏T�h

t¼1σ�1ξ exp � αt � αt�1ð Þ2

2σ2ξ

! !Kt

δ αt�1ð Þ1�Kt ; (28)

p djκð Þ∝κd 1� κð Þ1�d (29)

and

p Kjπð Þ∝∏T�h

t¼1πKt 1� πð Þ1�Kt ; (30)

with I1 = I[si,t�1 = si,t�2 = si,t�3 = si,t�4 = 1] and I0 = I[si,t�1 = si,t�2 = si,t�3 = si,t�4 = 0].δ(x) denotes a point mass at x.In each iteration of the sampler, the parameters and latent variables are drawn

from their full conditional distribution. The sampler consists of the following steps:

1. Draw β conditional on θ, Γ, S, A and y.2. Draw P and Q conditional on S and y.3. Draw π conditional on K and y.4. Draw λ conditional on Γ and y.5. Draw σ2

1 conditional on θ, Γ, S, A and y.6. Draw δ2 conditional on θ, Γ, S, A and y.7. Draw κ conditional on d and y.8. Draw σ2

ξ conditional on A, K and y.

156 J. van der Maas

© 2014 The Authors. Statistica Neerlandica © 2014 VVS.

9. Draw τ conditional on θ, Γ, S, A and y.10. Draw S conditional on θ, Γ, A and y.11. Draw A conditional on θ, Γ, S, K and y.12. Draw K conditional on θ, Γ, S and y.13. Draw d conditional on θ and y.14. Draw Γ conditional on θ, S, A and y.

Appendix A describes the steps in more detail.

3.4 Convergence

In order to asses whether the sampler has converged, I use two convergence diagnosticsthat are calculated separately for each set of draws for all parameters of the model, beingGEWEKE (1992) convergence tests and inefficiency factors. For a sample of draws g(1),…, g(N), the GEWEKE (1992) test statistic tests for equality of the means of two subsetsof draws, one in the beginning and one at the end of the sequence. If the first subsamplecontains N1 draws and the second subsample N2, the statistic can be computed as

tG ¼1N1

XN1

i¼1g ið Þ þ 1

N2

XN

i¼N�N2þ1g ið Þffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

1N1σ1 þ 1

N2σ2

p ; (31)

where σ1 and σ2 are the heteroskedasticity and autocorrelation consistent estimators of thestandard deviations in the first and last subsamples, respectively, calculated as in NEWEY

and WEST (1986) using a Bartlett kernel and a bandwidth of 4% of the number of drawsin the sample. Calculated values of the tG statistic can be compared with critical valuesof the standard normal distribution. Following GROEN et al. (2013), I take N1=0.2Nand N2=0.4N. Likewise, I calculate the inefficiency factors for all parameters as

1þ 2X∞i¼1

ρi; (32)

with ρi the as the ith-order autocorrelation of the sample of draws. The inefficiencyfactor equals the variance of the sample of draws, divided by the variance assumingindependent draws. The variance of the samples is calculated as the NEWEY and WEST

(1986) variance using a Bartlett kernel and a bandwidth of 4% of the number of draws inthe sample. The inefficiency factor gives an indication of the number of draws needed foran accurate analysis of a certain parameter. An inefficiency factor of, for example, 20means that at least 2000 draws are needed if the variance of the mean of the set of drawsshould be at most 1% of the variation owing to the data.

4 Data and prior choices

4.1 Data

In this paper, I consider two measures of quarterly US inflation over the period1960Q1–2011Q2 to forecast in real time, being the growth in personal consumption

Forecasting inflation using time-varying BMA 157

© 2014 The Authors. Statistica Neerlandica © 2014 VVS.

expenditure (PCE) deflator and the gross domestic product (GDP) deflator. Both in-flation measures move relatively close together, but as their composition differs, it islikely that their behavior can differ substantially. For both inflation measures, theoriginal vintages of real-time data are available.As it is the goal of this paper to forecast inflation in real time, the possible predic-

tors have to be selected on the basis of the availability of the original real-time vin-tages. This leaves a set of seven real activity measures, being real GDP in volumeterms, real durable PCE in volume terms, real residential investment in volume terms,the import deflator (PIMP), the unemployment ratio (UNEMP), non-farm payrolldata on employment and housing starts (HSTS).Next to real activitymeasures, I also use nominal variables as theM2monetary aggre-

gate and term structure data. As the term structure information can be approximated bythree factors, I use the level factor, slope factor and curvature factor based on TreasuryBill rates and zero-coupon bond yields. Finally, I use the one-year-ahead inflation expec-tations from the Reuters/Michigan Survey of Consumers as a possible predictor.Using the same data set,2 GROEN et al. (2013) conducted stationarity tests on each

series and transformed those to stationarity where necessary by computing the percent-age change. In this paper, I use the same transformations as GROEN et al. (2013). Thetransformations used are shown in Table 1.All variables are centered and scaled by subtracting their respective means and

dividing by their standard deviations in each vintage. Subtracting the means willremove the constant part of the predictors, as an intercept is always included in themodel. Dividing the variables by their standard deviation will eliminate possible biastoward including variables that have a high variance.

4.2 Prior choice

In order to use the MCMC sampler, we have to specify the hyperparameters for theprior distribution given by Equations (13–21). With regard to the prior mean of β, B0,I follow KUO andMALLICK (1998) and set B0 = 0 after centering and scaling all predic-tor variables. For the choice of g, we have to take into account that, in general, smallvalues of g lead to saturated models with small β coefficients, whereas large values leadto parsimonious models with a few large coefficients (GEORGE and FOSTER, 2000).However, too large values of g put more emphasis on the null model CHIPMAN et al.(2001). FERNANDEZ, LEY and STEEL (2001) examine a range of specifications for gand conclude that g=T is the optimal choice for small T when the dependent variableis assumed to be generated from a model with relatively few predictors. Hence, I setg=T, which corresponds to assigning the amount of information of one observationto the prior.The parameters a1i, b1i, a0i and b0i can be chosen to reflect the belief on the persistence

of the explanatory power of each variable. It seems reasonable to assume that if avariable has explanatory power at some time, it will also have explanatory power inthe next period with a large probability. The same goes for variables that are not

158 J. van der Maas

© 2014 The Authors. Statistica Neerlandica © 2014 VVS.

useful at a certain period. Therefore, I set a1i= a0i=30 and b1i= b0i=30/19 for all i, sothat the mean prior transition probabilities are equal to 0.95 (Table 2).I set υλ=5 and ωλ=10, so that the expected model size is k/3. The parameters υπ and

ωπ can be set to reflect beliefs on the frequency of levels shifts, whereas νξ and ηξ regulatethe size of the breaks. The intercept should exhibit relatively large shifts of thelevel of inflation, caused by changes in monetary policy and crises, so I set υπ=5,ωπ=95,νξ =10 and νxi=1, which means that a break occurs on average every 25 quarters.For the varianceσ2

1, I use an informative prior with ν1 = 2 and η1= 10, so thatE σ21

� � ¼0:25. For δ2, I take νδ=1 and η1 = 3, so δ is centered around 1 as can be expected thatσ2

2

is close to σ21. υκ and ωκ are set to 10 and 1, respectively. Finally, I set a=10, so that a

break in the variance cannot occur in the first or last 10 observations.

5 Full-sample results

This section presents the results from estimating the model described in section 3 overthe full 1960Q1–2011Q2 sample for both the PCE deflator and GDP deflator inflationmeasures and for forecast horizons h=1 and h=5. The model is estimated using thecomplete set of variables described in section 4 and four lags of the relevant inflationmeasure and using only four lags of inflation, which is essentially an autoregressive(AR) model with a variable lag length. The model using the full set of variables willbe referred to as the BMA–time-varying variable selection (BMA-TVS) model, and

Table 1. Transformations of variables to en-sure stationarity based on GROEN et al. (2013)

Variable Transformation

ROUTP Percentage changeRCONS Percentage changeRINVR Percentage changePIMP Percentage changeUNEMP LevelNFPR Percentage changeHSTS LogarithmM2 Percentage changeYL LevelTS LevelCS LevelMS Level

Note: CS, curvature factor; HSTS, housing starts;MS, one-year ahead inflation expectations fromthe Reuters/Michigan Survey of Consumers;NFPR, non-farm payroll data on employment;PIMP, import deflator; RCONS, real durablepersonal consumption expenditure in volumeterms; RINVR, real residential investment in vol-ume terms; ROUTP, real gross domestic productin volume terms; TS, slope factor; UNEMP, un-employment ratio; YL, level factor.

Forecasting inflation using time-varying BMA 159

© 2014 The Authors. Statistica Neerlandica © 2014 VVS.

the model using only lags as the BMA-TVS-AR model. Convergence statistics for es-timation of the models can be found in Appendix B.Tables 3 and 4 show the marginal posterior inclusion probabilities for each vari-

able, averaged over time for the BMA-TVS and BMA-TVS-AR models. An interest-ing feature of the marginal posterior inclusion probabilities is that the approach tendsto select quite parsimonious models. For each inflation measure and forecasting hori-zon, a small set of predictors are often selected, whereas the inclusion probabilities ofother variables are shrunk toward 0 – below 5% on average. It is also striking that forthe longer forecast horizon h=5, the model tends to select fewer predictors than forh=1. Indeed, the mean inclusion probabilities for h=1 for the BMA-TVS specifica-tion are 14% for both PCE and GDP deflator inflation but drop to 9% and 11%,respectively. The same pattern is visible for the BMA-TVS-AR model. Hence, thefull-sample results show that, given a large set of potential explanatory variables,the BMA-TVS specification averages over parsimonious models, thereby reducingthe risk of overfitting, which can be beneficial for out-of-sample forecasting.The marginal posterior inclusion probabilities can also be used to gain some insight in

which predictors are most useful for predicting a certain inflation measure at a certainforecasting horizon. When looking at the probabilities for the current quarter nowcastsin Table 3, one may notice some similarities between both inflation measures. For bothmeasures, UNEMP, level factor and PIMP are often selected. However, the PCE defla-tor inflation seems to be strongly correlated to inflation lags, whereas themost importantpredictors for the GDP deflator inflation are the macroeconomic factors UNEMP andPIMP. It is interesting to see that the models for h=5 select completely different predic-tors than the h=1 models. For one-year-ahead predictions, the most valuable informa-tion seems to be present in the yield curve factors and HSTS.

Table 2. Hyperparameter specification forthe model described in section 3

Parameter Value

B0 0g Ta1i 30 for i=1, …, kb1i 30 for i=1, …, ka0i 30/19 for i=1, …, kb0i 30/19 for i=1, …, kυπ 5ωπ 95υλ 5ωλ 10υκ 10ωλ 1ν1 2η1 10νδ 1ηδ 3νξ 1ηξ 10a 10

160 J. van der Maas

© 2014 The Authors. Statistica Neerlandica © 2014 VVS.

Table 4 indicates that for h=1, the current quarter is most informative for forecastinginflation. An interesting result is that the third lag is often selected for PCE deflatorinflation, whereas it is almost never selected in the case of the GDP deflator inflation.This could be an indication that there is some seasonality present in the GDP deflatorseries, but not in the PCE deflator. For h=5, we see that the inclusion probabilitiesdecrease relative to the h=1 case, and the model only selects one predictor with aprobability larger than 0.2.We can further zoom into the marginal posterior inclusion probabilities by

graphing them over time, as in Figures 1 and 2.

Table 3. Variable inclusion probabilities calculated as posterior means ofXT�h

t¼1Pr γisit¼1 y�= T�hð Þj½ for the Bayesian model averaging–time-varying

variable selection model

GDP deflator inflation PCE deflator inflation

h=1 h=5 h=1 h=5

MS 0.29 0.05 0.07 0.07UNEMP 0.61 0.09 0.22 0.07HSTS 0.05 0.25 0.07 0.61YL 0.11 0.12 0.17 0.12TS 0.08 0.29 0.08 0.25CS 0.03 0.05 0.04 0.03M2 0.04 0.05 0.03 0.04ROUTP 0.03 0.04 0.02 0.07NFPR 0.07 0.04 0.09 0.05RCONS 0.02 0.03 0.03 0.05PIMP 0.46 0.04 0.35 0.12RINVR 0.04 0.03 0.04 0.04yt 0.07 0.04 0.41 0.03yt�1 0.04 0.16 0.06 0.11yt�2 0.33 0.05 0.08 0.04yt�3 0.04 0.07 0.47 0.04

Note: CS, curvature factor; GDP, gross domestic product; HSTS, housing starts;MS, one-year ahead inflation expectations from the Reuters/Michigan Survey ofConsumers; NFPR, non-farm payroll data on employment; PCE, personal con-sumption expenditure; PIMP, import deflator; RCONS, real durable personalconsumption expenditure in volume terms; RINVR, real residential investment involume terms;ROUTP, real gross domestic product in volume terms; TS, slope fac-tor; UNEMP, unemployment ratio; YL, level factor.

Table 4. Variable inclusion probabilities calculated as posterior means of∑T�h

t¼1 Pr γisit ¼ 1 y�= T � hð Þj½ for the Bayesian model averaging–time-varyingvariable selection–autoregressive model

GDP deflator inflation PCE deflator inflation

h=1 h=5 h=1 h=5

yt 0.44 0.05 0.54 0.08yt�1 0.06 0.07 0.13 0.27yt�2 0.19 0.04 0.08 0.11yt�3 0.04 0.50 0.40 0.04

Note: GDP, gross domestic product; PCE, personal consumption expenditure.

Forecasting inflation using time-varying BMA 161

© 2014 The Authors. Statistica Neerlandica © 2014 VVS.

From Figure 1, we can see that the relations between inflation measures and predictorscan differ substantially over time and per forecast horizon. For h=1, the models include afew variables with a moderately high probability – never higher than 80% – which areroughly equally important and which evolve gradually over time. We can also see thatinclusion probabilities are generally higher in volatile times like the 1980s. For the h=5horizon, the models select one or two predictors with high probabilities, as high as100% in the case of HSTS for the PCE deflator inflation. The other variables are moder-ately important and are relatively stable over time. Just like for h=1, the plots show peaksin inclusion probabilities during volatile or high-inflation periods. For the PCE deflatorinflation, a sudden increase in HSTS inclusion coincides with the 1973–1975 recession,which includes the oil crisis. The HSTS inclusion probability drops until the 1979 energycrisis, where it starts to rise again and drops at the end of the recession in 1982. The laterpeaks in HSTS inclusion coincide with the early 2000s recession and the recent Great Re-cession. For the GDP deflator inflation, a similar pattern is visible, although less clearly.

(a) PCE, h = 1 (b) PCE, h = 5

(c) GDP, h = 1 (d) GDP, h = 5

Fig. 1. Marginal posterior means of γisit plotted over time for personal consumption expenditure (PCE) andgross domestic product (GDP) deflator inflation for h=1 and h=5 for Bayesian model averaging–time-varying variable selection model for the five variables with the highest marginal posterior inclu-sion. HSTS, housing starts; MS, one-year ahead inflation expectations from the Reuters/MichiganSurvey of Consumers; PIMP, import deflator; TS, slope factor; UNEMP, unemployment ratio;

YL, level factor.

162 J. van der Maas

© 2014 The Authors. Statistica Neerlandica © 2014 VVS.

Above all, Figure 1 shows the flexibility of the BMA-TVS specification in capturingtime-varying relations between inflation and its predictors. The specification allows forgradual changes in inclusion probabilities, such as the inclusion of UNEMP for PCEdeflator inflation and h=1, as well as sudden shifts in probabilities, like for HSTS forPCE and h=5. This flexibility allows the model to capture any type of changing relation-ships between inflation and predictors that traditional approaches would not be able to.Figure 2 shows that the relations between one-quarter-ahead inflation and present

quarter inflation are quite similar for both PCE and GDP deflator inflations. Bothplots indicate peaks of the inclusion probabilities in the 1970s and 1980s and a periodof high inclusion between 1990 and 2005. However, for the PCE inflation, yt�1 andyt�3 are almost never selected, which is not the case for the GDP deflator inflation.The h=5 results also show an interesting picture. The PCE deflator inflation modelonly selects yt�2 with a high probability, especially during volatile times such as the1970s and the 1980s, whereas the other lags are almost never selected. For the GDPdeflator inflation, the variable inclusion is even lower. For some periods, all inclusionprobabilities are below 10%.

(a) PCE, h = 1 (b) PCE, h = 5

(c) GDP, h = 1 (d) GDP, h = 5

Fig. 2. Marginal posterior means of γisit plotted over time for personal consumption expenditure (PCE)and gross domestic product (GDP) deflator inflation for h=1 and h=5 for Bayesian model aver-

aging–time-varying variable selection–autoregressive model.

Forecasting inflation using time-varying BMA 163

© 2014 The Authors. Statistica Neerlandica © 2014 VVS.

6 Forecasting

6.1 Forecasting procedure and evaluation

To evaluate the forecasting performance of the framework described in section 3, Iobtain one-quarter-ahead nowcasts (h=1) and one-year-ahead forecasts (h=5) overthe sample 1986Q1–2011Q2. The forecasting procedure consists of re-estimating themodel using an expanding window of real-time data. Suppose the first forecast is madein quarter t0. This boils down to estimating the model for h=1 and h=5 using dataavailable in quarter t0, which are data on the periods t=1, 2, …, t0� h as it is knownat t0. These models are used to obtain density forecasts for the inflation at quarters t0and t0 + 4, which are compared with the realizations at quarters t0 + 1 and t0 + 5,respectively. Subsequently, the model is re-estimated using data on the periodst=1, 2, …, t0� h+1 to produce forecasts for t0 + h. This procedure continues untilthe end of the sample.To obtain draws from the predictive density of yT+h+1 given data available in period

T+1, we need out-of-sample draws of (si,T+1, …, si,T+h+1), (KT+1, …, KT+h+1) and(αT+1, …, αT+h+1). In each step of the sampler, draws of (si,T+1, …, si,T+h+1) canbe obtained given the current sampled values of (si,T�3, …, si,T) by sampling fromp(sit|si,t�1, …, si,t�4) (see, e.g., ALBERT and CHIB, 1993, for a similar approach).Values of (KT+1, …, KT+h+1) can be sampled using the current value of π. Given drawsof (KT+1, …, KT+h+1) and αT, (αT+1, …, αT+h+1) can be sampled from Equation (8).Given draws of (si,T+1, …, si,T+h+1), θ, Γ and (αT+1, …, αT+h+1), a value of yT+h+1 canbe sampled from the predictive density

yTþhþ1jαTþhþ1; γf gki¼1; si;Tþhþ1; β;σ22∼NðαTþhþ1 þ

Xki¼1

γisi;Tþ1βixi;Tþ1;σ22Þ: (33)

Point forecasts can be obtained by taking the means and medians of the predictivedensities of yT+h+1, which can be compared with the realized inflation rates with theRMSE or the mean absolute error (MAE), respectively. The RMSE is an appropriatemeasure in case the mean is used as a point forecast, whereas the MAE is appropriatewhen using the median. The forecast performance will be measured by comparing theRMSE and MAE resulting from forecasting from (33) to forecast measures resultingfrom forecasting with a range of benchmark models, which will be described in thenext subsection. The measures can be computed as

RMSE ¼ffiffiffiffiffiffiffiffiffiffiMSE

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiXT�h

t¼t0�1ytþh � ytþh

� �2T � h� t0

vuut(34)

and

MAE ¼XT�h

t¼t0�1ytþh � ytþh

T � h� t0

; (35)

where ytþh is the point forecast made using data available at time t and yt+h is theinflation measure of quarter t+ h as known at quarter t+ h+1.

164 J. van der Maas

© 2014 The Authors. Statistica Neerlandica © 2014 VVS.

Following GROEN et al. (2013), I use the DIEBOLD and MARIANO (1995) test withthe HARVEY, LEYBOURNE and NEWBOLD (1997) small sample correction to evaluatethe significance of the difference of the RMSE or MAE of the model relative to anyof the benchmarks used. The test statistic can be calculated as

tHLN ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiT � t0 � hð Þ þ 1� 2hþ T � t0 � hð Þ�1h h� 1ð Þ

T � t0 � hð Þ

stDM ; (36)

where tDM is the DIEBOLD and MARIANO (1995) test statistic:

tDM ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiT � t0 � h

p dl � dBENCHσ

� �: (37)

dl is either the MSE or MAE for model l. Likewise, dBENCH is the MSE or MAE for abenchmark model. σ is the square root of the heteroskedasticity and autocorrelation con-sistent variance, as computed by DIEBOLD and MARIANO (1995), of the difference ineither the difference in squared forecast errors (for MSE) or absolute errors (for MAE).The tHLN statistic is computed for all variants of the model described in section 3 versusall benchmark models, after which it can be compared with critical values of the stan-dard normal distribution to test the null hypothesis of equal forecasting accuracy.The RMSE and MAE only evaluate the point forecasts and ignore the rest of the

predictive density. As unusual or extreme events will be located away from the meanor median of the predictive density, it is also useful to evaluate the density forecasts.A measure to evaluate density forecasts is the continuous ranked probability score(CRPS), defined as

CRPS t þ hð Þ ¼ ∫∞

�∞ F xð Þ � I ytþh ≤ x� �� �2

dx (38)

¼ E Ytþh � ytþh

� 12E Ytþh � Y ′

tþh

; (39)

where F is the cumulative density function corresponding to the density forecast madeat time t, I is an indicator function and Yt+h and Y ′

tþh are random variables with a dis-tribution function equal to the predictive density. Yt+h can be obtained from theMCMC draws, andY ′

tþh can be obtained by resampling of the draws. For models witha normally distributed error term, the CRPS can be computed analytically as

CRPS t þ hð Þ ¼ σ1ffiffiffiπ

p � 2ϕytþh � E ytþh

� �σ

� �� ytþh � E ytþh

� �σ

2Φytþh � E ytþh

� �σ

� �� 1

� � �:

(40)

As the CRPS measures the distance between the predictive density and the empir-ical cumulative distribution function of realized inflation, a low value means a gooddensity forecast. As a measure of forecasting performance over the entire forecastingsample, we can take the average of the CRPS over all predictions:

Forecasting inflation using time-varying BMA 165

© 2014 The Authors. Statistica Neerlandica © 2014 VVS.

CRPS ¼ 1T � t0 � h

XT�h

t¼t0�1

CRPS t þ hð Þ: (41)

Following GROEN et al. (2013), differences in CRPSwill be tested for statistical sig-nificance using the tHLN statistic.

6.2 Alternative models

The forecasting performance of the model of section 3 will be compared with theforecasting performance of a set of benchmark models that have proven to be hardto beat. The benchmark models include the standard random-walk model, whichpredicts h-period-ahead inflation as being equal to present inflation:

ytþh ¼ yt þ σεt; εt ∼ N 0; 1ð Þ : : (42)

The second benchmark under consideration is the ATKESON and OHANIAN (2001)random-walk (AORW) model, which models h-period-ahead inflation as the averagequarterly inflation over the previous year:

ytþh ¼14

X3i¼0

yt�i þ σεt εt∼N 0; 1ð Þ: (43)

Furthermore, I compare the forecast performance to simple AR(p) models, forp=1, …, 4

ytþh ¼ αþXpi¼1

βiyt�iþ1 þ σεt: (44)

Finally, I compute forecasts using an AR(p*) model, where p* ∈ 1, …, 4 is selectedon the basis of the Bayesian information criterion (BIC), which is often used in prac-tice. This model will be referred to as the AR BIC model.

ytþh ¼ αþXp�i¼1

βiyt�iþ1 þ σεt: (45)

The forecasts made by the model described in section 3 will be forecast using thecomplete set of variables described in section 4 and four lags of the relevant inflationmeasure and using only four lags of inflation.

7 Forecasting results

7.1 Forecast evaluation

In this section, the out-of-sample forecasting results for the 1986Q1–2011Q2 sampleare presented. The forecasts are compared against forecasts made by the competingmodels from section 2 by applying the forecast evaluation methodology described insection 1. The forecasting metrics for the BMA-TVS model are shown in Table 5 as aratio relative to the competing models, where a ratio smaller than 1 means the BMA-TVS forecasts are superior to the competing model’s forecasts.

166 J. van der Maas

© 2014 The Authors. Statistica Neerlandica © 2014 VVS.

Tab

le5.

Ratiosof

RMSE

,MAEan

dCRPSfortheBMA-TVSmod

elforGDPan

dPCEdeflator

inflationforh=1an

dh=5relative

tocompeting

mod

els

h=1

h=5

RMSE

MAE

CRPS

RMSE

MAE

CRPS

PCEdeflator

RW

0.85*(�

1.80)

0.87**

(�1.87)

0.87***(�

2.91)

0.83**

(�2.10)

0.86

(�2.01)**

0.89*(�

1.43)

AORW

0.95

(�1.05)

0.97

(�0.66)

0.92**

(�2.21)

1.01

(0.13)

0.99

(�0.44)

0.94**

(�2.10)

AR1

0.90*(�

1.30)

0.91*(�

1.34)

0.92*(�

1.37)

0.85***(�

2.71)

0.83***(�

2.54)

0.83***(�

2.81)

AR2

0.93

(�1.16)

0.96

(�0.72)

0.95

(�1.12)

0.87***(�

2.75)

0.83***(�

2.80)

0.85***(�

2.90)

AR3

0.97

(�0.60)

1.01

(0.16)

0.91*(�

1.41)

0.87***(�

2.77)

0.83***(�

2.85)

0.83***(�

2.82)

AR4

0.95

(�0.97)

0.99

(�0.14)

0.91*(�

1.41)

0.84***(�

3.50)

0.79***(�

3.62)

0.83***(�

2.81)

AR

BIC

0.93

(�1.22)

0.96

(�1.00)

0.95

(�1.22)

0.87***(�

2.69)

0.84***(�

2.39)

0.85**

(�2.05)

GDPdeflator

RW

0.82***(�

3.22)

0.85***(�

2.37)

0.95

(�0.98)

0.93

(�1.17)

0.96

(�0.60)

0.94*(�

1.29)

AORW

1.04

(0.84)

1.05

(1.21)

0.93**

( �1.78)

1.07

(1.57)

1.07

(1.67)

0.94*(�

1.61)

AR1

0.84***(�

3.15)

0.86**

(�2.68)

0.86***(�

3.26)

0.76***(�

3.91)

0.74***(�

3.70)

0.76***(�

4.23)

AR2

0.90**

(�2.30)

0.91**

(�2.15)

0.92**

(�2.22)

0.78***(�

3.95)

0.75***(�

3.87)

0.78***(�

4.34)

AR3

0.94*(�

1.42)

0.96

(�1.13)

0.87***(�

2.97)

0.79***(�

3.78)

0.76***(�

3.73)

0.76***(�

4.20)

AR4

0.95

(�1.24)

0.96

(�1.10)

0.87***(�

2.97)

0.78***(�

3.85)

0.76***(�

3.79)

0.76***(�

4.20)

AR

BIC

0.94**

(�1.65)

0.94*(�1

.43)

0.96*(�

1.38)

0.76***(�

3.91)

0.74***(�

3.70)

0.76***(�

4.23)

Notes:B

oldnu

mbers

indicate

asuperior

forecastby

theBMA-TVSmod

el.T

henu

mbers

inpa

renthesisarethet H

LN-statistics,where

negative

values

indicate

superior

forecastsby

theBMA-TVSmod

el.

AR,a

utoregressivemod

el;A

ORW,A

TKESO

Nan

dOHANIA

N(2001)

rand

omwalk;

BIC

,Bayesianinform

ationcriterion;

BMA-TVS,

Bayesianmod

elaveraging–time-

varyingvariab

leselection;

CRPS

,average

continuo

usrank

edprob

ability

score;GDP,g

rossdo

mesticprod

uct;MAE,m

eanab

solute

error;PCE,p

ersona

lcon

sump-

tion

expend

iture;

RW,ran

dom

walk.

*Statistically

sign

ificant

better

forecastsat

the10%

sign

ificancelevel.

**Statistically

sign

ificant

better

forecastsat

the5%

sign

ificancelevel.

***S

tatistically

sign

ificant

better

forecastsat

the1%

sign

ificancelevel.

Forecasting inflation using time-varying BMA 167

© 2014 The Authors. Statistica Neerlandica © 2014 VVS.

Regarding point forecasts, it seems that the BMA-TVS is not able to outperformthe AR models for the current quarter horizon in most cases. The same holds forthe AORW model. For the one-year-ahead horizon, however, the BMA-TVS modelsignificantly outperforms the AR models in terms of point forecasts, although theAORW is still too difficult to beat.The density forecast evaluation, on the other hand, shows that the BMA-TVS is able to

beat the competingmodelswhen it comes to density forecasting for the current quarter.How-ever, the results are still a bit poor for the PCE deflator inflation. For h=5, the BMA-TVSdensity forecasts seems to significantly outperform the density forecasts of all competingmodels, including the AORW model. The results show that the BMA-TVS model can beespecially useful for density forecasting inflation over the one-year-ahead horizon.Table 6 reports the results for the BMA-TVS-ARmodel. Surprisingly, theARversion of

the BMA-TVS seems tomakemore accurate point forecasts than the BMA-TVS specifica-tion. For all forecast except PCE deflator inflation at h=5, the RMSE andMAE ratios ofthe BMA-TVS-AR are consistently smaller than the BMA-TVS ratios. As a result, BMA-TVS-AR forecasts significantly outperform most AR forecasts, and even the AORWmodel for the PCE inflation at h=1. Moreover, the AR specification also produces betterdensity forecasts than the BMA-TVS model, again excluding PCE inflation for h=5.Overall, the BMA-TVS specifications can be a valuable tool for real-time out-of-sample

inflation forecasting, especially when it comes to density forecasting. Even though theAORW model forecasts better in terms of point forecasts, it is not significantly better inmost cases, whereas the BMA-TVS produces significantly better density forecasts.

7.2 Sensitivity analysis

The results obtained from the BMA-TVS specifications are obtained conditional on theprior specifications set in section 2. In order to asses the sensitivity of the results to the priorsettings, I obtain forecasts from the BMA-TVS model under different prior settings andcompare the predictive accuracy with the basic models’. Next to that, I also run theMCMC sampler for the BMA-TVS model with the basic prior settings using a set of pre-dictors that includes an extra one-quarter lag for all predictors (except for lagged inflation).This allows us to asses the performance of the model in case of a larger pool of predictors.For the first alternative, I set all a1i= b1i=30 and a2i= b2i=90/17, so all probabilities

in P and Q have a prior mean of 0.85. The second alternative has a1i= b1i=30 and a2i=b2i=10/33, so all probabilities inP andQ have a prior mean of 0.99. Furthermore, I ob-tain draws from versions of the model that have ν1= 2 and η1= 20, so E(σ1)= 1/9 andλ=0.5. The forecasting metrics of the alternative specifications are shown in Table 7 rel-ative to the forecasts of the BMA-TVS model with prior specifications as in section 2,where a value smaller than 1 means more accurate forecasts for the alternative model.Table 7 shows that forecasting accuracy does not change significantly for the second

alternative set of priors. For the first alternative, however, the performance breaks downfor the GDP deflator inflation forecasts for h=5. Unreported results indicate that lower-ing the prior mean of the transition probabilities in P andQ results in less-stable relations

168 J. van der Maas

© 2014 The Authors. Statistica Neerlandica © 2014 VVS.

Tab

le6.

Ratiosof

RMSE

,MAEan

dCRPSfortheBMA-TVS-AR

mod

elforGDPan

dPCEdeflator

inflationforh=1an

dh=5relative

tocompeting

mod

els

h=1

h=5

RMSE

MAE

CRPS

RMSE

MAE

CRPS

PCEdeflator

RW

0.85***(�

2.44)

0.85***(�

2.50)

0.86***(�

2.86)

0.84**

(�1.90)

0.88**

(�1.67)

0.91**

(�1.76)

AORW

0.95**

(�1.68)

0.95

(�1.14)

0.91**

(�2.30)

1.02

(0.70)

1.00

(0.18)

0.96

(�0.92)

AR1

0.90**

(�2.00)

0.90**

(�2.01)

0.91**

(�1.82)

0.86**

(�2.16)

0.84**

(�1.99)

0.86**

(�2.18)

AR2

0.92**

(�1.98)

0.94*(�

1.31)

0.94*(�

1.58)

0.88**

(�2.23)

0.85**

(�2.16)

0.87**

(�2.18)

AR3

0.96

(�1.19)

0.99

(�0.38)

0.91**

(�1.84)

0.88**

(�2.25)

0.84**

(�2.20)

0.86**

(�2.17)

AR4

0.95*(�

1.43)

0.98

(�0.64)

0.91**

(�1.87)

0.85***(�

2.92)

0.80***(�

2.93)

0.86**

(�2.18)

AR

BIC

0.93**

(�2.10)

0.94*(�

1.60)

0.94*(�

1.59)

0.88**

(�2.09)

0.86**

(�1.85)

0.87**

(�2.05)

GDPdeflator

RW

0.81***(�3

.08)

0.85**

(�2.24)

0.94

(�1.16)

0.88**

(�1.94)

0.93

(�1.05)

0.91**

(�2.29)

AORW

1.03

(1.20)

1.05

(1.45)

0.92**

(�2.03)

1.02

(1.06)

1.04

(0.73)

0.90***(�

2.95)

AR1

0.83***(�

3.16)

0.86***(�

2.47)

0.86***(�

3.12)

0.72***(�

4.87)

0.72***(�

4.25)

0.73***(�

5.30)

AR2

0.89***(�

2.97)

0.91**

(�2.09)

0.92***(�

2.59)

0.74***(�

5.21)

0.73***(�

4.61)

0.75***(�

5.69)

AR3

0.94**

(�1.90)

0.96

(�1.08)

0.86***(�

2.89)

0.75***(�

5.04)

0.74***(�

4.49)

0.73***(�

5.27)

AR4

0.95**

(�1.80)

0.96

(�1.07)

0.86***(�

2.89)

0.75***(�

5.07)

0.74***(�

4.55)

0.73***(�

5.28)

AR

BIC

0.93**

(�2.07)

0.94*(�

1.34)

0.95

(�1.54)*

0.72***(�

4.87)

0.72***(�

4.25)

0.73***(�

5.30)

Notes:B

oldnu

mbers

indicate

asuperior

forecastby

theBMA-TVS-AR

mod

el.T

henu

mbers

inpa

renthesisarethet H

LN-statistics,where

negative

values

indicate

su-

perior

forecastsby

theBMA-TVS-AR

mod

el.

AR,a

utoregressivemod

el;A

ORW,A

TKESO

Nan

dOHANIA

N(2001)

rand

omwalk;

BIC

,Bayesianinform

ationcriterion;

BMA-TVS-AR,B

ayesianmod

elaveraging–

time-varyingvariab

leselection–

autoregressive

mod

el; C

RPS

,average

continuo

usrank

edprob

ability

score;GDP,g

ross

domesticprod

uct;MAE,m

eanab

solute

er-

ror;PCE,person

alconsum

ptionexpend

iture;

RW,rand

omwalk.

*Statistically

sign

ificant

better

forecastsat

the10%

sign

ificancelevel.

**Statistically

sign

ificant

better

forecastsat

the5%

sign

ificancelevel.

***S

tatistically

sign

ificant

better

forecastsat

the1%

sign

ificancelevel.

Forecasting inflation using time-varying BMA 169

© 2014 The Authors. Statistica Neerlandica © 2014 VVS.

Tab

le7.

Ratiosof

RMSE

,MAEan

dCRPSforalternativepriorspecification

san

dad

dition

alexplan

atoryvariab

lesfortheBMA-TVSmod

elforGDPan

dPCE

deflator

inflationforh=1an

dh=5relative

totheBMA-TVSmod

elwithba

sepriorsettings

h=1

h=5

RMSE

MAE

CRPS

RMSE

MAE

CRPS

PCEdeflator

E(p

i)=E(q

i)=0.85

0.99

(0.70)

0.98

(1.40)

0.99

(0.80)

1.02

(�1.23)

1.01

(�1.08)

1.01

(�1.12)

E(p

i)=E(q

i)=0.99

1.00

(�0.39)

1.03

(�1.62)

1.01

(�0.70)

0.99

(0.61)

1.00

(�1.16)

1.00

(0.36)

λ=0.5

1.00

(0.00)

0.99

(0.40)

0.99

(0.16)

1.00

(�0.16)

1.00

(0.24)

1.01

(�0.30)

E(σ)=

1/9

1.01

(�0.24)

1.00

(0.25)

1.00

(�0.14)

1.01

(�0.17)

1.00

(�0.22)

1.01

(�0.26)

Add

itiona

llag

1.00

(�0.09)

1.00

(�0.18)

1.00

(�0.33)

0.99

(0.99)

0.98

(1.31)

0.99

(1.22)

GDPdeflator

E(p

i)=E(q

i)=0.85

1.01

(�1.10)

1.01

(�0.67)

1.01

(�1.23)

1.05***(�

2.82)

1.05***(�

2.81)

1.04***(�

2.99)

E(p

i)=E(q

i)=0.99

0.99

(0.79)

1.00

(0.36)

1.00

(1.14)

0.98

(1.34)

0.99

(1.19)

0.99

(0.80)

λ=0.5

1.00

(�0.01)

1.00

(�0.10)

0.99

(0.34)

0.94

(1.46)

0.95

(1.36)

0.95*(1.70)

E(σ)=

1/9

1.00

(�0.04)

1.00

(�0.10)

0.99

(0.41)

0.94

(1.37)

0.96

(1.31)

0.95

(1.61)

Add

itiona

llag

0.97

(0.76)

0.95

(1.37)

0.97

(1.07)

0.99

(0.96)

0.99

(1.22)

0.99

(0.78)

Notes:The

numbers

inpa

renthesisarethet H

LN-statistics,where

negative

values

indicate

superior

forecastsby

theba

semod

el.

BMA-TVS,

Bayesianmod

elaveraging–time-varyingvariab

leselection;

CRPS

,averagecontinuo

usrank

edprob

ability

score;

GDP,grossdo

mesticprod

uct;MAE,

meanab

solute

error;PCE,person

alconsum

ptionexpend

iture.

*Statistically

sign

ificant

better

forecastsat

the10%

sign

ificancelevel.

**Statistically

sign

ificant

better

forecastsat

the5%

sign

ificancelevel.

***S

tatistically

sign

ificant

better

forecastsat

the1%

sign

ificancelevel.

170 J. van der Maas

© 2014 The Authors. Statistica Neerlandica © 2014 VVS.

between inflation and its predictors, which can cause more uncertainty for out-of-sampleforecasting. The alternatives withE(σ1)=1/9 and λ=0.5 do not perform significantly bet-ter than the base case for the PCE deflator inflation and for the GDP deflator inflation ath=1. However, both alternatives can improve the GDP deflator inflation for h=5 some-what relative to the basic prior settings. Furthermore, we see that adding additionalexplanatory variables to the model never makes its predictive accuracy significantlyworse. Instead, it seems that adding additional variables to an already substantial numberof predictors can even slightly improve the forecasting performance. The conclusion thatwe can draw from Table 7 is that, although the results are generally not sensitive to priorchoices, the priors do have to be chosen carefully to ensure good out-of-sample forecasts.

8 Conclusion

This paper presented a BMA framework that allows for time variation in the set of variablesthat is included in the model, next to structural breaks in the intercept and conditional var-iance. This framework is subsequently used for real-time forecasting of PCE andGDPdefla-tor inflation for the current quarter and one-year-ahead horizon, using both a large pool ofpredictors and a version of the model with only lags of the relevant inflation measure in-cluded. A full-sample estimation of the model shows its flexibility in capturing both slowand abrupt changes in the relation between inflation measures and a set of predictors. Apoint forecast evaluation shows that versions of the model perform better than simple ARmodels. The AORWmodel –which is computationally much less intensive – is too difficultto beat inmost cases.However, theAORWmodel does not perform significantly better thanthe BMA-TVS versions.Moreover, the framework presented here can be a valuable tool fordensity forecasting of inflation, as the results show that especially the AR variant of themodel significantly outperforms the competing models for both inflation measures and bothforecasting horizons, including the AORW model. Another attractive property is the factthat the model’s forecasting accuracy is not affected in a negative way when a large numberof additional predictors is added. This shows that the modeling framework can effectivelyfilter out valuable information from a very large set of potential predictors.There is a range of possibilities to extend the BMA-TVS model presented in this

paper. Among the possibilities is the implementation of time-varying regression param-eters, as the assumption that the parameters are constant over a sample that containsaround 200 quarters can possibly be too restrictive. This could be incorporated byallowing the regression parameters to evolve according to a similar specification asEquation (8) or a Markov switching specification with a certain number of regimes forthe parameters (for example, a set of parameters for recessions and another set forexpansions). Furthermore, the single break in conditional variance could be restrictive,as there is evidence from literature (e.g., GROEN et al., 2013) that the conditional vari-ance decreased after the Great Moderation but is increasing again as a result of therecent crisis. This could be incorporated by including multiple possible discrete breaksor a Markov switching model with high-variance and low-variance regimes.

Forecasting inflation using time-varying BMA 171

© 2014 The Authors. Statistica Neerlandica © 2014 VVS.

NOTES

1. Technically, this specification violates the rules of conditional probability as it makesthe prior depend on y if lags of inflation are used as predictors.However, a specificationlike this can be justified by viewing it as an ‘empirical Bayes’ approach where data areused to specify hyperparameters, which is commonly used in literature.

2. The data set can be found at the website of the Journal of Business & EconomicStatistics. For the original data sources, I refer to GROEN et al. (2013).

Appendix A. MCMC sampling scheme

Appendix A.1 Sampling the regression coefficients β

We can rewrite the prior on β as

p βjσ21;σ

22; τ

� �∝jL′Lj1=2exp LB0 � Lβð Þ′ LB0 � Lβð Þ=2Þ;

�(A1)

where L′L ¼ 1=g σ�21 X1X1 þ σ�2

2 X2X2� �

.Furthermore, write

X� ¼γ1s11x11 … γks1kx1k

⋮ ⋱ ⋮γ1sT�h;1xT�h;1 … γksT�h;kxT�h;k

0B@

1CA

and

y� ¼y1 � α1

⋮yT�h � αT�h

0B@

1CA:

LetX�1 be the first τ rows of X

* andX�2 the remaining T� h� τ rows. Similarly, y�1 and

y�2 are the first and last τ and T� h� τ elements of y*. The part of the likelihood func-tion containing β can be written as

p yjθ;Γ; S;A;Kð Þ∝exp � y�1 � X�1β

� �′y�1 � X�

1β� �

=2σ21

� �� exp � y�2 � X�

2β� �′

y�2 � X�2β

� �=2σ2

2

� �:

(A2)Combining Equations (A1) and (A2) yields the full conditional distribution of β

p βjσ21;σ

22;Γ; τ; S; y

� �∝exp � z� Qβð Þ′ z� Qβð Þ=2

� �; (A3)

with

z ¼LB0

y�1=σ1

y�2=σ2

0B@

1CA

and

Q ¼L

X�1=σ1

X�2=σ2

0B@

1CA:

172 J. van der Maas

© 2014 The Authors. Statistica Neerlandica © 2014 VVS.

Hence, β can be sampled from a normal distribution with mean (Q′Q)�1Q′z andcovariance matrix (Q′Q)�1.

Appendix 2. Sampling the transition probabilities in P and Q

The full conditional distributions of pi and qi for i=1, …, k are given by

p pijS; yð Þ∝pai1�1i 1� pið Þbi1�1 ∏

T�h

t¼5psiti 1� pið Þ1�sit� �I1�

1� qið Þsit q1�siti

�I0δ si;t�1� �1�I1�I0

∝pai1þNi11�1i 1� pið Þbi1þNi10�1

andp qijS; yð Þ∝qai0þNi00�1

i 1� qið Þbi0þNi01�1; (A5)

where Ij= I[si,t�1 = si,t�2 = si,t�3 = si,t�4 = j] for j=0, 1 and Nijk denotes the numberof transitions from j to k for predictor i. Hence, pi and qi can be sampled fromBeta(ai1 +Ni11, bi1 +Ni10) and Beta(ai0 +Ni00, bi0 +Ni01) distributions, respectively.

Appendix 3. Sampling the break probability π

The full conditional distribution of π is equal to

p πjK; yð Þ∝πυπþXT�h

t¼1Kt�1

1� πð ÞωπþT�h�XT�h

t¼1Kt�1

; (A6)

so π can be sampled from aBeta υπ þXT�h

t¼1Kt;ωπ þ T � h�

XT�h

t¼1Kt

� �distribution.

Appendix 4. Sampling the inclusion probability λ

The full conditional distribution of λ is equal to

p λjγ; yð Þ∝λυλþXk

i¼1γi�1

1� λð Þωλþk�Xk

i¼1γi�1

; (A7)

so λ can be sampled from a Beta υλ þXk

i¼1γi;ωλ þ k �

Xk

i¼1γi � 1

� �distribution.

Appendix 5. Sampling the variance break probability κ

The full conditional distribution of κ is equal to

p κjd; yð Þ∝κυκþd 1� κð Þωκþ1�d; (A8)

so κ can be sampled from a Beta(υκ+ d, ωκ+1� d) distribution.

Appendix 6. Sampling the variance parameter σ21

Combining the prior for σ21

p σ21

� �∝σ� ν1þ2ð Þ

1 exp � η12σ2

1

� �; (A9)

with the likelihood function and dropping terms that do not depend on σ1 yield the kernel

(A4)

Forecasting inflation using time-varying BMA 173

© 2014 The Authors. Statistica Neerlandica © 2014 VVS.

p σ21jy; θ σ2

1;Γ; S;A;K� �

∝σ�τ1 σ� T�τð Þ

1 exp � y�1 � X�1β

� �′y�1 � X�

1β� �

=2σ21

� �exp � y�2 � X�

2β� �′

y�2 � X�2β

� �=2σ2

1δ2d

� �

� 1gσ�21 X1X1 þ σ�2

1 δ�2dX2X2� �

1=2

exp � B0 � βð Þ′ 1

2gσ21

X1X1 B0 � βð Þ� �

� σ� ν1þ2ð Þ1 exp � η1

2σ21

� �

∝σ� Tþν1þkþ2ð Þ1 �

exp� y�1 � X�

1β� �′

y�1 � X�1β

� �þ y�2 � X�2β

� �′y�2 � X�

2β� �

=δ2d þ B0 � βð Þ′1gX1X1 þ δ�2dX2X2� �

B0 � βð Þ þ η1h i

2σ21

0@

1A;

(A10)

which is the kernel of an inverted gamma-2 distribution with T+ ν1 + k degrees offreedom and scale parameter y�1 � X�

1β� �′

y�1 � X�1β

� �þ y�2 � X�2β

� �′y�2 � X�

2β� �

=δ2d þB0 � βð Þ′ 1=gð Þ X1X1 þ δ�2dX2X2

� �B0 � βð Þ þ η1.

Appendix 7. Sampling the variance parameter δ2

The full conditional distribution of δ2 can be written as

p δ2jy; θ δ2;Γ; S;A;K� �

∝δ�d T�τð Þexp � y�2 � X�2β

� �′y�2 � X�

2β� �

=2σ21δ

2d� �

� 1gσ�21 X1X1 þ σ�2

1 δ�2dX2X2� �

1=2

exp � B0 � βð Þ′ 1

2gσ21δ

2dX2X2 B0 � βð Þ !

� δ� ν2þ2ð Þexp � η22δ2

� �

∝1gX1X1 þ δ�2dX2X2� �

1=2

δ� d T�τð Þþν2þ2ð Þ

� exp� y�2 � X�

2β� �′

y�2 � X�2β

� �=σ2

1 � B0 � βð Þ′X2′ X2 B0 � βð Þ=gσ21

2δ2d� η22δ2

!:

(A11)If d=0, δ2 can be sampled from the prior

p δ2jy; θ δ2; Γ; S;A;K� �

∝δ� ν2þ2ð Þexp � η22δ2

� �; (A12)

if d=1,

p δ2jy; θ δ2;Γ; S;A;K� �

∝1gX1X1 þ δ�2X2X2� �

1=2

δ� T�τþν2þ2ð Þ

� exp� y�2 � X�

2β� �′

y�2 � X�2β

� �=σ2

1 � B0 � βð Þ′X2′X2 B0 � βð Þ=gσ2

1 � η22δ2

!;

(A13)

which does not correspond to a known distribution, so we have to resort to using a

174 J. van der Maas

© 2014 The Authors. Statistica Neerlandica © 2014 VVS.

Metropolis–Hastings sampler. An obvious choice for a proposal distribution is aninverted gamma-2 distribution with T� τ + ν2 degrees of freedom and parametery�2 � X�

2β� �′

y�2 � X�2β

� �=σ2

1 � B0 � βð Þ′X2′ X2 B0 � βð Þ=gσ21 � η2. Given the current draw

δ2,(m), a generated value δ2* is accepted with probability

ϕ ¼ minX1X1 þ δ�2;�X2X2

X1X1 þ δ�2; mð ÞX2X2

; 1 !

: (A14)

Appendix 8. Sampling the intercept variance σ2ξ

The full conditional density of σ2ξ is given by

p σ2ξ jy;A;K

� �∝σ� νξþ2ð Þ

ξ exp � ηξ2σ2

ξ

!∏T�h

t¼1σ�1ξ exp � αt � αt�1ð Þ2

2σ2ξ

! !Kt

δ αt�1ð Þ1�Kt

∝σ� νξþ

XT�h

t¼1Ktþ2

� �ξ exp �

ηξ þXT�h

t¼1Kt αt � αt�1ð Þ22σ2

ξ

0@

1A;

(A15)

so σ2ξ can be sampled from an inverted gamma-2 distribution with νξ þ

XT�h

t¼1Kt

degrees of freedom and scale parameter ηξ þXT�h

t¼1Kt αt � αt�1ð Þ2.

Appendix 9. Sampling the break indicator d

The full conditional distribution of d is proportional to

p djy; θð Þ∝pd 1� pð Þ1�dδ� T�h�τð Þd X1X1 þ δ�2dX2X2� � 1=2

exp� B0 � βð Þ′1

gB0 � βð Þ � y�2 � X�

2β� �′

y�2 � X�2β

� �2σ2

1δ2d

0@

1A:

(A16)

By normalization, we obtain

Pr d ¼ 1jy; θð Þ ¼ a

aþ b; (A17)

where

a ¼ pδ� T�h�τð Þ X1′ X1 þ δ�2X2′ X2� � 1=2exp � B0 � βð Þ′1g B0 � βð Þ � y�2 � X�

2β� �′

y�2 � X�2β

� �2σ2

1δ2

!

and

b ¼ 1� pð Þ X ′X 1=2exp � B0 � βð Þ′1g B0 � βð Þ � y�2 � X�

2β� �′

y�2 � X�2β

� �2σ2

1

!:

Forecasting inflation using time-varying BMA 175

© 2014 The Authors. Statistica Neerlandica © 2014 VVS.

Appendix 10. Sampling the breakpoint τ

The full conditional distribution of τ is given by

p τjy; θ τ;Γ; S;A;Kð Þ∝σ�τ1 σ� T�h�τð Þ

21gσ�21 X1X1 þ σ�2

2 X2′X2� �

1=2

� exp � LB0 � Lβð Þ′ LB0 � Lβð Þ=2� �

exp � y�1 � X�1β

� �′y�1 � X�

1β� �

=2σ21

� �� exp � y�2 � X�

2β� �′

y�2 � X�2β

� �=2σ2

2

� �(A18)

As τ can only take a finite number of discrete values, p(τ|y, θ\τ, Γ, S, A, K) can beobtained by normalization. For d=0, τ.

Appendix 11. Sampling the inclusion parameters in Γ

The full conditional distribution of γi is given by

p γijy; θ;Γ\γi; S;A;Kð Þ∝λΓi 1� λð Þ1�γiexp � y�1 � X�

1β� �′

y�1 � X�1β

� �=2σ2

1

� �exp � y�2 � X�

2β� �′

y�2 � X�2β

� �=2σ2

2

� �:

(A19)

By normalization, we obtain

Pr γi ¼ 1jy; θ;Γ\γi; S;A;Kð Þ ¼ c

cþ d; (A20)

with

c ¼ λexp � y�1 � X�11 β

� �′y�1 � X�1

1 β� �

=2σ21

� �exp � y�2 � X�1

2 β� �′

y�2 � X�12 β

� �=2σ2

2

� �and

d ¼ 1� λð Þexp � y�1 � X�01 β

� �′y�1 � X�0

1 β� �

=2σ21

� �exp � y�2 � X�0

2 β� �′

y�2 � X�02 β

� �=2σ2

2

� �;

where X�11 and X�1

2 are X�1 and X�

2 with γi=1 and X�01 and X�0

2 are X�1 and X�

2 with γi=0.The γi’s are sampled in random order in each iteration.

Appendix 12. Sampling the time-varying inclusion parameters in S

sit is only allowed to switch if variable i has been included or excluded for at least fourperiods, which means that sit follows a fourth-order Markov process. In order to usethe method of CHIB (1996), we can define s�it ¼ sit; si;t�1; si;t�2; si;t�3

� �. We then have

p s�itjs�i;t�1; s�i;t�2; s

�i;t�3;…

� �¼ p s�itjs�i;t�1

� �, because all relevant information for s�it

contained in s�i;t�2; s�i;t�3;… is also contained in s�i;t�1, which means s�it follows a first-

176 J. van der Maas

© 2014 The Authors. Statistica Neerlandica © 2014 VVS.

order Markov process with states (0, 0, 0, 0), (0, 0, 0, 1), (0, 0, 1, 1), (0, 1, 1, 1), (1, 1, 1,1), (1, 1, 1, 0), (1, 1, 0, 0) and (1, 0, 0, 0) and transition matrix

Pi ¼

qi 0 0 0 0 0 0 1� qi1 0 0 0 0 0 0 0

0 1 0 0 0 0 0 0

0 0 1 0 0 0 0 0

0 0 0 1� pi pi 0 0 0

0 0 0 0 1 0 0 0

0 0 0 0 0 1 0 0

0 0 0 0 0 0 1 0

0BBBBBBBBBBBBBB@

1CCCCCCCCCCCCCCA

: (A21)

S�i ¼ s�it� �T�h

t¼4 can therefore be sampled as a block using CHIB’S (1996) method bydecomposing its full conditional distribution as

p S�i jy; θ;Γ; S�i;A� � ¼ p s�i;T jy; θ;Γ; S�i;A

� ��

p s�i;T�1; jy; S�;T�2i ; θ; Γ; S�i;A

� ��⋯� p s�i;4�hjy; S�;5i ; θ;Γ; S�i;A

� �;

(A22)

with S�;ni ¼ s�i1; s�i2;…; s�in

� �.S�i can be sampled from the probability

p s�itjy; S�;tþ1i ; θ;Γ; S�i;A

� �¼

p s�itjYtþh; θ;Γ; S�i;A� �

p s�i;tþ1js�it� �

Xs�itp s�itjYtþh; θ;Γ; S�i;A� �

p s�i;tþ1js�it� � ; (A23)

where p s�itjs�i;t�1

� �can be found in Pi. The p s�it Ytþh; θ; Γ; S�i;AÞj�

term can be obtainedfrom a recursion, starting with the ‘prediction step’ for t=5.

p s�itjYtþh�1; θ;Γ; S�i;A� � ¼X

s�i;t�1

p s�i;t�1jYtþh�1; θ;Γ; S�i;A� �

p s�itjs�i;t�1

� �: (A24)

The next step is the ‘updating step’

p s�itjYtþh; θ;Γ; S�i;A� � ¼ p s�itjYtþh�1; θ;Γ; S�i;A

� �p ytþhjYtþh�1; s�it; θ;Γ; S�i;A� �X

s�itp s�it Ytþh�1; θ; Γ; S�i;AÞp ytþh Ytþh�1; s

�it; θ; Γ; S�i;AÞ:

��(A25)

We can recursively go through the prediction and updating steps and sample a value for

s�i;T�h fromp s�i;T�h y; θ;Γ; S�i;AÞj�

. Given this value, we can samples�i;T�h�1 fromEquation

(A27). This backward recursion continues until we have sampled all ofS�i . As input for theprediction step for t=4, we can use the invariant distributions

Forecasting inflation using time-varying BMA 177

© 2014 The Authors. Statistica Neerlandica © 2014 VVS.

p s�i4 ¼ 0; 0; 0; 0ð Þjθ� � ¼ 1� pi2� qi � pi þ 6 1� qið Þ 1� pið Þ (A26)

and

p s�i4 ¼ 1; 1; 1; 1ð Þjθ� � ¼ 1� qi2� qi � pi þ 6 1� qið Þ 1� pið Þ (A27)

and the probability of all other outcomes equal to

1� pið Þ 1� qið Þ2� qi � pi þ 6 1� qið Þ 1� pið Þ ; (A28)

from which we can compute p s�i5 Y4þh; θ;Γ; S�i;AÞj�by Equation (A28) and

p s�i5 Y5þh; θ;Γ; S�i;AÞj�by Equation (A29) and continue this recursion until t=T� h.

Appendix 13. Sampling the time-varying intercept αt

To sample the time-varying intercept A ¼ αtf gT�ht¼1 , we can write the model in state-

space form

ytþh ¼ gt þ ht′xt þ ωtut (A29)

xt ¼ f t þ Ftxt�1 þΩtvt; (A30)

with ut~N(0, 1) and νt~N(0, I) by taking gt ¼Xk

i¼1γisitβi , ht=1, ωt=σt, xt= αt,

ft= 0, Ft= 1 and Ωt=Ktσξ. A can be sampled using a forward-filtering–backwards-sampling algorithm (FRÜHWIRTH-SCHNATTER, 2006, p. 419), where the conditionalmean and variance of αt are computed by running the Kalman filter for t=1, …, T� h.

Appendix 14. Sampling the break indicators K

The break indicators in K can be drawn using the efficient algorithm of GERLACH,CARTER AND KOHN (2000), which draws Kt from

p Ktjθ;Γ; S;A; Ksf gs≠t; y� �

∝p ytþhþ1;…; yTþhjyhþ1;…; ytþh; θ;Γ; S;A; Ksf gs≠t� �

p ytþhjyhþ1;…; ytþh�1; θ; Γ; S;A; Ksf gs≠t� �

p Ktj Ksf gs≠t� �

;(A31)

where the integrating constant can be computed by normalization as Kt can take onlythe values 0 and 1. p(Kt|{Ks}s≠t) is given by the prior on Kt.p(yt+h+1,…, yT+h|yh+1,…, yt+h, θ, Γ, S, A, {Ks}s≠t) and p(yt+h|yh+1,…, yt+h�1, θ, Γ,

S, A, {Ks}s≠t) can be computed by the algorithm described by GERLACH et al. (2000).

178 J. van der Maas

© 2014 The Authors. Statistica Neerlandica © 2014 VVS.

Appendix B. Convergence

This appendix reports the convergence diagnostics calculated as described in section 4for the BMA-TVS and BMA-TVS-AR specifications. The diagnostics were applied toa range of options for the number of draws, the burn-in sample and the thinning fac-tor. From those results, I chose to obtain 90,000 draws for each forecast horizon andinflation measure for the BMA-TVS model. For h=1 for the PCE deflator and h=5for the GDP deflator inflation, I discarded the first 10,000 observations and used athinning value of 2, yielding 40,000 posterior draws. For h=1 for the GDP deflatorand h=5 for the PCE deflator inflation, I used a burn-in sample of 30,000 drawsand a thinning factor of 6, yielding a set of 10,000 posterior draws.Table B1 shows the summarizing statistics of the inefficiency factors and the 5%

rejection rate of the GEWEKE (1992) test, calculated as the percentage of null hypothe-ses rejected at the 5% significance level, based on the retained draws for the BMA-TVS

Table B1. Summary statistics of inefficiency factors 2þX∞

i¼1ρi, based on the Bartlett kernel as described by

Newey and West (1986) for retained draws of the parameters of the Bayesian model averaging–time-varying vari-able selectionmodel and the predictive densities p(yT+h+1|y, x), calculated for each quarter of the prediction sample

Mean Median Minimum Maximum 5% rejection

Personal consumption expenditure deflatorh=1 β 5.46 2.05 0.51 101.09 0.04

A 38.75 32.66 1.69 191.43 0.06σ1 21.53 12.16 2.62 89.27 0.07τ 2.50 1.15 0.57 20.86 0.09P 3.82 2.24 0.89 62.59 0.06Q 2.76 2.27 0.83 20.90 0.06p(yT+h+1|y, x) 3.12 2.14 0.78 30.92 0.05

h=5 β 2.39 1.30 0.46 67.97 0.07A 7.89 6.20 0.80 81.25 0.06σ1 2.29 2.08 1.04 5.08 0.07τ 1.17 1.00 0.60 6.90 0.05P 1.25 1.13 0.46 4.67 0.06Q 1.24 1.15 0.46 5.99 0.06p(yT+h+1|y, x) 1.20 1.11 0.58 2.59 0.04

Gross domestic product deflatorh=1 β 3.18 1.53 0.51 43.58 0.06

A 15.34 12.56 0.85 67.33 0.07σ1 5.48 5.52 0.92 17.84 0.04τ 0.95 0.90 0.56 3.06 0.03P 1.57 1.22 0.49 10.88 0.05Q 1.43 1.21 0.53 12.84 0.06p(yT+h+1|y, x) 1.53 1.33 0.57 4.46 0.04

h=5 β 6.95 2.39 0.51 72.71 0.05A 19.86 14.97 1.41 160.40 0.06σ1 3.79 3.69 1.44 7.19 0.07τ 1.05 1.00 0.52 2.04 0.06P 2.90 2.13 0.75 38.85 0.05Q 2.79 2.22 0.85 26.63 0.07p(yT+h+1|y, x) 1.89 1.51 0.69 10.06 0.04

Note: The rightmost column shows the 5% rejection rate of the GEWEKE (1992) test for equality of the meanof the first 20% and the last 40% of the retained draws.

Forecasting inflation using time-varying BMA 179

© 2014 The Authors. Statistica Neerlandica © 2014 VVS.

specification. Clearly, the regression parameters in β and A and the variance parameterσ1 are the most troublesome when looking at the inefficiency factors. However, thenumber of retained draws proved to be more than enough for reasonable inference.The rejection rates for the GEWEKE (1992) test are also satisfactory, as a rejection rateof 5% could be expected even if all hypotheses were true.For the BMA-TVS-AR model, I started with 50,000 draws for all models, except for

the PCE deflator for h=1, which suffered from high serial correlations in the chain. Inthat case, I used 65,000 draws. For the different combinations of horizon and inflationmeasure, I chose 10,000 burn-in draws and a thinning factor of 4 (GDP, h=1), 10,000burn-in draws and a thinning factor of 5 (GDP, h=5), 5000 burn-in draws and a thin-ning factor of 5 (PCE, h=1) and 10,000 burn-in draws and a thinning of factor 2 (PCE,h=5). This yields 10,000, 8000, 7500 and 20,000 draws, respectively.

Table B2. Summary statistics of inefficiency factors 2þ∑∞i¼1 ρi, based on the Bartlett kernel as described by

Newey and West (1986) for retained draws of the parameters of the Bayesian model averaging–time-varyingvariable selection–autoregressive model and the predictive densities p(yT+h+1|y, x), calculated for each quarterof the prediction sample

Mean Median Minimum Maximum 5% rejection

Personal consumption expenditure deflatorh=1 β 3.09 1.51 0.57 15.86 0.09

A 6.04 3.89 0.47 60.84 0.09σ1 4.65 3.33 1.26 17.79 0.08τ 1.08 0.97 0.54 3.50 0.04P 1.51 1.22 0.59 5.88 0.06Q 1.27 1.16 0.55 2.98 0.07p(yT+h+1|y, x) 1.19 1.09 0.63 3.32 0.07

h=5 β 5.95 2.12 0.59 260.21 0.08A 9.99 5.40 0.58 175.16 0.08σ1 2.52 2.40 1.16 4.87 0.08τ 1.69 1.13 0.52 20.41 0.04P 2.55 2.23 0.78 16.16 0.07Q 2.96 2.51 1.00 22.76 0.07p(yT+h+1|y, x) 1.29 1.18 0.62 3.38 0.07

Gross domestic product deflatorh=1 β 8.06 5.11 0.72 53.01 0.08

A 9.35 6.71 0.55 59.10 0.07σ1 3.28 3.21 1.43 5.81 0.04τ 1.05 0.94 0.51 3.52 0.09P 3.01 2.04 0.57 11.65 0.04Q 2.40 1.84 0.62 11.11 0.06p(yT+h+1|y, x) 1.33 1.19 0.65 3.26 0.07

h=5 β 8.68 2.41 0.41 64.14 0.06A 7.45 2.62 0.45 93.61 0.08σ1 2.13 1.88 0.62 5.02 0.08τ 0.99 0.93 0.48 2.00 0.06P 2.88 1.35 0.54 15.01 0.05Q 2.78 1.36 0.57 22.97 0.05p(yT+h+1|y, x) 1.35 1.08 0.59 10.59 0.02

Note: The rightmost column shows the 5% rejection rate of the GEWEKE (1992) test for equality of the meanof the first 20% and the last 40% of the retained draws.

180 J. van der Maas

© 2014 The Authors. Statistica Neerlandica © 2014 VVS.

Table B2 shows the summarizing statistics and the 5% rejection rate of the GEWEKE

(1992) test based on the retained draws for the BMA-TVS-ARmodel. In most cases, thenumber of retained draws proved to be more than enough for reasonable inference. Therejection rates for the GEWEKE (1992) test are also satisfactory.

References

ALBERT, J. H. and S. CHIB (1993), Bayes inference via Gibbs sampling of autoregressive timeseries subject to Markov mean and variance shifts, Journal of Business & Economic Statistics11, 1–15.

ATKESON, A. and L. E. OHANIAN (2001), Are Phillips curves useful for forecasting inflation?,Federal Reserve Bank of Minneapolis Quarterly Review 25, 2–11.

BELMONTE, M. and G. KOOP (2013, January), Model switching and model averaging in time-varying parameter regression models, Working Papers No. 1302, Department of Economics,University of Strathclyde Business School.

CECCHETTI, S. G., R. S. CHU and C. STEINDEL (2000), The unreliability of inflation indicators,Current Issues in Economics and Finance 4, 1–6.

CHAN, J. C., G. KOOP, R. LEON-GONZALEZ and R. W. STRACHAN (2012), Time varyingdimension models, Journal of Business & Economic Statistics 30, 358–367.

CHIB, S. (1996), Calculating posterior distributions and modal estimates in Markov mixturemodels, Journal of Econometrics 75, 79–97.

CHIPMAN, H., E. I. GEORGE, R. E. MCCULLOCH, M. CLYDE, D. P. FOSTER and R. A. STINE(2001), The practical implementation of Bayesian model selection, Lecture Notes –MonographSeries 38, 65–134.

DAVIG, T. and T. DOH (2009), Monetary policy regime shifts and inflation persistence. TheFederal Reserve Bank of Kansas City Research Working Paper, 08–16.

DIEBOLD, F. X. and R. S. MARIANO (1995), Comparing predictive accuracy, Journal ofBusiness & Economic Statistics 13, 253–263.

FERNANDEZ, C., E. LEY and M. F. STEEL (2001), Benchmark priors for Bayesian modelaveraging, Journal of Econometrics 100, 381–427.

FRÜHWIRTH-SCHNATTER, S. (2006), Finite mixture and Markov switching models, SpringerScience + Business Media, New York.

GEORGE, E. and D. P. FOSTER (2000), Calibration and empirical Bayes variable selection,Biometrika 87, 731–747.

GERLACH, R., C. CARTER and R. KOHN (2000), Efficient Bayesian inference for dynamicmixture models, Journal of the American Statistical Association 95, 819–828.

GEWEKE, J. (1992), Evaluating the accuracy of sampling-based approaches to the calculationof posterior moments, in: Bayesian statistics, Oxford University Press, Oxford, pp. 169–193.

GIORDANI, P. and R. KOHN (2008), Efficient Bayesian inference for multiple change-point andmixture innovation models, Journal of Business & Economic Statistics 26, 66–77.

GROEN, J. J., R. PAAP and F. RAVAZZOLO (2013), Real-time inflation forecasting in a chang-ing world, Journal of Business & Economic Statistics 31, 29–44.

HARVEY, D., S. LEYBOURNE and P. NEWBOLD (1997, June), Testing the equality of predictionmean squared errors, International Journal of Forecasting 13, 281–291.

KOOP, G. and S. M. POTTER (2004), Forecasting in dynamic factor models using Bayesianmodel averaging, The Econometrics Journal 7, 550–565.

KOOP, G. and S. M. POTTER (2007), Estimation and forecasting in models with multiplebreaks, The Review of Economic Studies 74, 763–789.

KUO, L. and B. MALLICK (1998), Variable selection for regression models, Sankhyā: TheIndian Journal of Statistics, Series B 60, 65–81.

LEY, E. and M. F. STEEL (2012), Mixtures of g-priors for Bayesian model averaging witheconomic applications, Journal of Econometrics 171, 251–266.

Forecasting inflation using time-varying BMA 181

© 2014 The Authors. Statistica Neerlandica © 2014 VVS.

LIANG, F., R. PAULO, G. MOLINA, M. A. CLYDE and J. O. BERGER (2008), Mixtures ofg-priors for Bayesian variable selection, Journal of the American Statistical Association103, 410–423.

NEWEY, W. K. and K. D. WEST (1986), A simple, positive semi-definite, heteroskedasticity andautocorrelation consistent covariance matrix, Econometrica 55, 703–708.

STOCK, J. H. and M. W. WATSON (1999), Forecasting inflation, Journal ofMonetary Economics44, 293–335.

ZELLNER, A. (1986), On assessing prior distributions and Bayesian regression analysis withg-prior distributions. Bayesian inference and decision techniques: Essays in Honor of Bruno DeFinetti 6, 233–243.

Received: 14 March 2014.

182 J. van der Maas

© 2014 The Authors. Statistica Neerlandica © 2014 VVS.