forecasting us inflation by bayesian model averaging

14
Copyright © 2008 John Wiley & Sons, Ltd. Forecasting US Inflation by Bayesian Model Averaging JONATHAN H. WRIGHT* Department of Economics, Johns Hopkins University, Baltimore, MD, USA ABSTRACT Recent empirical work has considered the prediction of inflation by combining the information in a large number of time series. One such method that has been found to give consistently good results consists of simple equal-weighted averaging of the forecasts from a large number of different models, each of which is a linear regression relating inflation to a single predictor and a lagged dependent variable. In this paper, I consider using Bayesian model averaging for pseudo out-of-sample prediction of US inflation, and find that it generally gives more accurate forecasts than simple equal-weighted averaging. This superior performance is consistent across subsamples and a number of inflation measures. Copyright © 2008 John Wiley & Sons, Ltd. key words shrinkage; model averaging; Phillips curve; model uncertainty; forecasting; inflation INTRODUCTION Forecasting inflation is clearly of critical importance to the conduct of monetary policy, as central banks attempt to conduct forecast-based monetary policy. A simple Phillips curve, which uses a single measure of economic slack such as unemployment to predict future inflation, is probably the most common econometric basis of inflation forecasting. The usefulness of the Phillips curve as a means of predicting inflation has, however, been questioned by several authors. For example, Atkeson and Ohanian (2001) found that Phillips curve based forecasts of inflation give larger out- of-sample prediction errors than a simple random walk forecast of inflation, although this specific result is sensitive to the sample period and to the choice of inflation measure (Sims, 2002). Cecchetti et al. (2000) consider inflation prediction with individual indicators, including unemployment, and argue that none of these gives reliable inflation forecasts. Stock and Watson (2003, 2004) consider prediction of inflation in each of the G7 countries using a large number of possible models. Each model has a single predictor (plus lagged inflation). They find that most of the models they consider give larger out-of-sample root mean square prediction error than a simple naive time series forecast based on fitting an autoregression to inflation. When a model does have predictive power relative to Journal of Forecasting J. Forecast. 28, 131–144 (2009) Published online 24 September 2008 in Wiley InterScience (www.interscience.wiley.com) DOI: 10.1002/for.1088 * Correspondence to: Jonathan Wright, Department of Economics, Johns Hopkins University, 3400N. Charles St, Baltimore, MD 21218, USA. E-mail: [email protected]

Upload: jonathan-h-wright

Post on 13-Jun-2016

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Forecasting US inflation by Bayesian model averaging

Copyright © 2008 John Wiley & Sons, Ltd.

Forecasting US Infl ation by Bayesian Model Averaging

JONATHAN H. WRIGHT*Department of Economics, Johns Hopkins University, Baltimore, MD, USA

ABSTRACT

Recent empirical work has considered the prediction of infl ation by combining the information in a large number of time series. One such method that has been found to give consistently good results consists of simple equal-weighted averaging of the forecasts from a large number of different models, each of which is a linear regression relating infl ation to a single predictor and a lagged dependent variable. In this paper, I consider using Bayesian model averaging for pseudo out-of-sample prediction of US infl ation, and fi nd that it generally gives more accurate forecasts than simple equal-weighted averaging. This superior performance is consistent across subsamples and a number of infl ation measures. Copyright © 2008 John Wiley & Sons, Ltd.

key words shrinkage; model averaging; Phillips curve; model uncertainty; forecasting; infl ation

INTRODUCTION

Forecasting infl ation is clearly of critical importance to the conduct of monetary policy, as central banks attempt to conduct forecast-based monetary policy. A simple Phillips curve, which uses a single measure of economic slack such as unemployment to predict future infl ation, is probably the most common econometric basis of infl ation forecasting. The usefulness of the Phillips curve as a means of predicting infl ation has, however, been questioned by several authors. For example, Atkeson and Ohanian (2001) found that Phillips curve based forecasts of infl ation give larger out-of-sample prediction errors than a simple random walk forecast of infl ation, although this specifi c result is sensitive to the sample period and to the choice of infl ation measure (Sims, 2002). Cecchetti et al. (2000) consider infl ation prediction with individual indicators, including unemployment, and argue that none of these gives reliable infl ation forecasts. Stock and Watson (2003, 2004) consider prediction of infl ation in each of the G7 countries using a large number of possible models. Each model has a single predictor (plus lagged infl ation). They fi nd that most of the models they consider give larger out-of-sample root mean square prediction error than a simple naive time series forecast based on fi tting an autoregression to infl ation. When a model does have predictive power relative to

Journal of ForecastingJ. Forecast. 28, 131–144 (2009)Published online 24 September 2008 in Wiley InterScience(www.interscience.wiley.com) DOI: 10.1002/for.1088

* Correspondence to: Jonathan Wright, Department of Economics, Johns Hopkins University, 3400N. Charles St, Baltimore, MD 21218, USA. E-mail: [email protected]

Page 2: Forecasting US inflation by Bayesian model averaging

132 J. H. Wright

Copyright © 2008 John Wiley & Sons, Ltd. J. Forecast. 28, 131–144 (2009) DOI: 10.1002/for

the naive time series forecast, this tends to be unstable. That is, a model with good predictive power in one subperiod has little or no propensity to have good predictive power in another subperiod.

In recent years, researchers have, however, made substantial progress in forecasting infl ation using large datasets (i.e., a large number of predictive variables), but where the information in these dif-ferent variables is combined in a judicious way that avoids the estimation of a large number of unrestricted parameters. Bayesian VARs have been found to be useful in forecasting: these often use many time series, but impose a prior that many of the coeffi cients in the VAR are close to zero. Approaches in which the researcher estimates a small number of factors from a large dataset and forecasts using these estimated factors have also been shown to be capable of superior predictive performance (Stock and Watson, 1999, 2002; Bernanke and Boivin, 2003). Stock and Watson (2003, 2004), however, fi nd that the best predictive performance is obtained by constructing forecasts from a very large number of models and simply averaging these forecasts. Stock and Watson report that this gives the best predictive performance of international infl ation (and also output growth), and that this is remarkably consistent across subperiods and across countries. Stock and Watson explore other methods for pooling the different forecasts, but fi nd that none does better than simply averag-ing them, i.e., giving them all equal weights.

Stock and Watson (2003, 2004) do not offer a defi nitive explanation for why simple averaging of forecasts does so well, but it is a very strong and general fi nding for the forecasting of infl ation, and other variables. It is indeed part of the folklore of economic forecasting going back to Bates and Granger (1969). It is unlikely that the true optimal weights on many different forecasting models will be exactly equal, but even if they are not the error introduced by estimating those weights may more than offset any benefi t, leading equal-weighted averaging to work better than methods in which the weights are estimated. The closer are the variances of the different forecast errors, and the lower is their correlation, the more likely it is that equal weighting will work best (Bunn, 1985). Research-ers have found some other circumstances under which equal weighting may be optimal, or at least very hard to beat. One is if the optimal weights are unstable over time (e.g., Clemen and Winkler, 1986). Another, considered by Hendry and Clements (2002), is if there is a large number of forecast-ing models, which are all misspecifi ed by omitting deterministic intercept shifts. They argue—both analytically and in Monte Carlo simulations—that averaging such forecasts may work better than using any individual forecast, and that equal-weighted averaging may dominate methods in which the weights are estimated, owing to the uncertainty created by estimation of the weights. Clemen (1989) surveys some of the reasons for the robustness of simple model averaging.

Bayesian model averaging (BMA) is another method for pooling forecasts that has received con-siderable recent attention in both the statistics and econometrics literatures. It can be used to shrink the weights on different forecasts towards equality and thereby mitigate the diffi culties created by estimating weights. This paper considers the prediction of US infl ation by BMA, a technique which was not considered by Stock and Watson (2003, 2004).

The idea of BMA is to take forecasts from many different models, and to assume that one of them is the true model, but that the researcher does not know which one this is. The researcher starts from a prior about which model is true and computes the posterior probabilities that each model is the true one. The forecasts from all the models are then weighted by these posterior probabilities. In this paper I argue that BMA generally does better than simple equal-weighted model averaging for pre-dicting US infl ation. The result is remarkably consistent across measures of infl ation, and across different time periods. BMA also fares well relative to a number of simple univariate time series forecasts.

Page 3: Forecasting US inflation by Bayesian model averaging

Forecasting US Infl ation by Bayesian Model Averaging 133

Copyright © 2008 John Wiley & Sons, Ltd. J. Forecast. 28, 131–144 (2009) DOI: 10.1002/for

One does not have to be a subjectivist Bayesian to believe in the usefulness of BMA, or of Bayesian shrinkage techniques more generally. A frequentist econometrician can interpret these methods as pragmatic devices that may be useful for out-of-sample forecasting in the face of model and parameter uncertainty. Recently Hjort and Claeskens (2003) and other authors have considered frequentist approaches to model averaging.

The plan for the remainder of the paper is as follows. In the next section I describe the idea of BMA. The out-of-sample infl ation prediction exercise is described in the third section. The fourth section concludes.

BAYESIAN MODEL AVERAGING

The idea of BMA was set out by Leamer (1978), and has more recently been used by many authors in the statistics literature, including Raftery et al. (1997) and Chipman et al. (2001). It has also been used in a number of econometric applications, including output growth forecasting (Min and Zellner, 1993; Koop and Potter, 2003), cross-country growth regressions (Doppelhofer et al., 2000; Fernandez et al., 2001), exchange rate forecasting (Wright, 2003) and stock return prediction (Avramov, 2002; Cremers, 2002).

Consider a set of n models M1, . . . , Mn. The ith model is indexed by a parameter vector qi. The researcher knows that one of these models is the true model, but does not know which one.1 The researcher has prior beliefs about the probability that the ith model is the true model which we write as P(Mi), observes data D, and updates her beliefs to compute the posterior probability that the ith model is the true model:

P M DP D M P M

P D M P Mi

i i

j jj

n( ) =

( ) ( )( ) ( )

=∑ 1

(1)

where

P D M P D M P Mi i i i i i( ) = ( ) ( )∫ θ θ θ, d (2)

is the marginal likelihood of the ith model, P(qi Mi) is the prior density of the parameter vector in this model and P(Dqi, Mi) is the likelihood. Each model implies a forecast. In the presence of model uncertainty, our overall forecast weights each of these forecasts by the posterior for that model. The researcher needs only specify the set of models, the model priors, P(Mi), and the parameter priors, P(qiMi).2

1 For recent work considering BMA when none of the models is in fact true, see Key et al. (1998) and Fernández-Villaverde and Rubio-Ramírez (2004). The BMA procedure is consistent in the sense that the weight on the model that is best in the Kullback–Leibler metric (which is the true model if that exists, but is still well defi ned if none of the models is true) goes to one asymptotically as the sample size increases.2 The distinction between model uncertainty and parameter uncertainty is a little artifi cial in the sense that one could write all the models as nested by a suffi ciently general model, and, within this nesting model, the only uncertainty is then parameter uncertainty. The BMA approach would, however, imply very particular and rather peculiar priors for these parameters.

Page 4: Forecasting US inflation by Bayesian model averaging

134 J. H. Wright

Copyright © 2008 John Wiley & Sons, Ltd. J. Forecast. 28, 131–144 (2009) DOI: 10.1002/for

The models do not have to be linear regression models, but I shall henceforth assume that they are. The ith model then specifi es that

y Xi i i= +β ε (3)

where y is a T × 1 vector of observations on a variable that the researcher wishes to forecast, Xi is a T × pi matrix of strictly exogenous predictors, bi is a pi × 1 parameter vector, ei is the disturbance vector, the disturbances are i.i.d. with mean zero and variance s2, and qi = (b′i, s2)′.

For the model priors, I shall assume that all models are equally likely, so that P(Mi) = 1/n. For the parameter priors, I shall take the natural conjugate g-prior specifi cation for bi, so that the prior for bi conditional on s2 is N(bi,fs2)(X′iXi)−1). For s2, I assume the improper prior that is proportional to 1/s2. It is well known (see, for example, Zellner, 1977) that one can then calculate the required likelihood of the model analytically as

P D MT

Si T

pi

Ti( ) =( )

+( )− −1

2

21

2

2Γπ

φ (4)

where S y X y X y X X X X X y Xi i i i i i i i i i i i i2 1

1= −( )′ −( ) − −( )′ ′( ) ′ −( )

+−β β β β φ

φ. The OLS estimate of bi is

βi i i iX X X y= ′( ) ′−1 (5)

while the posterior mean is

�β β φφ

βφi

i=+

++

ˆ

1 1 (6)

The prior for bi is a proper prior. In BMA, one cannot use improper priors for model-specifi c parameters, because improper priors are unique only up to an arbitrary multiplicative constant and so their use would lead to an indeterminacy of the model posterior probabilities (Kass and Raftery, 1995). Besides, an informative prior may improve forecasting performance. One way of thinking about the role of f is that it controls the relative weight of the data and our prior beliefs in computing the posterior probabilities of different models. If f = 0, then P(D Mi) = 1/n for all models and so the posterior probabilities of each model being true are equal (equal weights). The larger is f, the more we are willing to move away from the model priors in response to what is observed in the data. The prior for s2 is an inverse-gamma (0, 0) prior, and is improper. An improper prior can be used for s2, because this parameter is common to all models, and thus the arbitrary multiplicative constant in this prior does not affect the posterior probabilities for the different models. This parameter prior has been used by Fernandez et al. (2001) and many others in the BMA literature.3

3 Alternatively, a conjugate inverse-gamma (c0, d0) prior could have been used for s2 with c0 > 0 and d0 > 0. This is a proper prior, but has the disadvantage that is not invariant under scale transformations. Moreover, it seems appropriate to have a prior for s2 that is as uninformative as possible.

Page 5: Forecasting US inflation by Bayesian model averaging

Forecasting US Infl ation by Bayesian Model Averaging 135

Copyright © 2008 John Wiley & Sons, Ltd. J. Forecast. 28, 131–144 (2009) DOI: 10.1002/for

US INFLATION FORECASTING

The paper applies BMA to forecasting US infl ation in a standard recursive out-of-sample prediction exercise. Following Stock and Watson (2003, 2004), each forecasting model is of the form

π α γ ρ π εt t h i i i t i t h t i tZ, , , ,+ −= + + + (7)

where π t t h t h th

P P, log+ += ( )100denotes the infl ation rate from time t to time t + h, Pt is the price level

at time t, h is the forecasting horizon, Zi,t is a scalar predictor, and ei,t is the error term in the ith model. Each model has a different scalar predictor.

The data are quarterly from 1960Q1 to 2006Q1. The results shown in this paper are for the total Consumer Price Index (CPI) measure of infl ation, but the working paper version contains results for three other infl ation measures: core CPI, the chain-weighted Gross Domestic Product (GDP) price index and the chain-weighted Personal Consumption Expenditures (PCE) price index, which show qualitatively similar patterns. For the predictor Zi,t in equation (7), I consider a total of 107 possible variables, as listed in Table I. All of the variables are available at a quarterly frequency for the whole sample period, yielding a balanced panel covering a very wide range of potentially relevant measures of economic slack and fi nancial asset prices. The predictors used are similar to those considered by Stock and Watson (1999, 2003, 2004).

I consider pseudo out-of-sample prediction of infl ation using equal-weighted averaging across the models defi ned by the different predictors in equation (7). The equal-weighted averaging h-step-ahead forecast is given by

ˆ ˆ ˆ ˆ, , , , ,π α γ ρ πt t hEW

i t i t t i t t h ti

n

nZ+ −=

= + +( )∑11

(8)

where a i,t, g i,t and r i,t are the OLS estimates of the parameters of equation (7) obtained using only data from date t and earlier. I also consider a forecast constructed by BMA of these OLS forecasts given by

ˆ ˆ ˆ ˆ, , , , ,π α γ ρ πt t h t i i t i t t i t t h ti

nP M D Z+ −=

= ( ) + +( )∑BMA-OLS

1 (9)

where Pt(Mi D) is the posterior probability that model i is the true model, again computed using only data from date t and earlier. This will be referred to as the BMA-OLS forecast. In equation (9), the Bayesian approach is only used to determine the posterior probabilities for each model. Within each individual model, the forecast is constructed using OLS estimates of ai, g i and ri, not the posterior means. It is thus a hybrid Bayesian–classical forecast which has the advantage that it nests equal-weighted averaging as a special case, since it reduces to the equal-weighted averaging forecast in (8) when f = 0. But the more standard BMA forecast would instead use the posterior means of ai, g i and ri. This forecast is given by

ˆ , , , , ,π α γ ρ πt t h t i i t i t t i t t h ti

nP M D Z+ −=

= ( ) + +( )∑BMA � � �1

(10)

where a i,t, g i,t and r i,t are the posterior means of the parameters of equation (7) obtained using only data from date t and earlier and will be referred to as the BMA forecast.

Page 6: Forecasting US inflation by Bayesian model averaging

136 J. H. Wright

Copyright © 2008 John Wiley & Sons, Ltd. J. Forecast. 28, 131–144 (2009) DOI: 10.1002/for

Table I. Predictors used

Series Transform(s) Series Transform(s)

Employees on nonfarm payrolls dln Inventories index levelPrivate nonfarm payrolls: total dln New orders index level NRM dln Production index level Construction dln Purchasing managers’ index level Manufacturing dln Prices paid index level Durable goods dln Michigan index of consumer sentiment level Nondurable goods dln Help-wanted index dln TTU dln Number of unemployed: total dln Information dln less than 5 weeks dln Financial activities dln 5 to 14 weeks dln Prof. and business services dln 15 weeks and over dln Education and health services dln 15 to 26 weeks dln Leisure and hospitality dln Number of employed dln Other services dln Number employed in nonag. industries dln Government dln Unemployment rate, 16+ levelWeekly hours of PWs: NRM level Manufacturing capacity utilization level Construction level Real PCE: total dln Manufacturing level Durables dln Durable goods level Nondurables dln Nondurable goods level Services dlnWeekly overtime of PWs level New cars dlnHourly earnings: NRM ddln Labor productivity: NB dln Construction ddln Labor productivit: NCB dln Manufacturing ddln Unit labor cost: NB dln Durable goods ddln Unit labor cost: NCB dln Nondurable goods ddln Compensation per hour: NB dlnMonetary base dln,ddln Compensation per hour: NCB dlnMoney stock (M1) dln,ddln Real GDP dlnMoney stock (M2) dln,ddln Real fi nal sales dlnTotal reserves dln,ddln Real disposable personal income dlnIndustrial production: total dln Corporate profi ts before tax dln Business equipment dln NIPA compensation of employees dln Consumer goods dln NYSE composite index dln Durable consumer goods dln S&P composite index dln Final products dln Prime rate diff, s Materials dln Three-month Treasury bill rate diff, s Manufacturing dln Six-month Treasury bill rate diff, s Nondurable goods materials dln One-year CM Treasury yield diff, s Nondurable consumer goods dln Five-year CM Treasury yield diff, s Total products dln Ten-year CM Treasury yield diff, sHousing starts: total ln Moody’s AAA corporate yield diff, s Midwest ln Moody’s BAA corporate yield diff, s Northeast ln Effective federal funds rate diff South ln S&P dividend yield level West ln S&P price–earnings ratio levelMobile homes shipments ln Price of gold dlnNAPM: Supplier deliveries index level Price of oil dln Manufacturing employment index level

Abbreviations for series: PW, Production workers; NRM, natural resources and mining; TTU, trade, transportation and utilities; NB, nonfarm business; NCB, nonfi nancial corporate business; CM, constant maturity. Transformations: dln, fi rst difference of logs; ddln, second difference of logs; ln, logs; level, level; diff, fi rst differences; s, level of spread over effec-tive federal funds rate. The predictors in italics are classed as asset prices.

Page 7: Forecasting US inflation by Bayesian model averaging

Forecasting US Infl ation by Bayesian Model Averaging 137

Copyright © 2008 John Wiley & Sons, Ltd. J. Forecast. 28, 131–144 (2009) DOI: 10.1002/for

The posterior model probabilities are calculated as described above. The assumptions used in deriving equation (4) are clearly false in this application in two respects. Firstly, the regressors are assumed to be strictly exogenous. Many methods for combining a large number of variables in forecasting exercises (including bagging and empirical Bayes methods) likewise have a theoretical justifi cation that relies on strict exogeneity of the regressors. Secondly, the forecasts are overlapping h-step-ahead forecasts and so forecast errors less than h periods apart are bound to be serially cor-related, even though in the previous section it was assumed that they are i.i.d.4 But BMA, like other methods for combining many predictors for forecasting, may still have good forecasting properties even if the premises underlying their theoretical justifi cation are false (see Stock and Watson, 2005). Ability to provide good out-of-sample forecasts is a stringent test of the practical usefulness of BMA in forecasting.

The only parameters that must be specifi ed are f and the prior mean bi. The prior mean of g i is set to zero, meaning that the prior is that Zi,t is not useful for prediction. On the other hand, zero does not seem like a good choice for the prior mean for ai and ri. Therefore I use the OLS estimates of the pure autoregression

pt,t+h = a + rpt−h,t +et (11)

fi tted to CPI data over the pre-sample period 1913Q1–1959Q4 as the prior means for ai and ri.For each quarter from 1971Q1 on, the pseudo-out-of-sample root mean square prediction error

(RMSPE) of the equal-weighted averaging forecast in equation (8) and the BMA-OLS and BMA forecasts in equations (9) and (10) were computed with f = 1 (see below for discussion of the choice of f). Because of the possibility of forecast instability and because of the runup in infl ation in the late 1970s, and the subsequent disinfl ation was quite atypical of more recent outcomes, these relative out-of-sample RMSPEs were also computed for the more recent subsample from 1987Q1 to 2006Q1. All the predictors in Table I were used. The RMSPEs were also computed using only asset prices as predictors, which is useful because these are available at high frequency, in real time and are not subject to revision, although there are only 23 such regressors.

The results are reported in Table II. This table includes the out-of-sample RMPSE from a number of benchmark forecasting methods as a comparison. The benchmarks considered are:

• TS1: A random walk forecast p TSt,t+h

1 =pt−h,t.• TS2: A forecast obtained by the OLS estimation of equation (11).• TS3: A forecast obtained by the MLE of an ARMA(1,1) model fi tted to pt,t+1, which is then used

to construct the forecast of pt,t+h by iterating forward.5

• S1: The Survey of Professional Forecasters (SPF) median prediction for CPI infl ation from quar-ters t to t + h as reported in the survey conducted in the middle of quarter t. This is available only for the more recent subsample, and only at horizons out to four quarters.

• S2: The Blue Chip median prediction for CPI infl ation from quarters t to t + h as reported in the survey conducted in the middle of quarter t. Again, this is available only for the more recent subample, and only at horizons out to four quarters.

4 One way round this is to use only every hth observation in computing the posterior probabilities, but I do not do this in the baseline results. Results dropping all but every hth observation are not shown, so as to conserve space, but are similar and are available from the author.5 The MLE is obtained conditioning on the error at time zero being fi xed at zero.

Page 8: Forecasting US inflation by Bayesian model averaging

138 J. H. Wright

Copyright © 2008 John Wiley & Sons, Ltd. J. Forecast. 28, 131–144 (2009) DOI: 10.1002/for

From comparison of the recursive out-of-sample RMPSEs from the model averaging and bench-mark forecasts in Table II, a number of points emerge:

1. Both BMA-OLS and BMA generally outperform equal-weighted model averaging at longer horizons while at the shortest horizons the methods all have virtually identical RMSPEs. BMA-OLS represents a modest improvement over equal-weighted model averaging, corresponding to a reduction in RMSPE of 0–5%. However, it is consistent in the sense that there is no case in

Table II. Out-of-sample RMSPE of forecasts of CPI infl ation

Horizon (quarters) 1 2 3 4 5 6 7 8

1971Q1–2006Q1Model averaging (using all predictors)EWA 1.73 1.62 1.66 1.89 2.12 2.31 2.47 2.60BMA-OLS 1.73 1.62 1.61 1.81 2.06 2.25 2.44 2.59BMA 1.75 1.65 1.65 1.82 2.01 2.15 2.26 2.31

Model averaging (using asset prices only)EWA 1.72 1.62 1.68 1.91 2.14 2.33 2.50 2.63BMA-OLS 1.71 1.60 1.62 1.84 2.08 2.29 2.48 2.62BMA 1.74 1.65 1.65 1.83 2.02 2.17 2.28 2.32

Benchmark forecastsRW 1.81 1.71 1.73 1.94 2.14 2.29 2.41 2.48AR 1.77 1.72 1.84 2.12 2.38 2.58 2.74 2.86ARMA 1.76 1.74 1.71 1.84 1.98 2.08 2.20 2.33

1987Q1–2006Q1Model averaging (using all predictors)EWA 1.32 1.07 0.98 0.93 0.91 0.93 0.99 1.06BMA-OLS 1.32 1.07 1.00 0.91 0.86 0.89 0.96 1.04BMA 1.26 1.03 0.95 0.88 0.82 0.80 0.80 0.80

Model averaging (using asset prices only)EWA 1.32 1.07 0.99 0.94 0.92 0.94 1.00 1.07BMA-OLS 1.31 1.07 0.99 0.91 0.87 0.88 0.95 1.05BMA 1.25 1.02 0.95 0.88 0.82 0.80 0.80 0.80

Benchmark forecastsRW 1.40 1.17 1.11 1.07 1.04 1.03 1.04 1.06AR 1.33 1.11 1.05 1.03 1.04 1.09 1.16 1.24ARMA 1.25 1.10 1.06 1.09 1.10 1.08 1.10 1.13SPF 1.17 0.97 0.86 0.82Blue Chip 1.15 0.96 0.86 0.81

Notes: The table shows the root mean square prediction errors (RMSPEs) of selected out-of-sample forecasts of h-quarter-ahead CPI infl ation, expressed at an annualized rate, over the perods shown. The forecasts considered are the random walk forecast (RW), an AR(1) fi tted to h-quarter infl ation (AR), an ARMA process fi tted to one-quarter infl ation and iterated forward to generate an h-quarter forecast, the Survey of Professional Forecasters (SPF) and Blue Chip predictions and some model averaging forecasts. The survey forecasts are given for the later subperiod only because CPI predictions were not collected by the SPF before 1981. The surveys only go out four quarters. The model averaging forecasts use 107 models, each of which regresses h-period infl ation on lagged h-period infl ation and one of 107 predictors, as listed in Table I. The equal-weighted averaging (EWA) forecasts simply average the resulting predictions, as in equation (8). The BMA-OLS forecasts instead weight them by their posterior probabilities, as in equation (9). The BMA forecasts instead use these pos-terior probabilities to weight the forecasts based on the posterior means of the parameters, as in equation (10). Both the BMA-OLS and BMA forecasts use the prior j = 1.

Page 9: Forecasting US inflation by Bayesian model averaging

Forecasting US Infl ation by Bayesian Model Averaging 139

Copyright © 2008 John Wiley & Sons, Ltd. J. Forecast. 28, 131–144 (2009) DOI: 10.1002/for

which equal-weighted model averaging substantially outperforms BMA-OLS. The improvement can be larger for the BMA forecasts, especially at longer horizons. For example, in the 1987Q1–2006Q1 subsample, the out-of-sample RMSPE of 8-quarter-ahead forecasts of CPI infl ation is 1.06 percentage points (at an annualized rate) for equal-weighted averaging, 1.04 for BMA-OLS but 0.80 for BMA—a reduction of over 20% in RMSPE.

2. The RMSPEs from any of the forecasting methods are smaller in the 1987Q1–2006Q1 period than over the longer forecast evaluation sample, consistent with Stock and Watson (2007).

3. BMA generally outperforms the simple time series benchmarks. In the 1987Q1–2006Q1 sub-sample, BMA gives uniformly smaller RMSPE than any of the three time series benchmarks. In the full sample, the comparison is not quite as clear-cut, but it is still true that BMA is still typically better than any of the time series benchmarks. BMA does better relative to the time series benchmarks in the recent subsample than in the full sample. More recent out-of-sample forecasts use a larger sample size for estimation of the parameters and model posterior probabil-ities, so this is perhaps not surprising. However, it contrasts with Atkeson and Ohanian (2001) and Stock and Watson (2007), who found that the incremental improvement from the multivari-ate infl ation forecasting methods that they considered over univariate benchmarks was smaller more recently than in earlier decades.

4. BMA-OLS mostly outperforms the simple time series benchmarks, both in the full sample and in the 1987Q1–2006Q1 subsample.

5. Where available, surveys have good forecast performance. This is entirely consistent with the fi ndings of Ang et al. (2007). However, surveys exist only for a few infl ation variables and for particular horizons. BMA has the important advantage that it can be implemented for any infl a-tion measure and at any forecast horizon.

6. Equal-weighted averaging, BMA-OLS and BMA all generally give slightly better forecasts when using all predictors than when using asset prices only. This is, however, not uniformly true. Meanwhile, infl ation prediction using asset prices alone can be implemented in real time and can be updated at arbitrarily high frequency.

Dependence on the hyperparameter fThe results in Table II evaluate the performance of BMA-OLS and BMA using one particular value of the key shrinkage parameter, f. To illustrate the sensitivity of the results to varying this param-eter, Figure 1 plots the ratio of the out-of-sample RMSPE from BMA-OLS to that from equal-weighted averaging against f using all predictors over the full sample period. Because BMA-OLS reduces to equal-weighted averaging when f = 0, the ratio of the RMSPE from BMA-OLS to that from equal-weighted averaging is by construction equal to 1 when f = 0. As shrinkage becomes less intense (f goes up), this ratio generally initially declines but then typically starts to rise again. Raising f thus usually improves the forecasting power of BMA up to a point, but a suffi ciently large value of f usually leads to less good forecasting performance.

How the weights vary across models and over timeBy construction, when f = 0, all of the models will have equal posterior probability. On the other hand, with a suffi ciently large value of f, it is to be expected that the posterior weights will cluster on one or a very small number of models, effectively undoing the effect of any forecast combination. For 2- and 4-quarter horizon forecast of infl ation using all predictors, Figure 2 plots the cumulative posterior probability on the top fi ve models and the posterior probability associated with the top single model, for different values of f, using the full sample period. At the 2-quarter horizon, with

Page 10: Forecasting US inflation by Bayesian model averaging

140 J. H. Wright

Copyright © 2008 John Wiley & Sons, Ltd. J. Forecast. 28, 131–144 (2009) DOI: 10.1002/for

Figure 1. Out-of-sample RMSPE of BMA-OLS forecasts relative to equal-weighted averaging plotted against f (all predictors). This fi gure plots the out-of-sample RMSPE from BMA-OLS forecasts of CPI relative to that of equal-weighted averaging of OLS forecasts against f. By construction the ratio is equal to 1 when f = 0. Ratios smaller than one mean that BMA-OLS is giving more accurate forecasts

f = 1, the value used in the results in Table II, the model with the single highest weight gets a pos-terior probability of 11%6 and the top fi ve models together get a weight of 31%. While these weights are high, they are not so high as to amount to using just one or a very small number of models. Figure 2 clearly shows that the concentration of the weights is an increasing function of f, as is to be expected.

Figure 3 shows the time series of the recursive posterior probabilities associated with all of the 107 different models in forecasting infl ation at the 2-quarter horizon. They are shown on a semi-logarithmic scale. While they fl uctuate over time, they do so fairly slowly. It can be seen that the posterior probabilities tend to drift apart over time, which makes sense, because the prior probabil-ities are all equal, but the weight of the data relative to the prior increases as the sample period used for estimation gets longer.

6 The model with the highest posterior probability uses the spread of 6-month Treasury bill yields over the funds rate as the predictor, consistent with research showing that term spreads have predictive power for future infl ation (see, for example, Mishkin, 1990).

Page 11: Forecasting US inflation by Bayesian model averaging

Forecasting US Infl ation by Bayesian Model Averaging 141

Copyright © 2008 John Wiley & Sons, Ltd. J. Forecast. 28, 131–144 (2009) DOI: 10.1002/for

Robustness to outliersTo investigate whether the results in Table II are being heavily driven by outliers, I consider a less outlier-sensitive metric, namely the fraction of the time BMA-OLS or BMA gives a forecast that turns out to be better than that from equal-weighted model averaging, in the sense of being closer to the subsequent ex post realized infl ation rate. Table III reports the proportion of quarters where BMA-OLS/BMA gives a more accurate out-of-sample forecast than equal-weighted model averag-ing, both over the whole period 1971Q1–2006Q1 and over the more recent subsample 1987Q1–2006Q1, and both when using all possible predictors, and when using only asset prices. The elements of Table III are nearly all above 0.5, substantially so at longer horizons, confi rming that BMA-OLS and BMA are both usually more accurate than equal-weighted averaging. In some cases BMA-OLS outperforms equal-weighted averaging about 70% of the time. The comparatively good performance of the Bayesian model averaging schemes does not appear to be the result of outliers.

Figure 2. Posterior probabilities on the models with the highest and top fi ve weights plotted against j (all predictors). The top two panels plot the sum of the posterior probabilities on the models that have the top fi ve posterior probabilities, for forecasting CPI at alternative horizons, against f. The bottom two panels show the largest posterior probability of any single model. These probabilities are estimated using the full sample 1971Q1–2006Q1. By construction, when f = 0, the weights are all equal. The baseline results in this paper set f = 1

Page 12: Forecasting US inflation by Bayesian model averaging

142 J. H. Wright

Copyright © 2008 John Wiley & Sons, Ltd. J. Forecast. 28, 131–144 (2009) DOI: 10.1002/for

Figure 3. Recursive posterior probabilities on all models for 2-quarter-ahead forecasting (j = 1). This shows the time series of recursive posterior probabilities on all models. A logarithmic scale was used on the vertical axis

Table III. Fraction of quarters when BMA-OLS /BMA gives ex post better prediction than equal-weighted averaging of CPI

Horizon All Predictors Asset Prices Only

1971Q1–2006Q1 1987Q1–2006Q1 1971Q1–2006Q1 1987Q1–2006Q1

BMA-OLS BMA BMA-OLS BMA BMA-OLS BMA BMA-OLS BMA

1 0.55 0.54 0.53 0.61 0.55 0.55 0.58 0.642 0.51 0.50 0.48 0.55 0.56 0.50 0.56 0.563 0.58 0.55 0.49 0.52 0.57 0.57 0.49 0.554 0.58 0.55 0.48 0.52 0.62 0.55 0.52 0.555 0.56 0.55 0.58 0.53 0.57 0.52 0.57 0.526 0.65 0.60 0.70 0.62 0.67 0.59 0.71 0.617 0.67 0.61 0.66 0.62 0.67 0.62 0.71 0.648 0.62 0.64 0.65 0.68 0.67 0.64 0.69 0.68

Notes: The shrinkage parameter j is set to 1.

Page 13: Forecasting US inflation by Bayesian model averaging

Forecasting US Infl ation by Bayesian Model Averaging 143

Copyright © 2008 John Wiley & Sons, Ltd. J. Forecast. 28, 131–144 (2009) DOI: 10.1002/for

CONCLUSIONS AND FUTURE RESEARCH

A theme of much recent empirical work on infl ation forecasting is that the judicious pooling of information from a large number of variables provides the best approach to predicting infl ation. One method that has been particularly successful is to simply average the forecasts from a large number of models, each of which has a single predictor variable. The result that equal-weighted averaging gives the best forecasts of infl ation (and other variables) has been found by many authors, and the robustness of this result is rather surprising. In this paper, I have considered instead using BMA for US infl ation forecasting and found that it typically outperforms equal-weighted forecast averaging. This result is consistent across different subperiods and across different infl ation measures. BMA also fares well relative to a number of simple time series benchmarks and is competitive with survey forecasts of infl ation, but it has the important advantage that it can be implemented for any infl ation measure and any forecast horizon, while surveys exist only for certain infl ation measures and at some horizons. All in all, I conclude that BMA should be taken very seriously as a method for forecasting infl ation. In future work, it may be possible to get still better forecasts by incorporating nonlinear models in the exercise, by incorporating Greenbook and private sector survey forecasts of infl ation, by considering models with more than one predictor.

ACKNOWLEDGEMENTS

I am grateful to Ben Bernanke, David Bowman, Derek Bunn, Jon Faust, Kirstin Hubrich, Lutz Kilian, Mike McCracken, Matt Pritsker and Pietro Veronesi for helpful comments on earlier versions of this manuscript and to Sergey Chernenko for excellent research assistance. The views in this paper are solely the responsibility of the author and should not be interpreted as refl ecting the views of the Board of Governors of the Federal Reserve System or of any person associated with the Federal Reserve System.

REFERENCES

Ang A, Bekaert G, Wei M. 2007. Do macro variables, asset markets or surveys forecast infl ation better?, Journal of Monetary Economics 54: 1163–1212.

Atkeson A, Ohanian LE. 2001. Are Phillips curves useful for forecasting infl ation? Federal Reserve Bank of Minneapolis Quarterly Review 25: 2–11.

Avramov D. 2002. Stock return predictability and model uncertainty. Journal of Financial Economics 64: 423–458.

Bates JM, Granger CWJ. 1969. The combination of forecasts. Operations Research Quarterly 20: 451–468.Bernanke BS, Boivin J. 2003. Monetary policy in a data-rich environment. Journal of Monetary Economics 50:

525–546.Bunn DW. 1985. Statistical effi ciency in the linear combination of forecasts. International Journal of Forecasting

1: 151–163.Cecchetti S, Chu R, Steindel C. 2000. The unreliability of infl ation indicators. Federal Reserve Bank of New York

Current Issues in Economics and Finance 6: 1–6.Chipman H, George EI, McCulloch RE. 2001. The practical implementation of Bayesian model selection. Working

paper. Wharton School, University of Pennsylvania, Philadelphia.Clemen RT. 1989. Combining forecasts: a review and annotated bibliography. International Journal of

Forecasting 5: 559–583.Clemen RT, Winkler RL. 1986. Combining economic forecasts. Journal of Business and Economic Statistics 4:

39–46.

Page 14: Forecasting US inflation by Bayesian model averaging

144 J. H. Wright

Copyright © 2008 John Wiley & Sons, Ltd. J. Forecast. 28, 131–144 (2009) DOI: 10.1002/for

Cremers KJM. 2002. Stock return predictability: a bayesian model selection perspective. Review of Financial Studies 15: 1223–1249.

Doppelhofer G, Miller RI, Sala-i-Martin X. 2000. Determinants of long-term growth: a Bayesian averaging of classical estimates (BACE) approach. NBER Working Paper 7750.

Fernandez C, Ley E, Steel MFJ. 2001. Model uncertainty in cross-country growth regressions. Journal of Applied Econometrics 16: 563–576.

Fernández-Villaverde J, Rubio-Ramírez JF. 2004. Comparing dynamic equilibrium models to data: a Bayesian approach. Journal of Econometrics 123: 153–187.

Hendry DF, Clements MP. 2002. Pooling of forecasts. Econometrics Journal 5: 1–26.Hjort NL, Claeskens G. 2003. Frequentist model average estimators. Journal of the American Statistical Associa-

tion 98: 879–899.Kass RE, Raftery AE. 1995. Bayes factors. Journal of the American Statistical Association 90: 773–795.Key JT, Pericchi LR, Smith AFM. 1998. Choosing among models when none of them are true. In Proceedings of

the Workshop on Model Selection: Special Issue of Rassegna di Metodi Statistici ed Applicazioni, Racugno W (ed.). Pitagore Editrice: Bologna.

Koop G, Potter S. 2003. Forecasting in large macroeconomic panels using Bayesian model averaging. Federal Reserve Bank of New York Staff Report 163.

Leamer EE. 1978. Specifi cation Searches. Wiley: New York.Min C, Zellner A. 1993. Bayesian and non-Bayesian methods for combining models and forecasts with applica-

tions to forecasting international growth rates. Journal of Econometrics 56: 89–118.Mishkin FS. 1990. What does the term structure tell us about future infl ation? Journal of Monetary Economics

25: 77–95.Raftery AE, Madigan D, Hoeting JA. 1997. Bayesian model averaging for linear regression models. Journal of

the American Statistical Association 92: 179–191.Sims CA. 2002. The role of models and probabilities in the monetary policy process. Brookings Papers on Eco-

nomic Activity 2: 1–40.Stock JH, Watson MW. 1999. Forecasting Infl ation. Journal of Monetary Economics 44: 293–335.Stock JH, Watson MW. 2002. Forecasting using principal components from a large number of predictors. Journal

of the American Statistical Association 97: 1167–1179.Stock JH, Watson MW. 2003. Forecasting output and infl ation: the role of asset prices. Journal of Economic

Literature 41: 788–829.Stock JH, Watson MW. 2004. Combination forecasts of output growth in a seven country dataset. Journal of

Forecasting 23: 204–430.Stock JH, Watson MW. 2005. An empirical comparison of methods for forecasting using many predictors. Harvard

University, Cambridge.Stock JH, Watson MW. 2007. Has infl ation become harder to forecast? Journal of Money, Credit and Banking

39: 3–34.Wright JH. 2003. Bayesian model averaging and exchange rate forecasts. Journal of Ecometrics Forthcoming.Zellner A. 1971. An Introduction to Bayesian Inference in Econometrics. Wiley: New York.

Author’s biography:Jonathan H. Wright is Professor of Economics at Johns Hopkins University. He earned his PhD in 1997 from Harvard University, taught at the University of Virginia for two years and worked for nine years at the Federal Reserve Board. He is an Associate Editor of the Journal of Business and Economic Statistics and Journal of Applied Econometrics. His work is at the intersection of macroeconomics, fi nance and time series econometrics and he has published in journals including Econometrica, the Journal of Econometrics, the Review of Economics and Statistics, the Journal of Monetary Economics, the Journal of International Economics, the Journal of Busi-ness and Economic Statistics and the Journal of Applied Econometrics.

Author’s address:Jonathan H. Wright, Department of Economics, Johns Hopkins University, 3400N. Charles St, Baltimore, MD 21218, USA.