forecasting lecture 2: forecast combination, multi-step ...bhansen/cbc/cbc2.pdfgranger-ramanathan...

ForecastingLecture 2: Forecast Combination,

Multi-Step Forecasts

Bruce E. Hansen

Central Bank of ChileOctober 29-31, 2013

Bruce Hansen (University of Wisconsin) Forecast Combination and Multi-Step Forecasts October 29-31, 2013 1 / 82

Today’s Schedule

Combination Forecasts

Multi-Step Forecasting

Fan Charts

Iterated Forecasts

If time (optional): Threshold models or Nonlinear/NonparametricTime Series

Review

Optimal point forecast of yn+1 given information In is the conditionalmean E (yn+1|In)Linear model E (yn+1|In) ' β′xn is an approximationEstimate linear projections by least-squares

Model selection should focus on performance, not “truth”I Best forecast has smallest MSFEI Unknown, but MSFE can be estimatedI CV is a good estimator of MSFE

Good forecasts rely on selection of leading indicators

Combination Forecasts

Diversity of Forecasts

Model choice is criticalI Classic approach: SelectionI Modern approach: Combination

Issues:I How to select from a wide set of models/forecasts?

F Model selection criteria

I How to combine a wide set of models/forecasts?

F Weight selection criteria

Foundation

The ideal point forecast minimizes the MSFE

The goal of a good combination forecast is to minimize the MSFE

Forecast Selection

M forecasts: f = {f (1), f (2), ..., f (M)}Selection picks m to determine the forecast f = f (m)

M weights: w = {w(1),w(2), ...,w(M)}A combination forecast is the weighted average

f (w) =M

∑m=1

w(m)f (m)

= w′f

Combination generalizes selection

Possible restrictions on the weight vector

∑Mm=1 w(m) = 1I UnbiasednessI Typically improves performance

w(m) ≥ 0I nonnegativityI regularizationI Often critical for good performance

w(m) ∈ {0, 1}I Equivalent to forecast selectionI f (w) = f (m)I Selection is a special case of combinationI Strong restriction

OOS Forecast Combination

Sequence of true out-of-sample forecasts ft for yt+1Combination forecast is f (w) = w′fOOS empirical MSFE

σ2(w) =1P

∑t=n−P

(yt+1 −w′ft

)2PLS selected the model with the smallest OOS MSFE

Granger-Ramanathan combination: select w to minimize the OOSMSFE

Minimization over w is equivalent to the least-squares regression of yton the forecasts

yt+1 = w′ft + εt+1

Granger-Ramanathan (1984)

Unrestricted least-squares

∑t=n−P

ft f ′t

)−1 n

∑t=n−P

ftyt+1

This can produce weights far outside [0, 1] and don’t sum to one

Granger-Ramanathan’s intuition was that this flexibility is goodI But they provided no theory to support conjecture

Unrestricted weights are not regularizedI This results in poor sampling performance

Alternative Representation

Take yt+1 = w′ft + εt+1, subtract yt+1 from each side

0 = w′ft − yt+1 + εt+1

Impose restriction that weights to sum to one.

0 = w′ (ft − yt+1) + εt+1

Define et+1 = w′ (ft − yt+1) , the (negative) forecast errors. Then

0 = w′et+1 + εt+1

This is the regression of 0 on the forecast errors

But it is still better to also impose non-negativity w(m) ≥ 0

Constrained Granger-Ramanathan

The constrained GR weights solve the problem

minww′Aw

subject to

∑m=1

w(m) = 1

0 ≤ w(m) ≤ 1where

A = ∑tet+1e′t+1

is the M ×M matrix of forecast error empirical variances/covariances

Quadratic Programming (QP)

The weights lie on the unit simplex

The constrained GR weights minimize a quadratic over the unitsimplex

QP algorithms easily solve this problemI Gauss (qprog)I Matlab (quadprog)I R (quadprog)

Solution solution typicalI Many forecasts will receive zero weight

Bates-Granger (1969)

Assume A = ∑t et+1e′t+1 is diagonal.Then the regression with the coeffi cients constrained to sum to one

0 = w′et+1 + εt+1

has solution

w(m) =σ−2(m)

∑Mj=1 σ−2(j)

This are the Bates-Granger weights.

In many cases, they are close to equality, since OOS forecastvariances can be quite similar

Bayesian Model Averaging (BMA)

Put priors on individual models, and priors on the probability thatmodel m is the true model

Compute posterior probabilites w(m) that m is the true model

Forecast combination using w(m)

AdvantagesI Conceptually simpleI no theoretical analysis requiredI applies in broad contexts

DisadvantagesI Not designed to minimize forecast riskI Similar to BIC: asymptotically picks “true”finite modelsI does not distinguish between 1-step and multi-step forecast horizons

BMA Approximation

BIC weights

w(m) ∝ exp(−BIC (m)

)Simple approximation to full BMA method

Smoothed version of BIC selection

Works better than BIC selection in simulations

AIC Weights

Smooted AIC

w(m) ∝ exp(−AIC (m)

)Proposed by Buckland, Burnhamm and Augustin (1997)

Not theoretically motivated, but works better than AIC selection insimulations

Comments

Combination methods typically work better (lower MSFE) thancomparable selection methods

BIC and BMA not optimal for MSFE

Granger-Ramanathan has similar senstive as PLS to choice of P

Bates-Granger and weighted AIC have no theoretical grounding

Forecast Combination

yn+1(w) =M

∑m=1

w(m)yn+1(m)

∑m=1

w(m)xn(m)′ β(m)

= x′n β(w)

β(w) =M

∑m=1

w(m)β(m)

In Iinear models, the combination forecast is the same as the forecastbased on the weighted average of the parameter estimates across thedifferent modelsComputationally, it is easiest to calculate the M individual forecastyn+1(m), then take the weighted average to obtain yn+1(w)

Combination Residuals

et+1(w) = yt+1 − x′t β(w)

∑m=1

w(m)(yt+1 − x′t β(m)

∑m=1

w(m)et+1(m)

In linear models, the residual from the combination model is the sameas the weighted average of the model residuals.

Mallows Averaging Criterion

Cn(w) = σ2(w) +2n

∑m=1

w(m)k(m)

with σ2 an estimate from a “large”model

Cn(w) is an estimate of the MSFE (assuming homoskedasticity)Hansen (2007, Econometrica) Mallows Model Averaging (MMA)Hansen (Journal of Econometrics, 2008) Forecast Model Averaging(FMA)Combination weights found by constrained minimization

w = argminw

subject toM

∑m=1

w(m) = 1

0 ≤ w(m) ≤ 1Solution by Quadratic Programming (QP)

Theory of Optimal Weights

Hansen (2007, Econometrica)

Mallows weight selection is asymptotically optimal underhomoskedasticity

[In large samples, equivalent to using MSFE-minimizing weights

Comparison of Granger-Ramanathan and FMA

Both are solved by Quadratic Programming (QP)

Both typically yield corner solutions —many forecasts will receive zeroweight

GR uses empirical (OOS) forecast errors, FMA uses sample residuals

GR uses no penalty, FMA uses “average # of parameters”penalty

FMA is an estimate of MSFE for homoskedastic one-step forecasts,GR has no optimality

Cross-Validation

Leave-one-out estimator

β−t (w) =M

∑m=1

w(m)β−t (m)

Leave-one-out prediction residual

et+1(m) = yt+1 −M

∑m=1

w(m)β−t (w)′xt (m)

∑m=1

w(m)et+1(m)

CVn(w) =1n

∑n−1t=0 et+1(w)

2 is an estimate of MSFEn(m)

Cross-validation (CV) criterion for regression combination/averaging

Cross-validation Weights

Combination weights found by constrained minimization of CVn(w)

minwCVn(w) = w′Sw

subject to

∑m=1

w(m) = 1

0 ≤ w(m) ≤ 1

Cross-validation for combination forecasts (theory)

Theorem: ECVn(w) ' Cn(w)For heteroskedastic forecasts, CV is a valid estimate of the one-stepMSFE from a combination forecast

Hansen and Racine (Journal of Econometrica, 2012) show that theCV weights are asymptotically optimal for cross-section data underheteroskedasticity

Summary: Forecast Combination Methods

Granger-Ramanathan (GR), forecast model averaging (FMA) andcross-validation (CV) all pick weight vectors by quadraticminimization

GR only needs actual forecasts, the method can be unknown or ablack box

CV can be computed for a wide variety of estimation methodsI optimality theory for linear estimation

FMA limited to homoskedastic one-step-ahead models

Smoothed AIC (SAIC) and BMA have no forecast optimality, and aredesigned for homoskedastic one-step-ahead forecasts.

Example: AR models for GDP Growth

Fit AR(1) and AR(2) only

Leave-one-out residuals e1t and e2tCovariance matrix

S =[10.72 10.4410.44 10.52

]The best-fitting single model is AR(2)

The best combination is w = (.22, .78)′

CV = 10.50

Example: AR models for GDP Growth

Fit AR(0) through AR(12)

AR(0) is constant only

Models with positive weight are AR(0), AR(1), AR(2)

w = (.06, .16, .78)′

12.0 10.6 10.410.6 10.7 10.410.4 10.5 10.5

CV = 10.50 (essentially unchanged)

Example: Leading Indicator Forecasts

Fit AR(1), AR(2) with leading indicators

Models with positive weight

wAR(1), Spread, Housing 0.13AR(1), Spread, High-Yield, Housing 0.16AR(1), Spread, High-Yield, Housing, Building 0.52AR(2) 0.18AR(2), Spread 0.01

CV = 9.81

Summary: Forecast Combination by CVM forecasts fn+1(m) from n observations

For each estimate mI Define the leave-one-out prediction error

et+1(m) = yt+1 − β′(−t)(m)xt (m)

=et+1(m)1− htt (m)

I Store the n× 1 vector e(m)Construct the M ×M matrix

S =1ne ′e

Find the M × 1 weight vector w which minimizes w′SwI Use quadratic programming (quadprog) to find solution

The combination forecast is fn+1 = ∑Mm=1 w(m)fn+1(m)

Forecast Combination Criticisms

There has been considerable skepticism about formal forecastcombination method in the forecast literature

Many researchers have found that equal weighting: (wm = 1/M)works as well as formal methods

However, the formal methods which investigated areI Bates-Granger simple weights

F Not expected by theory to work well

I Unconstrained Granger-Ramanathan

F Without imposing [0, 1] weights, work terribly!

Furthermore, most investigations examine pseudo out-of-sampleperformance

I Identical to comparing models by PLS criterionI This is NOT an investigation of performanceI Just a ranking by PLS

Another Example - 10-Year Bond Rate

Estimated AR(1) through AR(24) models

CV Selection picked AR(2)

CV weight Selection: Models with positive weightI AR(0): w = 0.04I AR(1): w = 0.04I AR(2): w = 0.47I AR(6): w = 0.23I AR(22): w = 0.22

MInimizing CV = 0.0761 (slightly lower than 0.0768 from AR(2))

Point forecast 1.96 (same as from AR(2))

Forecast horizon: h

We say the forecast is “multi-step” if h > 1

Forecasting yn+h given Ine.g., forecasting GDP growth for 2012:3, 2012:4, 2013:1, 2013:2

The forecast distribution is yn+h | In ∼ Fh(yn+h |In)

Point Forecast

fn+h|h minimizes expected squared loss

fn+h|h = argminf

E((yn+h − f )2 |In

)= E (yn+h |In)

Optimal point forecasts are h-step conditional means

Relationship Between Forecast HorizonsTake an AR(1) model

yt+1 = αyt + ut+1

Iterate

yt+1 = α (αyt−1 + ut ) + ut+1= α2yt−1 + αut + ut+1

yt+2 = α2yt + et+2ut+2 = αut+1 + ut+2

Repeat h times

yt+h = αhyt + et+het+h = ut+h + αut+h−1 + α2ut+h−2 + · · ·+ αh−1ut+1

h-step forecast

yt+h = αhyt + et+het+h = ut+h + αut+h−1 + α2ut+h−2 + · · ·+ αh−1ut+1

E (yn+h |In) = αhyn

h−step point forecast is linear in ynh-step forecast error en+h is a MA(h− 1)

AR(2) Model

1-step AR(2) model

yt+1 = α0 + α1yt + α2yt−1 + ut+1

2-steps ahead

yt+2 = α0 + α1yt+1 + α2yt + ut+2

Taking conditional expectations

E (yt+2|It ) = α0 + α1E (yt+1|It ) + α2E (yt |It ) + E (et+2|It )= α0 + α1 (α0 + α1yt + α2yt−1) + α2yt= α0 + α1α0 +

(α21 + α2

)yt + α1α2yt−1

which is linear in (yt , yt−1)

In general, a 1-step linear model implies an h-step approximate linearmodel in the same variables

AR(k) h-step forecasts

Ifyt+1 = α0 + α1yt + α2yt−1 + · · ·+ αkyt−k+1 + ut+1

thenyt+h = β0 + β1yt + β2yt−1 + · · ·+ βkyt−k+1 + et+h

where et+h is a MA(h-1)

Leading Indicator Models

yt+1 = x′tβ+ ut

thenE (yt+h |It ) = E (xt+h−1|It )′ β

If E (xt+h−1|It ) is itself (approximately) a linear function of xt , then

E (yt+h |It ) = x′tγ

yt+h = x′tγ+ et+h

Common Structure: h-step conditional mean is similar to 1-step structure,but error is a MA.

Forecast Variable

We should think carefully about the variable we want to report in ourforecast

The choice will depend on the context

What do we want to forecast?I Future level: yn+h

F interest rates, unemployment rates

I Future differences: ∆yt+hI Cummulative Change: ∆yt+h

F Cummulative GDP growth

Forecast Transformation

fn+h|n = E (yn+h |In) = expected future levelI Level specification

yt+h = x′tβ+ et+hfn+h|n = x′tβ

I Difference specification

∆yt+h = x′tβh + et+hfn+h|n = yn + x′tβ1 + · · ·+ x′tβh

I Multi-Step difference specification

yt+h − yt = x′tβ+ et+hfn+h|n = yn + x′tβ

Direct and Iterated

There are two methods of multistep (h > 1) forecasts

Direct ForecastI Model and estimate E (yn+h |In) directly

Iterated ForecastI Model and estimate one-step E (yn+1 |In)I Iterate forward h stepsI Requires full model for all variables

Both have advantages and disadvantagesI For now, we will forcus on direct method.

Direct Multi-Step Forecasting

Markov approximationI E (yn+h |In) = E (yn+h |xn , xn−1, ...) ≈ E (yn+h |xn , ..., xn−p)

Linear approximationI E (yn+h |xn , ..., xn−p) ≈ β′xn

Projection Definition

I β = (E (xtx′t ))−1 (E (xtyt+h))

Forecast errorI et+h = yt+h − β′xt

Multi-Step Forecast Model

yt+h = β′xt + et+h

β =(E(xtx′t

))−1(E (xtyt+h))

E (xtet+h) = 0

σ2 = E(e2t+h

Least Squares Estimation

(n−1∑t=0

xtx′t

)−1 (n−1∑t=0

xtyt+h

)yn+h|n = fn+h|n = β

Residuals

Least-squares residuals

I et+h = yt+h − β′xt

I Standard, but overfit

Leave-one-out residualsI et+h = yt+h − β

′−txt

I Does not correct for MA errors

Leave h out residuals

et+h = yt+h − β′−t ,hxt

β−t ,h =

|j+h−t |≥hxjx′j

)−1 (∑

|j+h−t |≥hxjyj+h

)The summation is over all observations outside h− 1 periods of t + h.

Example: GDP Forecast

yt = 400 log(GDPt )

Forecast Variable: GDP growth over next h quarters, at annual rate

yt+h − yth

= β0+ β1∆yt + β1∆yt−1+Spreadt +HighYieldt + β2HSt + et+h

HSt =Housing Startst

h = 1 h = 2 h = 3 h = 4β0 −0.33 (1.0) −0.38 (1.3) −0.01 (1.6) 0.47 (1.8)∆yt 0.16 (.10) 0.18 (.09) 0.13 (.08) 0.13 (.09)∆yt−1 0.09 (.10) 0.04 (.05) 0.05 (.07) 0.02 (.06)Spreadt 0.61 (.23) 0.65(.19) 0.65 (.22) 0.65 (.25)HighYieldt −1.10 (.75) −0.68 (.70) −0.48 (.90) −0.41 (1.01)HSt 1.86 (.65) 1.64 (.70) 1.31 (.80) 1.01 (.94)

Example: GDP Forecast

Cummulative Annualized GrowthForecast Actual

2012:2 1.3 1.22012:3 1.6 2.02012:4 2.9 1.42013:1 2.2 1.32013:2 2.4 1.52013:3 2.72013:4 2.92014:1 3.2

Selection and Combination for h step forecasts

AIC routinely used for model selection

PLS (OOS MSFE) routinely used for model evaluation

Neither well justified

Not well studied problem

I recommend “leave h out” cross-validation.

Topic of afternoon seminar

Minimize sum of squared “leave h out” residuals, separately for eachforecast horizon.

Example: GDP Forecast Weights by Horizon

h = 1 h = 2 h = 3 h = 4 h = 5 h = 6 h = 7AR(1) .15 .19 .28 .18 .16 .11AR(2) .30AR(1)+HS .66 .70 .22AR(1)+HS+BP .14 .58 .72 .82 .84 .89AR(2)+HS .04

yn+h|n 1.7 2.0 1.9 2.0 2.1 2.3 2.6

h-step Variance Forecasting

Not well developed using direct methods

h-step Interval Forecasts

Similar to 1-step interval forecastsI But calculated from h−step residuals

Use constant variance specification

Let qe (α) and qe (1− α) be the α’th and (1− α)’th percentiles ofresiduals et+hForecast Interval:

[µn + qε(α), µn + q

e (1− α)]

Fan Charts

Plots of a set of interval forecasts for multiple horizonsI Pick a set of horizons, h = 1, ...,HI Pick a set of quantiles, e.g. α = .10, .25, .75, .90I Recall the quantiles of the conditional distribution areqn(α, h) = µn(h) + σn(h)qε(α, h)

I Plot qn(.1, h), qn(.25, h), µn(h), qn(.75, h), qn(.9, h) against h

Graphs easier to interpret than tables

Illustration

I’ve been making monthly forecasts of the Wisconsin unemploymentrate

Forecast horizon h = 1, ..., 12 (one year)

Quantiles: α = .1, .25, .75, .90

This corresponds to plotting 50% and 80% forecast intervals

50% intervals show “likely” region (equal odds)

Comments

Showing the recent history gives perspective

Some published fan charts use colors to indicate regions, but do notlabel the colors

Labels important to infer probabilities

I like clean plots, not cluttered

Illustration: GDP Growth

Figure: GDP Average Growth Fan Chart

2011.0 2011.5 2012.0 2012.5 2013.0 2013.5 2014.0

It doesn’t “fan”because we are plotting average growth

Figure: Fan Chart with Actuals

2011.0 2011.5 2012.0 2012.5 2013.0 2013.5 2014.0

Iterated Forecasts

Estimate one-step forecast

Iterate to obtain multi-step forecasts

Only works in complete systemsI AutoregressionsI Vector autoregressions

Vector Autoregresive Modelsyt is an p vectorxt are other variables (including lags)Ideal point forecast E (yn+1|In)Linear approximation

E (yn+1|In) ' A1yt + A2yt−1 + · · ·+ Akyt−k+1 + Bxt

Vector Autoregression (VAR)

yt+1 = A1yt + A2yt−1 + · · ·+ Akyt−k+1 + Bxt + et+1

Estimation: Least squares

yt+1 = A1yt + A2yt−1 + · · · +Akyt−k+1 + Bxt + et+1

One-Step-Ahead Point forecast

yn+1 = A1yn + A2yn−1 + · · ·+ Akyn−k+1 + Bxn

Vector Autoregresive versus Univariate Models

Let xt = (yt , yt−1, ..., xt )Then a VAR is a set of p regression models

y1t+1 = β′1xt + e1t...

ypt+1 = β′pxt + ept

All variables xt enter symmetrically in each equationSims (1980) argued that there is no a priori reason to include orexclude an individual variable from an individual equation.

Model Selection

Do not view selection as identification of “truth”

Rather, inclusion/exclusion is to improve finite sample performanceI minimize MSFE

Use selection methods, equation-by-equation

Example: VAR with 2 variables

y1t+1 = β11y1t + β12y1t−1 + β13y2t + e1t...

y2t+1 = β21y1t + β22y2t + β23y2t−1 + e2t

Selection picks y1t , y1t−1, y2t for equation for y1t+1Selection picks y1t , y2t , y2t−1 for equation for y2t+1The two equations have different variables

Same as system

yt+1 = A1yt + A2yt−1 + et+1

[β11 β13β21 β22

[β12 00 β23

]The VAR system notation is still quite useful for many purposes(including multi-step forecasting)

Iterative Forecast Relationships in Linear VARvector yt

yt+1 = A0 + A1yt + A2yt−1 + · · ·+ Akyt−k+1 + ut+11-step conditional mean

E (yt+1|It ) = A0 + A1E (yt |It ) + · · ·+ AkE (yt−k+1|It )= A0 + A1yt + A2yt−1 + · · ·+ Akyt−k+1

2-step conditional mean

h−step conditional meanE (yt+1|It−h+1) = E

(E (yt+1|It ) |It−h+1

)= A0 + A1E (yt |It−h+1) + · · ·+ AkE (yt−k+1|It−h+1)

Linear in lower-order (up to h− 1 step) conditional meansBruce Hansen (University of Wisconsin) Forecast Combination and Multi-Step Forecasts October 29-31, 2013 70 / 82

Iterative Least Squares Forecasts

Estimate 1-step VAR(k) by least-squares

yt+1 = A0 + A1yt + A2yt−1 + · · ·+ Akyt−k+1 + ut+1

Gives 1-step point forecast

yn+1|n = A0 + A1yn + A2yn−1 + · · ·+ Akyn−k+1

2-step iterative forecast

yn+2|n = A0 + A1yn+1|n + A2yn + · · ·+ Akyn−k+2

h−step iterative forecast

yn+h|n = A0 + A1yn+h−1|n + A2yn+h−2|n + · · ·+ Ak yn+h−k |n

This is (numerically) different than the direct LS forecast

Illustration 1: GDP Growth

AR(2) Model

yt+1 = 1.6+ 0.30yt + .16yt−1yn = 1.8, yn−1 = 2.9

yn+1 = 1.6+ 0.30 ∗ 1.8+ .16 ∗ 2.9 = 2.6yn+2 = 1.6+ 0.30 ∗ 2.6+ .16 ∗ 1.8 = 2.7yn+3 = 1.6+ 0.30 ∗ 2.7+ .16 ∗ 2.6 = 2.9yn+4 = 1.6+ 0.30 ∗ 2.9+ .16 ∗ 2.7 = 3.0

Point Forecasts

2012:2 2.652012:3 2.722012:4 2.872013:1 2.932013:2 2.972013:3 2.992013:4 3.002014:1 3.01

Illustration 2: GDP Growth+Housing Starts

VAR(2) Model

y1t = GDP Growth, y2t =Housing Starts

xt = (GDP Growtht , Housing Startst , GDP Growtht−1, HousingStartst−1yt+1 = A0 + A1yt + A2yt−1 + ut+1y1t+1 = 0.43+ 0.15y1t + 11.2y2t + 0.18y1t−1 − 10.1y2t−1y2t+1 = 0.07− 0.001y1t + 1.2y2t − 0.001y1t−1 − 0.26y2t−1

Illustration 2: GDP Growth+Housing Starts

y1n = 1.8, y2n = 0.71, y1n−1 = 2.9, y2n−1 = 0.68

y1n+1 = 0.43+ 0.15 ∗ 1.8+ 11.2 ∗ 0.71+ 0.18 ∗ 2.9− 10.1 ∗ 0.68 = 2.3y2t+1 = 0.07− 0.001 ∗ 1.8+ 1.2 ∗ 0.71− 0.001 ∗ 2.9− 0.26 ∗ 0.68 =0.76

y1n+2 = 0.43+ 0.15 ∗ 2.3+ 11.2 ∗ 0.76+ 0.18 ∗ 1.8− 10.1 ∗ 0.71 = 2.4y2t+1 = 0.07− 0.001 ∗ 2.3+ 1.2 ∗ 0.76− 0.001 ∗ 1.8− 0.26 ∗ 0.71 =0.80

Point Forecasts

GDP Housing2012:2 2.36 0.762012:3 2.38 0.802012:4 2.53 0.842013:1 2.58 0.882013:2 2.64 0.922013:3 2.66 0.952013:4 2.69 0.982014:1 2.71 1.01

Model Selection

It is typical to select the 1-step model and use this to make all h-stepforecasts

However, there theory to support this is incomplete

(It is not obvious that the best 1-step estimate produces the besth-step estimate)

For now, I recommend selecting based on the 1-step estimates

Model Combination

There is no theory about how to apply model combination to h-stepiterated forecasts

Can select model weights based on 1-step, and use these for allforecast horizons

Variance, Distribution, Interval Forecast

While point forecasts can be simply iterated, the other features cannot

Multi-step forecast distributions are convolutions of the 1-stepforecast distribution.

I Explicit calculation computationally costly beyond 2 steps

Instead, simple simulation methods work well

The method is to use the estimated condition distribution to simulateeach step, and iterate forward. Then repeat the simulation manytimes.

Multi-Step Forecast SimulationLet µ (x) and σ (x) denote the models for the conditional one-stepmean and standard deviation as a function of the conditional variablesxLet µ (x) and σ (x) denote the estimates of these functions, and let{ε1, ..., εn} be the normalized residualsxn = (yn, yn−1, ..., yn−p) is known. Set x∗n = xnTo create one h-step realization:

I Draw ε∗n+1 iid from normalized residuals {ε1, ..., εn}I Set y∗n+1 = µ (x∗n) + σ (x∗n) ε∗t+1I Set x∗n+1 = (y

∗n+1, yn , ..., yn−p+1)

I Draw ε∗n+2 iid from normalized residuals {ε1, ..., εn}I Set y∗n+2 = µ

(x∗n+1

)ε∗t+2

I Set x∗n+2 = (y∗n+2, y

∗n+1, ..., yn−p+2)

I Repeat until you obtain y∗n+hI y∗n+h is a draw from the h step ahead distribution

Repeat this B times, and let y ∗n+h(b), b = 1, ...,B denote the Brepetitions

Multi-Step Forecast Simulation

The simulation has produced y ∗n+h(b), b = 1, ...,B

For forecast intervals, calculate the empirical quantiles of y ∗n+h(b)I For an 80% interval, calculate the 10% and 90%

For a fan chartI Calculate a set of empirical quantiles (10%, 25%, 75%, 90%)I For each horizon h = 1, ...,H

As the calculations are linear they are numerically quickI Set B largeI For a quick application, B = 1000I For a paper, B = 10, 000 (minimum))

VARs and Variance Simulation

The simulation method requires a method to simulate the conditionalvariances

In a VAR setting, you can:I Treat the errors as iid (homoskedastic)

F Easiest

I Treat the errors as independent GARCH errors

F Also easy

I Treat the errors as multivariate GARCH

F Allows volatility to transmit across variablesF Probably not necessary with aggregate data

forecasting lecture 2: forecast combination, multi-step ...bhansen/cbc/cbc2.pdfgranger-ramanathan...

Documents

time series and forecasting lecture 2 nowcasting, forecast...

the cbc2 asic for 2s modules at hl-lhc

indoor forecast outdoor - la crosse technologyindoor...

forecast combination - university of...

cbc2: a strip readout asic with coincidence logic for...

ssc.wisc.edussc.wisc.edu/~bhansen/718/neweywest1994.pdfcreated...

da vi odaberete mjesto popravka pogled u...

time series and forecasting lecture 2 nowcasting,...

forecasting lecture 2: forecast combination, multi...

lecture 5 machine...

2016 -...

transportation funds forecast transportation funds forecast

forecast accuracy vs. forecast stabilityforecast...

cbc2: test results & plans - imperial college london ·...

dgtal forecast 2021 forecast - clarion

weather forecast – initial condition problem climate...

econometrica - university of...

multi step forecast variance -...

economics 390 economic forecasting - sscc -...

forecast uncertainty and forecast intervals