forecasting 100,000 variables: a new approach · makridakis or m-competitions)1. the m4-competition...

5

Click here to load reader

Upload: vodien

Post on 06-Aug-2018

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Forecasting 100,000 Variables: A New Approach · Makridakis or M-Competitions)1. The M4-Competition asked participants to forecast 100,000 variables. Those variables were collected

This report is available on wellsfargo.com/economics and on Bloomberg WFRE.

June 12, 2018

Economics Group

“If you only do what you can do, you’ll never be more than what you are now.”

- Kung Fu Panda III

Executive Summary One major challenge for decision making today is the abundance of information–as opposed to information scarcity–and how to efficiently utilize the available information to design effective policies. This challenge is amplified with traditional econometric/statistical tools being unable to efficiently handle thousands of variables, with millions of observations, to produce an accurate forecast. We propose a new forecasting tool, which could help decision makers forecast thousands of variables accurately. Our proposed method produced a more accurate forecast than the traditional forecasting techniques in a simulated real-time analysis.

Our forecasting framework has the potential to add value to the field of risk management by forecasting variables such as delinquency rates (mortgage/auto loans and credit cards, for example). More explicitly, our approach has the ability to provide benefits to analysts who are forecasting high frequency data (daily treasury yields, equity and commodity indices, for instance) and for those who wants to forecast economic/financial variables accurately. Essentially, our method would improve the accuracy of multi-period ahead forecasts, compared to traditional forecasting tools. It is worth mentioning that, on average, multi-period ahead forecasts, inherently, are risker, or tend to have larger forecast errors, than one-period ahead forecasting. However, it is necessary for decision makers to forecast the near-term outlook (multi-period ahead forecasting) to incorporate impending risks/rewards in policy planning. Therefore, our approach has the ability to improve decision making by producing more accurate forecasts than traditional forecasting methods.

Recently, we participated in a forecasting contest called the M4-Competition (also known as the Makridakis or M-Competitions)1. The M4-Competition asked participants to forecast 100,000 variables. Those variables were collected from all over the world and represent different sectors of an economy/country. The data set consisted of different time frequencies (from yearly to hourly datasets) as well as varied requested forecast horizons (from a minimum of six periods ahead to 48-period out forecast). We created a new framework, which utilized a system of equations to forecast those 100,000 variables.

Typically, an analyst would know what she/he is going to forecast, such as U.S. GDP or the S&P 500 index for the next few periods. However, the added fun of the M4-Competition was that we did not know what variables we were forecasting. The available information was very limited, as we only knew the data frequency and the amount of variables associated to each broader category per

1 The M4-Competition is the continuation of the forecasting competitions started by Spyros Makridakis. The first competition was held in 1982, the second in 1993 and the third competition was in 2000. At present, 216 forecasters from 45 countries have registered for the M4-competion according to the M4-Competition’s management. For more detail about M4-Competition please see the M4 web: https://www.m4.unic.ac.cy/

Special Commentary

John E. Silvia, Chief Economist

[email protected] ● (704) 410-3275

Azhar Iqbal, Econometrician [email protected] ● (704) 410-3270

Shannon Seery, Economic Analyst [email protected] ● (704) 410-1681

Forecasting 100,000 Variables: A New Approach

Our proposed method produced a more accurate forecast than the traditional forecasting techniques in a simulated real-time analysis.

Page 2: Forecasting 100,000 Variables: A New Approach · Makridakis or M-Competitions)1. The M4-Competition asked participants to forecast 100,000 variables. Those variables were collected

Forecasting 100,000 Variables: A New Approach WELLS FARGO SECURITIES June 12, 2018 ECONOMICS GROUP

2

frequency, such as the number of variables that were related to demographics or finance, etc. Therefore, we did not know whether we were building a model to forecast Chinese GDP or the population of Brazil. This lack of available knowledge discounted us the opportunity to utilize predictors within our model.

In such scenarios, when an analyst has to forecast a series and is unable to find predictors, single-equation forecasting methods, such as AR (1) or ARMA (1, 1), are traditionally the answer.2 However, a single-equation method is not the most efficient approach for multi-periods ahead forecasting. One major issue is that most single-equation methods just extend the trend of the series (either upward or downward trend) for the forecast horizon. For example, if we were to utilize an AR (1) model to forecast credit card delinquencies for the next 24 months and the estimated coefficient of the model is positive. That model would produce declining delinquency rates for the forecast horizon, which may not be realistic as delinquency rates fluctuate over time.3

Our proposed approach is based on a system of equations (or multiple equations model), which consists of more than one left-hand-side variable (dependent variables) along with more than one right-hand-side variable (independent variables). The estimation method is the Bayesian Vector Autoregressive (BVAR) method.4

We utilized the BVAR approach to generate forecasts for all 100,000 variables. However, to demonstrate our framework in this report, we have chosen one variable from each frequency category (six total variables) and set 12-periods ahead of prediction as the forecast horizon. The BVAR model produces more accurate forecasts than the AR (1) and ARMA (1, 1) models in a simulated real-time analysis.

Our proposed method would add value to the decision making process by accurately producing multi-periods ahead forecasts. Furthermore, forecasting a larger number of variables (thousands of millions) would not pose a challenge for our method. An additional benefit of the proposed forecasting framework is that it estimates forecast intervals (upper and lower-limits of the forecast) and forecast intervals can be utilized to gauge the risk to the forecast.

Necessity is the Mother of Invention: The M4-Forecast Competition The motivation behind our newly proposed forecasting framework was the M4-Competition. Before we share greater detail about our method, we will shed light on traditional forecasting tools and the issues we faced during the M4-Competition. As mentioned earlier, the target dataset consisted of 100,000 variables and the available information associated with the data was very limited. We only truly knew the frequency of the data; for example, 23,000 variables are yearly and 48,000 are monthly frequency. Another piece of given information were the categories, such as 8,708 of those variables represent demographic series, while 24,534 variables belong to the financial world; however, we did not know the specific variables associated with each category. Essentially, we were tasked with forecasting unknown variables.

Typically, when an analyst builds a forecasting model she/he at least knows what the target variable is, such as forecasting the daily closing value of the S&P500 index for the next 30 days. If we know the target variable then that information can help us select predictors for the model. Sometimes,

2 A single-equation approach utilizes one dependent variable (left-hand-side variable) with one or more right-hand-side variables. Typically, due to the lack of predictors, a forecaster includes either lags of the dependent variable as right-hand-side variables (known as Autoregressive, AR, approach) or/and also sometimes add lags of the error term in the right-hand-side of the equation (ARMA, Autoregressive Moving Average, Models). For more detail about the single-equation forecasting techniques, see Silvia, John, Azhar Iqbal, Sam Bullard, Sarah Watt and Kaylyn Swankoski. (2014). Economic and Business Forecasting: Analyzing and Interpreting Econometric Results. Wiley, 2014. 3 For more detail, see Silvia et al. (2014). 4 Litterman (1980, 1986) presented the BVAR approach. Litterman, R. (1980). Techniques for Forecasting with Vector Autoregressions. University of Minnesota, Ph. D. Thesis. And also see, Litterman, R.(1986). Forecasting with Bayesian vector Autoregressions- 5 years of experience. Journal of Business and Economic Statistics, 1986, 4, 25.38.

Our proposed approach is based on a system of equations.

Essentially, we were tasked with forecasting unknown variables.

Page 3: Forecasting 100,000 Variables: A New Approach · Makridakis or M-Competitions)1. The M4-Competition asked participants to forecast 100,000 variables. Those variables were collected

Forecasting 100,000 Variables: A New Approach WELLS FARGO SECURITIES June 12, 2018 ECONOMICS GROUP

3

however, due to the high frequency of particular target variables, analysts may choose not to include predictors, such as forecasting daily closing values of the S&P 500 index or U.S. 10-Year Treasury yield. Major economic variables are traditionally not found in daily frequencies, enabling analysts to typically utilize a single-equation model to forecast such variables. But, as mentioned earlier, most single-equation methods are not very effective for multi-period ahead forecasting, as these methods just extend the current trend (upward or downward) of the series for the forecast horizon.

Forecasting the Unknown: How to Build a Multiple-equations Model The M4-Competition asked for multi-periods ahead forecasts (with a minimum of six periods, to a maximum of 48 periods ahead). The fact that there were 100,000 unknown variables, indicates that we are unable to find predictors to include in our model. Therefore, due to the limited information preventing our ability to add predictors to our model, the likely option includes single-equation methods. However, single-equation approaches are not an effective tool for multi-periods ahead forecasts.5 Multiple-equation methods would be a better way to approach this forecast, as these methods tend to produce better predictions, as well as show some realistic paths of a target variable (fluctuations in the forecast instead of just upward or downward trend).6 However, without predictors one would be unable to build a multiple-equations model, because we need at least two variables to build such a model (2 equations for 2 variables, for instance). Therefore, the challenge became how to build a multiple-equations model given that we only have one variable (Y1t, first variable of the yearly frequency)?

To solve this mystery, we created another variable for each of the 100,000 target variables. That is, we have an original (target variable), Y1t for example, and we generated the log-difference of the original variable (let’s label it LDY1t). As log-difference is equivalent to growth rates, the LDY1t variable may represent the momentum of the series Y1t. Therefore, our model incorporates the level as well as momentum of each series, which is how we were able to build a two-equation model (for two variables, which are Y1t and LDY1t). Now, we have two dependent variables and we utilize lags of each of those two variables as right-hand-side (independent) variables. One important note here is that in a multiple-equations model (also known as a system of equations) we must select variables which are at least correlated to, if not causing, each other. These two variables should not be moving independently from each other, but should be highly connected to one another. As in the present case, the Y1t and LDY1t variables are explaining the movements in each other, as one is the level (Y1t) and the other one is momentum/growth rate (LDY1t).

Once we have a two-equation model, we estimate that model and generate forecasts using the BVAR approach. As outlined above, the BVAR model for Y1t consists of two variables, which are Y1t (original and target variable) and LDY1t (log-difference, or growth rate, of the Y1t)7.

Results: Who is the Winner? For purposes of this report, we randomly pick one series to represent each frequency (hourly, daily, weekly, monthly, quarterly and yearly), resulting in six total variables. We then set a 12-periods ahead forecast, as we are interested in multi-periods out prediction accuracy. For the sake of consistency, we utilize 50 observations of each variable to estimate the model. That is, for each of the six variables, the first 50 observations are utilized as the model input, and we request a 12-period output. While we have the actual 12-periods ahead values, we are able to compare the accuracy of each model within our analysis to the true values. We utilized three competing models

5 It is worth mentioning that the M4 Management utilized 10 different forecasting methods as benchmarks which includes the seasonal and trend approaches and those techniques are not eligible for the competition. Basically, the competition forced us to utilize advanced or some innovative ways to forecast those 100,000 variables. For more detail please see M4-Competition web: https://www.m4.unic.ac.cy/ 6 For more detail about the multiple-equations method please see, Silvia, John and Azhar Iqbal. (2012). A Comparison of Consensus and BVAR Macroeconomic Forecasts. Business Economics, Vol. 47, No. 4. 7 The BVAR model is an extension of the Sims (1980) Vector Autoregression (VAR). For a detail discussion about the VAR modeling please see, Sims, C. A. (1980). Macroeconomics and Reality. Econometrica, Vol. 48, no 1, 1-48.

Our model incorporates the level as well as momentum of each series, which is how we were able to build a 2-equation model.

Page 4: Forecasting 100,000 Variables: A New Approach · Makridakis or M-Competitions)1. The M4-Competition asked participants to forecast 100,000 variables. Those variables were collected

Forecasting 100,000 Variables: A New Approach WELLS FARGO SECURITIES June 12, 2018 ECONOMICS GROUP

4

which include; the BVAR (our proposed framework) and two methods to represent single-equation models, which are, are AR (1) and ARMA (1, 1). The results are reported in Table 1 below.

Table 1: Forecast Results

Source: Wells Fargo Securities

Our proposed forecasting method, BVAR, produced the smallest forecast error for each of the six variables; therefore, outperforming both the AR and ARMA models in average forecast error. Since we utilize SAS software to estimate the BAVR model, the software provides forecast intervals (upper and lower limits) in addition to the forecast for each variable. Forecast intervals (also sometimes known as confidence intervals) are very useful to gauge risk to a forecast.

Although in this report we reported results for only six variables, we are hoping to present results for all 100,000 variables once the M4-Competition announces the final results.8 We also plan to publish an additional report later this year with some potential applications of our proposed forecasting framework. In the present case, we had just included the log-difference form of the original variable, due to the nature of the target variables as well as available information set (mainly unknown variables and some variables have only 13 observations). Therefore, depending on the target variable and available information set, there are additional options to build a multiple-equation model from one variable.

For example, if we are interested in multi-period ahead (12 months or so) forecasts of retail sales, or gasoline prices, and we decided not to include any additional predictors. Assuming we are using a monthly dataset then we can include year-over-year percentage change of the variable, to counteract seasonality issues, in addition to the log-difference form (a three variables model). Similarly, an analyst working with high frequency data (daily S&P500 index forecasting, for example) may have a reason to believe some trading days may be more volatile than others, or vice versa (a simple example may be Mondays as first business day of the week or first Friday of the month, as employment data is released on that day). The analyst can include those properties of the data in the model.

Summing up, we suggest utilizing a system of equations for multi-period ahead forecasting as, typically, it would provide better forecasts than single-equation methods.

Concluding Remarks and Potential Applications of Our Proposed Framework

Multi-period ahead forecasting is an essential element of decision making. Our proposed forecasting framework could help decision makers generate more accurate forecasts than with traditional forecasting tools. In addition, our approach is handy for those analysts who either are unable to find predictors or have decided not the include predictors, as they can still build a multiple-equation model for forecasting. An example could be an analyst who wants to forecast delinquency rates by demographic, gender or/and by MSA, can benefit from our approach by producing more accurate forecasts than traditional methods.

8 The M4-Competition management plans to announce winners of the competition in October 2018.

AR(1) ARMA(1,1) BVAR

Hourly 156.7 144.4 69.2

Daily 27.3 27.2 15.0

Weekly 151.0 193.3 133.3

Monthly 138.0 150.2 127.7

Quarterly 1,580.6 1,621.3 1,525.9

Yearly 950.9 1,119.9 411.2

Average Forecast Error (RMSE)Variable

Our proposed forecasting method, BVAR, produced the smallest forecast error for each of the six variables.

Page 5: Forecasting 100,000 Variables: A New Approach · Makridakis or M-Competitions)1. The M4-Competition asked participants to forecast 100,000 variables. Those variables were collected

Wells Fargo Securities Economics Group

Diane Schumaker-Krieg Global Head of Research, Economics & Strategy

(704) 410-1801 (212) 214-5070

[email protected]

John E. Silvia, Ph.D. Chief Economist (704) 410-3275 [email protected]

Mark Vitner Senior Economist (704) 410-3277 [email protected]

Jay H. Bryson, Ph.D. Global Economist (704) 410-3274 [email protected]

Sam Bullard Senior Economist (704) 410-3280 [email protected]

Nick Bennenbroek Currency Strategist (212) 214-5636 [email protected]

Eugenio J. Alemán, Ph.D. Senior Economist (704) 410-3273 [email protected]

Azhar Iqbal Econometrician (704) 410-3270 [email protected]

Tim Quinlan Senior Economist (704) 410-3283 [email protected]

Eric Viloria, CFA Currency Strategist (212) 214-5637 [email protected]

Sarah House Senior Economist (704) 410-3282 [email protected]

Michael A. Brown Economist (704) 410-3278 [email protected]

Charlie Dougherty Economist (704) 410-6542 [email protected]

Jamie Feik Economist (704) 410-3291 [email protected]

Erik Nelson Currency Strategist (212) 214-5652 [email protected]

Michael Pugliese Economist (212) 214-5058 [email protected]

Harry Pershing Economic Analyst (704) 410-3034 [email protected]

Hank Carmichael Economic Analyst (704) 410-3059 [email protected]

Ariana Vaisey Economic Analyst (704) 410-1309 [email protected]

Abigail Kinnaman Economic Analyst (704) 410-1570 [email protected]

Shannon Seery Economic Analyst (704) 410-1681 [email protected]

Donna LaFleur Executive Assistant (704) 410-3279 [email protected]

Dawne Howes Administrative Assistant (704) 410-3272 [email protected]

Wells Fargo Securities Economics Group publications are produced by Wells Fargo Securities, LLC, a U.S. broker-dealer registered with the U.S. Securities and Exchange Commission, the Financial Industry Regulatory Authority, and the Securities Investor Protection Corp. Wells Fargo Securities, LLC, distributes these publications directly and through subsidiaries including, but not limited to, Wells Fargo & Company, Wells Fargo Bank N.A., Wells Fargo Clearing Services, LLC, Wells Fargo Securities International Limited, Wells Fargo Securities Asia Limited and Wells Fargo Securities (Japan) Co. Limited. Wells Fargo Securities, LLC. is registered with the Commodities Futures Trading Commission as a futures commission merchant and is a member in good standing of the National Futures Association. Wells Fargo Bank, N.A. is registered with the Commodities Futures Trading Commission as a swap dealer and is a member in good standing of the National Futures Association. Wells Fargo Securities, LLC. and Wells Fargo Bank, N.A. are generally engaged in the trading of futures and derivative products, any of which may be discussed within this publication. Wells Fargo Securities, LLC does not compensate its research analysts based on specific investment banking transactions. Wells Fargo Securities, LLC’s research analysts receive compensation that is based upon and impacted by the overall profitability and revenue of the firm which includes, but is not limited to investment banking revenue. The information and opinions herein are for general information use only. Wells Fargo Securities, LLC does not guarantee their accuracy or completeness, nor does Wells Fargo Securities, LLC assume any liability for any loss that may result from the reliance by any person upon any such information or opinions. Such information and opinions are subject to change without notice, are for general information only and are not intended as an offer or solicitation with respect to the purchase or sales of any security or as personalized investment advice. Wells Fargo Securities, LLC is a separate legal entity and distinct from affiliated banks and is a wholly owned subsidiary of Wells Fargo & Company © 2018 Wells Fargo Securities, LLC.

Important Information for Non-U.S. Recipients

For recipients in the EEA, this report is distributed by Wells Fargo Securities International Limited ("WFSIL"). WFSIL is a U.K. incorporated investment firm authorized and regulated by the Financial Conduct Authority. The content of this report has been approved by WFSIL a regulated person under the Act. For purposes of the U.K. Financial Conduct Authority’s rules, this report constitutes impartial investment research. WFSIL does not deal with retail clients as defined in the Markets in Financial Instruments Directive 2007. The FCA rules made under the Financial Services and Markets Act 2000 for the protection of retail clients will therefore not apply, nor will the Financial Services Compensation Scheme be available. This report is not intended for, and should not be relied upon by, retail clients. This document and any other materials accompanying this document (collectively, the "Materials") are provided for general informational purposes only.

SECURITIES: NOT FDIC-INSURED/NOT BANK-GUARANTEED/MAY LOSE VALUE