bowei zhang scholarly potential_ goodyear project

8/14/2019 Bowei Zhang Scholarly Potential_ Goodyear Project

1/24

2009Written by :

Bowei Zhang

Proofread by:

Steven MillerSteven Subichin

09/30/2009

Last Revision Date11/24/2009


2/24

2

Table of ContentsPROJECT INTRODUCTION.............................................................................................3

12 Months Rolling Sum and Lagged Leading Indicators.......................................................3

Correlation Verified To Be Linear.....................................................................................4

Market Share Forecast...................................................................................................4

Modeling Data Geography-US Models extended to include Canada...........................................5

MODELING EFFORT 1-SIMPLE LINEAR REGRESSION MODELS.........................................5

Modeling Assumptions and Limitations..............................................................................5

How to Obtain Monthly Forecast from 12 Months Rolling Sum Forecasts....................................5

Seasonality-Removed By 12 Months Rolling Sum.................................................................6

Outliers-Strike Consideration...........................................................................................6

Outliers Reduction-Smooth Economic Leading Indicators.......................................................7

Model Utility and Residual Analysis..................................................................................7

MODELING EFFORT 2- MULTIPLE LINEAR REGRESSION MODELS....................................8

How to Build Multiple Linear Regression Models in Minitab...................................................8

Reason for Not Using Multiple Regression Models Monthly Forecasts.......................................9

MODELING EFFORT 3- TIME SERIES MODELS...............................................................10DATA RANGE AND SOURCE........................................................................................11

EXPLANATION OF THE STANDARD LINEAR REGRESSION SPREADSHEET......................12

Common Tabs...........................................................................................................12

Unique Tabs..............................................................................................................14

Steps of Searching for New Leading Indicators...................................................................15

MODEL REFRESH AND UPDATE ISSUES.......................................................................16

FILES LOCATION AND NAME......................................................................................16

FUTURE LOOK...........................................................................................................17

APPENDIX..................................................................................................................17

APPENDIX

Goodyear and Industry North AmericaCommercial Replacement Tire

Causative Forecasting Models User


3/24

3

PROJECT INTRODUCTIONLean inventory and efficient demand planing are two weapons especially improtant for any businesses to

survive recession times. To achieve these two goals, a powerful demand forecasting system with

relatively high level of accuracy is necessary. The aim of the project is to build such forecasting models

which reveal the relationship between leading economic variables and Goodyears business that by

looking at the trend of those economic variables, Goodyear can tell the future highs and lows of itsrelevant business segments.

Key members of the project include Steven Miller, Steven Subichin, Mike Ryan, Greg Tomsho and

Bowei Zhang.

This project is focused on Goodyear and the total industrys performance in US/North America

commercial replacement tire markets. We split the commercial replacement tire market into four

different segments by tire application and wished to forecast the demand for each segment as well as the

total market as a whole.

Raw data we have for this project are:

Monthly data of 74 leading economic variables(US) that may potentially relate to the commercial

replacement tire market from 01/1996 to 06/2009. (Multiple data sources)

Industrys monthly shipment data for each segement of US/North America commercial

replacement tire market: Urban/Regional/long haul/Mixed service from 01/1996 to 06/2009.

(Data source: RMA)

Goodyears monthly billed sales and shipment data for each segment of US commercial

replacement tire market: Urban/Regional/long haul/Mixed service from 01/2003 to 07/2009 (Data

source: EDW)

One thing worth notice is that RMA and Goodyears classification of the four market segments are

slightly different. We kept Goodyears billed sales data for each market segment using its own marketclassification criteria and regrouped Goodyears shipment data using RMAs criteria. We did it this way

because Goodyears billed sales forecast will be used to assist DP which uses Goodyears market

classification criteria and Goodyears shipment will be used, together with Industrys shipment forecast,

which uses RMAs criteria, to calculate Goodyears future market share.

12 Months Rolling Sum and Lagged Leading Indicators

Initially we wished to find the potential linkage between external economic variables and the replacement

tire business, be it linear or non-linear relationship. To reduce the modeling noise occurred to relatively

small monthly billed sales and shipment values and identify their correlation with leading economic

variables more easily, we substituted each monthly tire sales/shipment data point with the sum of data for

that month and data for previous 11 months. Thereinafter this moving yearly data will be called 12months rolling sum. We calculated the correlation coefficients between 12 months rolling sums of billed

sales/shipments of Goodyear and Industry for each market segment with the 74 economic variables

monthly data. We assumed some of the variables have leading capabilities for the commercial replacment

market. To test that, we simplely lagged those variables by certain months when we calculated the

correlation coefficients. For example, if we think it takes the replacement tire market 2 months to respond

to the movement of a leading indicator, then we would use the 2 months lagging data of that variable to

calculate the correlation coefficients. For Goodyears billed sales data, we calculated their correlation


4/24

4

coefficients with up to 24 months lagging data of the 74 variables. The numbers somehow proved our

assumption because some variables have high correlation coefficients when lagged by near term and

some by long term.

Correlation Verified To Be Linear

Although correlation coefficient is a tool to depict the strength of a linear relationship between two

variables, the interpretation of that value could be very arbitary. There is no set rule about what nubmer is

high and low and sometimes high numbers dont necessarily mean pure linear relationships. So we also

drew scatter plots to study the true relationship between tire sales/shipment and leading economic

varaibles and used correlation coefficients as a second reference.

Using these two tools, we were able to identify some regular relatonship patterns between external

variables and sales/shipment data that can be captured by certain mathematical models. By regular I mean

that those relationship patterns can mostly be depicted by certain mathematical models. After careful

consideration and comprehensive tests, we decided to build only simple linear regression models (which

means one leading indicator matchs one market segment)for ease of understanding and use in practice.

Now we have already built simple linear regression causative models with some level of confidence forGoodyears billed sales, Goodyears shipment and Industry shipment to forecast 2 months and 12 months

out for each of the four segments in US only and North American commercial replacement tire market.

For Industry shipment models, we also built time series models to provide alternative views and they all

achieved decent forecast accuracy rates.(Monthly forecast ex-post errors for US only time series models

range from 7.75% for Urban tires and 22.24% for Mixed Service tires; Monthly forecast ex-post errors for

North America Market time series models range from 7.05% for Urban tire and 17.75 % for Mixed

Service tire. )

Market Share Forecast

Since for Goodyear and Industry shipment data we applied the same market classsification criteria(by

vehicle application code)[1]when grouping the data for each market segment, we are able toforecast(calculate) Goodyears future market shares.

However, a word of caution is that even though we re-grouped Goodyears shipment data using RMAs

criteria, there are still some difference between RMAs definition of certain market segments and

Goodyears. One verification is that Goodyears re-grouped shipment data(North America) for Regional

and Long haul segments are significantly different from what RMAs adjustment and interpretation of

shipment data reported by Goodyear. But the difference within each market segment can offset each other

to a certain extent. Also, due to re-statement issue, Goodyears total shipment data in EDW is different

from the data sent back by RMA, on average by 7.5% during the period from 06/2007 to 04/2008.

The above two facts mean that using different sources for Goodyear shipment data can leads to different

causative models. RMAs official Goodyear shipment data has its value for other analysis endeavors.

However,we chose to use EDWs Goodyear shipment data to build causative models and calculate

Goodyears future market share because the modeling results from this project are intended for internal

use only.


5/24


6/24

6

f13=(i=213Fi-i=112Fi) +H1

=F13+i=212Fi-i=212Fi - F1+H1

= H1+ (F13- F1) + ( i=212Fi-i=212Fi)

H1 is the true history value of Jan in the first year. (F13- F1) is deemed as the forecasted monthlyincrease/decrease year-over-year, the change from Jan in the first year to Jan in the second year in our

example. We assume the forecasted values from the two rolling sum forecasts of the same 11 months are

almost the same, namely the artificial error term ( i=212Fi-i=212Fi) would be close to 0. If this

assumption does not hold, our forecasted monthly value will deviate from the true monthly forecast we

wish to, but impossible to, get directly from 12 months rolling sum forecasts. This is likely to happen to a

monthly forecast value when absolute percentage forecast errors of the two related 12 months rolling sum

forecasts change dramatically in that it will violate the i=212Fii=212Fi assumption. It will be easier

for multiple regression models to violate this assumption thus generating inaccurate forecasts. More

detailed discussion will be covered in the section Multiple Linear Regression.

Seasonality-Removed By 12 Months Rolling SumOne benefit of using 12 months rolling sum history as the dependent variables for linear regression

models is that we dont have seasonality in the data.Appendix 2 is a comparison plot for monthly and

12 months rolling sum industry commercial replacement tire shipment data. As can be seen, the monthly

data is more volatile and has some seasonality across the history. The 12 months rolling sum, on the other

hand, does not have seasonal pattern at all (This fact applys to every market segment of our analysis). But

over the long term, the 12 months rolling sum may indicate some regular business cycilcal pattern which

can be deemed as a sort of seasonality when buiding time series models. This topic will be covered in

more detail later.

Normally for statistical modeling purpose, if the raw data has strong seasonality, we would have to

deseasonalize them first then build the model and in the end reseasonalize the forecast. In our models,the transformation foluma introduced above f13= H1+ (F13- F1) + ( i=212Fi-i=212Fi) adds

seasonality back to monthly forecasts by adding the monthly history term (H1 in the formula). Hence

by using this formula we successfully avoided the seasonality issue in raw data and kept seasonality in the

monthly forecast. The leading indicators we picked for this project are all free of seasonality issue.

However, in the future, if we want to bring in new economic variables with seasonal pattern, we

have to deseasonalize them before use.

Outliers-Strike Consideration

Outlying data points in terms of either independent or dependent variables in regression and time series

models can heavily skew forecast results and hence forecast accuracy. Among the many possible outlying

reasons, unusual one time event can generate abnormal history data. For example, the strike occurred onOct 5th, 2006 and ended early 2007 made Goodyears commercial replacement tire sales of each market

segment from 11-2006 to 01-2007 extremely low. Especially sales for 12 -2006 is lower than the lowest

points during recent recession times. Hence for Goodyears total replacement tire market, we

overforecasted about 63% and 117% for 11-2006 and 12-2006 respectively. To fix the problem, we

replaced the monthly sales data from 11-2006 to 01-2007 with the monthly average of same month from

2003 to 2005 and reran the linear regression model. The total cumulative absolute forecast error during

model building period (12-2003 to 12-2007) decreased from original 20.07% to 14.57%. However, the


7/24

7

model validation error (ex-post error over period from 01-2008 to 06-2009) increased a little from the

original 9.37% to 10.30%. One possible explanation is that without outlying data clean-up, the model

studied the dampened sales during stike period and exterted the learning for forecasts over ex-

post period, during which recession and dampened sales exist. Hence data clean-up in this case did

not improve the models ex-post forecast accuracy. For details about the test, please refer to the tab

named Industry outlier fix in the excel file named Goodyear Billed Sales Causative Models 2months out (US only). After careful thought, we think our Goodyear models are robust enough to

contain some outlying raw data in the model buidling period without deterioriate forecast results.

Hence we kept our models as they are.

Outliers Reduction-Smooth Economic Leading Indicators

However, there is one easy way to reduce, at least partially, the outlying forecast points. Before we dive

into that, lets first look at how we calculate the monthly forecast residual. Assuming H13 is the monthly

history value of the 13th month and using the transformation formula mentioned earlier, we can get the

following residual calculation formula:

H13- f13=H13- H1- (F13- F1) - ( i=212Fi-i=212Fi) = (H13-F13)-(H1- F1) - ( i=212Fi-i=212Fi)

As can be seen, abnormal values of H13, H1, or abnormal change of leading indicators monthly values

(It may cause two consecutive 12 months rolling sum forecasts change dramatically, which will very

possibly violate our assumption that i=212Fii=212Fi). Hence one reasonable remedy for outlying

monthly forecasts is to smooth the leading indicators monthly values by replacing them with the average

of values of corresponding months plus previous 11 months. This transformation of leading indicators

will reduce the monthly forecasts volatility.For an example, please refer to Appendix 3. We tested the

transformation technique of leading indicators on the industry shipment 12 months out model for North

America region. As shown by the plot, the transformation makes the forecasts smoother and closer to

history value. In fact, the ex-post error during period from 01/2008 to 06/2009 dropped from 21.59% to

only 9.35% after we took the 12 months rolling average of leading indicators monthly value.

This smoothing technique will not always generate more accurate forecast results. However, it will

definitely help make monthly forecasts less volatile if the leading indicators used are volatile in nature.

We applied this technique for Industry shipment 12 months out forecast models for US only and North

America data.

Model Utility and Residual AnalysisAppendix 4 is a causative models utility comparison table. Key metrics used include R-square (both

original and adjusted) and cumulative absolute percentage errors (ex-post period) of both causative and

nave models. The nave models simply assume what happened yesterday will happen again

tomorrow. Hence we would take current months sales/shipment as the forecast in 2 months and 12

months for nave models. As coded in blue at the right most column of this table, ex-post forecast errorsof nave models are all higher than those of causative models, except for Regional Market. We think this

could be either a coincidence or that the Regional Market is relatively speaking, stable enough to repeat

the history value over time. Either explanation wont invalidate the effectiveness of our causative models

though.

Ex-post errors are cumulative absolute percentage errors. To be specific, this heuristic metrics is

calculated by dividing the sum of absolute values of all monthly forecast residuals by the sum of monthly


8/24

8

history values over a certain period. We prefer to use this metric to evaluate our forecasts absolute

deviation from history rather than from a constant, as in R-square.

The R-square value is another tool to indicate the effectiveness of regression models. The higher R-square

(including adjusted one) is, the more likely that the total variation in the n observed values of the

dependent variable is explained by the overall regression model. However, there is no absolute standard

for what is a good value. As can be seen from the table (color coded in yellow), Goodyear models have

relatively high R-square and Industry models lower, despite the fact that industry models are as accurate

as Goodyear models in terms of monthly forecasts ex-post errors. This brings up two questions. One

question is that how long can we keep using the causative models until we have to revise them? (This

question will be addressed at the last section of this document). Another is that can we build multiple

regression models that generate small monthly forecast ex-post errors and high R-square (adjusted) value?

(Addressed below)

MODELING EFFORT 2- MULTIPLE LINEAR REGRESSION MODELS

How to Build Multiple Linear Regression Models in MinitabSince in general our Industry models have low R-square, industry total market shipment 2 months out

forecast model is picked for this test. What we wish to build is a multiple regression model which has a

high R-square and low cumulative absolute percentage error for monthly forecast during ex-post period.

Minitabs automatic model selection function is used to perform the test.

Ideally, it would be great if we can dump as many variables data into Minitab as possible and let the

computer generate an optimal solution for us. However, Minitab can only process a limited number of

variables using Stepwise [5]and Best subsets [6] selection methods. So some variables need to be

screened out of the candidates pool as follows:

Select 12 months rolling sum of total industry shipment and 2 months lagged data of the 74variables.

Calculate correlation coefficients between rolling sum values and 74 variables and keep variables

which have a correlation coefficient higher than 40% or lower than -40%. This step reduced the

number of potential variables from 74 to 30.

Use Stat-Basic Statistics-Correlation in Minitab to generate Correlation Matrix, which includes

P-value for each correlation coefficient between any pairs of variables, including dependent

variable-shipment. (If the correlation coefficient between two variables is higher than 0.9 or

lower than -0.9, then one of them can be considered redundant for the dependent variable in the

model. If there are more than two multi-correlated variables, compare their P-values with

dependent variable first to screen out those with higher P-values; if P-values are the same, keep

those with a higher absolute value of correlation coefficient with dependent variable.) Use the Correlation Matrix to eliminate redundant variables. This step reduced the number of

variables from 30 to 20.

Use Step-wise and Best-subsets methods in Minitab to generate the best multiple linear

regression models.

The best two models generated by Best-subsets method are one 10 variables linear regression model[7]and one 9 variables linear regression model.[8]


9/24

9

The best model from Step-wise method is a 5 variables linear regression model. [9]

Even though these models all have high R-squares (around 90%) and low ex-post forecast errors for 12

months rolling sum forecasts , which are better than our original one variable linear regression models,

their ex-post forecast errors for monthly forecast are especially high (around 40% and that of our original

model is around 9%). However, the multiple regression models cumulative absolute percentage errors of

monthly forecasts during model building period (around 12%) are not very far from those of our original

models (8%).

Reason for Not Using Multiple Regression Models Monthly Forecasts

We find out that the reason why multiple regression models did not perform well for monthly forecast

during ex-post period is related to the assumption of our defined transformation formula of 12 months

rolling sum forecast to monthly forecast. As mentioned before, we assume the forecasted values from the

two 12 months rolling sum forecasts of the same 11 months is almost the same,

namelyi=212Fii=212Fi.

However this assumption is not always true and can be more easily violated by multiple linear regression

models than single linear regression models. The multiple linear regression models in our case all havehigh R-square values, which means that the variation of dependent variable (shipment/sales of tires) is

explained to a large extent by those multiple variables we included (despite the fact that mathematically

speaking, the more variables we add in a multiple regression model, the higher its R-square). The

downside of that is that for multiple regression models we have more external factors to control and

each ones fluctuation can affect our final transformed monthly forecasts.

Look at the data and plot in Appendix 10. In the vertical axis of the plots in Appendix 10 is the

cumulative absolute percentage error for 12 months rolling forecast. Blue line represents the 12 months

rolling sum forecast obtained by our original single linear regression model. Red and green line are

forecasts from two multiple linear regression models selected by Best subsets method. As can be seen,

before period 49, which is 12-2007, multiple regression models are more accurate than single linearregression model in terms of 12 month rolling sum forecast.

During ex-post period, from 01-2008 to 06-2009, the forecast accuracy of multiple regression models

fluctuate more heavily than the single linear regression model. That is because there are more variables

in multiple regression models and it is more possible that the recession impacts on those variables will

skew the 12 months rolling sum forecast. More fluctuation between two consecutive 12 months rolling

sum forecasts will violate thei=212Fii=212Fi assumption and cause our related monthly forecast to

have a high forecast error.

To sum up, due to technique we used to transform 12 months rolling sum forecast to monthly

forecast and the fact that multiple regression models are more difficult to control and maintain, we

think simple linear regression model is better for our modeling purposes even though relatively,

they will have a smaller R-square, compared with multiple regression models.

It is natural to think that if we can use monthly shipment/sales as dependent variables directly to build

multiple regression models then we can have both high forecast accuracy and high R-square. However,

the monthly data is too volatile compared with 12 months rolling sum values, and as tested, we can barely

find well correlated external indicators for monthly shipment/sales data.


10/24

10

MODELING EFFORT 3- TIME SERIES MODELSOther than causative models, we also tested time series models for tire sales/shipment data. The

forecasting method we used is called exponential smoothing, which weights the observed time series

values unequally. More recent observations are weighted more heavily than more remote observations.

This modeling method studies the time series historys level, trend (optional) and seasonality (optional)

and copies it/them into future to make forecasts. As mentioned earlier, Goodyears history data dates backto 2003 and Industry back to 1996. Opposite to causative models, for time series studies, the more data

we have, the easier it is for us to capture trend and seasoanlity, if there are any. As proofed by the plots in

Appendix 11, Goodyears history data is too short to show obvious trend and seasonality while industrys

history data is strong enough to be considered as a good candidate for Multiplicative Holt Winters

method [12]. Actually, the seasonality indicated in the industry data is business cyclicality over the long

term because the 12 months rolling sum values dont have seasonality in themselves. But this cyclical

pattern can be modeled as a sort of seasonality.

Due to data availability, we built time series models for industry shipment only using monthly data from

01/1996 to 12/2007 and tested each model over the period from 01/2008 to 06/2009. The monthly hisotry

data is very volatile even though it has trend and seasonality over history. To make sure we build the besttime series models we can, we tested four models using both the monthly shipment data and 12 month

rolling sum shipment data for each market segment. The four models are : Level only; Level + Trend;

Level+Trend+Increasing Seasonality (Multiplicative Holt Winters method); Level+Trend+Constant

Seasonality (Additive Holt Winters method [13]). Hence in total, for each market segment, 8 time series

models were tested using Minitab. As expected, the model that generates the smallest monthly forecast

ex-post error for all the market segments except Mixed service is Multiplicative Holt Winters method on

12 months rolling sum history data. The fact that Mixed service is an exception did not surprise us

because its relatively complex business structure. Mixed services hisory plot does not show very typical

and easily recognizable trend and seasonality patterns either. The best time series model for this market

segment is level only using 12 month rolling sum hisory. This model generates a monthly forecast ex-

post error of 22.24%, which is higher than those of all the Multiplicative Holt Winters models for theother market segments. The level only model means that if we wish to forecast into future for multiple

periods after 06/2009, we would get the same 12 month rolling sum forecast for every future period. In

that case, according to the transformation formula introduced previously (f13=(i=213Fi-i=112Fi)

+H1), all future monthly forecasts will be the same as the one year back monthly history values. This is a

kind of nave model too.

All time series models are executed using Minitab. After uploading history shipment data into Minitab,

simple go to Stat-Time series-Single Exp Smoothing, Double Exp Smoothing and Winters

method for level only, level + trend and level+ trend+ seasonality models, respectively. For

level only and level+ trend models, Minitab can generate optimal models by automatically searching

for smoothing constants for level and trend components that minimize the Sum of Square Errors. ForHolt Winters method, we have to manually define all three smoothing constants for level, trend and

seasonality while the default 0.2 values for all three smoothing constants work well for our project, most

of the time.

All the Industry shipment modeling results are stored in an Excel file named Industry Time Series

modeling. For each market segment, there are three tabs in this Excel file. Take urban market for

example. In the Urban forecast results tab, monthly forecast, monthly forecasts cumulative absolute


11/24

11

percentage errors, history v.s. forecast plots for each of the eight time series models are listed for

comparison. In the Rolling to monthly transform- Urb tab, 12 months rolling sum forecasts generated

by Minitab can be copied to the column named Urban 12 months rolling sum forecast to generate the

monthly forecast in the right most column. The transformation formula previously defined was already

imbedded in this calculation. In the last tab called Error Calculator- Urban, transformed or Minitab

directly generated monthly forecasts can be copied to related column to get the forecast error statisticscolor coded in blue.

DATA RANGE AND SOURCEAs an old saying goes Garbage in garbage out. To avoid this clich for our project, we have to

carefully maintain and process the raw data. All Goodyear billed sales data is available from EDW. As

Goodyears market group names are slightly different from the RMA names for the four market

segments, for a detailed transformation table, please refer to Appendix 14. All Goodyear shipment data

has to be manually processed in order to apply RMAs market classification. This job is previously done

by Greg Tomsho using Materials number Vehicle Application code table generated by Steven D.

Miller. For a copy of this table, please see file named pbu03_all.All Industry shipment data by marketsegments is available from RMA. Contact Krista Liem for latest industry data.

All the key leading indicators used for our modeling purposes are summarized in the table named Key

Leading Indicators[15].In general our leading indicators come from three sources: Federal Reserve Bank

of St. Louis, Energy Information Administration, US Dept. of Energy and Freight Transportation

Research (FTR) Associate. FTR database is updated monthly and can be accessed by Krista Liem.

Another thing worth notice is the data range issue. For Goodyear and Industrys causative models, we

used sales/shipment history data from 2003 to 2009. For Industrys time series models, we used data from

1996 to 2009. It makes sense to use more history data to study purely the time series trend and

seasonality pattern. However, since so many macro-economic factors can affect tire sales/shipment

dramatically over a long period of time, it would be risky to use say, 12 years tire sales/shipment history

data to build single linear regression models. As a matter of fact, at the initial stage of our project, we

built causative models for Industry shipment using history data for past 12.5 years and then we reduced

the data range to past 6.5 years and re-ran the models. It turns out that using less shipment history data,

we got lower monthly forecast ex-post errors. And we had to change some of the leading indicators

selected previously.

Hence to build effective causative models, we may have to consider dropping some of the oldest data in

modeling period when new data becomes available and for time series models, it is OK to include new

data points while keeping the old data. Also, most of our economic indicators data source organizations

revise their published data periodically afterwards . As the new monthly data becomes available, the data

for past periods may also have changed. If that is the case, all revised data within our modeling data rangeshould be used to re-run the model to get new sales/shipment forecast.

EXPLANATION OF THE STANDARD LINEAR REGRESSION

SPREADSHEETAll the causative models developed so far have the same standardized excel spreadsheet structure.


12/24

12

There are six files in total for each category of modeling and they are named as:

Goodyear Billed Sales Causative Models 2 months out

Goodyear Billed Sales Causative Models 12 months out

Goodyear Shipment Causative Models 2 months out

Goodyear Shipment Causative Models 12 months out

Industry Shipment Causative Models 2 months out

Industry Shipment Causative Models 12 months out

There are two sets of models. One set for US only data and another set for North America data. Hence in

total there are 12 files. Every file contains the following 11 tabs. Take Goodyear Billed Sales Causative

Models 2 months out (US only) for example.

Common Tabs

1. ReadMe:

It contains description of the models within the Excel file and description of each indexed tab and

how to use them.

2. Scatter Plots:For each market segment and each of the 74 economic variables, there is a matching scatter plot

generated in this tab. All the data used come from the tab x-months Lagging Data Set. If the

current structure of the data in that tab does not change, the scatter plots will update automatically

as the data changes. However, if new data is added, then we have to manually change the plots to

refect the new data points. To do that, you can right click on the plots and select Select data,

then you will be directed to the tab x-months Lagging Data Set, where you are able to re-select

raw data.

3. x-months Lagging Data Set:

x can be either 2 or 12 depending the purpose of the model. The reason why our causative

models have the ability to forecast dependent variables future values is that we lagged the

independent variables while constructing the linear regression relationships. If we wish to

forecast 2 months out, we will lag the leading indicators by 2 months; if we wish to forecast 12

months out, we will lag the leading indicators by 12 months. Hence in this tab, billed sales 12

months rolling sum data and 2 months lagged 74 economic variables are listed from 12-2003 to

06-2009, which include both the modeling building period and validation (ex-post) period.

4. All Data:

This tab lists all 74 variabless monthly history data from 01-1996 to 06-2009. Some variables

may have missing data points for the most recent history. This tab was set up to store any history

data used for the project.

5. Correlation Coefficients:

This tab contains the monthly history of tire sales and automatically calculated 12 months rolling

sum values. Also, the monthly history data of all 74 variables are listed here. The red dotted line

table at the bottom of this tab listed correlation coefficients (calculated using =Correl()function

in Excel) between 12 months rolling sum over the period from 12-2003 to 06-2009 (same period

as used in the scartter plots) for each market segment and 2 months lagged 74 variables. All the

correlation coefficients whose absolute values are above 80% are listed in color using

Conditional Formatting in Excel. To obtain the updated correlation coefficients as new data

comes in, you may have to add new monthly sales, drag-down excel cells to get 12 months rolling


13/24

13

sums, add new monthly data for the 74 variables, and re-set the inbedded formula to include new

12 months rolling sum and leading indicators.

This tool is used together with scatter plots to detect potential linear relationship between leading

indicators and tire sales data.

6. Forecast errors:

This tab listed all the selected variables (using scartter plots and correlation coefficients inprevious tabs) and their cumulative absolute percentage errors during period 01/2008 to 06/2009

(ex-post errors) for both the 12 month rolling sum forecasts and transformed monthly forecats.

The 12 month rolling sum forecasts ex-post errors are used to monitor our simple linear

regression models effectiveness in capturing potential linear relationship between 12 months

rolling sum sales and leading indicators. If the relationship is close to linear, this ex-post error

should be small. And the monthly forecast ex-post error is used to check if our model can

generate decent monthly forecast in near future. Normally, the ex-post error for 12 months rolling

sum forecast should be smaller than that of monthly forecast.

7. Urban-x:

From tab 7 to 11 are the models we used to generate monthly forecasts. All tabs have the same

structure and are self-explanatory. For illustration purpose, a detailed explanation is providedhere only for Urban-2 tab for Goodyear Billed Sales 2 months out model.

The only two columns that need to be updated with external data source are the monthly history

of tire sales and the column named by the selected leading indicator. You can drag down the

column named 12 months rolling sum history to get the 12 months rolling sum needed for

modeling.

Then use Regression function in an Excel add-in called data analysis [16] to select the

dependent variable, which is 12 months rolling sum history of tire sales and 2 months lagged

monthly history of the leading indicator over the modeling period. The Regression function in

Data analysis will generate a detailed ANOVA analysis as shown in appendix 17. The orange

color coded two numbers are coefficients for the constant value and leading indicator in the

simple linear regression model. You can copy those two numbers in the corresponding locationsat the top of the table then the monthly forecasts (at the right most of the table) and forecast errors

(at the top right of the table) will be automatically generated. If current data selected for model

building is used for forecasting future monthly sales, you dont have to change the coefficients

previously entered at the top of the table. When new monthly sales data and leading indicators

data become avaible, you can add them in and change the formula for new ex-post error

calcualtion. If after a certain period of time, new data needs to be added into the modeling period,

you have to rerun the data-analysis add-in to reselect the corresponding 12 month rolling sum

tire sales and leading indicators monthly data.

Most of the data for new cells can be obtained by draging down the cells in Excel.

8. Regional-x:

See tab 6 for instruction.9. Long haul-x:

See tab 6 for instruction.

10. Mixed service-x:


11.Total Market-x:



14/24


15/24

15

Steps of Searching for New Leading Indicators

The logical steps of using multiple tabs in each model/excel file to search for the best leading indicators

for each market segment can be described as follows. Take Goodyear Billed Sales Causative Models 2

months out (US only) for example.

1. Update tire sales and leading indicators monthly data in the tab Correlation Coefficients.2. Adjust fomulas to include new data when refreshing the correlation coefficient calculation table

in this tab.

3. Copy and paste new tire sales 12 month rolling sum data (including ex-post period)and leading

indicators monthly data to the tab 2-month lagging data sets.

4. Go to tab Scatter Plots to update scatter plots one by one if necessary to include new data added

in the tab 2-month lagging data sets .

5. Observe the scatter plots. If a linear relationship is found, consider that variable a condidate for

test.

6. If linear relationship is not obvious to detect, use tab Correlation Coefficients to search for

variables with high correaltion coefficients with 12 months rolling sum tire sales data.

7. To test all the candidate varibles for a specific market segment, copy their data to correspondingmarket segment tab one bye one then perform the following test starting from step 8.

8. Update both the monthly data for tire sales and leading indicator selected in specific market

segment tab.

9. Click Excel Data-Data Analysis-Regression tab to select the matching 12 months rolling sum

sales and lagged monthly data for leading indicator(lagged by 2 months in this case) and perform

ANOVA analysis.

10.Copy coefficients for the constant and variable in linear regression model from the ANOVA

analysis generated by Excel to corresponding positions at the top of the market segment tab.

11. Drag down the the colum called 12 months rolling sum forecast and monthly forecast if

necessary. All formulas are already inbedded.

12.Copy and paste the ex-post forecast errors for both 12 months rolling sum and monthly forecastsautomatically generated at the top right of the table in market segment tab to corresponding

positions in the tab named Forecast errors.

13. Repeat step 7 to 12 until ex-post errors generated by every potential leading indicator are

recorded in the Forecast Errors tab.

14.Select the one variable that does not generate negative monthly forecasts and gives a low monthly

forecast ex-post error.

15.If outlying monthly forecasts are generated by a chosen leading indicator, either manual

adjustment of forecast is required or a back up leading indicator can be selected from the tab

Forecast Errors.

MODEL REFRESH AND UPDATE ISSUESTo use linear regression models to forecast, one important underlying assumption is that the linear

relationship between independent variable (leading indicators in our models) and dependent variable

(tires sales/ shipment)will last into future. And the similar type of underlying assumption for exponential

smoothing models is that the trend and seasonality will last into future. However, in practice these

assumption wont hold forever. That brings up the question about when to revisit the models. The

suggested re-modeling cycle is 6 months for our project. Every six months, when we have 6 more months


16/24

16

new tire sales/shipment data, we can evaluate the effectiveness of each model. If the leading indicator still

works fine, then the only thing to do might be to add new data in modeling period and drop the equal

amount of old data, if necessary. If the chosen external economic variable loses its power of leading tire

sales/shipment, then a backup leading indicator may be found at the Forecast Errors tab of each

model/excel file or a completely new leading indicator should be brought in by the above mentioned 15

steps approach.

All the update info about leading indicators chosen for this project is stored in the file named Key

Leading Indicators. Some of the economic variables for our 2 months out models have a delivery lag

around 45 to 60 days. That means to effectively use some of our causative models, we need to obtain the

leading indicators forecast values first. Sometimes these forecasted values are provided by the data

source organizations. Sometimes we need to do the forecasts by ourselves using time series modeling

techniques.

FILES LOCATION AND NAMEAll the files related to this project is stored at the following location:

T:\NAT\703 Commercial Demand Planning\Commercial Modeling

For details about all the folders and their contents please see Appendix 18.

FUTURE LOOKDepending on the effectiveness of the causitive models developed for this project as new data becomes

available, we can

Revise and maintain our current models

Transfer the modeling technique to Goodyears other business segments Automate the modeling procedures in Excel using advanced programming language

APPENDIX[1] RMA Commercial Truck Tire Classification

Market segment Vehicle Application Code Description

Urban 220 Light, Medium, and W ide-Base

Truck Tires marketed to operate

specifically in pickup and delivery

service in a local area (e.g. retail

and wholesale pick-up and delivery,

emergency vehicles, and intracity


17/24

17

bus fleets).

Regional 230 Medium, Wide Base and Heavy

Truck Tires marketed to operate in a

limited (150 mile radius) delivery or

service related vocation (e.g.

State & local government,

emergency vehicles, public utility,

school bus, food, petroleum and

manufacturing goods distribution,

and

inter-modal piggy-back trailers).

Long haul 240 Medium, W ide Base and Heavy

Truck Tires marketed to operate in

long distance, high annual mileage

operations (e.g. Less-Than-

Trailer-Load, Trailer-Load, and

Lease/Rental Fleets, Common

Contract Carriers, and Inter-City Bus

Fleets).

On-Off/Off Highway (Mixed

service)

250 All Light, Medium, W ide Base,

Heavy and Large-off-the-Road

Truck Tires marketed to operate in

off and on-off highway applications

(e.g.

construction, mining, sanitation, and

logging)

[2] Comparison of monthly data with 12 months rolling sum data

[3] Using 12 months rolling average to smooth leading indicator will sometimes improve forecast results


18/24

18

0

100000200000

300000

400000

500000

600000

700000

Jan-08

Feb-08

Mar-08

Apr-08

May-08

Jun-08

Jul-08

Aug-08

Sep-08

Oct-08

Nov-08

Dec-08

Jan-09

Feb-09

Mar-09

Apr-09

May-09

Jun-09

Jul-09

Aug-09

Sep-09

Oct-09

Nov-09

Dec-09

Jan-10

Feb-10

Mar-10

Apr-10

May-10

Jun-10

Industry Regional Market Segment Shipment Forecast

Actual History Forecast using leading indicator's monthly data Forecast using leading indicator's 12 months moving average

[4] Causative Models Utility Comparison Table

[5] Stepwise regression removes and adds variables to the regression model for the purpose of identifying a usefulsubset of thepredictors. Minitab provides three commonly used procedures: standard stepwise regression (adds andremoves variables), forward selection (adds variables), and backward elimination (removes variables).

When you choose the stepwise method, you can enter a starting set of predictor variables inPredictors in initial model. These variables are removed if theirp-values are greater than theAlpha to enter value. If you want keep variables in the model regardless of their p-values,enter them in Predictors to include in every model in the main dialog box.

When you choose the stepwise or forward selection method, you can set the value of Alphafor entering a new variable in the model in Alpha to enter.

When you choose the stepwise or backward elimination method, you can set the value ofAlpha for removing a variable from the model in Alpha to remove.

[6] Best subsets regression identifies the best-fitting regression models that can be constructed with thepredictorvariables you specify. Best subsets regression is an efficient way to identify models that achieve your goals with asfew predictors as possible. Subset models may actually estimate the regression coefficients and predict futureresponses with smaller variance than the full model using all predictors.
http://tmp/sv63d.tmp/javascript:Shared_GLOSSARY/response_and_predictor_variables_def.htm');http://tmp/sv63d.tmp/javascript:Shared_GLOSSARY/p_value_def.htm');http://tmp/sv63d.tmp/javascript:Shared_GLOSSARY/alpha_def.htm');http://tmp/sv63d.tmp/javascript:Shared_GLOSSARY/regression_analysis_def.htm');http://tmp/sv63d.tmp/javascript:Shared_GLOSSARY/response_and_predictor_variables_def.htm');http://tmp/sv63d.tmp/javascript:Shared_GLOSSARY/response_and_predictor_variables_def.htm');http://tmp/sv63d.tmp/javascript:Shared_GLOSSARY/Coefficients_def.htm');http://tmp/sv63d.tmp/javascript:Shared_GLOSSARY/response_and_predictor_variables_def.htm');http://tmp/sv63d.tmp/javascript:Shared_GLOSSARY/p_value_def.htm');http://tmp/sv63d.tmp/javascript:Shared_GLOSSARY/alpha_def.htm');http://tmp/sv63d.tmp/javascript:Shared_GLOSSARY/regression_analysis_def.htm');http://tmp/sv63d.tmp/javascript:Shared_GLOSSARY/response_and_predictor_variables_def.htm');http://tmp/sv63d.tmp/javascript:Shared_GLOSSARY/response_and_predictor_variables_def.htm');http://tmp/sv63d.tmp/javascript:Shared_GLOSSARY/Coefficients_def.htm');


19/24

19

Minitab examines all possible subsets of the predictors, beginning with all models containing one predictor, and thenall models containing two predictors, and so on. By default, Minitab displays the two best models for each numberof predictors.

For example, suppose you conduct a best subsets regression with three predictors. Minitab will report the best andsecond best one-predictor models, followed by the best and second best two-predictor models, followed by the fullmodel containing all three predictors

[7] Best multiple regression models by Minitab Best-subsets method-10 variables

The regressi

TOTAL = 16972

- 65

- 171

[8] Best multiple regression models by Minitab Best-subsets method-9 variables


20/24

20

The regressi

TOTAL = 12404

- 265

+ 10.[9] The best multiple regression model by Minitab Step-wise method-5 variablesThe regressi

TOTAL = - 283+

[10] 12 months rolling sum forecasts absolute percentage errors comparison table and plot for single andmultiple linear regression models


21/24

21

1

2

3

4

5

67

9 Aug-04

10 Sep-0411 Oct-04

12 Nov-04

13 Dec-04

14 Jan-05

DateTime

12months

[11] Goodyears 12 months rolling sum history plot for Total Market billed sales


22/24

22

Year

Month

20072006200520042003

DecJunDecJunDecJunDecJunDec

4000000

3750000

3500000

3250000

3000000Gyt12-monthrollingsales

Time Series Plot of Gyt 12-month rolling sales

Industrys 12 months rolling sum history plot for Total Market shipment

Year

Month

200720062005200420032002200120001999199819971996

DecDecDecDecDecDecDecDecDecDecDecDec

19000000

18000000

17000000

16000000

15000000

14000000

13000000

12000000

Industry12-monthrollingShip

Time Series Plot of Industry 12-month rolling Ship

[12] Multiplicative Holt Winters method

A time series modeling technique that is able to capture increasing seasonal variation.

[13] Additive Holt Winters method

A time series modeling technique that is able to capture constant seasonal variation.

[14] Goodyears Market Group and RMA name transformation

[15] Key Leading Indicators and their sources


23/24

23

N.O.

Var 1 Industrial

Var 2 CVar 3 Real Retail

Var 4 Ho

Var 5 2-4 Uni

Var 6 Conference Board In

Var 7 UM Index of

Var 8 Di

Var 9 WTI

Var 11 M1 Mon

Var12 ISM

External Eco

[16] Add-in Data Analysis in Excel 2007 can be activated as follows:


24/24

24

Click the Microsoft Office Button , and then click Excel Options. Click Add-Ins, and then in the Manage box, select Excel Add-ins. Click Go. In the Add-Ins available box, select the Analysis ToolPak check box, and then click OK. Tip: If Analysis ToolPak is not listed in the Add-Ins available box, click Browse to locate it.

If you get prompted that the Analysis ToolPak is not currently installed on your computer, click Yes toinstall it.

After you load the Analysis ToolPak, the Data Analysis command is available in the Analysis group on theData tab.

[17] ANOVA analysis generated by Regression function of Data analysis add-in in Excel 2007

SUMMARY OUTPUT

Re ression Sta

Multiple RR Square 0

Adjusted R Square 0

Standard Error 3Observations

[18] Project Folders and their contents

2 months and 12 months out Ca

Greg Tomsho

2 months and 12 months ou

JamesKrein

Commercial Replacement Indust

Folder Name: 2 month

Goodyear Billed Sales Causative Mo

Goodyear Billed Sales Causative Mo

bowei zhang scholarly potential_ goodyear project

Documents