bowei zhang scholarly potential_ goodyear project

Upload: james-nelson

Post on 30-May-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/14/2019 Bowei Zhang Scholarly Potential_ Goodyear Project

    1/24

    2009Written by :

    Bowei Zhang

    Proofread by:

    Steven MillerSteven Subichin

    09/30/2009

    Last Revision Date11/24/2009

  • 8/14/2019 Bowei Zhang Scholarly Potential_ Goodyear Project

    2/24

    2

    Table of ContentsPROJECT INTRODUCTION.............................................................................................3

    12 Months Rolling Sum and Lagged Leading Indicators.......................................................3

    Correlation Verified To Be Linear.....................................................................................4

    Market Share Forecast...................................................................................................4

    Modeling Data Geography-US Models extended to include Canada...........................................5

    MODELING EFFORT 1-SIMPLE LINEAR REGRESSION MODELS.........................................5

    Modeling Assumptions and Limitations..............................................................................5

    How to Obtain Monthly Forecast from 12 Months Rolling Sum Forecasts....................................5

    Seasonality-Removed By 12 Months Rolling Sum.................................................................6

    Outliers-Strike Consideration...........................................................................................6

    Outliers Reduction-Smooth Economic Leading Indicators.......................................................7

    Model Utility and Residual Analysis..................................................................................7

    MODELING EFFORT 2- MULTIPLE LINEAR REGRESSION MODELS....................................8

    How to Build Multiple Linear Regression Models in Minitab...................................................8

    Reason for Not Using Multiple Regression Models Monthly Forecasts.......................................9

    MODELING EFFORT 3- TIME SERIES MODELS...............................................................10DATA RANGE AND SOURCE........................................................................................11

    EXPLANATION OF THE STANDARD LINEAR REGRESSION SPREADSHEET......................12

    Common Tabs...........................................................................................................12

    Unique Tabs..............................................................................................................14

    Steps of Searching for New Leading Indicators...................................................................15

    MODEL REFRESH AND UPDATE ISSUES.......................................................................16

    FILES LOCATION AND NAME......................................................................................16

    FUTURE LOOK...........................................................................................................17

    APPENDIX..................................................................................................................17

    APPENDIX

    Goodyear and Industry North AmericaCommercial Replacement Tire

    Causative Forecasting Models User

  • 8/14/2019 Bowei Zhang Scholarly Potential_ Goodyear Project

    3/24

    3

    PROJECT INTRODUCTIONLean inventory and efficient demand planing are two weapons especially improtant for any businesses to

    survive recession times. To achieve these two goals, a powerful demand forecasting system with

    relatively high level of accuracy is necessary. The aim of the project is to build such forecasting models

    which reveal the relationship between leading economic variables and Goodyears business that by

    looking at the trend of those economic variables, Goodyear can tell the future highs and lows of itsrelevant business segments.

    Key members of the project include Steven Miller, Steven Subichin, Mike Ryan, Greg Tomsho and

    Bowei Zhang.

    This project is focused on Goodyear and the total industrys performance in US/North America

    commercial replacement tire markets. We split the commercial replacement tire market into four

    different segments by tire application and wished to forecast the demand for each segment as well as the

    total market as a whole.

    Raw data we have for this project are:

    Monthly data of 74 leading economic variables(US) that may potentially relate to the commercial

    replacement tire market from 01/1996 to 06/2009. (Multiple data sources)

    Industrys monthly shipment data for each segement of US/North America commercial

    replacement tire market: Urban/Regional/long haul/Mixed service from 01/1996 to 06/2009.

    (Data source: RMA)

    Goodyears monthly billed sales and shipment data for each segment of US commercial

    replacement tire market: Urban/Regional/long haul/Mixed service from 01/2003 to 07/2009 (Data

    source: EDW)

    One thing worth notice is that RMA and Goodyears classification of the four market segments are

    slightly different. We kept Goodyears billed sales data for each market segment using its own marketclassification criteria and regrouped Goodyears shipment data using RMAs criteria. We did it this way

    because Goodyears billed sales forecast will be used to assist DP which uses Goodyears market

    classification criteria and Goodyears shipment will be used, together with Industrys shipment forecast,

    which uses RMAs criteria, to calculate Goodyears future market share.

    12 Months Rolling Sum and Lagged Leading Indicators

    Initially we wished to find the potential linkage between external economic variables and the replacement

    tire business, be it linear or non-linear relationship. To reduce the modeling noise occurred to relatively

    small monthly billed sales and shipment values and identify their correlation with leading economic

    variables more easily, we substituted each monthly tire sales/shipment data point with the sum of data for

    that month and data for previous 11 months. Thereinafter this moving yearly data will be called 12months rolling sum. We calculated the correlation coefficients between 12 months rolling sums of billed

    sales/shipments of Goodyear and Industry for each market segment with the 74 economic variables

    monthly data. We assumed some of the variables have leading capabilities for the commercial replacment

    market. To test that, we simplely lagged those variables by certain months when we calculated the

    correlation coefficients. For example, if we think it takes the replacement tire market 2 months to respond

    to the movement of a leading indicator, then we would use the 2 months lagging data of that variable to

    calculate the correlation coefficients. For Goodyears billed sales data, we calculated their correlation

  • 8/14/2019 Bowei Zhang Scholarly Potential_ Goodyear Project

    4/24

    4

    coefficients with up to 24 months lagging data of the 74 variables. The numbers somehow proved our

    assumption because some variables have high correlation coefficients when lagged by near term and

    some by long term.

    Correlation Verified To Be Linear

    Although correlation coefficient is a tool to depict the strength of a linear relationship between two

    variables, the interpretation of that value could be very arbitary. There is no set rule about what nubmer is

    high and low and sometimes high numbers dont necessarily mean pure linear relationships. So we also

    drew scatter plots to study the true relationship between tire sales/shipment and leading economic

    varaibles and used correlation coefficients as a second reference.

    Using these two tools, we were able to identify some regular relatonship patterns between external

    variables and sales/shipment data that can be captured by certain mathematical models. By regular I mean

    that those relationship patterns can mostly be depicted by certain mathematical models. After careful

    consideration and comprehensive tests, we decided to build only simple linear regression models (which

    means one leading indicator matchs one market segment)for ease of understanding and use in practice.

    Now we have already built simple linear regression causative models with some level of confidence forGoodyears billed sales, Goodyears shipment and Industry shipment to forecast 2 months and 12 months

    out for each of the four segments in US only and North American commercial replacement tire market.

    For Industry shipment models, we also built time series models to provide alternative views and they all

    achieved decent forecast accuracy rates.(Monthly forecast ex-post errors for US only time series models

    range from 7.75% for Urban tires and 22.24% for Mixed Service tires; Monthly forecast ex-post errors for

    North America Market time series models range from 7.05% for Urban tire and 17.75 % for Mixed

    Service tire. )

    Market Share Forecast

    Since for Goodyear and Industry shipment data we applied the same market classsification criteria(by

    vehicle application code)[1]when grouping the data for each market segment, we are able toforecast(calculate) Goodyears future market shares.

    However, a word of caution is that even though we re-grouped Goodyears shipment data using RMAs

    criteria, there are still some difference between RMAs definition of certain market segments and

    Goodyears. One verification is that Goodyears re-grouped shipment data(North America) for Regional

    and Long haul segments are significantly different from what RMAs adjustment and interpretation of

    shipment data reported by Goodyear. But the difference within each market segment can offset each other

    to a certain extent. Also, due to re-statement issue, Goodyears total shipment data in EDW is different

    from the data sent back by RMA, on average by 7.5% during the period from 06/2007 to 04/2008.

    The above two facts mean that using different sources for Goodyear shipment data can leads to different

    causative models. RMAs official Goodyear shipment data has its value for other analysis endeavors.

    However,we chose to use EDWs Goodyear shipment data to build causative models and calculate

    Goodyears future market share because the modeling results from this project are intended for internal

    use only.

  • 8/14/2019 Bowei Zhang Scholarly Potential_ Goodyear Project

    5/24

  • 8/14/2019 Bowei Zhang Scholarly Potential_ Goodyear Project

    6/24

    6

    f13=(i=213Fi-i=112Fi) +H1

    =F13+i=212Fi-i=212Fi - F1+H1

    = H1+ (F13- F1) + ( i=212Fi-i=212Fi)

    H1 is the true history value of Jan in the first year. (F13- F1) is deemed as the forecasted monthlyincrease/decrease year-over-year, the change from Jan in the first year to Jan in the second year in our

    example. We assume the forecasted values from the two rolling sum forecasts of the same 11 months are

    almost the same, namely the artificial error term ( i=212Fi-i=212Fi) would be close to 0. If this

    assumption does not hold, our forecasted monthly value will deviate from the true monthly forecast we

    wish to, but impossible to, get directly from 12 months rolling sum forecasts. This is likely to happen to a

    monthly forecast value when absolute percentage forecast errors of the two related 12 months rolling sum

    forecasts change dramatically in that it will violate the i=212Fii=212Fi assumption. It will be easier

    for multiple regression models to violate this assumption thus generating inaccurate forecasts. More

    detailed discussion will be covered in the section Multiple Linear Regression.

    Seasonality-Removed By 12 Months Rolling SumOne benefit of using 12 months rolling sum history as the dependent variables for linear regression

    models is that we dont have seasonality in the data.Appendix 2 is a comparison plot for monthly and

    12 months rolling sum industry commercial replacement tire shipment data. As can be seen, the monthly

    data is more volatile and has some seasonality across the history. The 12 months rolling sum, on the other

    hand, does not have seasonal pattern at all (This fact applys to every market segment of our analysis). But

    over the long term, the 12 months rolling sum may indicate some regular business cycilcal pattern which

    can be deemed as a sort of seasonality when buiding time series models. This topic will be covered in

    more detail later.

    Normally for statistical modeling purpose, if the raw data has strong seasonality, we would have to

    deseasonalize them first then build the model and in the end reseasonalize the forecast. In our models,the transformation foluma introduced above f13= H1+ (F13- F1) + ( i=212Fi-i=212Fi) adds

    seasonality back to monthly forecasts by adding the monthly history term (H1 in the formula). Hence

    by using this formula we successfully avoided the seasonality issue in raw data and kept seasonality in the

    monthly forecast. The leading indicators we picked for this project are all free of seasonality issue.

    However, in the future, if we want to bring in new economic variables with seasonal pattern, we

    have to deseasonalize them before use.

    Outliers-Strike Consideration

    Outlying data points in terms of either independent or dependent variables in regression and time series

    models can heavily skew forecast results and hence forecast accuracy. Among the many possible outlying

    reasons, unusual one time event can generate abnormal history data. For example, the strike occurred onOct 5th, 2006 and ended early 2007 made Goodyears commercial replacement tire sales of each market

    segment from 11-2006 to 01-2007 extremely low. Especially sales for 12 -2006 is lower than the lowest

    points during recent recession times. Hence for Goodyears total replacement tire market, we

    overforecasted about 63% and 117% for 11-2006 and 12-2006 respectively. To fix the problem, we

    replaced the monthly sales data from 11-2006 to 01-2007 with the monthly average of same month from

    2003 to 2005 and reran the linear regression model. The total cumulative absolute forecast error during

    model building period (12-2003 to 12-2007) decreased from original 20.07% to 14.57%. However, the

  • 8/14/2019 Bowei Zhang Scholarly Potential_ Goodyear Project

    7/24

    7

    model validation error (ex-post error over period from 01-2008 to 06-2009) increased a little from the

    original 9.37% to 10.30%. One possible explanation is that without outlying data clean-up, the model

    studied the dampened sales during stike period and exterted the learning for forecasts over ex-

    post period, during which recession and dampened sales exist. Hence data clean-up in this case did

    not improve the models ex-post forecast accuracy. For details about the test, please refer to the tab

    named Industry outlier fix in the excel file named Goodyear Billed Sales Causative Models 2months out (US only). After careful thought, we think our Goodyear models are robust enough to

    contain some outlying raw data in the model buidling period without deterioriate forecast results.

    Hence we kept our models as they are.

    Outliers Reduction-Smooth Economic Leading Indicators

    However, there is one easy way to reduce, at least partially, the outlying forecast points. Before we dive

    into that, lets first look at how we calculate the monthly forecast residual. Assuming H13 is the monthly

    history value of the 13th month and using the transformation formula mentioned earlier, we can get the

    following residual calculation formula:

    H13- f13=H13- H1- (F13- F1) - ( i=212Fi-i=212Fi) = (H13-F13)-(H1- F1) - ( i=212Fi-i=212Fi)

    As can be seen, abnormal values of H13, H1, or abnormal change of leading indicators monthly values

    (It may cause two consecutive 12 months rolling sum forecasts change dramatically, which will very

    possibly violate our assumption that i=212Fii=212Fi). Hence one reasonable remedy for outlying

    monthly forecasts is to smooth the leading indicators monthly values by replacing them with the average

    of values of corresponding months plus previous 11 months. This transformation of leading indicators

    will reduce the monthly forecasts volatility.For an example, please refer to Appendix 3. We tested the

    transformation technique of leading indicators on the industry shipment 12 months out model for North

    America region. As shown by the plot, the transformation makes the forecasts smoother and closer to

    history value. In fact, the ex-post error during period from 01/2008 to 06/2009 dropped from 21.59% to

    only 9.35% after we took the 12 months rolling average of leading indicators monthly value.

    This smoothing technique will not always generate more accurate forecast results. However, it will

    definitely help make monthly forecasts less volatile if the leading indicators used are volatile in nature.

    We applied this technique for Industry shipment 12 months out forecast models for US only and North

    America data.

    Model Utility and Residual AnalysisAppendix 4 is a causative models utility comparison table. Key metrics used include R-square (both

    original and adjusted) and cumulative absolute percentage errors (ex-post period) of both causative and

    nave models. The nave models simply assume what happened yesterday will happen again

    tomorrow. Hence we would take current months sales/shipment as the forecast in 2 months and 12

    months for nave models. As coded in blue at the right most column of this table, ex-post forecast errorsof nave models are all higher than those of causative models, except for Regional Market. We think this

    could be either a coincidence or that the Regional Market is relatively speaking, stable enough to repeat

    the history value over time. Either explanation wont invalidate the effectiveness of our causative models

    though.

    Ex-post errors are cumulative absolute percentage errors. To be specific, this heuristic metrics is

    calculated by dividing the sum of absolute values of all monthly forecast residuals by the sum of monthly

  • 8/14/2019 Bowei Zhang Scholarly Potential_ Goodyear Project

    8/24

    8

    history values over a certain period. We prefer to use this metric to evaluate our forecasts absolute

    deviation from history rather than from a constant, as in R-square.

    The R-square value is another tool to indicate the effectiveness of regression models. The higher R-square

    (including adjusted one) is, the more likely that the total variation in the n observed values of the

    dependent variable is explained by the overall regression model. However, there is no absolute standard

    for what is a good value. As can be seen from the table (color coded in yellow), Goodyear models have

    relatively high R-square and Industry models lower, despite the fact that industry models are as accurate

    as Goodyear models in terms of monthly forecasts ex-post errors. This brings up two questions. One

    question is that how long can we keep using the causative models until we have to revise them? (This

    question will be addressed at the last section of this document). Another is that can we build multiple

    regression models that generate small monthly forecast ex-post errors and high R-square (adjusted) value?

    (Addressed below)

    MODELING EFFORT 2- MULTIPLE LINEAR REGRESSION MODELS

    How to Build Multiple Linear Regression Models in MinitabSince in general our Industry models have low R-square, industry total market shipment 2 months out

    forecast model is picked for this test. What we wish to build is a multiple regression model which has a

    high R-square and low cumulative absolute percentage error for monthly forecast during ex-post period.

    Minitabs automatic model selection function is used to perform the test.

    Ideally, it would be great if we can dump as many variables data into Minitab as possible and let the

    computer generate an optimal solution for us. However, Minitab can only process a limited number of

    variables using Stepwise [5]and Best subsets [6] selection methods. So some variables need to be

    screened out of the candidates pool as follows:

    Select 12 months rolling sum of total industry shipment and 2 months lagged data of the 74variables.

    Calculate correlation coefficients between rolling sum values and 74 variables and keep variables

    which have a correlation coefficient higher than 40% or lower than -40%. This step reduced the

    number of potential variables from 74 to 30.

    Use Stat-Basic Statistics-Correlation in Minitab to generate Correlation Matrix, which includes

    P-value for each correlation coefficient between any pairs of variables, including dependent

    variable-shipment. (If the correlation coefficient between two variables is higher than 0.9 or

    lower than -0.9, then one of them can be considered redundant for the dependent variable in the

    model. If there are more than two multi-correlated variables, compare their P-values with

    dependent variable first to screen out those with higher P-values; if P-values are the same, keep

    those with a higher absolute value of correlation coefficient with dependent variable.) Use the Correlation Matrix to eliminate redundant variables. This step reduced the number of

    variables from 30 to 20.

    Use Step-wise and Best-subsets methods in Minitab to generate the best multiple linear

    regression models.

    The best two models generated by Best-subsets method are one 10 variables linear regression model[7]and one 9 variables linear regression model.[8]

  • 8/14/2019 Bowei Zhang Scholarly Potential_ Goodyear Project

    9/24

    9

    The best model from Step-wise method is a 5 variables linear regression model. [9]

    Even though these models all have high R-squares (around 90%) and low ex-post forecast errors for 12

    months rolling sum forecasts , which are better than our original one variable linear regression models,

    their ex-post forecast errors for monthly forecast are especially high (around 40% and that of our original

    model is around 9%). However, the multiple regression models cumulative absolute percentage errors of

    monthly forecasts during model building period (around 12%) are not very far from those of our original

    models (8%).

    Reason for Not Using Multiple Regression Models Monthly Forecasts

    We find out that the reason why multiple regression models did not perform well for monthly forecast

    during ex-post period is related to the assumption of our defined transformation formula of 12 months

    rolling sum forecast to monthly forecast. As mentioned before, we assume the forecasted values from the

    two 12 months rolling sum forecasts of the same 11 months is almost the same,

    namelyi=212Fii=212Fi.

    However this assumption is not always true and can be more easily violated by multiple linear regression

    models than single linear regression models. The multiple linear regression models in our case all havehigh R-square values, which means that the variation of dependent variable (shipment/sales of tires) is

    explained to a large extent by those multiple variables we included (despite the fact that mathematically

    speaking, the more variables we add in a multiple regression model, the higher its R-square). The

    downside of that is that for multiple regression models we have more external factors to control and

    each ones fluctuation can affect our final transformed monthly forecasts.

    Look at the data and plot in Appendix 10. In the vertical axis of the plots in Appendix 10 is the

    cumulative absolute percentage error for 12 months rolling forecast. Blue line represents the 12 months

    rolling sum forecast obtained by our original single linear regression model. Red and green line are

    forecasts from two multiple linear regression models selected by Best subsets method. As can be seen,

    before period 49, which is 12-2007, multiple regression models are more accurate than single linearregression model in terms of 12 month rolling sum forecast.

    During ex-post period, from 01-2008 to 06-2009, the forecast accuracy of multiple regression models

    fluctuate more heavily than the single linear regression model. That is because there are more variables

    in multiple regression models and it is more possible that the recession impacts on those variables will

    skew the 12 months rolling sum forecast. More fluctuation between two consecutive 12 months rolling

    sum forecasts will violate thei=212Fii=212Fi assumption and cause our related monthly forecast to

    have a high forecast error.

    To sum up, due to technique we used to transform 12 months rolling sum forecast to monthly

    forecast and the fact that multiple regression models are more difficult to control and maintain, we

    think simple linear regression model is better for our modeling purposes even though relatively,

    they will have a smaller R-square, compared with multiple regression models.

    It is natural to think that if we can use monthly shipment/sales as dependent variables directly to build

    multiple regression models then we can have both high forecast accuracy and high R-square. However,

    the monthly data is too volatile compared with 12 months rolling sum values, and as tested, we can barely

    find well correlated external indicators for monthly shipment/sales data.

  • 8/14/2019 Bowei Zhang Scholarly Potential_ Goodyear Project

    10/24

    10

    MODELING EFFORT 3- TIME SERIES MODELSOther than causative models, we also tested time series models for tire sales/shipment data. The

    forecasting method we used is called exponential smoothing, which weights the observed time series

    values unequally. More recent observations are weighted more heavily than more remote observations.

    This modeling method studies the time series historys level, trend (optional) and seasonality (optional)

    and copies it/them into future to make forecasts. As mentioned earlier, Goodyears history data dates backto 2003 and Industry back to 1996. Opposite to causative models, for time series studies, the more data

    we have, the easier it is for us to capture trend and seasoanlity, if there are any. As proofed by the plots in

    Appendix 11, Goodyears history data is too short to show obvious trend and seasonality while industrys

    history data is strong enough to be considered as a good candidate for Multiplicative Holt Winters

    method [12]. Actually, the seasonality indicated in the industry data is business cyclicality over the long

    term because the 12 months rolling sum values dont have seasonality in themselves. But this cyclical

    pattern can be modeled as a sort of seasonality.

    Due to data availability, we built time series models for industry shipment only using monthly data from

    01/1996 to 12/2007 and tested each model over the period from 01/2008 to 06/2009. The monthly hisotry

    data is very volatile even though it has trend and seasonality over history. To make sure we build the besttime series models we can, we tested four models using both the monthly shipment data and 12 month

    rolling sum shipment data for each market segment. The four models are : Level only; Level + Trend;

    Level+Trend+Increasing Seasonality (Multiplicative Holt Winters method); Level+Trend+Constant

    Seasonality (Additive Holt Winters method [13]). Hence in total, for each market segment, 8 time series

    models were tested using Minitab. As expected, the model that generates the smallest monthly forecast

    ex-post error for all the market segments except Mixed service is Multiplicative Holt Winters method on

    12 months rolling sum history data. The fact that Mixed service is an exception did not surprise us

    because its relatively complex business structure. Mixed services hisory plot does not show very typical

    and easily recognizable trend and seasonality patterns either. The best time series model for this market

    segment is level only using 12 month rolling sum hisory. This model generates a monthly forecast ex-

    post error of 22.24%, which is higher than those of all the Multiplicative Holt Winters models for theother market segments. The level only model means that if we wish to forecast into future for multiple

    periods after 06/2009, we would get the same 12 month rolling sum forecast for every future period. In

    that case, according to the transformation formula introduced previously (f13=(i=213Fi-i=112Fi)

    +H1), all future monthly forecasts will be the same as the one year back monthly history values. This is a

    kind of nave model too.

    All time series models are executed using Minitab. After uploading history shipment data into Minitab,

    simple go to Stat-Time series-Single Exp Smoothing, Double Exp Smoothing and Winters

    method for level only, level + trend and level+ trend+ seasonality models, respectively. For

    level only and level+ trend models, Minitab can generate optimal models by automatically searching

    for smoothing constants for level and trend components that minimize the Sum of Square Errors. ForHolt Winters method, we have to manually define all three smoothing constants for level, trend and

    seasonality while the default 0.2 values for all three smoothing constants work well for our project, most

    of the time.

    All the Industry shipment modeling results are stored in an Excel file named Industry Time Series

    modeling. For each market segment, there are three tabs in this Excel file. Take urban market for

    example. In the Urban forecast results tab, monthly forecast, monthly forecasts cumulative absolute

  • 8/14/2019 Bowei Zhang Scholarly Potential_ Goodyear Project

    11/24

    11

    percentage errors, history v.s. forecast plots for each of the eight time series models are listed for

    comparison. In the Rolling to monthly transform- Urb tab, 12 months rolling sum forecasts generated

    by Minitab can be copied to the column named Urban 12 months rolling sum forecast to generate the

    monthly forecast in the right most column. The transformation formula previously defined was already

    imbedded in this calculation. In the last tab called Error Calculator- Urban, transformed or Minitab

    directly generated monthly forecasts can be copied to related column to get the forecast error statisticscolor coded in blue.

    DATA RANGE AND SOURCEAs an old saying goes Garbage in garbage out. To avoid this clich for our project, we have to

    carefully maintain and process the raw data. All Goodyear billed sales data is available from EDW. As

    Goodyears market group names are slightly different from the RMA names for the four market

    segments, for a detailed transformation table, please refer to Appendix 14. All Goodyear shipment data

    has to be manually processed in order to apply RMAs market classification. This job is previously done

    by Greg Tomsho using Materials number Vehicle Application code table generated by Steven D.

    Miller. For a copy of this table, please see file named pbu03_all.All Industry shipment data by marketsegments is available from RMA. Contact Krista Liem for latest industry data.

    All the key leading indicators used for our modeling purposes are summarized in the table named Key

    Leading Indicators[15].In general our leading indicators come from three sources: Federal Reserve Bank

    of St. Louis, Energy Information Administration, US Dept. of Energy and Freight Transportation

    Research (FTR) Associate. FTR database is updated monthly and can be accessed by Krista Liem.

    Another thing worth notice is the data range issue. For Goodyear and Industrys causative models, we

    used sales/shipment history data from 2003 to 2009. For Industrys time series models, we used data from

    1996 to 2009. It makes sense to use more history data to study purely the time series trend and

    seasonality pattern. However, since so many macro-economic factors can affect tire sales/shipment

    dramatically over a long period of time, it would be risky to use say, 12 years tire sales/shipment history

    data to build single linear regression models. As a matter of fact, at the initial stage of our project, we

    built causative models for Industry shipment using history data for past 12.5 years and then we reduced

    the data range to past 6.5 years and re-ran the models. It turns out that using less shipment history data,

    we got lower monthly forecast ex-post errors. And we had to change some of the leading indicators

    selected previously.

    Hence to build effective causative models, we may have to consider dropping some of the oldest data in

    modeling period when new data becomes available and for time series models, it is OK to include new

    data points while keeping the old data. Also, most of our economic indicators data source organizations

    revise their published data periodically afterwards . As the new monthly data becomes available, the data

    for past periods may also have changed. If that is the case, all revised data within our modeling data rangeshould be used to re-run the model to get new sales/shipment forecast.

    EXPLANATION OF THE STANDARD LINEAR REGRESSION

    SPREADSHEETAll the causative models developed so far have the same standardized excel spreadsheet structure.

  • 8/14/2019 Bowei Zhang Scholarly Potential_ Goodyear Project

    12/24

    12

    There are six files in total for each category of modeling and they are named as:

    Goodyear Billed Sales Causative Models 2 months out

    Goodyear Billed Sales Causative Models 12 months out

    Goodyear Shipment Causative Models 2 months out

    Goodyear Shipment Causative Models 12 months out

    Industry Shipment Causative Models 2 months out

    Industry Shipment Causative Models 12 months out

    There are two sets of models. One set for US only data and another set for North America data. Hence in

    total there are 12 files. Every file contains the following 11 tabs. Take Goodyear Billed Sales Causative

    Models 2 months out (US only) for example.

    Common Tabs

    1. ReadMe:

    It contains description of the models within the Excel file and description of each indexed tab and

    how to use them.

    2. Scatter Plots:For each market segment and each of the 74 economic variables, there is a matching scatter plot

    generated in this tab. All the data used come from the tab x-months Lagging Data Set. If the

    current structure of the data in that tab does not change, the scatter plots will update automatically

    as the data changes. However, if new data is added, then we have to manually change the plots to

    refect the new data points. To do that, you can right click on the plots and select Select data,

    then you will be directed to the tab x-months Lagging Data Set, where you are able to re-select

    raw data.

    3. x-months Lagging Data Set:

    x can be either 2 or 12 depending the purpose of the model. The reason why our causative

    models have the ability to forecast dependent variables future values is that we lagged the

    independent variables while constructing the linear regression relationships. If we wish to

    forecast 2 months out, we will lag the leading indicators by 2 months; if we wish to forecast 12

    months out, we will lag the leading indicators by 12 months. Hence in this tab, billed sales 12

    months rolling sum data and 2 months lagged 74 economic variables are listed from 12-2003 to

    06-2009, which include both the modeling building period and validation (ex-post) period.

    4. All Data:

    This tab lists all 74 variabless monthly history data from 01-1996 to 06-2009. Some variables

    may have missing data points for the most recent history. This tab was set up to store any history

    data used for the project.

    5. Correlation Coefficients:

    This tab contains the monthly history of tire sales and automatically calculated 12 months rolling

    sum values. Also, the monthly history data of all 74 variables are listed here. The red dotted line

    table at the bottom of this tab listed correlation coefficients (calculated using =Correl()function

    in Excel) between 12 months rolling sum over the period from 12-2003 to 06-2009 (same period

    as used in the scartter plots) for each market segment and 2 months lagged 74 variables. All the

    correlation coefficients whose absolute values are above 80% are listed in color using

    Conditional Formatting in Excel. To obtain the updated correlation coefficients as new data

    comes in, you may have to add new monthly sales, drag-down excel cells to get 12 months rolling

  • 8/14/2019 Bowei Zhang Scholarly Potential_ Goodyear Project

    13/24

    13

    sums, add new monthly data for the 74 variables, and re-set the inbedded formula to include new

    12 months rolling sum and leading indicators.

    This tool is used together with scatter plots to detect potential linear relationship between leading

    indicators and tire sales data.

    6. Forecast errors:

    This tab listed all the selected variables (using scartter plots and correlation coefficients inprevious tabs) and their cumulative absolute percentage errors during period 01/2008 to 06/2009

    (ex-post errors) for both the 12 month rolling sum forecasts and transformed monthly forecats.

    The 12 month rolling sum forecasts ex-post errors are used to monitor our simple linear

    regression models effectiveness in capturing potential linear relationship between 12 months

    rolling sum sales and leading indicators. If the relationship is close to linear, this ex-post error

    should be small. And the monthly forecast ex-post error is used to check if our model can

    generate decent monthly forecast in near future. Normally, the ex-post error for 12 months rolling

    sum forecast should be smaller than that of monthly forecast.

    7. Urban-x:

    From tab 7 to 11 are the models we used to generate monthly forecasts. All tabs have the same

    structure and are self-explanatory. For illustration purpose, a detailed explanation is providedhere only for Urban-2 tab for Goodyear Billed Sales 2 months out model.

    The only two columns that need to be updated with external data source are the monthly history

    of tire sales and the column named by the selected leading indicator. You can drag down the

    column named 12 months rolling sum history to get the 12 months rolling sum needed for

    modeling.

    Then use Regression function in an Excel add-in called data analysis [16] to select the

    dependent variable, which is 12 months rolling sum history of tire sales and 2 months lagged

    monthly history of the leading indicator over the modeling period. The Regression function in

    Data analysis will generate a detailed ANOVA analysis as shown in appendix 17. The orange

    color coded two numbers are coefficients for the constant value and leading indicator in the

    simple linear regression model. You can copy those two numbers in the corresponding locationsat the top of the table then the monthly forecasts (at the right most of the table) and forecast errors

    (at the top right of the table) will be automatically generated. If current data selected for model

    building is used for forecasting future monthly sales, you dont have to change the coefficients

    previously entered at the top of the table. When new monthly sales data and leading indicators

    data become avaible, you can add them in and change the formula for new ex-post error

    calcualtion. If after a certain period of time, new data needs to be added into the modeling period,

    you have to rerun the data-analysis add-in to reselect the corresponding 12 month rolling sum

    tire sales and leading indicators monthly data.

    Most of the data for new cells can be obtained by draging down the cells in Excel.

    8. Regional-x:

    See tab 6 for instruction.9. Long haul-x:

    See tab 6 for instruction.

    10. Mixed service-x:

    See tab 6 for instruction.

    11.Total Market-x:

    See tab 6 for instruction.

  • 8/14/2019 Bowei Zhang Scholarly Potential_ Goodyear Project

    14/24

  • 8/14/2019 Bowei Zhang Scholarly Potential_ Goodyear Project

    15/24

    15

    Steps of Searching for New Leading Indicators

    The logical steps of using multiple tabs in each model/excel file to search for the best leading indicators

    for each market segment can be described as follows. Take Goodyear Billed Sales Causative Models 2

    months out (US only) for example.

    1. Update tire sales and leading indicators monthly data in the tab Correlation Coefficients.2. Adjust fomulas to include new data when refreshing the correlation coefficient calculation table

    in this tab.

    3. Copy and paste new tire sales 12 month rolling sum data (including ex-post period)and leading

    indicators monthly data to the tab 2-month lagging data sets.

    4. Go to tab Scatter Plots to update scatter plots one by one if necessary to include new data added

    in the tab 2-month lagging data sets .

    5. Observe the scatter plots. If a linear relationship is found, consider that variable a condidate for

    test.

    6. If linear relationship is not obvious to detect, use tab Correlation Coefficients to search for

    variables with high correaltion coefficients with 12 months rolling sum tire sales data.

    7. To test all the candidate varibles for a specific market segment, copy their data to correspondingmarket segment tab one bye one then perform the following test starting from step 8.

    8. Update both the monthly data for tire sales and leading indicator selected in specific market

    segment tab.

    9. Click Excel Data-Data Analysis-Regression tab to select the matching 12 months rolling sum

    sales and lagged monthly data for leading indicator(lagged by 2 months in this case) and perform

    ANOVA analysis.

    10.Copy coefficients for the constant and variable in linear regression model from the ANOVA

    analysis generated by Excel to corresponding positions at the top of the market segment tab.

    11. Drag down the the colum called 12 months rolling sum forecast and monthly forecast if

    necessary. All formulas are already inbedded.

    12.Copy and paste the ex-post forecast errors for both 12 months rolling sum and monthly forecastsautomatically generated at the top right of the table in market segment tab to corresponding

    positions in the tab named Forecast errors.

    13. Repeat step 7 to 12 until ex-post errors generated by every potential leading indicator are

    recorded in the Forecast Errors tab.

    14.Select the one variable that does not generate negative monthly forecasts and gives a low monthly

    forecast ex-post error.

    15.If outlying monthly forecasts are generated by a chosen leading indicator, either manual

    adjustment of forecast is required or a back up leading indicator can be selected from the tab

    Forecast Errors.

    MODEL REFRESH AND UPDATE ISSUESTo use linear regression models to forecast, one important underlying assumption is that the linear

    relationship between independent variable (leading indicators in our models) and dependent variable

    (tires sales/ shipment)will last into future. And the similar type of underlying assumption for exponential

    smoothing models is that the trend and seasonality will last into future. However, in practice these

    assumption wont hold forever. That brings up the question about when to revisit the models. The

    suggested re-modeling cycle is 6 months for our project. Every six months, when we have 6 more months

  • 8/14/2019 Bowei Zhang Scholarly Potential_ Goodyear Project

    16/24

    16

    new tire sales/shipment data, we can evaluate the effectiveness of each model. If the leading indicator still

    works fine, then the only thing to do might be to add new data in modeling period and drop the equal

    amount of old data, if necessary. If the chosen external economic variable loses its power of leading tire

    sales/shipment, then a backup leading indicator may be found at the Forecast Errors tab of each

    model/excel file or a completely new leading indicator should be brought in by the above mentioned 15

    steps approach.

    All the update info about leading indicators chosen for this project is stored in the file named Key

    Leading Indicators. Some of the economic variables for our 2 months out models have a delivery lag

    around 45 to 60 days. That means to effectively use some of our causative models, we need to obtain the

    leading indicators forecast values first. Sometimes these forecasted values are provided by the data

    source organizations. Sometimes we need to do the forecasts by ourselves using time series modeling

    techniques.

    FILES LOCATION AND NAMEAll the files related to this project is stored at the following location:

    T:\NAT\703 Commercial Demand Planning\Commercial Modeling

    For details about all the folders and their contents please see Appendix 18.

    FUTURE LOOKDepending on the effectiveness of the causitive models developed for this project as new data becomes

    available, we can

    Revise and maintain our current models

    Transfer the modeling technique to Goodyears other business segments Automate the modeling procedures in Excel using advanced programming language

    APPENDIX[1] RMA Commercial Truck Tire Classification

    Market segment Vehicle Application Code Description

    Urban 220 Light, Medium, and W ide-Base

    Truck Tires marketed to operate

    specifically in pickup and delivery

    service in a local area (e.g. retail

    and wholesale pick-up and delivery,

    emergency vehicles, and intracity

  • 8/14/2019 Bowei Zhang Scholarly Potential_ Goodyear Project

    17/24

    17

    bus fleets).

    Regional 230 Medium, Wide Base and Heavy

    Truck Tires marketed to operate in a

    limited (150 mile radius) delivery or

    service related vocation (e.g.

    State & local government,

    emergency vehicles, public utility,

    school bus, food, petroleum and

    manufacturing goods distribution,

    and

    inter-modal piggy-back trailers).

    Long haul 240 Medium, W ide Base and Heavy

    Truck Tires marketed to operate in

    long distance, high annual mileage

    operations (e.g. Less-Than-

    Trailer-Load, Trailer-Load, and

    Lease/Rental Fleets, Common

    Contract Carriers, and Inter-City Bus

    Fleets).

    On-Off/Off Highway (Mixed

    service)

    250 All Light, Medium, W ide Base,

    Heavy and Large-off-the-Road

    Truck Tires marketed to operate in

    off and on-off highway applications

    (e.g.

    construction, mining, sanitation, and

    logging)

    [2] Comparison of monthly data with 12 months rolling sum data

    [3] Using 12 months rolling average to smooth leading indicator will sometimes improve forecast results

  • 8/14/2019 Bowei Zhang Scholarly Potential_ Goodyear Project

    18/24

    18

    0

    100000200000

    300000

    400000

    500000

    600000

    700000

    Jan-08

    Feb-08

    Mar-08

    Apr-08

    May-08

    Jun-08

    Jul-08

    Aug-08

    Sep-08

    Oct-08

    Nov-08

    Dec-08

    Jan-09

    Feb-09

    Mar-09

    Apr-09

    May-09

    Jun-09

    Jul-09

    Aug-09

    Sep-09

    Oct-09

    Nov-09

    Dec-09

    Jan-10

    Feb-10

    Mar-10

    Apr-10

    May-10

    Jun-10

    Industry Regional Market Segment Shipment Forecast

    Actual History Forecast using leading indicator's monthly data Forecast using leading indicator's 12 months moving average

    [4] Causative Models Utility Comparison Table

    [5] Stepwise regression removes and adds variables to the regression model for the purpose of identifying a usefulsubset of thepredictors. Minitab provides three commonly used procedures: standard stepwise regression (adds andremoves variables), forward selection (adds variables), and backward elimination (removes variables).

    When you choose the stepwise method, you can enter a starting set of predictor variables inPredictors in initial model. These variables are removed if theirp-values are greater than theAlpha to enter value. If you want keep variables in the model regardless of their p-values,enter them in Predictors to include in every model in the main dialog box.

    When you choose the stepwise or forward selection method, you can set the value of Alphafor entering a new variable in the model in Alpha to enter.

    When you choose the stepwise or backward elimination method, you can set the value ofAlpha for removing a variable from the model in Alpha to remove.

    [6] Best subsets regression identifies the best-fitting regression models that can be constructed with thepredictorvariables you specify. Best subsets regression is an efficient way to identify models that achieve your goals with asfew predictors as possible. Subset models may actually estimate the regression coefficients and predict futureresponses with smaller variance than the full model using all predictors.

    http://tmp/sv63d.tmp/javascript:Shared_GLOSSARY/response_and_predictor_variables_def.htm');http://tmp/sv63d.tmp/javascript:Shared_GLOSSARY/p_value_def.htm');http://tmp/sv63d.tmp/javascript:Shared_GLOSSARY/alpha_def.htm');http://tmp/sv63d.tmp/javascript:Shared_GLOSSARY/regression_analysis_def.htm');http://tmp/sv63d.tmp/javascript:Shared_GLOSSARY/response_and_predictor_variables_def.htm');http://tmp/sv63d.tmp/javascript:Shared_GLOSSARY/response_and_predictor_variables_def.htm');http://tmp/sv63d.tmp/javascript:Shared_GLOSSARY/Coefficients_def.htm');http://tmp/sv63d.tmp/javascript:Shared_GLOSSARY/response_and_predictor_variables_def.htm');http://tmp/sv63d.tmp/javascript:Shared_GLOSSARY/p_value_def.htm');http://tmp/sv63d.tmp/javascript:Shared_GLOSSARY/alpha_def.htm');http://tmp/sv63d.tmp/javascript:Shared_GLOSSARY/regression_analysis_def.htm');http://tmp/sv63d.tmp/javascript:Shared_GLOSSARY/response_and_predictor_variables_def.htm');http://tmp/sv63d.tmp/javascript:Shared_GLOSSARY/response_and_predictor_variables_def.htm');http://tmp/sv63d.tmp/javascript:Shared_GLOSSARY/Coefficients_def.htm');
  • 8/14/2019 Bowei Zhang Scholarly Potential_ Goodyear Project

    19/24

    19

    Minitab examines all possible subsets of the predictors, beginning with all models containing one predictor, and thenall models containing two predictors, and so on. By default, Minitab displays the two best models for each numberof predictors.

    For example, suppose you conduct a best subsets regression with three predictors. Minitab will report the best andsecond best one-predictor models, followed by the best and second best two-predictor models, followed by the fullmodel containing all three predictors

    [7] Best multiple regression models by Minitab Best-subsets method-10 variables

    The regressi

    TOTAL = 16972

    - 65

    - 171

    [8] Best multiple regression models by Minitab Best-subsets method-9 variables

  • 8/14/2019 Bowei Zhang Scholarly Potential_ Goodyear Project

    20/24

    20

    The regressi

    TOTAL = 12404

    - 265

    + 10.[9] The best multiple regression model by Minitab Step-wise method-5 variablesThe regressi

    TOTAL = - 283+

    [10] 12 months rolling sum forecasts absolute percentage errors comparison table and plot for single andmultiple linear regression models

  • 8/14/2019 Bowei Zhang Scholarly Potential_ Goodyear Project

    21/24

    21

    1

    2

    3

    4

    5

    67

    9 Aug-04

    10 Sep-0411 Oct-04

    12 Nov-04

    13 Dec-04

    14 Jan-05

    DateTime

    12months

    [11] Goodyears 12 months rolling sum history plot for Total Market billed sales

  • 8/14/2019 Bowei Zhang Scholarly Potential_ Goodyear Project

    22/24

    22

    Year

    Month

    20072006200520042003

    DecJunDecJunDecJunDecJunDec

    4000000

    3750000

    3500000

    3250000

    3000000Gyt12-monthrollingsales

    Time Series Plot of Gyt 12-month rolling sales

    Industrys 12 months rolling sum history plot for Total Market shipment

    Year

    Month

    200720062005200420032002200120001999199819971996

    DecDecDecDecDecDecDecDecDecDecDecDec

    19000000

    18000000

    17000000

    16000000

    15000000

    14000000

    13000000

    12000000

    Industry12-monthrollingShip

    Time Series Plot of Industry 12-month rolling Ship

    [12] Multiplicative Holt Winters method

    A time series modeling technique that is able to capture increasing seasonal variation.

    [13] Additive Holt Winters method

    A time series modeling technique that is able to capture constant seasonal variation.

    [14] Goodyears Market Group and RMA name transformation

    [15] Key Leading Indicators and their sources

  • 8/14/2019 Bowei Zhang Scholarly Potential_ Goodyear Project

    23/24

    23

    N.O.

    Var 1 Industrial

    Var 2 CVar 3 Real Retail

    Var 4 Ho

    Var 5 2-4 Uni

    Var 6 Conference Board In

    Var 7 UM Index of

    Var 8 Di

    Var 9 WTI

    Var 11 M1 Mon

    Var12 ISM

    External Eco

    [16] Add-in Data Analysis in Excel 2007 can be activated as follows:

  • 8/14/2019 Bowei Zhang Scholarly Potential_ Goodyear Project

    24/24

    24

    Click the Microsoft Office Button , and then click Excel Options. Click Add-Ins, and then in the Manage box, select Excel Add-ins. Click Go. In the Add-Ins available box, select the Analysis ToolPak check box, and then click OK. Tip: If Analysis ToolPak is not listed in the Add-Ins available box, click Browse to locate it.

    If you get prompted that the Analysis ToolPak is not currently installed on your computer, click Yes toinstall it.

    After you load the Analysis ToolPak, the Data Analysis command is available in the Analysis group on theData tab.

    [17] ANOVA analysis generated by Regression function of Data analysis add-in in Excel 2007

    SUMMARY OUTPUT

    Re ression Sta

    Multiple RR Square 0

    Adjusted R Square 0

    Standard Error 3Observations

    [18] Project Folders and their contents

    2 months and 12 months out Ca

    Greg Tomsho

    2 months and 12 months ou

    JamesKrein

    Commercial Replacement Indust

    Folder Name: 2 month

    Goodyear Billed Sales Causative Mo

    Goodyear Billed Sales Causative Mo