booze’d trees1—predicting beer sales in...
TRANSCRIPT
![Page 1: Booze’d Trees1—Predicting Beer Sales in Mexicocs229.stanford.edu/proj2016/poster/SchubertZylberg... · 2014—2015 2016 ‘15 ‘16 August ‘16 4:1 validation split Time Orders](https://reader036.vdocuments.mx/reader036/viewer/2022071004/5fc19719593a2a2ad42df11a/html5/thumbnails/1.jpg)
Booze’d Trees1—Predicting Beer Sales in Mexico Ludwig Schubert & Dan Zylberglejd for Anheuser-Busch InBev
Task & Dataset Methodology Results & Discussion
CS 229 Poster Session | 2016-12-05 | Finance & Commerce Dan Zylberglejd, DZYLBER & Ludwig Schubert, SCHUBERL 1 We should have let data guide our search for a project name.
We predict monthly sales volume of beer brands in Mexico using regional sales data from Anheuser-Busch InBev.
Why is accurate sales prediction so important?
Large scale production chains are slow in reacting to volume adjustments, leading to what supply chain management calls the BULLWHIP-EFFECT:
Small changes in consumer demand trigger outsized effects within the supply chain, exacerbating prediction inaccuracies.
We use this to motivate optimizing MSE instead of MAPE:
What data do you use?
Our dataset contains roughly 70,000 data points (25,000 sales months), broken down by region, brand and unit size, leaving us with only about 36 monthly sales numbers per combination.
We augmented this by a variety of environmental data:
We fit all models on sales data up to December 2015, spanning multiple years to capture seasonality.
Predictions are made one month at a time two months into the future as dictated by supply chain requirements.
The seemingly chaotic changes in sales volume over time motivate the use of ML over estimation via cYOY or simple YOY trends.
Which models do you use?
Initial training error distributions showed a high bias problem. Thus we expanded to the below list of more complex models.
All were hyper-parameter tuned with 5-fold cross validation on the training data only and optimized MSE.
Bagging + LASSO gives most robust result on validation
Our model improves on the State of the Art in testing
ABI currently employs a team of expert analysts to predict sales. Our predictions are slighty more precise, and combining both predictions makes them even better:
This resulted in ABI management greenlighting these models for implementation in their forecasting pipeline.
References & Thanks
We want to thank Jose Eduardo Villarreal Perez and his team at Anheuser-Busch InBev for guiding us through their data set. Icons licensed from Noun Project. All other charts were created by us. Beer bottle images used with permission from ABI.
January ‘14
19424 samples 6933 samples
2014—2015 2016
‘15 ‘16 August ‘16
4:1 validation split
Time
Ord
ers
Consumer
Time
Ord
ers
Retail
Time
Ord
ers
Wholesale
Time
Ord
ers
Manufacturerorders from orders from orders from
orders
stock
boozed’d trees - beer sales forecasting 1
1n
n
Âi=1
(Yi � Yi)2 (1)
1n
n
Âi=1
�����Yi � Yi
Yi
����� (2)
boozed’d trees - beer sales forecasting 1
1n
n
Âi=1
(Yi � Yi)2 (1)
1n
n
Âi=1
�����Yi � Yi
Yi
����� (2)
MSE—captures outsized effects on supply
MAPE—overemphasizes errors on smaller brands
Linear RegressionBoosting
Bar length proportional to MSE. Numbers show RMSE in Hectolitres.
16011564
Random Forrest
Bagging
1512
1476LASSO
RF + LASSO
1492
1455Bagging + LASSO 1437
Baseline (YOY) 1877
Oct 2015 Jan 2016 Apr 2016 Jul 2016
Ground truth (examples) 2-month leading prediction
Corona Extra
Victoria
Sales VolumeBrand × Region × Size
from ABI
DiscountsIn-store price cuts
from ABI
EconometricsEmployment,
Inflation, GDP
Soccer GamesScraped from Foxsports.com
HolidaysWeekdays, Workdays & Official Holidays
WeatherAverage Temperature
Month × Region
Boosting
Bagging
LASSO
Random Forest
RF+LASSO
Ensemble
Bagging+LASSO
Ensemble
Constant YOY
Linear Regression
Ground Truth
Pred
ictio
n1883ABI
1742Bagging + LASSO
1709Combined
9.2%
0.1
1.0
10.0
2013 2014 2015 2016Date
Log
chan
ge in
sale
s vol
ume