because it's the cup: predicting the stanley cup...

Because It's the Cup: Predicting the Stanley Cup PlayoffsMason Swofford, Shuvam Chakraborty, Vineet Kosaraju

The Stanley Cup playoffs have longbeen known for their drama: mostgames are close, upsets arecommon, and teams not consideredone of the best can win the cup.However, current predictions arebased on traditional NHL statistics,which are not indicative of successdue to luck influenced outcomes.Our project has two main goals: 1)predict regular season and playoffgame results, and 2) construct agambling agent to optimize returns.

Backround Features Models Results & Error Analysis

Data Collection

Twomaindatasets:

• Regularseason&playoffgames

• Trainingsetamalgamatedfromdatasources#2(stats:features)and#3(gameresults:labels)

• Eachtrainingexampleis1game,label=winningteam(0=A,1=H)

• Featuresforeachtrainingexampleconsistsofteamstatisticsaveragedoverpastgamesinseason(seeright)

Feature DescriptionCF Corsi For,shotattemptsforateam,

includingblockedshots,andshotsnotongoal

CA Corsi Against,shotattemptsagainstateam,includingblockedandnotongoalshots.

GF GoalsForGA GoalsAgainstxGF ExpectedGoalsFor,basedon

qualityofshotattemptsforxGA ExpectedGoalsAgainst,basedon

qualityofshotattemptsagainstPENT PenaltiesTakenPEND PenaltiesDrawn

Feature DescriptionPDO Shooting%+Save% (roughmeasure

ofluck)FF Fenwick For(unblockedshot

attempts)FA FenwickAgainstSF ShotsFor(ongoal)SA ShotsAgainstxPDO ExpectedPDOdPDO PDOdifferenceOZS OffensiveZoneStartsDZS DefensiveZoneStartsNZS NeutralZoneStartsZSR ZoneStartRatioFOW Faceoffs WonFOL Faceoffs LostGVA GiveawaysTKA TakeawaysHF HitsForHA HitsAgainst%Win WinningPercentage

“Basic” Features

“Advanced” Features

Regarding Goal 1, classificationmodels attempted include:

• Logistic, softmax regression

• SVM (rbf, linear, poly, sigmoid)

• ANNs (varying hidden layers,activation functions)

Features chosen using basic featureselection and PCA. Predicting ifteam A wins a playoff is done with abinomial distribution, where p is theprob. A wins a game:

Regarding Goal 2, the gamblingproblem was formulated as aMarkov Decision Process.State:(currentMoney,game).Startstate:(initialMoney,0).Action:(money,team).CanbetuptocurrentmoneyonHome/Away;bettingamountsdiscretized.T(s,a,s’): ProbabilitiesoftransitionsaregivenbyourMLmodel.isEnd(s): Ifwerunoutofmoney,orwehavereachedthelastgame.R(s,a,s’):1ifwehavereachedanendstateandhavegreaterthanorequaltoDesiredAmountand0otherwiseDiscount:Setto1.

UserParameters:

Payoff:Anumbergreaterthan1thatcorrespondstohowmuchyougetbackforeachdollarbetBucketSize:Discretizationsizeforbetting.DesiredAmount:minimummoneywewanttofinishwith

44464850525456586062

Baseline Logistic Softmax SVM(Rbf) SVM(Poly) SVM(Sigmoid) SVM(Linear)

GameResultPredictionAccuracyusingBasicFeatures

TrainingSet ValidationSet

44

46

48

50

52

54

56

58

60

62

Baseline Logistic SVM(Rbf) SVM(Linear) ANN(h=5,relu) ANN(h=15,logistic)

ANN(h=5/10,tanh)

ANN(h=5/10,identity)

GameResultPredictionAccuracyusingBasic+AdvancedFeatures

TrainingSet ValidationSet

Figure 1, 2: Training and validation accuracies reported using 10-fold crossvalidation. For the best model, the accuracy on the test set of playoffs was54.66% (for reference, ESPN experts were ~51% accurate).

Ablative analysis of basic featuresdemonstrated that more advancedones were needed, however evenadvanced features didn’t help. Theliterature mentions a theoretical limitfor predicting the result of a singlehockey game due to luck/variability:

This limit of 60-63% was confirmed ina Monte Carlo simulation, running1000 trials, suggesting games can’t bedirectly predicted (right). This model issimilar to those used in the NFL.

Conclusions

References

• Hockey is a very challenging sport topredict due to variability inherent tosport.

• Perfect stats could allow reaching thetheoretical accuracy limit, butincremental progress needed.

• Reached 70% using SVM on playoffdata, so model could be fine-tuned.

• Applications in other leagues (otherthan NHL) or sports (baseball, etc).

Pischedda,Gianni.PredictingNHLMatchOutcomeswithMLModels.citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.735.795&rep=rep1&type=pdf.

Weissbock,Joshua,etal.UseofPerformanceMetricstoForecastSuccessintheNationalHockeyLeague.ceur-ws.org/Vol-1969/paper-06.pdf.

1. Playoff Results

2. Daily Team Stats

3. Betting Odds

00.050.10.150.20.250.30.35

0 10 20 30 40 50 60 70 80 90 100

MonteCarloSimulationofSeason

SDfromLuck ObservedSD

Figure 3: Results are accounted for by 73%luck, so when making predictions we canaccurately predict 27% and guess with 50%accuracy on the 73%, which gives us a63.5% ceiling.

$200

$0

$300

+ $675

- $200

+/- $0

MDP: Example

Total: $1475 in 3 games.

Shot attempt/quality features.

Includes shot attempt features, andadds overall team-based metrics.

0

0.2

0.4

0.6

0.8

0

500

1000

1500

0.01 0.02 0.03 0.04 0.05 0.075 0.1 0.2 0.5

Accuracyvs.”Closeness”ofGame

Samples Accuracy

TeamType NumberGames

Accuracy

BothGood 1088 0.5588

Both Bad 1306 0.5628

Good&Bad 3146 0.5950

TeamType Accuracy

BothGoodTeams 0.524

Both BadTeams 0.547

OneGoodTeam,One Bad 0.572

Baseline Predictions

because it's the cup: predicting the stanley cup...

Documents