using performance metrics to forecast success in the national hockey league josh weissbock

22
Using Performance Metrics to Forecast Success in the National Hockey League Josh Weissbock

Upload: athena-woodroof

Post on 30-Mar-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Using Performance Metrics to Forecast Success in the National Hockey League Josh Weissbock

Using Performance Metrics to Forecast Success in the National

Hockey League

Josh Weissbock

Page 2: Using Performance Metrics to Forecast Success in the National Hockey League Josh Weissbock

Outline

• Introduction to Hockey• Introduction to Performance

Metrics• Predicting the outcome of a Single

Game• Exploration of Performance Metrics

J. Weissbock (2013)

Page 3: Using Performance Metrics to Forecast Success in the National Hockey League Josh Weissbock

Aim

• Performance Metrics (or “Advanced Stats”) have been shown on the internet to correlate much higher to wins and points in the standing, for the National Hockey League, than traditional statistics posted by the NHL.

• Can we use these advanced stats predict success in the NHL?

J. Weissbock (2013)

Page 4: Using Performance Metrics to Forecast Success in the National Hockey League Josh Weissbock

Introduction

• Lack of academic attention to hockey.• Hard to analyze due to the lack of events (goals).• We attempt to use Machine Learning to predict a single

game in the National Hockey League:– Using Traditional Statistics;– Using Performance Metrics; and– “Tuning Performance Metrics”

J. Weissbock (2013)

Page 5: Using Performance Metrics to Forecast Success in the National Hockey League Josh Weissbock

Sports in Machine Learning

• Chen et al. (1994) used Neural Networks to predict greyhound races.

• 2006 Soccer World Cup prediction accuracy 75%.• NFL accuracy with neural networks: 78.6%.• NCAA Football games prediction accuracy: 76%.• NBA basketball accuracy at 76%.

J. Weissbock (2013)

Page 6: Using Performance Metrics to Forecast Success in the National Hockey League Josh Weissbock

Background

• Hockey is a sport played on a rectangular sheet of ice 60-61m x 25-30m.

• 2x teams of 5x players and 1x goal keeper (goalie).• Team the scores the most goals in 60 minutes wins.• NHL top league in the world.

– Other major leagues: KHL, SHL, ELH.• Top 16 teams at the end of the year compete in an

elimination tournament for the Stanley Cup.– 4x rounds of best-of-seven series.

J. Weissbock (2013)

Page 7: Using Performance Metrics to Forecast Success in the National Hockey League Josh Weissbock

Background

• Traditional Statistics are “Real Time Scoring System” Statistics, based on goals (low events), usually are simple goal based stats and subject to rink bias.– i.e. Goals, Assists, +/-, Giveaway, Takeaway, etc.– Demonstrated bias amongst RTSS statistics in NHL

arenas.• Advanced Statistics based on more events (all shots,

misses, blocks, and goals), shown to be highly correlated to wins and points.

J. Weissbock (2013)

Page 8: Using Performance Metrics to Forecast Success in the National Hockey League Josh Weissbock

Advanced vs Traditional StatisticsStatistic Home/Points

r-squaredRoad/Points r-squared

Fenwick Close 0.623 0.218

Goals Against 0.358 -0.119

Goal Differential 0.297 0.196

Goals 0.233 0.084

Wins 0.186 0.199

Points 0.177 0.177

Blocked Shots 0.157 -0.117

Hits 0.116 -0.021

Giveaways -0.001 -0.002

Takeaways -0.001 -0.003

Source: http://blogs.thescore.com/nhl/2013/02/25/breaking-news-puck-possession-is-important-and-nobody-told-the-cbc/ J. Weissbock (2013)

Page 9: Using Performance Metrics to Forecast Success in the National Hockey League Josh Weissbock

Advanced vs Traditional StatisticsTeam Stat Relationships

vs Wins R^2 vs Points R^2

5-5 F/A 0.605 0.655

GA/G 0.472 0.510

PP+PK 0.372 0.390

G/G 0.352 0.360

Sv% 0.227 0.263

PP% 0.221 0.231

SA/G 0.198 0.191

S/G 0.170 0.203

S% 0.160 0.145

PK% 0.152 0.160

FO% 0.097 0.109

Source: http://www.nucksmisconduct.com/2013/2/13/3987546/exploring-marginal-save-percentage-and-if-the-canucks-should-trade-aJ. Weissbock (2013)

Page 10: Using Performance Metrics to Forecast Success in the National Hockey League Josh Weissbock

Advanced Statistics

• Fenwick Close: statistic of posession, summation of shots, missed shots, blocks, and goals. Correlates to zone time.– “Close” refers to only when the score is within 1 in

the 1st/2nd period, or when the score is tied in 3rd/OT to eliminate “Score Effects”.

• PDO: Statistic of “luck” (or random chance), summation of Shooting % + Save %. Regresses to 100% +/- 2% over a 82-game season.

• 5/5 Goals For/Against: The ratio of goals scored for and against during even strength play.

J. Weissbock (2013)

Page 11: Using Performance Metrics to Forecast Success in the National Hockey League Josh Weissbock

Experiment 1

• Predicting a single game in NHL using both advanced and traditional statistics. – To see how high of an accuracy we can obtain– To see if advanced or traditional statistics better help

predicting at the micro-scale.

J. Weissbock (2013)

Page 12: Using Performance Metrics to Forecast Success in the National Hockey League Josh Weissbock

Data

• 517 games of 2012-2013 NHL Season (72% of season)– Python script to collect data daily.

• 14 Features/Team collected including:– Location, Goals For & Against, Season Goals For &

Against, PP%, PK%, Sh%, Sv%, 5v5 Goals For/Against, Win Streak, Conference Standing, Fenwick Close, PDO.

• Data collected before and after game:– Goals scored for & against, shots for & against.– To assist calculating statistics for future games.

• Sources: NHL.com, BehindTheNet.com, TSN.ca/NHL.

J. Weissbock (2013)

Page 13: Using Performance Metrics to Forecast Success in the National Hockey League Josh Weissbock

Example of data

J. Weissbock (2013)

Page 14: Using Performance Metrics to Forecast Success in the National Hockey League Josh Weissbock

Experiment

• Data represented as differentials between both teams.• Two entries for each game, one for each team.• Labelled as either “win” or “loss”. • Weka’s implementations of SMO (Support Vector

Machines), Neural Networks, J48 (Decision Tree), and NaiveBayes.

• Binary classification using 10-fold cross-validation.• Compared datasets of only traditional and advanced

statistics, as well as both.

J. Weissbock (2013)

Page 15: Using Performance Metrics to Forecast Success in the National Hockey League Josh Weissbock

Experiment

Traditional Advanced Mixed

Baseline 50% 50% 50%

SMO 58.61% 54.55% 58.61%

NB 57.25% 54.93% 56.77%

J48 55.42% 50.29% 55.51%

NN 57.06% 52.42% 59.38%

J. Weissbock (2013)

Page 16: Using Performance Metrics to Forecast Success in the National Hockey League Josh Weissbock

Experiment

• Best results from Neural Networks:– With additional tuning, accuracy of 59.38%;– Not statistically different than SMO.

• Splitting the data into testing/training (66%/33%) – accuracy of 57.83%:– Looked at pairs labelled Win/Win or Loss/Loss by

algorithm, keeping the label with the highest confidence and inverting the other, accuracy of 59%.

• Ensemble learning w/ stacking and voting returned similar accuracy.

J. Weissbock (2013)

Page 17: Using Performance Metrics to Forecast Success in the National Hockey League Josh Weissbock

Experiment

• Using Consistency Subset Evaluation, the top three features were:– Location– Goals against– Goal differential

J. Weissbock (2013)

Page 18: Using Performance Metrics to Forecast Success in the National Hockey League Josh Weissbock

Experiment

• Second half of our experimental evaluation, we consider shortening PDO to the last n games to see how “lucky” a team has been recently

PDO1 PDO3 PDO5 PDO10 PDO25 PDOAll

Baseline 50% 50% 50% 50% 50% 50%

SMO 58.61% 58.61% 58.61% 58.61% 58.61% 58.61%

NB 56.38% 56.96% 56.38% 56.58% 56.58% 56.77%

J48 54.93% 55.71% 55.42% 55.90% 55.61% 55.51%

NN 57.64% 56.67% 58.03% 58.03% 57.74% 58.41%

J. Weissbock (2013)

Page 19: Using Performance Metrics to Forecast Success in the National Hockey League Josh Weissbock

Discussion

• Altering PDO does not appear to have a significant affect on accuracy.

• Can predict ~60% of games correctly.• Despite possession shown to be more useful in long

term predictions, the traditional statistics are better for predicting a single game.

• In a single game the most valued features are: goals against, goal differential and location.

J. Weissbock (2013)

Page 20: Using Performance Metrics to Forecast Success in the National Hockey League Josh Weissbock

Future Work

• Collecting additional features:– Rest days, days of travel, time-zone shifts, altitude

shifts, change in weather at arena, gambling odds, injures, score-adjusted Fenwick, possession over the last n games.

• Collecting a full season of data (1230 games regularly).• Training on past seasons of data.• Compare prediction of single game in other league with

same features• Predicting the playoffs with best-of-seven series.

J. Weissbock (2013)

Page 21: Using Performance Metrics to Forecast Success in the National Hockey League Josh Weissbock

Conclusion

• ~60% Accuracy to predict a single game.• Traditional statistics more effective in predicting a single

game than advanced statistics.• Predicting a single game is difficult due to large

variance in the standings.• Theoretical limit in prediction for machine learning for a

single game in the NHL appears to be 62%.– Changes based on the parity of the league and

number of events.

J. Weissbock (2013)

Page 22: Using Performance Metrics to Forecast Success in the National Hockey League Josh Weissbock

Questions?

Joshua [email protected]

Follow my hockey analysis on twitter: @joshweissbock

J. Weissbock (2013)