michael grenon cs378 data mining spring 2018 optimizing...

22
Optimizing Baseball Performance and Player Salary Michael Grenon CS378 Data Mining Spring 2018

Upload: others

Post on 09-Aug-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Michael Grenon CS378 Data Mining Spring 2018 Optimizing ...lxiong/cs378/share/project/11_late_24558… · Optimizing Baseball Performance and Player Salary Michael Grenon CS378 Data

OptimizingBaseball Performance andPlayer Salary

Michael GrenonCS378 Data Mining

Spring 2018

Page 2: Michael Grenon CS378 Data Mining Spring 2018 Optimizing ...lxiong/cs378/share/project/11_late_24558… · Optimizing Baseball Performance and Player Salary Michael Grenon CS378 Data
Page 3: Michael Grenon CS378 Data Mining Spring 2018 Optimizing ...lxiong/cs378/share/project/11_late_24558… · Optimizing Baseball Performance and Player Salary Michael Grenon CS378 Data

Baseball ♥ Stats

Page 4: Michael Grenon CS378 Data Mining Spring 2018 Optimizing ...lxiong/cs378/share/project/11_late_24558… · Optimizing Baseball Performance and Player Salary Michael Grenon CS378 Data

North America ♥ Baseball

Page 5: Michael Grenon CS378 Data Mining Spring 2018 Optimizing ...lxiong/cs378/share/project/11_late_24558… · Optimizing Baseball Performance and Player Salary Michael Grenon CS378 Data
Page 6: Michael Grenon CS378 Data Mining Spring 2018 Optimizing ...lxiong/cs378/share/project/11_late_24558… · Optimizing Baseball Performance and Player Salary Michael Grenon CS378 Data

Franchise Entertainmentorganization

moneydata

wins

2017*

Optimization

Page 7: Michael Grenon CS378 Data Mining Spring 2018 Optimizing ...lxiong/cs378/share/project/11_late_24558… · Optimizing Baseball Performance and Player Salary Michael Grenon CS378 Data

Salary optimizationHow did teams optimize their player salaries?

● No Salary Cap!● Similarity problem

○ Linear correlation○ Pearson correlation coefficient

Page 8: Michael Grenon CS378 Data Mining Spring 2018 Optimizing ...lxiong/cs378/share/project/11_late_24558… · Optimizing Baseball Performance and Player Salary Michael Grenon CS378 Data

How to Play to Win?Which aspects of play most strongly correlate with winning?

● Similarity problem○ (Linear) association○ Pearson correlation coefficient

Page 9: Michael Grenon CS378 Data Mining Spring 2018 Optimizing ...lxiong/cs378/share/project/11_late_24558… · Optimizing Baseball Performance and Player Salary Michael Grenon CS378 Data

How do the best teams use their players?How frequently are certain players used in games?

● Frequent item set problem○ Apriori algorithm○ Support threshold?

Page 10: Michael Grenon CS378 Data Mining Spring 2018 Optimizing ...lxiong/cs378/share/project/11_late_24558… · Optimizing Baseball Performance and Player Salary Michael Grenon CS378 Data
Page 11: Michael Grenon CS378 Data Mining Spring 2018 Optimizing ...lxiong/cs378/share/project/11_late_24558… · Optimizing Baseball Performance and Player Salary Michael Grenon CS378 Data
Page 12: Michael Grenon CS378 Data Mining Spring 2018 Optimizing ...lxiong/cs378/share/project/11_late_24558… · Optimizing Baseball Performance and Player Salary Michael Grenon CS378 Data

R = 0.253

Page 13: Michael Grenon CS378 Data Mining Spring 2018 Optimizing ...lxiong/cs378/share/project/11_late_24558… · Optimizing Baseball Performance and Player Salary Michael Grenon CS378 Data

Price Per Win

Page 14: Michael Grenon CS378 Data Mining Spring 2018 Optimizing ...lxiong/cs378/share/project/11_late_24558… · Optimizing Baseball Performance and Player Salary Michael Grenon CS378 Data

E = win% * (ppw rank + win rank)

R = -0.555

Page 15: Michael Grenon CS378 Data Mining Spring 2018 Optimizing ...lxiong/cs378/share/project/11_late_24558… · Optimizing Baseball Performance and Player Salary Michael Grenon CS378 Data

BsR ≈ Baserunning WAR

Taken from team_batting(2017)

R = 0.548

Page 16: Michael Grenon CS378 Data Mining Spring 2018 Optimizing ...lxiong/cs378/share/project/11_late_24558… · Optimizing Baseball Performance and Player Salary Michael Grenon CS378 Data

wSL ≈ weighted Slider

Taken from team_batting(2017)

R = 0.500

Page 17: Michael Grenon CS378 Data Mining Spring 2018 Optimizing ...lxiong/cs378/share/project/11_late_24558… · Optimizing Baseball Performance and Player Salary Michael Grenon CS378 Data

¯\_(ツ)_/¯

Page 18: Michael Grenon CS378 Data Mining Spring 2018 Optimizing ...lxiong/cs378/share/project/11_late_24558… · Optimizing Baseball Performance and Player Salary Michael Grenon CS378 Data

What’s next?How frequently are certain players used in games?

● Frequent item set problem○ Apriori algorithm○ Support threshold?

Page 19: Michael Grenon CS378 Data Mining Spring 2018 Optimizing ...lxiong/cs378/share/project/11_late_24558… · Optimizing Baseball Performance and Player Salary Michael Grenon CS378 Data

What’s next?To what extent are win-loss record and attendance related?

● Extending Pearson correlation analysis

Page 20: Michael Grenon CS378 Data Mining Spring 2018 Optimizing ...lxiong/cs378/share/project/11_late_24558… · Optimizing Baseball Performance and Player Salary Michael Grenon CS378 Data

Preliminary Conclusions

● Hitting coaches: teach how to hit a slider○ Pitch most “cost-efficient” to excel at hitting○ ...but not by much

● Fielding coaches: emphasize speed and skill on baserunning○ More closely associated with salary efficiency than any other performance

■ Even batting, pitching

Page 21: Michael Grenon CS378 Data Mining Spring 2018 Optimizing ...lxiong/cs378/share/project/11_late_24558… · Optimizing Baseball Performance and Player Salary Michael Grenon CS378 Data

Preliminary Conclusions

● Chess match: all pieces are important ● 2-sided game● Predictive (vs. descriptive) statistics

○ Time-series analysis

Page 22: Michael Grenon CS378 Data Mining Spring 2018 Optimizing ...lxiong/cs378/share/project/11_late_24558… · Optimizing Baseball Performance and Player Salary Michael Grenon CS378 Data

Preliminary Conclusions● Too much data

○ batting_stats(): 287 attributes?● Statcast data

○ More complex mining techniques○ Neural Nets

● Data warehouses incomplete, disorganized○ Private sector