research question:

1
Senior Project – Computer Science - 2008 Machine Learning in Football Andrew Finley Advisor – Prof. Striegnitz Research Question: Every year there are players who move from collegiate football to professional football with high expectations and never meet them. Likewise, there are players with low expectations who exceed them. This leads me to question, is it possible to accurately predict the success of NFL players based on their collegiate performance? A player is generally considered successful if he is starting a majority of his games by his third season. The goal of this project is to build a program that will predict a player’s professional statistics, given their collegiate statistics. For the sake of time, I am only looking at quarterbacks and running backs. Player Schoo l Year1 Pos1 Cl 1 G1 Rush Yds1 Car1 Rush TD1 Yds/ Car1 RushYds/ G1 Rec Yds1 Rec1 Rec TD1 Yds/ Rec1 Rec/ G1 RecYds/ G1 PR1 PR Yds1 PR TD1 Yds/ PR1 PR/G1 KR1 KR Yds1 KR TD1 Yds/ KR1 KR/G1 Ret TD1 Tot Yds1 Tot TD1 TotYds/ G1 Ronnie Brown Aubur n 2002 RB So 12 1008 175 13 5.76 84 166 9 1 18.4 0 13.8 0 0 0 0 0 0 0 0 0 0 0 1174 14 97.8 Year2 Pos2 Cl 2 G2 Rush Yds2 Car2 Rush TD2 Yds/ Car2 RushYds/ G2 Rec Yds2 Rec2 Rec TD2 Yds/ Rec2 Rec/ G2 RecYds/ G2 PR2 PR Yds2 PR TD2 Yds/ PR2 PR/G2 KR2 KR Yds2 KR TD2 Yds/ KR2 KR/G2 Ret TD2 Tot Yds2 Tot TD2 TotYds/ G2 2003 RB Jr 6 446 95 5 4.7 74.3 80 8 0 10 1 13.3 0 0 0 0 0 0 0 0 0 0 0 526 5 87.6 Year3 Pos3 Cl 3 G3 Rush Yds3 Car3 Rush TD3 Yds/ Car3 RushYds/ G3 Rec Yds3 Rec3 Rec TD3 Yds/ Rec3 Rec/ G3 RecYds/ G3 PR3 PR Yds3 PR TD3 Yds/ PR3 PR/G3 KR3 KR Yds3 KR TD3 Yds/ KR3 KR/G3 Ret TD3 Tot Yds3 Tot TD3 TotYds/ G3 2004 RB Sr 12 913 153 8 5.97 76.1 313 34 1 9.2 2 26.1 0 0 0 0 0 0 0 0 0 0 0 1226 9 102.2 Height Weight 6'-1'' 230 Season 1 Team1 G1 GS1 Att1 RushYds 1 RushAvg 1 RushLng 1 RushTD1 Rec1 RecYds 1 RecAvg 1 RecLng1 RecTD 1 FUM1 Lost 1 Starti ng 2005 MiamiDolph ins 15 14 207 907 4.4 65 4 32 232 7.3 38 1 4 4 TRUE Season 2 Team2 G2 GS2 Att2 RushYds 2 RushAvg 2 RushLng 2 RushTD2 Rec2 RecYds 2 RecAvg 2 RecLng2 RecTD 2 FUM2 Lost 2 Starti ng 2006 MiamiDolph ins 13 12 241 1008 4.2 47 5 33 276 8.4 24 0 4 2 TRUE Season 3 Team3 G3 GS3 Att3 RushYds 3 RushAvg 3 RushLng 3 RushTD3 Rec3 RecYds 3 RecAvg 3 RecLng3 RecTD 3 FUM3 Lost 3 Starti ng 2007 MiamiDolph ins 7 7 119 602 5.1 60 4 39 389 10 43 1 0 0 TRUE Data: •Step 1: Gather data by parsing it off websites (NFL.com, NCAA.org) with Python scripts, and through Collegio Football (database program). •Step 2: Use more Python scripts to combine data into two large .csv files for quarterbacks and running backs •Step 3: Fix any left over formatting errors, and fill in any missing statistics possible. •Step 4: Input into Weka (ML software), and predict desired statistics •Step 5: Evaluate accuracy using cross validation Preliminary Results: •Difficulty building trees with large sets of training data, better trees made when attributes are selected by hand. •Baseline for accuracy is 68%, this is given if all predictions for “starting third season” are set to false and no tree is constructed •Accuracy of the program varies significantly with different feature sets, feature selection is very important Classification using Decision Trees: The idea behind this project is to use classification algorithms to train a program to predict NFL stats when given collegiate stats. Classification is the process of training a program on a set of known instances, to predict unknown ones. I am using a Decision Tree algorithm to train the program. A decision tree algorithm: •Creates a graph (tree) from the training data. •The leaves are the classes, and branches are attribute values •Goal is to make the smallest tree possible that covers all instances •Uses the tree to make a set of classification rules. Next Step: •Continue with different feature selections to improve accuracy to beat baseline mple input for running back data, blue are inputs, red are possible outputs

Upload: dasan

Post on 05-Jan-2016

21 views

Category:

Documents


0 download

DESCRIPTION

Senior Project – Computer Science - 2008 Machine Learning in Football Andrew Finley Advisor – Prof. Striegnitz. Classification using Decision Trees: - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Research Question:

Senior Project – Computer Science - 2008

Machine Learning in FootballAndrew Finley

Advisor – Prof. Striegnitz

Research Question: Every year there are players who

move from collegiate football to professional football with high expectations and never meet them. Likewise, there are players with low expectations who exceed them. This leads me to question, is it possible to accurately predict the success of NFL players based on their collegiate performance? A player is generally considered successful if he is starting a majority of his games by his third season. The goal of this project is to build a program that will predict a player’s professional statistics, given their collegiate statistics. For the sake of time, I am only looking at quarterbacks and running backs.

Player School Year1 Pos1 Cl1 G1 Rush Yds1 Car1 Rush TD1 Yds/Car1 RushYds/G1 Rec Yds1 Rec1 Rec TD1 Yds/Rec1 Rec/G1 RecYds/G1 PR1 PR Yds1 PR TD1 Yds/PR1 PR/G1 KR1 KR Yds1 KR TD1 Yds/KR1 KR/G1 Ret TD1 Tot Yds1 Tot TD1 TotYds/G1Ronnie Brown Auburn 2002RB So 12 1008 175 13 5.76 84 166 9 1 18.4 0 13.8 0 0 0 0 0 0 0 0 0 0 0 1174 14 97.8

Year2 Pos2 Cl2 G2 Rush Yds2 Car2 Rush TD2 Yds/Car2 RushYds/G2 Rec Yds2 Rec2 Rec TD2 Yds/Rec2 Rec/G2 RecYds/G2 PR2 PR Yds2 PR TD2 Yds/PR2 PR/G2 KR2 KR Yds2 KR TD2 Yds/KR2 KR/G2 Ret TD2 Tot Yds2 Tot TD2 TotYds/G22003RB Jr 6 446 95 5 4.7 74.3 80 8 0 10 1 13.3 0 0 0 0 0 0 0 0 0 0 0 526 5 87.6

Year3 Pos3 Cl3 G3 Rush Yds3 Car3 Rush TD3 Yds/Car3 RushYds/G3 Rec Yds3 Rec3 Rec TD3 Yds/Rec3 Rec/G3 RecYds/G3 PR3 PR Yds3 PR TD3 Yds/PR3 PR/G3 KR3 KR Yds3 KR TD3 Yds/KR3 KR/G3 Ret TD3 Tot Yds3 Tot TD3 TotYds/G32004RB Sr 12 913 153 8 5.97 76.1 313 34 1 9.2 2 26.1 0 0 0 0 0 0 0 0 0 0 0 1226 9 102.2

Height Weight6'-1'' 230Season1 Team1 G1 GS1 Att1 RushYds1 RushAvg1 RushLng1 RushTD1 Rec1 RecYds1 RecAvg1 RecLng1 RecTD1 FUM1 Lost1 Starting

2005MiamiDolphins 15 14 207 907 4.4 65 4 32 232 7.3 38 1 4 4 TRUESeason2 Team2 G2 GS2 Att2 RushYds2 RushAvg2 RushLng2 RushTD2 Rec2 RecYds2 RecAvg2 RecLng2 RecTD2 FUM2 Lost2 Starting

2006MiamiDolphins 13 12 241 1008 4.2 47 5 33 276 8.4 24 0 4 2 TRUESeason3 Team3 G3 GS3 Att3 RushYds3 RushAvg3 RushLng3 RushTD3 Rec3 RecYds3 RecAvg3 RecLng3 RecTD3 FUM3 Lost3 Starting

2007MiamiDolphins 7 7 119 602 5.1 60 4 39 389 10 43 1 0 0 TRUE

Data: •Step 1: Gather data by parsing it off websites (NFL.com, NCAA.org) with Python scripts, and through Collegio Football (database program).•Step 2: Use more Python scripts to combine data into two large .csv files for quarterbacks and running backs•Step 3: Fix any left over formatting errors, and fill in any missing statistics possible. •Step 4: Input into Weka (ML software), and predict desired statistics•Step 5: Evaluate accuracy using cross validation

Preliminary Results:•Difficulty building trees with large sets of training data, better trees made when attributes are selected by hand.•Baseline for accuracy is 68%, this is given if all predictions for “starting third season” are set to false and no tree is constructed•Accuracy of the program varies significantly with different feature sets, feature selection is very important

Classification using Decision Trees: The idea behind this project is to use

classification algorithms to train a program to predict NFL stats when given collegiate stats. Classification is the process of training a program on a set of known instances, to predict unknown ones. I am using a Decision Tree algorithm to train the program. A decision tree algorithm:•Creates a graph (tree) from the training data.•The leaves are the classes, and branches are attribute values•Goal is to make the smallest tree possible that covers all instances •Uses the tree to make a set of classification rules.

Next Step:•Continue with different feature selections to improve accuracy to beat baseline

- Sample input for running back data, blue are inputs, red are possible outputs