msis5633 cfb presentation
Post on 14-Jul-2015
Embed Size (px)
Predictive Measures in College Football
Predictive Measures in College FootballBy David AffentrangerMSIS 5633 OSU Fall 2012
Project OutlineProject ResearchData Sources and ToolsData TransformationDecision TreeDecision Tree ResultsClassification Predictive ModelingPrediction ResultsFuture EffortsProject ResearchProjecting Point Spreadshttp://cs229.stanford.edu/proj2010/LiuLai-BeatingTheNCAAFootballPointSpread.pdfPredicting individual gameshttp://cfbpredictions.com/Predicting BCS Rankingshttp://harvardsportsanalysis.wordpress.com/2011/11/24/making-sense-of-the-chaos-a-bcs-prediction-model/
Stanford used 3 different models, logistics regression, support vector machines, and AdaBoost. Included Momentum as a factor in the models. Accuracy of 73% on straight bets and 56% on point spread bets
CFBPredictions uses a model to predict week to week games and provides news and blogs around college football.
Sites like harvardsportsanalysis are dedicated to predicting BCS rankings #1
3Data Sources and ToolsData Sourceshttp://www.cfbstats.com
ToolsMicrosoft ExcelRapid Miner*College Football Stats.com. Since 2005 kept data sets. *Updated weekly. 94 Statistical fields. Everything from kickoff returns to stadiums* Fairly large data set due to pulling data set in week 11 of the football season Due to large data Offline Processing of the data in Excel for manipulation and fed into RapidMiner
4Data TransformationData ChallengesData home/away teams listed in separate spreadsheetData rows lacked visiting team statisticsData rows lacked who won the gameMany numerical fields difficult for classificationData SolutionsUsed Excel vlookups to pull in visiting team statistics into the tuplesUsed the Points field to compare home vs away to create a Winner classification field.Built Formula fields to transform numerical values into a text classification (TOP, Special Teams, Penalty Yards)
CRISP MethodologyData Consolidation Collecting Data from the website. Data Cleaning Remove missing values from the data 1331 down to 665Data Transformation Created new fields for Winner, TOP, Special Teams, PenaData Reduction Changed the 94 fields down to 85Data Transformation
Fields like Home/Visit Rush and Pass yards were kept as integers due to there difficulty in classification and high weight.Turnover margin was calcd by comparing home/visit turnsFields like Penalty yards, special teams and TOP were built using formula fields combining data from home/visit and adding a value of NA for data that was deemed a wash +- 20 yards or +- 30 yards depending on data6Decision TreeUsing Rapid Miner and a Decision Tree (Gini Index) built a model to determine key factors for winning
First Step was to build a decision tree using Rapid Miner and the gini index.Gini index was chosen because of its ability to measure diversity.
Find factors to determine classification field Winner7Decision Tree ResultsVisit Rush Yards Most important Factor in the model>= 165 Yards Visit Winner 66% of the time