msis5633 cfb presentation

Download Msis5633 cfb presentation

Post on 14-Jul-2015




0 download

Embed Size (px)


Predictive Measures in College Football

Predictive Measures in College FootballBy David AffentrangerMSIS 5633 OSU Fall 2012

Project OutlineProject ResearchData Sources and ToolsData TransformationDecision TreeDecision Tree ResultsClassification Predictive ModelingPrediction ResultsFuture EffortsProject ResearchProjecting Point Spreads individual games BCS Rankings

Stanford used 3 different models, logistics regression, support vector machines, and AdaBoost. Included Momentum as a factor in the models. Accuracy of 73% on straight bets and 56% on point spread bets

CFBPredictions uses a model to predict week to week games and provides news and blogs around college football.

Sites like harvardsportsanalysis are dedicated to predicting BCS rankings #1

3Data Sources and ToolsData Sources

ToolsMicrosoft ExcelRapid Miner*College Football Since 2005 kept data sets. *Updated weekly. 94 Statistical fields. Everything from kickoff returns to stadiums* Fairly large data set due to pulling data set in week 11 of the football season Due to large data Offline Processing of the data in Excel for manipulation and fed into RapidMiner

4Data TransformationData ChallengesData home/away teams listed in separate spreadsheetData rows lacked visiting team statisticsData rows lacked who won the gameMany numerical fields difficult for classificationData SolutionsUsed Excel vlookups to pull in visiting team statistics into the tuplesUsed the Points field to compare home vs away to create a Winner classification field.Built Formula fields to transform numerical values into a text classification (TOP, Special Teams, Penalty Yards)

CRISP MethodologyData Consolidation Collecting Data from the website. Data Cleaning Remove missing values from the data 1331 down to 665Data Transformation Created new fields for Winner, TOP, Special Teams, PenaData Reduction Changed the 94 fields down to 85Data Transformation

Fields like Home/Visit Rush and Pass yards were kept as integers due to there difficulty in classification and high weight.Turnover margin was calcd by comparing home/visit turnsFields like Penalty yards, special teams and TOP were built using formula fields combining data from home/visit and adding a value of NA for data that was deemed a wash +- 20 yards or +- 30 yards depending on data6Decision TreeUsing Rapid Miner and a Decision Tree (Gini Index) built a model to determine key factors for winning

First Step was to build a decision tree using Rapid Miner and the gini index.Gini index was chosen because of its ability to measure diversity.

Find factors to determine classification field Winner7Decision Tree ResultsVisit Rush Yards Most important Factor in the model>= 165 Yards Visit Winner 66% of the time