2015 sport analysis for march madness
TRANSCRIPT
March Data Crunch Madness
Team Coach K.Yi Chun Chien, Xiayu Zeng, Feifei Chen,
Xiaoshan Jin
March 2015
2Introduction❖ Background: NCAA Men’s Basketball Tournament is a single-elimination tournament,
currently featuring 68 college teams.
❖ Objective: Create an effective model that examines factors contributing to a team’s performance, based on data from 2001-2014.
❖ Result: As can be analyzed from the model, box score has a large effect on a team’s result in 2015, which is helpful to predict:➢ Win/Lose➢ Winning Probability➢ Sweet Sixteen
3Independent & Dependent Variables
IndependentVariables
SeedLocation
Box Score
Assist, Steal, Block Shot,% 2/3 Point Field Goals,% Free Throws, Tempo
Seed#,If this team is Top 5,If this team is 15/16
Latitude, Longitude, Distance Difference
Dependent Variable:Win/Lose
4
Performance Validation
Accuracy
ROC Curve
AUC
RMSE
Define Data Group
Decision Tree
Build 5 Models
Bootstrap Forest
Boosted Tree
Neural Network
Nominal Logistic
Training (80%)
Validation(20%)
Testing (2015)
Probability
Win/Lose
Top 16
Prediction
Analysis Process
5
● Distribution Review: Most variables are normal distributed
Distribution and Correlation● Scatter Matrix: Few variables has linear correlation
6
5 Models Performance
Validation
Nominal Logistic Regression Accuracy: 72%
ROC Curve for Validation
Nominal Logistic Regression has the best performance
Performance Validation
Training
7
Result Lose WinLose 6 6Win 5 24Total 11 30
● 2015 Forecast Top 16 team ● 2015 Forecast Result: 73% accuracy
Prediction
8Model ExplanationDefensive efficiency, offensive efficiency, opponent’s blocked shots and assists are most important attributes based on individual p-value
According to our analysis results, good offensive efficiency contributes more than defensive efficiency in leading a team’s success
The closer the distance to stadium, the better result a team performs
9Interesting Analysis● Average score difference is narrowing down● The score pattern for Top 5 Seeds is less volatile
than the one for bottom 2 seeds
● 9 out of 16 is predicted correctly● Only Georgetown shows a declining pattern
of winning probability