2015 sport analysis for march madness

10
March Data Crunch Madness Team Coach K. Yi Chun Chien, Xiayu Zeng, Feifei Chen, Xiaoshan Jin March 2015

Upload: yi-chun-nancy-chien

Post on 10-Aug-2015

113 views

Category:

Data & Analytics


1 download

TRANSCRIPT

March Data Crunch Madness

Team Coach K.Yi Chun Chien, Xiayu Zeng, Feifei Chen,

Xiaoshan Jin

March 2015

2Introduction❖ Background: NCAA Men’s Basketball Tournament is a single-elimination tournament,

currently featuring 68 college teams.

❖ Objective: Create an effective model that examines factors contributing to a team’s performance, based on data from 2001-2014.

❖ Result: As can be analyzed from the model, box score has a large effect on a team’s result in 2015, which is helpful to predict:➢ Win/Lose➢ Winning Probability➢ Sweet Sixteen

3Independent & Dependent Variables

IndependentVariables

SeedLocation

Box Score

Assist, Steal, Block Shot,% 2/3 Point Field Goals,% Free Throws, Tempo

Seed#,If this team is Top 5,If this team is 15/16

Latitude, Longitude, Distance Difference

Dependent Variable:Win/Lose

4

Performance Validation

Accuracy

ROC Curve

AUC

RMSE

Define Data Group

Decision Tree

Build 5 Models

Bootstrap Forest

Boosted Tree

Neural Network

Nominal Logistic

Training (80%)

Validation(20%)

Testing (2015)

Probability

Win/Lose

Top 16

Prediction

Analysis Process

5

● Distribution Review: Most variables are normal distributed

Distribution and Correlation● Scatter Matrix: Few variables has linear correlation

6

5 Models Performance

Validation

Nominal Logistic Regression Accuracy: 72%

ROC Curve for Validation

Nominal Logistic Regression has the best performance

Performance Validation

Training

7

Result Lose WinLose 6 6Win 5 24Total 11 30

● 2015 Forecast Top 16 team ● 2015 Forecast Result: 73% accuracy

Prediction

8Model ExplanationDefensive efficiency, offensive efficiency, opponent’s blocked shots and assists are most important attributes based on individual p-value

According to our analysis results, good offensive efficiency contributes more than defensive efficiency in leading a team’s success

The closer the distance to stadium, the better result a team performs

9Interesting Analysis● Average score difference is narrowing down● The score pattern for Top 5 Seeds is less volatile

than the one for bottom 2 seeds

● 9 out of 16 is predicted correctly● Only Georgetown shows a declining pattern

of winning probability

10

Result and Conclusion❖ Whether a team wins or loses is positively related to four

primary factors:➢ offensive efficiency➢ defensive efficiency➢ block shots➢ assists

❖ Accuracy: Our model is 72.19% accurate in predicting a team’s result for 2015.