quantitative analysis in football
DESCRIPTION
Quantitative analysis in footballTRANSCRIPT
INDIAN INSTITUTE OF MANAGEMENT, KOZHIKODE
QUANTITATIVE ANALYSIS IN FOOTBALL
Quantitative Methods Project
9/18/2013
This report shows the various ways quantitative analysis can be done in football to determine the performance of a team and predict the result of a match.
Quantitative Analysis in Football
Contents
INTRODUCTION:...............................................................................4
DESCRIPTIVE STATISTICS:............................................................5
PROBABLITY ANALYSIS..............................................................10
INTERVAL ESTIMATION...............................................................14
HYPOTHESIS TESTING:.................................................................15
REFERENCES:..................................................................................18
Group 27 Page 2
Quantitative Analysis in Football
INTRODUCTION:We are all well aware of how football has impacted the financial world. The money generated by football is growing steadily since 1990. There have also been record breaking financial deals and negotiations between football clubs and players. Other than the deals between football clubs and players, a huge amount of money is transacted in the form of betting. According to Nevada gaming Commission $3.2 billion was wagered in sports bets in the state’s casinos in 2011. Of that amount, $1.34 billion or 41 percent was handled just for football. Thirty-three million Americans participate in fantasy football, according to the Fantasy Sports Trade Association. The FSTA found that $1.18 billion changes hands between players through pools each year.
Hence a there is a need to quantitatively evaluate not only the players, but also the performance of the team as a whole. Football results are randomly distributed but the outcomes of the games can be predicted using statistical analysis. Here in this project we have shown how quantitative analysis can be used in analysing the performance of the team and in turn predicting the results of a match.
With football betting, there are only three possible half-time and full time outcomes (home/draw/away). We have used the results of matches played by two teams – Real Madrid and Manchester United from 1998 to 2013 to analyse and predict their performance.
The data used for analysis contains the number of matches played by the team in a season, the position held by the team in that season, the points gained by the team in that season, the home and away match records (number of matches won, number of matches lost, number of matches with no result, number of goals scored for the team and number of goals scored against the team).
Below is the data used for our analysis:
Real Madrid
Home AwaySeason
Position
Played
Win Draw Loss
For Against Win Draw Loss
For Against
Points
2012-2013 2 38 17 2 0
67 21 9 5 5
36 21 85
2011-2012 1 38 16 2 1
70 19 16 2 1
51 13 100
2010-2011 2 38 16 1 2
61 12 13 4 2
41 21 92
2009-2010 2 38 18 0 1
60 18 13 3 3
42 17 96
2008-2009 2 38 14 2 3
49 29 11 1 7
34 23 78
2007-2008 1 38 17 0 2
53 18 10 4 5
31 18 85
2006- 1 38 12 4 3 3 18 11 3 5 3 22 76
Group 27 Page 3
Quantitative Analysis in Football
2007 2 42005-2006 2 38 11 4 4
40 21 9 6 4
30 19 70
2004-2005 2 38 15 1 3
43 12 10 4 5
28 20 80
2003-2004 4 38 13 2 4
43 26 8 5 6
29 28 70
2002-2003 1 38 13 5 1
52 22 9 7 3
34 20 78
2001-2002 3 38 14 5 0
48 14 5 4 10
21 30 66
2000-2001 1 38 15 3 1
53 15 9 5 5
28 25 80
1999-2000 5 38 9 4 6
31 27 7 10 2
27 21 62
1998-1999 2 38 14 2 3
46 24 7 3 9
31 38 68
Manchester United
Home AwaySeason
Position
Played
Win
Draw
Loss
For
Against
Win
Draw
Loss
For
Against
Points
EPL2012-2013 1 38 16 0 3 45 19 12 5 2 41 24 892011-2012 2 38 15 2 2 52 19 13 3 3 37 14 892010-2011 1 38 18 1 0 49 12 5 10 4 29 25 802009-2010 2 38 16 1 2 52 12 11 3 5 34 16 852008-2009 1 38 16 2 1 43 13 12 4 3 25 11 902007-2008 1 38 17 1 1 47 7 10 5 4 33 15 872006-2007 1 38 15 2 2 46 12 13 3 3 37 15 892005-2006 2 38 13 5 1 37 8 12 3 4 35 26 832004-2005 3 38 12 6 1 31 12 10 5 4 27 14 772003-2004 3 38 12 4 3 37 15 11 2 6 27 20 752002-2003 1 38 16 2 1 42 12 9 6 4 32 22 832001-2002 3 38 11 2 6 40 17 13 3 3 47 28 772000-2001 1 38 15 2 2 49 12 9 6 4 30 19 801999-2000 1 38 15 4 0 59 16 13 3 3 38 29 911998-1999 1 38 14 4 1 45 18 8 9 2 35 19 79
DESCRIPTIVE STATISTICS:Descriptive statistics is a discipline that describes the main features of collection of data. Some measures that are commonly used to describe a data set are measures of central tendency and measures of variability or dispersion. Measures of central tendency include the
Group 27 Page 4
Quantitative Analysis in Football
mean, median and mode, while measures of variability include the standard deviation (or variance), the minimum and maximum values of the variables, kurtosis and skewness.
Based on the position held by the teams in various seasons, we can come to the consensus that
Manchester United has remained in the top three teams for the past 14 years, with the majority times winning and getting ranked first.
Positions profile of Man U
123
Real Madrid has remained in the top five teams for the last 14 years, with majority times winning and getting ranked first.
Positions profile of Real Madrid
12345
Stacked column charts show the relationship of individual items to the whole, comparing the contribution of each value to a total across categories. Number of wins, draws and losses in home/away can be depicted using stacked column chart with each stack representing number of wins, number of losses and number of draws.
Group 27 Page 5
Quantitative Analysis in Football
Home – Manchester United
2012-2013
2011-2012
2010-2011
2009-2010
2008-2009
2007-2008
2006-2007
2005-2006
2004-2005
2003-2004
2002-2003
2001-2002
2000-2001
1999-2000
1998-199902468
101214161820
LossDrawWin
Home – Real Madrid
2012-2013
2011-2012
2010-2011
2009-2010
2008-2009
2007-2008
2006-2007
2005-2006
2004-2005
2003-2004
2002-2003
2001-2002
2000-2001
1999-2000
1998-199902468
101214161820
LossDrawWin
Group 27 Page 6
Quantitative Analysis in Football
Away – Manchester United
2012-2013
2011-2012
2010-2011
2009-2010
2008-2009
2007-2008
2006-2007
2005-2006
2004-2005
2003-2004
2002-2003
2001-2002
2000-2001
1999-2000
1998-199902468
101214161820
LossDrawWin
Away – Real Madrid
2012-2013
2011-2012
2010-2011
2009-2010
2008-2009
2007-2008
2006-2007
2005-2006
2004-2005
2003-2004
2002-2003
2001-2002
2000-2001
1999-2000
1998-199902468
101214161820
LossDrawWin
The summary statistics number of wins in home and away by a team is as follows
Manchester United
Home -Win
Group 27 Page 7
Quantitative Analysis in Football
Mean 14.73333Standard Error 0.511456Median 15Mode 16Standard Deviation 1.980861Sample Variance 3.92381Kurtosis -0.44462Skewness -0.46411Range 7Minimum 11Maximum 18Sum 221Count 15
Real Madrid
Home- Win Mean 14.26667Standard Error 0.628427Median 14Mode 14Standard Deviation 2.433888Sample Variance 5.92381Kurtosis 0.111816Skewness -0.52951Range 9Minimum 9
Group 27 Page 8
Away - Win
Mean 10.73333Standard Error 0.589323Median 11Mode 13Standard Deviation 2.282438Sample Variance 5.209524Kurtosis 1.366206Skewness -1.16011Range 8Minimum 5Maximum 13Sum 161Count 15
Quantitative Analysis in Football
Maximum 18Sum 214Count 15
Away- Win
Mean 9.8
Standard Error0.71180521
7Median 9Mode 9Standard Deviation 2.75680975Sample Variance 7.6
Kurtosis0.71580773
8Skewness 0.57022685Range 11Minimum 5Maximum 16Sum 147Count 15
Box Plot
This plot is used to determine the dispersion of values with respect to the mean as well as determine the skewness in the values.
Real Madrid
Home AwayWin Win
9 511 712 713 813 914 914 914 915 1015 1016 1116 1117 13
Group 27 Page 9
14 16 18139
Quantitative Analysis in Football
17 1318 16
Median = 14 Median = 9Q1 = 13 Q1 = 8Q2 = 16 Q2 = 11Minimum (x1) = 9
Minimum (x1) = 5
Maximum (x2) = 18
Maximum (x2) = 16
We see that winning at home ground is left skewed indicating that a higher number of matches are being won on home ground.
We see that winning away from the home ground is right skewed indicating that a lower number of matches are being won away from the home ground.
Manchester United
Home Away
Win Win11 512 812 913 914 10
15 1015 1115 1115 1216 1216 1216 1316 1317 1318 13Median = 15 Median = 11Q1 = 13 Q1 = 9Q2 = 16 Q2 = 13
Group 27 Page 10
Quantitative Analysis in Football
Minimum (x1) = 11 Minimum (x1) = 5Maximum (x2) = 18 Maximum (x2) = 13
We see that winning at home ground is left skewed indicating that a higher number of matches are being won on home ground.
We see that winning away from the home ground is also left skewed indicating that a high number of matches are being won away from the home ground as well and so in the two cases (Home and Away) the team has a similar performance whether the match is on home ground or not.
PROBABLITY ANALYSIS Determining the distribution of the number of wins in home of both the teams
Let X be the random variable that denotes number of wins
X follows normal distribution with parameters µ and σ
The standard normal variable z = X-µ/σ
f(Z) = 1/√2π e− z2/2 is the standard normal density function
Manchester United
µ = 14.733; σ = 1.980860804
Season Win Z f(Z)2012-2013 17 1.123031802 0.21312011-2012 16 0.712166509 0.31012010-2011 16 0.712166509 0.31012009-2010 18 1.533897096 0.12382008-2009 14 -0.109564078 0.3972007-2008 17 1.123031802 0.21312006-2007 12 -0.931294665 0.25892005-2006 11 -1.342159959 0.1625
Group 27 Page 11
Quantitative Analysis in Football
2004-2005 15 0.301301215 0.38142003-2004 13 -0.520429372 0.34852002-2003 13 -0.520429372 0.34852001-2002 14 -0.109564078 0.3972000-2001 15 0.301301215 0.38141999-2000 9 -2.163890546 0.03871998-1999 14 -0.109564078 0.397
Hence, the standard normal distribution of wins in home is given by the graph
-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 20
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
Real Madrid
µ = 14.26666667; σ = 2.433887739
Season Win Z f(Z)2012-2013 17 1.123031802 0.21312011-2012 16 0.712166509 0.31012010-2011 16 0.712166509 0.31012009-2010 18 1.533897096 0.12382008-2009 14 -0.109564078 0.3972007-2008 17 1.123031802 0.21312006-2007 12 -0.931294665 0.25892005-2006 11 -1.342159959 0.16252004-2005 15 0.301301215 0.38142003-2004 13 -0.520429372 0.34852002-2003 13 -0.520429372 0.34852001-2002 14 -0.109564078 0.3972000-2001 15 0.301301215 0.38141999-2000 9 -2.163890546 0.03871998-1999 14 -0.109564078 0.397
Group 27 Page 12
Quantitative Analysis in Football
Hence, the standard normal distribution of wins in home is given by the graph
-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 20
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
For Standard Normal Distribution we see that for both the teams the entire data for 15 years lies within μ ± 2σ and the spread of the distributions for both the teams is almost the same indicating similar performance on home ground.
Calculating the number of points expected by the team to score in a match
Number of points gained if the match is won = 3
Number of points gained if the match is draw = 1
Number of points gained if the match is lost = 0
Manchester United
weight(x) x p(x) xP(x)
win 30.77543859
62.32631
6
draw 10.13333333
30.13333
3loss 0 0.09122807 0
2.459649
Hence, the average number of points expected by Manchester United to score in a match is 2.459
Real Madrid
weight(x x p(x) xP(x)
Group 27 Page 13
Quantitative Analysis in Football
)win 3 0.515789 1.547368draw 1 0.231579 0.231579loss 0 0.252632 0
1.778947
Hence, the average number of points expected by Manchester United to score in a match is 1.778947
Determining the expected amount of money that a team will make in the future match.
Manchester UnitedConsidering a sample of 15 English Premium Leagues
PosnNumber of times in 15 years
1 92 33 3
15
Event x(in million dollars) P(X) xP(X) E(X)Finishes 1st 15.1 0.60 9.06 11.42Finishes 2nd 7.3 0.20 1.46Finishes 3rd 4.5 0.20 0.9
Thus, for the next premier league we can conclude that the team will make $11.42 million. Thus, the management can afford to incur a maximum maintenance cost of 11.42 million $ for no profit no loss. Else it will result in a loss. (Currently the maintenance cost for Manchester United stands around $9 million yearly
Real Madrid
Considering a sample of 15 Spanish La Ligas
PosnNumber of times in 15 years
1 52 73 14 1
Group 27 Page 14
Quantitative Analysis in Football
5 115
Event x(in million dollars) P(X) xP(X) E(X)Finishes 1st 8.6 0.33 2.87 5.57Finishes 2nd 5.2 0.47 2.43Finishes 3rd 4.1 0.07 0.27Finishes 4th 3.3 0.07 0.22Finishes 5th 2.1 0.07 0.14
Thus, for the next premier league we can conclude that the team will make $5.57 million. Thus, the management can afford to incur a maximum maintenance cost of 5.57 million $ for no profit no loss. Else it will result in a loss. (Currently the maintenance cost for Real Madrid stands around $4 million yearly
INTERVAL ESTIMATION Manchester United
Estimating the mean number of goals scored by Manchester united.
Sample of past 15 seasons shows the mean to be 78.73 and standard deviation to be 10.83. Assuming goal scoring pattern to be normally distributed over the years, construct a 95% confidence interval level for mean.
Data and Analysis:
Given Data
Sample size 15
Mean
78.73
Standard deviation
10.22
Confidence Interval
95%
Sx
2.64
Degrees of freedom 14
t value
2.145
Group 27 Page 15
Quantitative Analysis in Football
Calculating from above values using t distribution, maximum and minimum values,
Max
Min
84.39
73.06
Real Madrid
Estimating the mean number of goals scored by Real Madrid in next season
Sample of past 15 seasons shows the mean to be 83 and standard deviation to be 17.23. Assuming goal scoring pattern to be normally distributed over the years, construct a 95% confidence interval level for mean.
Data and Analysis
Sample size 15Mean 83Std Dev 17.23783215Confidence Interval 95%
Sx 4.450789122Degrees of freedom 14t value 2.145
Max Min92.54 73.45
Conclusion: We can expect Manchester United to score goals in the range of 73 to 84 in
upcoming seasons with 95% certainty We can expect Real Madrid to score goals in the range of 73 to 92 in upcoming
seasons with 95% certainty Comparing both the team’s statistics, it can be concluded that Manchester United
is expected to perform consistently with less variations than Real Madrid.
Group 27 Page 16
Quantitative Analysis in Football
HYPOTHESIS TESTING: Manchester United
One sample hypothesis
Problem: A random sample of 570 English Premier Matches featuring Manchester United showed that the average number of goals scored by them Xbar = 1.182 per match and standard deviation = 0.1851. Does the average number of goals scored by MANU in a match be greater than 1? (Level of significance = 1%)
EIGHT STEP PROCEDURE:
Step 1.The parameter of interest is the mean number of goals scored by Manchester United per match, µ. (σ is not given)
Step 2. H0 : µ <= 1
Step 3. Ha : µ > 1
Step 4. α = 0.01
Step 5.The text statistic is
t = x3bar - µ0
s / √n
Step 6. Given that n=570, hence d.f. = 569 (as d.f >100, it can be approximated as infinity and calculated correspondingly from table). Also for α = 0.01, DOF = 569, α = 2.326. Hence, reject H0 if t0< 2.326
Step 7.Computations: Since xbar = 1.182, s = .1851, µ0= 1 and n=570, we have
t0 = 1.182 – 1 = 23.53
.1851/√570
Step 8.
Conclusion: Since t0 = 23.53 > 2.326 (t0.01, 569); we therefore reject the null hypothesis (that is H0 : µ <= 1) at the 0.01 level of significance. Therefore, we
Group 27 Page 17
Quantitative Analysis in Football
conclude that the mean number of goals scored by MANU per match exceeds 1 based on hypothesis testing using the sample of 570 Manchester United EPL matches and 5% level of significance.
Real Madrid
One sample hypothesis
Problem. A random sample of 570 Spanish La Liga Matches featuring Real Madrid showed that the average number of goals scored by them Xbar = 1.31 per match and standard deviation = 0.301. Does the average number of goals scored by Real Madrid in a match be greater than 1? (Level of significance = 5%)
EIGHT STEP PROCEDURE:
Step 1.The parameter of interest is the mean number of goals scored by Real Madrid per match, µ. (σ is not given)
Step 2. H0 : µ <= 1
Step 3. Ha : µ > 1
Step 4. α = 0.05
Step 5.The text statistic is
t = x3bar - µ0
s / √n
Step 6. Given that n=570, hence d.f. = 569 (as d.f >100, it can be approximated as infinity and calculated correspondingly from table). Also for α = 0.05, DOF = 569, α = 1.645. Hence, reject H0 if t0< 1.645
Step 7.Computations: Since xbar = 1.31, s = .301, µ0= 1 and n=570, we have
t0 = 1.31 – 1 = 24.74
.301/√570
Group 27 Page 18
Quantitative Analysis in Football
Step 8.
Conclusion: Since t0 = 24.74 > 1.645 (t0 .05, 569); we therefore reject the null hypothesis (that is H0 : µ <= 1) at the 0.05 level of significance. Therefore, we conclude that the mean number of goals scored by Real Madrid per match exceeds 1 based on hypothesis testing using the sample of 570 Real Madrid Spanish La Liga matches and 5% level of significance.
REFERENCES:
Source of data:
http://www.statto.com/football/teams/real-madrid/history/modern
http://www.statto.com/football/teams/manchester-united/history/modern
Group 27 Page 19