generalized linear mixed model english premier league soccer – 2003/2004 season
TRANSCRIPT
Generalized Linear Mixed Model
English Premier League Soccer – 2003/2004 Season
Introduction
• English Premier League Soccer (Football) 20 Teams – Each plays all others twice (home/away) Games consist of two halves (45 minutes each) No overtime Each team is on offense and defense for 38 games
(38 first and second halves) Response Variable: Goals in a half Potential Independent Variables
• Fixed Factors: Home Dummy, Half2 Dummy, Game#(1-38)• Random Factors: Offensive Team, Defensive Team
Distribution of Response: Poisson?
Preliminary SummaryTeam Off Goals Def Goals Team Off Goals Def Goals
Arsenal 73 26 Southampton 44 45
Aston Villa 48 44 Wolverhampton 38 77
Blackburn 51 59 Birmingham 43 48
Charlton 51 51 Bolton 48 56
Everton 45 57 Chelsea 67 30
Leeds United 40 79 Fulham 52 46
Liverpool 55 37 Leicester City 48 65
Manchester United 64 35 Manchester City 55 54
Newcastle 52 40 Middlesbrough 44 52
Tottenham 47 57 Portsmouth 47 54
Half2 Goals0 4611 551
Home Goals0 4401 572
Goals by Game Order
0
5
10
15
20
25
30
35
40
45
0 5 10 15 20 25 30 35 40
Game Order
To
tal
Go
als
DW2.03335
Summary of Previous Slide
• Teams vary extensively on offense and defense Offense: min=38, max=73, mean=50.6, SD=8.85 Defense: min=26, max=79, mean=50.6, SD=13.75 Strong Negative correlation between off/def: r=-0.80
• Home Teams outscore Away Teams 1.3:1 • Second Half outscores First Half 1.2:1• No evidence of autocorrelation in total goals
scored over weeks, Durbin-Watson Stat = 2.03
“Marginal Analysis” – No Team Effects
• Break Down Goals by Home/Half2 (380 Games)Goals Home1 Road1 Home2 Road2Mean 0.6921 0.5211 0.8132 0.6368Variance 0.6886 0.5141 0.9122 0.6277Obs freqs
0 192 223 175 1981 127 124 130 1332 48 26 56 413 12 6 10 64 1 1 8 15 0 0 1 1
6+ 0 0 0 0
Exp freqs0 190.20 225.68 168.51 201.001 131.64 117.59 137.03 128.012 45.55 30.64 55.71 40.76
3+ 12.61 6.09 18.75 10.23
Chi-Sq0 0.0171 0.0318 0.2497 0.04491 0.1633 0.3493 0.3604 0.19462 0.1314 0.7014 0.0015 0.0014
3+ 0.0120 0.1350 0.0034 0.4846
Corr Home1 Road1 Home2 Road2Home1 1.0000 -0.0445 0.0970 0.1184Road1 -0.0445 1.0000 0.1079 0.0460Home2 0.0970 0.1079 1.0000 -0.0794Road2 0.1184 0.0460 -0.0794 1.0000
Sum 0.3238 1.2175 0.6151 0.7256df 2 2 2 2CV(.05) 5.991 5.991 5.991 5.991P-value 0.8505 0.5440 0.7353 0.6957
Summary of Previous Slide• Means (Variances) for 4 Half Types:
Home/1st Half: Mean = 0.692 Variance = 0.689 Away/1st Half: Mean = 0.521 Variance = 0.514 Home/2nd Half: Mean = 0.813 Variance = 0.912 Away/2nd Half: Mean = 0.637 Variance = 0.628 Thus, means and variances in strong agreement
• Chi-Square Statistics for testing for Poisson: Df = (4 categories-1)-(1 Parameter estimated) = 2 P-values all exceed 0.50 (.8505, .5440, .7353, .6957) Goals scored consistent with Poisson Distribution
Observed & Expected Counts
0
50
100
150
200
250
Fre
qu
en
cy
observed
expected
Home/1st Half Away/1st Half Home/2nd half Away/2nd Half
0 1 2 3+ 0 1 2 3+ 0 1 2 3+ 0 1 2 3+
Generalized Linear Models
• Dependent Variable: Goals Scored• Distribution: Poisson• Link Function: log• Independent Variables: Home, Half2 Dummy Variables• Models:
2*2)(log :Model2
2)(log :1 Model
HomeHalf2Half2Home0
Half2Home0
HalfHomeHalfHomeYE
HalfHomeYE
Model fit using generalized linear model software packages
Parameter Estimates / Model Fit – Model 1
Distribution Poisson Link Function Log Dependent Variable goals Number of Observations Read 1520 Number of Observations Used 1520
Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 1517 1650.4574 1.0880 Scaled Deviance 1517 1650.4574 1.0880 Pearson Chi-Square 1517 1549.2570 1.0213 Scaled Pearson X2 1517 1549.2570 1.0213 Log Likelihood -1411.0226
Algorithm converged.
Parameter Estimates / Model Fit – Model 1
Analysis Of Parameter Estimates Standard Wald 95% Confidence Chi- Parameter DF Estimate Error Limits Square Intercept 1 -0.6397 0.0588 -0.7549 -0.5245 118.48 home 1 0.2624 0.0634 0.1381 0.3866 17.12 half2 1 0.1783 0.0631 0.0546 0.3020 7.98 Scale 0 1.0000 0.0000 1.0000 1.0000
Analysis Of Parameter Estimates Parameter Pr > ChiSq Intercept <.0001 home <.0001 half2 0.0047 Scale
NOTE: The scale parameter was held fixed.
Parameter Estimates / Model Fit – Model 2
Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF
Deviance 1516 1650.3613 1.0886 Scaled Deviance 1516 1650.3613 1.0886 Pearson Chi-Square 1516 1549.7072 1.0222 Scaled Pearson X2 1516 1549.7072 1.0222 Log Likelihood -1410.9745
Algorithm converged.
Parameter Estimates / Model Fit – Model 2 Analysis Of Parameter Estimates Standard Wald 95% Confidence Chi-Parameter DF Estimate Error Limits SquareIntercept 1 -0.6519 0.0711 -0.7912 -0.5126 84.15home 1 0.2839 0.0941 0.0995 0.4683 9.10half2 1 0.2007 0.0958 0.0129 0.3885 4.39home*half2 1 -0.0395 0.1274 -0.2891 0.2101 0.10Scale 0 1.0000 0.0000 1.0000 1.0000
Parameter Pr > ChiSq
Intercept <.0001 home 0.0026 half2 0.0363 home*half2 0.7566 Scale
NOTE: The scale parameter was held fixed.
Testing for Home/Half2 Interaction
• H0: No Home x Half2 Interaction (HomeHalf2 = 0)• HA: Home x Half2 Interaction (HomeHalf2 ≠ 0)• Test 1 – Wald Test • Test 2 – Likelihood Ratio Test
7564.0962.
0962.0))9745.1410(2())0226.1411(2(
))eihood(H(-2log(lik))eihood(H(-2log(lik T.S.
:Test ratio Likelihood
7566.0961.
0961.01274.0
0395.0
SE :T.S.
:Test Wald
21
0
21
2
2
HomeHalf2
^HomeHalf2
^
2
PP
PP
X
A
obs
Testing for Main Effects for Home & Half2
• Wald tests only reported here (both effects are very significant)
• Tests based on Model 1 (no interaction model)
0047.98.798.70631.0
1783.0:..
0:0: :Effect Half2
0001.13.1713.170634.0
2624.0:..
0:0: :Effect Home
21
22
Half2Half20
21
22
HomeHome0
PPXST
HH
PPXST
HH
obs
A
obs
A
Interpreting the GLM
820.0)20.1(686.0 Home/Half2
630.0)20.1(53.0 Away/Half2
686.0)30.1(53.0 Home/Half1
5275.0 Away/Half1
:Means Estimated
)()12,1( Home/Half2
)()12,0( Away/Half2
)()02,1( Home/Half1
)()02,0( Away/Half1
)(
:Model
1783.02624.06397.0^
1783.06397.0^
2624.06397.0^
6397.^
2
Half2
^
Home
^
0
^
Half2
^
0
^
Home
^^
0
^
0
Half2Home0
Half20
Home0
0
Half2Home0
ee
ee
ee
ee
eYEHalfHome
eYEHalfHome
eYEHalfHome
eYEHalfHome
eYE HalfHome
Incorporating Random (Team) Effects
• Teams clearly vary in terms of offensive and defensive skills (see slide 3)
• Since many factors are inputs into team abilities (players, coaches, chemistry), we will treat team offensive and defensive effects as Random
• There will be 20 random offensive effects (one per team) and 20 defensive effects
Random Team Effects
• All effects are on log scale for goals scored
• Offense Effects: oi ~ NID(0,o2)
• Defense Effects: di ~ NID(0,d2)
• In Estimation process assume COV(oi,di)=0 which seems a stretch (but we can still “observe” the covariance of the estimated random effects)
Mixed Effects Model
• Fixed Effects: Intercept, Home, Half2 (• Random Effects: Offteam, Defteam ()• Conditional Model (on Random Effects)
0,,0~,0~
Teamfor Effect Defense Teamfor Effect Offense
effect Half 2 Effect Home Intercept
12,02,1,0
20,...,120,...,12,12,1
2log
,,2
,2
,
,,
ndHalf2Home0
2121
,,Half2Home0
lDefkOffdlDefokOff
lDefkOff
lDefkOffjiijkl
COVNIDNID
lk
HalfHalfHomeHome
lklkji
HalfHome
Model in Matrix Notation - Example
DDOOg βZβZXαZβXαμμeμY )log()(
League has 3 Teams: A, B, C
Order of Entry of Games: A@B, A@C, B@C, B@A, C@A, C@B
Order of Entry of Scores within Game: Home/1st, Away/1st, Home/2nd, Away/2nd
3 Offense Effects, 3 Defense Effects, 24 Observations
DC
DB
DA
D
OC
OB
OA
O
ββα
Half2
Home
0
Model – Based on 3 Teams
DDOOg βZβZXαZβXαμeμy )(
1 1 0 0 1 0 1 0 01 0 0 1 0 0 0 1 01 1 1 0 1 0 1 0 01 0 1 1 0 0 0 1 01 1 0 0 0 1 1 0 01 0 0 1 0 0 0 0 11 1 1 0 0 1 1 0 01 0 1 1 0 0 0 0 11 1 0 0 0 1 0 1 01 0 0 0 1 0 0 0 1
X= 1 1 1 Z0= 0 0 1 ZD= 0 1 01 0 1 0 1 0 0 0 11 1 0 1 0 0 0 1 01 0 0 0 1 0 1 0 01 1 1 1 0 0 0 1 01 0 1 0 1 0 1 0 01 1 0 1 0 0 0 0 11 0 0 0 0 1 1 0 01 1 1 1 0 0 0 0 11 0 1 0 0 1 1 0 01 1 0 0 1 0 0 0 11 0 0 0 0 1 0 1 01 1 1 0 1 0 0 0 11 0 1 0 0 1 0 1 0
Sequence of Potential Models
1. No fixed or random effects (common mean)
2. Fixed home and second half effects, no random effects
3. Fixed home and second half effects, random offense team effects
4. Fixed home and second half effects, random defense team effects
5. Fixed home and second half effects, random offense and defense team effects
Results – Estimates (P-Values)
Model Home Half2 o2 d
2 Res2 -2lnL AIC BIC
1 -.407
(.0001)
N/A N/A N/A N/A 1.044 5001.9 5003.9 5009.3
2 -.6397
(.0001)
.2624
(.0001)
.1783
(.0052)
N/A N/A 1.0213 4992.3 4994.3 4999.6
3 -.6413
(.0001)
.2624
(.0001)
.1783
(.0050)
.01004
(.143*)
N/A 1.0099 4985.6 4989.6 4991.6
4 -.6592
(.0001)
.2624
(.0001)
.1783
(.0040)
N/A .0588
(.012*)
0.9630 4958.6 4962.6 4964.6
5 -.6605
(.0001)
.2624
(.0001)
.1783
(.0039)
.0084
(.162*)
.0549
(.012*)
0.9531 4951.9 4957.9 4960.9
•Based on Z-test, not preferred Likelihood Ratio Test
•H0:o2 = 0 vs HA:0
2>0 TS: 4958.6-4951.9=6.7 P=0.5P(12 ≥6.7)=.005
•Based on AIC, BIC, Model with both offense and defense effects is best
•No interaction found between team effects and home or half2
Goodness of Fit
• We Test whether the Poisson GLMM is appropriate model by means of the Scaled Deviance
• H0: Model Fits HA: Model Lacks Fit• Deviance = 1570.7• DF = N-#fixed parms = 1520-3=1517• P-value=P(2≥1570.7)=0.1646• No Evidence of Lack-of-Fit*• * If we use Scaled Deviance, we do reject, where scaled
deviance=1570.7/0.9531=1647.9
Best Linear Unbiased Predictors (BLUPs)
Team Off Effect Def Effect Team Off Effect Def EffectArsenal 0.1284 -0.4016 Leicester City -0.0120 0.2112Aston Villa -0.0170 -0.0873 Liverpool 0.0240 -0.2018Birmingham -0.0469 -0.0262 Manchester City 0.0281 0.0649Blackburn 0.0049 0.1333 Manchester United 0.0775 -0.2348Bolton -0.0142 0.0914 Middlesbrough -0.0398 0.0335Charlton 0.0030 0.0205 Newcastle 0.0065 -0.1516Chelsea 0.0941 -0.3255 Portsmouth -0.0208 0.0630Everton -0.0325 0.1046 Southampton -0.0414 -0.0724Fulham 0.0079 -0.0549 Tottenham -0.0201 0.1050Leeds United -0.0582 0.3758 Wolverhampton -0.0712 0.3529
Estimated Team (Random) Effects
(Teams with High Defense values Allow More Goals)
Parameter EstimateIntercept -0.6605Home 0.2624Half2 0.1783
Estimated Fixed Effects
For each Halfijkl compute exp{-0.6605+HOMEi+HALF2j+ok+dl} as the BLUP
Comparison of BLUPs with Actual Scores
• For Each Team Half, we have Actual and BLUP• Correlation Between Actual & BLUP = 0.2655• Concordant Pairs of Halves (One scores higher
on both Actual and BLUP than other) = 452471• Discordant Pairs of Halves = 355617• “Gamma” =
(452471-355617)/(452471+355617)=0.1199• Evidence of Some Positive Association Between
actual and predicted scores
"Distribution" of BLUPs by Actual Goals Scored
0
0.5
1
1.5
2
2.5
3
0 0.2 0.4 0.6 0.8 1 1.2 1.4
BLUP
No
rmal
Den
sity 0
1
2
3+
Sources: Data: SoccerPunter.com
Methods:
Littell, Milliken, Stroup, Wolfinger(1996). “SAS System for Mixed Models”
Wolfinger, R. and M. O’Connell(1993). “Generalized Linear Mixed Models: A Pseudo-Likelihood Approach,” J. Statist. Comput. Simul., Vol. 48, pp. 233-243.
SAS Codedata one;infile 'engl2003d.dat';input hteam $ 1-20 rteam $21-40 goals 47-48 half2 56 home 64 round 71-73;if home=1 then do; offteam=hteam; defteam=rteam; end;else do; offteam=rteam; defteam=hteam; end;
%include 'glmm800.sas';%glimmix(data=two, procopt=method=reml, stmts=%str( class offteam defteam;
model goals = home half2 /s; random offteam defteam /s ; ), error=poisson, link=log);
run;