modeling a multinomial response - purdue universitybacraig/notes526/topic11a.pdf · 2020. 10....
TRANSCRIPT
![Page 1: Modeling a Multinomial Response - Purdue Universitybacraig/notes526/topic11a.pdf · 2020. 10. 6. · Purdue University Reading: Faraway Ch. 7, Agresti Ch. 7, KNNL Ch. 14 STAT 526](https://reader036.vdocuments.mx/reader036/viewer/2022071512/6131b84b1ecc51586944e9d1/html5/thumbnails/1.jpg)
Modeling a Multinomial Response
Bruce A Craig
Department of StatisticsPurdue University
Reading: Faraway Ch. 7, Agresti Ch. 7, KNNL Ch. 14
STAT 526 Topic 11 1
![Page 2: Modeling a Multinomial Response - Purdue Universitybacraig/notes526/topic11a.pdf · 2020. 10. 6. · Purdue University Reading: Faraway Ch. 7, Agresti Ch. 7, KNNL Ch. 14 STAT 526](https://reader036.vdocuments.mx/reader036/viewer/2022071512/6131b84b1ecc51586944e9d1/html5/thumbnails/2.jpg)
Multinomial Distribution
Model for discrete variable with two or more categories
Probability distribution:
Y = (Y1, . . . ,Yc) ∼ Multinomial(n, p1, . . . , pc−1)n is considered known (number of trials)
pc = 1−c−1∑
j=1
pj
p(y1, y2, . . . , yc) =(
n!y1!y2!...yc !
)
py11 p
y22 . . . pycc
E (Yj) = npj , Var(Yj) = npj(1− pj), Cov(Yj ,Yk) = −npjpk
Marginal dist for each Yj is B(n, pj)
Log-likelihood:
l(p) =c∑
j=1
yj log pj
Maximum Likelihood Estimator:p̂j = yj/n
STAT 526 Topic 11 2
![Page 3: Modeling a Multinomial Response - Purdue Universitybacraig/notes526/topic11a.pdf · 2020. 10. 6. · Purdue University Reading: Faraway Ch. 7, Agresti Ch. 7, KNNL Ch. 14 STAT 526](https://reader036.vdocuments.mx/reader036/viewer/2022071512/6131b84b1ecc51586944e9d1/html5/thumbnails/3.jpg)
Multinomial GLM Models
Now consider set of I Multinomial(ni ,pi) observations
Goal now to link predictors Xi to pi
As with binomial setting, can encounter data that are
Grouped: ni > 1Ungrouped: ni = 1
Predictors Xi may be continuous or discrete
Unlike binomial setting, need to distinguish between
Ordered categories for Y → cumulative logit modelNominal categories for Y → multinomial logit model
STAT 526 Topic 11 3
![Page 4: Modeling a Multinomial Response - Purdue Universitybacraig/notes526/topic11a.pdf · 2020. 10. 6. · Purdue University Reading: Faraway Ch. 7, Agresti Ch. 7, KNNL Ch. 14 STAT 526](https://reader036.vdocuments.mx/reader036/viewer/2022071512/6131b84b1ecc51586944e9d1/html5/thumbnails/4.jpg)
Example 1: Math Aptitude
Predicting a college freshman’s math aptitude given theirmathematics PSAT score in 10th grade.
Response: Aptitude Grade: 4 ordered levelsPredictor: PSAT score: continuous (10-pt increments)
STAT 526 Topic 11 4
![Page 5: Modeling a Multinomial Response - Purdue Universitybacraig/notes526/topic11a.pdf · 2020. 10. 6. · Purdue University Reading: Faraway Ch. 7, Agresti Ch. 7, KNNL Ch. 14 STAT 526](https://reader036.vdocuments.mx/reader036/viewer/2022071512/6131b84b1ecc51586944e9d1/html5/thumbnails/5.jpg)
Example 1: Math Aptitude
Aptitude grade (Y ) postively related to math score (X )
Overlap in math scores across grades means there is someuncertainty in predicting Y
In this example, ni = 1 for i = 1, 2, . . . , I = 500 students
Interested in conditional probs P(Y = j |xi )
With ordered response, often easier to work with thecumulative probabilities
P(Y ≤ j |xi ) =∑
k≤j
P(Y = j |xi )
STAT 526 Topic 11 5
![Page 6: Modeling a Multinomial Response - Purdue Universitybacraig/notes526/topic11a.pdf · 2020. 10. 6. · Purdue University Reading: Faraway Ch. 7, Agresti Ch. 7, KNNL Ch. 14 STAT 526](https://reader036.vdocuments.mx/reader036/viewer/2022071512/6131b84b1ecc51586944e9d1/html5/thumbnails/6.jpg)
Proportional Odds Model
Also called the cumulative logit model
log
(
P(Yi ≤ j |Xi )
1− P(Yi ≤ j |Xi )
)
= θj − Xiβ
Only parameters θj depend on level j
They are monotonically increasing with j
Parameters β describe cumulative log-odds
Odds(Xi )/Odds(Xi ′) does not depend on level jA β > 0 means a larger x increases probability of largerresponse j (positive association)
Like fitting logistic model for each j but β same
As with binary setting, can consider other link functions
STAT 526 Topic 11 6
![Page 7: Modeling a Multinomial Response - Purdue Universitybacraig/notes526/topic11a.pdf · 2020. 10. 6. · Purdue University Reading: Faraway Ch. 7, Agresti Ch. 7, KNNL Ch. 14 STAT 526](https://reader036.vdocuments.mx/reader036/viewer/2022071512/6131b84b1ecc51586944e9d1/html5/thumbnails/7.jpg)
Latent Variable Motivation
Similar to binary setting, can consider a latent variable tomotivate the GLM modelFor the math aptitude example, we could consider thereto be a latent continuous variable Z associated with theaptitude grade that is linearly related to their math score
Zi = β0 + β1xi + εi
Instead of observing Zi , we observe
Yi =
A Zi > c3B c2 < Zi < c3C c1 < Zi < c2D Zi < c1
Can compute the P(Yi = j |xi) using specified dist of ε
STAT 526 Topic 11 7
![Page 8: Modeling a Multinomial Response - Purdue Universitybacraig/notes526/topic11a.pdf · 2020. 10. 6. · Purdue University Reading: Faraway Ch. 7, Agresti Ch. 7, KNNL Ch. 14 STAT 526](https://reader036.vdocuments.mx/reader036/viewer/2022071512/6131b84b1ecc51586944e9d1/html5/thumbnails/8.jpg)
Motivation Continued
Using cJ = ∞,
P(Yi ≤ j |xi ) = P(Zi < cj)
= P(β0 + β1xi + εi ≤ cj)
= P(εi ≤ cj − β0 − β1xi )
= Fε(θ1 − β1xi )
If F is the CDF of the logistic distribution,
P(Yi ≤ j |xi ) =exp{θj − βixi}
1 + exp{θj − β1xi}
Can use Normal or Gumbel distributions to motivateprobit or complementary log-log link
STAT 526 Topic 11 8
![Page 9: Modeling a Multinomial Response - Purdue Universitybacraig/notes526/topic11a.pdf · 2020. 10. 6. · Purdue University Reading: Faraway Ch. 7, Agresti Ch. 7, KNNL Ch. 14 STAT 526](https://reader036.vdocuments.mx/reader036/viewer/2022071512/6131b84b1ecc51586944e9d1/html5/thumbnails/9.jpg)
Interpreting the Model Parameters
Using the logit link, the cumulative odds
log
(
P(Y ≤ j |X)
P(Y > j |X)
)
= θj − Xβ
Interpretation of a β (holding all other x constant)
log
(
P(Y ≤ j |x + δ)
P(Y > j |x + δ)÷
P(Y ≤ j |x)
P(Y > j |x)
)
= log
(
P(Y ≤ j |x + δ)
P(Y > j |x + δ)
)
− log
(
P(Y ≤ j |x)
P(Y > j |x)
)
= θ∗j − β(x + δ)− (θ∗j − βx) = −βδ
Change is proportional to the change in x for all j
STAT 526 Topic 11 9
![Page 10: Modeling a Multinomial Response - Purdue Universitybacraig/notes526/topic11a.pdf · 2020. 10. 6. · Purdue University Reading: Faraway Ch. 7, Agresti Ch. 7, KNNL Ch. 14 STAT 526](https://reader036.vdocuments.mx/reader036/viewer/2022071512/6131b84b1ecc51586944e9d1/html5/thumbnails/10.jpg)
Inference
Similar inference as in logistic regression when focusingon cumulative probs
Wald / LR / Score tests for model parametersPearson χ2, Deviance for goodness of fit
β invariant to the number of response categories
Predicting Y now involves a vector of probabilities
Easiest to first compute cumulative probabilities andthen use subtraction to get the probability vector
STAT 526 Topic 11 10
![Page 11: Modeling a Multinomial Response - Purdue Universitybacraig/notes526/topic11a.pdf · 2020. 10. 6. · Purdue University Reading: Faraway Ch. 7, Agresti Ch. 7, KNNL Ch. 14 STAT 526](https://reader036.vdocuments.mx/reader036/viewer/2022071512/6131b84b1ecc51586944e9d1/html5/thumbnails/11.jpg)
Maximum Likelihood Estimation
Let pj(X) = P(Y ≤ j |X)− P(Y ≤ j − 1|X)
Yi is vector of length j with one 1 and remainder 0’sLog-likelihood for the ith observation is:
li = log
J∏
j=1
pj (xi )yij
=J∏
j=1
[P(Y ≤ j |xi )− P(Y ≤ j − 1|xi )]yij
=J∏
j=1
[
exp{θj − xiβ}
1 + exp{θj − xiβ}−
exp{θj−1 − xiβ}
1 + exp{θj−1 − xiβ}
]yij
MaximizeI∑
i=1
li with respect to θj and β
STAT 526 Topic 11 11
![Page 12: Modeling a Multinomial Response - Purdue Universitybacraig/notes526/topic11a.pdf · 2020. 10. 6. · Purdue University Reading: Faraway Ch. 7, Agresti Ch. 7, KNNL Ch. 14 STAT 526](https://reader036.vdocuments.mx/reader036/viewer/2022071512/6131b84b1ecc51586944e9d1/html5/thumbnails/12.jpg)
Calculating the Residual Deviance
Residual deviance
G 2 =
I∑
i=1
J∑
j=1
yij log
(
1
p̂j(xi )
)
= −2
I∑
i=1
J∑
j=1
yij log p̂j(xi )
Degrees of freedom are the difference between model dfs
# params in saturated model (=# observations)
# params in reduced model (=# of intercepts + # predictors)
Residual deviance degrees of freedom for math aptitudestudy are 500− 4 = 496
STAT 526 Topic 11 12
![Page 13: Modeling a Multinomial Response - Purdue Universitybacraig/notes526/topic11a.pdf · 2020. 10. 6. · Purdue University Reading: Faraway Ch. 7, Agresti Ch. 7, KNNL Ch. 14 STAT 526](https://reader036.vdocuments.mx/reader036/viewer/2022071512/6131b84b1ecc51586944e9d1/html5/thumbnails/13.jpg)
Example: Math Aptitude
In R: Use polr in library MASS> library(MASS)
> fit = polr(grade1 ~ psat,mathapt)
> summary(fit)
Coefficients:
Value Std. Error t value
psat -0.01792 0.00124 -14.46 ***Function uses
int_j + XB
Intercepts: so be wary of sign
Value Std. Error t value
A|B -11.5391 0.6912 -16.6944
B|C -8.8652 0.5930 -14.9492
C|D -6.3311 0.5196 -12.1834
Residual Deviance: 981.8668
AIC: 989.8668
Grade A associated with j = 1 and Grade D with j = 4.That is why these are negative coefficients (see note above)
STAT 526 Topic 11 13
![Page 14: Modeling a Multinomial Response - Purdue Universitybacraig/notes526/topic11a.pdf · 2020. 10. 6. · Purdue University Reading: Faraway Ch. 7, Agresti Ch. 7, KNNL Ch. 14 STAT 526](https://reader036.vdocuments.mx/reader036/viewer/2022071512/6131b84b1ecc51586944e9d1/html5/thumbnails/14.jpg)
Results: Math Aptitude
Deviance is 981.9 on 496 df (from fit$df.residual)
Similar to Bernoulli distribution (ungrouped), thisdeviance should not be used to assess goodness of fit
Better to use extension of H-L test or Lipsitz testAssessment of β: For each 10-pt increase in score, theodds of being > j versus ≤ j decrease 16.4% (1− e−0.1792)
> exp(10*confint(fit))
Waiting for profiling to be done...
Re-fitting to get Hessian
2.5 % 97.5 %
0.8154889 0.8558026
STAT 526 Topic 11 14
![Page 15: Modeling a Multinomial Response - Purdue Universitybacraig/notes526/topic11a.pdf · 2020. 10. 6. · Purdue University Reading: Faraway Ch. 7, Agresti Ch. 7, KNNL Ch. 14 STAT 526](https://reader036.vdocuments.mx/reader036/viewer/2022071512/6131b84b1ecc51586944e9d1/html5/thumbnails/15.jpg)
Extension of Hosmer-Lemeshow
Score each observation and then group on these scores
si = p̂i1 + 2p̂i2 + · · ·+ Jp̂iJ
C∑
k=1
J∑
j=1
(Okj − Ekj)2/Ekj ∼ χ2
df
C represents the number of (equal-sized) groups
For this model, df = (C − 2)(J − 1) + (J − 2)
The additional J − 2 df are due to the reduced number ofparameters relative to a multinomial model
Note that our Bernoulli version considers J = 2
STAT 526 Topic 11 15
![Page 16: Modeling a Multinomial Response - Purdue Universitybacraig/notes526/topic11a.pdf · 2020. 10. 6. · Purdue University Reading: Faraway Ch. 7, Agresti Ch. 7, KNNL Ch. 14 STAT 526](https://reader036.vdocuments.mx/reader036/viewer/2022071512/6131b84b1ecc51586944e9d1/html5/thumbnails/16.jpg)
Using R
Can still use logitgof function> library(generalhoslem)
> logitgof(mathapt$grade1,ord=TRUE,fitted(fit))
Hosmer and Lemeshow test (ordinal model)
data: grade1, fitted(fit)
X-squared = 29.356, df = 26, p-value = 0.2952
sorder
Warning message:
In logitgof(grade1, ord = TRUE, fitted(fit)) :
At least one cell in the expected frequencies table is <!
Chi-square approximation may be incorrect.
This will be a problem in this example because littleoverlap in math scores between A and D aptitude students
STAT 526 Topic 11 16
![Page 17: Modeling a Multinomial Response - Purdue Universitybacraig/notes526/topic11a.pdf · 2020. 10. 6. · Purdue University Reading: Faraway Ch. 7, Agresti Ch. 7, KNNL Ch. 14 STAT 526](https://reader036.vdocuments.mx/reader036/viewer/2022071512/6131b84b1ecc51586944e9d1/html5/thumbnails/17.jpg)
Grouped Table - Using Deciles
Y = 1 Y = 2 Y = 3 Y = 4Group O E O E O E O E Total
1 25 21.34 21 23.81 3 4.42 1 0.43 502 11 9.75 28 32.38 18 13.28 0 1.59 573 3 3.85 23 21.93 16 16.69 3 2.53 454 2 2.72 21 19.78 20 22.30 6 4.20 495 0 2.46 26 21.26 33 34.73 8 8.56 676 0 0.75 8 7.55 21 18.38 4 6.33 337 1 0.88 6 9.66 31 32.08 20 15.38 588 0 0.39 5 4.68 17 21.79 21 16.14 439 0 0.25 4 3.19 23 21.63 30 31.93 5710 0 0.06 0 0.78 13 7.35 28 32.81 41
STAT 526 Topic 11 17
![Page 18: Modeling a Multinomial Response - Purdue Universitybacraig/notes526/topic11a.pdf · 2020. 10. 6. · Purdue University Reading: Faraway Ch. 7, Agresti Ch. 7, KNNL Ch. 14 STAT 526](https://reader036.vdocuments.mx/reader036/viewer/2022071512/6131b84b1ecc51586944e9d1/html5/thumbnails/18.jpg)
Lipsitz Test
As with H-L test, sort data into C groups
Define C − 1 group indicator variablesFit new ordinal logistic regression
log
(
P(Y ≤ j |X)
P(Y > j |X)
)
= θj − Xβ + γ1I1 + · · ·+ γC−1IC−1
Use the likelihood ratio test to test Ho : γ1 = · · · = γC−1 = 0
Recommend C be such that 6 ≤ C < N/5J
STAT 526 Topic 11 18
![Page 19: Modeling a Multinomial Response - Purdue Universitybacraig/notes526/topic11a.pdf · 2020. 10. 6. · Purdue University Reading: Faraway Ch. 7, Agresti Ch. 7, KNNL Ch. 14 STAT 526](https://reader036.vdocuments.mx/reader036/viewer/2022071512/6131b84b1ecc51586944e9d1/html5/thumbnails/19.jpg)
Using R
Can use lipsitz.test function in generalhoslem> library(generalhoslem)
> lipsitz.test(fit)
Lipsitz goodness of fit test for ordinal response models
data: formula: grade1 ~ psat
LR statistic = 11.226, df = 9, p-value = 0.2605
Tends to have rejection rates > α in small samples
Works best when covariates are continuous
STAT 526 Topic 11 19
![Page 20: Modeling a Multinomial Response - Purdue Universitybacraig/notes526/topic11a.pdf · 2020. 10. 6. · Purdue University Reading: Faraway Ch. 7, Agresti Ch. 7, KNNL Ch. 14 STAT 526](https://reader036.vdocuments.mx/reader036/viewer/2022071512/6131b84b1ecc51586944e9d1/html5/thumbnails/20.jpg)
Example 2: Dose Response
Effect of intravenous medication doses on patients withsubarachnoid hemorrhage trauma (p. 207, OrdCDA)
Glasgow Outcome Scale (Y )Treatment Veget. Major Minor GoodGroup (X ) Death State Disab. Disab. Recov.Placebo 59 25 46 48 32Low dose 48 21 44 47 30
Medium dose 44 14 54 64 31High dose 43 4 49 58 41
Response: Glascow Outcome scale - Ordered
Predictor: Dose level - Ordered
Similar to setting for linear-by-linear association model
Focus, however, is on predicting Y
So how do we treat the levels of X? Score them?
STAT 526 Topic 11 20
![Page 21: Modeling a Multinomial Response - Purdue Universitybacraig/notes526/topic11a.pdf · 2020. 10. 6. · Purdue University Reading: Faraway Ch. 7, Agresti Ch. 7, KNNL Ch. 14 STAT 526](https://reader036.vdocuments.mx/reader036/viewer/2022071512/6131b84b1ecc51586944e9d1/html5/thumbnails/21.jpg)
Notation for Ordinal Predictor
Back to contingency table summary (grouped data)Y
X 1 2 · · · J Total1 y11 y12 · · · y1J y1.2 y21 y22 · · · y2J y2....
..
....
..
....
..
.I yI1 yI2 · · · yIJ yI .
Total y.1 y
.2 · · · y.J n
Interested in cond probs P(Y = j |X = i) = pj |i
Proportional-odds model focuses on cumulative probs
P(Y ≤ j |X = i) =∑
k≤j
pk|i
STAT 526 Topic 11 21
![Page 22: Modeling a Multinomial Response - Purdue Universitybacraig/notes526/topic11a.pdf · 2020. 10. 6. · Purdue University Reading: Faraway Ch. 7, Agresti Ch. 7, KNNL Ch. 14 STAT 526](https://reader036.vdocuments.mx/reader036/viewer/2022071512/6131b84b1ecc51586944e9d1/html5/thumbnails/22.jpg)
Ordinal Odds Ratios
Local odds ratios
θLij =P(X = i ,Y = j) / P(X = i ,Y = j + 1)
P(X = i + 1,Y = j) / P(X = i + 1,Y = j + 1)
Global odds ratios
θGij =P(X ≤ i ,Y ≤ j) / P(X ≤ i ,Y > j)
P(X > i ,Y ≤ j) / P(X > i ,Y > j)
Cumulative odds ratios (conditional on X )
θCij =P(Y ≤ j |X = i) / P(Y > j |X = i)
P(Y ≤ j |X = i + 1) / P(Y > j |X = i + 1)
Analogues to correlations, but for categorical variables
STAT 526 Topic 11 22
![Page 23: Modeling a Multinomial Response - Purdue Universitybacraig/notes526/topic11a.pdf · 2020. 10. 6. · Purdue University Reading: Faraway Ch. 7, Agresti Ch. 7, KNNL Ch. 14 STAT 526](https://reader036.vdocuments.mx/reader036/viewer/2022071512/6131b84b1ecc51586944e9d1/html5/thumbnails/23.jpg)
Ordinal Odds Ratio Estimates
Local odds ratios
θ̂Lij =yij / yi,j+1
yi+1,j / yi+1,j+1
Global odds ratios
θ̂Gij =
∑
a≤i
∑
b≤j yab /∑
a≤i
∑
b>j yab∑
a>i
∑
b≤j yab /∑
a>i
∑
b>j yab
Cumulative odds ratios (conditional on X )
θ̂Cij =
∑
b≤j yib /∑
b>j yib∑
b≤j yi+1,b /∑
b>j yi+1,b
Alternative: testing for association with Pearson X 2
STAT 526 Topic 11 23
![Page 24: Modeling a Multinomial Response - Purdue Universitybacraig/notes526/topic11a.pdf · 2020. 10. 6. · Purdue University Reading: Faraway Ch. 7, Agresti Ch. 7, KNNL Ch. 14 STAT 526](https://reader036.vdocuments.mx/reader036/viewer/2022071512/6131b84b1ecc51586944e9d1/html5/thumbnails/24.jpg)
Example 2 Analysis : Dose Scored
> library(MASS)
> fit1 = polr(outcome~dose,weights=count,data=prob2)
> summary(fit1)
Coefficients:
Value Std. Error t value
dose 0.1755 0.05671 3.094
Intercepts:
Value Std. Error t value
1|2 -0.8946 0.1144 -7.8233
2|3 -0.4941 0.1107 -4.4638
3|4 0.5162 0.1118 4.6150
4|5 1.8815 0.1311 14.3565
Residual Deviance: 2461.349 degrees of freedom are 797
AIC: 2471.349 Cannot use to assess fit (ungrouped)
> exp(confint(fit1)) Increase dose 1 level increases odds of
2.5 % 97.5 % the next higher outcome between 6.6% and 33.2%
1.066619 1.332269
STAT 526 Topic 11 24
![Page 25: Modeling a Multinomial Response - Purdue Universitybacraig/notes526/topic11a.pdf · 2020. 10. 6. · Purdue University Reading: Faraway Ch. 7, Agresti Ch. 7, KNNL Ch. 14 STAT 526](https://reader036.vdocuments.mx/reader036/viewer/2022071512/6131b84b1ecc51586944e9d1/html5/thumbnails/25.jpg)
Plot of Predicted Probabilities> matplot(predProb, type="l", xlab="Dose+1", ylab="Predicted
Probability", cex=3.5)
> legend(x=3,y=0.15, lty=c(1:4), col=c(1:5), paste("Outcome =", c(1:5)))
STAT 526 Topic 11 25
![Page 26: Modeling a Multinomial Response - Purdue Universitybacraig/notes526/topic11a.pdf · 2020. 10. 6. · Purdue University Reading: Faraway Ch. 7, Agresti Ch. 7, KNNL Ch. 14 STAT 526](https://reader036.vdocuments.mx/reader036/viewer/2022071512/6131b84b1ecc51586944e9d1/html5/thumbnails/26.jpg)
Example 2 Analysis : Dose Categorical> fit2 = polr(outcome~as.factor(dose),weights=count,data=prob2)
> summary(fit2)
Coefficients:
Value Std. Error t value
as.factor(dose)1 0.1176 0.1791 0.6564
as.factor(dose)2 0.3174 0.1740 1.8240
as.factor(dose)3 0.5208 0.1794 2.9029
Intercepts:
Value Std. Error t value
1|2 -0.9188 0.1322 -6.9488 ***Two additional parameters
2|3 -0.5183 0.1291 -4.0154
3|4 0.4922 0.1298 3.7925 ***Test below does not suggest
4|5 1.8579 0.1462 12.7072 they add much to the fit
Residual Deviance: 2461.216
AIC: 2475.216
> anova(fit1,fit2)
Model R. df Resid. Dev Test Df LR stat. Pr(Chi)
dose 797 2461.349
as.factor(dose) 795 2461.216 1 vs 2 2 0.1328 0.9357261
STAT 526 Topic 11 26
![Page 27: Modeling a Multinomial Response - Purdue Universitybacraig/notes526/topic11a.pdf · 2020. 10. 6. · Purdue University Reading: Faraway Ch. 7, Agresti Ch. 7, KNNL Ch. 14 STAT 526](https://reader036.vdocuments.mx/reader036/viewer/2022071512/6131b84b1ecc51586944e9d1/html5/thumbnails/27.jpg)
Plot of Predicted Probabilities> matplot(predProb1, type="l", xlab="Dose+1", ylab="Predicted
Probability", cex=3.5)
> legend(x=3,y=0.15, lty=c(1:4), col=c(1:5), paste("Outcome =", c(1:5)))
STAT 526 Topic 11 27
![Page 28: Modeling a Multinomial Response - Purdue Universitybacraig/notes526/topic11a.pdf · 2020. 10. 6. · Purdue University Reading: Faraway Ch. 7, Agresti Ch. 7, KNNL Ch. 14 STAT 526](https://reader036.vdocuments.mx/reader036/viewer/2022071512/6131b84b1ecc51586944e9d1/html5/thumbnails/28.jpg)
Summary
Moving from scoring the ordinal variable to treating it asa nominal factor allow a test of the linearity assumption.
Result can depend on how one scores the different levelsof the dose variable
Equally spacedUnequally spaced
Visual comparison can be made via plots of the predictedprobabilities like the ones on Slides #25 and #27.
Need to look at grouped goodness of fit statistics
Multiple reasons for a poor fit
violation of proportional odds; wrong link; wrong func.form or missing predictors; overdispersion
STAT 526 Topic 11 28
![Page 29: Modeling a Multinomial Response - Purdue Universitybacraig/notes526/topic11a.pdf · 2020. 10. 6. · Purdue University Reading: Faraway Ch. 7, Agresti Ch. 7, KNNL Ch. 14 STAT 526](https://reader036.vdocuments.mx/reader036/viewer/2022071512/6131b84b1ecc51586944e9d1/html5/thumbnails/29.jpg)
Goodness of Fit: Grouped Data
Let the rows represent each of the groupsExpected cell frequency µ̂ij in row i and col j :
µ̂ij = yi.P̂(Y = j |X = i)
= yi.
[
P̂(Y ≤ j |X = i)− P̂(Y ≤ j − 1|X = i)]
Pearson χ2
X 2 =
I∑
i=1
J∑
j=1
(yij − µ̂ij)2
µ̂ij
H0∼ χ2df
Deviance
G 2 = 2
I∑
i=1
J∑
j=1
yij log
(
yij
µ̂ij
)
H0∼ χ2df
Dose scored: df = [I (J − 1)]− [(J − 1) + 1] = 11
Dose categorical: df = [I (J − 1)]− [(J − 1) + (I − 1)] = 9
STAT 526 Topic 11 29
![Page 30: Modeling a Multinomial Response - Purdue Universitybacraig/notes526/topic11a.pdf · 2020. 10. 6. · Purdue University Reading: Faraway Ch. 7, Agresti Ch. 7, KNNL Ch. 14 STAT 526](https://reader036.vdocuments.mx/reader036/viewer/2022071512/6131b84b1ecc51586944e9d1/html5/thumbnails/30.jpg)
Visual Assessment of Proportional
Odds : Grouped Data
Focus on each predictor (holding other predictors fixed)
According to the model, for all j and δ:
log
(
P(Y ≤ j |X + δ)
P(Y > j |X + δ)
)
− log
(
P(Y ≤ j |X )
P(Y > j |X )
)
= −βδ
Can plot these differences in cumulative odds usingestimates from the saturated model
When proportional odds are appropriate, the differencesshould be roughly the same for all values of X and levels j
STAT 526 Topic 11 30
![Page 31: Modeling a Multinomial Response - Purdue Universitybacraig/notes526/topic11a.pdf · 2020. 10. 6. · Purdue University Reading: Faraway Ch. 7, Agresti Ch. 7, KNNL Ch. 14 STAT 526](https://reader036.vdocuments.mx/reader036/viewer/2022071512/6131b84b1ecc51586944e9d1/html5/thumbnails/31.jpg)
Using R
> mat = xtabs(count~dose+outcome,prob2)
> cumProb <- apply( mat/apply(mat, 1, sum), 1, cumsum)
> cumProb
0 1 2 3
0 0.2809524 0.2526316 0.2125604 0.2205128
1 0.4000000 0.3631579 0.2801932 0.2410256
2 0.6190476 0.5947368 0.5410628 0.4923077
3 0.8476190 0.8421053 0.8502415 0.7897436
4 1.0000000 1.0000000 1.0000000 1.0000000
> logit <- function(x) {log(x/(1-x))}
> plot(0:3, logit(cumProb[-5,2])-logit(cumProb[-5,1]), type="l",
ylim=c(-1, 1), xlab="Dose", ylab="Empirical log(OR)", cex=3.5)
> for (i in 3:4) {lines(0:3, logit(cumProb[-5,i])-
logit(cumProb[-5,i-1]),col=i, lty=i)
}
abline(h=-coef(fit1), col="red", lwd=2)
legend("topleft", lty=c(1,3,4), col=c(1,3,4),
paste("Cum prob cutoff =", c(1:3)), cex=1)
legend("topright", lty=c(1), col=c("red"), "Model-based")
STAT 526 Topic 11 31
![Page 32: Modeling a Multinomial Response - Purdue Universitybacraig/notes526/topic11a.pdf · 2020. 10. 6. · Purdue University Reading: Faraway Ch. 7, Agresti Ch. 7, KNNL Ch. 14 STAT 526](https://reader036.vdocuments.mx/reader036/viewer/2022071512/6131b84b1ecc51586944e9d1/html5/thumbnails/32.jpg)
Are They Relatively Constant?
STAT 526 Topic 11 32
![Page 33: Modeling a Multinomial Response - Purdue Universitybacraig/notes526/topic11a.pdf · 2020. 10. 6. · Purdue University Reading: Faraway Ch. 7, Agresti Ch. 7, KNNL Ch. 14 STAT 526](https://reader036.vdocuments.mx/reader036/viewer/2022071512/6131b84b1ecc51586944e9d1/html5/thumbnails/33.jpg)
Formal Test for Proportional Odds
Testing
H0 : log
(
Pj
1− Pj
)
= θj − Xβ
Ha : log
(
Pj
1− Pj
)
= θj − Xβj
Model under Ha specifies cumulative logit, but notproportional odds, since log(OR) depends on j
The model under H0 is nested within the model under Ha
Thus can compare residual deviances
STAT 526 Topic 11 33
![Page 34: Modeling a Multinomial Response - Purdue Universitybacraig/notes526/topic11a.pdf · 2020. 10. 6. · Purdue University Reading: Faraway Ch. 7, Agresti Ch. 7, KNNL Ch. 14 STAT 526](https://reader036.vdocuments.mx/reader036/viewer/2022071512/6131b84b1ecc51586944e9d1/html5/thumbnails/34.jpg)
Formal Test in R
Must use vglm function in VGAM packageFirst fit proportional-odd model> library(VGAM)
> fit.vgam <- vglm(as.numeric(outcome) ~ dose,
+ cumulative(parallel=TRUE, reverse=FALSE),
+ weights=count,prob2)
> summary(fit.vgam)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept):1 -0.89466 0.11456 -7.809 5.74e-15 ***
(Intercept):2 -0.49410 0.11059 -4.468 7.91e-06 ***
(Intercept):3 0.51615 0.11067 4.664 3.10e-06 ***
(Intercept):4 1.88151 0.13020 14.451 < 2e-16 ***
dose -0.17548 0.05632 -3.116 0.00183 **
Residual deviance: 2461.349 on 75 degrees of freedom
**df based on ungrouped multinomial logit model**
STAT 526 Topic 11 34
![Page 35: Modeling a Multinomial Response - Purdue Universitybacraig/notes526/topic11a.pdf · 2020. 10. 6. · Purdue University Reading: Faraway Ch. 7, Agresti Ch. 7, KNNL Ch. 14 STAT 526](https://reader036.vdocuments.mx/reader036/viewer/2022071512/6131b84b1ecc51586944e9d1/html5/thumbnails/35.jpg)
Formal Test in R
Now fit relaxed model> fit.vgam3 <- vglm(as.numeric(outcome) ~ dose,
+ cumulative(parallel=FALSE, reverse=FALSE),
+ weights=count,prob2)
> summary(fit.vgam3)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept):1 -0.97749 0.13194 -7.408 1.28e-13 ***
(Intercept):2 -0.36265 0.12034 -3.014 0.00258 **
(Intercept):3 0.52391 0.12011 4.362 1.29e-05 ***
(Intercept):4 1.78941 0.16415 10.901 < 2e-16 ***
dose:1 -0.11292 0.07288 -1.549 0.12130
dose:2 -0.26889 0.06832 -3.936 8.29e-05 ***
dose:3 -0.18234 0.06385 -2.856 0.00430 **
dose:4 -0.11925 0.08470 -1.408 0.15916
Residual deviance: 2447.018 on 72 degrees of freedom
STAT 526 Topic 11 35
![Page 36: Modeling a Multinomial Response - Purdue Universitybacraig/notes526/topic11a.pdf · 2020. 10. 6. · Purdue University Reading: Faraway Ch. 7, Agresti Ch. 7, KNNL Ch. 14 STAT 526](https://reader036.vdocuments.mx/reader036/viewer/2022071512/6131b84b1ecc51586944e9d1/html5/thumbnails/36.jpg)
Results
Full residual deviance = 2447.018 on 72 df
Reduced residual deviance = 2461.349 on 75 df
Difference is 2461.349− 2447.018 = 14.331 on 3 df
> pchisq(14.331,3,lower=F)
[1] 0.002487536
Cannot accept that reduced model gives adequate fit
Proportional odds not reasonable
However, full cumulative odds model has issues too
Non-parallel lines means there is eventual crossing
STAT 526 Topic 11 36
![Page 37: Modeling a Multinomial Response - Purdue Universitybacraig/notes526/topic11a.pdf · 2020. 10. 6. · Purdue University Reading: Faraway Ch. 7, Agresti Ch. 7, KNNL Ch. 14 STAT 526](https://reader036.vdocuments.mx/reader036/viewer/2022071512/6131b84b1ecc51586944e9d1/html5/thumbnails/37.jpg)
Multinomial Logit Model
We now shift to case where categories are unordered
Therefore, cannot work with cumulative probabilitiesInstead declare one category as a reference and link thecovariates to probs through J − 1 relative prob ratios
ηij = log
(
pij
pi1
)
= xiβj j = 2, 3, . . . , J
This model implies
pij = exp{xiβj}pi1 j = 2, 3, . . . , J
and because∑J
1 pij = 1, this means
pi1 =1
1 +∑J
2 exp{xiβj}and pij =
exp{xiβj}
1 +∑J
2 exp{xiβj}
STAT 526 Topic 11 37
![Page 38: Modeling a Multinomial Response - Purdue Universitybacraig/notes526/topic11a.pdf · 2020. 10. 6. · Purdue University Reading: Faraway Ch. 7, Agresti Ch. 7, KNNL Ch. 14 STAT 526](https://reader036.vdocuments.mx/reader036/viewer/2022071512/6131b84b1ecc51586944e9d1/html5/thumbnails/38.jpg)
Multinomial Logit Model
The baseline, or reference, category is arbitrary
Common choices by software are j = 1 or j = J
Separate set of parameters βj for each ratio
Values of βj depend on the choice of baseline
Because all sets of βj relative to common category, jointlydefine probs
More flexible model than proportional odds but moredifficult to interpret (?)
Can be used as classification model using category withhighest predicted probability
STAT 526 Topic 11 38
![Page 39: Modeling a Multinomial Response - Purdue Universitybacraig/notes526/topic11a.pdf · 2020. 10. 6. · Purdue University Reading: Faraway Ch. 7, Agresti Ch. 7, KNNL Ch. 14 STAT 526](https://reader036.vdocuments.mx/reader036/viewer/2022071512/6131b84b1ecc51586944e9d1/html5/thumbnails/39.jpg)
Parameter Interpretation
In logistic regression and proportional-odds model, a βj
represents a log odds ratioIn this model, a βj describes the log change in relativeprob ratio
log
(
pj(x + 1)/p1(x + 1)
pj(x)/p1(x)
)
= logpj(x + 1)
p1(x + 1)− log
pj(x)
p1(x)
= β∗0j + β1j(x + 1)− (β∗
0j + β1jx)
= β1j
log
(
pj(x + 1)/pk(x + 1)
pj(x)/pk(x)
)
= logpj(x + 1)
pk(x + 1)− log
pj(x)
pk(x)
= logpj(x + 1)
p1(x + 1)− log
pk(x + 1)
p1(x + 1)−
logpj(x)
p1(x)+ log
pk(x)
p1(x)
= β1j − β1k
STAT 526 Topic 11 39
![Page 40: Modeling a Multinomial Response - Purdue Universitybacraig/notes526/topic11a.pdf · 2020. 10. 6. · Purdue University Reading: Faraway Ch. 7, Agresti Ch. 7, KNNL Ch. 14 STAT 526](https://reader036.vdocuments.mx/reader036/viewer/2022071512/6131b84b1ecc51586944e9d1/html5/thumbnails/40.jpg)
Maximum Likelihood Estimation
The log-likelihood for observation i is:
li = log
J∏
j=1
pj (xi )yij
=J
∑
j=2
yij log pj (xi ) +
1−J
∑
j=2
yij
log p1(xi )
=J
∑
j=2
yij logpj (xi )
1−J∑
k=2pk (xi )
+ log p1(xi )
=J
∑
j=2
yij (xiβj )− log
1 +J
∑
j=2
exp{xiβj}
MaximizeI∑
i=1
li with respect to βj
STAT 526 Topic 11 40
![Page 41: Modeling a Multinomial Response - Purdue Universitybacraig/notes526/topic11a.pdf · 2020. 10. 6. · Purdue University Reading: Faraway Ch. 7, Agresti Ch. 7, KNNL Ch. 14 STAT 526](https://reader036.vdocuments.mx/reader036/viewer/2022071512/6131b84b1ecc51586944e9d1/html5/thumbnails/41.jpg)
Example 2: Dose Response
Let’s revist our dose reponse study but use multinomiallogit model
Let’s consider doseAs a categorical predictor
There are 3 indicator variables per level + interceptTotal of 4(4) = 16 parameters
As a continuous predictor
Will assign scores to the categoriesTotal of 4(2) = 8 parameters
Previous proportional-odds models had 7 and 5parameters, respectively
STAT 526 Topic 11 41
![Page 42: Modeling a Multinomial Response - Purdue Universitybacraig/notes526/topic11a.pdf · 2020. 10. 6. · Purdue University Reading: Faraway Ch. 7, Agresti Ch. 7, KNNL Ch. 14 STAT 526](https://reader036.vdocuments.mx/reader036/viewer/2022071512/6131b84b1ecc51586944e9d1/html5/thumbnails/42.jpg)
Example 2: Dose Categorical
> library(nnet)
> fit1 <- multinom(outcome ~ as.factor(dose), weights=count, prob2)
> summary(fit1)
Coefficients:
(Intercept) as.factor(dose)1 as.factor(dose)2 as.factor(dose)3
2 -0.8586335 0.03194971 -0.2864809 -1.5161958
3 -0.2488754 0.16185828 0.4536705 0.3794879
4 -0.2063195 0.18526707 0.5810140 0.5055581
5 -0.6117850 0.14178037 0.2615807 0.5641507
Std. Errors:
(Intercept) as.factor(dose)1 as.factor(dose)2 as.factor(dose)3
2 0.2386396 0.3541205 0.3887204 0.5746170
3 0.1966936 0.2867909 0.2827264 0.2869711
4 0.1943777 0.2826526 0.2759257 0.2797853
5 0.2195434 0.3199468 0.3212239 0.3095891
Residual Deviance: 2443.166
AIC: 2475.166
STAT 526 Topic 11 42
![Page 43: Modeling a Multinomial Response - Purdue Universitybacraig/notes526/topic11a.pdf · 2020. 10. 6. · Purdue University Reading: Faraway Ch. 7, Agresti Ch. 7, KNNL Ch. 14 STAT 526](https://reader036.vdocuments.mx/reader036/viewer/2022071512/6131b84b1ecc51586944e9d1/html5/thumbnails/43.jpg)
Dose Categorical - Predicted Probs> predProb <- unique(fit1$fitted.values)
> matplot(predProb,las=1,type="l")
> legend("bottomleft", lty=c(1:4), col=c(1:5),
paste("Response =", c(0:4)),cex=0.75)
STAT 526 Topic 11 43
![Page 44: Modeling a Multinomial Response - Purdue Universitybacraig/notes526/topic11a.pdf · 2020. 10. 6. · Purdue University Reading: Faraway Ch. 7, Agresti Ch. 7, KNNL Ch. 14 STAT 526](https://reader036.vdocuments.mx/reader036/viewer/2022071512/6131b84b1ecc51586944e9d1/html5/thumbnails/44.jpg)
Calculation of Residual Deviance
Saturated model when the data are treated as grouped:Model-based predicted probs = sample proportions> m / apply(m, 1, sum)
0 1 2 3 4
0 0.2809524 0.11904762 0.2190476 0.2285714 0.1523810
1 0.2526316 0.11052632 0.2315789 0.2473684 0.1578947
2 0.2125604 0.06763285 0.2608696 0.3091787 0.1497585
3 0.2205128 0.02051282 0.2512821 0.2974359 0.2102564
Deviance for grouped data
G2 = 2I
∑
i=1
J∑
j=1
yij log
(
yij
µ̂ij
)
= 2I
∑
i=1
J∑
j=1
yij log
(
yij
yij
)
= 0
Deviance for ungrouped data
G2 = 2I
∑
i=1
J∑
j=1
yij log
(
1
p̂j (xi )
)
= −2I
∑
i=1
J∑
j=1
yij logp̂j (xi ) = 2443.166
with I × J × (J − 1)− I · (J − 1) = 4 · 5 · 4− 4 · 4 = 64 df
STAT 526 Topic 11 44
![Page 45: Modeling a Multinomial Response - Purdue Universitybacraig/notes526/topic11a.pdf · 2020. 10. 6. · Purdue University Reading: Faraway Ch. 7, Agresti Ch. 7, KNNL Ch. 14 STAT 526](https://reader036.vdocuments.mx/reader036/viewer/2022071512/6131b84b1ecc51586944e9d1/html5/thumbnails/45.jpg)
Example 2: Dose Scored
> library(nnet)
> fit2 <- multinom(outcome ~ dose, weights=count, prob2)
> summary(fit2)
Coefficients:
(Intercept) dose
2 -0.6999134 -0.3544346
3 -0.2194566 0.1470232
4 -0.1772963 0.1945578
5 -0.6544057 0.1914772
Std. Errors:
(Intercept) dose
2 0.2051749 0.13796048
3 0.1676773 0.09130087
4 0.1649761 0.08894654
5 0.1896008 0.10105460
Residual Deviance: 2449.145
AIC: 2465.145
STAT 526 Topic 11 45
![Page 46: Modeling a Multinomial Response - Purdue Universitybacraig/notes526/topic11a.pdf · 2020. 10. 6. · Purdue University Reading: Faraway Ch. 7, Agresti Ch. 7, KNNL Ch. 14 STAT 526](https://reader036.vdocuments.mx/reader036/viewer/2022071512/6131b84b1ecc51586944e9d1/html5/thumbnails/46.jpg)
Dose Scored - Predicted Probs> predProb <- unique(fit1$fitted.values)
> matplot(predProb,las=1,type="l")
> legend("bottomleft", lty=c(1:4), col=c(1:5),
paste("Response =", c(0:4)),cex=0.75)
STAT 526 Topic 11 46
![Page 47: Modeling a Multinomial Response - Purdue Universitybacraig/notes526/topic11a.pdf · 2020. 10. 6. · Purdue University Reading: Faraway Ch. 7, Agresti Ch. 7, KNNL Ch. 14 STAT 526](https://reader036.vdocuments.mx/reader036/viewer/2022071512/6131b84b1ecc51586944e9d1/html5/thumbnails/47.jpg)
Conclusions
Can compare the two models to test for linearity> anova(fit1,fit2)
Model Res. df Resid. Dev Df LR stat. Pr(Chi)
1 dose 72 2449.145
2 as.factor(dose) 64 2443.166 8 5.97846 0.6496448
Conclude that it is sufficient to consider linearity
Can do grouped goodness of fit test to assess fit
G 2 = 5.98 on 8 df (same because grouped Model 2 saturated)
This model does not fit as well as the relaxedcumulative-odds model
G 2 = 2447.018 on 72 df versus G 2 = 2449.145 on 72 df
STAT 526 Topic 11 47
![Page 48: Modeling a Multinomial Response - Purdue Universitybacraig/notes526/topic11a.pdf · 2020. 10. 6. · Purdue University Reading: Faraway Ch. 7, Agresti Ch. 7, KNNL Ch. 14 STAT 526](https://reader036.vdocuments.mx/reader036/viewer/2022071512/6131b84b1ecc51586944e9d1/html5/thumbnails/48.jpg)
Test for Equality of βj
Can test if different slope needed for each class j
H0 : log
(
pj (X )
p1(X )
)
= β0j + βX , j = 2, . . . , J
Ha : log
(
pj (X )
p1(X )
)
= β0j + βjX , j = 2, . . . , J
# -----separate beta_j for each response category-----
# ------the last category is the baseline in VGAM------
> fit3 <- vglm(outcome~dose, multinomial(parallel=FALSE),
+ weights=count,prob2)
> summary(fit3)
# -------same beta_j for each response category-------
> fit3.parallel <- vglm(outcome~dose,multinomial(parallel=TRUE),
+ weights=count,prob2)
> summary(fit3.parallel)
> 1 - pchisq(2*(logLik(fit3)-logLik(fit3.parallel)),
df=length(coef(fit3))-length(coef(fit3.parallel)))
[1] 0.0001767769
STAT 526 Topic 11 48
![Page 49: Modeling a Multinomial Response - Purdue Universitybacraig/notes526/topic11a.pdf · 2020. 10. 6. · Purdue University Reading: Faraway Ch. 7, Agresti Ch. 7, KNNL Ch. 14 STAT 526](https://reader036.vdocuments.mx/reader036/viewer/2022071512/6131b84b1ecc51586944e9d1/html5/thumbnails/49.jpg)
Example 4: Housing Satisfaction
1681 Copenhagen residents in study (housing in MASS)
Three categorical predictors (1 nominal, 2 ordered)
Contact Low HighSatisfaction Low Medium High Low Medium HighHousing InfluenceTower blocks Low 21 21 28 14 19 37
Medium 34 22 36 17 23 40High 10 11 36 3 5 23
Apartments Low 61 23 17 78 46 43Medium 43 35 40 48 45 86High 26 18 54 15 25 62
Atrium houses Low 13 9 10 20 23 20Medium 8 8 12 10 22 24High 6 7 9 7 10 21
Terraced houses Low 18 6 7 57 23 13Medium 15 13 13 31 21 13High 7 5 11 5 6 13
STAT 526 Topic 11 49
![Page 50: Modeling a Multinomial Response - Purdue Universitybacraig/notes526/topic11a.pdf · 2020. 10. 6. · Purdue University Reading: Faraway Ch. 7, Agresti Ch. 7, KNNL Ch. 14 STAT 526](https://reader036.vdocuments.mx/reader036/viewer/2022071512/6131b84b1ecc51586944e9d1/html5/thumbnails/50.jpg)
Mosaic Plot
STAT 526 Topic 11 50
![Page 51: Modeling a Multinomial Response - Purdue Universitybacraig/notes526/topic11a.pdf · 2020. 10. 6. · Purdue University Reading: Faraway Ch. 7, Agresti Ch. 7, KNNL Ch. 14 STAT 526](https://reader036.vdocuments.mx/reader036/viewer/2022071512/6131b84b1ecc51586944e9d1/html5/thumbnails/51.jpg)
Multinomial Logit Null Model Fit
Distribution of satisfaction same for all residents
> fit.mull <- multinom(Sat~1,weights=Freq,housing)
> summary(fit.null)
Call:
multinom(formula = Sat ~ 1, data = housing, weights = Freq)
Coefficients:
(Intercept)
Medium -0.2400404 #Low: 1/(1+exp(-0.2400404)+exp(.1639289))=0.3372992
High 0.1639289 #Medium: =0.2653183
#High: =0.3973825
Std. Errors:
(Intercept)
Medium 0.06329155
High 0.05710232 DF = 4*3*2*3*(3-1) - 2 = 142 (ungrouped)
DF = 4*3*2*(3-1) - 2 = 46 (grouped)
Residual Deviance: 3648.878
AIC: 3652.878
STAT 526 Topic 11 51
![Page 52: Modeling a Multinomial Response - Purdue Universitybacraig/notes526/topic11a.pdf · 2020. 10. 6. · Purdue University Reading: Faraway Ch. 7, Agresti Ch. 7, KNNL Ch. 14 STAT 526](https://reader036.vdocuments.mx/reader036/viewer/2022071512/6131b84b1ecc51586944e9d1/html5/thumbnails/52.jpg)
Multinomial Logit Fit
Consider influence as nominal variable> fit.multinom <- multinom(Sat~Infl+Type+Cont,weights=Freq,housing)
> summary(fit.multinom)
Coefficients:
(Intercept) InflMedium InflHigh TypeApartment
Medium -0.4192316 0.4464003 0.6649367 -0.4356851
High -0.1387453 0.7348626 1.6126294 -0.7356261
TypeAtrium TypeTerrace ContHigh
Medium 0.1313663 -0.6665728 0.3608513
High -0.4079808 -1.4123333 0.4818236
Std. Errors:
(Intercept) InflMedium InflHigh TypeApartment
Medium 0.1729344 0.1415572 0.1863374 0.1725327
High 0.1592295 0.1369380 0.1671316 0.1552714
TypeAtrium TypeTerrace ContHigh
Medium 0.2231065 0.2062532 0.1323975
High 0.2114965 0.2001496 0.1241371 Should we consider
interactions among
Residual Deviance: 3470.084 predictors?
AIC: 3498.084
STAT 526 Topic 11 52
![Page 53: Modeling a Multinomial Response - Purdue Universitybacraig/notes526/topic11a.pdf · 2020. 10. 6. · Purdue University Reading: Faraway Ch. 7, Agresti Ch. 7, KNNL Ch. 14 STAT 526](https://reader036.vdocuments.mx/reader036/viewer/2022071512/6131b84b1ecc51586944e9d1/html5/thumbnails/53.jpg)
Surrogate Log-Linear Models
Again focusing on satisfaction as multinomial responsewith other three variables as predictors
Will use associations between variables to developpredictive model
Model #1: Satisfaction is indep of the three predictors
If true, conditional distribution of satisfaction is thesame for all predictor combinationsIn other words, conditional probs do not vary withpredictorsThis is the same as the multinomial null modelCan express as log-linear model using
> fit <- glm(Freq~Infl*Type*Cont+Sat,family=poisson,housing)
STAT 526 Topic 11 53
![Page 54: Modeling a Multinomial Response - Purdue Universitybacraig/notes526/topic11a.pdf · 2020. 10. 6. · Purdue University Reading: Faraway Ch. 7, Agresti Ch. 7, KNNL Ch. 14 STAT 526](https://reader036.vdocuments.mx/reader036/viewer/2022071512/6131b84b1ecc51586944e9d1/html5/thumbnails/54.jpg)
Model #1 Results
> summary(fit)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.162e+00 1.243e-01 25.433 < 2e-16 ***
InflMedium 2.733e-01 1.586e-01 1.723 0.084868 .
InflHigh -2.054e-01 1.784e-01 -1.152 0.249511
TypeApartment 3.666e-01 1.555e-01 2.357 0.018403 *
TypeAtrium -7.828e-01 2.134e-01 -3.668 0.000244 ***
TypeTerrace -8.145e-01 2.157e-01 -3.775 0.000160 ***
ContHigh -1.190e-15 1.690e-01 0.000 1.000000
Sat1Medium -2.400e-01 6.329e-02 -3.793 0.000149 ***
Sat1High 1.639e-01 5.710e-02 2.871 0.004094 **
InflMedium:TypeApartment -1.177e-01 2.086e-01 -0.564 0.572571
InflHigh:TypeApartment 1.753e-01 2.279e-01 0.769 0.441783
InflMedium:TypeAtrium -4.068e-01 3.035e-01 -1.340 0.180118
InflHigh:TypeAtrium -1.692e-01 3.294e-01 -0.514 0.607433
InflMedium:TypeTerrace 6.292e-03 2.860e-01 0.022 0.982450
InflHigh:TypeTerrace -9.305e-02 3.280e-01 -0.284 0.776633
InflMedium:ContHigh -1.398e-01 2.279e-01 -0.613 0.539715
InflHigh:ContHigh -6.091e-01 2.800e-01 -2.176 0.029585 *
STAT 526 Topic 11 54
![Page 55: Modeling a Multinomial Response - Purdue Universitybacraig/notes526/topic11a.pdf · 2020. 10. 6. · Purdue University Reading: Faraway Ch. 7, Agresti Ch. 7, KNNL Ch. 14 STAT 526](https://reader036.vdocuments.mx/reader036/viewer/2022071512/6131b84b1ecc51586944e9d1/html5/thumbnails/55.jpg)
Model #1 Results
TypeApartment:ContHigh 5.029e-01 2.109e-01 2.385 0.017083 *
TypeAtrium:ContHigh 6.774e-01 2.751e-01 2.462 0.013811 *
TypeTerrace:ContHigh 1.099e+00 2.675e-01 4.106 4.02e-05 ***
InflMedium:TypeApartment:ContHigh 5.359e-02 2.862e-01 0.187 0.851450
InflHigh:TypeApartment:ContHigh 1.462e-01 3.380e-01 0.432 0.665390
InflMedium:TypeAtrium:ContHigh 1.555e-01 3.907e-01 0.398 0.690597
InflHigh:TypeAtrium:ContHigh 4.782e-01 4.441e-01 1.077 0.281619
InflMedium:TypeTerrace:ContHigh -4.980e-01 3.671e-01 -1.357 0.174827
InflHigh:TypeTerrace:ContHigh -4.470e-01 4.545e-01 -0.984 0.325326
Null deviance: 833.66 on 71 degrees of freedom
Residual deviance: 217.46 on 46 degrees of freedom
AIC: 610.43
Large deviance suggests probs vary with predictors
Residual deviance based on Poisson dist here
Coefs for Sat1 are the same as null multinomial intercepts
STAT 526 Topic 11 55
![Page 56: Modeling a Multinomial Response - Purdue Universitybacraig/notes526/topic11a.pdf · 2020. 10. 6. · Purdue University Reading: Faraway Ch. 7, Agresti Ch. 7, KNNL Ch. 14 STAT 526](https://reader036.vdocuments.mx/reader036/viewer/2022071512/6131b84b1ecc51586944e9d1/html5/thumbnails/56.jpg)
Additive Contributions of Predictors
Can assess whether Sat1 depends on each of the 3predictors individually by adding interactions with it
> addterm(fit, ~. + Sat1:(Infl+Type+Cont), test="Chisq")
Single term additions
Model:
Freq ~ Infl * Type * Cont + Sat
Df Deviance AIC LRT Pr(Chi)
<none> 217.46 610.43
Infl:Sat1 4 111.08 512.05 106.371 < 2.2e-16 ***
Type:Sat1 6 156.79 561.76 60.669 3.292e-11 ***
Cont:Sat1 2 212.33 609.30 5.126 0.07708 .
Infl: max reduction in resid. deviance & AIC
Even though Cont:Sat1 not significant, let’s look atmodel with all three interactions
STAT 526 Topic 11 56
![Page 57: Modeling a Multinomial Response - Purdue Universitybacraig/notes526/topic11a.pdf · 2020. 10. 6. · Purdue University Reading: Faraway Ch. 7, Agresti Ch. 7, KNNL Ch. 14 STAT 526](https://reader036.vdocuments.mx/reader036/viewer/2022071512/6131b84b1ecc51586944e9d1/html5/thumbnails/57.jpg)
Model #2: Interactions with Sat> fit2 <- glm(Freq~Infl*Type*Cont+Sat1:Infl+Sat1*Type+Sat1*Cont,
+ family=poisson,housing)
> summary(fit2)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.32106 0.14761 22.498 < 2e-16 ***
InflMedium -0.14543 0.17855 -0.814 0.415369
InflHigh -1.17183 0.21803 -5.375 7.68e-08 ***
TypeApartment 0.68296 0.17522 3.898 9.71e-05 ***
TypeAtrium -0.70064 0.24137 -2.903 0.003698 **
TypeTerrace -0.32511 0.23230 -1.400 0.161652
ContHigh -0.28230 0.18441 -1.531 0.125814
Sat1Medium -0.41923 0.17293 -2.424 0.015342 *
Sat1High -0.13874 0.15923 -0.871 0.383570
InflMedium:TypeApartment -0.01788 0.21050 -0.085 0.932302
InflHigh:TypeApartment 0.38687 0.23330 1.658 0.097263 .
InflMedium:TypeAtrium -0.36031 0.30498 -1.181 0.237432
InflHigh:TypeAtrium -0.03679 0.33479 -0.110 0.912503
InflMedium:TypeTerrace 0.18515 0.28889 0.641 0.521580
InflHigh:TypeTerrace 0.31075 0.33482 0.928 0.353345
InflMedium:ContHigh -0.20006 0.22875 -0.875 0.381799
InflHigh:ContHigh -0.72579 0.28235 -2.571 0.010155 *
TypeApartment:ContHigh 0.56969 0.21215 2.685 0.007247**
STAT 526 Topic 11 57
![Page 58: Modeling a Multinomial Response - Purdue Universitybacraig/notes526/topic11a.pdf · 2020. 10. 6. · Purdue University Reading: Faraway Ch. 7, Agresti Ch. 7, KNNL Ch. 14 STAT 526](https://reader036.vdocuments.mx/reader036/viewer/2022071512/6131b84b1ecc51586944e9d1/html5/thumbnails/58.jpg)
Model #2: Interactions with SatTypeAtrium:ContHigh 0.70211 0.27606 2.543 0.010979 *
TypeTerrace:ContHigh 1.21593 0.26997 4.504 6.67e-06 ***
InflMedium:Sat1Medium 0.44640 0.14156 3.153 0.001613 **
InflHigh:Sat1Medium 0.66494 0.18634 3.568 0.000359 ***
InflMedium:Sat1High 0.73486 0.13694 5.366 8.03e-08 ***
InflHigh:Sat1High 1.61263 0.16713 9.649 < 2e-16 ***
TypeApartment:Sat1Medium -0.43569 0.17253 -2.525 0.011562 *
TypeAtrium:Sat1Medium 0.13137 0.22311 0.589 0.555980
TypeTerrace:Sat1Medium -0.66657 0.20625 -3.232 0.001230 **
TypeApartment:Sat1High -0.73563 0.15527 -4.738 2.16e-06 ***
TypeAtrium:Sat1High -0.40798 0.21150 -1.929 0.053730 .
TypeTerrace:Sat1High -1.41233 0.20015 -7.056 1.71e-12 ***
ContHigh:Sat1Medium 0.36085 0.13240 2.726 0.006420 **
ContHigh:Sat1High 0.48183 0.12414 3.881 0.000104 ***
InflMedium:TypeApartment:ContHigh 0.04690 0.28621 0.164 0.869837
InflHigh:TypeApartment:ContHigh 0.12623 0.33821 0.373 0.708979
InflMedium:TypeAtrium:ContHigh 0.15724 0.39072 0.402 0.687364
InflHigh:TypeAtrium:ContHigh 0.47861 0.44424 1.077 0.281320
InflMedium:TypeTerrace:ContHigh -0.50016 0.36713 -1.362 0.173091
InflHigh:TypeTerrace:ContHigh -0.46310 0.45471 -1.018 0.308467
Null deviance: 833.657 on 71 degrees of freedom
Residual deviance: 38.662 on 34 degrees of freedom
AIC: 455.63STAT 526 Topic 11 58
![Page 59: Modeling a Multinomial Response - Purdue Universitybacraig/notes526/topic11a.pdf · 2020. 10. 6. · Purdue University Reading: Faraway Ch. 7, Agresti Ch. 7, KNNL Ch. 14 STAT 526](https://reader036.vdocuments.mx/reader036/viewer/2022071512/6131b84b1ecc51586944e9d1/html5/thumbnails/59.jpg)
Model #2: Interactions with Sat
Same model as our main-effects multinomial model
Different deviances due to different saturated models.
In multinom the saturated model is for subjectsIn surrogate log-linear model, it is for cells (grouped)
Comparison with null modelmultinom: 3648.9− 3470.1 = 178.8 and 142− 130 = 12 dflog-linear: 217.5− 38.7 = 178.8 and 46− 34 = 12 df
Could also consider higher-order interactionsRepresent non-additive effects of predictors on Sat
addterm(fit1, .~.+Sat:(Infl+Type+Cont)^2, test="Chisq")
None are found significant
STAT 526 Topic 11 59
![Page 60: Modeling a Multinomial Response - Purdue Universitybacraig/notes526/topic11a.pdf · 2020. 10. 6. · Purdue University Reading: Faraway Ch. 7, Agresti Ch. 7, KNNL Ch. 14 STAT 526](https://reader036.vdocuments.mx/reader036/viewer/2022071512/6131b84b1ecc51586944e9d1/html5/thumbnails/60.jpg)
Summary
Models using the Poisson distributionConsider E (count response) as a function of predictors
Poisson regressionQuasipoisson or negative binomial regressionSurrogate log-linear model
Multivariate associations of categorical variables
Nominal random variables: Log-linear modelsOrdinal random variables: Linear-by-linear model,column-effect models
Models using the multinomial distributionConsider E (count response) as a function of predictors
Ordinal response: cumulative logit modelNominal response: multinomial logit model
STAT 526 Topic 11 60