Download - Chapter 3: Generalized Linear Models
![Page 1: Chapter 3: Generalized Linear Models](https://reader033.vdocuments.mx/reader033/viewer/2022061503/568143d5550346895db062d0/html5/thumbnails/1.jpg)
Chapter 3: Generalized Linear Models
3.1 The Generalization
3.2 Logistic Regression Revisited
3.3 Poisson Regression
1
![Page 2: Chapter 3: Generalized Linear Models](https://reader033.vdocuments.mx/reader033/viewer/2022061503/568143d5550346895db062d0/html5/thumbnails/2.jpg)
Chapter 3: Generalized Linear Models
3.1 The Generalization3.1 The Generalization
3.2 Logistic Regression Revisited
3.3 Poisson Regression
2
![Page 3: Chapter 3: Generalized Linear Models](https://reader033.vdocuments.mx/reader033/viewer/2022061503/568143d5550346895db062d0/html5/thumbnails/3.jpg)
Objectives Review the linear model. Generalize the linear model. Describe several common generalized linear models.
3
![Page 4: Chapter 3: Generalized Linear Models](https://reader033.vdocuments.mx/reader033/viewer/2022061503/568143d5550346895db062d0/html5/thumbnails/4.jpg)
Review Linear Models A model is linear in the parameters when there is only
one parameter per term and it is a multiplicative constant.– It is not a matter of a linear response.
The response is modeled as a linear combination of terms.
This model is linear:
This model is not linear:
4
2 20 1 1 2 2 12 1 2 11 1 22 2y x x x x x x
10
xy e
![Page 5: Chapter 3: Generalized Linear Models](https://reader033.vdocuments.mx/reader033/viewer/2022061503/568143d5550346895db062d0/html5/thumbnails/5.jpg)
Linear Model Error The linear model is for the expected value or the mean
of the response. The linear model includes the response errors as
normally distributed deviations with a mean of 0 and a constant variance.– The variance does not depend on any explanatory
variable. The errors are added to the expectation.
5
![Page 6: Chapter 3: Generalized Linear Models](https://reader033.vdocuments.mx/reader033/viewer/2022061503/568143d5550346895db062d0/html5/thumbnails/6.jpg)
Generalized Linear Model The linear model can be generalized to cases with
nonnormal responses that are functions of the mean.– A random component uses any distribution
in the natural exponential family.– A systematic component relates the predictors
to the response.– A link function relates the mean response
to the systematic component.
6
![Page 7: Chapter 3: Generalized Linear Models](https://reader033.vdocuments.mx/reader033/viewer/2022061503/568143d5550346895db062d0/html5/thumbnails/7.jpg)
Random Component The random component uses any distribution
in the natural exponential family. The PMF or PDF is in this form:
– a(θi) is a function of the distribution parameter.
– b(yi) is a function of the response.
– Q(θi) is the natural parameter.
7
; i iy Qi i i if y a b y e
![Page 8: Chapter 3: Generalized Linear Models](https://reader033.vdocuments.mx/reader033/viewer/2022061503/568143d5550346895db062d0/html5/thumbnails/8.jpg)
Random Component in JMP The following distributions are available to serve
as the random component of a GLM in JMP:– Normal– Binomial– Poisson– Exponential
8
![Page 9: Chapter 3: Generalized Linear Models](https://reader033.vdocuments.mx/reader033/viewer/2022061503/568143d5550346895db062d0/html5/thumbnails/9.jpg)
Systematic Component The systematic component uses a linear model.
9
i j ijj
x
![Page 10: Chapter 3: Generalized Linear Models](https://reader033.vdocuments.mx/reader033/viewer/2022061503/568143d5550346895db062d0/html5/thumbnails/10.jpg)
Systematic Component in JMP Use Fit Model to specify the systematic component,
as you would for ordinary least squares regression. Create linear combinations of effects by adding terms
made from data columns.
10
![Page 11: Chapter 3: Generalized Linear Models](https://reader033.vdocuments.mx/reader033/viewer/2022061503/568143d5550346895db062d0/html5/thumbnails/11.jpg)
Link Component The link function g relates the random component
and the systematic component.
The link is a monotonic and differentiable function. It is the canonical link function if it transforms
the mean to the natural parameter Q(θ),.
11
i i
i i j ijj
E Y
g x
![Page 12: Chapter 3: Generalized Linear Models](https://reader033.vdocuments.mx/reader033/viewer/2022061503/568143d5550346895db062d0/html5/thumbnails/12.jpg)
Link Component in JMP The following functions are available to serve
as the link component in JMP:– Identity– Log– Logit– Reciprocal– Probit– Power:
– Complementary log-log:
12
, 0
log , 0
log log 1
![Page 13: Chapter 3: Generalized Linear Models](https://reader033.vdocuments.mx/reader033/viewer/2022061503/568143d5550346895db062d0/html5/thumbnails/13.jpg)
13
![Page 14: Chapter 3: Generalized Linear Models](https://reader033.vdocuments.mx/reader033/viewer/2022061503/568143d5550346895db062d0/html5/thumbnails/14.jpg)
3.01 QuizMatch the component of a GLM on the top with its representation or an example on the bottom.
A.Random component
B.Systematic component
C.Link component
1.
2.
3.
log1
i ii
x
; i iy Qi i i if y a b y e
14
![Page 15: Chapter 3: Generalized Linear Models](https://reader033.vdocuments.mx/reader033/viewer/2022061503/568143d5550346895db062d0/html5/thumbnails/15.jpg)
3.01 Quiz – Correct AnswerMatch the component of a GLM on the top with its representation or an example on the bottom.
A.Random component
B.Systematic component
C.Link component
1.
2.
3.
log1
i ii
x
; i iy Qi i i if y a b y e
The correct answer is A-3, B-2, and C-1.15
![Page 16: Chapter 3: Generalized Linear Models](https://reader033.vdocuments.mx/reader033/viewer/2022061503/568143d5550346895db062d0/html5/thumbnails/16.jpg)
Binary Logistic Regression A binary response can also be modeled with a GLM.
The canonical link function is the logit.
16
log1
;
; 1 1
1
1
log1
i i
ii
i ii
y Qi i i i
yn yy
i i i i i
i i
i
ii
i
f y a b y e
f y e
a
b y
Q
![Page 17: Chapter 3: Generalized Linear Models](https://reader033.vdocuments.mx/reader033/viewer/2022061503/568143d5550346895db062d0/html5/thumbnails/17.jpg)
Poisson Regression A simple model of counts is the Poisson distribution.
The canonical link function is the log.
17
log
;
1;
! !
1
!
log
i i
i i
i ii
i
y Qi i i i
yyi
i ii i
i
ii
i i
f y a b y e
ef y u e e
y y
a e
b yy
Q
![Page 18: Chapter 3: Generalized Linear Models](https://reader033.vdocuments.mx/reader033/viewer/2022061503/568143d5550346895db062d0/html5/thumbnails/18.jpg)
Poisson Loglinear Model with Offset The opportunity for the counts might not be constant for all
observations.– The opportunity N might be a period of time,
a length, an area, or a volume.
Log(Ni) is the offset.
18
log
log log
j ijj
ij ij
ji
i i j ijj
x
i i
xN
N x
N e
![Page 19: Chapter 3: Generalized Linear Models](https://reader033.vdocuments.mx/reader033/viewer/2022061503/568143d5550346895db062d0/html5/thumbnails/19.jpg)
Common Generalized Linear ModelsDistribution Link Predictors Method
Normal Identity Continuous OLS Regression
Normal Identity Categorical ANOVA
Binomial Logit Both Logistic Regr.
Binomial Compl. Log-Log Continuous Asymmetric
Poisson Log Both Poisson Regr.
Normal Inverse Normal Both Probit Analysis
Exponential Reciprocal Both Nonlinear Regr.
19
![Page 20: Chapter 3: Generalized Linear Models](https://reader033.vdocuments.mx/reader033/viewer/2022061503/568143d5550346895db062d0/html5/thumbnails/20.jpg)
Deviance The deviance is a measure of goodness of fit. The deviance assesses the difference between
the observed and the predicted response.– Differences should be random (chi-square).
The deviance assesses the value of explanatory variables in the model.– Deviance aids model selection.
The deviance is twice the difference in log-likelihood between the saturated model and the full model.
20
![Page 21: Chapter 3: Generalized Linear Models](https://reader033.vdocuments.mx/reader033/viewer/2022061503/568143d5550346895db062d0/html5/thumbnails/21.jpg)
Over-Dispersion Lack of fit can result from more variance than
expected from the model distribution. An over-dispersion parameter can be used to account
for the excess in the case of a binomial or Poisson distribution.– The parameter equals 1 when there
is no over-dispersion.
21
![Page 22: Chapter 3: Generalized Linear Models](https://reader033.vdocuments.mx/reader033/viewer/2022061503/568143d5550346895db062d0/html5/thumbnails/22.jpg)
22
![Page 23: Chapter 3: Generalized Linear Models](https://reader033.vdocuments.mx/reader033/viewer/2022061503/568143d5550346895db062d0/html5/thumbnails/23.jpg)
3.02 Multiple Answer PollWhich of the following statements are true of the deviance associated with a GLM?
a. The deviance is the difference between the predicted response and the observed response.
b. The deviance is twice the difference in log-likelihood between the saturated model and the full model.
c. The deviance is a measure of goodness of fit.
d. The deviance measures the variance of the response.
23
![Page 24: Chapter 3: Generalized Linear Models](https://reader033.vdocuments.mx/reader033/viewer/2022061503/568143d5550346895db062d0/html5/thumbnails/24.jpg)
3.02 Multiple Answer Poll – Correct AnswerWhich of the following statements are true of the deviance associated with a GLM?
a. The deviance is the difference between the predicted response and the observed response.
b. The deviance is twice the difference in log-likelihood between the saturated model and the full model.
c. The deviance is a measure of goodness of fit.
d. The deviance measures the variance of the response.
24
![Page 25: Chapter 3: Generalized Linear Models](https://reader033.vdocuments.mx/reader033/viewer/2022061503/568143d5550346895db062d0/html5/thumbnails/25.jpg)
25
![Page 26: Chapter 3: Generalized Linear Models](https://reader033.vdocuments.mx/reader033/viewer/2022061503/568143d5550346895db062d0/html5/thumbnails/26.jpg)
Chapter 3: Generalized Linear Models
3.1 The Generalization
3.2 Logistic Regression Revisited3.2 Logistic Regression Revisited
3.3 Poisson Regression
26
![Page 27: Chapter 3: Generalized Linear Models](https://reader033.vdocuments.mx/reader033/viewer/2022061503/568143d5550346895db062d0/html5/thumbnails/27.jpg)
Objectives Review binary logistic regression models. Model binary responses with a GLM.
27
![Page 28: Chapter 3: Generalized Linear Models](https://reader033.vdocuments.mx/reader033/viewer/2022061503/568143d5550346895db062d0/html5/thumbnails/28.jpg)
Binary Logistic Regression Model
28
0 1
0 1
0 1logit log1
1
x
x
xx x
x
ex
e
![Page 29: Chapter 3: Generalized Linear Models](https://reader033.vdocuments.mx/reader033/viewer/2022061503/568143d5550346895db062d0/html5/thumbnails/29.jpg)
Advantage of Using Logistic Regression JMP provides the following when you use logistic
regression:– Likelihood ratio test for lack of fit– Many measures of goodness of fit– Profiles of probability for all levels of the predictor– Odds ratios– ROC curve– Lift curve– Confusion matrix
29
![Page 30: Chapter 3: Generalized Linear Models](https://reader033.vdocuments.mx/reader033/viewer/2022061503/568143d5550346895db062d0/html5/thumbnails/30.jpg)
Advantage of Using Binary GLM JMP provides the following when you use a GLM
for a binary response:– Deviance for lack of fit– Over-dispersion model parameter– Likelihood ratio test for over-dispersion– Four residual plots– Prediction profiler for probability of target level
30
![Page 31: Chapter 3: Generalized Linear Models](https://reader033.vdocuments.mx/reader033/viewer/2022061503/568143d5550346895db062d0/html5/thumbnails/31.jpg)
GLM for a Binary Response A binary response can also be modeled with a GLM.
The canonical link function is the logit.
31
log1
;
; 1 1
1
1
log1
i i
ii
i ii
y Qi i i i
yn yy
i i i i i
i i
i
ii
i
f y a b y e
f y e
a
b y
Q
![Page 32: Chapter 3: Generalized Linear Models](https://reader033.vdocuments.mx/reader033/viewer/2022061503/568143d5550346895db062d0/html5/thumbnails/32.jpg)
Separation Problem It might happen in any given sample that the binary
outcomes are completely separated by the explanatory variable.
This separation causes a problem with estimating the logistic regression or GLM parameters.
Firth’s penalized maximum likelihood estimation method can avoid this problem and reduce bias in the parameter estimates in the case of rare outcomes.
32
![Page 33: Chapter 3: Generalized Linear Models](https://reader033.vdocuments.mx/reader033/viewer/2022061503/568143d5550346895db062d0/html5/thumbnails/33.jpg)
Pearson Residuals Pearson chi-square for goodness of fit is the sum
of the squared Pearson residuals.
33
ˆ
ˆ ˆ1i i i
i
i i i
y ne
n
![Page 34: Chapter 3: Generalized Linear Models](https://reader033.vdocuments.mx/reader033/viewer/2022061503/568143d5550346895db062d0/html5/thumbnails/34.jpg)
Deviance Residuals The deviance chi-square for goodness of fit is the sum
of the squared deviance residuals.
Studentized residuals provide a common scale for inspection.
34
1
2
2 log 2 logˆ ˆ
ˆsign
i i ii i i i i
i i i
i i i
y n yd s y n y
y n y
s y y
![Page 35: Chapter 3: Generalized Linear Models](https://reader033.vdocuments.mx/reader033/viewer/2022061503/568143d5550346895db062d0/html5/thumbnails/35.jpg)
35
![Page 36: Chapter 3: Generalized Linear Models](https://reader033.vdocuments.mx/reader033/viewer/2022061503/568143d5550346895db062d0/html5/thumbnails/36.jpg)
3.03 QuizWhat are the three GLM components for a binary response?
36
![Page 37: Chapter 3: Generalized Linear Models](https://reader033.vdocuments.mx/reader033/viewer/2022061503/568143d5550346895db062d0/html5/thumbnails/37.jpg)
3.03 Quiz – Correct AnswerWhat are the three GLM components for a binary response?
Random component is the binomial distribution.
Systematic component is a polynomial function.
Link component is the logit function.
37
![Page 38: Chapter 3: Generalized Linear Models](https://reader033.vdocuments.mx/reader033/viewer/2022061503/568143d5550346895db062d0/html5/thumbnails/38.jpg)
GLM for Binary Response Example Use GLM with the Titanic Passengers data set
to related Survived with Siblings and Spouses, Parents and Children, and Fare.
38
![Page 39: Chapter 3: Generalized Linear Models](https://reader033.vdocuments.mx/reader033/viewer/2022061503/568143d5550346895db062d0/html5/thumbnails/39.jpg)
This demonstration illustrates the concepts discussed previously.
GLM for a Binary Response
39
![Page 40: Chapter 3: Generalized Linear Models](https://reader033.vdocuments.mx/reader033/viewer/2022061503/568143d5550346895db062d0/html5/thumbnails/40.jpg)
40
![Page 41: Chapter 3: Generalized Linear Models](https://reader033.vdocuments.mx/reader033/viewer/2022061503/568143d5550346895db062d0/html5/thumbnails/41.jpg)
Exercise
This exercise reinforces the concepts discussed previously.
41
![Page 42: Chapter 3: Generalized Linear Models](https://reader033.vdocuments.mx/reader033/viewer/2022061503/568143d5550346895db062d0/html5/thumbnails/42.jpg)
Chapter 3: Generalized Linear Models
3.1 The Generalization
3.2 Logistic Regression Revisited
3.3 Poisson Regression3.3 Poisson Regression
42
![Page 43: Chapter 3: Generalized Linear Models](https://reader033.vdocuments.mx/reader033/viewer/2022061503/568143d5550346895db062d0/html5/thumbnails/43.jpg)
Objectives Identify categorical response of counts. Use a GLM that is also known as Poisson loglinear
regression.
43
![Page 44: Chapter 3: Generalized Linear Models](https://reader033.vdocuments.mx/reader033/viewer/2022061503/568143d5550346895db062d0/html5/thumbnails/44.jpg)
Response Is Counts The response can be simply the count of a particular
event in many cases.– Occurrence of a disease– Road accidents– Mold colonies– Number of non-conforming items
44
![Page 45: Chapter 3: Generalized Linear Models](https://reader033.vdocuments.mx/reader033/viewer/2022061503/568143d5550346895db062d0/html5/thumbnails/45.jpg)
Response Is Counts, Constant Opportunity The response can be the count of a particular event
in the same span of time, linear dimension, area, or volume.– Occurrence of a disease per annum– Road accidents each month on the same highway– Mold colonies in a standard Petri dish– Number of non-conforming items in a standard
lot size
45
![Page 46: Chapter 3: Generalized Linear Models](https://reader033.vdocuments.mx/reader033/viewer/2022061503/568143d5550346895db062d0/html5/thumbnails/46.jpg)
Poisson Regression A simple model of counts is the Poisson distribution.
The canonical link function is the log.
46
log
;
1;
! !
1
!
log
i i
i i
i ii
i
y Qi i i i
yyi
i ii i
i
ii
i i
f y a b y e
ef y u e e
y y
a e
b yy
Q
![Page 47: Chapter 3: Generalized Linear Models](https://reader033.vdocuments.mx/reader033/viewer/2022061503/568143d5550346895db062d0/html5/thumbnails/47.jpg)
Response Is Counts, Opportunity Varies The response can be simply the count of a particular
event in the same span of time or linear dimension, area, or volume.– Occurrence of a disease in different hospitals– Road accidents on different highways– Mold colonies in nonstandard field cases– Number of non-conforming items in lots of different
sizes Requires the use of an offset parameter in the model.
– Acts like intercept in the linear model.
47
![Page 48: Chapter 3: Generalized Linear Models](https://reader033.vdocuments.mx/reader033/viewer/2022061503/568143d5550346895db062d0/html5/thumbnails/48.jpg)
Poisson Loglinear Model with Offset The opportunity for the counts might not be constant for all
observations.– The opportunity N might be a period of time,
a length, an area, or a volume.
Log(Ni) is the offset.
48
log
log log
j ijj
ij ij
ji
i i j ijj
x
i i
xN
N x
N e
![Page 49: Chapter 3: Generalized Linear Models](https://reader033.vdocuments.mx/reader033/viewer/2022061503/568143d5550346895db062d0/html5/thumbnails/49.jpg)
49
![Page 50: Chapter 3: Generalized Linear Models](https://reader033.vdocuments.mx/reader033/viewer/2022061503/568143d5550346895db062d0/html5/thumbnails/50.jpg)
3.04 Multiple Answer PollHow is the logarithm always used in Poisson regression with GLM?
a. Transform the response variable
b. Transform the explanatory variable
c. Transform the offset variable
d. Link the systematic and random components
e. Increase the over-dispersion
50
![Page 51: Chapter 3: Generalized Linear Models](https://reader033.vdocuments.mx/reader033/viewer/2022061503/568143d5550346895db062d0/html5/thumbnails/51.jpg)
3.04 Multiple Answer Poll – Correct AnswerHow is the logarithm always used in Poisson regression with GLM?
a. Transform the response variable
b. Transform the explanatory variable
c. Transform the offset variable
d. Link the systematic and random components
e. Increase the over-dispersion
51
![Page 52: Chapter 3: Generalized Linear Models](https://reader033.vdocuments.mx/reader033/viewer/2022061503/568143d5550346895db062d0/html5/thumbnails/52.jpg)
Contrasts The effect tests in the GLM analysis provide
a sufficient test about the difference in the expected value of the response at both levels of a categorical explanatory variable with just two levels.
Another level of tests is possible when a categorical predictor has more than two levels: a contrast.
The contrast can test one level against another. The contrast can test a combination of levels against
another level or another combination of levels. Contrasts are based on a likelihood ratio test.
52
![Page 53: Chapter 3: Generalized Linear Models](https://reader033.vdocuments.mx/reader033/viewer/2022061503/568143d5550346895db062d0/html5/thumbnails/53.jpg)
Poisson Regression Example The number of new melanoma cases was reported from
1969-1971 for white males in two areas.
– Region is Northern or Southern.
– Age Group is <35, 35-44, 45-54, 55-64, 65-74, and >75.
– Cases is the number of patients with a new melanoma.
– Total is the number of patients in each region and age group. Offset is log(Total).
The total number of patients varies.
53
![Page 54: Chapter 3: Generalized Linear Models](https://reader033.vdocuments.mx/reader033/viewer/2022061503/568143d5550346895db062d0/html5/thumbnails/54.jpg)
This demonstration illustrates the concepts discussed previously.
GLM for Counts
54
![Page 55: Chapter 3: Generalized Linear Models](https://reader033.vdocuments.mx/reader033/viewer/2022061503/568143d5550346895db062d0/html5/thumbnails/55.jpg)
55
![Page 56: Chapter 3: Generalized Linear Models](https://reader033.vdocuments.mx/reader033/viewer/2022061503/568143d5550346895db062d0/html5/thumbnails/56.jpg)
Exercise
This exercise reinforces the concepts discussed previously.
56