chapter 3: generalized linear models 3.1 the generalization 3.2 logistic regression revisited 3.3...

Post on 12-Jan-2016

257 Views

Category:

Documents

4 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Chapter 3: Generalized Linear Models

3.1 The Generalization

3.2 Logistic Regression Revisited

3.3 Poisson Regression

1

Chapter 3: Generalized Linear Models

3.1 The Generalization3.1 The Generalization

3.2 Logistic Regression Revisited

3.3 Poisson Regression

2

Objectives Review the linear model. Generalize the linear model. Describe several common generalized linear models.

3

Review Linear Models A model is linear in the parameters when there is only

one parameter per term and it is a multiplicative constant.– It is not a matter of a linear response.

The response is modeled as a linear combination of terms.

This model is linear:

This model is not linear:

4

2 20 1 1 2 2 12 1 2 11 1 22 2y x x x x x x

10

xy e

Linear Model Error The linear model is for the expected value or the mean

of the response. The linear model includes the response errors as

normally distributed deviations with a mean of 0 and a constant variance.– The variance does not depend on any explanatory

variable. The errors are added to the expectation.

5

Generalized Linear Model The linear model can be generalized to cases with

nonnormal responses that are functions of the mean.– A random component uses any distribution

in the natural exponential family.– A systematic component relates the predictors

to the response.– A link function relates the mean response

to the systematic component.

6

Random Component The random component uses any distribution

in the natural exponential family. The PMF or PDF is in this form:

– a(θi) is a function of the distribution parameter.

– b(yi) is a function of the response.

– Q(θi) is the natural parameter.

7

; i iy Qi i i if y a b y e

Random Component in JMP The following distributions are available to serve

as the random component of a GLM in JMP:– Normal– Binomial– Poisson– Exponential

8

Systematic Component The systematic component uses a linear model.

9

i j ijj

x

Systematic Component in JMP Use Fit Model to specify the systematic component,

as you would for ordinary least squares regression. Create linear combinations of effects by adding terms

made from data columns.

10

Link Component The link function g relates the random component

and the systematic component.

The link is a monotonic and differentiable function. It is the canonical link function if it transforms

the mean to the natural parameter Q(θ),.

11

i i

i i j ijj

E Y

g x

Link Component in JMP The following functions are available to serve

as the link component in JMP:– Identity– Log– Logit– Reciprocal– Probit– Power:

– Complementary log-log:

12

, 0

log , 0

log log 1

13

3.01 QuizMatch the component of a GLM on the top with its representation or an example on the bottom.

A.Random component

B.Systematic component

C.Link component

1.

2.

3.

log1

i ii

x

; i iy Qi i i if y a b y e

14

3.01 Quiz – Correct AnswerMatch the component of a GLM on the top with its representation or an example on the bottom.

A.Random component

B.Systematic component

C.Link component

1.

2.

3.

log1

i ii

x

; i iy Qi i i if y a b y e

The correct answer is A-3, B-2, and C-1.15

Binary Logistic Regression A binary response can also be modeled with a GLM.

The canonical link function is the logit.

16

log1

;

; 1 1

1

1

log1

i i

ii

i ii

y Qi i i i

yn yy

i i i i i

i i

i

ii

i

f y a b y e

f y e

a

b y

Q

Poisson Regression A simple model of counts is the Poisson distribution.

The canonical link function is the log.

17

log

;

1;

! !

1

!

log

i i

i i

i ii

i

y Qi i i i

yyi

i ii i

i

ii

i i

f y a b y e

ef y u e e

y y

a e

b yy

Q

Poisson Loglinear Model with Offset The opportunity for the counts might not be constant for all

observations.– The opportunity N might be a period of time,

a length, an area, or a volume.

Log(Ni) is the offset.

18

log

log log

j ijj

ij ij

ji

i i j ijj

x

i i

xN

N x

N e

Common Generalized Linear ModelsDistribution Link Predictors Method

Normal Identity Continuous OLS Regression

Normal Identity Categorical ANOVA

Binomial Logit Both Logistic Regr.

Binomial Compl. Log-Log Continuous Asymmetric

Poisson Log Both Poisson Regr.

Normal Inverse Normal Both Probit Analysis

Exponential Reciprocal Both Nonlinear Regr.

19

Deviance The deviance is a measure of goodness of fit. The deviance assesses the difference between

the observed and the predicted response.– Differences should be random (chi-square).

The deviance assesses the value of explanatory variables in the model.– Deviance aids model selection.

The deviance is twice the difference in log-likelihood between the saturated model and the full model.

20

Over-Dispersion Lack of fit can result from more variance than

expected from the model distribution. An over-dispersion parameter can be used to account

for the excess in the case of a binomial or Poisson distribution.– The parameter equals 1 when there

is no over-dispersion.

21

22

3.02 Multiple Answer PollWhich of the following statements are true of the deviance associated with a GLM?

a. The deviance is the difference between the predicted response and the observed response.

b. The deviance is twice the difference in log-likelihood between the saturated model and the full model.

c. The deviance is a measure of goodness of fit.

d. The deviance measures the variance of the response.

23

3.02 Multiple Answer Poll – Correct AnswerWhich of the following statements are true of the deviance associated with a GLM?

a. The deviance is the difference between the predicted response and the observed response.

b. The deviance is twice the difference in log-likelihood between the saturated model and the full model.

c. The deviance is a measure of goodness of fit.

d. The deviance measures the variance of the response.

24

25

Chapter 3: Generalized Linear Models

3.1 The Generalization

3.2 Logistic Regression Revisited3.2 Logistic Regression Revisited

3.3 Poisson Regression

26

Objectives Review binary logistic regression models. Model binary responses with a GLM.

27

Binary Logistic Regression Model

28

0 1

0 1

0 1logit log1

1

x

x

xx x

x

ex

e

Advantage of Using Logistic Regression JMP provides the following when you use logistic

regression:– Likelihood ratio test for lack of fit– Many measures of goodness of fit– Profiles of probability for all levels of the predictor– Odds ratios– ROC curve– Lift curve– Confusion matrix

29

Advantage of Using Binary GLM JMP provides the following when you use a GLM

for a binary response:– Deviance for lack of fit– Over-dispersion model parameter– Likelihood ratio test for over-dispersion– Four residual plots– Prediction profiler for probability of target level

30

GLM for a Binary Response A binary response can also be modeled with a GLM.

The canonical link function is the logit.

31

log1

;

; 1 1

1

1

log1

i i

ii

i ii

y Qi i i i

yn yy

i i i i i

i i

i

ii

i

f y a b y e

f y e

a

b y

Q

Separation Problem It might happen in any given sample that the binary

outcomes are completely separated by the explanatory variable.

This separation causes a problem with estimating the logistic regression or GLM parameters.

Firth’s penalized maximum likelihood estimation method can avoid this problem and reduce bias in the parameter estimates in the case of rare outcomes.

32

Pearson Residuals Pearson chi-square for goodness of fit is the sum

of the squared Pearson residuals.

33

ˆ

ˆ ˆ1i i i

i

i i i

y ne

n

Deviance Residuals The deviance chi-square for goodness of fit is the sum

of the squared deviance residuals.

Studentized residuals provide a common scale for inspection.

34

1

2

2 log 2 logˆ ˆ

ˆsign

i i ii i i i i

i i i

i i i

y n yd s y n y

y n y

s y y

35

3.03 QuizWhat are the three GLM components for a binary response?

36

3.03 Quiz – Correct AnswerWhat are the three GLM components for a binary response?

Random component is the binomial distribution.

Systematic component is a polynomial function.

Link component is the logit function.

37

GLM for Binary Response Example Use GLM with the Titanic Passengers data set

to related Survived with Siblings and Spouses, Parents and Children, and Fare.

38

This demonstration illustrates the concepts discussed previously.

GLM for a Binary Response

39

40

Exercise

This exercise reinforces the concepts discussed previously.

41

Chapter 3: Generalized Linear Models

3.1 The Generalization

3.2 Logistic Regression Revisited

3.3 Poisson Regression3.3 Poisson Regression

42

Objectives Identify categorical response of counts. Use a GLM that is also known as Poisson loglinear

regression.

43

Response Is Counts The response can be simply the count of a particular

event in many cases.– Occurrence of a disease– Road accidents– Mold colonies– Number of non-conforming items

44

Response Is Counts, Constant Opportunity The response can be the count of a particular event

in the same span of time, linear dimension, area, or volume.– Occurrence of a disease per annum– Road accidents each month on the same highway– Mold colonies in a standard Petri dish– Number of non-conforming items in a standard

lot size

45

Poisson Regression A simple model of counts is the Poisson distribution.

The canonical link function is the log.

46

log

;

1;

! !

1

!

log

i i

i i

i ii

i

y Qi i i i

yyi

i ii i

i

ii

i i

f y a b y e

ef y u e e

y y

a e

b yy

Q

Response Is Counts, Opportunity Varies The response can be simply the count of a particular

event in the same span of time or linear dimension, area, or volume.– Occurrence of a disease in different hospitals– Road accidents on different highways– Mold colonies in nonstandard field cases– Number of non-conforming items in lots of different

sizes Requires the use of an offset parameter in the model.

– Acts like intercept in the linear model.

47

Poisson Loglinear Model with Offset The opportunity for the counts might not be constant for all

observations.– The opportunity N might be a period of time,

a length, an area, or a volume.

Log(Ni) is the offset.

48

log

log log

j ijj

ij ij

ji

i i j ijj

x

i i

xN

N x

N e

49

3.04 Multiple Answer PollHow is the logarithm always used in Poisson regression with GLM?

a. Transform the response variable

b. Transform the explanatory variable

c. Transform the offset variable

d. Link the systematic and random components

e. Increase the over-dispersion

50

3.04 Multiple Answer Poll – Correct AnswerHow is the logarithm always used in Poisson regression with GLM?

a. Transform the response variable

b. Transform the explanatory variable

c. Transform the offset variable

d. Link the systematic and random components

e. Increase the over-dispersion

51

Contrasts The effect tests in the GLM analysis provide

a sufficient test about the difference in the expected value of the response at both levels of a categorical explanatory variable with just two levels.

Another level of tests is possible when a categorical predictor has more than two levels: a contrast.

The contrast can test one level against another. The contrast can test a combination of levels against

another level or another combination of levels. Contrasts are based on a likelihood ratio test.

52

Poisson Regression Example The number of new melanoma cases was reported from

1969-1971 for white males in two areas.

– Region is Northern or Southern.

– Age Group is <35, 35-44, 45-54, 55-64, 65-74, and >75.

– Cases is the number of patients with a new melanoma.

– Total is the number of patients in each region and age group. Offset is log(Total).

The total number of patients varies.

53

This demonstration illustrates the concepts discussed previously.

GLM for Counts

54

55

Exercise

This exercise reinforces the concepts discussed previously.

56

top related