# binary and multinomial logistic regression - · pdf filebinary and multinomial logistic...

Post on 10-Mar-2018

226 views

Embed Size (px)

TRANSCRIPT

Binary and Multinomial Logistic Regression

stat 557Heike Hofmann

Outline

Logistic Regression: model checking by grouping Model selection scores

Intro to Multinomial Regression

Example: Happiness Data

> summary(happy) happy year age sex not too happy: 5629 Min. :1972 Min. : 18.00 female:28581 pretty happy :25874 1st Qu.:1982 1st Qu.: 31.00 male :22439 very happy :14800 Median :1990 Median : 43.00 NA's : 4717 Mean :1990 Mean : 45.43 3rd Qu.:2000 3rd Qu.: 58.00 Max. :2006 Max. : 89.00 NA's :184.00 marital degree finrela health divorced : 6131 bachelor : 6918 above average : 8536 excellent:11951 married :27998 graduate : 3253 average :23363 fair : 7149 never married:10064 high school :26307 below average :10909 good :17227 separated : 1781 junior college: 2601 far above average: 898 poor : 2164 widowed : 5032 lt high school:11777 far below average: 2438 NA's :12529 NA's : 14 NA's : 164 NA's : 4876

only consider extremes: very happy and not very happy individuals

female male

prodplot(data=happy, ~ happy+sex, c("vspine", "hspine"), na.rm=T, subset=level==2)# almost perfect independence# try a model

happy.sex |z|) (Intercept) 0.96613 0.02075 46.551

Deviance difference is asymptotically 2 distributed

Null hypothesis of independence cannot be rejected

> anova(happy.sex)Analysis of Deviance Table

Model: binomial, link: logit

Response: happy

Terms added sequentially (first to last)

Df Deviance Resid. Df Resid. DevNULL 20428 24053sex 1 0.0016906 20427 24053

> confint(happy.sex)Waiting for profiling to be done... 2.5 % 97.5 %(Intercept) 0.92557962 1.00693875sexmale -0.06064378 0.06332427

Age and Happiness

age

count

0.0

0.2

0.4

0.6

0.8

1.0

20 30 40 50 60 70 80

happy

not too happy

very happy

age

count

0

100

200

300

400

20 30 40 50 60 70 80

happy

not too happy

very happy

qplot(age, geom="histogram", fill=happy, binwidth=1, data=happy)

qplot(age, geom="histogram", fill=happy, binwidth=1, position="fill", data=happy)

# research paper claims that happiness is u-shapedhappy.age

> summary(happy.age)

Call:glm(formula = happy ~ poly(age, 2), family = binomial(), data = na.omit(happy[, c("age", "happy")]))

Deviance Residuals: Min 1Q Median 3Q Max -1.6400 -1.5480 0.7841 0.8061 0.8707

Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 0.96850 0.01571 61.660 < 2e-16 ***poly(age, 2)1 6.41183 2.22171 2.886 0.00390 ** poly(age, 2)2 -7.81568 2.21981 -3.521 0.00043 ***---Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 23957 on 20351 degrees of freedomResidual deviance: 23936 on 20349 degrees of freedomAIC: 23942

Number of Fisher Scoring iterations: 4

age

count

0.0

0.2

0.4

0.6

0.8

1.0

20 30 40 50 60 70 80

happy

not too happy

very happy

# effect of ageX

age

pred3

0.0

0.2

0.4

0.6

0.8

1.0

20 30 40 50 60 70 80

sex

female

male

# effect of ageX

Problems with Deviance

if X is continuous, deviance has no longer 2 distribution. Two-fold violations:

regard X to be categorical (with lots of categories): we might end up with a contingency table that has lots of small cells - which means, that the 2 approximation does not hold.

Increases in sample size, most likely increase the number of different values of X.Corresponding contingency table changes size (asymptotic distribution for the smaller contingency table doesnt exist).

... but

Differences in deviances between models that are only a few degrees of freedom apart, still have asymptotically 2

age

pred3

0.0

0.2

0.4

0.6

0.8

1.0

20 30 40 50 60 70 80

sex

female

male

# effect of ageX

Model Checking by Grouping

Group data along estimates, e.g. such that groups are approximately equal in size.

Partition smallest n1 estimates into group 1, second smallest batch of n2 estimates into group 2, ... If we assume g groups, we get the Hosmer-Lemeshow test statistic:

Problem with deviance: if X continuous, deviance has no longer 2 distribution. The approximation as-sumptions are violated two-fold: even if we regard X to be categorical (with lots of categories) these means,that we end up with a contingency table that has lots of small cells - which means, that the 2 approxima-tion does not hold. Secondly, if we increase the sample size, most likely the number of different values of Xincreases, too, which makes the corresponding contingency table change size (so we cannot even talk aboutan asymptotic distribution for the smaller contingency table, as it doesnt exist anymore once the samplesize is larger).

Model Checking by Grouping To get around the problems with the distribution assumption of G2, wecan group the data along estimates, e.g. by partitioning on estimates such that groups are approximatelyequal in size.Partitioning the estimates is done by size, we group the smallest n1 estimates into group 1, the secondsmallest batch of n2 estimates into group 2, ... If we assume g groups, we get the Hosmer-Lemeshow teststatistic

g

i=1

nij=1 yij

nij=1 ij

2

nij=1 ij

1

j ij/ni

2g2.

4.4 Effects of Coding

Let X be a nominal variable with I categories. An appropriate model would then be:

log(x)

1 (x) = + i,

where i is the effect of the ith category in X on the log odds, i.e. for each category one effect is estimated.This means that the above model is overparameterized (the last category can be explained in terms ofthe others). To make the solution unique again, we have to use an additional constraint. In R, 1 = 0,by default. Whenever one of the effects is fixed to be zero, this is called a contrast coding - as it allows acomparison of all the other effects to the baseline effect. For effect coding the constraint is on the sum of alleffects of a variable:

i i = 0. In a binary variable the effects are then the negatives of each other.

Predictions and inference are independent from the specific coding used and are not affected by changesmade in the coding.

Example: Alcohol and MalformationAlcohol during pregnancy is believed to be associated with congenital malformation. The following numbersare from an observational study - after three months of pregnancy questions on the average number of dailyalcoholic beverages were asked; at birth the infant was checked for malformations:

Alcohol malformed absent P(malformed)1 0 48 17066 0.00282 < 1 38 14464 0.00263 1-2 5 788 0.00634 3-5 1 126 0.00795 6 1 37 0.0263

Models m1 and m2 are the same in terms of statistical behavior: deviance, predictions and inference willyield the same numbers. The variable Alcohol is recoded for the second model, giving different estimatesfor the levels.

Alcohol

Problems with Grouping

Different groupings might (and will) lead to different decisions w.r.t model fit

Hosmer et al (1997): A COMPARISON OF GOODNESS-OF-FIT TESTS FOR THE LOGISTIC REGRESSION MODEL (on Blackboard)

Model Selection

?

Theory for relationship between response and outcome is well developed, model is fitted because we want to fine-tune dependency structure

Ideal Situation:

Model Selection

?

After initial data check, visually inspect relationship between response and potential co-variatesinclude strongest co-variates first, build up from there, check whether additions are significant improvements

Exploratory Modelling

Model Selection

Include/Exclude variables based on goodness-of-fit criteria such as AIC, adjusted R2, ...

Stepwise Modelling (not recommended by itself)

In Practice: combination of all three methods

(Forward) Selection

Results are often not easy to interpret - questionable value?

Step: AIC=18176cbind(happy, not) ~ sex + poly(age, 4) + marital + degree + finrela + degree:finrela + poly(age, 4):degree + poly(age, 4):finrela + sex:finrela + sex:degree

Df Deviance AIC 16714 18176+ sex:marital 4 16707 18177+ marital:degree 16 16688 18182+ poly(age, 4):marital 16 16688 18182+ sex:poly(age, 4) 4 16714 18184+ marital:finrela 16 16693 18187

(Forward) Selection

Step: AIC=18176cbind(happy, not) ~ sex + poly(age, 4) + marital + degree + finrela + degree:finrela + poly(age, 4):degree + poly(age, 4):finrela + sex:finrela + sex:degree

Df Deviance AIC 16714 18176- sex:degree