1 sta 617 – chp9 loglinear/logit models loglinear / logit models chapter 5-7 logistic regression:...

28
1 STA 617 – Chp9 STA 617 – Chp9 Loglinear/Logit Models Loglinear/Logit Models Loglinear / Logit Models Chapter 5-7 logistic regression: GLM with logit link binomial / multinomial Chapter 8 – loglinear models for contingency table log link Poisson cell counts Chapter 9 here – present graphs that show a model’s association and conditional independence patterns. selection and comparison of loglinear models Diagnostics for checking models, such as residuals association between ordinal variables

Upload: pearl-casey

Post on 03-Jan-2016

229 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: 1 STA 617 – Chp9 Loglinear/Logit Models Loglinear / Logit Models  Chapter 5-7 logistic regression: GLM with logit link binomial / multinomial  Chapter

1STA 617 – Chp9 STA 617 – Chp9 Loglinear/Logit ModelsLoglinear/Logit Models

Loglinear / Logit Models

Chapter 5-7 logistic regression: GLM withlogit linkbinomial / multinomial

Chapter 8 – loglinear models for contingency tablelog linkPoisson cell counts

Chapter 9 here – present graphs that show a model’s association and

conditional independence patterns. selection and comparison of loglinear models Diagnostics for checking models, such as residuals association between ordinal variables

Page 2: 1 STA 617 – Chp9 Loglinear/Logit Models Loglinear / Logit Models  Chapter 5-7 logistic regression: GLM with logit link binomial / multinomial  Chapter

2STA 617 – Chp9 STA 617 – Chp9 Loglinear/Logit ModelsLoglinear/Logit Models

9.1 ASSOCIATION GRAPHS AND COLLAPSIBILITY

Darroch et al. (1980) – mathematical graph theory to represent certain loglinear model having a conditional independence structure

An association graph has a set of vertices, each vertex representing a variable.

An edge connecting two variables represents a conditional association between them.

Page 3: 1 STA 617 – Chp9 Loglinear/Logit Models Loglinear / Logit Models  Chapter 5-7 logistic regression: GLM with logit link binomial / multinomial  Chapter

3STA 617 – Chp9 STA 617 – Chp9 Loglinear/Logit ModelsLoglinear/Logit Models

loglinear model (WX,WY,WZ, YZ)without XY and XZ terms.

It assumes independence between X and Y and between X and Z, conditional on the remaining two variables.

Two loglinear models with the same pairwise associations have the same association graph.

For instance, this association graph is also the one for model (WX,WYZ), which adds a three-factor WYZ interaction.

Page 4: 1 STA 617 – Chp9 Loglinear/Logit Models Loglinear / Logit Models  Chapter 5-7 logistic regression: GLM with logit link binomial / multinomial  Chapter

4STA 617 – Chp9 STA 617 – Chp9 Loglinear/Logit ModelsLoglinear/Logit Models

A path in an association graph is a sequence of edges leading from one variable to another.

Two variables X and Y are said to be separated by a subset of variables if all paths connecting X and Y intersect that subset.

For instance, in above Figure, W separates X and Y, since any path connecting X and Y goes through W.

The subset {W, Z} also separates X and Y. A fundamental result states that two variables are

conditionally independent given any subset of variables that separates them.

Page 5: 1 STA 617 – Chp9 Loglinear/Logit Models Loglinear / Logit Models  Chapter 5-7 logistic regression: GLM with logit link binomial / multinomial  Chapter

5STA 617 – Chp9 STA 617 – Chp9 Loglinear/Logit ModelsLoglinear/Logit Models

9.1.2 Collapsibility in Three-Way Contingency Tables

conditional associations in partial tables usually differ from marginal associations. However, under certain collapsibility conditions, they are the same.

the fitted XY odds ratio is identical in the partial tables and the marginal table for models with association graphs

Page 6: 1 STA 617 – Chp9 Loglinear/Logit Models Loglinear / Logit Models  Chapter 5-7 logistic regression: GLM with logit link binomial / multinomial  Chapter

6STA 617 – Chp9 STA 617 – Chp9 Loglinear/Logit ModelsLoglinear/Logit Models

9.1.4 Collapsibility and Association Graphs for Multiway Tables

Bishop et al. (1975, p. 47) provided a parametric collapsibility condition with multiway tables:

Page 7: 1 STA 617 – Chp9 Loglinear/Logit Models Loglinear / Logit Models  Chapter 5-7 logistic regression: GLM with logit link binomial / multinomial  Chapter

7STA 617 – Chp9 STA 617 – Chp9 Loglinear/Logit ModelsLoglinear/Logit Models

9.2 MODEL SELECTION AND COMPARISON

Key:

A model should be complex enough to fit well

but also relatively simple to interpret, smoothing

rather than overfitting the data.

The potentially useful models are usually a small subset

of the possible models.

Page 8: 1 STA 617 – Chp9 Loglinear/Logit Models Loglinear / Logit Models  Chapter 5-7 logistic regression: GLM with logit link binomial / multinomial  Chapter

8STA 617 – Chp9 STA 617 – Chp9 Loglinear/Logit ModelsLoglinear/Logit Models

9.2.1 Considerations in Model Selection

inclusion of certain terms: A study designed to answer certain questions, such as treatment group

models should recognize distinctions between response and explanatory variables

The modeling process should concentrate on terms linking responses and terms linking explanatory variables to responses.

The model should contain the most general interaction term relating the explanatory variables.

Thus, from the likelihood equations, this has the effect of equating the fitted totals to the sample totals at combinations of their levels.

Page 9: 1 STA 617 – Chp9 Loglinear/Logit Models Loglinear / Logit Models  Chapter 5-7 logistic regression: GLM with logit link binomial / multinomial  Chapter

9STA 617 – Chp9 STA 617 – Chp9 Loglinear/Logit ModelsLoglinear/Logit Models

Automobile example (Table 8.8)

Two responses: I-injury, and S-seat-belt useTwo explanatory variables: G-gender, L-location

Then we need to include G*L, we can imply if we use a loglinear model with GL term.

If S is also explanatory and only I is a response, should be fixed.

We should then use logit rather than loglinear models, when the main focus is describing effects on that response.

Page 10: 1 STA 617 – Chp9 Loglinear/Logit Models Loglinear / Logit Models  Chapter 5-7 logistic regression: GLM with logit link binomial / multinomial  Chapter

10STA 617 – Chp9 STA 617 – Chp9 Loglinear/Logit ModelsLoglinear/Logit Models

For exploratory studies, a search among potential models may provide clues about associations and interactions. first fits the model having single-factor terms then the model having two-factor and single-factor

terms then the model having three-factor and lower terms,

and so on. Fitting such models often reveals a restricted range of

good-fitting models.

Page 11: 1 STA 617 – Chp9 Loglinear/Logit Models Loglinear / Logit Models  Chapter 5-7 logistic regression: GLM with logit link binomial / multinomial  Chapter

11STA 617 – Chp9 STA 617 – Chp9 Loglinear/Logit ModelsLoglinear/Logit Models

Automatic model selection

Backward/Forward/Stepwise model elimination, may also be useful but should be used with care and skepticism.

Such a strategy need not yield a meaningful model.

Page 12: 1 STA 617 – Chp9 Loglinear/Logit Models Loglinear / Logit Models  Chapter 5-7 logistic regression: GLM with logit link binomial / multinomial  Chapter

12STA 617 – Chp9 STA 617 – Chp9 Loglinear/Logit ModelsLoglinear/Logit Models

the Dayton Student Survey

gender ŽG. and race ŽR.

Alcohol A, cigarettes C, marijuana M, gender G and race R

Page 13: 1 STA 617 – Chp9 Loglinear/Logit Models Loglinear / Logit Models  Chapter 5-7 logistic regression: GLM with logit link binomial / multinomial  Chapter

13STA 617 – Chp9 STA 617 – Chp9 Loglinear/Logit ModelsLoglinear/Logit Models

SAS Codedata table9_1;

input A $ C $ x1-x8;

array x{*} x1-x8;

retain i; i=0; drop i x1-x8;

do R='White', 'Other'; do G='Female','Male'; do M='Yes', 'No';

i=i+1; count=x{i}; output;

end;end;end;

cards;

Yes Yes 405 268 453 228 23 23 30 19

Yes No 13 218 28 201 2 19 1 18

No Yes 1 17 1 17 0 1 1 8

No No 1 117 1 133 0 12 0 17

;

A C R G M countYes Yes White Female Yes 405Yes Yes White Female No 268Yes Yes White Male Yes 453Yes Yes White Male No 228Yes Yes Other Female Yes 23Yes Yes Other Female No 23Yes Yes Other Male Yes 30Yes Yes Other Male No 19Yes No White Female Yes 13Yes No White Female No 218Yes No White Male Yes 28Yes No White Male No 201Yes No Other Female Yes 2Yes No Other Female No 19Yes No Other Male Yes 1Yes No Other Male No 18No Yes White Female Yes 1No Yes White Female No 17No Yes White Male Yes 1No Yes White Male No 17No Yes Other Female Yes 0No Yes Other Female No 1No Yes Other Male Yes 1No Yes Other Male No 8No No White Female Yes 1No No White Female No 117No No White Male Yes 1No No White Male No 133No No Other Female Yes 0No No Other Female No 12No No Other Male Yes 0No No Other Male No 17

Page 14: 1 STA 617 – Chp9 Loglinear/Logit Models Loglinear / Logit Models  Chapter 5-7 logistic regression: GLM with logit link binomial / multinomial  Chapter

14STA 617 – Chp9 STA 617 – Chp9 Loglinear/Logit ModelsLoglinear/Logit Models

Responses: Alcohol A, cigarettes C, marijuana M, Explanatory: gender G and race R, always include GR

Model selection – Mutual independence + GR Homogeneous association All three-factor terms

Backward selection

Page 15: 1 STA 617 – Chp9 Loglinear/Logit Models Loglinear / Logit Models  Chapter 5-7 logistic regression: GLM with logit link binomial / multinomial  Chapter

15STA 617 – Chp9 STA 617 – Chp9 Loglinear/Logit ModelsLoglinear/Logit Models

Page 16: 1 STA 617 – Chp9 Loglinear/Logit Models Loglinear / Logit Models  Chapter 5-7 logistic regression: GLM with logit link binomial / multinomial  Chapter

16STA 617 – Chp9 STA 617 – Chp9 Loglinear/Logit ModelsLoglinear/Logit Models

%let maineffect=A C M R G; %let data=table9_1;

data allfit; run;

/*STEP 1 main effects + GR*/

%modelbuild(G*R ,model1);

proc print data=modelfit; run;

/*STEP 2 main effects + 2fis*/

%modelbuild(A*C A*M A*R A*G C*M C*R C*G M*R M*G G*R ,model2);

proc print data=modelfit; run;

/*STEP 3 main effects + 2fis +3fis (not necessary for this example)*/

%modelbuild(A|C|M A|C|R A|C|G A|M|R A|M|G A|R|G C|M|R C|M|G C|R|G M|R|G ,model3);

proc print data=modelfit; run;

model G2 chi2 DF delta pvaluemodel1 1325.14 1454.14 25 0.2888 0model2 15.34 18.68 16 0.01253 0.49987model3 5.27 4.8 6 0.00825 0.50943

Page 17: 1 STA 617 – Chp9 Loglinear/Logit Models Loglinear / Logit Models  Chapter 5-7 logistic regression: GLM with logit link binomial / multinomial  Chapter

17STA 617 – Chp9 STA 617 – Chp9 Loglinear/Logit ModelsLoglinear/Logit Models

/*STEP 4 Backward selection starting from Model 2*/%modelbuild( A*M A*R A*G C*M C*R C*G M*R M*G G*R ,model4a);

%modelbuild(A*C A*R A*G C*M C*R C*G M*R M*G G*R ,model4b);

%modelbuild(A*C A*M A*G C*M C*R C*G M*R M*G G*R ,model4c);

%modelbuild(A*C A*M A*R C*M C*R C*G M*R M*G G*R ,model4d);

%modelbuild(A*C A*M A*R A*G C*R C*G M*R M*G G*R ,model4e);

%modelbuild(A*C A*M A*R A*G C*M C*G M*R M*G G*R ,model4f);

%modelbuild(A*C A*M A*R A*G C*M C*R M*R M*G G*R ,model4g);

%modelbuild(A*C A*M A*R A*G C*M C*R C*G M*G G*R ,model4h);

%modelbuild(A*C A*M A*R A*G C*M C*R C*G M*R G*R ,model4i);

proc print data=allfit; run;

/*Thus we delete CR*/

model G2 chi2 DF delta pvaluemodel2 15.34 18.68 16 0.01253 0.49987model4a 201.2 190.6 17 0.09132 0model4b 106.96 108.11 17 0.03763 0model4c 20.32 30.32 17 0.01404 0.25815model4d 18.72 23.14 17 0.01742 0.34502model4e 513.47 474.26 17 0.18236 0model4f 15.78 20.12 17 0.01205 0.53923model4g 16.32 19.16 17 0.01534 0.50147model4h 18.93 22.83 17 0.01642 0.33263model4i 25.16 27.97 17 0.03209 0.09116

Page 18: 1 STA 617 – Chp9 Loglinear/Logit Models Loglinear / Logit Models  Chapter 5-7 logistic regression: GLM with logit link binomial / multinomial  Chapter

18STA 617 – Chp9 STA 617 – Chp9 Loglinear/Logit ModelsLoglinear/Logit Models/*STEP 5 Backward selection starting from Model 4f above*/

%modelbuild( A*M A*R A*G C*M C*G M*R M*G G*R ,model5a);

%modelbuild(A*C A*R A*G C*M C*G M*R M*G G*R ,model5b);

%modelbuild(A*C A*M A*G C*M C*G M*R M*G G*R ,model5c);

%modelbuild(A*C A*M A*R C*M C*G M*R M*G G*R ,model5d);

%modelbuild(A*C A*M A*R A*G C*G M*R M*G G*R ,model5e);

%modelbuild(A*C A*M A*R A*G C*M M*R M*G G*R ,model5);

%modelbuild(A*C A*M A*R A*G C*M C*G M*G G*R ,model5g);

%modelbuild(A*C A*M A*R A*G C*M C*G M*R G*R ,model5h);

proc print data=allfit; run;

/*Thus we delete CG*/

model G2 chi2 DF delta pvaluemodel4f 15.78 20.12 17 0.01205 0.53923model5a 201.22 190.59 18 0.09137 0model5b 107.79 113.4 18 0.03719 0model5c 20.34 30.13 18 0.01405 0.31408model5d 19.18 24.86 18 0.01727 0.38079model5e 513.5 473.58 18 0.18232 0model5 16.74 20.51 18 0.01508 0.54139model5g 18.96 22.57 18 0.01681 0.39448model5h 25.57 29.34 18 0.03213 0.11002

Page 19: 1 STA 617 – Chp9 Loglinear/Logit Models Loglinear / Logit Models  Chapter 5-7 logistic regression: GLM with logit link binomial / multinomial  Chapter

19STA 617 – Chp9 STA 617 – Chp9 Loglinear/Logit ModelsLoglinear/Logit Models

/*STEP 6 Backward selection starting from Model 5 above*/

%modelbuild( A*M A*R A*G C*M M*R M*G G*R ,model6a);

%modelbuild(A*C A*R A*G C*M M*R M*G G*R ,model6b);

%modelbuild(A*C A*M A*G C*M M*R M*G G*R ,model6c);

%modelbuild(A*C A*M A*R C*M M*R M*G G*R ,model6d);

%modelbuild(A*C A*M A*R A*G M*R M*G G*R ,model6e);

%modelbuild(A*C A*M A*R A*G C*M M*G G*R ,model6);

%modelbuild(A*C A*M A*R A*G C*M M*R G*R,model6g);

proc print data=allfit; run;

/*Thus we delete MR*/

model G2 chi2 DF delta pvaluemodel5 16.74 20.51 18 0.01508 0.54139model6a 204.12 192.29 19 0.0936 0model6b 108.78 113.12 19 0.03719 0model6c 21.29 30.15 19 0.01621 0.32098model6d 22.02 26.91 19 0.02245 0.28319model6e 513.73 474.24 19 0.18261 0model6f 19.91 23.02 19 0.01841 0.4001model6g 25.81 29.83 19 0.03424 0.13553

Page 20: 1 STA 617 – Chp9 Loglinear/Logit Models Loglinear / Logit Models  Chapter 5-7 logistic regression: GLM with logit link binomial / multinomial  Chapter

20STA 617 – Chp9 STA 617 – Chp9 Loglinear/Logit ModelsLoglinear/Logit Models

/*STEP 6 Backward selection starting from Model 6 above*/

%modelbuild( A*M A*R A*G C*M M*G G*R ,model7a);

%modelbuild(A*C A*R A*G C*M M*G G*R ,model7b);

%modelbuild(A*C A*M A*G C*M M*G G*R ,model7c);

%modelbuild(A*C A*M A*R C*M M*G G*R ,model7d);

%modelbuild(A*C A*M A*R A*G M*G G*R ,model7e);

%modelbuild(A*C A*M A*R A*G C*M G*R ,model7f);

proc print data=allfit; run;

/*STOP Model selection, final model Model 6 above*/model G2 chi2 DF delta pvalue

model6f 19.91 23.02 19 0.01841 0.4001model7a 207.29 195.42 20 0.09606 0model7b 112.93 119.15 20 0.04365 0model7c 28.57 39.01 20 0.02256 0.09663model7d 25.17 29.34 20 0.02555 0.19507model7e 516.9 473.33 20 0.18257 0model7f 28.81 32.13 20 0.03604 0.09167

7d-6f: 25.17-19.91=5.26p=0.02 (DF=1 AG)

Page 21: 1 STA 617 – Chp9 Loglinear/Logit Models Loglinear / Logit Models  Chapter 5-7 logistic regression: GLM with logit link binomial / multinomial  Chapter

21STA 617 – Chp9 STA 617 – Chp9 Loglinear/Logit ModelsLoglinear/Logit Models

Final model

Model 6, denoted by ( AC, AM, CM, AG, AR, GM, GR), has association graph

Every path between C and {G, R} involves a variable in {A, M}. Given the outcome on alcohol use and marijuana use, the model states that cigarette use is independent of both gender and race.

Collapsing over the explanatory variables race and gender, the conditional associations between C and A and between C and M are the same as with the model (AC, AM, CM) fitted in Section 8.2.4.

Page 22: 1 STA 617 – Chp9 Loglinear/Logit Models Loglinear / Logit Models  Chapter 5-7 logistic regression: GLM with logit link binomial / multinomial  Chapter

22STA 617 – Chp9 STA 617 – Chp9 Loglinear/Logit ModelsLoglinear/Logit Models

Model

Removing GM term, (AC, AM, CM, AG, AR, GR) with G2=28.8 (DF=20), pvalue=0.09167,

It does not fit poorly. However, one might collapse over gender and race in studying associations among the primary variables.

An advantage of the full five-variable model is that it estimates effects of gender and race on these responses, in particular the effects of race and gender on alcohol use and the effect of gender on marijuana use.

Page 23: 1 STA 617 – Chp9 Loglinear/Logit Models Loglinear / Logit Models  Chapter 5-7 logistic regression: GLM with logit link binomial / multinomial  Chapter

23STA 617 – Chp9 STA 617 – Chp9 Loglinear/Logit ModelsLoglinear/Logit Models

9.2.3 Loglinear Model Comparison Statistics

Page 24: 1 STA 617 – Chp9 Loglinear/Logit Models Loglinear / Logit Models  Chapter 5-7 logistic regression: GLM with logit link binomial / multinomial  Chapter

24STA 617 – Chp9 STA 617 – Chp9 Loglinear/Logit ModelsLoglinear/Logit Models

statistic

Or

for two nested loglinear models with

It is asymptotically chi-squared with df equal to the difference between df for M0 and M1

Page 25: 1 STA 617 – Chp9 Loglinear/Logit Models Loglinear / Logit Models  Chapter 5-7 logistic regression: GLM with logit link binomial / multinomial  Chapter

25STA 617 – Chp9 STA 617 – Chp9 Loglinear/Logit ModelsLoglinear/Logit Models

9.3 DIAGNOSTICS FOR CHECKING MODELS

The model comparison test using

is useful for detecting whether an extra term improves a model fit.

Cell residuals provide a cell-specific indication of model lack of fit.

Page 26: 1 STA 617 – Chp9 Loglinear/Logit Models Loglinear / Logit Models  Chapter 5-7 logistic regression: GLM with logit link binomial / multinomial  Chapter

26STA 617 – Chp9 STA 617 – Chp9 Loglinear/Logit ModelsLoglinear/Logit Models

9.3.1 Residuals for Loglinear Models

Pearson residual is

Haberman (1973) defined the standardized Pearson residual

Page 27: 1 STA 617 – Chp9 Loglinear/Logit Models Loglinear / Logit Models  Chapter 5-7 logistic regression: GLM with logit link binomial / multinomial  Chapter

27STA 617 – Chp9 STA 617 – Chp9 Loglinear/Logit ModelsLoglinear/Logit Models

9.3.2 Student Survey Example Revisited

Model

405 Yes Yes White Female Yes 1.9922268 Yes Yes White Female No 0.069282453 Yes Yes White Male Yes 0.139215228 Yes Yes White Male No -0.7387823 Yes Yes Other Female Yes -1.1385123 Yes Yes Other Female No 1.20029330 Yes Yes Other Male Yes -1.3501119 Yes Yes Other Male No 0.13216413 Yes No White Female Yes -1.94454218 Yes No White Female No -1.0005628 Yes No White Male Yes 1.766758201 Yes No White Male No 0.3904242 Yes No Other Female Yes 0.57436519 Yes No Other Female No 0.9432731 Yes No Other Male Yes -0.5870618 Yes No Other Male No 0.6777771 No Yes White Female Yes -0.266417 No Yes White Female No -0.061231 No Yes White Male Yes -0.8619317 No Yes White Male No -1.018560 No Yes Other Female Yes -0.405971 No Yes Other Female No -0.847171 No Yes Other Male Yes 1.4249218 No Yes Other Male No 3.2638541 No No White Female Yes 0.796366

117 No No White Female No 0.9435391 No No White Male Yes 0.336513

133 No No White Male No 0.0272170 No No Other Female Yes -0.2482112 No No Other Female No -0.885480 No No Other Male Yes -0.3293317 No No Other Male No -0.74538

Std Pearson Residual

Mcount A C R G

Page 28: 1 STA 617 – Chp9 Loglinear/Logit Models Loglinear / Logit Models  Chapter 5-7 logistic regression: GLM with logit link binomial / multinomial  Chapter

28STA 617 – Chp9 STA 617 – Chp9 Loglinear/Logit ModelsLoglinear/Logit Models

two-factor associations model

Both models are good

405 Yes Yes White Female Yes 400.0949 1.146781268 Yes Yes White Female No 269.8484 -0.39405453 Yes Yes White Male Yes 456.0996 -0.72055228 Yes Yes White Male No 223.9048 0.88328423 Yes Yes Other Female Yes 23.37187 -0.1238323 Yes Yes Other Female No 22.8918 0.03617130 Yes Yes Other Male Yes 30.8181 -0.2677219 Yes Yes Other Male No 21.97047 -0.9984713 Yes No White Female Yes 18.60475 -1.83973218 Yes No White Female No 219.0132 -0.221128 Yes No White Male Yes 23.66513 1.397977201 Yes No White Male No 202.7692 -0.388052 Yes No Other Female Yes 0.949145 1.12591719 Yes No Other Female No 16.22589 1.016091 Yes No Other Male Yes 1.396475 -0.3557818 Yes No Other Male No 17.37625 0.2259851 No Yes White Female Yes 1.328092 -0.3361117 No Yes White Female No 17.82462 -0.285961 No Yes White Male Yes 1.940758 -0.8751117 No Yes White Male No 18.95878 -0.674060 No Yes Other Female Yes 0.128852 -0.365561 No Yes Other Female No 2.511385 -1.06731 No Yes Other Male Yes 0.217796 1.7269478 No Yes Other Male No 3.089721 3.1937271 No No White Female Yes 0.481553 0.793779

117 No No White Female No 112.8044 1.1311211 No No White Male Yes 0.785192 0.267953

133 No No White Male No 133.8766 -0.233140 No No Other Female Yes 0.040802 -0.2032712 No No Other Female No 13.88025 -0.731440 No No Other Male Yes 0.076954 -0.2805917 No No Other Male No 19.05424 -0.76584

Std Pearson Residual

M Predicted Value

count A C R G