logistic regression for nominal response variables multicategory... · pdf filelogistic...

Click here to load reader

Post on 28-Mar-2018

241 views

Category:

Documents

2 download

Embed Size (px)

TRANSCRIPT

  • Logistic Regression for Nominal ResponseVariables

    Edpsy/Psych/Soc 589

    Carolyn J. Anderson

    Department of Educational Psychology

    I L L I N O I Suniversity of illinois at urbana-champaign

    c Board of Trustees, University of Illinois

    Spring 2017

  • Introduction Multinomial/Baseline SAS Inference Grouped Data Latent Variable Conditional Model Mixed model

    Outline

    Introduction and Extending binary model

    Nominal Responses (baseline model)

    SAS

    Inference

    Grouped Data

    Latent variable interpretation

    Discrete choice model (conditional model)

    C.J. Anderson (Illinois) Logistic Regression for Nominal Responses Spring 2017 2.1/ 98

  • Introduction Multinomial/Baseline SAS Inference Grouped Data Latent Variable Conditional Model Mixed model

    Additional ReferencesGeneral References:

    Agresti, A. (2013). Categorical Data Analysis, 3rd edition.NY: Wiley.

    Long, J.S. (1997). Regression Models for Categorical andLimited Dependent Variables. Thousand Oaks, CA: Sage.

    Powers, D.A. & Xie, Y. (2000). Statistical Methods forCategorical Data Analysis. San Diego, CA: Academic Press.

    Fitting (Conditional) Multinomial Models using SAS:

    SAS Institute (1995). Logistic Regression Examples Using theSAS System, (version 6). Cary, NC: SAS Institute.

    Kuhfeld, W.F. (2001). Marketing Research Methods in theSAS System, Version 8.2 Edition, TS-650. Cary, NC: SASInstitute. (reports TS-650A TS-560I).

    C.J. Anderson (Illinois) Logistic Regression for Nominal Responses Spring 2017 3.1/ 98

  • Introduction Multinomial/Baseline SAS Inference Grouped Data Latent Variable Conditional Model Mixed model

    Additional References (continued)

    Some on my web-site,

    http://faculty.education.illinois.edu/cja/Handbookof Quantitative Psychology

    http://faculty.education.illinois.edu/cja/BestPractices/index.html

    Course web-site is most up-to-date.

    C.J. Anderson (Illinois) Logistic Regression for Nominal Responses Spring 2017 4.1/ 98

  • Introduction Multinomial/Baseline SAS Inference Grouped Data Latent Variable Conditional Model Mixed model

    Situation

    Situation: One response variable Y with J levels. One or more explanatory or predictor variables. The predictor

    variables may be quantitative, qualitative or both.

    Model: Multinomial Logistic regression.

    What if you have multiple predictor or explanatory variables?

    Describe individuals? Descriptors of categories? or Both?

    C.J. Anderson (Illinois) Logistic Regression for Nominal Responses Spring 2017 5.1/ 98

  • Introduction Multinomial/Baseline SAS Inference Grouped Data Latent Variable Conditional Model Mixed model

    Differences w/rt Binary logistic Regression

    There are 3 basic differences.

    Forming logits.

    The Distribution.

    Connections with other models (not mentioned before).

    C.J. Anderson (Illinois) Logistic Regression for Nominal Responses Spring 2017 6.1/ 98

  • Introduction Multinomial/Baseline SAS Inference Grouped Data Latent Variable Conditional Model Mixed model

    Forming Logits

    When J = 2, Y is dichotomous and we can model logs ofodds that an event occurs or does not occur. There is only 1logit that we can form

    logit() = log

    (

    1

    )

    When J > 2, . . .

    We have a multicategory or polytomous or polychotomousresponse variable.

    There are J(J 1)/2 logits (odds) that we can form, but only(J 1) are non-redundant.

    There are different ways to form a set of (J 1)non-redundant logits.

    C.J. Anderson (Illinois) Logistic Regression for Nominal Responses Spring 2017 7.1/ 98

  • Introduction Multinomial/Baseline SAS Inference Grouped Data Latent Variable Conditional Model Mixed model

    How to dichotomized the response Y ?

    The most common ones

    Nomnial Y Baseline logit models or Multinomial logistic regression. Conditional or Multinomial logit models.

    Ordinal Y Cumulative logits (Proportional Odds). Adjacent categories. Continuation ratios.

    C.J. Anderson (Illinois) Logistic Regression for Nominal Responses Spring 2017 8.1/ 98

  • Introduction Multinomial/Baseline SAS Inference Grouped Data Latent Variable Conditional Model Mixed model

    The Multinomial Distribution

    Yj Mulitnomial(1, 2, . . . , J) where where

    j j = 1 Yj = number of cases in the jth category (Yj = 0, 1, . . . , n). n =

    j Yj , the number of trials.

    Mean: E (Yj) = nj

    Variance: var(Yj) = nj(1 j)

    Covariance cov(Yj ,Yk) = njk , for j 6= k .

    Probability mass function,

    P(y1, y2, . . . , yJ) =

    (

    n!

    y1!y2! . . . yJ !

    )

    y1y2 . . . yJ

    Binomial distribution is a special case.

    C.J. Anderson (Illinois) Logistic Regression for Nominal Responses Spring 2017 9.1/ 98

  • Introduction Multinomial/Baseline SAS Inference Grouped Data Latent Variable Conditional Model Mixed model

    Example of Multinomial

    High School & Beyond program types General Academic Vo/Tech

    US 2006 Progress in International Reading Literacy Study(PIRLS) responses to item How often to you use the Internetas a source of information for school-related work withresponses

    Every day or almost every data (y1 = 746, p1 = .1494) Once or twice a week (y2 = 1, 240, p2 = .2883) Once or twice a month (y3 = 1, 377, p3 = .2757) Never or almost never (y4 = 1, 631, p4 = .3266)

    C.J. Anderson (Illinois) Logistic Regression for Nominal Responses Spring 2017 10.1/ 98

  • Introduction Multinomial/Baseline SAS Inference Grouped Data Latent Variable Conditional Model Mixed model

    Graph of PIRLS Distribution

    C.J. Anderson (Illinois) Logistic Regression for Nominal Responses Spring 2017 11.1/ 98

  • Introduction Multinomial/Baseline SAS Inference Grouped Data Latent Variable Conditional Model Mixed model

    Graph of PIRLS Distribution

    C.J. Anderson (Illinois) Logistic Regression for Nominal Responses Spring 2017 12.1/ 98

  • Introduction Multinomial/Baseline SAS Inference Grouped Data Latent Variable Conditional Model Mixed model

    Connections with Other Models Some are equivalent to Poisson regression or loglinear models. Some can be derived from (equivalent to) discrete choice

    models (e.g., Luce, McFadden). Some can be derived from latent variable models. Those that are equivalent to conditional multinomial models

    are equivalent to proportional hazard models (models forsurvival data), which is equivalent to Poisson regressionmodel.

    Some multicategory logit models are very similar to IRTmodels in terms of their parametric form. The differencebetween them is that in the IRT models, the predictor isunobserved (latent), and in the model we discuss here, thepredictor variable is observed.

    Others.C.J. Anderson (Illinois) Logistic Regression for Nominal Responses Spring 2017 13.1/ 98

  • Introduction Multinomial/Baseline SAS Inference Grouped Data Latent Variable Conditional Model Mixed model

    Multicategory Logit Models for Nominal Responses

    Baseline or Multinomial logistic regression model. Usecharacteristics of individuals as predictor variables.

    The parameters differ for each category of the responsevariable.

    Conditional Logit model. Use characteristics of the categoriesof the response variable as the predictors.

    The model parameters are the same for each category of theresponse variable.

    Conditional or Mixed logit model. Uses characteristics orattributes of the individuals and the categories as predictorvariables.

    C.J. Anderson (Illinois) Logistic Regression for Nominal Responses Spring 2017 14.1/ 98

  • Introduction Multinomial/Baseline SAS Inference Grouped Data Latent Variable Conditional Model Mixed model

    ConfusionThere is not a standard terminology for these models.

    Agresti (90) Conditional Logit model: Originally referredto by McFadden as a conditional logit model, it is now usuallycalled the multinomial logit model.

    Long (97): Refers to the Baseline or Multinomial logisticregression model as a multinomial logit model and callsConditional Logit model the conditional logit model.

    Powers & Xie (00) on the Conditional and Multinomialmodels, However, it is often called a multinominal logitmodel, leading to a great deal of confusion.

    Agresti (2013) calls all of them multinomial models andrefers to the Baseline or Multinomial logistic regression modelas the Baseline-category model.

    C.J. Anderson (Illinois) Logistic Regression for Nominal Responses Spring 2017 15.1/ 98

  • Introduction Multinomial/Baseline SAS Inference Grouped Data Latent Variable Conditional Model Mixed model

    Further Contribution to Confusion

    The models are related (connections):

    Baseline model is a special case of conditional model.

    Conditional Model can be fit as a proportional hazards model(have to do this in R).

    All are special cases of Possion log-linear models.

    C.J. Anderson (Illinois) Logistic Regression for Nominal Responses Spring 2017 16.1/ 98

  • Introduction Multinomial/Baseline SAS Inference Grouped Data Latent Variable Conditional Model Mixed model

    Baseline Category Logit ModelThe models give a simultaneous representation (summary,description) of the odds of being in one category relative to beingin another category for all pairs of categories.

    We need a set of (J 1) non-redundant odds (logits). All othercan be found from this set.

    This model is a special case of the binary logistic regression model.

    Consider the HSB data: Program types are General, Academic andVocational/TechnicalExp

View more