logistic regression for dichotomous response variables ... logreg_beamer... · pdf...

Click here to load reader

Post on 13-Jul-2018

226 views

Category:

Documents

0 download

Embed Size (px)

TRANSCRIPT

  • Logistic Regression for Dichotomous ResponseVariables

    Edpsy/Psych/Soc 589

    Carolyn J. Anderson

    Department of Educational Psychology

    I L L I N O I Suniversity of illinois at urbana-champaign

    c Board of Trustees, University of Illinois

    Spring 2017

  • Review Interpretation Inference Model checking. GOF Grouping Hosmer-Lemeshow LR ROC Residuals Influence Titanic

    OutlineIn this set of notes:

    Review and Some Uses & Examples. Interpreting logistic regression models. Inference for logistic regression. Model checking. The Tale of the Titanic

    Next set of notes will cover: Logit models for qualitative explanatory variables. Multiple logistic regression. Sample size & power.

    (Logit models for multi-category and ordinal (polytomous)responses covered later)C.J. Anderson (Illinois) Logistic Regression for Dichotomous Spring 2017 2.1/ 129

  • Review Interpretation Inference Model checking. GOF Grouping Hosmer-Lemeshow LR ROC Residuals Influence Titanic

    Additional References & Data

    Collett, D. (1991). Analysis of Binary Data.

    Hosmer, D.W., & Lemeshow, S. (1989). Applied LogisticRegression.

    McCullagh, P, & Nelder, J.A., (1989). Generalized LinearModels, 2nd Edition.

    SAS Institute (1995). Logistic Regression Examples Using theSAS System, Version 6.

    Example data sets from SAS book are available via

    Anonymous ftp to ftp.sas.com.

    World wide web http://www.sas.com

    C.J. Anderson (Illinois) Logistic Regression for Dichotomous Spring 2017 3.1/ 129

  • Review Interpretation Inference Model checking. GOF Grouping Hosmer-Lemeshow LR ROC Residuals Influence Titanic

    Review of Logistic RegressionThe logistic regression model is a generalized linear model with

    Random component: The response variable is binary. Yi = 1or 0 (an event occurs or it doesnt).

    We are interesting in probability that Yi = 1, (xi ).

    The distribution of Yi is Binomial. Systematic component: A linear predictor such as

    + 1x1i + . . .+ jxji

    The explanatory or predictor variables may be quantitative(continuous), qualitative (discrete), or both (mixed).

    Link Function: The log of the odds that an event occurs,otherwise known as the logit:

    logit() = log

    (

    1

    )

    Putting this all together, the logistic regression model is

    logit((xi )) = log

    ((xi )

    1 (xi )

    )= + 1x1i + . . . + jxji

    C.J. Anderson (Illinois) Logistic Regression for Dichotomous Spring 2017 4.1/ 129

  • Review Interpretation Inference Model checking. GOF Grouping Hosmer-Lemeshow LR ROC Residuals Influence Titanic

    Some Uses of Logistic Regression

    To model the probabilities of certain conditions or states as afunction of some explanatory variables.

    To identify Risk factors for certain conditions (e.g., divorce, welladjusted, disease, etc.).

    Diabetes Example (I got these data from SAS Logistic RegressionExamples who got it from Friendly (1991) who got it from Reaven& Miller, 1979).

    C.J. Anderson (Illinois) Logistic Regression for Dichotomous Spring 2017 5.1/ 129

  • Review Interpretation Inference Model checking. GOF Grouping Hosmer-Lemeshow LR ROC Residuals Influence Titanic

    (1) Example: Risk Factors

    In a study of the relationship between various blood chemistrymeasures and diabetic status, data were collected from 145nonobese adults who were diagnosed as Subclinical diabetic, Overtdiabetic, or Normal

    The possible explanatory variables:

    Relative weight (persons weight/expected weight given heightor BMI).

    Fasting plasma glucose.

    Test plasma glucose intolerence.

    Plasma insulin during test (measure of insulin response to oralglucose).

    Steady state glucose (measure of insulin resistance).

    C.J. Anderson (Illinois) Logistic Regression for Dichotomous Spring 2017 6.1/ 129

  • Review Interpretation Inference Model checking. GOF Grouping Hosmer-Lemeshow LR ROC Residuals Influence Titanic

    (2) Descriptive Discriminate AnalysisTo describe differences between individuals from separate groups asa function of some explanatory variables descriptive discriminate analysis.

    High School and Beyond data: The response variable is whether astudent attended an academic program or a non-academic program(i.e., general or vocational/techincal).

    Possible explanatory variables include

    Achievement test scores (continuous) reading, writing,math, science, and/or civics.

    Desired occupation (discretenominal) 17 of them.

    Socio-Economic status (discreteordinal) low, middle, high.

    Goal/Purpose: Describe differences between those who attendedacademic versus non-academic programs.C.J. Anderson (Illinois) Logistic Regression for Dichotomous Spring 2017 7.1/ 129

  • Review Interpretation Inference Model checking. GOF Grouping Hosmer-Lemeshow LR ROC Residuals Influence Titanic

    (3) Adjust for bias

    To Adjust for bias in comparing 2 groups in observationalstudies (Rosenbaum & Rubin, 1983)

    Propensity = Prob(one group given explanatory variables)

    where exclude variables that want to compare groups on.

    C.J. Anderson (Illinois) Logistic Regression for Dichotomous Spring 2017 8.1/ 129

  • Review Interpretation Inference Model checking. GOF Grouping Hosmer-Lemeshow LR ROC Residuals Influence Titanic

    (4) Predict ProbabilitiesTo predict probabilities that individuals fall into one of 2 categories on adichotomous response variable as a function of some set of explanatoryvariables.

    This covers lots of studies (from epidemiological to educationalmeasurement).

    Example: ESR Data from Collett (1991).

    A healthy individual should have an erythrocyte sedimentation rate(ESR) less than 20 mm/hour. The value of ESR isnt that important, sothe response variable is just

    Yi =

    {1 if ESR < 20 or healthy0 if ESR 20 or unhealthy

    The possible explanatory variables were

    Level of plasma fibrinogen (gm/liter).

    Level of gamma-globulin (gm/liter).

    C.J. Anderson (Illinois) Logistic Regression for Dichotomous Spring 2017 9.1/ 129

  • Review Interpretation Inference Model checking. GOF Grouping Hosmer-Lemeshow LR ROC Residuals Influence Titanic

    (4) Predict Probabilities

    An example from Anderson, Kim & Keller (2013): PIRLs datafrom US

    Response variable: Response of student to a question about howoften they look up information on the computer for school(Every day or almost every day, Once or twice a week, Onceor twice a month, Never or almost never)

    Explanatory variables: gender, how much time they spend per dayreading for homework, screen time per day, availability ofcomputers in their school, location of school, percent of studentsat school that get free or reduced price lunch, school climate).

    Complications: multilevel structure, design/sampling weights, andmissing data.

    C.J. Anderson (Illinois) Logistic Regression for Dichotomous Spring 2017 10.2/ 129

  • Review Interpretation Inference Model checking. GOF Grouping Hosmer-Lemeshow LR ROC Residuals Influence Titanic

    (5) Classify IndividualsTo classify individuals into one of 2 categories on the basis of theexplanatory variables.

    Effron (1975), Press & Wilson (1978), and Amemiy & Powell(1980) compared logistic regression to discriminant analysis (whichassumes the explanatory variables are multivariate normal at eachlevel of the response variable).

    Eshan Bokhari (2014): Compared logistic regressions & discriminantanalysis for identifying who will commit violent act. (Bokari &Hubert method seems to be best).

    C.J. Anderson (Illinois) Logistic Regression for Dichotomous Spring 2017 11.1/ 129

  • Review Interpretation Inference Model checking. GOF Grouping Hosmer-Lemeshow LR ROC Residuals Influence Titanic

    (6) Discrete ChoiceTo analyze responses from discrete choice studies (estimate choiceprobabilities).

    From SAS Logistic Regression Examples (hypothetical).

    Chocolate Candy: 10 subjects presented 8 different chocolateschoose which one of the 8 is the one that they like the best. The 8chocolates consisted of 23 combinations of

    Type of chocloate (milk or dark). Center (hard or soft). Whether is had nuts or not.

    The response is which chocolate most preferred.

    C.J. Anderson (Illinois) Logistic Regression for Dichotomous Spring 2017 12.2/ 129

  • Review Interpretation Inference Model checking. GOF Grouping Hosmer-Lemeshow LR ROC Residuals Influence Titanic

    (6) Discrete Choice (continued)

    The different names for this particular logit model are

    The multinomial logit model.

    McFaddens model.

    Conditional logit model.

    This model is related to Bradley-Terry-Luce choice model.

    This model is used

    To analyze choice data and use characteristics of the objectsor attributes of the subject as predictors of choice behavior.

    In marketing research to predict consumer behavior.

    As an alternative to conjoint analysis.

    C.J. Anderson (Illinois) Logistic Regression for Dichotomous Spring 2017 13.3/ 129

  • Review Interpretation Inference Model checking. GOF Grouping Hosmer-Lemeshow LR ROC Residuals Influence Titanic

    (7) Social Network Analysis Data often consist of individuals (people, organizations,

    countries, etc.) within a group or network upon which relationsare recorded (e.g., is friends with, talks to, does business with,trades, etc).

    The relations can be Undirected (e.g., is biologically related to) Directed (e.g., asks advise from, gives money to)

    Example: Data from Parker & Asher (1993). Children in 35different classes were asked

View more