logistic regression for dichotomous response variables ... logreg_beamer... · pdf...
Post on 13-Jul-2018
226 views
Embed Size (px)
TRANSCRIPT
Logistic Regression for Dichotomous ResponseVariables
Edpsy/Psych/Soc 589
Carolyn J. Anderson
Department of Educational Psychology
I L L I N O I Suniversity of illinois at urbana-champaign
c Board of Trustees, University of Illinois
Spring 2017
Review Interpretation Inference Model checking. GOF Grouping Hosmer-Lemeshow LR ROC Residuals Influence Titanic
OutlineIn this set of notes:
Review and Some Uses & Examples. Interpreting logistic regression models. Inference for logistic regression. Model checking. The Tale of the Titanic
Next set of notes will cover: Logit models for qualitative explanatory variables. Multiple logistic regression. Sample size & power.
(Logit models for multi-category and ordinal (polytomous)responses covered later)C.J. Anderson (Illinois) Logistic Regression for Dichotomous Spring 2017 2.1/ 129
Review Interpretation Inference Model checking. GOF Grouping Hosmer-Lemeshow LR ROC Residuals Influence Titanic
Additional References & Data
Collett, D. (1991). Analysis of Binary Data.
Hosmer, D.W., & Lemeshow, S. (1989). Applied LogisticRegression.
McCullagh, P, & Nelder, J.A., (1989). Generalized LinearModels, 2nd Edition.
SAS Institute (1995). Logistic Regression Examples Using theSAS System, Version 6.
Example data sets from SAS book are available via
Anonymous ftp to ftp.sas.com.
World wide web http://www.sas.com
C.J. Anderson (Illinois) Logistic Regression for Dichotomous Spring 2017 3.1/ 129
Review Interpretation Inference Model checking. GOF Grouping Hosmer-Lemeshow LR ROC Residuals Influence Titanic
Review of Logistic RegressionThe logistic regression model is a generalized linear model with
Random component: The response variable is binary. Yi = 1or 0 (an event occurs or it doesnt).
We are interesting in probability that Yi = 1, (xi ).
The distribution of Yi is Binomial. Systematic component: A linear predictor such as
+ 1x1i + . . .+ jxji
The explanatory or predictor variables may be quantitative(continuous), qualitative (discrete), or both (mixed).
Link Function: The log of the odds that an event occurs,otherwise known as the logit:
logit() = log
(
1
)
Putting this all together, the logistic regression model is
logit((xi )) = log
((xi )
1 (xi )
)= + 1x1i + . . . + jxji
C.J. Anderson (Illinois) Logistic Regression for Dichotomous Spring 2017 4.1/ 129
Review Interpretation Inference Model checking. GOF Grouping Hosmer-Lemeshow LR ROC Residuals Influence Titanic
Some Uses of Logistic Regression
To model the probabilities of certain conditions or states as afunction of some explanatory variables.
To identify Risk factors for certain conditions (e.g., divorce, welladjusted, disease, etc.).
Diabetes Example (I got these data from SAS Logistic RegressionExamples who got it from Friendly (1991) who got it from Reaven& Miller, 1979).
C.J. Anderson (Illinois) Logistic Regression for Dichotomous Spring 2017 5.1/ 129
Review Interpretation Inference Model checking. GOF Grouping Hosmer-Lemeshow LR ROC Residuals Influence Titanic
(1) Example: Risk Factors
In a study of the relationship between various blood chemistrymeasures and diabetic status, data were collected from 145nonobese adults who were diagnosed as Subclinical diabetic, Overtdiabetic, or Normal
The possible explanatory variables:
Relative weight (persons weight/expected weight given heightor BMI).
Fasting plasma glucose.
Test plasma glucose intolerence.
Plasma insulin during test (measure of insulin response to oralglucose).
Steady state glucose (measure of insulin resistance).
C.J. Anderson (Illinois) Logistic Regression for Dichotomous Spring 2017 6.1/ 129
Review Interpretation Inference Model checking. GOF Grouping Hosmer-Lemeshow LR ROC Residuals Influence Titanic
(2) Descriptive Discriminate AnalysisTo describe differences between individuals from separate groups asa function of some explanatory variables descriptive discriminate analysis.
High School and Beyond data: The response variable is whether astudent attended an academic program or a non-academic program(i.e., general or vocational/techincal).
Possible explanatory variables include
Achievement test scores (continuous) reading, writing,math, science, and/or civics.
Desired occupation (discretenominal) 17 of them.
Socio-Economic status (discreteordinal) low, middle, high.
Goal/Purpose: Describe differences between those who attendedacademic versus non-academic programs.C.J. Anderson (Illinois) Logistic Regression for Dichotomous Spring 2017 7.1/ 129
Review Interpretation Inference Model checking. GOF Grouping Hosmer-Lemeshow LR ROC Residuals Influence Titanic
(3) Adjust for bias
To Adjust for bias in comparing 2 groups in observationalstudies (Rosenbaum & Rubin, 1983)
Propensity = Prob(one group given explanatory variables)
where exclude variables that want to compare groups on.
C.J. Anderson (Illinois) Logistic Regression for Dichotomous Spring 2017 8.1/ 129
Review Interpretation Inference Model checking. GOF Grouping Hosmer-Lemeshow LR ROC Residuals Influence Titanic
(4) Predict ProbabilitiesTo predict probabilities that individuals fall into one of 2 categories on adichotomous response variable as a function of some set of explanatoryvariables.
This covers lots of studies (from epidemiological to educationalmeasurement).
Example: ESR Data from Collett (1991).
A healthy individual should have an erythrocyte sedimentation rate(ESR) less than 20 mm/hour. The value of ESR isnt that important, sothe response variable is just
Yi =
{1 if ESR < 20 or healthy0 if ESR 20 or unhealthy
The possible explanatory variables were
Level of plasma fibrinogen (gm/liter).
Level of gamma-globulin (gm/liter).
C.J. Anderson (Illinois) Logistic Regression for Dichotomous Spring 2017 9.1/ 129
Review Interpretation Inference Model checking. GOF Grouping Hosmer-Lemeshow LR ROC Residuals Influence Titanic
(4) Predict Probabilities
An example from Anderson, Kim & Keller (2013): PIRLs datafrom US
Response variable: Response of student to a question about howoften they look up information on the computer for school(Every day or almost every day, Once or twice a week, Onceor twice a month, Never or almost never)
Explanatory variables: gender, how much time they spend per dayreading for homework, screen time per day, availability ofcomputers in their school, location of school, percent of studentsat school that get free or reduced price lunch, school climate).
Complications: multilevel structure, design/sampling weights, andmissing data.
C.J. Anderson (Illinois) Logistic Regression for Dichotomous Spring 2017 10.2/ 129
Review Interpretation Inference Model checking. GOF Grouping Hosmer-Lemeshow LR ROC Residuals Influence Titanic
(5) Classify IndividualsTo classify individuals into one of 2 categories on the basis of theexplanatory variables.
Effron (1975), Press & Wilson (1978), and Amemiy & Powell(1980) compared logistic regression to discriminant analysis (whichassumes the explanatory variables are multivariate normal at eachlevel of the response variable).
Eshan Bokhari (2014): Compared logistic regressions & discriminantanalysis for identifying who will commit violent act. (Bokari &Hubert method seems to be best).
C.J. Anderson (Illinois) Logistic Regression for Dichotomous Spring 2017 11.1/ 129
Review Interpretation Inference Model checking. GOF Grouping Hosmer-Lemeshow LR ROC Residuals Influence Titanic
(6) Discrete ChoiceTo analyze responses from discrete choice studies (estimate choiceprobabilities).
From SAS Logistic Regression Examples (hypothetical).
Chocolate Candy: 10 subjects presented 8 different chocolateschoose which one of the 8 is the one that they like the best. The 8chocolates consisted of 23 combinations of
Type of chocloate (milk or dark). Center (hard or soft). Whether is had nuts or not.
The response is which chocolate most preferred.
C.J. Anderson (Illinois) Logistic Regression for Dichotomous Spring 2017 12.2/ 129
Review Interpretation Inference Model checking. GOF Grouping Hosmer-Lemeshow LR ROC Residuals Influence Titanic
(6) Discrete Choice (continued)
The different names for this particular logit model are
The multinomial logit model.
McFaddens model.
Conditional logit model.
This model is related to Bradley-Terry-Luce choice model.
This model is used
To analyze choice data and use characteristics of the objectsor attributes of the subject as predictors of choice behavior.
In marketing research to predict consumer behavior.
As an alternative to conjoint analysis.
C.J. Anderson (Illinois) Logistic Regression for Dichotomous Spring 2017 13.3/ 129
Review Interpretation Inference Model checking. GOF Grouping Hosmer-Lemeshow LR ROC Residuals Influence Titanic
(7) Social Network Analysis Data often consist of individuals (people, organizations,
countries, etc.) within a group or network upon which relationsare recorded (e.g., is friends with, talks to, does business with,trades, etc).
The relations can be Undirected (e.g., is biologically related to) Directed (e.g., asks advise from, gives money to)
Example: Data from Parker & Asher (1993). Children in 35different classes were asked