logistic regression and odds ratios 818 - lecture 0… · odds ratio used to compare two...

33
Logistic Regression and Logistic Regression and Odds Ratios Odds Ratios Psych 818 - DeShon Psych 818 - DeShon

Upload: others

Post on 20-Jun-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Logistic Regression and Odds Ratios 818 - Lecture 0… · Odds Ratio Used to compare two proportions across groups odds for males =.54/(1-.53) = 1.13 odds for females = .62/(1-.62)

Logistic Regression andLogistic Regression and

Odds RatiosOdds Ratios

Psych 818 - DeShonPsych 818 - DeShon

Page 2: Logistic Regression and Odds Ratios 818 - Lecture 0… · Odds Ratio Used to compare two proportions across groups odds for males =.54/(1-.53) = 1.13 odds for females = .62/(1-.62)

Dichotomous ResponseDichotomous Response

Used when the outcome or DV is aUsed when the outcome or DV is adichotomous, random variabledichotomous, random variable

Can only take one of two possible values (1,0)Can only take one of two possible values (1,0)Pass/FailPass/Fail

Disease/No DiseaseDisease/No Disease

Agree/DisagreeAgree/Disagree

True/FalseTrue/False

Present/AbsentPresent/Absent

This data structure causes problems forThis data structure causes problems forOLS regressionOLS regression

Page 3: Logistic Regression and Odds Ratios 818 - Lecture 0… · Odds Ratio Used to compare two proportions across groups odds for males =.54/(1-.53) = 1.13 odds for females = .62/(1-.62)

Dichotomous ResponseDichotomous Response

Properties of dichotomous responseProperties of dichotomous response

variables (variables (YY))POSITIVE RESPONSE (Success =1) POSITIVE RESPONSE (Success =1) pp

NEGATIVE RESPONSE (Failure = 0) NEGATIVE RESPONSE (Failure = 0) qq = (1- = (1-pp))

observed proportion of successes observed proportion of successes

VarVar((YY) = ) = p*qp*q

OoopsOoops! Variance depends on the mean! Variance depends on the mean

Y = p

Page 4: Logistic Regression and Odds Ratios 818 - Lecture 0… · Odds Ratio Used to compare two proportions across groups odds for males =.54/(1-.53) = 1.13 odds for females = .62/(1-.62)

Dichotomous ResponseDichotomous Response

Lets generate some (0,1)Lets generate some (0,1)

datadataYY <- <-rbinomrbinom((nn==10001000,,sizesize==11,,probprob==.3.3))

mean(Y)mean(Y) = 0.295= 0.295

μμ = .3 = .3

varvar(Y)(Y) = 0.208 = 0.20822= (.3 = (.3 *.7) = .21*.7) = .21

histhist(Y(Y))

Histogram of Y

Y

0.0 0.2 0.4 0.6 0.8 1.0

01

00

20

03

00

40

05

00

60

07

00

Page 5: Logistic Regression and Odds Ratios 818 - Lecture 0… · Odds Ratio Used to compare two proportions across groups odds for males =.54/(1-.53) = 1.13 odds for females = .62/(1-.62)

Describing Dichotomous DataDescribing Dichotomous Data

Proportion of successes (p)Proportion of successes (p)

OddsOdds

Odds of an event is the probability it occursOdds of an event is the probability it occurs

divided by the probability it does not occurdivided by the probability it does not occur

p/(1-p)p/(1-p)

if p=.53; odds=.53/.47 = 1.13if p=.53; odds=.53/.47 = 1.13

Page 6: Logistic Regression and Odds Ratios 818 - Lecture 0… · Odds Ratio Used to compare two proportions across groups odds for males =.54/(1-.53) = 1.13 odds for females = .62/(1-.62)

Modeling Y (Categorical X)Modeling Y (Categorical X)

Odds RatioOdds Ratio

Used to compare two proportions across groupsUsed to compare two proportions across groupsodds for males =.54/(1-.53) = 1.13odds for males =.54/(1-.53) = 1.13

odds for females = .62/(1-.62) = 1.63odds for females = .62/(1-.62) = 1.63

Odds-ratio = 1.62/1.13 = 1.44Odds-ratio = 1.62/1.13 = 1.44

A female is 1.44 times more likely than a male to get a 1A female is 1.44 times more likely than a male to get a 1

OrOr…… 1.13/1.62 = 0.69 1.13/1.62 = 0.69

A male is .69 times as likely as a female to get a 1A male is .69 times as likely as a female to get a 1

OR > 1: increased odds for group 1 relative to 2OR > 1: increased odds for group 1 relative to 2

OR = 1: no difference in odds for group 1 relative to 2OR = 1: no difference in odds for group 1 relative to 2

OR < 1: lower odds for group 1 relative to 2OR < 1: lower odds for group 1 relative to 2

Page 7: Logistic Regression and Odds Ratios 818 - Lecture 0… · Odds Ratio Used to compare two proportions across groups odds for males =.54/(1-.53) = 1.13 odds for females = .62/(1-.62)

Modeling Y (Categorical X)Modeling Y (Categorical X)

Odds-ratio for a 2 x 2 tableOdds-ratio for a 2 x 2 table

Odds(Hi)Odds(Hi)11/411/4

Odds(Lo)Odds(Lo)2/52/5

O.R. = (11/4)/(2/5)=8.25O.R. = (11/4)/(2/5)=8.25

Odds of HD are 8.25 time larger for highOdds of HD are 8.25 time larger for highcholesterolcholesterol

CholestCholest

inin

DietDiet

Heart DiseaseHeart Disease

232310101313

886622LoLo

1515441111HiHi

NNYY

Page 8: Logistic Regression and Odds Ratios 818 - Lecture 0… · Odds Ratio Used to compare two proportions across groups odds for males =.54/(1-.53) = 1.13 odds for females = .62/(1-.62)

Odds-RatioOdds-Ratio

Ranges from 0 to infinityRanges from 0 to infinity

00 11

Tends to be skewedTends to be skewed

Often transform to log-odds to getOften transform to log-odds to get

symmetrysymmetryThe log-OR comparing females to males = log(1.44) = 0.36The log-OR comparing females to males = log(1.44) = 0.36

The log-OR comparing males to females = log(0.69) = -0.36The log-OR comparing males to females = log(0.69) = -0.36

Page 9: Logistic Regression and Odds Ratios 818 - Lecture 0… · Odds Ratio Used to compare two proportions across groups odds for males =.54/(1-.53) = 1.13 odds for females = .62/(1-.62)

Modeling Y (Continuous X)Modeling Y (Continuous X)

We need to form a general prediction modelWe need to form a general prediction model

Standard OLS regression wonStandard OLS regression won’’t workt work

The errors of a dichotomous variable can not beThe errors of a dichotomous variable can not be

normally distributed with constant variancenormally distributed with constant variance

Also, the estimated parameters donAlso, the estimated parameters don’’t make mucht make much

sensesense

LetLet’’s look at a s look at a scatterplot scatterplot of dichotomous dataof dichotomous data……

Page 10: Logistic Regression and Odds Ratios 818 - Lecture 0… · Odds Ratio Used to compare two proportions across groups odds for males =.54/(1-.53) = 1.13 odds for females = .62/(1-.62)

Dichotomous Dichotomous ScatterplotScatterplot

What smooth function can we use to model somethingWhat smooth function can we use to model something

that looks like this?that looks like this?

Page 11: Logistic Regression and Odds Ratios 818 - Lecture 0… · Odds Ratio Used to compare two proportions across groups odds for males =.54/(1-.53) = 1.13 odds for females = .62/(1-.62)

Dichotomous Dichotomous ScatterplotScatterplot

OLS regression? Smooth butOLS regression? Smooth but……

Page 12: Logistic Regression and Odds Ratios 818 - Lecture 0… · Odds Ratio Used to compare two proportions across groups odds for males =.54/(1-.53) = 1.13 odds for females = .62/(1-.62)

Dichotomous Dichotomous ScatterplotScatterplot

Could break X into groups to form a moreCould break X into groups to form a more

continuous scale for Ycontinuous scale for Y

proportion or percentage scaleproportion or percentage scale

Page 13: Logistic Regression and Odds Ratios 818 - Lecture 0… · Odds Ratio Used to compare two proportions across groups odds for males =.54/(1-.53) = 1.13 odds for females = .62/(1-.62)

Dichotomous Dichotomous ScatterplotScatterplot

Now, plot the categorized dataNow, plot the categorized data

Notice the “S”Shape? = sigmoid

Notice that we just shifted to acontinuous scale?

Page 14: Logistic Regression and Odds Ratios 818 - Lecture 0… · Odds Ratio Used to compare two proportions across groups odds for males =.54/(1-.53) = 1.13 odds for females = .62/(1-.62)

Dichotomous Dichotomous ScatterplotScatterplot

We can fit a smooth function by modelingWe can fit a smooth function by modeling

the probability of success (the probability of success (““11””) directly) directly

Model the probabilityof a ‘1’ rather than the(0,1) data directly

Page 15: Logistic Regression and Odds Ratios 818 - Lecture 0… · Odds Ratio Used to compare two proportions across groups odds for males =.54/(1-.53) = 1.13 odds for females = .62/(1-.62)

Another ExampleAnother Example

Page 16: Logistic Regression and Odds Ratios 818 - Lecture 0… · Odds Ratio Used to compare two proportions across groups odds for males =.54/(1-.53) = 1.13 odds for females = .62/(1-.62)

Another Example (cont)Another Example (cont)

Page 17: Logistic Regression and Odds Ratios 818 - Lecture 0… · Odds Ratio Used to compare two proportions across groups odds for males =.54/(1-.53) = 1.13 odds for females = .62/(1-.62)

Logistic EquationLogistic Equation

E(y|x)= E(y|x)= (x) = probability that a person with a(x) = probability that a person with agiven x-score will have a score of given x-score will have a score of ‘‘11’’ on Y on Y

Could just expand Could just expand uu to include more predictors to include more predictorsfor a multiple logistic regressionfor a multiple logistic regression

(x) =

eu

1+ eu

u = +

1x

Page 18: Logistic Regression and Odds Ratios 818 - Lecture 0… · Odds Ratio Used to compare two proportions across groups odds for males =.54/(1-.53) = 1.13 odds for females = .62/(1-.62)

Logistic RegressionLogistic Regression

- shifts the distribution (value of x where =.5)

- reflects the steepness of the transition (slope)

Page 19: Logistic Regression and Odds Ratios 818 - Lecture 0… · Odds Ratio Used to compare two proportions across groups odds for males =.54/(1-.53) = 1.13 odds for females = .62/(1-.62)

Features of Logistic RegressionFeatures of Logistic Regression

Change in probability is not constantChange in probability is not constant

(linear) with constant changes in X(linear) with constant changes in X

probability of a success (Y = 1) given theprobability of a success (Y = 1) given the

predictor variable (X) is a non-linearpredictor variable (X) is a non-linear

functionfunction

Can rewrite the logistic equation as anCan rewrite the logistic equation as an

OddsOdds

0 1 1( )ˆ( 1| )e

ˆ(1 ( 1| )) (1 )i

b b Xi

i

P Y X

P Y X

+== =

=

Page 20: Logistic Regression and Odds Ratios 818 - Lecture 0… · Odds Ratio Used to compare two proportions across groups odds for males =.54/(1-.53) = 1.13 odds for females = .62/(1-.62)

Logit Logit TransformTransform

Can Can linearizelinearize the logistic equation by using the logistic equation by using

the the ““logitlogit”” transformation transformation

apply the natural log to both sides of theapply the natural log to both sides of the

equationequation

Yields the Yields the logitlogit or log-odds: or log-odds:

0 1 1

ˆ( 1| )ln ln

ˆ(1 ( 1| )) (1 )

P Y Xb b X

P Y X

== = +

=

Page 21: Logistic Regression and Odds Ratios 818 - Lecture 0… · Odds Ratio Used to compare two proportions across groups odds for males =.54/(1-.53) = 1.13 odds for females = .62/(1-.62)

Logit Logit TransformationTransformation

The The logitlogit transformation puts the transformation puts the

interpretation of the regression estimatesinterpretation of the regression estimates

back on familiar footingback on familiar footing

= = expected value of the expected value of the logitlogit (log-odds) (log-odds)

when X = 0when X = 0

= = ‘‘logitlogit difference difference’’ = The amount the = The amount the logitlogit

(log-odds) changes, with a one unit change in(log-odds) changes, with a one unit change in

X;X;

Page 22: Logistic Regression and Odds Ratios 818 - Lecture 0… · Odds Ratio Used to compare two proportions across groups odds for males =.54/(1-.53) = 1.13 odds for females = .62/(1-.62)

LogitLogit

LogitLogit

the natural log of the oddsthe natural log of the odds

often called a log oddsoften called a log odds

logitlogit scale is continuous, linear, and functions scale is continuous, linear, and functionsmuch like a z-score scale.much like a z-score scale.

p = 0.50, then p = 0.50, then logitlogit = 0 = 0

p = 0.70, then p = 0.70, then logitlogit = 0.84 = 0.84

p = 0.30, then p = 0.30, then logitlogit = -0.84 = -0.84

Page 23: Logistic Regression and Odds Ratios 818 - Lecture 0… · Odds Ratio Used to compare two proportions across groups odds for males =.54/(1-.53) = 1.13 odds for females = .62/(1-.62)

Odds-Ratios and LogisticOdds-Ratios and Logistic

RegressionRegression

The slope may also be interpreted as theThe slope may also be interpreted as the

log odds-ratio associated with a unitlog odds-ratio associated with a unit

increase in xincrease in x

exp(exp( )=odds-ratio)=odds-ratio

Compare the log odds (Compare the log odds (logitlogit) of a person) of a person

with a score of x to a person with a scorewith a score of x to a person with a score

of x+1of x+1logit( ( ))x x= +

logit( ( 1)) ( 1)x x x+ = + + = + +

Page 24: Logistic Regression and Odds Ratios 818 - Lecture 0… · Odds Ratio Used to compare two proportions across groups odds for males =.54/(1-.53) = 1.13 odds for females = .62/(1-.62)

There and back againThere and back again……

If the data are consistent with a logistic function,If the data are consistent with a logistic function,

then the relationship between the model and thethen the relationship between the model and the

logit logit is linearis linear

The The logit logit scale is somewhat difficult to understandscale is somewhat difficult to understand

Could interpret as odds but people seem to preferCould interpret as odds but people seem to prefer

probability as the natural scale, soprobability as the natural scale, so……

log logit( )1

pp x

p= = +

Page 25: Logistic Regression and Odds Ratios 818 - Lecture 0… · Odds Ratio Used to compare two proportions across groups odds for males =.54/(1-.53) = 1.13 odds for females = .62/(1-.62)

There and back againThere and back again……

log logit( )1

pp x

p= = +

1

xpe

p

+=

Logit

1

x

x

ep

e

+

+=

+

Odds

Probability

Page 26: Logistic Regression and Odds Ratios 818 - Lecture 0… · Odds Ratio Used to compare two proportions across groups odds for males =.54/(1-.53) = 1.13 odds for females = .62/(1-.62)

EstimationEstimation

DonDon’’t meet OLS assumptions so somet meet OLS assumptions so some

variant of MLE is usedvariant of MLE is used

LetLet’’s develop the likelihoods develop the likelihood

Assuming observations are independentAssuming observations are independent……

p(yi = 1) = i

p(yi = 0) = 1 i

pdf : fi (yi ) = iyi (1 i )

1 yi ; yi = 0,1; i = 1,2...n

joint pdf : fi (yi )i=1

n

= iyi (1 i )

1 yi

i=1

n

Page 27: Logistic Regression and Odds Ratios 818 - Lecture 0… · Odds Ratio Used to compare two proportions across groups odds for males =.54/(1-.53) = 1.13 odds for females = .62/(1-.62)

EstimationEstimation

LikelihoodLikelihood

recall..recall..

joint pdf : fi (yi )i=1

n

= iyi (1 i )

1 yi

i=1

n

log transform = [yi log( i1 i

)]i=1

n

+ log(1 i )i=1

n

log i

1 i

= + x

1 i =1

1+ exp( + x)

Page 28: Logistic Regression and Odds Ratios 818 - Lecture 0… · Odds Ratio Used to compare two proportions across groups odds for males =.54/(1-.53) = 1.13 odds for females = .62/(1-.62)

EstimationEstimation

Upon substitutionUpon substitution……

log l = l( , ) = yi ( + x) log[1+ exp( + x)]i=1

n

i=1

n

Page 29: Logistic Regression and Odds Ratios 818 - Lecture 0… · Odds Ratio Used to compare two proportions across groups odds for males =.54/(1-.53) = 1.13 odds for females = .62/(1-.62)

ExampleExample

Heart Disease & AgeHeart Disease & Age

100 participants100 participants

DV = presence of heart diseaseDV = presence of heart disease

IV = AgeIV = Age

Page 30: Logistic Regression and Odds Ratios 818 - Lecture 0… · Odds Ratio Used to compare two proportions across groups odds for males =.54/(1-.53) = 1.13 odds for females = .62/(1-.62)

Heart Disease ExampleHeart Disease Example

0.0

0.2

0.4

0.6

0.8

1.0

Page 31: Logistic Regression and Odds Ratios 818 - Lecture 0… · Odds Ratio Used to compare two proportions across groups odds for males =.54/(1-.53) = 1.13 odds for females = .62/(1-.62)

Heart Disease ExampleHeart Disease Example

library(MASS)library(MASS)

glmglm(formula = y ~ x, family = binomial,(formula = y ~ x, family = binomial,data=mydatadata=mydata))

Coefficients:

Estimate Std. Error z value Pr(>|z|)

(Intercept) -5.30945 1.13365 -4.683 2.82e-06 ***

age 0.11092 0.02406 4.610 4.02e-06 ***

Null deviance: 136.66 on 99 degrees of freedom

Residual deviance: 107.35 on 98 degrees of freedom

AIC: 111.35

Number of Fisher Scoring iterations: 4

Page 32: Logistic Regression and Odds Ratios 818 - Lecture 0… · Odds Ratio Used to compare two proportions across groups odds for males =.54/(1-.53) = 1.13 odds for females = .62/(1-.62)

Heart Disease ExampleHeart Disease Example

Logistic regressionLogistic regression

Odds-RatioOdds-Ratio

exp(.111)=1.117exp(.111)=1.117

5.31 .111( )

5.31 .111( )( )

1

x

x

ex

e

+

+=

+

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Page 33: Logistic Regression and Odds Ratios 818 - Lecture 0… · Odds Ratio Used to compare two proportions across groups odds for males =.54/(1-.53) = 1.13 odds for females = .62/(1-.62)

Heart Disease ExampleHeart Disease Example

In terms of In terms of logitslogits……

-3-2

-10