logit model, logistic regression, and log-linear model a comparison

Post on 30-Dec-2015

252 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Logit model, logistic regression, and log-linear model

A comparison

R o w i , C o l u m n j S e x : A , B

u u u u ln ABij

Bj

Aiij

o r

o r

w i t h A T I M E [ e a r l y = 0 ; l a t e = 1 ] a n d B S E X [ f e m a l e = 0 ; m a l e = 1 ]

E A R L Y i s r e f e r e n c e c a t e g o r y

... ln xxx 3322110

ijjiij ln

Leaving home

Models of counts: log-linear model

Model 1: null model

= 4.887 ij = 133.5 for all i and j (=530/4)

Model 2: + TIME

= 4.649

i = 0.4291

ln = exp[4.649 + 0.4291 t] 104.5 for ‘early’ (t=0) and 160.5 for ‘late’ (t=1)

or

ln = exp[4.649] = 104.5 for early

ln = exp[4.649 + 0.4291] = 160.5 for late

Leaving home

M o d e l 3 : T I M E A N D S E X

= 4 . 6 9 7 ; 2 = 0 . 4 2 9 1 ; 2 = - 0 . 0 9 8 2

R e f e r e n c e c a t e g o r i e s : ‘ e a r l y ’ [ 1 = 0 ] a n d ‘ F e m a l e s ’ [ 1 = 0 ]

jiij ln

TablePredicted number of young adults leaving home by age and sex

(unsaturated log-linear model)Females Males Total

< 20 109.6 99.4 209

20 168.4 152.6 321

Total 278 252 530

Leaving home

11 = exp[4.697] = 109.6

21 = exp[4.697 + 0.4291] = 168.4

12 = exp[4.697 - 0.0982] = 99.4

22 = exp[4.697 + 0.4291 - 0.0982] = 152.8

Model 3: Time and Sex (unsaturated log-linear model)

jiij ln

jiij exp

Leaving home

M o d e l 4 : T I M E A N D S E X A N D T I M E * S E X i n t e r a c t i o n( S a t u r a t e d l o g - l i n e a r m o d e l

= 4 . 9 0 5 o v e r a l l e f f e c t 2 = 0 . 0 5 7 6 T I M E 2 = - 0 . 6 0 1 2 G E N D E R 2 2 = 0 . 8 2 0 1 T I M E * G E N D E R

o r

1 i = 0 f o r < 2 0x 1 i = 1 f o r 2 0

x 2 i = 0 f e m a l e sx 2 i = 1 m a l e s

x 3 i = 0 < 2 0 a n d f e m a l e sx 3 i = 0 < 2 0 a n d m a l e sx 3 i = 0 2 0 a n d f e m a l e sx 3 i = 1 2 0 a n d m a l e s

S a t u r a t e d m o d e l p r e d i c t s p e r f e c t l y

i jjii j ln

x i332 i21 i10ij ln xx

Leaving home

M o d e l 4 : T I M E A N D S E X A N D T I M E * S E X i n t e r a c t i o n

= 4 . 9 0 5 o v e r a l l e f f e c t 2 = 0 . 0 5 7 5 7 T I M E ( 2 ) 2 = - 0 . 6 0 1 2 S E X ( 2 ) 2 2 = 0 . 8 2 0 1 T I M E ( 2 ) * S E X ( 2 )

ijjiij ln

TablePredicted number of young adults leaving home by age and sex

(saturated log-linear model)Females Males Total

< 20 135 74 209

20 143 178 321

Total 278 252 530

Leaving home

Model 4: TIME AND SEX AND TIME*SEX interaction

11 = exp[4.905

= 135

21 = exp[4.905 + 0.0576]

= 143

12 = exp[4.905 - 0.6012]

= 74

22 = exp[4.905 + 0.0576 - 0.6012 + 0.8201]

= 178

ijjiij ln

ijjiij exp

Leaving home

Log-linear and logit model

Log-linear model: μ ln μμμλAB

ij

B

j

A

iij

Select one variable as a dependent variable: response variable, e.g. does voting behaviour differ by sex

Are females more likely to vote conservative than males?

Logit model: γ ln B

j

2j

1j

λλ γ

Political attitudes

μμμμμμλλ AB

21

B

1

A

2

AB

11

B

1

A

1

21

11 μ μ ln

Males voting conservative rather than labour:

Females voting conservative rather than labour:

μμμμμμλλ AB

22

B

2

A

2

AB

12

B

2

A

1

22

12 μ μ ln

Are females more likely to vote conservative than males?

Log-odds = logit

2 - - ln μ2μμμμμλλ AB

21

A

1

AB

21

AB

11

A

2

A

1

21

11

2 - - ln μ2μμμμμλλ AB

22

A

1

AB

22

AB

12

A

2

A

1

22

12

Effect coding (1)

θγγ B

1

B

1ln

θγγ B

2

B

2ln

A = Party; B = Sex

Political attitudes

Are women more conservative than men? Do women vote more conservative than men? The odds ratio.

γγγγθθ B

1

B

2

B

1

B

2B

1

B

2 - γ γ ln

If the odds ratio is positive, then the odds of voting conservative rather than labour is larger for women than men. In that case, women vote more conservative than men.

0* - γ ln γγγθB

1

B

2

B

1

B

1

1* - γ ln γγγθB

1

B

2

B

1

B

2

bx a p-1

pln ln logit(p) η

pp

2

1 Logit model:

with a = γB

1 γ

and b = γγB

1

B

2

Log odds of reference category (males)

Log odds ratio (odds females / odds males)

with x = 0, 1

Political attitudes

The logit model as a regression model

• Select a response variable proportion

• Dependent variable of logit model is the log of (odds of) being in one category rather than in another.

• Number of observations in each subpopulation (males, females) is assumed to be fixed.

• Intercept (a) = log odds of reference category

• Slope (b) = log odds ratio

DATA SexParty Male Female TotalConservative 279 352 631Labour 335 291 626Total 614 643 1257

Logit model: descriptive statisticsCounts in terms of odds and odds ratio

Male Female TotalOdds 0.8328 1.2096 1.0080Odds ratio (ref.cat: males): 1.4524

Sex

Reference categories: Labour; Males

Party Odds Odds ratioConservative 1.2616Labour 0.8687Total 1.0472 1.4524

F11 = 279

F21 = 335 = 279 * 335/279 = 279 / 0.8328

F12 = 352 = 279 * 352/279 = 279 1.2616

F22 = 291 = 279 * 352/279 * 291/352 = 279 * 1.2616 * [1/1.2096]

Political attitudes

DATA SexParty Male Female TotalConservative 279 352 631Labour 335 291 626Total 614 643 1257

Proportion voting conservative: SexParty Male Female Males Females Conservative 0.454 0.547 0.8328 1.2096

Are females more likely to vote conservative than males?Logit model: logit(p) = a + bX (males reference category)

v exp(v) pln(odds) (odds)

a = -0.18292 0.8328 0.454 Males = 0.833/(1+0.833)b = 0.37323 1.4524 Odds ratioa+b = 0.19031 1.2096 0.547 Females = 1.2096/(1+1.2096)

logit(p) = -0.18292 + 0.37323X (with X = 0 for males and X = 1 for females)

If number of males and number of females are known, the counts can be calculated.

Odds of voting cons. rather than labour

LOGIT MODEL

Political attitudes

Logistic regression SPSS

Variable Param S.E. Exp(param) SEX(1) .3732 .1133 1.4524Constant -.1903 .0792

Females voting labour: 1/[1+exp[-(-0.1903)]] = 45% 291/626 (females ref.cat)Males voting labour: 1/[1+exp[-(-0.1903+0.3732)]] = 55% 335/626

Reference category: females (X = 1 for males and X = 0 for females)

Different parameter coding: X = -0.5 for males and X = 0.5 for females

Variable Param S.E. Exp(param)SEX(1) -.3732 .1133 0.6885 Constant -.0037 .0567

Females voting labour: 1/[1+exp[-(-0.0037 + 0.5*(-0.3732))]] = 45% 291/626Males voting labour: 1/[1+exp[-(-0.0037 - 0.5 * (-0.3732))]] = 55% 335/626

Political attitudes

Observation from a binomial distribution with parameter p and index m

The logit model andthe logistic regression

Leaving parental home

L o g i t m o d e l a n d l o g i s t i c r e g r e s s i o n

N u m b e r o f y o u n g a d u l t s l e a v i n g h o m e e a r l y : 2 0 9T o t a l n u m b e r o f y o u n g a d u l t s l e a v i n g h o m e : 5 3 0P r o b a b i l i t y o f l e a v i n g h o m e e a r l y : 2 0 9 / 5 3 0 = 0 . 3 9 4

R E F E R E N C E C A T E G O R Y : l e a v i n g h o m e l a t e ( l a t e = 0 ; e a r l y = 1 )

O D D S o f l e a v i n g h o m e e a r l y v e r s u s l a t e : 2 0 9 / ( 5 3 0 - 2 0 9 ) = 0 . 6 5 1 1L o g i t o f l e a v i n g h o m e e a r l y : l n 0 . 6 5 1 1 = - 0 . 4 2 9 1

S p e c i f y a m o d e l :

L o g i t m o d e l

0.4291- 0 .394-1

0 .394ln

p-1

pln pLogit

Leaving home

L o g i s t i c r e g r e s s i o n

0.394 (-0.4291)-exp1

1 p

S t a n d a r d e r r o r :

0.0889 321

1

209

1

C o n fi d e n c e i n t e r v a l : - 0 . 4 2 9 1 1 . 9 6 * 0 . 0 8 8 9 = ( - 0 . 6 0 3 , - 0 . 2 5 5 ) O N L O G I T S C A L E

a n d

0.4366) (0.3546, 549)]exp[-(-0.21

1 ,

)][-(-0.6033exp1

1

O N P R O B A B I L I T Y S C A L E

Leaving home

Relation logit and log-linear modelThe unsaturated model

Log-linear model:

With i effect of timing and j effect of sex

Odds of leaving parental home late rather than early: females:

ln jiij

1.536 109.6

168.4

11

21

21ODDS

1.536 0-0.4291exp -exp

exp

exp 2112

11

12

11

21

21ODDS

Leaving home

Relation logit and log-linear modelThe unsaturated model

Odds of leaving parental home late rather than early: males:

1.536 99.4

152.6

12

22

21ODDS

1.536 0-0.4291exp -exp

exp

exp 2112

21

22

12

22

21ODDS

0.0889) (s.e.result same gives modellogit ofOutput

males. and femalesfor 0.4291 Logit pp

early

late

Leaving home

Relation logit and log-linear modelThe saturated model

Log-linear model:

With i effect of timing and j effect of sex and ij the effect of interaction between timing and sex

Odds of leaving parental home late rather than early: females (ref):

ijjiij ln

1.059 135

143

11

21

21ODDS

1.059 0) - (0 0)-(0.0576exp

) - ( ) -exp exp

exp 21112112

1111

2112

11

21

21 (ODDS

Leaving home

Relation logit and log-linear modelThe saturated model

Odds of leaving parental home late rather than early: males:

2.405 74

178

12

22

22ODDS

males)for 1 and femalesfor 0 X(with X 0.8201 0.0573 logit(p) :modellogit

[ref]) females odds / males (odds RATIO ODDS log is 0.8201 0.0573 - 0.8775

malesfor odds log is 0.8775 2.405ln

cat) ref. (females modellogit ofeffect overall is 0.0573 1.059ln

2.405 0) -(0.8201 0)-(0.0576exp

) - ( ) -exp exp

exp 22122212

1221

2222

12

22

22 (ODDS

Leaving home

females 278

143 0.514

0.8201)]-77exp[-(0.871

1 p

males 252

178 0.706

77)]exp[-(0.871

1 p

0.8201X - 0.8777 p-1

pln Logit(p)

Logit model:

Logistic regression: probability of leaving home late

X=0 for males

X=1 for females

Leaving home

T a b l eN u m b e r o f y o u n g a d u l t s l e a v i n g h o m e b y a g e a n d s e x

F e m a l e s M a l e s T o t a l

< 2 0 1 3 5 7 4 2 0 9

2 0 1 4 3 1 7 8 3 2 1

T o t a l 2 7 8 2 5 2 5 3 0

D u m m y c o d i n g : r e f e r e n c e c a t e g o r y : ( i ) f e m a l e s ; ( i i ) l e a v i n g h o m e l a t e

L o g i t m o d e l : xx ii10i

i 0.8201 - 0.05757- p-1

pln pLogit

x i i s 0 f o r f e m a l e s a n d 1 f o r m a l e s

L O G I T p i s – 0 . 0 5 7 5 7 f o r f e m a l e s a n d – 0 . 0 5 7 5 7 – 0 . 8 2 0 1 = - 0 . 8 7 7 7 f o r m a l e s

O D D SF e m a l e s ( r e f e r e n c e ) : e x p [ - 0 . 0 5 7 5 7 ] = 0 . 9 4 4 0 = 1 3 5 / 1 4 3M a l e s : e x p [ - 0 . 8 7 7 7 ] = 0 . 4 1 5 7 = 7 4 / 1 7 8

O D D S R A T I OO D D S m a l e s / O D D S f e m a l e s = e x p [ - 0 . 8 2 0 1 ] = 0 . 4 4 0 4 = 0 . 4 1 5 7 / 0 . 9 4 4 0

A r e m a l e s m o r e l i k e l y t o l e a v e h o m e e a r l y t h a n f e m a l e s ?

Leaving home

L o g i s t i c r e g r e s s i o n

0.486 (-0.05757)-exp1

1 p f

0.294 0.8201) - (-0.05757-exp1

1 p m

xx ii10i

i 0.4101 0.4676- p-1

pln pLogit

x i i s 1 f o r f e m a l e s a n d - 1 f o r m a l e s

L o g i t p i s – 0 . 4 6 7 6 + 0 . 4 1 0 1 = - 0 . 0 5 7 6 f o r f e m a l e s a n d - 0 . 4 6 7 6 + 0 . 4 1 0 1 * ( - 1 ) = - 0 . 8 7 7 7 f o r m a l e s

xx ii10

i

i 0.8201 - 0.05757- p-1

pln pLogit

Dummy coding: ref.cat: females, late

Effect coding or marginal coding: females +1; males –1

Leaving home

The logistic regression in SPSS

Micro data and tabulated data

SPSS: Micro-data

• Micro-data: age at leaving home in months

• Crosstabs: Number leaving home by reason (row) and sex (column)

• Create variable: Age in years• Age = TRUNC[(month-1)/12]

• Create variable: TIMING2 based on MONTH: • TIMING2 =1 (early) if month 240 & reason < 4

• TIMING2 =2 (late) if month > 240 & reason < 4

• For analysis: select cases that are NOT censored: SELECT CASES with reason < 4

SPSS: tabulated data

• Number of observations: WEIGHT cases (in data)

• No difference between model for tabulated data and

micro-data

The logistic regression in SPSS

SPSS: regression/logisticNote: Dependent variable: TIMING2 (p = probability of leaving home LATE)

Covariate: sex (CATEGORICAL)

Logit[p/(1-p)] = 0.8777 – 0.8201 X with males reference categoryMales coded 0; hence X is 1 for females

OUTPUT SPSS:

---------------------- Variables in the Equation -----------

Variable B S.E. Wald df Sig R Exp(B)

SEX(1) -.8201 .1831 20.0598 1 .0000 -.1594 .4404Constant .8777 .1383 40.2681 1 .0000

Leaving home

Related models

• Poisson distribution: counts have Poisson distribution (total number not fixed)

• Poisson regression

• Log-linear model: model of count data (log of counts)

• Binomial and multinomial distributions: counts follow multinomial distribution (total number is fixed)

• Logit model: model of proportions [and odds (log of odds)]

• Logistic regression

• Log-rate model: log-linear model with OFFSET (constant term)

Parameters of these models are related

Construct your own logistic regression model

Specify the logistic regressionfor this observation

• Schoolleavers: 50% are males and 50% are females

• 70% of schoolleavers find a job within a year

• 60% of those who find a job are females

1. Construct table

Table

Durationof search Females Males Total

Less than 1 year 42 28 701 year and more 8 22 30Total 50 50 100

Sex

Duration of job search among schooleavers, by sex

84% of females find a job within a year against 56% of males

2. Determine reference categories

• Duration of job search: One year or more

• Sex: Males

3. Odds ratios

• Males (ref. Cat): 28/22 = 1.278

• Females: 42/8 = 5.250

• Odds ratio: 5.250/1.278 = 4.125

Logit model

• p = probability of finding a job within a year

• Logit(p) = ln[p/(1-p)] = a + b x • with x Sex (0 for males and 1 for females)

– a = ln 1.273 = 0.241– b = ln 4.128 = 1.418

• Logit model for these data:

logit(p) = 0.241 + 1.418 x

Logistic regression

• For males:

• For females:

• 84% of females find a job within a year against 56% of males

0.56 0)]*1.418 (0.241 exp[- 1

1 p

0.84 1)]*1.418 - (0.241 exp[- 1

1 p

Confidence interval

• S.e. saturated model:– s.e. of a [0.2412] =

– s.e. of b [1.417] =

0.2849 22

1

28

1

0.4796 8

1

42

1

22

1

28

1

Confidence interval

• S.e. null model:– s.e. of ln[0.7/(1-0.7)]

= s.e. of 0.8473 =

• Conf. Interval: 0.8473 +/- 1.96 * 0.2180

(0.420, 1.275) on logit scale

or (0.603, 0.782) on probability scale

• The p for males and females are significantly different

0.2180 30

1

70

1

SPSS output: logistic regression

Parameters of logistic regression

Variable B S.E. Wald df Sig (p-value) R

SEX(1) -1.4168 0.4795 8.7297 1 0.0031 -0.2347Constant -0.2412 0.2849 0.7165 1 0.3973

p = probability that duration of search is more than one year

Simple coding (SPSS): reference categories:

• Dependent variable: timing: early

• Factor: sex: males

Parameters

SPSS output: logistic regression

Parameters of logistic regression

p = probability that duration of search is more than one year

Deviation coding (SPSS):

• Dependent variable: timing: early

• Factor: females (-1); males (+1)

ParametersVariable B S.E. Wald df Sig (p-value) R

SEX(1) -0.7084 0.2398 8.7297 1 0.0031 -0.2347Constant -0.9496 0.2398 15.6849 1 0.0001

SPSS and GLIM: a comparison

TIMING2 * SEX Crosstabulation

Count

135 74 209

143 178 321

278 252 530

Early

Late

TIMING2

Total

Females Males

SEX

Total

SPSS: UNSATURATED LOG-LINEAR MODEL: Parameter Estimates

Asymptotic 95% CIParameter Estimate SE Z-value Lower Upper

1 5.0280 .0721 69.75 4.89 5.17 TIMI(1)2 .0982 .0870 1.13 -.07 .27 3 .0000 . . . . SEX(1) 4 -.4291 .0889 -4.83 -.60 -.25 5 .0000 . . . .

GLIM: UNSATURATED LOG-LINEAR MODEL

estimate s.e. parameter [o] 1 4.697 0.08058 1 [o] 2 0.4291 0.08887 TIMI(2) [o] 3 -0.09819 0.08697 SEX(2) [o] scale parameter taken as 1.000

SPSS: SATURATED MODEL

Asymptotic 95% CIParameter Estimate SE Z-value Lower Upper

1 5.1846 .0748 69.27 5.04 5.33TIMI(1) 2 -.2183 .1121 -1.95 -.44 1.497E-03 3 .0000 . . . .SEX(1) 4 -.8738 .1379 -6.33 -1.14 -.60 5 .0000 . . . .TIMI*SEX6 .8164 .1827 4.47 .46 1.17 7 .0000 . . . . 8 .0000 . . . . 9 .0000 . . .

GLIM: SATURATED MODEL

d e$ [o] estimate s.e. parameter [o] 1 4.905 0.08607 1 [o] 2 0.05757 0.1200 TIMI(2) [o] 3 -0.6012 0.1446 SEX(2) [o] 4 0.8201 0.1831 TIMI(2).SEX(2) [o] scale parameter taken as 1.000

top related