multinomial logistic regression

46
Multinomial Logistic Regression Inanimate objects can be classified scientifically into three major categories; those that don't work, those that break down and those that get lost” (Russell Baker)

Upload: yair

Post on 05-Jan-2016

118 views

Category:

Documents


0 download

DESCRIPTION

Multinomial Logistic Regression “ Inanimate objects can be classified scientifically into three major categories; those that don't work, those that break down and those that get lost” (Russell Baker). Multinomial Logistic Regression. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Multinomial Logistic Regression

Multinomial Logistic Regression

“Inanimate objects can be classified scientifically into three major categories; those that don't work, those that break down and those that get lost” (Russell

Baker)

Page 2: Multinomial Logistic Regression

Multinomial Logistic Regression

Also known as “polytomous” or “nominal logistic” or “logit regression” or the “discrete choice model”

Generalization of binary logistic regression to a polytomous DVWhen applied to a dichotomous DV identical

to binary logistic regression

Page 3: Multinomial Logistic Regression

Polytomous Variables

Three or more unordered categories Categories mutually exclusive and

exhaustive Sometimes called “multicategorical” or

sometimes “multinomial” variables

Page 4: Multinomial Logistic Regression

Polytomous DVs

Reason for leaving welfare:marriage, stable employment, move to

another state, incarceration, or death Status of foster home application:

licensed to foster, discontinued application process prior to licensure, or rejected for licensure

Changes in living arrangements of the elderly:newly co-residing with their children, no

longer co-residing, or residing in institutions

Page 5: Multinomial Logistic Regression

Single (Dichotomous) IV Example DV = interview tracking effort

easy-to-interview and track mothers (Easy); difficult-to-track mothers who required more

telephone calls (MoreCalls); difficult-to-track mothers who required more

unscheduled home visits (MoreVisits) IV = race, 0 = European-American, 1 =

African-American N = 246 mothers What is the relationship between race and

interview tracking effort?

Page 6: Multinomial Logistic Regression

Crosstabulation

Table 3.1

Relationship between race and tracking effort is statistically significant [2(2, N = 246) = 8.69, p = .013]

Page 7: Multinomial Logistic Regression

Reference Category

In binary logistic regression category of the DV coded 0 implicitly serves as the reference category

Known as “baseline,” “base,” or “comparison” category

Necessary to explicitly select reference category“Easy” selected

Page 8: Multinomial Logistic Regression

Probabilities

Table 3.1 More Calls (vs. Easy)

European-American: .24 = 30 / (30 + 96) African-American: .31 = 24 / (24 + 53)

More Visits (vs. Easy)European-American: .15 = 17 / (17 + 96) African-American: .33 = 26 / (26 +53)

Page 9: Multinomial Logistic Regression

Odds & Odds Ratio

More Calls (vs. Easy)European-American: .3125 (.2098 / .6713)African-American: .4528 (.2330 / .5146)Odds Ratio = 1.45 (.4528 / .3125)

• 45% increase in the odds

More Visits (vs. Easy)European-American: .1771 (.1189 / .6713)African-American: .4905 (.2524 / .5146). Odds Ratio = 2.77 (.4905 / .1771)

• 177% increase in the odds

Page 10: Multinomial Logistic Regression

Question & Answer

What is the relationship between race and interview tracking effort?

The odds of requiring more calls, compared to being easy-to-track, are higher for African-Americans by a factor of 1.45 (45%). The odds of requiring more visits, compared to being easy-to-track, are higher for African-Americans by a factor of 2.77 (177%).

Page 11: Multinomial Logistic Regression

Multinomial Logistic Regression

Set of binary logistic regression models estimated simultaneouslyNumber of non-redundant binary logistic

regression equations equals the number of categories of the DV minus one

Page 12: Multinomial Logistic Regression

Statistical Significance

Table 3.2(Race, More Calls vs. Easy) = (Race, More Visits vs. Easy) = 0

• Reject Table 3.3

(Race, More Calls vs. Easy) = (Race, More Visits vs. Easy) = 0• Reject

Table 3.4(Race, More Calls vs. Easy) = 0

• Don’t Reject(Race, More Visits vs. Easy) = 0

• Reject

Page 13: Multinomial Logistic Regression

Odds Ratios

OR(More Calls vs. Easy) = 1.45The odds of requiring more calls, compared

to being easy-to-track, are not significantly different for European- and African-Americans.

OR(More Visits vs. Easy) = 2.77The odds of requiring more visits, compared

to being easy-to-track, are higher for African-Americans by a factor of 2.77 (177%).

Page 14: Multinomial Logistic Regression

Estimated Logits (L)

Table 3.4

L(More Calls vs. Easy) = a + BRaceXRace

L(More Calls vs. Easy) = -1.163 + (.371)(XRace)

L(More Visits vs. Easy) = a + BRaceXRace

L(More Visits vs. Easy) = -1.731 + (1.019)(XRace)

Page 15: Multinomial Logistic Regression

Logits to Odds

African-Americans (X = 1)

L(More Calls vs. Easy) = -.792 = -1.163 + (.371)(1)

Odds = e-.792 = .45

L(More Visits vs. Easy) = -.712 = -1.731 + (1.019)(1)Odds = e-.712 = .49

Page 16: Multinomial Logistic Regression

Logits to Probabilities

African-Americans, L(More Calls vs. Easy) = -.792

African-Americans, L(More Visits vs. Easy) = -.712

.e

ep̂

.

.

Easy) vs.Calls (More

.e

ep̂

.

.

Easy) vs.Visits (More

Page 17: Multinomial Logistic Regression

Question & Answer

What is the relationship between race and interview tracking effort?

The odds of requiring more calls, compared to being easy-to-track, are not significantly different for European- and African-Americans.

The odds of requiring more visits, compared to being easy-to-track, are higher for African-Americans by a factor of 2.77 (177%).

Page 18: Multinomial Logistic Regression

Single (Quantitative) IV Example DV = interview tracking effort

easy-to-interview and track mothers (Easy); difficult-to-track mothers who required more

telephone calls (MoreCalls); difficult-to-track mothers who required more

unscheduled home visits (MoreVisits) IV = years of education N = 246 mothers What is the relationship between

education and interview tracking effort?

Page 19: Multinomial Logistic Regression

Statistical Significance

Table 3.6(Education, More Calls vs. Easy) = (Education, More Visits vs. Easy)

= 0• Reject

Table 3.7(Education, More Calls vs. Easy) = 0

• Don’t Reject

(Education, More Visits vs. Easy) = 0• Reject

Page 20: Multinomial Logistic Regression

Odds Ratios

OR(More Calls vs. Easy) = .88The odds of requiring more calls, compared

to being easy-to-track, are not significantly associated with education.

OR(More Visits vs. Easy) = .76For every additional year of education the

odds of needing more visits, compared to being easy-to-track, decrease by a factor of .76 (i.e., -24.1%).

Page 21: Multinomial Logistic Regression

Figures

Education.xls

Page 22: Multinomial Logistic Regression

Estimated Logits (L)

Table 3.7

X = 12 (high school education)

L(More Calls vs. Easy) = -.977 = .583 + (-.130)(12)

L(More Visits vs. Easy) = -1.235 = 2.077 + (-.276)(12)

Page 23: Multinomial Logistic Regression

Effect of Education on Tracking Effort (Logits)

-3.00

-2.00

-1.00

0.00

1.00

2.00

Years of Education

Log

its

More Calls -0.46 -0.58 -0.71 -0.84 -0.97 -1.10 -1.23 -1.36 -1.49 -1.62

More Visits -0.13 -0.41 -0.69 -0.96 -1.24 -1.51 -1.79 -2.07 -2.34 -2.62

8 9 10 11 12 13 14 15 16 17

Page 24: Multinomial Logistic Regression

Logits to Odds

X = 12 (high school education)

Odds(More Calls vs. Easy) = e-.977 = .38

Odds(More Visits vs. Easy) = e-1.235 = .29

Page 25: Multinomial Logistic Regression

Effect of Education on Tracking Effort (Odds)

0.00

0.20

0.40

0.60

0.80

1.00

Years of Education

Odd

s

More Calls 0.63 0.56 0.49 0.43 0.38 0.33 0.29 0.26 0.22 0.20

More Visits 0.88 0.66 0.50 0.38 0.29 0.22 0.17 0.13 0.10 0.07

8 9 10 11 12 13 14 15 16 17

Page 26: Multinomial Logistic Regression

Logits to Probabilities

X = 12 (high school education)

.e

ep̂

.

.

Easy) vs.Calls (More

.e

ep̂

.

.

Easy) vs.Visits (More

Page 27: Multinomial Logistic Regression

Effect of Education on Tracking Effort (Probabilities)

.00

.10

.20

.30

.40

.50

Years of Education

Pro

babi

litie

s

More Calls 0.39 0.36 0.33 0.30 0.27 0.25 0.23 0.20 0.18 0.16

More Visits 0.47 0.40 0.34 0.28 0.22 0.18 0.14 0.11 0.09 0.07

8 9 10 11 12 13 14 15 16 17

Page 28: Multinomial Logistic Regression

Question & Answer

What is the relationship between education and interview tracking effort?

The odds of requiring more calls, compared to being easy-to-track, are not significantly associated with education. For every additional year of education the odds of needing more visits, compared to being easy-to-track, decrease by a factor of .76 (i.e., -24.1%).

Page 29: Multinomial Logistic Regression

Multiple IV Example

DV = interview tracking efforteasy-to-interview and track mothers (Easy); difficult-to-track mothers who required more

telephone calls (MoreCalls); difficult-to-track mothers who required more

unscheduled home visits (MoreVisits) IV = race, 0 = European-American, 1 =

African-American IV = years of education N = 246 mothers

Page 30: Multinomial Logistic Regression

Multiple IV Example (cont’d)

What is the relationship between race and interview tracking effort, when controlling for education?

Page 31: Multinomial Logistic Regression

Statistical Significance

Table 3.8(Race, More Calls vs. Easy) = (Race, More Visits vs. Easy) = (Ed,

More Calls vs. Easy) = (Ed, More Visits vs. Easy) = 0• Reject

Table 3.9(Race, More Calls vs. Easy) = (Race, More Visits vs. Easy) = 0

• Reject(Ed, More Calls vs. Easy) = (Ed, More Visits vs. Easy) = 0

• Reject

Page 32: Multinomial Logistic Regression

Statistical Significance (cont’d) Table 3.10

(Race, More Calls vs. Easy) = 0• Don’t reject

(Race, More Visits vs. Easy) = 0• Reject

(Ed, More Calls vs. Easy) = 0• Don’t reject

(Ed, More Visits vs. Easy) = 0• Reject

Page 33: Multinomial Logistic Regression

Odds Ratios: Race

OR(More Calls vs. Easy) = 1.36The odds of requiring more calls, compared

to being easy-to-track, are not significantly different for European- and African-Americans.

OR(More Visits vs. Easy) = 2.48The odds of requiring more visits, compared

to being easy-to-track, are higher for African-Americans by a factor of 2.48 (148%).

Page 34: Multinomial Logistic Regression

Odds Ratios: Education

OR(More Calls vs. Easy) = .89The odds of requiring more calls, compared

to being easy-to-track, are not significantly associated with education.

OR(More Visits vs. Easy) = .77For every additional year of education the

odds of needing more visits, compared to being easy-to-track, decrease by a factor of .77 (i.e., -23%), when controlling for race.

Page 35: Multinomial Logistic Regression

Figures

Race & Education.xls

Page 36: Multinomial Logistic Regression

Effect of Education on Tracking Effort for African-Americans (Odds)

0.00

0.50

1.00

1.50

Years of Education

Odd

s

More Calls 0.73 0.65 0.58 0.51 0.45 0.40 0.36 0.32 0.28 0.25

More Visits 1.30 1.01 0.78 0.60 0.46 0.36 0.28 0.21 0.17 0.13

8 9 10 11 12 13 14 15 16 17

Page 37: Multinomial Logistic Regression

Effect of Education on Tracking Effort for African-Americans (Probabilities)

.00

.10

.20

.30

.40

.50

.60

Years of Education

Pro

babi

litie

s

More Calls 0.42 0.39 0.37 0.34 0.31 0.29 0.26 0.24 0.22 0.20

More Visits 0.57 0.50 0.44 0.38 0.32 0.26 0.22 0.18 0.14 0.11

8 9 10 11 12 13 14 15 16 17

Page 38: Multinomial Logistic Regression

Question & Answer

What is the relationship between race and interview tracking effort, when controlling for education?

The odds of requiring more calls, compared to being easy-to-track, are not significantly different for European- and African-Americans, when controlling for education. The odds of requiring more visits, compared to being easy-to-track, are higher for African-Americans by a factor of 2.48 (148%), when controlling for education.

Page 39: Multinomial Logistic Regression

Assumptions Necessary for Testing Hypotheses Assumptions discussed in GZLM lecture Independence of irrelevant alternatives

(IIA)Odds of one outcome (e.g., More Calls)

relative to another (e.g., Easy) are not influenced by other alternatives (e.g., More Visits)

Page 40: Multinomial Logistic Regression

Model Evaluation

Create a set of binary DVs from the polytomous DV

recode TrackCat (1=0) (2=1) (3=sysmis) into MoreCalls.recode TrackCat (1=0) (2=sysmis) (3=1) into MoreVisits.

Run separate binary logistic regressions Use binary logistic regression methods to

detect outliers and influential observations

Page 41: Multinomial Logistic Regression

Model Evaluation (cont’d)

Index plotsLeverage valuesStandardized or unstandardized deviance

residualsCook’s D

Graph and compare observed and estimated counts

Page 42: Multinomial Logistic Regression

Analogs of R2

None in standard use and each may give different results

Typically much smaller than R2 values in linear regression

Difficult to interpret

Page 43: Multinomial Logistic Regression

Multicollinearity

SPSS multinomial logistic regression doesn’t compute multicollinearity statistics

Use SPSS linear regression Problematic levels

Tolerance < .10 or VIF > 10

Page 44: Multinomial Logistic Regression

Additional Topics

Polytomous IVs Curvilinear relationships Interactions

Page 45: Multinomial Logistic Regression

Additional Regression Models for Polytomous DVs Multinomial probit regression

Substantive results essentially indistinguishable from binary logistic regression

Choice between this and binary logistic regression largely one of convenience and discipline-specific convention

Many researchers prefer binary logistic regression because it provides odds ratios whereas probit regression does not, and binary logistic regression comes with a wider variety of fit statistics

Page 46: Multinomial Logistic Regression

Additional Regression Models for Polytomous DVs (cont’d)Discriminant analysis

Limited to continuous IVs