multinomial logistic regression

46
Multinomial Logistic Regression Inanimate objects can be classified scientifically into three major categories; those that don't work, those that break down and those that get lost” (Russell Baker)

Post on 05-Jan-2016

118 views

Category:

Documents

Tags:

• dichotomous dv identical

DESCRIPTION

Multinomial Logistic Regression “ Inanimate objects can be classified scientifically into three major categories; those that don't work, those that break down and those that get lost” (Russell Baker). Multinomial Logistic Regression. - PowerPoint PPT Presentation

TRANSCRIPT

Multinomial Logistic Regression

“Inanimate objects can be classified scientifically into three major categories; those that don't work, those that break down and those that get lost” (Russell

Baker)

Multinomial Logistic Regression

Also known as “polytomous” or “nominal logistic” or “logit regression” or the “discrete choice model”

Generalization of binary logistic regression to a polytomous DVWhen applied to a dichotomous DV identical

to binary logistic regression

Polytomous Variables

Three or more unordered categories Categories mutually exclusive and

exhaustive Sometimes called “multicategorical” or

sometimes “multinomial” variables

Polytomous DVs

Reason for leaving welfare:marriage, stable employment, move to

another state, incarceration, or death Status of foster home application:

licensed to foster, discontinued application process prior to licensure, or rejected for licensure

Changes in living arrangements of the elderly:newly co-residing with their children, no

longer co-residing, or residing in institutions

Single (Dichotomous) IV Example DV = interview tracking effort

easy-to-interview and track mothers (Easy); difficult-to-track mothers who required more

telephone calls (MoreCalls); difficult-to-track mothers who required more

unscheduled home visits (MoreVisits) IV = race, 0 = European-American, 1 =

African-American N = 246 mothers What is the relationship between race and

interview tracking effort?

Crosstabulation

Table 3.1

Relationship between race and tracking effort is statistically significant [2(2, N = 246) = 8.69, p = .013]

Reference Category

In binary logistic regression category of the DV coded 0 implicitly serves as the reference category

Known as “baseline,” “base,” or “comparison” category

Necessary to explicitly select reference category“Easy” selected

Probabilities

Table 3.1 More Calls (vs. Easy)

European-American: .24 = 30 / (30 + 96) African-American: .31 = 24 / (24 + 53)

More Visits (vs. Easy)European-American: .15 = 17 / (17 + 96) African-American: .33 = 26 / (26 +53)

Odds & Odds Ratio

More Calls (vs. Easy)European-American: .3125 (.2098 / .6713)African-American: .4528 (.2330 / .5146)Odds Ratio = 1.45 (.4528 / .3125)

• 45% increase in the odds

More Visits (vs. Easy)European-American: .1771 (.1189 / .6713)African-American: .4905 (.2524 / .5146). Odds Ratio = 2.77 (.4905 / .1771)

• 177% increase in the odds

What is the relationship between race and interview tracking effort?

The odds of requiring more calls, compared to being easy-to-track, are higher for African-Americans by a factor of 1.45 (45%). The odds of requiring more visits, compared to being easy-to-track, are higher for African-Americans by a factor of 2.77 (177%).

Multinomial Logistic Regression

Set of binary logistic regression models estimated simultaneouslyNumber of non-redundant binary logistic

regression equations equals the number of categories of the DV minus one

Statistical Significance

Table 3.2(Race, More Calls vs. Easy) = (Race, More Visits vs. Easy) = 0

• Reject Table 3.3

(Race, More Calls vs. Easy) = (Race, More Visits vs. Easy) = 0• Reject

Table 3.4(Race, More Calls vs. Easy) = 0

• Don’t Reject(Race, More Visits vs. Easy) = 0

• Reject

Odds Ratios

OR(More Calls vs. Easy) = 1.45The odds of requiring more calls, compared

to being easy-to-track, are not significantly different for European- and African-Americans.

OR(More Visits vs. Easy) = 2.77The odds of requiring more visits, compared

to being easy-to-track, are higher for African-Americans by a factor of 2.77 (177%).

Estimated Logits (L)

Table 3.4

L(More Calls vs. Easy) = a + BRaceXRace

L(More Calls vs. Easy) = -1.163 + (.371)(XRace)

L(More Visits vs. Easy) = a + BRaceXRace

L(More Visits vs. Easy) = -1.731 + (1.019)(XRace)

Logits to Odds

African-Americans (X = 1)

L(More Calls vs. Easy) = -.792 = -1.163 + (.371)(1)

Odds = e-.792 = .45

L(More Visits vs. Easy) = -.712 = -1.731 + (1.019)(1)Odds = e-.712 = .49

Logits to Probabilities

African-Americans, L(More Calls vs. Easy) = -.792

African-Americans, L(More Visits vs. Easy) = -.712

.e

ep̂

.

.

Easy) vs.Calls (More

.e

ep̂

.

.

Easy) vs.Visits (More

What is the relationship between race and interview tracking effort?

The odds of requiring more calls, compared to being easy-to-track, are not significantly different for European- and African-Americans.

The odds of requiring more visits, compared to being easy-to-track, are higher for African-Americans by a factor of 2.77 (177%).

Single (Quantitative) IV Example DV = interview tracking effort

easy-to-interview and track mothers (Easy); difficult-to-track mothers who required more

telephone calls (MoreCalls); difficult-to-track mothers who required more

unscheduled home visits (MoreVisits) IV = years of education N = 246 mothers What is the relationship between

education and interview tracking effort?

Statistical Significance

Table 3.6(Education, More Calls vs. Easy) = (Education, More Visits vs. Easy)

= 0• Reject

Table 3.7(Education, More Calls vs. Easy) = 0

• Don’t Reject

(Education, More Visits vs. Easy) = 0• Reject

Odds Ratios

OR(More Calls vs. Easy) = .88The odds of requiring more calls, compared

to being easy-to-track, are not significantly associated with education.

OR(More Visits vs. Easy) = .76For every additional year of education the

odds of needing more visits, compared to being easy-to-track, decrease by a factor of .76 (i.e., -24.1%).

Figures

Education.xls

Estimated Logits (L)

Table 3.7

X = 12 (high school education)

L(More Calls vs. Easy) = -.977 = .583 + (-.130)(12)

L(More Visits vs. Easy) = -1.235 = 2.077 + (-.276)(12)

Effect of Education on Tracking Effort (Logits)

-3.00

-2.00

-1.00

0.00

1.00

2.00

Years of Education

Log

its

More Calls -0.46 -0.58 -0.71 -0.84 -0.97 -1.10 -1.23 -1.36 -1.49 -1.62

More Visits -0.13 -0.41 -0.69 -0.96 -1.24 -1.51 -1.79 -2.07 -2.34 -2.62

8 9 10 11 12 13 14 15 16 17

Logits to Odds

X = 12 (high school education)

Odds(More Calls vs. Easy) = e-.977 = .38

Odds(More Visits vs. Easy) = e-1.235 = .29

Effect of Education on Tracking Effort (Odds)

0.00

0.20

0.40

0.60

0.80

1.00

Years of Education

Odd

s

More Calls 0.63 0.56 0.49 0.43 0.38 0.33 0.29 0.26 0.22 0.20

More Visits 0.88 0.66 0.50 0.38 0.29 0.22 0.17 0.13 0.10 0.07

8 9 10 11 12 13 14 15 16 17

Logits to Probabilities

X = 12 (high school education)

.e

ep̂

.

.

Easy) vs.Calls (More

.e

ep̂

.

.

Easy) vs.Visits (More

Effect of Education on Tracking Effort (Probabilities)

.00

.10

.20

.30

.40

.50

Years of Education

Pro

babi

litie

s

More Calls 0.39 0.36 0.33 0.30 0.27 0.25 0.23 0.20 0.18 0.16

More Visits 0.47 0.40 0.34 0.28 0.22 0.18 0.14 0.11 0.09 0.07

8 9 10 11 12 13 14 15 16 17

What is the relationship between education and interview tracking effort?

The odds of requiring more calls, compared to being easy-to-track, are not significantly associated with education. For every additional year of education the odds of needing more visits, compared to being easy-to-track, decrease by a factor of .76 (i.e., -24.1%).

Multiple IV Example

DV = interview tracking efforteasy-to-interview and track mothers (Easy); difficult-to-track mothers who required more

telephone calls (MoreCalls); difficult-to-track mothers who required more

unscheduled home visits (MoreVisits) IV = race, 0 = European-American, 1 =

African-American IV = years of education N = 246 mothers

Multiple IV Example (cont’d)

What is the relationship between race and interview tracking effort, when controlling for education?

Statistical Significance

Table 3.8(Race, More Calls vs. Easy) = (Race, More Visits vs. Easy) = (Ed,

More Calls vs. Easy) = (Ed, More Visits vs. Easy) = 0• Reject

Table 3.9(Race, More Calls vs. Easy) = (Race, More Visits vs. Easy) = 0

• Reject(Ed, More Calls vs. Easy) = (Ed, More Visits vs. Easy) = 0

• Reject

Statistical Significance (cont’d) Table 3.10

(Race, More Calls vs. Easy) = 0• Don’t reject

(Race, More Visits vs. Easy) = 0• Reject

(Ed, More Calls vs. Easy) = 0• Don’t reject

(Ed, More Visits vs. Easy) = 0• Reject

Odds Ratios: Race

OR(More Calls vs. Easy) = 1.36The odds of requiring more calls, compared

to being easy-to-track, are not significantly different for European- and African-Americans.

OR(More Visits vs. Easy) = 2.48The odds of requiring more visits, compared

to being easy-to-track, are higher for African-Americans by a factor of 2.48 (148%).

Odds Ratios: Education

OR(More Calls vs. Easy) = .89The odds of requiring more calls, compared

to being easy-to-track, are not significantly associated with education.

OR(More Visits vs. Easy) = .77For every additional year of education the

odds of needing more visits, compared to being easy-to-track, decrease by a factor of .77 (i.e., -23%), when controlling for race.

Figures

Race & Education.xls

Effect of Education on Tracking Effort for African-Americans (Odds)

0.00

0.50

1.00

1.50

Years of Education

Odd

s

More Calls 0.73 0.65 0.58 0.51 0.45 0.40 0.36 0.32 0.28 0.25

More Visits 1.30 1.01 0.78 0.60 0.46 0.36 0.28 0.21 0.17 0.13

8 9 10 11 12 13 14 15 16 17

Effect of Education on Tracking Effort for African-Americans (Probabilities)

.00

.10

.20

.30

.40

.50

.60

Years of Education

Pro

babi

litie

s

More Calls 0.42 0.39 0.37 0.34 0.31 0.29 0.26 0.24 0.22 0.20

More Visits 0.57 0.50 0.44 0.38 0.32 0.26 0.22 0.18 0.14 0.11

8 9 10 11 12 13 14 15 16 17

What is the relationship between race and interview tracking effort, when controlling for education?

The odds of requiring more calls, compared to being easy-to-track, are not significantly different for European- and African-Americans, when controlling for education. The odds of requiring more visits, compared to being easy-to-track, are higher for African-Americans by a factor of 2.48 (148%), when controlling for education.

Assumptions Necessary for Testing Hypotheses Assumptions discussed in GZLM lecture Independence of irrelevant alternatives

(IIA)Odds of one outcome (e.g., More Calls)

relative to another (e.g., Easy) are not influenced by other alternatives (e.g., More Visits)

Model Evaluation

Create a set of binary DVs from the polytomous DV

recode TrackCat (1=0) (2=1) (3=sysmis) into MoreCalls.recode TrackCat (1=0) (2=sysmis) (3=1) into MoreVisits.

Run separate binary logistic regressions Use binary logistic regression methods to

detect outliers and influential observations

Model Evaluation (cont’d)

Index plotsLeverage valuesStandardized or unstandardized deviance

residualsCook’s D

Graph and compare observed and estimated counts

Analogs of R2

None in standard use and each may give different results

Typically much smaller than R2 values in linear regression

Difficult to interpret

Multicollinearity

SPSS multinomial logistic regression doesn’t compute multicollinearity statistics

Use SPSS linear regression Problematic levels

Tolerance < .10 or VIF > 10

Polytomous IVs Curvilinear relationships Interactions

Additional Regression Models for Polytomous DVs Multinomial probit regression

Substantive results essentially indistinguishable from binary logistic regression

Choice between this and binary logistic regression largely one of convenience and discipline-specific convention

Many researchers prefer binary logistic regression because it provides odds ratios whereas probit regression does not, and binary logistic regression comes with a wider variety of fit statistics

Additional Regression Models for Polytomous DVs (cont’d)Discriminant analysis

Limited to continuous IVs