introduction to categorical data analysis odds ratio, measure of association, test of independence,...

58
INTRODUCTION TO CATEGORICAL DATA ANALYSIS ODDS RATIO, MEASURE OF ASSOCIATION, TEST OF INDEPENDENCE, LOGISTIC REGRESSION AND POLYTOMIOUS LOGISTIC REGRESSION

Upload: luke-hacker

Post on 14-Dec-2015

235 views

Category:

Documents


8 download

TRANSCRIPT

Page 1: INTRODUCTION TO CATEGORICAL DATA ANALYSIS ODDS RATIO, MEASURE OF ASSOCIATION, TEST OF INDEPENDENCE, LOGISTIC REGRESSION AND POLYTOMIOUS LOGISTIC REGRESSION

INTRODUCTION TO CATEGORICAL DATA

ANALYSISODDS RATIO, MEASURE OF ASSOCIATION, TEST OF INDEPENDENCE, LOGISTIC REGRESSION AND POLYTOMIOUS LOGISTIC REGRESSION

Page 2: INTRODUCTION TO CATEGORICAL DATA ANALYSIS ODDS RATIO, MEASURE OF ASSOCIATION, TEST OF INDEPENDENCE, LOGISTIC REGRESSION AND POLYTOMIOUS LOGISTIC REGRESSION

DEFINITION

• Categorical data are such that measurement scale consists of a set of categories.

• E.g. marital status: never married, married, divorced, widowed nominal• E.g. attitude toward some policy: strongly disapprove, disapprove, approve,

strongly approve ordinal• SOME VISUALIZATION TECHNIQUES: Jittering, mosaic plots, bar plots etc.• Correlation between ordinal or nominal measurements are usually referred

to as association.

Page 3: INTRODUCTION TO CATEGORICAL DATA ANALYSIS ODDS RATIO, MEASURE OF ASSOCIATION, TEST OF INDEPENDENCE, LOGISTIC REGRESSION AND POLYTOMIOUS LOGISTIC REGRESSION

MEASURE OF ASSOCIATION

Page 4: INTRODUCTION TO CATEGORICAL DATA ANALYSIS ODDS RATIO, MEASURE OF ASSOCIATION, TEST OF INDEPENDENCE, LOGISTIC REGRESSION AND POLYTOMIOUS LOGISTIC REGRESSION

ODDS RATIO

Page 5: INTRODUCTION TO CATEGORICAL DATA ANALYSIS ODDS RATIO, MEASURE OF ASSOCIATION, TEST OF INDEPENDENCE, LOGISTIC REGRESSION AND POLYTOMIOUS LOGISTIC REGRESSION

ODDS RATIO - EXAMPLE

• Chinook Salmon fish captured in 1999• VARIABLES:

-SEX: M or F Nominal - MODE OF CAPTURE: Hook&line or Net Nominal

- RUN: Early run (before July 1) or Late run (After July 1) Ordinal- AGE: Interval (Cont. var.)- LENGTH (Eye to fork of tail in mm): Interval (Cont. Var.)

• What is the odds that a captured fish is a female?Consider Success = Female (Because they are heavier )

Page 6: INTRODUCTION TO CATEGORICAL DATA ANALYSIS ODDS RATIO, MEASURE OF ASSOCIATION, TEST OF INDEPENDENCE, LOGISTIC REGRESSION AND POLYTOMIOUS LOGISTIC REGRESSION

CHINOOK SALMON EXAMPLE

For Hook&Line:

For Net:

77.182.0

45.1

netfish with female a capturing of Odds

line&hookfish with female a capturing of OddsˆOR ROEstimated

The odds that a captured fish is female are 77% ((1.77-1)=0.77) higher with hook&line compared to net.

Page 7: INTRODUCTION TO CATEGORICAL DATA ANALYSIS ODDS RATIO, MEASURE OF ASSOCIATION, TEST OF INDEPENDENCE, LOGISTIC REGRESSION AND POLYTOMIOUS LOGISTIC REGRESSION

ODDS RATIO

• In general

2112

2211

221222221212

211121211111

YY

YY

YY/Y/YY/Y

YY/Y/YY/YRO

Page 8: INTRODUCTION TO CATEGORICAL DATA ANALYSIS ODDS RATIO, MEASURE OF ASSOCIATION, TEST OF INDEPENDENCE, LOGISTIC REGRESSION AND POLYTOMIOUS LOGISTIC REGRESSION

INTERPRETATION OF OR

• What does OR=1 mean?

eventst independen

2

2

1

1

122

11

CFailureP

CSuccessP

CFailureP

CSuccessP

ConditionFailureP/ConditionSuccessP

ConditionFailureP/ConditionSuccessPOR

Odds of success are equal number under both conditions. e.g. no matter which mode of capturing is used.

Page 9: INTRODUCTION TO CATEGORICAL DATA ANALYSIS ODDS RATIO, MEASURE OF ASSOCIATION, TEST OF INDEPENDENCE, LOGISTIC REGRESSION AND POLYTOMIOUS LOGISTIC REGRESSION

INTERPRETATION OF OR

• OR>1

• OR<1

2

2

1

1

CFailureP

CSuccessP

CFailureP

CSuccessP Odds of success is higher with condition 1

2

2

1

1

CFailureP

CSuccessP

CFailureP

CSuccessP Odds of success is lower with condition 1

Page 10: INTRODUCTION TO CATEGORICAL DATA ANALYSIS ODDS RATIO, MEASURE OF ASSOCIATION, TEST OF INDEPENDENCE, LOGISTIC REGRESSION AND POLYTOMIOUS LOGISTIC REGRESSION

SHAPE OF OR

• The range of OR is: 0OR

• ln(OR) has a more symmetric distribution than OR (i.e., more close to normal distribution)

• OR=1 ln(OR)=0

• (1)100% Confidence Interval for ln(OR):

(1)100% Confidence Interval for OR:

Non-symmetric

B,AYYYY

z)ROln(say

/ 22211211

21111

Bexp,Aexp

Page 11: INTRODUCTION TO CATEGORICAL DATA ANALYSIS ODDS RATIO, MEASURE OF ASSOCIATION, TEST OF INDEPENDENCE, LOGISTIC REGRESSION AND POLYTOMIOUS LOGISTIC REGRESSION

CHINOOK SALMON EXAMPLE (Contd.)

• The odds that a captured fish is female are about 30 to 140% greater with hook&line than with using a net.

42231eORfor CI

88202590158909615710

5710

15890202

1

165

1

119

1

172

1

88200.259 .,.e,

.,....

.ROln

.ROlnSE

.

Page 12: INTRODUCTION TO CATEGORICAL DATA ANALYSIS ODDS RATIO, MEASURE OF ASSOCIATION, TEST OF INDEPENDENCE, LOGISTIC REGRESSION AND POLYTOMIOUS LOGISTIC REGRESSION

OTHER MEASURES OF ASSOCIATION FOR 2X2 TABLES

• Relative Risk= 221212

211111

YY/Y

YY/Y

n.correlatio toclose istion Interpreta

t.independen are events 0,QWhen

111

1

Q,OR

ORQs'Yule

Page 13: INTRODUCTION TO CATEGORICAL DATA ANALYSIS ODDS RATIO, MEASURE OF ASSOCIATION, TEST OF INDEPENDENCE, LOGISTIC REGRESSION AND POLYTOMIOUS LOGISTIC REGRESSION

MEASURE OF ASSOCIATION FOR IxJTABLES• Pearson 2 in contingency tables:

• EXAMPLE= Instrument Failure

I

i

J

j ij

ijij

E

EO

1 1

22

Location of Failure

L1 L2 L3 TOTAL

Type of Failure T1 50 16 31 97

T2 61 26 16 103

TOTAL 111 42 47 200

Page 14: INTRODUCTION TO CATEGORICAL DATA ANALYSIS ODDS RATIO, MEASURE OF ASSOCIATION, TEST OF INDEPENDENCE, LOGISTIC REGRESSION AND POLYTOMIOUS LOGISTIC REGRESSION

PEARSON 2 IN CONTINGENCY TABLES

• Question: Are the type of failure and location of failure independent?

H0: Type and location are independent

H1: They are not independent

• We will compare sample frequencies (i.e. observed values) with the expected frequencies under H0.

• Remember that if events A and B are independent, then P(AB)=P(A)P(B).• If type and location are independent, then• P(T1 and L1)=P(T1)P(L1)=(97/200)(111/200)

Page 15: INTRODUCTION TO CATEGORICAL DATA ANALYSIS ODDS RATIO, MEASURE OF ASSOCIATION, TEST OF INDEPENDENCE, LOGISTIC REGRESSION AND POLYTOMIOUS LOGISTIC REGRESSION

PEARSON 2 IN CONTINGENCY TABLES

• Cells~Multinomial(n,p1,…,p6)E(Xi)=npi

• Expected Frequency=E11=n.Prob.=200(97/200)(111/200)=53.84

• E12=200(97/200)(42/200)=20.37

• E13=(97*47/200)=22.8

• E21=(103*111/200)=57.17

• E22=(103*42/200)=21.63

• E23=(103*47/200)=24.2

t.independennot areThey HReject 0868995

22111with

0868224

22416

8453

845350

022

0502

2

1

3

1

2222

..

xrow##columndf

..

.

.

.

E

EO

.,

i j ij

ijij

Page 16: INTRODUCTION TO CATEGORICAL DATA ANALYSIS ODDS RATIO, MEASURE OF ASSOCIATION, TEST OF INDEPENDENCE, LOGISTIC REGRESSION AND POLYTOMIOUS LOGISTIC REGRESSION

CRAMER’S V

• It adjusts the Pearson 2 with n, I and J.

In the previous example,

1011

2

V,J,Imin

n/Vs'Cramer

2021

2000868.

,min

/.V

Page 17: INTRODUCTION TO CATEGORICAL DATA ANALYSIS ODDS RATIO, MEASURE OF ASSOCIATION, TEST OF INDEPENDENCE, LOGISTIC REGRESSION AND POLYTOMIOUS LOGISTIC REGRESSION

CORRELATION BETWEEN ORDINAL VARIABLES

• Correlation coefficients are used to quantitatively describe the strength and direction of a relationship between two variables.

• When both variables are at least interval measurements, may report Pearson product moment coefficient of correlation that is also known as the correlation coefficient, and is denoted by ‘r’.

• Pearson correlation coefficient is only appropriate to describe linear correlation. The appropriateness of using this coefficient could be examined through scatter plots.

• A statistic that measures the correlation between two ‘rank’ measurements is Spearman’s ρ , a nonparametric analog of Pearson’s r.

• Spearman’s ρ is appropriate for skewed continuous or ordinal measurements. It can also be used to determine the relationship between one continuous and one ordinal variable.

• Statistical tests are available to test hypotheses on ρ. Ho: There is no correlation between the two variables (H0: ρ= 0).

Page 18: INTRODUCTION TO CATEGORICAL DATA ANALYSIS ODDS RATIO, MEASURE OF ASSOCIATION, TEST OF INDEPENDENCE, LOGISTIC REGRESSION AND POLYTOMIOUS LOGISTIC REGRESSION

MEASURES OF ASSOCIATION FOR IxJ TABLES FOR TWO ORDINAL VARIABLES

Page 19: INTRODUCTION TO CATEGORICAL DATA ANALYSIS ODDS RATIO, MEASURE OF ASSOCIATION, TEST OF INDEPENDENCE, LOGISTIC REGRESSION AND POLYTOMIOUS LOGISTIC REGRESSION

Either of these might be considered a perfect relationship, depending on one’s reasoning about what relationships between variables look like.

• Why are there multiple measures of association?• Statisticians over the years have thought of varying

ways of characterizing what a perfect relationship is: tau-b = 1, gamma = 1 tau-b <1, gamma = 1

55

10 25

3 7 30

55

35

40

Page 20: INTRODUCTION TO CATEGORICAL DATA ANALYSIS ODDS RATIO, MEASURE OF ASSOCIATION, TEST OF INDEPENDENCE, LOGISTIC REGRESSION AND POLYTOMIOUS LOGISTIC REGRESSION

I’m so confused!!

Page 21: INTRODUCTION TO CATEGORICAL DATA ANALYSIS ODDS RATIO, MEASURE OF ASSOCIATION, TEST OF INDEPENDENCE, LOGISTIC REGRESSION AND POLYTOMIOUS LOGISTIC REGRESSION

Rule of Thumb

• Gamma tends to overestimate strength but gives an idea of upper boundary.

• If table is square use tau-b; if rectangular, use tau-c.

• Pollock (and we agree):

τ <.1 is weak; .1<τ<.2 is moderate; .2<τ<.3 moderately strong; .3< τ<1 strong.

Page 22: INTRODUCTION TO CATEGORICAL DATA ANALYSIS ODDS RATIO, MEASURE OF ASSOCIATION, TEST OF INDEPENDENCE, LOGISTIC REGRESSION AND POLYTOMIOUS LOGISTIC REGRESSION

MEASUREMENT OF AGREEMENT FOR IxI TABLES

Judge 2

A B C

Judge 1

A

B

C

agreement)(perfect 11

ˆ,

ˆˆ

ˆˆˆˆ

iii

iii

iii

Prob. of agreement

Page 23: INTRODUCTION TO CATEGORICAL DATA ANALYSIS ODDS RATIO, MEASURE OF ASSOCIATION, TEST OF INDEPENDENCE, LOGISTIC REGRESSION AND POLYTOMIOUS LOGISTIC REGRESSION

EXAMPLE (COHEN’S KAPPA or Index of Inter-rater Reliability)• Two pathologists examined 118 samples and categorize them into 4

groups. Below is the 2x2 table for their decisions.

Page 24: INTRODUCTION TO CATEGORICAL DATA ANALYSIS ODDS RATIO, MEASURE OF ASSOCIATION, TEST OF INDEPENDENCE, LOGISTIC REGRESSION AND POLYTOMIOUS LOGISTIC REGRESSION

EXAMPLE (Contd.)

6360118

10367224

1.ˆ

iii

2810

118

10286938122627262

4

1.ˆˆ

iii

493028101

28106360.

.

..ˆ

The difference between observed agreement that expected under independence is about 50% of the maximum possible difference.

Page 25: INTRODUCTION TO CATEGORICAL DATA ANALYSIS ODDS RATIO, MEASURE OF ASSOCIATION, TEST OF INDEPENDENCE, LOGISTIC REGRESSION AND POLYTOMIOUS LOGISTIC REGRESSION

EVALUATION OF KAPPA

• If the obtained K is less than .70 -- conclude that the inter-rater reliability is not satisfactory.

• If the obtained K is greater than .70 -- conclude that the inter-rater reliability is satisfactory.

• Interpretation of kappa, after Landis and Koch (1977)

Page 26: INTRODUCTION TO CATEGORICAL DATA ANALYSIS ODDS RATIO, MEASURE OF ASSOCIATION, TEST OF INDEPENDENCE, LOGISTIC REGRESSION AND POLYTOMIOUS LOGISTIC REGRESSION

PROBABILITY MODELS FOR CATEGORICAL DATA

• Bernoulli/Binomial• Multinomial• Poisson• …

Page 27: INTRODUCTION TO CATEGORICAL DATA ANALYSIS ODDS RATIO, MEASURE OF ASSOCIATION, TEST OF INDEPENDENCE, LOGISTIC REGRESSION AND POLYTOMIOUS LOGISTIC REGRESSION

TEST ON PROPORTIONS AND CONFIDENCE INTERVALS

• You are already familiar with tests for proportions:

• CI for Y=0

21021000 or or :H:H:H

Pearson 2 or Deviance G2 test

Page 28: INTRODUCTION TO CATEGORICAL DATA ANALYSIS ODDS RATIO, MEASURE OF ASSOCIATION, TEST OF INDEPENDENCE, LOGISTIC REGRESSION AND POLYTOMIOUS LOGISTIC REGRESSION

CONFIDENCE INTERVAL FOR A PROPORTION

• For large sample size, we can use normal approximation to binomial (np5 and n(1p) 5).

• If np<5 or n(1p)<5, normal approximation is not realistic.

n

ppzpCI

n

Ypˆ

/

1 :for

estimatePoint

2

Page 29: INTRODUCTION TO CATEGORICAL DATA ANALYSIS ODDS RATIO, MEASURE OF ASSOCIATION, TEST OF INDEPENDENCE, LOGISTIC REGRESSION AND POLYTOMIOUS LOGISTIC REGRESSION

CONFIDENCE INTERVAL FOR A PROPORTION

• Consider Y=0 in n trials. Then, p=Y/n=0.• Normal approximated CI:

No matter what n is! But, observing 0 success in 1 trial or in 100 trials is different. Note that, np=0<5.

00010

9610 ,n

.

Page 30: INTRODUCTION TO CATEGORICAL DATA ANALYSIS ODDS RATIO, MEASURE OF ASSOCIATION, TEST OF INDEPENDENCE, LOGISTIC REGRESSION AND POLYTOMIOUS LOGISTIC REGRESSION

EXACT CONFIDENCE INTERVALS(Collette, 1991, Modeling Binary

Data)• Lower Limit:

• Upper Limit:

122 where

21

21221

1

Ynv,Yv

Fvv

vP

/,v,vL

Ynv,Yv

Fvv

FvP

/,v,v

/,v,vU

212 where

43

24334

2433

Page 31: INTRODUCTION TO CATEGORICAL DATA ANALYSIS ODDS RATIO, MEASURE OF ASSOCIATION, TEST OF INDEPENDENCE, LOGISTIC REGRESSION AND POLYTOMIOUS LOGISTIC REGRESSION

EXACT CONFIDENCE INTERVALS

• Going back to Example with Y=0.• Let n=5.

• Y=0 v1=0, v2=2(5+1)=12, v3=2, v4=2(5)=10

52180

210

2

0120

0

465

0250102

0250102

0250012

.F

F.P

FP

.

.,

.,U

.,L

Page 32: INTRODUCTION TO CATEGORICAL DATA ANALYSIS ODDS RATIO, MEASURE OF ASSOCIATION, TEST OF INDEPENDENCE, LOGISTIC REGRESSION AND POLYTOMIOUS LOGISTIC REGRESSION

LOGISTIC REGRESSION• To analyze the relationship between a binary outcome and a set of

explanatory variables when Y is binary.• Assumptions of linear models do not hold.• Assume Yi~Ber(i). Then, E(Yi)= i=P(Yi=1)P(Yi=0)=1-i.

• Logistic regression is defined as:

pipii

ii xxlogitlog

110

0iYP to1iYP of odds

1

pipi

pipiii xxexp

xxexpYE

110

110

1

log odds is expressed as a function of x’s

Page 33: INTRODUCTION TO CATEGORICAL DATA ANALYSIS ODDS RATIO, MEASURE OF ASSOCIATION, TEST OF INDEPENDENCE, LOGISTIC REGRESSION AND POLYTOMIOUS LOGISTIC REGRESSION

Binary Logistic Regression

• Logistic Distribution

• Transformed, however, the “log odds” are linear.

ln[p/(1-p)]

P (Y=1)

x

x

Page 34: INTRODUCTION TO CATEGORICAL DATA ANALYSIS ODDS RATIO, MEASURE OF ASSOCIATION, TEST OF INDEPENDENCE, LOGISTIC REGRESSION AND POLYTOMIOUS LOGISTIC REGRESSION

INTERPRETATION OF PARAMATERS

• Consider p=1. Let X*=X+1 (i.e., one unit increase in X). Then, odds ratio is:

• exp(1): the odds ratio for 1 unit change in X

• 1: the log-odds ratio for 1 unit change in X

110

110

1

1

ee

eeix

ix

i

i

*i

*i

Page 35: INTRODUCTION TO CATEGORICAL DATA ANALYSIS ODDS RATIO, MEASURE OF ASSOCIATION, TEST OF INDEPENDENCE, LOGISTIC REGRESSION AND POLYTOMIOUS LOGISTIC REGRESSION

MULTIPLE LOGISTIC REGRESSION

Page 36: INTRODUCTION TO CATEGORICAL DATA ANALYSIS ODDS RATIO, MEASURE OF ASSOCIATION, TEST OF INDEPENDENCE, LOGISTIC REGRESSION AND POLYTOMIOUS LOGISTIC REGRESSION

ESTIMATION OF PARAMETERS

Yi~Ber(i).

n

i

iyi

iy

i

n

iiyfL

1

1

11

p,,,k,

e

exxy

Lln

e

ey

Lln

elnxxy

lnlny

lnylnyLln

n

i pixpix

pixpixki

n

ikii

k

n

i pixpix

pixpixn

ii

n

i pixpixpipii

n

ii

i

ii

n

iiiii

2101

01

1

1

11

11

1 110

110

1

1 110

110

10

1 110110

1

1

Nonlinear equations in s. No closed form. Need iterative methods in computer!

Page 37: INTRODUCTION TO CATEGORICAL DATA ANALYSIS ODDS RATIO, MEASURE OF ASSOCIATION, TEST OF INDEPENDENCE, LOGISTIC REGRESSION AND POLYTOMIOUS LOGISTIC REGRESSION

MODEL CHECK

• Since errors, i takes only two values in logistic regression, “usual” residuals will not help with model checks. But, there is “deviance in residuals” in this case.

iiiii

i

ˆlnyˆlnyved

likelihoodlogved

112

2isubject for Deviance

Page 38: INTRODUCTION TO CATEGORICAL DATA ANALYSIS ODDS RATIO, MEASURE OF ASSOCIATION, TEST OF INDEPENDENCE, LOGISTIC REGRESSION AND POLYTOMIOUS LOGISTIC REGRESSION

MODEL CHECK

• You can plot devi vs i, which is called index plot of deviance residuals to identify outlying residuals. But this plot does not indicate whether these residuals should be treated as outliers.

• There are also analogues of common methods used for linear regression such as leverage values and influence diagnostics ( Dffits, Cook’s distance)…

• NOTE: An alternative for predicting binary response is discriminant analysis. However, this approach assumes X’s are jointly distributed as multivariate normal distribution. So, it is more reasonable when X’s are continuous. Otherwise, logistic regression should be preferred.

Page 39: INTRODUCTION TO CATEGORICAL DATA ANALYSIS ODDS RATIO, MEASURE OF ASSOCIATION, TEST OF INDEPENDENCE, LOGISTIC REGRESSION AND POLYTOMIOUS LOGISTIC REGRESSION

Binary Logistic Regression

• A researcher is interested in the likelihood of gun ownership in the US, and what would predict that.

• He uses the 2002 GSS to test the following research hypotheses:1. Men are more likely to own guns than women2. The older persons are, the more likely they are to own guns3. White people are more likely to own guns than those of other races4. The more educated persons are, the less likely they are to own guns

Page 40: INTRODUCTION TO CATEGORICAL DATA ANALYSIS ODDS RATIO, MEASURE OF ASSOCIATION, TEST OF INDEPENDENCE, LOGISTIC REGRESSION AND POLYTOMIOUS LOGISTIC REGRESSION

Binary Logistic Regression

• Variables are measured as such:Dependent:

Havegun: no gun = 0, own gun(s) = 1

Independent:1. Sex: men = 0, women = 12. Age: entered as number of years3. White: all other races = 0, white =14. Education: entered as number of years

SPSS: Anyalyze Regression Binary LogisticEnter your variables and for output below, under options, I checked

“iteration history”

Page 41: INTRODUCTION TO CATEGORICAL DATA ANALYSIS ODDS RATIO, MEASURE OF ASSOCIATION, TEST OF INDEPENDENCE, LOGISTIC REGRESSION AND POLYTOMIOUS LOGISTIC REGRESSION

Binary Logistic RegressionSPSS Output: Some descriptive information first…

Page 42: INTRODUCTION TO CATEGORICAL DATA ANALYSIS ODDS RATIO, MEASURE OF ASSOCIATION, TEST OF INDEPENDENCE, LOGISTIC REGRESSION AND POLYTOMIOUS LOGISTIC REGRESSION

Binary Logistic RegressionSPSS Output: Some descriptive information first…

Maximum likelihood process stops at third iteration and yields an intercept (-.625) for a model with no predictors. A measure of fit, -2 Log likelihood is generated. The equation producing this:-2(∑(Yi * ln[P(Yi)] + (1-Yi) ln[1-P(Yi)])This is simply the relationship between observed values for each case in your data and the model’s prediction for each case. The “negative 2” makes this number distribute as a X2 distribution.In a perfect model, -2 log likelihood would equal 0. Therefore, lower numbers imply better model fit.

Page 43: INTRODUCTION TO CATEGORICAL DATA ANALYSIS ODDS RATIO, MEASURE OF ASSOCIATION, TEST OF INDEPENDENCE, LOGISTIC REGRESSION AND POLYTOMIOUS LOGISTIC REGRESSION

Binary Logistic Regression

Originally, the “best guess” for each person in the data set is 0, have no gun!

This is the model for log odds when any other potential variable equals zero (null model). It predicts : P = .651, like above. 1/1+ea or 1/1+.535

Real P = .349

If you added each…

Page 44: INTRODUCTION TO CATEGORICAL DATA ANALYSIS ODDS RATIO, MEASURE OF ASSOCIATION, TEST OF INDEPENDENCE, LOGISTIC REGRESSION AND POLYTOMIOUS LOGISTIC REGRESSION

Binary Logistic RegressionNext are iterations for our full model…

Page 45: INTRODUCTION TO CATEGORICAL DATA ANALYSIS ODDS RATIO, MEASURE OF ASSOCIATION, TEST OF INDEPENDENCE, LOGISTIC REGRESSION AND POLYTOMIOUS LOGISTIC REGRESSION

Binary Logistic RegressionGoodness-of-fit statistics for new model come next…

Test of new model vs. intercept-only model (the null model), based on difference of -2LL of each. The difference has a X2 distribution. Is new -2LL significantly smaller?

The -2LL number is “ungrounded,” but it has a χ2 distribution. Smaller is better. In a perfect model, -2 log likelihood would equal 0.

These are attempts to replicate R2 using information based on -2 log likelihood, (C&S cannot equal 1)

-2(∑(Yi * ln[P(Yi)] + (1-Yi) ln[1-P(Yi)])

Assessment of new model’s predictions

Page 46: INTRODUCTION TO CATEGORICAL DATA ANALYSIS ODDS RATIO, MEASURE OF ASSOCIATION, TEST OF INDEPENDENCE, LOGISTIC REGRESSION AND POLYTOMIOUS LOGISTIC REGRESSION

Binary Logistic RegressionInterpreting Coefficients…ln[p/(1-p)] = a + b1X1 + b2X2 + b3X3 + b4X4

b1

b2

b3

b4

a

Being male, getting older, and being white have a positive effect on likelihood of owning a gun. On the other hand, education does not affect owning a gun.

X1

X2

X3

X4

1

eb

Which b’s are significant?

Page 47: INTRODUCTION TO CATEGORICAL DATA ANALYSIS ODDS RATIO, MEASURE OF ASSOCIATION, TEST OF INDEPENDENCE, LOGISTIC REGRESSION AND POLYTOMIOUS LOGISTIC REGRESSION

• ln[p/(1-p)] = a + b1X1 + …+bkXk, the power to which you need to take e to get:

P P

1 – P So… 1 – P = ea + b1X1+…+bkXk

• Plug in values of x to get the odds ( = p/1-p).

Binary Logistic Regression

The coefficients can be manipulated as follows:

Odds = p/(1-p) = ea+b1X1+b2X2+b3X3+b4X4 = ea(eb1)X1(eb2)X2(eb3)X3(eb4)X4

Odds = p/(1-p) = ea+.898X1+.008X2+1.249X3-.056X4 = e-1.864(e.898)X1(e.008)X2(e1.249)X3(e-.056)X4

Page 48: INTRODUCTION TO CATEGORICAL DATA ANALYSIS ODDS RATIO, MEASURE OF ASSOCIATION, TEST OF INDEPENDENCE, LOGISTIC REGRESSION AND POLYTOMIOUS LOGISTIC REGRESSION

Binary Logistic Regression

The coefficients can be manipulated as follows:

Odds = p/(1-p) = ea+b1X1+b2X2+b3X3+b4X4 = ea(eb1)X1(eb2)X2(eb3)X3(eb4)X4

Odds = p/(1-p) = e-2.246-.780X1+.020X2+1.618X3-.023X4 = e-2.246(e-.780)X1(e.020)X2(e1.618)X3(e-.023)X4

Each coefficient increases the odds by a multiplicative amount, the amount is eb. “Every unit increase in X increases the odds by eb.”

In the example above, eb = Exp(B) in the last column.

Page 49: INTRODUCTION TO CATEGORICAL DATA ANALYSIS ODDS RATIO, MEASURE OF ASSOCIATION, TEST OF INDEPENDENCE, LOGISTIC REGRESSION AND POLYTOMIOUS LOGISTIC REGRESSION

Binary Logistic Regression

Each coefficient increases the odds by a multiplicative amount, the amount is eb. “Every unit increase in X increases the odds by eb.”

In the example above, eb = Exp(B) in the last column.

For Sex: e-.780 = .458 … If you subtract 1 from this value, you get the proportion increase (or decrease) in the odds caused by being male, -.542. In percent terms, odds of owning a gun decrease 54.2% for women.

Age: e.020 = 1.020 A year increase in age increases the odds of owning a gun 2%.

White: e1.618 = 5.044 …Being white increases the odd of owning a gun by 404%

Educ: e-.023 = .977 …Not significant

Page 50: INTRODUCTION TO CATEGORICAL DATA ANALYSIS ODDS RATIO, MEASURE OF ASSOCIATION, TEST OF INDEPENDENCE, LOGISTIC REGRESSION AND POLYTOMIOUS LOGISTIC REGRESSION

Binary Logistic Regression

Age: e.020 = 1.020 A year increase in age increases the odds of owning a gun 2%.

How would 10 years’ increase in age affect the odds? Recall (eb)X is the equation component for a variable. For 10 years, (1.020)10 = 1.219. The odds jump by 22% for ten years’ increase in age.

Note: You’d have to know the current prediction level for the dependent variable to know if this percent change is actually making a big difference or not!

Page 51: INTRODUCTION TO CATEGORICAL DATA ANALYSIS ODDS RATIO, MEASURE OF ASSOCIATION, TEST OF INDEPENDENCE, LOGISTIC REGRESSION AND POLYTOMIOUS LOGISTIC REGRESSION

Binary Logistic Regression

For our problem, P = e-2.246-.780X1+.020X2+1.618X3-.023X4

1 + e-2.246-.780X1+.020X2+1.618X3-.023X4

For, a man, 30, Latino, and 12 years of education, the P equals?

Let’s solve for e-2.246-.780X1+.020X2+1.618X3-.023X4 = e-2.246-.780(0)+.020(30)+1.618(0)-.023(12)

e-2.246 – 0 + .6 + 0 - .276 = e -1.922 = 2.71828-1.922 = .146

Therefore,

P = .146 = .127 The probability that the 30 year-old, Latino with 12

1.146 years of education will own a gun is .127!!! Or you could say there is a 12.7% chance.

Page 52: INTRODUCTION TO CATEGORICAL DATA ANALYSIS ODDS RATIO, MEASURE OF ASSOCIATION, TEST OF INDEPENDENCE, LOGISTIC REGRESSION AND POLYTOMIOUS LOGISTIC REGRESSION

Binary Logistic RegressionInferential statistics are

as before:

• In model fit, if χ2 test is significant, the expanded model (with your variables), improves prediction.

• This Chi-squared test tells us that as a set, the variables improve classification.

Page 53: INTRODUCTION TO CATEGORICAL DATA ANALYSIS ODDS RATIO, MEASURE OF ASSOCIATION, TEST OF INDEPENDENCE, LOGISTIC REGRESSION AND POLYTOMIOUS LOGISTIC REGRESSION

Binary Logistic RegressionInferential statistics are as before:

• The significance of the coefficients is determined by a “wald test.” Wald is χ2 with 1 df and equals a two-tailed t2 with p-value exactly the same.

Page 54: INTRODUCTION TO CATEGORICAL DATA ANALYSIS ODDS RATIO, MEASURE OF ASSOCIATION, TEST OF INDEPENDENCE, LOGISTIC REGRESSION AND POLYTOMIOUS LOGISTIC REGRESSION

Binary Logistic Regression

1. Significance test for -level = .052. Critical X2

df=1= 3.84 3.To find if there is a significant slope in the population,

Ho: = 0Ha: 0

4.Collect Data5.Calculate Wald, like t (z): t = b – o (1.96 * 1.96 = 3.84) s.e. 6.Make decision about the null hypothesis7.Find P-value

So how would I do hypothesis testing? An Example:

Reject the null for Male, age, and white. Fail to reject the null for education. There is a 24.2% chance that the sample came from a population where the education coefficient equals 0.

Page 55: INTRODUCTION TO CATEGORICAL DATA ANALYSIS ODDS RATIO, MEASURE OF ASSOCIATION, TEST OF INDEPENDENCE, LOGISTIC REGRESSION AND POLYTOMIOUS LOGISTIC REGRESSION

EXTENSIONS OF LOGISTIC REGRESSION

Page 56: INTRODUCTION TO CATEGORICAL DATA ANALYSIS ODDS RATIO, MEASURE OF ASSOCIATION, TEST OF INDEPENDENCE, LOGISTIC REGRESSION AND POLYTOMIOUS LOGISTIC REGRESSION

MULTINOMIAL LOGISTIC REGRESSION

• There are many ways of constructing polytomous regression.1.Logistic regression with respect to a baseline category (e.g. last

category).For nominal response:

piJ,piJ,J,Ji

i,J

pipiJi

i

pipiJi

i

xxlog

xxlog

xxlog

1111101

2112022

1111011

1

1

1

Page 57: INTRODUCTION TO CATEGORICAL DATA ANALYSIS ODDS RATIO, MEASURE OF ASSOCIATION, TEST OF INDEPENDENCE, LOGISTIC REGRESSION AND POLYTOMIOUS LOGISTIC REGRESSION

MULTINOMIAL LOGISTIC REGRESSION

2. Adjacent categories logits (for ordinal data):

piJ,piJ,J,Ji

i,J

pipii

i

pipii

i

xxlog

xxlog

xxlog

1111101

2112023

2

1111012

1

Page 58: INTRODUCTION TO CATEGORICAL DATA ANALYSIS ODDS RATIO, MEASURE OF ASSOCIATION, TEST OF INDEPENDENCE, LOGISTIC REGRESSION AND POLYTOMIOUS LOGISTIC REGRESSION

MULTINOMIAL LOGISTIC REGRESSION

3. Cumulative logits for ordinal variables.4. Continuation-ratio logits for ordinal variables.5. Proportional odds model for ordinal variables.(See Agresti!)