exact logistic regression larry cook. outline review the logistic regression model explore an...

57
Exact Logistic Regression Larry Cook

Upload: miranda-sparks

Post on 18-Dec-2015

223 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Exact Logistic Regression Larry Cook. Outline Review the logistic regression model Explore an example where model assumptions fail –Brief algebraic interlude

Exact Logistic Regression

Larry Cook

Page 2: Exact Logistic Regression Larry Cook. Outline Review the logistic regression model Explore an example where model assumptions fail –Brief algebraic interlude

Outline

• Review the logistic regression model

• Explore an example where model assumptions fail– Brief algebraic interlude

• Explore an example with a different issue where logistic regression fails

• Computational considerations

• Example SAS code

Page 3: Exact Logistic Regression Larry Cook. Outline Review the logistic regression model Explore an example where model assumptions fail –Brief algebraic interlude

Logistic Regression

• Model a binary outcome, Y, with one or more predictors– Success/failure– Disease/not disease

• Model outcome in terms of the log odds of a success

• log(odds of Yi) = + xi +

Page 4: Exact Logistic Regression Larry Cook. Outline Review the logistic regression model Explore an example where model assumptions fail –Brief algebraic interlude

Why Log Odds?

• Canonical link function• Makes a binary outcome continuous• Solves this problem

– Probability is constrained to [0,1]– Odds are constrained to [0, ∞)

• Log odds are in (-∞, ∞)• Exponentiating coefficients gives us

estimates of odds ratios

Page 5: Exact Logistic Regression Larry Cook. Outline Review the logistic regression model Explore an example where model assumptions fail –Brief algebraic interlude

Example: Motor Vehicle Crash Fatalities

• What are odds of being hospitalized or killed in a motor vehicle crash for drivers using safety restraints vs. those that are not?– Outcome: Hospitalized/killed or not– Covariate: safety belt use

Page 6: Exact Logistic Regression Larry Cook. Outline Review the logistic regression model Explore an example where model assumptions fail –Brief algebraic interlude

Hospital/Killed * Restraint Use

OR = 0.22, p-value < 0.001

Page 7: Exact Logistic Regression Larry Cook. Outline Review the logistic regression model Explore an example where model assumptions fail –Brief algebraic interlude

Example: Motor Vehicle Crash Fatalities

• What are odds of being hospitalized or killed in a motor vehicle crash for drivers using safety restraints vs. those that are not?– Outcome: Hospitalized/killed or not– Covariate: safety belt use

gender, age, alcohol, rural area

Page 8: Exact Logistic Regression Larry Cook. Outline Review the logistic regression model Explore an example where model assumptions fail –Brief algebraic interlude

Logistic Regression Output

Parameter Estimate Odds Ratio P-value

Intercept -0.261 < 0.001

Male -0.576 0.56 < 0.001

Restraint Use

-1.430 0.24 < 0.001

Alcohol 1.065 2.90 < 0.001

Night 0.194 1.21 0.011

Rural 0.135 1.14 <0.001

Page 9: Exact Logistic Regression Larry Cook. Outline Review the logistic regression model Explore an example where model assumptions fail –Brief algebraic interlude

Assumptions

• Conditional probabilities follow a logistic function of the independent variables

• Observations are independent

• Asymptotics– Sample size is large enough– Minimum of 50 to 100 observations– 10 successes/failures per variable

Page 10: Exact Logistic Regression Larry Cook. Outline Review the logistic regression model Explore an example where model assumptions fail –Brief algebraic interlude

Corneal Graft Rejections

• What if studying a rare disease?

• Data for eight kids in young age group and eight in the older age group

• Hypothesis is that rejection is more likely in older children

Page 11: Exact Logistic Regression Larry Cook. Outline Review the logistic regression model Explore an example where model assumptions fail –Brief algebraic interlude

Graft RejectionsYoung (< 4 y.o.)

(X = 0)Older (> 4 y.o.)

(X = 1)Total

No Rejection(Y = 0)

7 2 9

Rejection(Y = 1)

1 6 7

Total 8 8 16

OR = 21, p-value = 0.012, 100% of cell have expected counts < 5!!!Fisher’s Exact Test p-value (2-sided) = 0.0406; (1-sided) = 0.0203

Page 12: Exact Logistic Regression Larry Cook. Outline Review the logistic regression model Explore an example where model assumptions fail –Brief algebraic interlude

Let’s Tackle the Graft Rejection Example as

Logistic Regression

Page 13: Exact Logistic Regression Larry Cook. Outline Review the logistic regression model Explore an example where model assumptions fail –Brief algebraic interlude

Graft RejectionsYoung (< 4 y.o.) Older (> 4 y.o.) Total

No Rejection 7 2 9

Rejection 1 6 7

Total 8 8 16

Sample Size << 50!Don’t have 10 success or 10 failures!

Page 14: Exact Logistic Regression Larry Cook. Outline Review the logistic regression model Explore an example where model assumptions fail –Brief algebraic interlude

Exact (Conditional)Logistic Regression

• Rather than using the unconditional logistic regression, we will condition on nuisance parameters

• Use conditional maximum likelihood for estimation and inference

Page 15: Exact Logistic Regression Larry Cook. Outline Review the logistic regression model Explore an example where model assumptions fail –Brief algebraic interlude

Warning Algebra Ahead

Proceed with Caution

Page 16: Exact Logistic Regression Larry Cook. Outline Review the logistic regression model Explore an example where model assumptions fail –Brief algebraic interlude

Logistic Model

Page 17: Exact Logistic Regression Larry Cook. Outline Review the logistic regression model Explore an example where model assumptions fail –Brief algebraic interlude

Likelihood of a Sample

Page 18: Exact Logistic Regression Larry Cook. Outline Review the logistic regression model Explore an example where model assumptions fail –Brief algebraic interlude

Sufficient Statistics

Page 19: Exact Logistic Regression Larry Cook. Outline Review the logistic regression model Explore an example where model assumptions fail –Brief algebraic interlude

Conditioning

• If we are only trying to describe the relationship between rejection and age, do we care about the value of the intercept?

• Remove the intercept, , out of the likelihood by conditioning on its sufficient statistic, t0 = yi.

• Let S(to) = Set of all tables with yi = t0 and observed sample sizes

Page 20: Exact Logistic Regression Larry Cook. Outline Review the logistic regression model Explore an example where model assumptions fail –Brief algebraic interlude

Conditional Likelihood

Page 21: Exact Logistic Regression Larry Cook. Outline Review the logistic regression model Explore an example where model assumptions fail –Brief algebraic interlude

Estimation

Page 22: Exact Logistic Regression Larry Cook. Outline Review the logistic regression model Explore an example where model assumptions fail –Brief algebraic interlude

Inference

Page 23: Exact Logistic Regression Larry Cook. Outline Review the logistic regression model Explore an example where model assumptions fail –Brief algebraic interlude

End of Algebra

Back to Example

Page 24: Exact Logistic Regression Larry Cook. Outline Review the logistic regression model Explore an example where model assumptions fail –Brief algebraic interlude

Graft RejectionsYoung (< 4 y.o.)

(X = 0)Older (> 4 y.o.)

(X = 1)Total

No Rejection(Y = 0)

7 2 9

Rejection(Y = 1)

1 6 7

Total 8 8 16

Sufficient Statisticst0 = yi = # of rejections = 7t1 = xiyi = 0*# of rejections in young + 1*# of rejections in old = 0*1 + 1*6 = 6

Page 25: Exact Logistic Regression Larry Cook. Outline Review the logistic regression model Explore an example where model assumptions fail –Brief algebraic interlude

Conditional Distribution for Graft Rejection

• Need to calculate all possible tables that have exactly 7 rejections

• Calculate how often each of the tables occur

• Calculate CMLE

• Calculate how rare our table is to obtain p-value

Page 26: Exact Logistic Regression Larry Cook. Outline Review the logistic regression model Explore an example where model assumptions fail –Brief algebraic interlude

Reference SetYng_NR Yng_R Old_NR Old_R t0 t1 Count P[Table]

1 7 8 0 7 0 8 0.0007

2 6 7 1 7 1 224 0.0196

3 5 6 2 7 2 1,568 0.1371

4 4 5 3 7 3 3,920 0.3427

5 3 4 4 7 4 3,920 0.3427

6 2 3 5 7 5 1,568 0.1371

7 1 2 6 7 6 224 0.0196

8 0 1 7 7 7 8 0.007

7 11,440 1.000

Page 27: Exact Logistic Regression Larry Cook. Outline Review the logistic regression model Explore an example where model assumptions fail –Brief algebraic interlude

Estimate and Find a p-value

t1 Count P[Table]

0 8 0.0007

1 224 0.0196

2 1,568 0.1371

3 3,920 0.3427

4 3,920 0.3427

5 1,568 0.1371

6 224 0.0196

7 8 0.0007

Page 28: Exact Logistic Regression Larry Cook. Outline Review the logistic regression model Explore an example where model assumptions fail –Brief algebraic interlude

Estimate and p-value

t1 Count P[Table]

0 8 0.0007

1 224 0.0196

2 1,568 0.1371

3 3,920 0.3427

4 3,920 0.3427

5 1,568 0.1371

6 224 0.0196

7 8 0.0007

Page 29: Exact Logistic Regression Larry Cook. Outline Review the logistic regression model Explore an example where model assumptions fail –Brief algebraic interlude

Confidence Interval

• Lower Bound, -

• If t1 = t1,min - = -∞

• Otherwise - is the value of

that produces an upper p-value of /2

• Upper Bound, +

• If t1 = t1,max + = ∞

• Otherwise + is the value of

that produces a lower p-value of /2

Page 30: Exact Logistic Regression Larry Cook. Outline Review the logistic regression model Explore an example where model assumptions fail –Brief algebraic interlude

Final Stats for Graft Rejection

Page 31: Exact Logistic Regression Larry Cook. Outline Review the logistic regression model Explore an example where model assumptions fail –Brief algebraic interlude

Example 2

PECARN C-Spine Study

Page 32: Exact Logistic Regression Larry Cook. Outline Review the logistic regression model Explore an example where model assumptions fail –Brief algebraic interlude

Case Control Study

Not Present Present Total

Control 1,057 2 1,059

Case 540 0 540

Total 1,0597 2 1,599

Any problems estimating the odds ratio?

Could exact logistic regression help?

Page 33: Exact Logistic Regression Larry Cook. Outline Review the logistic regression model Explore an example where model assumptions fail –Brief algebraic interlude

What sufficient statisticsare needed?

Not Present(X = 0)

Present(X = 1)

Total

Control(Y = 0)

1,057 2 1,059

Case(Y = 1)

540 0 540

Total 1,597 2 1,599

• y = 2• xy = 0

Page 34: Exact Logistic Regression Larry Cook. Outline Review the logistic regression model Explore an example where model assumptions fail –Brief algebraic interlude

Conditional DensityCase P Case NP Ctrl P Ctrl NP t0 t1 Count P[Table]

0 540 2 1,057 2 0 560,211 0.438

1 539 1 1,058 2 1 571,860 0.448

2 538 0 1,059 2 2 145,530 0.114

2 1,277,601 1.000

One-sided p-value = 0.438Two-sided p-value = 2*0.438 = 0.87695% confidence interval (-∞, 2.345)Point estimate?

Page 35: Exact Logistic Regression Larry Cook. Outline Review the logistic regression model Explore an example where model assumptions fail –Brief algebraic interlude

Median Unbiased Estimate

Page 36: Exact Logistic Regression Larry Cook. Outline Review the logistic regression model Explore an example where model assumptions fail –Brief algebraic interlude

One More Example

Dose Response

Page 37: Exact Logistic Regression Larry Cook. Outline Review the logistic regression model Explore an example where model assumptions fail –Brief algebraic interlude

Toxicology Experiment

0 1 2 3 Total

Lived 99 97 95 90 381

Died 1 3 5 10 19

Total 100 100 100 100 400

• 400 mice randomized to one of four levels of a drug• Drug administered to each animal• Outcome is the number of deaths in each dose

level

y = 19xy = 3 + 10 + 30 = 43

Page 38: Exact Logistic Regression Larry Cook. Outline Review the logistic regression model Explore an example where model assumptions fail –Brief algebraic interlude
Page 39: Exact Logistic Regression Larry Cook. Outline Review the logistic regression model Explore an example where model assumptions fail –Brief algebraic interlude

Exact vs. Unconditional

Exact

• Estimate = 0.710• SE = 0.246• OR = 2.03• CI = (1.26, 3.52)• p-value = 0.002

Unconditional

• Estimate = 0.712• SE = 0.246• OR = 2.04• CI = (1.26, 3.30)• p-value = 0.004

Page 40: Exact Logistic Regression Larry Cook. Outline Review the logistic regression model Explore an example where model assumptions fail –Brief algebraic interlude

Computational Issues

Page 41: Exact Logistic Regression Larry Cook. Outline Review the logistic regression model Explore an example where model assumptions fail –Brief algebraic interlude

Counting All the Tables

• One of the main hurdles for conditional logistic regression is counting all the tables in the sample space– Graft rejections – 11,440 possibilities– PECARN C-Spine - 1,277,601– Toxicology – 2.79 x 1033

• Obviously don’t want to generate tables one at a time

Page 42: Exact Logistic Regression Larry Cook. Outline Review the logistic regression model Explore an example where model assumptions fail –Brief algebraic interlude

Network Algorithm

• Graphical representation of the sample space

• Nodes represent a partial sum of the sufficient statistic

• Arcs have combinatorial weighting value

• One path through the graph represents a table in the sample space

Page 43: Exact Logistic Regression Larry Cook. Outline Review the logistic regression model Explore an example where model assumptions fail –Brief algebraic interlude

ExampleX = 1 X = 2 X = 3 X = 4 Total

Y = 0 3 2 2 1 8

Y = 1 0 1 1 2 4

Total 3 3 3 3 12

Sufficient Statisticst0 = yi = 4t1 = xiyi = 1*0 + 2*1 + 3*1 + 4*2 = 13

Page 44: Exact Logistic Regression Larry Cook. Outline Review the logistic regression model Explore an example where model assumptions fail –Brief algebraic interlude

(1,0) (2,0)

(1,1) (2,1) (3,1)

(0,0) (1,2) (2,2) (3,2) (4,4)

(1,3) (2,3) (3,3)

(2,4) (3,4)

X = 1 X = 2 X = 3 X = 4 Total

Y = 0 1 3 1 3 8

Y = 1 2 0 2 0 4

Page 45: Exact Logistic Regression Larry Cook. Outline Review the logistic regression model Explore an example where model assumptions fail –Brief algebraic interlude

(1,0) (2,0)

(1,1) (2,1) (3,1)

(0,0) (1,2) (2,2) (3,2) (4,4)

(1,3) (2,3) (3,3)

(2,4) (3,4)

X = 1 X = 2 X = 3 X = 4 Total

Y = 0 3 2 2 1 8

Y = 1 0 1 1 2 4

Page 46: Exact Logistic Regression Larry Cook. Outline Review the logistic regression model Explore an example where model assumptions fail –Brief algebraic interlude

Network Representationof the Sample Space

(1,0) (2,0)

(1,1) (2,1) (3,1)

(0,0) (1,2) (2,2) (3,2) (4,4)

(1,3) (2,3) (3,3)

(2,4) (3,4)

Page 47: Exact Logistic Regression Larry Cook. Outline Review the logistic regression model Explore an example where model assumptions fail –Brief algebraic interlude

What About Multiple Covariates?

More Conditioning!

Page 48: Exact Logistic Regression Larry Cook. Outline Review the logistic regression model Explore an example where model assumptions fail –Brief algebraic interlude

Osteogtenic SarcomaLogXact Manual

• 46 patients surgically treated for osteogenic sarcoma and then observed for disease recurrence within 3 years

• Covariates– Sex: Male = 1, Female = 0– Any Ostoid Pathology (AOP)

• Present = 1, not = 0

• Interested in the effect of AOP

Page 49: Exact Logistic Regression Larry Cook. Outline Review the logistic regression model Explore an example where model assumptions fail –Brief algebraic interlude

Osteogtenic SarcomaCovariate

GroupNo

Recurrence(y = 0)

Recurrence(y = 1)

Group Size(ni)

Covariates

Sex (x1) AOP (x2)

1 8 0 8 0 0

2 5 2 7 0 1

3 9 4 13 1 0

4 7 11 18 1 1

Total 29 17 46

Page 50: Exact Logistic Regression Larry Cook. Outline Review the logistic regression model Explore an example where model assumptions fail –Brief algebraic interlude

Estimating the Effect of AOP

• New statistics to condition– Group sizes– Sufficient statistic for intercept, y = 17

– Sufficient statistic for coefficient for sex, x1y = 15

• Calculate the conditional distribution of x2y– Sufficient statistic for coefficient for AOP– Number of cases with AOP in recurrence (=13)– Given exactly 17 with recurrence

15 of which are males

Page 51: Exact Logistic Regression Larry Cook. Outline Review the logistic regression model Explore an example where model assumptions fail –Brief algebraic interlude

Network Algorithm

• The Network Algorithm using two passes– First pass conditions on the intercept

• All tables with exactly 17 cases in recurrence

– Second pass removes arcs that don’t produce sufficient statistic for sex

• All tables that don’t have 15 males in recurrence

• Proceed with estimation & inference as before

Page 52: Exact Logistic Regression Larry Cook. Outline Review the logistic regression model Explore an example where model assumptions fail –Brief algebraic interlude

P[x2y = t2 |17 in recurrence and 15 males ]

Page 53: Exact Logistic Regression Larry Cook. Outline Review the logistic regression model Explore an example where model assumptions fail –Brief algebraic interlude

Results

Page 54: Exact Logistic Regression Larry Cook. Outline Review the logistic regression model Explore an example where model assumptions fail –Brief algebraic interlude

LR Test for Both Variables

• To test both sex and AOP are zero simultaneously, need the joint conditional density– All possible combinations of males and

patients with AOP in recurrence given exactly 17 patients in recurrence

– Determine how rare is it to have 15 recurrent males AND 13 recurrent AOP patients?

Page 55: Exact Logistic Regression Larry Cook. Outline Review the logistic regression model Explore an example where model assumptions fail –Brief algebraic interlude

SAS Examples

Page 56: Exact Logistic Regression Larry Cook. Outline Review the logistic regression model Explore an example where model assumptions fail –Brief algebraic interlude

Conclusion

• Exact (conditional) logistic regression– Useful method when asymptotic assumptions

are not met or with separation– Utilizes conditioning to remove nuisance

parameters from the likelihood– Very computational intensive method– Network algorithm speeds up calculations

Page 57: Exact Logistic Regression Larry Cook. Outline Review the logistic regression model Explore an example where model assumptions fail –Brief algebraic interlude

Questions?