logistic regression

22
LOGISTIC REGRESSION A statistical procedure to relate the probability of an event to explanatory variables Used in epidemiology to describe and evaluate the effect of a risk on the occurrence of a disease event. Example: Framingham Heart Study Coronary heart disease and blood pressure

Upload: hewitt

Post on 08-Jan-2016

32 views

Category:

Documents


2 download

DESCRIPTION

LOGISTIC REGRESSION. A statistical procedure to relate the probability of an event to explanatory variables Used in epidemiology to describe and evaluate the effect of a risk on the occurrence of a disease event. Example: Framingham Heart Study Coronary heart disease and blood pressure. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: LOGISTIC REGRESSION

LOGISTIC REGRESSION

A statistical procedure to relate the probability of an event to explanatory variables

Used in epidemiology to describe and evaluate the effect of a risk on the occurrence of a disease event.

Example: Framingham Heart Study

Coronary heart disease and blood pressure

Page 2: LOGISTIC REGRESSION

LOGISTIC REGRESSION: AN EXAMPLE

Event: Coronary Heart Disease

Occurrence is the dependent variable,

which takes 2 values: Yes or No.

Risk factor: Blood pressure

Systolic blood pressure is the independent variable X, a continuous measurement.

The probability of getting coronary heart disease depends on blood pressure.

Page 3: LOGISTIC REGRESSION

DATA

MAN SYSTOLIC DEVELOPEDBP CHD

John 130 NO 0Steven 140 NO 0Sean 145 NO 0Brian 150 NO 0Michael 155 YES 1Terry 160 NO 0Joseph 165 NO 1Patrick 170 YES 1Teddy 175 YES 1Ryan 180 YES 1

. . . .

. . . .

. . . .

Page 4: LOGISTIC REGRESSION

SCATTER PLOT

0.0

0.2

0.4

0.6

0.8

1.0

120 140 160 180 200

Systolic blood pressure

CH

D

Page 5: LOGISTIC REGRESSION

LINEAR REGRESSION FOR Prob.(CHD):NOT A GOOD IDEA!

-0.4-0.20.00.20.40.60.81.01.2

120 140 160 180 200

Systolic blood pressure

Pro

b(C

HD

)

Page 6: LOGISTIC REGRESSION

PROPORTION WITH CHD BY SBP GROUP

Systolic BP Range Proportion

130-149 mmHg 0/3 0.00

150-169 mmHg 2/4 0.50

170-189 mmHg 3/3 1.00

Page 7: LOGISTIC REGRESSION

LOGISTIC REGRESSION PROBABILITY MODEL

1

p(X) = -----------------------------

1 + exp (- 0 - X)

The probability of the event varies as an S-shaped function of the risk factor X: the logistic curve.

Page 8: LOGISTIC REGRESSION

LOGISTIC CURVE MODEL: OCCURRENCE OF CHD AS A FUNCTION OF SBP

Probability of CHD

0

0.2

0.4

0.6

0.8

1

0 100 200 300Systolic Blood Pressure

Pro

babi

lity

prob.=1/{1+exp(-6.08 + 0.0243(SBP)}

Page 9: LOGISTIC REGRESSION

LOGISTIC MODEL: LOG ODDS

p (X)

log ----------- = 0 + 1X

1 - p (X)

The log of the odds of the event is a linear function of X.

Log(odds of CHD) = - 6.08 + 0.0243(SBP)

Page 10: LOGISTIC REGRESSION

ODDS

The odds of an event is the chance that the event occurs divided by the chance of its not occurring:

Odds = p/(1 - p) = p/q

Page 11: LOGISTIC REGRESSION

: KEY PARAMETER OF THE LOGISTIC MODEL

p (X)

log ----------- = 0 + 1X 1 - p (X)

The parameter is like the slope of a linear regression model.

= 0 indicates that X has no effect on the probability, e.g., a man’s chance of CHD does not depend on his SBP.

Page 12: LOGISTIC REGRESSION

1: KEY PARAMETER

p (X)

log ----------- = 0 + 1X

1 - p (X)

The coefficient 1 measures the amount of change in the log of the odds per unit change in X.

Page 13: LOGISTIC REGRESSION

1: KEY PARAMETER

log odds(X+1) = 0 + 1(X+1)

= 0 + 1X+ 1

log odds(X) = 0 + 1X

Difference in log odds = 1

E.g., the log of the odds of getting CHD increases by 0.0243 for an increase of 1 mmHg of systolic blood pressure.

(Hard to explain to a patient!)

Page 14: LOGISTIC REGRESSION

THE COEFFICIENT 1

AND THE ODDS RATIO

Difference in log odds given by 1

translates into the odds ratio (OR).

exp(1) = OR =

ratio of odds at risk level of X+1

to the odds when risk level is X

1 = 0 OR = 1.

Page 15: LOGISTIC REGRESSION

THE COEFFICIENT 1 AND THE ODDS RATIO

For example, the odds of CHD are multiplied by the factor exp(0.0243) = 1.025 for every increase of 1 mmHg in SBP.

A difference of 10 mmHg multiplies the odds of CHD by (1.025)10, or 1.275.

Page 16: LOGISTIC REGRESSION

ESTIMATION OF THE PARAMETERS

Technique:

Maximum likelihood estimation

For large sample sizes, the normal distribution is used to put a confidence interval around the estimate of the coefficient .

Page 17: LOGISTIC REGRESSION

HYPOTHESIS TESTING

Ho: 1 = 0

No difference in risk at different levels of the risk factor X.

No association between risk factor X and probability of occurrence.

Page 18: LOGISTIC REGRESSION

HYPOTHESIS TESTING

Ha: 1 =/= 0 or

1 > 0 (risk increases with X) or

1 < 0 (risk goes down as X increases)

Page 19: LOGISTIC REGRESSION

HYPOTHESIS TESTING

Ho: OR = 1

Ha: OR =/= 1 or

OR > 1 (risk increases with X) or

OR < 1 (X is protective)

Page 20: LOGISTIC REGRESSION

RESULTS OF LOGISTIC REGRESSION

OR with confidence interval and p value indicate whether there is a significant association between level of the risk factor and chance of occurrence

OR = 1.025 (1.015, 1.034), p < 0.001

Page 21: LOGISTIC REGRESSION

RESULTS OF LOGISTIC REGRESSION

Can be used to predict an individual’s risk:

prob. of CHD when SBP = 180:

p/q = exp{-6.082 + 0.0243(180)}

Solve for p:

prob. of CHD = 0.125

Page 22: LOGISTIC REGRESSION

MULTIVARIATE LOGISTIC REGRESSION

Model with additional risk factors:

p (X)

log ----------- = 0 + 1X + 2X 1 - p (X)

Log(odds of CHD) =

0+ 1(SBP) + 2(CHOL) + 3(smoker)