logistic regression
DESCRIPTION
LOGISTIC REGRESSION. A statistical procedure to relate the probability of an event to explanatory variables Used in epidemiology to describe and evaluate the effect of a risk on the occurrence of a disease event. Example: Framingham Heart Study Coronary heart disease and blood pressure. - PowerPoint PPT PresentationTRANSCRIPT
LOGISTIC REGRESSION
A statistical procedure to relate the probability of an event to explanatory variables
Used in epidemiology to describe and evaluate the effect of a risk on the occurrence of a disease event.
Example: Framingham Heart Study
Coronary heart disease and blood pressure
LOGISTIC REGRESSION: AN EXAMPLE
Event: Coronary Heart Disease
Occurrence is the dependent variable,
which takes 2 values: Yes or No.
Risk factor: Blood pressure
Systolic blood pressure is the independent variable X, a continuous measurement.
The probability of getting coronary heart disease depends on blood pressure.
DATA
MAN SYSTOLIC DEVELOPEDBP CHD
John 130 NO 0Steven 140 NO 0Sean 145 NO 0Brian 150 NO 0Michael 155 YES 1Terry 160 NO 0Joseph 165 NO 1Patrick 170 YES 1Teddy 175 YES 1Ryan 180 YES 1
. . . .
. . . .
. . . .
SCATTER PLOT
0.0
0.2
0.4
0.6
0.8
1.0
120 140 160 180 200
Systolic blood pressure
CH
D
LINEAR REGRESSION FOR Prob.(CHD):NOT A GOOD IDEA!
-0.4-0.20.00.20.40.60.81.01.2
120 140 160 180 200
Systolic blood pressure
Pro
b(C
HD
)
PROPORTION WITH CHD BY SBP GROUP
Systolic BP Range Proportion
130-149 mmHg 0/3 0.00
150-169 mmHg 2/4 0.50
170-189 mmHg 3/3 1.00
LOGISTIC REGRESSION PROBABILITY MODEL
1
p(X) = -----------------------------
1 + exp (- 0 - X)
The probability of the event varies as an S-shaped function of the risk factor X: the logistic curve.
LOGISTIC CURVE MODEL: OCCURRENCE OF CHD AS A FUNCTION OF SBP
Probability of CHD
0
0.2
0.4
0.6
0.8
1
0 100 200 300Systolic Blood Pressure
Pro
babi
lity
prob.=1/{1+exp(-6.08 + 0.0243(SBP)}
LOGISTIC MODEL: LOG ODDS
p (X)
log ----------- = 0 + 1X
1 - p (X)
The log of the odds of the event is a linear function of X.
Log(odds of CHD) = - 6.08 + 0.0243(SBP)
ODDS
The odds of an event is the chance that the event occurs divided by the chance of its not occurring:
Odds = p/(1 - p) = p/q
: KEY PARAMETER OF THE LOGISTIC MODEL
p (X)
log ----------- = 0 + 1X 1 - p (X)
The parameter is like the slope of a linear regression model.
= 0 indicates that X has no effect on the probability, e.g., a man’s chance of CHD does not depend on his SBP.
1: KEY PARAMETER
p (X)
log ----------- = 0 + 1X
1 - p (X)
The coefficient 1 measures the amount of change in the log of the odds per unit change in X.
1: KEY PARAMETER
log odds(X+1) = 0 + 1(X+1)
= 0 + 1X+ 1
log odds(X) = 0 + 1X
Difference in log odds = 1
E.g., the log of the odds of getting CHD increases by 0.0243 for an increase of 1 mmHg of systolic blood pressure.
(Hard to explain to a patient!)
THE COEFFICIENT 1
AND THE ODDS RATIO
Difference in log odds given by 1
translates into the odds ratio (OR).
exp(1) = OR =
ratio of odds at risk level of X+1
to the odds when risk level is X
1 = 0 OR = 1.
THE COEFFICIENT 1 AND THE ODDS RATIO
For example, the odds of CHD are multiplied by the factor exp(0.0243) = 1.025 for every increase of 1 mmHg in SBP.
A difference of 10 mmHg multiplies the odds of CHD by (1.025)10, or 1.275.
ESTIMATION OF THE PARAMETERS
Technique:
Maximum likelihood estimation
For large sample sizes, the normal distribution is used to put a confidence interval around the estimate of the coefficient .
HYPOTHESIS TESTING
Ho: 1 = 0
No difference in risk at different levels of the risk factor X.
No association between risk factor X and probability of occurrence.
HYPOTHESIS TESTING
Ha: 1 =/= 0 or
1 > 0 (risk increases with X) or
1 < 0 (risk goes down as X increases)
HYPOTHESIS TESTING
Ho: OR = 1
Ha: OR =/= 1 or
OR > 1 (risk increases with X) or
OR < 1 (X is protective)
RESULTS OF LOGISTIC REGRESSION
OR with confidence interval and p value indicate whether there is a significant association between level of the risk factor and chance of occurrence
OR = 1.025 (1.015, 1.034), p < 0.001
RESULTS OF LOGISTIC REGRESSION
Can be used to predict an individual’s risk:
prob. of CHD when SBP = 180:
p/q = exp{-6.082 + 0.0243(180)}
Solve for p:
prob. of CHD = 0.125
MULTIVARIATE LOGISTIC REGRESSION
Model with additional risk factors:
p (X)
log ----------- = 0 + 1X + 2X 1 - p (X)
Log(odds of CHD) =
0+ 1(SBP) + 2(CHOL) + 3(smoker)