basic statistics: logistic...
TRANSCRIPT
Motivation Overview One binary covariate Two binary covariates One continuous covariate Mutually adjusted Interaction Final words
Basic statistics: logistic regression
Paul Blanche1
Department of Biostatistics, University of Copenhagen
13/11/2019
1material based on previous (wonderful) work from Thomas A Gerds.
1/55
Motivation Overview One binary covariate Two binary covariates One continuous covariate Mutually adjusted Interaction Final words
TOPICS OF TODAY
Logistic regression, especially:I modelI parameter interpretationI examplesI categorical and continuous covariatesI 95% CI and testI interactionI Fitting the model with R and a few computational “tricks”
1/55
Motivation Overview One binary covariate Two binary covariates One continuous covariate Mutually adjusted Interaction Final words
CARPENTER ET AL. (THE LANCET, 2004)
2/55
Motivation Overview One binary covariate Two binary covariates One continuous covariate Mutually adjusted Interaction Final words
CARPENTER ET AL.
3/55
Motivation Overview One binary covariate Two binary covariates One continuous covariate Mutually adjusted Interaction Final words
REGRESSION
The type of the outcome variable determines which kind of model is relevant:
Quantitative (continuous) outcome
I Linear regresssionI Association parameters: differences between mean values
0-1 (binary) outcome
I Logistic regressionI Association parameters: odds ratio, differences between log(odds)
4/55
Motivation Overview One binary covariate Two binary covariates One continuous covariate Mutually adjusted Interaction Final words
CATEGORICAL EXPLANATORY VARIABLE
Group 1, . . . , K (especially binary: K=2)
Linear regression, continuous outcome Y
mean(Y|group k) - mean(Y|reference group)
E.g., the average systolic blood pressure was higher in males compared tofemales
Logistic regression, binary outcome
Odds ratio =odds(group k)
odds(reference group)
E.g., the risk (or the odds 2) of coronary heart disease was higher in malescompared to females
2remember: odds(p)= p/(1 − p) and “higher odds” is equivalent to “higher risk”.5/55
Motivation Overview One binary covariate Two binary covariates One continuous covariate Mutually adjusted Interaction Final words
QUANTITATIVE (CONTINUOUS) EXPLANATORY VARIABLES
Linear regression, continuous outcome YDifferences in mean values per unit of X:
mean(Y|x+1)-mean(Y|x)
E.g., the average systolic blood pressure increased with age
Logistic regression, binary outcomeDifferences in log(odds) per unit of X
Odds ratio =odds(x+1)
odds(x)
E.g., the risk (odds) of coronary heart disease increased with age
6/55
Motivation Overview One binary covariate Two binary covariates One continuous covariate Mutually adjusted Interaction Final words
LINEARITY IN REGRESSION MODELS
For a continuous explanatory variable X, linearity means that the effect of aunit change of X on the outcome does not depend on the value of X .
Linear regression, continuous outcome Y
mean(Y |45 + 1)−mean(Y |45) = mean(Y |46 + 1)−mean(Y |46)
= · · · = mean(Y |61 + 1)−mean(Y |61)
Logistic regression, binary outcome
odds(45+1)odds(45)
=odds(46+1)
odds(46)= · · · = odds(61+1)
odds(61)
Linearity is a model assumption which should be checked!3
3Categorization of the continuous covariate can be useful when linearity does not hold.7/55
Motivation Overview One binary covariate Two binary covariates One continuous covariate Mutually adjusted Interaction Final words
BINARY OUTCOME REGRESSION
If the outcome variable is binary:
Yi =
{1 if i is diseased0 if i is not diseased
then linear regressionYi = β0 + β1Xi
is not good for many reasons.
One reason is that theregression line will go below 0and above 1.
● ●● ●● ●●●
●
●
●
● ●●●
●
●
●
●●● ●
●
●● ●● ●●● ● ●● ●● ●● ●● ● ●● ●● ●● ● ●
●
●
●
●
●
●●●●●● ●● ●
●
●● ●● ● ●● ●●● ●● ●
●
●● ●●●
●
●● ●●●● ● ●● ●● ● ●●
●●
●● ●● ●● ●●● ●●●
●
● ●● ● ●
●
●●●
●
●●
●
●●●● ●●
● ●
●
●
●
●
●● ● ●●●●
●
●●● ●●● ●● ●● ● ●● ●●● ●●●
●
● ● ●●● ●●
●
● ●●
●
●●● ●● ●●● ●● ●●
●●
● ●
●
● ●● ● ●●●
●
● ●
●●
● ●●● ●
●
●●●
●
●●●● ●● ● ●● ●●
●
●
●
● ●●
●●
●●
●
●● ● ● ●● ● ●
●
● ●
●
● ●● ●● ●●
●
●
●
● ●●
●
●●●● ●●
●
● ● ●●●● ●●
●
●
●
●
●
●●● ●●● ●
●
● ●●
● ●● ●
●●●●
● ●
●●●
●
●
●
●
●
● ●● ●●●
●
●●● ●● ●
●
●●
●●
● ●●●●●
●
●
●
●● ●
●
● ●●●● ●
●
●
●
●●●
●
●●●
● ●
● ●● ●●
●
●●● ●● ●
●
●●●● ●●
●
●●● ●●
●
● ●●●●
●
●●●
●
●
●●
● ● ● ●● ● ●●● ●●●●
●
● ● ●● ●● ●● ● ●● ●● ●● ● ●●● ●●●●●● ●
●
● ●●● ●●
●
●● ●● ● ●● ●●●●● ●● ●●●●● ● ● ●●
●
●●
●
●● ●
●
●● ●●
●
●
●
●●●● ● ●
●
● ●●
● ●
●
●
● ●●
●
● ●●
●
●● ●
●
●● ●●● ●
●
●● ●● ●●
●
● ●●
●
●●
●
● ●
●
● ● ● ●●
●
●● ●●● ●●●● ● ●●● ●● ●
●
● ●●●●
●
●●
●
●
● ●
●
●
●● ●
●
●● ●●● ●● ●● ●● ● ●●● ● ●●●
●
●
●
●
●●
●
●
● ●● ●
●
●● ● ●●●
●●
●
●
●●● ● ●● ●●
●
● ●● ● ●●● ●● ●● ●●●
●
●● ● ●● ●●● ●
● ●●
● ●● ●●
●● ●
●● ●●
●
●●
●
● ●●
●
● ●● ● ●●● ●
●
●
●
●●● ●●●● ● ●
●
●
●
●●
●
●
● ●
●●
●
●● ●
●
●
●
●● ● ●● ●● ●● ●● ●● ● ●●● ●● ●●● ●
●
● ●● ●● ●●●●
●
●●●● ●
● ●
● ●
●●●
●● ●●
●
● ● ●●● ●
●
●
●
● ●● ●●
●
●
●
●
●
●●●
●●
●● ●●
●●
●● ● ●●●● ● ●●●● ●● ●●● ●
● ●
● ● ●
●
●
● ●
●
●● ●
●
●
●●● ●●●
● ●
●●
●
● ●● ● ● ●● ●● ●●● ●●● ● ●●●● ● ●
● ●
● ●● ● ●
● ●●
●
●
●●●
●
●●●● ● ●●●●
●
● ●●●
●
●● ● ●● ●● ● ●●
●
● ●
●
●
●
●
●
●●●
●
●● ● ●● ●●●●
●
●●●
●
●
●
●
●
● ●● ●●
●●●
●● ●●● ●
●
●
●
●●
●
● ● ●●● ●● ●
●
●● ●● ● ●● ●● ●●● ●
●
●
●
●
●
● ●●●● ●●● ●
● ●
● ●
●
●● ● ●●●● ● ●
●
●● ●●
●
●● ●●
●
●
●
●● ●● ●●● ●●● ●●
●
●● ●●
● ●
●●● ● ●
●
●● ●● ●● ●●●●● ● ●●●
●
●● ●● ● ●●● ●●●●
●
● ●●
●●●
●●●●
●
●●
●
●● ●● ●● ●●
●●
●● ●●● ●● ●
●
● ● ●●● ●●
●
● ●●
●
●●
● ●● ●● ●
● ●● ●●● ●
●
●●● ●●●● ●
●
● ●●●
●
●
●
●●●
●
● ●●
●
●● ● ●
●
●● ●● ● ● ●●●
●
● ●● ● ●● ● ● ●● ●●● ●● ●●● ●● ●●●
●
●●●
● ●
●
●
●● ●● ●●● ●
●
● ●● ●●●● ●● ●●
●
●●
● ●
●
●
● ● ●●
●
● ●●● ●●● ●● ● ●● ●●● ●
●
● ● ● ●●● ●●
●●●
●
●
●●●●
●
● ●●● ●● ● ●●●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
● ●● ●
●
● ●●
●●
●●● ●● ●● ● ●●
●
● ●
●
● ●● ●
●●
●
●
●
●●
●●● ●● ● ●● ●● ●●●●● ●●●
●
●● ● ●●
● ●
●● ●
−25 %
0 %
25 %
50 %
75 %
100 %
125 %
20 40 60 80
(Fitted linear model lm(Y∼AGE,data=framingham))
8/55
Motivation Overview One binary covariate Two binary covariates One continuous covariate Mutually adjusted Interaction Final words
BINARY OUTCOME REGRESSION
If the outcome variable is binary:
Yi =
{1 if i is diseased0 if i is not diseased
then linear regressionYi = β0 + β1Xi
is not good for many reasons.
One reason is that theregression line will go below 0and above 1.
● ●● ●● ●●●
●
●
●
● ●●●
●
●
●
●●● ●
●
●● ●● ●●● ● ●● ●● ●● ●● ● ●● ●● ●● ● ●
●
●
●
●
●
●●●●●● ●● ●
●
●● ●● ● ●● ●●● ●● ●
●
●● ●●●
●
●● ●●●● ● ●● ●● ● ●●
●●
●● ●● ●● ●●● ●●●
●
● ●● ● ●
●
●●●
●
●●
●
●●●● ●●
● ●
●
●
●
●
●● ● ●●●●
●
●●● ●●● ●● ●● ● ●● ●●● ●●●
●
● ● ●●● ●●
●
● ●●
●
●●● ●● ●●● ●● ●●
●●
● ●
●
● ●● ● ●●●
●
● ●
●●
● ●●● ●
●
●●●
●
●●●● ●● ● ●● ●●
●
●
●
● ●●
●●
●●
●
●● ● ● ●● ● ●
●
● ●
●
● ●● ●● ●●
●
●
●
● ●●
●
●●●● ●●
●
● ● ●●●● ●●
●
●
●
●
●
●●● ●●● ●
●
● ●●
● ●● ●
●●●●
● ●
●●●
●
●
●
●
●
● ●● ●●●
●
●●● ●● ●
●
●●
●●
● ●●●●●
●
●
●
●● ●
●
● ●●●● ●
●
●
●
●●●
●
●●●
● ●
● ●● ●●
●
●●● ●● ●
●
●●●● ●●
●
●●● ●●
●
● ●●●●
●
●●●
●
●
●●
● ● ● ●● ● ●●● ●●●●
●
● ● ●● ●● ●● ● ●● ●● ●● ● ●●● ●●●●●● ●
●
● ●●● ●●
●
●● ●● ● ●● ●●●●● ●● ●●●●● ● ● ●●
●
●●
●
●● ●
●
●● ●●
●
●
●
●●●● ● ●
●
● ●●
● ●
●
●
● ●●
●
● ●●
●
●● ●
●
●● ●●● ●
●
●● ●● ●●
●
● ●●
●
●●
●
● ●
●
● ● ● ●●
●
●● ●●● ●●●● ● ●●● ●● ●
●
● ●●●●
●
●●
●
●
● ●
●
●
●● ●
●
●● ●●● ●● ●● ●● ● ●●● ● ●●●
●
●
●
●
●●
●
●
● ●● ●
●
●● ● ●●●
●●
●
●
●●● ● ●● ●●
●
● ●● ● ●●● ●● ●● ●●●
●
●● ● ●● ●●● ●
● ●●
● ●● ●●
●● ●
●● ●●
●
●●
●
● ●●
●
● ●● ● ●●● ●
●
●
●
●●● ●●●● ● ●
●
●
●
●●
●
●
● ●
●●
●
●● ●
●
●
●
●● ● ●● ●● ●● ●● ●● ● ●●● ●● ●●● ●
●
● ●● ●● ●●●●
●
●●●● ●
● ●
● ●
●●●
●● ●●
●
● ● ●●● ●
●
●
●
● ●● ●●
●
●
●
●
●
●●●
●●
●● ●●
●●
●● ● ●●●● ● ●●●● ●● ●●● ●
● ●
● ● ●
●
●
● ●
●
●● ●
●
●
●●● ●●●
● ●
●●
●
● ●● ● ● ●● ●● ●●● ●●● ● ●●●● ● ●
● ●
● ●● ● ●
● ●●
●
●
●●●
●
●●●● ● ●●●●
●
● ●●●
●
●● ● ●● ●● ● ●●
●
● ●
●
●
●
●
●
●●●
●
●● ● ●● ●●●●
●
●●●
●
●
●
●
●
● ●● ●●
●●●
●● ●●● ●
●
●
●
●●
●
● ● ●●● ●● ●
●
●● ●● ● ●● ●● ●●● ●
●
●
●
●
●
● ●●●● ●●● ●
● ●
● ●
●
●● ● ●●●● ● ●
●
●● ●●
●
●● ●●
●
●
●
●● ●● ●●● ●●● ●●
●
●● ●●
● ●
●●● ● ●
●
●● ●● ●● ●●●●● ● ●●●
●
●● ●● ● ●●● ●●●●
●
● ●●
●●●
●●●●
●
●●
●
●● ●● ●● ●●
●●
●● ●●● ●● ●
●
● ● ●●● ●●
●
● ●●
●
●●
● ●● ●● ●
● ●● ●●● ●
●
●●● ●●●● ●
●
● ●●●
●
●
●
●●●
●
● ●●
●
●● ● ●
●
●● ●● ● ● ●●●
●
● ●● ● ●● ● ● ●● ●●● ●● ●●● ●● ●●●
●
●●●
● ●
●
●
●● ●● ●●● ●
●
● ●● ●●●● ●● ●●
●
●●
● ●
●
●
● ● ●●
●
● ●●● ●●● ●● ● ●● ●●● ●
●
● ● ● ●●● ●●
●●●
●
●
●●●●
●
● ●●● ●● ● ●●●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
● ●● ●
●
● ●●
●●
●●● ●● ●● ● ●●
●
● ●
●
● ●● ●
●●
●
●
●
●●
●●● ●● ● ●● ●● ●●●●● ●●●
●
●● ● ●●
● ●
●● ●
−25 %
0 %
25 %
50 %
75 %
100 %
125 %
20 40 60 80
(Fitted linear model lm(Y∼AGE,data=framingham))8/55
Motivation Overview One binary covariate Two binary covariates One continuous covariate Mutually adjusted Interaction Final words
(MULTIPLE) LOGISTIC REGRESSION
We denote the probability of the event Yi = 1 for a subject with explanatoryvariables Xi , Zi , . . . as
P(Yi = 1|Xi ,Zi , . . . ) = pi .
The idea is to use the logit function. Instead of pi which is bounded between0 and 1 we apply linear regression to log(odds):
logit(pi) = log
(pi
1− pi
)= a + b1Zi + b2Xi + . . .
log(
pi1−pi
)can take both negative and positive values.
We will see that exp(a), exp(b1), exp(b2) can be interpreted as odds ratios.
9/55
Motivation Overview One binary covariate Two binary covariates One continuous covariate Mutually adjusted Interaction Final words
Equivalently, the logistic model is:
pi =exp( a + b1Zi + b2Xi + . . . )
1 + exp( a + b1Zi + b2Xi + . . .︸ ︷︷ ︸linear predictor = logit(pi )
)
linear predictor = logit(risk)
risk
−10 −5 0 5 10
0 %
25 %
50 %
75 %
100 %
10/55
Motivation Overview One binary covariate Two binary covariates One continuous covariate Mutually adjusted Interaction Final words
WARM-UP EXERCISES
Complete the following table
pi oddsi logit(pi)0.001%
-7.0-4.5
2.1%8%
50%3.8
99%11.5
Hint: the following functions take a vector as argument
logit <- function(p){log(p/(1 - p))}expit <- function(x){exp(x)/(1 + exp(x))}
11/55
Motivation Overview One binary covariate Two binary covariates One continuous covariate Mutually adjusted Interaction Final words
EXAMPLE: FRAMINGHAM STUDY
I SEX: 1 for males, 2 for femalesI AGE: age (years) at baseline (45-62)I FRW: "Framingham relative weight" (pct.) at baseline (52-222; 11
persons have missing values)I SBP: systolic blood pressure at baseline (mmHg) (90-300)I DBP: diastolic blood pressure at baseline (mmHg) 50-160)I CHOL: cholesterol at baseline (mg/100ml) (96-430)I CIG: cigarettes per day at baseline (0-60; 1 person has missing value)I CHD: 0 if no "coronary heart disease" during follow-up, 1 if "coronary
heart disease" at baseline (prevalent cases), x=2-10 if "coronary heartdisease" was diagnosed at follow-up no.x
12/55
Motivation Overview One binary covariate Two binary covariates One continuous covariate Mutually adjusted Interaction Final words
FRAMINGHAM STUDY: DATA PREPARATION
library(data.table)framingham <- fread("data/Framingham.csv")## remove prevalent casesframingham <- framingham[CHD!=1,]## define factor levels/labelsframingham[,Smoke:=factor(CIG>0,levels=c(FALSE,TRUE),
labels=c("No","Yes"))]framingham[,Sex:=factor(SEX,levels=c(1,2),
labels=c("Male","Female"))]## define binary outcome variableframingham[,Y:=factor(CHD>1,levels=c(FALSE,TRUE),
labels=c("no CHD","CHD"))]framingham
ID SEX AGE FRW SBP DBP CHOL CIG CHD Smoke Sex Y1: 1070 2 45 93 100 62 220 0 0 No Female no CHD2: 1081 1 48 93 108 70 340 0 0 No Male no CHD3: 1123 2 45 91 160 100 171 0 0 No Female no CHD4: 1215 1 50 110 110 70 224 0 0 No Male no CHD5: 1267 1 48 85 110 70 229 25 0 Yes Male no CHD
---1359: 6432 1 47 113 155 105 175 5 5 Yes Male CHD1360: 6434 1 59 98 124 84 227 20 2 Yes Male CHD1361: 6437 2 55 111 108 74 231 0 0 No Female no CHD1362: 6440 1 49 114 110 80 218 20 0 Yes Male no CHD1363: 6442 2 51 95 152 90 199 1 0 Yes Female no CHD 13/55
Motivation Overview One binary covariate Two binary covariates One continuous covariate Mutually adjusted Interaction Final words
FRAMINGHAM OUTCOME
Yi =
{1 subject i develops coronary heart diseased (CHD)0 subject i does not develop CHD
Probability of CHD for any subject similar to subject i , with respect tovariables X , V , W , ...
pi = P(Yi = 1|Xi ,Vi ,Zi , ...)
Corresponding odds
pi
(1− pi)= odds of CHD of subject i
For example:I Xi : age of subject iI Zi : gender of subject iI Vi : smoking status of subject i
14/55
Motivation Overview One binary covariate Two binary covariates One continuous covariate Mutually adjusted Interaction Final words
A BINARY EXPLANATORY VARIABLE
Zi =
{1 if i is a man0 if i is a woman
Simple logistic regression:
log
(pi
1− pi
)= a + bZi =
{a females
a + b males
That means,
b = (a + b)− a = log(odds for males)− log(odds for females)
= log
(Odds for males
Odds for females
)and
−b = a− (a + b) = log
(Odds for femalesOdds for males
)Note: remember that exp(−b) = 1/ exp(b).
15/55
Motivation Overview One binary covariate Two binary covariates One continuous covariate Mutually adjusted Interaction Final words
A BINARY EXPLANATORY VARIABLE
Zi =
{1 if i is a man0 if i is a woman
Simple logistic regression:
log
(pi
1− pi
)= a + bZi =
{a females
a + b males
That means,
b = (a + b)− a = log(odds for males)− log(odds for females)
= log
(Odds for males
Odds for females
)and
−b = a− (a + b) = log
(Odds for femalesOdds for males
)Note: remember that exp(−b) = 1/ exp(b).
15/55
Motivation Overview One binary covariate Two binary covariates One continuous covariate Mutually adjusted Interaction Final words
EXERCISE: 2 BY 2 CONTINGENCY TABLE
framingham[,table(Sex,Y)]
no CHD CHDMale 479 164Female 616 104
I use the tools for 2x2 tables4
I compute the odds ratio with 95% confidence limits and correspondingp-value
I report and interpret the result in a sentence
4Note: pay attention to the order of rows and columns before using any formula or software.16/55
Motivation Overview One binary covariate Two binary covariates One continuous covariate Mutually adjusted Interaction Final words
LOGISTIC REGRESSION IN R
fit1 <- glm(Y~Sex, data=framingham, family=binomial)
I Y ∼ Sex tells R that Y is the outcome and Sex the explanatory variableI data=framingham tells R where to find Y and Sex
I glm means generalized linear modelI family=binomial tells R that the outcome is binary and the logit link
should be used
17/55
Motivation Overview One binary covariate Two binary covariates One continuous covariate Mutually adjusted Interaction Final words
LOGISTIC REGRESSION IN Rfit1 <- glm(Y~Sex,data=framingham,family=binomial)summary(fit1)
Call:glm(formula = Y ~ Sex, family = binomial, data = framingham)
Deviance Residuals:Min 1Q Median 3Q Max
-0.7674 -0.7674 -0.5586 -0.5586 1.9672
Coefficients:Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.07183 0.09047 -11.847 < 2e-16 ***SexFemale -0.70702 0.13937 -5.073 0.000000392 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1351.2 on 1362 degrees of freedomResidual deviance: 1324.9 on 1361 degrees of freedomAIC: 1328.9
Number of Fisher Scoring iterations: 4
Note: pay attention to the default reference group.18/55
Motivation Overview One binary covariate Two binary covariates One continuous covariate Mutually adjusted Interaction Final words
CONFIDENCE INTERVALS FOR THE ODDS RATIO
library(Publish)fit1 <- glm(Y~Sex,data=framingham,family=binomial)publish(fit1)
Variable Units OddsRatio CI.95 p-valueSex Male 1.00 [1.00;1.00] 1
Female 0.49 [0.38;0.65] <0.0001
Note : 0.49 = exp(−0.71)
Women have a significantly lower risk to develop coronary heart disease than men(odds ratio: 0.49, 95%-CI: [0.38; 0.65], p-value <0.0001).
19/55
Motivation Overview One binary covariate Two binary covariates One continuous covariate Mutually adjusted Interaction Final words
CONFIDENCE INTERVALS FOR THE ODDS RATIO
library(Publish)fit1 <- glm(Y~Sex,data=framingham,family=binomial)publish(fit1)
Variable Units OddsRatio CI.95 p-valueSex Male 1.00 [1.00;1.00] 1
Female 0.49 [0.38;0.65] <0.0001
Note : 0.49 = exp(−0.71)
Women have a significantly lower risk to develop coronary heart disease than men(odds ratio: 0.49, 95%-CI: [0.38; 0.65], p-value <0.0001).
19/55
Motivation Overview One binary covariate Two binary covariates One continuous covariate Mutually adjusted Interaction Final words
CHANGING THE REFERENCE LEVEL
framingham[,sex:=relevel(Sex,"Female")]fit1a <- glm(Y~sex,data=framingham,family=binomial)publish(fit1a)
Variable Units OddsRatio CI.95 p-valuesex Female 1.00 [1.00;1.00] 1
Male 2.03 [1.54;2.66] <0.0001
Note : 2.03 = exp(0.71)
Men have a significantly higher risk to develop coronary heart disease than women(odds ratio: 2.03, 95%-CI: [1.5; 2.7], p-value <0.0001).
20/55
Motivation Overview One binary covariate Two binary covariates One continuous covariate Mutually adjusted Interaction Final words
CHANGING THE REFERENCE LEVEL
framingham[,sex:=relevel(Sex,"Female")]fit1a <- glm(Y~sex,data=framingham,family=binomial)publish(fit1a)
Variable Units OddsRatio CI.95 p-valuesex Female 1.00 [1.00;1.00] 1
Male 2.03 [1.54;2.66] <0.0001
Note : 2.03 = exp(0.71)
Men have a significantly higher risk to develop coronary heart disease than women(odds ratio: 2.03, 95%-CI: [1.5; 2.7], p-value <0.0001).
20/55
Motivation Overview One binary covariate Two binary covariates One continuous covariate Mutually adjusted Interaction Final words
TWO EXPLANATORY VARIABLES:
Zi =
{1 if i male0 female and Vi =
{1 if i smokes0 otherwise
Data can be summarized as two 2 by 2 tables in two ways5
Males (Z=1) Females (Z=0)Y = 1 Y = 0 Y = 1 Y = 0
V = 1 107 288 V = 1 27 192V = 0 57 191 V = 0 77 423
Smokers (V = 1) Non-smokers (V = 0)Y = 1 Y = 0 Y = 1 Y = 0
Males 107 288 Males 57 191Females 27 192 Females 77 423
5Usually, one option is more interesting than the other. Here, the first option is more suitable tostudy the association between the risk CHD and smoking.
21/55
Motivation Overview One binary covariate Two binary covariates One continuous covariate Mutually adjusted Interaction Final words
COCHRAN-MANTEL-HAENSZEL TEST
In this way, we can study the effect of smoking adjusted for sex:
ORMantel-Haenszel = 1.03; p > 0.05
and also study the effect of Sex (male vs female) adjusted for smoking:
ORMantel-Haenszel = 2.03; p < 0.05
ConclusionsI there is no significant effect of smoking adjusted for sexI there is a significant effect of sex adjusted for smoking
22/55
Motivation Overview One binary covariate Two binary covariates One continuous covariate Mutually adjusted Interaction Final words
LOGISTIC REGRESSION MODEL: TWO BINARY VARIABLES
log(
pi1−pi
)= a + b1Zi + b2Vi
=
a Female non-smokera + b1 Male non-smokera + b2 Female smokera + b1 + b2 Male smoker
Note: b1 = (a + b1)− a (non-smoker)
= (a + b1 + b2)− (a + b2) (smoker)
= logOR (males vs. females for given smoking status)
and b2 = (a + b2)− a (female)
= (a + b1 + b2)− (a + b1) (male)
= logOR (smokers vs. non-smokers for given gender)
23/55
Motivation Overview One binary covariate Two binary covariates One continuous covariate Mutually adjusted Interaction Final words
LOGISTIC REGRESSION RESULTSfit2=glm(Y~Sex+Smoke,data=framingham,family=binomial)summary(fit2)
Call:glm(formula = Y ~ Sex + Smoke, family = binomial, data = framingham)
Deviance Residuals:Min 1Q Median 3Q Max
-0.7716 -0.7607 -0.5564 -0.5564 1.9708
Coefficients:Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.09215 0.12717 -8.588 < 2e-16 ***SexFemale -0.69521 0.14635 -4.750 0.00000203 ***SmokeYes 0.03296 0.14457 0.228 0.82---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1350.8 on 1361 degrees of freedomResidual deviance: 1324.5 on 1359 degrees of freedom
(1 observation deleted due to missingness)AIC: 1330.5
Number of Fisher Scoring iterations: 4 24/55
Motivation Overview One binary covariate Two binary covariates One continuous covariate Mutually adjusted Interaction Final words
EXTRACTING ODDS RATIOS WITH CONFIDENCE INTERVALS
publish(fit2)
Variable Units Missing OddsRatio CI.95 p-valueSex Male 0 Ref
Female 0.50 [0.37;0.66] <0.0001Smoke No 1 Ref
Yes 1.03 [0.78;1.37] 0.8196
Logistic regression adjusted for smoking status showed a decrease in odds ofCHD of 50% (CI-95%: [37%;66%]) in women compared to men (p<0.0001).
Exercise: Based on this fitted model, compute the risk of CHD for anon-smoking woman, a non-smoking man, a smoking woman and a smokingman.
25/55
Motivation Overview One binary covariate Two binary covariates One continuous covariate Mutually adjusted Interaction Final words
EXTRACTING ODDS RATIOS WITH CONFIDENCE INTERVALS
publish(fit2)
Variable Units Missing OddsRatio CI.95 p-valueSex Male 0 Ref
Female 0.50 [0.37;0.66] <0.0001Smoke No 1 Ref
Yes 1.03 [0.78;1.37] 0.8196
Logistic regression adjusted for smoking status showed a decrease in odds ofCHD of 50% (CI-95%: [37%;66%]) in women compared to men (p<0.0001).
Exercise: Based on this fitted model, compute the risk of CHD for anon-smoking woman, a non-smoking man, a smoking woman and a smokingman.
25/55
Motivation Overview One binary covariate Two binary covariates One continuous covariate Mutually adjusted Interaction Final words
EXTRACTING ODDS RATIOS WITH CONFIDENCE INTERVALS
publish(fit2)
Variable Units Missing OddsRatio CI.95 p-valueSex Male 0 Ref
Female 0.50 [0.37;0.66] <0.0001Smoke No 1 Ref
Yes 1.03 [0.78;1.37] 0.8196
Logistic regression adjusted for smoking status showed a decrease in odds ofCHD of 50% (CI-95%: [37%;66%]) in women compared to men (p<0.0001).
Exercise: Based on this fitted model, compute the risk of CHD for anon-smoking woman, a non-smoking man, a smoking woman and a smokingman.
25/55
Motivation Overview One binary covariate Two binary covariates One continuous covariate Mutually adjusted Interaction Final words
SIMPLE LOGISTIC REGRESSION: CATEGORICAL EXPLANATORYVARIABLE:
Categorize age into 4 intervals:
45-48, 49-52, 53-56, 57-62
Summarize in 2 by 4 table
X = 0 X = 1 X = 2 X = 345-48 49-52 53-56 57-62
Y = 0 308 298 254 235 1095Y = 1 51 61 64 92 268
359 359 318 327 1363
(Note: both males and females)
26/55
Motivation Overview One binary covariate Two binary covariates One continuous covariate Mutually adjusted Interaction Final words
ANOVA: χ2 TEST
We may test whether the risk of CHD differs between the 4 age groups usinga chi-square test statistic - in this case with 3 degrees of freedom:
Null hypothesis:
odds(age 45− 48) = odds(age 49− 52)
= odds(age 53− 56) = odds(age 57−62)
Pearson χ2 test:∑ (OBS − EXP)2
EXP= 23.29 ∼ χ2
3, p < 0.001
Conclusion: the CHD-risk in all age groups are (significantly) not all equal.
27/55
Motivation Overview One binary covariate Two binary covariates One continuous covariate Mutually adjusted Interaction Final words
LOGISTIC REGRESSION: CATEGORICAL VARIABLE WITH 4 LEVELS:
log
(pi
1− pi
)=
a age 45− 48
a + b1 age 49− 52a + b2 age 53− 56a + b3 age 57− 62
Reference category 45-48
a = log (Odds(45− 48))
b1 = log
(Odds(49− 52)Odds(45− 48)
)b2 = log
(Odds(53− 56)Odds(45− 48)
)b3 = log
(Odds(57− 62)Odds(45− 48)
)
28/55
Motivation Overview One binary covariate Two binary covariates One continuous covariate Mutually adjusted Interaction Final words
RESULTS
framingham[,AgeCut:=cut(AGE,c(40,48,52,56,99),labels=c("45-48","49-52","53-56","57-62"))]
fit3=glm(Y~AgeCut,data=framingham,family=binomial)publish(fit3)
Variable Units OddsRatio CI.95 p-valueAgeCut 45-48 Ref
49-52 1.24 [0.82;1.85] 0.3042553-56 1.52 [1.02;2.28] 0.0415157-62 2.36 [1.61;3.46] < 0.0001
Notes:
1. The interpretation depends on the cut-off values.2. Not all (six) comparisons are in the table, for example the odds ratio for group
57-62 vs 53-56 is not. But, you should now be able to "easily" compute otherodds ratio estimates from the above results.
3. Running a similar code after changing the reference group is a convenient"trick" to easily obtain any odds ratio estimate, with corresponding 95%confidence interval and p-value.
29/55
Motivation Overview One binary covariate Two binary covariates One continuous covariate Mutually adjusted Interaction Final words
RESULTS
framingham[,AgeCut:=cut(AGE,c(40,48,52,56,99),labels=c("45-48","49-52","53-56","57-62"))]
fit3=glm(Y~AgeCut,data=framingham,family=binomial)publish(fit3)
Variable Units OddsRatio CI.95 p-valueAgeCut 45-48 Ref
49-52 1.24 [0.82;1.85] 0.3042553-56 1.52 [1.02;2.28] 0.0415157-62 2.36 [1.61;3.46] < 0.0001
Notes:
1. The interpretation depends on the cut-off values.2. Not all (six) comparisons are in the table, for example the odds ratio for group
57-62 vs 53-56 is not. But, you should now be able to "easily" compute otherodds ratio estimates from the above results.
3. Running a similar code after changing the reference group is a convenient"trick" to easily obtain any odds ratio estimate, with corresponding 95%confidence interval and p-value.
29/55
Motivation Overview One binary covariate Two binary covariates One continuous covariate Mutually adjusted Interaction Final words
EQUIVALENT RESULTS
Let us change the reference group to 53-56 and re-fit the model.
framingham[,AgeCutb:=relevel(AgeCut,"53-56")]fit3b=glm(Y~AgeCutb,data=framingham,family=binomial)publish(fit3b)
Variable Units OddsRatio CI.95 p-valueAgeCutb 53-56 Ref
45-48 0.66 [0.44;0.98] 0.0415149-52 0.81 [0.55;1.20] 0.2946857-62 1.55 [1.08;2.24] 0.01798
As expected, we could have computed these estimates from the previous(equivalent) model. E.g.:I 0.66=1/1.52, that is OR(45-48 vs 53-56)=1/OR(53-56 vs 45-48)
I 1.55=2.36/1.52, that is OR(57-62 vs 53-56)= OR(57-62 vs 45-48)/OR(53-56 vs
45-48)30/55
Motivation Overview One binary covariate Two binary covariates One continuous covariate Mutually adjusted Interaction Final words
QUANTITATIVE EXPLANATORY FACTOR
It is sometimes more natural / better to include the variable AGE (in years) asa quantitative explanatory factor in the model (i.e., No grouping)6
log
(pi
1− pi
)= a + b · agei
a = log(odds(age=0))
b = log(odds(age=a))− log(odds(age=a+1))
Interpretation: For each year
exp(b) = odds ratio
is the factor by which odds for CHD increases with each one unit increase ofage (here 1 year).
6sometimes better but not always, due to the linearity assumption.31/55
Motivation Overview One binary covariate Two binary covariates One continuous covariate Mutually adjusted Interaction Final words
RESULTSfit5=glm(Y~AGE,data=framingham,family=binomial)summary(fit5)
Call:glm(formula = Y ~ AGE, family = binomial, data = framingham)
Deviance Residuals:Min 1Q Median 3Q Max
-0.8600 -0.7052 -0.6082 -0.5224 2.0294
Coefficients:Estimate Std. Error z value Pr(>|z|)
(Intercept) -4.88431 0.77372 -6.313 0.000000000274 ***AGE 0.06581 0.01446 4.550 0.000005374208 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1351.2 on 1362 degrees of freedomResidual deviance: 1330.2 on 1361 degrees of freedomAIC: 1334.2
Number of Fisher Scoring iterations: 4
32/55
Motivation Overview One binary covariate Two binary covariates One continuous covariate Mutually adjusted Interaction Final words
COMPARING MODELS (MODEL CHECK)
Age
Ris
k of
CH
D (
%)
45 48 52 56 62
10
20
30
40
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Estimated risk:
no categorization (linear)categorization
33/55
Motivation Overview One binary covariate Two binary covariates One continuous covariate Mutually adjusted Interaction Final words
REPORTING RESULTS
One year change in age (not very good)
fit5=glm(Y~AGE,data=framingham,family=binomial)publish(fit5)
Variable Units OddsRatio CI.95 p-valueAGE 1.07 [1.04;1.10] < 0.0001
10-year change in age (probably better)
framingham[,age10:=AGE/10]fit5=glm(Y~age10,data=framingham,family=binomial)publish(fit5)
Variable Units OddsRatio CI.95 p-valueage10 1.93 [1.45;2.56] < 0.0001
These results are completely equivalent: 1.93 = 1.0710. The fitted models are thesame, but the "default" way of presenting the results is different.
34/55
Motivation Overview One binary covariate Two binary covariates One continuous covariate Mutually adjusted Interaction Final words
REPORTING RESULTS
One year change in age (not very good)
fit5=glm(Y~AGE,data=framingham,family=binomial)publish(fit5)
Variable Units OddsRatio CI.95 p-valueAGE 1.07 [1.04;1.10] < 0.0001
10-year change in age (probably better)
framingham[,age10:=AGE/10]fit5=glm(Y~age10,data=framingham,family=binomial)publish(fit5)
Variable Units OddsRatio CI.95 p-valueage10 1.93 [1.45;2.56] < 0.0001
These results are completely equivalent: 1.93 = 1.0710. The fitted models are thesame, but the "default" way of presenting the results is different.
34/55
Motivation Overview One binary covariate Two binary covariates One continuous covariate Mutually adjusted Interaction Final words
EXERCISESIf we substract from each person’s age the value 50 and then divide by 10:
framingham[,Age50:=(AGE-50)/10]fit5a=glm(Y~Age50,data=framingham,family=binomial)summary(fit5a)
Call:glm(formula = Y ~ Age50, family = binomial, data = framingham)
Deviance Residuals:Min 1Q Median 3Q Max
-0.8600 -0.7052 -0.6082 -0.5224 2.0294
Coefficients:Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.59382 0.08347 -19.09 < 2e-16 ***Age50 0.65810 0.14465 4.55 5.37e-06 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1351.2 on 1362 degrees of freedomResidual deviance: 1330.2 on 1361 degrees of freedomAIC: 1334.2
Number of Fisher Scoring iterations: 4
1. Report the coronary heart disease risk of a person aged 50.2. Report the association between age and risk of coronary heart disease in a sentence with
confidence interval and p-value. 35/55
Motivation Overview One binary covariate Two binary covariates One continuous covariate Mutually adjusted Interaction Final words
MULTIPLE LOGISTIC REGRESSION
Additive effects of several explanatory variables:
log
(pi
1− pi
)= a + b1Zi + b2Xi + . . .
Multiple logistic regression is a way to control confounding:
The effect on the outcome (odds ratio) of each explanatory variable ismutually adjusted for the other explanatory variables.
I The model assumes that the effect (odds ratio) of Z on Y is the same forall values of X.
I In other words: the effect of Z on Y is not modified by the values of X (nostatistical interaction).
36/55
Motivation Overview One binary covariate Two binary covariates One continuous covariate Mutually adjusted Interaction Final words
ILLUSTRATION OF WHAT "MUTUALLY ADJUSTED" MEANS
Additive model (no statistical interactions)
log
(pi
1− pi
)= a + b1Zi + b2Xi
Effect of sex Zi (0 = female, 1 = male) adjusted for age (Xi)
odds(age=50, male)odds(age=50, female)
=exp(a + b1 + b250)
exp(a + b250)= exp(a + b1 + b250− a− b250)
= exp(b1).
The result is the same for age 46 and age 61 and all other ages.
37/55
Motivation Overview One binary covariate Two binary covariates One continuous covariate Mutually adjusted Interaction Final words
ILLUSTRATION OF WHAT "MUTUALLY ADJUSTED" MEANS (CONTINUED)
Effect of age (Xi) for males:
odds(age=51, male)odds(age=50, male)
=exp(a + b1 + b251)exp(a− b1 + b250)
= exp(a + b1 + b251− a− b1 − b250)
= exp(b2).
The result is the same for females:
odds(age=51, female)odds(age=50, female)
=exp(a + b251)exp(a− b250)
= exp(a + b251− a− b250)
= exp(b2).
Linearity means that the result is the same for a comparison of age 63 andage 62 and all other one year differences.
38/55
Motivation Overview One binary covariate Two binary covariates One continuous covariate Mutually adjusted Interaction Final words
RESULTS (RAW)
fit.add=glm(Y ~ AGE + Sex, family = binomial,data = framingham)
summary(fit.add)
Call:glm(formula = Y ~ AGE + Sex, family = binomial, data = framingham)
Deviance Residuals:Min 1Q Median 3Q Max
-0.9910 -0.6927 -0.5958 -0.4500 2.1913
Coefficients:Estimate Std. Error z value Pr(>|z|)
(Intercept) -4.59208 0.78019 -5.886 0.00000000396 ***AGE 0.06672 0.01458 4.575 0.00000475151 ***SexFemale -0.71613 0.14052 -5.096 0.00000034612 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1351.2 on 1362 degrees of freedomResidual deviance: 1303.5 on 1360 degrees of freedomAIC: 1309.5
Number of Fisher Scoring iterations: 4 39/55
Motivation Overview One binary covariate Two binary covariates One continuous covariate Mutually adjusted Interaction Final words
RESULTS (FORMATTED FOR PUBLICATION)
fit.add=glm(Y ~ AGE + Sex, family = binomial, data =framingham)
publish(fit.add)
Variable Units OddsRatio CI.95 p-valueAGE 1.07 [1.04; 1.10] < 0.0001Sex Male Ref
Female 0.49 [0.37; 0.64] < 0.0001
Logistic regression was used to investigate gender differences in odds (risks)of CHD adjusted for age.
The age adjusted odds ratio was 0.49 (95%-CI: [0.37;0.64]) showing that therisks of CHD were significantly lower for women compared to men(p<0.0001).
40/55
Motivation Overview One binary covariate Two binary covariates One continuous covariate Mutually adjusted Interaction Final words
PREDICTED RISKS BASED ON LOGISTIC REGRESSION MODEL
A logistic regression model can be used to predict personalized risks:
log
(pi
1− pi
)= a + b1Zi + b2Xi + . . .
is equivalent to
pi =exp(a + b1Zi + b2Xi + . . . )
1 + exp(a + b1Zi + b2Xi + . . . )
The risks (and risk ratios) depend on all explanatory variables simultaneously.
We can predict a risk for any value of the covariates Z , X ,... once we haveestimated the model parameters.7
7However, upmost caution is needed when using covariate values beyond the range of thoseobserved (e.g. age=110). Usually we do not want to extrapolate beyond the observed data.
41/55
Motivation Overview One binary covariate Two binary covariates One continuous covariate Mutually adjusted Interaction Final words
PREDICTED RISKS BASED ON LOGISTIC REGRESSION MODEL
mydata=expand.grid(AGE=c(50,55),Sex=factor(c("Female","Male")))setDT(mydata)mydata
AGE Sex1: 50 Female2: 55 Female3: 50 Male4: 55 Male
mydata[,risk:=predict(fit.add,newdata=mydata,type="response")]mydata
AGE Sex risk1: 50 Female 0.12213812: 55 Female 0.16263533: 50 Male 0.22162844: 55 Male 0.2844255
42/55
Motivation Overview One binary covariate Two binary covariates One continuous covariate Mutually adjusted Interaction Final words
VISUALIZATION OF PREDICTED RISKS
mydata2 <- setDT(expand.grid(AGE=seq(45,62,1),Sex=factor(c("Female","Male"))))
mydata2[,risk:=predict(fit.add,newdata=mydata2,type="response")]
library(ggplot2)ggplot(mydata2,aes(x=AGE,y=risk,group=Sex
,colour=Sex))+geom_line()+ylim(c(0,1))+xlab("Age (years)")+ylab("Risk ofCHD")
0.00
0.25
0.50
0.75
1.00
45 50 55 60
Age (years)
Ris
k of
CH
D
Sex
Female
Male
43/55
Motivation Overview One binary covariate Two binary covariates One continuous covariate Mutually adjusted Interaction Final words
EXAMPLE WITH MORE VARIABLES
framingham[,Chol10:=CHOL/10]fit.multi <- glm(Y ~ AGE + Sex + Chol10 + SBP + Smoke,
family = binomial,data = framingham)publish(fit.multi)
Variable Units Missing OddsRatio CI.95 p-valueAGE 0 1.06 [1.02;1.09] 0.0004181Sex Male 0 Ref
Female 0.38 [0.28;0.52] < 0.0001Chol10 0 1.05 [1.02;1.08] 0.0026086
SBP 0 1.02 [1.01;1.02] < 0.0001Smoke No 1 Ref
Yes 1.19 [0.88;1.60] 0.2510447
44/55
Motivation Overview One binary covariate Two binary covariates One continuous covariate Mutually adjusted Interaction Final words
EXERCISE
1. Report the effect of cholesterol on coronary heart disease from themultiple logistic regression model
2. Predict the coronary heart disease risks of four smoking females allaged 50 and with 150 SBP but with different cholesterol values:I person 1: 235, person 2: 245, person 3: 351, person 4: 361
mydata=data.frame(AGE=50,Sex=factor("Female",levels(framingham$Sex)),Smoke=factor("Yes",levels(framingham$Smoke)),SBP=150,Chol10=c(23.5,24.5,35.1,36.1))
1. Compute the risk ratios for 10 unit cholesterol changes from 245 to 235and from 361 to 351. Are they similar? Did you expect that?
2. Repeat 2. and 3. for a male person.
45/55
Motivation Overview One binary covariate Two binary covariates One continuous covariate Mutually adjusted Interaction Final words
Statistical interaction = Effect modification
The effect of X on Y depends on Z
46/55
Motivation Overview One binary covariate Two binary covariates One continuous covariate Mutually adjusted Interaction Final words
EFFECT MODIFICATION
Setting: 3 variables
I two explanatory variables X,ZI one outcome Y
MeaningIn logistic regression, an interaction means that the odds ratio whichdescribes the effect of X on the odds of Y=1 depends on the value of Z
SymmetryIf the effect of variable X on Y is modified by Z then also the effect of Z on Y ismodified X.
ExampleThe age (Z) effect on the CHD-risk (Y) may depend on sex (X).
47/55
Motivation Overview One binary covariate Two binary covariates One continuous covariate Mutually adjusted Interaction Final words
A MODEL WITH AN INTERACTION
log
(pi
1− pi
)= a + b1Zi + b2Xi + b3Xi · Zi
The effect of sex Zi (0 = female, 1 = male) depends on age (Xi)
odds(age=50, male)odds(age=50, female)
=exp(a + b1 + b250 + b350)
exp(a + b250)= exp(b1 + b350).
The effect of age (Xi) depends on sex Zi (0 = female, 1 = male)
odds(age=50, male)odds(age=45, male)
= exp(b25 + b35).
odds(age=50, female)odds(age=45, female)
= exp(b25).
Note: exp(b2) describes the odds ratio for age in the reference group for sex(female) only, while it is exp(b2 + b3) in the other group (male).
48/55
Motivation Overview One binary covariate Two binary covariates One continuous covariate Mutually adjusted Interaction Final words
STATISTICAL INTERACTION IN R
First option (more transparent):
glm(Y ~ AGE + SEX + AGE:SEX, family = binomial, data =framingham)
Shorter syntax:
glm(Y ~ AGE * SEX, family = binomial, data = framingham)
49/55
Motivation Overview One binary covariate Two binary covariates One continuous covariate Mutually adjusted Interaction Final words
RESULTS
summary(glm(Y ~ AGE * SEX, family = binomial, data = framingham))
Call:glm(formula = Y ~ AGE * Sex, family = binomial, data = framingham)
Deviance Residuals:Min 1Q Median 3Q Max
-0.9171 -0.7284 -0.6074 -0.4010 2.3029
Coefficients:Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.45290 1.00008 -3.453 0.000555 ***AGE 0.04523 0.01883 2.402 0.016288 *SexFemale -3.54459 1.60431 -2.209 0.027146 *AGE:SexFemale 0.05297 0.02987 1.773 0.076194 .---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1351.2 on 1362 degrees of freedomResidual deviance: 1300.4 on 1359 degrees of freedomAIC: 1308.4
Number of Fisher Scoring iterations: 4 50/55
Motivation Overview One binary covariate Two binary covariates One continuous covariate Mutually adjusted Interaction Final words
STATISTICAL INTERACTION
publish(glm(Y ~ AGE * Sex, family = binomial, data =framingham))
Variable Units OddsRatio CI.95 p-valueAGE: Sex(Male) 1.05 [1.01;1.09] 0.01629
AGE: Sex(Female) 1.10 [1.05;1.15] < 0.0001
Notes:
I The main effects for AGE and Sex have no interpretation (and aretherefore not shown).
I One year more in age increases the odds by 5% in males and by 10% infemales.
51/55
Motivation Overview One binary covariate Two binary covariates One continuous covariate Mutually adjusted Interaction Final words
PREDICTED RISK OF THE MODEL WITH AN ADDITIVE EFFECT OF AGEAND SEX (NO EFFECT MODIFICATION)
Age (years)
Est
imat
ed C
HD
ris
k (%
)
45 50 55 60
0
25
50
75
100 MaleFemale
Age (years)
Logi
t of e
stim
ated
CH
D r
isk
45 50 55 60
−3
−2
−1
0
52/55
Motivation Overview One binary covariate Two binary covariates One continuous covariate Mutually adjusted Interaction Final words
PREDICTED RISK OF THE MODEL WITH AN INTERACTION BETWEEN AGEAND SEX (WITH EFFECT MODIFICATION)
Age (years)
Est
imat
ed C
HD
ris
k (%
)
45 50 55 60
0
25
50
75
100 MaleFemale
Age (years)
Logi
t of e
stim
ated
CH
D r
isk
45 50 55 60
−3
−2
−1
0
53/55
Motivation Overview One binary covariate Two binary covariates One continuous covariate Mutually adjusted Interaction Final words
TESTING FOR STATISTICAL INTERACTION
fit.add=glm(Y ~ AGE + SEX, family = binomial, data =framingham)
fit.int=glm(Y ~ AGE * SEX, family = binomial, data =framingham)
anova(fit.add,fit.int,test="Chisq")
Analysis of Deviance Table
Model 1: Y ~ AGE + SEXModel 2: Y ~ AGE * SEX
Resid. Df Resid. Dev Df Deviance Pr(>Chi)1 1360 1303.52 1359 1300.4 1 3.1676 0.07511 .---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
There is no statistically significant modification of the age effect by gender.
Note: this method and code also work when none of the two variables are binary. 54/55
Motivation Overview One binary covariate Two binary covariates One continuous covariate Mutually adjusted Interaction Final words
TAKE HOME MESSAGES
I (Multiple) logistic regression describes associations between one orseveral explanatory variables and the risk of an event (binary outcome).
I Analysis of an exposure of interest can be adjusted for potentialconfounders.
I In an additive model (no interactions), odds ratios do not depend on theother explanatory variables.
I Risks and risk ratios predicted by the model depend on the otherexplanatory variables.
I Linearity and absence of interaction are assumptions which should beinvestigated.
I Models with interactions are flexible and useful but need moreconcentration to be interpreted correctly.
I Several models can be fitted from the same data, but some are morerelevant than others for a given research question (e.g. in terms ofadjustments and interactions).
55/55