assessing binary outcomes: logistic regression

44
Assessing Binary Outcomes: Logistic Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Statistics for Health Research Research

Upload: vida

Post on 13-Feb-2016

71 views

Category:

Documents


2 download

DESCRIPTION

Statistics for Health Research. Assessing Binary Outcomes: Logistic Regression. Peter T. Donnan Professor of Epidemiology and Biostatistics. Objectives of Session . Understand what is meant by a binary outcome How analyses of binary outcomes implemented in logistic regression model - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Assessing Binary Outcomes: Logistic Regression

Assessing Binary Outcomes: Logistic

Regression Peter T. Donnan

Professor of Epidemiology and Biostatistics

Statistics for Health ResearchStatistics for Health Research

Page 2: Assessing Binary Outcomes: Logistic Regression

Objectives of Session Objectives of Session

•Understand what is meant by a Understand what is meant by a binary outcomebinary outcome

•How analyses of binary outcomes How analyses of binary outcomes implemented in logistic implemented in logistic regression model regression model

•Understand when a logistic model Understand when a logistic model is appropriateis appropriate

•Be able to implement in SPSS and Be able to implement in SPSS and •Interpret logistic model outputInterpret logistic model output

Page 3: Assessing Binary Outcomes: Logistic Regression

Binary OutcomeBinary Outcome

Extremely common in health Extremely common in health research:research:•Dead / AliveDead / Alive•Hospitalisation (Yes / No)Hospitalisation (Yes / No)•Diagnosis of diabetes (Yes / No)Diagnosis of diabetes (Yes / No)•Met target e.g. total cholesterol < 5.0 Met target e.g. total cholesterol < 5.0 mmol/l (Yes / No)mmol/l (Yes / No)n.b. Can use any code such as 1 / 2 but mathematically n.b. Can use any code such as 1 / 2 but mathematically easier to use 0 / 1easier to use 0 / 1

Page 4: Assessing Binary Outcomes: Logistic Regression

How is relationship How is relationship formulated?formulated?

For linear simplest equation For linear simplest equation is :is :

iebxay

y is the outcome; a is the y is the outcome; a is the intercept;intercept;b is the slope related to x the b is the slope related to x the explanatory variable and;explanatory variable and;e is the error term or random e is the error term or random ‘noise’‘noise’

Page 5: Assessing Binary Outcomes: Logistic Regression

Can we fit y as a Can we fit y as a probability range 0 to probability range 0 to

1?1?iebxay

Not quite! Not quite! Y as continuous can take any value from -Y as continuous can take any value from -∞ to + ∞ to + ∞∞Outcome is a probability of event, Outcome is a probability of event, ΠΠ (or p) on (or p) on scale 0 – 1 scale 0 – 1 Certain transformations of p can give the Certain transformations of p can give the required scalerequired scaleProbit is a normal transformation of pProbit is a normal transformation of pBut not easy to interpret results But not easy to interpret results

Page 6: Assessing Binary Outcomes: Logistic Regression

We can now fit p as a probability range 0 We can now fit p as a probability range 0 to 1 to 1 And y in range -∞ to + ∞And y in range -∞ to + ∞

iebxa)p(itlogy

The logit transformation The logit transformation works! works!

iebxa

pp

1

log

Page 7: Assessing Binary Outcomes: Logistic Regression

Logistic Regression ModelLogistic Regression Model

This has very useful propertiesThis has very useful propertiesThe term p/(1-p) is called the ‘Odds’ of an The term p/(1-p) is called the ‘Odds’ of an eventeventNote: not the same as the probability of an Note: not the same as the probability of an event pevent pIf x is binary coded 0/1 then - If x is binary coded 0/1 then -

exp (b) = ODDS RATIOexp (b) = ODDS RATIOfor the outcome in those coded 1 relative to for the outcome in those coded 1 relative to code 0 code 0 e.g. Odds of death in men (1) vs. women (0)e.g. Odds of death in men (1) vs. women (0)

iebxa

pp

1

log

Page 8: Assessing Binary Outcomes: Logistic Regression

Logistic Regression ModelLogistic Regression Model

Consider the LDL data. Consider the LDL data. It has two binary outcomes –It has two binary outcomes –1)1)LDL target achievedLDL target achieved2)2)Chol target achieved Chol target achieved For example consider gender as For example consider gender as a predictor – Male = 1 & Female a predictor – Male = 1 & Female = 2= 2

Page 9: Assessing Binary Outcomes: Logistic Regression

For a binary x we can express For a binary x we can express results as odds ratios (available in results as odds ratios (available in

crosstabs)crosstabs)

140140 563563

149149 531531

No Yes

Male

Female

LDL target achieved

Odds yes Odds yes = = 563/140563/140Odds yes Odds yes = = 531/149531/149

Page 10: Assessing Binary Outcomes: Logistic Regression

Odds ratio = 4.02 / 3.56Odds ratio = 4.02 / 3.56OR = 0.886 Female cf MaleOR = 0.886 Female cf Male

140140 563563

149149 531531

No Yes

Male

Female

LDL target achieved

Odds yes Odds yes = = 563/140563/140= = 4.024.02Odds yes Odds yes = = 531/149531/149= = 3.563.56

N.b. Odds is different to prob – Men p = 563/(140+563) = 0.80 or 80%

Page 11: Assessing Binary Outcomes: Logistic Regression

Odds ratio from Odds ratio from CrosstabsCrosstabs

Obtain odds ratios for 2 x 2 Obtain odds ratios for 2 x 2 tables from crosstabs and select tables from crosstabs and select option ‘risk’option ‘risk’

Page 12: Assessing Binary Outcomes: Logistic Regression

Results from CrosstabsResults from Crosstabs

Odds ratios for achieving LDL Odds ratios for achieving LDL target in females vs. malestarget in females vs. males

n.b. OR given for Female vs male = 0.886

Page 13: Assessing Binary Outcomes: Logistic Regression

Fit Logistic Regression Fit Logistic Regression ModelModel

DependentDependent is binary outcome – is binary outcome – LDL target met (Yes = 1, No = 0)LDL target met (Yes = 1, No = 0)IndependentIndependent – Gender 1 = M, 2 = F – Gender 1 = M, 2 = FShould get same as the crosstabs Should get same as the crosstabs result result Select Analyze / Regression / Binary Select Analyze / Regression / Binary LogisticLogisticSelect option of 95% CI for exp (b)Select option of 95% CI for exp (b)

Page 14: Assessing Binary Outcomes: Logistic Regression

Regression / Regression / Binary Binary

logistic…..logistic…..

Page 15: Assessing Binary Outcomes: Logistic Regression

Odds ratio from logistic Odds ratio from logistic model results for a binary model results for a binary

predictorpredictor

EXP (B) = Odds ratio F vs. MEXP (B) = Odds ratio F vs. MNote that OR for Men vs Note that OR for Men vs Women = 1/0.886 = 1.13Women = 1/0.886 = 1.13

Page 16: Assessing Binary Outcomes: Logistic Regression

Fit Logistic Regression Fit Logistic Regression Model – continuous Model – continuous

predictorpredictorDependentDependent is binary outcome – is binary outcome – LDL target metLDL target metIndependentIndependent – Continuous predictor – Continuous predictor – Adherence– AdherenceB represents the change in the ODDS B represents the change in the ODDS RATIO for a 1 unit increase in adherenceRATIO for a 1 unit increase in adherenceB x 10 represents the change in the B x 10 represents the change in the ODDS RATIO for a 10 unit increase in ODDS RATIO for a 10 unit increase in adherenceadherence

Page 17: Assessing Binary Outcomes: Logistic Regression

Odds ratio from logistic Odds ratio from logistic model results for a model results for a

continuous continuous

EXP (B) = Odds ratio for 1% increase in EXP (B) = Odds ratio for 1% increase in AdherenceAdherenceOR for 10% increase is exp(10 x 0.010) = 1.105 OR for 10% increase is exp(10 x 0.010) = 1.105 i.e. a 10.5% increase in odds of i.e. a 10.5% increase in odds of meeting LDL target for each 10% meeting LDL target for each 10% increase in adherenceincrease in adherence

Page 18: Assessing Binary Outcomes: Logistic Regression

Fit Logistic Regression Fit Logistic Regression Model – categorical Model – categorical

predictorpredictorDependentDependent is binary outcome – is binary outcome – LDL target metLDL target metIndependentIndependent – APOE genotype (1 – – APOE genotype (1 – 6)6)Choose a reference category, in this case Choose a reference category, in this case worst outcome is genotype 6 so choose 6 worst outcome is genotype 6 so choose 6 to give ORs > 1to give ORs > 1B represents the OR for each category B represents the OR for each category relative to the reference categoryrelative to the reference category

Page 19: Assessing Binary Outcomes: Logistic Regression

Regression / Regression / Binary Binary

logistic…..logistic….. Choose Categorical

Page 20: Assessing Binary Outcomes: Logistic Regression

Odds ratios from logistic Odds ratios from logistic model results for a model results for a

categorical predictorcategorical predictor

EXP (B) = Odds ratio EXP (B) = Odds ratio for APOE (2) vs APOE for APOE (2) vs APOE (6) OR = 4.381 (6) OR = 4.381 (95% CI 1.742, 11.021)(95% CI 1.742, 11.021)

Page 21: Assessing Binary Outcomes: Logistic Regression

Epidemiological Epidemiological DesignsDesigns

• Logistic model common in Logistic model common in epidemiological researchepidemiological research

• In case-control designs, case is coded 1 In case-control designs, case is coded 1 and controls as 0 and used as and controls as 0 and used as dependent variabledependent variable

• In cohort study outcome (e.g. death) is In cohort study outcome (e.g. death) is used as binary outcome in logistic used as binary outcome in logistic modelmodel

• Note in cohort study exp(b) is Relative Note in cohort study exp(b) is Relative Risk (RR) rather than OR Risk (RR) rather than OR

Page 22: Assessing Binary Outcomes: Logistic Regression

Definition- Clinical Definition- Clinical Prediction RulePrediction Rule

• Clinical tool that quantifies Clinical tool that quantifies contribution of:contribution of:– HistoryHistory– ExaminationExamination– Diagnostic testsDiagnostic tests

• Stratify patients according to Stratify patients according to probability of having target disorderprobability of having target disorder

• Outcome can be in terms of diagnosis, Outcome can be in terms of diagnosis, prognosis, referral or treatmentprognosis, referral or treatment

Page 23: Assessing Binary Outcomes: Logistic Regression

Thresholds for decision Thresholds for decision makingmaking

Diagnosis / test threshold

Test / reassurance threshold

Derived Derived Probability Probability of diseaseof disease

100%

0%

TreatmentTreatment

Further diagnostic Further diagnostic testingtesting

ReassuranceReassurance

Page 24: Assessing Binary Outcomes: Logistic Regression

Ottawa ankle ruleOttawa ankle rule

Page 25: Assessing Binary Outcomes: Logistic Regression

Identify high Identify high risk through risk through ‘risk ‘risk stratification’ stratification’ andandIntervene Intervene through case through case management at management at highest riskhighest risk

Risk StratificationRisk StratificationKaiser-Permanente Kaiser-Permanente

PyramidPyramid

Page 26: Assessing Binary Outcomes: Logistic Regression

Framingham Risk Framingham Risk AlgorithmAlgorithm

• Prediction of Prediction of risk: risk: CardiovasculaCardiovascular r (Framingham)(Framingham)

55 yr-old woman 15-20% 5 yr risk

Page 27: Assessing Binary Outcomes: Logistic Regression

Increasing appearance of “prediction Increasing appearance of “prediction models” in literature (ISI Web of models” in literature (ISI Web of

Knowledge v3) Knowledge v3)

Page 28: Assessing Binary Outcomes: Logistic Regression

Stages of development and Stages of development and assessment of a CPRassessment of a CPR

Cross Cross SectionalSectionaloror

CohortCohortRandomized Randomized Controlled Controlled TrialTrial

Cross Cross SectionalSectionaloror

CohortCohort

Step 1 DerivationIdentification of factors with predictive power

Step 2 ValidationEvidence of reproducible accuracyApplication of a rule in similar clinical settings and population or better still multiple clinical settings and different populations with varying prevalence and outcomes of disease

Step 3 Impact AnalysisEvidence that rule changes physician behaviour and improves patient outcomes and /or reduces costs

Page 29: Assessing Binary Outcomes: Logistic Regression

How to derive a How to derive a CPR?CPR?

1.1. Toss a coin to make Toss a coin to make decision?decision?

2.2. Individual opinion and Individual opinion and experience?experience?

3.3. Huddle of wise ones – Huddle of wise ones – Delphi technique to reach Delphi technique to reach consensus?consensus?

4.4. Statistical prediction Statistical prediction models !models !

Page 30: Assessing Binary Outcomes: Logistic Regression

Regression Models for Regression Models for predictionprediction

• In all of these models we In all of these models we combine a set of factors:combine a set of factors:

Usually between 2-20 predictorsUsually between 2-20 predictorsOccam’s razor suggests smaller is betterOccam’s razor suggests smaller is better

• Fit a multiple regression Fit a multiple regression modelmodel

• Extract probabilities of Extract probabilities of outcome or diagnosisoutcome or diagnosis

• Create CPRCreate CPR

Page 31: Assessing Binary Outcomes: Logistic Regression

Regression Models Regression Models for predictionfor prediction

• Linear if outcome Linear if outcome continuouscontinuous

• Binary OutcomesBinary OutcomesLogistic regression model Logistic regression model Survival models – Cox PH, Survival models – Cox PH, Weibull, log logistic, etcWeibull, log logistic, etc

• Ordinal or nominal Ordinal or nominal outcomesoutcomesOrdinal logistic regressionOrdinal logistic regression

Page 32: Assessing Binary Outcomes: Logistic Regression

We can now fit p as a probability range 0 We can now fit p as a probability range 0 to 1 to 1 And y in range -∞ to + ∞And y in range -∞ to + ∞

iebxa)p(itlogy

The logit The logit transformation transformation

iebxa

pp

1

log

Page 33: Assessing Binary Outcomes: Logistic Regression

Statistical prediction Statistical prediction ModelsModels

Logistic regression model:Logistic regression model:

.....+xβ+xβ+β=)p-1plog( 22110

p= probability of the Event p= probability of the Event and effect of factors (x) and effect of factors (x) increase or decrease risk of increase or decrease risk of this eventthis event

Page 34: Assessing Binary Outcomes: Logistic Regression

Derivation of Derivation of probability of eventsprobability of eventsLogistic regression model:Logistic regression model:

.....+xβ+xβ+β=)p-1plog( 22110

Call Call Linear Predictor Linear Predictor as a linear as a linear function of the predictors xfunction of the predictors x11, x, x22, , xx33, etc…., etc….

.....xβxββX22110

Page 35: Assessing Binary Outcomes: Logistic Regression

Derivation of Derivation of probability of eventsprobability of eventsThen:Then: X)

p-1plog(

Take exp of both sides :Take exp of both sides :

)Xexp()p-1p(

Page 36: Assessing Binary Outcomes: Logistic Regression

Derivation of Derivation of probability of eventsprobability of events

Then rearrange:Then rearrange:

)Xexp(11p

Or:Or:)Xexp(1)Xexp(p

Page 37: Assessing Binary Outcomes: Logistic Regression

Example:Example:PEONY model to predict risk of PEONY model to predict risk of emergency admission to hospital over emergency admission to hospital over the next yearthe next yearNow implemented in NHS Tayside as Now implemented in NHS Tayside as part of Virtual Wards management of part of Virtual Wards management of LTCLTCPEONY II model developed – watch this PEONY II model developed – watch this space!space!Donnan et al Arch Int Med 2008Donnan et al Arch Int Med 2008

Risk Stratification Risk Stratification based on derived based on derived

probabilitiesprobabilities

Page 38: Assessing Binary Outcomes: Logistic Regression

Other binary modelsOther binary models

The logistic model is only applicable The logistic model is only applicable whenever the length of follow-up is whenever the length of follow-up is same for each individual e.g. 5-yr same for each individual e.g. 5-yr follow-up of a cohortfollow-up of a cohortFor binary outcomes where For binary outcomes where censoring occurs i.e. people leave censoring occurs i.e. people leave the cohort from death or migration the cohort from death or migration then length of follow-up varies and then length of follow-up varies and need to use need to use survival models survival models such as such as Cox Proportional Hazards modelCox Proportional Hazards model

Page 39: Assessing Binary Outcomes: Logistic Regression

SummarySummary• Logistic model easily fitted in Logistic model easily fitted in

SPSSSPSS• Clear link with ODDS RATIOSClear link with ODDS RATIOS• Common model for case-control, Common model for case-control,

cohort studies as well as cohort studies as well as development of clinical prediction development of clinical prediction modelsmodels

Page 40: Assessing Binary Outcomes: Logistic Regression

General General ReferencesReferences

• Campbell MJ, Machin D. Campbell MJ, Machin D. Medical Statistics. A Medical Statistics. A commonsense approach.commonsense approach. 3 3rdrd ed. Wiley, New York, ed. Wiley, New York, 1999.1999.

• Hosmer DW and Lemeshow S. Hosmer DW and Lemeshow S. Applied logistic Applied logistic regression. regression. John Wiley& sons, New Jersey, 2000. John Wiley& sons, New Jersey, 2000.

• Altman DG. Altman DG. Practical statistics for medical researchPractical statistics for medical research. . London: Chapman and Hall, 1991.London: Chapman and Hall, 1991.

• Armitage P and Berry G. Armitage P and Berry G. Statistical Methods in Statistical Methods in Medical researchMedical research. 3. 3rdrd ed. Oxford: Blackwell ed. Oxford: Blackwell Scientific, 1994.Scientific, 1994.

• Agresti A. Agresti A. An introduction to Categorical Data An introduction to Categorical Data Analysis. Analysis. Wiley, New York, 1996.Wiley, New York, 1996.

Page 41: Assessing Binary Outcomes: Logistic Regression

Practical: Fit Multiple Practical: Fit Multiple Logistic Regression ModelLogistic Regression Model

DependentDependent is binary outcome – is binary outcome – LDL target met (Yes = 1, No = 0)LDL target met (Yes = 1, No = 0)IndependentIndependent – Gender 1 = M, 2 = F, – Gender 1 = M, 2 = F, add APOE, adherence, etcadd APOE, adherence, etcRemember Remember Select Analyze / Regression / Select Analyze / Regression / Binary LogisticBinary LogisticSelect option of 95% CI for exp (b)Select option of 95% CI for exp (b)

Page 42: Assessing Binary Outcomes: Logistic Regression

3) Screening for variables to 3) Screening for variables to eliminateeliminate

• Consider screening procedures to Consider screening procedures to eliminate a number of variables eliminate a number of variables under consideration under consideration

• Test each variable separatelyTest each variable separately• If p > 0.3 then they would have to be If p > 0.3 then they would have to be

very strong confounders to become very strong confounders to become significant on adjustment in a significant on adjustment in a multiple regression so could be multiple regression so could be discardeddiscarded

• Hosmer-Lemeshow criteriaHosmer-Lemeshow criteria

Page 43: Assessing Binary Outcomes: Logistic Regression

4) A mixture of automatic 4) A mixture of automatic procedures and self procedures and self

selectionselection• Use automatic procedures as a guideUse automatic procedures as a guide• Compare stepwise and backward Compare stepwise and backward

elimination elimination • Think about what factors are importantThink about what factors are important• Add ‘important’ factorsAdd ‘important’ factors• Do not follow blindly statistical Do not follow blindly statistical

significancesignificance

Page 44: Assessing Binary Outcomes: Logistic Regression

Remember Occam’s Razor‘‘Entia non sunt Entia non sunt multiplicanda multiplicanda praeter praeter necessitatem’necessitatem’‘‘Entities must not be Entities must not be multiplied beyond multiplied beyond necessity’necessity’

William of Ockham 14th century Friar and logician1288-1347