introduction to logistic regression. content simple and multiple linear regression simple logistic...

48
Introduction to Logistic Regression

Upload: evangeline-eaton

Post on 13-Jan-2016

271 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Logistic Regression. Content Simple and multiple linear regression Simple logistic regression –The logistic function –Estimation of parameters

Introduction to Logistic Regression

Page 2: Introduction to Logistic Regression. Content Simple and multiple linear regression Simple logistic regression –The logistic function –Estimation of parameters

Content

• Simple and multiple linear regression

• Simple logistic regression– The logistic function

– Estimation of parameters

– Interpretation of coefficients

• Multiple logistic regression– Interpretation of coefficients

– Coding of variables

Page 3: Introduction to Logistic Regression. Content Simple and multiple linear regression Simple logistic regression –The logistic function –Estimation of parameters

Simple linear regression

Age SBP Age SBP Age SBP

22 131 41 139 52 128 23 128 41 171 54 105 24 116 46 137 56 145 27 106 47 111 57 141 28 114 48 115 58 153 29 123 49 133 59 157 30 117 49 128 63 155 32 122 50 183 67 176 33 99 51 130 71 172 35 121 51 133 77 178 40 147 51 144 81 217

Table 1 Age and systolic blood pressure (SBP) among 33 adult women

Page 4: Introduction to Logistic Regression. Content Simple and multiple linear regression Simple logistic regression –The logistic function –Estimation of parameters

80

100

120

140

160

180

200

220

20 30 40 50 60 70 80 90

SBP (mm Hg)

Age (years)

adapted from Colton T. Statistics in Medicine. Boston: Little Brown, 1974

Page 5: Introduction to Logistic Regression. Content Simple and multiple linear regression Simple logistic regression –The logistic function –Estimation of parameters

Simple linear regression

• Relation between 2 continuous variables (SBP and age)

• Regression coefficient 1– Measures association between y and x– Amount by which y changes on average when x changes by one

unit– Least squares method

y

x

xβαy 11Slope

Page 6: Introduction to Logistic Regression. Content Simple and multiple linear regression Simple logistic regression –The logistic function –Estimation of parameters

Multiple linear regression

• Relation between a continuous variable and a set ofi continuous variables

• Partial regression coefficients i

– Amount by which y changes on average when xi changes by one unit and all the other xis remain constant

– Measures association between xi and y adjusted for all other xi

• Example– SBP versus age, weight, height, etc

xβ ... xβ xβαy ii2211

Page 7: Introduction to Logistic Regression. Content Simple and multiple linear regression Simple logistic regression –The logistic function –Estimation of parameters

Multiple linear regression

Dependent Independent variables

Predicted Predictor variables

Response variable Explanatory variables

Outcome variable Covariables

xβ ... xβ xβα y ii2211

Page 8: Introduction to Logistic Regression. Content Simple and multiple linear regression Simple logistic regression –The logistic function –Estimation of parameters

Multivariate analysis

Model Outcome

Linear regression continous

Poisson regression counts

Cox model survival

Logistic regression binomial

......

• Choice of the tool according to study, objectives, and the variables

– Control of confounding

– Model building, prediction

Page 9: Introduction to Logistic Regression. Content Simple and multiple linear regression Simple logistic regression –The logistic function –Estimation of parameters

Logistic regression

• Models the relationship between a set of variables xi

– dichotomous (eat : yes/no)

– categorical (social class, ... )

– continuous (age, ...)

and

– dichotomous variable Y

• Dichotomous (binary) outcome most common situation in biology and epidemiology

Page 10: Introduction to Logistic Regression. Content Simple and multiple linear regression Simple logistic regression –The logistic function –Estimation of parameters

Logistic regression (1)

Table 2 Age and signs of coronary heart disease (CD)

Page 11: Introduction to Logistic Regression. Content Simple and multiple linear regression Simple logistic regression –The logistic function –Estimation of parameters

How can we analyse these data?

• Comparison of the mean age of diseased and non-diseased women

– Non-diseased: 38.6 years

– Diseased: 58.7 years (p<0.0001)

• Linear regression?

Page 12: Introduction to Logistic Regression. Content Simple and multiple linear regression Simple logistic regression –The logistic function –Estimation of parameters

Dot-plot: Data from Table 2

Page 13: Introduction to Logistic Regression. Content Simple and multiple linear regression Simple logistic regression –The logistic function –Estimation of parameters

Logistic regression (2)

Table 3 Prevalence (%) of signs of CD according to age group

Page 14: Introduction to Logistic Regression. Content Simple and multiple linear regression Simple logistic regression –The logistic function –Estimation of parameters

Dot-plot: Data from Table 3

0

20

40

60

80

100

0 2 4 6 8

Diseased

Age (years)

P1-P

Page 15: Introduction to Logistic Regression. Content Simple and multiple linear regression Simple logistic regression –The logistic function –Estimation of parameters

Dot-plot: Data from Table 3

-100

-80

-60

-40

-20

0

20

40

60

80

100

0 2 4 6 8

Diseased %

Age (years)

Page 16: Introduction to Logistic Regression. Content Simple and multiple linear regression Simple logistic regression –The logistic function –Estimation of parameters

The logistic function (2)

logit of P(y|x)

{

Page 17: Introduction to Logistic Regression. Content Simple and multiple linear regression Simple logistic regression –The logistic function –Estimation of parameters

The logistic function (3)

• Advantages of the logit– Simple transformation of P(y|x)

– Linear relationship with x

– Can be continuous (Logit between - to + )– Known binomial distribution (P between 0 and 1)

– Directly related to the notion of odds of disease

βxαP-1

P ln

eP-1

P βxα

Page 18: Introduction to Logistic Regression. Content Simple and multiple linear regression Simple logistic regression –The logistic function –Estimation of parameters

Interpretation of (1)

eP-1

P βxα

Page 19: Introduction to Logistic Regression. Content Simple and multiple linear regression Simple logistic regression –The logistic function –Estimation of parameters

Interpretation of (2)

• = increase in log-odds for a one unit increase in x

• Test of the hypothesis that =0 (Wald test)

• Interval testing

df) (1 Variance(

2β)

β2

Page 20: Introduction to Logistic Regression. Content Simple and multiple linear regression Simple logistic regression –The logistic function –Estimation of parameters

Example

• Age (<55 and 55+ years) and risk of developing coronary heart disease (CD)

Page 21: Introduction to Logistic Regression. Content Simple and multiple linear regression Simple logistic regression –The logistic function –Estimation of parameters

جمع پسر دختر

183 124 59 دارد

5316 2874 2442 ندارد

5499 2998 2501 جمع

فراوانی مطلق ابتال به آسم بر حسب جنس در •1374دانش آموزان شهر زنجان

Page 22: Introduction to Logistic Regression. Content Simple and multiple linear regression Simple logistic regression –The logistic function –Estimation of parameters

پسر دختر

شيوع

Odds

لگاريتم Odds

Page 23: Introduction to Logistic Regression. Content Simple and multiple linear regression Simple logistic regression –The logistic function –Estimation of parameters
Page 24: Introduction to Logistic Regression. Content Simple and multiple linear regression Simple logistic regression –The logistic function –Estimation of parameters

• Results of fitting Logistic Regression Model

Age 2.094 0.841- Age βαP-1

P ln 1

Page 25: Introduction to Logistic Regression. Content Simple and multiple linear regression Simple logistic regression –The logistic function –Estimation of parameters

Interpretation of (1)

eP-1

P βxα

Page 26: Introduction to Logistic Regression. Content Simple and multiple linear regression Simple logistic regression –The logistic function –Estimation of parameters

Multiple logistic regression

• More than one independent variable– Dichotomous, ordinal, nominal, continuous …

• Interpretation of i – Increase in log-odds for a one unit increase in xi with all

the other xis constant– Measures association between xi and log-odds adjusted

for all other xi

ii2211 xβ ... xβ xβαP-1

P ln

Page 27: Introduction to Logistic Regression. Content Simple and multiple linear regression Simple logistic regression –The logistic function –Estimation of parameters

Multiple logistic regression

• Effect modification– Can be modelled by including interaction terms

1132211 xxβ xβ xβαP-1

P ln

Page 28: Introduction to Logistic Regression. Content Simple and multiple linear regression Simple logistic regression –The logistic function –Estimation of parameters

Statistical testing

• Question– Does model including a given independent variable

provide more information about dependent variable than model without this variable?

• Three tests– Likelihood ratio statistic (LRS)

– Wald test

Page 29: Introduction to Logistic Regression. Content Simple and multiple linear regression Simple logistic regression –The logistic function –Estimation of parameters

Fitting equation to the data

• Linear regression: Least squares

• Logistic regression: Maximum likelihood

• Likelihood function– Estimates parameters and with property that

likelihood (probability) of observed data is higher than for any other values

– Practically easier to work with log-likelihood

Page 30: Introduction to Logistic Regression. Content Simple and multiple linear regression Simple logistic regression –The logistic function –Estimation of parameters

Likelihood ratio statistic

• Compares two nested models Log(odds) = + 1x1 + 2x2 + 3x3 + 4x4 (model 1)

Log(odds) = + 1x1 + 2x2 (model 2)

Page 31: Introduction to Logistic Regression. Content Simple and multiple linear regression Simple logistic regression –The logistic function –Estimation of parameters

Example

0.2664) (SE 0.2614) (SE

Smk 0.7005 Exc 1.0047 0.7102

Smk β Exc βαP-1

P ln 21

P Probability for cardiac arrestExc 1= lack of exercise, 0 = exerciseSmk 1= smokers, 0= non-smokers

adapted from Kerr, Handbook of Public Health Methods, McGraw-Hill, 1998

Page 32: Introduction to Logistic Regression. Content Simple and multiple linear regression Simple logistic regression –The logistic function –Estimation of parameters

• Interactive effect between smoking and exercise?

• Product term 3 = -0.4604 (SE 0.5332)

Wald test = 0.75 (1df)

-2log(L) = 342.092 with interaction term = 342.836 without interaction term

LR statistic = 0.74 (1df), p = 0.39 No evidence of any interaction

Exc Smk β Smk β Exc βαP-1

P ln 321

Page 33: Introduction to Logistic Regression. Content Simple and multiple linear regression Simple logistic regression –The logistic function –Estimation of parameters

Coding of variables (1)

• Dichotomous variables: yes = 1, no = 0

• Continuous variables– Increase in OR for a one unit change in exposure

variable

– Logistic model is multiplicative OR increases exponentially with x

» If OR = 2 for a one unit change in exposure and x increases from 2 to 5: OR = 2 x 2 x 2 = 23 = 8

– Verify that OR increases exponentially with x. When in doubt, treat as qualitative variable

Page 34: Introduction to Logistic Regression. Content Simple and multiple linear regression Simple logistic regression –The logistic function –Estimation of parameters

Indicator variables: Type of tobacco

• Neutralises artificial hierarchy between classes in the variable "type of tobacco"

• No assumptions made

• 3 variables (3 df) in model using same reference

• OR for each type of tobacco adjusted for the others in reference to non-smoking

Page 35: Introduction to Logistic Regression. Content Simple and multiple linear regression Simple logistic regression –The logistic function –Estimation of parameters

) بعنوان يکي از LBWتولد با وزن پايين (•شاخصهاي مهم سالمتي پيامدي است که شرايط اقتصادي و بهداشتي تاثير زيادي بر روي آن دارد.

اين شرايط که به نوبه خود در مکانهاي مختلف متفاوتند باعث تنوع در الگوي مکاني رخداد تولد با

وزن پايين مي شوند. توجه به اين جنبه و نقش پراهميت مکان در تنوع پيامدهاي بهداشتي حيطه

اي است که در هر منطقه بايد جداگانه صورت گيرد تا مناطق پرخطر براي تخصيص مداخالت الزم شناسايي شوند. لذا اين بررسي با هدف

در روستاهاي LBWتعيين مناطق پرخطر شهرستان رشت انجام شد.

Page 36: Introduction to Logistic Regression. Content Simple and multiple linear regression Simple logistic regression –The logistic function –Estimation of parameters

روش كار:•براي جمع آوري داده هاي اين مطالعه، تولدهاي با •

گرم به تفکيک واحدهاي روستايي 2500وزن زير 1381 و 1380شهرستان رشت در فاصله زماني

از زيجهاي حياتي روستاها استخراج گرديد. براي تعيين توزيع مکاني تولدهاي با وزن پايين ابتدا

"نقشه هاي مسطح" براي ميزان تولدهاي با وزن پايين تهيه شد.

Page 37: Introduction to Logistic Regression. Content Simple and multiple linear regression Simple logistic regression –The logistic function –Estimation of parameters

كل شهرستان LBWتعداد روستاهاي درزنده 295 زايمان هاي تعداد و 5987مورد

موارد. تعداد آيا روستاهاي LBWاست درشماست انتظار حد از بيش زير

كد روستاتعداد تولد

زنده LBW8800 80 08806 32 28803 36 18815 36 18801 24 4

Page 38: Introduction to Logistic Regression. Content Simple and multiple linear regression Simple logistic regression –The logistic function –Estimation of parameters

)!()( xexP

Page 39: Introduction to Logistic Regression. Content Simple and multiple linear regression Simple logistic regression –The logistic function –Estimation of parameters

را LBWمحقق بر روي نقشه مناطق پر خطر •مشخص مي كند. حال بنظرش مناسب مي رسد كه ارتباط برخي ازشاخصهاي

بهداشتي و اقتصادي روستاها و تولد با وزن پايين را مورد سنجش قرار دهد. به همين دليل

(، دسترسي به 3/82ميزان باروري عمومي) درصد( و دسترسي به وسيله 96خانه بهداشت)

درصد( را براي ساكنين 52نقليه در خانواده)روستاهاي شهرستان مشخص مي كند.

Page 40: Introduction to Logistic Regression. Content Simple and multiple linear regression Simple logistic regression –The logistic function –Estimation of parameters

Poisson Regression

•Poisson Regression Analysis is used when the outcome variable comprises counts, usually of rather rare events e.g. number of cases of cancer over a defined period in a cohort of subjects.

log (rate) = β0 + β1X1 + β2X2 + … +βnXn

log (event/person-time) = β0 + β1X1 + …

log (event) – log (person-time) = β0 + β1X1 + …

log (event) = log (person-time) + β0 + β1X1 + …

log (event) = β0* + β1X1 + …

Page 41: Introduction to Logistic Regression. Content Simple and multiple linear regression Simple logistic regression –The logistic function –Estimation of parameters

P value ضريب درصد95محدوده اطمينان شاخص

96/0 37/0- 35/0 008/0- نداشتن خانه بهداشت

52/0 17/0- 34/0 08/0 نداشتن آب لوله كشي

06/0 01/0- 48/0 23/0 عدم دسترسي به وسايل نقليه

01/0 85/0- 09/0- 47/0- نداشتن مركز بهداشتي درماني

60/0 22/0- 38/0 08/0 66-82 = GFR

05/0 42/0- 22/0 09/0 100-83= GFR

Page 42: Introduction to Logistic Regression. Content Simple and multiple linear regression Simple logistic regression –The logistic function –Estimation of parameters

Reference

• Hosmer DW, Lemeshow S. Applied logistic regression.Wiley & Sons, New York, 1989

Page 43: Introduction to Logistic Regression. Content Simple and multiple linear regression Simple logistic regression –The logistic function –Estimation of parameters

Example 1: Low Birth Weight Study

• 198 observations

• Low Birth Weigth [LBW] – 1= Birth weight < 2500g

– 0= Birth weight >= 2500g

• Age of mother in years

• Weight of mother in pounds [LWT]

• Race (1,2,3)

• Number of doctor’s visit in last trimester [FTV]

Page 44: Introduction to Logistic Regression. Content Simple and multiple linear regression Simple logistic regression –The logistic function –Estimation of parameters

Example 2: Risk of death from bacterial meningitis according to

treatment• 161 observations

• Death (0,1)

• Treatment– 1=Chloramphenicol, 2=Ampicillin)

• Delay before treatment (onset, in days)

• Convulsions (1,0)

• Level of consciousness (1-3)

• Severity of dehydration (1-3)

• Age in years

• Pathogen– 1 Others, 2 HiB, 3 Streptococcus pneumoniae

Page 45: Introduction to Logistic Regression. Content Simple and multiple linear regression Simple logistic regression –The logistic function –Estimation of parameters

The logistic function (1)

0.0

0.2

0.4

0.6

0.8

1.0

Probability of disease

x

Page 46: Introduction to Logistic Regression. Content Simple and multiple linear regression Simple logistic regression –The logistic function –Estimation of parameters

The logistic function (2)

logit of P(y|x)

{

Page 47: Introduction to Logistic Regression. Content Simple and multiple linear regression Simple logistic regression –The logistic function –Estimation of parameters

Fitting equation to the data

• Linear regression: Least squares

• Logistic regression: Maximum likelihood

• Likelihood function– Estimates parameters and with property that

likelihood (probability) of observed data is higher than for any other values

– Practically easier to work with log-likelihood

n

iiiii xyxylL

1

)(1ln)1()(ln)(ln)(

Page 48: Introduction to Logistic Regression. Content Simple and multiple linear regression Simple logistic regression –The logistic function –Estimation of parameters

Maximum likelihood

• Iterative computing– Choice of an arbitrary value for the coefficients (usually 0)

– Computing of log-likelihood

– Variation of coefficients’ values

– Reiteration until maximisation (plateau)

• Results– Maximum Likelihood Estimates (MLE) for and – Estimates of P(y) for a given value of x