logistic regression. linear regression – numerical response logistic regression – binary...

21
Logistic Regression

Upload: samuel-sims

Post on 21-Jan-2016

252 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Logistic Regression. Linear regression – numerical response Logistic regression – binary categorical response eg. has the disease, or unaffected by the

Logistic Regression

Page 2: Logistic Regression. Linear regression – numerical response Logistic regression – binary categorical response eg. has the disease, or unaffected by the

• Linear regression – numerical responseLogistic regression – binary categorical response

• eg. has the disease, or unaffected by the disease• Interested to find the attributes that are

associated with the onset of the disease• Or interested to predict the probability of getting

the disease, given a set of attributes

Theory

Page 3: Logistic Regression. Linear regression – numerical response Logistic regression – binary categorical response eg. has the disease, or unaffected by the

• Linear regression – numerical responseLogistic regression – binary categorical response

• eg. has the disease, or unaffected by the disease• Interested to find the attributes that are

associated with the onset of the disease• Or interested to predict the probability of getting

the disease, given a set of attributes• Fits the model:

• Effectively a linear model for log odds

...1

log 332211

XbXbXbap

p

Theory

Page 4: Logistic Regression. Linear regression – numerical response Logistic regression – binary categorical response eg. has the disease, or unaffected by the

Lung CancerAn age effect?Associated with smoking?

Page 5: Logistic Regression. Linear regression – numerical response Logistic regression – binary categorical response eg. has the disease, or unaffected by the

Logistic Regression• Assess whether a variable is significantly

associated with the response• Quantify the association, in terms of odds ratio

Page 6: Logistic Regression. Linear regression – numerical response Logistic regression – binary categorical response eg. has the disease, or unaffected by the

Logistic Regression• Assess whether a variable is significantly

associated with the response• Quantify the association, in terms of odds ratio• Consider the equation

0.03 + 1.4 Smoke + 0.02 Gender + 0.01 (Age – 20)

where p = probability of getting lung cancer

with baseline of a non-smoking female of age 20• Keep everything else constant to interpret the

effects of each variable

pp

1log

Page 7: Logistic Regression. Linear regression – numerical response Logistic regression – binary categorical response eg. has the disease, or unaffected by the

0.03 + 1.4 Smoke + 0.02 Gender + 0.01 (Age – 20)

• Non-smoking male of age 20 is exp(0.02) = 1.02 times more likely than a non-smoking female of age 20 to get lung cancer

• Smoking female of age 20 is exp(1.4) = 4.06 times more likely than non-smoking female of age 20

• Non-smoking female of age 50 is exp(30 0.01) = 1.35 times more likely than non-smoking female of age 20

Combining the effects• Smoking male of age 50 is

exp(1.4 + 0.02 + 0.01 30) = 5.58times more likely than a non-smoking female of age 20

pp

1log

Page 8: Logistic Regression. Linear regression – numerical response Logistic regression – binary categorical response eg. has the disease, or unaffected by the
Page 9: Logistic Regression. Linear regression – numerical response Logistic regression – binary categorical response eg. has the disease, or unaffected by the
Page 10: Logistic Regression. Linear regression – numerical response Logistic regression – binary categorical response eg. has the disease, or unaffected by the
Page 11: Logistic Regression. Linear regression – numerical response Logistic regression – binary categorical response eg. has the disease, or unaffected by the
Page 12: Logistic Regression. Linear regression – numerical response Logistic regression – binary categorical response eg. has the disease, or unaffected by the
Page 13: Logistic Regression. Linear regression – numerical response Logistic regression – binary categorical response eg. has the disease, or unaffected by the

Note the encodings!

Page 14: Logistic Regression. Linear regression – numerical response Logistic regression – binary categorical response eg. has the disease, or unaffected by the
Page 15: Logistic Regression. Linear regression – numerical response Logistic regression – binary categorical response eg. has the disease, or unaffected by the

Interpret based on encodings

Page 16: Logistic Regression. Linear regression – numerical response Logistic regression – binary categorical response eg. has the disease, or unaffected by the
Page 17: Logistic Regression. Linear regression – numerical response Logistic regression – binary categorical response eg. has the disease, or unaffected by the
Page 18: Logistic Regression. Linear regression – numerical response Logistic regression – binary categorical response eg. has the disease, or unaffected by the
Page 19: Logistic Regression. Linear regression – numerical response Logistic regression – binary categorical response eg. has the disease, or unaffected by the
Page 20: Logistic Regression. Linear regression – numerical response Logistic regression – binary categorical response eg. has the disease, or unaffected by the
Page 21: Logistic Regression. Linear regression – numerical response Logistic regression – binary categorical response eg. has the disease, or unaffected by the

Summary

• Large suite of statistical tools for analysing data

• Important to choose the appropriate tools for the kind of data available.

• Most statistical tests require particular assumptions to be valid – need to check these assumptions.