# logistic regression. linear regression – numerical response logistic regression – binary...

Click here to load reader

Post on 21-Jan-2016

219 views

Embed Size (px)

TRANSCRIPT

Slide 1

Logistic RegressionLinear regression numerical responseLogistic regression binary categorical responseeg. has the disease, or unaffected by the diseaseInterested to find the attributes that are associated with the onset of the diseaseOr interested to predict the probability of getting the disease, given a set of attributes

TheoryLinear regression numerical responseLogistic regression binary categorical responseeg. has the disease, or unaffected by the diseaseInterested to find the attributes that are associated with the onset of the diseaseOr interested to predict the probability of getting the disease, given a set of attributesFits the model:

Effectively a linear model for log odds

TheoryLung Cancer

An age effect?Associated with smoking?Logistic RegressionAssess whether a variable is significantly associated with the responseQuantify the association, in terms of odds ratio Logistic RegressionAssess whether a variable is significantly associated with the responseQuantify the association, in terms of odds ratioConsider the equation

0.03 + 1.4 Smoke + 0.02 Gender + 0.01 (Age 20)

where p = probability of getting lung cancerwith baseline of a non-smoking female of age 20Keep everything else constant to interpret the effects of each variable

0.03 + 1.4 Smoke + 0.02 Gender + 0.01 (Age 20)

Non-smoking male of age 20 is exp(0.02) = 1.02 times more likely than a non-smoking female of age 20 to get lung cancerSmoking female of age 20 is exp(1.4) = 4.06 times more likely than non-smoking female of age 20Non-smoking female of age 50 is exp(30 0.01) = 1.35 times more likely than non-smoking female of age 20

Combining the effectsSmoking male of age 50 is exp(1.4 + 0.02 + 0.01 30) = 5.58times more likely than a non-smoking female of age 20

Note the encodings!

Interpret based on encodings

Summary Large suite of statistical tools for analysing data Important to choose the appropriate tools for the kind of data available. Most statistical tests require particular assumptions to be valid need to check these assumptions.