stata for logistic regression - for logistic regression.pdf fit a logistic regression model summary...

Click here to load reader

Post on 09-Mar-2020

38 views

Category:

Documents

1 download

Embed Size (px)

TRANSCRIPT

  • BIOSTATS 640 – Spring 2017 5. Logistic Regression Stata Illustration

    ….1. Teaching\stata\stata version 14\Stata for Logistic Regression.docx Page 1 of 30

    5. Logistic Regression Illustration – Stata version 14

    March 2017

    1. Tip: “1/2” Variables versus “0/1” Variables ………..….………….… 2. Tip: How to Create Quartile Groupings of Continuous Variable ……. 3. Fit a Logistic Regression Model …………………..………………… 4. Likelihood Ratio Test for 2 “Hierarchical” Models…………….………. 5. Regression Diagnostics for Logistic Regression: Numerical …….……. a. Numerical Measures of Fit Using fitstat …………………..…….. b. Test of Model Adequacy Using linktest …………………………. c. Test of Overall Goodness-of-Fit Using lfit ……………………….. 6. Regression Diagnostics for Logistic Regression: Graphical …….……. a. Plot of ROC Curve Using lroc …………………………………….. b. Plot of Standardized Residuals versus Observation Number ………. c. Plot of Influential Observations Using Cook’s Distances …………... 7. Tip: Save Your Commands to a DO File for Later Use ………….…

    2

    6

    9

    20

    23 25 25 26

    27 27 28 29

    30

    Preliminary – Download the stata data set illeetvilaine.dta. Note – This data set is accessible through the internet. Alternatively, you can download it from the course website.

    (a) In Stata, input directly from the internet using the command use use “http://people.umass.edu/biep640w/datasets/illeetvilaine.dta”, clear (b) From the course website, right click to download. Afterwards, in Stata, use FILE > OPEN See, http://people.umass.edu/biep640w/webpages/demonstrations.html

  • BIOSTATS 640 – Spring 2017 5. Logistic Regression Stata Illustration

    ….1. Teaching\stata\stata version 14\Stata for Logistic Regression.docx Page 2 of 30

    1. Tip - “1/2” Variables versus “0/1” Variables

    Why the fuss? Answer – Sometimes the arrangement of rows and columns in a 2x2 table are not what you expected.

    tab2 Stata will order the rows and columns according to the numeric values of the row and column variable. For a 0/1 variable, row 1 will be the value “0” row. Row 2 will be the value “1” row. For a 1/2 variable, row 1 will be the value “1” row. Row 2 will be the value “2” row. Columns are ordered similarly. cc, cs Stata assumes that you are using 0/1 variables here with 1= event and 0=non-event Stata will order the rows and columns according to event, with event being the first row (or column) Thus, row 1 will be the value “1=event” row. Row 2 will be the value “0=non-event” row. Columns are ordered similarly.

    Ille-et-Vilaine Data: Illustration Suppose we are interested in the 2x2 table cross-classification of heavy smoking (30+ gm/day versus other) and case status (esophageal cancer case versus control): Disease (Esophageal Cancer) Exposure (Heavy Smoking) Yes No

    Yes (30+ gm/day) 31 51 82 No 169 724 893

    200 775 975 Preliminary: Introduction to the command recode Use recode to re-set the values of a variable. This is especially handy in the creation of a new variable. You can recode a single old value to a new value. Or you can recode a whole range of values to a new value. For example - . use “http://people.umass.edu/biep640w/datasets/illeetvilaine.dta”, clear . * recode variablename (oldvalue=newvalue) (rangelower/rangeupper=newvalue) etc. . generate age12=age . recode age12 (18=1) (19/max=2)

  • BIOSTATS 640 – Spring 2017 5. Logistic Regression Stata Illustration

    ….1. Teaching\stata\stata version 14\Stata for Logistic Regression.docx Page 3 of 30

    . * Create "1/2" variables when you want to use command tab2 . * “1/2” measure of heavy smoking (1=30+ gm/day versus 2=other) . * Exposure will be heavy smoking defined as tobgp=4 (30+ gm/day) . generate exposure12=tobgp . recode exposure12 (1=2) (2=2) (3=2) (4=1) (exposure12: 739 changes made) . label define exposure12f 2 "other" 1 "heavy" . label values exposure12 exposure12f . * "1/2" variable for case status (1=case versus 2=other) . generate case12=case . recode case12 (0=2) (case12: 775 changes made) . label define case12f 2 "control" 1 "case" . label values case12 case12f . * Check variable creations . tab2 tobgp exposure12 -> tabulation of tobgp by exposure12 Grouped | tobacco | exposure12 consum. | heavy other | Total -----------+----------------------+---------- 0-9 gm/day | 0 526 | 526 10-19 | 0 236 | 236 20-29 | 0 131 | 131 30+ | 82 0 | 82 -----------+----------------------+---------- Total | 82 893 | 975 . tab2 case case12 -> tabulation of case by case12 Case | status | (1=case, | case12 0=control) | case control | Total -----------+----------------------+---------- 0 | 0 775 | 775 1 | 200 0 | 200 -----------+----------------------+---------- Total | 200 775 | 975

  • BIOSTATS 640 – Spring 2017 5. Logistic Regression Stata Illustration

    ….1. Teaching\stata\stata version 14\Stata for Logistic Regression.docx Page 4 of 30

    . * Create "0/1" variables when you want to use commands cc, cs . * “0/1” measure of heavy smoking (1=30+ gm/day versus 0=other) . * Exposure will be heavy smoking defined as tobgp=4 (30+ gm/day) . generate exposure01=tobgp . recode exposure01 (1=0) (2=0) (3=0) (4=1) (exposure01: 975 changes made) . label define exposure01f 0 "other" 1 "heavy" . label values exposure01 exposure01f . * "0/1" variable for case status (1=case versus 0=other) . * This already exists as the variable case . * Check variable creations . tab2 tobgp exposure01 -> tabulation of tobgp by exposure01 Grouped | tobacco | exposure01 consum. | other heavy | Total -----------+----------------------+---------- 0-9 gm/day | 526 0 | 526 10-19 | 236 0 | 236 20-29 | 131 0 | 131 30+ | 0 82 | 82 -----------+----------------------+---------- Total | 893 82 | 975 . * The command cc works fine with 0/1 variables . cc case exposure01 Proportion | Exposed Unexposed | Total Exposed -----------------+------------------------+------------------------ Cases | 31 169 | 200 0.1550 Controls | 51 724 | 775 0.0658 -----------------+------------------------+------------------------ Total | 82 893 | 975 0.0841 | | | Point estimate | [95% Conf. Interval] |------------------------+------------------------ Odds ratio | 2.604014 | 1.557944 4.2894 (exact) Attr. frac. ex. | .6159775 | .3581283 .7668672 (exact) Attr. frac. pop | .0954765 | +------------------------------------------------- chi2(1) = 16.42 Pr>chi2 = 0.0001 The commands cc and cs are commands for epidemiological analyses of 2x2 tables where the convention is to have cases be in row 1 (controls in row 2) and exposed be in column 1 (non-exposed in column 2).

  • BIOSTATS 640 – Spring 2017 5. Logistic Regression Stata Illustration

    ….1. Teaching\stata\stata version 14\Stata for Logistic Regression.docx Page 5 of 30

    . * tab2 with 0/1 variables . tab2 exposure01 case -> tabulation of exposure01 by case | Case status (1=case, | 0=control) exposure01 | 0 1 | Total -----------+----------------------+---------- other | 724 169 | 893 heavy | 51 31 | 82 -----------+----------------------+---------- Total | 775