linear regression with r 2
TRANSCRIPT
![Page 1: Linear regression with R 2](https://reader033.vdocuments.mx/reader033/viewer/2022052600/55851dffd8b42ada748b4a66/html5/thumbnails/1.jpg)
Linear Regressionwith
2012-12-10 @HSPHKazuki Yoshida, M.D. MPH-CLE student
FREEDOMTO KNOW
2: Model selection
![Page 2: Linear regression with R 2](https://reader033.vdocuments.mx/reader033/viewer/2022052600/55851dffd8b42ada748b4a66/html5/thumbnails/2.jpg)
Group Website is at:
http://rpubs.com/kaz_yos/useR_at_HSPH
![Page 3: Linear regression with R 2](https://reader033.vdocuments.mx/reader033/viewer/2022052600/55851dffd8b42ada748b4a66/html5/thumbnails/3.jpg)
n Introduction
n Reading Data into R (1)
n Reading Data into R (2)
n Descriptive, continuous
n Descriptive, categorical
n Deducer
n Graphics
n Groupwise, continuous
n Linear regression
Previously in this group
![Page 4: Linear regression with R 2](https://reader033.vdocuments.mx/reader033/viewer/2022052600/55851dffd8b42ada748b4a66/html5/thumbnails/4.jpg)
Menu
n Linear regression: Model selection
![Page 5: Linear regression with R 2](https://reader033.vdocuments.mx/reader033/viewer/2022052600/55851dffd8b42ada748b4a66/html5/thumbnails/5.jpg)
Ingredients
n Selection methods n step()
n drop1()
n add1()
n leaps::regsubsets()
Statistics Programming
![Page 6: Linear regression with R 2](https://reader033.vdocuments.mx/reader033/viewer/2022052600/55851dffd8b42ada748b4a66/html5/thumbnails/6.jpg)
Open R Studio
![Page 7: Linear regression with R 2](https://reader033.vdocuments.mx/reader033/viewer/2022052600/55851dffd8b42ada748b4a66/html5/thumbnails/7.jpg)
Open the saved script that we
created last time.See also Linear Regression with R 1 slides
![Page 8: Linear regression with R 2](https://reader033.vdocuments.mx/reader033/viewer/2022052600/55851dffd8b42ada748b4a66/html5/thumbnails/8.jpg)
Create full & null models
lm.full <- lm(bwt ~ age + lwt + smoke + ht + ui + ftv.cat + race.cat + preterm, data = lbw)
lm.null <- lm(bwt ~ 1, data = lbw)
Intercept-only
![Page 9: Linear regression with R 2](https://reader033.vdocuments.mx/reader033/viewer/2022052600/55851dffd8b42ada748b4a66/html5/thumbnails/9.jpg)
Compare two models
anova(lm.full, lm.null)
Model 2Model 1
![Page 10: Linear regression with R 2](https://reader033.vdocuments.mx/reader033/viewer/2022052600/55851dffd8b42ada748b4a66/html5/thumbnails/10.jpg)
Partial F-test
Significant
Models
Residual degree of freedomResidual sum of squares
Difference in residual SS
![Page 11: Linear regression with R 2](https://reader033.vdocuments.mx/reader033/viewer/2022052600/55851dffd8b42ada748b4a66/html5/thumbnails/11.jpg)
Backward elimination
lm.step.bw <- step(lm.full, direction = "backward")
Final model object
Specify full model
![Page 12: Linear regression with R 2](https://reader033.vdocuments.mx/reader033/viewer/2022052600/55851dffd8b42ada748b4a66/html5/thumbnails/12.jpg)
Initial AIC
for full model
Removing ftv.catmakes AIC smallest
Removing agemakes AIC smallest
Doing nothingmakes AIC smallest
![Page 13: Linear regression with R 2](https://reader033.vdocuments.mx/reader033/viewer/2022052600/55851dffd8b42ada748b4a66/html5/thumbnails/13.jpg)
Forward selection
lm.step.fw <- step(lm.null, scope = ~ age + lwt + smoke + ht + ui + ftv.cat + race.cat + preterm, direction = "forward")
Final model object Specify null model
formula for possible variables
![Page 14: Linear regression with R 2](https://reader033.vdocuments.mx/reader033/viewer/2022052600/55851dffd8b42ada748b4a66/html5/thumbnails/14.jpg)
Initial AIC for null
model
Adding uimakes AIC smallest
Adding race.catmakes AIC smallest
Adding smokemakes AIC smallest
Still goes on ...
![Page 15: Linear regression with R 2](https://reader033.vdocuments.mx/reader033/viewer/2022052600/55851dffd8b42ada748b4a66/html5/thumbnails/15.jpg)
Stepwise selection/elimination
lm.step.both <- step(lm.null, scope = ~ age + lwt + smoke + ht + ui + ftv.cat + race.cat + preterm, direction = "both")
Final model object Specify null model
formula for possible variables
![Page 16: Linear regression with R 2](https://reader033.vdocuments.mx/reader033/viewer/2022052600/55851dffd8b42ada748b4a66/html5/thumbnails/16.jpg)
Initial AIC for null
model
Adding uimakes AIC smallest
Adding race.catmakes AIC smallest
Adding smokemakes AIC smallest
Still goes on ...
Removing is also considered
Removing is also considered
![Page 17: Linear regression with R 2](https://reader033.vdocuments.mx/reader033/viewer/2022052600/55851dffd8b42ada748b4a66/html5/thumbnails/17.jpg)
F-test using drop1()## age is the least significant by partial F testdrop1(lm.full, test = "F")
## After elimination, ftv.cat is the least significantdrop1(update(lm.full, ~ . -age), test = "F")
## After elimination, preterm is least significat at p = 0.12.drop1(update(lm.full, ~ . -age -ftv.cat), test = "F")
## After elimination, all variables are significant at p < 0.1drop1(update(lm.full, ~ . -age -ftv.cat -preterm), test = "F")
## Show summary for final modelsummary(update(lm.full, ~ . -age -ftv.cat -preterm))
![Page 18: Linear regression with R 2](https://reader033.vdocuments.mx/reader033/viewer/2022052600/55851dffd8b42ada748b4a66/html5/thumbnails/18.jpg)
Updating models## Remove age from full modellm.age.less <- update(lm.full, ~ . -age)
## Adding ui to null modellm.ui.only <- update(lm.null, ~ . +ui)
all variables(.) minus age
all variables (.) plus ui
![Page 19: Linear regression with R 2](https://reader033.vdocuments.mx/reader033/viewer/2022052600/55851dffd8b42ada748b4a66/html5/thumbnails/19.jpg)
age least significant
ftv.cat least significant
remove age, and test
test full model
remove age, ftv.cat
F-test comparing age-in model to age-out model
![Page 20: Linear regression with R 2](https://reader033.vdocuments.mx/reader033/viewer/2022052600/55851dffd8b42ada748b4a66/html5/thumbnails/20.jpg)
F-test using add1()## ui is the most significant variableadd1(lm.null, scope = ~ age + lwt + race.cat + smoke + preterm + ht + ui + ftv.cat, test = "F")
## After inclusion, race.cat is the most significantadd1(update(lm.null, ~ . +ui), scope = ~ age + lwt + race.cat + smoke + preterm + ht + ui + ftv.cat, test = "F")
## After inclusion, smoke is the most significantadd1(update(lm.null, ~ . +ui +race.cat), scope = ~ age + lwt + race.cat + smoke + preterm + ht + ui + ftv.cat, test = "F")
## After inclusion, ht is the most significantadd1(update(lm.null, ~ . +ui +race.cat +smoke), scope = ~ age + lwt + race.cat + smoke + preterm + ht + ui + ftv.cat, test = "F")...
![Page 21: Linear regression with R 2](https://reader033.vdocuments.mx/reader033/viewer/2022052600/55851dffd8b42ada748b4a66/html5/thumbnails/21.jpg)
ui most significant
race.cat most significant
add ui, and test
test null model
add ui and race.cat
F-test comparing ui-out model to ui-in model
![Page 22: Linear regression with R 2](https://reader033.vdocuments.mx/reader033/viewer/2022052600/55851dffd8b42ada748b4a66/html5/thumbnails/22.jpg)
All-subset regression using leaps package
![Page 23: Linear regression with R 2](https://reader033.vdocuments.mx/reader033/viewer/2022052600/55851dffd8b42ada748b4a66/html5/thumbnails/23.jpg)
library(leaps)
regsubsets.out <- regsubsets(bwt ~ age + lwt + smoke + ht + ui + ftv.cat + race.cat + preterm, data = lbw, nbest = 1, nvmax = NULL, force.in = NULL, force.out = NULL, method = "exhaustive")
summary(regsubsets.out)
![Page 24: Linear regression with R 2](https://reader033.vdocuments.mx/reader033/viewer/2022052600/55851dffd8b42ada748b4a66/html5/thumbnails/24.jpg)
library(leaps)
regsubsets.out <- regsubsets(bwt ~ age + lwt + smoke + ht + ui + ftv.cat + race.cat + preterm, data = lbw, nbest = 1, nvmax = NULL, force.in = NULL, force.out = NULL, method = "exhaustive")
summary(regsubsets.out)
Result object
Full model
How many best models?Max model size Forced variables
![Page 25: Linear regression with R 2](https://reader033.vdocuments.mx/reader033/viewer/2022052600/55851dffd8b42ada748b4a66/html5/thumbnails/25.jpg)
Forced variables
Best 1 predictor
model
Best 7 predictor
model
Best 10 predictor
model
Variable combination
![Page 26: Linear regression with R 2](https://reader033.vdocuments.mx/reader033/viewer/2022052600/55851dffd8b42ada748b4a66/html5/thumbnails/26.jpg)
plot(regsubsets.out, scale = "adjr2", main = "Adjusted R^2")
~ lwt + smoke + ht + ui + race.cat + preterm
~ ui
~ smoke + ht + ui + race
the higher the better
![Page 27: Linear regression with R 2](https://reader033.vdocuments.mx/reader033/viewer/2022052600/55851dffd8b42ada748b4a66/html5/thumbnails/27.jpg)
library(car)subsets(regsubsets.out, statistic="adjr2", legend = FALSE, min.size = 5, main = "Adjusted R^2")
~ lwt + smoke + ht + ui + race.cat + preterm
![Page 28: Linear regression with R 2](https://reader033.vdocuments.mx/reader033/viewer/2022052600/55851dffd8b42ada748b4a66/html5/thumbnails/28.jpg)
subsets(regsubsets.out, statistic="cp", legend = FALSE, min.size = 5, main = "Mallow Cp")
~ lwt + smoke + ht + ui + race.cat + preterm
First model for which Mallow Cp is less than number of
regressors + 1
![Page 29: Linear regression with R 2](https://reader033.vdocuments.mx/reader033/viewer/2022052600/55851dffd8b42ada748b4a66/html5/thumbnails/29.jpg)