multiple regression

18
Multiple Regression SPH 247 Statistical Analysis of Laboratory Data 1 April 23, 2010 SPH 247 Statistical Analysis of Laboratory Data

Upload: afya

Post on 22-Feb-2016

33 views

Category:

Documents


0 download

DESCRIPTION

Multiple Regression. SPH 247 Statistical Analysis of Laboratory Data. Cystic Fibrosis Data. Cystic fibrosis lung function data lung function data for cystic fibrosis patients (7-23 years old) age a numeric vector. Age in years. sex a numeric vector code. 0: male, 1:female. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Multiple Regression

SPH 247 Statistical Analysis of Laboratory Data 1

Multiple RegressionSPH 247

Statistical Analysis of Laboratory Data

April 23, 2010

Page 2: Multiple Regression

SPH 247 Statistical Analysis of Laboratory Data 2

Cystic Fibrosis DataCystic fibrosis lung function data

lung function data for cystic fibrosis patients (7-23 years old)

age a numeric vector. Age in years. sex a numeric vector code. 0: male, 1:female. height a numeric vector. Height (cm). weight a numeric vector. Weight (kg). bmp a numeric vector. Body mass (% of normal). fev1 a numeric vector. Forced expiratory volume. rv a numeric vector. Residual volume. frc a numeric vector. Functional residual

capacity. tlc a numeric vector. Total lung capacity. pemax a numeric vector. Maximum expiratory

pressure.April 23, 2010

Page 3: Multiple Regression

SPH 247 Statistical Analysis of Laboratory Data 3April 23, 2010

cf <- read.csv("cystfibr.csv")pairs(cf)attach(cf)cf.lm <- lm(pemax ~ age+sex+height+weight+bmp+fev1+rv+frc+tlc)print(summary(cf.lm))print(anova(cf.lm))print(drop1(cf.lm,test="F"))plot(cf.lm)step(cf.lm)detach(cf)

Page 4: Multiple Regression

SPH 247 Statistical Analysis of Laboratory Data 4April 23, 2010

Page 5: Multiple Regression

SPH 247 Statistical Analysis of Laboratory Data 5April 23, 2010

> source("cystfibr.r")> cf.lm <- lm(pemax ~ age + sex + height + weight + bmp + fev1 + rv + frc + tlc)> print(summary(cf.lm))…

Coefficients: Estimate Std. Error t value Pr(>|t|)(Intercept) 176.0582 225.8912 0.779 0.448age -2.5420 4.8017 -0.529 0.604sex -3.7368 15.4598 -0.242 0.812height -0.4463 0.9034 -0.494 0.628weight 2.9928 2.0080 1.490 0.157bmp -1.7449 1.1552 -1.510 0.152fev1 1.0807 1.0809 1.000 0.333rv 0.1970 0.1962 1.004 0.331frc -0.3084 0.4924 -0.626 0.540tlc 0.1886 0.4997 0.377 0.711

Residual standard error: 25.47 on 15 degrees of freedomMultiple R-Squared: 0.6373, Adjusted R-squared: 0.4197 F-statistic: 2.929 on 9 and 15 DF, p-value: 0.03195

Page 6: Multiple Regression

SPH 247 Statistical Analysis of Laboratory Data 6April 23, 2010

> print(anova(cf.lm))Analysis of Variance Table

Response: pemax Df Sum Sq Mean Sq F value Pr(>F) age 1 10098.5 10098.5 15.5661 0.001296 **sex 1 955.4 955.4 1.4727 0.243680 height 1 155.0 155.0 0.2389 0.632089 weight 1 632.3 632.3 0.9747 0.339170 bmp 1 2862.2 2862.2 4.4119 0.053010 . fev1 1 1549.1 1549.1 2.3878 0.143120 rv 1 561.9 561.9 0.8662 0.366757 frc 1 194.6 194.6 0.2999 0.592007 tlc 1 92.4 92.4 0.1424 0.711160 Residuals 15 9731.2 648.7 ---Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Performs sequential ANOVA

Page 7: Multiple Regression

SPH 247 Statistical Analysis of Laboratory Data 7April 23, 2010

> print(drop1(cf.lm, test = "F"))

Single term deletions

Model:pemax ~ age + sex + height + weight + bmp + fev1 + rv + frc + tlc Df Sum of Sq RSS AIC F value Pr(F)<none> 9731.2 169.1 age 1 181.8 9913.1 167.6 0.2803 0.6043sex 1 37.9 9769.2 167.2 0.0584 0.8123height 1 158.3 9889.6 167.5 0.2440 0.6285weight 1 1441.2 11172.5 170.6 2.2215 0.1568bmp 1 1480.1 11211.4 170.6 2.2815 0.1517fev1 1 648.4 10379.7 168.7 0.9995 0.3333rv 1 653.8 10385.0 168.7 1.0077 0.3314frc 1 254.6 9985.8 167.8 0.3924 0.5405tlc 1 92.4 9823.7 167.3 0.1424 0.7112

Performs Type III ANOVA

Page 8: Multiple Regression

SPH 247 Statistical Analysis of Laboratory Data 8April 23, 2010

80 100 120 140 160

-40

-20

020

40

Fitted values

Res

idua

ls

lm(pemax ~ age + sex + height + weight + bmp + fev1 + rv + frc + tlc)

Residuals vs Fitted

2124

16

Page 9: Multiple Regression

SPH 247 Statistical Analysis of Laboratory Data 9April 23, 2010

-2 -1 0 1 2

-10

12

Theoretical Quantiles

Sta

ndar

dize

d re

sidu

als

lm(pemax ~ age + sex + height + weight + bmp + fev1 + rv + frc + tlc)

Normal Q-Q

24 14

16

Page 10: Multiple Regression

SPH 247 Statistical Analysis of Laboratory Data 10April 23, 2010

80 100 120 140 160

0.0

0.2

0.4

0.6

0.8

1.0

1.2

Fitted values

Sta

ndar

dize

d re

sidu

als

lm(pemax ~ age + sex + height + weight + bmp + fev1 + rv + frc + tlc)

Scale-Location

241416

Page 11: Multiple Regression

SPH 247 Statistical Analysis of Laboratory Data 11April 23, 2010

0.0 0.1 0.2 0.3 0.4 0.5 0.6

-2-1

01

2

Leverage

Sta

ndar

dize

d re

sidu

als

lm(pemax ~ age + sex + height + weight + bmp + fev1 + rv + frc + tlc)

Cook's distance

0.5

0.5

Residuals vs Leverage

1424

16

Page 12: Multiple Regression

SPH 247 Statistical Analysis of Laboratory Data 12April 23, 2010

> step(cf.lm)Start: AIC=169.11pemax ~ age + sex + height + weight + bmp + fev1 + rv + frc + tlc

Df Sum of Sq RSS AIC- sex 1 37.9 9769.2 167.2- tlc 1 92.4 9823.7 167.3- height 1 158.3 9889.6 167.5- age 1 181.8 9913.1 167.6- frc 1 254.6 9985.8 167.8- fev1 1 648.4 10379.7 168.7- rv 1 653.8 10385.0 168.7<none> 9731.2 169.1- weight 1 1441.2 11172.5 170.6- bmp 1 1480.1 11211.4 170.6

Step: AIC=167.2pemax ~ age + height + weight + bmp + fev1 + rv + frc + tlc

……………

Page 13: Multiple Regression

SPH 247 Statistical Analysis of Laboratory Data 13April 23, 2010

Step: AIC=160.66pemax ~ weight + bmp + fev1 + rv

Df Sum of Sq RSS AIC<none> 10354.6 160.7- rv 1 1183.6 11538.2 161.4- bmp 1 3072.6 13427.2 165.2- fev1 1 3717.1 14071.7 166.3- weight 1 10930.2 21284.8 176.7

Call:lm(formula = pemax ~ weight + bmp + fev1 + rv)

Coefficients:(Intercept) weight bmp fev1 rv 63.9467 1.7489 -1.3772 1.5477 0.1257

Page 14: Multiple Regression

SPH 247 Statistical Analysis of Laboratory Data 14April 23, 2010

> cf.lm2 <- lm(pemax ~ rv+bmp+fev1+weight)> summary(cf.lm2)

Call:lm(formula = pemax ~ rv + bmp + fev1 + weight)

Residuals: Min 1Q Median 3Q Max -39.77 -11.74 4.33 15.66 35.07

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 63.94669 53.27673 1.200 0.244057 rv 0.12572 0.08315 1.512 0.146178 bmp -1.37724 0.56534 -2.436 0.024322 * fev1 1.54770 0.57761 2.679 0.014410 * weight 1.74891 0.38063 4.595 0.000175 ***---Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 22.75 on 20 degrees of freedomMultiple R-Squared: 0.6141, Adjusted R-squared: 0.5369 F-statistic: 7.957 on 4 and 20 DF, p-value: 0.000523

Page 15: Multiple Regression

SPH 247 Statistical Analysis of Laboratory Data 15

Cautionary NotesThe significance levels are not necessarily

believable after variable selectionThe original full model F-statistic is

significant, indicating that there is some significant relationship: F(9,15) = 2.93, p = 0.0320

After variable selection, F(3,21) = 9.28, p = 0.0004, which is biased.

April 23, 2010

Page 16: Multiple Regression

SPH 247 Statistical Analysis of Laboratory Data 16April 23, 2010

set obs 25generate x1 = invnormal(uniform())generate x2 = invnormal(uniform())generate x3 = invnormal(uniform())generate x4 = invnormal(uniform())generate x5 = invnormal(uniform())generate x6 = invnormal(uniform())generate x7 = invnormal(uniform())generate x8 = invnormal(uniform())generate x9 = invnormal(uniform())generate y = invnormal(uniform())regress y x1 x2 x3 x4 x5 x6 x7 x8 x9stepwise, pr(.1): regress y x1 x2 x3 x4 x5 x6 x7 x8 x9

Page 17: Multiple Regression

SPH 247 Statistical Analysis of Laboratory Data 17April 23, 2010

. regress y x1 x2 x3 x4 x5 x6 x7 x8 x9

Source | SS df MS Number of obs = 25-------------+------------------------------ F( 9, 15) = 0.91 Model | 12.3235639 9 1.36928488 Prob > F = 0.5397 Residual | 22.5105993 15 1.50070662 R-squared = 0.3538-------------+------------------------------ Adj R-squared = -0.0340 Total | 34.8341632 24 1.45142347 Root MSE = 1.225

------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- x1 | -.0441858 .2998066 -0.15 0.885 -.6832085 .594837 x2 | -.9078136 .4347798 -2.09 0.054 -1.834525 .0188976 x3 | .2076754 .3789522 0.55 0.592 -.6000421 1.015393 x4 | -.0056383 .3319125 -0.02 0.987 -.7130931 .7018166 x5 | -.330546 .3854497 -0.86 0.405 -1.152113 .4910207 x6 | .0202964 .3470704 0.06 0.954 -.7194666 .7600594 x7 | -.073401 .3135234 -0.23 0.818 -.7416603 .5948583 x8 | -.0552909 .3026913 -0.18 0.858 -.7004621 .5898803 x9 | -.3190092 .3137931 -1.02 0.325 -.9878434 .349825 _cons | -.2490392 .3078424 -0.81 0.431 -.9051898 .4071113------------------------------------------------------------------------------

Page 18: Multiple Regression

SPH 247 Statistical Analysis of Laboratory Data 18April 23, 2010

. stepwise, pr(.1): regress y x1 x2 x3 x4 x5 x6 x7 x8 x9 begin with full modelp = 0.9867 >= 0.1000 removing x4p = 0.9545 >= 0.1000 removing x6p = 0.8456 >= 0.1000 removing x1p = 0.8165 >= 0.1000 removing x7p = 0.7506 >= 0.1000 removing x8p = 0.5023 >= 0.1000 removing x3p = 0.2866 >= 0.1000 removing x5p = 0.2081 >= 0.1000 removing x9

Source | SS df MS Number of obs = 25-------------+------------------------------ F( 1, 23) = 7.23 Model | 8.33379862 1 8.33379862 Prob > F = 0.0131 Residual | 26.5003646 23 1.15218977 R-squared = 0.2392-------------+------------------------------ Adj R-squared = 0.2062 Total | 34.8341632 24 1.45142347 Root MSE = 1.0734

------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- x2 | -.6644002 .2470417 -2.69 0.013 -1.175445 -.1533555 _cons | -.1523124 .214703 -0.71 0.485 -.5964594 .2918346------------------------------------------------------------------------------