lecture 6: multiple linear regression adjusted variable plots

23
Lecture 6: Multiple Linear Regression Adjusted Variable Plots BMTRY 701 Biostatistical Methods II

Upload: kareem

Post on 12-Jan-2016

43 views

Category:

Documents


0 download

DESCRIPTION

Lecture 6: Multiple Linear Regression Adjusted Variable Plots. BMTRY 701 Biostatistical Methods II. Graphical Displays in MLR. No more one simple scatterplot: need to look at multiple pairs of variables “pairs” in R. but, we cant look at all in regards to the way thet enter the model - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Lecture 6: Multiple Linear Regression Adjusted Variable Plots

Lecture 6:Multiple Linear RegressionAdjusted Variable Plots

BMTRY 701Biostatistical Methods II

Page 2: Lecture 6: Multiple Linear Regression Adjusted Variable Plots

Graphical Displays in MLR

No more one simple scatterplot: need to look at multiple pairs of variables

“pairs” in R. but, we cant look at all in regards to the way thet

enter the model solution: adjusted variable plot

Page 3: Lecture 6: Multiple Linear Regression Adjusted Variable Plots

Adjusted Variable Plots

Adjusted variable plots are useful in linear regression for outlier detection and for qualitative evaluation of the fit of a model.

With two covariates: Shows the association between X and Y adjusted for another variable, Z.

With more than two covariates: Shows the association between X and Y adjusted for many other covariates

In our example, association between logLOS and number of nurses, adjusted for number of beds

Page 4: Lecture 6: Multiple Linear Regression Adjusted Variable Plots

Approach

Assume we want to look at the association of Y and X, adjusted for Z

Step 1: Regress Y on X and save residuals (res.xy)

Step 2: Regress Z on X and save residuals (res.xz)

Step 3: plot res.xy versus res.xz Optional step 4:

• perform regression of res.xy on res.xz• compare slope to that of MLR of Y on X and Z

Page 5: Lecture 6: Multiple Linear Regression Adjusted Variable Plots

SENIC

Page 6: Lecture 6: Multiple Linear Regression Adjusted Variable Plots

R

pairs(~INFRISK+BEDS+logLOS, data=data, pch=16)

# adjusted variable plot approach# look at the association between INFRISK and logLOS, # adjusting for BEDS

reg.xy <- lm(logLOS ~ BEDS, data=data)res.xy <- reg.xy$residuals

reg.xz <- lm(INFRISK ~ BEDS, data=data)res.xz <- reg.xz$residuals

plot(res.xz, res.xy, pch=16)reg.res <- lm(res.xy ~ res.xz)abline(reg.res, lwd=2)reg.infrisk.beds <- lm(logLOS ~ BEDS + INFRISK, data=data)

Page 7: Lecture 6: Multiple Linear Regression Adjusted Variable Plots

INFRISK

0 200 400 600 800

23

45

67

8

020

040

060

080

0

BEDS

2 3 4 5 6 7 8 2.0 2.2 2.4 2.6 2.8 3.0

2.0

2.2

2.4

2.6

2.8

3.0

logLOS

Page 8: Lecture 6: Multiple Linear Regression Adjusted Variable Plots

-2 -1 0 1 2 3

-0.2

0.0

0.2

0.4

0.6

res.xz

res.

xy

Page 9: Lecture 6: Multiple Linear Regression Adjusted Variable Plots

Why is this important or interesting?

It shows us the ‘adjusted’ relationship it can help us determine if

• it is an important variable (at all)• if another form of X is more appropriate• if the correlation is high vs. low after adjustment• we need to/want to adjust for this variable

It also informs us about why a variable ‘loses’ significance

Most important: check for non-linearity Example: logLOS ~ NURSE

Page 10: Lecture 6: Multiple Linear Regression Adjusted Variable Plots

What about BEDS and NURSE?

# why NURSE is not associated, after adjustment for BEDS?

reg.nurse <- lm(logLOS ~ NURSE, data=data)reg.nurse.beds <- lm(logLOS ~ NURSE + BEDS, data=data)

reg.xy <- lm(logLOS ~ BEDS, data=data)res.xy <- reg.xy$residuals

reg.xz <- lm(NURSE ~ BEDS, data=data)res.xz <- reg.xz$residuals

plot(res.xz, res.xy, pch=16)reg.res <- lm(res.xy ~ res.xz)abline(reg.res, lwd=2)

Page 11: Lecture 6: Multiple Linear Regression Adjusted Variable Plots

-200 -100 0 100 200

-0.2

0.0

0.2

0.4

0.6

res.xz

res.

xy

Page 12: Lecture 6: Multiple Linear Regression Adjusted Variable Plots

What about the other way around?

######################## what about the other way? what about why BEDS is # assoc after adjustment for NURSE?

reg.xy <- lm(logLOS ~ NURSE, data=data)res.xy <- reg.xy$residuals

reg.xz <- lm(BEDS ~ NURSE, data=data)res.xz <- reg.xz$residuals

plot(res.xz, res.xy, pch=16)reg.res <- lm(res.xy ~ res.xz)abline(reg.res, lwd=2)reg.nurse.beds <- lm(logLOS ~ NURSE + BEDS, data=data)

Page 13: Lecture 6: Multiple Linear Regression Adjusted Variable Plots

-200 -100 0 100 200 300

-0.2

0.0

0.2

0.4

0.6

res.xz

res.

xy

Page 14: Lecture 6: Multiple Linear Regression Adjusted Variable Plots

Interpretation in MLR

“Adjusted for” “Controlled for “ “Holding all else constant”

In MLR, you need to include one of these phrases (or something like one of them) when interpreting a regression coefficient

Page 15: Lecture 6: Multiple Linear Regression Adjusted Variable Plots

LOS ~ INFRISK + BEDS

> reg.infrisk.beds <- lm(LOS ~ BEDS + INFRISK, data=data)> summary(reg.infrisk.beds)

Call:lm(formula = LOS ~ BEDS + INFRISK, data = data)

Residuals: Min 1Q Median 3Q Max -2.8624 -0.9904 -0.1996 0.6671 8.4219

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 6.2703521 0.5038751 12.444 < 2e-16 ***BEDS 0.0024747 0.0008236 3.005 0.00329 ** INFRISK 0.6323812 0.1184476 5.339 5.08e-07 ***---

Page 16: Lecture 6: Multiple Linear Regression Adjusted Variable Plots

Hard to interpret with so many decimal places!

> data$beds100 <- data$BEDS/100> reg.infrisk.beds <- lm(LOS ~ beds100 + INFRISK, data=data)> summary(reg.infrisk.beds)

Call:lm(formula = LOS ~ beds100 + INFRISK, data = data)

Residuals: Min 1Q Median 3Q Max -2.8624 -0.9904 -0.1996 0.6671 8.4219

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 6.27035 0.50388 12.444 < 2e-16 ***beds100 0.24747 0.08236 3.005 0.00329 ** INFRISK 0.63238 0.11845 5.339 5.08e-07 ***---

Page 17: Lecture 6: Multiple Linear Regression Adjusted Variable Plots

logLOS ~ INFRISK + BEDS

> reg.infrisk.beds <- lm(logLOS ~ BEDS + INFRISK, data=data)> summary(reg.infrisk.beds)

Call:lm(formula = logLOS ~ BEDS + INFRISK, data = data)

Residuals: Min 1Q Median 3Q Max -0.314377 -0.079979 -0.008026 0.072108 0.580675

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.926e+00 4.611e-02 41.767 < 2e-16 ***BEDS 2.407e-04 7.538e-05 3.194 0.00183 ** INFRISK 6.048e-02 1.084e-02 5.579 1.75e-07 ***---

Page 18: Lecture 6: Multiple Linear Regression Adjusted Variable Plots

Hard to interpret with so many decimal places!

> data$beds100 <- data$BEDS/100> reg.infrisk.beds100 <- lm(logLOS ~ beds100 + INFRISK, data=data)> summary(reg.infrisk.beds100)

Call:lm(formula = logLOS ~ beds100 + INFRISK, data = data)

Residuals: Min 1Q Median 3Q Max -0.314377 -0.079979 -0.008026 0.072108 0.580675

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.926040 0.046114 41.767 < 2e-16 ***beds100 0.024075 0.007538 3.194 0.00183 ** INFRISK 0.060477 0.010840 5.579 1.75e-07 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.1435 on 110 degrees of freedomMultiple R-squared: 0.3612, Adjusted R-squared: 0.3496 F-statistic: 31.1 on 2 and 110 DF, p-value: 1.971e-11

Page 19: Lecture 6: Multiple Linear Regression Adjusted Variable Plots

How to interpret?

Pick two values of BEDS• e.g. 100 to 200• e.g. 400 to 500

Estimate the difference in logLOS for each value

What do we plug in for INFRISK?

INFRISKbeds

INFRISKbedsSLO

*060.0100*024.093.1

ˆ100ˆˆˆlog 210

Page 20: Lecture 6: Multiple Linear Regression Adjusted Variable Plots

How to interpret?

Remember that our inferences are “holding all else constant”

To compare two hospitals with the same INFRISK, it doesn’t matter what you put in

024.0

1*024.02*024.0

)*060.01*024.093.1(

)*060.02*024.093.1(]1100|ˆ[log]2100|ˆ[log

*060.02*024.093.1

ˆ100ˆˆ]2100|ˆ[log

*060.01*024.093.1

ˆ100ˆˆ]1100|ˆ[log

210

210

INFRISK

INFRISKbedsSLObedsSLO

INFRISK

INFRISKbedsbedsSLO

INFRISK

INFRISKbedsbedsSLO

Page 21: Lecture 6: Multiple Linear Regression Adjusted Variable Plots

How to interpret?

02.1)024.0exp(

)1(ˆ)2(ˆ

)1(ˆ)2(ˆ

logexp))1(ˆlog)2(ˆexp(log

SLO

SLO

SLO

SLOSLOSLO

Comparing two hospitals whose number of beds differ by 100 andassuming the same infection risk in the two hospitals is the same, theratio of average LOS in the two hospitals is 1.02 with the hospital with more beds having the longer stay.

Page 22: Lecture 6: Multiple Linear Regression Adjusted Variable Plots

difference of 400 beds?

10.1)024.0*4exp(

)1(ˆ)5(ˆ

)1(ˆ)5(ˆ

logexp))1(ˆlog)5(ˆexp(log

SLO

SLO

SLO

SLOSLOSLO

Page 23: Lecture 6: Multiple Linear Regression Adjusted Variable Plots

When outcome is log transformed

interpretation of coefficients can be made as RATIOS instead of DIFFERENCES

Need to exponentiate the coefficient. its interpretation is the ratio for a one-unit

difference in the predictor.