sections 11-7 and 11-8homepage.stat.uiowa.edu/~rdecook/stat2020/notes/ch11_pt4.pdf · sections 11-7...

12
Chapter 11: SIMPLE LINEAR REGRESSION (SLR) AND CORRELATION Part 4: Adequacy of the regression model, Correlation Sections 11-7 and 11-8 Is a linear model the correct model? (Is simple linear regression complex enough to capture the relationship between X & Y ?) Are the assumptions we’re making for our model reasonable, or are they violated? To answer these questions, we will use the residuals of the model. The residual for observations i: e i = y i - ˆ y i 1

Upload: others

Post on 26-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Sections 11-7 and 11-8homepage.stat.uiowa.edu/~rdecook/stat2020/notes/ch11_pt4.pdf · Sections 11-7 and 11-8 Is a linear model the correct model? (Is simple linear regression complex

Chapter 11: SIMPLE LINEARREGRESSION (SLR)AND CORRELATION

Part 4: Adequacy of the regression model,CorrelationSections 11-7 and 11-8

• Is a linear model the correct model?(Is simple linear regression complex enoughto capture the relationship betweenX & Y ?)

• Are the assumptions we’re making for ourmodel reasonable, or are they violated?

• To answer these questions, we will use theresiduals of the model.

The residual for observations i:

ei = yi − yi

1

Page 2: Sections 11-7 and 11-8homepage.stat.uiowa.edu/~rdecook/stat2020/notes/ch11_pt4.pdf · Sections 11-7 and 11-8 Is a linear model the correct model? (Is simple linear regression complex

Residuals are informative

• Plot residuals vs. the explanatory variable(or vs. y fitted values in SLR)

• If the plot is a random scatter of points aboveand below the horizontal reference line, thenthe linear model is reasonable, and adequate.

2

Page 3: Sections 11-7 and 11-8homepage.stat.uiowa.edu/~rdecook/stat2020/notes/ch11_pt4.pdf · Sections 11-7 and 11-8 Is a linear model the correct model? (Is simple linear regression complex

• If not (i.e. if there is a pattern in the residualplot), then there may be issues with our lin-earity assumption or perhaps other assump-tions in our model.

• Example showing inadequacy:Kentucky Derby data set

on year of race and speed of horse.

●●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

●●●

●●

●●

●●

1880 1900 1920 1940 1960 1980 2000

3233

3435

3637

Kentucky Derby: Horse speed vs. Year

Year

Spe

ed (

MP

H)

3

Page 4: Sections 11-7 and 11-8homepage.stat.uiowa.edu/~rdecook/stat2020/notes/ch11_pt4.pdf · Sections 11-7 and 11-8 Is a linear model the correct model? (Is simple linear regression complex

The form of the scatterplot looks a bit non-linear, but we’ll go ahead and fit a straightline model first to get the following residualplot...

Residual Plot of ‘residuals vs. fitted values’

• Residuals have a bit of a pattern (e.g. be-low the line, above the line, below the line),not randomly scattered around the horizon-tal line

4

Page 5: Sections 11-7 and 11-8homepage.stat.uiowa.edu/~rdecook/stat2020/notes/ch11_pt4.pdf · Sections 11-7 and 11-8 Is a linear model the correct model? (Is simple linear regression complex

• Linear form may not be reasonable oradequate.

⇒ Quadratic may fit better.

5

Page 6: Sections 11-7 and 11-8homepage.stat.uiowa.edu/~rdecook/stat2020/notes/ch11_pt4.pdf · Sections 11-7 and 11-8 Is a linear model the correct model? (Is simple linear regression complex

Beyond Adequacy

• Besides checking that our model fits the gen-eral (linear) relationship between X and Y,we also need to consider the assumptionswe made in our model.

• The basic model

Yi = β0 + β1xi + εi︸ ︷︷ ︸ ↑linear random

relationship error term

with εiiid∼ N(0, σ2)

– Constant variance of errors(only one σ2 for all errors)

– Normality of errors

– Independence of errors

6

Page 7: Sections 11-7 and 11-8homepage.stat.uiowa.edu/~rdecook/stat2020/notes/ch11_pt4.pdf · Sections 11-7 and 11-8 Is a linear model the correct model? (Is simple linear regression complex

Constant Variance Assumption

•We’ll check this assumption by plotting theresiduals vs. the fitted values (or vs. the ex-planatory variable in SLR)

• Look for a constant ‘spread’ above and belowthe horizontal reference line.

●●

●●

● ●

●●

0.5 1.0 1.5 2.0

−0.

15−

0.10

−0.

050.

000.

050.

100.

15

Residuals vs. Fitted

fitted values

resi

dual

s

• NOTE: This same residual plot was also used to check

linearity.

7

Page 8: Sections 11-7 and 11-8homepage.stat.uiowa.edu/~rdecook/stat2020/notes/ch11_pt4.pdf · Sections 11-7 and 11-8 Is a linear model the correct model? (Is simple linear regression complex

•Constant Variance and Adequacy areboth checked with the same residualplot in SLR

• Plot residuals against x or y.

8

Page 9: Sections 11-7 and 11-8homepage.stat.uiowa.edu/~rdecook/stat2020/notes/ch11_pt4.pdf · Sections 11-7 and 11-8 Is a linear model the correct model? (Is simple linear regression complex

Normality Assumption

• Use normal probability plot of residualsto check normality of errors (see section 6-6for non-normal patterns like those below).

9

Page 10: Sections 11-7 and 11-8homepage.stat.uiowa.edu/~rdecook/stat2020/notes/ch11_pt4.pdf · Sections 11-7 and 11-8 Is a linear model the correct model? (Is simple linear regression complex

Independence Assumption

• Verify that the observations are independent.

• Check how the data was collected (talk tothe researcher or client).

• If data was collected over time, plot residu-als against time to make sure there isn’t adependence (or trend) across time.

10

Page 11: Sections 11-7 and 11-8homepage.stat.uiowa.edu/~rdecook/stat2020/notes/ch11_pt4.pdf · Sections 11-7 and 11-8 Is a linear model the correct model? (Is simple linear regression complex

• Predictions and Extrapolation

– We can use our fitted model to make pre-dictions.

– e.g.What is the expected longevity of a fruitflywith a thorax of length 0.80 mm?

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

● ●

●● ●

●●

●●

●●

●●

0.65 0.70 0.75 0.80 0.85 0.90 0.95

2040

6080

100

ff.data$Thorax

ff.da

ta$L

onge

vity

Y = −61.05 + 144.33 x

11

Page 12: Sections 11-7 and 11-8homepage.stat.uiowa.edu/~rdecook/stat2020/notes/ch11_pt4.pdf · Sections 11-7 and 11-8 Is a linear model the correct model? (Is simple linear regression complex

Prediction:

Yx=0.80 = −61.05 + 144.33(0.80)= 54.414 days

– If we try to predict Y outside of the rangeof observed x-values, we are using the modelto extrapolate (predict outside the rangeof the observed data).

– You should be very careful when using ex-trapolation. In general it should be avoidedas we don’t have a feel for what is goingon outside the observed range.

– Predicting Y for x = 1.50 mm (which isnot a value near the observed x-values)would be an extrapolation in this fruitflyexample.

12