sections 11-7 and 11-8homepage.stat.uiowa.edu/~rdecook/stat2020/notes/ch11_pt4.pdf · sections 11-7...

Post on 26-Jul-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Chapter 11: SIMPLE LINEARREGRESSION (SLR)AND CORRELATION

Part 4: Adequacy of the regression model,CorrelationSections 11-7 and 11-8

• Is a linear model the correct model?(Is simple linear regression complex enoughto capture the relationship betweenX & Y ?)

• Are the assumptions we’re making for ourmodel reasonable, or are they violated?

• To answer these questions, we will use theresiduals of the model.

The residual for observations i:

ei = yi − yi

1

Residuals are informative

• Plot residuals vs. the explanatory variable(or vs. y fitted values in SLR)

• If the plot is a random scatter of points aboveand below the horizontal reference line, thenthe linear model is reasonable, and adequate.

2

• If not (i.e. if there is a pattern in the residualplot), then there may be issues with our lin-earity assumption or perhaps other assump-tions in our model.

• Example showing inadequacy:Kentucky Derby data set

on year of race and speed of horse.

●●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

●●●

●●

●●

●●

1880 1900 1920 1940 1960 1980 2000

3233

3435

3637

Kentucky Derby: Horse speed vs. Year

Year

Spe

ed (

MP

H)

3

The form of the scatterplot looks a bit non-linear, but we’ll go ahead and fit a straightline model first to get the following residualplot...

Residual Plot of ‘residuals vs. fitted values’

• Residuals have a bit of a pattern (e.g. be-low the line, above the line, below the line),not randomly scattered around the horizon-tal line

4

• Linear form may not be reasonable oradequate.

⇒ Quadratic may fit better.

5

Beyond Adequacy

• Besides checking that our model fits the gen-eral (linear) relationship between X and Y,we also need to consider the assumptionswe made in our model.

• The basic model

Yi = β0 + β1xi + εi︸ ︷︷ ︸ ↑linear random

relationship error term

with εiiid∼ N(0, σ2)

– Constant variance of errors(only one σ2 for all errors)

– Normality of errors

– Independence of errors

6

Constant Variance Assumption

•We’ll check this assumption by plotting theresiduals vs. the fitted values (or vs. the ex-planatory variable in SLR)

• Look for a constant ‘spread’ above and belowthe horizontal reference line.

●●

●●

● ●

●●

0.5 1.0 1.5 2.0

−0.

15−

0.10

−0.

050.

000.

050.

100.

15

Residuals vs. Fitted

fitted values

resi

dual

s

• NOTE: This same residual plot was also used to check

linearity.

7

•Constant Variance and Adequacy areboth checked with the same residualplot in SLR

• Plot residuals against x or y.

8

Normality Assumption

• Use normal probability plot of residualsto check normality of errors (see section 6-6for non-normal patterns like those below).

9

Independence Assumption

• Verify that the observations are independent.

• Check how the data was collected (talk tothe researcher or client).

• If data was collected over time, plot residu-als against time to make sure there isn’t adependence (or trend) across time.

10

• Predictions and Extrapolation

– We can use our fitted model to make pre-dictions.

– e.g.What is the expected longevity of a fruitflywith a thorax of length 0.80 mm?

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

● ●

●● ●

●●

●●

●●

●●

0.65 0.70 0.75 0.80 0.85 0.90 0.95

2040

6080

100

ff.data$Thorax

ff.da

ta$L

onge

vity

Y = −61.05 + 144.33 x

11

Prediction:

Yx=0.80 = −61.05 + 144.33(0.80)= 54.414 days

– If we try to predict Y outside of the rangeof observed x-values, we are using the modelto extrapolate (predict outside the rangeof the observed data).

– You should be very careful when using ex-trapolation. In general it should be avoidedas we don’t have a feel for what is goingon outside the observed range.

– Predicting Y for x = 1.50 mm (which isnot a value near the observed x-values)would be an extrapolation in this fruitflyexample.

12

top related