chapter 11: simple linear regression and correlation...

23
Chapter 11: SIMPLE LINEAR REGRESSION AND CORRELATION Part 1: Simple Linear Regression (SLR) Introduction Sections 11-1 and 11-2 Abrasion Loss vs. Hardness Price of clock vs. Age of clock 1000 1400 1800 2200 125 150 175 Age of Clock (yrs) Price Sold at Auction 5.0 7.5 10.0 12.5 15.0 Bidders 1

Upload: others

Post on 17-Oct-2020

16 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Chapter 11: SIMPLE LINEAR REGRESSION AND CORRELATION …homepage.divms.uiowa.edu/~rdecook/stat2020/notes/ch11_pt1.pdf · Part 1: Simple Linear Regression (SLR) Introduction Sections

Chapter 11: SIMPLE LINEAR REGRESSION

AND CORRELATION

Part 1: Simple Linear Regression (SLR)

Introduction

Sections 11-1 and 11-2

Abrasion Loss vs. Hardness

Price of clock vs. Age of clock

1000

1400

1800

2200

125 150 175Age of Clock (yrs)

Pric

e So

ld a

t Auc

tion

5.0

7.5

10.0

12.5

15.0Bidders

1

Page 2: Chapter 11: SIMPLE LINEAR REGRESSION AND CORRELATION …homepage.divms.uiowa.edu/~rdecook/stat2020/notes/ch11_pt1.pdf · Part 1: Simple Linear Regression (SLR) Introduction Sections

•Regression is a method for studying therelationship between two or morequantitative variables

• Simple linear regression (SLR):

One quantitative dependent variable

- response variable

- dependent variable

- Y

One quantitative independent variable

- explanatory variable

- predictor variable

- X

•Multiple linear regression:

One quantitative dependent variable

Many quantitative independent variables

– You’ll see this in STAT:3200/IE:3760Applied Linear Regression, if you take it.

2

Page 3: Chapter 11: SIMPLE LINEAR REGRESSION AND CORRELATION …homepage.divms.uiowa.edu/~rdecook/stat2020/notes/ch11_pt1.pdf · Part 1: Simple Linear Regression (SLR) Introduction Sections

• SLR Examples:

– predict salary from years of experience

– estimate effect of lead exposure on schooltesting performance

– predict force at which a metal alloy rodbends based on iron content

3

Page 4: Chapter 11: SIMPLE LINEAR REGRESSION AND CORRELATION …homepage.divms.uiowa.edu/~rdecook/stat2020/notes/ch11_pt1.pdf · Part 1: Simple Linear Regression (SLR) Introduction Sections

• Example: Health dataVariables:

Percent of Obese IndividualsPercent of Active Individuals

Data from CDC. Units are regions of U.S. in 2014.

PercentObesity PercentActive

1 29.7 55.3

2 28.9 51.9

3 35.9 41.2

4 24.7 56.3

5 21.3 60.4

6 26.3 50.9

.

.

.

40 45 50 55 60 65

2530

35

Percent Active

Per

cent

obe

se

4

Page 5: Chapter 11: SIMPLE LINEAR REGRESSION AND CORRELATION …homepage.divms.uiowa.edu/~rdecook/stat2020/notes/ch11_pt1.pdf · Part 1: Simple Linear Regression (SLR) Introduction Sections

A scatterplot or scatter diagram can give us ageneral idea of the relationship between obe-sity and activity...

40 45 50 55 60 65

2530

35

Percent Active

Per

cent

obe

se

The points are plotted as the pairs (xi, yi)for i = 1, . . . , 25

Inspection suggests a linear relationship be-tween obesity and activity (i.e. a straight linewould go through the bulk of the points, andthe points would look randomly scattered aroundthis line).

5

Page 6: Chapter 11: SIMPLE LINEAR REGRESSION AND CORRELATION …homepage.divms.uiowa.edu/~rdecook/stat2020/notes/ch11_pt1.pdf · Part 1: Simple Linear Regression (SLR) Introduction Sections

Simple Linear RegressionThe model

• The basic model

Yi = β0 + β1xi + εi

– Yi is the observed response or dependentvariable for observation i

– xi is the observed predictor, regressor, ex-planatory variable, independent variable,covariate

– εi is the error term

– εi are iid N(0, σ2)(iid means independently and identically distributed)

6

Page 7: Chapter 11: SIMPLE LINEAR REGRESSION AND CORRELATION …homepage.divms.uiowa.edu/~rdecook/stat2020/notes/ch11_pt1.pdf · Part 1: Simple Linear Regression (SLR) Introduction Sections

– So, E[Yi|xi] = β0 + β1xi + 0 = β0 + β1xi

The conditional mean (i.e. the expectedvalue of Yi given xi, or after conditioningon xi) is “β0 + β1xi” (a point on the esti-mated line).

– Or, as another notation, E[Y |x] = µY |x

– The random scatter around the mean (i.e.around the line) follows a N(0, σ2) distri-bution.

7

Page 8: Chapter 11: SIMPLE LINEAR REGRESSION AND CORRELATION …homepage.divms.uiowa.edu/~rdecook/stat2020/notes/ch11_pt1.pdf · Part 1: Simple Linear Regression (SLR) Introduction Sections

Example: Consider the model that re-gresses Oxygen purity on Hydrocarbon levelin a distillation process with...

β0 = 75 and β1 = 15

For each xi there is a different Oxygen pu-rity mean (which is the center of a normaldistribution of Oxygen purity values).

Plugging in xi to (75+15xi) gives you theconditional mean at xi.

8

Page 9: Chapter 11: SIMPLE LINEAR REGRESSION AND CORRELATION …homepage.divms.uiowa.edu/~rdecook/stat2020/notes/ch11_pt1.pdf · Part 1: Simple Linear Regression (SLR) Introduction Sections

The conditional mean for x = 1:

E[Y |x] = 75 + 15 · 1 = 90

The conditional mean for x = 1.25:

E[Y |x] = 75 + 15 · 1.25 = 93.75

9

Page 10: Chapter 11: SIMPLE LINEAR REGRESSION AND CORRELATION …homepage.divms.uiowa.edu/~rdecook/stat2020/notes/ch11_pt1.pdf · Part 1: Simple Linear Regression (SLR) Introduction Sections

These values that randomly scatter around aconditional mean are called errors.

The random error of observation i is denotedas εi. The errors around a conditional meanare normally distributed, centered at 0, andhave a variance of σ2 or εi ∼N (0,σ).

Here, we assume all the conditional distri-butions of the errors are the same, so we’reusing a constant variance model.

V [Yi|xi] = V (β0 + β1xi + εi) = V (εi) = σ2

10

Page 11: Chapter 11: SIMPLE LINEAR REGRESSION AND CORRELATION …homepage.divms.uiowa.edu/~rdecook/stat2020/notes/ch11_pt1.pdf · Part 1: Simple Linear Regression (SLR) Introduction Sections

• The model can also be written as:

Yi|xi ∼ N(β0 + β1xi , σ2)︸ ︷︷ ︸

Conditionalmean

– mean of Y given x is β0 + β1x (known asconditional mean)

– β0 + β1xi is the mean value of all theY ’s for the given value of xi

The regression line itself represents all theconditional means.

All the observed points will not fall on theline, there is some random noise around themean (we model this part with an error term).

Usually, we will not know β0, β1, or σ2 sowe will estimate them from the data.

11

Page 12: Chapter 11: SIMPLE LINEAR REGRESSION AND CORRELATION …homepage.divms.uiowa.edu/~rdecook/stat2020/notes/ch11_pt1.pdf · Part 1: Simple Linear Regression (SLR) Introduction Sections

• Some interpretation of parameters:

– β0 is conditional mean when x=0

– β1 is the slope, also stated as the changein mean of Y per 1 unit change in x

– σ2 is the variability of responses about theconditional mean

12

Page 13: Chapter 11: SIMPLE LINEAR REGRESSION AND CORRELATION …homepage.divms.uiowa.edu/~rdecook/stat2020/notes/ch11_pt1.pdf · Part 1: Simple Linear Regression (SLR) Introduction Sections

Simple Linear RegressionAssumptions

• Key assumptions

– linear relationship exists between Y and x

*we say the relationship between Y andx is linear if the means of the conditionaldistributions of Y |x lie on a straight line

– independent errors(this essentially equates to independent

observations in the case of SLR)

– constant variance of errors

– normally distributed errors

13

Page 14: Chapter 11: SIMPLE LINEAR REGRESSION AND CORRELATION …homepage.divms.uiowa.edu/~rdecook/stat2020/notes/ch11_pt1.pdf · Part 1: Simple Linear Regression (SLR) Introduction Sections

Simple Linear RegressionEstimation

We wish to use the sample data to estimate thepopulation parameters: the slope β1 and theintercept β0

• Least squares estimation

– To choose the ‘best fitting line’ using leastsquares estimation, we minimize the sumof the squared vertical distances of eachpoint to the fitted line.

14

Page 15: Chapter 11: SIMPLE LINEAR REGRESSION AND CORRELATION …homepage.divms.uiowa.edu/~rdecook/stat2020/notes/ch11_pt1.pdf · Part 1: Simple Linear Regression (SLR) Introduction Sections

– We let ‘hats’ denote predicted values orestimates of parameters, so we have:

yi = β0 + β1xi

where yi is the estimated conditional meanfor xi,

β0 is the estimator for β0,

and β1 is the estimator for β1

– We wish to choose β0 and β1 such that weminimize the sum of the squared verticaldistances of each point to the fitted line,i.e. minimize

∑ni=1(yi − yi)2

– Or minimize the function g:

g(β0, β1) =∑ni=1(yi − yi)2

=∑ni=1(yi − (β0 + β1xi))

2

15

Page 16: Chapter 11: SIMPLE LINEAR REGRESSION AND CORRELATION …homepage.divms.uiowa.edu/~rdecook/stat2020/notes/ch11_pt1.pdf · Part 1: Simple Linear Regression (SLR) Introduction Sections

– This vertical distance of a point from thefitted line is called a residual. The resid-ual for observation i is denoted ei and

ei = yi − yi

– So, in least squares estimation, we wishto minimize the sum of the squaredresiduals (or error sum of squares SSE).

– To minimizeg(β0, β1) =

∑ni=1(yi − (β0 + β1xi))

2

we take the derivative of g with respect toβ0 and β1, set equal to zero, and solve.

∂g

∂β0

= −2

n∑i=1

(yi − (β0 + β1xi)) = 0

∂g

∂β1

= −2

n∑i=1

(yi − (β0 + β1xi))xi = 0

16

Page 17: Chapter 11: SIMPLE LINEAR REGRESSION AND CORRELATION …homepage.divms.uiowa.edu/~rdecook/stat2020/notes/ch11_pt1.pdf · Part 1: Simple Linear Regression (SLR) Introduction Sections

Simplifying the above gives:

nβ0 + β1

n∑i=1

xi =

n∑i=1

yi

β0

n∑i=1

xi + β1

n∑i=1

(x2i ) =

n∑i=1

yixi

And these two equations are known asthe least squares normal equations.

Solving the normal equations gets us ourestimators β0 and β1...

17

Page 18: Chapter 11: SIMPLE LINEAR REGRESSION AND CORRELATION …homepage.divms.uiowa.edu/~rdecook/stat2020/notes/ch11_pt1.pdf · Part 1: Simple Linear Regression (SLR) Introduction Sections

Simple Linear RegressionEstimation

– Estimate of the slope:

β1 =

∑ni=1(xi − x)(yi − y)∑n

i=1(xi − x)2=SxySxx

– Estimate of the Y -intercept:

β0 = y − β1x

the point (x, y) will always be on theleast squares line

Alternative formulas for β0 and β1 are alsogiven in the book.

18

Page 19: Chapter 11: SIMPLE LINEAR REGRESSION AND CORRELATION …homepage.divms.uiowa.edu/~rdecook/stat2020/notes/ch11_pt1.pdf · Part 1: Simple Linear Regression (SLR) Introduction Sections

• Example: Cigarette data(Nicotine vs. Tar content)

●●

0 5 10 15 20 25 30

0.5

1.0

1.5

2.0

Tar

Nic

n = 25

Least squares estimates from software:

β0=0.1309 and β1=0.0610

Summary statistics:∑ni=1 xi = 305.4 x = 12.216∑ni=1 yi = 21.91 y = 0.8764

19

Page 20: Chapter 11: SIMPLE LINEAR REGRESSION AND CORRELATION …homepage.divms.uiowa.edu/~rdecook/stat2020/notes/ch11_pt1.pdf · Part 1: Simple Linear Regression (SLR) Introduction Sections

∑ni=1(yi − y)(xi − x) = 47.01844∑ni=1(xi − x)2 = 770.4336∑ni=1 x

2i = 4501.2

∑ni=1 y

2i = 22.2105

Using the previous formulas and the sum-mary statistics...

β1 =SxySxx

=47.01844

770.4336= 0.061029

and

β0 = y − β1x

= 0.8764− 0.061029(12.216)

= 0.130870

(Same estimates as software)

20

Page 21: Chapter 11: SIMPLE LINEAR REGRESSION AND CORRELATION …homepage.divms.uiowa.edu/~rdecook/stat2020/notes/ch11_pt1.pdf · Part 1: Simple Linear Regression (SLR) Introduction Sections

Simple Linear RegressionEstimating σ2

• One of the assumptions of simple linear re-gression is that the variance for each of theconditional distributions of Y |x is the sameat all x-values (i.e. constant variance).

• In this case, it makes sense to pool all theobserved error information (in the residuals)to come up with a common estimate for σ2

21

Page 22: Chapter 11: SIMPLE LINEAR REGRESSION AND CORRELATION …homepage.divms.uiowa.edu/~rdecook/stat2020/notes/ch11_pt1.pdf · Part 1: Simple Linear Regression (SLR) Introduction Sections

Recall the model:

Yi = β0 + β1xi + εi with εiiid∼ N(0, σ2)

– We use the error sum of squares (SSE)to estimate σ2...

σ2 =SSEn− 2

=

∑ni=1(yi − yi)2

n− 2= MSE

∗ SSE = error sum of squares=∑ni=1(yi − yi)2

∗MSE is the mean squared error

∗ E[MSE] = E[σ2] = σ2 (Unbiasedestimator)

∗ σ =√σ2 =

√MSE

22

Page 23: Chapter 11: SIMPLE LINEAR REGRESSION AND CORRELATION …homepage.divms.uiowa.edu/~rdecook/stat2020/notes/ch11_pt1.pdf · Part 1: Simple Linear Regression (SLR) Introduction Sections

∗ ‘2’ is subtracted from n in the denom-inator because we’ve used 2 degrees offreedom for estimating the slope and in-tercept (i.e. there were 2 parameters es-timated when modeling the conditionalmean)

∗When we estimated σ2 in a single nor-mal population, we divide∑ni=1(yi − yi)2 by (n− 1) because

we only estimated 1 mean structure pa-rameter which was µ, now we’re esti-mate two parameters for our mean struc-ture, β0 and β1.

23