chapter 11: simple linear regression and correlation...
Embed Size (px)
TRANSCRIPT

Chapter 11: SIMPLE LINEAR REGRESSIONAND CORRELATION
Part 1: Simple Linear Regression (SLR)
Introduction
Sections 111 and 112
Abrasion Loss vs. Hardness
Price of clock vs. Age of clock
1000
1400
1800
2200
125 150 175Age of Clock (yrs)
Pric
e So
ld a
t Auc
tion
5.0
7.5
10.0
12.5
15.0Bidders
1

•Regression is a method for studying therelationship between two or morequantitative variables
• Simple linear regression (SLR):One quantitative dependent variable
 response variable
 dependent variable
 Y
One quantitative independent variable
 explanatory variable
 predictor variable
 X
•Multiple linear regression:One quantitative dependent variable
Many quantitative independent variables
– You’ll see this in STAT:3200/IE:3760Applied Linear Regression, if you take it.
2

• SLR Examples:– predict salary from years of experience
– estimate effect of lead exposure on schooltesting performance
– predict force at which a metal alloy rodbends based on iron content
3

• Example: Health dataVariables:
Percent of Obese IndividualsPercent of Active Individuals
Data from CDC. Units are regions of U.S. in 2014.
PercentObesity PercentActive
1 29.7 55.3
2 28.9 51.9
3 35.9 41.2
4 24.7 56.3
5 21.3 60.4
6 26.3 50.9
.
.
.
40 45 50 55 60 65
2530
35
Percent Active
Per
cent
obe
se
4

A scatterplot or scatter diagram can give us ageneral idea of the relationship between obesity and activity...
40 45 50 55 60 65
2530
35
Percent Active
Per
cent
obe
se
The points are plotted as the pairs (xi, yi)for i = 1, . . . , 25
Inspection suggests a linear relationship between obesity and activity (i.e. a straight linewould go through the bulk of the points, andthe points would look randomly scattered aroundthis line).
5

Simple Linear RegressionThe model
• The basic model
Yi = β0 + β1xi + �i
– Yi is the observed response or dependentvariable for observation i
– xi is the observed predictor, regressor, explanatory variable, independent variable,covariate
– �i is the error term
– �i are iid N(0, σ2)
(iid means independently and identically distributed)
6

– So, E[Yixi] = β0 + β1xi + 0 = β0 + β1xi
The conditional mean (i.e. the expectedvalue of Yi given xi, or after conditioningon xi) is “β0 + β1xi” (a point on the estimated line).
– Or, as another notation, E[Y x] = µY x
– The random scatter around the mean (i.e.around the line) follows a N(0, σ2) distribution.
7

Example: Consider the model that regresses Oxygen purity on Hydrocarbon levelin a distillation process with...
β0 = 75 and β1 = 15
For each xi there is a different Oxygen purity mean (which is the center of a normaldistribution of Oxygen purity values).
Plugging in xi to (75+15xi) gives you theconditional mean at xi.
8

The conditional mean for x = 1:
E[Y x] = 75 + 15 · 1 = 90
The conditional mean for x = 1.25:
E[Y x] = 75 + 15 · 1.25 = 93.75
9

These values that randomly scatter around aconditional mean are called errors.
The random error of observation i is denotedas �i. The errors around a conditional meanare normally distributed, centered at 0, andhave a variance of σ2 or �i ∼N (0,σ).
Here, we assume all the conditional distributions of the errors are the same, so we’reusing a constant variance model.
V [Yixi] = V (β0 + β1xi + �i) = V (�i) = σ2
10

• The model can also be written as:
Yixi ∼ N(β0 + β1xi , σ2)︸ ︷︷ ︸Conditional
mean
– mean of Y given x is β0 + β1x (known asconditional mean)
– β0 + β1xi is the mean value of all theY ’s for the given value of xi
The regression line itself represents all theconditional means.
All the observed points will not fall on theline, there is some random noise around themean (we model this part with an error term).
Usually, we will not know β0, β1, or σ2 so
we will estimate them from the data.11

• Some interpretation of parameters:
– β0 is conditional mean when x=0
– β1 is the slope, also stated as the changein mean of Y per 1 unit change in x
– σ2 is the variability of responses about theconditional mean
12

Simple Linear RegressionAssumptions
• Key assumptions
– linear relationship exists between Y and x
*we say the relationship between Y andx is linear if the means of the conditionaldistributions of Y x lie on a straight line
– independent errors(this essentially equates to independent
observations in the case of SLR)
– constant variance of errors
– normally distributed errors
13

Simple Linear RegressionEstimation
We wish to use the sample data to estimate thepopulation parameters: the slope β1 and theintercept β0
• Least squares estimation
– To choose the ‘best fitting line’ using leastsquares estimation, we minimize the sumof the squared vertical distances of eachpoint to the fitted line.
14

– We let ‘hats’ denote predicted values orestimates of parameters, so we have:
ŷi = β̂0 + β̂1xi
where ŷi is the estimated conditional meanfor xi,
β̂0 is the estimator for β0,
and β̂1 is the estimator for β1
– We wish to choose β̂0 and β̂1 such that weminimize the sum of the squared verticaldistances of each point to the fitted line,i.e. minimize
∑ni=1(yi − ŷi)2
– Or minimize the function g:
g(β̂0, β̂1) =∑ni=1(yi − ŷi)2
=∑ni=1(yi − (β̂0 + β̂1xi))215

– This vertical distance of a point from thefitted line is called a residual. The residual for observation i is denoted ei and
ei = yi − ŷi
– So, in least squares estimation, we wishto minimize the sum of the squaredresiduals (or error sum of squares SSE).
– To minimizeg(β̂0, β̂1) =
∑ni=1(yi − (β̂0 + β̂1xi))2
we take the derivative of g with respect toβ̂0 and β̂1, set equal to zero, and solve.
∂g
∂β̂0= −2
n∑i=1
(yi − (β̂0 + β̂1xi)) = 0
∂g
∂β̂1= −2
n∑i=1
(yi − (β̂0 + β̂1xi))xi = 0
16

Simplifying the above gives:
nβ̂0 + β̂1
n∑i=1
xi =n∑i=1
yi
β̂0
n∑i=1
xi + β̂1
n∑i=1
(x2i ) =
n∑i=1
yixi
And these two equations are known asthe least squares normal equations.
Solving the normal equations gets us ourestimators β̂0 and β̂1...
17

Simple Linear RegressionEstimation
– Estimate of the slope:
β̂1 =
∑ni=1(xi − x̄)(yi − ȳ)∑n
i=1(xi − x̄)2=SxySxx
– Estimate of the Y intercept:
β̂0 = ȳ − β̂1x̄
the point (x̄, ȳ) will always be on theleast squares line
Alternative formulas for β̂0 and β̂1 are alsogiven in the book.
18

• Example: Cigarette data(Nicotine vs. Tar content)
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0 5 10 15 20 25 30
0.5
1.0
1.5
2.0
Tar
Nic
n = 25
Least squares estimates from software:
β̂0=0.1309 and β̂1=0.0610
Summary statistics:∑ni=1 xi = 305.4 x̄ = 12.216∑ni=1 yi = 21.91 ȳ = 0.8764
19

∑ni=1(yi − ȳ)(xi − x̄) = 47.01844∑ni=1(xi − x̄)2 = 770.4336∑ni=1 x
2i = 4501.2
∑ni=1 y
2i = 22.2105
Using the previous formulas and the summary statistics...
β̂1 =SxySxx
=47.01844
770.4336= 0.061029
and
β̂0 = ȳ − β̂1x̄
= 0.8764− 0.061029(12.216)
= 0.130870
(Same estimates as software)
20

Simple Linear RegressionEstimating σ2
• One of the assumptions of simple linear regression is that the variance for each of theconditional distributions of Y x is the sameat all xvalues (i.e. constant variance).
• In this case, it makes sense to pool all theobserved error information (in the residuals)to come up with a common estimate for σ2
21

Recall the model:
Yi = β0 + β1xi + �i with �iiid∼ N(0, σ2)
– We use the error sum of squares (SSE)to estimate σ2...
σ̂2 =SSEn− 2
=
∑ni=1(yi − ŷi)2
n− 2= MSE
∗ SSE = error sum of squares=∑ni=1(yi − ŷi)2
∗MSE is the mean squared error
∗ E[MSE] = E[σ̂2] = σ2 (Unbiasedestimator)
∗ σ̂ =√σ̂2 =
√MSE
22

∗ ‘2’ is subtracted from n in the denominator because we’ve used 2 degrees offreedom for estimating the slope and intercept (i.e. there were 2 parameters estimated when modeling the conditionalmean)
∗When we estimated σ2 in a single normal population, we divide∑ni=1(yi − ŷi)2 by (n− 1) because
we only estimated 1 mean structure parameter which was µ, now we’re estimate two parameters for our mean structure, β0 and β1.
23