applied regression - @let@token chapter 2 simple linear
TRANSCRIPT
Applied Regression
Applied RegressionChapter 2 Simple Linear Regression
Hongcheng Li
April, 6, 2013
Outline
1 Introduction of simple linear regression
2 Scatter plot
3 Simple linear regression model
4 Test of Hypothesis
Applied Regression
Introduction of simple linear regression
1 Introduction of simple linear regressionLinearityCovariance and correlation coefficientsMatrix of correlation coefficientsNonlinear relationship
2 Scatter plot
3 Simple linear regression model
4 Test of Hypothesis
Applied Regression
Introduction of simple linear regression
Linearity
LinearityScatter plot OR correlation coefficientBefore to apply linear regression, make sure X and Y has linearrelationship.
Figure: Scatter Plot to identify Linear Relationship
Applied Regression
Introduction of simple linear regression
Linearity
Linearity
●
●
●
●
●
●
●●
●
●
●
●
●
●
2 4 6 8 10
2040
6080
100
120
140
160
Units
Min
utes
Figure: Scatter Plot to identify Linear Relationship
Applied Regression
Introduction of simple linear regression
Covariance and correlation coefficients
Covariance and correlation coefficient I
1 Sample mean
y =
∑ni=1 yi
n
x =
∑ni=1 xi
n
Applied Regression
Introduction of simple linear regression
Covariance and correlation coefficients
Covariance and correlation coefficient II
2 Sample S.D
sx =
√∑ni=1(xi − x)2
n − 1
sy =
√∑ni=1(yi − y)2
n − 1
Applied Regression
Introduction of simple linear regression
Covariance and correlation coefficients
Covariance and correlation coefficient III
3 Covariance
Cov(X ,Y ) =
∑ni=1(yi − y)(xi − x)
n − 1
4 Cov(Y ,X ) only tells us the direction of the linear relationshipbetween X and Y . It does not indicate the strength of such arelationship. Its magnitude is affected by changes in the unitsof measurement.
Applied Regression
Introduction of simple linear regression
Matrix of correlation coefficients
Correlation Table I
Applied Regression
Introduction of simple linear regression
Matrix of correlation coefficients
Correlation Table II
Figure: Covariance and Coefficients of Correlation
Applied Regression
Introduction of simple linear regression
Matrix of correlation coefficients
Correlation coefficients I
Cor(X ,Y ) =1
n − 1
n∑i=1
(yi − y
sy)(
xi − x
sx)
=Cov(Y ,X )
sy sx
=
∑ni=1(yi − y)(xi − x)√∑ni=1(yi − y)2(xi − x)2
Applied Regression
Introduction of simple linear regression
Matrix of correlation coefficients
Properties
1 −1 ≤ Cor(X ,Y ) ≤ 1
2 What does it mean if Cor(X ,Y ) = 0?
3 scatter plot!!
4 correlation coefficient can be substantially influenced by oneor few outliers in the data. ref P24 Anscombe’s Quartet
Applied Regression
Introduction of simple linear regression
Nonlinear relationship
Nonelinear relationship I
Figure: Scatter Plot to identify Linear Relationship
Applied Regression
Scatter plot
1 Introduction of simple linear regression
2 Scatter plotComputer repair data
3 Simple linear regression model
4 Test of Hypothesis
Applied Regression
Scatter plot
Computer repair data
Example 2.3 P 26 IComputer repair data
1 The relationship between the length of service calls and thenumber of electronic components in the computer that mustbe repaired or replaced.
2 scatter plot
3
Cor(Y ,X ) = 0.996
Applied Regression
Simple linear regression model
1 Introduction of simple linear regression
2 Scatter plot
3 Simple linear regression modelParameter estimation
4 Test of Hypothesis
Applied Regression
Simple linear regression model
The simple linear regression model I
1 The relationship between a response variable Y and apredictor variable X is postulated as a linear model.
Y = β0 + β1X + ε
Each observation in the data set can be written as
yi = β0 + β1xi + εi , i = 1, 2, · · · , n.
Applied Regression
Simple linear regression model
The simple linear regression model II
2 The relationship between the length of service calls and thenumber of electronic components in the computer that mustbe repaired or replaced.
Minuts = β0 + β1 · Units + ε
Applied Regression
Simple linear regression model
Parameter estimation
Parameter estimation I
1 Least square method gives the line that minimizes the sum ofsquares of the vertical distances from each point to the line.The vertical distance is the errors in the response variable,which is as follows:
εi = yi − (β0 + β1xi ), i = 0, 1, 2, · · · , n.
Applied Regression
Simple linear regression model
Parameter estimation
Parameter estimation II
2 The sum of squares of vertical distances
S(β0, β1) =n∑
i=1
ε2i =n∑
i=1
(yi − (β0 + β1xi ))2
Applied Regression
Simple linear regression model
Parameter estimation
Formula I
1
β1 =
∑ni=1(yi − y)(xi − x)∑n
i=1(xi − x)2
=Cov(Y ,X )
Var(X )
= Cor(Y ,X )sysx
β0 = y − β1x
Applied Regression
Simple linear regression model
Parameter estimation
Formula II
2 fitted value
yi = β0 + β1xi , i = 1, 2, · · · , n.
Applied Regression
Test of Hypothesis
1 Introduction of simple linear regression
2 Scatter plot
3 Simple linear regression model
4 Test of Hypothesis
Applied Regression
Test of Hypothesis
Test of Hypothesis I
The linear relationship between Y and X can be checked
1 Scatter plot
2 Correlation coefficients
3 Formal way of measuring the usefulness of X asa predictor of Y is to test the hypothesis β1 = 0
Applied Regression
Test of Hypothesis
Assumptions to perform the test of the hypotheses I
1 For every fixed X , the ε′s are assumed to be independentrandom quantities normally distributed with mean zero and acommon variance σ2
2 With the above assumption, β0 andβ1 are the unbiasedestimator of β0 and β1
Applied Regression
Test of Hypothesis
Assumptions to perform the test of the hypotheses II
3 Their variances are (see P33-37):
Var(β0) = σ2[1
n+
x2∑(xi − x)2
] (Var-0)
And
Var(β1) =σ2∑
(xi − x)2(Var-1)
The sampling distributions of the least squares estimates β0
and β1 are normal with means β0 and β1 and variances asgiven in Var-1 and Var-0, respectively.
Applied Regression
Test of Hypothesis
Assumptions to perform the test of the hypotheses III
4 An unbiased estimator of σ2, the variance of ε′s is:
σ2 =
∑e2i
n − 2=
SSE
n − 2
Applied Regression
Test of Hypothesis
Standard deviation-Standard error(s.e.) I
1 An estimate of the standard deviation of an estimator is calledthe standard error of the estimator.
2 The s.e. of β0 and β1 P33
Applied Regression
Test of Hypothesis
Testing H0 : β1 = β01
1 Test statistics
t1 =β1 − β0
1
s.e.(β1)
2 P35 H0 is to be rejected at the significance level of α if
|t1| ≥ t(n−2,α/2)
ORp(t ≥ |t1|) ≤ α
Applied Regression
Test of Hypothesis
Testing H0 : β0 = β00 I
1 Test statistics
t0 =β0 − β0
0
s.e.(β0)
2 P35
|t0| ≥ t(n−2,α/2)
OR
p(t ≥ |t0|) ≤ α
Applied Regression
Test of Hypothesis
Testing H0 : ρ = 0 I
1 Test statistics
t1 =Cor(Y ,X )
√n − 2√
1− [Cor(Y ,X )]2
Applied Regression
Test of Hypothesis
Confidence intervals of β0 and β1 I
1
β0 ± t(n−2,α/2) × s.e.(β0)
2
β1 ± t(n−2,α/2) × s.e.(β1)
Applied Regression
Prediction
1 Introduction of simple linear regression
2 Scatter plot
3 Simple linear regression model
4 Test of Hypothesis
Applied Regression
Prediction
Prediction I
1 P38 The prediction of the value of the response variable Ywhich corresponds to any chosen value, x0, of the predictorvariable, the predicted value y0 is
y0 = β0 + β1x0.
The standard error of this prediction is
s.e.(y0) = σ
√1 +
1
n+
(x0 − x)2∑(xi − x)2
Applied Regression
Prediction
Prediction II
The confidence limits for the predicted value with the level ofconfidence (1− α) are given by
y0 ± t(n−2,α/2)s.e.(y0)
Applied Regression
Prediction
Prediction III
2 The estimation of the mean response µ0, when X = x0.
µ0 = β0 + β1x0
Applied Regression
Prediction
Prediction IV
3 Care should be taken in employing fitted regression lines forprediction far outside the range of observations.
Applied Regression
Prediction
Measuring the quality of fit I
1 Test of hypotheses
2 scatter plot of Y versus X
3 scatter plot of Y versus Y
4 Coefficient of determination R2
SST = SSR + SSE
R2 =SSR
SST= 1− SSE
SST= [Cor(Y ,X )]2 = [Cor(Y , Y )]2
Applied Regression
Prediction
Regression through the origin
1 The so-called no-intercept model
Y = β1X + ε (no-intercept)
2 The R2 value obtained from the the (no-intercept) is notstrictly comparable with the R2 value obtained from the(intercept model) model. Compared with the (interceptmodel) model, the R2 from the (no-intercept) model might bepossible negative.
Y = β0 + β1X + ε (intercept)
Applied Regression
Prediction
H.W. I
1 P45 2.3 2.4 2.6 2.7 2.9
2 2.10 ∼ 2.12