applied regression - @let@token chapter 2 simple linear

Applied Regression

Applied RegressionChapter 2 Simple Linear Regression

Hongcheng Li

April, 6, 2013

Outline

1 Introduction of simple linear regression

2 Scatter plot

3 Simple linear regression model

4 Test of Hypothesis

Applied Regression

Introduction of simple linear regression

1 Introduction of simple linear regressionLinearityCovariance and correlation coefficientsMatrix of correlation coefficientsNonlinear relationship

2 Scatter plot



Applied Regression


Linearity

LinearityScatter plot OR correlation coefficientBefore to apply linear regression, make sure X and Y has linearrelationship.

Figure: Scatter Plot to identify Linear Relationship

Applied Regression


Linearity

Linearity

●

●

●

●

●

●

●●

●

●

●

●

●

●

2 4 6 8 10

2040

6080

100

120

140

160

Units

Min

utes


Applied Regression


Covariance and correlation coefficients

Covariance and correlation coefficient I

1 Sample mean

y =

∑ni=1 yi

n

x =

∑ni=1 xi

n

Applied Regression



Covariance and correlation coefficient II

2 Sample S.D

sx =

√∑ni=1(xi − x)2

n − 1

sy =

√∑ni=1(yi − y)2

n − 1

Applied Regression



Covariance and correlation coefficient III

3 Covariance

Cov(X ,Y ) =

∑ni=1(yi − y)(xi − x)

n − 1

4 Cov(Y ,X ) only tells us the direction of the linear relationshipbetween X and Y . It does not indicate the strength of such arelationship. Its magnitude is affected by changes in the unitsof measurement.

Applied Regression


Matrix of correlation coefficients

Correlation Table I

Applied Regression



Correlation Table II

Figure: Covariance and Coefficients of Correlation

Applied Regression



Correlation coefficients I

Cor(X ,Y ) =1

n − 1

n∑i=1

(yi − y

sy)(

xi − x

sx)

=Cov(Y ,X )

sy sx

=

∑ni=1(yi − y)(xi − x)√∑ni=1(yi − y)2(xi − x)2

Applied Regression



Properties

1 −1 ≤ Cor(X ,Y ) ≤ 1

2 What does it mean if Cor(X ,Y ) = 0?

3 scatter plot!!

4 correlation coefficient can be substantially influenced by oneor few outliers in the data. ref P24 Anscombe’s Quartet

Applied Regression


Nonlinear relationship

Nonelinear relationship I


Applied Regression

Scatter plot


2 Scatter plotComputer repair data



Applied Regression

Scatter plot

Computer repair data

Example 2.3 P 26 IComputer repair data

1 The relationship between the length of service calls and thenumber of electronic components in the computer that mustbe repaired or replaced.

2 scatter plot

3

Cor(Y ,X ) = 0.996

Applied Regression

Simple linear regression model


2 Scatter plot

3 Simple linear regression modelParameter estimation


Applied Regression


The simple linear regression model I

1 The relationship between a response variable Y and apredictor variable X is postulated as a linear model.

Y = β0 + β1X + ε

Each observation in the data set can be written as

yi = β0 + β1xi + εi , i = 1, 2, · · · , n.

Applied Regression


The simple linear regression model II

2 The relationship between the length of service calls and thenumber of electronic components in the computer that mustbe repaired or replaced.

Minuts = β0 + β1 · Units + ε

Applied Regression


Parameter estimation

Parameter estimation I

1 Least square method gives the line that minimizes the sum ofsquares of the vertical distances from each point to the line.The vertical distance is the errors in the response variable,which is as follows:

εi = yi − (β0 + β1xi ), i = 0, 1, 2, · · · , n.

Applied Regression



Parameter estimation II

2 The sum of squares of vertical distances

S(β0, β1) =n∑

i=1

ε2i =n∑

i=1

(yi − (β0 + β1xi ))2

Applied Regression



Formula I

1

β1 =

∑ni=1(yi − y)(xi − x)∑n

i=1(xi − x)2

=Cov(Y ,X )

Var(X )

= Cor(Y ,X )sysx

β0 = y − β1x

Applied Regression



Formula II

2 fitted value

yi = β0 + β1xi , i = 1, 2, · · · , n.

Applied Regression

Test of Hypothesis


2 Scatter plot



Applied Regression

Test of Hypothesis

Test of Hypothesis I

The linear relationship between Y and X can be checked

1 Scatter plot

2 Correlation coefficients

3 Formal way of measuring the usefulness of X asa predictor of Y is to test the hypothesis β1 = 0

Applied Regression

Test of Hypothesis

Assumptions to perform the test of the hypotheses I

1 For every fixed X , the ε′s are assumed to be independentrandom quantities normally distributed with mean zero and acommon variance σ2

2 With the above assumption, β0 andβ1 are the unbiasedestimator of β0 and β1

Applied Regression

Test of Hypothesis

Assumptions to perform the test of the hypotheses II

3 Their variances are (see P33-37):

Var(β0) = σ2[1

n+

x2∑(xi − x)2

] (Var-0)

And

Var(β1) =σ2∑

(xi − x)2(Var-1)

The sampling distributions of the least squares estimates β0

and β1 are normal with means β0 and β1 and variances asgiven in Var-1 and Var-0, respectively.

Applied Regression

Test of Hypothesis

Assumptions to perform the test of the hypotheses III

4 An unbiased estimator of σ2, the variance of ε′s is:

σ2 =

∑e2i

n − 2=

SSE

n − 2

Applied Regression

Test of Hypothesis

Standard deviation-Standard error(s.e.) I

1 An estimate of the standard deviation of an estimator is calledthe standard error of the estimator.

2 The s.e. of β0 and β1 P33

Applied Regression

Test of Hypothesis

Testing H0 : β1 = β01

1 Test statistics

t1 =β1 − β0

1

s.e.(β1)

2 P35 H0 is to be rejected at the significance level of α if

|t1| ≥ t(n−2,α/2)

ORp(t ≥ |t1|) ≤ α

Applied Regression

Test of Hypothesis

Testing H0 : β0 = β00 I

1 Test statistics

t0 =β0 − β0

0

s.e.(β0)

2 P35

|t0| ≥ t(n−2,α/2)

OR

p(t ≥ |t0|) ≤ α

Applied Regression

Test of Hypothesis

Testing H0 : ρ = 0 I

1 Test statistics

t1 =Cor(Y ,X )

√n − 2√

1− [Cor(Y ,X )]2

Applied Regression

Test of Hypothesis

Confidence intervals of β0 and β1 I

1

β0 ± t(n−2,α/2) × s.e.(β0)

2

β1 ± t(n−2,α/2) × s.e.(β1)

Applied Regression

Prediction


2 Scatter plot



Applied Regression

Prediction

Prediction I

1 P38 The prediction of the value of the response variable Ywhich corresponds to any chosen value, x0, of the predictorvariable, the predicted value y0 is

y0 = β0 + β1x0.

The standard error of this prediction is

s.e.(y0) = σ

√1 +

1

n+

(x0 − x)2∑(xi − x)2

Applied Regression

Prediction

Prediction II

The confidence limits for the predicted value with the level ofconfidence (1− α) are given by

y0 ± t(n−2,α/2)s.e.(y0)

Applied Regression

Prediction

Prediction III

2 The estimation of the mean response µ0, when X = x0.

µ0 = β0 + β1x0

Applied Regression

Prediction

Prediction IV

3 Care should be taken in employing fitted regression lines forprediction far outside the range of observations.

Applied Regression

Prediction

Measuring the quality of fit I

1 Test of hypotheses

2 scatter plot of Y versus X

3 scatter plot of Y versus Y

4 Coefficient of determination R2

SST = SSR + SSE

R2 =SSR

SST= 1− SSE

SST= [Cor(Y ,X )]2 = [Cor(Y , Y )]2

Applied Regression

Prediction

Regression through the origin

1 The so-called no-intercept model

Y = β1X + ε (no-intercept)

2 The R2 value obtained from the the (no-intercept) is notstrictly comparable with the R2 value obtained from the(intercept model) model. Compared with the (interceptmodel) model, the R2 from the (no-intercept) model might bepossible negative.

Y = β0 + β1X + ε (intercept)

Applied Regression

Prediction

H.W. I

1 P45 2.3 2.4 2.6 2.7 2.9

2 2.10 ∼ 2.12

applied regression - @let@token chapter 2 simple linear

Documents