chapter 6 simple regression

40
Chapter 6 Simple Regression

Upload: mulan

Post on 07-Jan-2016

37 views

Category:

Documents


1 download

DESCRIPTION

Chapter 6 Simple Regression. 6.1 - Introduction. Fundamental questions Is there a relationship between two random variables and how strong is it? Can we predict the value of one if we know the value of the other? Example - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Chapter 6  Simple  Regression

Chapter 6 Simple Regression

Page 2: Chapter 6  Simple  Regression

6.1 - Introduction

Fundamental questions – Is there a relationship between two random

variables and how strong is it?– Can we predict the value of one if we know the

value of the other?

Example– The author had ten of his students measure their

shoe length and height

Page 3: Chapter 6  Simple  Regression

Scatterplot

Page 4: Chapter 6  Simple  Regression

6.2 – Covariance and Correlation

Definition 6.2.1 Let and be two random variables with respective means and . The covariance of and is

Alternatively,

( , ) X YCov X Y E X Y

( , ) ( ) X YCov X Y E XY

Page 5: Chapter 6  Simple  Regression

Example 6.2.1

1(1/ 9) 2(1/ 3) 3(5 / 9) 22 / 9

1(5 / 9) 2(1/ 3) 3(1/ 9) 14 / 9

( ) 1 1(1/ 9) 2 1(2 / 9) 3 3(1/ 9) 4

( , ) 4 (22 / 9)(14 / 9) 16 / 81 0.1975

X

Y

E XY

Cov X Y

Page 6: Chapter 6  Simple  Regression

Correlation Coefficient

Definition 6.2.2 Let and be random variables with standard deviations and , respectively. The correlation coefficient of and is

Theorem 6.2.2

( )( , )( , ) X Y

X Y X Y

E XYCov X YX Y

( , ) 1

( , ) 0

( , ) 1 if and only if 1 for so

1.

2. If and are independent, then

me

con

3

stants an

.

d

X Y

X Y

X Y P Y mX b

m b

X Y

Page 7: Chapter 6  Simple  Regression

Sample Correlation Coefficient

Definition 6.2.3 The sample correlation coefficient of n pairs of data values is

Alternatively,

2 2

1

1 1

i i

i i

x y x ynr

x x y yn n

2 22 2

i i i i

i i i i

n x y x yr

n x x n y y

Page 8: Chapter 6  Simple  Regression

Sample Correlation Coefficient

r measures the strength of a linear relationship

Page 9: Chapter 6  Simple  Regression

Bivariate Normal Distribution

Definition 6.2.4 Let

Two variables X and Y are said to have a bivariate normal distribution if their joint p.d.f. is

2 2

2

1( , ) 2

1X X Y Y

X X Y Y

x x y yh x y

( , )/2

2

1( , )

2 1

h x y

X Y

f x y e

Page 10: Chapter 6  Simple  Regression

Bivariate Normal Distribution

Theorem 6.2.3 Two random variables and with a bivariate normal distribution are independent if and only if .

Page 11: Chapter 6  Simple  Regression

T-test of

T-test of for Bivariate Random Variables

Purpose: To test the null hypothesis H0: where and have a bivariate normal distribution.– Test statistic

– Critical value: t-score with degrees of freedom

2

2

1

nt r

r

Page 12: Chapter 6  Simple  Regression

Example 6.2.4

For the shoe length vs height data, , – Test the claim that

H0: H1:

– Test statistic

2

10 20.974 12.16

1 (0.974)t

Page 13: Chapter 6  Simple  Regression

Example 6.2.4

– Critical value: – Critical region: – P-value = twice the region to the right of which is 0

– Reject H0

Final conclusion:– There is a statistically significant linear

relationship between shoe length and height.

Page 14: Chapter 6  Simple  Regression

6.3 – Method of Least-Squares

We want to find and that minimize

22

1 1

ˆˆ ˆn n

i i i ii i

S y y y mx b

Page 15: Chapter 6  Simple  Regression

Method of Least-Squares

22

ˆˆ2 ( ) 0ˆ

ˆˆ2 ( 1) 0ˆ

ˆˆ ˆand

i i i

i i

i i i i

i i

Sy mx b x

mS

y mx bb

n x y x ym b y mx

n x x

Page 16: Chapter 6  Simple  Regression

Example 6.3.1

Page 17: Chapter 6  Simple  Regression

Example 6.3.1

Suppose a crime scene investigator finds a shoe print outside a window that measures 11.25 in long and would like to estimate the height of the person who made the print

Cautions1. If there is no linear correlation, do not use a linear

regression equation to make predictions

2. Only use a linear regression equation to make predictions within the range of the x-values of the data

ˆ 3.878(11.25) 25.84 69.47y

Page 18: Chapter 6  Simple  Regression

6.4 – The Simple Linear Model

Definition 6.4.1 Two random variables and are said to be described by a simple linear model if

where and are constants and is a random variable independent of that is where is a constant.

Y mX b ò

Page 19: Chapter 6  Simple  Regression

Residuals

Definition 6.4.2 For a set of data the residuals are

where and are the least-squares estimates of m and b as calculated in Section 6.3– Observed values of

ˆˆ ˆ for 1, ,i i i iy y y mx b i n

Page 20: Chapter 6  Simple  Regression

Example 6.4.1

Page 21: Chapter 6  Simple  Regression

Standard Error of Estimate

Definition 6.4.3 Let and be described by a simple linear model. The standard error of estimate is

– An unbiased estimate of , the variance of

2

1

1 ˆˆ2

n

e i ii

s y mx bn

Page 22: Chapter 6  Simple  Regression

Prediction Interval

Definition 6.4.4 Let and be described by a simple linear model. Given a value of , say , a prediction interval estimate for the corresponding value of is

where , the margin of error is

and is a critical t-value with d.f.

ˆ ˆy E Y y E

2

0/2 2

11e

i

x xE t s

n x x

Page 23: Chapter 6  Simple  Regression

Confidence Interval for

Definition 6.4.5 Let X and Y be described by a simple linear model . A confidence interval estimate of is

where the margin of error is

and is a critical t-value with d.f.

ˆ ˆm E m m E

/2 2

e

i

sE t

x x

Page 24: Chapter 6  Simple  Regression

T-Test of the Slope

Let and be described by a simple linear model . To test the null hypothesis

H0: ,

the test statistic is

the critical value is a t-score with degrees of freedom, and the P-value is the area under the corresponding density curve.

2

0

1ˆ ,i

e

t m m x xs

Page 25: Chapter 6  Simple  Regression

6.5 – Sums of Squares and ANOVA

Variation

22 2ˆ ˆ( ) ( )Tot i Reg i Res i iSS y y SS y y SS y y

Page 26: Chapter 6  Simple  Regression

Coefficient of Determination

– The square of the sample correlation coefficient

Interpretation– “The proportion of the total variation in the -values

from explained (or accounted for) by the regression equation.”

2 Tot Res

Tot

SS SSr

SS

Page 27: Chapter 6  Simple  Regression

F-Test of the Slope

Let X and Y be described by a simple linear model . To test the hypotheses

H0: vs. H1: ,

the test statistic is

The critical value is The P-value is the area under the corresponding density curve to the right of the test statistic.

/ ( 2)Reg

Res

SSf

SS n

Page 28: Chapter 6  Simple  Regression

6.6 – Nonlinear Regression

Example: and are described by – Use the data below to estimate and – is linear with respect to – “Transform” the -values

Page 29: Chapter 6  Simple  Regression

Nonlinear Regression

Page 30: Chapter 6  Simple  Regression

Transformations

Page 31: Chapter 6  Simple  Regression

Example 6.6.1

• People/physician ()• Male life expectancy ()

(World Almanac Book of Facts, 1992, Pharos Books)

• Fit Power and Exponential models to the data

Page 32: Chapter 6  Simple  Regression

Example 6.6.1

Page 33: Chapter 6  Simple  Regression

6.7 – Multiple Regression

Goal: Predict the value of a variable in terms of two or more other variables – – response variable– – predictor variables

Assume a relation of the form

– Use software to estimate coefficients1 1 k kY m X m X b ò

Page 34: Chapter 6  Simple  Regression

Example

Predict Selling Price in terms of Area, Acres, and Bedrooms

Page 35: Chapter 6  Simple  Regression

Outputs

Coefficients: Yield the multiple regression equation

Standard error: Use to calculate confidence interval estimate of the coefficients

where is a critical t-value with d.f.

1 2 3ˆ 20.9 10339.6 2641.1 41510.9y x x x

/2 /2ˆ ˆi i i i im t s m m t s

Page 36: Chapter 6  Simple  Regression

Outputs

t Stat: Test statistic for the hypotheses

H0: , H1:

in the presence of the other predictor variables– Small P-value indicates that the variable is

“statistically significant”

Page 37: Chapter 6  Simple  Regression

ANOVA Results

F – Test statistic for the hypothesesH0: , H1: at least one is not 0

Significance F – Corresponding P-value– Measures the “overall significance” of the set of predictor

variables– Small P-value: The set is “statistically significant”

Page 38: Chapter 6  Simple  Regression

Regression Statistics

Multiple R – Multiple regression equivalent of the sample correlation coefficient r

R Squared – Multiple coefficient of determination

Page 39: Chapter 6  Simple  Regression

Regression Statistics

Adjusted R Square – Calculated with the formula

– The higher the value, the better the overall quality of the model

Standard Error – Estimate of the standard deviation of the random variable in the multiple regression model– Also called the standard error of estimate

2 21Adjusted 1 1

1

nR R

n k

Page 40: Chapter 6  Simple  Regression

Which Set of Variables is “Best?”

• Very complicated to answer• A very simple approach:– Compare , Adjusted , and P-values

– Area and Acres are “best”