simple linear regression. correlation correlation ( ) measures the strength of the linear...

38
Simple Linear Regression

Upload: emerald-watts

Post on 28-Dec-2015

226 views

Category:

Documents


0 download

TRANSCRIPT

Simple Linear Regression

Correlation Correlation () measures the strength of the

linear relationship between two sets of data (X,Y).

The value for is always between -1 and +1.

Correlation helps answer the question:

if X is above its average value does Y tend to be above or below its average value?

If X increases does Y tend to increase or decrease?

Scatter Plot

Individual Height Weight1.0 60.0 137.42.0 60.5 130.93.0 61.0 146.14.0 61.5 157.15.0 62.0 158.46.0 62.5 165.07.0 63.0 133.18.0 63.5 152.09.0 64.0 165.5. . .. . .. . .

100.0

120.0

140.0

160.0

180.0

200.0

220.0

240.0

260.0

280.0

58.0 63.0 68.0 73.0 78.0 83.0 88.0

Height

Wei

gh

t

Positive Correlation If the correlation between two variables is positive (greater

than 0), When X is above its average, Y tends to be above its

average When X increases, Y tends to increase.

100

120

140

160

180

200

220

240

260

280

58 63 68 73 78 83 88

Height

Wei

gh

t = 0.94

Negative Correlation If the correlation between two variables is negative (less

than 0), When X is above its average, Y tends to be below its

average When X increases, Y tends to decrease.

$200.0

$210.0

$220.0

$230.0

$240.0

$250.0

$260.0

$270.0

$280.0

$0.8 $1.0 $1.2 $1.4 $1.6

Price

Sal

es

= -0.87

Perfect Correlation

= -1 = 1

If know X, know Y

-15

-10

-5

0

5

10

15

-15 -10 -5 0 5 10 15

-15

-10

-5

0

5

10

15

-15 -10 -5 0 5 10 15

No Correlation

= -0.059

Knowing X does not help predicting Y

-15

-10

-5

0

5

10

15

-15 -10 -5 0 5 10 15

Returns and Assets Managed Correlation between the return of an mutual

fund and the amount of assets managed?

Annual returns for 385 US equity mutual funds, year ending July 1998. Data provided by Lipper Analytic.

Retunes and Assets Managed Sample of the Data

. . .

Mutual Fund Name Assets($Mill) (X) Annual Return (Y)

EQUITRUST:VAL GRO R 111.5 -4.09

FPA PARAMOUNT 572.2 -1.21

FRANKLIN VAL:VALUE;I 133.9 5.20

IMS CAPITAL VALUE FUND 13.0 6.31

HERITAGE:VALUE EQTY;A 20.0 6.65

ADVANTUS CORNERSTONE;A 113.9 6.69

PUTNAM NEW VALUE;B R 461.2 8.46

YACKTMAN FUND 796.0 8.59

GREENSPRING FUND 182.9 9.22

PIONEER II;A 7239.0 9.54

FRANKLIN ALL:MODERT;I 21.1 10.03

. . .

. . .

Average = 1413.16 = 23.20

Standard Deviation SX = 4908.93 SY = 6.73

Retunes and Assets Managed Correlation = 0.0339

Does the size of Mutual Fund tell You anything about Expected Returns?

Charictoristics of Correlation Positive/Negative: Increase/Decrease

Positive: X increases, then Y increases Negative: X increases, then Y decreases

Prefect Correlation If know X, then know Y All observations are on a straight line

No Corrrelation No relationship between X and Y

Correlation Quiz Imagine that the correlation between price of a product

and weekly sales is –0.8. The average price for the product was $1 and the average of the weekly sales was $200 per week. If the price for the product is set at $1.5 which of the following average weekly sales would be reasonable?

-100 200 240 160

Regression Questions Three Questions

What is the best estimate of a and ? Which line fits best?

Are a and different than zero? Is there anything going on?

How much of Y is explained by X? How much of the total variation of Y is explained by X?

Best Guess? If Knew X, what would be guess for Y?

X = 78

Y=Average value of Y

100

120

140

160

180

200

220

240

260

280

58 63 68 73 78 83 88

Height

Wei

gh

t

Best Guess? If Knew X, what would be guess for Y?

X = 78

Draw a line that“describes” X in terms of Y.

100

120

140

160

180

200

220

240

260

280

58 63 68 73 78 83 88

Height

Wei

gh

t

Equation of a Straight Line

Intercept: Value of Y when X = 0:

Y a X a

Equation of a Straight Line

Intercept: Value of Y when X = 0: Weight when height = 0

Y a X a

Equation of a Straight Line

Intercept: Value of Y when X = 0: Weight when height = 0 Sales when price = 0

Y a X a

Equation of a Straight Line

Intercept: Value of Y when X = 0: Weight when height = 0 Sales when price = 0

Slope: Change Y/Change X:

Y a X a

Equation of a Straight Line

Intercept: Value of Y when X = 0: Weight when height = 0 Sales when price = 0

Slope: Change Y/Change X: Expected change in weight when height increases by

1

Y a X a

Equation of a Straight Line

Intercept: Value of Y when X = 0: Weight when height = 0 Sales when price = 0

Slope: Change Y/Change X: Expected change in weight when height increases by

1 Expected change in sales when price increases by 1

Y a X a

Statistical Notation (Language) Y is known as the dependent (or response)

variable Typcically we want to have some control over Y

X is know as the independent (or predictor) variable

Often we have some control over X – e.g. Price

Best Straight Line?

100

120

140

160

180

200

220

240

260

280

58 63 68 73 78 83 88

Height

Wei

gh

t

Y a X

a Choose: Choose:

Minimize Forecast Error

100

120

140

160

180

200

220

240

260

280

58 63 68 73 78 83 88

Height

Wei

gh

t

ˆˆ ˆi iY a X Forecast: iYObservation:

Error:

ˆi iY Y

Minimize Forecast Error

Choose a and so that they minimize the total error

In particular minimize the total sum of squared errorsˆ

i i iY Y

iY observed value (height for person i)

iY expected value (height for person i) based onestimates of and a

What is the best estimate of a and ?

Choose a and so that they minimize the total error

In particular minimize the total sum of squared errors

2

1

n

ii

SSE

ˆi i iY Y ˆˆ ˆi iY a X

Best Line for Height vs Weight

Choose a and so that they minimize the total error

Use a Statistical Software (e.g. SPSS)ˆ 100.35a ˆ 3.986

Coefficientsa

-100.350 14.586 -6.880 .000

3.986 .200 .943 19.915 .000

(Constant)

HEIGHT

Model1

B Std. Error

UnstandardizedCoefficients

Beta

Standardized

Coefficients

t Sig.

Dependent Variable: WEIGHTa.

Interpreting Coefficients

ˆ 3.986

ˆ 100.35a Height = 0, Weight = -100

Forecasting outside ofRange of observed datais dangerous!!!!!

Increase height by 1 inch, weightincreases by 3.986 pounds

Caution:

The Best Line

y = - 100.38 + 3.9867x

100.0

120.0

140.0

160.0

180.0

200.0

220.0

240.0

260.0

280.0

58.0 63.0 68.0 73.0 78.0 83.0 88.0

Height

Wei

gh

t

Best Line for Sales vs Price

Choose a and so that they minimize the total error

Use a Statistical Software (e.g. SPSS)ˆ 303.86a ˆ 53.84

Coefficientsa

303.854 6.245 48.658 .000

-53.839 5.234 -.870 -10.286 .000

(Constant)

PRICE

Model1

B Std. Error

UnstandardizedCoefficients

Beta

Standardized

Coefficients

t Sig.

Dependent Variable: SALESa.

Interpreting Coefficients

ˆ 53.84

ˆ 303.86a Price = 0, Sales = $303.86

Does this make sense?

Increase price by $1, salesdecrease by 53.84 units

Caution:

The Best Line

y = 303.85 - 53.834x

200.0

210.0

220.0

230.0

240.0

250.0

260.0

270.0

280.0

0.8 1.0 1.2 1.4 1.6

Price

Sale

s

Are a and different than zero? Hypothesis Tests

Null Hypothesis: a=0 =0: If this is true, no relationship between X and Y!!!

Statistical Software Calculates t-statistic (very large or very small reject Null

Hypothesis) Significance Level = P-Value (sig < 0.05, reject Null

Hypothesis)

Are a and different than zero?

Statistical Software Calculates t-statistic (far from zero reject Null Hypothesis) Significance Level = P-Value (sig < 0.05, reject Null

Hypothesis)

Coefficientsa

-100.350 14.586 -6.880 .000

3.986 .200 .943 19.915 .000

(Constant)

HEIGHT

Model1

B Std. Error

UnstandardizedCoefficients

Beta

Standardized

Coefficients

t Sig.

Dependent Variable: WEIGHTa.

Is different than zero?

t-statistic: 19.915 Significance Level = P-Value: less than 0.000

Reject Null Hypothesis: Reject idea that = 0!!!

Coefficientsa

-100.350 14.586 -6.880 .000

3.986 .200 .943 19.915 .000

(Constant)

HEIGHT

Model1

B Std. Error

UnstandardizedCoefficients

Beta

Standardized

Coefficients

t Sig.

Dependent Variable: WEIGHTa.

How much of Y is explained by X?

R - square =% of variation of Y explained by the X

correlation = R For Simple Linear Regression Only!!!

Model Summary

.943a .890 .888 10.5204Model1

R R SquareAdjustedR Square

Std. Errorof the

Estimate

Predictors: (Constant), HEIGHTa.

Managerial Insight What are the expected average sales for a

week if price is set at $1?

ˆˆ ˆY a X $1X

ˆ $303.85a ˆ 53.84

ˆ $303.85 53.84($1)Y

ˆ $250.01Y

Managerial InsightWhat price would you have to set in order to get

an average sales of $300 per store?

ˆˆ ˆY a X ?X

$300 $303.85 53.84( )X

$0.0713X

ˆ $300Y

$3.84

53.84X

Nonsense?