simple linear regression. correlation correlation ( ) measures the strength of the linear...
TRANSCRIPT
Correlation Correlation () measures the strength of the
linear relationship between two sets of data (X,Y).
The value for is always between -1 and +1.
Correlation helps answer the question:
if X is above its average value does Y tend to be above or below its average value?
If X increases does Y tend to increase or decrease?
Scatter Plot
Individual Height Weight1.0 60.0 137.42.0 60.5 130.93.0 61.0 146.14.0 61.5 157.15.0 62.0 158.46.0 62.5 165.07.0 63.0 133.18.0 63.5 152.09.0 64.0 165.5. . .. . .. . .
100.0
120.0
140.0
160.0
180.0
200.0
220.0
240.0
260.0
280.0
58.0 63.0 68.0 73.0 78.0 83.0 88.0
Height
Wei
gh
t
Positive Correlation If the correlation between two variables is positive (greater
than 0), When X is above its average, Y tends to be above its
average When X increases, Y tends to increase.
100
120
140
160
180
200
220
240
260
280
58 63 68 73 78 83 88
Height
Wei
gh
t = 0.94
Negative Correlation If the correlation between two variables is negative (less
than 0), When X is above its average, Y tends to be below its
average When X increases, Y tends to decrease.
$200.0
$210.0
$220.0
$230.0
$240.0
$250.0
$260.0
$270.0
$280.0
$0.8 $1.0 $1.2 $1.4 $1.6
Price
Sal
es
= -0.87
Perfect Correlation
= -1 = 1
If know X, know Y
-15
-10
-5
0
5
10
15
-15 -10 -5 0 5 10 15
-15
-10
-5
0
5
10
15
-15 -10 -5 0 5 10 15
No Correlation
= -0.059
Knowing X does not help predicting Y
-15
-10
-5
0
5
10
15
-15 -10 -5 0 5 10 15
Returns and Assets Managed Correlation between the return of an mutual
fund and the amount of assets managed?
Annual returns for 385 US equity mutual funds, year ending July 1998. Data provided by Lipper Analytic.
Retunes and Assets Managed Sample of the Data
. . .
Mutual Fund Name Assets($Mill) (X) Annual Return (Y)
EQUITRUST:VAL GRO R 111.5 -4.09
FPA PARAMOUNT 572.2 -1.21
FRANKLIN VAL:VALUE;I 133.9 5.20
IMS CAPITAL VALUE FUND 13.0 6.31
HERITAGE:VALUE EQTY;A 20.0 6.65
ADVANTUS CORNERSTONE;A 113.9 6.69
PUTNAM NEW VALUE;B R 461.2 8.46
YACKTMAN FUND 796.0 8.59
GREENSPRING FUND 182.9 9.22
PIONEER II;A 7239.0 9.54
FRANKLIN ALL:MODERT;I 21.1 10.03
. . .
. . .
Average = 1413.16 = 23.20
Standard Deviation SX = 4908.93 SY = 6.73
Retunes and Assets Managed Correlation = 0.0339
Does the size of Mutual Fund tell You anything about Expected Returns?
Charictoristics of Correlation Positive/Negative: Increase/Decrease
Positive: X increases, then Y increases Negative: X increases, then Y decreases
Prefect Correlation If know X, then know Y All observations are on a straight line
No Corrrelation No relationship between X and Y
Correlation Quiz Imagine that the correlation between price of a product
and weekly sales is –0.8. The average price for the product was $1 and the average of the weekly sales was $200 per week. If the price for the product is set at $1.5 which of the following average weekly sales would be reasonable?
-100 200 240 160
Regression Questions Three Questions
What is the best estimate of a and ? Which line fits best?
Are a and different than zero? Is there anything going on?
How much of Y is explained by X? How much of the total variation of Y is explained by X?
Best Guess? If Knew X, what would be guess for Y?
X = 78
Y=Average value of Y
100
120
140
160
180
200
220
240
260
280
58 63 68 73 78 83 88
Height
Wei
gh
t
Best Guess? If Knew X, what would be guess for Y?
X = 78
Draw a line that“describes” X in terms of Y.
100
120
140
160
180
200
220
240
260
280
58 63 68 73 78 83 88
Height
Wei
gh
t
Equation of a Straight Line
Intercept: Value of Y when X = 0: Weight when height = 0 Sales when price = 0
Y a X a
Equation of a Straight Line
Intercept: Value of Y when X = 0: Weight when height = 0 Sales when price = 0
Slope: Change Y/Change X:
Y a X a
Equation of a Straight Line
Intercept: Value of Y when X = 0: Weight when height = 0 Sales when price = 0
Slope: Change Y/Change X: Expected change in weight when height increases by
1
Y a X a
Equation of a Straight Line
Intercept: Value of Y when X = 0: Weight when height = 0 Sales when price = 0
Slope: Change Y/Change X: Expected change in weight when height increases by
1 Expected change in sales when price increases by 1
Y a X a
Statistical Notation (Language) Y is known as the dependent (or response)
variable Typcically we want to have some control over Y
X is know as the independent (or predictor) variable
Often we have some control over X – e.g. Price
Best Straight Line?
100
120
140
160
180
200
220
240
260
280
58 63 68 73 78 83 88
Height
Wei
gh
t
Y a X
a Choose: Choose:
Minimize Forecast Error
100
120
140
160
180
200
220
240
260
280
58 63 68 73 78 83 88
Height
Wei
gh
t
ˆˆ ˆi iY a X Forecast: iYObservation:
Error:
ˆi iY Y
Minimize Forecast Error
Choose a and so that they minimize the total error
In particular minimize the total sum of squared errorsˆ
i i iY Y
iY observed value (height for person i)
iY expected value (height for person i) based onestimates of and a
What is the best estimate of a and ?
Choose a and so that they minimize the total error
In particular minimize the total sum of squared errors
2
1
n
ii
SSE
ˆi i iY Y ˆˆ ˆi iY a X
Best Line for Height vs Weight
Choose a and so that they minimize the total error
Use a Statistical Software (e.g. SPSS)ˆ 100.35a ˆ 3.986
Coefficientsa
-100.350 14.586 -6.880 .000
3.986 .200 .943 19.915 .000
(Constant)
HEIGHT
Model1
B Std. Error
UnstandardizedCoefficients
Beta
Standardized
Coefficients
t Sig.
Dependent Variable: WEIGHTa.
Interpreting Coefficients
ˆ 3.986
ˆ 100.35a Height = 0, Weight = -100
Forecasting outside ofRange of observed datais dangerous!!!!!
Increase height by 1 inch, weightincreases by 3.986 pounds
Caution:
The Best Line
y = - 100.38 + 3.9867x
100.0
120.0
140.0
160.0
180.0
200.0
220.0
240.0
260.0
280.0
58.0 63.0 68.0 73.0 78.0 83.0 88.0
Height
Wei
gh
t
Best Line for Sales vs Price
Choose a and so that they minimize the total error
Use a Statistical Software (e.g. SPSS)ˆ 303.86a ˆ 53.84
Coefficientsa
303.854 6.245 48.658 .000
-53.839 5.234 -.870 -10.286 .000
(Constant)
PRICE
Model1
B Std. Error
UnstandardizedCoefficients
Beta
Standardized
Coefficients
t Sig.
Dependent Variable: SALESa.
Interpreting Coefficients
ˆ 53.84
ˆ 303.86a Price = 0, Sales = $303.86
Does this make sense?
Increase price by $1, salesdecrease by 53.84 units
Caution:
The Best Line
y = 303.85 - 53.834x
200.0
210.0
220.0
230.0
240.0
250.0
260.0
270.0
280.0
0.8 1.0 1.2 1.4 1.6
Price
Sale
s
Are a and different than zero? Hypothesis Tests
Null Hypothesis: a=0 =0: If this is true, no relationship between X and Y!!!
Statistical Software Calculates t-statistic (very large or very small reject Null
Hypothesis) Significance Level = P-Value (sig < 0.05, reject Null
Hypothesis)
Are a and different than zero?
Statistical Software Calculates t-statistic (far from zero reject Null Hypothesis) Significance Level = P-Value (sig < 0.05, reject Null
Hypothesis)
Coefficientsa
-100.350 14.586 -6.880 .000
3.986 .200 .943 19.915 .000
(Constant)
HEIGHT
Model1
B Std. Error
UnstandardizedCoefficients
Beta
Standardized
Coefficients
t Sig.
Dependent Variable: WEIGHTa.
Is different than zero?
t-statistic: 19.915 Significance Level = P-Value: less than 0.000
Reject Null Hypothesis: Reject idea that = 0!!!
Coefficientsa
-100.350 14.586 -6.880 .000
3.986 .200 .943 19.915 .000
(Constant)
HEIGHT
Model1
B Std. Error
UnstandardizedCoefficients
Beta
Standardized
Coefficients
t Sig.
Dependent Variable: WEIGHTa.
How much of Y is explained by X?
R - square =% of variation of Y explained by the X
correlation = R For Simple Linear Regression Only!!!
Model Summary
.943a .890 .888 10.5204Model1
R R SquareAdjustedR Square
Std. Errorof the
Estimate
Predictors: (Constant), HEIGHTa.
Managerial Insight What are the expected average sales for a
week if price is set at $1?
ˆˆ ˆY a X $1X
ˆ $303.85a ˆ 53.84
ˆ $303.85 53.84($1)Y
ˆ $250.01Y