Download - 8. Simple Linear Regression
-
8/4/2019 8. Simple Linear Regression
1/49
The McGraw-Hill Companies, Inc. 2008McGraw-Hill/Irwin
Module 200: Statistics of Finance
Linear Regression andCorrelation
Victor A. Abola, Ph.D.School of Economics
DBPManagementAssociatesProgram
-
8/4/2019 8. Simple Linear Regression
2/49
2
Outline
Dependent and independent variable. Calculate and interpret the coefficient of correlation,
Conduct a test of hypothesis to determine whetherthe coefficient of correlation in the population is zero.
Calculate the least squares regression line. Goodness of fit: R2
Testing hypothesis on coefficients
Confidence and prediction intervals for thedependent variable.
Non-linear regressions
-
8/4/2019 8. Simple Linear Regression
3/49
3
Regression Analysis - Introduction
Recall in Chapter 4 the idea of showing therelationship between twovariables with a scatterdiagram was introduced.
In that case we showed that, as the age of the buyerincreased, the amount spent for the vehicle also
increased. In this chapter we carry this idea further. Numerical
measures to express the strength of relationshipbetween two variables are developed.
In addition, an equation is used to express therelationship. between variables, allowing us toestimate one variable on the basis of another.
-
8/4/2019 8. Simple Linear Regression
4/49
4
Regression Analysis - Uses
Some examples.
Is there a relationship between the amount KFC spends permonth on advertising and its sales in the month? (predictionand cost-effectiveness)
Can we base an estimate of the eiectricity cost of air-con based
on the number of sq. m. of the office? (prediction) Is there a relationship between the number of hours that
students studied for an exam and the score earned?(prediction)
If we can establish a relationship between sales by call center
agent and the number of calls made, then we can know howeach agent is performing. (benchmarking)
-
8/4/2019 8. Simple Linear Regression
5/49
5
Correlation Analysis
Correlation Analysisis the study of the
relationship between variables. It is alsodefined as group of techniques to measurethe association between two variables.
A Scatter Diagram is a chart that portraysthe relationship between the two variables. Itis the usual first step in correlations analysis
The Dependent Variable is the variable beingpredicted or estimated.
The Independent Variable provides the basis forestimation. It is the predictor variable.
-
8/4/2019 8. Simple Linear Regression
6/49
6
Regression Example
The sales manager of HPPhil., wants todetermine whetherthere is a relationshipbetween the number
of sales calls made ina month and thenumber of copiers soldthat month.
The manager selects arandom sample of 10sales reps and obtainsthe ff. data
-
8/4/2019 8. Simple Linear Regression
7/49
7
The Coefficient of Correlation, r
The Coefficient of Correlation (r) is a measure of thestrength of the relationship between two variables. Itrequires interval or ratio-scaled data.
It can range from -1.00 to 1.00.
Values of -1.00 or 1.00 indicate perfect and strongcorrelation.
Values close to 0.0 indicate weak correlation.
Negative values indicate an inverse relationship and
positive values indicate a direct relationship.
-
8/4/2019 8. Simple Linear Regression
8/49
8
Correlation Coefficient - Interpretation
-
8/4/2019 8. Simple Linear Regression
9/49
9
Correlation Coefficient - Formula
-
8/4/2019 8. Simple Linear Regression
10/49
10
How do we interpret a correlation of 0.759?First, it is positive, there is a direct relationship between the number ofsales calls and the number of copiers sold. The value of 0.759 is fairlyclose to 1.00, so we may conclude that the association is strong.
However, does this mean that more sales calls causemore sales?No, we have not demonstrated cause and effect here, only that the twovariablessales calls and copiers soldare related or associated..
Correlation Coefficient - Example
-
8/4/2019 8. Simple Linear Regression
11/49
11
Testing the Significance ofthe Correlation Coefficient
H0: = 0 (the correlation in the population is 0)H1: 0 (the correlation in the population is not 0)
Reject H0 if:
t> t/2,n-2 or t< -t/2,n-2
df = n - 2
-
8/4/2019 8. Simple Linear Regression
12/49
12
Testing the Significance of
the Correlation Coefficient - Example
H0
: = 0 (the correlation in the population is 0)
H1: 0 (the correlation in the population is not 0)
Reject H0 if:
t> t/2,n-2 or t< -t/2,n-2
t> t0.025,8 or t< -t0.025,8t> 2.306 or t< -2.306
-
8/4/2019 8. Simple Linear Regression
13/49
13
Testing the Significance of
the Correlation Coefficient - Example
The computed t(3.297) is within the rejection region, thus, we reject H0. Thismeans the correlation in the population is not zero.
From a practical standpoint, it indicates to the sales manager that there iscorrelation with respect to the number of sales calls made and the number of
copiers sold in the population of salespeople.
-
8/4/2019 8. Simple Linear Regression
14/49
14
Linear Regression Model
-
8/4/2019 8. Simple Linear Regression
15/49
15
Regression Analysis
In regression analysis we use the independent variable(X) to estimate the dependent variable (Y).
The relationship between the variables is linear.
Both variables must be at least interval scale.
The least squares criterion is used to determine theequation.
-
8/4/2019 8. Simple Linear Regression
16/49
16
Scatter Diagram
-
8/4/2019 8. Simple Linear Regression
17/49
17
Illustration of the Least Squares
Regression Principle
-
8/4/2019 8. Simple Linear Regression
18/49
18
Regression Analysis Least Squares
Principle
The least squares principle is used toobtain aand b.
The equations to determine aand b
are:
bn XY X Y
n X X
a Yn
b Xn
=
=
( ) ( )( )
( ) ( )
2 2
-
8/4/2019 8. Simple Linear Regression
19/49
19
Regression Equation - Example
Recall the example involving
Copier Sales of America. Thesales manager gatheredinformation on the number ofsales calls made and thenumber of copiers sold for arandom sample of 10 salesrepresentatives. Use the leastsquares method to determine alinear equation to express therelationship between the twovariables.
What is the expected number ofcopiers sold by a representativewho made 20 calls?
-
8/4/2019 8. Simple Linear Regression
20/49
20
Finding the Regression Equation - Example
6316.42
)20(1842.19476.18
1842.19476.18
:isequationregressionThe
^
^
^
^
=
+=
+=
+=
Y
Y
XY
bXaY
-
8/4/2019 8. Simple Linear Regression
21/49
21
Computing the Estimates of Y
Step 1 Using the regression equation, substitute the
value of each X to solve for the estimated sales
4736.54
)30(1842.19476.18
1842.19476.18
JonesSoni
^
^
^
=
+=
+=
Y
Y
XY
6316.42
)20(1842.19476.18
1842.19476.18
KellerTom
^
^
^
=
+=
+=
Y
Y
XY
-
8/4/2019 8. Simple Linear Regression
22/49
22
)(^
YYGraphical Illustration of the Differences between ActualY Estimated Y
-
8/4/2019 8. Simple Linear Regression
23/49
23
Goodness-of-Fit
The coefficient of determination (r2) is the
proportion of the total variance in thedependent variable (Y) that is explained oraccounted for by the variation in the
independent variable (X). It is the square of the coefficient of
correlation.
It ranges from 0 to 1. It does not give any information on the
direction of the relationship between thevariables.
-
8/4/2019 8. Simple Linear Regression
24/49
24
Plotting the Estimated and the Actual Ys
(Ya -
Yc - Ya or e
Regression line Yc
-
8/4/2019 8. Simple Linear Regression
25/49
25
Coefficient of Determination (r2) - Example
The coefficient of determination, r2 ,is 0.576,found by (0.759)2
This is a proportion or a percent; we can say that
57.6 percent of the variation in the number ofcopiers sold is explained, or accounted for, by thevariation in the number of sales calls.
-
8/4/2019 8. Simple Linear Regression
26/49
26
The Standard Error of Estimate
The standard error of estimate measures thescatter, or dispersion, of the observed valuesaround the line of regression
The formulas that are used to compute the
standard error:
2
)( 2^
.=
n
YYs xy 2
2
.
= n
XYbYaYs xy
-
8/4/2019 8. Simple Linear Regression
27/49
27
Standard Error of the Estimate - Example
Recall the example involving
Copier Sales of America.The sales managerdetermined the leastsquares regressionequation is given below.
Determine the standard errorof estimate as a measureof how well the values fitthe regression line.
XY 1842.19476.18^
+=
901.9210
211.784
2
)( 2^
.
=
=
=
n
YYs xy
-
8/4/2019 8. Simple Linear Regression
28/49
28
Assumptions Underlying LinearRegression
For each value of X, there is a group of Yvalues, and these
Yvalues are normally distributed. The meansof these normaldistributions of Yvalues all lie on the straight line of regression.
The standard deviationsof these normal distributions are equal.
The Yvalues are statistically independent. This means that in
the selection of a sample, the Yvalues chosen for a particular Xvalue do not depend on the Yvalues for any other Xvalues.
-
8/4/2019 8. Simple Linear Regression
29/49
29
Confidence Interval and Prediction
Interval Estimates of Y
A confidence interval reports the meanvalue of Yfor a given X.A prediction interval reports the range of valuesof Yfor a particularvalue of X.
-
8/4/2019 8. Simple Linear Regression
30/49
30
Confidence Interval Estimate - Example
We return to the Copier Sales of Americaillustration. Determine a 95 percent confidenceinterval for all sales representatives who make25 calls.
-
8/4/2019 8. Simple Linear Regression
31/49
31
Step 1 Compute the point estimate of Y
In other words, determine the number of copiers we expect a sales
representative to sell if he or she makes 25 calls.
5526.48
)25(1842.19476.18
1842.19476.18
:isequationregressionThe
^
^
^
=
+=
+=
Y
Y
XY
Confidence Interval Estimate - Example
-
8/4/2019 8. Simple Linear Regression
32/49
32
Step 2 Find the value of t
To find the tvalue, we need to first know the number
of degrees of freedom. In this case the degrees offreedom is n -2 = 10 2 = 8.
We set the confidence level at 95 percent. To findthe value of t, move down the left-hand column ofAppendix B.2 to 8 degrees of freedom, then moveacross to the column with the 95 percent level ofconfidence.
The value of tis 2.306.
Confidence Interval Estimate - Example
-
8/4/2019 8. Simple Linear Regression
33/49
33
Confidence Interval Estimate - Example
-
8/4/2019 8. Simple Linear Regression
34/49
34
Confidence Interval Estimate - Example
Step 4 Use the formula above by substituting the numbers computed
in previous slides
Thus, the 95 percent confidence interval for the average sales of allsales representatives who make 25 calls is from 40.9170 up to
56.1882 copiers.
-
8/4/2019 8. Simple Linear Regression
35/49
35
Prediction Interval Estimate - Example
We return to the Copier Sales of Americaillustration. Determine a 95 percentprediction interval for Sheila Baker, a West
Coast sales representative who made 25calls.
-
8/4/2019 8. Simple Linear Regression
36/49
36
Step 1 Compute the point estimate of Y
In other words, determine the number of copiers we
expect a sales representative to sell if he or shemakes 25 calls.
5526.48
)25(1842.19476.18
1842.19476.18
:isequationregressionThe
^
^
^
=
+=
+=
Y
Y
XY
Prediction Interval Estimate - Example
-
8/4/2019 8. Simple Linear Regression
37/49
37
Step 2 Using the information computedearlier in the confidence interval estimation
example, use the formula above.
Prediction Interval Estimate - Example
If Sheila Baker makes 25 sales calls, the number of copiers she
will sell will be between about 24 and 73 copiers.
-
8/4/2019 8. Simple Linear Regression
38/49
38
Transforming Data
The coefficient of correlation describes thestrength of the linearrelationship betweentwo variables. It could be that two variablesare closely related, but there relationship is
not linear. Be cautious when you are interpreting the
coefficient of correlation. A value of rmayindicate there is no linear relationship, but itcould be there is a relationship of some othernonlinear or curvilinear form.
-
8/4/2019 8. Simple Linear Regression
39/49
39
Transforming Data - Example
On the right is a listing of 22 professionalgolfers, the number of events in
which they participated, the amountof their winnings, and their meanscore for the 2004 season. In golf,the objective is to play 18 holes inthe least number of strokes. So, wewould expect that those golfers withthe lower mean scores would have
the larger winnings. To put it anotherway, score and winnings should beinversely related. In 2004 TigerWoods played in 19 events, earned$5,365,472, and had a mean scoreper round of 69.04. Fred Couplesplayed in 16 events, earned
$1,396,109, and had a mean scoreper round of 70.92. The data for the22 golfers follows.
-
8/4/2019 8. Simple Linear Regression
40/49
40
Scatterplot of Golf Data
The correlation between the
variables Winnings andScore is 0.782. This is afairly strong inverserelationship.
However, when we plot thedata on a scatter diagramthe relationship does notappear to be linear; it doesnot seem to follow a straight
line.
-
8/4/2019 8. Simple Linear Regression
41/49
41
What can we do to explore other (nonlinear)relationships?
One possibility is to transform one of thevariables. For example, instead of using Yas
the dependent variable, we might use its log,reciprocal, square, or square root. Anotherpossibility is to transform the independentvariable in the same way. There are othertransformations, but these are the mostcommon.
-
8/4/2019 8. Simple Linear Regression
42/49
42
In the golf winnings
example, changing thescale of the dependentvariable is effective. Wedetermine the log of each
golfers winnings andthen find the correlationbetween the log ofwinnings and score. That
is, we find the log to thebase 10 of Tiger Woodsearnings of $5,365,472,which is 6.72961.
Transforming Data - Example
-
8/4/2019 8. Simple Linear Regression
43/49
43
Using the Transformed Equation for
Estimation
Based on the regression equation, a golfer witha mean score of 70 could expect to earn:
The value 6.4372 is the log to the base 10 of winnings.The antilog of 6.4372 is 2.736So a golfer that had a mean score of 70 could expect toearn $2,736,528.
-
8/4/2019 8. Simple Linear Regression
44/49
44
Appendix
-
8/4/2019 8. Simple Linear Regression
45/49
45
Computing the Slope of the Line
-
8/4/2019 8. Simple Linear Regression
46/49
46
Computing the Y-Intercept
-
8/4/2019 8. Simple Linear Regression
47/49
47
Standard Error of the Estimate - Excel
-
8/4/2019 8. Simple Linear Regression
48/49
48
Scatter Plot of Transformed Y
Li R i U i th
-
8/4/2019 8. Simple Linear Regression
49/49
49
Linear Regression Using the
Transformed Y