# 8. simple linear regression

Post on 07-Apr-2018

220 views

Embed Size (px)

TRANSCRIPT

8/4/2019 8. Simple Linear Regression

1/49

The McGraw-Hill Companies, Inc. 2008McGraw-Hill/Irwin

Module 200: Statistics of Finance

Linear Regression andCorrelation

Victor A. Abola, Ph.D.School of Economics

DBPManagementAssociatesProgram

8/4/2019 8. Simple Linear Regression

2/49

2

Outline

Dependent and independent variable. Calculate and interpret the coefficient of correlation,

Conduct a test of hypothesis to determine whetherthe coefficient of correlation in the population is zero.

Calculate the least squares regression line. Goodness of fit: R2

Testing hypothesis on coefficients

Confidence and prediction intervals for thedependent variable.

Non-linear regressions

8/4/2019 8. Simple Linear Regression

3/49

3

Regression Analysis - Introduction

Recall in Chapter 4 the idea of showing therelationship between twovariables with a scatterdiagram was introduced.

In that case we showed that, as the age of the buyerincreased, the amount spent for the vehicle also

increased. In this chapter we carry this idea further. Numerical

measures to express the strength of relationshipbetween two variables are developed.

In addition, an equation is used to express therelationship. between variables, allowing us toestimate one variable on the basis of another.

8/4/2019 8. Simple Linear Regression

4/49

4

Regression Analysis - Uses

Some examples.

Is there a relationship between the amount KFC spends permonth on advertising and its sales in the month? (predictionand cost-effectiveness)

Can we base an estimate of the eiectricity cost of air-con based

on the number of sq. m. of the office? (prediction) Is there a relationship between the number of hours that

students studied for an exam and the score earned?(prediction)

If we can establish a relationship between sales by call center

agent and the number of calls made, then we can know howeach agent is performing. (benchmarking)

8/4/2019 8. Simple Linear Regression

5/49

5

Correlation Analysis

Correlation Analysisis the study of the

relationship between variables. It is alsodefined as group of techniques to measurethe association between two variables.

A Scatter Diagram is a chart that portraysthe relationship between the two variables. Itis the usual first step in correlations analysis

The Dependent Variable is the variable beingpredicted or estimated.

The Independent Variable provides the basis forestimation. It is the predictor variable.

8/4/2019 8. Simple Linear Regression

6/49

6

Regression Example

The sales manager of HPPhil., wants todetermine whetherthere is a relationshipbetween the number

of sales calls made ina month and thenumber of copiers soldthat month.

The manager selects arandom sample of 10sales reps and obtainsthe ff. data

8/4/2019 8. Simple Linear Regression

7/49

7

The Coefficient of Correlation, r

The Coefficient of Correlation (r) is a measure of thestrength of the relationship between two variables. Itrequires interval or ratio-scaled data.

It can range from -1.00 to 1.00.

Values of -1.00 or 1.00 indicate perfect and strongcorrelation.

Values close to 0.0 indicate weak correlation.

Negative values indicate an inverse relationship and

positive values indicate a direct relationship.

8/4/2019 8. Simple Linear Regression

8/49

8

Correlation Coefficient - Interpretation

8/4/2019 8. Simple Linear Regression

9/49

9

Correlation Coefficient - Formula

8/4/2019 8. Simple Linear Regression

10/49

10

How do we interpret a correlation of 0.759?First, it is positive, there is a direct relationship between the number ofsales calls and the number of copiers sold. The value of 0.759 is fairlyclose to 1.00, so we may conclude that the association is strong.

However, does this mean that more sales calls causemore sales?No, we have not demonstrated cause and effect here, only that the twovariablessales calls and copiers soldare related or associated..

Correlation Coefficient - Example

8/4/2019 8. Simple Linear Regression

11/49

11

Testing the Significance ofthe Correlation Coefficient

H0: = 0 (the correlation in the population is 0)H1: 0 (the correlation in the population is not 0)

Reject H0 if:

t> t/2,n-2 or t< -t/2,n-2

df = n - 2

8/4/2019 8. Simple Linear Regression

12/49

12

Testing the Significance of

the Correlation Coefficient - Example

H0

: = 0 (the correlation in the population is 0)

H1: 0 (the correlation in the population is not 0)

Reject H0 if:

t> t/2,n-2 or t< -t/2,n-2

t> t0.025,8 or t< -t0.025,8t> 2.306 or t< -2.306

8/4/2019 8. Simple Linear Regression

13/49

13

Testing the Significance of

the Correlation Coefficient - Example

The computed t(3.297) is within the rejection region, thus, we reject H0. Thismeans the correlation in the population is not zero.

From a practical standpoint, it indicates to the sales manager that there iscorrelation with respect to the number of sales calls made and the number of

copiers sold in the population of salespeople.

8/4/2019 8. Simple Linear Regression

14/49

14

Linear Regression Model

8/4/2019 8. Simple Linear Regression

15/49

15

Regression Analysis

In regression analysis we use the independent variable(X) to estimate the dependent variable (Y).

The relationship between the variables is linear.

Both variables must be at least interval scale.

The least squares criterion is used to determine theequation.

8/4/2019 8. Simple Linear Regression

16/49

16

Scatter Diagram

8/4/2019 8. Simple Linear Regression

17/49

17

Illustration of the Least Squares

Regression Principle

8/4/2019 8. Simple Linear Regression

18/49

18

Regression Analysis Least Squares

Principle

The least squares principle is used toobtain aand b.

The equations to determine aand b

are:

bn XY X Y

n X X

a Yn

b Xn

=

=

( ) ( )( )

( ) ( )

2 2

8/4/2019 8. Simple Linear Regression

19/49

19

Regression Equation - Example

Recall the example involving

Copier Sales of America. Thesales manager gatheredinformation on the number ofsales calls made and thenumber of copiers sold for arandom sample of 10 salesrepresentatives. Use the leastsquares method to determine alinear equation to express therelationship between the twovariables.

What is the expected number ofcopiers sold by a representativewho made 20 calls?

8/4/2019 8. Simple Linear Regression

20/49

20

Finding the Regression Equation - Example

6316.42

)20(1842.19476.18

1842.19476.18

:isequationregressionThe

^

^

^

^

=

+=

+=

+=

Y

Y

XY

bXaY

8/4/2019 8. Simple Linear Regression

21/49

21

Computing the Estimates of Y

Step 1 Using the regression equation, substitute the

value of each X to solve for the estimated sales

4736.54

)30(1842.19476.18

1842.19476.18

JonesSoni

^

^

^

=

+=

+=

Y

Y

XY

6316.42

)20(1842.19476.18

1842.19476.18

KellerTom

^

^

^

=

+=

+=

Y

Y

XY

8/4/2019 8. Simple Linear Regression

22/49

22

)(^

YYGraphical Illustration of the Differences between ActualY Estimated Y

8/4/2019 8. Simple Linear Regression

23/49

23

Goodness-of-Fit

The coefficient of determination (r2) is the

proportion of the total variance in thedependent variable (Y) that is explained oraccounted for by the variation in the

independent variable (X). It is the square of the coefficient of

correlation.

It ranges from 0 to 1. It does not give any information on the

direction of the relationship between thevariables.

8/4/2019 8. Simple Linear Regression

24/49

24

Plotting the Estimated and the Actual Ys

(Ya -

Yc - Ya or e

Regression line Yc

8/4/2019 8. Simple Linear Regression

25/49

25

Coefficient of Determination (r2) - Example

The coefficient of determination, r2 ,is 0.576,found by (0.759)2

This is a proportion or a percent; we can say that

57.6 percent of the variation in the number ofcopiers sold is explained, or accounted for, by thevariation in the number of sales calls.

8/4/2019 8. Simple Linear Regression

26/49

26

The Standard Error of Estimate

The standard error of estimate measures thescatter, or dispersion, of the observed valuesaround the line of regression

The formulas that are used to compute the

standard error:

2

)( 2^

.=

n

YYs xy 2

2

.

= n

XYbYaYs xy

8/4/2019 8. Simple Linear Regression

27/49

27

Standard Error of the Estimate - Example

Recall the example involving

Copier Sales of America.The sales managerdetermined the leastsquares regressionequation is given below.

Determine the standard errorof estimate as a measureof how well the values fitthe regression line.

XY 1842.19476.18^

+=

901.9210

211.784

2

)( 2^

.

=

=

=

n

YYs xy

8/4/2019 8. Simple Linear Regression

28/49

28

Assumptions Underlying LinearRegression

For each value of X, there is a group of Yvalues, and these

Yvalues are normally distributed. The meansof these normaldistributions of Yvalues all lie on the straight line of regression.

The standard deviationsof these normal distributions are equal.

The Yvalues are statistically independent. This means that in

the selection of a sample, the Yvalues chosen for a particular Xvalue do not depend on the Yvalues for any other Xvalues.

8/4/2019 8. Simple Linear Regression

29/49

29

Confidence Interval and Prediction

Interval Estimates of Y

A confidence interval reports the meanvalue of Yfor a given X.A prediction interval reports the range of valuesof Yfor a particularvalue of X.

8/4/2019 8. Simple Linear Regression

30/49

30

Confidence Interval Estimate - Example

We return to the Copier Sales of Americaillustration. Determine a 95 percent confidenceinterval for all sales representatives who make25 calls.

8/4/2019 8. Simple Linear Regression

31/49

31

Step 1 Compute the point estimate of Y

In other words, determine the number of copiers we expect a sales

representative to sell if he or she makes 25 calls.

5526.48

)25(1842.19476.18

1842.19476.18

:isequationregressionThe

^

^

^

=

+=

+=

Y

Y

XY

Confidence Interval Estimate - Example

8/4/2019 8. Simple Linear Regression

32/49

32

Step 2 Find the value of t

To find the tvalue, we need to first know the number

of degrees of freedom. In this case the degrees offreedom is n -2 = 10 2 = 8.

We set the confidence level at 95 percent. To findthe value of t, move down the left-hand column ofAppendix B.2 to 8 degrees of freedom, then moveacross to the column with the 95 percent level ofconfidence.

The value of tis 2.306.

Confidence Interval Estimate - Example

8/4/2019 8. Simple Linear Regression

33/49

33

Confidence Interval Estimate - Example

8/4/2019 8. Simple Linear Regression

34/49

34

Confidence Interval Estimate - Example

Step 4 Use the formula above by substituting the numbers computed

in previous slides

Thus, the 95 percent confidence interval for the average sales of allsales representatives who make 25 calls is from 40.9170 up to

56.1882 copiers.

8/4/2019 8. Simple Linear Regression

35/49

35

Prediction Interval Estimate - Example

We return to the Copier Sales of Americaillustration. Determine a 95 percentprediction interval for Sheila Baker, a West

Coast sales representative who made 25calls.

8/4/2019 8. Simple Linear Regression

36/49

36

Step 1 Compute the point estimate of Y

In other words, determine the number of copiers we

expect a sales representative to sell if he or shemakes 25 calls.

5526.48

)25(1842.19476.18

1842.19476.18

:isequationregressionThe

^

^

^

=

+=

+=

Y

Y

XY

Prediction Interval Estimate - Example

8/4/2019 8. Simple Linear Regression

37/49

37

Step 2 Using the information computedearlier in the confidence interval estimation

example, use the formula above.

Prediction Interval Estimate - Example

If Sheila Baker makes 25 sales calls, the number of copiers she

will sell will be between about 24 and 73 copiers.

8/4/2019 8. Simple Linear Regression

38/49

38

Transforming Data

The coefficient of correlation describes thestrength of the linearrelationship betweentwo variables. It could be that two variablesare closely related, but there relationship is

not linear. Be cautious when you are interpreting the

coefficient of correlation. A value of rmayindicate there is no linear relationship, but itcould be there is a relationship of some othernonlinear or curvilinear form.

8/4/2019 8. Simple Linear Regression

39/49

39

Transforming Data - Example

On the right is a listing of 22 professionalgolfers, the number of events in

which they participated, the amountof their winnings, and their meanscore for the 2004 season. In golf,the objective is to play 18 holes inthe least number of strokes. So, wewould expect that those golfers withthe lower mean scores would have

the larger winnings. To put it anotherway, score and winnings should beinversely related. In 2004 TigerWoods played in 19 events, earned$5,365,472, and had a mean scoreper round of 69.04. Fred Couplesplayed in 16 events, earned

$1,396,109, and had a mean scoreper round of 70.92. The data for the22 golfers follows.

8/4/2019 8. Simple Linear Regression

40/49

40

Scatterplot of Golf Data

The correlation between the

variables Winnings andScore is 0.782. This is afairly strong inverserelationship.

However, when we plot thedata on a scatter diagramthe relationship does notappear to be linear; it doesnot seem to follow a straight

line.

8/4/2019 8. Simple Linear Regression

41/49

41

What can we do to explore other (nonlinear)relationships?

One possibility is to transform one of thevariables. For example, instead of using Yas

the dependent variable, we might use its log,reciprocal, square, or square root. Anotherpossibility is to transform the independentvariable in the same way. There are othertransformations, but these are the mostcommon.

8/4/2019 8. Simple Linear Regression

42/49

42

In the golf winnings

example, changing thescale of the dependentvariable is effective. Wedetermine the log of each

golfers winnings andthen find the correlationbetween the log ofwinnings and score. That

is, we find the log to thebase 10 of Tiger Woodsearnings of $5,365,472,which is 6.72961.

Transforming Data - Example

8/4/2019 8. Simple Linear Regression

43/49

43

Using the Transformed Equation for

Estimation

Based on the regression equation, a golfer witha mean score of 70 could expect to earn:

The value 6.4372 is the log to the base 10 of winnings.The antilog of 6.4372 is 2.736So a golfer that had a mean score of 70 could expect toearn $2,736,528.

8/4/2019 8. Simple Linear Regression

44/49

44

Appendix

8/4/2019 8. Simple Linear Regression

45/49

45

Computing the Slope of the Line

8/4/2019 8. Simple Linear Regression

46/49

46

Computing the Y-Intercept

8/4/2019 8. Simple Linear Regression

47/49

47

Standard Error of the Estimate - Excel

8/4/2019 8. Simple Linear Regression

48/49

48

Scatter Plot of Transformed Y

Li R i U i th

8/4/2019 8. Simple Linear Regression

49/49

49

Linear Regression Using the

Transformed Y