8. simple linear regression

Download 8. Simple Linear Regression

Post on 07-Apr-2018

220 views

Category:

Documents

0 download

Embed Size (px)

TRANSCRIPT

  • 8/4/2019 8. Simple Linear Regression

    1/49

    The McGraw-Hill Companies, Inc. 2008McGraw-Hill/Irwin

    Module 200: Statistics of Finance

    Linear Regression andCorrelation

    Victor A. Abola, Ph.D.School of Economics

    DBPManagementAssociatesProgram

  • 8/4/2019 8. Simple Linear Regression

    2/49

    2

    Outline

    Dependent and independent variable. Calculate and interpret the coefficient of correlation,

    Conduct a test of hypothesis to determine whetherthe coefficient of correlation in the population is zero.

    Calculate the least squares regression line. Goodness of fit: R2

    Testing hypothesis on coefficients

    Confidence and prediction intervals for thedependent variable.

    Non-linear regressions

  • 8/4/2019 8. Simple Linear Regression

    3/49

    3

    Regression Analysis - Introduction

    Recall in Chapter 4 the idea of showing therelationship between twovariables with a scatterdiagram was introduced.

    In that case we showed that, as the age of the buyerincreased, the amount spent for the vehicle also

    increased. In this chapter we carry this idea further. Numerical

    measures to express the strength of relationshipbetween two variables are developed.

    In addition, an equation is used to express therelationship. between variables, allowing us toestimate one variable on the basis of another.

  • 8/4/2019 8. Simple Linear Regression

    4/49

    4

    Regression Analysis - Uses

    Some examples.

    Is there a relationship between the amount KFC spends permonth on advertising and its sales in the month? (predictionand cost-effectiveness)

    Can we base an estimate of the eiectricity cost of air-con based

    on the number of sq. m. of the office? (prediction) Is there a relationship between the number of hours that

    students studied for an exam and the score earned?(prediction)

    If we can establish a relationship between sales by call center

    agent and the number of calls made, then we can know howeach agent is performing. (benchmarking)

  • 8/4/2019 8. Simple Linear Regression

    5/49

    5

    Correlation Analysis

    Correlation Analysisis the study of the

    relationship between variables. It is alsodefined as group of techniques to measurethe association between two variables.

    A Scatter Diagram is a chart that portraysthe relationship between the two variables. Itis the usual first step in correlations analysis

    The Dependent Variable is the variable beingpredicted or estimated.

    The Independent Variable provides the basis forestimation. It is the predictor variable.

  • 8/4/2019 8. Simple Linear Regression

    6/49

    6

    Regression Example

    The sales manager of HPPhil., wants todetermine whetherthere is a relationshipbetween the number

    of sales calls made ina month and thenumber of copiers soldthat month.

    The manager selects arandom sample of 10sales reps and obtainsthe ff. data

  • 8/4/2019 8. Simple Linear Regression

    7/49

    7

    The Coefficient of Correlation, r

    The Coefficient of Correlation (r) is a measure of thestrength of the relationship between two variables. Itrequires interval or ratio-scaled data.

    It can range from -1.00 to 1.00.

    Values of -1.00 or 1.00 indicate perfect and strongcorrelation.

    Values close to 0.0 indicate weak correlation.

    Negative values indicate an inverse relationship and

    positive values indicate a direct relationship.

  • 8/4/2019 8. Simple Linear Regression

    8/49

    8

    Correlation Coefficient - Interpretation

  • 8/4/2019 8. Simple Linear Regression

    9/49

    9

    Correlation Coefficient - Formula

  • 8/4/2019 8. Simple Linear Regression

    10/49

    10

    How do we interpret a correlation of 0.759?First, it is positive, there is a direct relationship between the number ofsales calls and the number of copiers sold. The value of 0.759 is fairlyclose to 1.00, so we may conclude that the association is strong.

    However, does this mean that more sales calls causemore sales?No, we have not demonstrated cause and effect here, only that the twovariablessales calls and copiers soldare related or associated..

    Correlation Coefficient - Example

  • 8/4/2019 8. Simple Linear Regression

    11/49

    11

    Testing the Significance ofthe Correlation Coefficient

    H0: = 0 (the correlation in the population is 0)H1: 0 (the correlation in the population is not 0)

    Reject H0 if:

    t> t/2,n-2 or t< -t/2,n-2

    df = n - 2

  • 8/4/2019 8. Simple Linear Regression

    12/49

    12

    Testing the Significance of

    the Correlation Coefficient - Example

    H0

    : = 0 (the correlation in the population is 0)

    H1: 0 (the correlation in the population is not 0)

    Reject H0 if:

    t> t/2,n-2 or t< -t/2,n-2

    t> t0.025,8 or t< -t0.025,8t> 2.306 or t< -2.306

  • 8/4/2019 8. Simple Linear Regression

    13/49

    13

    Testing the Significance of

    the Correlation Coefficient - Example

    The computed t(3.297) is within the rejection region, thus, we reject H0. Thismeans the correlation in the population is not zero.

    From a practical standpoint, it indicates to the sales manager that there iscorrelation with respect to the number of sales calls made and the number of

    copiers sold in the population of salespeople.

  • 8/4/2019 8. Simple Linear Regression

    14/49

    14

    Linear Regression Model

  • 8/4/2019 8. Simple Linear Regression

    15/49

    15

    Regression Analysis

    In regression analysis we use the independent variable(X) to estimate the dependent variable (Y).

    The relationship between the variables is linear.

    Both variables must be at least interval scale.

    The least squares criterion is used to determine theequation.

  • 8/4/2019 8. Simple Linear Regression

    16/49

    16

    Scatter Diagram

  • 8/4/2019 8. Simple Linear Regression

    17/49

    17

    Illustration of the Least Squares

    Regression Principle

  • 8/4/2019 8. Simple Linear Regression

    18/49

    18

    Regression Analysis Least Squares

    Principle

    The least squares principle is used toobtain aand b.

    The equations to determine aand b

    are:

    bn XY X Y

    n X X

    a Yn

    b Xn

    =

    =

    ( ) ( )( )

    ( ) ( )

    2 2

  • 8/4/2019 8. Simple Linear Regression

    19/49

    19

    Regression Equation - Example

    Recall the example involving

    Copier Sales of America. Thesales manager gatheredinformation on the number ofsales calls made and thenumber of copiers sold for arandom sample of 10 salesrepresentatives. Use the leastsquares method to determine alinear equation to express therelationship between the twovariables.

    What is the expected number ofcopiers sold by a representativewho made 20 calls?

  • 8/4/2019 8. Simple Linear Regression

    20/49

    20

    Finding the Regression Equation - Example

    6316.42

    )20(1842.19476.18

    1842.19476.18

    :isequationregressionThe

    ^

    ^

    ^

    ^

    =

    +=

    +=

    +=

    Y

    Y

    XY

    bXaY

  • 8/4/2019 8. Simple Linear Regression

    21/49

    21

    Computing the Estimates of Y

    Step 1 Using the regression equation, substitute the

    value of each X to solve for the estimated sales

    4736.54

    )30(1842.19476.18

    1842.19476.18

    JonesSoni

    ^

    ^

    ^

    =

    +=

    +=

    Y

    Y

    XY

    6316.42

    )20(1842.19476.18

    1842.19476.18

    KellerTom

    ^

    ^

    ^

    =

    +=

    +=

    Y

    Y

    XY

  • 8/4/2019 8. Simple Linear Regression

    22/49

    22

    )(^

    YYGraphical Illustration of the Differences between ActualY Estimated Y

  • 8/4/2019 8. Simple Linear Regression

    23/49

    23

    Goodness-of-Fit

    The coefficient of determination (r2) is the

    proportion of the total variance in thedependent variable (Y) that is explained oraccounted for by the variation in the

    independent variable (X). It is the square of the coefficient of

    correlation.

    It ranges from 0 to 1. It does not give any information on the

    direction of the relationship between thevariables.

  • 8/4/2019 8. Simple Linear Regression

    24/49

    24

    Plotting the Estimated and the Actual Ys

    (Ya -

    Yc - Ya or e

    Regression line Yc

  • 8/4/2019 8. Simple Linear Regression

    25/49

    25

    Coefficient of Determination (r2) - Example

    The coefficient of determination, r2 ,is 0.576,found by (0.759)2

    This is a proportion or a percent; we can say that

    57.6 percent of the variation in the number ofcopiers sold is explained, or accounted for, by thevariation in the number of sales calls.

  • 8/4/2019 8. Simple Linear Regression

    26/49

    26

    The Standard Error of Estimate

    The standard error of estimate measures thescatter, or dispersion, of the observed valuesaround the line of regression

    The formulas that are used to compute the

    standard error:

    2

    )( 2^

    .=

    n

    YYs xy 2

    2

    .

    = n

    XYbYaYs xy

  • 8/4/2019 8. Simple Linear Regression

    27/49

    27

    Standard Error of the Estimate - Example

    Recall the example involving

    Copier Sales of America.The sales managerdetermined the leastsquares regressionequation is given below.

    Determine the standard errorof estimate as a measureof how well the values fitthe regression line.

    XY 1842.19476.18^

    +=

    901.9210

    211.784

    2

    )( 2^

    .

    =

    =

    =

    n

    YYs xy

  • 8/4/2019 8. Simple Linear Regression

    28/49

    28

    Assumptions Underlying LinearRegression

    For each value of X, there is a group of Yvalues, and these

    Yvalues are normally distributed. The meansof these normaldistributions of Yvalues all lie on the straight line of regression.

    The standard deviationsof these normal distributions are equal.

    The Yvalues are statistically independent. This means that in

    the selection of a sample, the Yvalues chosen for a particular Xvalue do not depend on the Yvalues for any other Xvalues.

  • 8/4/2019 8. Simple Linear Regression

    29/49

    29

    Confidence Interval and Prediction

    Interval Estimates of Y

    A confidence interval reports the meanvalue of Yfor a given X.A prediction interval reports the range of valuesof Yfor a particularvalue of X.

  • 8/4/2019 8. Simple Linear Regression

    30/49

    30

    Confidence Interval Estimate - Example

    We return to the Copier Sales of Americaillustration. Determine a 95 percent confidenceinterval for all sales representatives who make25 calls.

  • 8/4/2019 8. Simple Linear Regression

    31/49

    31

    Step 1 Compute the point estimate of Y

    In other words, determine the number of copiers we expect a sales

    representative to sell if he or she makes 25 calls.

    5526.48

    )25(1842.19476.18

    1842.19476.18

    :isequationregressionThe

    ^

    ^

    ^

    =

    +=

    +=

    Y

    Y

    XY

    Confidence Interval Estimate - Example

  • 8/4/2019 8. Simple Linear Regression

    32/49

    32

    Step 2 Find the value of t

    To find the tvalue, we need to first know the number

    of degrees of freedom. In this case the degrees offreedom is n -2 = 10 2 = 8.

    We set the confidence level at 95 percent. To findthe value of t, move down the left-hand column ofAppendix B.2 to 8 degrees of freedom, then moveacross to the column with the 95 percent level ofconfidence.

    The value of tis 2.306.

    Confidence Interval Estimate - Example

  • 8/4/2019 8. Simple Linear Regression

    33/49

    33

    Confidence Interval Estimate - Example

  • 8/4/2019 8. Simple Linear Regression

    34/49

    34

    Confidence Interval Estimate - Example

    Step 4 Use the formula above by substituting the numbers computed

    in previous slides

    Thus, the 95 percent confidence interval for the average sales of allsales representatives who make 25 calls is from 40.9170 up to

    56.1882 copiers.

  • 8/4/2019 8. Simple Linear Regression

    35/49

    35

    Prediction Interval Estimate - Example

    We return to the Copier Sales of Americaillustration. Determine a 95 percentprediction interval for Sheila Baker, a West

    Coast sales representative who made 25calls.

  • 8/4/2019 8. Simple Linear Regression

    36/49

    36

    Step 1 Compute the point estimate of Y

    In other words, determine the number of copiers we

    expect a sales representative to sell if he or shemakes 25 calls.

    5526.48

    )25(1842.19476.18

    1842.19476.18

    :isequationregressionThe

    ^

    ^

    ^

    =

    +=

    +=

    Y

    Y

    XY

    Prediction Interval Estimate - Example

  • 8/4/2019 8. Simple Linear Regression

    37/49

    37

    Step 2 Using the information computedearlier in the confidence interval estimation

    example, use the formula above.

    Prediction Interval Estimate - Example

    If Sheila Baker makes 25 sales calls, the number of copiers she

    will sell will be between about 24 and 73 copiers.

  • 8/4/2019 8. Simple Linear Regression

    38/49

    38

    Transforming Data

    The coefficient of correlation describes thestrength of the linearrelationship betweentwo variables. It could be that two variablesare closely related, but there relationship is

    not linear. Be cautious when you are interpreting the

    coefficient of correlation. A value of rmayindicate there is no linear relationship, but itcould be there is a relationship of some othernonlinear or curvilinear form.

  • 8/4/2019 8. Simple Linear Regression

    39/49

    39

    Transforming Data - Example

    On the right is a listing of 22 professionalgolfers, the number of events in

    which they participated, the amountof their winnings, and their meanscore for the 2004 season. In golf,the objective is to play 18 holes inthe least number of strokes. So, wewould expect that those golfers withthe lower mean scores would have

    the larger winnings. To put it anotherway, score and winnings should beinversely related. In 2004 TigerWoods played in 19 events, earned$5,365,472, and had a mean scoreper round of 69.04. Fred Couplesplayed in 16 events, earned

    $1,396,109, and had a mean scoreper round of 70.92. The data for the22 golfers follows.

  • 8/4/2019 8. Simple Linear Regression

    40/49

    40

    Scatterplot of Golf Data

    The correlation between the

    variables Winnings andScore is 0.782. This is afairly strong inverserelationship.

    However, when we plot thedata on a scatter diagramthe relationship does notappear to be linear; it doesnot seem to follow a straight

    line.

  • 8/4/2019 8. Simple Linear Regression

    41/49

    41

    What can we do to explore other (nonlinear)relationships?

    One possibility is to transform one of thevariables. For example, instead of using Yas

    the dependent variable, we might use its log,reciprocal, square, or square root. Anotherpossibility is to transform the independentvariable in the same way. There are othertransformations, but these are the mostcommon.

  • 8/4/2019 8. Simple Linear Regression

    42/49

    42

    In the golf winnings

    example, changing thescale of the dependentvariable is effective. Wedetermine the log of each

    golfers winnings andthen find the correlationbetween the log ofwinnings and score. That

    is, we find the log to thebase 10 of Tiger Woodsearnings of $5,365,472,which is 6.72961.

    Transforming Data - Example

  • 8/4/2019 8. Simple Linear Regression

    43/49

    43

    Using the Transformed Equation for

    Estimation

    Based on the regression equation, a golfer witha mean score of 70 could expect to earn:

    The value 6.4372 is the log to the base 10 of winnings.The antilog of 6.4372 is 2.736So a golfer that had a mean score of 70 could expect toearn $2,736,528.

  • 8/4/2019 8. Simple Linear Regression

    44/49

    44

    Appendix

  • 8/4/2019 8. Simple Linear Regression

    45/49

    45

    Computing the Slope of the Line

  • 8/4/2019 8. Simple Linear Regression

    46/49

    46

    Computing the Y-Intercept

  • 8/4/2019 8. Simple Linear Regression

    47/49

    47

    Standard Error of the Estimate - Excel

  • 8/4/2019 8. Simple Linear Regression

    48/49

    48

    Scatter Plot of Transformed Y

    Li R i U i th

  • 8/4/2019 8. Simple Linear Regression

    49/49

    49

    Linear Regression Using the

    Transformed Y