forecasting using the simple linear regression model and correlation

45
Forecasting Using the Simple Linear Regression Model and Correlation

Upload: gianna-wellington

Post on 14-Dec-2015

276 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Forecasting Using the Simple Linear Regression Model and Correlation

Forecasting Using the Simple Linear Regression

Model and Correlation

Page 2: Forecasting Using the Simple Linear Regression Model and Correlation

What is a forecast?

Using a statistical method on past data to predict the future.

Using experience, judgment and surveys to predict the future.

Page 3: Forecasting Using the Simple Linear Regression Model and Correlation

Why forecast?

to enhance planning.to force thinking about the future.to fit corporate strategy to future conditions. to coordinate departments to the same future.to reduce corporate costs.

Page 4: Forecasting Using the Simple Linear Regression Model and Correlation

Kinds of Forecasts

Causal forecasts are when changes in a variable (Y) you wish to predict are caused by changes in other variables (X's).

Time series forecasts are when changes in a variable (Y) are predicted based on prior values of itself (Y).

Regression can provide both kinds of forecasts.

Page 5: Forecasting Using the Simple Linear Regression Model and Correlation

Types of Relationships

Positive Linear Relationship Negative Linear Relationship

Page 6: Forecasting Using the Simple Linear Regression Model and Correlation

Types of Relationships

Relationship NOT Linear No Relationship

(continued)

Page 7: Forecasting Using the Simple Linear Regression Model and Correlation

Relationships

If the relationship is not linear, the forecaster often has to use math transformations to make the relationship linear.

Page 8: Forecasting Using the Simple Linear Regression Model and Correlation

Correlation Analysis

•Correlation measures the strength of the linear relationship between variables.

•It can be used to find the best predictor variables.

• It does not assure that there is a causal relationship between the variables.

Page 9: Forecasting Using the Simple Linear Regression Model and Correlation

The Correlation Coefficient

• Ranges between -1 and 1.

• The Closer to -1, The Stronger Is The Negative Linear Relationship.

• The Closer to 1, The Stronger Is The Positive Linear Relationship.

• The Closer to 0, The Weaker Is Any Linear Relationship.

Page 10: Forecasting Using the Simple Linear Regression Model and Correlation

Y

X

Y

X

Y

X

Y

X

Y

X

Graphs of Various Correlation (r) Values

r = -1 r = -.6 r = 0

r = .6 r = 1

Page 11: Forecasting Using the Simple Linear Regression Model and Correlation

The Scatter Diagram

0

20

40

60

80

0 20 40 60

X

Y

Plot of all (Xi , Yi) pairs

Page 12: Forecasting Using the Simple Linear Regression Model and Correlation

The Scatter Diagram

Is used to visualize the relationship and to assess its linearity. The scatter diagram can also be used to identify outliers.

Page 13: Forecasting Using the Simple Linear Regression Model and Correlation

Regression Analysis

Regression Analysis can be used to model causality and make predictions.

Terminology: The variable to be predicted is called the dependent or response variable. The variables used in the prediction model are called independent, explanatory or predictor variables.

Page 14: Forecasting Using the Simple Linear Regression Model and Correlation

Simple Linear Regression Model

•The relationship between variables is described by a linear function.•A change of one variable causes the other variable to change.

Page 15: Forecasting Using the Simple Linear Regression Model and Correlation

• Population Regression Line Is A Straight Line that Describes The Dependence of One Variable on The Other

Population Y intercept

Population SlopeCoefficient

Random Error

Dependent (Response) Variable

Independent (Explanatory) Variable

Population Linear Regression

ii iY X PopulationRegressionLine

Page 16: Forecasting Using the Simple Linear Regression Model and Correlation

= Random Error

Y

X

How is the best line found?

Observed Value

Observed Value

YX iX

i

ii iY X

Page 17: Forecasting Using the Simple Linear Regression Model and Correlation

Sample Linear Regression

Sample Y Intercept

SampleSlopeCoefficient

Sample Regression Line Provides an Estimate of The Population Regression Line

0 1i iib bY X e

provides an estimate of

provides an estimate of

0b

1b

Sample Regression Line

Residual

Y

Page 18: Forecasting Using the Simple Linear Regression Model and Correlation

Simple Linear Regression: An Example

You wish to examine the relationship between the square footage of produce stores and their annual sales. Sample data for 7 stores were obtained. Find the equation of the straight line that fits the data best

Annual Store Square Sales

Feet ($1000)

1 1,726 3,681

2 1,542 3,395

3 2,816 6,653

4 5,555 9,543

5 1,292 3,318

6 2,208 5,563

7 1,313 3,760

Page 19: Forecasting Using the Simple Linear Regression Model and Correlation

The Scatter Diagram

0

2000

4000

6000

8000

10000

12000

0 1000 2000 3000 4000 5000 6000

Square Feet

An

nu

al

Sa

les

($00

0)

Excel Output

Page 20: Forecasting Using the Simple Linear Regression Model and Correlation

The Equation for the Regression Line

i

ii

X..

XbbY

4871415163610

From Excel Printout:

CoefficientsIntercept 1636.414726X Variable 1 1.486633657

Page 21: Forecasting Using the Simple Linear Regression Model and Correlation

Graph of the Regression Line

0

2000

4000

6000

8000

10000

12000

0 1000 2000 3000 4000 5000 6000

Square Feet

An

nu

al

Sa

les

($00

0)

Y i = 1636.415 +1.487X i

Page 22: Forecasting Using the Simple Linear Regression Model and Correlation

Interpreting the Results

Yi = 1636.415 +1.487Xi

The slope of 1.487 means that each increase of one unit in X, we predict the average of Y to increase by an estimated 1.487 units.

The model estimates that for each increase of 1 square foot in the size of the store, the expected annual sales are predicted to increase by $1487.

Page 23: Forecasting Using the Simple Linear Regression Model and Correlation

The Coefficient of Determination

SSR regression sum of squares

SST total sum of squaresr2 = =

The Coefficient of Determination (r2 ) measures the proportion of variation in Y explained by the independent variable X.

Page 24: Forecasting Using the Simple Linear Regression Model and Correlation

Coefficients of Determination (R2) and Correlation (R)

r2 = 1,

Y

Yi = b0 + b1Xi

X

^

r = +1

Page 25: Forecasting Using the Simple Linear Regression Model and Correlation

Coefficients of Determination (R2) and Correlation (R)

r2 = .81, r = +0.9

Y

Yi = b0 + b1Xi

X

^

(continued)

Page 26: Forecasting Using the Simple Linear Regression Model and Correlation

Coefficients of Determination (R2) and Correlation (R)

r2 = 0, r = 0

Y

Yi = b0 + b1Xi

X

^

(continued)

Page 27: Forecasting Using the Simple Linear Regression Model and Correlation

Coefficients of Determination (R2) and Correlation (R)

r2 = 1, r = -1

Y

Yi = b0 + b1Xi

X

^

(continued)

Page 28: Forecasting Using the Simple Linear Regression Model and Correlation

Correlation: The Symbols

• Population correlation coefficient (‘rho’) measures the strength between two variables.

• Sample correlation coefficient r estimates based on a set of sample observations.

Page 29: Forecasting Using the Simple Linear Regression Model and Correlation

Regression StatisticsMultiple R 0.9705572R Square 0.94198129Adjusted R Square 0.93037754Standard Error 611.751517Observations 7

Example: Produce Stores

From Excel Printout

Page 30: Forecasting Using the Simple Linear Regression Model and Correlation

Inferences About the Slope

• t Test for a Population SlopeIs There A Linear Relationship between X and Y ?

1

11

bS

bt

•Test Statistic:

n

ii

YXb

)XX(

SS

1

21

and df = n - 2

• Null and Alternative Hypotheses

H0: 1 = 0 (No Linear Relationship) H1: 1 0 (Linear Relationship)

Where

Page 31: Forecasting Using the Simple Linear Regression Model and Correlation

Example: Produce Stores

Data for 7 Stores: Estimated Regression Equation:

The slope of this model is 1.487.

Is Square Footage of the store affecting its Annual Sales?

Annual Store Square Sales

Feet ($000)

1 1,726 3,681

2 1,542 3,395

3 2,816 6,653

4 5,555 9,543

5 1,292 3,318

6 2,208 5,563

7 1,313 3,760

Yi = 1636.415 +1.487Xi

Page 32: Forecasting Using the Simple Linear Regression Model and Correlation

t Stat P-valueIntercept 3.6244333 0.0151488X Variable 1 9.009944 0.0002812

H0: 1 = 0

H1: 1 0

.05

df 7 - 2 = 5

Critical value(s):

Test Statistic:

Decision:

Conclusion:

There is evidence of a linear relationship.t0 2.5706-2.5706

.025

Reject Reject

.025

From Excel Printout

Reject H0

Inferences About the Slope: t Test Example

Page 33: Forecasting Using the Simple Linear Regression Model and Correlation

Inferences About the Slope Using A Confidence Interval

Confidence Interval Estimate of the Slopeb1 tn-2 1bS

Excel Printout for Produce Stores

At 95% level of Confidence The confidence Interval for the slope is (1.062, 1.911). Does not include 0.

Conclusion: There is a significant linear relationship between annual sales and the size of the store.

Lower 95% Upper 95%Intercept 475.810926 2797.01853X Variable 11.06249037 1.91077694

Page 34: Forecasting Using the Simple Linear Regression Model and Correlation

Residual Analysis

Is used to evaluate validity of assumptions. Residual analysis uses numerical measures and plots to assure the validity of the assumptions.

Page 35: Forecasting Using the Simple Linear Regression Model and Correlation

Linear Regression Assumptions

1. X is linearly related to Y.

2. The variance is constant for each value of Y (Homoscedasticity).

3. The Residual Error is Normally Distributed.

4. If the data is over time, then the errors must be independent.

Page 36: Forecasting Using the Simple Linear Regression Model and Correlation

Residual Analysis for Linearity

Not Linear Linear

X

e eX

Y

X

Y

X

Page 37: Forecasting Using the Simple Linear Regression Model and Correlation

Residual Analysis for Homoscedasticity

Heteroscedasticity Homoscedasticity

e

X

e

X

Y

X X

Y

Page 38: Forecasting Using the Simple Linear Regression Model and Correlation

Residual Analysis for Independence: The Durbin-Watson Statistic

It is used when data is collected over time.

It detects autocorrelation; that is, the residuals in one time period are related to residuals in another time period.

It measures violation of independence assumption.

n

ii

n

iii

e

)ee(D

1

2

2

21 Calculate D and

compare it to the value in Table E.8.

Page 39: Forecasting Using the Simple Linear Regression Model and Correlation

Preparing Confidence Intervals for Forecasts

Page 40: Forecasting Using the Simple Linear Regression Model and Correlation

Interval Estimates for Different Values of X

X

Y

X

Confidence Interval for a individual Yi

A Given X

Confidence Interval for the mean of Y

Y i = b0 + b1X i

_

Page 41: Forecasting Using the Simple Linear Regression Model and Correlation

Estimation of Predicted Values

Confidence Interval Estimate for YX

The Mean of Y given a particular Xi

n

ii

iyxni

)XX(

)XX(

nStY

1

2

2

21

t value from table with df=n-2

Standard error of the estimate

Size of interval vary according to distance away from mean, X.

Page 42: Forecasting Using the Simple Linear Regression Model and Correlation

Estimation of Predicted Values

Confidence Interval Estimate for Individual Response Yi at a Particular Xi

n

ii

iyxni

)XX(

)XX(

nStY

1

2

2

21

1

Addition of 1 increases width of interval from that for the mean of Y

Page 43: Forecasting Using the Simple Linear Regression Model and Correlation

Example: Produce Stores

Yi = 1636.415 +1.487Xi

Data for 7 Stores:

Regression Model Obtained:

Predict the annual sales for a store with

2000 square feet.

Annual Store Square Sales

Feet ($000)

1 1,726 3,681

2 1,542 3,395

3 2,816 6,653

4 5,555 9,543

5 1,292 3,318

6 2,208 5,563

7 1,313 3,760

Page 44: Forecasting Using the Simple Linear Regression Model and Correlation

Estimation of Predicted Values: Example

Find the 95% confidence interval for the average annual sales for stores of 2,000 square feet

n

ii

iyxni

)XX(

)XX(

nStY

1

2

2

21

Predicted Sales Yi = 1636.415 +1.487Xi = 4610.45 ($000)

X = 2350.29 SYX = 611.75 tn-2 = t5 = 2.5706

= 4610.45 612.66

Confidence interval for mean Y

Confidence Interval Estimate for YX

Page 45: Forecasting Using the Simple Linear Regression Model and Correlation

Estimation of Predicted Values: Example

Find the 95% confidence interval for annual sales of one particular store of 2,000 square feet

Predicted Sales Yi = 1636.415 +1.487Xi = 4610.45 ($000)

X = 2350.29 SYX = 611.75 tn-2 = t5 = 2.5706

= 4610.45 1687.68

Confidence interval for individual Y

n

ii

iyxni

)XX(

)XX(

nStY

1

2

2

21

1

Confidence Interval Estimate for Individual Y