slides 13c: causal models and regression analysis mgs3100 chapter 13 forecasting

Slides 13c: Causal Models and

Regression Analysis

MGS3100 Chapter 13Forecasting

In a In a causal forecastingcausal forecasting model, the forecast model, the forecast for the quantity of interest “rides for the quantity of interest “rides piggyback” on another quantity or set of piggyback” on another quantity or set of quantities.quantities.In other words, our knowledge of the value In other words, our knowledge of the value of one variable (or perhaps several of one variable (or perhaps several variables) enables us to forecast the value variables) enables us to forecast the value of another variable.of another variable.In this model, letIn this model, let

yy denote the true value of some denote the true value of some variable of interest andvariable of interest and

yy denote a predicted or forecast value denote a predicted or forecast value for that variable.for that variable.

^̂

Then, in a causal model, Then, in a causal model,

wherewhere

ff is a forecasting rule, or function, and is a forecasting rule, or function, and

xx11, , xx2 2 , … , … xxi i , is a set of variables, is a set of variables

yy = = ff((xx11, , xx22, … , … xxnn))^̂

In this representation, the In this representation, the xx variables are variables are often called often called independent variablesindependent variables, whereas , whereas yy is the is the dependent dependent or or response variableresponse variable..

^̂

We either know the independent variables We either know the independent variables in advance or can forecast them more easily in advance or can forecast them more easily than than yy. .

^̂

Then the independent variables will be used Then the independent variables will be used in the forecasting model to forecast the in the forecasting model to forecast the dependent variable.dependent variable.

Companies often find by looking at past Companies often find by looking at past performance that their monthly sales are performance that their monthly sales are directly related to the monthly GDP, and directly related to the monthly GDP, and thus figure that a good forecast could be thus figure that a good forecast could be made using next month’s GDP figure.made using next month’s GDP figure.

The only problem is that this quantity is not The only problem is that this quantity is not known, or it may just be a forecast and thus known, or it may just be a forecast and thus not a truly independent variable.not a truly independent variable.

To use a causal forecasting model, requires To use a causal forecasting model, requires two conditions:two conditions:

1.1. There must be a relationship between There must be a relationship between values of the independent and values of the independent and dependent variables such that the dependent variables such that the former provides information about the former provides information about the latter.latter.

2.2. The values for the independent The values for the independent variables must be known and variables must be known and available to the forecaster at the time available to the forecaster at the time the forecast is made.the forecast is made.

Simply because there is a mathematical Simply because there is a mathematical relationship does not guarantee that there relationship does not guarantee that there is really cause and effect. is really cause and effect.

One commonly used approach in creating a One commonly used approach in creating a causal forecasting model is called causal forecasting model is called curve curve fittingfitting..

Consider an oil company that is planning to Consider an oil company that is planning to expand its network of modern self-service expand its network of modern self-service gasoline stations. gasoline stations.

CURVE FITTING: CURVE FITTING: AN OIL COMPANY EXPANSIONAN OIL COMPANY EXPANSION

The company plans to use traffic flow The company plans to use traffic flow (measured in the average number of cars (measured in the average number of cars per hour) to forecast sales (measured in per hour) to forecast sales (measured in average dollar sales per hour).average dollar sales per hour).

The firm has had five stations in operation The firm has had five stations in operation for more than a year and has used historical for more than a year and has used historical data to calculate the following averages:data to calculate the following averages:

The averages are plotted in a scatter The averages are plotted in a scatter diagram.diagram.

$-

$50.00

$100.00

$150.00

$200.00

$250.00

$300.00

0 50 100 150 200 250

Cars/hour

Sal

es

/ho

ur

($)

Now, these data will be used to construct a Now, these data will be used to construct a function that will be used to forecast sales function that will be used to forecast sales at any proposed location by measuring the at any proposed location by measuring the traffic flow at that location and plugging its traffic flow at that location and plugging its value into the constructed function.value into the constructed function.

Least Squares FitsLeast Squares Fits The method of least The method of least squares is a formal procedure for curve squares is a formal procedure for curve fitting. It is a two-step process.fitting. It is a two-step process.

1.1. Select a specific functional form (e.g., Select a specific functional form (e.g., a straight line or quadratic curve).a straight line or quadratic curve).

2.2. Within the set of functions specified Within the set of functions specified in step 1, choose the specific function in step 1, choose the specific function that minimizes the sum of the squared that minimizes the sum of the squared deviations between the data points deviations between the data points and the function values.and the function values.

To demonstrate the process, consider the To demonstrate the process, consider the sales-traffic flow example.sales-traffic flow example.

1.1. Assume a straight line; that is, Assume a straight line; that is, functions of the form functions of the form y = a + bxy = a + bx..

2.2. Draw the line in the scatter diagram Draw the line in the scatter diagram and indicate the deviations between and indicate the deviations between observed points and the function as observed points and the function as ddi i ..

dd11 = = yy11 – [a +b – [a +bxx11] = 220 – [a + 150b] ] = 220 – [a + 150b] For example,For example,

wherewhereyy1 1 = actual sales/hr at location 1= actual sales/hr at location 1xx11 = actual traffic flow at location 1 = actual traffic flow at location 1aa = = yy-axis intercept for the function-axis intercept for the functionbb = slope for the function = slope for the function

The value The value dd1122 is one measure of how close is one measure of how close

the value of the function the value of the function [a +b[a +bxx11] ] is to the is to the observed value, observed value, yy11; that is it indicates how ; that is it indicates how well the function fits at this one point.well the function fits at this one point.

$-

$50.00

$100.00

$150.00

$200.00

$250.00

$300.00

0 50 100 150 200 250

Cars/hour

Sal

es

/ho

ur

($)

dd22

dd55

dd44

dd11

dd33

y y == a a + + bxbx

yy

xx

One measure of how well the function fits One measure of how well the function fits overall is the sum of the squared overall is the sum of the squared deviations:deviations:

ddii22

i=1i=1

55

Consider a general model with Consider a general model with nn as opposed as opposed to five observations. Since each to five observations. Since each ddi i = = yyii – (a – (a +b+bxxii)), the sum of the squared deviations , the sum of the squared deviations can be written as:can be written as:

i=1i=1

nn

((yyii – [a +b – [a +bxxii])])22

Using the method of least squares, select Using the method of least squares, select aa and and bb so as to minimize the sum in the so as to minimize the sum in the equation above.equation above.

Now, take the partial derivative of the sum Now, take the partial derivative of the sum with respect to with respect to aa and set the resulting and set the resulting expression equal to zero.expression equal to zero.

i=1i=1

nn

-2(-2(yyii – [a +b – [a +bxxii]) = 0]) = 0

A second equation is derived by following A second equation is derived by following the same procedure with the same procedure with bb..

i=1i=1

nn

-2-2xxii ( (yyii – [a +b – [a +bxxii]) = 0]) = 0

Recall that the values for Recall that the values for xxii and and yyii are the are the observations, and our goal is to find the observations, and our goal is to find the values of values of aa and and bb that satisfy these two that satisfy these two equations. equations.

The solution is:The solution is:

xxii

i=1i=1

nn

xxiiyyi i --b =b =

11nn

i=1i=1

nn

xxii i=1i=1

nn

yyii

i=1i=1

nn

xxii22 --

11nn

i=1i=1

nn22

aa == 11nn

i=1i=1

nn

yyii - b- b 11nn

i=1i=1

nn

xxii

The next step is to determine the values The next step is to determine the values for:for:

i=1i=1

nn

xxii22

i=1i=1

nn

yyiii=1i=1

nn

xxii i=1i=1

nn

xxiiyyii

Note that these quantities depend only on Note that these quantities depend only on observed data and can be found with simple observed data and can be found with simple arithmetic operations or automatically using arithmetic operations or automatically using Excel’s predefined functions.Excel’s predefined functions.

Using Excel, click on Using Excel, click on Tools – Data Analysis …Tools – Data Analysis …

In the resulting In the resulting dialog, choose dialog, choose

RegressionRegression..

In the In the RegressionRegression dialog, enter the dialog, enter the YY-range -range and and XX-range.-range.

Choose Choose to place to place the the output in output in

a new a new worksheeworksheet called t called ResultsResultsSelect Select Residual Plots Residual Plots and and Normal Normal Probability PlotsProbability Plots to be created along with to be created along with the output.the output.

Click Click OKOK to produce the following results: to produce the following results:

Note that Note that a a ((InterceptIntercept) and ) and bb ( (X Variable 1X Variable 1) ) are reported as are reported as 57.10457.104 and and 0.929970.92997, , respectively.respectively.

To add the resulting least squares line, first To add the resulting least squares line, first click on the worksheet click on the worksheet Chart 1Chart 1 which which contains the original scatter plot.contains the original scatter plot.

Next, click on the data series so that they Next, click on the data series so that they are highlighted and then choose are highlighted and then choose Add Add Trendline …Trendline … from the from the ChartChart pull-down menu. pull-down menu.

Choose Choose Linear TrendLinear Trend in the resulting dialog in the resulting dialog and click and click OKOK..

A linear trend is fit to the data:A linear trend is fit to the data:

$-

$50.00

$100.00

$150.00

$200.00

$250.00

$300.00

0 50 100 150 200 250

Cars/hour

Sal

es

/ho

ur

($)

Series1

Linear (Series1)

One of the other summary output values One of the other summary output values that is given in Excel is: that is given in Excel is: R Square = 69.4%R Square = 69.4%

This is a “goodness of fit” measure which This is a “goodness of fit” measure which represents the represents the RR22 statistic discussed in statistic discussed in introductory statistics classes.introductory statistics classes.

RR22 ranges in value from ranges in value from 00 to to 11 and gives an and gives an indication of how much of the total variation indication of how much of the total variation in in YY from its mean is explained by the new from its mean is explained by the new trend line.trend line.In fact, there are three different sums of In fact, there are three different sums of errors:errors:

TSSTSS (Total Sum of Squares) (Total Sum of Squares)

ESSESS (Error Sum of Squares) (Error Sum of Squares)

RSSRSS (Regression Sum of Squares) (Regression Sum of Squares)

The basic relationship between them is:The basic relationship between them is:

TSS = ESS + RSSTSS = ESS + RSS

They are defined as follows:They are defined as follows:

TSS TSS ==

i=1i=1

nn

((YYii – – Y Y ))22––

ESS ESS ==

i=1i=1

nn

((YYii – – YYii ))22^̂

i=1i=1

nn

((YYii – – Y Y ))22^̂ ––RSS RSS ==

Essentially, the Essentially, the ESSESS is the amount of is the amount of variation that can’t be explained by the variation that can’t be explained by the regression.regression.The The RSSRSS quantity is effectively the amount quantity is effectively the amount of the original, total variation (of the original, total variation (TSSTSS) that ) that could be removed using the regression line.could be removed using the regression line.

If the regression line fits perfectly, then If the regression line fits perfectly, then ESS ESS = 0= 0 and and RSS = TSSRSS = TSS, resulting in , resulting in RR22 = 1= 1..

RR22 = =RSSRSSTSSTSS

RR22 is defined as: is defined as:

In this example, In this example, RR22 = .694 = .694 which means that which means that approximately approximately 70%70% of the variation in the of the variation in the YY values is explained by the one explanatory values is explained by the one explanatory variable (variable (XX), cars per hour.), cars per hour.

Now, returning to the original question: Now, returning to the original question: Should we build a station at Buffalo Grove Should we build a station at Buffalo Grove where traffic is 183 cars/hour?where traffic is 183 cars/hour?

The best guess at what the corresponding The best guess at what the corresponding sales volume would be is found by placing sales volume would be is found by placing this this XX value into the new regression value into the new regression equation:equation:

Sales/hour = 57.104 + 0.92997 * (183 Sales/hour = 57.104 + 0.92997 * (183 cars/hour)cars/hour)

However, it would be nice to be able to However, it would be nice to be able to state a 95% confidence interval around state a 95% confidence interval around this best guess. this best guess.

yy = a + b * = a + b * xx^̂

= $227.29= $227.29

Excel reports that the Excel reports that the standard error (standard error (SSee) is ) is 44.1844.18. . This quantity This quantity represents the represents the amount of scatter in amount of scatter in the actual data the actual data around the regression around the regression line.line.

We can get the information to do this from We can get the information to do this from Excel’s Excel’s Summary OutputSummary Output..

The formula for The formula for SSee is: is:

SSee = =i=1i=1

nn

((YYii – – YYii ))22^̂

nn – – kk - -11

Where Where nn is the is the number of data number of data points (e.g., points (e.g., 55) and ) and kk is the number of is the number of independent independent variables (e.g., variables (e.g., 11).).

This equation is equivalent to:This equation is equivalent to:nn – – kk - -11

ESSESS

Once we know Once we know SSee and based on the normal and based on the normal distribution, we can state that distribution, we can state that

• We have We have 68%68% confidence that the confidence that the actual value of sales/hour is within actual value of sales/hour is within ++ 11 SSee of the predicted value of the predicted value (($277.29$277.29).).• We have We have 95%95% confidence that the confidence that the actual value of sales/hour is within actual value of sales/hour is within ++ 22 SSee of the predicted value of the predicted value (($277.29$277.29). ).

[[277.29 – 2(44.18)277.29 – 2(44.18);; 227.29 + 2(44.18) 227.29 + 2(44.18)]]

[[$138.93$138.93;; $315.65 $315.65]]

The The 95%95% confidence interval is: confidence interval is:

Another value of interest in the Another value of interest in the SummarySummary report is the report is the tt-statistic for the -statistic for the XX variable variable and its associated values.and its associated values.

The The tt-statistic is -statistic is 2.612.61 and the and the PP-value is -value is 0.07980.0798. . A A PP-value less than -value less than 0.050.05 represents that we represents that we have at least have at least 95%95% confidence that the slope confidence that the slope parameter (parameter (bb) is statistically significantly ) is statistically significantly than than 00 (zero). (zero). A slope of A slope of 00 results in a flat trend line and results in a flat trend line and indicates no relationship between indicates no relationship between YY and and XX..

The The 95%95% confidence limit for confidence limit for bb is [ is [-0.205-0.205;; 2.0642.064]]Thus, we can’t exclude the possibility that Thus, we can’t exclude the possibility that the true value of the true value of bb might be might be 00..

Also given in the Also given in the SummarySummary report is the report is the F F –significance. Since there is only one –significance. Since there is only one independent variable, the independent variable, the F F –significance is –significance is identical to the identical to the PP-value for the -value for the t-t-statistic.statistic.

In the case of more than one In the case of more than one XX variable, the variable, the FF –significance tests the hypothesis that all –significance tests the hypothesis that all the the XX variable parameters as a group are variable parameters as a group are statistically significantly different than zero.statistically significantly different than zero.

Concerning multiple regression models, as Concerning multiple regression models, as you add other you add other XX variables, the variables, the RR22 statistic statistic will always increase, meaning the will always increase, meaning the RSSRSS has has increased.increased. In this case, the In this case, the

Adjusted Adjusted RR22 statistic is a statistic is a reliable indicator of the reliable indicator of the

true goodness of fit true goodness of fit because it compensates because it compensates for the reduction in the for the reduction in the ESSESS due to the addition due to the addition

of more independent of more independent variables.variables.Thus, it may report a decreased adjusted Thus, it may report a decreased adjusted RR22

value even though value even though RR22 has increased, unless has increased, unless the improvement in the improvement in RSSRSS is more than is more than compensated for by the addition of the new compensated for by the addition of the new independent variables.independent variables.

If, for example, a quadratic function fits If, for example, a quadratic function fits better than a linear function, why not better than a linear function, why not choose a more general form, thereby choose a more general form, thereby getting an even better fit?getting an even better fit?

WHICH CURVE TO FIT?WHICH CURVE TO FIT?

In practice, functions of the form (with only In practice, functions of the form (with only a single independent variable for a single independent variable for illustrative purposes) are often suggested:illustrative purposes) are often suggested:

yy = a = a00 + a + a11xx + a + a22xx2 2 + … + a+ … + annxxnn

Such a function is called a Such a function is called a polynomial of polynomial of degree degree nn, and it represents a broad and , and it represents a broad and flexible class of functions.flexible class of functions.

n n = 2= 2quadraticquadraticn n = 3= 3 cubiccubicn n = 4= 4 quarticquartic……

One must proceed with caution when fitting One must proceed with caution when fitting data with a polynomial function.data with a polynomial function.

For example, it is possible to find a (For example, it is possible to find a (kk – 1 – 1)-)-degree polynomial that will perfectly fit degree polynomial that will perfectly fit kk data points.data points.To be more specific, suppose we have seven To be more specific, suppose we have seven historical observations, denoted historical observations, denoted

((xxi i , , yyii), ), ii = 1, 2, …, 7 = 1, 2, …, 7

It is possible to find a sixth-degree It is possible to find a sixth-degree polynomialpolynomialyy = a = a00 + a + a11xx + a + a22xx2 2 + … + a+ … + a66xx66

that exactly passes through each of these that exactly passes through each of these seven data points.seven data points.

A perfect fit gives zero for the sum of A perfect fit gives zero for the sum of squared deviations.squared deviations.

However, However, this is this is

deceptive, deceptive, for it does for it does not imply not imply

much much about the about the predictive predictive

value of value of the model the model for use in for use in

future future forecastinforecastin

g. g.

Despite the perfect fit of the polynomial Despite the perfect fit of the polynomial function, the forecast is very inaccurate. function, the forecast is very inaccurate. The linear fit might provide more realistic The linear fit might provide more realistic forecasts.forecasts. Also, note Also, note

that the that the polynomial polynomial

fit has fit has hazardous hazardous

extrapolatiextrapolation on

properties properties (i.e., the (i.e., the

polynomial polynomial “blows up” “blows up”

at its at its extremes).extremes).

Reliability and Validity • Does the model make intuitive sense? Is the

model easy to understand and interpret?• Are the coefficients statistically significant

(p-values less than .05)?• Are the signs associated with the coefficients

as expected?• Does the model predict values that are

reasonably close to the actual values?• Is the model sufficiently sound (high R2, low

standard error, etc.)?

Correlation Coefficient and Coefficient of Determination

• Coefficient of determination = r2.

• Correlation coefficient = r.

• Where: Yi = dependent variable.

• Xi = independent variable.

• n = number of observations.

2 2 2 2[ ( ) ][ ( ) ]

i i i i

i i i i

n X Y X Yr

n X X Y Y

Correlation Coefficient and Coefficient of Determination

Summary: Causal Forecasting Models

• The goal of causal forecasting model is to develop the best statistical relationship between a dependent variable and one or more independent variables.

• The most common model approach used in practice is regression analysis. Only linear regression models are examined in this course.

• In causal forecasting models, when one tries to predict a dependent variable using a single independent variable, it is called a simple regression model.

• When one uses more than one independent variable to forecast the dependent variable, it is called a multiple regression model.

slides 13c: causal models and regression analysis mgs3100 chapter 13 forecasting

Documents