4. simple regression - otexts simple regression.pdf · outline the simple linear model least...

41
4. Simple regression QBUS6840 Predictive Analytics https://www.otexts.org/fpp/4 QBUS6840 Predictive Analytics 4. Simple regression 1/41

Upload: vanxuyen

Post on 07-Feb-2018

227 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 4. Simple regression - OTexts Simple regression.pdf · Outline The simple linear model Least squares estimation Forecasting with regression Non-linear functional forms Regression

4. Simple regression

QBUS6840 Predictive Analytics

https://www.otexts.org/fpp/4

QBUS6840 Predictive Analytics 4. Simple regression 1/41

Page 2: 4. Simple regression - OTexts Simple regression.pdf · Outline The simple linear model Least squares estimation Forecasting with regression Non-linear functional forms Regression

Outline

The simple linear model

Least squares estimation

Forecasting with regression

Non-linear functional forms

Regression with time series dataScenario based forecastingEx-ante versus ex-post forecastsLinear trendResidual autocorrelationSpurious regression

QBUS6840 Predictive Analytics 4. Simple regression 2/41

Page 3: 4. Simple regression - OTexts Simple regression.pdf · Outline The simple linear model Least squares estimation Forecasting with regression Non-linear functional forms Regression

Outline

The simple linear model

Least squares estimation

Forecasting with regression

Non-linear functional forms

Regression with time series dataScenario based forecastingEx-ante versus ex-post forecastsLinear trendResidual autocorrelationSpurious regression

QBUS6840 Predictive Analytics 4. Simple regression 3/41

Page 4: 4. Simple regression - OTexts Simple regression.pdf · Outline The simple linear model Least squares estimation Forecasting with regression Non-linear functional forms Regression

The simple linear model I

I In this chapter, the forecast and predictor variables are assumed tobe related by the simple linear model:

y = β0 + β1x + ε.

I An example of data from such a model is shown in Figure 4.1. Theparameters β0 and β1 determine the intercept and the slope of theline respectively. The intercept β0 represents the predicted value ofy when x = 0. The slope β1 represents the predicted increase in Yresulting from a one unit increase in x .

QBUS6840 Predictive Analytics 4. Simple regression 4/41

Page 5: 4. Simple regression - OTexts Simple regression.pdf · Outline The simple linear model Least squares estimation Forecasting with regression Non-linear functional forms Regression

The simple linear model II

Figure 4.1: An example of data from a linear regression model.

QBUS6840 Predictive Analytics 4. Simple regression 5/41

Page 6: 4. Simple regression - OTexts Simple regression.pdf · Outline The simple linear model Least squares estimation Forecasting with regression Non-linear functional forms Regression

The simple linear model III

I Notice that the observations do not lie on the straight line but arescattered around it. We can think of each observation yi consistingof the systematic or explained part of the model, β0 + β1xi , and therandom error, εi . The error term does not imply a mistake, but adeviation from the underlying straight line model. It capturesanything that may affect yi other than xi . We assume that theseerrors:

1. have mean zero; otherwise the forecasts will be systematically biased.2. are not autocorrelated; otherwise the forecasts will be inefficient as

there is more information to be exploited in the data.3. are unrelated to the predictor variable; otherwise there would be

more information that should be included in the systematic part ofthe model.

I It is also useful to have the errors normally distributed with constantvariance in order to produce prediction intervals and to performstatistical inference. While these additional conditions make thecalculations simpler, they are not necessary for forecasting.

QBUS6840 Predictive Analytics 4. Simple regression 6/41

Page 7: 4. Simple regression - OTexts Simple regression.pdf · Outline The simple linear model Least squares estimation Forecasting with regression Non-linear functional forms Regression

The simple linear model IV

I Another important assumption in the simple linear model is that x isnot a random variable. If we were performing a controlledexperiment in a laboratory, we could control the values of x (so theywould not be random) and observe the resulting values of y . Withobservational data (including most data in business and economics)it is not possible to control the value of x , and hence we make thisan assumption.

QBUS6840 Predictive Analytics 4. Simple regression 7/41

Page 8: 4. Simple regression - OTexts Simple regression.pdf · Outline The simple linear model Least squares estimation Forecasting with regression Non-linear functional forms Regression

Outline

The simple linear model

Least squares estimation

Forecasting with regression

Non-linear functional forms

Regression with time series dataScenario based forecastingEx-ante versus ex-post forecastsLinear trendResidual autocorrelationSpurious regression

QBUS6840 Predictive Analytics 4. Simple regression 8/41

Page 9: 4. Simple regression - OTexts Simple regression.pdf · Outline The simple linear model Least squares estimation Forecasting with regression Non-linear functional forms Regression

Least squares estimation I

I In practice, of course, we have a collection of observations but we donot know the values of β0 and β1. These need to be estimated fromthe data. We call this fitting a line through the data.

I There are many possible choices for β0 and β1, each choice giving adifferent line. The least squares principle provides a way of choosingβ0 and β1 effectively by minimizing the sum of the squared errors.That is, we choose the values of β0 and β1 that minimize

N∑i=1

ε2i =N∑i=1

(yi − β0 − β1xi )2.

QBUS6840 Predictive Analytics 4. Simple regression 9/41

Page 10: 4. Simple regression - OTexts Simple regression.pdf · Outline The simple linear model Least squares estimation Forecasting with regression Non-linear functional forms Regression

Least squares estimation II

I Using mathematical calculus, it can be shown that the resultingleast squares estimators are

β1 =

∑Ni=1(yi − y)(xi − x)∑N

i=1(xi − x)2

and

β0 = y − β1x ,

where x is the average of the x observations and y is the average ofthe y observations. The estimated line is known as the regressionline and is shown in Figure 4.2.

QBUS6840 Predictive Analytics 4. Simple regression 10/41

Page 11: 4. Simple regression - OTexts Simple regression.pdf · Outline The simple linear model Least squares estimation Forecasting with regression Non-linear functional forms Regression

Least squares estimation III

Figure 4.2: Estimated regression line for a random sample of size N.

QBUS6840 Predictive Analytics 4. Simple regression 11/41

Page 12: 4. Simple regression - OTexts Simple regression.pdf · Outline The simple linear model Least squares estimation Forecasting with regression Non-linear functional forms Regression

Least squares estimation IV

I We imagine that there is a true line denoted by y = β0 + β1x(shown as the dashed green line in Figure 4.2, but we do not knowβ0 and β1 so we cannot use this line for forecasting. Therefore weobtain estimates β0 and β1 from the observed data to give theregression line (the solid purple line in Figure 4.2).

I The regression line is used for forecasting. For each value of x , wecan forecast a corresponding value of y using y = β0 + β1x .

QBUS6840 Predictive Analytics 4. Simple regression 12/41

Page 13: 4. Simple regression - OTexts Simple regression.pdf · Outline The simple linear model Least squares estimation Forecasting with regression Non-linear functional forms Regression

Fitted values and residuals

I The forecast values of y obtained from the observed x values arecalled fitted values. We write these as yi = β0 + β1xi , fori = 1, . . . ,N. Each yi is the point on the regression linecorresponding to observation xi .

I The difference between the observed y values and the correspondingfitted values are the residuals:

ei = yi − yi = yi − β0 − β1xi .

I The residuals have some useful properties including the followingtwo:

N∑i=1

ei = 0 andN∑i=1

xiei = 0.

I As a result of these properties, it is clear that the average of theresiduals is zero, and that the correlation between the residuals andthe observations for predictor variable is also zero.

QBUS6840 Predictive Analytics 4. Simple regression 13/41

Page 14: 4. Simple regression - OTexts Simple regression.pdf · Outline The simple linear model Least squares estimation Forecasting with regression Non-linear functional forms Regression

Outline

The simple linear model

Least squares estimation

Forecasting with regression

Non-linear functional forms

Regression with time series dataScenario based forecastingEx-ante versus ex-post forecastsLinear trendResidual autocorrelationSpurious regression

QBUS6840 Predictive Analytics 4. Simple regression 14/41

Page 15: 4. Simple regression - OTexts Simple regression.pdf · Outline The simple linear model Least squares estimation Forecasting with regression Non-linear functional forms Regression

Forecasting with regression I

I Forecasts from a simple linear model are easily obtained using theequation

y = β0 + β1x

where x is the value of the predictor for which we require a forecast.That is, if we input a value of x in the equation we obtain acorresponding forecast y .

I When this calculation is done using an observed value of x from thedata, we call the resulting value of y a fitted value. This is not agenuine forecast as the actual value of y for that predictor value wasused in estimating the model, and so the value of y is affected bythe true value of y . When the values of x is a new value (i.e., notpart of the data that were used to estimate the model), the resultingvalue of y is a genuine forecast.

QBUS6840 Predictive Analytics 4. Simple regression 15/41

Page 16: 4. Simple regression - OTexts Simple regression.pdf · Outline The simple linear model Least squares estimation Forecasting with regression Non-linear functional forms Regression

Forecasting with regression II

I Assuming that the regression errors are normally distributed, anapproximate 95% forecast interval (also called a predictioninterval) associated with this forecast is given by

y ± 1.96se

√1 +

1

N+

(x − x)2

(N − 1)s2x, (4.2)

where N is the total number of observations, x is the mean of theobserved x values, sx is the standard deviation of the observed xvalues and se is the standard error of the residuals. Similarly, an80% forecast interval can be obtained by replacing 1.96 by 1.28 inequation (4.2).

I Equation (4.2) shows that the forecast interval is wider when x is farfrom x . That is, we are more certain about our forecasts whenconsidering values of the predictor variable close to its sample mean.

QBUS6840 Predictive Analytics 4. Simple regression 16/41

Page 17: 4. Simple regression - OTexts Simple regression.pdf · Outline The simple linear model Least squares estimation Forecasting with regression Non-linear functional forms Regression

Forecasting with regression III

I The estimated regression line in the Car data example is

y = 12.53 − 0.22x .

For the Chevrolet Aveo (the first car in the list) x1 = 25 mpg andy1 = 6.6 tons of CO2 per year. The model returns a fitted value ofy1 = 7.00, i.e., e1 = 0.4. For a car with City driving fuel economyx = 30 mpg, the average footprint forecasted is y = 5.90 tons ofCO2 per year. The corresponding 95% and 80% forecast intervalsare [4.95,6.84] and [5.28,6.51] respectively.

QBUS6840 Predictive Analytics 4. Simple regression 17/41

Page 18: 4. Simple regression - OTexts Simple regression.pdf · Outline The simple linear model Least squares estimation Forecasting with regression Non-linear functional forms Regression

Forecasting with regression IV

Figure 4.6: Forecast with 80% and 95% forecast intervals for a car with x = 30mpg in city driving.

QBUS6840 Predictive Analytics 4. Simple regression 18/41

Page 19: 4. Simple regression - OTexts Simple regression.pdf · Outline The simple linear model Least squares estimation Forecasting with regression Non-linear functional forms Regression

Outline

The simple linear model

Least squares estimation

Forecasting with regression

Non-linear functional forms

Regression with time series dataScenario based forecastingEx-ante versus ex-post forecastsLinear trendResidual autocorrelationSpurious regression

QBUS6840 Predictive Analytics 4. Simple regression 19/41

Page 20: 4. Simple regression - OTexts Simple regression.pdf · Outline The simple linear model Least squares estimation Forecasting with regression Non-linear functional forms Regression

Non-linear functional forms I

I Although the linear relationship assumed at the beginning of thischapter is often adequate, there are cases for which a non-linearfunctional form is more suitable. The scatter plot of Carbon versusCity in Figure 4.3 is an example where a non-linear functional formis required.

I Simply transforming variables y and/or x and then estimating aregression model using the transformed variables is the simplest wayof obtaining a non-linear specification. The most commonly usedtransformation is the (natural) logarithmic (see Section 2/4).

I A log-log functional form is specified as

log yi = β0 + β1 log xi + εi .

In this model, the slope β1 can be interpeted as an elasticity: β1 isthe average percentage change in y resulting from a 1% change in x .

QBUS6840 Predictive Analytics 4. Simple regression 20/41

Page 21: 4. Simple regression - OTexts Simple regression.pdf · Outline The simple linear model Least squares estimation Forecasting with regression Non-linear functional forms Regression

Non-linear functional forms II

I Figure 4.7 shows a scatter plot of Carbon versus City and the fittedlog-log model in both the original scale and the logarithmic scale.The plot shows that in the original scale the slope of the fittedregression line using a log-log functional form is non-constant. Theslope depends on x and can be calculated for each point (see Table4.1). In the logarithmic scale the slope of the line which is nowinterpreted as an elasticity is constant. So estimating a log-logfunctional form produces a constant elasticity estimate in contrast tothe linear model which produces a constant slope estimate.

QBUS6840 Predictive Analytics 4. Simple regression 21/41

Page 22: 4. Simple regression - OTexts Simple regression.pdf · Outline The simple linear model Least squares estimation Forecasting with regression Non-linear functional forms Regression

Non-linear functional forms III

Figure 4.7: Fitting a log-log functional form to the Car data example. Plotsshow the estimated relationship both in the original and the logarithmic scales.

QBUS6840 Predictive Analytics 4. Simple regression 22/41

Page 23: 4. Simple regression - OTexts Simple regression.pdf · Outline The simple linear model Least squares estimation Forecasting with regression Non-linear functional forms Regression

Non-linear functional forms IV

Figure 4.8: Residual plot from estimating a log-log functional form for the Cardata example.

QBUS6840 Predictive Analytics 4. Simple regression 23/41

Page 24: 4. Simple regression - OTexts Simple regression.pdf · Outline The simple linear model Least squares estimation Forecasting with regression Non-linear functional forms Regression

Non-linear functional forms V

I Other useful forms are the log-linear form and the linear-log form:

Model Functional form Slope Elasticitylinear y = β0 + β1x β1 β1x/ylog-log log y = β0 + β1 log x β1y/x β1linear-log y = β0 + β1 log x β1/x β1ylog-linear log y = β0 + β1x β1y β1x

Table 4.1: Summary of selected functional forms. Elasticities that dependon the observed values of y and x are commonly calculated for the samplemeans of these.

QBUS6840 Predictive Analytics 4. Simple regression 24/41

Page 25: 4. Simple regression - OTexts Simple regression.pdf · Outline The simple linear model Least squares estimation Forecasting with regression Non-linear functional forms Regression

Non-linear functional forms VI

Figure 4.9: The four non-linear forms shown in Table 4.1.

QBUS6840 Predictive Analytics 4. Simple regression 25/41

Page 26: 4. Simple regression - OTexts Simple regression.pdf · Outline The simple linear model Least squares estimation Forecasting with regression Non-linear functional forms Regression

Outline

The simple linear model

Least squares estimation

Forecasting with regression

Non-linear functional forms

Regression with time series dataScenario based forecastingEx-ante versus ex-post forecastsLinear trendResidual autocorrelationSpurious regression

QBUS6840 Predictive Analytics 4. Simple regression 26/41

Page 27: 4. Simple regression - OTexts Simple regression.pdf · Outline The simple linear model Least squares estimation Forecasting with regression Non-linear functional forms Regression

Regression with time series data I

I When using regression for prediction, we are often considering timeseries data and we are aiming to forecast the future. There are a fewissues that arise with time series data but not with cross-sectionaldata that we will consider in this section.

Figure 4.10: Percentage changes in personal consumption expenditure for theUS.

QBUS6840 Predictive Analytics 4. Simple regression 27/41

Page 28: 4. Simple regression - OTexts Simple regression.pdf · Outline The simple linear model Least squares estimation Forecasting with regression Non-linear functional forms Regression

Regression with time series data III Figure 4.10 shows time series plots of quarterly percentage changes

(growth rates) of real personal consumption expenditure (C ) andreal personal disposable income (I ) for the US for the period March1970 to Dec 2010. Also shown is a scatter plot including theestimated regression line

C = 0.52 + 0.32I ,

with the estimation results are shown below the graphs. These showthat a 1% increase in personal disposable income will result to anaverage increase of 0.84% in personal consumption expenditure. Weare interested in forecasting consumption for the four quarters of2011.

I Using a regression model to forecast time series data poses achallenge in that future values of the predictor variable (Income inthis case) are needed to be input into the estimated model, butthese are not known in advance. One solution to this problem is touse scenario based forecasting.

QBUS6840 Predictive Analytics 4. Simple regression 28/41

Page 29: 4. Simple regression - OTexts Simple regression.pdf · Outline The simple linear model Least squares estimation Forecasting with regression Non-linear functional forms Regression

Scenario based forecasting I

I In this setting the forecaster assumes possible scenarios for thepredictor variable that are of interest. For example the US policymaker may want to forecast consumption if there is a 1% growth inincome for each of the quarters in 2011. Alternatively a 1% declinein income for each of the quarters may be of interest. The resultingforecasts are calculated and shown in Figure 4.11.

QBUS6840 Predictive Analytics 4. Simple regression 29/41

Page 30: 4. Simple regression - OTexts Simple regression.pdf · Outline The simple linear model Least squares estimation Forecasting with regression Non-linear functional forms Regression

Scenario based forecasting II

Figure 4.11: Forecasting percentage changes in personal consumptionexpenditure for the US.

QBUS6840 Predictive Analytics 4. Simple regression 30/41

Page 31: 4. Simple regression - OTexts Simple regression.pdf · Outline The simple linear model Least squares estimation Forecasting with regression Non-linear functional forms Regression

Scenario based forecasting III

I Forecast intervals for scenario based forecasts do not include theuncertainty associated with the future values of the predictorvariables. They assume the value of the predictor is known inadvance.

I An alternative approach is to use genuine forecasts for the predictorvariable. For example, a pure time series based approach can beused to generate forecasts for the predictor variable (more on this inChapter 9) or forecasts published by some other source such as agovernment agency can be used.

QBUS6840 Predictive Analytics 4. Simple regression 31/41

Page 32: 4. Simple regression - OTexts Simple regression.pdf · Outline The simple linear model Least squares estimation Forecasting with regression Non-linear functional forms Regression

Ex-ante versus ex-post forecasts I

I When using regression models with time series data, we need todistinguish between two different types of forecasts that can beproduced, depending on what is assumed to be known when theforecasts are computed.

I Ex ante forecasts are those that are made using only the informationthat is available in advance. For example, ex ante forecasts ofconsumption for the four quarters in 2011 should only useinformation that was available before 2011. These are the onlygenuine forecasts, made in advance using whatever information isavailable at the time.

I Ex post forecasts are those that are made using later information onthe predictors. For example, ex post forecasts of consumption foreach of the 2011 quarters may use the actual observations of incomefor each of these quarters, once these have been observed. These arenot genuine forecasts, but are useful for studying the behaviour offorecasting models.

QBUS6840 Predictive Analytics 4. Simple regression 32/41

Page 33: 4. Simple regression - OTexts Simple regression.pdf · Outline The simple linear model Least squares estimation Forecasting with regression Non-linear functional forms Regression

Ex-ante versus ex-post forecasts II

I The model from which ex-post forecasts are produced should not beestimated using data from the forecast period. That is, ex-postforecasts can assume knowledge of the predictor variable (the xvariable), but should not assume knowledge of the data that are tobe forecast (the y variable).

I A comparative evaluation of ex ante forecasts and ex post forecastscan help to separate out the sources of forecast uncertainty. Thiswill show whether forecast errors have arisen due to poor forecasts ofthe predictor or due to a poor forecasting model.

QBUS6840 Predictive Analytics 4. Simple regression 33/41

Page 34: 4. Simple regression - OTexts Simple regression.pdf · Outline The simple linear model Least squares estimation Forecasting with regression Non-linear functional forms Regression

Linear trend I

I A common feature of time series data is a trend. Using regressionwe can model and forecast the trend in time series data by includingt = 1, . . . ,T , as a predictor variable:

yt = β0 + β1t + εt .

I Figure 4.12 shows a time series plot of aggregate tourist arrivals toAustralia over the period 1980 to 2010 with the fitted linear trendline yt = 0.3375 + 0.1761t. Also plotted are the point and forecastintervals for the years 2011 to 2015.

QBUS6840 Predictive Analytics 4. Simple regression 34/41

Page 35: 4. Simple regression - OTexts Simple regression.pdf · Outline The simple linear model Least squares estimation Forecasting with regression Non-linear functional forms Regression

Linear trend II

Figure 4.12: Forecasting international tourist arrivals to Australia for the period2011-2015 using a linear trend. 80% and 95% forecast intervals are shown.

QBUS6840 Predictive Analytics 4. Simple regression 35/41

Page 36: 4. Simple regression - OTexts Simple regression.pdf · Outline The simple linear model Least squares estimation Forecasting with regression Non-linear functional forms Regression

Residual autocorrelation I

I With time series data it is highly likely that the value of a variableobserved in the current time period will be influenced by its value inthe previous period, or even the period before that, and so on.Therefore when fitting a regression model to time series data, it isvery common to find autocorrelation in the residuals. In this case,the estimated model violates the assumption of no autocorrelation inthe errors, and our forecasts may be inefficient there is someinformation left over which should be utilized in order to obtainbetter forecasts. The forecasts from a model with autocorrelatederrors are still unbiased, and so are not wrong, but they will usuallyhave larger prediction intervals than they need to.

I Figure 4.13 plots the residuals from Examples 4.3 and 4.4, and theACFs of the residuals (see Section 2/2 for an introduction to theACF). The ACF of the consumption residuals shows a significantspike at lag 2 and the ACF of the tourism residuals shows significantspikes at lags 1 and 2. Usually plotting the ACFs of the residuals areadequate to reveal any potential autocorrelation in the residuals.More formal tests for autocorrelation are discussed in Section 5/4.

QBUS6840 Predictive Analytics 4. Simple regression 36/41

Page 37: 4. Simple regression - OTexts Simple regression.pdf · Outline The simple linear model Least squares estimation Forecasting with regression Non-linear functional forms Regression

Residual autocorrelation II

Figure 4.13: Residuals from the regression models for Consumption andTourism. Because these involved time series data, it is important to look at theACF of the residuals to see if there is any remaining information not accountedfor by the model. In both these examples, there is some remainingautocorrelation in the residuals.

QBUS6840 Predictive Analytics 4. Simple regression 37/41

Page 38: 4. Simple regression - OTexts Simple regression.pdf · Outline The simple linear model Least squares estimation Forecasting with regression Non-linear functional forms Regression

Spurious regression I

I More often than not, time series data are non-stationary; that is, thevalues of the time series do not fluctuate around a constant mean orwith a constant variance. We will deal with time series stationarityin more detail in Chapter 8, but here we need to address the effectnon-stationary data can have on regression models.

I For example consider the two variable plotted in Figure 4.14, whichappear to be related simply because they both trend upwards in thesame manner. However, air passenger traffic in Australia hasnothing to do with rice production in Guinea. Selected outputobtained from regressing the number of air passengers transported inAustralia versus rice production in Guinea (in metric tons) is alsoshown in Figure 4.14.

QBUS6840 Predictive Analytics 4. Simple regression 38/41

Page 39: 4. Simple regression - OTexts Simple regression.pdf · Outline The simple linear model Least squares estimation Forecasting with regression Non-linear functional forms Regression

Spurious regression II

Figure 4.14: Trending time series data can appear to be related, as shown inthis example in which air passengers in Australia are regressed against riceproduction in Guinea.

QBUS6840 Predictive Analytics 4. Simple regression 39/41

Page 40: 4. Simple regression - OTexts Simple regression.pdf · Outline The simple linear model Least squares estimation Forecasting with regression Non-linear functional forms Regression

Spurious regression III

I Regressing non-stationary time series can lead to spuriousregressions. High R2s and high residual autocorrelation can be signsof spurious regression. We discuss the issues surroundingnon-stationary data and spurious regressions in detail in Chapter 9.

I Cases of spurious regression might appear to give reasonableshort-term forecasts, but they will generally not continue to workinto the future.

QBUS6840 Predictive Analytics 4. Simple regression 40/41

Page 41: 4. Simple regression - OTexts Simple regression.pdf · Outline The simple linear model Least squares estimation Forecasting with regression Non-linear functional forms Regression

Outline

The simple linear model

Least squares estimation

Forecasting with regression

Non-linear functional forms

Regression with time series dataScenario based forecastingEx-ante versus ex-post forecastsLinear trendResidual autocorrelationSpurious regression

QBUS6840 Predictive Analytics 4. Simple regression 41/41