forecastit 2. linear regression & model statistics
DESCRIPTION
This lesson begins with explaining the linear regression method characteristics, and uses. Linear regression method attempts to best fit a line through the data. Using an example and the forecasting process, we apply the linear regression method to create a model and forecast based upon it.TRANSCRIPT
Copyright 2010 DeepThought, Inc. 1
Linear Regression and Model Statistics
Lesson #2
Linear Regression Method
Copyright 2010 DeepThought, Inc. 2
Linear Regression and Model Statistics
Method Introduction• One of the simpler methods to use for forecasting• Estimates a line through the data• Uses the estimated line equation to forecast future values.• Method format:
– Y = a + b × t
Copyright 2010 DeepThought, Inc. 3
Linear Regression and Model Statistics
Model Characteristics• Method characteristics
– Fits a line to the data– Estimating a line which minimizes the errors between actual
data points and model estimates• When to use method
– Estimate trend– Estimate trend magnitude
• When not to use– Estimate anything beyond a simple linear relationship
Copyright 2010 DeepThought, Inc. 4
Linear Regression and Model Statistics
Forecasting Steps1. Set an objective2. Build model3. Evaluate model4. Use model
Copyright 2010 DeepThought, Inc. 5
Linear Regression and Model Statistics
Objective Setting• Simpler is better• Linear regression allows to test whether a line fitted to the data
works as a model. Objectives should take that principal under consideration
• Example objectives for M2 Money Stock (see next slide):– Test if M2 has a linear trend over time– If M2 exhibits a statistically significant trend, review and
interpret results– If model looks good, create a forecast based off model
Copyright 2010 DeepThought, Inc. 6
Linear Regression and Model Statistics
Example: M2 Money Stock
May-79 Nov-84 May-90 Oct-95 Apr-01 Oct-06 Apr-120.0
1000.0
2000.0
3000.0
4000.0
5000.0
6000.0
7000.0
8000.0
9000.0
M2 Money Stock (Billions of $'s)
Copyright 2010 DeepThought, Inc. 7
Linear Regression and Model Statistics
Method Selection• Observe time series qualities: trend, seasonality, cyclicality, and
randomness• Adjust time frame, units, periods to forecast as needed• Determine if linear regression is a possible candidate based on
method characteristics– Determine if transforming the units will enable use of model
▪ Eight different unit transformation techniques
Copyright 2010 DeepThought, Inc. 8
Linear Regression and Model Statistics
Build Model• Software finds us the best fit line to the data; minimizing the sum of
squared errors
May-79 Nov-84 May-90 Oct-95 Apr-01 Oct-06 Apr-120.0
1000.0
2000.0
3000.0
4000.0
5000.0
6000.0
7000.0
8000.0
9000.0
M2 Money Stock (Billions of $'s)
Copyright 2010 DeepThought, Inc. 9
Linear Regression and Model Statistics
Evaluate Model• Descriptive Statistics
– Mean– Variance & Standard Deviation
• Accuracy / Error– SSE– RMSE– MAPE– R2; Adjusted R2
• Statistical Significance– F-Test– P-Value F-Test
Copyright 2010 DeepThought, Inc. 10
Linear Regression and Model Statistics
Descriptive StatisticsMean
• The average value of the data set
×http://images.google.com/imgres?imgurl=http://www.cs.princeton.edu/introcs/11gaussian/images/stddev.png&imgrefurl=http://www.cs.princeton.edu/introcs/11gaussian/&usg=__7JZMBeSrlQKPfVL2YCVuV8HVXkY=&h=206&w=570&sz=18&hl=en&start=54&um=1&tbnid=5jb7PXr6kgP08M:&tbnh=48&tbnw=134&prev=/images%3Fq%3Dstandard%2Brandom%2Bdistribution%26ndsp%3D18%26hl%3Den%26client%3Dfirefox-a%26rls%3Dorg.mozilla:en-US:official%26hs%3DXpO%26sa%3DN%26start%3D36%26um%3D1
Copyright 2010 DeepThought, Inc. 11
Linear Regression and Model Statistics
Variance & Standard Deviation• The sum of squared deviations of the data from the mean
– Estimates the variation the data exhibits from the mean• Standard deviation is the squared root of the variance
– Used to measure the distribution of the variable away from the mean, most observations of the variable will be within ± 3 standard deviations
Copyright 2010 DeepThought, Inc. 12
Linear Regression and Model Statistics
M2 Example• Mean
– 4214.38
• Variance– 3346475.10
• Standard Deviation– 1829.34
May-79 Nov-84 May-90 Oct-95 Apr-01 Oct-06 Apr-120.0
1000.0
2000.03000.0
4000.05000.0
6000.07000.0
8000.09000.0
M2 Money Stock (Billions of $'s)
Copyright 2010 DeepThought, Inc. 13
Linear Regression and Model Statistics
Accuracy/ErrorSSE
• Sum of Square Errors (SSE)– Sums the errors between the actual values and model values
• Measures the total error of the model• M2 Example:
– SSE: 316778645.89
May-79 Nov-84 May-90 Oct-95 Apr-01 Oct-06 Apr-120.0
1000.02000.03000.04000.05000.06000.07000.08000.09000.0
M2 Money Stock (Billions of $'s)
Copyright 2010 DeepThought, Inc. 14
Linear Regression and Model Statistics
RMSE
• The square root of the sum of square error divided by the number of observations
• An averaged out total of errors based upon the number of observations
• Simple way to compare models based on error• M2 Example:
– RMSE: 456.82
Copyright 2010 DeepThought, Inc. 15
Linear Regression and Model Statistics
MAPE
• The average percentage error of the model• Describes the average percentage of variation exhibited between
actual and forecasted values• M2 Example:
– MAPE: 10.09%
May-79 Nov-84 May-90 Oct-95 Apr-01 Oct-06 Apr-120.0
1000.02000.03000.04000.05000.06000.07000.08000.09000.0
M2 Money Stock (Billions of $'s)
Copyright 2010 DeepThought, Inc. 16
Linear Regression and Model Statistics
R-Squared & Adjusted R-Squared
• A proportion between unexplained and explained errors• Measures the percentage of variation captured by the model• Adjusted R2 incorporated the number of variables used and sample
size to adjust the R2 value
Copyright 2010 DeepThought, Inc. 17
Linear Regression and Model Statistics
M2 Example• R2
– 93.76%
• Adjusted R2
– 93.76%
May-79 Nov-84 May-90 Oct-95 Apr-01 Oct-06 Apr-120.0
1000.0
2000.0
3000.0
4000.0
5000.0
6000.0
7000.0
8000.0
9000.0
M2 Money Stock (Billions of $'s)
Copyright 2010 DeepThought, Inc. 18
Linear Regression and Model Statistics
Statistical SignificanceF-Test
• A proportion between explained and unexplained errors of model• Used to test if model build is statistically significant from being
equal to zero• The larger the F-test the better
Copyright 2010 DeepThought, Inc. 19
Linear Regression and Model Statistics
F-Test P-Value
• The F-Test P-Value representsthe percentage of significance of the F-test (blue area on graph)
• The higher the value of the F-test the lower the shaded blue area is. As the blue area decreases, confidence about our model being statistically significant increases
• 1 – p-value = Significance Level of the Model (%)• Significance level of the model (%) represents the amount of
confidence we have that our model is different from a model with no impact, or zero impact
Copyright 2010 DeepThought, Inc. 20
Linear Regression and Model Statistics
M2 Example
May-79 Nov-84 May-90 Oct-95 Apr-01 Oct-06 Apr-120.0
1000.0
2000.0
3000.0
4000.0
5000.0
6000.0
7000.0
8000.0
9000.0
M2 Money Stock (Billions of $'s)• F-Test– 22778.98
• F-Test P-Value– 0.00
Copyright 2010 DeepThought, Inc. 21
Linear Regression and Model Statistics
Compare Multiple Models• Skip this step until have knowledge of multiple methods• Will use accuracy/error statistics to compare multiple models to
find best models
Copyright 2010 DeepThought, Inc. 22
Linear Regression and Model Statistics
Use Model
• Understand limitations of model– Only measures a trend– A long term average
• Answer objectives– Does M2 has a linear trend– If trend exists, what is its magnitude– If model statistically significant, forecast
Copyright 2010 DeepThought, Inc. 23
Linear Regression and Model Statistics
M2 Example• M2 = 1145.31 + 4.04 × Time• Next Period is 1519• Forecast for that period is:
– Y = 1145.31 + 4.04 × 1519Y = 7283.446866