2 ordinary least squares 2012a
TRANSCRIPT
-
7/27/2019 2 Ordinary Least Squares 2012A
1/29
Ordinary Least Squares
Clas ErikssonHST, MDH
October 29, 2012
Clas Eriksson (HST, MDH) Ordinary Least Squares October 29, 2012 1 / 29
http://find/http://goback/ -
7/27/2019 2 Ordinary Least Squares 2012A
2/29
Introduction
OLS is the most well-known technique to estimate the regressioncoefficients
The chapter also discusses the fit of the equation
Outline:1 Estimating Single-Independent-Variable Models with OLS2 Estimating Multivariate Regression Models with OLS3 Evaluating the Quality of a Regression Equation4 Describing the Overall Fit of the Estimated Model
5 An Example of the Misuse of R2
Clas Eriksson (HST, MDH) Ordinary Least Squares October 29, 2012 2 / 29
http://find/ -
7/27/2019 2 Ordinary Least Squares 2012A
3/29
-
7/27/2019 2 Ordinary Least Squares 2012A
4/29
Estimating Single-Independent-Variable Models with OLSII
The OLS method starts from the Sum of the Squares of all
Residuals:
SSR =Ni=1
e2i (3)
The method calculates :s so as to minimize this sum.
Using the definition of ei, (3) becomes
Ni=1
Yi Yi
2Using also (2), to eliminate Yi, we have
Ni=1
Yi 0 1Xi
2()
We are looking for values of0 and
1 that minimize this sum.
Clas Eriksson (HST, MDH) Ordinary Least Squares October 29, 2012 4 / 29
http://find/ -
7/27/2019 2 Ordinary Least Squares 2012A
5/29
Estimating Single-Independent-Variable Models with OLSIII
Why Use Ordinary Least Squares?
There are many other techniques, but there are at least three goodreasons to use OLS:
1 Simple calculations.2 Easy to understand and reasonable: we want the regression equation to
be as close as possible to the observed data.3 OLS estimates have two useful characteristics:
The sum of residuals is exactly zero OLS is the best estimator under some specific conditions (best is
explained in ch. 4)OLS is an estimator (a mathematical technique)A given produced by OLS is an estimate
Clas Eriksson (HST, MDH) Ordinary Least Squares October 29, 2012 5 / 29
http://find/ -
7/27/2019 2 Ordinary Least Squares 2012A
6/29
Estimating Single-Variable Models with OLS IV
How Does OLS Work?
To find 0 and 1 that minimize (), we differentiate with respect to0 and 1, and put these derivatives equal to zero:
2N
i=1
Yi 0 1Xi
= 0 (a1)
2Ni=1
Yi 0 1Xi
Xi = 0 (a2)
These equations can be solved for
0 = Y 1X, (a)
1 =N
N
i=1 YiXi
N
i=1 Yi
N
i=1 Xi
NNi=1 X2i Ni=1 XiNi=1 Xi(b)
Clas Eriksson (HST, MDH) Ordinary Least Squares October 29, 2012 6 / 29
http://find/http://goback/ -
7/27/2019 2 Ordinary Least Squares 2012A
7/29
Estimating Single-Variable Models with OLS V
How Does OLS Work? (contd)
We have here defined
Y =1
N
Ni=1
Yi and X =1
N
Ni=1
Xi
as the average values of Y and X, respectively.In the book, we have equivalently:
1 =
N
i=1
Xi X
Yi Y
Ni=1 Xi X2(4)
and0 = Y 1X (5)
(For each different sample we get different estimates.)
Clas Eriksson (HST, MDH) Ordinary Least Squares October 29, 2012 7 / 29
http://find/ -
7/27/2019 2 Ordinary Least Squares 2012A
8/29
Estimating Single-Independent-Variable Models with OLSVI
An Example
Assume the following data:
i Xi Yi Xi X Yi Y (Xi X)2 (Xi X)(Yi Y) Yi ei
1 1 3
2 3 5
3 3 4
4 5 8
S 12 20
A 3 5
Note 1
Clas Eriksson (HST, MDH) Ordinary Least Squares October 29, 2012 8 / 29
http://find/ -
7/27/2019 2 Ordinary Least Squares 2012A
9/29
Estimating Multivariate Regression Models with OLS IOne variable can rarely explain everything. E.g. consumer demanddepends on several prices, income, advertising etc.
The meaning of Multivariate Regression Coefficients
A general multivariate regression (or multiple regression) model withK independent variables (1.13):
Yi = 0 + 1X1i + 2X2i + . . . + KXKi + i, (6)
where i = 1, 2, . . . , N, i.e. we have N observations of each variable
A slope coefficient now indicates the change in the dependent variableassociated with a one-unit increase in the explanatory variable
holding the other explanatory variables constant(Confer the partial derivative: Yi/XKi = K)
NB: Omitted (and relevant!) variables are not held constant
The intercept term, 0, is the value of Y when all the Xs and the
error term equal zeroClas Eriksson (HST, MDH) Ordinary Least Squares October 29, 2012 9 / 29
http://find/ -
7/27/2019 2 Ordinary Least Squares 2012A
10/29
Estimating Multivariate Regression Models with OLS II
The meaning of Multivariate Regression Coefficients (contd)
Example: an estimated model of per capita demand for beef in year t,CBt:
CBt = 37.54 0.88Pt + 11.9Ydt, (7)
where Pt = price of beef in year t (cents per pound) Ydt = per capita disposable income in year t (in thousands of dollars)
The estimated coefficient 11.9 means that beef consumption willincrease by 11.9 pounds per person, if per capita income rises by $1000, and if the price of beef is constant
The latter is crucial, but not entirely likely: the higher demand maytend to raise the price
Clas Eriksson (HST, MDH) Ordinary Least Squares October 29, 2012 10 / 29
http://find/ -
7/27/2019 2 Ordinary Least Squares 2012A
11/29
Estimating Multivariate Regression Models with OLS III
OLS Estimation of Multivariate Regression Models
Consider the model with two explanatory variables:
Yi = 0 + 1X1i + 2X2i + i, (8)
The estimated regression equation now is
Yi = 0 + 1X1i + 2X2i
and thus the residual of observation i, ei = Yi Yi, can be written as
ei = Yi 0 1X1i 2X2i
The sum of the squared residuals therefore is
N
i=1 Yi 0 1X1i 2X2i
2
()
Clas Eriksson (HST, MDH) Ordinary Least Squares October 29, 2012 11 / 29
http://find/ -
7/27/2019 2 Ordinary Least Squares 2012A
12/29
Estimating Multivariate Regression Models with OLS IV
OLS Estimation of Multivariate Regression Models (contd)
Like in the simpler case, we find the values of 0, 1 and 2 thatminimize this sum by computing the derivatives w.r.t. 0, 1 and 2and putting them equal to zero.
This system of equations can be solved for the estimated coefficients.Using the notation y = Yi Y, x1 = X1i X1 and x2 = X2i X2,
they are:
1 =(
yx1)
x22 (
yx2) (
x1x2)x21
x22 (
x1x2)2
2 =
( yx2) x21 ( yx1) ( x1x2) x21 x22 ( x1x2)20 = Y 1X1 2X2
Luckily, we can let a computer do the calculations
Clas Eriksson (HST, MDH) Ordinary Least Squares October 29, 2012 12 / 29
http://find/ -
7/27/2019 2 Ordinary Least Squares 2012A
13/29
Estimating Multivariate Regression Models with OLS V
Example
Assume a college where FINAIDi measures the financial aid ($/year)that the ith applicant gets
This aid is supposed to depend on The amount of money ($/year) that the parents can contribute:
PARENTi (need)
The students GPA rank in high school (percentage): HSRANKi(merit)
We can expect the following qualitative relation:
FINAIDi = f ()PARENTi, (+)HSRANKi , (9)i.e. more aid can be anticipated if the parents have limited possibilitiesto contribute and if the student has performed well in high school
Clas Eriksson (HST, MDH) Ordinary Least Squares October 29, 2012 13 / 29
http://find/ -
7/27/2019 2 Ordinary Least Squares 2012A
14/29
-
7/27/2019 2 Ordinary Least Squares 2012A
15/29
Figure 2.1
Clas Eriksson (HST, MDH) Ordinary Least Squares October 29, 2012 15 / 29
http://find/ -
7/27/2019 2 Ordinary Least Squares 2012A
16/29
Figure 2.2
Clas Eriksson (HST, MDH) Ordinary Least Squares October 29, 2012 16 / 29
http://find/ -
7/27/2019 2 Ordinary Least Squares 2012A
17/29
Estimating Multivariate Regression Models with OLS VII
Total, Explained and Residual Sums of Squares
How much of the variation in Y is explained by the regressionequation?
We use the total sum of squares,
TSS=
N
i=1 Yi Y2 (12)
to describe the total variation in Y (around its mean).
For OLS, the TSS is decomposed according to
Ni=1
Yi Y
2
TSS
=Ni=1
Yi Y
2
ESS
+Ni=1
e2i
RSS
(13)
Clas Eriksson (HST, MDH) Ordinary Least Squares October 29, 2012 17 / 29
O S
http://find/ -
7/27/2019 2 Ordinary Least Squares 2012A
18/29
Estimating Multivariate Regression Models with OLS VIII
Total, Explained and Residual Sums of Squares (contd)(Equation (13) is usually called the decomposition of variance)
We here define ESS = Explained Sum of Squares (attributable to the estimated
regression equation) RSS = Residual Sum of Squares (not explained by the estimated
regression equation)
Figure 2.3
Since ESS is the part of the variation in Y that is explained by the
model, the regression line fits the data better if ESS/TSS is high(and thus RSS/TSS is low; recall that OLS minimizes RSS)
Clas Eriksson (HST, MDH) Ordinary Least Squares October 29, 2012 18 / 29
Fi 2 3
http://find/ -
7/27/2019 2 Ordinary Least Squares 2012A
19/29
Figure 2.3
Clas Eriksson (HST, MDH) Ordinary Least Squares October 29, 2012 19 / 29
E l i h Q li f R i E i
http://find/ -
7/27/2019 2 Ordinary Least Squares 2012A
20/29
Evaluating the Quality of a Regression Equation
When the estimates have been produced, they must be evaluated, forinstance by asking the following questions:1 Is the equation supported by sound theory?2 How well does the estimated regression fit the data?3 Is the data set reasonably large and accurate?4 Is OLS the best estimator to be used for this equation?5 How well do the estimated coefficients correspond to the expectations
developed by the researcher before the data were collected?6 Are all the obviously important variables included in the equation?7 Has the most theoretically logical functional form been used?8 Does the regression appear to be free of major econometric problems?
These numbers roughly correspond to the relevant chapters in thebook
Clas Eriksson (HST, MDH) Ordinary Least Squares October 29, 2012 20 / 29
http://find/ -
7/27/2019 2 Ordinary Least Squares 2012A
21/29
D ibi th O ll Fit f th E ti t d M d l II
-
7/27/2019 2 Ordinary Least Squares 2012A
22/29
Describing the Overall Fit of the Estimated Model II
R2 (contd)
Using (13) again, we have
R2 = 1
N
i=1 e2i
N
i=1
Yi Y
2 (14)The higher R2 (the goodness of fit) is, the closer does the theestimated equation fit the sample data. We have
0 R2 1
Figures 2.42.6
For time series data (e.g. yearly observations for one country), theR2 is often quite high. But this can be because there are significanttime trends on both sides of the equation
For cross-sectional data (e.g. observations during one year for many
countries), the R2 is often lower. R2 = 0.50 might be considered goodClas Eriksson (HST, MDH) Ordinary Least Squares October 29, 2012 22 / 29
Fig e 2 4
http://find/ -
7/27/2019 2 Ordinary Least Squares 2012A
23/29
Figure 2.4
Clas Eriksson (HST, MDH) Ordinary Least Squares October 29, 2012 23 / 29
Figure 2 5
http://find/ -
7/27/2019 2 Ordinary Least Squares 2012A
24/29
Figure 2.5
Clas Eriksson (HST, MDH) Ordinary Least Squares October 29, 2012 24 / 29
http://find/ -
7/27/2019 2 Ordinary Least Squares 2012A
25/29
Describing the Overall Fit of the Estimated Model III
-
7/27/2019 2 Ordinary Least Squares 2012A
26/29
Describing the Overall Fit of the Estimated Model III
The Simple Correlation Coefficient, r
The correlation coefficient between two variables X1 and X2 is
r1,2 =
(X1i X1)(X2i X2)
(X1i X1)2
(X2i X2)2
It measures the strength and direction of a linear relationship between
two variables.We have
1 r 1
If two variables are
perfectly positively correlated, then r = 1 perfectly negatively correlated, then r = 1 totaly uncorrelated, then r = 0
If an estimated regression has only one explanatory variable, thenR2 = r2
Clas Eriksson (HST, MDH) Ordinary Least Squares October 29, 2012 26 / 29
Describing the Overall Fit of the Estimated Model IV
http://find/ -
7/27/2019 2 Ordinary Least Squares 2012A
27/29
Describing the Overall Fit of the Estimated Model IV
R2, The Adjusted R2
Adding one more explanatory variable will never lowerR2
, butprobably raise it.It is therefore tempting to increase the number of explanatoryvariables, but this is not necessarily good econometrics.
In an example model in the book, weight is explained by height.
Then the post box number is included as an additional explanatoryvariable, and this raises R2.
But it is a nonsense variable: there is no reason that the post boxnumber should explain your weight
The inclusion of one more explanatory variable requires the estimation
of one more coefficient.This reduces the degrees of freedom: the excess of the number ofvariables (N) over the number of coefficients estimated (K + 1).
The cost of this is that the estimates are likely to be less reliable
Clas Eriksson (HST, MDH) Ordinary Least Squares October 29, 2012 27 / 29
Describing the Overall Fit of the Estimated Model V
http://find/ -
7/27/2019 2 Ordinary Least Squares 2012A
28/29
Describing the Overall Fit of the Estimated Model V
R2, The Adjusted R2 (contd)
We therefore have an alternative measure of the quality of the fit:
R2 = 1
Ni=1 e
2i
/(N K 1)N
i=1
Yi Y
2/(N 1)
= (15)
= 1 Ni=1 e2iNi=1
Yi Y2 N 1N K 1
This is R2 adjusted for degrees of freedom
When one more explanatory variable is introduced the last fraction
increases and thus lowers R2
.On the other hand, the other fraction is likely to decrease, whichraises R2. What is the net result?
In the book example, R2 falls as the post box number is included(We have R2 1, but R2 can be slightly negative)
Clas Eriksson (HST, MDH) Ordinary Least Squares October 29, 2012 28 / 29
An Example of the Misuse of R2
http://find/ -
7/27/2019 2 Ordinary Least Squares 2012A
29/29
An Example of the Misuse ofR
Read on your ownAdding the price of water as an explanatory for the use of water,seems reasonable and it raises R2
But the sign of the estimated coefficient for the price is
positiveopposite to the expectedAfter all, it may not be good to include the price variable, becausethe suspect coefficient estimates are likely to lead to flawed forecasts(Since the expenditures on water are such small shares of householdexpenditures, water demand may be very insensitive to the price)
So, once again, do not only look at the goodness of fit
Clas Eriksson (HST, MDH) Ordinary Least Squares October 29, 2012 29 / 29
http://find/