chapter 3. two-variable regression model: the problem of...

Chapter 3.Two-Variable Regression Model:The Problem of Estimation

Ordinary Least Squares Method (OLS)

Recall that, PRF: Yi = β1 + β2 Xi + ui

Thus, since PRF is not directly observable, it is estimated by SRF; that is,

iii uXY ˆˆˆ21 ++= ββ

iii uYY ˆˆ +=

On Error Term More

iii YYu ˆˆ −=

iii uYY ˆˆ +=If

And,iii XYu 21

ˆˆˆ ββ −−=

On error term moreWe need to choose SRF in such a way that, error terms should be as

small as possible,

That is,

The sum of residuals which is represented by

( )∑ ∑ −= iii YYu ˆˆ

Should be as SMALL as possible

On Error Terms moreTherefore, the essential solution is to find a criterion in

order to minimize error disturbances in SRF.

All of the errors are to be as closer as possible to the

central line of SRF

Then, Least Squares Criterion Comes as a Solution

Least Squares Criterion is based on:

( )∑ ∑ −=22 ˆˆ iii YYu

( )∑ −−=2

21ˆˆ

ii XY ββ

( )∑ = 212 ˆ,ˆˆ ββfui

Example to Least Squares Criterion

The first Model is Better?Why?

Sum of squares of Error disturbances of the second model is lower

Regression Equation

iii uXY ˆˆˆ21 ++= ββ

( )( )( )

∑∑∑

∑∑ ∑

∑ ∑ ∑

−−=

YXYXnβ

∑ ∑∑ ∑ ∑ ∑

Sample mean of YSample mean of X

The Classical Linear Regression Model (CLRM): The Assumptions Underlying The Method of Least Squares

The inferences about the true β1 and β2 are important because the estimated values of them are needed to be closer and closer to population values.

Therefore CLRM, which is the cornerstone of most econometric theory, makes 10 assumptions.

Assumptions of CLRM:Assumption 1. Linear Regression Model

The regression model is linear in the parameters, that is:

Yi = β1 + β2 Xi + ui

Assumption 2. X values are fixed in repeated sampling.More technically, X is assumed to be non-stochastic

X: 80$ income level → Y: 60$ weekly consumption of a familyX: 80$ income level → Y: 75$ weekly consumption of another family

Assumption 2 is known as: Conditional Regression Analysis, that is, conditional on the given values of the regressor(s) X.

Assumption 3. Zero Mean value of disturbance ui

( ) 0/ =ii XuE

Assumption 4. Homoscedasticity or Equal Variance of ui

( ) ( )[ ]( )

cefor varian stands var

3 Assumption of because /

XuEuEXu

Homoscedasticity vs Heteroscedasticity

( ) 2/var σ=ii Xu

Assumption 5. No Autocorrelation between the disturbances

Autocorrelation

PRF: Yt = β1 + β2Xt + ut

And if ut and ut-1 are correlated, then Yt depends not only Xt, but also on ut-1.

Autocorrelation in Graphs

Assumption 6. Zero Covariance between ui and Xi.

Assumption 7.

Assumption 8.

Assumption 9.

Assumption 10. There is No Perfect Multicollinearity

That is, there is no perfect linear relationship among the explanatory variables.

tnnt uXXXY ++++= ββββ .....22110

High correlation among independent variables causes multicollinearity which also causes standard errors to be high, hypotheses to be inefficient (low t values), etc...

Properties of the Least-Squares Estimators: The Gauss-Markov Theorem

Gauss-Markov Theorem is the least squares approach of Gauss (1821) with the minimum variance approach of Markov (1900).

Standard error of estimate is simply the standard deviation of the Y values about the estimated regression line and is often used as a summary measure of the “goodness of fit” of the estimated regression line.

BLUE (Best Linear Unbiased Estimator)

1. An estimator is linear, that is, a linear function of a random variable, such as the dependent variable Y in the regression model.

2. An estimator is unbiased, that is, its average or expected value, E(β2), is equal to the true value, β2.

3. An estimator has minimum variance in the class of all such linear unbiased estimators; an unbiased estimator with the least variance is known as an efficient estimator.

Therefore, in the regression context it can be proved that the OLS estimators are BLUE which also sets the base of Gauss-Markov Theorem.

The Coefficient of Determination, r2: A Measure of “Goodness of Fit”

The coefficient of determination, r2 (two-variable case) or R2 (multiple regression) is a summary measure that tells how well the sample regression line fits the data.

The Ballentine View of R2

See Peter Kennedy, “Ballentine: A Graphical Aid for Econometrics”, Australian Economics Papers, Vol 20, 1981, 414-416. The name Ballentine is derived from the emblem of the well-known Ballantine beer with its circles.

Coefficient of Determination, r2

TSS = ESS + RSS

where;TSS = total sum of squaresESS = explained sum of squaresRSS = residual sum of squares

( )( ) ( )∑

∑∑∑

2i ˆY

TSSRSS

TSSESS

If TSS = ESS + RSS, then:

On r2 more:

R2 indicates the explained part of the regression model, therefore,

TSSESSr =2

( )( ) TSS

−=∑∑

Alternatively,

TSSRSSr

−−=∑∑

Coefficient of Determination

Coefficient Of Determination

HW # 1:

Problem 3.20 (Chapter 3)

Consumer Prices and Money Supply in Japan

1982 to 2001

chapter 3. two-variable regression model: the problem of...

Documents

multiple regression · 2008-07-28 · multiple regression...

multiple regression variable selection - new york...

topics: regression simple linear regression: one dependent...

logistic regression. what type of regression? dependent...

simultaneous regression shrinkage, variable selection, and

the classical two-variable regression model ii

regression with an imputed dependent variable

econometrics: two variable regression

building the regression model data preparation variable...

6- single variable regression (part ii)

dummy-variable regression · pdf file7 dummy-variable...

variable importance in linear regression versus random

multiple regression with a qualitative dependent variable

regression analysis -...

local polynomial regression and variable selectionkeywords...

all rights reserved by dr.bill wan sing hung - hkbu 2a.1...

1 linear regression with one variable

variable importance assessment regression

linear regression - wharton finance - finance...

block variable selection in multivariate regression and