class 4 ordinary least squares skema ph.d programme 2010-2011 lionel nesta observatoire français...

39
Class 4 Ordinary Least Squares SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques [email protected]

Upload: arron-cole

Post on 25-Dec-2015

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Class 4 Ordinary Least Squares SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Class 4

Ordinary Least Squares

SKEMA Ph.D programme2010-2011

Lionel Nesta

Observatoire Français des Conjonctures Economiques

[email protected]

Page 2: Class 4 Ordinary Least Squares SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Introduction to Regression Ideally, the social scientist is interested not only in knowing the

intensity of a relationship, but also in quantifying the magnitude

of a variation of one variable associated with the variation of

one unit of another variable.

Regression analysis is a technique that examines the relation

of a dependent variable to independent or explanatory

variables.

Simple regression y = f(X)

Multiple regression y = f(X,Z)

Let us start with simple regressions

Page 3: Class 4 Ordinary Least Squares SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Scatter Plot of Fertilizer and Production

Page 4: Class 4 Ordinary Least Squares SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Scatter Plot of Fertilizer and Production

Page 5: Class 4 Ordinary Least Squares SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Scatter Plot of Fertilizer and Production

iPr ediction Y

i iError Y Y

Page 6: Class 4 Ordinary Least Squares SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Scatter Plot of Fertilizer and Production

Page 7: Class 4 Ordinary Least Squares SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Scatter Plot of Fertilizer and Production

Page 8: Class 4 Ordinary Least Squares SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Objective of Regression It is time to ask: “What is a good fit?”

“A good fit is what makes the error small”

“The best fit is what makes the error smallest”

Three candidates

1. To minimize the sum of all errors

2. To minimize the sum of absolute values of errors

3. To minimize the sum of squared errors

Page 9: Class 4 Ordinary Least Squares SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

To minimize the sum of all errors

1

minn

i ii

y y

X

Y

–+

X

Y

– ++

Problem of sign

Page 10: Class 4 Ordinary Least Squares SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

X

Y

+3

To minimize the sum of absolute values of errors

1

minn

i ii

y y

X

Y

–1

–1+2

Problem of middle point

Page 11: Class 4 Ordinary Least Squares SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

To minimize the sum of squared errors

2

1

minn

i ii

y y

X

Y

–+

Solve both problems

Page 12: Class 4 Ordinary Least Squares SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

22

1 1

min minn n

i ii i

y y

ε

ε²

Overcomes the sign problem

Goes through the middle point

Squaring emphasizes large errors

Easily Manageable

Has a unique minimum

Has a unique – and best - solution

To minimize the sum of squared errors

Page 13: Class 4 Ordinary Least Squares SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Scatter Plot of Fertilizer and Production

Page 14: Class 4 Ordinary Least Squares SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Scatter Plot of R&D and Patents (log)

Page 15: Class 4 Ordinary Least Squares SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Scatter Plot of R&D and Patents (log)

Page 16: Class 4 Ordinary Least Squares SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Scatter Plot of R&D and Patents (log)

Page 17: Class 4 Ordinary Least Squares SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Scatter Plot of R&D and Patents (log)

Page 18: Class 4 Ordinary Least Squares SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

The Simple Regression Model

( )i i i

i i

y x

E y x

yi Dependent variable (to be explained)

xi Independent variable (explanatory)

α First parameter of interest

Second parameter of interest

εi Error term

Page 19: Class 4 Ordinary Least Squares SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

The Simple Regression Model

iiy x

.

and are estimates of

the true - but unkown - and

Page 20: Class 4 Ordinary Least Squares SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

2

1

minn

i ii

y y

ε

ε²

2 2

1 1

2

1

2

1

min min

0

0

n n

i i i ii i

n

i

n

i

y y y x

To minimize the sum of squared errors

Page 21: Class 4 Ordinary Least Squares SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

2

1

minn

i ii

y y

ε

ε²

2

i i

i

y y x x

x x

y x

To minimize the sum of squared errors

Page 22: Class 4 Ordinary Least Squares SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Application to SKEMA_BIO Data using Excel

lnpat_assets lnrd_assetsNumerator Beta_Hat

Denominator Beta_Hat

-12.77 -2.28 -0.61 0.01 -0.01 0.00-12.51 -2.24 -0.35 0.05 -0.02 0.00-12.74 -2.20 -0.58 0.09 -0.05 0.01-12.52 -2.31 -0.36 -0.02 0.01 0.00-12.12 -2.25 0.04 0.04 0.00 0.00-12.53 -2.26 -0.37 0.03 -0.01 0.00-12.09 -2.25 0.07 0.04 0.00 0.00

Mean of y Mean of x Sum Sum-12.16 -2.29 448.75 256.55

Alpha_hat -8.148

Beta_hat 1.749

Deviation to the mean

Page 23: Class 4 Ordinary Least Squares SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Application to SKEMA_BIO Data using Excel

lnpat_assets lnrd_assetsNumerator Beta_Hat

Denominator Beta_Hat

-12.77 -2.28 -0.61 0.01 -0.01 0.00-12.51 -2.24 -0.35 0.05 -0.02 0.00-12.74 -2.20 -0.58 0.09 -0.05 0.01-12.52 -2.31 -0.36 -0.02 0.01 0.00-12.12 -2.25 0.04 0.04 0.00 0.00-12.53 -2.26 -0.37 0.03 -0.01 0.00-12.09 -2.25 0.07 0.04 0.00 0.00

Mean of y Mean of x Sum Sum-12.16 -2.29 448.75 256.55

Alpha_hat -8.148

Beta_hat 1.749

Deviation to the mean

Patent R&Dln 8.148 1.748 ln

Assets Assets i

Page 24: Class 4 Ordinary Least Squares SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

InterpretationPatent R&D

ln 8.148 1.748 lnAssets Assets i

When the log of R&D (per asset) increases by one unit, the log of patent per asset increases by 1.748

Remember! A change in log of x is a relative change of x itself

A 1% increase in R&D (per asset) entails a 1.748% increase in the number of patent (per asset).

Page 25: Class 4 Ordinary Least Squares SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

OLS with STATA

Stata Instruction : regress (reg)

reg y x1 x2 x3 … xk [if] [weight] [, options]

Options : noconstant : gets rid of constant

robust : estimates robust variances, even with heteroskedasticity

if : selects observations

weight : Weighted least squares

Page 26: Class 4 Ordinary Least Squares SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Application to Data using STATA

reg lpat_assets lrdi

_cons -8.150657 .2440936 -33.39 0.000 -8.630425 -7.670889 lrdi 1.748129 .1009131 17.32 0.000 1.549784 1.946475 lpat_assets Coef. Std. Err. t P>|t| [95% Conf. Interval]

Patent R&Dln 8.148 1.748 ln

Assets Assets i

predict newvar , [type]

Type means residual or predictions

Page 27: Class 4 Ordinary Least Squares SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Assessing the Goodness of Fit

It is important to ask whether a specification provides a good prediction on the dependent variable, given values of the independent variable.

Ideally, we want an indicator of the proportion of variance of the dependent variable that is accounted for – or explained – by the statistical model.

This is the variance of predictions (ŷ) and the variance of residuals (ε), since by construction, both sum to overall variance of the dependent variable (y).

Page 28: Class 4 Ordinary Least Squares SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Overall Variance

Page 29: Class 4 Ordinary Least Squares SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Decomposing the overall variance (1)

Page 30: Class 4 Ordinary Least Squares SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Decomposing the overall variance (2)

Page 31: Class 4 Ordinary Least Squares SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Coefficient of determination R² R2 is a statistic which provides information on the

goodness of fit of the model.

2

2

2

tot i

fit i tot fit res

res i i

SS y y

SS y y SS SS SS

SS y y

² fit

tot

SSR

SS

0 ² 1R

Page 32: Class 4 Ordinary Least Squares SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Fisher’s F Statistics Fisher’s statistics is relevant as a form of ANOVA on SSfit

which tells us whether the regression model brings significant (in a statistical sense, information.

Model SS df MSS F

(1) (2) (3) (2)/(3)

Fitted p

Residual N–p–1

Total N–1 2

iy y

2

i iy y

2

iy y

p: number of parametersN: number of observations

MSS

MSSfit

res

MSS fit

MSSres

Page 33: Class 4 Ordinary Least Squares SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

STATA output

_cons -8.150657 .2440936 -33.39 0.000 -8.630425 -7.670889 lrdi 1.748129 .1009131 17.32 0.000 1.549784 1.946475 lpat_assets Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 1905.10212 430 4.43047005 Root MSE = 1.6165 Adj R-squared = 0.4102 Residual 1120.97039 429 2.61298459 R-squared = 0.4116 Model 784.131733 1 784.131733 Prob > F = 0.0000 F( 1, 429) = 300.09 Source SS df MS Number of obs = 431

. reg lpat_assets lrdi

.

Page 34: Class 4 Ordinary Least Squares SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

What the R² is not

Independent variables are a true cause of the changes in the dependent variable

The correct regression was used

The most appropriate set of independent variables has been chosen

There is co-linearity present in the data

The model could be improved by using transformed versions of the existing set of independent variables

Page 35: Class 4 Ordinary Least Squares SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Inference on β

We have estimated

Therefore we must test whether the estimated parameter is significantly different than 0, and, by way of consequence, we must say something on the distribution – the mean and variance – of the true but unobserved β*

( )i iiE y y x Si 0, ( )iE y Si 0, ( ) iE y x

Page 36: Class 4 Ordinary Least Squares SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

The mean and variance of β It is possible to show that is a good approximation,

i.e. an unbiased estimator, of the true parameter β*.

*ˆE

2 22

ˆ2

1

VAR where 1 1i in

i

y y nx x

The variance of β is defined as the ratio of the mean square of errors over the sum of squares of the explanatory variable

Page 37: Class 4 Ordinary Least Squares SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

The confidence interval of β

We must now define de confidence interval of β, at 95%. To do so, we use the mean and variance of β and define the t value as follows: *

ˆt s

*.025

2

1

tn

i

x x

Therefore, the 95% confidence interval of β is:

If the 95% CI does not include 0, then β is significantly different than 0.

Page 38: Class 4 Ordinary Least Squares SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Student t Test for β We are also in the position to infer on β

H0: β* = 0

H1: β* ≠ 0

Rule of decision

Accept H0 is | t | < tα/2

Reject H0 is | t | ≥ tα/2

*

ˆ ˆ

ts s

Page 39: Class 4 Ordinary Least Squares SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

STATA output

_cons -8.150657 .2440936 -33.39 0.000 -8.630425 -7.670889 lrdi 1.748129 .1009131 17.32 0.000 1.549784 1.946475 lpat_assets Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 1905.10212 430 4.43047005 Root MSE = 1.6165 Adj R-squared = 0.4102 Residual 1120.97039 429 2.61298459 R-squared = 0.4116 Model 784.131733 1 784.131733 Prob > F = 0.0000 F( 1, 429) = 300.09 Source SS df MS Number of obs = 431

. reg lpat_assets lrdi

.