introductory econometrics - session 5 - the linear model

The simple regression model The multiple regression model Inference

Introductory EconometricsSession 5 - The linear model

Roland Rathelot

Sciences Po

July 2011

Rathelot

Introductory Econometrics


Multivariate econometrics

I Outcome

I Covariate(s)

I Model

In the simple regression: 1 outcome and 1 regressor

Rathelot



Assumption: random sample

I The population of interest should be defined

I (yi , xi ){i=1...n} are assumed to be iid

I Note that here yi and xi are rv

Rathelot



Assumption: A linear model

y = α + βx + u

I y is the outcome (explained variable)

I x is the explanatory variable (covariate)

I α is the constant (intercept)

I β is the coefficient on x (slope)

I u is the error

Rathelot



Correlation and causality

I Comovement of ∆x and ∆y

I Interpreting β in a causal sense: when is it possible?

I In a causal framework, u are the unobserved determinants of y

Rathelot



Assumption: Zero expectation of the error

E (u) = 0

When an intercept is introduced in the model, this assumptioncomes at no cost

Rathelot



Assumption: Zero conditional expectation

E (u|x) = 0

I This means that the error is not correlated with any functionof x

I Crucial assumption

I As a consequence:

E (y |x) = α + βx

Rathelot



What is the right estimator?

I Based on these assumptions, how can we estimate β (and α)?

I By the moments

I By the least squares

Rathelot



The OLS estimator

β =

∑ni (yi − y)(xi − y)∑n

i (xi − x)2

α = y − βx

Rathelot



Algebraic properties of the OLS estimator

Let’s define

I the residualui = yi − α− βxi

I the predicted valueyi = α + βxi

Then

1.∑

ui = 0

2.∑

xi ui = 0

3. (x , y) is on the regression line

Rathelot



Decomposition of the variance

We define:

SST =∑

(yi − y)2

SSE =∑

(yi − y)2

SSR =∑

(ui )2

Then,SST = SSE + SSR

Rathelot



Goodness of fit

The R-squared is usually used to appreciate the goodness of fit ofthe linear regression

R2 = SSE/SST = 1− SSR/SST

It is the share of the explained variance in the total variance

Rathelot



Assumptions: Summary

I (A1) Linear model

I (A2) Random sample in the population

I (A3) Variability of the covariate in the sample

I (A4) Zero conditional expectation of the error

Rathelot



Statistical properties

Under assumptions (A1) to (A4), two important properties for theOLS estimator (α, β)

I The OLS estimator is consistent

I The OLS estimator is unbiased

Rathelot



An additional assumption

I So far, nothing about precision

I (A5) Homoskedasticity: V (u|x) = σ2

I This assumption means that the variance of the error does notdepend on the value of the covariate

Assumption (A1)-(A5) are sometimes called the Gauss-Markovassumptions

Rathelot



The variance of the OLS estimator

Conditional of the sample {x1 . . . xn}, the variance of the OLSestimator is

V (β) =σ2∑

(xi − x)2

V (α) =

∑x2in

σ2∑(xi − x)2

Rathelot



Estimating the variance of β

I What is unknown and needed is σ2

I The usual estimator is σ2 =∑

u2in−2

I This estimator is unbiased

Rathelot



Regression through the origin

It is possible to estimate the model with no intercept

y = βx + u

In this case, β =∑

xiyix2i

When using this estimator instead of the

one with intercept?

I When strong a priori or theory to believe that α = 0

I When variables have been centered before the regression:xi = xi − x

Rathelot



The multiple regression model

I Multiple: not just one but several covariates

I k covariates: 1, x1, x2,... xk−1I Caeteris paribus: principle and examples

Rathelot



Scalar notations

The linear model now writes:

yi = β0 + β1x1,i + . . . βk−1x(k−1,i) + ui

where, for the sake of clarity, the subscript i is usually omitted

Rathelot



Matrix notations

y = Xβ + u

I y is the vector (y1 . . . yn), of length n

I X is a matrix with k columns and n rows

I β is a vector of length k

I u is a vector of length n

Rathelot



The Gauss-Markov assumption in the multiple case

I (A1) Linear model

I (A2) Random sample in the population

I (A3) No collinearity between covariates

I (A4) Zero conditional expectation of the error

I (A5) Homoskedasticity : Var(ui |xi = x) = σ2

Rathelot



Obtaining the OLS

Just the same as before:

I Least squares

I Moments

provide the same expression for the OLS estimators

β = (β0, β1 . . . βk−1)

Rathelot



A compact expression

β = (X ′X )−1(X ′y)

I Under A1 to A4, the estimator is consistent and unbiased

I Under A1 to A5, its variance is equal to (X ′X )−1σ2

Rathelot



Residuals and predicted values

The residuals and the predicted values are defined as before

u = y − X β

y = X β

Rathelot



Goodness of fit

The residuals and the predicted values are defined as beforeThe R-squared is still used to assess the model’s goodness of fit

R2 = SSE/SST = 1− SSR/SST

Rathelot



Estimating the variance of the OLS estimator

I As in the simple case, σ2 has to be estimated

I An unbiased estimator for σ2 is:

σ2 =1

n − k

∑u2i

Rathelot



Projections

To interpret the meaning of OLS estimators, it is useful tointroduce:

PX = X (X ′X )−1X ′

MX = I − X (X ′X )−1X ′

I PX and MX are symmetric

I PX and MX are projectors

so that y = PX y and u = MX y

Rathelot



The Frisch Waugh theorem

Split the covariates in two groups: X = (X1,X2), β = (β1, β2)

I First regress y on X1 and X2 on X1 and keep the residualsMX1y and MX1X2

I Now regress MX1y on MX1X2: the obtained estimator is equalto the OLS estimator β2

Rathelot



Frisch Waugh and ceteris paribus

I Suppose we are especially interested by βj , the coefficient onxj

I Regress first xj on all the other covariates: keep the residualM−jxj

I βj may be obtained by the regression of y on M−jxj

Rathelot



Frisch Waugh and the variance of βj

Var(βj) =σ2

(1− R2j )SSTj

I SSTj =∑

(xij − xj)2

I R2j is the R-squared from regressing xj on all other

independent variables

Rathelot



Misspecification

Suppose the true model is y = β0 + β1x1 + β2x2 + u

I β1 is the OLS estimator relating to x1 in the regression of yon x1 and x2

I β1 is the OLS estimator of the regression of y on x1

Rathelot



Misspecification (2)

I β1 is biased iff:

1. β2 6= 02. Cov(x1, x2) 6= 0

I In terms of variance, β is always more precise than beta

Var(β) ≥ Var(β)

Rathelot



BLUE

I Among the linear estimators

βj =n∑i

wiyi

I that are unbiased

I the OLS estimator is the one with the smallest variance

It is said to be the Best Linear Unbiased Estimator

Rathelot



Normality

I Even under the Gauss Markov assumptions, the distribution ofβ may still have any form

I To be able to make inference, need to add a normalityassumption

(A6) u is independent from x1 . . . xk and is distributed as aN(0, σ2)

Rathelot



Distribution of the estimator

Under A1 to A6, the OLS estimator is distributed as:

βj ∼ N(βj ,Var(βj))

Andβj − βj√Var(βj)

∼ N(0, 1)

Rathelot



The t-stat

I The distribution of its empirical counterpart is a Student withn − k df

βj − βj√V (βj)

∼ tn−k

I When n is not too small, this distribution is really close to astandard normal

I To test the significance of the coefficient βj , the t-statistic isusually used

βj√V (βj)

Rathelot



To test any linear restriction

This test may be used to test any case where there is one linearrestriction

I the equality of a coefficient to 0

I the equality of a coefficient to any number

I the equality of two coefficients

I any linear relationship between two or more coefficients

Rathelot



Testing more restrictions

The Fisher test

Rathelot



What if normality is not likely?

I In many cases, the normality of the errors is a strongassumption

I How can we reach inference without this assumption?

I We replace A6 by another assumption

I A′6 n is sufficiently large so that we can use the asymptoticproperties of the OLS estimator

Rathelot



Consistency of the OLS estimator

I The OLS estimator is consistent

plimβ = β

Rathelot



Asymptotic normality

Using the Central Limit Theorem, under the Gauss Markovassumptions

I√n(β − β) N(0, σ2/A2)

I where A2 = plim(X ′X )/n

I σ2 is a consistent estimator of σ2

I Finallyβj − βj√V (βj)

N(0, 1)

Rathelot



Asymptotic inference

When n is large, one may, without any normality assumption A6

I use the t-test for one linear restriction

I use the Fisher test for several linear restrictions

Rathelot



The asymptotic behavior of the variance

We already know that

V ar(βj) =σ2

(1− R2j )SSTj

I σ2 is consistent for σ2

I R2j converges to some value between 0 and 1

I SSTj/n converges to Var(xj)

So the variance is O(1/n) and the standard error O(1/√n)

Rathelot


introductory econometrics - session 5 - the linear model

Documents