econometrics the multiple regression model: inferencedocentes.fe.unl.pt/~azevedoj/web...

24
Normality The t Test The p-value CI The F test Econometrics The Multiple Regression Model: Inference Jo˜ ao Valle e Azevedo Faculdade de Economia Universidade Nova de Lisboa Spring Semester Jo˜ ao Valle e Azevedo (FEUNL) Econometrics Lisbon, March 2011 1 / 24

Upload: ngoliem

Post on 07-Feb-2018

227 views

Category:

Documents


0 download

TRANSCRIPT

Normality The t Test The p-value CI The F test

EconometricsThe Multiple Regression Model: Inference

Joao Valle e Azevedo

Faculdade de EconomiaUniversidade Nova de Lisboa

Spring Semester

Joao Valle e Azevedo (FEUNL) Econometrics Lisbon, March 2011 1 / 24

Normality The t Test The p-value CI The F test

Inference

Inference in the Multiple Linear Regression Model

Suppose you want to test whether a variable is important inexplaining variation in the dependent variable:

I E.g., is the effect of tenure on wages statistically significant (ie,different from zero)? Is the effect of height on wages statisticallysignificant?

Or suppose you want to test whether a coefficient has a particularvalue

I E.g., is the effect of one additional year of schooling on expectedmonthly wages equal to 200?

Need to take into account the sampling distribution of our estimators

We will check whether under the maintained hypothesis (or nullhypothesis) the observed values of certain test statistics are likely

I If they are not we reject the null

Joao Valle e Azevedo (FEUNL) Econometrics Lisbon, March 2011 2 / 24

Normality The t Test The p-value CI The F test

Inference

Inference in the Multiple Linear Regression Model

y = β0 + β1x1 + β2x2 + ...+ βkxk + u

Assumption MLR.6 (Normality)The distribution of the population error u is independent ofx1, x2, ..., xk and u is normally distributed with mean 0 and varianceσ2: we write u ∼ Normal(0, σ2)

I Independence assumption is stronger than MLR.4 (Zero ConditionalMean) assumption. Actually, it implies MLR.4

I Also, normality and independence imply MLR.5 so that all the resultsregarding unbiasedness and variance of the estimators remain valid

Joao Valle e Azevedo (FEUNL) Econometrics Lisbon, March 2011 3 / 24

Normality The t Test The p-value CI The F test

Inference

Classical Linear Model

Assumptions MLR.1 through MLR.6 are the Classical Linear Model(CLM) assumptions

Under the CLM assumptions, OLS is not only BLUE, but is theminimum variance unbiased estimator: no other unbiasedestimator has a variance smaller than OLS

We can summarize the population assumptions of CLM as follows

y |X ∼ Normal(β0, β1x1, β2x2, ..., βkxk , σ2)

Normality is unrealistic in many cases (e.g., wages cannot be negativebut under the normality assumption of u we can get negative wages)

However, most results would hold in large samples without thenormality assumption

Joao Valle e Azevedo (FEUNL) Econometrics Lisbon, March 2011 4 / 24

Normality The t Test The p-value CI The F test

Inference

Normal Sampling Distribution

..

x1 x2

E(y|x) = b0 + b1x

y

f(y|x)

Normaldistributions

Figure: The homoskedastic normal distribution with a single explanatory variable

Joao Valle e Azevedo (FEUNL) Econometrics Lisbon, March 2011 5 / 24

Normality The t Test The p-value CI The F test

Inference

Normal Sampling Distribution

Since the OLS estimators are a linear function of the error term u,then (conditional on the x ’s):

Theorem

Under the CLM assumptions, conditional on the sample values of theindependent variables,

βj ∼ Normal [βj ,Var(βj)],

I Therefore,(βj − βj)sd(βj)

∼ Normal(0, 1)

I where sd stands for standard deviation (squared root of the variance, derivedin previous classes)

Joao Valle e Azevedo (FEUNL) Econometrics Lisbon, March 2011 6 / 24

Normality The t Test The p-value CI The F test

Inference

Normal Sampling Distribution

Now, the σ2 that appears in the expression for the standard deviationof the estimators must be estimated

Also, conditional on the x ’s (n − k − 1)σ2/σ2 ∼ χ2n−k−1 which

implies:

(βj − βj)se(βj)

=(βj − βj)sd(βj)

sd(βj)

se(βj)

=(βj − βj)sd(βj)

σ

σ

≡ Normal(0, 1)√χ2n−k−1

n−k−1

∼ tn−k−1

Joao Valle e Azevedo (FEUNL) Econometrics Lisbon, March 2011 7 / 24

Normality The t Test The p-value CI The F test

Inference

Normal Sampling Distribution

Theorem

Under the CLM assumptions MLR.1 through MLR.6,

(βj − βj)se(βj)

∼ tn−k−1,

where k+1 is the number of unknown parameters in the population modely = β0 + β1x1 + ...+ βkxk + u (k slope parameters and the intercept β0)

Joao Valle e Azevedo (FEUNL) Econometrics Lisbon, March 2011 8 / 24

Normality The t Test The p-value CI The F test

Inference

Performing a test on a coefficient

Set the null hypothesis (and the alternative)I E.g., H0 : βj = 0 (coefficient on experience in our wage regression) and

H1 : βj > 0

Choose a significance level (Probability of rejecting the null if the nullis actually true)

I E.g., α = 0.05

Look at the sampling distribution of the ”test statistic” t (randomvariable) involving the parameter:

t =(βj − βj)se(βj)

∼ t(n−k−1),

I Under the null hypothesis, the test statistic should be ”small” acrosssamples. Reject the null if the observed value of the test statistic isvery unlikely (very large)

Joao Valle e Azevedo (FEUNL) Econometrics Lisbon, March 2011 9 / 24

Normality The t Test The p-value CI The F test

Inference

Performing a test on a coefficient

One-side Tests

I For one-sided tests where the alternative is favored if tobs is large andpositive (e.g., H1 : βj > 0), reject the null if the observed test statistic,tobs , is larger than c, where c is implicitly given by: Prob[t > c |H0 istrue]=α

I For one-sided tests where the alternative is favored if tobs is large andnegative (e.g., H1 : βj < 0), reject the null if the observed teststatistic, tobs , is smaller than -c, where c is implicitly given by:Prob[t < −c |H0 is true]=α

For two-sided tests, where the alternative is favored if tobs is large inabsolute value (e.g., H1 : βj 6= 0), reject the null if the absolute valueof observed test statistic, tobs , is larger than c, where c is implicitlygiven by: Prob[|t| > c |H0 is true]=α

Joao Valle e Azevedo (FEUNL) Econometrics Lisbon, March 2011 10 / 24

Normality The t Test The p-value CI The F test

Inference

One-Sided AlternativeH0 : βj = 0 H1 : βj > 0

(1-α)

α

Reject the null

Fail to rejectthe null

Figure: Rejection region for a 5% significance level for alternative H1 : βj > 0

Joao Valle e Azevedo (FEUNL) Econometrics Lisbon, March 2011 11 / 24

Normality The t Test The p-value CI The F test

Inference

Two-Sided AlternativeH0 : βj = 0 H1 : βj 6= 0

(1-α)

α/2α/2

Reject the null

Fail to rejectthe null

Reject the null

Figure: Rejection region for a 5% significance level for alternative H1 : βj 6= 0

Joao Valle e Azevedo (FEUNL) Econometrics Lisbon, March 2011 12 / 24

Normality The t Test The p-value CI The F test

Inference

Example: Hypothesis Testing

Independent Variable Coefficient Estimate Standard Error

Intercept 5.33815 0.01218

Education (in years) 0.07614 0.00079

n 11064

R2 0.4774

Labor Market Experience (in years) 0.03093 0.00087

Square of Labor Market Experience (inyears)

-0.00038 0.000018

t ratio

438.36

96.75

35.38

-20.64

Figure: Dependent Variable: Log of Wages

The ”t ratios” are the observed values of the test statistic for testingβj = 0

I E.g. 96.75=0.07614/0.00079

Joao Valle e Azevedo (FEUNL) Econometrics Lisbon, March 2011 13 / 24

Normality The t Test The p-value CI The F test

Inference

Example: Hypothesis Testing (Cont.)

Choose α = 0.05Test H0 : βj = 0 against H1 : βj 6= 0 (coefficient on education)

tobs =0.07614− 0

0.00079= 96.75

I |t| >1.96 ⇒ Reject the null: the coefficient for education is significantat 5% significance level

I We use Normal approximation since n is large

Reject the null Reject the null

Fail to reject thenull

-c=-1.96 c=1.96

Joao Valle e Azevedo (FEUNL) Econometrics Lisbon, March 2011 14 / 24

Normality The t Test The p-value CI The F test

Inference

Example: Hypothesis Testing (Cont.)

Choose α = 0.05Test H0 : βj = 0 against H1 : βj > 0 (clearly more reasonable...)

tobs =0.07614− 0

0.00079= 96.75

I |t| >1.645 ⇒ Reject the null: the coefficient for education is significantat 5% significance level

I We use Normal approximation since n is large

Fail to reject thenull

Reject the null

c=1.645

Joao Valle e Azevedo (FEUNL) Econometrics Lisbon, March 2011 15 / 24

Normality The t Test The p-value CI The F test

Inference

Example: Hypothesis Testing (Cont.)

Choose α = 0.05Test H0 : βj = 0.07 against H1 : βj 6= 0.07 (coefficient on education)

tobs =0.07614− 0.07

0.00079= 7.772

I |t| >1.96 ⇒ Reject the null: the coefficient for education is significantat 5% significance level

I We use Normal approximation since n is large

Reject the null Reject the null

Fail to reject thenull

-c=-1.96 c=1.96

Joao Valle e Azevedo (FEUNL) Econometrics Lisbon, March 2011 16 / 24

Normality The t Test The p-value CI The F test

Inference

p-value

p-value: Given the observed value of the t statistic, what would bethe smallest significance level at which the null H0 : βj = 0 would berejected against the alternative H1 : βj 6= 0?

I It is given by:

Prob[|t| > |tobs | | H0 true]

“p-value”/2 “p-value”/2

1-“p-value”

-tobs tobs

Figure: If the α > p − value we would reject the null!

Joao Valle e Azevedo (FEUNL) Econometrics Lisbon, March 2011 17 / 24

Normality The t Test The p-value CI The F test

Inference

Confidence Intervals

A (1− α)% confidence interval is defined as:

βj ± c × se(βj)

I where c is the (1− α2 ) percentile in a tn−k−1 distribution

If the hypothesized value of a parameter (bj) is inside the confidenceinterval, we would not reject the null βj = bj against βj 6= bj at thesignificance level α

Joao Valle e Azevedo (FEUNL) Econometrics Lisbon, March 2011 18 / 24

Normality The t Test The p-value CI The F test

Inference

Testing multiple exclusion restrictions

Unrestricted model:y = β0 + β1x1 + β2x2 + β3x3 + ...+ βkxk + u

H0 : βk−q+1 = βk−q+2 = ... = βk = 0 H1 : NotH0

Restricted model:y = β0 + β1x1 + β2x2 + β3x3 + ...+ βk−qxk−q + u

Under the null:

Fstatistic =(SSRr − SSRur )/q

SSRur/(n − k − 1)∼ F(q,n−k−1)

I r stands for restricted and ur for unrestricted, q is number ofrestrictions

I Does SSRur decrease enough compared to SSRr? If Fobs is ”too” largewe reject the null

Joao Valle e Azevedo (FEUNL) Econometrics Lisbon, March 2011 19 / 24

Normality The t Test The p-value CI The F test

Inference

Testing multiple exclusion restrictions

H0 : βk−q+1 = βk−q+2 = ... = βk = 0 H1 : NotH0

Fstatistic =(SSRr − SSRur )/q

SSRur/(n − k − 1)∼ F(q,n−k−1)

Fstatistic =(R2

ur − R2r )/q

(1− R2ur )/(n − k − 1)

∼ F(q,n−k−1)

Obtained by dividing the numerator and the denominator above bySST

This is different from testing significance of each coefficientindividually!! It is a test of joint significance

Joao Valle e Azevedo (FEUNL) Econometrics Lisbon, March 2011 20 / 24

Normality The t Test The p-value CI The F test

Inference

Testing multiple exclusion restrictions: F test

Reject the null if the observed test statistic, Fobs , is larger than c,where c is implicitly given by: Prob[F > c |H0istrue] = α

c

1-α

α

Fail to Rejectthe null

Reject the null

Joao Valle e Azevedo (FEUNL) Econometrics Lisbon, March 2011 21 / 24

Normality The t Test The p-value CI The F test

Inference

Example

H0 : β2 = β3 = 0

Independent Variable Coefficient Estimate Standard Error

Intercept 5.33815 0.01218

Education (in years) 0.07614 0.00079

Mean Square Error 0.11342

R2 0.4774

Labor Market Experience (in years) 0.03093 0.00087

Square of Labor Market Experience (inyears)

-0.00038 0.000018

t ratio

438.36

96.75

35.38

-20.64

Unrestricted model

Intercept 5.88400 0.00729

Education (in years) 0.06046 0.00081

Mean Square Error 0.14379

R2 0.3374

807.45

75.05

Restricted model

Figure: Dependent Variable: Log of monthly wage, n=11064

Joao Valle e Azevedo (FEUNL) Econometrics Lisbon, March 2011 22 / 24

Normality The t Test The p-value CI The F test

Inference

Example (Cont.)

α = 0.05

H0 : β2 = β3 = 0

Fstatistic =(0.4774− 0.3374)/2

(1− 0.4774)/(11064− 3− 1)

= 1581.4 > 3.00⇒ Reject H0

Joao Valle e Azevedo (FEUNL) Econometrics Lisbon, March 2011 23 / 24

Normality The t Test The p-value CI The F test

Inference

Overall significance of the model

H0 : β1 = β2 = ... = βk = 0 H1 : NotH0

Under the null use:

F =(SST − SSR)/k

SSR/(n − k − 1)

=SSE/k

SSR/(n − k − 1)

=R2/k

(1− R2)/(n − k − 1)∼ F(k,n−k−1)

Testing general linear restrictions: in the practice sessions!

Joao Valle e Azevedo (FEUNL) Econometrics Lisbon, March 2011 24 / 24