econ 113: solutions for homework 4 - ucsc …cdobkin/solution key hw4.pdf · econ 113: solutions...

Econ 113: Solutions for Homework 4

Professor Carlos Dobkin

Spring 2013

1. Derive the formula for the variance of β̂0

We have β̂0 = µ̂Y − β̂1µ̂X

where µ̂Y =1

n

n∑i=1

Yi where Yi = β0 + β1Xi + ui

=1

n

n∑i=1

(β0 + β1Xi + ui) = β0 + β1µ̂X + µ̂u

Therefore β̂0 becomes β̂0 = β0 + β1µ̂X + µ̂u − β̂1µ̂X , and if we take a variance of it, we have

V ar(β̂0) = V ar(β0 + β1µ̂X + µ̂u − β̂1µ̂X)

= V ar(µ̂u − β̂1µ̂X) (since µ̂u and β̂1 are the only stochastic part)

= V ar(µ̂u) + µ̂2XV ar(β̂1) + Cov(µ̂u, β̂1) (Cov(µ̂u, β̂1) = 0 by assumption)

= V ar

(1

n

n∑i=1

ui

)+

σ2µ̂2X∑n

i=1(Xi − µ̂X)2(using what we found for V ar(β̂1))

=1

n2

n∑i=1

V ar(ui) +σ2µ̂2

X∑ni=1(Xi − µ̂X)2

(suppose Cov(ui, uj) = 0 i 6= j)

=1

n2

n∑i=1

σ2 +σ2µ̂2

X∑ni=1(Xi − µ̂X)2

(By SR6)

=σ2

n+

σ2µ̂2X∑n

i=1(Xi − µ̂X)2

= σ2

(1

n+

µ̂2X∑n

i=1(Xi − µ̂X)2

)

1

2. You want to determine if attending class leads to improved performance on the �nal exam(use the data from the course web page). Lets assume that assumptions SR1-SR6 hold andthat final = β0 + β1attendance+ u

(a) Use R to estimate β̂0 and β̂1

> data <- read.csv("http://cdobkin.org/attendance.csv")

> attach(data)

> reg <- lm(finalexam ∼ class_atten)

> print(reg)

We have the following result:

Call:lm(formula = finalexam ~ class_atten)

Coefficients:(Intercept) class_atten 82.949 1.098

Here we can �nd that β̂0 = 82.949 and β̂1 = 1.098.

(b) Use R to estimate the se(β̂0) and se(β̂1)

> summary(reg)

We have the following result:

Call:lm(formula = finalexam ~ class_atten)

Residuals: Min 1Q Median 3Q Max ‐74.72 ‐16.33 0.18 15.88 58.88

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 82.9485 9.2116 9.005 4.12e‐16 ***class_atten 1.0985 0.4257 2.580 0.0107 * ‐‐‐Signif. codes: 0 ?**?0.001 ?*?0.01 ??0.05 ??0.1 ??1

Residual standard error: 24.64 on 171 degrees of freedom Multiple R‐squared: 0.03747, Adjusted R‐squared: 0.03184

F‐statistic: 6.657 on 1 and 171 DF, p‐value: 0.01071

We can read the �gures under Std.Error column, and here we can �nd that se(β̂0) =

9.2116 and se(β̂1) = 0.4257. We can also �nd the coe�cient estimates for part (a) under

2

Estimate column (82.9485 and 1.0985).

(c) Conduct a signi�cance test at the 5 percent level that β1 is equal to 0.

• The null and alternative hypotheses are:

H0 : β1 = 0

H1 : β1 6= 0

• To test the null against the alternative, we know that we have to do a t-test. Therelevant formula in general is:

t =β̂1 − β1se(β̂1)

and if we apply what we have found from part (a) and (b), we have

tβ̂1

=β̂1

se(β̂1)=

1.098

0.4257= 2.579

• We reject the null hypothesis if|tβ̂1| > c

Since we have more than 120 samples (173 observations), we can use the normaldistribution. The critical value for 5% con�dence level in normal distribution is 1.96and therefore

|2.579| > 1.96

and we reject the null hypothesis, i.e., we have a statistical evidence that β1 is notzero.

(d) Compute the p-value for the hypothesis that β1 is equal to 0.

We have found the t-value of 2.579 in part (c), and if we look up the t-value in thenormal distribution table, we have 0.99506 and therefore

p− value = 2 ∗ (1− 0.99506) = 0.00988

We can use R to �nd the p-value using the following code:

> 2*pnorm(-abs(2.579))

[1] 0.009908679

which provide the similar result.

(e) How do you interpret the results from the prior two questions? Does attendance matter?

We can infer that the variable, attendance, is a signi�cant determinant of the �nal score.So yes, attendance does matter.

3

(f) Do you think the zero conditional mean assumption really is met in this context? Whyor why not?

No. There are other factors that a�ect �nal scores than only attendance, and the impactof those omitted variables are embedded in the error term. Therefore it seems that wewould not have zero conditional mean.

(g) If the zero conditional mean assumption is not meet what does that imply for the esti-mates above (which way does the bias go)?

When we have an omitted variable, the expected value of the estimate is

E[β̂1] = β1 + β2α1

whereX2 = α0 + α1X1.

Suppose we have another variable, Hours of studying. Suppose we assume that the stu-dents who put more time in studying attend more classes: α1 > 0 and that the hoursof studying has a positive relation with �nal score: β2 > 0. That is, β2α1 > 0: upwardbias � we overestimated β1.

3. Can you think of an approach to estimating the return to attending class for which the �veassumptions under which we get unbiased estimates of the causal e�ect of class attendanceare likely to be met?

A randomized trial run on a randomly selected subsample of the population where we arecareful with the measurement. In this case the zero conditional mean assumption and the nomeasurement error assumption are met. These are the two assumptions we are most worriedabout.

4

econ 113: solutions for homework 4 - ucsc …cdobkin/solution key hw4.pdf · econ 113: solutions...

Documents