answer key: problem set 3 -...
TRANSCRIPT
ECON 482 / WH Hong Answer Key
Answer Key: Problem Set 3
1. In a study relating college grade point average to time spent in various activities, you
distribute a survey to several students. The students are asked how many hours they
spend each week in four activities: studying, sleeping, working, and leisure. Any activity
is put into one of the four categories, so that for each student, the sum of hours in the four
activities must be 168.
i. In the model
0 1 2 3 4GPA study sleep work leisure uβ β β β β= + + + + +
does it make sense to hold , , and fixed while changing
?
sleep work leisure
study
(Ans)
No. By definition, study + sleep + work + leisure = 168. Therefore, if we change
study, we must change at least one of the other categories so that the sum is still 168.
ii. Explain why this model violates Assumption MLR.3.
(Ans)
From part (i), we can write, say, study as a perfect linear function of the other
independent variables: study = 168 − sleep − work − leisure. This holds for every
observation, so MLR.3 violated
iii. How could you reformulate the model so that its parameters have a useful
interpretation and it satisfies Assumption MLR.3?
(Ans)
Simply drop one of the independent variables, say leisure:
GPA = 0β + 1β study + 2β sleep + 3β work + u.
Now, for example, 1β is interpreted as the change in GPA when study increases by
one hour, where sleep, work, and u are all held fixed. If we are holding sleep and
work fixed but increasing study by one hour, then we must be reducing leisure by one
hour. The other slope parameters have a similar interpretation.
1
ECON 482 / WH Hong Answer Key
2. Suppose that average worker productivity at manufacturing firms ( avgprod ) depends on
two factors, average hours of training ( avgtrain ) and average worker ability ( avgabil ):
0 1 2avgprod avgtrain avgabil uβ β β= + + +
Assume that this equation satisfies the Gauss-Markov assumptions. If grants have been
given to firms whose workers have less than average ability, so that avgtrain and
are negatively correlated, what is the likely bias in avgabil 1β obtained from the
simple regression of on ? avgprod avgtrain
(Ans)
We know that 1 1 1ˆ ˆ ˆ
2β β δ β= + , where 1̂δ is from the regression of 1 0 1 2ˆ ˆ ˆx x eδ δ= + + . By
definition, 2β > 0, and by assumption, Corr(x1,x2) < 0. Therefore, there is a negative
bias in 1β : E( 1β 1) < β . This means that, on average across different random samples,
the simple regression estimator underestimates the effect of the training program. It is
even possible that E( 1β ) is negative even though 1β > 0.
3.
i. Consider the simple regression model 0 1y x uβ β= + + under the first four Gauss-
Markov assumptions. For some function ( )g x , for example ( ) 2 or
)2 , define
g x x=
( ) (log 1g x x= + ( )i . Define a slope estimator as iz g x=
( ) ( )11 1
n n
i i ii i
z z y z z xβ= =
⎛ ⎞ ⎛= − −⎜ ⎟ ⎜⎝ ⎠ ⎝∑ ∑ i
⎞⎟⎠
Show that 1β is linear (in iy ) and unbiased. Remenber, because ( ) 0E u x = , you
can treat both ix and as nonrandom in your derivation. iz
(Ans)
For notational simplicity, define szx = 1
( )n
ii
z z x=
−∑ ;i this is not quite the sample
covariance between z and x because we do not divide by n – 1, but we are only using
it to simplify notation. Then we can write 1β as
2
ECON 482 / WH Hong Answer Key
11
( ).
n
i ii
zx
z z y
sβ =
−=
∑
This is clearly a linear function of the yi: take the weights to be wi = (zi − z )/szx.
To show unbiasedness, as usual we plug yi = 0β + 1β xi + ui into this equation, and
simplify:
0 11
1
0 11 1
11
( )( )
( ) ( )
( )
n
i i ii
zx
n n
i zx ii i
zx
n
i ii
zx
z z x u
s
z z s z z u
s
z z u
s
β ββ
β β
β
=
= =
=
− + +=
− + + −=
−= +
∑
∑ ∑
∑
i
where we use the fact that 1
(n
ii
z z=
)−∑ = 0 always. Now szx is a function of the zi
and xi and the expected value of each ui is zero conditional on all zi and xi in the
sample. Therefore, conditional on these values,
11 1
( )E( )E( )
n
i ii
zx
z z u
s 1β β β=
−= + =
∑
because E(ui) = 0 for all i.
ii. Add the homoskedasticity assumption, MLR.5. Show that
( ) ( ) ( )2
221
1 1
varn n
i ii i
z z z z xβ σ= =
⎛ ⎞ ⎛= − −⎜ ⎟ ⎜⎝ ⎠ ⎝∑ ∑ i
⎞⎟⎠
.
(Ans)
From the fourth equation in part (i) we have (again conditional on the zi and xi in the
sample),
3
ECON 482 / WH Hong Answer Key
2
1 11 2 2
2
2 12
Var ( ) ( ) Var(Var( )
( )
n n
i i i ii i
zx zxn
ii
zx
z z u z z u
s s
z z
s
β
σ
= =
=
⎡ ⎤− −⎢ ⎥⎣ ⎦= =
−=
∑ ∑
∑
)
because of the homoskedasticity assumption [Var(ui) = σ2 for all i]. Given the
definition of szx, this is what we wanted to show.
iii. Show directly that, under the Gauss-Markov assumption, ( ) ( )1̂var var 1β β≤ , where
1̂β is the OLS estimator. [Hint: The Cauch inequality implies that
( )( ) ( ) ( )2
2 21 1 1
1 1
n n
i i i ii i
n z z x x n z z n x x− − −
= =
⎛ ⎞ ⎛ ⎞⎛− − ≤ − −⎜ ⎟ ⎜ ⎟⎜⎝ ⎠ ⎝ ⎠⎝
∑ ∑ ∑1
n
i=
⎞⎟⎠
;
notice that we can drop x from the sample covariance.]
(Ans)
We know that Var( 1̂β ) = σ2/ 2
1
[ ( ) ]n
ii
x x=
−∑ . Now we can rearrange the inequality in
the hint, drop x from the sample covariance, and cancel n-1 everywhere, to get
2 2
1
[ ( ) ] /n
i zxi
z z s=
−∑ ≥ 2
1
1/[ ( ) ].n
ii
x x=
−∑ When we multiply through by σ2 we get
Var( 1β ) ≥ Var( 1̂β ), which is what we wanted to show.
Computer Exercises
4. Confirm the partiallling out interpretation of the OLS estimates by explicitly doing the
partialling out for Example 3.2 in the textbook, using the data set in WAGE1.dta. This
first requires regressing educ on exper and tenure and saving the residuals, 1̂r .
Then, regress ( ) on 1̂r . Compare the coefficient on 1̂r with the coefficient on
educ in the regression of ( )wage on educ , exper and tenure .
log wage
log
(Ans)
The regression of educ on exper and tenure yields
4
ECON 482 / WH Hong Answer Key
educ = 13.57 − .074 exper + .048 tenure + . 1̂r
n = 526, R2 = .101.
Now, when we regress log(wage) on we obtain 1̂r
= 1.62 + .092 log( )wage 1̂r
n = 526, R2 = .207.
As expected, the coefficient on in the second regression is identical to the coefficient
on educ in equation (3.19). Notice that the R-squared from the above regression is
below that in (3.19). In effect, the regression of log(wage) on explains log(wage)
using only the part of educ that is uncorrelated with exper and tenure; separate effects of
exper and tenure are not included.
1̂r
1̂r
5. Use the data set in WAGE2.dta for this problem. As usual, be sure all of the following
regression contain an intercept.
i. Run a simple regression of IQ on educ to obtain the slope coefficient, say, 1δ .
(Ans)
The slope coefficient from the regression IQ on educ is (rounded to five decimal
places) 1 3.53383.δ =
ii. Run the simple regression of ( ) on educ , and obtain the slope coefficient,
1
log wage
β .
(Ans)
The slope coefficient from log(wage) on educ is 1 .05984.β =
iii. Run the multiple regression of ( ) on educ and log wage IQ , and obtain the slope
coefficients, 1̂β and 2β̂ , respectively.
(Ans)
The slope coefficients from log(wage) on educ and IQ are
5
ECON 482 / WH Hong Answer Key
6
1 2ˆ ˆ.03912 and .00586,β β= = respectively.
iv. Verify that 1ˆ ˆ
1 1 2β β β δ= + .
(Ans)
We have which is very close
to .05984; the small difference is due to rounding error.
1 1 2ˆ ˆ .03912 3.53383(.00586) .05983,β δ β+ = + ≈