answer key: problem set 3 -...

ECON 482 / WH Hong Answer Key

Answer Key: Problem Set 3

1. In a study relating college grade point average to time spent in various activities, you

distribute a survey to several students. The students are asked how many hours they

spend each week in four activities: studying, sleeping, working, and leisure. Any activity

is put into one of the four categories, so that for each student, the sum of hours in the four

activities must be 168.

i. In the model

0 1 2 3 4GPA study sleep work leisure uβ β β β β= + + + + +

does it make sense to hold , , and fixed while changing

?

sleep work leisure

study

(Ans)

No. By definition, study + sleep + work + leisure = 168. Therefore, if we change

study, we must change at least one of the other categories so that the sum is still 168.

ii. Explain why this model violates Assumption MLR.3.

(Ans)

From part (i), we can write, say, study as a perfect linear function of the other

independent variables: study = 168 − sleep − work − leisure. This holds for every

observation, so MLR.3 violated

iii. How could you reformulate the model so that its parameters have a useful

interpretation and it satisfies Assumption MLR.3?

(Ans)

Simply drop one of the independent variables, say leisure:

GPA = 0β + 1β study + 2β sleep + 3β work + u.

Now, for example, 1β is interpreted as the change in GPA when study increases by

one hour, where sleep, work, and u are all held fixed. If we are holding sleep and

work fixed but increasing study by one hour, then we must be reducing leisure by one

hour. The other slope parameters have a similar interpretation.

1


2. Suppose that average worker productivity at manufacturing firms ( avgprod ) depends on

two factors, average hours of training ( avgtrain ) and average worker ability ( avgabil ):

0 1 2avgprod avgtrain avgabil uβ β β= + + +

Assume that this equation satisfies the Gauss-Markov assumptions. If grants have been

given to firms whose workers have less than average ability, so that avgtrain and

are negatively correlated, what is the likely bias in avgabil 1β obtained from the

simple regression of on ? avgprod avgtrain

(Ans)

We know that 1 1 1ˆ ˆ ˆ

2β β δ β= + , where 1̂δ is from the regression of 1 0 1 2ˆ ˆ ˆx x eδ δ= + + . By

definition, 2β > 0, and by assumption, Corr(x1,x2) < 0. Therefore, there is a negative

bias in 1β : E( 1β 1) < β . This means that, on average across different random samples,

the simple regression estimator underestimates the effect of the training program. It is

even possible that E( 1β ) is negative even though 1β > 0.

3.

i. Consider the simple regression model 0 1y x uβ β= + + under the first four Gauss-

Markov assumptions. For some function ( )g x , for example ( ) 2 or

)2 , define

g x x=

( ) (log 1g x x= + ( )i . Define a slope estimator as iz g x=

( ) ( )11 1

n n

i i ii i

z z y z z xβ= =

⎛ ⎞ ⎛= − −⎜ ⎟ ⎜⎝ ⎠ ⎝∑ ∑ i

⎞⎟⎠

Show that 1β is linear (in iy ) and unbiased. Remenber, because ( ) 0E u x = , you

can treat both ix and as nonrandom in your derivation. iz

(Ans)

For notational simplicity, define szx = 1

( )n

ii

z z x=

−∑ ;i this is not quite the sample

covariance between z and x because we do not divide by n – 1, but we are only using

it to simplify notation. Then we can write 1β as

2


11

( ).

n

i ii

zx

z z y

sβ =

−=

∑

This is clearly a linear function of the yi: take the weights to be wi = (zi − z )/szx.

To show unbiasedness, as usual we plug yi = 0β + 1β xi + ui into this equation, and

simplify:

0 11

1

0 11 1

11

( )( )

( ) ( )

( )

n

i i ii

zx

n n

i zx ii i

zx

n

i ii

zx

z z x u

s

z z s z z u

s

z z u

s

β ββ

β β

β

=

= =

=

− + +=

− + + −=

−= +

∑

∑ ∑

∑

i

where we use the fact that 1

(n

ii

z z=

)−∑ = 0 always. Now szx is a function of the zi

and xi and the expected value of each ui is zero conditional on all zi and xi in the

sample. Therefore, conditional on these values,

11 1

( )E( )E( )

n

i ii

zx

z z u

s 1β β β=

−= + =

∑

because E(ui) = 0 for all i.

ii. Add the homoskedasticity assumption, MLR.5. Show that

( ) ( ) ( )2

221

1 1

varn n

i ii i

z z z z xβ σ= =

⎛ ⎞ ⎛= − −⎜ ⎟ ⎜⎝ ⎠ ⎝∑ ∑ i

⎞⎟⎠

.

(Ans)

From the fourth equation in part (i) we have (again conditional on the zi and xi in the

sample),

3


2

1 11 2 2

2

2 12

Var ( ) ( ) Var(Var( )

( )

n n

i i i ii i

zx zxn

ii

zx

z z u z z u

s s

z z

s

β

σ

= =

=

⎡ ⎤− −⎢ ⎥⎣ ⎦= =

−=

∑ ∑

∑

)

because of the homoskedasticity assumption [Var(ui) = σ2 for all i]. Given the

definition of szx, this is what we wanted to show.

iii. Show directly that, under the Gauss-Markov assumption, ( ) ( )1̂var var 1β β≤ , where

1̂β is the OLS estimator. [Hint: The Cauch inequality implies that

( )( ) ( ) ( )2

2 21 1 1

1 1

n n

i i i ii i

n z z x x n z z n x x− − −

= =

⎛ ⎞ ⎛ ⎞⎛− − ≤ − −⎜ ⎟ ⎜ ⎟⎜⎝ ⎠ ⎝ ⎠⎝

∑ ∑ ∑1

n

i=

⎞⎟⎠

;

notice that we can drop x from the sample covariance.]

(Ans)

We know that Var( 1̂β ) = σ2/ 2

1

[ ( ) ]n

ii

x x=

−∑ . Now we can rearrange the inequality in

the hint, drop x from the sample covariance, and cancel n-1 everywhere, to get

2 2

1

[ ( ) ] /n

i zxi

z z s=

−∑ ≥ 2

1

1/[ ( ) ].n

ii

x x=

−∑ When we multiply through by σ2 we get

Var( 1β ) ≥ Var( 1̂β ), which is what we wanted to show.

Computer Exercises

4. Confirm the partiallling out interpretation of the OLS estimates by explicitly doing the

partialling out for Example 3.2 in the textbook, using the data set in WAGE1.dta. This

first requires regressing educ on exper and tenure and saving the residuals, 1̂r .

Then, regress ( ) on 1̂r . Compare the coefficient on 1̂r with the coefficient on

educ in the regression of ( )wage on educ , exper and tenure .

log wage

log

(Ans)

The regression of educ on exper and tenure yields

4


educ = 13.57 − .074 exper + .048 tenure + . 1̂r

n = 526, R2 = .101.

Now, when we regress log(wage) on we obtain 1̂r

= 1.62 + .092 log( )wage 1̂r

n = 526, R2 = .207.

As expected, the coefficient on in the second regression is identical to the coefficient

on educ in equation (3.19). Notice that the R-squared from the above regression is

below that in (3.19). In effect, the regression of log(wage) on explains log(wage)

using only the part of educ that is uncorrelated with exper and tenure; separate effects of

exper and tenure are not included.

1̂r

1̂r

5. Use the data set in WAGE2.dta for this problem. As usual, be sure all of the following

regression contain an intercept.

i. Run a simple regression of IQ on educ to obtain the slope coefficient, say, 1δ .

(Ans)

The slope coefficient from the regression IQ on educ is (rounded to five decimal

places) 1 3.53383.δ =

ii. Run the simple regression of ( ) on educ , and obtain the slope coefficient,

1

log wage

β .

(Ans)

The slope coefficient from log(wage) on educ is 1 .05984.β =

iii. Run the multiple regression of ( ) on educ and log wage IQ , and obtain the slope

coefficients, 1̂β and 2β̂ , respectively.

(Ans)

The slope coefficients from log(wage) on educ and IQ are

5


6

1 2ˆ ˆ.03912 and .00586,β β= = respectively.

iv. Verify that 1ˆ ˆ

1 1 2β β β δ= + .

(Ans)

We have which is very close

to .05984; the small difference is due to rounding error.

1 1 2ˆ ˆ .03912 3.53383(.00586) .05983,β δ β+ = + ≈

answer key: problem set 3 -...

Documents