montana state university | montana state university - pa nj ...1. there are different ways to...
Post on 18-Sep-2020
5 Views
Preview:
TRANSCRIPT
1. (25 points) In 1992, there was an increase in the (state) minimum wage in one U.S.
state (New Jersey) but not in a neighboring location (eastern Pennsylvania). The
study provides you with the following information,
PA NJ
FTE Employment before 23.33 20.44
FTE Employment after 21.17 21.03
Where FTE is “full time equivalent” and the table reports the average FTEs per fast food
restaurant.
(a) Calculate the change in the treatment group, the change in the control group, and
finally diffindiff
(b) Since minimum wages represent a price floor, did you expect diffindiff to be
positive or negative?
(c) The standard error for diffindiff is 1.36. Test whether or not the coefficient is
statistically significant, given that there are 410 observations.
(d) If you believed that the benefit from small minimum wage increases outweighed
the cost in terms of employment loss, would finding that this coefficient was not
statistically significant discourage you?
(e) Do you have any concerns about forming a conclusion about the effect of the
minimum wage law using this approach? (Stronger answers will make specific
reference to the information presented.)
2. ( 20 points) You have estimated the effect of X on Y using a number of different
models:
Parameter Estimates and Standard Errors
Variable
Pooled Between
First
Difference
Fixed
Random
Xit
0.3201
(0.0842)
0.1201
(0.0542)
0.0543
(0.0321)
0.0271
(0.0242)
0.0735
(0.0223)
(a) What is the Hausman test statistic for the null hypothesis that ui and Xit are
uncorrelated?
(b) Can you reject the null?
(c) What does that mean about which estimator you would prefer to use?
(d) Explain the intuition for this test. [Why does the answer to (b) imply the
estimator you provided in (c)?]
3. (20 points) A classmate is interested in estimating the variance of the error term in the
following equation
yi =β0 + β1xi + ui
where i denotes entities, y is the dependent variable, and x is an explanatory variable
for each entity and z is an instrument.
Suppose that she uses the following estimator for 2u from the second-stage
regression of TSLS:
210
2 )ˆˆˆ(2
1ˆ i
TSLSTSLSi xy
n
where ix is the fitted value from the first-stage regression. Is this a consistent
estimator for2u ?
4. Evans and Schwab (1995) studied the effect of attending Catholic high school on the
probability of attending college. Let college be a binary variable equal to 1 if a
student attends college and zero otherwise. Let CathHS be a binary variable equal to
1 if a student attends a Catholic high school and zero otherwise. A linear probability
model is
collegei = β0 + β1CathHSi + ui
(i) (5 pts) Interpret the magnitudes of the OLS coefficient estimates of β0 and
β1 (I am looking for more substantive answers than “β0 is the constant.”)
(ii) (5 pts) Why might CathHS be correlated with u?
(iii) (10 pts) Let CathRel be a binary variable equal to 1 if a student is
Catholic. How would you interpret the 2SLS estimate of β1, using this
variable as an IV for CathHS? Use the definition of the Wald estimator in
your answer.
(iv) (5 pts) Do you think CathRel is a convincing instrument for CathHS?
Explain.
5. You would like to study the impact of price on demand. By luck you have access to
sales and prices for fish sales for 97 days. Since fish prices change often this seems
like a good market to look at. Given the data at hand you write down the following
model:
qi =β0 +β1 pi +β2 Moni +β3 Tuesi +β4 Wedi +β5 Thursi +β6 ti +ui i=1,. . . . ,97
where q is the quantity (in pounds) purchased on a particular day
p is the price on a particular day (per pound)
t is a trend variable (1 for 1st day, 2 for 2
nd day, etc.)
Mon =1 if the day is Monday
Tues =1 if the day is Tuesday
Wed =1 if the day is Wednesday
Thurs =1 if the day is Thursday
The omitted variable is Friday; no fish are sold on the weekend (the market is closed).
OLS Estimates—Dependent variable is q
Number of observations = 97 R2 = 0.23
Coefficient Standard Error
p -2484.70 742.05
Mon -1110.75 776.06
Tues -2326.37 763.37
Wed -1981.41 753.49
Thurs -39.46 754.01
t -4.24 9.07
Constant 7519.11 1030.33
(a) (5 pts) Based on the OLS results above, is price an important determinant of quantity
demanded? (Comment on both the magnitude and statistical significance.)
(b) (5 pts) Do you have any concerns about the OLS estimate of β1?
(c) (10 pts) Weather conditions can be assumed to affect supply while having a
negligible effect on demand. Average wind speed (wind) and average wave height
(wave) over the past two days are proposed as instrumental variable for price in the
demand equation. What needs to be true for these variables to be valid instruments?
(d) (10 pts) Given the attached regression results reported below, discuss the validity of
wind and wave. You may need to use the provided F or chi-squared tables. (Note
that not all of the information provided may be relevant.)
(e) (5 pts) Give what you found in (d), explain what these results imply about the TSLS
estimates.
OLS Model
pi = π 0 + π 1 windi + π 2 wavei+ π 3Moni + π 4 Tuesi + π 5 Wedi + π 6Thursi + π 7 ti +vi
Dependent Variable is p Number of observations = 97 R
2 = 0.33
Coefficient Standard Error
wind -0.005 0.009
wave 0.105 0.021
Mon -0.058 0.097
Tues -0.009 0.094
Wed 0.038 0.093
Thur 0.0877 0.093
t -0.002 0.001
Constant 0.450 0.147
H0: π1 = 0 H1: π1 0 Test Statistic: F(1, 89) = 0.31
H0: π2 = 0 H1: π2 0 Test Statistic: F(1, 89) = 25.76
H0: π1 = π2 = 0 H1: H0 is not true Test Statistic: F(2, 89) = 15.86
H0: π3 = π4 = π5 = π6 =0 H1: H0 is not true Test Statistic: F(4, 89) = 0.63
TSLS Model
iiiiiiii utThursWedTuesMonpq 6543210 ˆ
pi = π 0 + π 1 windi + π 2 wavei+ π 3Moni + π 4 Tuesi + π 5 Wedi + π 6Thursi + π 7 ti +vi
iiiiiiiiTSLSi etThursWedTuesMonWavewindu 76543210ˆ
Dependent Variable = q Instruments = wind and wave
Number of observations = 97 Coefficient Standard Error
p -3400.917 1459.99
Mon -1150.396 784.484
Tues -2349.607 770.464
Wed -1987.691 759.896
Thurs 0.203 762.305
t -7.565 10.211
Constant 8463.304 1657.892
H0: β1 =0 H1: β10 Test Statistic: F(1, 90) = 5.43
H0: β2 = β3 = β4 = β5 =0 H1: H0 is not true Test Statistic: F(4, 90) = 4.02
Dependent Variable = TSLSu
Number of observations = 97 R2 = 0.02
Coefficient Standard Error
wind -94.234 76.397
wave 91.137 169.682
Mon 206.940 797.556
Tues 21.025 767.865
Wed 21.498 760.581
Thurs 0.629 760.174
t -0.411 9.102
Constant 637.768 1208.353
H0: θ1 =0 H1: θ 10 Test Statistic: F(1, 94) = 1.52
H0: θ2 =0 H1: θ 20 Test Statistic: F(1, 94) = 0.29
H0: θ1 =θ2 =0 H1: H0 is not true Test Statistic: F(2, 94) = 0.77
1. There are different ways to combine features of the Breusch-Pagan and White tests
for heteroskedasticity. One possibility that we did not cover is to run the regression:
2ˆiu on x1i, x2i, x3i, . . . ., xki, and
2ˆiy
where the 2ˆiu are the squares of the OLS residuals (estimated under homoskedasticty) and
the 2ˆiy are the squares of the predicted values from the OLS regression.
a) (6pts) How would you construct the LM statistic for this test? What are its
degrees of freedom?
b) (5pts) How will the R2 from this regression compare with the R
2 from the usual
BP test and the White test?
c) (5pts) Does this imply that the new test statistic always delivers a smaller/larger
p-value than the BP or White tests? Explain.
d) (5pts) Suppose someone suggested also adding iy to this regression. What do
you think of this idea?
2. For one semester, you collect data on a random sample of college juniors and senior
for each class taken. In other words, you have multiple observations for each student,
with a different observation for each class the student takes. This information
includes measures of a standardized final exam score, percentage of lectures attended,
a dummy variable indicating whether the class is within the student’s major,
cumulative GPA prior to the semester, and SAT score.
a) (6pts) Write a model for final exam performance in terms of attendance and other
characteristics. Use subscripts for students and classes.
b) (6pts) Suppose that your main interest is the effect of attendance on final exam
performance. If you pool all of the data and use OLS, what are you assuming?
Explain this in the context of the other variables in your model.
c) (10 pts) What other models (other than OLS) might you consider using?
Carefully describe under what circumstances would you use those models?
3. Yet another attempt to identify the returns to education! Based on 1991 data from the
NLSY, you run a regression of log wages (lwage) on the highest year of education
completed (educ):
. reg lwage educ
Source | SS df MS Number of obs = 1230
-------------+------------------------------ F( 1, 1228) = 236.62
Model | 69.9901106 1 69.9901106 Prob > F = 0.0000
Residual | 363.229151 1228 .295789211 R-squared = 0.1616
-------------+------------------------------ Adj R-squared = 0.1609
Total | 433.219262 1229 .352497365 Root MSE = .54387
------------------------------------------------------------------------------
lwage | Coef. Std. Err.
-------------+----------------------------------------------------------------
educ | .1013613 .0065894
_cons | 1.092319 .0872969
------------------------------------------------------------------------------
a) (5pts) Interpret the coefficient on educ. Comment on its magnitude and statistical
significance.
b) (5pts) Of course, having taken ECON562, you know that this return may not
represent the causal effect of education. Why not?
c) (6pts) You are considering using the change in college tuition for an individual
from age 17 to age 18, ctuit, as a potential instrument. This variable is measured
in thousands of dollars.
What conditions should ctuit satisfy in order to be a valid instrument?
d) You use ctuit as an instrument to predict education, and include use the predicted
values of education (educhat1) to estimate the return to education. Your second
stage results are the following:
. reg lwage educhat1
Source | SS df MS Number of obs = 1230
-------------+------------------------------ F( 1, 1228) = 3.72
Model | 1.30777496 1 1.30777496 Prob > F = 0.0541
Residual | 431.911487 1228 .351719452 R-squared = 0.0030
-------------+------------------------------ Adj R-squared = 0.0022
Total | 433.219262 1229 .352497365 Root MSE = .59306
------------------------------------------------------------------------------
lwage | Coef. Std. Err.
-------------+----------------------------------------------------------------
educhat1 | .8202563 .4253841
_cons | -8.280202 5.545928
------------------------------------------------------------------------------
Assume for this part of the question that ctuit is a valid instrument.
e) (3pts) Interpret the coefficient on educ. What is the 95% confidence interval for
the return based on these results?
f) (8pts) What do the results thus far indicate about whether education is exogenous?
(Your answer should include a test statistic and your interpretation of that test
statistic.)
g) (5pts) After puzzling over this, you remember that you were supposed to check
your first stage results. Those results are Education Regression #1 on the
attached sheet. Based on these, comment on the validity of ctuit as an instrument.
h) (6pts) Does your answer in (g) affect your conclusions in (f) about the exogeneity
of education? Explain.
i) (7pts) After thinking about this some more, you decide other control variables are
warranted. You include a quadratic in years of experience (exper), dummy
variables for region of the country where a person resides, and a dummy variable
for whether or not the person lives in an urban area. You aren’t exactly sure what
to do here, so you estimate a bunch of regressions and make several predictions
for education (See the attached sheet—Education regressions #1 and #2 and Wage
Regressions #1- #4).
Which of the wage regressions gives you the best estimate of the return to
education and why? What is that estimated return?
j) (5pts) Comment on the validity of ctuit as an instrument based on the education
regression you choose.
k) (7pts) Explain why your conclusions about (1) the validity of the instrument and
(2) the return to education changed in the way that they did after adding control
variables. What exactly changed that affected your conclusions (coefficients?
Standard errors?) Can you propose an explanation for why those parameters
changed in the way that they did?
Education Regression #1 and Prediction educhat1 . reg educ ctuit
Source | SS df MS Number of obs = 1230
-------------+------------------------------ F( 1, 1228) = 0.35
Model | 1.94372373 1 1.94372373 Prob > F = 0.5540
Residual | 6810.33595 1228 5.54587618 R-squared = 0.0003
-------------+------------------------------ Adj R-squared = -0.0005
Total | 6812.27967 1229 5.54294522 Root MSE = 2.355
------------------------------------------------------------------------------
educ | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ctuit | -.0494466 .0835227
_cons | 13.03836 .0671676
------------------------------------------------------------------------------
. predict educhat1, xb
. test ctuit
( 1) ctuit = 0
F( 1, 1228) = 0.35
Prob > F = 0.5540
Education Regression #1 and Prediction educhat2 . reg educ ctuit exper expersq ne nc west urban
Source | SS df MS Number of obs = 1230
-------------+------------------------------ F( 7, 1222) = 163.72
Model | 3296.87961 7 470.982801 Prob > F = 0.0000
Residual | 3515.40006 1222 2.87675946 R-squared = 0.4840
-------------+------------------------------ Adj R-squared = 0.4810
Total | 6812.27967 1229 5.54294522 Root MSE = 1.6961
------------------------------------------------------------------------------
educ | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ctuit | -.1952777 .0604338 -3.23 0.001 -.3138431 -.0767122
exper | -.8877036 .0816872 -10.87 0.000 -1.047966 -.727441
expersq | .0171023 .0037967 4.50 0.000 .0096535 .024551
ne | .1141177 .1458075 0.78 0.434 -.1719431 .4001784
nc | -.0081578 .1269651 -0.06 0.949 -.2572517 .240936
west | .2501201 .1558156 1.61 0.109 -.0555757 .555816
urban | -.2010627 .1320998 -1.52 0.128 -.4602303 .0581049
_cons | 20.53855 .4470007 45.95 0.000 19.66158 21.41552
------------------------------------------------------------------------------
. test ctuit
( 1) ctuit = 0
F( 1, 1222) = 10.44
Prob > F = 0.0013
. predict educhat2, xb
Wage Regression #1: reg lwage educ exper expersq ne nc west urban Source | SS df MS Number of obs = 1230
-------------+------------------------------ F( 7, 1222) = 46.79
Model | 91.5734225 7 13.0819175 Prob > F = 0.0000
Residual | 341.64584 1222 .279579247 R-squared = 0.2114
lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
educ | .1348698 .0088801 15.19 0.000 .1174479 .1522917
exper | .111761 .026586 4.20 0.000 .0596018 .1639202
expersq | -.0032241 .0011917 -2.71 0.007 -.0055621 -.0008861
ne | .1278787 .045432 2.81 0.005 .0387454 .217012
nc | -.0120877 .0395779 -0.31 0.760 -.0897358 .0655605
west | -.0023592 .0486249 -0.05 0.961 -.0977568 .0930384
urban | .2230756 .04121 5.41 0.000 .1422253 .3039259
_cons | -.3462225 .2287505 -1.51 0.130 -.7950098 .1025648
Wage Regression #2: reg lwage ctuit exper expersq ne nc west urban Source | SS df MS Number of obs = 1230
-------------+------------------------------ F( 7, 1222) = 12.51
Model | 28.9618019 7 4.13740028 Prob > F = 0.0000
Residual | 404.25746 1222 .330816252 R-squared = 0.0669
lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ctuit | -.0488464 .0204938 -2.38 0.017 -.0890533 -.0086396
exper | -.0095934 .027701 -0.35 0.729 -.0639402 .0447534
expersq | -.0008583 .0012875 -0.67 0.505 -.0033842 .0016676
ne | .1410388 .0494449 2.85 0.004 .0440325 .2380451
nc | -.0137673 .0430552 -0.32 0.749 -.0982377 .0707031
west | .0315301 .0528388 0.60 0.551 -.0721347 .1351948
urban | .1972751 .0447965 4.40 0.000 .1093886 .2851616
_cons | 2.433934 .1515828 16.06 0.000 2.136543 2.731325
Wage Regression #3: reg lwage educhat1 exper expersq ne nc west urban Source | SS df MS Number of obs = 1230
-------------+------------------------------ F( 7, 1222) = 12.51
Model | 28.9617815 7 4.13739736 Prob > F = 0.0000
Residual | 404.257481 1222 .330816269 R-squared = 0.0669
lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
educhat1 | .9878573 .4144624 2.38 0.017 .1747204 1.800994
exper | -.0095934 .027701 -0.35 0.729 -.0639402 .0447534
expersq | -.0008583 .0012875 -0.67 0.505 -.0033842 .0016676
ne | .1410389 .0494449 2.85 0.004 .0440326 .2380452
nc | -.0137673 .0430553 -0.32 0.749 -.0982377 .0707031
west | .03153 .0528388 0.60 0.551 -.0721347 .1351948
urban | .1972751 .0447965 4.40 0.000 .1093885 .2851616
_cons | -10.4461 5.396812 -1.94 0.053 -21.03415 .1419391
Wage Regression #4: reg lwage educhat2 exper expersq ne nc west urban Source | SS df MS Number of obs = 1230
-------------+------------------------------ F( 7, 1222) = 12.51
Model | 28.9618029 7 4.13740041 Prob > F = 0.0000
Residual | 404.257459 1222 .330816251 R-squared = 0.0669
lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
educhat2 | .2501384 .1049467 2.38 0.017 .0442427 .4560342
exper | .2124554 .0957597 2.22 0.027 .0245838 .4003271
expersq | -.0051362 .0021646 -2.37 0.018 -.009383 -.0008895
ne | .1124936 .0513506 2.19 0.029 .0117486 .2132386
nc | -.0117267 .0430533 -0.27 0.785 -.0961932 .0727398
west | -.0310346 .0589366 -0.53 0.599 -.1466627 .0845935
urban | .2475686 .0500256 4.95 0.000 .1494229 .3457143
_cons | -2.703547 2.15156 -1.26 0.209 -6.924708 1.517615
1. For a large university, you are asked to estimate the demand for tickets to
women’s basketball games. You can collect time series data over 10 seasons, for
a total of about 150 observations (assuming about 15 games a season. One
possible model is
lATTENDt= β0+ β1lPRICEt+ β2 WINPERCt +β3 RIVALt +β4WEEKENDt + ut
where lATTEND is the natural log of attendance at a game, lPRICE is the natural log of
the real price of admissions, WINPERC is the team’s winning percentage, RIVAL is a
dummy variable indicating whether the game is with the major rival, and WEEKEND is a
dummy variable for whether or not the game is on a weekend.
a. Do you think a time trend should be included in the equation? Why or why
not?
b. The supply of tickets is fixed by stadium capacity which has not changed over
this time period. Does this mean that price is necessarily exogenous
(uncorrelated with ut) in the equation? Explain.
c. Suppose that the nominal price of admission changes slowly—say at the
beginning of each season. The athletic office chooses the price based partly
on last season’s average attendance, as well as the last season’s team success.
If lPRICEt is in fact endogenous, under what assumptions is last season’s
winning percentage a valid instrument for lPRICEt?
d. Suppose you are worried that some of the series, particularly lATTEND and
lPRICE might have unit roots. Why would this be a concern? How might
you change the estimated equation if these series did in fact have unit roots?
e. If some games are sold out, what problems does this cause for estimating the
demand function?
2. One type of partial adjustment model is
y*t = 0 + 1xt+et
yt-yt-1 = (y*t-yt-1) + ut
where y*t is the desired or optimal level of y and yt is the actual observed level. For
example, y*t is the desired growth in firm inventories and xt is growth in firm sales. The
parameter 1 measures the effect of xt on y*t. The second equation describes how the
actual y depends on the relationship between the desired y in time t and the actual y in
time t-1. The parameter measures the speed of adjustment and satisfies 0<<1.
a. Show that we can rewrite the model as a regression of yt on xt and yt-1. Show
how the coefficients in this model relate to the parameters in the two equation
model above, and show how the error term relates to the error terms above.
b. If E(et|xt, yt-1, xt-1, yt-2, . . . .) = E(ut|xt, yt-1, xt-1, yt-2, . . . .) = 0 and all series are
weakly dependent, if we estimate the model you specified in (a) by OLS will
we get consistent estimates? Explain why or why not.
c. Will the errors in your model in (a) be serially correlated? Explain why or
why not.
3. Let hy6t denote the three-month holding yield (in percent) from buying a 6 month
T-bill at time (t-1) and selling it at time t (three months later) as a three month T
bill.Let hy3t-1 be the three month holding yield (in percent) from buying a 3
month T bill at time (t-1). At time (t-1), then, hy3t-1 is known, whereas hy6t is
unknown because the price in time t of three month T-bills is unknown at time t-1.
The expectations hypothesis says that, of course, these two different three month
investments should be the same, on average. In other words,
E(hy6t|all information up to time t-1) = hy3t-1
Suppose you were to estimate the model
hy6t = 0 + 1 hy3t-1 + ut
a. 4 pts How would you test the expectations hypothesis using this model? What is
the null hypothesis?
b. 4pts Suppose your estimates of that equation produced the following
^hy6t = -.058 + 1.104 hy3t-1 + ut
(.070) (.039)
N=123, R2 = .866
Do you reject the test in (a) at the 5% significance level?
c. 4pts Another implication of the expectations hypothesis is that no other variables
dated as t-1 or earlier should explain hy6t after controlling for hy3t-1. Suppose
you were to test this implication by estimating the following equation, which
includes the lagged spread between the 6 and 3 month T-bill rates:
^hy6t = -.123+ 1.053 hy3t-1 + .480(r6t-1 – r3t-1)
(.067) (.039) (.109)
According to this, is the lagged spread term significant at the 5% level? What
do the results imply about whether you should invest in 6-month or 3-month T
bills if at time t-1, r6 is greater than r3?
d. 3pts Conduct the test you specified in (a) using the results in (c). Do you
conclude anything different from your results in (b)?
e. 6pts The sample correlation between hy3t and hy3t-1 is .914. Does this raise any
concerns with your previous analysis? Be specific.
4. For the US economy, let gprice be the monthly growth in prices and gwage be the
monthly growth in wages (gprice = ∆log(price); gwage=∆log(wage))
Using monthly data, suppose we estimate a distributed lag model:
Dependent var: gprice Standard errors in parentheses
Constant -.00093
(.00057)
gwage .119
(.052)
gwaget-1 .097
(.039)
gwaget-2 .040
(.039)
gwaget-3 .038
(.039)
gwaget-4 .081
(.039)
gwaget-5 .107
(.039)
gwaget-6 .095
(.039)
gwaget-7 .104
(.039)
gwaget-8 .103
(.039)
gwaget-9 .159
(.039)
gwaget-10 .110
(.039)
gwaget-11 .103
(.039)
gwaget-12 .016
(.052)
N=273 R2 =.317, Adj. R
2=.283
a. 4 pts What is the long run propensity? Is it much different from one? Explain what
the LRP tells us.
b. 6 pts What regression would you run to obtain the standard error of the LRP directly?
c. 6pts How would you test the joint significance of 6 more lags of gwage? Be specific
about the distribution of your test statistic (ie., if the distribution of the test statistic
depends on degrees of freedom, be sure to state these.)
5. The Report of the Presidential Commission on the Space Shuttle Challenger
Accident in 1986 shows a plot of the calculated joint temperature in Fahrenheit
and the number of O-rings that had some thermal distress. You collect the data for
the seven flights for which thermal distress was identified before the fatal flight
and produce the accompanying plot.
0
0.5
1
1.5
2
2.5
3
3.5
50 55 60 65 70 75 80
Temperature
No
. o
f In
cid
en
ts
a. 5 pts Do you see any problems other than sample size in fitting a linear model to
these data?
b. 6pts You look at all successful launches before Challenger, even those with no
incidents. You simplify the problem by specifying a binary variable, Ofail, which
equals 1 if there was some O-ring failure and is 0 otherwise. You fit a linear
probability model:
^OFail = 2.858 – 0.037×Temperature
(0.496) (0.007)
The numbers in parentheses are heteroskedasticity-robust standard errors.
Interpret the equation. What is your prediction for some O-ring thermal distress when the
temperature is 31o, the temperature on January 28, 1986? Above which temperature do
you predict values of less than zero? Below which temperature do you predict values of
greater than one?
c. 5pts Why do you think that heteroskedasticity-robust standard errors were used?
d. 5pts To fix the problem encountered in (b), you re-estimate the relationship
using a logit regression:
Pr(OFail=1|Temperature)=G (15.297 – 0.236× Temperature)
(7.329) (0.107)
Recall that the logistic function is z
z
e
ezG
1)(
Calculate the effect of a decrease in temperature from 800 to 70
0, and from 60
0 to 500.
Why is the change in probability not constant? How does this compare to the linear
probability model?
e. 5pts You want to see how sensitive the results are to using the logit, rather than
the probit estimation method. The probit regression is as follows:
Pr(OFail=1|Temperature)=Φ (8.900 – 0.137× Temperature)
(3.983) (0.058)
Why is the slope coefficient in the probit so different from the logit coefficient?
6. Suppose an author has data on outcomes (Y) for two periods, 1 and 2, for a
number of cross-sectional units (e.g., firms). Suppose also that a subgroup of
firms were hit with an intervention (e.g., minimum wage hike) sometime between
period 1 and 2.
a. 6 pts Write an equation for the difference-in-difference estimate (e.g., one that
controls for group and time effects) that provides a consistent estimate of the
intervention by using the first differences in outcomes between two periods.
That is, the dependent variable would be ΔYi = Y2i-Y1i.
b. 5 pts What is the advantage of the difference-in difference approach in this
context?
c. 5pts Are there potential problems with inference that the difference-in-
difference approach does not resolve?
7. Consider the bivariate regression model
yi = α + βx*i + ui
We do not observe the true x* i , but instead estimate
yi = α + βxi + vi
where the covariate of interest xi is subject to classical measurement error.
In particular, assume that the measured value xi is related to the true (unobserved) value
x i* according to the linear equation: xi = x i * + ei
where ei is an iid random error, with E[ei]=0 and var(ei)=σ2
e and Cov(ei,xi*)=0. (Also
assume X and u are uncorrelated in expectation.) We demonstrated in class that with
measurement error, is biased and inconsistent.
a. 5pts Suppose there is an instrument Zi where Cov(Z, v)=0, Cov(Z,e)=0 but
Cov(Z,X)≠0. In the presence of measurement error in X, is the IV estimate of
consistent? Explain.
b. 5pts Now consider this in a time series context:
yt = α + βx*t + ut;
xt = x t* + et
To simplify the algebra for the rest of this question, assume that the mean of x t* is zero.
In addition to the previous assumptions, assume that ut and et are uncorrelated with all
past values of x t* and et.
Show that E(x t-1, vt) = 0, where vt is the error term in the model from part (b).
c. 5pts Are xt and xt-1 likely to be correlated? Explain.
d. 6 pts What does all of this suggest as a useful strategy for consistently
estimating β?
top related