practice questions multiple choice questions chapter...

46
Practice Questions Multiple Choice Questions Chapter 5 1) The confidence interval for a single coefficient in a multiple regression a. makes little sense because the population parameter is unknown. b. should not be computed because there are other coefficients present in the model. c. contains information from a large number of hypothesis tests. d. should only be calculated if the regression R 2 is identical to the adjusted R 2 . Answer : c 2) In the multiple regression model, the adjusted R 2 , 2 R a. cannot be negative. b. will never be greater than the regression R 2 . c. equals the square of the correlation coefficient r. d. cannot decrease when an additional explanatory variable is added. Answer : b 3) The following linear hypothesis can be tested using the F-test with the exception of a. 2 3 4 5 1 and / β β β β = = . b. 0 2 = β . c. 1 2 3 4 1 and 2 β β β β + = =− . d. β 0 = β 1 and β 1 = 0. Answer : a 4) When there are omitted variables in the regression, which are determinants of the dependent variable, then a. you cannot measure the effect of the omitted variable, but the estimator of your included variable(s) is (are) unaffected. b. this has no effect on the estimator of your included variable because the other variable is not included.

Upload: hanguyet

Post on 30-Jan-2018

274 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Practice Questions Multiple Choice Questions Chapter 5faculty.econ.ucdavis.edu/faculty/jorda/class/140/S03/midterm2... · Practice Questions Multiple Choice Questions Chapter 5 1)

Practice Questions Multiple Choice Questions Chapter 5 1) The confidence interval for a single coefficient in a multiple regression

a. makes little sense because the population parameter is unknown. b. should not be computed because there are other coefficients present in the

model. c. contains information from a large number of hypothesis tests. d. should only be calculated if the regression R2 is identical to the adjusted

R2.

Answer: c 2) In the multiple regression model, the adjusted R2, 2R

a. cannot be negative. b. will never be greater than the regression R2. c. equals the square of the correlation coefficient r. d. cannot decrease when an additional explanatory variable is added.

Answer: b

3) The following linear hypothesis can be tested using the F-test with the exception

of a. 2 3 4 51 and /β β β β= = . b. 02 =β . c. 1 2 3 41 and 2β β β β+ = = − . d. β0 = β1 and β1 = 0.

Answer: a 4) When there are omitted variables in the regression, which are determinants of the

dependent variable, then

a. you cannot measure the effect of the omitted variable, but the estimator of your included variable(s) is (are) unaffected.

b. this has no effect on the estimator of your included variable because the other variable is not included.

Page 2: Practice Questions Multiple Choice Questions Chapter 5faculty.econ.ucdavis.edu/faculty/jorda/class/140/S03/midterm2... · Practice Questions Multiple Choice Questions Chapter 5 1)

c. this will always bias the OLS estimator of the included variable. d. the OLS estimator is biased if the omitted variable is correlated with the

included variable.

Answer: d

5) Imagine you regressed earnings of individuals on a constant, a binary variable (“Male”) which takes on the value 1 for males and is 0 otherwise, and another binary variable (“Female”) which takes on the value 1 for females and is 0 otherwise. Because females typically earn less than males, you would expect

a. the coefficient for Male to have a positive sign, and for Female a negative

sign. b. both coefficients to be the same distance from the constant, one above and

the other below. c. none of the OLS estimators to exist because there is perfect

multicollinearity. d. this to yield a difference in means statistic.

Answer: c

7) When you have an omitted variable problem, the assumption that E(ui | Xi) = 0 is violated. This implies that

a. the sum of the residuals is no longer zero. b. there is another estimator called weighted least squares, which is BLUE. c. the sum of the residuals times any of the explanatory variables is no longer

zero. d. the OLS estimator is no longer consistent.

Answer: d

8) (Requires Calculus) In the multiple regression model you estimate the effect on Yi

of a unit change in one of the Xi while holding all other regressors constant. This

a. makes little sense, because in the real world all other variables change. b. corresponds to the economic principle of mutatis mutandis. c. leaves the formula for the coefficient in the single explanatory variable

case unaffected. d. corresponds to taking a partial derivative in mathematics.

Answer: d

8) You have to worry about perfect multicollinearity in the multiple regression

model because

Page 3: Practice Questions Multiple Choice Questions Chapter 5faculty.econ.ucdavis.edu/faculty/jorda/class/140/S03/midterm2... · Practice Questions Multiple Choice Questions Chapter 5 1)

a. many economic variables are perfectly correlated. b. the OLS estimator is no longer BLUE. c. the OLS estimator cannot be computed in this situation. d. in real life, economic variables change together all the time.

Answer: c

9) In a two regressor regression model, if you exclude one of the relevant variables then

a. it is no longer reasonable to assume that the errors are homoskedastic. b. OLS is no longer unbiased, but still consistent. c. you are no longer controlling for the influence of the other variable. d. the OLS estimator no longer exists.

Answer: c

10) The intercept in the multiple regression model

a. should be excluded if one explanatory variable has negative values. b. determines the height of the regression line. c. should be excluded because the population regression function does not go

through the origin. d. is statistically significant if it is larger than 1.96.

Answer: b

11) In the multiple regression model, the t-statistic for testing that the slope is significantly different from zero is calculated

a. by dividing the estimate by its standard error. b. from the square root of the F-statistic. c. by multiplying the p-value by 1.96. d. using the adjusted R2 and the confidence interval.

Answer: a

12) In the multiple regression model, the least squares estimator is derived by

a. minimizing the sum of squared prediction mistakes. b. setting the sum of squared errors equal to zero. c. minimizing the absolute difference of the residuals. d. forcing the smallest distance between the actual and fitted values.

Answer: a

Page 4: Practice Questions Multiple Choice Questions Chapter 5faculty.econ.ucdavis.edu/faculty/jorda/class/140/S03/midterm2... · Practice Questions Multiple Choice Questions Chapter 5 1)

13) To test joint linear hypotheses in the multiple regression model, you need to

a. compare the sums of squared residuals from the restricted and unrestricted model. b. use the heteroskedasticity-robust F-statistic. c. use several t-statistics and perform tests using the standard normal

distribution. d. compare the adjusted R2 for the model which imposes the restrictions, and

the unrestricted model.

Answer: b 14) The sample regression line estimated by OLS

a. has an intercept that is equal to zero. b. is the same as the population regression line. c. cannot have negative and positive slopes. d. is the line that minimizes the sum of squared prediction mistakes.

Answer: d

15) The OLS residuals in the multiple regression model

a. cannot be calculated because there is more than one explanatory variable. b. can be calculated by subtracting the fitted values from the actual values. c. are zero because the predicted values are another name for forecasted

values. d. are typically the same as the population regression function errors.

Answer: b

16) If you wanted to test, using a 5% significance level, whether or not a specific slope coefficient is equal to one, then you should

a. subtract 1 from the estimated coefficient, divide the difference by the standard error, and check if the resulting ratio is larger than 1.96.

b. add and subtract 1.96 from the slope and check if that interval includes 1. c. see if the slope coefficient is between 0.95 and 1.05. d. check if the adjusted R2 is close to 1.

Answer: a

17) If the absolute value of your calculated t-statistic exceeds the critical value from the standard normal distribution you can

Page 5: Practice Questions Multiple Choice Questions Chapter 5faculty.econ.ucdavis.edu/faculty/jorda/class/140/S03/midterm2... · Practice Questions Multiple Choice Questions Chapter 5 1)

a. safely assume that your regression results are significant. b. reject the null hypothesis. c. reject the assumption that the error terms are homoskedastic. d. conclude that most of the actual values are very close to the regression

line.

Answer: b 18) Under the least squares assumptions for the multiple regression problem (zero conditional mean for the error term, all Xi and Yi being i.i.d., all Xi and ui having finite fourth moments, no perfect multicollinearity), the OLS estimators for the slopes and intercept

a. have an exact normal distribution for n > 25. b. are BLUE. c. have a normal distribution in small samples as long as the errors are

homoskedastic. d. are unbiased and consistent.

Answer: d

19) The main advantage of using multiple regression analysis over differences in means testing is that the regression technique

a. allows you to calculate p-values for the significance of your results. b. provides you with a measure of your goodness of fit. c. gives you quantitative estimates of a unit change in X. d. assumes that the error terms are generated from a normal distribution.

Answer: c

20) In a multiple regression framework, the slope coefficient on the regressor X2i

a. takes into account the scale of the error term. b. is measured in the units of Yi divided by units of X2i. c. is usually positive. d. is larger than the coefficient on X1i.

Answer: b

Page 6: Practice Questions Multiple Choice Questions Chapter 5faculty.econ.ucdavis.edu/faculty/jorda/class/140/S03/midterm2... · Practice Questions Multiple Choice Questions Chapter 5 1)

Chapter 6 1) In nonlinear models, the expected change in the dependent variable for a change in

one of the explanatory variables is given by

a. 1 1 2( , ,..., )kY f X X X X∆ ∆= + . b. 1 1 2 2 1 2( , ,..., ) ( , ,... )k k kY f X X X X X X f X X X∆ ∆ ∆ ∆= + + + − . c. 1 1 2 1 2( , ,..., ) ( , ,... )k kY f X X X X f X X X∆ ∆= + − . d. 1 1 2 1 2( , ,..., ) ( , ,... )k kY f X X X X f X X X∆ = + − .

Answer: c

2) The interpretation of the slope coefficient in the model 0 1 ln( )i i iY X uβ β= + + is as

follows: a

a. 1% change in X is associated with a 1β % change in Y. b. 1% change in X is associated with a change in Y of 0.01 1β . a. change in X by one unit is associated with a 100 1β % change in Y. b. change in X by one unit is associated with a 1β change in Y.

Answer: b

3) The interpretation of the slope coefficient in the model 0 1ln( )i i iY X uβ β= + + is as

follows: a

a. 1% change in X is associated with a 1β % change in Y. b. change in X by one unit is associated with a 100 1β % change in Y. c. 1% change in X is associated with a change in Y of 0.01 1β . d. change in X by one unit is associated with a 1β change in Y.

Answer: b

4) The interpretation of the slope coefficient in the model 0 1ln( ) ln( )i i iY X uβ β= + + is

as follows: a

a. 1% change in X is associated with a 1β % change in Y. b. change in X by one unit is associated with a 1β change in Y. c. change in X by one unit is associated with a 100 1β % change in Y. d. 1% change in X is associated with a change in Y of 0.01 1β .

Page 7: Practice Questions Multiple Choice Questions Chapter 5faculty.econ.ucdavis.edu/faculty/jorda/class/140/S03/midterm2... · Practice Questions Multiple Choice Questions Chapter 5 1)

Answer: a

5) In the case of regression with interactions, the coefficient of a binary variable should

be interpreted as follows:

a. there are really problems in interpreting these, since the ln(0) is not defined.

b. for the case of interacted regressors, the binary variable coefficient represents the various intercepts for the case when the binary variable equals one.

c. first set all explanatory variables to one, with the exception of the binary variables. Then allow for each of the binary variables to take on the value of one sequentially. The resulting predicted value indicates the effect of the binary variable.

d. first compute the expected values of Y for each possible case described by the set of binary variables. Next compare these expected values. Each coefficient can then be expressed either as an expected value or as the difference between two or more expected values.

Answer: d

6) The following interactions between binary and continuous variables are possible, with

the exception of

a. 0 1 2 3( )i i i i i iY X D X D uβ β β β= + + + × + . b. 0 1 2 ( )i i i i iY X X D uβ β β= + + × + . c. 0 1( )i i i iY D X uβ β= + + + . d. 0 1 2i i i iY X D uβ β β= + + + .

Answer: c

7) An example of the interaction term between two independent, continuous variables is

a. 0 1 2 3( )i i i i i iY X D X D uβ β β β= + + + × + . b. 0 1 1 2 2i i i iY X X uβ β β= + + + . c. 0 1 1 2 2 3 1 2( )i i i i i iY D D D D uβ β β β= + + + × + . d. 0 1 1 2 2 3 1 2( )i i i i i iY X X X X uβ β β β= + + + × + .

Answer: d

8) Including an interaction term between two independent variables, 1X and 2X , allows

for the following, except that: the interaction term

Page 8: Practice Questions Multiple Choice Questions Chapter 5faculty.econ.ucdavis.edu/faculty/jorda/class/140/S03/midterm2... · Practice Questions Multiple Choice Questions Chapter 5 1)

a. lets the effect on Y of a change in 1X depend on the value of 2X . b. coefficient is the effect of a unit increase in 1X and 2X above and beyond the sum of the individual effects of a unit increase in the two variables alone. c. coefficient is the effect of a unit increase in 1 2( )X X× . d. lets the effect on Y of a change in 2X depend on the value of 1X .

Answer: c

9) An example of a quadratic regression model is

a. 2

0 1 2i iY X Y uβ β β= + + + . b. 0 1 ln( )i iY X uβ β= + + . c. 2

0 1 2i iY X X uβ β β= + + + . d. 2

0 1i iY X uβ β= + + .

Answer: c 10) (Requires Calculus) In the equation

2607.3 3.85 0.0423TestScore Income Income= + − , the following income level results in the maximum test score:

a. 607.3. b. 91.02. c. 45.50. d. cannot be determined without a plot of the data.

Answer: c

11) To decide whether 0 1i iY X uβ β= + + or 0 1ln( )i iY X uβ β= + + fits the data better,

you cannot consult the regression 2R because

a. ln(Y) may be negative for 0<Y<1. b. the TSS are not measured in the same units between the two models. c. the slope no longer indicates the effect of a unit change of X on Y in the

log-linear model. d. the regression 2R can be greater than one in the second model.

Answer: b

12) You have estimated the following equation:

2607.3 3.85 0.0423TestScore Income Income= + − ,

Page 9: Practice Questions Multiple Choice Questions Chapter 5faculty.econ.ucdavis.edu/faculty/jorda/class/140/S03/midterm2... · Practice Questions Multiple Choice Questions Chapter 5 1)

where TestScore is the average of the reading and math scores on the Stanford 9

standardized test administered to 5th grade students in 420 California school districts in 1998 and 1999. Income is the average annual per capita income in the school district, measured in thousands of 1998 dollars. The equation

a. suggests a positive relationship between test scores and income for most

of the sample. b. is positive until a value of Income of 610.81. c. does not make much sense since the square of income is entered. d. suggests a positive relationship between test scores and income for all of

the sample.

Answer: a 13) A polynomial regression model is specified as:

a. 2

0 1 2r

i i i r i iY X X X uβ β β β= + + + + + . b. 2

0 1 1 1r

i i i i iY X X X uβ β β β= + + + + + . c. 2

0 1 2r

i i i r i iY X Y Y uβ β β β= + + + + + . d. 0 1 1 2 2 3 1 2( )i i i i iY X X X X uβ β β β= + + + × + .

Answer: a

14) For the polynomial regression model,

a. you need new estimation techniques since the OLS assumptions do not apply any longer.

b. the techniques for estimation and inference developed for multiple regression can be applied.

c. you can still use OLS estimation techniques, but the t-statistics do not have an asymptotic normal distribution.

d. the critical values from the normal distribution have to be changed to 1.962, 1.963, etc.

Answer: b

15) To test whether or not the population regression function is linear rather than a

polynomial of order r,

a. check whether the regression 2R for the polynomial regression is higher than that of the linear regression.

b. compare the TSS from both regressions. c. look at the pattern of the coefficients: if they change from positive

to negative to positive, etc., then the polynomial regression should

Page 10: Practice Questions Multiple Choice Questions Chapter 5faculty.econ.ucdavis.edu/faculty/jorda/class/140/S03/midterm2... · Practice Questions Multiple Choice Questions Chapter 5 1)

be used. d. use the test of (r-1) restrictions using the F-statistic.

Answer: d

16) The best way to interpret polynomial regressions is to

a. take a derivative of Y with respect to the relevant X. b. plot the estimated regression function and to calculate the estimated effect

on Y associated with a change in X for one or more values of X. c. look at the t-statistics for the relevant coefficients. d. analyze the standard error of estimated effect.

Answer: b

17) The exponential function

a. is the inverse of the natural logarithm function. b. does not play an important role in modeling nonlinear regression functions

in econometrics. c. can be written as exp( )xe . d. is xe , where e is 3.1415….

Answer: a

18) The following are properties of the logarithm function with the exception of a. ln(1/ ) ln( )x x= − . b. ln( ) ln( ) ln( )a x a x+ = + . c. ln( ) ln( ) ln( )ax a x= + . d. ln( ) ln( )ax a x .

Answer: b

19) The binary variable interaction regression a. can only be applied when there are two binary variables, but not three or more. b. is the same as testing for differences in means. c. cannot be used with logarithmic regression functions because ln(0) is not defined. d. allows the effect of changing one of the binary independent variables to depend on

the value of the other binary variable. Answer: d

Page 11: Practice Questions Multiple Choice Questions Chapter 5faculty.econ.ucdavis.edu/faculty/jorda/class/140/S03/midterm2... · Practice Questions Multiple Choice Questions Chapter 5 1)

20) In the regression model 0 1 2 3( )i i i i i iY X D X D uβ β β β= + + + × + , where X is a continuous variable and D is a binary variable, to test that the two regressions are identical, you must use the

a. t-statistic separately for 2 30, 0β β= = .

b. F-statistic for the joint hypothesis that 0 10, 0β β= = . c. t-statistic separately for 3 0β = . d. F-statistic for the joint hypothesis that 2 30, 0β β= = .

Answer: d

Analytical Questions Chapter 5 1. Females, on average, are shorter and weigh less than males. One of your friends,

who is a pre-med student, tells you that in addition, females will weigh less for a given height. To test this hypothesis, you collect height and weight of 29 female and 81 male students at your university. A regression of the weight on a constant, height, and a binary variable, which takes a value of one for females and is zero otherwise, yields the following result:

Studentw = –229.21 – 6.36×Female + 5.58×Height , R2=0.50, SER = 20.99

(43.39) (5.74) (0.62)

where Studentw is weight measured in pounds and Height is measured in inches (heteroskedasticity-robust standard errors in parentheses).

(a) Interpret the results. Does it make sense to have a negative intercept?

Answer: For every additional inch in height, weight increases by roughly 5.5 pounds. Female students weigh approximately 6.5 pounds less than male students, controlling for height. The regression explains 50 percent of the weight variation among students. It does not make sense to interpret the intercept, since there are no observations close to the origin, or, put differently, there are no individuals who are zero inches tall.

(b) You decide that in order to give an interpretation to the intercept you should

rescale the height variable. One possibility is to subtract 5 ft. or 60 inches from your Height, because the minimum height in your data set is 62 inches. The resulting new intercept is now 105.58. Can you interpret this number now? What effect do you think the rescaling had on the two slope coefficients and their t-statistic? Do you thing that the regression R2 has changed? What about the

Page 12: Practice Questions Multiple Choice Questions Chapter 5faculty.econ.ucdavis.edu/faculty/jorda/class/140/S03/midterm2... · Practice Questions Multiple Choice Questions Chapter 5 1)

standard error of the regression?

Answer: There are now observations close to the origin and you can therefore interpret the intercept. A student who is 5ft. tall will weight roughly 105.5 pounds, on average. The two slopes will be unaffected, as will be the regression R2 . Since the explanatory power of the regression is unaffected by rescaling, and the dependent variable and the total sums of squares have remained unchanged, the sums of squared residuals, and hence the SER, must also remain the same.

(c) Calculate t-statistics and carry out the hypothesis test that females weigh the same

as males, on average, for a given height, using a 10% significance level. What is the alternative hypothesis? What is the p-value? What critical value did you use?

Answer: The t-statistics for the intercept, the gender binary variable, and the height variable are -5.28, -1.11, and 9.00, respectively. For a one-sided alternative hypothesis, 0Femaleβ < , the critical value from the standard normal table is –1.28. Hence you cannot reject the null hypothesis at the 10% level. The p-value is 13.4%.

(d) You have learned that correlation does not imply causation. Although this is true

mathematically, does this always apply?

Answer: Although true in general, there are cases where Y cannot cause X, as is the case here. Gaining weight is not a good way for becoming taller, or put differently, weighing 250 pounds will not make students over 7 ft. tall.

2) The cost of attending your college has once again gone up. Although you have

been told that education is investment in human capital, which carries a return of roughly 10% a year, you (and your parents) are not pleased. One of the administrators at your university/college does not make the situation better by telling you that you pay more because the reputation of your institution is better than that of others. To investigate this hypothesis, you collect data randomly for 100 national universities and liberal arts colleges from the 2000-2001 U.S. News and World Report annual rankings. Next you perform the following regression

Cost = 7,311.17 + 3,985.20×Reputation – 0.20×Size

(2,058.63) (664.58) (0.13) + 8,406.79×Dpriv – 416.38×Dlibart – 2,376.51×Dreligion (2,154.85) (1,121.92) (1,007.86)

Page 13: Practice Questions Multiple Choice Questions Chapter 5faculty.econ.ucdavis.edu/faculty/jorda/class/140/S03/midterm2... · Practice Questions Multiple Choice Questions Chapter 5 1)

R2=0.72, SER = 3,773.35 where Cost is Tuition, Fees, Room and Board in dollars, Reputation is the index

used in U.S. News and World Report (based on a survey of university presidents and chief academic officers), which ranges from 1 (“marginal”) to 5 (“distinguished”), Size is the number of undergraduate students, and Dpriv, Dlibart, and Dreligion are binary variables indicating whether the institution is private, a liberal arts college, and has a religious affiliation. The numbers in parentheses are heteroskedasticity-robust standard errors.

(a) Interpret the results and indicate whether or not the coefficients are significantly

different from zero. Do the coefficients have the expected sign?

Answer: An increase in reputation by one category, increases the cost by roughly $3,985. The larger the size of the college/university, the lower the cost. An increase of 10,000 students results in a $2,000 lower cost. Private schools charge roughly $8,406 more than public schools. A school with a religious affiliation is approximately $2,376 cheaper, presumably due to subsidies, and a liberal arts college also charges roughly $416 less. There are no observations close to the origin, so there is no direct interpretation of the intercept. Other than perhaps the coefficient on liberal arts colleges, all coefficients have the expected sign, although that coefficient is not significantly different from zero. All other coefficients are statistically significant at conventional levels, with the exception of the size coefficient, which carries a t-statistic of 1.54, and hence is not statistically significant at the 5% level (using a one-sided alternative hypothesis).

(b) What is the forecasted cost for a liberal arts college, which has no religious

affiliation, a size of 1,500 students and a reputation level of 4.5? (All liberal arts colleges are private.)

Answer: $ 32,935.

(c) To save money, you are willing to switch from a private university to a public

university, which has a ranking of 0.5 less and 10,000 more students. What is the effect on your cost? Is it substantial?

Answer: Roughly $ 12,4.00. Since over the four years of education, this implies

approximately $50,000, it is a substantial amount of money for the average household.

(d) What is the p-value for the null hypothesis that the coefficient on Size is equal to

zero? Based on this, should you eliminate the variable from the regression? Why or why not?

Page 14: Practice Questions Multiple Choice Questions Chapter 5faculty.econ.ucdavis.edu/faculty/jorda/class/140/S03/midterm2... · Practice Questions Multiple Choice Questions Chapter 5 1)

Answer: Using a one-sided alternative hypothesis, the p-value is 6.2 percent. Variables should not be eliminated simply on grounds of a statistical test. The sign of the coefficient is as expected, and its magnitude makes it important. It is best to leave the variable in the regression and let the reader decide whether or not this is convincing evidence that the size of the university matters.

(e) You want to test simultaneously the hypotheses that 0 and 0size Dlibartβ β= = . Your

regression package returns the F-statistic of 1.23. Can you reject the null hypothesis?

Answer: The critical value for 2,F ∞ is 3.00 (5% level) and 4.61 (1% level). Hence

you cannot reject the null hypothesis in this case.

(f) Eliminating the Size and Dlibart variables from your regression, the estimation regression becomes

Cost = 5,450.35 + 3,538.84×Reputation + 10,935.70×Dpriv – 2,783.31×Dreligion;

(1,772.35) (590.49) (875.51) (1,180.57)

R2=0.72, SER = 3,792.68 Why do you think that the effect of attending a private institution has increased now?

Answer: Private institutions are smaller, on average, and some of these are liberal arts colleges. Both of these variables had negative coefficients.

(g) You give a final attempt to bring the effect of Size back into the equation by

forcing the assumption of homoskedasticity onto your estimation. The results are as follows:

Cost = 7,311.17 + 3,985.20×Reputation – 0.20×Size (1,985.17) (593.65) (0.07) + 8,406.79×Dpriv – 416.38×Dlibart – 2,376.51×Dreligion (1,423.59) (1,096.49) (989.23)

R2=0.72, SER = 3,682.02 Calculate the t-statistic on the Size coefficient and perform the hypothesis test that its coefficient is zero. Is this test reliable? Explain.

Page 15: Practice Questions Multiple Choice Questions Chapter 5faculty.econ.ucdavis.edu/faculty/jorda/class/140/S03/midterm2... · Practice Questions Multiple Choice Questions Chapter 5 1)

Answer: Although the coefficient would be statistically significant in this case,

the test is unreliable and should not be used for statistical inference. There is no theoretical suggestion here that the errors might be homoskedastic. Since the standard errors are quite different here, you should use the more reliable ones, i.e. the heteroskedasticity-robust.

(h) What can you say about causation in the above relationship? Is it possible that

Cost affects Reputation rather than the other way around?

Answer: It is very possible that the university president and chief academic officer are influenced by the cost variable in answering the U.S. News and World Report survey. If this were the case, then the above equation suffers from simultaneous causality bias, a topic that will be covered in a later chapter. However, this poses a serious threat to the internal validity of the study.

3) In the multiple regression model with two explanatory variables

iiii uXXY +++= 22110 βββ

the OLS estimators for the three parameters are as follows (small letters refer to deviations from means as in i iz Z Z= − ):

0 1 1 2 2ˆ ˆ ˆY X Xβ β β= − −

2

1 2 2 1 21 1 1 1

12 2 21 2 1 2

1 1 1

ˆ( )

n n n n

i i i i i i ii i i i

n n n

i i i ii i i

y x x y x x x

x x x xβ = = = =

= = =

−=

∑ ∑ ∑ ∑

∑ ∑ ∑

2

2 1 1 1 21 1 1 1

22 2 21 2 1 2

1 1 1

ˆ( )

n n n n

i i i i i i ii i i i

n n n

i i i ii i i

y x x y x x x

x x x xβ = = = =

= = =

−=

∑ ∑ ∑ ∑

∑ ∑ ∑

You have collected data for 104 countries of the world from the Penn World Tables and want to estimate the effect of the population growth rate (X1i) and the saving rate (X2i) (average investment share of GDP from 1980 to 1990) on GDP per worker (relative to the U.S.) in 1990. The various sums needed to calculate the OLS estimates are given below:

1 21 1 1

33.33; 2.025; 17.313n n n

i i ii i i

Y X X= = =

= = =∑ ∑ ∑

Page 16: Practice Questions Multiple Choice Questions Chapter 5faculty.econ.ucdavis.edu/faculty/jorda/class/140/S03/midterm2... · Practice Questions Multiple Choice Questions Chapter 5 1)

2 2 2

1 21 1 1

8.3103; .0122; 0.6422n n n

i i ii i i

y x x= = =

= = =∑ ∑ ∑

1 2 1 21 1 1

0.2304; 1.5676; 0.0520n n n

i i i i i ii i i

y x y x x x= = =

= − = = −∑ ∑ ∑

(a) What are your expected signs for the regression coefficient? Calculate the

coefficients and see if their signs correspond to your intuition.

Answer: You expect 1 0β < and 2 0β > with no prior expectation on the intercept. Substituting the above numbers into the equations for the regression coefficients results in 1β = -12.95, 2β = 1.39, and 0β = 0.34.

(b) Find the regression R2, and interpret it. What other factors can you think of that

might have an influence on productivity?

Answer: 1 21 2

2 1 1

2

1

n n

i i i ii i

n

ii

y x y xR

y

β β= =

=

+=

∑ ∑

∑= 0.62. 62 percent of the variation in

relative productivity is explained by the regression. There is a vast literature on the subject and students’ answers will obviously vary. Some may focus on additional economic variables such as the initial level of productivity and the inflation rate during the sample period. Others may emphasize institutional variables such as whether or not the country was democratic over the sample period, or had political stability, etc.

(c) The heteroskedasticity-robust standard errors of the two slope coefficients are

1.99 (for population growth) and 0.23 (for the saving rate). Calculate the 95% confidence interval for both coefficients. How many standard deviations are the coefficients away from zero?

Answer: The 95% confidence interval for the population growth is (–16.85, -

9.05), and the 95% confidence interval for the saving rate is (0.94, 1.84). The population growth coefficient has a t-statistic of -6.51, and the saving rate coefficient of 6.04. These represent standard deviations away from zero.

4) A subsample from the Current Population Survey is taken, on weekly earnings of

individuals, their age, and their gender. You have read in the news that women make 70 cents to the $1 that men earn. To test this hypothesis, you first regress earnings on a constant and a binary variable, which takes on a value of 1 for females and is 0 otherwise. The results were:

Page 17: Practice Questions Multiple Choice Questions Chapter 5faculty.econ.ucdavis.edu/faculty/jorda/class/140/S03/midterm2... · Practice Questions Multiple Choice Questions Chapter 5 1)

Earn = 570.70 - 170.72×Female, R2=0.084, SER = 282.12.

(9.44) (13.52) (a) There are 850 females in your sample and 894 males. What are the mean earnings

of males and females in this sample? What is the percentage of average female income to male income?

Answer: Males earn $570.70, females $399.98. Percentage of average female

income to male income is 70.1% in the sample. (b) Perform a difference in means test and indicate whether or not the difference in

the mean salaries is significantly different. Justify your choice of a one-sided or two-sided alternative test. Are these results evidence enough to argue that there is discrimination against females? Why or why not? Is it likely that the errors are normally distributed in this case? If not, does that present a problem to your test?

Answer: The t-statistic is -12.63, while the critical value is –1.64. The difference

is therefore statistically significant. A one-sided alternative was chosen since the claim is that females make less than males. This represents little evidence of discrimination, since attributes of males and females have not been included. Given that earnings distributions are not normally distributed, the errors will also not be distributed normally, and assuming that they are, results in problematic inference.

(c) You decide to control for age (in years) in your regression results because older

people, up to a point, earn more on average than younger people. This regression output is as follows (using heteroskedasticity-robust standard errors):

Earn = 323.70 – 169.78×Female + 5.15×Age, R2=0.135, SER = 274.45.

(21.18) (13.06) (0.55) Interpret these results carefully. How much, on average, does a 40-year-old female make per year in your sample? What about a 20-year-old male? Does this represent stronger evidence of discrimination against females?

Answer: As individuals become one year older, they earn $5.15 more, on average.

Females earn significantly less money on average and for a given age. 13.5 percent of the earnings variation is explained by the regression. A 40-year-old female earns $359.92, while a 20-year-old male makes $426.70. There is somewhat more evidence here, since age has been added as a regressor. However, many attributes, which could potentially explain this difference, are still omitted.

(d) Test for the significance of the age and gender coefficients. Why do you think that

age plays a role in earnings determination?

Page 18: Practice Questions Multiple Choice Questions Chapter 5faculty.econ.ucdavis.edu/faculty/jorda/class/140/S03/midterm2... · Practice Questions Multiple Choice Questions Chapter 5 1)

Answer: The t-statistics are 9.36 for the age coefficient, and -13.00 for the gender

coefficient. Both of these values are greater than the (absolute) critical value from the standard normal distribution (1.64). Hence you can reject the null hypothesis that these coefficients are zero. Age proxies “on the job training.” A better proxy that has been used frequently in the past is the Mincer experience variable (Age-Education-6). Obviously this is a better proxy for some subsample of individuals than for others.

5) You have collected data from Major League Baseball (MLB) to find the determinants

of winning. You have a general idea that both good pitching and strong hitting are needed to do well. However, you do not know how much each of these contributes separately. To investigate this problem, you collect data for all MLB during 1999 season. Your strategy is to first regress the winning percentage on pitching quality (“Team ERA”), second to regress the same variable on some measure of hitting (“OPS – On-base Plus Slugging percentage”), and third to regress the winning percentage on both.

Summary of the Distribution of Winning Percentage, On Base plus Slugging

percentage, and Team Earned Run Average for MLB in 1999

Average Standard deviation

Percentile

10% 25% 40% 50% (median)

60% 75% 90%

Team ERA

4.71 0.53 3.84 4.35 4.72 4.78 4.91 5.06 5.25

OPS

0.778 0.034 0.720 0.754 0.769 0.780 0.790 0.798 0.820

Winning Percentage

0.50 0.08 0.40 0.43 0.46 0.48 0.49 0.59 0.60

The results are as follows:

Winpct = 0.94 – 0.100×teamera , R2 = 0.49, SER = 0.06.

(0.08) (0.017)

Winpct = –0.68 + 1.513×ops , R2=0.45, SER = 0.06. (0.17) (0.221)

Winpct = –0.19 – 0.099×teamera + 1.490×ops , R2=0.92, SER = 0.02. (0.08) (0.008) (0.126)

(a) Interpret the multiple regression. What is the effect of a one point increase in team

ERA? Given that the Atlanta Braves had the most wins that year, wining 103

Page 19: Practice Questions Multiple Choice Questions Chapter 5faculty.econ.ucdavis.edu/faculty/jorda/class/140/S03/midterm2... · Practice Questions Multiple Choice Questions Chapter 5 1)

games out of 162, do you find this effect important? Use the t-statistic to test for the statistical significance of the coefficient. Next analyze the importance and statistical significance for the OPS coefficient. (The Minnesota Twins had the minimum OPS of 0.712, while the Texas Rangers had the maximum with 0.840.) Since the intercept is negative, and since winning percentages must lie between zero and one, should you rerun the regression through the origin?

Answer: A single point increase in team ERA lowers the winning percentage by

approximately 10 percent. A 0.1 increase in OPS results roughly in an increase of 15 percent. Given that there are no observations close to the origin, you should not interpret the intercept. The multiple regression explains 92 percent of the variation in winning percentage. The Atlanta Braves only won 63.6 percent of their games. Given that this represents the best record during that season, a 10 percentage point drop is important. The t-statistics for team ERA and OPS are -12.38 and 11.83. Both of these are highly significant. Although the intercept cannot be interpreted, it anchors the regression at a certain level and should therefore not be omitted.

(b) You remember from your textbook, that having omitted variables can have a

devastating effect on the coefficient of the included variable. In the above regression, the coefficient of Team ERA and OPS hardly changed when the other variable was excluded. What can you conclude from that? Does it make sense? What could you do to confirm your suspicion?

Answer: There are two potential reasons for the coefficient of the included

variable not to be biased. One is that the omitted variable is not relevant in determining the dependent variable. That can be comfortably ruled out in the above example. The second reason is that there is no correlation between the included and omitted variable. This is the obvious candidate here. To confirm the suspicion, you could run a regression of Team ERA on OPS. The regression 2R turns out to be 0.0002.

(c) There are 30 teams in MLB. Does the small sample size worry you here when

testing for significance?

Answer: The t-statistic is only normally distributed in large samples. As a result, inference is problematic here.

(d) What are some of the omitted variables in your analysis? Are they likely to affect

the coefficient on Team ERA and OPS given the size of the R2 and their potential correlation with the included variables?

Answer: The quality of the management and coaching comes to mind, although

both may be reflected in the performance statistics, as are salaries. There

Page 20: Practice Questions Multiple Choice Questions Chapter 5faculty.econ.ucdavis.edu/faculty/jorda/class/140/S03/midterm2... · Practice Questions Multiple Choice Questions Chapter 5 1)

are other aspects of baseball performance that are missing, such as the fielding percentage of the team. However, none of these are likely to be statistically significant given the high regression R2.

6) In the process of collecting weight and height data from 29 female and 81 male

students at your university, you also asked the students for the number of siblings they have. Although it was not quite clear to you initially what you would use that variable for, you construct a new theory that suggests that children who have more siblings come from poorer families and will have to share the food on the table. Although a friend tells you that this theory does not pass the “straight-face” test, you decide to hypothesize that peers with many siblings will weigh less, on average, for a given height. In addition, you believe that the muscle/fat tissue composition of male bodies suggests that females will weigh less, on average, for a given height. To test these theories, you perform the following regression:

Studentw = –229.92 – 6.52×Female + 0.51×Sibs+ 5.58×Height,

(44.01) (5.52) (2.25) (0.62)

R2=0.50, SER = 21.08

where Studentw is in pounds, Height is in inches, Female takes a value of 1 for females and is 0 otherwise, Sibs is the number of siblings (heteroskedasticity-robust standard errors in parentheses).

(a) Interpret the regression results.

Answer: For every additional inch in height, students weigh, on average, roughly 5.5 pounds more. For a given height and number of siblings, female students weigh approximately 6.5 pounds less. For every additional sibling, the weight of students increases by half a pound. Since there are no observations close to the origin, you cannot interpret the intercept. The regression explains half of the variation in student weight.

(b) Carrying out hypotheses tests using the relevant t-statistics to test your two claims

separately, is there strong evidence in favor of your hypotheses? Is it appropriate to use two separate tests in this situation?

Answer: The t-statistics for gender and number of siblings are -1.18 and 0.23

respectively. Neither coefficient is statistically significant at conventional levels. If you wanted to test the two hypothesis simultaneously, then you should use an F-test.

(c) You also perform an F-test on the joint hypothesis that the two coefficients for

females and siblings are zero. The calculated F-statistic is 0.84. Find the critical value from the F-table. Can you reject the null hypothesis? Is it possible that one of the two parameters is zero in the population, but not the other?

Page 21: Practice Questions Multiple Choice Questions Chapter 5faculty.econ.ucdavis.edu/faculty/jorda/class/140/S03/midterm2... · Practice Questions Multiple Choice Questions Chapter 5 1)

Answer: The critical value is 3.00 at the 5% level, and 4.61 at the 1% level.

Hence you cannot reject the null hypothesis. The hypothesis is that both coefficients are zero, and this cannot be rejected. Had you rejected the null hypothesis, then the alternative hypothesis states that one or both of the restrictions do not hold.

d) You are now a bit worried that the entire regression does not make sense and

therefore also test for the height coefficient to be zero. The resulting F-statistic is 57.25. Does that prove that there is a relationship between weight and height?

Answer: Although you cannot prove anything in this context with certainty, there

is a very high probability that there is a relationship between height and weight in the population, given the sample result. The critical value from the F-table is 3.78 at the 1% level.

7) Attendance at sports events depends on various factors. Teams typically do not

change ticket prices from game to game to attract more spectators to less attractive games. However, there are other marketing tools used, such as fireworks, free hats, etc., for this purpose. You work as a consultant for a sports team, the Los Angeles Dodgers, to help them forecast attendance, so that they can potentially devise strategies for price discrimination. After collecting data over two years for every one of the 162 home games of the 2000 and 2001 season, you run the following regression: Attend = 15,005 + 201×Temperat + 465×DodgNetWin + 82×OppNetWin (8,770) (121) (169) (26) + 9647×DFSaSu + 1328×Drain + 1609×D150m + 271×DDiv – 978×D2001; (1505) (3355) (1819) (1,184) (1,143) R2=0.416, SER = 6983

where Attend is announced stadium attendance, Temperat it the average temperature on game day, DodgNetWin are the net wins of the Dodgers before the game (wins-losses), OppNetWin is the opposing team’s net wins at the end of the previous season, and DFSaSu, Drain, D150m, Ddiv, and D2001 are binary variables, taking a value of 1 if the game was played on a weekend, it rained during that day, the opposing team was within a 150 mile radius, the opposing team plays in the same division as the Dodgers, and the game was played during 2001, respectively. Numbers in parentheses are heteroskedasticity- robust standard errors.

(a) Interpret the regression results. Do the coefficients have the expected signs?

Page 22: Practice Questions Multiple Choice Questions Chapter 5faculty.econ.ucdavis.edu/faculty/jorda/class/140/S03/midterm2... · Practice Questions Multiple Choice Questions Chapter 5 1)

Answer: 10 degree warmer temperature increases attendance by roughly 2,000. A

10 game net increase in wins results in approximately 4,600 more spectators. If the opponents’ net win is 10 games higher when compared to another team, then roughly 800 more people attend. Weekend games attract almost 10,000 more people on average. Rain during the day of the game brings out close to 1,300 more fans. A team from closer by, such as the Angels or the Diamondbacks, attract a bit more than 1,600 more people, and a team from the same division results in close to 270 more fans in the stadium. On average, there were approximately 1,000 fewer spectators per game in 2001 than in 2000, holding all other factors constant. With the exception of the rain variable, the signs correspond to prior expectation. The regression explains 41.6 percent of the variation in Dodger attendance.

(b) Are the slope coefficients statistically significant?

Answer: The t-statistics for Temperat, DodgNewWin, OppNetWin, and DFSaSu are all statistically significant at the 5% level, using a one-sided test. The constant is insignificant using a two-sided test. All the other coefficients are not statistically significant at the 5% level.

(c) To test whether the effect of the last four binary variables is significant, you have

your regression program calculate the relevant F-statistic, which is 0.295. What is the critical value? What is your decision about excluding these variables?

Answer: The critical value at the 5% level is 2.37. Hence you cannot reject the

null hypothesis that all four coefficients are simultaneously zero. (d) Excluding the last four binary variables results in the following regression

result:

Attend = 14,838 + 202×Temperat + 435×DodgNetWin + 90×OppNetWin (8,770) (121) (169) (26) + 10,472×DFSaSu, R2=0.410, SER = 6925 (1505) According to this regression, what is your forecast of the change in attendance if the temperature increases by 30 degrees? Is it likely that people attend more games if the temperature increases? Is it possible that Temperat picks up the effect of an omitted variable? Answer: For an increase in 30 degrees, there will be roughly 6,000 more people

in attendance. Although people prefer 75 degrees over 45 degrees, it is

Page 23: Practice Questions Multiple Choice Questions Chapter 5faculty.econ.ucdavis.edu/faculty/jorda/class/140/S03/midterm2... · Practice Questions Multiple Choice Questions Chapter 5 1)

unlikely that they prefer 105 degrees over 75 degrees. Temperature rises during the baseball season in Los Angeles. There are typically fewer people in attendance during the earlier parts of the season than during the latter parts. Binary variables for the month of the year would pick up such an effect.

(e) Assuming that ticket sales depend on prices, what would your policy advice be for

the Dodgers to increase attendance?

Answer: The only variable that management has limited control over is the performance of the team. The policy advice would therefore be to assure a superior team performance, which, in turn, increases attendance. (Stating the obvious is not going to keep the consultant on the payroll much longer.)

(f) Dodger stadium is large and is not often sold out. The Boston Red Sox play in a

much smaller stadium, Fenway Park, which often reaches capacity. If you did the same analysis for the Red Sox, what problems would you foresee in your analysis?

Answer: If there was a serious capacity constraint, then estimating the equation in

the above way would not yield sensible results. Imagine that Fenway Park was basically sold out and the Red Sox would now improve their net wins. Since you would not observe an increase in the dependent variable, the coefficient for net wins would necessarily have to be zero.

9) The administration of your university/college is thinking about implementing a

policy of coed floors only in dormitories. Currently there are only single gender floors. One reason behind such a policy might be to generate an atmosphere of better “understanding” between the sexes. The Dean of Students (DoS) has decided to investigate if such a behavior results in more “togetherness” by attempting to find the determinants of the gender composition at the dinner table in your main dining hall, and in that of a neighboring university, which only allows for coed floors in their dorms. The survey includes 176 students, 63 from your university/college, and 113 from a neighboring institution.

(a) The Dean’s first problem is how to define gender composition. To begin with, the

survey excludes single persons’ tables, since the study is to focus on group behavior. The Dean also eliminates sports teams from the analysis, since a large number of single-gender students will sit at the same table. Finally, the Dean decides to only analyze tables with three or more students, since she worries about “couples” distorting the results. The Dean finally settles for the following specification of the dependent variable:

GenderComp=|(50%-% of Male Students at Table)|

Page 24: Practice Questions Multiple Choice Questions Chapter 5faculty.econ.ucdavis.edu/faculty/jorda/class/140/S03/midterm2... · Practice Questions Multiple Choice Questions Chapter 5 1)

Where “|Z|” stands for absolute value of Z. The variable can take on values from zero to fifty. Briefly analyze some of the possible values. What are the implications for gender composition as more female students join a given number of males at the table? Why would you choose the absolute value here? Discuss some other possible specifications for the dependent variable. Answer: 3 females, 0 males: 50; 0 females, 3 males: 50; 2 females, 2 males: 0; 1

female, 3 males: 30; 4 females, 3 males: 7.143. For a given number of males, say 3, the gender composition will first decrease as the number of females increases from 0 to 3. After that, the gender composition will decrease again. You need to choose the absolute value because having many individuals from one gender relative to the other is equally bad for a balanced gender composition. Another possibility would be to use the squared difference.

(b) After considering various explanatory variables, the Dean settles for an initial list

of eight, and estimates the following relationship, using heteroskedasticity-robust standard errors (this Dean obviously has taken an econometrics course earlier in her career and/or has an able research assistant):

GenderComp = 30.90 – 3.78×Size – 8.81×DCoed + 2.28×DFemme

+2.06×DRoommate (7.73) (0.63) (2.66) (2.42) (2.39)

- 0.17×DAthlete + 1.49×DCons – 0.81 SAT + 1.74×SibOther, R2=0.24, SER = 15.50 (3.23) (1.10) (1.20) (1.43)

where Size is the number of persons at the table minus 3, DCoed is a binary variable, which takes on the value of 1 if you live on a coed floor, DFemme is a binary variable, which is 1 for females and zero otherwise, DRoommate is a binary variable which equals 1 if the person at the table has a roommate and is zero otherwise, DAthlete is a binary variable which is 1 if the person at the table is a member of an athletic varsity team, DCons is a variable which measures the political tendency of the person at the table on a seven-point scale, ranging from 1 being “liberal” to 7 being “conservative,” SAT is the SAT score of the person at the table measured on a seven-point scale, ranging from 1 for the category “900-1000” to 7 for the category “1510 and above,” and increasing by one for 100 point increases, and SibOther is the number of siblings from the opposite gender in the family the person at the table grew up with.

Interpret the above equation carefully, justifying the inclusion of the explanatory variables along the way. Does it make sense to interpret the constant in the above regression? Indicate which of the coefficients are statistically significant.

Page 25: Practice Questions Multiple Choice Questions Chapter 5faculty.econ.ucdavis.edu/faculty/jorda/class/140/S03/midterm2... · Practice Questions Multiple Choice Questions Chapter 5 1)

Answer: The larger the size at the table, the more balanced the gender composition. Consider a table of 6, where you find two more males than females (4 females, 2 males, gender composition = 16.7) versus a table of 14, where you have two more males than females (gender composition = 7.1). Obviously, if males and females increased in the same proportion, then gender composition would not change. This has not happened here. Students from a coed floor are more likely to sit at a more balanced table in terms of gender composition. This is likely to happen if students knock on neighbors’ doors to see who is willing to join them for lunch. Females are less likely to sit at gender balanced tables, and there is no prior on the coefficient of this variable. Having a roommate increases the likelihood of gender imbalance. Roommates are from the same gender, and joining the roommate for a meal results in a more imbalanced gender composition. Being a member of a varsity team decreases the gender imbalance. Recall that sports teams sitting together are excluded from the sample. Although there is no strong prior here, the result suggests that varsity team members have more friends, on average, from the other sex than does the general student body. Having a more conservative view, holding other factors constant, results in sitting at meals with more people from the same sex. More intelligent students, or at least those with a higher SAT score, sit more frequently with students from the other sex. Having had more siblings from the other gender at home results in a more imbalanced gender composition: the female student who had four brothers when she grew up has had enough of this sort of experience (although, given the specification of the dependent variable, it is also possible that she continues to sit with four males). There are no observations close to the origin, so it is best not to interpret the dependent variable. 24 percent of the variation in gender composition is explained by the regression.

Only the constant, Size, and DCoed are statistically significant at the 5% level.

(c) Based on the above results, the Dean decides to specify a more parsimonious form

by eliminating the least significant variables. Using the F-statistic for the null hypothesis that there is no relationship between the gender composition at the table and DFemme, DRoommate, DAthlete, and SAT, the regression package returns a value of 1.10. What are the degrees of freedom for the statistic? Look up the 1% and 5% critical values from the F- table and make a decision about the exclusion of these variables based on the critical values.

Answer: The 4,F ∞ is 2.37 at the 5% level, and 3.32 at the 1% level. Hence you

cannot reject the null hypothesis that all four coefficients are zero. (d) The Dean decides to estimate the following specification next:

Page 26: Practice Questions Multiple Choice Questions Chapter 5faculty.econ.ucdavis.edu/faculty/jorda/class/140/S03/midterm2... · Practice Questions Multiple Choice Questions Chapter 5 1)

GenderComp = 29.07 – 3.80×Size – 9.75×DCoed + 1.50×DCons + 1.97×SibOther, (3.75) (0.62) (1.04) (1.04) (1.44)

R2=0.22 SER = 15.44 Calculate the t-statistics for the coefficients and discuss whether or not the Dean

should attempt to simplify the specification further. Based on the results, what might some of the comments be that she will write up for the other senior administrators of your college? What are some of the potential flaws in her analysis? What other variables do you think she should have considered as explanatory factors?

Answer: The t-statistics for the five coefficients are as follows: 7.75, -6.13, -9.38,

1.44 and 1.37. The Dean should leave the specification as is and allow readers to decide if they want to place much weight on the insignificant coefficients. The variable of interest is DCoed and she will most likely focus on that, concluding that having coed floors in dormitories will increase the gender balance at dining hall tables. She will most likely go further in her report and suggest that communication between the sexes will improve as a result of coed floors.

One of the major flaws in the analysis is that students from one college do not have coed floors in dormitories while students from the other college do not have single gender floors. Ideally you would like to survey students from the same college where some of the students lived on single gender floors while others did not. Answers on omitted variables will obviously vary. Ideally some survey question should be included which would indicate the student’s attitude towards the other sex.

(e) Had the Dean used the number of people sitting at the table instead of Number-3,

what effect would that have had on the above specification?

Answer: The only change would be in the intercept. (f) If you believe that going down the hallway and knocking on doors is one of the

major determinants of who goes to eat with whom, then why would it not be a good idea to survey students at lunch tables?

Answer: Many students attend lectures before lunch, and may ask some of the

students attending the same lecture to join them for lunch. 10) The Solow growth model suggests that countries with identical saving rates and

population growth rates should converge to the same per capita income level. This result has been extended to include investment in human capital (education) as

Page 27: Practice Questions Multiple Choice Questions Chapter 5faculty.econ.ucdavis.edu/faculty/jorda/class/140/S03/midterm2... · Practice Questions Multiple Choice Questions Chapter 5 1)

well as investment in physical capital. This hypothesis is referred to as the “conditional convergence hypothesis,” since the convergence is dependent on countries obtaining the same values in the driving variables. To test the hypothesis, you collect data from the Penn World Tables on the average annual growth rate of GDP per worker (g6090) for the 1960-1990 sample period, and regress it on the (i) initial starting level of GDP per worker relative to the United States in 1960 (RelProd60), (ii) average population growth rate of the country (n), (iii) average investment share of GDP from 1960 to1990 (sK - remember investment equals savings), and (iv) educational attainment in years for 1985 (Educ). The results for close to 100 countries is as follows (numbers in parentheses are for heteroskedasticity-robust standard errors):

6090g = 0.004 - 0.172×n + 0.133×sK + 0.002×Educ – 0.044×RelProd60,

(0.007) (0.209) (0.015) (0.001) (0.008) R2=0.537, SER = 0.011 (a) Interpret the results. Do the coefficients have the expected signs? Why does a

negative coefficient on the initial level of per capita income indicate conditional convergence (“beta-convergence”)? Is the coefficient on this variable significantly different from zero at the 5% level? At the 1% level?

Answer: All slope coefficients have the expected sign given the economic theory

behind the equation. The negative coefficient implies that countries which were further behind grew relatively faster, or, put differently, countries which had a higher relative per capita income in 1960 grew relatively slower. The coefficient has a t-statistic of 5.50 and is therefore statistically significant at both the 5% and the 1% level.

(b) Equations of the above type have been labeled “determinants of growth”

equations in the literature. You recall from your intermediate macroeconomics course that growth in the Solow growth model is determined by technological progress. Yet the above equation does not contain technological progress. Is that inconsistent?

Answer: The equation only determines growth relative to a given starting point,

namely per capita income in 1960. Compare this to runners placed on a track where the starting blocks are at various points of the first 100 m. Let the race last for perhaps 10 seconds and let the runners stop at that point on the track. In essence, you measure where the runners ended up given their starting point, or you can also measure how far they ran given their starting point. In many ways, the above equation is therefore meant to predict the per capita income level in 1990 rather than the growth.

(c) Test for the significance of the other slope coefficients. Should you use a one-

Page 28: Practice Questions Multiple Choice Questions Chapter 5faculty.econ.ucdavis.edu/faculty/jorda/class/140/S03/midterm2... · Practice Questions Multiple Choice Questions Chapter 5 1)

sided alternative hypothesis or a two-sided test? Will the decision for one or the other influence the decision about the significance of the parameters? Should you always eliminate variables which carry insignificant coefficients?

Answer: The t-statistics are –0.82. 8.87, and 2.00. Hence the coefficient on

population growth is not statistically significant. You should use a one-sided alternative hypothesis test since economic theory gives you information about the expected sign on these variables. In the above case, the decision will not be influenced by the choice of a one-sided or two-sided test, since the (absolute value of the) critical value is 1.64 or 1.96 at the 5% significance level. If there is a strong prior on the sign of the coefficient, then the variable should not be eliminated based on the significance test. Instead it should be left in the equation, but the low p-value should be flagged to the reader, and the reader should decide herself how convincing the evidence is in favor of the theory.

Chapter 6 1. Females, it is said, make 70 cents to the dollar in the United States. To investigate

this phenomenon, you collect data on weekly earnings from 1,744 individuals, 850 females and 894 males. Next, you calculate their average weekly earnings and find that the females in your sample earned $346.98, while the males made $517.70.

(a) Calculate the female earnings in percent of the male earnings. How would you

test whether or not this difference is statistically significant? Give two approaches.

Answer: Female earnings are at 67 percent of male earnings. The difference in means test described in section 3.4 of the text. The t-statistic for comparison of two means is given in equation (3.20), which is one way to test for statistical significance. The alternative is to run a regression of earnings on a constant and a binary variable, which takes on the value of one for females and is zero otherwise. Using a t-test on the slope of the binary variable amounts to the same test as the difference in means (section 4.7 in the text).

(b) A peer suggests that this is consistent with the idea that there is discrimination against

females in the labor market. What is your response?

Answer: Differences in attributes of the individuals, such as education, ability, and tenure with an employer, have not been taken into account. Hence, in itself, this is weak evidence, at best, for discrimination.

(c) You recall from your textbook that additional years of experience are supposed to

result in higher earnings. You reason that this is because experience is related to “on the job training.” One frequently used measure for experience is “Age-Education-6.”

Page 29: Practice Questions Multiple Choice Questions Chapter 5faculty.econ.ucdavis.edu/faculty/jorda/class/140/S03/midterm2... · Practice Questions Multiple Choice Questions Chapter 5 1)

Explain the underlying rationale. Assuming, heroically, that education is constant across the 1,744 individuals, you consider regressing earnings on age and a binary variable for gender. You estimate two specifications initially:

Earn = 323.70 + 5.15×Age – 169.78×Female, 2R =0.13, SER=274.75 (21.18) (0.55) (13.06) ( )Ln Earn = 5.44 + 0.015×Age – 0.421×Female, 2R =0.17, SER=0.75 (0.08) (0.002) (0.036)

where Earn are weekly earnings in dollars, Age is measured in years, and Female

is a binary variable, which takes on the value of one if the individual is a female and is zero otherwise. Interpret each regression carefully. For a given age, how much less do females earn on average? Should you choose the second specification on grounds of the higher regression 2R ?

Answer: The Mincer experience variable is a reasonable proxy for “on the job

training” if the individual started to work after completing her or his education, and stayed employed thereafter. Hence this is a better proxy for some than for others.

The linear specification suggests that for every additional year the individual receives $5.15 of additional weekly earnings on average. Females make $167.78 less than males at a given age. There is no data close to the origin, so the intercept should not be interpreted. The regression explains 13 percent of the variation in earnings. The log-linear specification says that earnings increase by 1.5 percent for every additional year in an individual’s life. Females earn approximately 42.1 percent less than males at a given age. Again, the intercept should not be interpreted. The regression explains 17 percent of the variation in the log of earnings. You should not prefer this specification over the linear one on grounds of the higher regression 2R since these cannot be compared as a result of the difference in the units of measurement of the dependent variable.

(d) Your peer points out to you that age-earning profiles typically take on an inverted U-

shape. To test this idea, you add the square of age to your log-linear regression. ( )Ln Earn = 3.04 + 0.147×Age – 0.421×Female – 0.0016 2Age , (0.18) (0.009) (0.033) (0.0001)

2R =0.28, SER=0.68

Page 30: Practice Questions Multiple Choice Questions Chapter 5faculty.econ.ucdavis.edu/faculty/jorda/class/140/S03/midterm2... · Practice Questions Multiple Choice Questions Chapter 5 1)

Interpret the results again. Are there strong reasons to assume that this specification is superior to the previous one? Why is the increase of the Age coefficient so large relative to its value in (c)?

Answer: The coefficient on the added variable is statistically significant and has

resulted in a substantial increase in the regression 2R . The increase in the Age coefficient is due to the fact that earnings increase more initially than later in life or, mathematically speaking, it compensates for the negative coefficient on 2Age , which lowers earnings as individuals become older.

(e) What other factors may play a role in earnings determination?

Answer: Students’ answers will differ, but education, ability, regional differences, race, and professional choice are often mentioned.

2) An extension of the Solow growth model that includes human capital in addition to physical capital, suggests that investment in human capital (education) will increase the wealth of a nation (per capita income). To test this hypothesis, you collect data for 104 countries and perform the following regression:

RelPersInc = 0.046 – 5.869×gpop + 0.738×sK + 0.055×Educ, R2=0.775, SER = 0.1377

(0.079) (2.238) (0.294) (0.010)

where RelPersInc is GDP per worker relative to the United States, gpop is the average population growth rate, 1980 to1990, sK is the average investment share of GDP from 1960 to1990, and Educ is the average educational attainment in years for 1985. Numbers in parentheses are for heteroskedasticity-robust standard errors.

(a) Interpret the results and indicate whether or not the coefficients are

significantly different from zero. Do the coefficients have the expected sign?

Answer: A one percentage point decrease in the population growth rate increases GDP per worker relative to the United States by roughly 0.06. An increase in the investment share of 0.1 results in an increase of GDP per worker relative to the United States by approximately 0.07. For every additional year of average educational attainment, the increase is 0.055. The intercept should not be interpreted. The regression explains 77.5 percent of the variation in relative productivity. All coefficients are significantly different from zero at conventional levels. All coefficients carry the expected sign.

(b) To test for equality of the coefficients between the OECD and other countries,

you introduce a binary variable (DOECD), which takes on the value of one for

Page 31: Practice Questions Multiple Choice Questions Chapter 5faculty.econ.ucdavis.edu/faculty/jorda/class/140/S03/midterm2... · Practice Questions Multiple Choice Questions Chapter 5 1)

the OECD countries and is zero otherwise. To conduct the test for equality of the coefficients, you estimate the following regression:

RelPersInc = -0.068 – 0.063×gpop + 0.719×sK + 0.044×Educ,

(0.072) (2.271) (0.365) (0.012)

0.381×DOECD – 8.038×(DOECD×gpop)- 0.430×(DOECD×sK) (0.184) (5.366) (0.768)

+0.003×(DOECD×Educ), R2=0.845, SER = 0.116 (0.018)

Write down the two regression functions, one for the OECD countries, the other for the non-OECD countries. The F- statistic that all coefficients involving DOECD are zero, is 6.76. Find the corresponding critical value from the F table and decide whether or not the coefficients are equal across the two sets of countries.

Answer: The regression for the non-OECD countries is

RelPersInc = -0.068 – 0.063×gpop + 0.719×sK + 0.044×Educ.

For the OECD countries we get RelPersInc = 0.313 – 8.101×gpop + 0.289×sK + 0.047×Educ.

The critical value is 3.32 at the 1% level and hence you can reject the null hypothesis that the coefficients are equal.

(c) Given your answer in the previous question, you want to investigate further.

You first force the same slopes across all countries, but allow the intercept to differ. That is, you reestimate the above regression but set

0kDOECD gpop DOECD s DOECD Educβ β β× × ×= = = . The t-statistic for DOECD is 4.39. Is

the coefficient, which was 0.241, statistically significant?

Answer: Given the critical value, the coefficient is statistically significant, that is, you can reject 0DOECDβ = .

(d) Your final regression allows the slopes to differ in addition to the intercept. The F-statistic for 0

kDOECD gpop DOECD s DOECD Educβ β β× × ×= = = is 1.05. What is your decision? Each one of the t-statistics is also smaller than the critical value from the standard normal table. Which test should you use?

Page 32: Practice Questions Multiple Choice Questions Chapter 5faculty.econ.ucdavis.edu/faculty/jorda/class/140/S03/midterm2... · Practice Questions Multiple Choice Questions Chapter 5 1)

Answer: Given the critical value of 3.78 at the 1% level, you cannot reject the null hypothesis that the additional coefficients are all zero. The F-test is the proper procedure to use when testing for simultaneous restrictions.

(e) Looking at the tests in the two previous questions, what is your conclusion?

Answer: There is evidence that the slopes can be set equal. However, there seems to be a level difference between the two groups of countries.

3) You have been asked by your younger sister to help her with a science fair project.

During the previous years she already studied why objects float and there also was the inevitable volcano project. Having learned regression techniques recently, you suggest that she investigate the weight-height relationship of 4th to 6th graders. Her presentation topic will be to explain how people at carnivals predict weight. You collect data for roughly 100 boys and girls between the ages of nine and twelve and estimate for her the following relationship:

Weight = 45.59 + 4.32×Height4, 2R = 0.55, SER = 15.69

(3.81) (0.46) where Weight is in pounds, and Height4 is inches above 4 feet.

(a) Interpret the results.

Answer: For every inch above 4 feet, children of that age group gain roughly 4

pounds. A student who is 4 feet tall, weighs approximately 45.5 pounds. The regression explains 55 percent of the weight variation in children of that age group.

(b) You remember from the medical literature that females in the adult population

are, on average, shorter than males and weigh less. You also seem to have heard that females, controlling for height, are supposed to weigh less than males. To see if this relationship holds for children, you add a binary variable (DFY) that takes on the value one for girls and is zero otherwise. You estimate the following regression function:

Weight = 36.27 + 17.33×DFY + 5.32×Height4 – 1.83×(DFY×Height4),

(5.99) (7.36) (0.80) (0.90) 2R = 0.58, SER = 15.41 Are the signs on the new coefficients as expected? Are the new coefficients

individually statistically significant? Write down and sketch the regression function for boys and girls separately.

Page 33: Practice Questions Multiple Choice Questions Chapter 5faculty.econ.ucdavis.edu/faculty/jorda/class/140/S03/midterm2... · Practice Questions Multiple Choice Questions Chapter 5 1)

Answer: Shorter girls weigh more than boys, and taller boys weigh more than

girls on average. Given your prior expectations, this is somewhat unexpected. The coefficients involving the binary variable are statistically significant at conventional levels. The regressions for boys is

Weight = 36.27 + 5.32×Height4. For girls it is Weight = 53.60 + 3.49×Height4.

Predicted Height/Weight Relationship for Children, Age 9-12

0

20

40

60

80

100

120

140

160

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Height above 4 Inches

Wei

ght i

n Po

unds

Boys Girls

(c) The medical literature provides you with the following information for median

height and weight of nine to twelve year olds:

Page 34: Practice Questions Multiple Choice Questions Chapter 5faculty.econ.ucdavis.edu/faculty/jorda/class/140/S03/midterm2... · Practice Questions Multiple Choice Questions Chapter 5 1)

Median Height and Weight for Children, Age 9-12

Boys’ Weight Boys’ Height Girls’ Weight Girls’ Height 9 Year Old 60 52 60 49 10 Year Old 70 54 70 52 11 Year Old 77 56 80 57 12 Year Old 87 58.5 92 60

Insert two height/weight measures each for boys and girls and see how accurate

your predictions are.

Answer: The “XX” points mark a female, and the “XY” a male. The regression line predicts a 9 year old boy to weigh 57.2 pounds, an 11 year old boy to weigh 78.8 pounds, a 10 year old girl to weigh 67.6 pounds, and a 12 year old girl to weigh 95.5 pounds. Hence the weights are quite close.

Predicted Height/Weight Relationship for Children, Age 9-12

0

20

40

60

80

100

120

140

160

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Height above 4 Inches

Wei

ght i

n Po

unds

Boys Girls

XY

XXYXX

(d) The F-statistic for testing that the intercept and slope for boys and girls are

identical is 2.92. Find the critical values at the 5% and 1% level, and make a decision. Allowing for a different intercept with an identical slope results in a t-statistic for DFY of (–0.35). Having identical intercepts but different slopes gives a t-statistic on (DFY×Height4) of

(–0.35) also. Does this affect your previous conclusion?

Page 35: Practice Questions Multiple Choice Questions Chapter 5faculty.econ.ucdavis.edu/faculty/jorda/class/140/S03/midterm2... · Practice Questions Multiple Choice Questions Chapter 5 1)

Answer: The critical value is 3.00 at the 5% level, and 4.61 at the 1% level. Hence you cannot reject equality of the two coefficients. The previous conclusion is unaffected since the test was for both hypotheses to hold simultaneously. The t-statistics indicate that imposing the equality and testing for either the slope or the intercept to be significantly different between boys and girls, does not result in a different coefficient either.

(e) Assume that you also wanted to test if the relationship changes by age. Briefly

outline how you would specify the regression including the gender binary variable and an age binary variable (Older) that takes on a value of one for eleven to twelve year olds and is zero otherwise. Indicate in a table of two rows and two columns how the estimated relationship would vary between younger girls, older girls, younger boys, and older boys.

Answer: 0 1 2 34 ( 4)Weight DFY Height DFY Heightβ β β β= + + + × + 4 5 ( 4)Older Older Height uβ β+ × +

Younger Older Boys

0 2 4Heightβ β+ 0 4 2 5( ) ( ) 4Heightβ β β β+ + + Girls

0 1 2 3( ) ( ) 4Heightβ β β β+ + + 0 1 4 2 3 5( ) ( ) 4Heightβ β β β β β+ + + + +

4) You have learned that earnings functions are one of the most investigated relationships in economics. These typically relate the logarithm of earnings to a series of explanatory variables such as education, work experience, gender, race, etc.

(a) Why do you think that researchers have preferred a log-linear specification over a

linear specification? In addition to the interpretation of the slope coefficients, also think about the distribution of the error term.

Answer: The error variance and the variance of the dependent variable are related.

Given that the dependent variable (earnings) is not normally distributed, it is difficult to postulate that the error variance is normally distributed. Using logarithms results in a distribution that is closer to a normal. In addition, there seems to be a better fit for the log-linear specification, and the coefficients can be interpreted as percentage changes.

(b) To establish age-earnings profiles, you regress ln(Earn) on Age, where Earn is

weekly earnings in dollars, and Age is in years. Plotting the residuals of the regression against age for 1,744 individuals looks as shown in the figure:

Page 36: Practice Questions Multiple Choice Questions Chapter 5faculty.econ.ucdavis.edu/faculty/jorda/class/140/S03/midterm2... · Practice Questions Multiple Choice Questions Chapter 5 1)

-6

-4

-2

0

2

0 20 40 60 80 100

AGE

RES

Do you sense a problem?

Answer: There seems to be a pattern in the residuals when sorted by age. This suggests a misspecified functional form.

(c) You decide, given your knowledge of age-earning profiles, to allow the regression

line to differ for the below and above 40 years age category. Accordingly you create a binary variable, Dage, that takes the value one for age 39 and below, and is zero otherwise. Estimating the earnings equation results in the following output (using heteroskedasticity-robust standard errors):

LnEarn = 6.92 – 3.13×Dage - 0.019×Age + 0.085×(Dage×Age), R2=0.20, SER

=0.721. (38.33) (0.22) (0.004) (0.005)

Sketch both regression lines: one for the age category 39 years and under, and one for 40 and above. Does it make sense to have a negative sign on the Age coefficient? Predict the ln(earnings) for a 30 year old and a 50 year old. What is the percentage difference between these two?

Answer: According to the specification, earnings increase with age until the

individual is 39 years old. It is only from age 40 onwards that the regression predicts a negative relationship between earnings and age. According to the estimates, a 30 year old would have ln(earnings) of 5.77, while the predicted value for a 50 year old would be 5.97. The difference between the two is approximately 20 percent.

Page 37: Practice Questions Multiple Choice Questions Chapter 5faculty.econ.ucdavis.edu/faculty/jorda/class/140/S03/midterm2... · Practice Questions Multiple Choice Questions Chapter 5 1)

Predicted (log) Earnings

55.25.45.65.8

66.26.46.6

20 30 40 50 60 70 80

Age

ln(E

arn)

LnEarn<40 LnEarn>40

(d) The F-statistic for the hypothesis that both slopes and intercepts are the same is 124.43. Can you reject the null hypothesis?

Answer: The critical value from the F-table is 4.61 at the 1% level. Hence the

null hypothesis is rejected.

(e) What other functional forms should you consider?

Answer: Instead of the inverted V-shape for the above regression, an inverted U-shape would most likely produce a better fit. This can be generated through the use of a polynomial regression model of degree 2.

6) Sports economics typically looks at winning percentages of sports teams as one of

various outputs, and estimates production functions by analyzing the relationship between the winning percentage and inputs. In Major League Baseball (MLB), the determinants of winning are quality pitching and batting. The table shows all 30 MLB teams for the 1999 season. Pitching quality is approximated by “Team Earned Run Average” (ERA), and hitting quality by “On Base Plus Slugging Percentage” (OPS).

Page 38: Practice Questions Multiple Choice Questions Chapter 5faculty.econ.ucdavis.edu/faculty/jorda/class/140/S03/midterm2... · Practice Questions Multiple Choice Questions Chapter 5 1)

Summary of the Distribution of Winning Percentage, On Base Plus Slugging Percentage, and Team Earned Run Average for MLB in 1999

Average Standard

deviation Percentile

10% 25% 40% 50% (median)

60% 75% 90%

Team ERA

4.71 0.53 3.84 4.35 4.72 4.78 4.91 5.06 5.25

OPS

0.778 0.034 0.720 0.754 0.769 0.780 0.790 0.798 0.820

Winning Percentage

0.50 0.08 0.40 0.43 0.46 0.48 0.49 0.59 0.60

Your regression output is:

Winpct = –0.19 – 0.099×teamera + 1.490×ops , R2=0.92, SER = 0.02. (0.08) (0.008) (0.126)

(a) Interpret the regression. Are the results statistically significant and important?

Answer: Lowering the team ERA by one results in a winning percentage increase of roughly ten percent. Increasing the OPS by 0.1 generates a higher winning percentage of approximately 15 percent. The regression explains 92 percent of the variation in winning percentages. Both slope coefficients are statistically significant, and given the small differences in winning percentage, they are also important.

(b) There are two leagues in MLB, the American League (AL) and the National League

(NL). One major difference is that the pitcher in the AL does not have to bat. Instead there is a “designated hitter” in the hitting line-up. You are concerned that, as a result, there is a different effect of pitching and hitting in the AL from the NL. To test this hypothesis, you allow the AL regression to have a different intercept and different slopes from the NL regression. You therefore create a binary variable for the American League (DAL) and estimate the following specification:

Winpct = – 0.29 + 0.10×DAL – 0.100×teamera + 0.008×(DAL×teamera)

(0.12) (0.24) (0.008) (0.018) + 1.622*ops – 0.187 *(DAL×ops) , R2=0.92, SER = 0.02.

(0.163) (0.160)

Page 39: Practice Questions Multiple Choice Questions Chapter 5faculty.econ.ucdavis.edu/faculty/jorda/class/140/S03/midterm2... · Practice Questions Multiple Choice Questions Chapter 5 1)

What is the regression for winning percentage in the AL and NL? Next, calculate the t-statistics and say something about the statistical significance of the AL variables. Since you have allowed all slopes and the intercept to vary between the two leagues, what would the results imply if all coefficients involving DAL were statistically significant?

Answer: NL: Winpct = – 0.29 – 0.100×teamera + 1.622×ops.

AL : Winpct = – 0.19 – 0.092×teamera + 1.435×ops. The t-statistics for all variables involving DAL are, in order of appearance in the above regression, 0.42, 0.44, and –1.17. None of the coefficients is statistically significant individually. If these were statistically significant, then this would indicate that the coefficients vary between the two leagues. Hence it would suggest that the introduction of the designated hitter might have changed the relationship.

(c) You remember that sequentially testing the significance of slope coefficients is not

the same as testing for their significance simultaneously. Hence you ask your regression package to calculate the F-statistic that all three coefficients involving the binary variable for the AL are zero. Your regression package gives a value of 0.35. Looking at the critical value from you F-table, can you reject the null hypothesis at the 1% level? Should you worry about the small sample size?

Answer: The critical value of the F-statistic is 3.78 at the 1% level, and hence

you cannot reject the null hypothesis, that all three coefficients are zero. However, the F-statistic is not really distributed as 3,F ∞ , and, as a result, inference is problematic here.

7) There has been much debate about the impact of minimum wages on employment and unemployment. While most of the focus has been on the employment-to-population ratio of teenagers, you decide to check if aggregate state unemployment rates have been affected. Your idea is to see if state unemployment rates for the 48 contiguous U.S. states in 1985 can predict the unemployment rate for the same states in 1995, and if this prediction can be improved upon by entering a binary variable for “high impact” minimum wage states. One labor economist labeled states as high impact if a large fraction of teenagers was affected by the 1990 and 1991 federal minimum wage increases. Your first regression results in the following output:

95

iUr = 3.19 + 0.27× 85iUr , 2R = 0.21, SER=1.031

(0.56) (0.07) (a) Sketch the regression line and add a 450 line to the graph. Interpret the regression

Page 40: Practice Questions Multiple Choice Questions Chapter 5faculty.econ.ucdavis.edu/faculty/jorda/class/140/S03/midterm2... · Practice Questions Multiple Choice Questions Chapter 5 1)

results. What would the interpretation be if the fitted line coincided with the 450 line?

Answer: An increase in the 1985 unemployment rate results in an increase in the unemployment rate in 1995 of 0.27 percent. Put differently, if one state had a one percent higher unemployment rate in 1985 than another state, then this difference would shrink, on average, to 0.27 percent in 1995. 21 percent of the variation in 1995 state unemployment rates is explained by the regression. If the fitted line coincided with the 450 line, then the unemployment rates in 1995 would remain unchanged when compared to 1985. The estimated regression implies, unrealistically, mean reversion in the unemployment rates.

State Unemployment Rates, 1995 (predicted) and 1985

1.5

3.5

5.5

7.5

9.5

11.5

1.5

2.2

2.9

3.6

4.3 5

5.7

6.4

7.1

7.8

8.5

9.2

9.9

10.6

11.3 12

Unemployment Rate 1985

Pred

icte

d 19

95

Une

mpl

oym

ent R

ate

Predicted Unemployment Rate 45 Degree Line

(b) Adding the binary variable DhiImpact by allowing the slope and intercept to differ,

results in the following fitted line:

95iUr = 4.02 + 0.16× 85

iUr – 3.25 ×DhiImpact + 0.38×(DhiImpact× 85iUr ),

(0.66) (0.09) (0.89) (0.11)

2R = 0.31, SER=0.987 The F-statistic for the null hypothesis that both parameters involving the high impact minimum wage variable are zero, is 42.16. Can you reject the null

Page 41: Practice Questions Multiple Choice Questions Chapter 5faculty.econ.ucdavis.edu/faculty/jorda/class/140/S03/midterm2... · Practice Questions Multiple Choice Questions Chapter 5 1)

hypothesis that both coefficients are zero? Sketch the two regression lines together with the 450 line and interpret the results again. Answer: The critical value for the F-statistic is 4.61 at the 1% level and hence the

null hypothesis that both coefficients are zero in the population is rejected. (The sample size is small, however, so the distribution of the test statistic is not really known.) The intercept for the high-impact states is smaller and the slope is steeper. This suggests that for high-impact states there is less of a mean reversion effect present: if high-impact states had high 1985 unemployment rates, then they are expected to have higher unemployment rates in 1995 when compared to a low-impact state. High and low unemployment rates are thereby more persistent for high-impact states.

U.S. State Unemployment Rates

1.5

3.5

5.5

7.5

9.5

11.5

1.5 3.5 5.5 7.5 9.5 11.5

State Unemployment Rate 1985

Pred

icte

d 19

95

Une

mpl

oym

ent R

ate

Low Impact High Impact 45 Degree Line

(c) To check the robustness of these results, you repeat the exercise using a new binary variable for the so-called mining state (Dmining), i.e. the eleven states that have at least three percent of their total state earnings derived from oil, gas extraction, and coal mining, in the 1980s. This results in the following output:

95

iUr = 4.04 + 0.15× 85iUr – 2.92 ×Dmining + 0.37×(Dmining× 85

iUr ), (0.65) (0.09) (0.90) (0.10)

2R = 0.31, SER=0.997

Page 42: Practice Questions Multiple Choice Questions Chapter 5faculty.econ.ucdavis.edu/faculty/jorda/class/140/S03/midterm2... · Practice Questions Multiple Choice Questions Chapter 5 1)

How confident are you that the previously found effect is due to minimum wages?

Answer: The results here are similar to those in (b) in that the regression for the mining states is steeper than the one for the other states. Perhaps omitted variables play a role here, such as relative (oil) price shocks that affect some states more than others. Oil prices fell considerably over the time period and it is possible that the high-impact binary variable coefficient picks up the effect of omitted variables. Including more explanatory variables would be desirable.

8) Labor economists have extensively researched the determinants of earnings.

Investment in human capital, measured in years of education, and on the job training are some of the most important explanatory variables in this research. You decide to apply earnings functions to the field of sports economics by finding the determinants for baseball pitcher salaries. You collect data on 455 pitchers for the 1998 baseball season and estimate the following equation using OLS and heteroskedasticity-robust standard errors:

( )iLn Earn = 12.45 + 0.052×Years + 0.00089×Innings + 0.0032×Saves

(0.08) (0.026) (0.00020) (0.0018)

– 0.0085×ERA, 2R =0.45, SER=0.874 (0.0168)

where Earn is annual salary in dollars, Years is number of years in the major leagues, Innings is number of innings pitched during the career before the 1998 season, Saves is number of saves during the career before the 1998 season, and ERA is the earned run average before the 1998 season.

(a) What happens to earnings when the pitcher stays in the league for one additional year? Compare the salaries of two relievers, one with 10 more saves than the other. What effect does pitching 100 more innings have on the salary of the pitcher? What effect does reducing his ERA by 1.5 have? Do the signs correspond to your expectations? Explain.

Answer: For staying an additional year in the league, the pitcher receives a 5.2

percent increase in earnings. On average, the reliever with 10 more saves ends up with 3.2 percent higher earnings. Pitching100 additional innings results in 8.9 percent higher earnings, and lowering the ERA by 1.5 increases earnings by 1.3 percent. ERA, innings pitched, and number of saves are all quality of input indicators and should therefore have the signs as in the regression above. Years in the major leagues stands as a proxy for on the job training and should therefore carry a positive sign.

Page 43: Practice Questions Multiple Choice Questions Chapter 5faculty.econ.ucdavis.edu/faculty/jorda/class/140/S03/midterm2... · Practice Questions Multiple Choice Questions Chapter 5 1)

(b) Are the individual coefficients statistically significant? Indicate the level of

significance you used and the type of alternative hypothesis you considered.

Answer: Given that there is prior expectation on the sign of the coefficients, you should conduct a one-sided hypothesis test. All variables with the exception of ERA carry statistically significant coefficients at the 5% level.

(c) Although you are quite impressed with the fit of the regression, someone suggests

that you should include the square of years and innings as additional explanatory variables. Your results change as follows:

( )iLn Earn = 12.15 + 0.160×Years + 0.00268×Innings + 0.0063×Saves

(0.05) (0.039) (0.00030) (0.0010)

1.0.0584×ERA – 0.0165×Years2 - 0.00000045× Innings2 (0.0165) (0.0026) (0.00000012)

2R =0.69, SER=0.666 What is her reasoning? Are the coefficients of the quadratic terms statistically

significant? Are they meaningful?

Answer: Allowing for the quadratic terms to enter results in an inverted U-shape for the relationship between the log of earnings, and both years in the league and innings pitched. Both coefficients are highly significant and have resulted also in a significant ERA coefficient.

(d) Calculate the effect of moving from two to three years, as opposed to from 12 to

13 years.

Answer: Having played for two years and staying for one more year in the league results in an earnings increase of 7.8 percent, while staying for an additional year after 12 years in the majors results in a predicted decrease of 25.3 percent.

(e) You also decide to test the specification for stability across leagues (National

League and American League) by including a dummy variable for the National League and allowing the intercept and all slopes to differ. The resulting F-statistic for restricting all coefficients that involve the National League dummy variable to zero, is 0.40. Compare this to the relevant critical value from the table and decide whether or not these additional variables should be included.

Page 44: Practice Questions Multiple Choice Questions Chapter 5faculty.econ.ucdavis.edu/faculty/jorda/class/140/S03/midterm2... · Practice Questions Multiple Choice Questions Chapter 5 1)

Answer: 7, 2.01F ∞ = at the 5% level. Hence you cannot reject the null hypothesis of equality of coefficients across leagues.

9) One of the most frequently estimated equations in the macroeconomics growth

literature is so-called convergence regressions. In essence the average per capita income growth rate is regressed on the beginning-of-period per capita income level to see if countries that were further behind initially, grew faster. Some macroeconomic models make this prediction, once other variables are controlled for. To investigate this matter, you collect data from 104 countries for the sample period 1960-1990 and estimate the following relationship (numbers in parentheses are for heteroskedasticity-robust standard errors):

6090g = 0.020 - 0.360×gpop + 0.004×Educ – 0.053×RelProd60, R2=0.332, SER

= 0.013 (0.009) (0.241) (0.001) (0.009) where g6090 is the growth rate of GDP per worker for the 1960-1990 sample

period, RelProd60 is the initial starting level of GDP per worker relative to the United States in 1960, gpop is the average population growth rate of the country, and Educ is educational attainment in years for 1985.

(a) What is the effect of an increase of 5 years in educational attainment? What

would happen if a country could implement policies to cut population growth by one percent? Are all coefficients significant at the 5% level? If one of the coefficients is not significant, should you automatically eliminate its variable from the list of explanatory variables?

Answer: Increasing educational attainment by 5 years results in an increase of

productivity growth of 2 percent. Decreasing the population growth rate by one percent increases productivity growth by 0.4 percent. All coefficients are statistically significant at the 5% level with the exception of population growth. You should not eliminate a variable simply because it is not statistically significant. It is better to report the statistics and let the reader decide.

(b) The coefficient on the initial condition has to be significantly negative to suggest

conditional convergence. Furthermore, the larger this coefficient, in absolute terms, the faster the convergence will take place. It has been suggested to you to interact education with the initial condition to test for additional effects of education on growth. To test for this possibility, you estimate the following regression:

6090g = 0.015 -0.323×gpop + 0.005×Educ –0.051×RelProd60

(0.009) (0.238) (0.001) (0.013)

Page 45: Practice Questions Multiple Choice Questions Chapter 5faculty.econ.ucdavis.edu/faculty/jorda/class/140/S03/midterm2... · Practice Questions Multiple Choice Questions Chapter 5 1)

–0.0028×(Educ×RelProd60), R2=0.346, SER = 0.013 (0.0015)

Write down the effect of an additional year of education on growth. West Germany has a value for RelProd60 of 0.57, while Brazil’s value is 0.23. What is the predicted growth rate effect of adding one year of education in both countries? Does this predicted growth rate make sense?

Answer: 606090 0.005 0.0028Re Prg l od

Educ∆ = −∆

. For West Germany, the effect is

0.3 percent, while for Brazil it is 0.4 percent. These are small gains, but they accumulate over time.

(c) What is the implication for the speed of convergence? Is the interaction effect

statistically significant?

Answer: 60

6090 0.051 0.0028Re Pr

g Educl od

∆ = − −∆

, which therefore depends on

educational attainment. Countries with higher educational attainment will converge faster. The coefficient has a t-statistic of 1.87 and is therefore statistically significant at the 5% level using a one-sided hypothesis test.

(d) Convergence regressions are basically of the type

0 1 0ln lntY Y∆ β β= − where ∆ might be the change over a longer time period, 30 years, say, and the

average growth rate is used on the left-hand side. You note that the equation can be rewritten as

0 1 0ln (1 ) lntY Yβ β= − −

Over a century ago, Sir Francis Galton first coined the term “regression” by

analyzing the relationship between the height of children and the height of their parents. Estimating a function of the type above, he found a positive intercept and a slope between zero and one. He therefore concluded that heights would revert to the mean. Since ultimately this would imply the height of the population being the same, his result has become known as “Galton’s Fallacy.” Your estimate of 1β above is approximately 0.05. Do you see a parallel to Galton’s Fallacy?

Answer: The above regressions generate a mean reversion outcome. Interpreted

literally, the implication is that all countries end up with the same productivity or per capita income, just as all persons would be of the

Page 46: Practice Questions Multiple Choice Questions Chapter 5faculty.econ.ucdavis.edu/faculty/jorda/class/140/S03/midterm2... · Practice Questions Multiple Choice Questions Chapter 5 1)

same height. It can be shown that Galton’s Fallacy is the result of errors-in-variables which biases the slope coefficient downward. This topic is covered in Chapter 7. The solution is to use instrumental variable techniques, also discussed in Chapter 10. The literature in this area has done so, and the convergence result persists.