exam

12
BEE2006 UNIVERSITY OF EXETER BUSINESS SCHOOL May/June 2012 STATISTICS AND ECONOMETRICS Module Convenors: Dr. Paulo M.D.C. Parente Dr. Ana Fernandes Duration: TWO HOURS Answer ONLY ONE question from SECTION A, ONLY ONE question from SECTION B and BOTH questions from SECTION C. Use a separate answer booklet for each section. Materials to be supplied: Statistical Tables Instructions (please read before starting ): Write in a clear legible manner in ink/ballpoint. Do not use pencils or erasable pens. Approved calculators are permitted. Only one sheet (2 sides A4) of notes made exclusively by the student may be consulted (no material distributed by the teacher in any form is allowed). Whenever conducting a test use a 5% significance level unless stated otherwise. Also be sure to state null and alternative hypotheses, null distribution (with degrees of freedom), rejection criterion (critical values and rejection region) and outcome. If you are asked to derive something, give all intermediate steps also. Do not answer questions with a “yes” or “no” only, but carefully justify your answer. 1

Upload: paul-garner

Post on 09-Nov-2014

39 views

Category:

Documents


1 download

DESCRIPTION

Stats

TRANSCRIPT

Page 1: Exam

BEE2006

UNIVERSITY OF EXETER

BUSINESS SCHOOL

May/June 2012

STATISTICS AND ECONOMETRICS

Module Convenors:Dr. Paulo M.D.C. Parente

Dr. Ana Fernandes

Duration: TWO HOURS

Answer ONLY ONE question from SECTION A, ONLY ONE question from SECTIONB and BOTH questions from SECTION C.

Use a separate answer booklet for each section.Materials to be supplied: Statistical Tables

Instructions (please read before starting): Write in a clear legible manner inink/ballpoint. Do not use pencils or erasable pens. Approved calculators are permitted.Only one sheet (2 sides A4) of notes made exclusively by the student may be consulted(no material distributed by the teacher in any form is allowed). Whenever conducting atest use a 5% significance level unless stated otherwise. Also be sure to state null andalternative hypotheses, null distribution (with degrees of freedom), rejection criterion(critical values and rejection region) and outcome. If you are asked to derive something,give all intermediate steps also. Do not answer questions with a “yes” or “no” only, butcarefully justify your answer.

1

Page 2: Exam

Section A - Answer only one question

Question 1Consider the following model to explain child birth weight in terms of various factors

bwght = β0 + β1cigs + β2parity + β3faminc

+β4motheeduc + β5fatheeudc + u,

where u ∼ N (0, σ2) and the variables in the model are:bwght = birth weight in pounds;cigs = average number of cigarettes the mother smoked per day during pregnancy;parity = the birth order of the child;faminc = annual family income;motheduc = years of schooling for the mother;fatheduc = years of schooling for the father.

(a) (6 Marks) Does this regression model necessarily imply a causal relationship be-tween child’s birth weight and the regressors cigs, parity, faminc, motheduc andfatheduc? Justify your answer.

(b) (5 Marks) Interpret β3.

(c) (6 Marks) Using data from the US 1988 National Health Interview Survey thefollowing results were obtained

bwght = 114.524(3.7285)

− 0.5959(0.1104)

cigs + 1.7876(0.6594)

parity + 0.0560(0.0366)

faminc (1)

−0.3705(0.3199)

motheeduc + 0.4724(0.2826)

fatheeduc,

n = 1191, TSS = 482722.355, SSR = 464040.052,

where TSS is the Total Sum of Squares, SSR is the Sum of Squared Residuals,and the standard errors of estimated coefficients are reported in brackets. Test theindividual significance of motheeduc and fatheeduc at 10% level.

(d) (6 Marks) Test the significance of the overall regression.

(e) (6 Marks) Let u denote the residual of regressing bwght on cigs, parity and faminc

and consider the following regression

u = −0.9456(3.7285)

− 0.0019(0.1104)

cigs− 0.0447(0.6594)

parity − 0.011(0.0366)

faminc

−0.3705(0.3199)

motheeduc + 0.4724(0.2826)

fatheeduc,

R2 = 0.0024.

Are motheeduc and fatheeduc jointly significant?

2

Page 3: Exam

(f) (6 Marks) The R2 of the regression of the squared of the residuals of (1) on cigs,parity, faminc, motheduc and fatheduc and respective squares is 0.0029. Test forHeteroskedasticity.

Question 2We are interested in investigating how the price of a house depends on the character-

istics of the house in Boston, US. We consider the model

log(price) = β0 + β1sqrft + β2bdrms + u,

where u ∼ N (0, σ2) and the variables in the model are:price = house price, in thousands of dollars;sqrft =size of house in square feet;bdrms =number of bedrooms.

(a) (5 Marks) Interpret β2.

(b) (6 Marks) Using data collected from the Boston Globe during 1990 the followingresults were obtained

log(price) = 4.76603(0.09704)

+ 0.00038(0.000040)

sqrft + 0.02888(0.02964)

bdrms,

n = 88, R2 = 0.5883,

(Standard errors of estimated coefficients are reported in brackets.) Test whetherthe size of house in square feet has a significant positive effect on log(price).

(c) (6 Marks) Test the overall significance of the regression.

(d) (6 Marks) We are interested in estimating and obtaining a confidence interval forthe percentage change in price when a 150-square-foot bedroom is added to a house.In decimal form, this is θ1 = 150β1 + β2. Estimate and construct a 95% confidenceinterval for θ1 given that the estimated covariance between the OLS estimator forβ1 and β2 is −0.000000681.

(e) (6 Marks) We now include the squares of bdrms in the regression model.

log(price) = 5.07139(0.27108)

+ 0.00038(0.000040)

sqrft− 0.13086(0.13573)

bdrms + 0.01999(0.01657)

bdrms2,

(2)

n = 88, R2 = 0.5883, SSR = 3. 2434.

Test whether the number of bedrooms affects the price of the house taking intoaccount that the R2 of the restricted model is 0.568.

3

Page 4: Exam

(f) (6 Marks) Now we are interested in studying if the regression model differs betweencolonial houses and non-colonial houses. The regression for non-colonial housesyields

log(price) = 6.12642(0.63578)

+ 0.00033(8e−005)

sqrft− 0.76368(0.37576)

bdrms + 0.11269(0.05902)

bdrms2,

n = 27, R2 = 0.6366,

SSR = 0.94035.

Running a regression for colonial houses we obtain

log(price) = 4.7786(0.39637)

+ 0.0004(5e−005)

sqrft + 0.01041(0.18493)

bdrms + 0.00229(0.02126)

bdrms2,

n = 61, R2 = 0.6090,

SSR = 2.021.

Test whether the regression function is identical for colonial and non-colonial houses.

Section B- Answer only one question

Question 1

(a) To study the effect of women’s education (schooling) on fertility we estimate model(3) below where the dependent variable, kids, is the number of children born towomen aged between 35-54 and educ denotes the years of schooling. We also includeas regressors age and its squared term agesq, a binary variable that takes the valueof one if the individual is black and zero otherwise, black; a binary variable thattakes the value of one if the individual lived in a rural area at the age of 16, othrural;and a binary variable taking the value of one if the individual lived in a small cityat the age of sixteen and zero otherwise, smcity.

kids = β0 + β1educ + β2age + β3age2 + β4black + β5othrural + β6smcity + u

(3)

One could argue that education, educ, is not an exogenous determinant of fertility.Women’s education could be correlated with unobservable characteristics that arejointly determined with fertility. We have two instrumental variable candidates foreducation, the individual’s father’s years of education, feduc, and the individual’smother years of education, meduc. We estimate a number of models, provided

4

Page 5: Exam

below, using OLS and Two Stage Least Squares (2SLS).

Model 1: OLS, using observations 1—1129Dependent variable: kids

Coefficient Std. Error t-test p-valueconstant −8.11296 3.06963 −2.6430 0.0083educ −0.134841 0.0181137 −7.4442 0.0000age 0.551360 0.139837 3.9429 0.0001agesq −0.00596589 0.00158168 −3.7719 0.0002black 0.862121 0.168723 5.1097 0.0000othrural −0.207259 0.158015 −1.3116 0.1899smcity 0.186718 0.143372 1.3023 0.1931

R2 0.092255 Adjusted R2 0.087400F (6, 1122) 19.00493 P-value(F ) 3.65e—21

Model 2: OLS, using observations 1—1129Dependent variable: educ

Coefficient Std. Error t-ratio p-valueconstant 14.0525 4.36629 3.2184 0.0013age −0.237050 0.199523 −1.1881 0.2351agesq 0.00267332 0.00225641 1.1848 0.2364black 0.431187 0.242351 1.7792 0.0755othrural −0.463964 0.225901 −2.0538 0.0402smcity 0.186039 0.204700 0.9088 0.3636meduc 0.182272 0.0219009 8.3226 0.0000feduc 0.218522 0.0251017 8.7055 0.0000

R2 0.275637 Adjusted R2 0.271114F (7, 1121) 60.93821 P-value(F ) 2.91e—74

5

Page 6: Exam

Model 3: OLS, using observations 1—1129Dependent variable: kids

Coefficient Std. Error t-ratio p-valueconstant −7.63497 3.15309 −2.4214 0.0156educ −0.155673 0.0361375 −4.3078 0.0000age 0.542543 0.140496 3.8616 0.0001agesq −0.00587699 0.00158769 −3.7016 0.0002black 0.859666 0.168805 5.0927 0.0000othrural −0.229508 0.161544 −1.4207 0.1557smcity 0.195752 0.144047 1.3589 0.1744Model 2 Residuals 0.0278269 0.0417661 0.6663 0.5054

R2 0.092614 Adjusted R2 0.086948F (7, 1121) 16.34528 P-value(F ) 1.35e—20

Model 4: 2SLS, using observations 1—1129Dependent variable: kids

Instrumented: educInstruments: constant age agesq black othrural smcity meduc feduc

Coefficient Std. Error z p-valueconstant −7.63497 3.15417 −2.4206 0.0155educ −0.155673 0.0361498 −4.3063 0.0000age 0.542543 0.140544 3.8603 0.0001agesq −0.00587699 0.00158824 −3.7003 0.0002black 0.859666 0.168863 5.0909 0.0000othrural −0.229508 0.161599 −1.4202 0.1555smcity 0.195752 0.144096 1.3585 0.1743

R2 0.091781 Adjusted R2 0.086924F (6, 1122) 12.84828 P-value(F ) 4.45e—14

Sargan over-identification test — Null hypothesis: all instruments are validTest statistic for over-identification: LM = 0.0582575 with p-value = 0.809272

(i) (5 Marks) Specify the equation for educ and explain why the parameters ofthat equation can be estimated by OLS.

(ii) (6 Marks) Use the relevant output from above to test for instrumental variablerelevance and assess whether meduc and feduc are suitable instruments foreduc.

(iii) (6 Marks) What do you conclude regarding Sargan´s over-identification testresult? (provided at the end of the output for Model 4).

(iv) (6 Marks) Using the relevant output from above, conduct Hausman´s endo-geneity test. Provide the null, the alternative hypothesis and the numericalvalue of the test. What do you conclude regarding the endogeneity of educ?

6

Page 7: Exam

(v) (6 Marks) Since there is no presence of heteroskedasticity the usual standarderrors are reported in all estimated models. Bearing this into considerationand given your decision regarding Hausman´s endogeneity test which is yourpreferred estimate of parameter β1? Why?

(b) (6 Marks) Consider a simple model to estimate the effect of computer ownershipon the average mark of graduating students at a large UK university:

MARK = β0 + β1PC + u.

Is it reasonable to assume that PC ownership is likely to be uncorrelated with u?Explain.

Question 2

(a) (5 Marks) Consider the multiple regression model:

yt = β0 + β1xt1 + ... + βkxtk + ut.

Assume that the explanatory variables, xtj, are strictly exogenous. Further, utfollows an AR(q) process:

ut = ρ1ut−1 + ρ2ut−2 + ... + ρqut−q + et.

Explain how you would test for serial correlation.

(b) (6 Marks) Specify and explain the meaning of the contemporaneous exogeneityassumption for explanatory variables in time series analysis.

(c) Consider the following partial adjustment model:

y∗t = γ0 + γ1xt + et,

yt − yt−1 = λ(y∗t − yt−1) + at, 0 < λ < 1,

where y∗t is the desired growth in firm inventories and yt is the actual (observed)growth. xt represents the growth in firm sales. The parameter γ1 measures theeffect of xt on y∗t .

(i) (6 Marks) Explain what the second equation describes and how you wouldinterpret the parameter λ.

(ii) (6 Marks) Show that we can write:

yt = β0 + β1yt−1 + β2xt + ut.

In particular, provide the expressions for the β’s in terms of γ’s and λ andfind ut in terms of et and at.

(iii) (6 Marks) If E(et|xt, yt−1,xt−1, ...) = 0 and E(at|xt, yt−1,xt−1, ...) = 0, and allseries are weakly dependent, how would you estimate the β’s in the model ofpart (ii) above? Explain.

(iv) (6 Marks) If β1 = 0.7 and β2 = 0.2, what are the estimates of γ1 and λ?

7

Page 8: Exam

Section C- Answer both questions

Question 1Are the following statements correct? (Justify carefully your answers)

(a) (5 Marks) From asymptotic theory we learn that - under appropriate conditions -the error terms in a regression model will be approximately normally distributed ifthe sample size is sufficiently large.

(b) (5 Marks) In a random sample under the assumption of homoskedasticity the gener-alised least squares estimator and the ordinary least squares estimator are identical.

(c) (5 Marks) Suppose that we want to estimate the effect of several variables on annualsaving and that we have a panel data set on individuals collected on January 20,2000, and January 20, 2002. If we include a year dummy for 2002 and use firstdifferencing, we can also include age in the original model.

(d) (5 Marks) We can use first differences when we have independent cross sections intwo years.

Question 2 (10 Marks)Consider the linear regression model

yi = β + ui, i = 1, ..., n,

E(ui|xi) = 0, var(ui|xi) = σ2,

where the observations {(yi, xi), i = 1, ..., n} are independent. Let

b =

∑n

i=1yixi

∑n

i=1x2i

.

Show that

var(b|x1, ..., xn) =σ2

∑n

i=1x2i

.

justifying all the steps of the derivation.

[end of paper]

8

Page 9: Exam
Page 10: Exam
Page 11: Exam
Page 12: Exam