mcgill university part a examination in statistics ... · cane let to grow for one year after being...

46
McGill University Part A Examination in Statistics Methodology Paper Department of Mathematics & Statistics Date: Thursday, August 23rd 2007 Time: 13:00–17:00 Instructions Answer two questions out of Section 1. Only two questions will be marked. Answer two questions out of Section 2. Only two questions will be marked. If you do not indicate which questions you wish to have marked, the questions will be marked in the order in which they appear in the answer book until the quota has been reached. All questions are weighted equally (20 marks each). Good luck! This exam comprises the cover, questions on pages 1 to 8, and tables on pages 41 to 45. v 5-20070427

Upload: others

Post on 31-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: McGill University Part A Examination in Statistics ... · Cane let to grow for one year after being cut (this is, the cane is flrst ratoon) can be considered to have an age of one

McGill University

Part A Examination in Statistics

Methodology Paper

Department of Mathematics & Statistics

Date: Thursday, August 23rd 2007 Time: 13:00–17:00

Instructions

• Answer two questions out of Section 1. Only two questions will bemarked.

• Answer two questions out of Section 2. Only two questions will bemarked.

• If you do not indicate which questions you wish to have marked, thequestions will be marked in the order in which they appear in theanswer book until the quota has been reached.

• All questions are weighted equally (20 marks each).

• Good luck!

This exam comprises the cover, questions on pages 1 to 8, and tables on pages 41 to45.v 5-20070427

Page 2: McGill University Part A Examination in Statistics ... · Cane let to grow for one year after being cut (this is, the cane is flrst ratoon) can be considered to have an age of one

Part A Examination August 2007 Methodology Paper

Section 1: Answer two of questions SM1 to SM3.

SM1. This dataset is courtesy of Dr Waldon Garris University of Virginia School ofMedicine. Dr Garriss collected the data in a pilot study during his work inthe Dominican Republic in 1997. The subjects are persons who came to med-ical clinics in several villages; variables age, gender, village name systolic

blood pressure, and diastolic blood pressure were collected.

The primary research question of interest is to determine the extent to whichwe can use the first three covariates to predict the systolic blood pressure.

The output for question SM1 begins on page 9.

(a) [3 marks] Test for significance of each of the three covariates individually.Refer clearly to the part(s) of the output that you are using for your tests.Is there evidence to include only one (if any) of the covariates in the model?Or should one include both covariates? Explain.

(b) [3 marks] Test for the significance of each of three covariates in the pres-ence of both of the others. What covariates should be included in themodel for systolic blood pressure? Explain.

(c) [4 marks] State and comment on the appropriateness of the assumptionsof the linear regression model that you’ve selected.

(d) [10 marks] Refer to the code and figures for Question #1, Part (d) for thefollowing questions. We will only examine models that include all threecovariates for conciseness.

i. Interpret the Box-Cox transformation plot. Does the plot and theBox-Cox procedure suggest a transformation of the data is necessary?

ii. Assume now that one would transform the data (regardless of youranswer in part (i)), what transformation would you propose for thisdataset? Briefly explain why you choose your transformation. Brieflydiscuss the advantages and disadvantages of transforming the data theway you’ve proposed.

iii. A researcher wants to use the AIC and BIC to select a “best” regressionmodel for the data. Can the researcher use the AIC and BIC (notshown) to choose amongst transformations? Why or why not? If not,be sure to suggest a possible adjustment to the AIC/BIC values thatwould make such a comparison more reasonable.

1

Page 3: McGill University Part A Examination in Statistics ... · Cane let to grow for one year after being cut (this is, the cane is flrst ratoon) can be considered to have an age of one

Part A Examination August 2007 Methodology Paper

SM2. This data gives sugar cane yields for each paddock in the Mulgrave area ofNorth Queensland for the 1997 sugar cane season. It was obtained by DavidGregory and Nick Denman for their MS305 data project at The University ofQueensland in 1998. There are 3775 observations in the dataset.

Mulgrave is a region in North Queensland around the Mulgrave river and thecity of Cairns. Sugar cane is the primary industry in Mulgrave, and all sugarcane from the area is processed through the Mulgrave Central Mill. The datawere provided by the Bureau of Sugar Experimental Stations (BSES) on behalfof the Mulgrave Central Mill and was obtained from the OzDASL repository.

The response variable of interest is the commercial sugar content per rake pro-duced (Sugar). The goal of the analysis is to discover predictors of sugar contentand the best regression model using the following predictor variables:

• DistrictPosition: The Mulgrave area has been divided by the BSESinto fifteen districts, but the statistical authors grouped them further bylocation into 5 groups by location (Central, North, South, East, and West).

• Age: Cane planted the year before may be regarded as having age zero.Cane let to grow for one year after being cut (this is, the cane is firstratoon) can be considered to have an age of one.

• HarvestMonth: The sugar cane cutting season usually begins in June andconcludes in mid-November, the finishing date depending on how the sea-son has gone with respect to rainfall and mill breakdowns. Months arelabelled by their numerical equivalent (June = 6, July = 7, etc.).

The output for question SM2 begins on page 22.

(a) [6 marks] Refer only to the code and plots for part (a) for the followingtwo parts.

i. Give an interpretation for the coefficient DistrictPositionN withrespect to the mean of the response variable.

ii. Test for a significant effect of District Position Group on the sugarcontent yield. Comment on the model fit and validity of the modelassumptions.

(b) [8 marks] Refer only to the code and plots for parts (a) and (b) for thefollowing three parts.

i. Test for a significant effect of Age by itself.

ii. Test for significant effects of both DistrictPosition and Age in thepresence of the other.

2

Page 4: McGill University Part A Examination in Statistics ... · Cane let to grow for one year after being cut (this is, the cane is flrst ratoon) can be considered to have an age of one

Part A Examination August 2007 Methodology Paper

iii. Comment on the model fit and validity of the model assumptions forboth models in b(i) and b(ii).

iv. Should both covariates be included in the model? Explain why or whynot.

(c) [6 marks] Refer only to the code part (c) for the following part.

i. Assume that we will include Age and DistrictPosition in the model.There are three suggested modelling choices for HarvestMonth:

• Modeling HarvestMonth as a single quantitative variable

• Modeling HarvestMonth with a linear and quadratic term

• Modeling HarvestMonth as a factor (or categorical) variable

Discuss the relative merits of each of the three models from a statisticalperspective and choose what you would consider the “best” model. Besure your discussion makes clear your reasons for selecting one modelover the other two.

3

Page 5: McGill University Part A Examination in Statistics ... · Cane let to grow for one year after being cut (this is, the cane is flrst ratoon) can be considered to have an age of one

Part A Examination August 2007 Methodology Paper

SM3. The mean shift outlier model is given by:

y = Xβ + φzn + ε

where zn is a given (n× 1) vector with zeroes in all positions except the n−thposition, which contains a 1. φ is an unknown scalar parameter and ε is a (n×1)vector of independent Normal(0,σ2) random variables.

(a) [2 marks] Write out the expected value for the nth observation. Whatis the intercept for this model (i.e. what is the expected value for anobservation with all covariate values equal to 0)?

(b) [4 marks] If

d = [z′n(I−H)zn]−1z′n(I−H)y

then show that d = (yn−yn)2

1−hnn, where hnn is the n−th diagonal element of

the hat matrix for the covariates in X.

(c) [4 marks] Show that the increase in regression sum of squares after fittingφ is given by

SSR2 = d2z′n(I−H)zn = e2n/(1− hnn).

(d) [4 marks] From part (c), deduce the corresponding reduction in SSE.

(e) [6 marks] What statistical test would you do to test the hypothesis H0 :φ = 0? What influence diagnostic is this test statistic equivalent to?

4

Page 6: McGill University Part A Examination in Statistics ... · Cane let to grow for one year after being cut (this is, the cane is flrst ratoon) can be considered to have an age of one

Part A Examination August 2007 Methodology Paper

Section 2: Answer two of questions SM4 to SM6.

SM4. Dr P. J. Solomon of the Australian National Centre in HIV Epidemiology andClinical Research collected data on 2843 patients diagnosed with AIDS in Aus-tralia before 1 July 1991:

state: Grouped state of origin: NSW, Other, QLD or VIC

sex: Sex of patient

diag: (Julian) date of diagnosis (days)

death: (Julian) date of death or end of observation (days)

status: ”A” (alive) or ”D” (dead) at end of observation

T.categ: Reported transmission category (8 categories)

age: Age (years) at diagnosis.

The survival time (time) was assumed to have an exponential distribution witha log link to a linear model in the regressors.

The output for question SM4 begins on page 36.

Choose suitable models to decide if the survival time is related to

(a) [6 marks]

i. state

ii. sex

iii. transmission category

(b) [6 marks]

i. age

ii. date of diagnosis.

Does the survival time increase or decrease with age? with date of diag-nosis?

(c) [8 marks] Choose a suitable model to estimate the mean survival timeof a 25 year old male patient diagnosed with AIDS in NSW on July 12004 (diag=16253) who reported transmission by heterosexual contact(T.categhet), and the probability that such a patient would survive morethan three years (365×3=1095 days). How reliable do you think this esti-mator is?

5

Page 7: McGill University Part A Examination in Statistics ... · Cane let to grow for one year after being cut (this is, the cane is flrst ratoon) can be considered to have an age of one

Part A Examination August 2007 Methodology Paper

SM5. A breast cancer database was obtained from the University of Wisconsin Hos-pitals, Madison from Dr. William H. Wolberg. He assessed biopsies of breasttumors for 699 patients up to 15 July 1992; each of nine attributes has beenscored on a scale of 1 to 10, and the outcome is also known: benign (Y=0) ormalignant (Y=1). This data frame contains the following columns:

V1 Clump thickness

V2 Uniformity of cell size

V3 Uniformity of cell shape

V4 Marginal adhesion

V5 Single epithelial cell size

V6 Bare nuclei (16 values are missing)

V7 Bland chromatin

V8 Normal nucleoli

V9 Mitoses class ”benign” or ”malignant”

The output for question SM5 begins on page 38.

(a) [6 marks] To relate the probability that a tumor is malignant to the firstvariable, clump thickness, two sets of models were fitted, the first assuminga normal family, the second assuming a binomial family. Choose suitablemodels to test for a linear effect of V1.

(b) [3 marks] A factor fV1 was created taking the values of V1 as levels. Usethis to test if the effect of clump thickness is linear in V1 (as opposed tonon-linear).

(c) [3 marks] Look at Figure 17 (a) and (b) on page 41. Why is it that theplots of the fitted values using the model with V1 (triangles) are differentin Figures (a) and (b), yet the plots of the fitted values using the modelwith fV1 (circles) are the same in Figures (a) and (b)? Explain.

(d) [6 marks] Which attributes are related to the malignancy of breast tu-mors?

(e) [2 marks] Do you think a goodness of fit test for the last model is valid?If so, do it; if not, say why not.

6

Page 8: McGill University Part A Examination in Statistics ... · Cane let to grow for one year after being cut (this is, the cane is flrst ratoon) can be considered to have an age of one

Part A Examination August 2007 Methodology Paper

SM6. Carl Morris (see next page) showed that there are only six families of distri-butions in the exponential family with quadratic variance functions: normal,poisson, gamma, binomial, negative binomial, and a sixth distribution which hecalled the hyperbolic secant distribution. Its variance function is V (µ) = µ2+1,it is continuous on (−∞,∞) (like the normal distribution), but it is not sym-metric. The deviance parameter is φ > 0. (If m = 1/φ is an integer and µ = 0,then the hyperbolic secant random variable is Y = (2/π)

∑mi=1 log |Ci|, where

C1, . . . , Cm are independent Cauchy random variables.)

(a) [4 marks] Find the canonical link. [Hint: make the substitution µ =tan θ]. Is this a good choice for a generalized linear model?

(b) [4 marks] What is the variance function of the inverse hyperbolic secantdistribution?

(c) [4 marks] Find an expression for the deviance as a function of the obser-vations Y1, Y2, . . . , Yn and their fitted values µ1, µ2, . . . , µn.

(d) [4 marks] Suppose we have 4 observations from this distribution withvalues 0.2,0.5,0.4,0.9. If the mean µ and the deviance parameter φ is thethe same for each observation, find the maximum likelihood estimate of µ,and any good estimate of φ.

(e) [4 marks] We suspect that the data in (d) have a hyperbolic secant dis-tribution with φ = 0.05. Do you think a goodness of fit test for this modelwith φ = 0.05 is valid? If so, do it (approximately); if not, say why not.

7

Page 9: McGill University Part A Examination in Statistics ... · Cane let to grow for one year after being cut (this is, the cane is flrst ratoon) can be considered to have an age of one

Part A Examination August 2007 Methodology Paper

8

Page 10: McGill University Part A Examination in Statistics ... · Cane let to grow for one year after being cut (this is, the cane is flrst ratoon) can be considered to have an age of one

Part A Examination August 2007 Methodology Paper

Code and output for Question SM1

###Code and output for Question SM1 (a)

> age.mod<-lm(sbp ~ age)

> summary(age.mod)

Call:

lm(formula = sbp ~ age)

Residuals:

Min 1Q Median 3Q Max

-63.080 -16.688 -2.787 11.961 96.815

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 108.30786 4.19013 25.848 < 2e-16 ***

age 0.51462 0.08332 6.176 1.69e-09 ***

---

Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

Residual standard error: 24.54 on 379 degrees of freedom

Multiple R-Squared: 0.09145, Adjusted R-squared: 0.08905

F-statistic: 38.15 on 1 and 379 DF, p-value: 1.692e-09

############

> gen.mod<-lm(sbp~gender)

> summary(gen.mod)

Call:

lm(formula = sbp ~ gender)

Residuals:

Min 1Q Median 3Q Max

-53.198 -15.198 -3.198 16.802 103.431

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 133.1977 1.6030 83.093 <2e-16 ***

genderMale -0.6286 2.8213 -0.223 0.824

---

Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

Residual standard error: 25.75 on 379 degrees of freedom

Multiple R-Squared: 0.000131, Adjusted R-squared: -0.002507

F-statistic: 0.04964 on 1 and 379 DF, p-value: 0.8238

9

Page 11: McGill University Part A Examination in Statistics ... · Cane let to grow for one year after being cut (this is, the cane is flrst ratoon) can be considered to have an age of one

Part A Examination August 2007 Methodology Paper

##########

> vill.mod<-lm(sbp~village)

> summary(vill.mod)

Call:

lm(formula = sbp ~ village)

Residuals:

Min 1Q Median 3Q Max

-51.625 -18.456 -4.714 14.390 104.375

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 129.625 4.558 28.440 <2e-16 ***

villageBatey Verde 5.985 6.082 0.984 0.326

villageCarmona 3.375 5.582 0.605 0.546

villageCojobal 3.172 5.660 0.560 0.576

villageJuan Sanchez 5.733 5.772 0.993 0.321

villageLa Altagracia 2.000 6.115 0.327 0.744

villageLos Gueneos -1.169 5.695 -0.205 0.838

villageSan Antonio 9.089 6.306 1.441 0.150

---

Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

Residual standard error: 25.78 on 373 degrees of freedom

Multiple R-Squared: 0.01328, Adjusted R-squared: -0.005241

F-statistic: 0.717 on 7 and 373 DF, p-value: 0.6577

############

########### Code and output for Question SM1 (b)

> all.mod<-lm(sbp~age+village+gender)

> summary(all.mod)

Call:

lm(formula = sbp ~ age + village + gender)

Residuals:

Min 1Q Median 3Q Max

-63.142 -15.789 -3.318 12.842 101.729

10

Page 12: McGill University Part A Examination in Statistics ... · Cane let to grow for one year after being cut (this is, the cane is flrst ratoon) can be considered to have an age of one

Part A Examination August 2007 Methodology Paper

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 105.63285 5.89386 17.923 < 2e-16 ***

age 0.55251 0.08717 6.339 6.73e-10 ***

villageBatey Verde 4.94172 5.80636 0.851 0.3953

villageCarmona 2.47623 5.31836 0.466 0.6418

villageCojobal 2.24775 5.42089 0.415 0.6786

villageJuan Sanchez 4.87772 5.51608 0.884 0.3771

villageLa Altagracia 1.07375 5.82694 0.184 0.8539

villageLos Gueneos -0.55883 5.43606 -0.103 0.9182

villageSan Antonio 7.15539 6.01397 1.190 0.2349

genderMale -5.58559 2.84672 -1.962 0.0505 .

---

Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

Residual standard error: 24.56 on 371 degrees of freedom

Multiple R-Squared: 0.1098, Adjusted R-squared: 0.08823

F-statistic: 5.086 on 9 and 371 DF, p-value: 1.699e-06

######

> anova(glm(sbp~age+gender+village))

Analysis of Deviance Table

Model: gaussian, link: identity

Response: sbp

Terms added sequentially (first to last)

Df Deviance Resid. Df Resid. Dev

NULL 380 251294

age 1 22980 379 228314

gender 1 2518 378 225796

village 7 2100 371 223696

#####

> anova(glm(sbp~age+village+gender))

Analysis of Deviance Table

Model: gaussian, link: identity

Response: sbp

11

Page 13: McGill University Part A Examination in Statistics ... · Cane let to grow for one year after being cut (this is, the cane is flrst ratoon) can be considered to have an age of one

Part A Examination August 2007 Methodology Paper

Terms added sequentially (first to last)

Df Deviance Resid. Df Resid. Dev

NULL 380 251294

age 1 22980 379 228314

village 7 2297 372 226017

gender 1 2321 371 223696

######

> anova(glm(sbp~village+gender+age))

Analysis of Deviance Table

Model: gaussian, link: identity

Response: sbp

Terms added sequentially (first to last)

Df Deviance Resid. Df Resid. Dev

NULL 380 251294

village 7 3336 373 247958

gender 1 37 372 247921

age 1 24225 371 223696

12

Page 14: McGill University Part A Examination in Statistics ... · Cane let to grow for one year after being cut (this is, the cane is flrst ratoon) can be considered to have an age of one

Part A Examination August 2007 Methodology Paper

120 130 140 150 160

−50

050

100

Fitted values

Res

idua

ls

Residuals vs Fitted

226

24882

−3 −2 −1 0 1 2 3

−2

01

23

4Theoretical Quantiles

Sta

ndar

dize

d re

sidu

als

Normal Q−Q

226

24882

120 130 140 150 160

0.0

0.5

1.0

1.5

2.0

Fitted values

Sta

ndar

dize

d re

sidu

als

Scale−Location226

24882

0 100 200 300

0.00

0.02

0.04

0.06

0.08

Obs. number

Coo

k’s

dist

ance

Cook’s distance339338

259

Figure 1: Regression diagnostics for sbp ~age

13

Page 15: McGill University Part A Examination in Statistics ... · Cane let to grow for one year after being cut (this is, the cane is flrst ratoon) can be considered to have an age of one

Part A Examination August 2007 Methodology Paper

132.6 132.8 133.0 133.2

−50

050

100

Fitted values

Res

idua

ls

Residuals vs Fitted

226

259 248

−3 −2 −1 0 1 2 3

−2

−1

01

23

4Theoretical Quantiles

Sta

ndar

dize

d re

sidu

als

Normal Q−Q

226

259248

132.6 132.8 133.0 133.2

0.0

0.5

1.0

1.5

2.0

Fitted values

Sta

ndar

dize

d re

sidu

als

Scale−Location226

259 248

0 100 200 300

0.00

0.02

0.04

0.06

Obs. number

Coo

k’s

dist

ance

Cook’s distance226

259

9

Figure 2: Regression diagnostics for sbp ~gender

14

Page 16: McGill University Part A Examination in Statistics ... · Cane let to grow for one year after being cut (this is, the cane is flrst ratoon) can be considered to have an age of one

Part A Examination August 2007 Methodology Paper

132.6 132.8 133.0 133.2

−50

050

100

Fitted values

Res

idua

ls

Residuals vs Fitted

226

259 248

−3 −2 −1 0 1 2 3

−2

−1

01

23

4Theoretical Quantiles

Sta

ndar

dize

d re

sidu

als

Normal Q−Q

226

259248

132.6 132.8 133.0 133.2

0.0

0.5

1.0

1.5

2.0

Fitted values

Sta

ndar

dize

d re

sidu

als

Scale−Location226

259 248

0 100 200 300

0.00

0.02

0.04

0.06

Obs. number

Coo

k’s

dist

ance

Cook’s distance226

259

9

Figure 3: Regression diagnostics for sbp ~village

15

Page 17: McGill University Part A Examination in Statistics ... · Cane let to grow for one year after being cut (this is, the cane is flrst ratoon) can be considered to have an age of one

Part A Examination August 2007 Methodology Paper

120 130 140 150 160

−50

050

100

Fitted values

Res

idua

ls

Residuals vs Fitted

226

248174

−3 −2 −1 0 1 2 3

−2

01

23

4Theoretical Quantiles

Sta

ndar

dize

d re

sidu

als

Normal Q−Q

226

248174

120 130 140 150 160

0.0

0.5

1.0

1.5

2.0

Fitted values

Sta

ndar

dize

d re

sidu

als

Scale−Location226

248174

0 100 200 300

0.00

0.02

0.04

Obs. number

Coo

k’s

dist

ance

Cook’s distance226

248338

Figure 4: Regression diagnostics for sbp ~village + age

16

Page 18: McGill University Part A Examination in Statistics ... · Cane let to grow for one year after being cut (this is, the cane is flrst ratoon) can be considered to have an age of one

Part A Examination August 2007 Methodology Paper

128 130 132 134 136 138

−50

050

100

Fitted values

Res

idua

ls

Residuals vs Fitted

226

248259

−3 −2 −1 0 1 2 3

−2

−1

01

23

4Theoretical Quantiles

Sta

ndar

dize

d re

sidu

als

Normal Q−Q

226

248259

128 130 132 134 136 138

0.0

0.5

1.0

1.5

2.0

Fitted values

Sta

ndar

dize

d re

sidu

als

Scale−Location226

248259

0 100 200 300

0.00

0.02

0.04

0.06

Obs. number

Coo

k’s

dist

ance

Cook’s distance226

259248

Figure 5: Regression diagnostics for sbp ~village + gender

17

Page 19: McGill University Part A Examination in Statistics ... · Cane let to grow for one year after being cut (this is, the cane is flrst ratoon) can be considered to have an age of one

Part A Examination August 2007 Methodology Paper

110 120 130 140 150 160

−50

050

100

Fitted values

Res

idua

ls

Residuals vs Fitted

226

9 248

−3 −2 −1 0 1 2 3

−2

02

4Theoretical Quantiles

Sta

ndar

dize

d re

sidu

als

Normal Q−Q

226

9248

110 120 130 140 150 160

0.0

0.5

1.0

1.5

2.0

Fitted values

Sta

ndar

dize

d re

sidu

als

Scale−Location226

9 248

0 100 200 300

0.00

0.02

0.04

0.06

0.08

Obs. number

Coo

k’s

dist

ance

Cook’s distance339338

226

Figure 6: Regression diagnostics for sbp ~age + gender

18

Page 20: McGill University Part A Examination in Statistics ... · Cane let to grow for one year after being cut (this is, the cane is flrst ratoon) can be considered to have an age of one

Part A Examination August 2007 Methodology Paper

110 120 130 140 150 160

−50

050

100

Fitted values

Res

idua

ls

Residuals vs Fitted

226

2489

−3 −2 −1 0 1 2 3

−2

02

4Theoretical Quantiles

Sta

ndar

dize

d re

sidu

als

Normal Q−Q

226

2489

110 120 130 140 150 160

0.0

0.5

1.0

1.5

2.0

Fitted values

Sta

ndar

dize

d re

sidu

als

Scale−Location226

2489

0 100 200 300

0.00

0.02

0.04

Obs. number

Coo

k’s

dist

ance

Cook’s distance226

338339

Figure 7: Regression diagnostics for sbp ~age + gender + village

19

Page 21: McGill University Part A Examination in Statistics ... · Cane let to grow for one year after being cut (this is, the cane is flrst ratoon) can be considered to have an age of one

Part A Examination August 2007 Methodology Paper

####

#### Output and code for Question SM1 (d)

> mybox<-boxcox(lm(sbp~age+gender+village))

> mybox$x[order(mybox$y,decreasing=T)[1]]

[1] -0.5454545

20

Page 22: McGill University Part A Examination in Statistics ... · Cane let to grow for one year after being cut (this is, the cane is flrst ratoon) can be considered to have an age of one

Part A Examination August 2007 Methodology Paper

−2 −1 0 1 2

−23

80−

2370

−23

60−

2350

−23

40−

2330

λ

log−

Like

lihoo

d

95%

Figure 8: Box-Cox transformation diagnostic plot for systolic blood pressure dataset

21

Page 23: McGill University Part A Examination in Statistics ... · Cane let to grow for one year after being cut (this is, the cane is flrst ratoon) can be considered to have an age of one

Part A Examination August 2007 Methodology Paper

Code and output for Question SM2

### Code for Question SM2, part (a)

> districtonly.mod<-lm(Sugar~DistrictPosition)

> summary(districtonly.mod)

Call:

lm(formula = Sugar ~ DistrictPosition)

Residuals:

Min 1Q Median 3Q Max

-5.5326 -0.8103 0.0716 0.8912 4.2147

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 11.42840 0.06007 190.248 < 2e-16 ***

DistrictPositionE 0.35294 0.08566 4.120 3.86e-05 ***

DistrictPositionN 1.72268 0.08709 19.781 < 2e-16 ***

DistrictPositionS 0.18416 0.08281 2.224 0.026222 *

DistrictPositionW 0.23187 0.06792 3.414 0.000646 ***

----

Residual standard error: 1.341 on 3770 degrees of freedom

Multiple R-Squared: 0.1227,Adjusted R-squared: 0.1217

F-statistic: 131.8 on 4 and 3770 DF, p-value: < 2.2e-16

22

Page 24: McGill University Part A Examination in Statistics ... · Cane let to grow for one year after being cut (this is, the cane is flrst ratoon) can be considered to have an age of one

Part A Examination August 2007 Methodology Paper

C E N S W

68

10

12

14

16

Figure 9: Boxplots of Sugar by DistrictPosition

23

Page 25: McGill University Part A Examination in Statistics ... · Cane let to grow for one year after being cut (this is, the cane is flrst ratoon) can be considered to have an age of one

Part A Examination August 2007 Methodology Paper

11.5 12.0 12.5 13.0

−6

−2

02

4

Fitted values

Re

sid

ua

ls

Residuals vs Fitted

11991198703

−2 0 2

−4

−2

02

4Theoretical Quantiles

Sta

nd

ard

ize

d r

esid

ua

ls

Normal Q−Q

11991198703

11.5 12.0 12.5 13.0

0.0

0.5

1.0

1.5

2.0

Fitted values

Sta

nd

ard

ize

d r

esid

ua

ls

Scale−Location11991198703

0 1000 2000 3000

0.0

00

0.0

03

0.0

06

Obs. number

Co

ok’s

dis

tan

ce

Cook’s distance703

11991198

Figure 10: Regression diagnostics for model for Sugar including onlyDistrictPosition

24

Page 26: McGill University Part A Examination in Statistics ... · Cane let to grow for one year after being cut (this is, the cane is flrst ratoon) can be considered to have an age of one

Part A Examination August 2007 Methodology Paper

### Code for Question SM2, part (b)

###### Age Only

> age.mod <- lm(Sugar~Age)

> summary(age.mod)

Call:

lm(formula = Sugar ~ Age)

Residuals:

Min 1Q Median 3Q Max

-5.49281 -0.87684 0.03579 0.89816 5.21414

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 12.14586 0.03776 321.66 <2e-16 ***

Age -0.15305 0.01395 -10.97 <2e-16 ***

---

Residual standard error: 1.408 on 3773 degrees of freedom

Multiple R-Squared: 0.03092,Adjusted R-squared: 0.03066

F-statistic: 120.4 on 1 and 3773 DF, p-value: < 2.2e-16

########### District Position + Age

> summary(district.age.mod)

Call:

lm(formula = Sugar ~ DistrictPosition + Age)

Residuals:

Min 1Q Median 3Q Max

-5.44238 -0.80695 0.08379 0.90526 4.44009

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 11.66672 0.06424 181.606 < 2e-16 ***

DistrictPositionE 0.40397 0.08478 4.765 1.96e-06 ***

DistrictPositionN 1.70123 0.08606 19.767 < 2e-16 ***

DistrictPositionS 0.25396 0.08213 3.092 0.002 **

DistrictPositionW 0.28142 0.06729 4.182 2.95e-05 ***

Age -0.12831 0.01324 -9.687 < 2e-16 ***

---

25

Page 27: McGill University Part A Examination in Statistics ... · Cane let to grow for one year after being cut (this is, the cane is flrst ratoon) can be considered to have an age of one

Part A Examination August 2007 Methodology Paper

Residual standard error: 1.324 on 3769 degrees of freedom

Multiple R-Squared: 0.144,Adjusted R-squared: 0.1429

F-statistic: 126.8 on 5 and 3769 DF, p-value: < 2.2e-16

#############

> anova(age.mod,district.age.mod)

Analysis of Variance Table

Model 1: Sugar ~ Age

Model 2: Sugar ~ DistrictPosition + Age

Res.Df RSS Df Sum of Sq F Pr(>F)

1 3773 7483.5

2 3769 6610.3 4 873.2 124.47 < 2.2e-16 ***

> anova(districtonly.mod,district.age.mod)

Analysis of Variance Table

Model 1: Sugar ~ DistrictPosition

Model 2: Sugar ~ DistrictPosition + Age

Res.Df RSS Df Sum of Sq F Pr(>F)

1 3770 6774.9

2 3769 6610.3 1 164.6 93.846 < 2.2e-16 ***

---

26

Page 28: McGill University Part A Examination in Statistics ... · Cane let to grow for one year after being cut (this is, the cane is flrst ratoon) can be considered to have an age of one

Part A Examination August 2007 Methodology Paper

0 2 4 6 8

68

10

12

14

16

Age

Sugar

Figure 11: Plot of Sugar by Age

27

Page 29: McGill University Part A Examination in Statistics ... · Cane let to grow for one year after being cut (this is, the cane is flrst ratoon) can be considered to have an age of one

Part A Examination August 2007 Methodology Paper

11.0 11.2 11.4 11.6 11.8 12.0

−6

−2

02

46

Fitted values

Re

sid

ua

ls

Residuals vs Fitted

70311991198

−2 0 2

−4

−2

02

4Theoretical Quantiles

Sta

nd

ard

ize

d r

esid

ua

ls

Normal Q−Q

70311991198

11.0 11.2 11.4 11.6 11.8 12.0

0.0

0.5

1.0

1.5

2.0

Fitted values

Sta

nd

ard

ize

d r

esid

ua

ls

Scale−Location70311991198

0 1000 2000 3000

0.0

00

0.0

04

0.0

08

Obs. number

Co

ok’s

dis

tan

ce

Cook’s distance11991198

658

Figure 12: Regression diagnostics for model for Sugar including only Age

28

Page 30: McGill University Part A Examination in Statistics ... · Cane let to grow for one year after being cut (this is, the cane is flrst ratoon) can be considered to have an age of one

Part A Examination August 2007 Methodology Paper

11.0 11.5 12.0 12.5 13.0

−6

−2

02

4

Fitted values

Re

sid

ua

ls

Residuals vs Fitted

70311991198

−2 0 2

−4

−2

02

4Theoretical Quantiles

Sta

nd

ard

ize

d r

esid

ua

ls

Normal Q−Q

70311991198

11.0 11.5 12.0 12.5 13.0

0.0

0.5

1.0

1.5

2.0

Fitted values

Sta

nd

ard

ize

d r

esid

ua

ls

Scale−Location70311991198

0 1000 2000 3000

0.0

00

0.0

03

0.0

06

Obs. number

Co

ok’s

dis

tan

ce

Cook’s distance1199703 1198

Figure 13: Regression diagnostics for model for Sugar including only Age andDistrictPosition

29

Page 31: McGill University Part A Examination in Statistics ... · Cane let to grow for one year after being cut (this is, the cane is flrst ratoon) can be considered to have an age of one

Part A Examination August 2007 Methodology Paper

### Code for Question SM2, part (c)

###### HarvestMonth as linear

> harvest.mod<-lm(Sugar~DistrictPosition+Age+HarvestMonth)

> summary(harvest.mod)

Call:

lm(formula = Sugar ~ DistrictPosition + Age + HarvestMonth)

Residuals:

Min 1Q Median 3Q Max

-5.70976 -0.76860 0.07636 0.88184 4.04782

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 10.16050 0.13535 75.068 < 2e-16 ***

DistrictPositionE 0.42889 0.08309 5.161 2.58e-07 ***

DistrictPositionN 1.74824 0.08441 20.712 < 2e-16 ***

DistrictPositionS 0.28550 0.08051 3.546 0.000396 ***

DistrictPositionW 0.28567 0.06593 4.333 1.51e-05 ***

Age -0.13841 0.01300 -10.645 < 2e-16 ***

HarvestMonth 0.17588 0.01399 12.570 < 2e-16 ***

-----

Residual standard error: 1.298 on 3768 degrees of freedom

Multiple R-Squared: 0.1784,Adjusted R-squared: 0.1771

F-statistic: 136.4 on 6 and 3768 DF, p-value: < 2.2e-16

###### HarvestMonth as factor

> harvest.fact.mod<-lm(Sugar~DistrictPosition+Age+factor(HarvestMonth))

> summary(harvest.quad.mod)

Call:

lm(formula = Sugar ~ DistrictPosition + Age + HarvestMonth +

I(HarvestMonth^2))

Residuals:

Min 1Q Median 3Q Max

-5.69731 -0.76383 0.07553 0.85751 4.38852

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 1.77596 0.73698 2.410 0.016010 *

30

Page 32: McGill University Part A Examination in Statistics ... · Cane let to grow for one year after being cut (this is, the cane is flrst ratoon) can be considered to have an age of one

Part A Examination August 2007 Methodology Paper

DistrictPositionE 0.47098 0.08175 5.761 9.01e-09 ***

DistrictPositionN 1.80860 0.08312 21.758 < 2e-16 ***

DistrictPositionS 0.30759 0.07915 3.886 0.000104 ***

DistrictPositionW 0.29750 0.06481 4.591 4.56e-06 ***

Age -0.08725 0.01352 -6.452 1.24e-10 ***

HarvestMonth 2.16681 0.17267 12.549 < 2e-16 ***

I(HarvestMonth^2) -0.11630 0.01006 -11.567 < 2e-16 ***

---

Residual standard error: 1.275 on 3767 degrees of freedom

Multiple R-Squared: 0.2066,Adjusted R-squared: 0.2051

F-statistic: 140.2 on 7 and 3767 DF, p-value: < 2.2e-16

###### HarvestMonth with linear and quadratic terms

> harvest.quad.mod<-lm(Sugar~DistrictPosition+Age+HarvestMonth +

I(HarvestMonth^2))

> summary(harvest.fact.mod)

Call:

lm(formula = Sugar ~ DistrictPosition + Age + factor(HarvestMonth))

Residuals:

Min 1Q Median 3Q Max

-5.53426 -0.75585 0.06586 0.84534 4.18221

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 10.43768 0.09790 106.618 < 2e-16 ***

DistrictPositionE 0.47509 0.08133 5.841 5.62e-09 ***

DistrictPositionN 1.82553 0.08277 22.055 < 2e-16 ***

DistrictPositionS 0.32759 0.07891 4.151 3.38e-05 ***

DistrictPositionW 0.31851 0.06463 4.928 8.65e-07 ***

Age -0.08761 0.01346 -6.510 8.50e-11 ***

factor(HarvestMonth)7 0.82974 0.08483 9.781 < 2e-16 ***

factor(HarvestMonth)8 1.34458 0.08659 15.528 < 2e-16 ***

factor(HarvestMonth)9 1.40121 0.08659 16.183 < 2e-16 ***

factor(HarvestMonth)10 1.15057 0.08423 13.659 < 2e-16 ***

factor(HarvestMonth)11 1.28707 0.09230 13.944 < 2e-16 ***

----

Residual standard error: 1.268 on 3764 degrees of freedom

Multiple R-Squared: 0.216,Adjusted R-squared: 0.214

31

Page 33: McGill University Part A Examination in Statistics ... · Cane let to grow for one year after being cut (this is, the cane is flrst ratoon) can be considered to have an age of one

Part A Examination August 2007 Methodology Paper

F-statistic: 103.7 on 10 and 3764 DF, p-value: < 2.2e-16

#### Model selection criteria

> AIC(harvest.mod)

[1] 12688.75

> AIC(harvest.quad.mod)

[1] 12559.00

> AIC(harvest.fact.mod)

[1] 12519.90

> AIC(harvest.mod,k=log(length(Sugar)))

[1] 12738.64

> AIC(harvest.quad.mod,k=log(length(Sugar)))

[1] 12615.13

> AIC(harvest.fact.mod,k=log(length(Sugar)))

[1] 12594.74

32

Page 34: McGill University Part A Examination in Statistics ... · Cane let to grow for one year after being cut (this is, the cane is flrst ratoon) can be considered to have an age of one

Part A Examination August 2007 Methodology Paper

10.5 11.0 11.5 12.0 12.5 13.0 13.5

−6

−2

02

4

Fitted values

Re

sid

ua

ls

Residuals vs Fitted

7031199 709

−2 0 2

−4

−2

02

4Theoretical Quantiles

Sta

nd

ard

ize

d r

esid

ua

ls

Normal Q−Q

7031199709

10.5 11.0 11.5 12.0 12.5 13.0 13.5

0.0

0.5

1.0

1.5

2.0

Fitted values

Sta

nd

ard

ize

d r

esid

ua

ls

Scale−Location7031199 709

0 1000 2000 3000

0.0

00

0.0

04

0.0

08

Obs. number

Co

ok’s

dis

tan

ce

Cook’s distance1199

70924

Figure 14: Regression diagnostics for model using Harvest

33

Page 35: McGill University Part A Examination in Statistics ... · Cane let to grow for one year after being cut (this is, the cane is flrst ratoon) can be considered to have an age of one

Part A Examination August 2007 Methodology Paper

10.0 10.5 11.0 11.5 12.0 12.5 13.0 13.5

−6

−2

02

4

Fitted values

Re

sid

ua

ls

Residuals vs Fitted

70311981199

−2 0 2

−4

−2

02

4Theoretical Quantiles

Sta

nd

ard

ize

d r

esid

ua

ls

Normal Q−Q

70311981199

10.0 10.5 11.0 11.5 12.0 12.5 13.0 13.5

0.0

0.5

1.0

1.5

2.0

Fitted values

Sta

nd

ard

ize

d r

esid

ua

ls

Scale−Location70311981199

0 1000 2000 3000

0.0

00

0.0

04

0.0

08

Obs. number

Co

ok’s

dis

tan

ce

Cook’s distance709 1199

442

Figure 15: Regression diagnostics for model using Harvest2

34

Page 36: McGill University Part A Examination in Statistics ... · Cane let to grow for one year after being cut (this is, the cane is flrst ratoon) can be considered to have an age of one

Part A Examination August 2007 Methodology Paper

10 11 12 13

−6

−2

02

4

Fitted values

Re

sid

ua

ls

Residuals vs Fitted

11997031198

−2 0 2

−4

−2

02

4Theoretical Quantiles

Sta

nd

ard

ize

d r

esid

ua

ls

Normal Q−Q

11997031198

10 11 12 13

0.0

0.5

1.0

1.5

2.0

Fitted values

Sta

nd

ard

ize

d r

esid

ua

ls

Scale−Location11997031198

0 1000 2000 3000

0.0

00

0.0

04

Obs. number

Co

ok’s

dis

tan

ce

Cook’s distance1199709

24

Figure 16: Regression diagnostics for model using factor(Harvest)

35

Page 37: McGill University Part A Examination in Statistics ... · Cane let to grow for one year after being cut (this is, the cane is flrst ratoon) can be considered to have an age of one

Part A Examination August 2007 Methodology Paper

Code and output for Question SM4

> data(Aids2)> attach(Aids2)> time<-death-diag+1> c<-codes(status)-1> rate<-c/time> summary(glm(rate~state+sex+diag+T.categ+age, family=poisson, weight=time))

Call:glm(formula = rate ~ state + sex + diag + T.categ + age, family = poisson,

weights = time)

Deviance Residuals:Min 1Q Median 3Q Max

-4.37597 -0.77433 0.04455 0.91472 3.42263

Coefficients:Estimate Std. Error z value Pr(>|z|)

(Intercept) -3.6728465 0.4716294 -7.788 6.83e-15 ***stateOther -0.0944785 0.0895655 -1.055 0.29149stateQLD 0.1860238 0.0878128 2.118 0.03414 *stateVIC -0.0018092 0.0613208 -0.030 0.97646sexM -0.0369529 0.1757609 -0.210 0.83348diag -0.0003179 0.0000421 -7.552 4.29e-14 ***T.categhsid -0.1211765 0.1520374 -0.797 0.42544T.categid -0.3799289 0.2459986 -1.544 0.12248T.categhet -0.7307894 0.2652388 -2.755 0.00587 **T.categhaem 0.3462834 0.1881367 1.841 0.06568 .T.categblood 0.1393095 0.1374007 1.014 0.31063T.categmother 0.4603228 0.5893405 0.781 0.43475T.categother 0.1200160 0.1636915 0.733 0.46345age 0.0139496 0.0024987 5.583 2.37e-08 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

Null deviance: 4407.2 on 2842 degrees of freedomResidual deviance: 4283.0 on 2829 degrees of freedomAIC: Inf

Number of Fisher Scoring iterations: 8

> glm(rate~state+sex+diag+age, family=poisson, weight=time)

Degrees of Freedom: 2842 Total (i.e. Null); 2836 ResidualNull Deviance: 4407Residual Deviance: 4302 AIC: Inf> glm(rate~sex+diag+T.categ+age, family=poisson, weight=time)

Degrees of Freedom: 2842 Total (i.e. Null); 2832 ResidualNull Deviance: 4407Residual Deviance: 4289 AIC: Inf

36

Page 38: McGill University Part A Examination in Statistics ... · Cane let to grow for one year after being cut (this is, the cane is flrst ratoon) can be considered to have an age of one

Part A Examination August 2007 Methodology Paper

> glm(rate~state+diag+T.categ+age, family=poisson, weight=time)

Degrees of Freedom: 2842 Total (i.e. Null); 2830 ResidualNull Deviance: 4407Residual Deviance: 4283 AIC: Inf> glm(rate~diag+T.categ+age, family=poisson, weight=time)

Degrees of Freedom: 2842 Total (i.e. Null); 2833 ResidualNull Deviance: 4407Residual Deviance: 4289 AIC: Inf> glm(rate~sex+diag+age, family=poisson, weight=time)

Degrees of Freedom: 2842 Total (i.e. Null); 2839 ResidualNull Deviance: 4407Residual Deviance: 4308 AIC: InfThere were 50 or more warnings (use warnings() to see the first 50)> glm(rate~state+diag+age, family=poisson, weight=time)

Degrees of Freedom: 2842 Total (i.e. Null); 2837 ResidualNull Deviance: 4407Residual Deviance: 4303 AIC: InfThere were 50 or more warnings (use warnings() to see the first 50)> summary(glm(rate~diag+age, family=poisson, weight=time))

Call:glm(formula = rate ~ diag + age, family = poisson, weights = time)

Deviance Residuals:Min 1Q Median 3Q Max

-4.19449 -0.77350 0.04316 0.91967 3.40510

Coefficients:Estimate Std. Error z value Pr(>|z|)

(Intercept) -3.681e+00 4.352e-01 -8.457 < 2e-16 ***diag -3.251e-04 4.122e-05 -7.888 3.07e-15 ***age 1.521e-02 2.411e-03 6.308 2.83e-10 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

Null deviance: 4407.2 on 2842 degrees of freedomResidual deviance: 4309.4 on 2840 degrees of freedomAIC: Inf

Number of Fisher Scoring iterations: 8

There were 50 or more warnings (use warnings() to see the first 50)

37

Page 39: McGill University Part A Examination in Statistics ... · Cane let to grow for one year after being cut (this is, the cane is flrst ratoon) can be considered to have an age of one

Part A Examination August 2007 Methodology Paper

Code and output for Question SM5

> data(biopsy)> attach(biopsy)> Y<-codes(class)-1> fV1<-factor(V1)> par(mfrow=c(2,2))> glm0<-glm(Y~V1)> summary(glm0)Call:glm(formula = Y ~ V1)

Deviance Residuals:Min 1Q Median 3Q Max

-0.77804 -0.17331 -0.01994 0.06859 1.06859

Coefficients:Estimate Std. Error t value Pr(>|t|)

(Intercept) -0.189535 0.023395 -8.102 2.43e-15 ***V1 0.120947 0.004467 27.078 < 2e-16 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1(Dispersion parameter for gaussian family taken to be 0.1104095)

Null deviance: 157.908 on 698 degrees of freedomResidual deviance: 76.955 on 697 degrees of freedomAIC: 447.39 Number of Fisher Scoring iterations: 2

> glm1<-glm(Y~fV1)> summary(glm1)Call:glm(formula = Y ~ fV1)

Deviance Residuals:Min 1Q Median 3Q Max

-9.565e-01 -1.111e-01 -2.069e-02 3.331e-15 9.793e-01

Coefficients:Estimate Std. Error t value Pr(>|t|)

(Intercept) 0.02069 0.02647 0.782 0.43466fV12 0.05931 0.05227 1.135 0.25689fV13 0.09042 0.04051 2.232 0.02593 *fV14 0.12931 0.04439 2.913 0.00369 **fV15 0.32546 0.03850 8.455 < 2e-16 ***fV16 0.50872 0.06073 8.377 3.04e-16 ***fV17 0.93583 0.07153 13.083 < 2e-16 ***fV18 0.89235 0.05393 16.546 < 2e-16 ***fV19 0.97931 0.08920 10.979 < 2e-16 ***fV110 0.97931 0.04661 21.010 < 2e-16 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1(Dispersion parameter for gaussian family taken to be 0.1015776)

Null deviance: 157.908 on 698 degrees of freedomResidual deviance: 69.987 on 689 degrees of freedomAIC: 397.04 Number of Fisher Scoring iterations: 2

38

Page 40: McGill University Part A Examination in Statistics ... · Cane let to grow for one year after being cut (this is, the cane is flrst ratoon) can be considered to have an age of one

Part A Examination August 2007 Methodology Paper

> plot(V1,fitted(glm1))> points(V1,fitted(glm0),pch=2)> title(’Figure 2.1: Normal family’)> glm0<-glm(Y~V1,family=binomial)> summary(glm0)Call:glm(formula = Y ~ V1, family = binomial)

Deviance Residuals:Min 1Q Median 3Q Max

-2.1986 -0.4261 -0.1704 0.1730 2.9118

Coefficients:Estimate Std. Error z value Pr(>|z|)

(Intercept) -5.16017 0.37772 -13.66 <2e-16 ***V1 0.93546 0.07372 12.69 <2e-16 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 900.53 on 698 degrees of freedomResidual deviance: 464.05 on 697 degrees of freedomAIC: 468.05 Number of Fisher Scoring iterations: 5

> glm1<-glm(Y~fV1, family=binomial)> summary(glm1)Call:glm(formula = Y ~ fV1, family = binomial)

Deviance Residuals:Min 1Q Median 3Q Max

-2.50419 -0.48535 -0.20448 0.01184 2.78500

Coefficients:Estimate Std. Error z value Pr(>|z|)

(Intercept) -3.8572 0.5834 -6.611 3.81e-11 ***fV12 1.4149 0.7824 1.808 0.07054 .fV13 1.7778 0.6589 2.698 0.00697 **fV14 2.1226 0.6621 3.206 0.00135 **fV15 3.2212 0.6119 5.265 1.40e-07 ***fV16 3.9750 0.6771 5.871 4.34e-09 ***fV17 6.9483 1.1772 5.902 3.58e-09 ***fV18 6.2086 0.7837 7.922 2.33e-15 ***fV19 13.4232 19.3753 0.693 0.48844fV110 13.4232 8.7430 1.535 0.12471---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 900.53 on 698 degrees of freedomResidual deviance: 450.21 on 689 degrees of freedomAIC: 470.21 Number of Fisher Scoring iterations: 8

39

Page 41: McGill University Part A Examination in Statistics ... · Cane let to grow for one year after being cut (this is, the cane is flrst ratoon) can be considered to have an age of one

Part A Examination August 2007 Methodology Paper

> plot(V1,fitted(glm1))> points(V1,fitted(glm0),pch=2)> title(’Figure 2.2: Binomial family’)> summary(glm(Y~V1+V2+V3+V4+V5+V6+V7+V8+V9, family=binomial))Call:glm(formula = Y ~ V1 + V2 + V3 + V4 + V5 + V6 + V7 + V8 + V9,

family = binomial)

Deviance Residuals:Min 1Q Median 3Q Max

-3.48404 -0.11529 -0.06192 0.02221 2.46983

Coefficients:Estimate Std. Error z value Pr(>|z|)

(Intercept) -10.103859 1.170793 -8.630 < 2e-16 ***V1 0.535008 0.141838 3.772 0.000162 ***V2 -0.006278 0.208786 -0.030 0.976011V3 0.322705 0.230224 1.402 0.161005V4 0.330634 0.123318 2.681 0.007337 **V5 0.096634 0.156467 0.618 0.536836V6 0.383024 0.093741 4.086 4.39e-05 ***V7 0.447184 0.171156 2.613 0.008982 **V8 0.213030 0.112757 1.889 0.058855 .V9 0.534817 0.328105 1.630 0.103098---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 884.35 on 682 degrees of freedomResidual deviance: 102.89 on 673 degrees of freedomAIC: 122.89 Number of Fisher Scoring iterations: 7

> summary(glm(Y~V1+V4+V6+V7, family=binomial))Call:glm(formula = Y ~ V1 + V4 + V6 + V7, family = binomial)

Deviance Residuals:Min 1Q Median 3Q Max

-3.69637 -0.14510 -0.06093 0.02317 2.44758

Coefficients:Estimate Std. Error z value Pr(>|z|)

(Intercept) -10.11370 1.03190 -9.801 < 2e-16 ***V1 0.81166 0.12579 6.453 1.10e-10 ***V4 0.43412 0.11399 3.808 0.00014 ***V6 0.48136 0.08813 5.462 4.72e-08 ***V7 0.70154 0.15190 4.619 3.87e-06 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 884.35 on 682 degrees of freedomResidual deviance: 125.77 on 678 degrees of freedomAIC: 135.77 Number of Fisher Scoring iterations: 7

40

Page 42: McGill University Part A Examination in Statistics ... · Cane let to grow for one year after being cut (this is, the cane is flrst ratoon) can be considered to have an age of one

Part A Examination August 2007 Methodology Paper

2 4 6 8 10

0.0

0.2

0.4

0.6

0.8

1.0

V1

fitte

d(gl

m1)

a: Normal family

2 4 6 8 10

0.0

0.2

0.4

0.6

0.8

1.0

V1

fitte

d(gl

m1)

b: Binomial family

Figure 17: Figure for question SM4

41

Page 43: McGill University Part A Examination in Statistics ... · Cane let to grow for one year after being cut (this is, the cane is flrst ratoon) can be considered to have an age of one

Part A Examination August 2007 Methodology Paper

Upper tail probabilities of the standard Normal distribution

z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.090.0 0.5000 0.4960 0.4920 0.4880 0.4840 0.4801 0.4761 0.4721 0.4681 0.46410.1 0.4602 0.4562 0.4522 0.4483 0.4443 0.4404 0.4364 0.4325 0.4286 0.42470.2 0.4207 0.4168 0.4129 0.4090 0.4052 0.4013 0.3974 0.3936 0.3897 0.38590.3 0.3821 0.3783 0.3745 0.3707 0.3669 0.3632 0.3594 0.3557 0.3520 0.34830.4 0.3446 0.3409 0.3372 0.3336 0.3300 0.3264 0.3228 0.3192 0.3156 0.31210.5 0.3085 0.3050 0.3015 0.2981 0.2946 0.2912 0.2877 0.2843 0.2810 0.27760.6 0.2743 0.2709 0.2676 0.2643 0.2611 0.2578 0.2546 0.2514 0.2483 0.24510.7 0.2420 0.2389 0.2358 0.2327 0.2296 0.2266 0.2236 0.2206 0.2177 0.21480.8 0.2119 0.2090 0.2061 0.2033 0.2005 0.1977 0.1949 0.1922 0.1894 0.18670.9 0.1841 0.1814 0.1788 0.1762 0.1736 0.1711 0.1685 0.1660 0.1635 0.16111.0 0.1587 0.1562 0.1539 0.1515 0.1492 0.1469 0.1446 0.1423 0.1401 0.13791.1 0.1357 0.1335 0.1314 0.1292 0.1271 0.1251 0.1230 0.1210 0.1190 0.11701.2 0.1151 0.1131 0.1112 0.1093 0.1075 0.1056 0.1038 0.1020 0.1003 0.09851.3 0.0968 0.0951 0.0934 0.0918 0.0901 0.0885 0.0869 0.0853 0.0838 0.08231.4 0.0808 0.0793 0.0778 0.0764 0.0749 0.0735 0.0721 0.0708 0.0694 0.06811.5 0.0668 0.0655 0.0643 0.0630 0.0618 0.0606 0.0594 0.0582 0.0571 0.05591.6 0.0548 0.0537 0.0526 0.0516 0.0505 0.0495 0.0485 0.0475 0.0465 0.04551.7 0.0446 0.0436 0.0427 0.0418 0.0409 0.0401 0.0392 0.0384 0.0375 0.03671.8 0.0359 0.0351 0.0344 0.0336 0.0329 0.0322 0.0314 0.0307 0.0301 0.02941.9 0.0287 0.0281 0.0274 0.0268 0.0262 0.0256 0.0250 0.0244 0.0239 0.02332.0 0.0228 0.0222 0.0217 0.0212 0.0207 0.0202 0.0197 0.0192 0.0188 0.01832.1 0.0179 0.0174 0.0170 0.0166 0.0162 0.0158 0.0154 0.0150 0.0146 0.01432.2 0.0139 0.0136 0.0132 0.0129 0.0125 0.0122 0.0119 0.0116 0.0113 0.01102.3 0.0107 0.0104 0.0102 0.0099 0.0096 0.0094 0.0091 0.0089 0.0087 0.00842.4 0.0082 0.0080 0.0078 0.0075 0.0073 0.0071 0.0069 0.0068 0.0066 0.00642.5 0.0062 0.0060 0.0059 0.0057 0.0055 0.0054 0.0052 0.0051 0.0049 0.00482.6 0.0047 0.0045 0.0044 0.0043 0.0041 0.0040 0.0039 0.0038 0.0037 0.00362.7 0.0035 0.0034 0.0033 0.0032 0.0031 0.0030 0.0029 0.0028 0.0027 0.00262.8 0.0026 0.0025 0.0024 0.0023 0.0023 0.0022 0.0021 0.0021 0.0020 0.00192.9 0.0019 0.0018 0.0018 0.0017 0.0016 0.0016 0.0015 0.0015 0.0014 0.00143.0 0.0013 0.0013 0.0013 0.0012 0.0012 0.0011 0.0011 0.0011 0.0010 0.00103.1 0.0010 0.0009 0.0009 0.0009 0.0008 0.0008 0.0008 0.0008 0.0007 0.00073.2 0.0007 0.0007 0.0006 0.0006 0.0006 0.0006 0.0006 0.0005 0.0005 0.00053.3 0.0005 0.0005 0.0005 0.0004 0.0004 0.0004 0.0004 0.0004 0.0004 0.00033.4 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.00023.5 0.0002 0.0002 0.0002 0.0002 0.0002 0.0002 0.0002 0.0002 0.0002 0.00023.6 0.0002 0.0002 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.00013.7 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.00013.8 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.00013.9 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.00004.0 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000

42

Page 44: McGill University Part A Examination in Statistics ... · Cane let to grow for one year after being cut (this is, the cane is flrst ratoon) can be considered to have an age of one

Part A Examination August 2007 Methodology Paper

Quantiles of the t and χ2 distributions

Quantiles of the t distribution Quantiles of the χ2 distributionDegrees of Upper tail probability Upper tail probabilityfreedom 0.1 0.05 0.025 0.01 0.005 0.1 0.05 0.025 0.01 0.005

1 3.08 6.31 12.71 31.82 63.66 2.71 3.84 5.02 6.63 7.882 1.89 2.92 4.30 6.96 9.92 4.61 5.99 7.38 9.21 10.603 1.64 2.35 3.18 4.54 5.84 6.25 7.81 9.35 11.34 12.844 1.53 2.13 2.78 3.75 4.60 7.78 9.49 11.14 13.28 14.865 1.48 2.02 2.57 3.36 4.03 9.24 11.07 12.83 15.09 16.756 1.44 1.94 2.45 3.14 3.71 10.64 12.59 14.45 16.81 18.557 1.41 1.89 2.36 3.00 3.50 12.02 14.07 16.01 18.48 20.288 1.40 1.86 2.31 2.90 3.36 13.36 15.51 17.53 20.09 21.959 1.38 1.83 2.26 2.82 3.25 14.68 16.92 19.02 21.67 23.5910 1.37 1.81 2.23 2.76 3.17 15.99 18.31 20.48 23.21 25.1911 1.36 1.80 2.20 2.72 3.11 17.28 19.68 21.92 24.72 26.7612 1.36 1.78 2.18 2.68 3.05 18.55 21.03 23.34 26.22 28.3013 1.35 1.77 2.16 2.65 3.01 19.81 22.36 24.74 27.69 29.8214 1.35 1.76 2.14 2.62 2.98 21.06 23.68 26.12 29.14 31.3215 1.34 1.75 2.13 2.60 2.95 22.31 25.00 27.49 30.58 32.8016 1.34 1.75 2.12 2.58 2.92 23.54 26.30 28.85 32.00 34.2717 1.33 1.74 2.11 2.57 2.90 24.77 27.59 30.19 33.41 35.7218 1.33 1.73 2.10 2.55 2.88 25.99 28.87 31.53 34.81 37.1619 1.33 1.73 2.09 2.54 2.86 27.20 30.14 32.85 36.19 38.5820 1.33 1.72 2.09 2.53 2.85 28.41 31.41 34.17 37.57 40.0021 1.32 1.72 2.08 2.52 2.83 29.62 32.67 35.48 38.93 41.4022 1.32 1.72 2.07 2.51 2.82 30.81 33.92 36.78 40.29 42.8023 1.32 1.71 2.07 2.50 2.81 32.01 35.17 38.08 41.64 44.1824 1.32 1.71 2.06 2.49 2.80 33.20 36.42 39.36 42.98 45.5625 1.32 1.71 2.06 2.49 2.79 34.38 37.65 40.65 44.31 46.9326 1.31 1.71 2.06 2.48 2.78 35.56 38.89 41.92 45.64 48.2927 1.31 1.70 2.05 2.47 2.77 36.74 40.11 43.19 46.96 49.6428 1.31 1.70 2.05 2.47 2.76 37.92 41.34 44.46 48.28 50.9929 1.31 1.70 2.05 2.46 2.76 39.09 42.56 45.72 49.59 52.3430 1.31 1.70 2.04 2.46 2.75 40.26 43.77 46.98 50.89 53.6735 1.31 1.69 2.03 2.44 2.72 46.06 49.80 53.20 57.34 60.2740 1.30 1.68 2.02 2.42 2.70 51.81 55.76 59.34 63.69 66.7745 1.30 1.68 2.01 2.41 2.69 57.51 61.66 65.41 69.96 73.1750 1.30 1.68 2.01 2.40 2.68 63.17 67.50 71.42 76.15 79.4955 1.30 1.67 2.00 2.40 2.67 68.80 73.31 77.38 82.29 85.7560 1.30 1.67 2.00 2.39 2.66 74.40 79.08 83.30 88.38 91.9580 1.29 1.66 1.99 2.37 2.64 96.58 101.88 106.63 112.33 116.32100 1.29 1.66 1.98 2.36 2.63 118.50 124.34 129.56 135.81 140.17120 1.29 1.66 1.98 2.36 2.62 140.23 146.57 152.21 158.95 163.65∞ 1.28 1.64 1.96 2.33 2.5843

Page 45: McGill University Part A Examination in Statistics ... · Cane let to grow for one year after being cut (this is, the cane is flrst ratoon) can be considered to have an age of one

Part A Examination August 2007 Methodology Paper

Quantiles of the F distribution, P = 0.05

Denominatordegrees of Numerator degrees of freedomfreedom 1 2 3 4 5 6 7 8 9 10

1 161.4 199.5 215.7 224.6 230.2 233.0 236.8 238.9 240.5 241.92 18.51 19.00 19.16 19.25 19.30 19.33 19.35 19.37 19.38 19.403 10.13 9.55 9.28 9.12 9.01 8.94 8.89 8.85 8.81 8.794 7.71 6.94 6.59 6.39 6.26 6.16 6.09 6.04 6.00 5.965 6.61 5.79 5.41 5.19 5.05 4.95 4.88 4.82 4.77 4.746 5.99 5.14 4.76 4.53 4.39 4.28 4.21 4.15 4.10 4.067 5.59 4.74 4.35 4.12 3.97 3.87 3.79 3.73 3.68 3.648 5.32 4.46 4.07 3.84 3.69 3.58 3.50 3.44 3.39 3.359 5.12 4.26 3.86 3.63 3.48 3.37 3.29 3.23 3.18 3.1410 4.96 4.10 3.71 3.48 3.33 3.22 3.14 3.07 3.02 2.9811 4.84 3.98 3.59 3.36 3.20 3.09 3.01 2.95 2.90 2.8512 4.75 3.89 3.49 3.26 3.11 3.00 2.91 2.85 2.80 2.7513 4.67 3.81 3.41 3.18 3.03 2.92 2.83 2.77 2.71 2.6714 4.60 3.74 3.34 3.11 2.96 2.85 2.76 2.70 2.65 2.6015 4.54 3.68 3.29 3.06 2.90 2.79 2.71 2.64 2.59 2.5416 4.49 3.63 3.24 3.01 2.85 2.74 2.66 2.59 2.54 2.4917 4.45 3.59 3.20 2.96 2.81 2.70 2.61 2.55 2.49 2.4518 4.41 3.55 3.16 2.93 2.77 2.66 2.58 2.51 2.46 2.4119 4.38 3.52 3.13 2.90 2.74 2.63 2.54 2.48 2.42 2.3820 4.35 3.49 3.10 2.87 2.71 2.60 2.51 2.45 2.39 2.3521 4.32 3.47 3.07 2.84 2.68 2.57 2.49 2.42 2.37 2.3222 4.30 3.44 3.05 2.82 2.66 2.55 2.46 2.40 2.34 2.3023 4.28 3.42 3.03 2.80 2.64 2.53 2.44 2.37 2.32 2.2724 4.26 3.40 3.01 2.78 2.62 2.51 2.42 2.36 2.30 2.2525 4.24 3.39 2.99 2.76 2.60 2.49 2.40 2.34 2.28 2.2426 4.23 3.37 2.98 2.74 2.59 2.47 2.39 2.32 2.27 2.2227 4.21 3.35 2.96 2.73 2.57 2.46 2.37 2.31 2.25 2.2028 4.20 3.34 2.95 2.71 2.56 2.45 2.36 2.29 2.24 2.1929 4.18 3.33 2.93 2.70 2.55 2.43 2.35 2.28 2.22 2.1830 4.17 3.32 2.92 2.69 2.53 2.42 2.33 2.27 2.21 2.1635 4.12 3.27 2.87 2.64 2.49 2.37 2.29 2.22 2.16 2.1140 4.08 3.23 2.84 2.61 2.45 2.34 2.25 2.18 2.12 2.0845 4.06 3.20 2.81 2.58 2.42 2.31 2.22 2.15 2.10 2.0550 4.03 3.18 2.79 2.56 2.40 2.29 2.20 2.13 2.07 2.0355 4.02 3.16 2.77 2.54 2.38 2.27 2.18 2.11 2.06 2.0160 4.00 3.15 2.76 2.53 2.37 2.25 2.17 2.10 2.04 1.9980 3.96 3.11 2.72 2.49 2.33 2.21 2.13 2.06 2.00 1.95100 3.94 3.09 2.70 2.46 2.31 2.19 2.10 2.03 1.97 1.93120 3.92 3.07 2.68 2.45 2.29 2.18 2.09 2.02 1.96 1.91∞ 3.84 3.00 2.60 2.37 2.21 2.10 2.01 1.94 1.88 1.8344

Page 46: McGill University Part A Examination in Statistics ... · Cane let to grow for one year after being cut (this is, the cane is flrst ratoon) can be considered to have an age of one

Part A Examination August 2007 Methodology Paper

Quantiles of the Bonferroni t distribution, P (T > t) = 0.025/n

Degrees of freedomn 5 10 15 20 25 30 40 50 60 80 100 120 ∞5 4.03 3.17 2.95 2.85 2.79 2.75 2.70 2.68 2.66 2.64 2.63 2.62 2.5810 4.77 3.58 3.29 3.15 3.08 3.03 2.97 2.94 2.91 2.89 2.87 2.86 2.8115 5.25 3.83 3.48 3.33 3.24 3.19 3.12 3.08 3.06 3.03 3.01 3.00 2.9420 5.60 4.00 3.62 3.46 3.36 3.30 3.23 3.18 3.16 3.12 3.10 3.09 3.0225 5.89 4.14 3.73 3.55 3.45 3.39 3.31 3.26 3.23 3.20 3.17 3.16 3.0930 6.14 4.26 3.82 3.63 3.52 3.45 3.37 3.32 3.29 3.25 3.23 3.22 3.1435 6.35 4.36 3.90 3.70 3.58 3.51 3.43 3.38 3.34 3.30 3.28 3.26 3.1940 6.54 4.44 3.96 3.75 3.64 3.56 3.47 3.42 3.39 3.35 3.32 3.31 3.2345 6.71 4.52 4.02 3.80 3.68 3.61 3.51 3.46 3.43 3.38 3.36 3.34 3.2650 6.87 4.59 4.07 3.85 3.73 3.65 3.55 3.50 3.46 3.42 3.39 3.37 3.2955 7.01 4.65 4.12 3.89 3.76 3.68 3.58 3.53 3.49 3.45 3.42 3.40 3.3260 7.15 4.71 4.16 3.93 3.80 3.71 3.61 3.56 3.52 3.47 3.45 3.43 3.3470 7.39 4.81 4.24 3.99 3.86 3.77 3.67 3.61 3.57 3.52 3.49 3.47 3.3880 7.60 4.90 4.31 4.05 3.91 3.82 3.71 3.65 3.61 3.56 3.53 3.51 3.4290 7.80 4.98 4.36 4.10 3.96 3.86 3.75 3.69 3.65 3.60 3.57 3.55 3.45100 7.98 5.05 4.42 4.15 4.00 3.90 3.79 3.72 3.68 3.63 3.60 3.58 3.48

Quantiles of the Bonferroni t distribution, P (T > t) = 0.005/n

Degrees of freedomn 5 10 15 20 25 30 40 50 60 80 100 120 ∞5 5.89 4.14 3.73 3.55 3.45 3.39 3.31 3.26 3.23 3.20 3.17 3.16 3.0910 6.87 4.59 4.07 3.85 3.73 3.65 3.55 3.50 3.46 3.42 3.39 3.37 3.2915 7.50 4.85 4.27 4.02 3.88 3.80 3.69 3.63 3.59 3.54 3.51 3.49 3.4020 7.98 5.05 4.42 4.15 4.00 3.90 3.79 3.72 3.68 3.63 3.60 3.58 3.4825 8.36 5.20 4.53 4.24 4.08 3.98 3.86 3.79 3.75 3.70 3.66 3.64 3.5430 8.69 5.33 4.62 4.32 4.15 4.05 3.92 3.85 3.81 3.75 3.72 3.69 3.5935 8.98 5.44 4.70 4.39 4.21 4.11 3.98 3.90 3.85 3.80 3.76 3.74 3.6340 9.24 5.53 4.77 4.44 4.27 4.15 4.02 3.94 3.89 3.83 3.80 3.78 3.6645 9.47 5.62 4.83 4.49 4.31 4.20 4.06 3.98 3.93 3.87 3.83 3.81 3.6950 9.68 5.69 4.88 4.54 4.35 4.23 4.09 4.01 3.96 3.90 3.86 3.84 3.7255 9.87 5.76 4.93 4.58 4.39 4.27 4.13 4.04 3.99 3.93 3.89 3.86 3.7460 10.05 5.83 4.97 4.62 4.42 4.30 4.15 4.07 4.02 3.95 3.91 3.89 3.7670 10.38 5.94 5.05 4.68 4.48 4.35 4.20 4.12 4.06 4.00 3.96 3.93 3.8080 10.67 6.04 5.12 4.74 4.53 4.40 4.25 4.16 4.10 4.03 3.99 3.97 3.8490 10.94 6.13 5.18 4.79 4.58 4.44 4.29 4.20 4.14 4.07 4.02 4.00 3.86100 11.18 6.21 5.24 4.84 4.62 4.48 4.32 4.23 4.17 4.10 4.05 4.03 3.89

45