biostats 640 intermediate biostatistics spring 2016 ...people.umass.edu/biep640w/pdf/be640 exam i...

16
BIOSTATS 640 Exam 1 – Spring 2016 Name ________________________________________________ Z:\bigelow\...\2016\...\BE640 Exam 1 2016.doc Page 1 of 16 BIOSTATS 640 Intermediate Biostatistics Spring 2016 Examination 1 Units 1 and 2 – Review of Introductory Biostatistics & Regression and Correlation Due: Wednesday March 2, 2016 Before you begin: This is a “take-home” exam. You are welcome to use any reference materials you wish. You are welcome to use the computer as you wish, too. However, you MUST work this exam by yourself and you may not consult with anyone. Instructions and Checklist: __1. Start each problem on a new page. __ 2. Write your name on every page. __ 3. Make a photo-copy of your exam for safekeeping prior to submission __ 4. Complete the signature page __ 5. Please DO NOT submit a copy of the exam questions!! I have them…. How to submit your exam (sorry – Faxed exams are NOTpermitted): (1) ONLINE Students Please be sure your name is somewhere on your submission. Next, save it as a SINGLE FILE pdf using the naming convention lastname_exam1.pdf. Email it to me at: [email protected] (2) Worcester Section. Please be sure your name is somewhere on your submission. Next, save it as a SINGLE FILE pdf using the naming convention lastname_exam1.pdf. Email it to me at: [email protected] (3) Amherst Section Please put your exam (stapled please) in my mailbox, located in the mail room on the 4 th floor of Arnold House. If you are unable to come to Arnold House on Wednesday March 2, 2016, I will accept a pdf (see instructions for online students). (4) ALL I will also accept exams sent by U.S. Post. Please mail with postmark no later than March 2, 2016 to: Carol Bigelow School of Public Health/402 Arnold House University of Massachusetts/Amherst 715 North Pleasant Street Amherst, MA 01003-9304 Tel. 413-545-1319.

Upload: vunhan

Post on 15-Feb-2018

285 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: BIOSTATS 640 Intermediate Biostatistics Spring 2016 ...people.umass.edu/biep640w/pdf/BE640 Exam I 2016.pdf · BIOSTATS 640 Intermediate Biostatistics Spring 2016 Examination 1

BIOSTATS 640 Exam 1 – Spring 2016 Name ________________________________________________

Z:\bigelow\...\2016\...\BE640 Exam 1 2016.doc Page 1 of 16

BIOSTATS 640 Intermediate Biostatistics

Spring 2016 Examination 1

Units 1 and 2 – Review of Introductory Biostatistics & Regression and Correlation Due: Wednesday March 2, 2016

Before you begin: This is a “take-home” exam. You are welcome to use any reference materials you wish. You are welcome to use the computer as you wish, too. However, you MUST work this exam by yourself and you may not consult with anyone. Instructions and Checklist: __1. Start each problem on a new page. __ 2. Write your name on every page. __ 3. Make a photo-copy of your exam for safekeeping prior to submission __ 4. Complete the signature page __ 5. Please DO NOT submit a copy of the exam questions!! I have them…. How to submit your exam (sorry – Faxed exams are NOTpermitted): (1) ONLINE Students Please be sure your name is somewhere on your submission. Next, save it as a SINGLE FILE pdf using the naming convention lastname_exam1.pdf. Email it to me at: [email protected] (2) Worcester Section. Please be sure your name is somewhere on your submission. Next, save it as a SINGLE FILE pdf using the naming convention lastname_exam1.pdf. Email it to me at: [email protected] (3) Amherst Section Please put your exam (stapled please) in my mailbox, located in the mail room on the 4th floor of Arnold House. If you are unable to come to Arnold House on Wednesday March 2, 2016, I will accept a pdf (see instructions for online students). (4) ALL I will also accept exams sent by U.S. Post. Please mail with postmark no later than March 2, 2016 to: Carol Bigelow School of Public Health/402 Arnold House University of Massachusetts/Amherst 715 North Pleasant Street Amherst, MA 01003-9304 Tel. 413-545-1319.

Page 2: BIOSTATS 640 Intermediate Biostatistics Spring 2016 ...people.umass.edu/biep640w/pdf/BE640 Exam I 2016.pdf · BIOSTATS 640 Intermediate Biostatistics Spring 2016 Examination 1

BIOSTATS 640 Exam 1 – Spring 2016 Name ________________________________________________

Z:\bigelow\...\2016\...\BE640 Exam 1 2016.doc Page 2 of 16

Signature This is to confirm that in completing this exam, I worked independently and did not consult with anyone. Name: ___________________________________________________________ Date: ___________________________

Thank you!

Page 3: BIOSTATS 640 Intermediate Biostatistics Spring 2016 ...people.umass.edu/biep640w/pdf/BE640 Exam I 2016.pdf · BIOSTATS 640 Intermediate Biostatistics Spring 2016 Examination 1

BIOSTATS 640 Exam 1 – Spring 2016 Name ________________________________________________

Z:\bigelow\...\2016\...\BE640 Exam 1 2016.doc Page 3 of 16

1. (10 points, total) Diagnostic related groups (DRG’s) are used in the payment for the health care of Medicare-funded patients. The following are lengths of stay LOS (days) for 50 patients with a specific DRG.

1 2 3 5 6 8 13 18 26 43 1 2 4 5 7 9 15 19 29 49 2 2 4 5 7 9 15 19 31 52 2 3 4 6 8 10 17 20 34 67 2 3 5 6 8 12 17 23 36 96

1a. (1 points) Calculate the sample mean and the sample standard deviation. 1b. (5 points) Calculate the five percentiles: minimum, first quartile, median, third quartile, maximum. 1c. (2 points)

By any means you like, construct a box plot graphical summary.

1d. (2 points) In no more than 1-2 sentences, in your assessment, is this distribution skewed? Explain.

Page 4: BIOSTATS 640 Intermediate Biostatistics Spring 2016 ...people.umass.edu/biep640w/pdf/BE640 Exam I 2016.pdf · BIOSTATS 640 Intermediate Biostatistics Spring 2016 Examination 1

BIOSTATS 640 Exam 1 – Spring 2016 Name ________________________________________________

Z:\bigelow\...\2016\...\BE640 Exam 1 2016.doc Page 4 of 16

2. (10 points, total)

2a. (2 points) If 25% of 11 year-old children have no decayed, missing or filled (DMF) teeth, what is the probability that in a random sample of 20 11 year-old children, there will be exactly 3 with no DMF teeth?

2b. (2 points) (Same setting as for question #2a). If 25% of 11 year-old children have no decayed, missing or filled (DMF) teeth, what is the probability that in a random sample of 20 11 year-old children, there will be fewer than 3 with no DMF teeth?

2c. (3 points) Suppose that, among all persons 17 years of age and older, half the males and one third of the females are current smokers. What is the probability that a random sample of 10 males and 15 females includes exactly 4 male and 6 female smokers? You may assume independence of the males and females.

2d. (3 points) (Same setting as for question #2c). Suppose that among all persons 17 years of age and over, half the males and one third of the females are current smokers. In a random sample of 10 males and 15 females, what I the probability that none smoke?

Page 5: BIOSTATS 640 Intermediate Biostatistics Spring 2016 ...people.umass.edu/biep640w/pdf/BE640 Exam I 2016.pdf · BIOSTATS 640 Intermediate Biostatistics Spring 2016 Examination 1

BIOSTATS 640 Exam 1 – Spring 2016 Name ________________________________________________

Z:\bigelow\...\2016\...\BE640 Exam 1 2016.doc Page 5 of 16

3. (10 points total) For questions 3a and 3b: According to a recent Census Bureau report, 59% of Americans have private health insurance, 25% have government health insurance (meaning: Medicare or Medicaid or military health care) and 16% have no health insurance.

3a. (3 points) Estimate the probability that a randomly selected American has health insurance. 3b. (3 points)

Given that a randomly selected American is known to have health insurance, estimate the probability that it is private. For question 3c: In a jury trial, suppose the probability that the defendant is convicted, given guilt, is 0.95, and the probability that the defendant is acquitted, given innocence, is 0.95. Suppose that 90% of all defendants truly are guilty:

3c. (4 points) Given that the defendant is convicted, what is the probability that he or she was actually innocent?

Page 6: BIOSTATS 640 Intermediate Biostatistics Spring 2016 ...people.umass.edu/biep640w/pdf/BE640 Exam I 2016.pdf · BIOSTATS 640 Intermediate Biostatistics Spring 2016 Examination 1

BIOSTATS 640 Exam 1 – Spring 2016 Name ________________________________________________

Z:\bigelow\...\2016\...\BE640 Exam 1 2016.doc Page 6 of 16

4. (10 points total)

4a. (5 points) Suppose that the weight W of male patients registered at a certain diet clinic is distributed normal with mean µ = 190 and variance σ2 = 100. For random sample of n=25, find the values “a” and “b” such that

n=25P [ a W b] = .80≤ ≤ Recall: n=25W is the sample mean of 25 observations of W.

4b. (5 points) A random sample of 32 persons attending a certain diet clinic was found to have lost, over a three- week period, an average of 30 pounds with a sample standard deviation of 11 pounds. Calculate a 99% confidence interval estimate of the true mean weight loss, over a three-week period, experienced by all persons attending the clinic. You may assume that the distribution of weight loss is normal.

Page 7: BIOSTATS 640 Intermediate Biostatistics Spring 2016 ...people.umass.edu/biep640w/pdf/BE640 Exam I 2016.pdf · BIOSTATS 640 Intermediate Biostatistics Spring 2016 Examination 1

BIOSTATS 640 Exam 1 – Spring 2016 Name ________________________________________________

Z:\bigelow\...\2016\...\BE640 Exam 1 2016.doc Page 7 of 16

5. (10 points total) An environmentalist is interested in determining if the pH of the creek water behind his house is affected by the new development upstream. He knows that a neutral stream has a pH of 7. He draws 16 samples of water and measures for each the pH. Is the pH of the creek statistically significantly different from neutral? Following are the data:

Sample 1 2 3 4 5 6 7 8 pH 7.5 7.6 7.1 6.2 6.3 6.9 7.1 7.3

Sample 9 10 11 12 13 14 15 16 pH 6.3 6.6 7.1 7.1 6.3 6.9 6.7 6.9

5a. (5 points) Using the critical region approach with type I error = 0.05, conduct a statistical significance test to evaluate this claim. You may assume normality. In reporting your answer, please provide - The null and alternative hypothesis (1 point) - The name of the test statistic used to develop the correct critical region (1 point) - The values defining the critical region (1 point) - The value of the test statistic (1 point) - Interpretation of your findings (1 point) 5b. (5 points)

What is the achieved level of significance (the p-value)?

Page 8: BIOSTATS 640 Intermediate Biostatistics Spring 2016 ...people.umass.edu/biep640w/pdf/BE640 Exam I 2016.pdf · BIOSTATS 640 Intermediate Biostatistics Spring 2016 Examination 1

BIOSTATS 640 Exam 1 – Spring 2016 Name ________________________________________________

Z:\bigelow\...\2016\...\BE640 Exam 1 2016.doc Page 8 of 16

6. (20 points total) It has been suggested that, compared to older children and adults, very young children have a higher metabolism that gives them more energy. Suppose we wish to investigate this hypothesis in a simple linear regression analysis. The data are pulse rates (Y) and age (X) for a sample of n=22 children. The following statistics have been calculated for you.

22

ii=122

2i

i=1

x = 233

x = 3345

22

ii=122

2i

i=1

y = 1725

y = 140,933

22

i ii=1

x y = 16,748∑

n = 22

6a. (2 points) State the assumptions necessary for a simple linear model relating pulse rate (Y) and age (X). 6b. (2 points) Calculate the least squares estimate of the slope and intercept. 6c. (2 points) Complete the following analysis of variance table. Source DF Sum of Squares Mean Square F-Ratio p-value Regression

?___ ?___ ?___ ?___ ?___

Residual

?___ ?___ ?___

Total, corrected ?___ ?___

Tip! ( ) ( )( )n n

2 2 2i i

i=1 i=1y - y = y - n y∑ ∑

Page 9: BIOSTATS 640 Intermediate Biostatistics Spring 2016 ...people.umass.edu/biep640w/pdf/BE640 Exam I 2016.pdf · BIOSTATS 640 Intermediate Biostatistics Spring 2016 Examination 1

BIOSTATS 640 Exam 1 – Spring 2016 Name ________________________________________________

Z:\bigelow\...\2016\...\BE640 Exam 1 2016.doc Page 9 of 16

6. (20 points total) - continued 6d. (2 points) Test the fitted model for statistical significance. In reporting your answer, please include statements of: the null and alternative hypotheses, the formula for the test statistic, the value of the test statistic, the p-value and most importantly, your interpretation! 6e. (2 points) Using the answer you obtained in #6b, what is the predicted pulse for an average 12 year-old child? 6f. (2 points) Consider your answer to #6e. What is the estimated standard error of the estimated mean prediction you obtained? 6g. (2 points) Using the answer you obtained in #6b, what is the predicted pulse for the individual John Smith who is 12 years old? 6h. (2 points) Consider your answer to #6f. What is the estimated standard error of the estimated individual prediction you obtained? 6i. (2 points) How does John Smith compare with other children his age if his actual pulse is 75? 6j. (2 points) Comment on the difference between the two predictions (questions #6e and #6g) and the two standard errors (questions #6f and #6h).

Page 10: BIOSTATS 640 Intermediate Biostatistics Spring 2016 ...people.umass.edu/biep640w/pdf/BE640 Exam I 2016.pdf · BIOSTATS 640 Intermediate Biostatistics Spring 2016 Examination 1

BIOSTATS 640 Exam 1 – Spring 2016 Name ________________________________________________

Z:\bigelow\...\2016\...\BE640 Exam 1 2016.doc Page 10 of 16

7. (10 points total) A multiple linear regression analysis of n=19 cases of coronary artery disease investigated three predictors in relationship to Y = VO2 max. X1 = maximal ejection fraction X2 = maximal heart rate X3 = maximal systolic blood pressure Preliminary descriptive statistics on the 19 values of Y = VO2 max yielded a sample mean Y=37.052 and s=8.7017 Suppose several multiple predictor models are fit and you are given the following.

Predictors in the model Sum of Squares Residual (due error) X1 , X2 , X3 790.76

X1 , X2 791.49 X1 , X3 1270.24

X2 , X3 814.16 X1 1357.48 X2 814.41 X3 1281.19

7a. (2 points) Complete the following analysis of variance table by completing the 10 cells with “?___”

Source DF SSQ MSQ F-Ratio R2

Regression { X1 , X2 , X3}

?___ ?___ ?___ ?___ ?___

Residual

?___ ?___ ?___

Total, corrected ?___ ?___ 7b. (3 points) Complete the following analysis of variance table by completing the 7 cells with “?___”

Source DF SSQ

Regression 3

2 3

1 2 3

(X )(X |X )(X |X ,X )

⎧ ⎫⎪ ⎪⎨ ⎬⎪ ⎪⎩ ⎭

1 1 1

?___ ?___ ?___

Residual

?___ ?___

Total, corrected ?___ ?___

Page 11: BIOSTATS 640 Intermediate Biostatistics Spring 2016 ...people.umass.edu/biep640w/pdf/BE640 Exam I 2016.pdf · BIOSTATS 640 Intermediate Biostatistics Spring 2016 Examination 1

BIOSTATS 640 Exam 1 – Spring 2016 Name ________________________________________________

Z:\bigelow\...\2016\...\BE640 Exam 1 2016.doc Page 11 of 16

7 - CONTINUED

7c. (5 points) Carry out the appropriate test to compare the following two models

0 1 1 2 2 3 3

0 3 3

Y = β + β X + β X + β X + EversusY = β + β X + E

In your answer, please indicate 7c (i). (1 point) The null and alternative hypotheses. 7c (ii). (1 point) The test statistic formula and its value for these data. 7c (iii). (1 point) The achieved level of significance (p-value). 7c (iv). (2 points) Interpretation of your findings.

Page 12: BIOSTATS 640 Intermediate Biostatistics Spring 2016 ...people.umass.edu/biep640w/pdf/BE640 Exam I 2016.pdf · BIOSTATS 640 Intermediate Biostatistics Spring 2016 Examination 1

BIOSTATS 640 Exam 1 – Spring 2016 Name ________________________________________________

Z:\bigelow\...\2016\...\BE640 Exam 1 2016.doc Page 12 of 16

8. (20 points total) Low birth weight is of concern because of its association with infant mortality and birth defects. A woman’s behavior during pregnancy (including diet, smoking habits, prenatal care) can affect the chances of carrying a baby to term and of delivering a baby of normal birth weight. The following is a code sheet of variables that were investigated in a multivariable regression analysis of n=189 birth weight outcomes. In these analyses the dependent variable is Y=BWT. The predictors of interest are AGE, LWT, SMOKE, PTL, HT, UI, and FTV.

Variable Variable Name and Coding Y = BWT Birth weight (grams) AGE Age of mother (years) LWT Weight at last menstrual period (pounds) SMOKE Indicator smoked during pregnancy (1=yes, 0=no) PTL Indicator history of premature labor (1=yes, 0=no) HT Indicator of hypertension (1=yes, 0=no) UI Indicator of uterine irritability (1=yes, 0=no) FTV Number of visits to doctor during first trimester (integer, 0, 1, 2, etc)

Selected calculations have been performed for you.

LWT Y = BWT N of Cases 189 189 Minimum 80.000 709.000 Maximum 250.00 4990.000 Mean 129.15 2944.656 Standard deviation 30.579 729.022

DEP VAR: BWT N: 189 MULTIPLE R: .420 SQUARED MULTIPLE R: .176 ADJUSTED SQUARED MULTIPLE R: .144

Variable Coefficient Standard Error T P (2 tail) CONSTANT 2508.341 294.508 8.517 0.000 AGE 4.692 9.722 0.483 0.630 LWT 4.276 1.725 2.479 0.014 SMOKE -226.379 102.517 -2.208 0.028 PTL -72.254 105.526 -0.685 0.494 HT -642.251 209.345 -3.068 0.002 UI -526.027 143.902 -3.655 0.000 FTV -7.987 48.118 -0.166 0.868

Page 13: BIOSTATS 640 Intermediate Biostatistics Spring 2016 ...people.umass.edu/biep640w/pdf/BE640 Exam I 2016.pdf · BIOSTATS 640 Intermediate Biostatistics Spring 2016 Examination 1

BIOSTATS 640 Exam 1 – Spring 2016 Name ________________________________________________

Z:\bigelow\...\2016\...\BE640 Exam 1 2016.doc Page 13 of 16

8a (5 points) Complete the following analysis of variance table.

Source Sum of Squares DF Mean Square F-Ratio P Regression

?___________ ?____ 2,516,982.109 ?_________ ?_____

Residual

?___________ ?____ ?____________

Total, corrected ? ____________

Hint: Notice that the standard deviation of Y =BWT = 729.022.

8b (5 points) What is the test statistic and p-value for the test of the global hypothesis that the fit of the multiple linear model explains a statistically significant greater proportion of the variability in BWT than is explained by the average BWT alone?

8c (10 points) On the next page are the results of two hierarchical multivariable models fit to the same data. The smaller model is a simple linear regression model with LWT as the predictor variable. The larger model is a 4 predictor model that contains LWT plus SMOKE, HT, and UI as predictor variables. Carry out an appropriate hypothesis test to determine which model should be reported.

Page 14: BIOSTATS 640 Intermediate Biostatistics Spring 2016 ...people.umass.edu/biep640w/pdf/BE640 Exam I 2016.pdf · BIOSTATS 640 Intermediate Biostatistics Spring 2016 Examination 1

BIOSTATS 640 Exam 1 – Spring 2016 Name ________________________________________________

Z:\bigelow\...\2016\...\BE640 Exam 1 2016.doc Page 14 of 16

Model 1. DEP VAR: BWT N: 189 MULTIPLE R: .186 SQUARED MULTIPLE R: .035 ADJUSTED SQUARED MULTIPLE R: .029

Variable Coefficient Standard Error T P (2 tail) CONSTANT 2369.672 228.431 10.374 0.000 LWT 4.429 1.713 2.586 0.010 Model 1.

Source Sum of Squares DF Mean Square F-Ratio P Regression

3,448,881.301 1 3,448,881.301 6.686 .010

Residual

.964682E+08 187 515872.574

Model 2. DEP VAR: BWT N: 189 MULTIPLE R: .416 SQUARED MULTIPLE R: .173 ADJUSTED SQUARED MULTIPLE R: .155

Variable Coefficient Standard Error T P (2 tail) CONSTANT 2575.769 226.819 11.356 0.000 LWT 4.510 1.660 2.716 0.007 SMOKE -240.081 100.139 -2.397 0.018 HT -649.271 206.349 -3.145 0.002 UI -548.924 139.440 -3.937 0.000 Model 2.

Source Sum of Squares DF Mean Square F-Ratio P Regression

.173312E+08 4 4,332,798.603 9.653 .000

Residual

.825859E+08 184 448,836.186

Page 15: BIOSTATS 640 Intermediate Biostatistics Spring 2016 ...people.umass.edu/biep640w/pdf/BE640 Exam I 2016.pdf · BIOSTATS 640 Intermediate Biostatistics Spring 2016 Examination 1

BIOSTATS 640 Exam 1 – Spring 2016 Name ________________________________________________

Z:\bigelow\...\2016\...\BE640 Exam 1 2016.doc Page 15 of 16

EXTRA CREDIT Up to 10 points, up to a maximum total exam score of 100

Radial keratotomy is a type of surgery performed to reduce myopia in near sighted patients. . The Prospective Evaluation of Radial Keratotomy (PERK) study was initiated in 1983 with the goal of investigating the effects of radial keratotomy. In one study the outcome of interest was Y = 5-year post surgical change in refractive error (diopters) in relationship to an hypothesized predictor X1 = baseline refractive error. A sample of n=54 was studied. Now suppose we want to investigate whether the relationship of Y = 5-year post surgical change in refractive error (diopters) to X1 = baseline refractive error is different, depending on gender. To address this, two new variables are created, Z and X1Z Z = 1 if patient is male 0 if patient is female. X1Z = (Z ) * (X1 ) Recall from class what this kind of new variable does: X1Z = (Z)*(X1) = (1) * X1 if patient is male = 0 if patient is female. The following two models are fit and yielded the following output. Model 1: Y regressed on X1 and Z 1y = 2.752647 - 0.309731*x - 0.412878*z

df Sum of squares Mean square F p-value Model 2 15.30101 7.65207 6.009 0.0045 Error 51 64.94880 1.27351

Total, corrected 53 80.25294 Model 2: Y regressed on X2 and Z and X1Z 1y = 3.178210 - 0.201008*x - 1.995126*z - 0.383826*x1z

df Sum of squares Mean square F p-value Model 3 19.65170 6.55057 5.405 0.0027 Error 50 60.60124 1.21202

Total, corrected 53 80.25294

Page 16: BIOSTATS 640 Intermediate Biostatistics Spring 2016 ...people.umass.edu/biep640w/pdf/BE640 Exam I 2016.pdf · BIOSTATS 640 Intermediate Biostatistics Spring 2016 Examination 1

BIOSTATS 640 Exam 1 – Spring 2016 Name ________________________________________________

Z:\bigelow\...\2016\...\BE640 Exam 1 2016.doc Page 16 of 16

(2 points) State a single multiple linear regression model that defines straight-line models relating Y = 5-year post surgical change in refractive error (diopters) to X1 = baseline refractive error for both males and females. Be sure to define all terms. (3 points) Using the output provided on the previous page, carry out the appropriate statistical test to test whether the lines for males and females coincide. In reporting your answer, be sure to state the null and alternative hypotheses, show your work, and interpret your results. (3 points) Again using the output provided on the previous page, carry out the appropriate statistical test to test whether the lines for males and females are parallel. In reporting your answer, be sure to state the null and alternative hypotheses, show your work, and interpret your results. (2 points) In 1-2 sentences at most, comment on the comparison of the straight-line models relating . Y = 5-year post surgical change in refractive error (diopters) to X1 = baseline refractive error for both males and females.