ams394.01 practice midterm fall 2015 name: id: signature: …zhu/ams394/practicemidterm_sas.pdf ·...

15
1 AMS394.01 Practice Midterm Fall 2015 Name: ____________________________ ID: ___________________ Signature: ___________________ Instruction: This is an open book exam. However no communication is allowed between students. Please provide complete solutions for full credit. Good luck! 1. We want to test the relative durability of 4 different surface coatings for optical lenses. The durability test involves subjecting a coated lens to 150 cycles of abrasion. The response variable is a measure of the increase in lens haziness. Please write up the SAS code, and the R code to do the following. In addition, please provide the out and summary of your tests/plots using one of these two programs: (1) We are testing the two hypotheses H 0 : 1 = 2 = 3 = 4 vs. H a : At least one of the means differs from the others. (2) Please include the follow-up tests for detecting specific differences among the means. (3) Please also include the side-by-side boxplot to check for homogeneity of variances, and, a residual plot. (4) Please conduct a usual t-test to compare the mean haziness between coatings 1 and 2. Solution: data one; input coating haziness; label coating = "Lens Surface Coating" haziness = "Lens Haziness after Abrasion"; datalines; 1 8.52 1 9.21 1 10.45 1 10.23 1 8.75 1 9.32 1 9.65 2 12.50 2 11.84 2 12.69 2 12.43 2 12.78 2 13.15 2 12.89 3 8.45 3 10.89 3 11.49 3 12.87 3 14.52 3 13.94 3 13.16

Upload: others

Post on 05-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: AMS394.01 Practice Midterm Fall 2015 Name: ID: Signature: …zhu/ams394/PracticeMidterm_SAS.pdf · 2015-10-29 · The following SAS data step inputs a two-way ANOVA data set examining

1

AMS394.01 Practice Midterm Fall 2015

Name: ____________________________ ID: ___________________ Signature: ___________________

Instruction: This is an open book exam. However no communication is allowed

between students. Please provide complete solutions for full credit. Good luck! 1. We want to test the relative durability of 4 different surface coatings for optical lenses. The

durability test involves subjecting a coated lens to 150 cycles of abrasion. The response

variable is a measure of the increase in lens haziness. Please write up the SAS code, and

the R code to do the following. In addition, please provide the out and summary of your

tests/plots using one of these two programs:

(1) We are testing the two hypotheses

H0: 1 = 2 = 3= 4 vs. Ha: At least one of the means differs from the others.

(2) Please include the follow-up tests for detecting specific differences among the means.

(3) Please also include the side-by-side boxplot to check for homogeneity of variances, and, a

residual plot.

(4) Please conduct a usual t-test to compare the mean haziness between coatings 1 and 2.

Solution:

data one;

input coating haziness;

label coating = "Lens Surface Coating"

haziness = "Lens Haziness after Abrasion";

datalines;

1 8.52

1 9.21

1 10.45

1 10.23

1 8.75

1 9.32

1 9.65

2 12.50

2 11.84

2 12.69

2 12.43

2 12.78

2 13.15

2 12.89

3 8.45

3 10.89

3 11.49

3 12.87

3 14.52

3 13.94

3 13.16

Page 2: AMS394.01 Practice Midterm Fall 2015 Name: ID: Signature: …zhu/ams394/PracticeMidterm_SAS.pdf · 2015-10-29 · The following SAS data step inputs a two-way ANOVA data set examining

2

4 10.73

4 8.00

4 9.75

4 8.71

4 10.45

4 11.38

4 11.35

;

Run;

proc boxplot data=one;

plot haziness*coating;

title "Side-by-Side Boxplots of Response Variable";

title2 "by Levels of Treatment";

Run;

Proc glm data=one;

class coating;

model haziness = coating;

lsmeans coating /out=outmns;

means coating / cldiff bon;

output out=resout p=preds rstudent=exstdres;

title "Analysis of Variance for Optical Lens Surface Coatings";

title2 "With Follow-Up Tests";

Run;

Quit;

title 'Profile Plot';

symbol i=j;

proc gplot data=outmns;

where coating ne .;

plot lsmean*coating;

run;

quit;

goptions reset=all;

title 'Residual Plot';

proc gplot data=resout;

plot exstdres*preds;

run; quit;

data two;

input coating haziness;

label coating = "Lens Surface Coating"

Page 3: AMS394.01 Practice Midterm Fall 2015 Name: ID: Signature: …zhu/ams394/PracticeMidterm_SAS.pdf · 2015-10-29 · The following SAS data step inputs a two-way ANOVA data set examining

3

haziness = "Lens Haziness after Abrasion";

datalines;

1 8.52

1 9.21

1 10.45

1 10.23

1 8.75

1 9.32

1 9.65

2 12.50

2 11.84

2 12.69

2 12.43

2 12.78

2 13.15

2 12.89

;

Run;

Proc univariate data=two normal;

Class coating;

Var haziness;

Title ‘check for normality’;

Run;

Proc ttest data=two;

Class coating;

Var haziness;

Title ‘Independent samples t-test’;

Run;

Page 4: AMS394.01 Practice Midterm Fall 2015 Name: ID: Signature: …zhu/ams394/PracticeMidterm_SAS.pdf · 2015-10-29 · The following SAS data step inputs a two-way ANOVA data set examining

4

Selected output and summary:

(1) The GLM Procedure

Dependent Variable: haziness Lens Haziness after Abrasion

Sum of

Source DF Squares Mean Square F Value Pr > F

Model 3 51.06744286 17.02248095 10.12 0.0002

Error 24 40.35205714 1.68133571

Corrected Total 27 91.41950000

Summary: we reject the ANOVA null hypothesis.

(2) The GLM Procedure

Bonferroni (Dunn) t Tests for haziness

NOTE: This test controls the Type I experimentwise error rate, but it generally has a higher Type

II error rate than Tukey's for all pairwise comparisons.

Alpha 0.05

Error Degrees of Freedom 24

Error Mean Square 1.681336

Critical Value of t 2.87509

Minimum Significant Difference 1.9927

Comparisons significant at the 0.05 level are indicated by ***.

Difference

coating Between Simultaneous 95%

Comparison Means Confidence Limits

2 - 3 0.4229 -1.5699 2.4156

2 - 4 2.5586 0.5659 4.5513 ***

2 - 1 3.1643 1.1716 5.1570 ***

3 - 2 -0.4229 -2.4156 1.5699

3 - 4 2.1357 0.1430 4.1284 ***

3 - 1 2.7414 0.7487 4.7341 ***

4 - 2 -2.5586 -4.5513 -0.5659 ***

Page 5: AMS394.01 Practice Midterm Fall 2015 Name: ID: Signature: …zhu/ams394/PracticeMidterm_SAS.pdf · 2015-10-29 · The following SAS data step inputs a two-way ANOVA data set examining

5

4 - 3 -2.1357 -4.1284 -0.1430 ***

4 - 1 0.6057 -1.3870 2.5984

1 - 2 -3.1643 -5.1570 -1.1716 ***

1 - 3 -2.7414 -4.7341 -0.7487 ***

1 - 4 -0.6057 -2.5984 1.3870

Summary: the pairwise comparisons show that coatings 1/2, 1/3, 2/4, 3/4

are significantly different at the familywise error rate of 0.05. Note,

although we used the Bonferroni method here as an example, the Tukey method

is less conservative and in general, better.

(3)

Side-by-Side Boxplots of Response Variableby Levels of Treatment

1 2 3 4

8

10

12

14

16

Lens H

azin

ess a

fter

Abra

sio

n

Lens Surface Coating

Page 6: AMS394.01 Practice Midterm Fall 2015 Name: ID: Signature: …zhu/ams394/PracticeMidterm_SAS.pdf · 2015-10-29 · The following SAS data step inputs a two-way ANOVA data set examining

6

exstdres

-4

-3

-2

-1

0

1

2

3

preds

9 10 11 12 13

Residual Plot

Summary: The box-plots make us worry about the equal variance assumptions.

The residual plot shows some concern of unequal variance too.

(4)

The UNIVARIATE Procedure

Variable: haziness (Lens Haziness after Abrasion)

coating = 1

Tests for Normality

Test --Statistic--- -----p Value------

Shapiro-Wilk W 0.953597 Pr < W 0.7623

Kolmogorov-Smirnov D 0.148529 Pr > D >0.1500

Cramer-von Mises W-Sq 0.027158 Pr > W-Sq >0.2500

Anderson-Darling A-Sq 0.196002 Pr > A-Sq >0.2500

The UNIVARIATE Procedure

Variable: haziness (Lens Haziness after Abrasion)

coating = 2

Tests for Normality

Test --Statistic--- -----p Value------

Shapiro-Wilk W 0.949828 Pr < W 0.7281

Kolmogorov-Smirnov D 0.188846 Pr > D >0.1500

Page 7: AMS394.01 Practice Midterm Fall 2015 Name: ID: Signature: …zhu/ams394/PracticeMidterm_SAS.pdf · 2015-10-29 · The following SAS data step inputs a two-way ANOVA data set examining

7

Cramer-von Mises W-Sq 0.036567 Pr > W-Sq >0.2500

Anderson-Darling A-Sq 0.251295 Pr > A-Sq >0.2500

Summary: The Shapiro-Wilk test shows that both samples are normal and

thus we can continue with the independent samples t-test.

The TTEST Procedure

Variable: haziness (Lens Haziness after Abrasion)

coating N Mean Std Dev Std Err Minimum Maximum

1 7 9.4471 0.7162 0.2707 8.5200 10.4500

2 7 12.6114 0.4169 0.1576 11.8400 13.1500

Diff (1-2) -3.1643 0.5860 0.3132

coating Method Mean 95% CL Mean Std Dev 95% CL Std Dev

1 9.4471 8.7848 10.1095 0.7162 0.4615 1.5771

2 12.6114 12.2259 12.9970 0.4169 0.2686 0.9180

Diff (1-2) Pooled -3.1643 -3.8467 -2.4818 0.5860 0.4202 0.9673

Diff (1-2) Satterthwaite -3.1643 -3.8657 -2.4629

Method Variances DF t Value Pr > |t|

Pooled Equal 12 -10.10 <.0001

Satterthwaite Unequal 9.6468 -10.10 <.0001

Equality of Variances

Method Num DF Den DF F Value Pr > F

Folded F 6 6 2.95 0.2135

Summary: The F-test shows that the variances can be considered equal.

Therefore we adopted the pooled-variance t-test and found significant

mean differences (in terms of haziness of lenses) between coatings 1 and

2.

2. The following SAS data step inputs a two-way ANOVA data set examining the relationship

between crop density, amount of fertilizers, and crop yield. Please write up the SAS code,

and the R code to do the following. In addition, please provide the out and summary of your

tests/plots using one of these two programs:

Page 8: AMS394.01 Practice Midterm Fall 2015 Name: ID: Signature: …zhu/ams394/PracticeMidterm_SAS.pdf · 2015-10-29 · The following SAS data step inputs a two-way ANOVA data set examining

8

(1) We are testing the ANOVA hypotheses of (a) no interaction, (b) density main effect, and (c)

fertilizer main effect.

(2) Please include the follow-up tests for detecting specific differences among the means.

(3) Please also include the side-by-side boxplot to check for homogeneity of variances, and, a

residual plot.

PROC FORMAT;

VALUE den 1='regular' 2='thick';

VALUE fert 1='low' 2='medium' 3='high';

RUN;

DATA soybean(DROP=rep);

FORMAT density den. fertilizer fert.;

DO fertilizer = 1 TO 3;

DO density = 1 TO 2;

DO rep = 1 TO 4;

INPUT yield @@;

OUTPUT;

END;

END;

END;

DATALINES;

37.5 36.5 38.6 36.5 37.4 35.0 38.1 36.5

48.1 48.3 48.6 46.4 36.7 36.4 39.3 37.5

48.5 46.1 49.1 48.2 45.7 45.7 48.0 46.4

;

Run;

Proc sort data=soybean;

By fertilizer;

Run;

proc boxplot data=soybean;

plot yield*fertilizer;

title "Side-by-Side Boxplot of Response Variable";

title2 "by Levels of fertilizer";

Run;

Proc sort data=soybean;

By density;

Run;

proc boxplot data=soybean;

plot yield*density;

Page 9: AMS394.01 Practice Midterm Fall 2015 Name: ID: Signature: …zhu/ams394/PracticeMidterm_SAS.pdf · 2015-10-29 · The following SAS data step inputs a two-way ANOVA data set examining

9

title "Side-by-Side Boxplot of Response Variable";

title2 "by Levels of density";

Run;

TITLE3 'Tests for Interaction & Main Effects';

PROC GLM DATA=soybean ORDER=INTERNAL;

CLASS density fertilizer;

MODEL yield = density | fertilizer;

lsmeans density fertilizer density*fertilizer /out=outmns;

means density fertilizer /cldiff bon;

output out=resout p=preds rstudent=exstdres;

RUN;

Quit;

title 'Profile/Interaction Plots';

symbol i=j;

proc gplot data=outmns;

where fertilizer ne . and density ne .;

plot lsmean*density=fertilizer;

plot lsmean*fertilizer=density;

run; quit;

goptions reset=all; *resets PROC GPLOT options;

title 'Residual Plot';

proc gplot data=resout;

plot exstdres*preds;

run; quit;

Page 10: AMS394.01 Practice Midterm Fall 2015 Name: ID: Signature: …zhu/ams394/PracticeMidterm_SAS.pdf · 2015-10-29 · The following SAS data step inputs a two-way ANOVA data set examining

10

Selected output and summary:

(1)

Source DF Type III SS Mean Square F Value Pr > F

density 1 102.9204167 102.9204167 74.01 <.0001

fertilizer 2 417.7733333 208.8866667 150.20 <.0001

density*fertilizer 2 117.5633333 58.7816667 42.27 <.0001

Summary: we see significant interaction and main effects.

(2)

Bonferroni (Dunn) t Tests for yield

NOTE: This test controls the Type I experimentwise error rate, but it generally has a higher Type

II error rate than Tukey's for all pairwise comparisons.

Alpha 0.05

Error Degrees of Freedom 18

Error Mean Square 1.390694

Critical Value of t 2.10092

Minimum Significant Difference 1.0115

Comparisons significant at the 0.05 level are indicated by ***.

Difference

density Between Simultaneous 95%

Comparison Means Confidence Limits

regular - thick 4.1417 3.1302 5.1531 ***

thick - regular -4.1417 -5.1531 -3.1302 ***

Bonferroni (Dunn) t Tests for yield

NOTE: This test controls the Type I experimentwise error rate, but it generally has a higher Type

II error rate than Tukey's for all pairwise comparisons.

Alpha 0.05

Error Degrees of Freedom 18

Error Mean Square 1.390694

Critical Value of t 2.63914

Page 11: AMS394.01 Practice Midterm Fall 2015 Name: ID: Signature: …zhu/ams394/PracticeMidterm_SAS.pdf · 2015-10-29 · The following SAS data step inputs a two-way ANOVA data set examining

11

Minimum Significant Difference 1.5561

Comparisons significant at the 0.05 level are indicated by ***.

Difference

fertilizer Between Simultaneous 95%

Comparison Means Confidence Limits

high - medium 4.5500 2.9939 6.1061 ***

high - low 10.2000 8.6439 11.7561 ***

medium - high -4.5500 -6.1061 -2.9939 ***

medium - low 5.6500 4.0939 7.2061 ***

low - high -10.2000 -11.7561 -8.6439 ***

low - medium -5.6500 -7.2061 -4.0939 ***

Summary: the pairwise comparisons show that all pairs are significantly

different from each other in means.

(3)

Side-by-Side Boxplot of Response Variableby Levels of fertilizer

low medium high

35.0

37.5

40.0

42.5

45.0

47.5

50.0

yie

ld

fertilizer

Page 12: AMS394.01 Practice Midterm Fall 2015 Name: ID: Signature: …zhu/ams394/PracticeMidterm_SAS.pdf · 2015-10-29 · The following SAS data step inputs a two-way ANOVA data set examining

12

Side-by-Side Boxplot of Response Variableby Levels of density

regular thick

35.0

37.5

40.0

42.5

45.0

47.5

50.0

yie

ld

density

exstdres

-2

-1

0

1

2

preds

36 37 38 39 40 41 42 43 44 45 46 47 48

Residual Plot

Summary: The box-plots make us worry about the equal variance assumptions

for different fertilizers, but no worries for different density levels.

The residual plot seems okay.

Page 13: AMS394.01 Practice Midterm Fall 2015 Name: ID: Signature: …zhu/ams394/PracticeMidterm_SAS.pdf · 2015-10-29 · The following SAS data step inputs a two-way ANOVA data set examining

13

3. The following dataset examines the relationship between time to

headache relief, and the three brands of pain killers. Please use the

REGRESSION procedures in SAS and R to analyze this data set.

(1) Please write down the program for both SAS and R, and use one of

these two programs to analyze the data.

(2) Please include necessary plots and analyses to verify the

underlying model assumptions.

(3) Please include your output and summary of results.

Data three;

Input BRAND RELIEF;

Dummy1= 0;

Dummy2= 0;

If brand=1 then dummy1=1;

If brand=2 then dummy2=1;

Datalines;

1 24.5

1 23.5

1 26.4

1 27.1

1 29.9

2 28.4

2 34.2

2 29.5

2 32.2

2 30.1

3 26.1

3 28.3

3 24.3

3 26.2

3 27.8

;

Run;

Proc print data=three;

Run;

proc boxplot data=three;

plot relief*brand;

title "Side-by-Side Boxplots of Response Variable";

title2 "by brands of Treatment";

Run;

Proc glm data=three;

Page 14: AMS394.01 Practice Midterm Fall 2015 Name: ID: Signature: …zhu/ams394/PracticeMidterm_SAS.pdf · 2015-10-29 · The following SAS data step inputs a two-way ANOVA data set examining

14

class brand;

model relief = brand;

lsmeans brand /out=outmns;

means brand / cldiff bon;

output out=resout p=preds rstudent=exstdres;

title "Analysis of Variance for Pain Relief by Drug Brands";

title2 "With Follow-Up Tests";

Run;

Quit;

Proc reg data=three;

model relief = dummy1 dummy2;

Run;

Quit;

title 'Profile Plot';

symbol i=j;

proc gplot data=outmns;

where brand ne .;

plot lsmean*brand;

run;

quit;

goptions reset=all;

title 'Residual Plot';

proc gplot data=resout;

plot exstdres*preds;

run; quit;

Page 15: AMS394.01 Practice Midterm Fall 2015 Name: ID: Signature: …zhu/ams394/PracticeMidterm_SAS.pdf · 2015-10-29 · The following SAS data step inputs a two-way ANOVA data set examining

15

Selected output and summary:

Dear students, the only difference of what is required in this problem

versus that in Problem 1, is that I need you to write down the general

linear model. This can be accomplished by you setting up the dummy

variables and then run the regression with the dummy variables directly.

There will be other approaches but we are showing the easiest one here.

So to save time, I will only show this different part.

Obs BRAND RELIEF Dummy1 Dummy2

1 1 24.5 1 0

2 1 23.5 1 0

3 1 26.4 1 0

4 1 27.1 1 0

5 1 29.9 1 0

6 2 28.4 0 1

7 2 34.2 0 1

8 2 29.5 0 1

9 2 32.2 0 1

10 2 30.1 0 1

11 3 26.1 0 0

12 3 28.3 0 0

13 3 24.3 0 0

14 3 26.2 0 0

15 3 27.8 0 0

Parameter Estimates

Parameter Standard

Variable DF Estimate Error t Value Pr > |t|

Intercept 1 26.54000 0.96720 27.44 <.0001

Dummy1 1 -0.26000 1.36782 -0.19 0.8524

Dummy2 1 4.34000 1.36782 3.17 0.0080

Summary: Here you see the dataset with the two dummy variables. The

estimated general linear model is:

ˆ 26.54 0.26* 1 4.34* 2Y dummy dummy