~aa7ljhomepage.divms.uiowa.edu/~rdecook/stat5201/notes/2-11a_modeling_trends.pdfstat:5201 applied...

20
</h~0J f('iU?{s (3.(0 OLR,) - S~ ~ I tt ~~(~ u.-e ~ f.to/~ Cl.:V C{ /~ \\ cuIu~ hew.; ~aA7lj /cJ! ~. i. e. ~ /i~~ Lvud' % ('~ ~ ~ e~ t.e ~ a:u v~ &'L tL (ldYzh:"u~ ~~I F~ .c !V1~ , ''d +u..1Z " ~5 - - - ~~ I(P 0 ")/ '$0) 2bO) 2. 2-0 (w) Df'lIJ r;- /6 \ l S- ( vvta- ) ) 7#'U! InJ} ItO 2{)[)} 2/0 (oI(J~4~ ) ) --- A,6,C,D

Upload: others

Post on 17-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ~aA7ljhomepage.divms.uiowa.edu/~rdecook/stat5201/notes/2-11a_modeling_trends.pdfSTAT:5201 Applied Statistics II (Transformation in a dose-response setting) One-Factor experiment with

</h~0J f('iU?{s (3.(0 OLR,)-

S~ ~ I tt ~~(~ u.-e ~ f.to/~ Cl.:V C{ /~ \\

cuIu~ hew.; ~aA7lj /cJ! ~. i.e. ~ /i~~

Lvud' % ('~ ~ ~ e~ t.e ~ a:u

v~ &'L tL (ldYzh:"u~ ~~I

F~.c !V1~ ,''d +u..1Z "~5

- - -~~ I(P 0 ") / '$0) 2bO) 2. 2-0 (w)

Df'lIJ r;- /6 \ l S- ( vvta- ))

7#'U! InJ} ItO 2{)[)} 2/0 (oI(J~4~ ))

---

A,6,C,D

Page 2: ~aA7ljhomepage.divms.uiowa.edu/~rdecook/stat5201/notes/2-11a_modeling_trends.pdfSTAT:5201 Applied Statistics II (Transformation in a dose-response setting) One-Factor experiment with

z/'t-O ®OLR ..T ('~ -t.. (~ 1etds- CW ,'..ftt 'rnecvn.;1j.(;..Q" f'ltvrr1U,L

", ~S) ~ oIrH-tUV

w~ u-e h~ «. I~ wj~ ~ au, ~)

We ~~ m au: ~J;6V1..5 -' !J1~1J It{ /'.e/aI~J£pIu ~ -1-4 (..R<>jJtfPuUa-/ ~ I"~ .. ~~)f~ {LA-< CiPdd ~ -h, U:vz- ~ /7 P1PJ?/~

to fh-Ordd It.. r~UJhS~ be~ ('b~ & tM/~'"

c ~ +l f!lOlJr/~ 2 Jah sds0'Yv01

CO ~

I

/ ~

~l'

~e BI

"1

5 (0 IS- 2AJI(}() /)0 2)0

(r l'

If Jd..t /Ylldt/l wtf7<h/ w~ dIo-..J j tJl <tH-{f cJY Yh.RcvYl ~ -cad uU.7A bolt.. oI.J6.. sd-5

ih~ tNI~ I~~ ace ~

+01 -Ide /NtUt ~f/'t.«'~ 1m ~ 1uU /J1rdd ·

Page 3: ~aA7ljhomepage.divms.uiowa.edu/~rdecook/stat5201/notes/2-11a_modeling_trends.pdfSTAT:5201 Applied Statistics II (Transformation in a dose-response setting) One-Factor experiment with

b/W CDI

W~,'J.. MakJ2 j)) ct bef-l~ e~/'o!a.le -111l ~

pri~f5Yv\~0cJ2- fVLac:/rd ~ ~ b=e -14p~ /'J~;"?f~p?

/ / / / yV'-oJJ-s .L /110h.! do u-e c~~ ~(tv/IA ce £UShCW W-A ?- We tu» tt!YU/~ 4 ~ 1J1~ pi';~ t'f/lW/'4(~ #vz ss, /f/) e~ m~

- SSE wtU ~tJ c/dI<P1 ~ a. P11~ C~.A-?~/

6U1 r:/ m7 Jw/ f)o c:/~ -?P!~ ~ ;td'jlh'7; 4 -4rPtl ttM~ (~It.e ~h--«1 01-1,

-' 4I/~dvrJ

Page 4: ~aA7ljhomepage.divms.uiowa.edu/~rdecook/stat5201/notes/2-11a_modeling_trends.pdfSTAT:5201 Applied Statistics II (Transformation in a dose-response setting) One-Factor experiment with

i~1 e/t» - (})

(' or,<J('b oto.k ~ J]J) ~ e ~- SSlO 1U! ~ ~<2/L rnrJ?iu i<l c~ ~5 '

~).~~

f;5E 11f1 ~ ~/'~c. r71ter!d I-d eallad SS:l::)~.

Wi- ~ S!:•.u-: '/ Ss 5'~j~~) ~ ~/t;~ (~~ :~r~ ~~ thtf2& 4Cl4 <>- ~ 12~ ~#U t4uCUt S~Lu!-~ (AAJ~i ~). ~ g4d~cIn dJ LI40 <L -MU~ .3 fa. ~ j; 4.~ slr~~.

ASS : SS' - SSe c~.f/~CVt e 18 t{ad

Mor1JL ~ cI. t. = 3 - 2 == l

--ro tv.d' IIwu IJ/ vs. ('~ /11 ~ } ?eX WLe

Fo = {~~~f. ) = _ :s:r.o ("S.s~>+V.J-Q ) MSdF EI~

-ttJ-P

LUnc&t If., tuu I ~ ~ led~ 1l1~ /5 -r-:F ~~ \o (.a 01. { ) d f).~ /

Page 5: ~aA7ljhomepage.divms.uiowa.edu/~rdecook/stat5201/notes/2-11a_modeling_trends.pdfSTAT:5201 Applied Statistics II (Transformation in a dose-response setting) One-Factor experiment with

1f F. > F .(~ol.-f. dr ~ ~ -s: ~ SSE La ~a-(~ ~~

) .fv.iQ) .05"; (7

WOlLed & '9<-~ Cf ~ r~ H1~ UAo ~)

tIMe! UX! r~ #to r~ 1J7~ WJ /~ ~

4 /tdf M&UI,

V'S, !uJ~2 ) .6.01, -f =c 1..VlMk.

OAd F~f() ({ ) cI tfv.il )

Ifrl =:v Iud V~, r~ /'1tJC>tJs) J co .

(~7 h, s--hJ WI f.1 ~ f17»sl =: /11~-t k C I'luJJ~ uae. /hio cw ~ r« r11rdU' J

~ fmy~ ;4, --I-k I1.fo;C I s,~ I ~ M~,

]; dJQ ~ m I h JlVtfJSf e(J)1/l~ /h~ ~c/ -I!I a..

l-w(j ANovA fV\d.t2 (Lf taA~ -10 cUoGt/~ ~

/Ih.e a-rt SI/'0C£) w-4 ('J,. iAJwt/ cJ" re -Itw s"""'-<' f'J/~)

v~ Q<J <A- c~,'c /~htfY'1/~,

(SJUL SA:S bKtV>«-('Ie. •. 1- WAy' AI\fWA 0\2. ~~)

Page 6: ~aA7ljhomepage.divms.uiowa.edu/~rdecook/stat5201/notes/2-11a_modeling_trends.pdfSTAT:5201 Applied Statistics II (Transformation in a dose-response setting) One-Factor experiment with

STAT:5201 Applied Statistic II (I-way ANOVA or dosage model?)

One-Factor experiment with levels defined as dosesFactor: Power (160 W, 180 W, 200 W , 220 W)

Response: Etch rate in A/minute

Five circuit wafers are randomly assigned to each power level.This is a completely randomized design (CRD).

SAS Program Part 1:

data etch;input power rate;cards;160 575160 542160 530160 539160 570180 565180 593180 590180 579180 610200 600200 651200 610200 637200 629220 725220 700220 715220 685220 710

/*Plot the observed data with side-by-sideSYMBOLl i=box bwidth=4 c=black;proc gplot data=etch;

plot rate*power/haxis=155 to 225;

box plots, manually requesting.*/

run;

Fit alway ANOVA, the most complex model to be fitted:

/* PROC GLM provides the boxplot for you when 'power' is a class variable.*/proc glm data=etch;

class power;model rate=power;

run;

1

Page 7: ~aA7ljhomepage.divms.uiowa.edu/~rdecook/stat5201/notes/2-11a_modeling_trends.pdfSTAT:5201 Applied Statistics II (Transformation in a dose-response setting) One-Factor experiment with

The GLM Procedure

Dependent Variable: rate

Sum of

Source DF Squares Mean Square F Value Pr > F

Model 3 66870.55000 22290.18333 66.80 <.0001

Error 16 5339.20000 333.70000

Corrected Total 19 72209.75000

We can compare the 1-way ANOVA model (whose fitted values are the same as the fitted cubicpolynomial in this case) to the quadratic (next simpler polynomial model) using a full vs. reducedtest and the SSE for each respective model.

H0: quadratic model is su�cientHA: the cubic model is justified

Fit a model that is quadratic in power:

proc glm data=etch; /*notice that power is no longer in the class statement.*/

model rate=power power*power; /*to square a variable in SAS use the ‘*’. */

output out=diagnostics p=predicted r=residual;

run;

The GLM Procedure

Dependent Variable: rate

Sum of

Source DF Squares Mean Square F Value Pr > F

Model 2 66433.74000 33216.87000 97.76 <.0001

Error 17 5776.01000 339.76529

Corrected Total 19 72209.75000

2

Page 8: ~aA7ljhomepage.divms.uiowa.edu/~rdecook/stat5201/notes/2-11a_modeling_trends.pdfSTAT:5201 Applied Statistics II (Transformation in a dose-response setting) One-Factor experiment with

For both models, the Total Sum of Squares (corrected) is the same at 772209.75. We will use theSSE from each model to perform the ‘lack of fit (LOF)’ F-test.

F0 =

SSEquadratic�SSE1way ANOV A

1

!

MSE1way ANOV A= 5776.01�5339.2

333.7 = 1.31

This F ratio is very close to 1 and is not significant (p-value = 0.2692) when considering the nulldistribution of the test statistic F(1,16). Thus, we accept the reduced model, and reject the overlycomplex 1way ANOVA model (four separately estimated means).

Fit a model that is linear in power:

proc glm data=etch;

model rate=power; /*notice power is not listed as a class variable.*/

run;

The GLM Procedure

Dependent Variable: rate

Sum of

Source DF Squares Mean Square F Value Pr > F

Model 1 63857.29000 63857.29000 137.62 <.0001

Error 18 8352.46000 464.02556

Corrected Total 19 72209.75000

Comparing the linear and quadratic fit: For both models, the Total Sum of Squares (cor-rected) is the same at 772209.75. We can use the SSE from each model to perfrom an F-test for

H0: linear model is su�cientHA: inclusion of a quadratic term is justified

F0 =

SSElinear�SSEquadratic

1

!

MSEquadratic= 8352.46�5776.01

339.77 = 7.58

The p-value is P (F(1,17) > 7.58) = 0.0136, so we reject H0 and choose the quadratic model for thedata. The use of the extra parameter in the mean structure is worth it.

3

Page 9: ~aA7ljhomepage.divms.uiowa.edu/~rdecook/stat5201/notes/2-11a_modeling_trends.pdfSTAT:5201 Applied Statistics II (Transformation in a dose-response setting) One-Factor experiment with

Diagnostic plots from quadratic fit.

And it looks like PROC GLM will also automatically give you a fitted curve with confidence bands(perceiving ‘power’ as a continuous variable).

4

Page 10: ~aA7ljhomepage.divms.uiowa.edu/~rdecook/stat5201/notes/2-11a_modeling_trends.pdfSTAT:5201 Applied Statistics II (Transformation in a dose-response setting) One-Factor experiment with

z /7-(j ;(j£)S (f»« ~ ii/ ~/J --h r~ <t i~~~+ jIu dah f.el~ 4 cI~ (!l r:-0/JrvU ) .: (/I~

-Iv s~;/tfJ ~ cine -f-M/JhLJZ f~a!'0>7SLp,( ~ 5 ~ S, ~"{!1/Vly)-<.P: Ir .on S.t."VV'A---t, ifY1 (/Vl a... 01C6e- rcS.f' ~ )

LJ~ Lulz ~ ofevl-ec -Jr6YVl- OLPT f~ -:;.'3

a-o ~ *'Y~;-fe 161 /4:/5 5,1aJ~ J ~ Y tM W~

4fJ sud <l //()b~ oYt y~ ~ ~ l/tJ 6j1.

c/~/1;t jJf'lTt~ 3. 51 .; {)U2T.

Page 11: ~aA7ljhomepage.divms.uiowa.edu/~rdecook/stat5201/notes/2-11a_modeling_trends.pdfSTAT:5201 Applied Statistics II (Transformation in a dose-response setting) One-Factor experiment with

STAT:5201 Applied Statistics II (Transformation in a dose-response setting)

One-Factor experiment with levels as dosesFactor: Number of passes (0, 25, 75, 200, 500)

Response: Height (em)

To study how resistant different types of vegetation are to 'trampling', park researchers randomly assigned20 trails to one of five levels of 'trampling'. 'Trampling' was quantified based on the number of walkingpasses that were taken over the trail. Trails either received 0, 25, 75, 200, or 500 walking passes. One yearlater, the average height of the vegetation on the trail was measured as the response. This is a complctolyrandomized design (eRD).

(This is Problem 3.3 on p.62 in OLRT).

SAS Part 1:

proc import datafile="Y:\Iowa_classes\Stat_5201_Design\2-18_dose-response\passes_height.csv"out=passhtdbms=CSVreplace;

run;

proc print data=passht;run;

Obs Passes Height

1 0 20.72 0 15.93 0 17.84 0 17.65 25 12.96 25 13.47 25 12.78 25 99 75 11.8

10 75 12.611 75 11.412 75 12.113 200 7.614 200 9.515 200 9.916 200 917 500 7.818 500 919 500 8.520 500 6.7

proc gplot data=passht;plot Height*Passes;

run;

1

Page 12: ~aA7ljhomepage.divms.uiowa.edu/~rdecook/stat5201/notes/2-11a_modeling_trends.pdfSTAT:5201 Applied Statistics II (Transformation in a dose-response setting) One-Factor experiment with

Height21

20

19

18

17

16

15

14

13

12

11

10

9

8

7

6

0

+:j:

++++

+

100 500

+++

+

200 300 400

Fit alway ANOVA, the most complex model (same results as a quartic polynomial):

Passes

proc glm data=passht plot=diagnostics;class Passes;model Height=Passes;out ut out=anovaout p=predicted r=residual;lsmeans Passes; /* Gives the mean for each level of factor Passes. */

SAS output:

The GLM ProcedureClass Level Information

ClassPasses

Levels5

Valueso 25 75 200 500

The GLM ProcedureDependent Variable: Height

SourceModelErrorCorrected Total

DF41519

Sum ofSquares

243.162000030.9275000

274.0895000

Mean Square60.79050002.0618333

F Value29.48

Pr > F<.0001

2

Page 13: ~aA7ljhomepage.divms.uiowa.edu/~rdecook/stat5201/notes/2-11a_modeling_trends.pdfSTAT:5201 Applied Statistics II (Transformation in a dose-response setting) One-Factor experiment with

The GLM ProcedureLeast Squares Means

PassesHeightLSMEAN

o2575200500

18.000000012.000000011.97500009.00000008.0000000

Bonus p~ot from..the LSMEANS state~~Ilt:

LS-Means for Passesr!I0I

17.5 li

II

150 ic:

I(QQ)

~r.h I---.J12.5 I

1:: "'1!

0) ,Q)

I

Iii

10.0 ~I!!!

7.5 -jI

I

a

o o

o

oI

!

-------,- ..------.-.---- ..- r-' __J25

---,75

Passes200 500

3

Page 14: ~aA7ljhomepage.divms.uiowa.edu/~rdecook/stat5201/notes/2-11a_modeling_trends.pdfSTAT:5201 Applied Statistics II (Transformation in a dose-response setting) One-Factor experiment with

4

Page 15: ~aA7ljhomepage.divms.uiowa.edu/~rdecook/stat5201/notes/2-11a_modeling_trends.pdfSTAT:5201 Applied Statistics II (Transformation in a dose-response setting) One-Factor experiment with

SAS Part 2: Fit a quartic polynomial (same fitted model as 1-way ANOVA)

data passht; set passht;Pass2=Passes*Passes;Pass3=Passes*Pass2;Pass4=Pass2*Pass2;

run;proc print data=passht;run;

Obs Passes Height Pass2 Pass3 Pass4

1 0 20.7 0 0 02 0 15.9 0 0 03 0 17.8 0 0 04 0 17.6 0 0 05 25 12.9 625 15625 3906256 25 13.4 625 15625 3906257 25 12.7 625 15625 3906258 25 9 625 15625 3906259 75 11.8 5625 421875 3164062510 75 12.6 5625 421875 3164062511 75 11.4 5625 421875 3164062512 75 12.1 5625 421875 3164062513 200 7.6 40000 8000000 160000000014 200 9.5 40000 8000000 160000000015 200 9.9 40000 8000000 160000000016 200 9 40000 8000000 160000000017 500 7.8 250000 125000000 6250000000018 500 9 250000 125000000 6250000000019 500 8.5 250000 125000000 6250000000020 500 6.7 250000 125000000 62500000000

proc glm data=passht plot=diagnostics;model Height=Passes Pass2 Pass3 Pass4;output out=quarticout p=predicted;

run;

The GLM ProcedureDependent Variable: Height

SourceModelErrorCorrected Total

Sum ofSquares

243.162000030.9275000

274.0895000

Pr > F<.0001

DF41519

Mean Square60.79050002.0618333

F Value29.48

The above 88 for the quartic model look exactly the same as for the I-way ANOVA.

5

2~@

Page 16: ~aA7ljhomepage.divms.uiowa.edu/~rdecook/stat5201/notes/2-11a_modeling_trends.pdfSTAT:5201 Applied Statistics II (Transformation in a dose-response setting) One-Factor experiment with

Source DF Type I SS

Passes 1 141.2953228Pass2 1 54.3022721Pass3 1 16.0233017Pass4 1 31.5411034

StandardParameter Estimate Error

Intercept 18.00000000 0.71795427Passes -0.36377960 0.06814183Pass2 0.00560094 0.00131285Pass3 -0.00002684 0.00000671Pass4 0.00000003 0.00000001

Mean Square F Value Pr > F

<.00010.00010.01380.00144- ,s, ~ou ().I' I-i c, -I-~r II'\.

~..ucbcJ } rr: r s~V c ttl:;,-C. ?5 f..1 t .(',-0 .(l~ f "

The estimates of the 5 parameters used to fit the quartic are shown in the last part of the above output.

141.295322854.302272116.023301731.5411034

68.5326.347.7715.30

Type I SS and tests are computed based on 'sequential sums'. In Type I SS, the order in which termsarc entered into the model affects the SS and tests. Above, Passes is entered first, and the test for signifi-cance is done using ONLY Passes in the model (and an intercept). Next, Pass2 is entered into the model,and the Type I SS test is done given Passes is accounted for, but none of the other terms. For most of ourmodels, these Type I SS and tests are not of interest, but in the case of a polynomial, they do provide someuseful information. For instance, given that we've fit a cubic to the data, should we fit a more complex modeland include a quartic term'? The final p-value tests for sufficiency of the cubic model in this en"c.

t Value Pr > ItI

25.07 <.0001-5.34 <.00014.27 0.0007-4.00 0.00123.91 0.0014

Based on the Type I SS output, it looks like we need the complexity of a quartic to describe the data.

NOTE: The l-way ANOVA model and quartic polynomial provide the same fitted values:

data apreds; set anovaout;anovapred=predicted;keep anovapred;

data qpreds; set quarticout;quarticpreds=predicted;keep quarticpreds;

data bothpreds; merge apreds qpreds;proc print data=bothpreds;run;

Obs anovapred quarticpreds

1 18.000 18.0002 18.000 18.0003 18.000 18.0004 18.000 18.0005 12.000 12.0006 12.000 12.0007 12.000 12.0008 12.000 12.0009 11.975 11.97510 11.975 11.975

6

Page 17: ~aA7ljhomepage.divms.uiowa.edu/~rdecook/stat5201/notes/2-11a_modeling_trends.pdfSTAT:5201 Applied Statistics II (Transformation in a dose-response setting) One-Factor experiment with

11121314151617181920

11.97511.9759.000

9.000

9.0009.000

8.000

8.000

8.000

8.000

11.97511.9759.000

9.000

9.000

9.000

8.0008.000

8.000

8.000

Can we transform the dosages in order to create a linear trend?

We will first. try a square root transformation, then a quarter power transformation.

SAS Part 3:

data passht; set passht;sqrtPass=sqrt(Passes);

run;

proc gplot data=passht;plot Height*sqrtPass;

run;

Looks like we need to bring the right tail in further to get a

7

30

Page 18: ~aA7ljhomepage.divms.uiowa.edu/~rdecook/stat5201/notes/2-11a_modeling_trends.pdfSTAT:5201 Applied Statistics II (Transformation in a dose-response setting) One-Factor experiment with

data passht; set passht;qrootPass=Passes**(1/4); /* SAS statement for a power transformation. */

run;

proc gplot data=passht;plot Height*qrootPass;

run;

17

+:t:

++++

++

+ + ++

+ +

+

16

15

14

13

12

11

10

oqrootPass

data passht; set passht;qrootPass2=qrootPass*qrootPass;qrootPass3=qrootPass*qrootPass2;qrootPass4=qrootPass2*qrootPass2;

Fit a quartic polynomial to the above transformed data and look at Type I SS:

~ StBUeJt /;tdSiM1lS AS!t«Irt!>

run;

proc glm data=passht;

run70del Height=qrootPass qrootPass2 qrootPass3 qrootpass4;t"...- +r~*ormJ X~The GLM ProcedureDependent Variable: Height

SourceModelErrorCorrected Total

DF41519

Sum ofSquares

243.162000030.9275000

274.0895000

Mean Square60.79050002.0618333

F Value29.48

Pr > F<.0001

8

Page 19: ~aA7ljhomepage.divms.uiowa.edu/~rdecook/stat5201/notes/2-11a_modeling_trends.pdfSTAT:5201 Applied Statistics II (Transformation in a dose-response setting) One-Factor experiment with

Source DF Type I SS Mean Square F Value Pr > F

qrootPass 1 235.7993988 235.7993988 114.36(2qrootPass2 1 1.7977902 1.7977902 0.87 . 652_qrootPass3 1 0.0546985 0.0546985 0.03 0.8728qrootPass4 1 5.5101125 5.5101125 2.67 0.1229

Based on the Type I SS, it looks like a simple linear regression on the transformed data is sufficient.Fit that model:

proc glm data=passht plot=diagnostics;model Height=qrootPass;

run;

The GLM ProcedureDependent Variable: Height

SourceModelErrorCorrected Total

Sum ofSquares

235.799398838.2901012

274.0895000

Mean Square235.7993988

2.1272278

DF1

1819

The GLM ProcedureDependent Variable: Height

ParameterStandard

Error t ValueEstimate

InterceptqrootPass

17.66169730-2.14611031

27.36-10.53

0.645646420.20383920

20

Fit Plot for Height............................................................

o

15

10

2 3

.. ~-- Fit a 95 •••Confidence Limits 95 ••• Prediction limits

F Value110.85

Pr > F<.0001

Pr > It I

<.0001<.0001

o

Using transformed data always makes interpretation a little more difficult, but if it simp lies the relationshipbetween the dose and response, it may be worth it.

9

Page 20: ~aA7ljhomepage.divms.uiowa.edu/~rdecook/stat5201/notes/2-11a_modeling_trends.pdfSTAT:5201 Applied Statistics II (Transformation in a dose-response setting) One-Factor experiment with

In R you can quickly get the fit of the polynomials using the following code:

x=Passes

plot(x,Height,main="Height vs. passes(x)")

d = seq(0, 500, length.out = 200)

for(degree in 1:4) {

fm <- lm(Height ~ poly(x, degree))

lines(d, predict(fm, data.frame(x = d)), col = degree)

}

0 100 200 300 400 500

810

1214

1618

20

Height vs. passes(x)

x

Height

10