analysis of variancestaff.pubhealth.ku.dk/~lts/varians_regression/overheads/... · 2012. 3. 19. ·...

70
Analysis of Variance Analysis of variance and regression course http://staff.pubhealth.ku.dk/~lts/regression11_2 Marc Andersen, [email protected] Analysis of variance and regression for health researchers, November 24, 2011 1 / 70

Upload: others

Post on 27-Jul-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Analysis of Variancestaff.pubhealth.ku.dk/~lts/varians_regression/overheads/... · 2012. 3. 19. · Analysis of variance and regression for health researchers, November 24, 2011 1/70

Analysis of VarianceAnalysis of variance and regression course

http://staff.pubhealth.ku.dk/~lts/regression11_2

Marc Andersen, [email protected]

Analysis of variance and regression for health researchers,November 24, 2011

1 / 70

Page 2: Analysis of Variancestaff.pubhealth.ku.dk/~lts/varians_regression/overheads/... · 2012. 3. 19. · Analysis of variance and regression for health researchers, November 24, 2011 1/70

Outline

Comparison of serveral groups

Model checking

Two-way ANOVA

Interaction

Advanced designs

2 / 70

Page 3: Analysis of Variancestaff.pubhealth.ku.dk/~lts/varians_regression/overheads/... · 2012. 3. 19. · Analysis of variance and regression for health researchers, November 24, 2011 1/70

Acknowledgements

written by Lene Theil Skovgaard1

2006, 2007updated by Julie Lyng Forman1

2008, November 2009updated by Marc Andersen2

April 2009, April 2010, November 2010,April 2011, November 2011

1Dept. of Biostatistics2StatGroup

3 / 70

Page 4: Analysis of Variancestaff.pubhealth.ku.dk/~lts/varians_regression/overheads/... · 2012. 3. 19. · Analysis of variance and regression for health researchers, November 24, 2011 1/70

Comparison of 2 or more groups

number different sameof groups individuals individual

2 unpaired pairedt-test t-test

≥2 oneway two wayanalysis of variance analysis of variance

4 / 70

Page 5: Analysis of Variancestaff.pubhealth.ku.dk/~lts/varians_regression/overheads/... · 2012. 3. 19. · Analysis of variance and regression for health researchers, November 24, 2011 1/70

One-way analysis of variance

◮ Do the distributions differ between the groups?◮ Do the levels differ between the groups?

5 / 70

Page 6: Analysis of Variancestaff.pubhealth.ku.dk/~lts/varians_regression/overheads/... · 2012. 3. 19. · Analysis of variance and regression for health researchers, November 24, 2011 1/70

Example: ventilation during anaesthesia

Data: 22 bypass-patients randomised to 3 different kinds ofventilation during anaesthesiaOutcome: measurement of red cell folate (µg/L)

Group I 50% N2O, 50% O2 for 24 hoursGroup II 50% N2O, 50% O2 during operationGroup III 30–50% O2 (no N2O) for 24 hours

Gr.I Gr.II Gr.IIIn 8 9 5Mean 316.6 256.4 278.0SD 58.7 37.1 33.8

6 / 70

Page 7: Analysis of Variancestaff.pubhealth.ku.dk/~lts/varians_regression/overheads/... · 2012. 3. 19. · Analysis of variance and regression for health researchers, November 24, 2011 1/70

Example: ventilation during anaesthesia

Red

cel

l fol

ate

(µg/

L)

200

250

300

350

400

GroupI II III

7 / 70

Page 8: Analysis of Variancestaff.pubhealth.ku.dk/~lts/varians_regression/overheads/... · 2012. 3. 19. · Analysis of variance and regression for health researchers, November 24, 2011 1/70

One-way ANOVA

One-waybecause we only have one critera for classification of theobservations, here ventilation method

ANalysis Of VAriancebecause we comparethe variance between groupswith the variance within groups

8 / 70

Page 9: Analysis of Variancestaff.pubhealth.ku.dk/~lts/varians_regression/overheads/... · 2012. 3. 19. · Analysis of variance and regression for health researchers, November 24, 2011 1/70

The one-way ANOVA model

NotationThe j’th observation from group i is described by:

Yij = µi + εij

j’th observation mean individualin group no. i group i deviation

i.e. as consisting of mean of the group plus an individualdeviation , with εij ∼ N(0, σ2) or equivalently Yij ∼ N(µi , σ

2).

AssumptionsObservations are assumed be independent and to follow anormal distribution with mean µi withing group i with the samevariance.

Model assumptions should be investigated!

9 / 70

Page 10: Analysis of Variancestaff.pubhealth.ku.dk/~lts/varians_regression/overheads/... · 2012. 3. 19. · Analysis of variance and regression for health researchers, November 24, 2011 1/70

Hypothesis testing

Investigate difference between groups

◮ Null hypothesis: group means are equal, H0 : µi = µ

◮ Alternative hypothesis: group means are not equal◮ We conclude that the means are not equal when we reject

the null hypothesis of equality (ref DGA, 8.5 HypothesisTesting)

10 / 70

Page 11: Analysis of Variancestaff.pubhealth.ku.dk/~lts/varians_regression/overheads/... · 2012. 3. 19. · Analysis of variance and regression for health researchers, November 24, 2011 1/70

ANOVA math: Sums of squares

Decomposition of ’deviation from grand mean’

yij − y·= (yij − yi) + (yi − y

·)

Decomposition of variation (sums of squares)

i ,j

(yij − y·)2

︸ ︷︷ ︸

total variation

=∑

i ,j

(yij − yi)2

︸ ︷︷ ︸

within groups

+∑

i ,j

(yi − y·)2

︸ ︷︷ ︸

between groups

yij j ’th observation in i ’th groupyi average in i ’th groupy. overall average, or ’grand mean’

11 / 70

Page 12: Analysis of Variancestaff.pubhealth.ku.dk/~lts/varians_regression/overheads/... · 2012. 3. 19. · Analysis of variance and regression for health researchers, November 24, 2011 1/70

Decomposition of variation

total = between + within

SStotal = SSbetween + SSwithin

(n − 1) = (k − 1) + (n − k)

F-test statistic

F =MSbetween

MSwithin=

SSbetween/(k − 1)

SSwithin/(n − k)

Hypothesis testReject the null hypothesis if F is large, i.e. if the variationbetween groups is too large compared to the variation withingroups.

12 / 70

Page 13: Analysis of Variancestaff.pubhealth.ku.dk/~lts/varians_regression/overheads/... · 2012. 3. 19. · Analysis of variance and regression for health researchers, November 24, 2011 1/70

Analysis of variance table

ANOVA tableVariation df SS MS F PBetween k − 1 SSb SSb/dfb MSb/MSw P(F (dfb, dfw) > Fobs)Within n − k SSw SSw/dfw

Total n − 1 SStot

F test statisticsThe F test statistics follows and F-distribution with dfb and dfwdegrees of freedom: Fobs ∼ F (dfb, dfw).

13 / 70

Page 14: Analysis of Variancestaff.pubhealth.ku.dk/~lts/varians_regression/overheads/... · 2012. 3. 19. · Analysis of variance and regression for health researchers, November 24, 2011 1/70

Analysis of variance table - Anaestesia example

ANOVA table

df SS MS F PBetween 2 15515.77 7757.9 3.71 0.04Within 19 39716.09 2090.3Total 21 55231.86

F test statistics

F = 3.71 ∼ F (2, 19) ⇒ P = 0.04

InterpretationWeak evidence of non-equality of the three means

14 / 70

Page 15: Analysis of Variancestaff.pubhealth.ku.dk/~lts/varians_regression/overheads/... · 2012. 3. 19. · Analysis of variance and regression for health researchers, November 24, 2011 1/70

Analysis of variance in SAS

To define the anaestesia data in SAS, we write

data ex_redcell;input grp redcell;cards;1 2431 2511 275. .. .. .3 2933 328;run;

The variable redcell contains all the measurements of theoutcome and grp contains the method of ventilation for eachindividual.

15 / 70

Page 16: Analysis of Variancestaff.pubhealth.ku.dk/~lts/varians_regression/overheads/... · 2012. 3. 19. · Analysis of variance and regression for health researchers, November 24, 2011 1/70

Descriptive statistics

proc tabulate data=ex_redcell missing;title "Table of means and standard deviation and standard error";class grp;var redcell;table redcell*(N*f=f5.0 mean*f=f5.1 std*f=f5.1), grp;

run;title;

------------------------------------------| | Group || |-----------------|| | I | II | III ||----------------------+-----+-----+-----||Red cell |N | 8| 9| 5||folate |-----------+-----+-----+-----|| |Mean |316.6|256.4|278.0|| |-----------+-----+-----+-----|| |Std | 58.7| 37.1| 33.8|------------------------------------------

16 / 70

Page 17: Analysis of Variancestaff.pubhealth.ku.dk/~lts/varians_regression/overheads/... · 2012. 3. 19. · Analysis of variance and regression for health researchers, November 24, 2011 1/70

Analysis of variance program

proc glm data=ex_redcell;class grp;model redcell=grp / solution;run;

General Linear Models ProcedureDependent Variable: REDCELL

Sum of MeanSource DF Squares Square F Value Pr > F

Model 2 15515.7664 7757.8832 3.71 0.0436Error 19 39716.0972 2090.3209Corrected Total 21 55231.8636

R-Square C.V. Root MSE REDCELL Mean0.280921 16.14252 45.7200 283.227

Source DF Type I SS Mean Square F Value Pr > FGRP 2 15515.7664 7757.8832 3.71 0.0436

Source DF Type III SS Mean Square F Value Pr > FGRP 2 15515.7664 7757.8832 3.71 0.0436

17 / 70

Page 18: Analysis of Variancestaff.pubhealth.ku.dk/~lts/varians_regression/overheads/... · 2012. 3. 19. · Analysis of variance and regression for health researchers, November 24, 2011 1/70

Parameter estimates

The option solution outputs parameter estimates

T for H0: Pr > |T| Std Error ofParameter Estimate Parameter=0 Estimate

INTERCEPT 278.0000000 B 13.60 0.0001 20.44661784GRP 1 38.6250000 B 1.48 0.1548 26.06442584

2 -21.5555556 B -0.85 0.4085 25.501412903 0.0000000 B . . .

NOTE: The X’X matrix has been found to be singular and a generalizedinverse was used to solve the normal equations. Estimates followedby the letter ’B’ are biased, and are not unique estimators of theparameters.

◮ Group 3 (the last group) is the reference group◮ The estimates for the other groups refer to differences to

this reference group

18 / 70

Page 19: Analysis of Variancestaff.pubhealth.ku.dk/~lts/varians_regression/overheads/... · 2012. 3. 19. · Analysis of variance and regression for health researchers, November 24, 2011 1/70

Parameter estimates with confidence intervals

The ods output statement stores parameter estimates

proc glm data=ex_redcell;class grp;ods output ParameterEstimates=ParamEstim;model redcell=grp / noint solution CLPARM;

run;

proc print data=ParamEstim;run;

Lower UpperParameter Estimate CL CL

grp I 316.63 282.79 350.46grp II 256.44 224.55 288.34grp III 278.00 235.20 320.80

19 / 70

Page 20: Analysis of Variancestaff.pubhealth.ku.dk/~lts/varians_regression/overheads/... · 2012. 3. 19. · Analysis of variance and regression for health researchers, November 24, 2011 1/70

PROC glm box plot

20 / 70

Page 21: Analysis of Variancestaff.pubhealth.ku.dk/~lts/varians_regression/overheads/... · 2012. 3. 19. · Analysis of variance and regression for health researchers, November 24, 2011 1/70

Interpreting the estimates

◮ What is the scientific question◮ Clinical significance◮ Statistical significance◮ Provide confidence interval◮ Does it make sense?

21 / 70

Page 22: Analysis of Variancestaff.pubhealth.ku.dk/~lts/varians_regression/overheads/... · 2012. 3. 19. · Analysis of variance and regression for health researchers, November 24, 2011 1/70

Multiple comparisons

The F -test show, that there is a difference — but where?

Pairwise t-tests are not suitable due to risk of masssignificance

A significance level of α = 0.05 means 5% chance of wrongfullyrejecting a true hypothesis (type I error)

The chance of at least one type I error goes up with the numberof tests.

(for k groups, we have m = k(k − 1)/2 possible tests, the actual significance level can

be as bad as: 1 − (1 − α)m , e.g. for k=5: 0.40)

22 / 70

Page 23: Analysis of Variancestaff.pubhealth.ku.dk/~lts/varians_regression/overheads/... · 2012. 3. 19. · Analysis of variance and regression for health researchers, November 24, 2011 1/70

Adressing multiplicity

There is no completely satisfactory solution.

Approximative solutions

1. Select a (small) number of relevant comparisons in theplanning stage.

2. Make a graph of the average ±2 × SEM and judge visually(!), perhaps supplemented with F -tests on subsets ofgroups.

3. Modify the t-tests by multiplying the P-values with thenumber of tests, the socalled Bonferroni correction(conservative)

4. Use a correction for multiple testing (Dunnett, Tukey) or a(prespecified) multiple testing procedure

23 / 70

Page 24: Analysis of Variancestaff.pubhealth.ku.dk/~lts/varians_regression/overheads/... · 2012. 3. 19. · Analysis of variance and regression for health researchers, November 24, 2011 1/70

Tukey: multiple comparisons in SAS

proc glm data=ex_redcell;class grp;model redcell=grp /

solution;lsmeans grp /

adjust=tukey pdiff cl;run;

The GLM ProcedureLeast Squares MeansAdjustment for Multiple Comparisons: Tukey-Kramer

Least Squares Means for effect grpPr > |t| for H0: LSMean(i)=LSMean(j)

Dependent Variable: redcell

i/j 1 2 3

1 0.0355 0.32152 0.0355 0.68023 0.3215 0.6802

Least Squares Means for Effect grp

Difference Simultaneous 95%Between Confidence Limits for

i j Means LSMean(i)-LSMean(j)

1 2 60.180556 3.742064 116.6190471 3 38.625000 -27.590379 104.8403792 3 -21.555556 -86.340628 43.229517

24 / 70

Page 25: Analysis of Variancestaff.pubhealth.ku.dk/~lts/varians_regression/overheads/... · 2012. 3. 19. · Analysis of variance and regression for health researchers, November 24, 2011 1/70

Visual assessment (1/3)

The bars represent 95 % confidence intervals for the meansusing the standard deviation for each group (std2mjt insymbol1 statement).

proc gplot data=ex_redcell;plot redcell*grp

/ haxis=axis1 vaxis=axis2 frame;axis1 order=(1 to 3 by 1)

offset=(8,8)label=(H=3)value=(H=2) minor=NONE;

axis2offset=(1,1) value=(H=2) minor=NONElabel=(A=90 R=0 H=3);

symbol1 v=circle i=std2mjt l=1 h=2 w=2;run;

Red

cel

l fol

ate

200220240260280300320340360380400

GroupI II III

25 / 70

Page 26: Analysis of Variancestaff.pubhealth.ku.dk/~lts/varians_regression/overheads/... · 2012. 3. 19. · Analysis of variance and regression for health researchers, November 24, 2011 1/70

Visual assessment (2/3)

The bars represent 95 % confidence intervals for the meansusing the pooled standard deviation for each group (std2mpjtin symbol1 statement).

proc gplot data=ex_redcell;plot redcell*grp

/ haxis=axis1 vaxis=axis2 frame;axis1 order=(1 to 3 by 1)

offset=(8,8)label=(H=3)value=(H=2) minor=NONE;

axis2offset=(1,1) value=(H=2) minor=NONElabel=(A=90 R=0 H=3);

symbol1 v=circle i=std2mpjt l=1 h=2 w=2;run;

Red

cel

l fol

ate

200220240260280300320340360380400

GroupI II III

26 / 70

Page 27: Analysis of Variancestaff.pubhealth.ku.dk/~lts/varians_regression/overheads/... · 2012. 3. 19. · Analysis of variance and regression for health researchers, November 24, 2011 1/70

Visual assessment (3/3)The bars represent 95 % confidence intervals for the meansusing the pooled standard deviation for each group obtainedfrom PROC glm.

27 / 70

Page 28: Analysis of Variancestaff.pubhealth.ku.dk/~lts/varians_regression/overheads/... · 2012. 3. 19. · Analysis of variance and regression for health researchers, November 24, 2011 1/70

Model checking

Check if the assumptions are reasonable: (If not theanalysis is unreliable!)

◮ Variance homogeneity may be checked by performingLevenes test (or Bartletts test).

◮ In case of variance inhomogeneity, we may also perform aweighted analysis (Welch’s test ), just as in the T-test

◮ Normality may be checked through probability plots (orhistograms) of residuals, or by a numerical test on theresiduals.

◮ In case of non-normality, we may use the nonparametricKruskal-Wallis test

Transformation (often logarithms) may help to achievevariance homogeneity as well as normality

28 / 70

Page 29: Analysis of Variancestaff.pubhealth.ku.dk/~lts/varians_regression/overheads/... · 2012. 3. 19. · Analysis of variance and regression for health researchers, November 24, 2011 1/70

Check of variance homogeneity and normality in SAS

proc glm data=ex_redcell;class grp;model redcell=grp;means grp / hovtest=levene welch;output out=model p=predicted r=residual;

run;

Store residuals in a dataset for further model checking

proc univariate data=model normal ;var residual;histogram residual/ normal(mu=0);ppplot residual / normal(mu=0) square;

run;

29 / 70

Page 30: Analysis of Variancestaff.pubhealth.ku.dk/~lts/varians_regression/overheads/... · 2012. 3. 19. · Analysis of variance and regression for health researchers, November 24, 2011 1/70

Output from proc glm: Test for variance homogeneity

Levene’s Test for Homogeneity of redcell VarianceANOVA of Squared Deviations from Group Means

Sum of MeanSource DF Squares Square F Value Pr > F

grp 2 18765720 9382860 4.14 0.0321Error 19 43019786 2264199

Weighted anova in case of variance heterogeneity:

Welch’s ANOVA for redcell

Source DF F Value Pr > F

grp 2.0000 2.97 0.0928Error 11.0646

So we are not too sure concerning the group differences.....

30 / 70

Page 31: Analysis of Variancestaff.pubhealth.ku.dk/~lts/varians_regression/overheads/... · 2012. 3. 19. · Analysis of variance and regression for health researchers, November 24, 2011 1/70

Test for normality

Output from proc univariate

Tests for NormalityTest --Statistic--- -----p Value----Shapiro-Wilk W 0.965996 Pr < W 0.6188Kolmogorov-Smirnov D 0.107925 Pr > D >0.1500Cramer-von Mises W-Sq 0.043461 Pr > W-Sq >0.2500Anderson-Darling A-Sq 0.263301 Pr > A-Sq >0.2500

The 4 tests focus on different aspects of non-normality.

◮ For small data sets, we rarely get significance◮ For large data sets, we almost always get significance◮ Could look at a probability plot instead

31 / 70

Page 32: Analysis of Variancestaff.pubhealth.ku.dk/~lts/varians_regression/overheads/... · 2012. 3. 19. · Analysis of variance and regression for health researchers, November 24, 2011 1/70

Output from proc univariate: Histogram and probabilityplot

32 / 70

Page 33: Analysis of Variancestaff.pubhealth.ku.dk/~lts/varians_regression/overheads/... · 2012. 3. 19. · Analysis of variance and regression for health researchers, November 24, 2011 1/70

Non-parametric ANOVA, the Kruskal-Wallis test

SAS code

proc npar1way wilcoxon;exact;class grp;var redcell;run;

Wilcoxon Scores (Rank Sums) for Variable redcellClassified by Variable grp

Sum of Expected Std Dev Meangrp N Scores Under H0 Under H0 Score-------------------------------------------------------------------1 8 120.0 92.00 14.651507 15.0000002 9 77.0 103.50 14.974979 8.5555563 5 56.0 57.50 12.763881 11.200000

Kruskal-Wallis TestChi-Square 4.1852DF 2Asymptotic Pr > Chi-Square 0.1234Exact Pr >= Chi-Square 0.1233

Again, we have ’lost’ the significance....

33 / 70

Page 34: Analysis of Variancestaff.pubhealth.ku.dk/~lts/varians_regression/overheads/... · 2012. 3. 19. · Analysis of variance and regression for health researchers, November 24, 2011 1/70

Two-way analysis of variance

Two criterias for subdividing observations, A og B

Data in two-way layout:

BA 1 2 · · · c1 · · ·

2 · · ·

......

......

r · · ·

◮ Effect of both factors◮ Perhaps even

interaction (effectmodification)

One factor may be ’individuals’or “experimental units” (e.g. dif-ferent treatments tried on sameperson)

34 / 70

Page 35: Analysis of Variancestaff.pubhealth.ku.dk/~lts/varians_regression/overheads/... · 2012. 3. 19. · Analysis of variance and regression for health researchers, November 24, 2011 1/70

Repeated measurements

Example: Short term effect of enalaprilate on heart rate(beats per minute) (DGM, Section 12.3.1)

TimeSubject 0 30 60 120 average1 96 92 86 92 91.502 110 106 108 114 109.503 89 86 85 83 85.754 95 78 78 83 83.505 128 124 118 118 122.006 100 98 100 94 98.007 72 68 67 71 69.508 79 75 74 74 75.509 100 106 104 102 103.00average 96.56 92.56 91.11 92.33 93.14

35 / 70

Page 36: Analysis of Variancestaff.pubhealth.ku.dk/~lts/varians_regression/overheads/... · 2012. 3. 19. · Analysis of variance and regression for health researchers, November 24, 2011 1/70

Line plot (“Spaghettiogram”)

Ideally the time courses are parallel.

36 / 70

Page 37: Analysis of Variancestaff.pubhealth.ku.dk/~lts/varians_regression/overheads/... · 2012. 3. 19. · Analysis of variance and regression for health researchers, November 24, 2011 1/70

The additive model

The two effects (s and t) work in an additive way.

Yst = µ + αs + βt + εst

The εst ’s are assumed to be independent, normally distributedwith mean 0, and identical variances, εst ∼ N(0, σ2).(This assumption should be investigated!)

Variational decomposition:

SStotal = SSsubject + SStime + SSresidual

37 / 70

Page 38: Analysis of Variancestaff.pubhealth.ku.dk/~lts/varians_regression/overheads/... · 2012. 3. 19. · Analysis of variance and regression for health researchers, November 24, 2011 1/70

Analysis of variance table - enalaprilate example

df SS MS F PSubjects 8 8966.6 1120.8 90.64 <0.0001Times 3 151.0 50.3 4.07 0.0180Residual 24 296.8 12.4Total 35 9414.3

◮ Highly significant difference between subjects (not veryinteresting)

◮ Significant time differences.

38 / 70

Page 39: Analysis of Variancestaff.pubhealth.ku.dk/~lts/varians_regression/overheads/... · 2012. 3. 19. · Analysis of variance and regression for health researchers, November 24, 2011 1/70

Two-way ANOVA in SAS

proc glm data=ex_pulse;class subject times;

model hrate=subject times / solution;run;

General Linear Models ProcedureClass Level Information

Class Levels Values

SUBJECT 9 1 2 3 4 5 6 7 8 9TIMES 4 0 30 60 120

Number of observations in data set = 36

39 / 70

Page 40: Analysis of Variancestaff.pubhealth.ku.dk/~lts/varians_regression/overheads/... · 2012. 3. 19. · Analysis of variance and regression for health researchers, November 24, 2011 1/70

Two-way ANOVA output

General Linear Models Procedure

Dependent Variable: HRATESum of Mean

Source DF Squares Square F Value Pr > F

Model 11 9117.52778 828.86616 67.03 0.0001Error 24 296.77778 12.36574Corrected Total 35 9414.30556

R-Square C.V. Root MSE HRATE Mean

0.968476 3.775539 3.51650 93.1389

Source DF Type I SS Mean Square F Value Pr > F

SUBJECT 8 8966.55556 1120.81944 90.64 0.0001TIMES 3 150.97222 50.32407 4.07 0.0180

Source DF Type III SS Mean Square F Value Pr > F

SUBJECT 8 8966.55556 1120.81944 90.64 0.0001TIMES 3 150.97222 50.32407 4.07 0.0180

40 / 70

Page 41: Analysis of Variancestaff.pubhealth.ku.dk/~lts/varians_regression/overheads/... · 2012. 3. 19. · Analysis of variance and regression for health researchers, November 24, 2011 1/70

Parameter estimates

T for H0: Pr > |T| Std Error ofParameter Estimate Parameter=0 Estimate

INTERCEPT 102.1944444 B 50.34 0.0001 2.03024963SUBJECT 1 -11.5000000 B -4.62 0.0001 2.48653783

2 6.5000000 B 2.61 0.0152 2.486537833 -17.2500000 B -6.94 0.0001 2.486537834 -19.5000000 B -7.84 0.0001 2.486537835 19.0000000 B 7.64 0.0001 2.486537836 -5.0000000 B -2.01 0.0557 2.486537837 -33.5000000 B -13.47 0.0001 2.486537838 -27.5000000 B -11.06 0.0001 2.486537839 0.0000000 B . . .

TIMES 0 4.2222222 B 2.55 0.0177 1.6576918930 0.2222222 B 0.13 0.8945 1.6576918960 -1.2222222 B -0.74 0.4681 1.65769189120 0.0000000 B . . .

NOTE: The X’X matrix has been found to be singular and a generalizedinverse was used to solve the normal equations. Estimates followedby the letter ’B’ are biased, and are not unique estimators of theparameters.

◮ subject 9 at time 120 minutes is the reference

41 / 70

Page 42: Analysis of Variancestaff.pubhealth.ku.dk/~lts/varians_regression/overheads/... · 2012. 3. 19. · Analysis of variance and regression for health researchers, November 24, 2011 1/70

Expected values and residuals

Expected values for subject=3, times=30

yst = µ + αs + βt

= 102.19 − 17.25 + 0.22

= 85.16

Residuals

rst = observed − expected

= yst − yst ≈ εst

Residual for subject 3, time 30: r32 = 86 − 85.16 = 0.84

42 / 70

Page 43: Analysis of Variancestaff.pubhealth.ku.dk/~lts/varians_regression/overheads/... · 2012. 3. 19. · Analysis of variance and regression for health researchers, November 24, 2011 1/70

Model checking

Look for:

◮ differences in variances (systematic?)◮ Non-normality◮ Lack of additivity (interaction).

Can only be tested if there is more than one observationfor each combination

◮ Serial correlation?(Neighboring observations look more alike)

43 / 70

Page 44: Analysis of Variancestaff.pubhealth.ku.dk/~lts/varians_regression/overheads/... · 2012. 3. 19. · Analysis of variance and regression for health researchers, November 24, 2011 1/70

Residual based diagnostics

Use the residuals for model checking

◮ Probability plot of residuals.◮ Plot residuals vs expected values.◮ Plot residuals vs group.◮ Look for outliers (a large residual means observed and

expected values deviate a lot).

44 / 70

Page 45: Analysis of Variancestaff.pubhealth.ku.dk/~lts/varians_regression/overheads/... · 2012. 3. 19. · Analysis of variance and regression for health researchers, November 24, 2011 1/70

Enalaprilate example

No systematic patterns should be present.

45 / 70

Page 46: Analysis of Variancestaff.pubhealth.ku.dk/~lts/varians_regression/overheads/... · 2012. 3. 19. · Analysis of variance and regression for health researchers, November 24, 2011 1/70

Interaction

Example of two criterias for subdividing individuals:sex and smoking habits

Outcome: FEV1

Here, we see an interaction between sex and smoking.

46 / 70

Page 47: Analysis of Variancestaff.pubhealth.ku.dk/~lts/varians_regression/overheads/... · 2012. 3. 19. · Analysis of variance and regression for health researchers, November 24, 2011 1/70

Possible explanations for interaction

◮ Biologically different effects of smoking on males andfemales

◮ Perhaps the women do not smoke as much as the men◮ Perhaps the effect is relative

(to be expressed in %)

47 / 70

Page 48: Analysis of Variancestaff.pubhealth.ku.dk/~lts/varians_regression/overheads/... · 2012. 3. 19. · Analysis of variance and regression for health researchers, November 24, 2011 1/70

Example: The effect of smoking on birth weight

48 / 70

Page 49: Analysis of Variancestaff.pubhealth.ku.dk/~lts/varians_regression/overheads/... · 2012. 3. 19. · Analysis of variance and regression for health researchers, November 24, 2011 1/70

Example: The effect of smoking on birth weight

49 / 70

Page 50: Analysis of Variancestaff.pubhealth.ku.dk/~lts/varians_regression/overheads/... · 2012. 3. 19. · Analysis of variance and regression for health researchers, November 24, 2011 1/70

Interpreting interaction

◮ There is an effect of smoking, but only for those who havebeen smoking for a long time.

◮ There is an effect of duration, and this effects increaseswith amount of smoking

The effect of duration depends upon .... amount of smoking

and the effect of amount depends upon .... duration of smoking

50 / 70

Page 51: Analysis of Variancestaff.pubhealth.ku.dk/~lts/varians_regression/overheads/... · 2012. 3. 19. · Analysis of variance and regression for health researchers, November 24, 2011 1/70

Example: Fibrinogen after spleen operation

34 rats are randomized, in 2 ways

◮ 17 have their spleen removed (splenectomy=yes/no)◮ 8/17 in each group are kept in altitude chambers

(corresponding to 15.000 ft)(place=altitude/control)

OutcomeFibrinogen level in mg% at day 21

Source: Rupert G. Miller: Beyond ANOVA, p. 161-162

51 / 70

Page 52: Analysis of Variancestaff.pubhealth.ku.dk/~lts/varians_regression/overheads/... · 2012. 3. 19. · Analysis of variance and regression for health researchers, November 24, 2011 1/70

Example: Fibrinogen after spleen operation

Fib

rinog

en (

mg%

%)

100

200

300

400

500

600

group

no_altitude no_control yes_altitude yes_control

52 / 70

Page 53: Analysis of Variancestaff.pubhealth.ku.dk/~lts/varians_regression/overheads/... · 2012. 3. 19. · Analysis of variance and regression for health researchers, November 24, 2011 1/70

ANOVA model with interaction

The usual additive model:

Yspr = µ + αs + βp + εspr , εspr ∼ N(0, σ2)

splenectomy (s=yes/no) and place (p=altitude/control)have an additive effect.

Model with interaction

Yspr = µ + αs + βp + γsp + εspr , εspr ∼ N(0, σ2)

Here, we specify an interaction between splenectomy andplace, i.e. the effect of living in a high altitude may be thoughtto depend upon whether or not you have an intact spleen.

and vice versa..

53 / 70

Page 54: Analysis of Variancestaff.pubhealth.ku.dk/~lts/varians_regression/overheads/... · 2012. 3. 19. · Analysis of variance and regression for health researchers, November 24, 2011 1/70

Two-way ANOVA with interaction in SAS

proc glm data=ex_fibrinogen;class splenectomy place;

model fibrinogen=place splenectomyplace*splenectomy / solution;

output out=model p=predicted r=residual;run;

The GLM Procedure

Class Level Information

Class Levels Values

splenectomy 2 no yesplace 2 altitude control

Number of observations 34

54 / 70

Page 55: Analysis of Variancestaff.pubhealth.ku.dk/~lts/varians_regression/overheads/... · 2012. 3. 19. · Analysis of variance and regression for health researchers, November 24, 2011 1/70

Output: two-way ANOVA table

Dependent Variable: fibrinogen

Sum ofSource DF Squares Mean Square F Value Pr > F

Model 3 139439.2067 46479.7356 8.32 0.0004Error 30 167573.7639 5585.7921Corrected Total 33 307012.9706

R-Square Coeff Var Root MSE fibrinogen Mean0.454180 20.99213 74.73816 356.0294

Source DF Type I SS Mean Square F Value Pr > F

place 1 67925.25531 67925.25531 12.16 0.0015splenectomy 1 69662.38235 69662.38235 12.47 0.0014splenectomy*place 1 1851.56904 1851.56904 0.33 0.5691

Source DF Type III SS Mean Square F Value Pr > F

place 1 67925.25531 67925.25531 12.16 0.0015splenectomy 1 68093.92198 68093.92198 12.19 0.0015splenectomy*place 1 1851.56904 1851.56904 0.33 0.5691

55 / 70

Page 56: Analysis of Variancestaff.pubhealth.ku.dk/~lts/varians_regression/overheads/... · 2012. 3. 19. · Analysis of variance and regression for health researchers, November 24, 2011 1/70

Output: Parameter estimates

StandardParameter Estimate Error t Value Pr > |t|

Intercept 261.6666667 B 24.91271904 10.50 <.0001place altitude 104.3333333 B 36.31621657 2.87 0.0074place control 0.0000000 B . . .splenectomy no 104.4444444 B 35.23190514 2.96 0.0059splenectomy yes 0.0000000 B . . .splenectomy*place no altitude -29.5694444 B 51.35888601 -0.58 0.5691splenectomy*place no control 0.0000000 B . . .splenectomy*place yes altitude 0.0000000 B . . .splenectomy*place yes control 0.0000000 B . . .

NOTE: The X’X matrix has been found to be singular, and a generalized inverse was used tosolve the normal equations. Terms whose estimates are followed by the letter ’B’ are not

uniquely estimable.

56 / 70

Page 57: Analysis of Variancestaff.pubhealth.ku.dk/~lts/varians_regression/overheads/... · 2012. 3. 19. · Analysis of variance and regression for health researchers, November 24, 2011 1/70

Computing expected values

The reference levels are place=control,splenectomy=yes(as SAS chooses the reference levels as last level based onalphabetic ordering)

so the expected fibrinogen level for these animals isintercept=261.67

For all other groups, we have to add one or more extraestimates, as shown in the table below:

57 / 70

Page 58: Analysis of Variancestaff.pubhealth.ku.dk/~lts/varians_regression/overheads/... · 2012. 3. 19. · Analysis of variance and regression for health researchers, November 24, 2011 1/70

Expected fibrinogen levels

placesplenectomy control altitude

261.67 261.67yes + 104.33

= 366.00261.67 261.67

+ 104.44 + 104.44no + 104.33

- 29.57= 366.11 = 440.87

Note: expected value for splenectomy=no, place=altitude - rounding issue

58 / 70

Page 59: Analysis of Variancestaff.pubhealth.ku.dk/~lts/varians_regression/overheads/... · 2012. 3. 19. · Analysis of variance and regression for health researchers, November 24, 2011 1/70

Model checking

Variance homogeneity may be judged from a one-wayanova

The GLM ProcedureClass Level Information

Class Levels Valuesgroup 4 no_altitude no_control yes_altitude yes_control

Number of observations 34

Levene’s Test for Homogeneity of fibrinogen VarianceANOVA of Squared Deviations from Group Means

Sum of MeanSource DF Squares Square F Value Pr > F

group 3 1.9078E8 63594756 1.55 0.2222Error 30 1.2314E9 41045352

No reason to suspect inhomogeneity

59 / 70

Page 60: Analysis of Variancestaff.pubhealth.ku.dk/~lts/varians_regression/overheads/... · 2012. 3. 19. · Analysis of variance and regression for health researchers, November 24, 2011 1/70

Normality assumption for residuals

Result from proc univariate normal)

Tests for Normality

Test --Statistic--- -----p Value------Shapiro-Wilk W 0.964518 Pr < W 0.3276Kolmogorov-Smirnov D 0.126665 Pr > D >0.1500Cramer-von Mises W-Sq 0.091627 Pr > W-Sq 0.1424Anderson-Darling A-Sq 0.490958 Pr > A-Sq 0.2140

ConclusionNo reason to suspect non-normality

60 / 70

Page 61: Analysis of Variancestaff.pubhealth.ku.dk/~lts/varians_regression/overheads/... · 2012. 3. 19. · Analysis of variance and regression for health researchers, November 24, 2011 1/70

Model simplification

In the two-way anova, the interaction was not significant(P=0.77), so we omit it from the model:

proc glm data=ex_fibrinogen;class splenectomy place;model fibrinogen=place splenectomy / solution clparm;

run;

Dependent Variable: fibrinogen

Sum ofSource DF Squares Mean Square F Value Pr > F

Model 2 137587.6377 68793.8188 12.59 <.0001Error 31 169425.3329 5465.3333Corrected Total 33 307012.9706

R-Square Coeff Var Root MSE fibrinogen Mean0.448149 20.76455 73.92789 356.0294

Source DF Type III SS Mean Square F Value Pr > Fplace 1 67925.25531 67925.25531 12.43 0.0013splenectomy 1 69662.38235 69662.38235 12.75 0.0012

61 / 70

Page 62: Analysis of Variancestaff.pubhealth.ku.dk/~lts/varians_regression/overheads/... · 2012. 3. 19. · Analysis of variance and regression for health researchers, November 24, 2011 1/70

Assessing the main effects

StandardParameter Estimate Error t Value Pr > |t|

Intercept 268.6241830 B 21.54935559 12.47 <.0001place altitude 89.5486111 B 25.40104253 3.53 0.0013place control 0.0000000 B . . .splenectomy no 90.5294118 B 25.35705800 3.57 0.0012splenectomy yes 0.0000000 B . . .

Parameter 95% Confidence Limits

Intercept 224.6739825 312.5743835place altitude 37.7428433 141.3543789place control . .splenectomy no 38.8133510 142.2454725splenectomy yes . .

◮ Removal of spleen leads to a decrease in fibronogen ofapprox 90.53 mg% at day 21

◮ Placing in altitude leads to an increase in fibronogen ofapprox 89.55 mg% at day 21

62 / 70

Page 63: Analysis of Variancestaff.pubhealth.ku.dk/~lts/varians_regression/overheads/... · 2012. 3. 19. · Analysis of variance and regression for health researchers, November 24, 2011 1/70

Residual plots

Normality Variance homogeneity

Res

idua

l-200

-100

0

100

200

Expected

260 280 300 320 340 360 380 400 420 440 460

63 / 70

Page 64: Analysis of Variancestaff.pubhealth.ku.dk/~lts/varians_regression/overheads/... · 2012. 3. 19. · Analysis of variance and regression for health researchers, November 24, 2011 1/70

More complicated analyses of variances

◮ Three- or more-sided analysis of variance.◮ Latin squares

1 2 3I A B CII B C AIII C A B

(Cochran & Cox (1957): Experimental Designs, 2.ed., Wiley)

◮ Cross-over designs◮ Variance component models

64 / 70

Page 65: Analysis of Variancestaff.pubhealth.ku.dk/~lts/varians_regression/overheads/... · 2012. 3. 19. · Analysis of variance and regression for health researchers, November 24, 2011 1/70

Example of a latin square: A rabbit experiment

65 / 70

Page 66: Analysis of Variancestaff.pubhealth.ku.dk/~lts/varians_regression/overheads/... · 2012. 3. 19. · Analysis of variance and regression for health researchers, November 24, 2011 1/70

Example of a latin square: A rabbit experiment

◮ 6 rabbits◮ Vaccination at 6 different

spots on the back◮ 6 different orders of

vaccination◮ Swelling is area of

blister (cm2)

spot rabbit order swelling

1 1 3 7.91 2 5 8.71 3 4 7.41 4 1 7.4

.

.6 4 4 5.86 5 1 6.46 6 3 7.7

66 / 70

Page 67: Analysis of Variancestaff.pubhealth.ku.dk/~lts/varians_regression/overheads/... · 2012. 3. 19. · Analysis of variance and regression for health researchers, November 24, 2011 1/70

Illustrationssw

ellin

g

5

6

7

8

9

10

spot

a b c d e f

1

1

11 1

1

22 2

2

2

2

3 3

33

3 34 44

4

44

5

5

5

5

5 5

6

6

6

6

66

sw

ellin

g

5

6

7

8

9

10

order

1 2 3 4 5 6

11

1

1

11

2 2 2

2

22

3 33

3

3

34 4

44

445 5

55

55

6

66 6

6

6

67 / 70

Page 68: Analysis of Variancestaff.pubhealth.ku.dk/~lts/varians_regression/overheads/... · 2012. 3. 19. · Analysis of variance and regression for health researchers, November 24, 2011 1/70

3-way analysis of variance, with additive effects

proc glm;class rabbit spot order;model swelling=rabbit spot order;

run;

The GLM Procedure

Class Level Information

Class Levels Values

rabbit 6 1 2 3 4 5 6spot 6 a b c d e forder 6 1 2 3 4 5 6

Number of observations 36

68 / 70

Page 69: Analysis of Variancestaff.pubhealth.ku.dk/~lts/varians_regression/overheads/... · 2012. 3. 19. · Analysis of variance and regression for health researchers, November 24, 2011 1/70

3-way analysis of variance

Dependent Variable: swelling

Sum ofSource DF Squares Mean Square F Value Pr > F

Model 15 17.23000000 1.14866667 1.75 0.1205Error 20 13.13000000 0.65650000Corrected Total 35 30.36000000

R-Square Coeff Var Root MSE swelling Mean

0.567523 10.99883 0.810247 7.366667

Source DF Type III SS Mean Square F Value Pr > F

rabbit 5 12.83333333 2.56666667 3.91 0.0124spot 5 3.83333333 0.76666667 1.17 0.3592order 5 0.56333333 0.11266667 0.17 0.9701

The design is balanced , so the test of the effect of one variable(covariate) does not depend on which of the others are still inthe model.

69 / 70

Page 70: Analysis of Variancestaff.pubhealth.ku.dk/~lts/varians_regression/overheads/... · 2012. 3. 19. · Analysis of variance and regression for health researchers, November 24, 2011 1/70

How about possible interactions?

proc glm;class rabbit spot order;model swelling=rabbit spot order spot*order;

run;

Dependent Variable: swellingSum of

Source DF Squares Mean Square F Value Pr > F

Model 35 30.36000000 0.86742857 . .Error 0 0.00000000 .Corrected Total 35 30.36000000

Source DF Type I SS Mean Square F Value Pr > F

rabbit 5 12.83333333 2.56666667 . .spot 5 3.83333333 0.76666667 . .order 5 0.56333333 0.11266667 . .spot*order 20 13.13000000 0.65650000 . .

There is no room for interaction, since there is only oneobservation for each combination of spot and order!

70 / 70