copyright (c) bani k. mallick1 stat 651 lecture #13
Post on 21-Dec-2015
215 views
TRANSCRIPT
Copyright (c) Bani K. Mallick 1
STAT 651
Lecture #13
Copyright (c) Bani K. Mallick 2
Topics in Lecture #13 Multiple comparisons, especially
Fisher’s Least Significant Difference
Residuals as a means of checking the normality assumption
Copyright (c) Bani K. Mallick 3
Book Sections Covered in Lecture #13
Chapter 8.4 (Residuals)
Chapter 9.4 (Fisher’s)
Chapter 9.1 (the idea of multiple comparisons)
Copyright (c) Bani K. Mallick 4
Lecture 12 Review: ANOVA
Suppose we form three populations on the basis of body mass index (BMI):
BMI < 22, 22 <= BMI < 28, BMI > 28
This forms 3 populations
We want to know whether the three populations have the same mean caloric intake, or if their food composition differs.
Copyright (c) Bani K. Mallick 5
Lecture 12 Review: ANOVA
One procedure that is often followed is to do a preliminary test to see whether there are any differences among the populations
Then, once you conclude that some differences exist, you allow somewhat more informality in deciding where those differences manifest themselves
The first step is the ANOVA F-test
Copyright (c) Bani K. Mallick 6
Lecture 12 Review: ANOVA
The distance of the data to the overall mean is
TSS = (Corrected) Total Sum of Squares
This has degrees of freedom
2ij
ij
TSS = (Y Y )
Tn 1
Copyright (c) Bani K. Mallick 7
Lecture 12 Review: ANOVA
The sum of squares between groups Corrected Model) is
It has t-1 degrees of freedom, so the number of populations is the degrees of freedom between groups + 1.
2ii
i
n (Y Y )
Copyright (c) Bani K. Mallick 8
Lecture 12 Review: ANOVA
The distance of the observations to their sample means is
This is the Sum of Squares for Error
It has degrees of freedom
2iij
ij
SSE = (Y Y )
Tn t
Copyright (c) Bani K. Mallick 9
Lecture 12 Review: ANOVA
Next comes the F-statistic
It is the ratio of the mean square for the corrected model to the mean square for error
Large values indicate rejection of the null hypothesis Tests of Between-Subjects Effects
Dependent Variable: Baseline FFQ
960.287a 2 480.143 5.689 .004
196009.919 1 196009.919 2322.508 .000
960.287 2 480.143 5.689 .004
15275.639 181 84.396
226223.216 184
16235.925 183
SourceCorrected Model
Intercept
BMIGROUP
Error
Total
Corrected Total
Type III Sumof Squares df Mean Square F Sig.
R Squared = .059 (Adjusted R Squared = .049)a.
Copyright (c) Bani K. Mallick 10
Lecture 12 Review: ANOVA
The F-statistic is compared to the F-distribution with t-1 and degrees of freedom.
See Table 8 ,which lists the cutoff points in terms of . If the F-statistic exceeds the cutoff, you reject the hypothesis of equality of all the means.
SPSS gives you the p-value (significance level) for this test
Tn t
Copyright (c) Bani K. Mallick 11
Lecture 12 Review: ANOVA
The F-statistic is compared to the F-distribution with df1 = t-1 and degrees of freedom.
For example if you have 3 populations, 6 observations for each population, then there are 18 total observations.
The degrees of freedom are 2 and 15. If you want a type I error of 5%, look at df1 = 2, df2 = 15, = .05 to get a critical value of 3.68: try this out!
2 Tdf =n t
Copyright (c) Bani K. Mallick 12
Lecture 12 Review: ANOVA
If the populations have a common variance 2, the Mean squared error estimates it.
You take the square root of the MSE to estimate
Tests of Between-Subjects Effects
Dependent Variable: Baseline FFQ
960.287a 2 480.143 5.689 .004
196009.919 1 196009.919 2322.508 .000
960.287 2 480.143 5.689 .004
15275.639 181 84.396
226223.216 184
16235.925 183
SourceCorrected Model
Intercept
BMIGROUP
Error
Total
Corrected Total
Type III Sumof Squares df Mean Square F Sig.
R Squared = .059 (Adjusted R Squared = .049)a.
Copyright (c) Bani K. Mallick 13
Lecture 12 Review: ANOVA
The critical value of 2 and 181 df for an F-test at Type I error 0.05 is about 3.05
Hence F > 3.05, so the p-value is < 0.05
Tests of Between-Subjects Effects
Dependent Variable: Baseline FFQ
960.287a 2 480.143 5.689 .004
196009.919 1 196009.919 2322.508 .000
960.287 2 480.143 5.689 .004
15275.639 181 84.396
226223.216 184
16235.925 183
SourceCorrected Model
Intercept
BMIGROUP
Error
Total
Corrected Total
Type III Sumof Squares df Mean Square F Sig.
R Squared = .059 (Adjusted R Squared = .049)a.
Copyright (c) Bani K. Mallick 14
ANOVA in SPSS
“Analyze”, “General Linear Model”, “Univariate”
“Fixed factor” = the variable defining the populations
Always “Save” unstandardized residuals
“Posthoc”: Move factor to right and click on LSD
Copyright (c) Bani K. Mallick 15
ANOVA Table
Tests of Between-Subjects Effects
Dependent Variable: Baseline FFQ
960.287a 2 480.143 5.689 .004
196009.919 1 196009.919 2322.508 .000
960.287 2 480.143 5.689 .004
15275.639 181 84.396
226223.216 184
16235.925 183
SourceCorrected Model
Intercept
BMIGROUP
Error
Total
Corrected Total
Type III Sumof Squares df Mean Square F Sig.
R Squared = .059 (Adjusted R Squared = .049)a.
Copyright (c) Bani K. Mallick 16
Fisher’s Least Significant Distance (LSD)
Suppose that we determine that there are at least some differences among t population means.
Fisher’s Least Significant Difference is one way to tell which ones are different
The main reason to use it is convenience: all comparisons can be done with the click of a mouse
It does not guarantee longer or shorter confidence intervals
Copyright (c) Bani K. Mallick 17
Fisher’s Least Significant Distance (LSD)
For example, suppose there are t = 3 populations.
The null hypothesis is
The alternative is:
But this does not tell you which populations are different, only that some are
0 1 2 3Η :μ =μ =μ
0H : null hypothesis is false
Copyright (c) Bani K. Mallick 18
Fisher’s Least Significant Distance (LSD)
The null hypothesis is
The alternative is:
There are 4 possibilities:
Fishers LSD is a way of getting this directly
0 1 2 3H :μ =μ =μ
0H : null hypothesis is false
1 2 3
1 3 2
2 3 1
1 2 3
Copyright (c) Bani K. Mallick 19
Fisher’s LSD
We have done an ANOVA, and now we want to compare two specific populations.
Fisher’s LSD differs from our usual 2-population comparisons in two features:
The degrees of freedom (nT-t) not n1+n2-2
The pooled standard deviation (square
root of MSE = SSE/(nT-t) , not sP
Copyright (c) Bani K. Mallick 20
Review: Comparing Two Populations
If you can reasonably believe that the population sd’s are nearly equal, it is customary to pick the equal variance assumption and estimate the common standard deviation by
2 21 1 2 2
p1 2
(n 1)s (n 1)ss
n +n 2
Copyright (c) Bani K. Mallick 21
Comparing Two Populations: Usual and Fisher LSD
1 2X X /2 1 1 2
2 pt (n +n -2n
)1 1
sn
21 2
1 2Tn t
1 1MSE
n nt
Usual
Fisher
Copyright (c) Bani K. Mallick 22
ROS Data
ROS data has three groups: Fish oil diet, Fish-like oil diet, and Corn Oil
We want to compare their responses to butyrate
Between-Subjects Factors
FAEE oildiet
10
Fish oil diet 10
Corn oil diet 10
1.00
2.00
3.00
DietGroup
Value Label N
Copyright (c) Bani K. Mallick 23
ANOVA
ROS data, log scale. What do you see?
101010N =
ROS Response After Butyrate Exposure
Diet Group
Corn oil dietFish oil dietFAEE oil diet
log
(Bu
tyra
te)
- lo
g(C
on
tro
l)
2.0
1.5
1.0
.5
0.0
-.5
24
Copyright (c) Bani K. Mallick 24
ANOVA
ROS data, log scale. What do you see? Maybe different variances, but sample sizes are small
101010N =
ROS Response After Butyrate Exposure
Diet Group
Corn oil dietFish oil dietFAEE oil diet
log
(Bu
tyra
te)
- lo
g(C
on
tro
l)
2.0
1.5
1.0
.5
0.0
-.5
24
Copyright (c) Bani K. Mallick 25
ANOVA
ROS data, log scale. No major changes in means?
101010N =
ROS Response After Butyrate Exposure
Diet Group
Corn oil dietFish oil dietFAEE oil diet
log
(Bu
tyra
te)
- lo
g(C
on
tro
l)
2.0
1.5
1.0
.5
0.0
-.5
24
Copyright (c) Bani K. Mallick 26
ANOVA
ROS data has three groups: Fish oil diet, Fish-like oil diet, and Corn Oil
What was the total sample size? n = 30Tests of Between-Subjects Effects
Dependent Variable: log(Butyrate) - log(Control)
5.188E-02a 2 2.594E-02 .203 .818
5.957 1 5.957 46.542 .000
5.188E-02 2 2.594E-02 .203 .818
3.456 27 .128
9.465 30
3.508 29
SourceCorrected Model
Intercept
DIETGRP
Error
Total
Corrected Total
Type III Sumof Squares df Mean Square F Sig.
R Squared = .015 (Adjusted R Squared = -.058)a.
Copyright (c) Bani K. Mallick 27
ANOVA
ROS data: any evidence that the population means are different in their change after butyrate exposure?
Tests of Between-Subjects Effects
Dependent Variable: log(Butyrate) - log(Control)
5.188E-02a 2 2.594E-02 .203 .818
5.957 1 5.957 46.542 .000
5.188E-02 2 2.594E-02 .203 .818
3.456 27 .128
9.465 30
3.508 29
SourceCorrected Model
Intercept
DIETGRP
Error
Total
Corrected Total
Type III Sumof Squares df Mean Square F Sig.
R Squared = .015 (Adjusted R Squared = -.058)a.
Copyright (c) Bani K. Mallick 28
ANOVA
ROS data: any evidence that the population means are different in their change after butyrate exposure? No, the p-value is 0.818!
This matches the box plotsTests of Between-Subjects Effects
Dependent Variable: log(Butyrate) - log(Control)
5.188E-02a 2 2.594E-02 .203 .818
5.957 1 5.957 46.542 .000
5.188E-02 2 2.594E-02 .203 .818
3.456 27 .128
9.465 30
3.508 29
SourceCorrected Model
Intercept
DIETGRP
Error
Total
Corrected Total
Type III Sumof Squares df Mean Square F Sig.
R Squared = .015 (Adjusted R Squared = -.058)a.
Copyright (c) Bani K. Mallick 29
ROS Data
Testing for Normality in ANOVA
I use the General Linear Model to define these residuals
Form the residuals, which are simply the differences of the data with their group sample mean
Then do a q-q plot
Useful if you have many groups with a small number of observations per group
Copyright (c) Bani K. Mallick 30
ANOVA
Here is the Q-Q plot. How’s it look?
ROS: log scale
Observed Value
1.0.50.0-.5-1.0
Exp
ect
ed
No
rma
l Va
lue
.8
.6
.4
.2
0.0
-.2
-.4
-.6
-.8
Copyright (c) Bani K. Mallick 31
ROS Data
Testing for Normality in ANOVA:
Illustrate saving residuals: “general linear model”, “univariate”, “save” (select “unstandardized” to create the residual variable )
Illustrate q-q- plot on residuals
Illustrate editing a chart object to change titles and the like
Copyright (c) Bani K. Mallick 32
ROS Data Fisher’s LSD. Note how all p-values are
> 0.10.
Multiple Comparisons
Dependent Variable: log(Butyrate) - log(Control)
LSD
6.825E-02 .1600 .673 -.2600 .3965
9.960E-02 .1600 .539 -.2287 .4279
-6.8255E-02 .1600 .673 -.3965 .2600
3.135E-02 .1600 .846 -.2969 .3596
-9.9605E-02 .1600 .539 -.4279 .2287
-3.1350E-02 .1600 .846 -.3596 .2969
(J) Diet GroupFish oil diet
Corn oil diet
FAEE oil diet
Corn oil diet
FAEE oil diet
Fish oil diet
(I) Diet GroupFAEE oil diet
Fish oil diet
Corn oil diet
MeanDifference
(I-J) Std. Error
Pvalues
Sig. Lower Bound Upper Bound
95% Confidence Interval
Based on observed means.
Copyright (c) Bani K. Mallick 33
ROS Data: Compare Fish to Corn oil
Mean for fish – mean for corn =
Multiple Comparisons
Dependent Variable: log(Butyrate) - log(Control)
LSD
6.825E-02 .1600 .673 -.2600 .3965
9.960E-02 .1600 .539 -.2287 .4279
-6.8255E-02 .1600 .673 -.3965 .2600
3.135E-02 .1600 .846 -.2969 .3596
-9.9605E-02 .1600 .539 -.4279 .2287
-3.1350E-02 .1600 .846 -.3596 .2969
(J) Diet GroupFish oil diet
Corn oil diet
FAEE oil diet
Corn oil diet
FAEE oil diet
Fish oil diet
(I) Diet GroupFAEE oil diet
Fish oil diet
Corn oil diet
MeanDifference
(I-J) Std. Error
Pvalues
Sig. Lower Bound Upper Bound
95% Confidence Interval
Based on observed means.
Copyright (c) Bani K. Mallick 34
ROS Data: Compare Fish to Corn oil
Mean for fish – mean for corn = 0.03135
Standard error =
Multiple Comparisons
Dependent Variable: log(Butyrate) - log(Control)
LSD
6.825E-02 .1600 .673 -.2600 .3965
9.960E-02 .1600 .539 -.2287 .4279
-6.8255E-02 .1600 .673 -.3965 .2600
3.135E-02 .1600 .846 -.2969 .3596
-9.9605E-02 .1600 .539 -.4279 .2287
-3.1350E-02 .1600 .846 -.3596 .2969
(J) Diet GroupFish oil diet
Corn oil diet
FAEE oil diet
Corn oil diet
FAEE oil diet
Fish oil diet
(I) Diet GroupFAEE oil diet
Fish oil diet
Corn oil diet
MeanDifference
(I-J) Std. Error
Pvalues
Sig. Lower Bound Upper Bound
95% Confidence Interval
Based on observed means.
Copyright (c) Bani K. Mallick 35
ROS Data: Compare Fish to Corn oil
Mean for fish – mean for corn = 0.03135
Standard error = 0.1600
CI (95%) = Multiple Comparisons
Dependent Variable: log(Butyrate) - log(Control)
LSD
6.825E-02 .1600 .673 -.2600 .3965
9.960E-02 .1600 .539 -.2287 .4279
-6.8255E-02 .1600 .673 -.3965 .2600
3.135E-02 .1600 .846 -.2969 .3596
-9.9605E-02 .1600 .539 -.4279 .2287
-3.1350E-02 .1600 .846 -.3596 .2969
(J) Diet GroupFish oil diet
Corn oil diet
FAEE oil diet
Corn oil diet
FAEE oil diet
Fish oil diet
(I) Diet GroupFAEE oil diet
Fish oil diet
Corn oil diet
MeanDifference
(I-J) Std. Error
Pvalues
Sig. Lower Bound Upper Bound
95% Confidence Interval
Based on observed means.
Copyright (c) Bani K. Mallick 36
ROS Data: Compare Fish to Corn oil
Mean for fish – mean for corn = 0.03135
Standard error = 0.1600
CI (95%) = -2969 to .3596Multiple Comparisons
Dependent Variable: log(Butyrate) - log(Control)
LSD
6.825E-02 .1600 .673 -.2600 .3965
9.960E-02 .1600 .539 -.2287 .4279
-6.8255E-02 .1600 .673 -.3965 .2600
3.135E-02 .1600 .846 -.2969 .3596
-9.9605E-02 .1600 .539 -.4279 .2287
-3.1350E-02 .1600 .846 -.3596 .2969
(J) Diet GroupFish oil diet
Corn oil diet
FAEE oil diet
Corn oil diet
FAEE oil diet
Fish oil diet
(I) Diet GroupFAEE oil diet
Fish oil diet
Corn oil diet
MeanDifference
(I-J) Std. Error
Pvalues
Sig. Lower Bound Upper Bound
95% Confidence Interval
Based on observed means.
Copyright (c) Bani K. Mallick 37
Concho Water Snake Illustration
A numerical example will help illustrate this idea. I’ll consider comparing tail lengths of female Concho Water Snakes with age classes 2,3, and 4.
Sample sizes
Sample sd:
Sample means:
1 2 3 Tn 11,n 17,n 9,n 37.
1 2 3s 17.90,s 10.95,s 13.58.
1 2 3153.82, 173.24, 194.67.
Copyright (c) Bani K. Mallick 38
Female Concho Water Snakes, Ages 2-4, Tail Length
Between-Subjects Factors
11
17
9
2.00
3.00
4.00
AgeN
Copyright (c) Bani K. Mallick 39
Female Concho Water Snakes, Ages 2-4, Tail Length
91711N =
Age
4.003.002.00
Ta
il L
en
gth
220
200
180
160
140
120
35
27
Copyright (c) Bani K. Mallick 40
Female Concho Water Snakes, Ages 2-4, Tail Length: are they different in population means?
Tests of Between-Subjects Effects
Dependent Variable: Tail Length
8269.413a 2 4134.706 21.304 .000
1043505.649 1 1043505.649 5376.698 .000
8269.413 2 4134.706 21.304 .000
6598.695 34 194.079
1118093.000 37
14868.108 36
SourceCorrected Model
Intercept
AGE
Error
Total
Corrected Total
Type III Sumof Squares df Mean Square F Sig.
R Squared = .556 (Adjusted R Squared = .530)a.
Copyright (c) Bani K. Mallick 41
Concho Water Snake Example
Multiple Comparisons
Dependent Variable: Tail Length
LSD
-19.4171 * 5.3907 .001 -30.3724 -8.4618
-40.8485 * 6.2616 .000 -53.5736 -28.1233
19.4171 * 5.3907 .001 8.4618 30.3724
-21.4314 * 5.7429 .001 -33.1023 -9.7604
40.8485 * 6.2616 .000 28.1233 53.5736
21.4314 * 5.7429 .001 9.7604 33.1023
(J) Age3.00
4.00
2.00
4.00
2.00
3.00
(I) Age2.00
3.00
4.00
MeanDifference
(I-J) Std. Error Sig. Lower Bound Upper Bound
95% Confidence Interval
Based on observed means.
The mean difference is significant at the .05 level.*.
Copyright (c) Bani K. Mallick 42
Concho Water Snake Illustration: Hand Calculations
Sample size factor for comparing the age groups
Sample mean difference
2 3
1 10.41
n n
43.2123
Copyright (c) Bani K. Mallick 43
Concho Water Snake Illustration
nT – t = 34 degrees of freedom for error
MSE = 194.08,
= 0.05
= 9.76 to 33.10: compare with output
MSE 13.93 2 Tn t 2.03t
3 2 2 T2 3
1 1n t MSE
n nt
43.2123
Copyright (c) Bani K. Mallick 44
Female Concho Water Snakes, Ages 2-4, Tail Length
Normal Q-Q Plot of Residual for TAILL
Observed Value
3020100-10-20-30-40
Exp
ect
ed
No
rma
l Va
lue
30
20
10
0
-10
-20
-30
We need a method that allows for non-normal data!