anova demo part 2: analysis
DESCRIPTION
ANOVA Demo Part 2: Analysis. Psy 320 Cal State Northridge Andrew Ainsworth PhD. Review: Sample Variance. This can be re-written into:. Data Set. FYI: N = 9. Total Sums of Squares. Let’s calculate the Sums of Squares (SS) for this data set as it is…. - PowerPoint PPT PresentationTRANSCRIPT
ANOVA DemoPart 2: Analysis
Psy 320Cal State Northridge
Andrew Ainsworth PhD
Review: Sample Variance
22
1iX X
sN
22
1iX X SS
sN df
This can be re-written into:
Data SetScores G Means
71
101178
141316
Mean 9.667 9.667
G1
G2
G3
6.00
8.67
14.33
FYI: N = 9
Scores G Means71
101178
141316
Mean 9.667 9.667
G1
G2
G3
6.00
8.67
14.33
Total Sums of SquaresLet’s calculate the Sums of Squares (SS) for this data set as it is…
2T i GMSS X X As we can see the mean of 9.67 has already been calculated for us and we are going to treat that 9.67 as a Grand Mean (i.e. ungrouped mean)
Scores G Means71
101178
141316
Mean 9.667 9.667
G1
G2
G3
6.00
8.67
14.33
Total Sums of Squares
2 2 2
2 2 2
2 2 2
7 9.667 1 9.667 10 9.667
11 9.667 7 9.667 8 9.667
14 9.667 13 9.667 16 9.667
TSS
164TSS
7.111 75.111 0.111
1.778 7.111 2.778
18.778 11.111 40.111
TSS
Scores G Means71
101178
141316
Mean 9.667 9.667
G1
G2
G3
6.00
8.67
14.33
Between Groups Sums of SquaresSo, the Total Sums of Squares applies to this data if all of the 9 data points were collected as part of a single 9 member group.
However, what if the data were collected in groups of 3 instead
And let’s imagine that each group is receiving some different form of treatment (e.g. Independent variable) that we think will affect the subjects’ scores along with each individual group’s mean
Between Groups Sums of SquaresNote: that each group has it’s own mean that describes the central tendency of the participants in that group (e.g. 6 is the mean of 7, 1 and 10)
Also Note: That with equal subjects in each group the average of the 3 group means is the “Grand Mean” from earlier (i.e. 9.67 is the average of 6, 8.667 and 14.333)
Scores G Means71
101178
141316
Mean 9.667 9.667
G1
G2
G3
6.000
8.667
14.333
Between Groups Sums of SquaresSo, if we want to understand the effect that the different treatments are having on the group via the participants (i.e. how are the treatments moving the participants away from the grand mean) we can pretend for a second that every participant scored exactly at their own group mean
Scores G Means71
101178
141316
Mean 9.667 9.667
G1
G2
G3
6.000
8.667
14.333
Between Groups Sums of SquaresSo, if we want to understand the effect that the different treatments are having on the group via the participants (i.e. how are the treatments moving the participants away from the grand mean) we can pretend for a second that every participant scored exactly at their own group mean
Note: the group means and grand mean stays the same
Scores G Means666
8.6678.6678.667
14.33314.33314.333
Mean 9.667 9.667
G1 6.000
G2 8.667
G3 14.333
Scores G Means666
8.6678.6678.667
14.33314.33314.333
Mean 9.667 9.667
G1 6.000
G2 8.667
G3 14.333
Between Groups Sums of SquaresLet’s ignore the group means for a second and calculate the SS pretending that every participant scored at their group mean
2 2 2
2 2 2
2 2 2
6 9.667 6 9.667 6 9.667
8.667 9.667 8.667 9.667 8.667 9.667
14.333 9.667 14.333 9.667 14.333 9.667
BGSS
108.655BGSS
2BG gi GMSS X X
Between Groups Sums of SquaresLet’s take a look at that last formula and see that when calculated there is a lot of redundancy within each group
2 2 26 9.667 6 9.667 6 9.667BGSS
For instance for the first group we are subtracting and squaring the same number 3 times (i.e. one for each participant)
Couldn’t we come to the same answer by simply doing it once and multiplying by the number of participants in that group (i.e. 3)?
23* 6 9.667BGSS
Between Groups Sums of SquaresThis is why typically we don’t substitute the mean for every person’s score but just weight the difference by the number of scores in each group (i.e. ng)
2 2
2
[3* 6 9.667 ] [3* 8.667 9.667 ]
[3* 14.333 9.667 ]
BGSS
108.655BGSS
2BG g g GMSS n X X
Within Groups Sums of SquaresLooking back at the original data we can see that in fact the subject rarely, if ever, scored exactly at their group mean…
So, something else, beside our hypothesized treatment, is causing our subjects to differ within each of the groups
We haven’t hypothesized it, therefore we can’t explain why it’s there so we are going to assume that it is just random variation, but we still need to identify it…
Scores G Means71
101178
141316
Mean 9.667 9.667
G1
G2
G3
6.000
8.667
14.333
Within Groups Sums of Squares
Scores71
10Mean 6
G1
To identify the random variation, let’s look inside each group separately to see if we can find an average degree of variation
Remembering that variance is SS/df let’s identify the with group SS values for each group
For Group 1: 1 1
2
WG i gSS X X
1
2 2 27 6 1 6 10 6WGSS
142WGSS
Within Groups Sums of SquaresTo identify the random variation, let’s look inside each group separately to see if we can find an average degree of variation
Remembering that variance is SS/df let’s identify the with group SS values for each group
For Group 2: 2 2
2
WG i gSS X X
2
2 2
2
11 8.667 7 8.667
8 8.667
WGSS
28.667WGSS
Scores1178
Mean 8.667
G2
Note: this equals the mean coincidentally
Within Groups Sums of SquaresTo identify the random variation, let’s look inside each group separately to see if we can find an average degree of variation
Remembering that variance is SS/df let’s identify the with group SS values for each group
For Group 3: 3 3
2
WG i gSS X X
3
2 2
2
14 14.333 13 14.333
16 14.333
WGSS
34.667WGSS
Scores141316
Mean 14.333
G3
Within Groups VarianceThese Within Groups Sums of Squares can be used to tell us how people just “randomly” spread out within each of groups
Remembering that variance is SS/df let’s divide each groups SS by it’s degrees of freedom (df)
22
1j
j
i g
WGj
X X
n
Where nj is the number of participants in each group (e.g. for our example nj = 3 for all of the groups)
The Variance Within Each Group
22
1j j
j
j
i g WG
WGj WG
X X SS
n df
For group 1:
1
1
1
2 4221
3 1WG
WGWG
SS
df
The Variance Within Each Group
22
1j j
j
j
i g WG
WGj WG
X X SS
n df
For group 2:
2
2
2
2 8.6674.334
3 1WG
WGWG
SS
df
The Variance Within Each Group
22
1j j
j
j
i g WG
WGj WG
X X SS
n df
For group 3:
3
3
3
2 4.6672.334
3 1WG
WGWG
SS
df
Average Within Groups VarianceNow that we have the variances within each of the groups we can calculate an average within groups variance that is an extension of the pooled variance from the independent samples t-testBecause the values for nj are equal this is just a simple average (Note: if the nj values are not equal you can perform a weighted average or just calculate the WG variance directly as in the next slide)
2 21 4.334 2.3349.223
3WG
Note: No subscript because this is not for any particular group but an average across the across the groups
Within Groups Sums of SquaresThe value for the overall WG Sums of Squares could have been calculated directly by simply combining the SSWG formula across groups
Note again: that there is no subscript on the SS value because it is done for all groups at the same time
Scores G Means71
101178
141316
Mean 9.667 9.667
G1
G2
G3
6.000
8.667
14.333
2jWG i gSS X X
Within Groups Sums of SquaresRemembering that the means for the groups are 6, 8.667 and 14.333 respectively we can simply take every individual score and subtract the mean of the group the score belongs to
All together now…
2 2 2
2 2 2
2 2 2
7 6 1 6 10 6
11 8.667 7 8.667 8 8.667
14 14.333 13 14.333 16 14.333
WGSS
55.333WGSS
Source SS df MS (2) FBetween Groups 108.655 2 54.3275 5.89Within Groups 55.333 6 9.223Total 164 8
ANOVA Summary Table
Let’s take what we know and see if we can’t put it together and summarize it using the table aboveWe know that the SSTotal = 164We know that the SSBG = 108.655We know that the SSWG = 55.333We also know that the WG variance (i.e. MSWG above) = 9.223 from the average of the 3 group variancesNote: The SS for Between and Within add up to the SS-total as it should
Source SS df MS (2) FBetween Groups 108.655 2 54.3275 5.89Within Groups 55.333 6 9.223Total 164 8
ANOVA Summary Table
We have the SS value for the BG source of variability but we need to convert it to a variance.Remembering that variance is SS/df, we just need to figure out what are the BG degrees of freedom.
# 1 3 1 2BGdf groups
Source SS df MS (2) FBetween Groups 108.655 2 54.328 5.89Within Groups 55.333 6 9.223Total 164 8
ANOVA Summary Table
We have the SS value for the BG source of variability but we need to convert it to a variance.Remembering that variance is SS/df, we now just need to divide the SS value by the df value for the Between Groups source
2 108.65554.328
2BG
BG BGBG
SSMS
df
Source SS df MS (2) FBetween Groups 108.655 2 54.328 5.89Within Groups 55.333 6 9.223Total 164 8
ANOVA Summary Table
We have the SS value and the MS value for the WG source of variability but these to values should be connected, somehow…
Remembering that variance is SS/df, we just need to figure out what are the degrees of freedom Within Groups to see if we divide in the same way as with the BG source if we get the same value (i.e. 9.223)
ANOVA Summary TableWhen we calculated the Within Groups variance we did so by averaging over the three individual group variances, each of which had a n – 1 degree of freedom
So, that’s an n – 1 for each group or g * (n – 1)
Or you can think of it as you need to calculate a mean for each group so you simply take the total number of scores (i.e. N) and subtract 1 for every group (i.e. g), and that’s N – g
Note: that if all of the nj values are equal then g*(n-1) = N – g
Source SS df MS (2) FBetween Groups 108.655 2 54.328 5.89Within Groups 55.333 6 9.223Total 164 8
ANOVA Summary Table
We have 9 total subjects and 3 groups so that’s…
3*(3 1) 3*2 6
9 3 6
WG
WG
df
OR
df N g
Source SS df MS (2) FBetween Groups 108.655 2 54.328 5.89Within Groups 55.333 6 9.223Total 164 8
ANOVA Summary Table
If we divide the SS value by the df value for the Within Groups source of variance we in fact get the same value we calculated earlier using the pooling method
2 55.3339.222
6WG
WG WGWG
SSMS
df
Source SS df MS (2) FBetween Groups 108.655 2 54.328 5.89Within Groups 55.333 6 9.223Total 164 8
ANOVA Summary Table
In order to calculate the total SS we needed to estimate a single Grand Mean, because of this we lose one degree of freedom.
The total degrees of freedom is simply the total number of participants (i.e. N) minus 1
1 9 1 8Totaldf N Note: The degrees of freedom for BG and WG sum to the df-total as they should
ANOVA Summary TableIn the ANOVA demo #1 we talked about how the Between Groups Variance is a measure of how far apart the groups are from the Grand Mean, which in turn tells us how far apart they are from each other on average.
In order for us to know if the groups are varying far away from each other (i.e. they are significantly different) we need a measure of random variability to see if our groups are differing more than just randomly
The Within Groups Variance tells us how much individuals vary from one another on average across the groups and this is our best estimate of random variability so we use it to see if the groups are different by creating the F-ratio
Source SS df MS (2) FBetween Groups 108.655 2 54.328 5.89Within Groups 55.333 6 9.223Total 164 8
ANOVA Summary Table
The F-ratio is simply the ratio of the Between Groups variance over the Within Groups Variance
2
2
54.3285.89
9.223BG BG
WG WG
MSF
MS
ANOVA Summary TableThe Between Groups variance contains both Real and Random variability, while the Within Groups variance contains only random (at least that’s what we are assuming).
So in order for an F-ratio to be large the real group differences have to be large enough for us to see them through the random differences
2 2 2Real Random
2 2Random
BG
WG
F
If no real differences exist than you are left with 2 2
Random2 2
Random
1BG
WG
F
Source SS df MS (2) FBetween Groups 108.655 2 54.328 5.89Within Groups 55.333 6 9.223Total 164 8
ANOVA Summary Table
The values found in the F-table indicate how much “real” variability exists between the groups compared to the random variability, controlling for the number of groups (i.e. dfBG) and the number of people in each group (i.e. dfWG)
For our example
( , ) (2,6) 5.143crit BG WG critF df df F
ANOVA Summary Table
The value of 5.143 tells us that any value of 5.143 or larger is not likely to occur by accident (i.e. it has a .05 of lower probability) given the number of groups and the number of subjects per groupSince our F-value is 5.89 and that is larger than 5.143 we can conclude that some significant group difference occurs somewhere between 2 of our group means
( , ) (2,6) 5.143crit BG WG critF df df F
Source SS df MS (2) FBetween Groups 108.655 2 54.328 5.89Within Groups 55.333 6 9.223Total 164 8