1 contact details colin gray room s16 (occasionally) e-mail address:...

Post on 28-Mar-2015

222 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

Contact details • Colin Gray

• Room S16 (occasionally)

• E-mail address: psy045@abdn.ac.uk

• Telephone: (27) 2233

• Don’t hesitate to get in touch with me by e-mail if you have any queries or encounter any difficulties. I shall respond quickly.

2

SESSION 1

The one-way Analysis of Variance

(ANOVA)

3

The session

• 2.05 – 3.00 A short talk.

• 3.00 – 3.20 A break for coffee.

• 3.20 – 4.00 Running the analysis with SPSS 16.

4

The analysis of variance (ANOVA)

• The analysis of variance (ANOVA) is used analyse data from complex experiments with three or more conditions or groups.

• There are many different kinds of ANOVA experimental plans or DESIGNS.

5

A one-factor, between subjects experiment

6

The one-way ANOVA

• The ANOVA of a one-factor BETWEEN SUBJECTS experiment is known as the ONE-WAY ANOVA.

• The one-way ANOVA must be sharply distinguished from the one-factor WITHIN SUBJECTS (or REPEATED MEASURES) ANOVA, which is appropriate when each participant is tested under every condition.

• The between subjects and within subjects ANOVAs are predicated upon different statistical MODELS.

7

Results of a one-factor, between subjects experiment

raw scores

grand mean

8

Statistics of the results

group (cell) means

group (cell) standard deviations

Group (cell) variances

9

The null hypothesis

• The null hypothesis states that, in the population, all the means have the same value.

• In words, the null hypothesis states that none of the drugs has any effect.

• It’s as if everyone were performing under the Placebo condition.

• The values of the means vary considerably, casting doubt upon the null hypothesis.

10

Deviation scores

• A DEVIATION is a score from which the mean has been subtracted.

• Deviations about the mean sum to zero.

11

12

Partition of the total sum of squares

13

Interpretation of the two components

• The between groups deviation (and sum of squares) partly reflects any differences that there may be among the population means.

• It also reflects random influences such as individual differences and experimental error, that is, ERROR VARIANCE.

• The within groups deviation (and sum of squares) reflects only error variance, or DATA NOISE.

14

How the one-way ANOVA works

• The variability BETWEEN the treatment means is compared with the average spread of scores around their means WITHIN the treatment groups.

• The comparison is made with a statistic called the F-RATIO.

15

Mean squares

• In the ANOVA, a variance estimate is known as a MEAN SQUARE, which is expressed as a SUM OF SQUARES (SS), divided by the appropriate DEGREES OF FREEDOM (df).

16

17

Degrees of freedom

• The term DEGREES OF FREEDOM is borrowed from physics.

• The degrees of freedom of a system is the number of constraints necessary to determine its state completely.

• Deviations of n values about their mean sum to zero. So if you know (n – 1) deviations, you know the nth deviation.

• The sum of squares of the n deviations has only (n – 1) degrees of freedom.

18

19

Calculating MSwithin

• In the equal-n case, we can simply take the mean of the cell variance estimates.

• MSwithin = (3.33 + 4.54 + 6.22 + 20.27 + 14.00)/5

=48.36/5 = 9.67

20

21

Degrees of freedom of the between groups mean square

• Although there were 50 participants, the between groups deviation has only five different values.

• But deviations about the mean sum to zero, so if you know FOUR deviations, you know the fifth.

• The between groups mean square has FOUR degrees of freedom.

22

Finding MSbetween

23

What F is measuring

• If there are differences among the population means, the numerator will be inflated and F will increase.

• If there are no differences, the two MS’s will have similar values and F will be close to 1.

error + real differences

error only

24

Range of variation of F

• The F statistic is the ratio of two sample variances.

• A variance can take only non-negative values.

• So the lower limit for F is zero.

• There is no upper limit for F.

25

Repeated sampling

• Suppose the null hypothesis is true.• Imagine the experiment were to be repeated

thousands and thousands of times, with fresh samples of participants each time.

• There would be thousands and thousands of data sets, from each of which a value of F could be calculated.

• The distribution of F with repeated sampling is known as its SAMPLING DISTRIBUTION.

26

Specifying the sampling distribution

• To test the null hypothesis, you must be able to locate the value of F in its sampling distribution

• To specify the correct distribution of F (or any other test statistic), you must assign values to properties known as PARAMETERS.

27

Parameters of F

• Recall that the t distribution has ONE parameter: the DEGREES OF FREEDOM (df ).

• The F distribution has TWO parameters: the degrees of freedom of the between groups and within groups mean squares, which we shall denote by dfbetween and dfwithin, respectively.

28

The correct F distribution

• We shall specify an F distribution with the notation F(dfbetween, dfwithin).

• We have seen that in our example, dfbetween = 4 and dfwithin = 45.

• The correct F distribution for our test of the null hypothesis is therefore F(4, 45).

29

The distribution of F(4, 45)

30

31

The ANOVA summary table

• The results of the ANOVA are arrayed in the ANOVA summary table.

• The table contains: 1. a list of sources of variance.2. their degrees of freedom.3. their sums of squares.4. their mean squares.5. the value of F. 6. the p-value of F.

32

Reporting the result

• There is a correct format for reporting the results of a statistical test. This is described in the APA publications manual.

• Never report a p-value as ‘.000’ – that’s not acceptable in a scientific report.

• Write, • ‘With an alpha-level of .05, F is significant: F(4,

45) = 9.09; p <.01.’ • You are now expected to include a measure of

EFFECT SIZE as well.

33

Lisa DeBruine’s guidelines• Lisa DeBruine has recently compiled a very useful document

describing the most important of the APA guidelines for the reporting of the results of statistical tests.

• I strongly recommend this document, which is readily available on the Web.

• http://www.facelab.org/debruine/Teaching/Meth_A/

• Sometimes the APA manual is unclear. In such cases, Lisa has opted for what seemed to be the most reasonable of several possible interpretations.

• If you follow Lisa’s guidelines, your submitted paper won’t draw fire on account of poor presentation of your statistics!

34

A two-group, between subjects experiment

35

ANOVA or t-test?

We can compare the two means by using an independent-samples t-test.

But what would happen if, instead of making a t test, we were to run an ANOVA to test the null hypothesis of equality of the means?

36

The two-group case: comparison of ANOVA with the t-test

• Observe that F = t2 .• Observe also that the

p-value is the same for both tests.

• The ANOVA and the independent-samples t test are EXACTLY EQUIVALENT and result in the same decision about the null hypothesis.

37

Implications of a significant F

• Our F test has shown a significant effect of the Drug factor. What can we conclude?

• We can say that the null hypothesis is false.

• That means that, in the population, there is at least one difference among the array of five means.

• But it does not tell us WHICH differences are significant.

38

Only the starting point

• In ANOVA, the rejection of the null hypothesis leaves many questions unanswered.

• Further analysis is needed to pinpoint the crucial patterns in the data.

• So, unlike the t test, the ANOVA is often just the first step in what may be quite an extensive statistical analysis.

39

Comparisons among the five treatment means

40

Planned comparisons

• Suppose that, before you ran your experiment, you had planned to make a set of specified comparisons among the treatment means.

• For example, you might have planned to compare the mean of the Placebo group with the mean for each of the drug conditions.

• I am going to describe how such comparisons are made.

• I am also going to compare the properties of different comparison sets.

41

Simple and complex comparisons

• A comparison between any two of the array of 5 means is known as a SIMPLE comparison.

• Comparisons between the Placebo mean and each of the others are simple comparisons.

• You might want to compare, not single means, but aggregates (means) of means. Such comparisons between aggregates are known as COMPLEX comparisons.

• For example, you might want to compare the Placebo mean with the mean of the four drug means.

42

Notation for the five means

• Let M1, M2, M3, M4, and M5 be the means for the Placebo, Drug A, Drug B, Drug C, Drug D and Drug E groups, respectively.

43

44

Non-independence of comparisons

• The simple comparison of M5 with M1 and the complex comparison are not independent or ORTHOGONAL.

• The value of M5 feeds into the value of the average of the means for the drug groups.

45

Systems of comparisons

• With a complex experiment, interest centres on the properties of SYSTEMS of comparisons.

• Which comparisons are independent or ORTHOGONAL?

• How much VARIANCE can we attribute to different comparisons?

• How can we test a comparison for SIGNIFICANCE?

46

Linear functions

• Y is a linear function of X if the graph of Y upon X is a straight line.

• For example, temperature in degrees Fahrenheit is a linear function of temperature in degrees Celsius.

47

F is a linear function of C

Degrees Fahrenheit

Degrees Celsius (0, 0)

932

5F C

Intercept → 32

Q

P

9 / 5P

SlopeQ

48

49

50

Linear contrasts

• Any comparison can be expressed as a sum of terms, each of which is a product of a treatment mean and a coefficient such that the coefficients sum to zero.

• When so expressed, the comparison is a LINEAR CONTRAST, because it has the form of a linear function.

• It looks artificial at first, but this notation enables us to study the properties of systems of comparisons among the treatment means.

51

The complex comparison of the Placebo mean with the mean of the means of the four drug conditions

can be expressed as a linear function of the five treatment

means …

52

Notice that the coefficients sum to zero

53

54

More compactly, if there are k treatment groups, we can write

55

56

57

58

Helmert contrasts

• Compare the first mean with the mean of the other means.

• Drop the first mean and compare the second mean with the mean of the remaining means. Drop the second mean.

• Continue until you arrive at a comparison between the last two means.

59

Helmert contrasts…

• Our first contrast is • 1, -¼, -¼, -¼, -¼• Our second contrast is• 0, 1, -⅓ , -⅓, -⅓• Our third contrast is• 0, 0, 1, -½, -½• Our fourth is • 0, 0, 0, 1, -1

60

61

Orthogonal contrasts

• The first contrast in no way constrains the value of the second, because the first mean has been dropped.

• The first two contrasts do not affect the third, because the first two means have been dropped.

• This is a set of four independent or ORTHOGONAL contrasts.

62

The orthogonal property

• As with all contrasts, the coefficients in each row sum to zero.

• In addition, the sum of the products of corresponding coefficients in any pair of rows is zero.

• This means that we have an ORTHOGONAL contrast set.

63

Size of an orthogonal set

• In general, for an array of k means, you can construct a set of, at most, k-1 orthogonal contrasts.

• In the present ANOVA example, k = 5, so the rule tells us that there can be no more than 4 orthogonal contrasts in the set.

• Several different orthogonal sets, however, can often be constructed for the same set of means.

64

Contrast sums of squares

• We have seen that in the one-way ANOVA, the value of SSbetween reflects the sizes of the differences among the treatment means.

• In the same way, it is possible to measure the importance of a contrast by calculating a sum of squares which reflects the variation attributable to that contrast alone

• We can use an F statistic to test each contrast for significance.

65

Formula for a contrast sum of squares

66

67

Here, once again, is our set of Helmert contrasts, to which I have

added the values of the five treatment means

68

69

70

71

72

73

Non-orthogonal contrasts

• Contrasts don’t have to be independent.

• For example, you might wish to compare each of the four drug groups with the Placebo group.

• What you want are SIMPLE CONTRASTS.

74

Simple contrasts

• These are linear contrasts – each row sums to zero. • But they are not orthogonal – with some pairings, the

sum of products of corresponding coefficients is not zero.

• Their sums of squares will not sum to the between groups sum of squares.

75

Testing a contrast sum of squares for significance

76

Two approaches

• A contrast is a comparison between two means.

• You can therefore make an F test or you can make a t test.

• The two tests are equivalent.

77

Degrees of freedom of a contrast sum of squares

• A contrast sum of squares compares two means.

• A contrast sum of squares, therefore, has ONE degree of freedom, because the two deviations from the grand mean sum to zero.

78

79

80

81

82

83

84

Summary

• A contrast is a comparison between two means, so its sum of squares has ONE degree of freedom.

• The contrasts can therefore be tested with either F or t. (F = t2.)

• If the contrasts form an orthogonal set, the contrast sums of squares sum to the value of SSbetween.

85

Coffee break

86

SPSS 16

87

Comparison with SPSS 15

• SPSS 16 is essentially similar to SPSS 15.

• When using SPSS 16, the same GENERAL PRINCIPLES apply.

• Several changes, but these are mostly minor.

• Some commands now appear in different menus than they did in SPSS 15.

• Some old problems remain.

88

SPSS Data Editor

• In SPSS, there are two display modes:• 1. Variable View. This contains information

about the variables in your data set.• 2. Data View. This contains your numerical

data (referred to as ‘values’ by SPSS). • WORK IN VARIABLE VIEW FIRST, because

1. that will make it much easier to enter and view your data in Data View.

2. you can improve the quality of the output.

89

Assistance

• I assume that most (if not all) of you are familiar with SPSS.

• If there are any problems, you’re welcome to e-mail me at any time throughout the year and we can quickly arrange an appointment to try to sort them out.

• I shall begin with an analysis of the data from the Drugs experiment.

90

91

Points

• As always, each row contains data on one person only.

• Only two columns are needed.

• One column contains code numbers (values) indicating group membership.

• The other contains the scores that the participants achieved.

92

Value labels

• In SPSS 15, Value Labels was in the Data menu; but in SPSS 16, it’s in the View menu.

93

Variable View

94

Levels of measurement

SPSS classifies data according to the LEVEL OF MEASUREMENT. There are 3 levels:

1. SCALE data, which are measures on an independent scale with units. Heights, weights, performance scores, counts and IQs are scale data. Each score has ‘stand-alone’ meaning. Equivalent terms are CONTINUOUS and INTERVAL.

2. ORDINAL data, which are RANKS. A rank has meaning only in relation to the other individuals in the sample. It has no ‘stand-alone’ meaning.

3. NOMINAL data, which are assignments to categories. (So-many males, so-many females.) Nominal data are records of CATEGORICAL or QUALITATIVE variables.

95

96

Specifying the level of measurement

97

98

99

Beware the means plot!

100

Output graph of the results

101

A false picture!

• The table of means shows miniscule differences among the five group means.

• The p-value of F is very high – unity to two places of decimals.

• Nothing going on here!

102

A microscopic vertical scale

• Only a microscopically small section of the scale is shown on the vertical axis: 10.9 to 11.4!

• Even small differences among the group means look huge.

103

Putting things right

• Double-click on the image to get into the Graph Editor.

• Double-click on the vertical axis to access the scale

specifications.

104

Putting things right …

• Uncheck the minimum value box and enter zero as the desired minimum point.

• Click Apply.

105

Look for the zero point

• The effect is dramatic! • The profile is now as flat

as a pancake. • The graph now accurately

depicts the results. • Always be suspicious of

graphs that do not show the ZERO POINT on the VERTICAL SCALE.

106

Simple contrasts with SPSS

• Here are the entries for the first contrast, which is between the Placebo and Drug A groups.

• Below that are the entries for the final contrast between the Placebo and Drug D groups.

107

The results

• In the column headed ‘Value of Contrast’, are the differences between pairs of treatment means.

• For example, Drug A mean minus Placebo mean = 7.90 - 8.00 = -.10. Drug D – Placebo = 13.00 – 8.00 = 5.00.

108

Clicking buttons

• The basic ANOVA doesn’t take long to order.• But there are several other options that can be added,

such as descriptives and tests for multiple comparisons. • You can spend quite a lot of time filling in slots in dialog

boxes and clicking buttons.• Suppose you wanted to run ANOVAs on other DVs as

well. If you are still in the same session, it’s easy to transfer the new DVs to the box in the ANOVA dialog.

• But when you log off, you would have to do it all again!

109

The answer: Syntax!

• A computing package can be told to run specified analyses by means of a system of written instructions known as CONTROL LANGUAGE.

• In SPSS, the language is known as Syntax. • If you have to run the same technique again and

again, you should create the appropriate syntax file and save it.

• At your next session, you need only open the data file and run the whole analysis using the syntax file instead of filling in all the boxes and pressing all the buttons again.

110

The Paste button

• In the ANOVA dialog, simply transfer the group and score variables. That orders the basic analysis.

• Click the Paste button at the foot of the One-way ANOVA dialog.

111

112

113

114

Choose some options

• Observe what happens to the syntax file when we press the appropriate buttons and choose a means plot, descriptives, Tukey multiple-comparisons and order Helmert planned contrasts.

115

I press buttons to order Descriptives and a profile plot.

116

117

I click the Post Hoc button and select Tukey multiple pairwise

comparisons.

118

119

Helmert contrasts

• Let’s order a set of Helmert contrasts.

• Click the Contrasts button and proceed as follows:

120

121

To enter the whole set …

• Instead of returning to the One-way ANOVA dialog, click the Next button and enter the next row of contrast coefficients.

• Continue until the whole set of contrasts has been entered, then click the Continue button to return to the One-way ANOVA dialog.

• Now look at the syntax file.

122

123

Don’t write, just paste!

• Notice that I haven’t written anything.

• I have produced all this syntax merely by pressing the Paste button.

• But I can easily adapt the analysis for other variables and data sets.

124

In summary

• The one-way ANOVA compares the variance between groups with the variance within groups.

• Further analysis is necessary for the testing of comparisons among individual group means.

• Report the results of your analysis in APA format, following Lisa DeBruine’s guidelines.

• In SPSS, work in Data View first. • In the ANOVA output, watch out for the profile

plots. • Save time with syntax!

125

Appendix

RELATIONSHIP BETWEEN F AND T IN THE TWO-GROUP CASE

126

top related