handout nine: repeated measures –design, analysis, & assumptions

28
Handout Nine: Repeated Measures –Design, Analysis, & Assumptions. EPSE 592 Experimental Designs and Analysis in Educational Research Instructor: Dr. Amery Wu 1

Upload: jocelin-norton

Post on 08-Jan-2018

225 views

Category:

Documents


0 download

DESCRIPTION

One-way Within-Subjects Factorial Design with Interaction Where We Are Today One-way Within-Subjects Factorial Design with Interaction Measurement of Data Quantitative Categorical Type of the Inference Descriptive A B Inferential C D Today, we will introduce the analysis and assumptions of a within-subjects design. This design explains/predicts quantitative data using one (implicit) independent variables, with two or more conditions under which the measurements of the DV are taken from the same individuals. 2

TRANSCRIPT

Page 1: Handout Nine: Repeated Measures –Design, Analysis, & Assumptions

1

Handout Nine: Repeated Measures –Design, Analysis,

& Assumptions. EPSE 592

Experimental Designs and Analysis in Educational Research

Instructor: Dr. Amery Wu

Page 2: Handout Nine: Repeated Measures –Design, Analysis, & Assumptions

Today, we will introduce the analysis and assumptions of a within-subjects design.This design explains/predicts quantitative data using one (implicit) independent variables, with two or more conditions under which the measurements of the DV are taken from the same individuals.

Where We Are TodayOne-way Within-Subjects Factorial Design with

InteractionMeasurement of Data

Quantitative Categorical

Type ofthe

Inference

DescriptiveA B

InferentialC D

2

Page 3: Handout Nine: Repeated Measures –Design, Analysis, & Assumptions

3

Eight Canadian students participated in a food dislike study where four different edible foods from different cultures were compared. The time until retching of the same 8 students was recorded after they were given each of the four treatment conditions (eating parts of 4 animals; Field, 2007).

An Example to Contextualize the LearningOne-Way Within-Subjects Design of ANOVA

Student Stick InsectKangaroo Testicle Fish Eye

Witchetty Grub

1 8 7 1 62 9 5 2 53 6 2 3 84 5 3 1 95 8 4 5 86 7 5 6 77 10 2 7 28 12 6 8 1

Page 4: Handout Nine: Repeated Measures –Design, Analysis, & Assumptions

4

The researcher would like to find out whether there is an effect of animal parts on the students’ food dislike (time till retch).

An Example to Contextualize the Learning

One-Way Within-Subjects Design of ANOVA

Page 5: Handout Nine: Repeated Measures –Design, Analysis, & Assumptions

5

DesignExperimentalObservational

DataQuantitativeCategorical

ModelDescriptive/SummativeExplanatory/Predictive

InferenceDescriptive vs. InferentialRelational vs. Causal

ResearchQuestion

Quantitative Methodology Network

Page 6: Handout Nine: Repeated Measures –Design, Analysis, & Assumptions

6

What renders the validity of making a causal inference based on a randomized experiment research design?

Page 7: Handout Nine: Repeated Measures –Design, Analysis, & Assumptions

7

Between- vs. Within Subjects Designs

1 2 3 2 4 5 3 4 6 8 5 5

Between-Subjects Design12 individuals were randomly assigned to each of the three groups. Each group receives a different treatment (A, B, or C).

Treatment A Treatment B Treatment B

4 4 3 51 2 3 2 4 6 6 8

Within-Subjects DesignThe same 4 individuals receive all three treatments of A, B, and C.

Treatment A Treatment B Treatment C

Page 8: Handout Nine: Repeated Measures –Design, Analysis, & Assumptions

8

What are the major differences between the designs of One-Way Between-Subjects and One-Way Within-Subjects ANOVA?

Page 9: Handout Nine: Repeated Measures –Design, Analysis, & Assumptions

9

In a between-subjects design, groups separate individuals such that individuals in one group are different from those in another. In a within-subjects design, individuals are not grouped. Each individual is a group of his/her own. There are as many groups as individuals.

In a between-subjects design, each individual receives one condition of the treatment, in a within-subjects design, each individual receives all conditions of the treatment.

For a between-subjects design, the DV is only measured once for each individual. For a within-subjects-design, the same DV is measured repeatedly (under various conditions).

The data of the two designs are also laid out differently (see next slide).

Between- vs. Within Subjects Designs

Page 10: Handout Nine: Repeated Measures –Design, Analysis, & Assumptions

10

Subject Condition 1 Condition 2 Condition 31 114 102 1212 92 97 1013 85 82 894 93 99 975 89 89 1106 102 98 93

Subject Condition Score1 1 1142 1 793 1 894 1 915 1 946 1 957 2 928 2 949 2 95

10 2 9411 2 9612 2 9613 3 8114 3 8515 3 8716 3 8817 3 9018 3 96

Layout of the Data Files between- vs. within- Subjects Designs

Note that there are 18 data values for both designs.

Page 11: Handout Nine: Repeated Measures –Design, Analysis, & Assumptions

In a between-subjects (independent samples) experimental design, different treatments are given to different groups of individuals, e.g., individuals receiving treatment A will not receive treatment B or C. Their scores on the DV is then recorded once after the single treatment is given. An individual’s group membership is recorded as an IV.In a within-subjects (repeated measures or dependent samples) experimental design, the same individuals receives all the treatments, one treatment at a time, and the same DV is measured multiple times - each time after each treatment.Recording the treatment IV as a variable (i.e., control, treatment A, or treatment B) is unnecessary since everyone is in all treatment groups.

Within-Subjects Design & Analysis

11

Page 12: Handout Nine: Repeated Measures –Design, Analysis, & Assumptions

12

In between-subjects ANOVA, the treatment effect is investigated by comparing the mean DV scores among the different treatment groups (e.g., control vs. treatment).In a within-subjects ANOVA, the treatment effect is investigated by comparing the mean DV scores of the same individuals- comparing their own repeated DV scores under different treatment conditions.Repeated measures ANOVA is analogous to a dependent (paired) sample t-test in which it compares the means between two groups of the same individuals. When there are 2 or more dependent (related) samples, the repeated measures ANOVA is used.

Within-Subjects Design & Analysis

Page 13: Handout Nine: Repeated Measures –Design, Analysis, & Assumptions

Like the between-subjects ANOVA, within-subjects ANOVA is an inferential statistical technique used to compare and test the effect of a treatment (mean differences among two or more treatment groups).

However, unlike the between-subjects ANOVA where the group membership is independent, individuals’ DV scores across the IV groups are dependent resulting from the same (or matched) individuals whose scores on the DV are repeatedly measured after each of the experiment conditions.

13

Within-Subjects Design & Analysis

Page 14: Handout Nine: Repeated Measures –Design, Analysis, & Assumptions

In a between-subjects design, each individual is given one of the treatments of the experiment and measured on the DV once. Cross-individual differences (within each group) can only be regarded as random errors (sample to sample errors due to chance). Remember that random variation can not be explained, although it can be quantified as error variance.With a repeated measures experimental design, systematic variation can be due to differences in the treatments and differences across the individuals.Another way of understanding a repeated measures design is that each individual forms a group, the number of groups is equal to the sample size.

How Repeated Measures ANOVA Works?

14

Page 15: Handout Nine: Repeated Measures –Design, Analysis, & Assumptions

For example, 30 people were asked to give their ratings on 8 ethnic dishes, the variation among the 240 scores (30X8) may be due to:

1. systematic treatment condition differences (e.g., differences in the taste of the dishes)2. systematic individual differences (e.g., Tony and Kiki tended to give higher ratings but Chen and Wu tended to give lower ratings on western cuisine).3. random differences (sample to sample variation due to chance)

If each individual is measured K times on the DV, systematic variation across the K repeated measures for each individual can be calculated and partitioned out of the total sum of squares. That is, variation within a subject, can be explained and partitioned out of the error variance).

15

How Repeated Measures ANOVA Works?

Page 16: Handout Nine: Repeated Measures –Design, Analysis, & Assumptions

With a repeated measures design, relatively fewer participants are needed. It is preferred when the recruitment of participants is difficult.A repeated measures design allows studying of changes of the same participants’ behavior over the conditions (temporal changes or changes due to the conditions of the treatment).In essence, with a repeated measures design, participants serve as their own comparison groups. This type of design may increase the power by controlling cross-individual differences. However, increase in power is not guaranteed, the actual observed power will depend on how large the individual differences are and in the degrees of freedom.

16

Advantages of a Within-Subjects Design

Page 17: Handout Nine: Repeated Measures –Design, Analysis, & Assumptions

17

This design will not work well if there is a carry-over effect, i.e., the effect of treatment on the DV is affected by practice and learning due to earlier exposure under other treatment conditions or participants’ natural maturation. For example, it is hard to conclude that the new intern training program leads to better nursing practices, if the same interns were exposed to the tasks/questions at the first exam and are already familiar with the contents and procedures of the exam when taking the exam again after completing the second and third training programs.Because there is only one person in each group, the “subject by condition interaction effect” can not be computed (modeled) and is treated as error.

Limitations of a Within-Subjects Design

Page 18: Handout Nine: Repeated Measures –Design, Analysis, & Assumptions

18

Error Variance in a One-Way Within-Subjects DesignThe pattern of dislike of animal parts differs among the individuals. Variation due to individuals’ differences in the pattern over the repeated measures (subject by animal_parts interaction) is treated as the error variance.

Page 19: Handout Nine: Repeated Measures –Design, Analysis, & Assumptions

19

One-way Within-Subjects DesignPartitioning the Total Sum of Squares by the 1

Factors

1 Fac

tor

SS w-s

SS error

SS b-s

Page 20: Handout Nine: Repeated Measures –Design, Analysis, & Assumptions

20

Calculating The Mean Squares for the F Summary Table

One-way Within-Subjects Design

See Excel spreadsheet “Animal Parts.xls”

Page 21: Handout Nine: Repeated Measures –Design, Analysis, & Assumptions

Data Assumptions about Repeated Measures ANOVAAssumption 1: Normal Distribution

The difference scores between any pairs of the repeatedly measured DVs is normally distributed. With K groups, there are K(K-1)/2 difference scores.This assumption is analogous to that for the dependent-sample t-test where the difference scores between the pair of DVs is normally distributed.This assumption can be tested by whether all the paired difference scores are normally distributed using the statistic of skewness and graphs of histogram.

21

Page 22: Handout Nine: Repeated Measures –Design, Analysis, & Assumptions

22

Assumption 2: SphericityThe variances of the difference scores between all pairs of repeatedly measured DVs are equal (Sphericity).This assumption used to be hypothesis tested using the Mauchly’s W test, which follows Chi-square distributions. A significant result signals that the sphericity assumption is violated. However, using a inferential test for checking sphericity is considered to be too strict (always rejected).Epsilon, ἓ, is an alternative measure for sphericity; a maximum value of 1 indicates perfect sphericity. In SPSS, three Epsilon indices are reported.It’s been suggested that a Greenhouse-Geisser’s ἓ being less than 0.75 signals a violation of the sphericity.

Data Assumptions about Repeated Measures ANOVA

Page 23: Handout Nine: Repeated Measures –Design, Analysis, & Assumptions

23

Interpretation of the Repeated Measures ANOVAThe omnibus F test

The interpretation of a repeated measures ANOVA results is similar to that of an independent ANOVA.The focus is to conclude, using the F test, whether there is any statistically significant difference between the means in the DV under different IV conditions. If the sphericity assumption is violated (Greenhouse-Geisser’s ἓ less than 0.75), pick one of the alternative F-tests that corrects for lack of sphericity (i.e., Greenhouse-Geisser, Huynh-Feldt, orLower-bound). Specifically, the numerator and denominator degrees of freedom in the F test are multiplied by the epsilon of the corresponding method.Greenhouse-Geisser F test is more conservative, especially when sample size is small. Huynh-Feldt’s F test less conservative but may assume epsilon values greater than 1.0, in which case it is set to 1.0. The lower bound method for F Test is the most conservative.

Page 24: Handout Nine: Repeated Measures –Design, Analysis, & Assumptions

24

Lab Activity: One-Way Within-Subjects Design Using SPSS for Omnibus F Test

Remember to hit “Add”

Page 25: Handout Nine: Repeated Measures –Design, Analysis, & Assumptions

25

Lab Activity: One-Way Within-Subjects Design SPSS Outputs for Omnibus F Test

Since Greenhouse-Geisser’s ἓ is less than 0.75 signaling a violation of the sphericity, one should choose one of the three alternative F tests. Note that they reached different conclusions.

Page 26: Handout Nine: Repeated Measures –Design, Analysis, & Assumptions

26

Interpretation of the Repeated Measures ANOVAThe post-hoc comparisons

If the omnibus test is significant, the next task is to examine which paired differences is statistically significant.This post hoc test is analogous to the dependent (paired) sample t-tests.Note that violation of sphericity can also complicate the post hoc paired t-tests. In SPSS, there are three options:1. Tukey’s LSD test does not correct for violation of

sphericity, hence is not recommended.2. Bonferroni’s test corrects for violation of

sphericity. It is more conservative than the other two and is recommended in general.

3. Sidak’s test corrects for violation of sphericity. It is more powerful than the other two and is considered when power is a concern.

Page 27: Handout Nine: Repeated Measures –Design, Analysis, & Assumptions

27

Lab Activity: One-Way Within-Subjects Design Using SPSS for Post

Hoc Comparisons

Page 28: Handout Nine: Repeated Measures –Design, Analysis, & Assumptions

28

Lab Activity: One-Way Within-Subjects Design SPSS Outputs for Post hoc Comparisons.