repeated measures

22
Repeated Measures The term repeated measures refers to data sets with multiple measurements of a response variable on the same experimental unit or subject. In repeated measurements designs, we are often concerned with two types of variability: Between-subjects - Variability associated with different groups of subjects who are treated differently (equivalent to between groups effects in one-way ANOVA) Within-subjects - Variability associated with measurements made on an individual subject. Examples of Repeated Measures designs: A. Two groups of subjects treated with different drugs for whom responses are measured at six-hour increments for 24 hours. Here, DRUG treatment is the between-subjects factor and TIME is the within-subjects factor. B. Students in three different statistics classes (taught by different instructors) are given a test with five problems and scores on each problem are recorded separately. Here CLASS is a between-subjects factor and PROBLEM is a within- subjects factor. C. Volunteer subjects from a linguistics class each listen to 12 sentences produced by 3 text to speech synthesis systems and rate the naturalness of each sentence. This is a completely within-subjects design with factors SENTENCE (1-12) and SYNTHESIZER.

Upload: hester

Post on 26-Jan-2016

107 views

Category:

Documents


4 download

DESCRIPTION

Repeated Measures. The term repeated measures refers to data sets with multiple measurements of a response variable on the same experimental unit or subject. In repeated measurements designs, we are often concerned with two types of variability: - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Repeated Measures

Repeated Measures The term repeated measures refers to data sets with multiple

measurements of a response variable on the same experimental unit or subject.

In repeated measurements designs, we are often concerned with two types of variability: Between-subjects - Variability associated with different groups of subjects

who are treated differently (equivalent to between groups effects in one-way ANOVA)

Within-subjects - Variability associated with measurements made on an individual subject.

Examples of Repeated Measures designs:A. Two groups of subjects treated with different drugs for whom responses are measured at six-

hour increments for 24 hours. Here, DRUG treatment is the between-subjects factor and TIME is the within-subjects factor.

B. Students in three different statistics classes (taught by different instructors) are given a test with five problems and scores on each problem are recorded separately. Here CLASS is a between-subjects factor and PROBLEM is a within-subjects factor.

C. Volunteer subjects from a linguistics class each listen to 12 sentences produced by 3 text to speech synthesis systems and rate the naturalness of each sentence. This is a completely within-subjects design with factors SENTENCE (1-12) and SYNTHESIZER.

Page 2: Repeated Measures

Repeated Measures When measures are made over time as in example

A we want to assess: how the dependent measure changes over time

independent of treatment (i.e. the main effect of time) how treatments differ independent of time (i.e., the main

effect of treatment) how treatment effects differ at different times (i.e. the

treatment by time interaction). Repeated measures require special treatment

because: Observations made on the same subject are not

independent of each other. Adjacent observations in time are likely to be more

correlated than non-adjacent observations

Page 3: Repeated Measures

Repeated Measures

Methods of repeated measures ANOVA Univariate - Uses a single outcome measure. Multivariate - Uses multiple outcome

measures. Mixed Model Analysis - One or more factors

(other than subject) are random effects. We will discuss only univariate approach

Page 4: Repeated Measures

Repeated Measures

Subjects Trt 1 Trt2 Trt3

1 y11 y12 y13

2 y21 y22 y23

3 y31 y32 y33...

.

.

....

.

.

.n yn1 yn2 yn3

A layout of simple repeated measures with three treatments on n subjects.

Page 5: Repeated Measures

Repeated Measures

Assumptions: Subjects are independent. The repeated observations for each subject

follows a multivariate normal distribution The correlation between any pair of within

subjects levels are equal. This assumption is known as sphericity.

Page 6: Repeated Measures

Repeated Measures Test for Sphericity:

Mauchley’s test Violation of sphericity assumption leads to inflated F

statistics and hence inflated type I error. Three common Corrections for violation of sphericity:

Greenhouse-Geisser correction Huynh-Feldt correction Lower Bound Correction

All these three methods adjust the degrees of freedom using a correction factor called Epsilon.

Epsilon lies between 1/k-1 to 1, where k is the number of levels in the within subject factor.

Page 7: Repeated Measures

Repeated Measures - SPSS Demo Analyze > General Linear model > Repeated

Measures Within-Subject Factor Name: e.g. time, Number

of Levels: number of measures of the factors, e.g. we have two measurements PLUC.pre and PLUC.post. So, number of level is 2 for our example. > Click on Add

Click on Define and select Within-Subjects Variables (time): e.g. PLUC.pre(1) and PLUC.pre(2)

Select Between-Subjects Factor(s): e.g. grp

Page 8: Repeated Measures

Repeated Measures ANOVA SPSS output:

Within-Subjects Factors Measure: MEASURE_1

time Dependent

Variable 1 PLUC.pre 2 PLUC.post

Between-Subjects Factors

N 1 20 2 20

grp

3 20

Mauchly's Test of Sphericity(b) Measure: MEASURE_1

Epsilon(a)

Within Subjects Effect Mauchly's W Approx. Chi-

Square df Sig. Greenhouse-

Geisser Huynh-Feldt Lower-bound time 1.000 .000 0 . 1.000 1.000 1.000

Tests the null hypothesis that the error covariance matrix of the orthonormalized transformed dependent variables is proportional to an identity matrix. a May be used to adjust the degrees of freedom for the averaged tests of significance. Corrected tests are displayed in the Tests of Within-Subjects Effects table. b Design: Intercept+grp Within Subjects Design: time

Page 9: Repeated Measures

Repeated measures ANOVA SPSS output

Tests of Within-Subjects Effects Measure: MEASURE_1

Source Type III Sum of Squares df Mean Square F Sig.

Sphericity Assumed 172.952 1 172.952 33.868 .000 Greenhouse-Geisser 172.952 1.000 172.952 33.868 .000 Huynh-Feldt 172.952 1.000 172.952 33.868 .000

time

Lower-bound 172.952 1.000 172.952 33.868 .000 Sphericity Assumed 46.818 2 23.409 4.584 .014 Greenhouse-Geisser 46.818 2.000 23.409 4.584 .014 Huynh-Feldt 46.818 2.000 23.409 4.584 .014

time * grp

Lower-bound 46.818 2.000 23.409 4.584 .014 Sphericity Assumed 291.083 57 5.107 Greenhouse-Geisser 291.083 57.000 5.107 Huynh-Feldt 291.083 57.000 5.107

Error(time)

Lower-bound 291.083 57.000 5.107

Tests of Within-Subjects Contrasts Measure: MEASURE_1

Source time Type III Sum of Squares df Mean Square F Sig.

time Linear 172.952 1 172.952 33.868 .000 time * grp Linear 46.818 2 23.409 4.584 .014 Error(time) Linear 291.083 57 5.107

Tests of Between-Subjects Effects Measure: MEASURE_1 Transformed Variable: Average

Source Type III Sum of Squares df Mean Square F Sig.

Intercept 14371.484 1 14371.484 3858.329 .000 grp 121.640 2 60.820 16.328 .000 Error 212.313 57 3.725

Page 10: Repeated Measures

Repeated Measures - R Demo Arrangement of the data can be done in MS-

Excel. However the following R commands will arrange the data in default.csv for the ANOVA of repeated measures-

attach(x)

sid1<- c(sid,sid) grp1<-c(grp,grp)plc<- c(PLUC.pre, Pluc.post)time<-c(rep(1,60),rep(2,60))

The following code is for repeated measures ANOVA

summary(aov(plc~factor(grp1)*factor(time) + Error(factor(sid1))))

Page 11: Repeated Measures

Repeated Measures R output:Error: factor(sid1) Df Sum Sq Mean Sq F value Pr(>F) factor(grp1) 2 121.640 60.820 16.328 2.476e-06 ***Residuals 57 212.313 3.725 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Error: Within Df Sum Sq Mean Sq F value Pr(>F) factor(time) 1 172.952 172.952 33.8676 2.835e-07 ***factor(grp1):factor(time) 2 46.818 23.409 4.5839 0.01426 * Residuals 57 291.083 5.107 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Page 12: Repeated Measures

Odds Ratio Statistical analysis of binary responses leads to a

conclusion about two population proportions or probabilities.

The odds in favor of an event is the quantity p/1-p, where p is the probability (proportion) of the event.

The odds ratio of an event is the ratio of the odds of the event occurring in one group to the odds of it occurring in another group.

Let p1 be the probability of an event in one group and p2 be the probability of the same event in the other group. Then the odds ratio,

)1/(

)1/(

22

11

pp

ppOR

Page 13: Repeated Measures

Odds Ratio

Cancer No Cancer

Smokers a b a+bNon- Smokers

c d c+d

a+c b+d N=a+b+c+d

The odds of cancer among smokers is a/b and the odds of cancer in non-smokers is c/d and the ratio of two odds is,

bc

ad

dc

baOR

/

/

Estimating odds ratio in 2x2 table

Page 14: Repeated Measures

Odds Ratio The odds ratio must be greater than or equal to zero. As

the odds of the first group approaches zero, the odds ratio approaches zero. As the odds of the second group approaches zero, the odds ratio approaches positive infinity

An odds ratio of 1 indicates that the condition or event under study is equally likely in both groups.

An odds ratio greater than 1 indicates that the condition or event is more likely in the first group.

An odds ratio less than 1 indicates that the condition or event is less likely in the first group.

Log odds …

Page 15: Repeated Measures

Odds Ratio-Demo

MS-Excel: No default function SPSS: Analyze > Descriptive Statistics >

Crosstabs > Select Variables: Row(s): Two groups we want to compare e.g. shades, Column(s): Response variable e.g (pc)> Click on Statistics > Select Risk.

R: Odds ratio are calculated from Logistic regression model.

Page 16: Repeated Measures

Odds ratio SPSS output:

pc * Shades Crosstabulation Count

Shades

1 2 Total

0 23 7 30 pc

1 7 23 30 Total 30 30 60

Risk Estimate

95% Confidence Interval

Value Lower Upper Odds Ratio for pc (0 / 1) 10.796 3.263 35.718 For cohort Shades = 1 3.286 1.668 6.473 For cohort Shades = 2 .304 .154 .600 N of Valid Cases 60

Page 17: Repeated Measures

Logistic Regression Consider a binary response variable Y with two

outcomes (1 and 0). Let p be the proportion of the event with outcome 1.

The logit(p) can be defined as log of odds of the event with outcome 1. Then the logistic regression for the response variable Y on a set of explanatory variables X1, X2, …., Xr can be written as

logit(p) = log(p/1-p)= b0 +b1X1+ … +brXr

Page 18: Repeated Measures

Logistic Regression From the model of logistic regression, the odds in favor of

Y=1, (p/1-p), can be expressed interms of some r independent variables,

p/1-p = exp( b0 + b1X1 + +brXr ) If there is no influence of the explanatory variables (i.e.

X1=0, … Xr =0), the odds in favor of Y=1, (p/1-p), is exp(b0).

If an independent variable (e. g. X1 in the model) is categorical with two categories, then exp(b1) implies the ratio of odds (in favor of Y=1) in two categories of the variable X1.

If an independent variable (e. g. X2 in the model) is numeric, then exp(b2) implies the change in odds (p/1-p) for 1 unit change in X2.

Page 19: Repeated Measures

Logistic Regression-Demo

MS-Excel: No default functions SPSS: Analyze > Regression > Binary

Logistic > Select Dependent variable: > Select independent variable (covariate)

R: uses the ‘glm’ function for Logistic regression analysis. Codes are in the R-output slide.

Page 20: Repeated Measures

Logistic Regression SPSS outputDependent Variable Encoding

Original Value Internal Value 0 0 1 1

Categorical Variables Codings

Parameter coding

Frequency (1) 1 30 1.000 Shades

2 30 .000

Classification Table(a,b)

Predicted

pc

Observed 0 1 Percentage

Correct

0 0 30 .0 pc

1 0 30 100.0

Step 0

Overall Percentage 50.0

a Constant is included in the model. b The cut value is .500 Variables in the Equation

B S.E. Wald df Sig. Exp(B) Step 0 Constant .000 .258 .000 1 1.000 1.000

Variables not in the Equation

Score df Sig. Variables Shades(1) 17.067 1 .000 Step 0

Overall Statistics 17.067 1 .000

Page 21: Repeated Measures

Logistic Regression SPSS outputOmnibus Tests of Model Coefficients

Chi-square df Sig. Step 17.985 1 .000 Block 17.985 1 .000

Step 1

Model 17.985 1 .000

Model Summary

Step -2 Log

likelihood Cox & Snell R Square

Nagelkerke R Square

1 65.193(a) .259 .345

a Estimation terminated at iteration number 4 because parameter estimates changed by less than .001. Classification Table(a)

Predicted

pc

Observed 0 1 Percentage

Correct

0 23 7 76.7 pc

1 7 23 76.7

Step 1

Overall Percentage 76.7

a The cut value is .500 Variables in the Equation

B S.E. Wald df Sig. Exp(B) Shades(1) -2.379 .610 15.189 1 .000 .093 Step

1(a) Constant 1.190 .432 7.594 1 .006 3.286

a Variable(s) entered on step 1: Shades.

Page 22: Repeated Measures

Logistic Regression R outputShades[Shades==2]<-0> ftc<-glm(pc~Shades, family=binomial(link = "logit"))> summary(ftc)

Call:glm(formula = pc ~ Shades, family = binomial(link = "logit"))

Deviance Residuals: Min 1Q Median 3Q Max -1.706e+00 -7.290e-01 -1.110e-16 7.290e-01 1.706e+00

Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 1.1896 0.4317 2.756 0.00585 ** Shades -2.3792 0.6105 -3.897 9.73e-05 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 83.178 on 59 degrees of freedomResidual deviance: 65.193 on 58 degrees of freedomAIC: 69.193

Number of Fisher Scoring iterations: 4

> confint(ftc)Waiting for profiling to be done... 2.5 % 97.5 %(Intercept) 0.3944505 2.114422Shades -3.6497942 -1.237172> exp(ftc$coefficients)(Intercept) Shades 3.2857143 0.0926276 > exp(confint(ftc))Waiting for profiling to be done... 2.5 % 97.5 %(Intercept) 1.48356873 8.2847933Shades 0.02599648 0.2902037>