repeated measures
DESCRIPTION
Repeated Measures. The term repeated measures refers to data sets with multiple measurements of a response variable on the same experimental unit or subject. In repeated measurements designs, we are often concerned with two types of variability: - PowerPoint PPT PresentationTRANSCRIPT
Repeated Measures The term repeated measures refers to data sets with multiple
measurements of a response variable on the same experimental unit or subject.
In repeated measurements designs, we are often concerned with two types of variability: Between-subjects - Variability associated with different groups of subjects
who are treated differently (equivalent to between groups effects in one-way ANOVA)
Within-subjects - Variability associated with measurements made on an individual subject.
Examples of Repeated Measures designs:A. Two groups of subjects treated with different drugs for whom responses are measured at six-
hour increments for 24 hours. Here, DRUG treatment is the between-subjects factor and TIME is the within-subjects factor.
B. Students in three different statistics classes (taught by different instructors) are given a test with five problems and scores on each problem are recorded separately. Here CLASS is a between-subjects factor and PROBLEM is a within-subjects factor.
C. Volunteer subjects from a linguistics class each listen to 12 sentences produced by 3 text to speech synthesis systems and rate the naturalness of each sentence. This is a completely within-subjects design with factors SENTENCE (1-12) and SYNTHESIZER.
Repeated Measures When measures are made over time as in example
A we want to assess: how the dependent measure changes over time
independent of treatment (i.e. the main effect of time) how treatments differ independent of time (i.e., the main
effect of treatment) how treatment effects differ at different times (i.e. the
treatment by time interaction). Repeated measures require special treatment
because: Observations made on the same subject are not
independent of each other. Adjacent observations in time are likely to be more
correlated than non-adjacent observations
Repeated Measures
Methods of repeated measures ANOVA Univariate - Uses a single outcome measure. Multivariate - Uses multiple outcome
measures. Mixed Model Analysis - One or more factors
(other than subject) are random effects. We will discuss only univariate approach
Repeated Measures
Subjects Trt 1 Trt2 Trt3
1 y11 y12 y13
2 y21 y22 y23
3 y31 y32 y33...
.
.
....
.
.
.n yn1 yn2 yn3
A layout of simple repeated measures with three treatments on n subjects.
Repeated Measures
Assumptions: Subjects are independent. The repeated observations for each subject
follows a multivariate normal distribution The correlation between any pair of within
subjects levels are equal. This assumption is known as sphericity.
Repeated Measures Test for Sphericity:
Mauchley’s test Violation of sphericity assumption leads to inflated F
statistics and hence inflated type I error. Three common Corrections for violation of sphericity:
Greenhouse-Geisser correction Huynh-Feldt correction Lower Bound Correction
All these three methods adjust the degrees of freedom using a correction factor called Epsilon.
Epsilon lies between 1/k-1 to 1, where k is the number of levels in the within subject factor.
Repeated Measures - SPSS Demo Analyze > General Linear model > Repeated
Measures Within-Subject Factor Name: e.g. time, Number
of Levels: number of measures of the factors, e.g. we have two measurements PLUC.pre and PLUC.post. So, number of level is 2 for our example. > Click on Add
Click on Define and select Within-Subjects Variables (time): e.g. PLUC.pre(1) and PLUC.pre(2)
Select Between-Subjects Factor(s): e.g. grp
Repeated Measures ANOVA SPSS output:
Within-Subjects Factors Measure: MEASURE_1
time Dependent
Variable 1 PLUC.pre 2 PLUC.post
Between-Subjects Factors
N 1 20 2 20
grp
3 20
Mauchly's Test of Sphericity(b) Measure: MEASURE_1
Epsilon(a)
Within Subjects Effect Mauchly's W Approx. Chi-
Square df Sig. Greenhouse-
Geisser Huynh-Feldt Lower-bound time 1.000 .000 0 . 1.000 1.000 1.000
Tests the null hypothesis that the error covariance matrix of the orthonormalized transformed dependent variables is proportional to an identity matrix. a May be used to adjust the degrees of freedom for the averaged tests of significance. Corrected tests are displayed in the Tests of Within-Subjects Effects table. b Design: Intercept+grp Within Subjects Design: time
Repeated measures ANOVA SPSS output
Tests of Within-Subjects Effects Measure: MEASURE_1
Source Type III Sum of Squares df Mean Square F Sig.
Sphericity Assumed 172.952 1 172.952 33.868 .000 Greenhouse-Geisser 172.952 1.000 172.952 33.868 .000 Huynh-Feldt 172.952 1.000 172.952 33.868 .000
time
Lower-bound 172.952 1.000 172.952 33.868 .000 Sphericity Assumed 46.818 2 23.409 4.584 .014 Greenhouse-Geisser 46.818 2.000 23.409 4.584 .014 Huynh-Feldt 46.818 2.000 23.409 4.584 .014
time * grp
Lower-bound 46.818 2.000 23.409 4.584 .014 Sphericity Assumed 291.083 57 5.107 Greenhouse-Geisser 291.083 57.000 5.107 Huynh-Feldt 291.083 57.000 5.107
Error(time)
Lower-bound 291.083 57.000 5.107
Tests of Within-Subjects Contrasts Measure: MEASURE_1
Source time Type III Sum of Squares df Mean Square F Sig.
time Linear 172.952 1 172.952 33.868 .000 time * grp Linear 46.818 2 23.409 4.584 .014 Error(time) Linear 291.083 57 5.107
Tests of Between-Subjects Effects Measure: MEASURE_1 Transformed Variable: Average
Source Type III Sum of Squares df Mean Square F Sig.
Intercept 14371.484 1 14371.484 3858.329 .000 grp 121.640 2 60.820 16.328 .000 Error 212.313 57 3.725
Repeated Measures - R Demo Arrangement of the data can be done in MS-
Excel. However the following R commands will arrange the data in default.csv for the ANOVA of repeated measures-
attach(x)
sid1<- c(sid,sid) grp1<-c(grp,grp)plc<- c(PLUC.pre, Pluc.post)time<-c(rep(1,60),rep(2,60))
The following code is for repeated measures ANOVA
summary(aov(plc~factor(grp1)*factor(time) + Error(factor(sid1))))
Repeated Measures R output:Error: factor(sid1) Df Sum Sq Mean Sq F value Pr(>F) factor(grp1) 2 121.640 60.820 16.328 2.476e-06 ***Residuals 57 212.313 3.725 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Error: Within Df Sum Sq Mean Sq F value Pr(>F) factor(time) 1 172.952 172.952 33.8676 2.835e-07 ***factor(grp1):factor(time) 2 46.818 23.409 4.5839 0.01426 * Residuals 57 291.083 5.107 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Odds Ratio Statistical analysis of binary responses leads to a
conclusion about two population proportions or probabilities.
The odds in favor of an event is the quantity p/1-p, where p is the probability (proportion) of the event.
The odds ratio of an event is the ratio of the odds of the event occurring in one group to the odds of it occurring in another group.
Let p1 be the probability of an event in one group and p2 be the probability of the same event in the other group. Then the odds ratio,
)1/(
)1/(
22
11
pp
ppOR
Odds Ratio
Cancer No Cancer
Smokers a b a+bNon- Smokers
c d c+d
a+c b+d N=a+b+c+d
The odds of cancer among smokers is a/b and the odds of cancer in non-smokers is c/d and the ratio of two odds is,
bc
ad
dc
baOR
/
/
Estimating odds ratio in 2x2 table
Odds Ratio The odds ratio must be greater than or equal to zero. As
the odds of the first group approaches zero, the odds ratio approaches zero. As the odds of the second group approaches zero, the odds ratio approaches positive infinity
An odds ratio of 1 indicates that the condition or event under study is equally likely in both groups.
An odds ratio greater than 1 indicates that the condition or event is more likely in the first group.
An odds ratio less than 1 indicates that the condition or event is less likely in the first group.
Log odds …
Odds Ratio-Demo
MS-Excel: No default function SPSS: Analyze > Descriptive Statistics >
Crosstabs > Select Variables: Row(s): Two groups we want to compare e.g. shades, Column(s): Response variable e.g (pc)> Click on Statistics > Select Risk.
R: Odds ratio are calculated from Logistic regression model.
Odds ratio SPSS output:
pc * Shades Crosstabulation Count
Shades
1 2 Total
0 23 7 30 pc
1 7 23 30 Total 30 30 60
Risk Estimate
95% Confidence Interval
Value Lower Upper Odds Ratio for pc (0 / 1) 10.796 3.263 35.718 For cohort Shades = 1 3.286 1.668 6.473 For cohort Shades = 2 .304 .154 .600 N of Valid Cases 60
Logistic Regression Consider a binary response variable Y with two
outcomes (1 and 0). Let p be the proportion of the event with outcome 1.
The logit(p) can be defined as log of odds of the event with outcome 1. Then the logistic regression for the response variable Y on a set of explanatory variables X1, X2, …., Xr can be written as
logit(p) = log(p/1-p)= b0 +b1X1+ … +brXr
Logistic Regression From the model of logistic regression, the odds in favor of
Y=1, (p/1-p), can be expressed interms of some r independent variables,
p/1-p = exp( b0 + b1X1 + +brXr ) If there is no influence of the explanatory variables (i.e.
X1=0, … Xr =0), the odds in favor of Y=1, (p/1-p), is exp(b0).
If an independent variable (e. g. X1 in the model) is categorical with two categories, then exp(b1) implies the ratio of odds (in favor of Y=1) in two categories of the variable X1.
If an independent variable (e. g. X2 in the model) is numeric, then exp(b2) implies the change in odds (p/1-p) for 1 unit change in X2.
Logistic Regression-Demo
MS-Excel: No default functions SPSS: Analyze > Regression > Binary
Logistic > Select Dependent variable: > Select independent variable (covariate)
R: uses the ‘glm’ function for Logistic regression analysis. Codes are in the R-output slide.
Logistic Regression SPSS outputDependent Variable Encoding
Original Value Internal Value 0 0 1 1
Categorical Variables Codings
Parameter coding
Frequency (1) 1 30 1.000 Shades
2 30 .000
Classification Table(a,b)
Predicted
pc
Observed 0 1 Percentage
Correct
0 0 30 .0 pc
1 0 30 100.0
Step 0
Overall Percentage 50.0
a Constant is included in the model. b The cut value is .500 Variables in the Equation
B S.E. Wald df Sig. Exp(B) Step 0 Constant .000 .258 .000 1 1.000 1.000
Variables not in the Equation
Score df Sig. Variables Shades(1) 17.067 1 .000 Step 0
Overall Statistics 17.067 1 .000
Logistic Regression SPSS outputOmnibus Tests of Model Coefficients
Chi-square df Sig. Step 17.985 1 .000 Block 17.985 1 .000
Step 1
Model 17.985 1 .000
Model Summary
Step -2 Log
likelihood Cox & Snell R Square
Nagelkerke R Square
1 65.193(a) .259 .345
a Estimation terminated at iteration number 4 because parameter estimates changed by less than .001. Classification Table(a)
Predicted
pc
Observed 0 1 Percentage
Correct
0 23 7 76.7 pc
1 7 23 76.7
Step 1
Overall Percentage 76.7
a The cut value is .500 Variables in the Equation
B S.E. Wald df Sig. Exp(B) Shades(1) -2.379 .610 15.189 1 .000 .093 Step
1(a) Constant 1.190 .432 7.594 1 .006 3.286
a Variable(s) entered on step 1: Shades.
Logistic Regression R outputShades[Shades==2]<-0> ftc<-glm(pc~Shades, family=binomial(link = "logit"))> summary(ftc)
Call:glm(formula = pc ~ Shades, family = binomial(link = "logit"))
Deviance Residuals: Min 1Q Median 3Q Max -1.706e+00 -7.290e-01 -1.110e-16 7.290e-01 1.706e+00
Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 1.1896 0.4317 2.756 0.00585 ** Shades -2.3792 0.6105 -3.897 9.73e-05 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 83.178 on 59 degrees of freedomResidual deviance: 65.193 on 58 degrees of freedomAIC: 69.193
Number of Fisher Scoring iterations: 4
> confint(ftc)Waiting for profiling to be done... 2.5 % 97.5 %(Intercept) 0.3944505 2.114422Shades -3.6497942 -1.237172> exp(ftc$coefficients)(Intercept) Shades 3.2857143 0.0926276 > exp(confint(ftc))Waiting for profiling to be done... 2.5 % 97.5 %(Intercept) 1.48356873 8.2847933Shades 0.02599648 0.2902037>