analysis of repeated measurement data
TRANSCRIPT
-
7/29/2019 Analysis of Repeated Measurement Data
1/8
113
ANALYSIS OF REPEATED MEASURES DATA USING SAS
Krishan Lal
I.A.S.R.I., Library Avenue, Pusa, New Delhi 110 012
1. INTRODUCTION
The term repeated measures refers broadly to the data in which the response of each
experimental unit or subject is observed on multiple occasions or under multiple conditions.
Thus repeated measurements refer to the situation in which multiple measurements of the
response variable are obtained, over several time periods, from each experimental unit, such
as an animal. Usually, the responses are taken over time, as in growth of animal weights are
measured weekly/monthly production of fruit over the years from the same tree. Repeated
measurement data are obtained in animal science, horticulture, clinical trials, medical science,
physiological, psychological experiments, etc.
Repeated measures experiments are a type of factorial experiment, with group and time as the
two factors. They have been used commonly in animal, plant, and human research for several
decades, but only in recent years statistical and computing methodologies been available to
analyze them effectively and efficiently. The objectives of repeated measures data analysis
are to examine and compare response trends over time. This can involve comparisons of
groups at specific times, or averaged over time. It also can involve comparisons of times
within a group. These are objectives common to any factorial experiment. The important
feature of repeated measures experiments that requires special attention in data analysis is the
correlation pattern among the responses on the same individual (animal) over time.
2. METHODS FOR ANALYZING REPEATED MEASURES
Responses measured on the same animal are correlated because they contain a common
contribution from the animal. Moreover, measures on the same animal close in time tend to
be more highly correlated than measures far apart in time. Also, variances of repeated
measures often change with time. These potential patterns of correlation and variation may
combine to produce a complicated covariance structure of repeated measures. Special
methods of statistical analysis are needed for repeated measures data because of the
covariance structure. Standard regression and analysis of variance methods may produce
invalid results because they require mathematical assumptions that do not hold with repeated
measures data. In repeated measures analysis of variance, the effects of interest are
i) between-subject effects such as GROUPii) within-subject effects such as TIMEiii) interactions between the two types of effects such as GROUP*TIME.There are several statistical methods used for analyzing repeated measures data. Here we give
from basic to sophisticated methods for the analysis of repeated measure data using SAS
software. These include:
i) Separate analyses at each time point,
ii) Univariate analysis of variance,
iii) Univariate and multivariate analyses of time variables, and
iv) Mixed model methodology.
-
7/29/2019 Analysis of Repeated Measurement Data
2/8
Analysis of Repeated Measures Data Using SAS
114
Separate analyses at each time point do not require special methods for repeated measures
and do not directly address the objectives of examining and comparing trends over time. The
other three approaches require special methodology and software. Development of statistical
methods for repeated measures data has been an active area of research in the past two
decades because of advancements in computing hardware and software. Enhancements in the
SAS System reflect the advancements in methodology and hardware. In SAS System theGLM procedure enabled users to perform univariate analysis of variance but did not provide
valid standard errors for most estimates. Moreover, conclusions derived from univariate
analysis of variance are often invalid because the methodology does not adequately address
the covariance structure of repeated measures. The REPEATED statement is now available to
the SAS in the GLM procedure and Mixed procedure. PROC GLM provides both univariate
and multivariate tests for repeated measures for one response. Another approach to analysis
of repeated measures is via general mixed models. This approach can handle balanced as well
as unbalanced or missing within-subject data, and it offers more options for modeling the
within-subject covariance. The main drawback of the mixed models approach is that it
generally requires iteration and, thus, may be less computationally efficient. The results
provided by the REPEATED statement are based on univariate and multivariate analyses ofcontrast variables computed from the repeated measures variables. This approach basically
bypassed the problems of covariance structure rather than addressing them directly. The
REPEATED statement enabled users to obtain statistical tests for effects involving time
trends. However, the tests were inefficient and the problem of incorrect standard errors
remained. In addition, missing data on even one measure of an animal caused all the data for
that animal to be ignored. Mixed procedure provided capabilities of mixed model
methodology for analysis of repeated measures data. Use of mixed model methodology
enabled the user to directly address the covariance structure and greatly enhanced the users
ability to analyze repeated measures data by providing valid standard errors and efficient
statistical tests.
Here we shall illustrate the univariate and multivariate methods of analysis and their
respective advantages and shortcomings. The statistical analysis methods illustrated focus on
group (sex) comparisons at specific times, group comparisons averaged over times, and on
changes over time in specific groups. Differences between groups (male and female) are
computed at individual times and averaged across times.
Separate analyses at each time and the GLM REPEATED statement require the data to be
organized in multivariate mode. That is, there is one row per experimental unit in the data
set, and the measurements at each time are considered separate response variables. The
univariate ANOVA and MIXED procedure require that the data be organized in univariate
mode, that is, one row per experimental unit at each time.
We use the data obtained on body weight (kg) of pigs for the male and female. The body
weights of pigs are collected at interval of 4 weeks since birth to 20 weeks of age and are
given in Table -1. Here the sex has two levels.
-
7/29/2019 Analysis of Repeated Measurement Data
3/8
Analysis of Repeated Measures Data Using SAS
115
Table 1: Body weights of pigs maintained at Jabalpur during 1988-89
Anim
No.
Sex Week
0 4 8 12 16 201 Male 1 4.8 12.6 16 21 1.6
2 Male 1 4.2 7 10 14 223 Male 0.8 4 6 6.4 10 15
4 Male 0.8 4 6 9 13 21
5 Male 0.8 5 9.4 11 14 23
6 Male 0.8 3.2 7 10 15 22
7 Male 0.8 3.2 5.5 7.4 12 17
8 Male 0.8 3.4 7 8.7 12.4 19.2
9 Female 1 5.4 10 13 17.4 26.4
10 Female 1.2 4.8 12.6 16 20 21
11 Female 1 4.6 13 18 22 24
12 Female 0.8 4.2 8 11 13 18
13 Female 0.8 3.8 7 7.2 12 1914 Female 1 5.4 11 14 19 22
15 Female 1 6 5.4 10 17 26.8
16 Female 1 3.4 7.8 10 13 17.8
Now the analysis of this data by using different methods with the use of software is given
below:
I) Analysis at Individual Time Points
Analysis of data at each time point examines group effects separately at individual
observation times and makes no statistical comparisons among times. This can be anlysed by
using even in Microsoft Excel (easily available software). In it we make a file in MicrosoftExcel by taking columns as the levels of the groups and then using Anova single factor
command in Data Analysis command in Tools. This process is repeated for each time point.
In SAS data is organized in the "multivariate mode". The statements to obtain analyses at
each time point are:
DATA BW1;
INPUT SEX T1-T6;
CARDS;
DATA;
PROC GLM;
CLASS SEX;
MODEL T1-T6 = SEX/SS3;
MEANS SEX/LSD;
ESTIMATE GP 1 GP 2 SEX 1 -1;
RUN;
No inference is drawn about trends over time, so this method is not truly a repeated measures
analysis. Use of analysis at each time point is usually at a preliminary stage of data analysis
and is not a preferred method because it does not address time effects. The only advantage in
-
7/29/2019 Analysis of Repeated Measurement Data
4/8
Analysis of Repeated Measures Data Using SAS
116
this method is that if we do not have any statistical software the data can be analyzed in
Microsoft Excel.
II) Univariate ANOVA when the data follow a trend
Some of the repeated measures data such as growth, lactation data follow a trend. The
analysis of such data can be done by fitting the appropriate such as linear, quadratic curvesetc. on each of the animal. A set of estimates of parameters of these repeated data are
estimated. These estimates are further analyzed to determine the effect of factors. Such data
can easily be analyzed by using SAS system easily. The drawback of this method is that we
are using the estimates of parameter that may not be normally distributed.
III) Univariate Analysis of Variance Using the GLM Procedure
Univariate analysis of variance (ANOVA), is the method most commonly applied to repeated
measures data that makes comparisons between times. It treats the data as if they were from a
split-plot design with the animals as whole-plot units and animals at particular times as sub-
plot units. This approach also is referred to as a split plot in time analysis. If measurements
have equal variance at all times, and if pairs of measurements on the same animal are equallycorrelated, regardless of the time lag between the measurements, then the univariate ANOVA
is valid from a statistical point of view, and, in fact, yields an optimal method of analysis.
However, measurements close in time are often more highly correlated than measures far
apart in time, which will invalidate tests for effects involving time. For this procedure data is
to be set in univariate mode. The data can be analyzed by using SAS system. Now SAS code
using PROC GLM for this analysis is given below:
DATA BW2;
INPUT sex an wk wt;
CARDS;
DATA
;
PROC GLM;
CLASS sex an wk;
MODEL wt = sex an(sex) wk sex*wk;
RANDOM an(sex)/TEST;
LSMEANS sex/STDERR E = an(sex);
LSMEANS sex*wk/PDIFF;
RUN;
The MODEL statement specifies sources of variation for the ANOVA. The RANDOM
statement produces a table of expected mean squares which, in a true split-plot experiment
and can be used to determine appropriate denominators of F-statistics for all terms in the
MODEL statement. These tests are produced by the TEST option at the end of the RANDOM
statement. In this case, test statistic for SEX is F=MS[SEX] /MS[AN(SEX)]. Tests for effects
of WK and SEX*WK use F-statistics with MS[ERROR] for denominator mean square. The
first LSMEAN statement computes means for each sex, averaged over weeks, with standard
errors. The second LSMEANS statement computes means for combinations of sex and
weeks, with standard errors. In addition to the potential problems of statistical validity with
univariate ANOVA analysis of repeated measures, there are potential shortcomings with
capabilities of the GLM procedure. The LSMEANS statement in PROC GLM does not
compute correct standard errors for the SEX*WK means, even if correlation structure of therepeated measures is not a problem, that is, even if variances are equal and correlations are
-
7/29/2019 Analysis of Repeated Measurement Data
5/8
Analysis of Repeated Measures Data Using SAS
117
equal. Also, comparisons of LSMEANS between sex at specific weeks are not valid due to
incorrect calculation of standard errors of differences. Moreover we are using the model of
split- plot design but the observations at sub-plot (time points) are not randomly distributed.
IV) Analysis of Contrast Variables Using the GLM REPEATED Statement
Contrast variables in repeated measures data are linear combinations of the responses overtime for individual animals. A familiar example from basic statistical methodology is given
by the orthogonal polynomials (Snedecor and Cochran, 1980), which represent linear,
quadratic, cubic, etc., trends over time. Another example is the set of differences between
responses at consecutive time points, that is, changes from time 1 to time 2, time 2 to time 3,
and so forth. A set of contrast variables can be used to analyze trends over time and to make
comparisons between times in repeated measures data. The original repeated measures data
for each animal are transformed into a new set of variables given by a set of contrast
variables. Then, multivariate and univariate analyses can be applied to these new variables.
This provides a method for analyzing repeated measures data that evades some of the
covariance structure problems that invalidate univariate ANOVA analyses, as discussed in
the previous section. The REPEATED statement in GLM provides automatic computationand analyses for several common choices of contrast variables. Data must be in a multivariate
mode for use of the GLM REPEATED statement.
Using SAS system GLM statements are:
DATA BW1;
INPUT sex t1-t6;
CARDS;
DATA
;
proc glm;
CLASS sex;MODEL t1-t6 = sex/SS3;
repeated time 6 contrast (1);
title 'repeated measures analysis using REPEATED Statement';
RUN;
Note that TIME is not a variable in the SAS data set named MULT. Rather, it is only a
name attached to the set of contrasts to be analyzed.
The REPEATED statement produces results from several statistical methods to obtain tests
for effects involving week. If there were the same number of animals per group and no
missing data on any animal, then all four multivariate tests would have equal results. If allanimals had complete data, the univariate ANOVA results would agree exactly with those in
given in Section I. The label t1 refers to a difference between the response t1 on week 0
and the mean of responses t2 on wk 2 through t6 on wk6. That is, wk1 = t1 - (t2 + ... + t6)/6.
Likewise, the label wk2 refers to t2 - (t1 + t3... + t6)/6, and so forth. The REPEATED
statement causes PROC GLM to compute an ANOVA for each of the contrast variables wk1
through wk6.
V) Mixed Model Analysis Using the MIXED Procedure
As noted above, analysis of repeated measures data requires special attention to the
covariance structure due to the sequential nature of the data on each animal. Procedures
discussed previously either avoid the issue (analysis of contrast variables) or ignore it(univariate analysis of variance). Ignoring the covariance issues may result in incorrect
-
7/29/2019 Analysis of Repeated Measurement Data
6/8
Analysis of Repeated Measures Data Using SAS
118
conclusions from the statistical analysis. Avoiding the issues may result in inefficient
analyses, which is tantamount to wasting data. The general linear mixed model allows the
capability to address the issue directly by modeling the covariance structure. This capability
is implemented in the MIXED procedure of the SAS System.
There are two basic steps in performing a repeated measures analysis using mixed modelmethodology. The first step is to model the covariance structure. The second step is to
analyze time trends for groups by estimating and comparing means.
Measures on different animals are independent, so covariance concern is only with measures
on the same animal. The covariance structure refers to variances at individual times and to
correlation between measures at different times on the same animal. There are basically two
aspects of the correlation. First, two measures on the same animal are correlated simply
because they share common contributions from the animal. This is due to variation between
animals. Second, measures on the same animal close in time are often more highly correlated
than measures far apart in time. This is covariation within animals. Usually, when using
PROC MIXED, the variation between animals is specified by the RANDOM statement, andcovariation within animals is specified by the REPEATED statement. Numerous structures
are available as options on the REPEATED and RANDOM statements in the MIXED
procedure. Three different structures will be shown here and one will be chosen as best
among the three. First, a structure known as compound symmetry (CS) will be fitted. This
structure specifies that measures at all times have the same variance, and that all pairs of
measures on the same animal have the same correlation. The implication is that the only
aspect of the covariance between repeated measures is due to the animal contribution,
irrespective of proximity of time. If this structure holds, then the univariate ANOVA in Table
2 would have valid tests, although the standard errors and tests of LSMEANS from
statements (2) would not necessarily be valid. Compound symmetric structure can be fitted in
two ways with PROC MIXED. One way is with the RANDOM statement:
DATA BW2;
INPUT sex an wk wt;
CARDS;
DATA
;
PROC MIXED;
CLASS sex an wk;
MODEL wt = sex an(sex) wk sex*wk;
(4)RANDOM an(sex);
RUN;
This RANDOM statement specifies that there is a contribution common to all measures on
the same animal, which results in equal variances at all times and equal correlations between
all pairs of times. Only fixed effects are included in the PROC MIXED MODEL statement.
Statements for fitting the compound symmetric structure with the REPEATED statement are:
-
7/29/2019 Analysis of Repeated Measurement Data
7/8
Analysis of Repeated Measures Data Using SAS
119
DATA BW2;
INPUT sex an wk wt;
CARDS;
DATA
;
PROC MIXED;CLASS sex an wk;
MODEL wt = sex wk sex*wk;
REPEATED wk / SUB=an(sex) TYPE=CS R RCORR;
RUN;
Here, the REPEATED statement indicates via SUB=an(sex) that data are correlated on the
same animal All other animals are assumed to have the same covariance matrix, although
heterogeneity of variances between animals can be accommodated by the MIXED procedure.
Second, a general structure will be fitted. As an option in PROC MIXED, this is indicated as
UN for unstructured. This structure makes no assumptions regarding equal variances orcorrelations. Observed average correlations and estimated correlation functions from
compound symmetric and autoregressive plus random effect covariance structures. For fitting
this structure with the REPEATED statement are
DATA BW2;
INPUT sex an wk wt;
CARDS;
DATA
;
PROC MIXED;
CLASS sex an wk;
MODEL wt = sex wk sex*wk;
REPEATED wk / SUB=an(sex) TYPE= UN R RCORR;
RUN;
Again, no RANDOM statement is used because interanimal variance is absorbed into the
general structure. Results from statements All other animals have the same covariance
matrix. This combination structure specifies an inter-animal random effect of differences
between animals, and a correlation structure within animals that decreases with increasing lag
between measures. A combination of MIXED procedure using both RANDOM and
REPEATED statements is given below:
DATA BW2;
INPUT sex an wk wt;
CARDS;
DATA
;
PROC MIXED;
CLASS sex an wk;
MODEL wt = sex wk sex*wk;
RANDOM an(sex);
REPEATED wk / SUB=an(sex) TYPE= AR(1);RUN;
-
7/29/2019 Analysis of Repeated Measurement Data
8/8
Analysis of Repeated Measures Data Using SAS
120
Implications
Computer software is currently available that enables researchers to analyze repeated
measures data using mixed model methodology. This methodology provides more valid and
efficient statistical analyses of repeated measures. Implementation of this methodology
requires the data analyst to model the variance and correlation structure of the data as a first
step. Then, comparisons of groups and trends over time can be analyzed.
REFERENCES
Damon, R. A., and W. R. Harvey (1987) Experimental Design, ANOVA, and Regression. p
320. Harper and Row, New York.
SAS (1989). SAS/STAT Users Guide (Version 6, 4th Ed.). SAS Inst. Inc., Cary, NC.
SAS (19960. SAS/STAT Software: Changes and Enhancements through Release 6.11. SAS
Inst. Inc., Cary, NC.
Snedecor, G. W., and W. G. Cochran (1980). Statistical Methods (7th
Ed.). Iowa State
University Press, Ames.
Searle, S. R. (1971).Linear Models. John Wiley & Sons, New York.