analysis of repeated measurement data

7/29/2019 Analysis of Repeated Measurement Data

1/8

113

ANALYSIS OF REPEATED MEASURES DATA USING SAS

Krishan Lal

I.A.S.R.I., Library Avenue, Pusa, New Delhi 110 012

[email protected]

1. INTRODUCTION

The term repeated measures refers broadly to the data in which the response of each

experimental unit or subject is observed on multiple occasions or under multiple conditions.

Thus repeated measurements refer to the situation in which multiple measurements of the

response variable are obtained, over several time periods, from each experimental unit, such

as an animal. Usually, the responses are taken over time, as in growth of animal weights are

measured weekly/monthly production of fruit over the years from the same tree. Repeated

measurement data are obtained in animal science, horticulture, clinical trials, medical science,

physiological, psychological experiments, etc.

Repeated measures experiments are a type of factorial experiment, with group and time as the

two factors. They have been used commonly in animal, plant, and human research for several

decades, but only in recent years statistical and computing methodologies been available to

analyze them effectively and efficiently. The objectives of repeated measures data analysis

are to examine and compare response trends over time. This can involve comparisons of

groups at specific times, or averaged over time. It also can involve comparisons of times

within a group. These are objectives common to any factorial experiment. The important

feature of repeated measures experiments that requires special attention in data analysis is the

correlation pattern among the responses on the same individual (animal) over time.

2. METHODS FOR ANALYZING REPEATED MEASURES

Responses measured on the same animal are correlated because they contain a common

contribution from the animal. Moreover, measures on the same animal close in time tend to

be more highly correlated than measures far apart in time. Also, variances of repeated

measures often change with time. These potential patterns of correlation and variation may

combine to produce a complicated covariance structure of repeated measures. Special

methods of statistical analysis are needed for repeated measures data because of the

covariance structure. Standard regression and analysis of variance methods may produce

invalid results because they require mathematical assumptions that do not hold with repeated

measures data. In repeated measures analysis of variance, the effects of interest are

i) between-subject effects such as GROUPii) within-subject effects such as TIMEiii) interactions between the two types of effects such as GROUP*TIME.There are several statistical methods used for analyzing repeated measures data. Here we give

from basic to sophisticated methods for the analysis of repeated measure data using SAS

software. These include:

i) Separate analyses at each time point,

ii) Univariate analysis of variance,

iii) Univariate and multivariate analyses of time variables, and

iv) Mixed model methodology.


2/8

Analysis of Repeated Measures Data Using SAS

114

Separate analyses at each time point do not require special methods for repeated measures

and do not directly address the objectives of examining and comparing trends over time. The

other three approaches require special methodology and software. Development of statistical

methods for repeated measures data has been an active area of research in the past two

decades because of advancements in computing hardware and software. Enhancements in the

SAS System reflect the advancements in methodology and hardware. In SAS System theGLM procedure enabled users to perform univariate analysis of variance but did not provide

valid standard errors for most estimates. Moreover, conclusions derived from univariate

analysis of variance are often invalid because the methodology does not adequately address

the covariance structure of repeated measures. The REPEATED statement is now available to

the SAS in the GLM procedure and Mixed procedure. PROC GLM provides both univariate

and multivariate tests for repeated measures for one response. Another approach to analysis

of repeated measures is via general mixed models. This approach can handle balanced as well

as unbalanced or missing within-subject data, and it offers more options for modeling the

within-subject covariance. The main drawback of the mixed models approach is that it

generally requires iteration and, thus, may be less computationally efficient. The results

provided by the REPEATED statement are based on univariate and multivariate analyses ofcontrast variables computed from the repeated measures variables. This approach basically

bypassed the problems of covariance structure rather than addressing them directly. The

REPEATED statement enabled users to obtain statistical tests for effects involving time

trends. However, the tests were inefficient and the problem of incorrect standard errors

remained. In addition, missing data on even one measure of an animal caused all the data for

that animal to be ignored. Mixed procedure provided capabilities of mixed model

methodology for analysis of repeated measures data. Use of mixed model methodology

enabled the user to directly address the covariance structure and greatly enhanced the users

ability to analyze repeated measures data by providing valid standard errors and efficient

statistical tests.

Here we shall illustrate the univariate and multivariate methods of analysis and their

respective advantages and shortcomings. The statistical analysis methods illustrated focus on

group (sex) comparisons at specific times, group comparisons averaged over times, and on

changes over time in specific groups. Differences between groups (male and female) are

computed at individual times and averaged across times.

Separate analyses at each time and the GLM REPEATED statement require the data to be

organized in multivariate mode. That is, there is one row per experimental unit in the data

set, and the measurements at each time are considered separate response variables. The

univariate ANOVA and MIXED procedure require that the data be organized in univariate

mode, that is, one row per experimental unit at each time.

We use the data obtained on body weight (kg) of pigs for the male and female. The body

weights of pigs are collected at interval of 4 weeks since birth to 20 weeks of age and are

given in Table -1. Here the sex has two levels.


3/8


115

Table 1: Body weights of pigs maintained at Jabalpur during 1988-89

Anim

No.

Sex Week

0 4 8 12 16 201 Male 1 4.8 12.6 16 21 1.6

2 Male 1 4.2 7 10 14 223 Male 0.8 4 6 6.4 10 15

4 Male 0.8 4 6 9 13 21

5 Male 0.8 5 9.4 11 14 23

6 Male 0.8 3.2 7 10 15 22

7 Male 0.8 3.2 5.5 7.4 12 17

8 Male 0.8 3.4 7 8.7 12.4 19.2

9 Female 1 5.4 10 13 17.4 26.4

10 Female 1.2 4.8 12.6 16 20 21

11 Female 1 4.6 13 18 22 24

12 Female 0.8 4.2 8 11 13 18

13 Female 0.8 3.8 7 7.2 12 1914 Female 1 5.4 11 14 19 22

15 Female 1 6 5.4 10 17 26.8

16 Female 1 3.4 7.8 10 13 17.8

Now the analysis of this data by using different methods with the use of software is given

below:

I) Analysis at Individual Time Points

Analysis of data at each time point examines group effects separately at individual

observation times and makes no statistical comparisons among times. This can be anlysed by

using even in Microsoft Excel (easily available software). In it we make a file in MicrosoftExcel by taking columns as the levels of the groups and then using Anova single factor

command in Data Analysis command in Tools. This process is repeated for each time point.

In SAS data is organized in the "multivariate mode". The statements to obtain analyses at

each time point are:

DATA BW1;

INPUT SEX T1-T6;

CARDS;

DATA;

PROC GLM;

CLASS SEX;

MODEL T1-T6 = SEX/SS3;

MEANS SEX/LSD;

ESTIMATE GP 1 GP 2 SEX 1 -1;

RUN;

No inference is drawn about trends over time, so this method is not truly a repeated measures

analysis. Use of analysis at each time point is usually at a preliminary stage of data analysis

and is not a preferred method because it does not address time effects. The only advantage in


4/8


116

this method is that if we do not have any statistical software the data can be analyzed in

Microsoft Excel.

II) Univariate ANOVA when the data follow a trend

Some of the repeated measures data such as growth, lactation data follow a trend. The

analysis of such data can be done by fitting the appropriate such as linear, quadratic curvesetc. on each of the animal. A set of estimates of parameters of these repeated data are

estimated. These estimates are further analyzed to determine the effect of factors. Such data

can easily be analyzed by using SAS system easily. The drawback of this method is that we

are using the estimates of parameter that may not be normally distributed.

III) Univariate Analysis of Variance Using the GLM Procedure

Univariate analysis of variance (ANOVA), is the method most commonly applied to repeated

measures data that makes comparisons between times. It treats the data as if they were from a

split-plot design with the animals as whole-plot units and animals at particular times as sub-

plot units. This approach also is referred to as a split plot in time analysis. If measurements

have equal variance at all times, and if pairs of measurements on the same animal are equallycorrelated, regardless of the time lag between the measurements, then the univariate ANOVA

is valid from a statistical point of view, and, in fact, yields an optimal method of analysis.

However, measurements close in time are often more highly correlated than measures far

apart in time, which will invalidate tests for effects involving time. For this procedure data is

to be set in univariate mode. The data can be analyzed by using SAS system. Now SAS code

using PROC GLM for this analysis is given below:

DATA BW2;

INPUT sex an wk wt;

CARDS;

DATA

;

PROC GLM;

CLASS sex an wk;

MODEL wt = sex an(sex) wk sex*wk;

RANDOM an(sex)/TEST;

LSMEANS sex/STDERR E = an(sex);

LSMEANS sex*wk/PDIFF;

RUN;

The MODEL statement specifies sources of variation for the ANOVA. The RANDOM

statement produces a table of expected mean squares which, in a true split-plot experiment

and can be used to determine appropriate denominators of F-statistics for all terms in the

MODEL statement. These tests are produced by the TEST option at the end of the RANDOM

statement. In this case, test statistic for SEX is F=MS[SEX] /MS[AN(SEX)]. Tests for effects

of WK and SEX*WK use F-statistics with MS[ERROR] for denominator mean square. The

first LSMEAN statement computes means for each sex, averaged over weeks, with standard

errors. The second LSMEANS statement computes means for combinations of sex and

weeks, with standard errors. In addition to the potential problems of statistical validity with

univariate ANOVA analysis of repeated measures, there are potential shortcomings with

capabilities of the GLM procedure. The LSMEANS statement in PROC GLM does not

compute correct standard errors for the SEX*WK means, even if correlation structure of therepeated measures is not a problem, that is, even if variances are equal and correlations are


5/8


117

equal. Also, comparisons of LSMEANS between sex at specific weeks are not valid due to

incorrect calculation of standard errors of differences. Moreover we are using the model of

split- plot design but the observations at sub-plot (time points) are not randomly distributed.

IV) Analysis of Contrast Variables Using the GLM REPEATED Statement

Contrast variables in repeated measures data are linear combinations of the responses overtime for individual animals. A familiar example from basic statistical methodology is given

by the orthogonal polynomials (Snedecor and Cochran, 1980), which represent linear,

quadratic, cubic, etc., trends over time. Another example is the set of differences between

responses at consecutive time points, that is, changes from time 1 to time 2, time 2 to time 3,

and so forth. A set of contrast variables can be used to analyze trends over time and to make

comparisons between times in repeated measures data. The original repeated measures data

for each animal are transformed into a new set of variables given by a set of contrast

variables. Then, multivariate and univariate analyses can be applied to these new variables.

This provides a method for analyzing repeated measures data that evades some of the

covariance structure problems that invalidate univariate ANOVA analyses, as discussed in

the previous section. The REPEATED statement in GLM provides automatic computationand analyses for several common choices of contrast variables. Data must be in a multivariate

mode for use of the GLM REPEATED statement.

Using SAS system GLM statements are:

DATA BW1;

INPUT sex t1-t6;

CARDS;

DATA

;

proc glm;

CLASS sex;MODEL t1-t6 = sex/SS3;

repeated time 6 contrast (1);

title 'repeated measures analysis using REPEATED Statement';

RUN;

Note that TIME is not a variable in the SAS data set named MULT. Rather, it is only a

name attached to the set of contrasts to be analyzed.

The REPEATED statement produces results from several statistical methods to obtain tests

for effects involving week. If there were the same number of animals per group and no

missing data on any animal, then all four multivariate tests would have equal results. If allanimals had complete data, the univariate ANOVA results would agree exactly with those in

given in Section I. The label t1 refers to a difference between the response t1 on week 0

and the mean of responses t2 on wk 2 through t6 on wk6. That is, wk1 = t1 - (t2 + ... + t6)/6.

Likewise, the label wk2 refers to t2 - (t1 + t3... + t6)/6, and so forth. The REPEATED

statement causes PROC GLM to compute an ANOVA for each of the contrast variables wk1

through wk6.

V) Mixed Model Analysis Using the MIXED Procedure

As noted above, analysis of repeated measures data requires special attention to the

covariance structure due to the sequential nature of the data on each animal. Procedures

discussed previously either avoid the issue (analysis of contrast variables) or ignore it(univariate analysis of variance). Ignoring the covariance issues may result in incorrect


6/8


118

conclusions from the statistical analysis. Avoiding the issues may result in inefficient

analyses, which is tantamount to wasting data. The general linear mixed model allows the

capability to address the issue directly by modeling the covariance structure. This capability

is implemented in the MIXED procedure of the SAS System.

There are two basic steps in performing a repeated measures analysis using mixed modelmethodology. The first step is to model the covariance structure. The second step is to

analyze time trends for groups by estimating and comparing means.

Measures on different animals are independent, so covariance concern is only with measures

on the same animal. The covariance structure refers to variances at individual times and to

correlation between measures at different times on the same animal. There are basically two

aspects of the correlation. First, two measures on the same animal are correlated simply

because they share common contributions from the animal. This is due to variation between

animals. Second, measures on the same animal close in time are often more highly correlated

than measures far apart in time. This is covariation within animals. Usually, when using

PROC MIXED, the variation between animals is specified by the RANDOM statement, andcovariation within animals is specified by the REPEATED statement. Numerous structures

are available as options on the REPEATED and RANDOM statements in the MIXED

procedure. Three different structures will be shown here and one will be chosen as best

among the three. First, a structure known as compound symmetry (CS) will be fitted. This

structure specifies that measures at all times have the same variance, and that all pairs of

measures on the same animal have the same correlation. The implication is that the only

aspect of the covariance between repeated measures is due to the animal contribution,

irrespective of proximity of time. If this structure holds, then the univariate ANOVA in Table

2 would have valid tests, although the standard errors and tests of LSMEANS from

statements (2) would not necessarily be valid. Compound symmetric structure can be fitted in

two ways with PROC MIXED. One way is with the RANDOM statement:

DATA BW2;

INPUT sex an wk wt;

CARDS;

DATA

;

PROC MIXED;

CLASS sex an wk;

MODEL wt = sex an(sex) wk sex*wk;

(4)RANDOM an(sex);

RUN;

This RANDOM statement specifies that there is a contribution common to all measures on

the same animal, which results in equal variances at all times and equal correlations between

all pairs of times. Only fixed effects are included in the PROC MIXED MODEL statement.

Statements for fitting the compound symmetric structure with the REPEATED statement are:


7/8


119

DATA BW2;

INPUT sex an wk wt;

CARDS;

DATA

;

PROC MIXED;CLASS sex an wk;

MODEL wt = sex wk sex*wk;

REPEATED wk / SUB=an(sex) TYPE=CS R RCORR;

RUN;

Here, the REPEATED statement indicates via SUB=an(sex) that data are correlated on the

same animal All other animals are assumed to have the same covariance matrix, although

heterogeneity of variances between animals can be accommodated by the MIXED procedure.

Second, a general structure will be fitted. As an option in PROC MIXED, this is indicated as

UN for unstructured. This structure makes no assumptions regarding equal variances orcorrelations. Observed average correlations and estimated correlation functions from

compound symmetric and autoregressive plus random effect covariance structures. For fitting

this structure with the REPEATED statement are

DATA BW2;

INPUT sex an wk wt;

CARDS;

DATA

;

PROC MIXED;

CLASS sex an wk;


REPEATED wk / SUB=an(sex) TYPE= UN R RCORR;

RUN;

Again, no RANDOM statement is used because interanimal variance is absorbed into the

general structure. Results from statements All other animals have the same covariance

matrix. This combination structure specifies an inter-animal random effect of differences

between animals, and a correlation structure within animals that decreases with increasing lag

between measures. A combination of MIXED procedure using both RANDOM and

REPEATED statements is given below:

DATA BW2;

INPUT sex an wk wt;

CARDS;

DATA

;

PROC MIXED;

CLASS sex an wk;


RANDOM an(sex);

REPEATED wk / SUB=an(sex) TYPE= AR(1);RUN;


8/8


120

Implications

Computer software is currently available that enables researchers to analyze repeated

measures data using mixed model methodology. This methodology provides more valid and

efficient statistical analyses of repeated measures. Implementation of this methodology

requires the data analyst to model the variance and correlation structure of the data as a first

step. Then, comparisons of groups and trends over time can be analyzed.

REFERENCES

Damon, R. A., and W. R. Harvey (1987) Experimental Design, ANOVA, and Regression. p

320. Harper and Row, New York.

SAS (1989). SAS/STAT Users Guide (Version 6, 4th Ed.). SAS Inst. Inc., Cary, NC.

SAS (19960. SAS/STAT Software: Changes and Enhancements through Release 6.11. SAS

Inst. Inc., Cary, NC.

Snedecor, G. W., and W. G. Cochran (1980). Statistical Methods (7th

Ed.). Iowa State

University Press, Ames.

Searle, S. R. (1971).Linear Models. John Wiley & Sons, New York.

analysis of repeated measurement data

Documents