analysis of repeated measurement data

Upload: edgardoking

Post on 03-Apr-2018

222 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/29/2019 Analysis of Repeated Measurement Data

    1/8

    113

    ANALYSIS OF REPEATED MEASURES DATA USING SAS

    Krishan Lal

    I.A.S.R.I., Library Avenue, Pusa, New Delhi 110 012

    [email protected]

    1. INTRODUCTION

    The term repeated measures refers broadly to the data in which the response of each

    experimental unit or subject is observed on multiple occasions or under multiple conditions.

    Thus repeated measurements refer to the situation in which multiple measurements of the

    response variable are obtained, over several time periods, from each experimental unit, such

    as an animal. Usually, the responses are taken over time, as in growth of animal weights are

    measured weekly/monthly production of fruit over the years from the same tree. Repeated

    measurement data are obtained in animal science, horticulture, clinical trials, medical science,

    physiological, psychological experiments, etc.

    Repeated measures experiments are a type of factorial experiment, with group and time as the

    two factors. They have been used commonly in animal, plant, and human research for several

    decades, but only in recent years statistical and computing methodologies been available to

    analyze them effectively and efficiently. The objectives of repeated measures data analysis

    are to examine and compare response trends over time. This can involve comparisons of

    groups at specific times, or averaged over time. It also can involve comparisons of times

    within a group. These are objectives common to any factorial experiment. The important

    feature of repeated measures experiments that requires special attention in data analysis is the

    correlation pattern among the responses on the same individual (animal) over time.

    2. METHODS FOR ANALYZING REPEATED MEASURES

    Responses measured on the same animal are correlated because they contain a common

    contribution from the animal. Moreover, measures on the same animal close in time tend to

    be more highly correlated than measures far apart in time. Also, variances of repeated

    measures often change with time. These potential patterns of correlation and variation may

    combine to produce a complicated covariance structure of repeated measures. Special

    methods of statistical analysis are needed for repeated measures data because of the

    covariance structure. Standard regression and analysis of variance methods may produce

    invalid results because they require mathematical assumptions that do not hold with repeated

    measures data. In repeated measures analysis of variance, the effects of interest are

    i) between-subject effects such as GROUPii) within-subject effects such as TIMEiii) interactions between the two types of effects such as GROUP*TIME.There are several statistical methods used for analyzing repeated measures data. Here we give

    from basic to sophisticated methods for the analysis of repeated measure data using SAS

    software. These include:

    i) Separate analyses at each time point,

    ii) Univariate analysis of variance,

    iii) Univariate and multivariate analyses of time variables, and

    iv) Mixed model methodology.

  • 7/29/2019 Analysis of Repeated Measurement Data

    2/8

    Analysis of Repeated Measures Data Using SAS

    114

    Separate analyses at each time point do not require special methods for repeated measures

    and do not directly address the objectives of examining and comparing trends over time. The

    other three approaches require special methodology and software. Development of statistical

    methods for repeated measures data has been an active area of research in the past two

    decades because of advancements in computing hardware and software. Enhancements in the

    SAS System reflect the advancements in methodology and hardware. In SAS System theGLM procedure enabled users to perform univariate analysis of variance but did not provide

    valid standard errors for most estimates. Moreover, conclusions derived from univariate

    analysis of variance are often invalid because the methodology does not adequately address

    the covariance structure of repeated measures. The REPEATED statement is now available to

    the SAS in the GLM procedure and Mixed procedure. PROC GLM provides both univariate

    and multivariate tests for repeated measures for one response. Another approach to analysis

    of repeated measures is via general mixed models. This approach can handle balanced as well

    as unbalanced or missing within-subject data, and it offers more options for modeling the

    within-subject covariance. The main drawback of the mixed models approach is that it

    generally requires iteration and, thus, may be less computationally efficient. The results

    provided by the REPEATED statement are based on univariate and multivariate analyses ofcontrast variables computed from the repeated measures variables. This approach basically

    bypassed the problems of covariance structure rather than addressing them directly. The

    REPEATED statement enabled users to obtain statistical tests for effects involving time

    trends. However, the tests were inefficient and the problem of incorrect standard errors

    remained. In addition, missing data on even one measure of an animal caused all the data for

    that animal to be ignored. Mixed procedure provided capabilities of mixed model

    methodology for analysis of repeated measures data. Use of mixed model methodology

    enabled the user to directly address the covariance structure and greatly enhanced the users

    ability to analyze repeated measures data by providing valid standard errors and efficient

    statistical tests.

    Here we shall illustrate the univariate and multivariate methods of analysis and their

    respective advantages and shortcomings. The statistical analysis methods illustrated focus on

    group (sex) comparisons at specific times, group comparisons averaged over times, and on

    changes over time in specific groups. Differences between groups (male and female) are

    computed at individual times and averaged across times.

    Separate analyses at each time and the GLM REPEATED statement require the data to be

    organized in multivariate mode. That is, there is one row per experimental unit in the data

    set, and the measurements at each time are considered separate response variables. The

    univariate ANOVA and MIXED procedure require that the data be organized in univariate

    mode, that is, one row per experimental unit at each time.

    We use the data obtained on body weight (kg) of pigs for the male and female. The body

    weights of pigs are collected at interval of 4 weeks since birth to 20 weeks of age and are

    given in Table -1. Here the sex has two levels.

  • 7/29/2019 Analysis of Repeated Measurement Data

    3/8

    Analysis of Repeated Measures Data Using SAS

    115

    Table 1: Body weights of pigs maintained at Jabalpur during 1988-89

    Anim

    No.

    Sex Week

    0 4 8 12 16 201 Male 1 4.8 12.6 16 21 1.6

    2 Male 1 4.2 7 10 14 223 Male 0.8 4 6 6.4 10 15

    4 Male 0.8 4 6 9 13 21

    5 Male 0.8 5 9.4 11 14 23

    6 Male 0.8 3.2 7 10 15 22

    7 Male 0.8 3.2 5.5 7.4 12 17

    8 Male 0.8 3.4 7 8.7 12.4 19.2

    9 Female 1 5.4 10 13 17.4 26.4

    10 Female 1.2 4.8 12.6 16 20 21

    11 Female 1 4.6 13 18 22 24

    12 Female 0.8 4.2 8 11 13 18

    13 Female 0.8 3.8 7 7.2 12 1914 Female 1 5.4 11 14 19 22

    15 Female 1 6 5.4 10 17 26.8

    16 Female 1 3.4 7.8 10 13 17.8

    Now the analysis of this data by using different methods with the use of software is given

    below:

    I) Analysis at Individual Time Points

    Analysis of data at each time point examines group effects separately at individual

    observation times and makes no statistical comparisons among times. This can be anlysed by

    using even in Microsoft Excel (easily available software). In it we make a file in MicrosoftExcel by taking columns as the levels of the groups and then using Anova single factor

    command in Data Analysis command in Tools. This process is repeated for each time point.

    In SAS data is organized in the "multivariate mode". The statements to obtain analyses at

    each time point are:

    DATA BW1;

    INPUT SEX T1-T6;

    CARDS;

    DATA;

    PROC GLM;

    CLASS SEX;

    MODEL T1-T6 = SEX/SS3;

    MEANS SEX/LSD;

    ESTIMATE GP 1 GP 2 SEX 1 -1;

    RUN;

    No inference is drawn about trends over time, so this method is not truly a repeated measures

    analysis. Use of analysis at each time point is usually at a preliminary stage of data analysis

    and is not a preferred method because it does not address time effects. The only advantage in

  • 7/29/2019 Analysis of Repeated Measurement Data

    4/8

    Analysis of Repeated Measures Data Using SAS

    116

    this method is that if we do not have any statistical software the data can be analyzed in

    Microsoft Excel.

    II) Univariate ANOVA when the data follow a trend

    Some of the repeated measures data such as growth, lactation data follow a trend. The

    analysis of such data can be done by fitting the appropriate such as linear, quadratic curvesetc. on each of the animal. A set of estimates of parameters of these repeated data are

    estimated. These estimates are further analyzed to determine the effect of factors. Such data

    can easily be analyzed by using SAS system easily. The drawback of this method is that we

    are using the estimates of parameter that may not be normally distributed.

    III) Univariate Analysis of Variance Using the GLM Procedure

    Univariate analysis of variance (ANOVA), is the method most commonly applied to repeated

    measures data that makes comparisons between times. It treats the data as if they were from a

    split-plot design with the animals as whole-plot units and animals at particular times as sub-

    plot units. This approach also is referred to as a split plot in time analysis. If measurements

    have equal variance at all times, and if pairs of measurements on the same animal are equallycorrelated, regardless of the time lag between the measurements, then the univariate ANOVA

    is valid from a statistical point of view, and, in fact, yields an optimal method of analysis.

    However, measurements close in time are often more highly correlated than measures far

    apart in time, which will invalidate tests for effects involving time. For this procedure data is

    to be set in univariate mode. The data can be analyzed by using SAS system. Now SAS code

    using PROC GLM for this analysis is given below:

    DATA BW2;

    INPUT sex an wk wt;

    CARDS;

    DATA

    ;

    PROC GLM;

    CLASS sex an wk;

    MODEL wt = sex an(sex) wk sex*wk;

    RANDOM an(sex)/TEST;

    LSMEANS sex/STDERR E = an(sex);

    LSMEANS sex*wk/PDIFF;

    RUN;

    The MODEL statement specifies sources of variation for the ANOVA. The RANDOM

    statement produces a table of expected mean squares which, in a true split-plot experiment

    and can be used to determine appropriate denominators of F-statistics for all terms in the

    MODEL statement. These tests are produced by the TEST option at the end of the RANDOM

    statement. In this case, test statistic for SEX is F=MS[SEX] /MS[AN(SEX)]. Tests for effects

    of WK and SEX*WK use F-statistics with MS[ERROR] for denominator mean square. The

    first LSMEAN statement computes means for each sex, averaged over weeks, with standard

    errors. The second LSMEANS statement computes means for combinations of sex and

    weeks, with standard errors. In addition to the potential problems of statistical validity with

    univariate ANOVA analysis of repeated measures, there are potential shortcomings with

    capabilities of the GLM procedure. The LSMEANS statement in PROC GLM does not

    compute correct standard errors for the SEX*WK means, even if correlation structure of therepeated measures is not a problem, that is, even if variances are equal and correlations are

  • 7/29/2019 Analysis of Repeated Measurement Data

    5/8

    Analysis of Repeated Measures Data Using SAS

    117

    equal. Also, comparisons of LSMEANS between sex at specific weeks are not valid due to

    incorrect calculation of standard errors of differences. Moreover we are using the model of

    split- plot design but the observations at sub-plot (time points) are not randomly distributed.

    IV) Analysis of Contrast Variables Using the GLM REPEATED Statement

    Contrast variables in repeated measures data are linear combinations of the responses overtime for individual animals. A familiar example from basic statistical methodology is given

    by the orthogonal polynomials (Snedecor and Cochran, 1980), which represent linear,

    quadratic, cubic, etc., trends over time. Another example is the set of differences between

    responses at consecutive time points, that is, changes from time 1 to time 2, time 2 to time 3,

    and so forth. A set of contrast variables can be used to analyze trends over time and to make

    comparisons between times in repeated measures data. The original repeated measures data

    for each animal are transformed into a new set of variables given by a set of contrast

    variables. Then, multivariate and univariate analyses can be applied to these new variables.

    This provides a method for analyzing repeated measures data that evades some of the

    covariance structure problems that invalidate univariate ANOVA analyses, as discussed in

    the previous section. The REPEATED statement in GLM provides automatic computationand analyses for several common choices of contrast variables. Data must be in a multivariate

    mode for use of the GLM REPEATED statement.

    Using SAS system GLM statements are:

    DATA BW1;

    INPUT sex t1-t6;

    CARDS;

    DATA

    ;

    proc glm;

    CLASS sex;MODEL t1-t6 = sex/SS3;

    repeated time 6 contrast (1);

    title 'repeated measures analysis using REPEATED Statement';

    RUN;

    Note that TIME is not a variable in the SAS data set named MULT. Rather, it is only a

    name attached to the set of contrasts to be analyzed.

    The REPEATED statement produces results from several statistical methods to obtain tests

    for effects involving week. If there were the same number of animals per group and no

    missing data on any animal, then all four multivariate tests would have equal results. If allanimals had complete data, the univariate ANOVA results would agree exactly with those in

    given in Section I. The label t1 refers to a difference between the response t1 on week 0

    and the mean of responses t2 on wk 2 through t6 on wk6. That is, wk1 = t1 - (t2 + ... + t6)/6.

    Likewise, the label wk2 refers to t2 - (t1 + t3... + t6)/6, and so forth. The REPEATED

    statement causes PROC GLM to compute an ANOVA for each of the contrast variables wk1

    through wk6.

    V) Mixed Model Analysis Using the MIXED Procedure

    As noted above, analysis of repeated measures data requires special attention to the

    covariance structure due to the sequential nature of the data on each animal. Procedures

    discussed previously either avoid the issue (analysis of contrast variables) or ignore it(univariate analysis of variance). Ignoring the covariance issues may result in incorrect

  • 7/29/2019 Analysis of Repeated Measurement Data

    6/8

    Analysis of Repeated Measures Data Using SAS

    118

    conclusions from the statistical analysis. Avoiding the issues may result in inefficient

    analyses, which is tantamount to wasting data. The general linear mixed model allows the

    capability to address the issue directly by modeling the covariance structure. This capability

    is implemented in the MIXED procedure of the SAS System.

    There are two basic steps in performing a repeated measures analysis using mixed modelmethodology. The first step is to model the covariance structure. The second step is to

    analyze time trends for groups by estimating and comparing means.

    Measures on different animals are independent, so covariance concern is only with measures

    on the same animal. The covariance structure refers to variances at individual times and to

    correlation between measures at different times on the same animal. There are basically two

    aspects of the correlation. First, two measures on the same animal are correlated simply

    because they share common contributions from the animal. This is due to variation between

    animals. Second, measures on the same animal close in time are often more highly correlated

    than measures far apart in time. This is covariation within animals. Usually, when using

    PROC MIXED, the variation between animals is specified by the RANDOM statement, andcovariation within animals is specified by the REPEATED statement. Numerous structures

    are available as options on the REPEATED and RANDOM statements in the MIXED

    procedure. Three different structures will be shown here and one will be chosen as best

    among the three. First, a structure known as compound symmetry (CS) will be fitted. This

    structure specifies that measures at all times have the same variance, and that all pairs of

    measures on the same animal have the same correlation. The implication is that the only

    aspect of the covariance between repeated measures is due to the animal contribution,

    irrespective of proximity of time. If this structure holds, then the univariate ANOVA in Table

    2 would have valid tests, although the standard errors and tests of LSMEANS from

    statements (2) would not necessarily be valid. Compound symmetric structure can be fitted in

    two ways with PROC MIXED. One way is with the RANDOM statement:

    DATA BW2;

    INPUT sex an wk wt;

    CARDS;

    DATA

    ;

    PROC MIXED;

    CLASS sex an wk;

    MODEL wt = sex an(sex) wk sex*wk;

    (4)RANDOM an(sex);

    RUN;

    This RANDOM statement specifies that there is a contribution common to all measures on

    the same animal, which results in equal variances at all times and equal correlations between

    all pairs of times. Only fixed effects are included in the PROC MIXED MODEL statement.

    Statements for fitting the compound symmetric structure with the REPEATED statement are:

  • 7/29/2019 Analysis of Repeated Measurement Data

    7/8

    Analysis of Repeated Measures Data Using SAS

    119

    DATA BW2;

    INPUT sex an wk wt;

    CARDS;

    DATA

    ;

    PROC MIXED;CLASS sex an wk;

    MODEL wt = sex wk sex*wk;

    REPEATED wk / SUB=an(sex) TYPE=CS R RCORR;

    RUN;

    Here, the REPEATED statement indicates via SUB=an(sex) that data are correlated on the

    same animal All other animals are assumed to have the same covariance matrix, although

    heterogeneity of variances between animals can be accommodated by the MIXED procedure.

    Second, a general structure will be fitted. As an option in PROC MIXED, this is indicated as

    UN for unstructured. This structure makes no assumptions regarding equal variances orcorrelations. Observed average correlations and estimated correlation functions from

    compound symmetric and autoregressive plus random effect covariance structures. For fitting

    this structure with the REPEATED statement are

    DATA BW2;

    INPUT sex an wk wt;

    CARDS;

    DATA

    ;

    PROC MIXED;

    CLASS sex an wk;

    MODEL wt = sex wk sex*wk;

    REPEATED wk / SUB=an(sex) TYPE= UN R RCORR;

    RUN;

    Again, no RANDOM statement is used because interanimal variance is absorbed into the

    general structure. Results from statements All other animals have the same covariance

    matrix. This combination structure specifies an inter-animal random effect of differences

    between animals, and a correlation structure within animals that decreases with increasing lag

    between measures. A combination of MIXED procedure using both RANDOM and

    REPEATED statements is given below:

    DATA BW2;

    INPUT sex an wk wt;

    CARDS;

    DATA

    ;

    PROC MIXED;

    CLASS sex an wk;

    MODEL wt = sex wk sex*wk;

    RANDOM an(sex);

    REPEATED wk / SUB=an(sex) TYPE= AR(1);RUN;

  • 7/29/2019 Analysis of Repeated Measurement Data

    8/8

    Analysis of Repeated Measures Data Using SAS

    120

    Implications

    Computer software is currently available that enables researchers to analyze repeated

    measures data using mixed model methodology. This methodology provides more valid and

    efficient statistical analyses of repeated measures. Implementation of this methodology

    requires the data analyst to model the variance and correlation structure of the data as a first

    step. Then, comparisons of groups and trends over time can be analyzed.

    REFERENCES

    Damon, R. A., and W. R. Harvey (1987) Experimental Design, ANOVA, and Regression. p

    320. Harper and Row, New York.

    SAS (1989). SAS/STAT Users Guide (Version 6, 4th Ed.). SAS Inst. Inc., Cary, NC.

    SAS (19960. SAS/STAT Software: Changes and Enhancements through Release 6.11. SAS

    Inst. Inc., Cary, NC.

    Snedecor, G. W., and W. G. Cochran (1980). Statistical Methods (7th

    Ed.). Iowa State

    University Press, Ames.

    Searle, S. R. (1971).Linear Models. John Wiley & Sons, New York.