repeated measures and mixed effect models repeated measures analysis using aov function analysis...

Repeated measures and mixed effect models

• Repeated measures

• Analysis using aov function

• Analysis using mixed effect model

• Various types of models and their relation

• Mixed effect models: simple case

• Repeated measures as mixed effect models.

• R functions for mixed effect models

Repeated measures: motivationsThere is a large class of problems when the same subject is treated under different conditions. In

these cases simple crossed ANOVA may not be appropriate to do analysis.

For example: “Orthodont” data in R package nlme has 108 orthodontic measurements over time. Measurements are done for males and females. We can expect that the results for female and male will be different. It would not be correct to pull all data together and analyse them as a single variable regression model. Better way of analysing such data is to fit two regression lines: one for males and one for females. This way we can analyse if there is difference between responses of males and females as well as dependence of responses over time. In this case male/female is a factor variable and time is continuous variable.

Another example: People are assigned to different diets and exercise groups. Their pulses are measured over time: 1min, 15min and 30min. Here diets and exercises are factors and time is a continues variable. We want to know if diets have different effects, exercise groups have different effects and is there change of pulses over time. Moreover each person has an id. So we have for each person 3 measurements. We can use this fact in building error model. We can assume that variation of pulses for different people will be different and in our model we can estimate one error model per person.

Repeated measures: interaction plotBefore starting to analyse such data it is a good idea to look at the

interaction plot. For the second example we have two factors and one time variable. We can plot two interaction plots. One of them is dependence of pulse on time for different diets and another one is dependence of pulse on time for different exercise type.

In R it can be done using the following command:

interaction.plot(time,diet,pulse,lwd=5,col='red')

interaction.plot(time,exertype,pulse,lwd=5,col='red')

From these plots we can conclude that interaction between time and diet is not significant (lines are more or less parallel). Interaction between time and exercise type is significant (lines are not parallel).

Repeated measures: analysisThere are several ways of analysing repeated measures data. One of them is using aov command

in R. The simples way is using the command aov. For example for full model the command is:

both.aov = aov(pulse ~ factor(exertype)*factor(diet)*factor(time) + Error(factor(id)), exer) summary(both.aov)

This command will give: between subjects and within subjects analysis. It will give results on exertype, diet and interaction between exertype and diet as between subject (It is defined by Error(factor(id)) component of the model in aov command. It will also give within subject analysis: time and interaction between diet and time as well as three way interaction between diet, exertype and time. If Error component in the aov command is not specified the results would be very different.

Repeated measures: analysisLet consider data set ToothGrowth. Before going into analysis let

us plot this data in a little bit different way – using coplot.

coplot(len ~ dose | supp, data = ToothGrowth, panel = panel.smooth, xlab = "ToothGrowth data: length vs dose, given type of supplement")

From this plot we see that there is difference between different suppliments, moreover doses of the supplements also affect tooth growth.

Repeated measures: exampleToothGrowth data could be analysed as a nested model using lm or aov. Using lm:

attach(ToothGrowth)

lm1 = lm(len~supp+supp:dose)

summary(lm1)

anova(lm1)

Or using aov:

t.aov = aov(len~supp+supp:dose)

summary(t.aov)

In both cases we make conclusion that there is significant differences between effects of supplements as well as different doses of supplements.

To design confidence intervals for differences between effects of supplements as well as doses we should make sure that both variables are factors (supp is already factor).

t.aov = aov(len~supp+supp:factor(dose))

summary(t.aov)

TukeyHSD(t.aov)

Repeated measures using mixed effect models

Repeated measures can be analysed using mixed effect models. In this cases different treatments are considered as fixed effect and different subjects on which treatments are tested are considered as random effect models. It can be justified as follows: Treatments are only those treatments we want to tests, whereas subjects are random selection from a population that has certain distribution. For example in the case of effects of exercises, diets on pulses over time subjects (people) were selected randomly from all people. But exercises and diets are only those exercises and diets we want to test.

Before going into application of mixed effect models to repeated measures ANOVA let us consider mixed effect models.

Various forms of models and relation between them

LM: Assumptions 1) independent 2) Normal distribution, 3) constant unknown coefficients

GLM: assumption 2) Exponential family

LMM: Assumptions 1) and 3) are modified

GLMM: Assumption 2) Exponential family and assumptions 1) and 3) are modified

Repeated measures: Assumptions 1) and 3) is modified

Time series

Maximum likelihood: All assumptions can be modified

Classical statistics (Observations are random, parameters are unknown constants)

NLM: Can be applied to all

Bayesian statistics: Coefficients as well as observations are random

Conceptually different approachLM - Linear model

GLM - Generalised linear model

LMM - Linear mixed model

GLMM - Generalised linear mixed model

NLM - Non-linear model

Mixed effect models: motivationIn linear and generalised linear models we assumed ( a little bit different formulation) that 1)

observations are independent on each other and have the same variances 2) Distribution is normal; 3) Parameters are constant (in linear model case):

y = X+; has Normal distribution N(0,2I); is a vector unknown constants. This type of model is called fixed effect models.

What happens if we remove assumption 1) and 3). Then problem becomes more complicated and in general we need nx(n+1)/2 number of parameters to describe covariance structure of observations. Mixed effect models deal with these type of problems. In general this type of models bring classical statistics to a new level and allows to tackle such problems as: clustered data, repeated measures, hierarchical data.

Mixed effect models: ExampleLet us assume that we have a clinical trial. There is a drug. We want to test the effect of the

different doses of the drug. We are interested only these dose levels. We randomly take n person and give to each of them one of the doses. Then the result of the experiment could be written:

yij=+i+ij

Where i is i-th dose, j is j-th person, is average effect of the drug and is effect of the drug specific to this particular dose, is error. Our interest lies on effects of these doses and these doses alone. This type of model is fixed effect model.

Now let us assume these doses were tested in 20 different clinics. Clinics were chosen randomly. Then we can write the model:

yijk=+i+bj +cij ijk

i is i-th dose, j is j-th clinic, k is the k-th patient. Since doses are only those doses we are interested in, they are fixed, 20 clinics have been chosen randomly form the population of all clinics, they are random. We can not guarantee that effect of clinic and effect of dose is additive that is why we add c - interaction between clinics and doses. Since clinics are random then c must be random also. This is an example of mixed effect model. To solve this problem we need to find estimations overall effect (), effects of dose () and distribution of clinics (distributions of b and c).

Mixed or fixed

It is often a challenging problem to decide if we should use fixed or mixed effect models. For example in drug and clinics case if we are going to use these drugs in all clinics (in case of successful results) then we should consider clinics as random but if drugs are very expensive and specialised and they are going to be used only in those clinics then we cannot consider these clinics as random. Then they should be considered as a fixed.

Sometimes choice between random and fixed could be dictated by the amount of the data and information we have. If we have enough data to make inference about the population then we can use mixed effect models. If we do not have enough data then we can make inference only about different levels (e.g. doses of drugs, different clincis) of the variable of interest.

Mixed effect models: Simple model

Let us consider a model:

yij=+ai+xj+ij

μ is overall intercept constant coefficient on x (describes dependence of y on x), a is random intercept specific to i and is a random error. Let us assume that distribution of is N(0, ) and all ai-s are identically and independently distributed (i.i.d.) random variables with N(0, a). Now we can write the distribution of y:

E(yij) = +xi

Var(yij) = a2+2

Cov(yij,yij’) = a2

Cov(yi’j,yij’) =0 for i’i

We see that only two parameters are sufficient to describe the whole covariance structure of the observations. Now we can write multivariate normal distribution for joint probability distribution of the observations.

Mixed effect models: Simple model

If we use notation V as covariance of the observation then we can write of the distribution of the observation and therefore for likelihood:

L(y|m,b,, a) = N(+x,V)

Now the problem is to estimate parameters by maximising this likelihood function. The problem is usually solved iteratively: 1) estimate parameters involved in mean assuming V constant; 2) then estimate parameters involved in V and 3) repeat 1) and 2) until convergence.

General linear mixed effect models

General mixed effect models can be written:

y=X+Zu+Where u is random variable with distribution N(0,D), has distribution N(0, ), a is fixed.

Then we can write:

E(y)=XV(y)=Z DZT+2 I

So if the distribution is the normal distribution then we can build joint probability distribution of all observations and therefore the likelihood function. Note that fixed effects are involved only in mean values (just like in linear model), random effects modify the covariance matrix of the observations, it is no longer diagonal and it means that observations are dependent on each other.

Above equations are general form of the linear mixed effect models.

Simpler forms of linear mixed effect models

If the structure of the data is known then it is possible to simplify covariance of the above described model. For example if we have two group of variables that are not dependent on each other. For example: let us assume we want to analyse performances of pupils in maths. We take n schools, in each school k classes and in each class l boys and m girls. In the model we would include one constant parameter for boys and one for girls (since these are only two options), then we would take random effect of schools (we are interested in all schools) and classes in these schools (we are interested in all classes in this school). Now it is reasonable to assume that there is no correlation between classes and schools. If class does not belong to the school then I do not know where correlation could come from, if class is in the school then since school is considered as a random effect then correlation between classes and this school would be absorbed by the covariance of the school. So we have variance-covariance of schools and that of classes.

Predicting random effects

In mixed models we estimate parameters of fixed effects and distribution for random effects. Sometimes it is interesting to predict random effects. The expressions for fixed effect coefficients and for so called best linear unbiased prediction (BLUP) is

est=(XT V-1X)-1XTV-1y

upredict=DZTV-1(y-Xest)= DZTV-1 (I- (XT V-1X)-1XTV-1)y

var(upredict)=DZTV-1(I- (XT V-1X)-1XTV-1)ZD

Using these facts one can design tests of hypotheses, confidence intervals about u.

Analysis of repeated measures using mixed effect modelsLet us consider the data on effects of exercises and diets on pulses over time. Since exercises

and diets are those we want to find out effects of, they should be considered as fixed effects, subjects on which these exercises are tested should be considered random effects.

For simplicity let us consider exercise type and time only. For covariance of observations we have many different choices. The simplest of them is unstructured covarance matrix. I.e. all covariances are different. In this case the number of parameters are very big and in general is equal to the nrandom*(nrandom+1)/2, where nrandom is the number of random effect variables. We can also assume different type of covariance structure, e.g. autoregressive. Each covariance structure will result in different model. So apart from model fitting problem we also have model selection problem. As we already know, one of the techniques for model selection is using information criterion (AIC or BIC).

This type of analyses can be performed using gls (generalised least-squares) from nlme package of R.

Analysis of repeated measures using mixed effect modelsLet us consider several covariance structures and compare them.

We will try 3 different covariance structures: Compound symmetry (all diagonal elements of covariance matrix are equal to each other and all non-diagonal terms are different from diagonal but equal to each other), unstructured covaraince matrix and autoregressive model (i.e. every next time period is correlated with the previous one with the same correlation coefficient). Only constraint on covariance matrix is that it should be symmetric (we need also group data using the command groupedData. Exercise and time are grouped using persons’ id):

require(nlme)

longg <- groupedData(pulse ~ exertype*time | id, data=exer)

fit1 = gls(pulse ~ factor(exertype)*factor(time), data=longg, corr=corSymm(form = ~ 1 | id), weights = varIdent(form = ~ 1 | time))

fit2 = gls(pulse ~ factor(exertype)*factor(time), data=longg, corr=corSymm(form = ~ 1 | id), weights = varIdent(form = ~ 1 | time))

fit3 = gls(pulse ~ factor(exertype)*factor(time), data=longg, corr=corAR1( form= ~ 1 | id), weight=varIdent(form = ~ 1 | time))

Repeated measures using mixed modelLet us compare three models using AIC (and BIC). This can be done using anova

command (in this case anova compares three models using likelihood ratio test as well as AIC and BIC)

anova(fit1,fit2,fit3)

Produces the following: Model df AIC BIC logLik Test L.Ratio p-value

fit1 1 15 607.7365 643.6532 -288.8682

fit2 2 11 612.8316 639.1706 -295.4158 1 vs 2 13.09512 0.0108

fit3 3 13 605.7693 636.8971 -289.8846 2 vs 3 11.06236 0.0040

According to this table we can reject null-hypothesis that model 1 and model 2 are same and model 2 and model 3 are same. According to AIC and BIC the best model is model 3. If we would compare model 1 and model 2 using AIC then model 1 should be chosen, if we use BIC then model 2 should be chosen. In general in this case it seems to be reasonable to choose model 3 as we are dealing with effect vs time and autoregressive model may be better choice for this type of cases.

R commands for linear mixed modelsCommands for linear mixed models are in the library nlme:

library(nlme)

data(Orthodont)

lm1 = lme(distance~age+Sex,data=Orthodont)

lm1

summary(lm1)

Other commands

aov

gls – general least squares. It allows specification of covariance structure

nlme – general non-linear mixed effect model. It is very general function allows to solve very general class of problems

Useful plot commands

coplot

require(lattice)

xyplot(pulse~time, groups=id, type="o", exer)

References

1) Demidenko E (2004) Mixed Models: Theory and applications

2) McCullagh CE, Searle SR, (2001) Generalized, linear and mixed models

repeated measures and mixed effect models repeated measures analysis using aov function analysis...

Documents