analysis of longitudinal data: random coefficient regression modelling

21
STATISTICS IN MEDICINE, VOL. 13, 1211-1231 (1994) ANALYSIS OF LONGITUDINAL DATA: RANDOM COEFFICIENT REGRESSION MODELLING CAROLYN M. RUTTER Department of Health Care Policy, Harvard Medical School, 25 Shattuck Street. Parcel B. First Floor, Boston, MA 02115. USA. AND ROBERT M. ELASHOFF Departmenr of Biomathematics. Universily of Cal$ornia. Los Angeles, 10833 Lp Conte Avenue, A V-417 CHS. Los Angeles, California 90024-1 766, US. A. SUMMARY We review random coefficient regression (RCR) models and methods for fitting these models from an applications perspective. Methods for data with exponential family distributions are presented with the Gaussian distribution as a special case. Attention is given to interpretation of fixed effects and the correlation structures implied by RCR models. Estimation methods are presented with computational approaches. Problems associated with testing fixed effects include accurate variance estimation and robustness to misspecification of the covariance structure. Methods for model selection and assessment are presented. An example is used to demonstrate recommended approaches. 1. INTRODUCTION This paper is motivated by the analysis of longitudinal data from a weight loss study. The goals of the data analysis are the description of weight loss over time and the estimation of the effects of covariates on weight loss. Our interest is in estimating population average effects. We are not concerned with prediction. Time trend and covariate effects need to be consistently estimated and reliable estimates of variability are required. The outcome variable of interest is per cent over ideal weight. (Ideal weight is based on the lower limit of the range in height-weight tables compiled by the Metropolitan Life Insurance Company). The outcome variable is approximately normally distributed. Missing data are assumed to be missing completely at random. Statistical models must balance simplicity with accuracy. When data are normally distributed, consistent estimation of parameters associated with the mean structure depends only on correct specification of the mean structure. Ordinary least squares (OLS) models consistently estimate population average (or overall) effects. However, when within-individual observations are correlated, OLS estimation procedures do not estimate consistently the variability of estimated regression parameters. Under the random coefficient regression (RCR) model, observations within individuals are assumed to be correlated, with correlation arising in part from an individual's deviation from overall effects. When the RCR variance model is nearer to the true variance structure than CCC 0277-67 15/94/12 12 1 1-21 0 1994 by John Wiley & Sons, Ltd. Received October 1992 Revised August 1993

Upload: carolyn-m-rutter

Post on 06-Jul-2016

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Analysis of longitudinal data: Random coefficient regression modelling

STATISTICS IN MEDICINE, VOL. 13, 1211-1231 (1994)

ANALYSIS OF LONGITUDINAL DATA: RANDOM COEFFICIENT REGRESSION MODELLING

CAROLYN M. RUTTER Department of Health Care Policy, Harvard Medical School, 25 Shattuck Street. Parcel B. First Floor, Boston,

MA 02115. U S A .

AND

ROBERT M. ELASHOFF Departmenr of Biomathematics. Universily of Cal$ornia. Los Angeles, 10833 Lp Conte Avenue, A V-417 CHS. Los Angeles,

California 90024-1 766, US. A.

SUMMARY We review random coefficient regression (RCR) models and methods for fitting these models from an applications perspective. Methods for data with exponential family distributions are presented with the Gaussian distribution as a special case. Attention is given to interpretation of fixed effects and the correlation structures implied by RCR models. Estimation methods are presented with computational approaches. Problems associated with testing fixed effects include accurate variance estimation and robustness to misspecification of the covariance structure. Methods for model selection and assessment are presented. An example is used to demonstrate recommended approaches.

1. INTRODUCTION

This paper is motivated by the analysis of longitudinal data from a weight loss study. The goals of the data analysis are the description of weight loss over time and the estimation of the effects of covariates on weight loss. Our interest is in estimating population average effects. We are not concerned with prediction. Time trend and covariate effects need to be consistently estimated and reliable estimates of variability are required. The outcome variable of interest is per cent over ideal weight. (Ideal weight is based on the lower limit of the range in height-weight tables compiled by the Metropolitan Life Insurance Company). The outcome variable is approximately normally distributed. Missing data are assumed to be missing completely at random.

Statistical models must balance simplicity with accuracy. When data are normally distributed, consistent estimation of parameters associated with the mean structure depends only on correct specification of the mean structure. Ordinary least squares (OLS) models consistently estimate population average (or overall) effects. However, when within-individual observations are correlated, OLS estimation procedures do not estimate consistently the variability of estimated regression parameters.

Under the random coefficient regression (RCR) model, observations within individuals are assumed to be correlated, with correlation arising in part from an individual's deviation from overall effects. When the RCR variance model is nearer to the true variance structure than

CCC 0277-67 15/94/12 12 1 1-21 0 1994 by John Wiley & Sons, Ltd.

Received October 1992 Revised August 1993

Page 2: Analysis of longitudinal data: Random coefficient regression modelling

1212 C. RUTTER AND R. ELASHOFF

complete independence, RCR estimation methods provide more efficient estimates of overall effects than OLS, and better estimates of their variances.

In this article we present RCR modelling approaches, strengths of RCR models and gaps in current knowledge. We focus on longitudinal data applications of RCR models. RCR models for both Gaussian and non-Gaussian data are defined in Section 2. In Section 3 we outline methods for estimating RCR model parameters and discuss problems associated with variance estimation. Computational methods, including iterative algorithms for maximizing likelihoods and Gibbs sampling, are outlined in Section4. We discuss ways to check RCR model assumptions in Section 5. A general approach to data analysis is given in Section 6, followed by analysis of a weight loss data set in Section 7. Concluding remarks are made in Section 8.

2. RANDOM COEFFICIENT REGRESSION MODELS

Random coefficient regression (RCR) models include both overall and within-subject effects. Data from different individuals (indexed by i ) are assumed to be independent, although repeated measurements within individuals (indexed by j = 1, , nr) may be correlated. Let yi, i = 1, ... , m, be ni x 1 data vectors, and let Xi and Zi be ni x p and ni x q matrices of regressors. Fixed effects, associated with the columns of Xi, are modelled by the p x 1 vector a. Within-individual or random effects, St, associated with the columns of Zi are assumed to be unobservable random q x 1 vectors that vary across, but not within, individuals.

RCR models are two-stage or hierarchical models.’ - The first stage describes the distribution of the outcome within individuals. This stage of modelling is conditional on random effects. A known function of the conditional mean of the outcome is assumed to be a linear function of unknown parameters and covariates:

h(E(~ilSi)) = Xia + ZiSi *

The conditional covariance matrix of y, var(yilbi) = Ri, is assumed not to depend on the individual, except possibly through the number of and the duration between observations.

Most work has focused on models that assume a multivariate normal outcome:

YilBi - MVN(Xia + Zifli,Ri).

The exponential family of distributions offers a range of alternatives to the multivariate normal distribution. The generalized linear RCR model assumes that the conditional distribution of yill/3, is in the exponential family:’

f(~ijISi) = exp{(yijUij - b(wij))/a(dij) + c(Yij, 4ij))

E(yijIBi) = b(wij) = h-’(xija + ZijSi)

var(Yi,lbi) = b”(w*&(4ij)

The functions a( .), b( * ) and c( * ) are known, the parameter wij is a function of xija + zij/$, and 4ij is a scale parameter. For example, when yij is normally distributed, ~ ( 4 ~ ~ ) = 4ij, b ( q ) = ~ $ 2 , and c(Yij, 4ij) = - ) (~ i j /4 i j + WWij)) SO that wij = E(~ijlSi) = xija + ZjjSi, and 4ij =

The second stage of RCR modelling describes variability across individuals by specifying a distribution for the random effects, a(&). Random effects are assumed to have expectation zero and variance var(Bi) = D. Usually, random effects are assumed to be normally distributed,

var(Yi,lSi) = fJ2*

Bi - MVN(0,D).

Page 3: Analysis of longitudinal data: Random coefficient regression modelling

RANDOM COEFFICIENT REGRESSION MODELLING 1213

If bothf(y1b) and ~ ( b ) are MVN, then the marginal distribution of y is easily obtained:

yi - MVN(Xia, ZiDZ; + R,) . (1) Models based on multivariate normal conditional and random coefficient distributions (1) are called linear RCR models. Models based on the exponential family of distributions are called generalized linear RCR (GL-RCR) models. Except for special cases, the marginal distribution of yi under the GL-RCR model is intractable.

2.1. Interpretation of fixed effects

Under the RCR model, fixed effects (a) model the conditional expected value of y at /.? = E,(fi) = 0. This conditional expectation, based on the population expected random effect, is called the subject specific (SS) effect. SS effects model the expected within individual effect. Population averaged (PA) effects model expected average effects across individuals, based on the marginal expectation of y, E,(E(yi lpi)) . Under the linear RCR model, the vector a represents both SS and PA effects.

Zeger et d6 give the form of the marginal mean for some standard link functions when random effects are assumed to be distributed N(0, D). For example, under the probit link E ( y S = @(IDzijzij + 11-4/2xija) where x i ] and zil are the jth rows of the Xi and 2, design matrices. Under the logit link the marginal mean does not have closed form, but can be approximated by E ( y i j ) x logit-'(Ic2Dzijzij + 11-4/2xija) where c = 16,/3/15x. Under the logistic and probit models, PA effects are closer to zero than SS effects. This is easily seen by noting that IDzijzij + 11 > 1 since D is positive definite. Evidence of attenuation in fixed effect (marginal) models can also be found in results from studies of omitted covariate~.~~' This attenuation does not affect the statistical significance of tests associated with fixed effects when standard errors are attenuated by the same factor as point estimates. Zeger et d6argue that when a logit or probit link is used and random effects have a Gaussian distribution, tests associated with PA and SS effects are approximately equivalent.

Interpretation of SS effects associated with the GL-RCR model may be difficult. PA effects estimate expected differences between individuals with different covariate values while SS effects estimate expected differences within individuals for different covariate values. When fixed effect covariates do not change within individuals ( for example, gender, ethnicity) or do not vary during the study (for example, educational level), SS estimates of within-individual changes are based on entirely on differences between individuals.*

2.2. Variance structure

RCR models involve specific covariance structures. There are two components of variance. The random effects variance matrix, D, models variability across individuals. The conditional covariance of yilSi, R, , models variability within individuals. The variance matrices Ri and D are assumed to be known functions of a parameter vector 8. The marginal variance of y is given by

In the multivariate normal case (l), var(yi) = ZiDZ: + Ri so that the marginal variance of y increases with increasing magnitude of random effect covariates. When there is more than one random effect, the covariance between observations may also increase with increasing magnitude of the random effect covariates, depending on the range of the covariate and the correlation between random effects.

In the GL-RCR context, the marginal variance structure is less clear. The structure implied by random effects, var,(Ef(y(p)), is related to random effects through the non-linear inverse link

W y ) = varx(E,(ylS)) + Ex(var,(ylS)).

Page 4: Analysis of longitudinal data: Random coefficient regression modelling

1214 C. RUITER AND R. ELASHOFF

function, h-’(-). The conditional variance of yI/3, varf(yl/3) may also be a function of the fixed effects (for example, in Poisson and logistic models). A Taylor series approximation of the marginal variance of y, provides some insight regarding the marginal variance structure implied by the GL-RCR models; var(yi) x LiZiDZ;Li + 4 A i where Li = diag(dh-’(u)/du,u = xiia,j = 1, , ni).6 Although the accuarcy of this approximation may not be adequate for estimation,’ it suggests that when fixed effect covariates are constant over time the GL-RCR model implies increasing variability with increasing magnitude of random effect covariates.

The conditional covariance, Ri, can model covariance not explained by the random effects structure. The conditional independence model, R = a”(o)+I, is a common assumption. GL-RCR models have been defined as conditional independence models in recent literature (for example, Zeger and Karim”). Under the linear RCR model, conditional independence implies homoscedastic variance. More general structures for Ri have been used with the linear RCR

Because any data set has finite variability to be divided between these two components of variance, inclusion of random effects and parameterization of Ri must be considered toget her.

The use of linear RCR models is limited by these implied variance structures. Linear RCR models that include random time effects may not be applicable to data collected over long periods of time because of the implied increase in variability with increasing magnitude of the time covariate.

, ni) and Ai = diag(a”(oij),j = 1,

2.3. Other approaches

RCR models imply that at least part of the within-individual correlation is due to the random effects. In contrast, the correlation of repeated observations within individuals could be modelled directly through the parameterization of the marginal variance matrix. The general linear regression model is based on yi - MVN(Xia, V,) with Vi unrestricted. General linear regression models are limited by the number of covariance parameters to be estimated. Therefore, they are of limited use when there are many observations per subject or data are unequally spaced.

In another approach, generalized estimating equations model the marginal mean and variance of y,, and can be used with both Gaussian and non-Gaussian data.14 The marginal distribution of the outcome is assumed to be in the exponential family, and the correlation of observations within individuals is directly modelled through parameterization of the covariance matrix. This marginal approach estimates population averaged effects rather than subject specific effects.

Under the RCR model, within-individual effects are assumed to be fixed over time. This can be a serious limitation. Because RCR models assume that random effects are fixed, individuals ‘track’, or remain on the same trajectory, indefinitely. To overcome this, Taylor et al.’ combine random effects with an integrated Ornstein-Uhlenbeck (OU) process that allows individuals to track over short periods, with the amount of tracking estimated from the data.

3. ESTIMATION METHODS

We begin by discussing point estimation, followed by the more difficult problem of estimating the variability of the point estimates.

Page 5: Analysis of longitudinal data: Random coefficient regression modelling

RANDOM COEFFICIENT REGRESSION MODELLING 1215

3.1. Point estimates, linear RCR model

When the outcome, y, is multivariate normal (1) with known variance matrices of arbitrary form, the generalized least squares estimator, d = (XF',X;V; 'Xi)-'C;"= lX:V;lyi, is the uniformly minimum variance unbiased estimator of a.' If Vi = ZiDZ: + Ri and both Ri and D, are known, then the best (uniformly smallest mean squared_error) linear unbiased estimator of random effects is given by the Gauss-Markov extension:16 p i = DZ:V;'(yi - Xi&). Standard errors of these estimators are easily calculated since they are linear combinations of Gaussian random variables.'

Estimation methods for linear RCR models focus on finding methods for estimating variance parameters, with these parameters subsequently used as if they were fixed and known. These methods are based on replacing unknown parameters by estimates in formulae for posterior means, for example, E(Bily,a = &,8 = 4). This parametric empirical Bayes (PEB) a p p r ~ a c h ' ~ yields consistent estimates as long as the mean structure is correctly specified.

Approximations to generalized least squares" - 2o can be used to estimate fixed effects, though these methods do not estimate particular variance structures implied by the RCR model. The usefulness of these methods is limited by the data structure. In particular, these methods may not work well when the design is unbalanced.' '

Maximum likelihood (ML) estimation uses the observed data likelihood, L(a, 8; y) = Iff(yi; a, 81pi)n(pi;8)dpi, to estimate fixed effect and variance parameters. The EM

algorithm can be used to maximize the likelihood by treating random effects and errors as missing The EM algorithm is discussed further in Section 4.2. Although ML estimators of 0 are asymptotically unbiased, they are downwardly biased for small samples.' Restricted maximum likelihood (REML) estimation accounts for estimation of a using an uninformative prior on a, a - N(0 , r ) with r-' = 0. REML estimation can also be given a frequentist interpretation, based on independent error contrasts, u'y such that E(u'y) = 0.' Kackar and Harville" show that under the assumption of correctly specified distributions, estimates of linear RCR fixed and random effects (a and p i ) based on replacing estimated variance parameters into equations derived for known 8 are unbiased when ML or REML estimates of 8 are used.

Properties of likelihood based estimators are based on Bayesian and maximum likelihood estimation theory. Consistency and efficiency of maximum likelihood estimators depend on correct specification of the marginal distribution of y,f(y) =f(ylfi)n(p). If the conditional distri- bution, f(ylp), is misspecified, either in distribution or regression function, then the posterior distribution of ply is not necessarily consistent.24s25 Whenf(y1B) is correctly specified, the choice of the random coefficient distribution, n(pi), will be most important when the likelihood is flat, for example when there is colinearity among regressors. The influence of this distribution is reduced when the likelihood is concentrated near its centre.

3.2. Point estimates, GGRCR model

The likelihood function for GL-RCR models is generally intractable, so that closed form expressions for fixed effect estimates cannot be obtained, even when variance matrices are known. To reduce computation, fixed and random effects can be estimated by posterior modes rather than posterior means." This is equivalent to using a MVN approximation to the posterior distribution of parameters.26 The MVN approximation to the posterior distribution may not work well in small sample^.^^^^^ Numeric integration or Gibbs sampling may be necessary. These methods are discussed in Section 4.

Page 6: Analysis of longitudinal data: Random coefficient regression modelling

1216 C. RUTTER AND R. ELASHOFF

3.3. Estimated variance of fixed effects

The underlying variance of the fixed effects estimators depends on the estimation approach taken. When using the ML method, the asymptotic variance of fixed effects parameter estimates is given by the inverse information matrix I - '(a) = {E(d210g(L(a,B; y)/aaz)}-l. When REML estimation is used, the variance estimated is the posterior variance var(a(y, 8).

When the link function, h ( * ) , is non-linear, generalized linear RCR estimates B and e ̂ are asymptotically correlated." Therefore, when using the GL-RCR model, it is important to fake into account uncertainty in e ̂when estimating var(B). Under the linear RCR model, B and 8 are asymptotically uncorrelated. Although there are not studies examining the relationship between sample size and estimation of var(8) when 8 is estimated, problems with estimation of var(B) are expected to be most severe in small samples.

There are three general approaches to estimation of the variance of fixed effects: substitution estimators, approximation to exact estimators, and resampling estimators.

3.3.1. Substitution variance estimators

Substitution estimators are based on replacing parameters in variance formulae with estimated values, assuming that closed form variance formulae are available. The observed information, a210g(L(a, e;y))/aa2 evaluated at maximum likelihood estimates, is a substitution estimate for the true information. PEB estimators _are also substitution estimators; the posterior variance var(b)y, 8) is estimated by var(Bly, 8). Substitution estimators do not take into account the variability in 8, and therefore provide lower bounds for the variance of B.28,4

3.3.2. Approximations to exact estimators

Approximate methods are based on the Taylor series expansion. Kackar and Harville' present an approximation that takes into account the variability of 6and the dependence of ti and e^in the linear RCR model. Kass and Steffey4 present first- and second-order variance estimators for PEB estimators based on Laplace's method that have bias of order m - l and m - 2 , respectively. Approximations proposed by Kass Steffey are most easily applied when the model is conjugate and the variance function can be obtained analytically. If the model is not conjugate and the dimension of random effects, pi, is not large, numeric integration can be used to obtain estimates. Further approximation is necessary if the variance function is not analytic. These methods depend on correct specification of the model, including the variance structure.

White" and Royal13' independently present a variance estimator that is robust to misspecification of the distribution of the errors E, = y, - X,a. The marginal expectation, E(y l ) , must be correctly specified. Zeger et aL6 suggested using this variance estimator in the longitudinal data context. This robust variance estimator is based on a Taylor series expansion of the gradient of the log-likelihood about the true value, do:

where

Page 7: Analysis of longitudinal data: Random coefficient regression modelling

RANDOM COEFFICIENT REGRESSION MODELLING 1217

The outer term, Z(d)- is the model-based substitution estimator. The inner term captures the deviation of the true variance from the model-based estimate. Under the linear RCR model (1) there are simple formulae for the robust variance estimator,

Consistency of the robust vafiance estimator does not depend on normality assumptions nor on the assumed variance of y. V ( d ) is a consistent estimator of the asymptotic variance of B if the unconditional mean is correctly specified.

Gibbs sampling is another approximate method b r estimating the posterior distribution of parameter estimates. Zeger and Karim" use the Gibbs sampler to estimate logistic RCR parameters when random effects are MVN, although their technique is not restricted to MVN random effects. A flat prior is specified for the fixed effects, a, and a non-informative prior is used for the random effects variance matrix, p ( D ) a IDI-(4+1)'2. There may be problems with this prior because it has infinite mass at ID1 = 0. Zeger and Karim point out potential problems with convergence of the Gibbs sampler when the variance of random effects is near zero. They recommend restarting the algorithm when any of the diagonal elements of D becomes small. Zeger and Karim use a simulation study based on 100 replicated data sets to evaluate coverage probabilities or confidence intervals based on their method for generating posterior distribution of fixed effects in a logistic RCR model. Two types of correctly specified models are examined: one has a random intercept, the other has random intercept and random time effects. They find that the estimated fixed effects, based on the mean of samples from the posterior distribution, are approximately unbiased. Their estimated 90 per cent Bayes confidence intervals have actual coverage probabilities that are plus or minus 6 per cent of the nominal level. Estimates of the variance of the random effects based on the mean of the sampled posterior are positively biased; posterior modes are better estimates of these variance parameters.

Gibbs sampling can also be used to estimate linear RCR parameters. Gelfand et al." compare Gibbs sampling estimates to PEB estimates of fixed effects in the linear RCR model. Their Gibbs estimates of the variance of estimated linear RCR fixed effects are consistently larger, and they suggest more accurate, than empirical Bayes variance estimates. Gelfand et al." generally prefer Gibbs sampling estimates to PEB estimates, especially in small samples. This preference seems to be based on accurate estimation of the variance of fixed effects and flexibility in the choice of statistics. However, estimates obtained from the Gibbs sampler are based on stronger model assumptions than PEB estimates.

In summary, three of the four approximate methods presented depend on correct specification of the model. Approximations derived by Kackar and Harville2* and Kass and Steffey4 adjust for estimation of unknown parameters. Gibbs sampling is entirely model based. There have not yet been published studies examining its robustness to model misspecification. The robust variance estimator (2) is not dependent on full model specification, requiring only correct specification of the marginal mean.

3.3.3. Resampling methods for variance estimation

Resampling techniques are another approach to variance estimation. Unweighted leave-one-out jackknife variance estimators of linear estimators are consistent given that the marginal mean structure is correctly ~pecified.~' In the repeated measures context, individuals' sets of observations or residuals are kept together and resampled as a group.

Page 8: Analysis of longitudinal data: Random coefficient regression modelling

1218 C. RUTTER AND R. ELASHOFF

There has been relatively little published research examining resampling techniques for variance estimation when data are longitudinal. Two use bootstrap simulation studies to compare bootstrap and substitution variance estimates of general linear regression coefficients when data are Gaussian. Both studies examine small samples (m < 50). In these studies, substitution variance estimates are shown to underestimate true variability and although bootstrap variance estimates are less biased than substitution estimates they also underestimate variability. Lee32 compares substitution, delta-method, and jackknife variance estimators of the analysis of variance estimated mean in a one-way random effects model using a Monte Carlo simulation study. Lee also examines small samples. The substitution variance estimator generally underestimates the variability of the mean. When the variance structure is misspecified the delta-method variance estimator also underestimates the variability of the mean. The standard jackknife has the most consistent 'good' behaviour, rarely underestimating variability.

Resampling methods are used with longitudinal non-Gaussian data in two recent articles.33* 34

Both articles examine marginal generalized linear models rather than random coefficient models. Currently, the practical use of resampling techniques for GL-RCR models is limited by the computationally intensive methods required to fit these models. When the link function is non-linear, the estimator 2 may be non-linear. Bootstrap or delete-d jackknife variance estimates can be used to calculate consistent variance estimates of non-linear parameter^.^^

3.4. Implications for data analysis

Our data are approximately normally distributed and we have a relatively large sample size (m = 251). We need reasonable and consistent estimates of the overall effects and their variance. If the mean function is correctly specified then generalized least squares estimators provide unbiased estimates of fixed effects. If the marginal variance is also correctly specified then plug-in estimates of the variance of estimated fixed effects may be adequate. If within-individual observations have additional correlation, correlation beyond that induced by random effects (that is, Ri # Id), then substitution variance estimators will not be consistent. Both the robust variance estimator (2) and the leave-one-out jackknife method produce consistent variance estimates of fixed effect estimates as long as the marginal mean is correctly specified.

4. COMPUTATION

In this section we discuss methods for computing estimates when solutions are non-trivial. When data are assumed to be non-Gaussian, generalized estimating e q u a t i ~ n s ' ~ can be used to estimate fixed effects (Section 4.1). When the likelihood is analytic, iterative methods for maximizing the likelihood can be used (Section4.2). Several published programs can be used to calculate maximum likelihood estimates under the linear RCR model (for example, BMDSV, SAS PROC MIXED). Numeric integration (Section 4.3) is another method for calculating maximum likelihood estimates. Gibbs sampling (Section 4.4) is a recently developed tool for estimating RCR model parameters under a fully Bayesian model.

4.1. Generalized estimating equations

The generalized estimating equation (GEE) approach was derived under the assumption that the marginal distribution of the outcome, f ( y i j ) , is in the exponential family. The correlation of observations within individuals is directly modelled through parameterization of the covariance

Page 9: Analysis of longitudinal data: Random coefficient regression modelling

RANDOM COEFFICIENT REGRESSION MODELLING 1219

matrix, R(8). Population averaged effects, a, are estimated using generalized estimating equations (GEEs):

i= f 1 ( F ) V ; l ( y i - qyi,)) = 0

where Vi is the modelled marginal covariance matrix, cov(yi). Zeger et aL6 outline an approximate method for estimating subject specific effects associated

with the GL-RCR model when a multivariate normal distribution is assumed for random effects. They recommend iterating between GEE estimation of fixed effects and method of moments estimation of variance parameters. Marginal expectations and variances used in G E E s are approximated. Gilmour et al.’ found problems with similar variance estimators in the probit model. We are not aware of studies examining the properties of GEE estimators of GL-RCR fixed effects and variance parameters.

4.2. Likelihood methods

Likelihood estimators of RCR model parameters (REML or ML) are not of closed form when the covariance parameters, 8, are unknown. Maximization of the full data or restricted data likelihood requires iterative solution.

Laird and Ware3 and Stiratelli et a1.” use the expectation-maximization (EM) algorithmj6 to estimate variance parameters associated with the conditional independence RCR models. The E-step estimates statistics based on the unobservable vectors, pi and ei = yi - X p - Zipi. Laird and Ware’s3 algorithm alternates between generalized least squares estimation of fixed effects and PEB estimation of random effects with EM estimation of variance parameters. Stiratelli et a1.” take a similar approach to the logistic RCR model; estimation alternates between EM estimation of variance parameters and PEB estimation of fixed and random effects.

There are published computational formulae and programs for estimating parameters of the linear RCR conditional independence model. Laird et al? give computational formulae for calculating ML and REML estimates using the EM algorithm. Jennrich and S c h l u ~ h t e r ~ ~ propose a generalized EM (GEM) algorithm that can be applied to RCR models. Their GEM algorithm uses scoring for the maximization step.

The EM algorithm is monotonic and guarantees eventual convergence when the maximum occurs at an interior point of the parameter space.36 The EM algorithm is generally slow to converge (it has linear convergence near the solution), and may converge to a local, rather than global m a ~ i m u m . ~ * ~ ~ * ~ ’ The choice of starting values is important. If the solution for D is on or near the boundary of the parameter space, the EM algorithm may not ~onverge.~’

There have been several efforts to speed up the EM algorithm. Laird et al.37 and Lindstrom and Bates3’ use the Aitken accelerator4’ to speed convergence of the EM algorithm of the EM algorithm near the solution. Jamshidian and Jennrich4’ cast the EM algorithm as a generalized gradient algorithm and use a generalized conjugate gradient acceleration. Jamshidian and Jennrich41 use a ‘complete’ EM algorithm; EM is used to estimate all parameters, rather than only variance parameters. In a data example, they find that their accelerated EM algorithm requires significantly fewer iterations and floating point operations than either the complete EM or Laird and Ware’s algorithm. Lange4’ takes a quasi-Newton approach based on improved estimation of the hessian of the log-likelihood function.

When the linear RCR model is used with an AR(1) model for within-individual errors the EM algorithm is unattractive because the M-step is not closed form.” One possible approach to this problem is the expectation conditional maximization (ECM) algorithm.43 Jones and

Page 10: Analysis of longitudinal data: Random coefficient regression modelling

1220 C. RUTTER AND R. ELASHOFF

Boadi-Boateng' propose using the Kalman filter to obtain estimates when data have AR( 1) structure.

Gradient algorithms, such as Newton-Raphson and Fisher scoring, can also be used to maximize likelihoods. Jennrich and SchluchterJ8 apply Newton-Raphson and Fisher scoring to repeated measures models, including linear RCR models. They modify the Newton-Raphson method, using the scoring method for the first few steps to take advantage of its robustness to starting values. Chi and Reinsel" outline a scoring algorithm to obtain ML or REML estimates when within-individual errors are autocorrelated in addition to the random effect structure.

Advantages of the Newton-Raphson and Fisher scoring algorithms include faster convergence than the EM algorithm, objective convergence criteria, and calculation of the observed informa- tion matrix.39 Jennrich and SchluchterJ8 recommend using their GEM algorithm only when the number of random effects is large, otherwise they prefer gradient algorithms. Lindstrom and Bates3' compare EM and Newton-Raphson estimation of linear RCR model parameters for two data sets. They also prefer the Newton-Raphson method when the number of random effects is small. However, gradient methods are not guaranteed to converge.

4.3. Numeric integration

Numeric integration can be used for estimation when likelihood functions are not analytic. Numeric integration is a 'brute force' technique. Theoretically it can always be used, although there are computational limits. Numeric integration can be used with either ML or REML approaches. ML estimation requires m q-dimensional integrations for each evaluation of the likelihood. Evaluation of the integrals required for PEB estimation of fixed and random effects requires numeric integration in p and mq dimensions, respectively.

4.4. Gibbs sampler

Gibbs sampling is a flexible approach to estimation of GL-RCR model parameters. The model fit using the Gibbs sampler is fully Bayesian, and therefore requires statement of a prior distribution for the variance of random effects, D. Gibbs sampling is based on the hierarchical Bayes model

p(Yi,) = p(Yijla,D, Bi)p(Bila, D ) p ( a , D) where p ( .) denotes a probability density function (p.d.f.).

Gibbs sampling uses draws from the conditional p.d.f.'s, p(a)D, B, y ) , p(Dla, @, y ) , and p(Bla, D, y ) to simulate draws from the joint p.d.f. p(a,B,Dly) . A single iteration of the Gibbs sampler generates one draw from each conditional distribution using values from the previous iteration (for example, a('+ ') is drawn from p(alD(k), p'), y ) , then D('+ ') is drawn from p(Dla('+ 'I, B('), y ) and @'+') is drawn from p ( B J a ( k + l ) , D ( k f l ) , y ) ) . After many iterations, the distribution of a(*), p*) and D(*) converges to the joint posterior p.d.f. Methods for determining when the Gibbs sampler has converged, and optimal ways to simulate draws from the posterior (for example, using a single starting point or several starting points) are open research areas.44 - 47

Gibbs sampling depends on correct specification of the density functions that define the hierarchical Bayes models, and the ability to simulate independent draws from resulting conditional densities. This technique's robustness to model misspecification is not known.

5. MODEL ASSESSMENT

Published work examining RCR models has focused on estimation. In this section we discuss methods for checking RCR model assumptions, model selection, and residual analysis. Model

Page 11: Analysis of longitudinal data: Random coefficient regression modelling

RANDOM COEFFICIENT REGRESSION MODELLING 1221

assessment is relatively simple for linear RCR models and difficult for GL-RCR models. We assume that data are longitudinal and use the natural ordering of observations.

5.1. Checking RCR model assumptions

Sample distributions across individuals, graphical techniques, and within-individual analyses can be used to assess whether data satisfy RCR model assumptions.

Because observations across individuals are independent, the distribution of the outcome across individuals can be used to check assumptions about the marginal distribution. Increasing variance implied by random effects under the linear RCR model can be checked by calculating the sample variance at several levels of the random effect covariate.

Graphical model assessment techniques for cross-sectional data are complicated by repeated measurements. Links between observations within individuals usually need to be maintained to give plots meaning. There is some simplification when data are longitudinal because there is a natural ordering of observations. 'Parallel plots' plot repeated measures against time, with straight lines connecting observations within individuals. Weiss and Lazaro4" discuss the use of parallel plots in the linear RCR context.

When data are approximately normally distributed, parallel plots of repeated measurements can be used to suggest both mean structures and random coefficients in time. Fanning in parallel plots indicates increasing variability over time, characteristic when there is a random time effect. As the number of random effects increases, it may become increasingly difficult to gain insight from parallel plots. For example, when there are both random intercept and random slope effects, fanning may be more easily seen when the baseline measurement is subtracted from all subsequent outcomes.

When data are non-Gaussian, parallel plots of the outcome may not be helpful. Certain types of data, for example binary data, will not provide useful raw data plots. Even when raw data plots are potentially informative, both the marginal mean and variance are required to know what could be expected in parallel plots. Because parallel plots show both the mean and variance structure these plots may be difficult to interpret when the variance is a function of the mean.

If there are enough observations within subjects, within-individual analyses can be used to check model assumptions and guide model selection. This method is useful for both linear and generalized linear RCR models. Modelling outcomes within individuals estimates subject specific effects. If the RCR model approach is appropriate and the mean structure has been correctly specified, within-individual models using the same mean structure should generally fit well. The sample distribution of within-subject regression coefficients can be used to check distributional assumptions placed on random effects. Probability plots of estimated random effects can also be used to assess their assumed normality. Lange and Ryan49 show that the limiting distribution of the estimated empirical c.d.f. of standardized random effects is standard normal when ML or REML is used for estimation.

5.2. Model selection

The choice of random and fixed effects given a RCR model can be aided by a combination of graphical examination of the data, within-individual analysis results and statistical testing.

The distribution of estimated regression coefficients based on within-individual analysis can be used to guide selection of fixed and random effects. One can use the distribution of estimated intercept effects across covariates to investigate possible main effects and the distribution of estimated slopes to investigate interaction effects. Waternaux et a/.'' plot estimated random effects estimated from linear RCR models against excluded variables, or included variables that

Page 12: Analysis of longitudinal data: Random coefficient regression modelling

1222 C. RUTTER AND R. ELASHOFF

are suspected to interact with random effects. This is analogous to examining the joint distributions of estimated coefficients from within-individual analyses. One potential disadvantage of their approach is that it requires fitting the RCR model.

Statistical tests associated with RCR models include Wald-type tests of the fixed effects and likelihood ratio tests for nested models. When parameters are estimated using ML, likelihood ratio tests are useful for selecting both random and fixed effects. Schluchter” warns that likelihood ratio tests for fixed effects are not appropriate when REML estimation is used because the likelihood maximized does not explicitly include theAxed effects. Tests of the fixed effects based on the asymptotic standard normality of B/,/{var(B)} are biased if based on biased variance estimators.

5.3. Model verification

In the section below we outline residual analysis and detection of outlying and influential observations.

5.3.1. Residual analysis

Parallel residual plots provide useful information about fitted linear RCR models. There are two types of errors in RCR models. Marginal errors, el = y, - E(y,) , are estimated by residuals 9, = y , - E m . Marginal residuals remove only the estimated fixed effects from2bserved values. Conditional errors, ei = y, - E(y,ljI,), are estimated by residuals &, = yi - E(yiljIi). Conditional residuals remove both the estimated fixed and random effects from observed values.

Residual analysis is complex for GL-RCR models. A large part of the problem is that the marginal distribution f(y) is generally not analytic. If the marginal expectation can be approximated, then marginal residuals can be calculated. However, marginal residuals are uninformative when their expected distribution cannot be estimated. In the GL-RCR context, conditional residuals make sense only when they are examined within cluster because variance may depend on the cluster mean. These complications are superimposed on difficulties associated with residual analysis for cross-sectional generalized linear models.’

The underlying distribution of residuals can be derived when multivariate normal data are fit using the linear RCR model. Both marginal and conditional residuals have zero expectation. The variance of marginal residuals is given by

var(9,) = Vi + X,var(B)X: - Xicov(B, yi) - cov(yi, d)X:.

If var(yi) is known, then 9, - MVN, and its variance can be simplified,

var(9,) = Vi - Xivar(B)Xt.

As the estimate B becomes more precise, var(B) tends to 0, and var(9,) converges to V,.

variance of conditional residuals is complicated, Conditional residuals are a linear function of the marginal residuals, &, = (I - Z,DZ;)$. The

var(ki) = Vi + Xivar(a)x; + Z,var(j,)z; - cov(.t,, ji)z; - ~,cov(B~,t,).

If the components of V,, D and R,, are known then &, - MVN with

var(&,) = (I - ZiDZ;)var(Li)(I - ZiDZ; ) .

As estimates of both B and /, become more precise, var(2) tends to 0 and var(/,) tends to D for all i, so that var(&,) converges to R,.

Page 13: Analysis of longitudinal data: Random coefficient regression modelling

RANDOM COEFFICIENT REGRESSION MODELLING 1223

When the variance parameters are estimated, the distribution of residuals may be non-Gaussian. Louiss2 suggests transforming residuals to (approximate) independence using estimated covariance matrices then using usual residual analysis methods. Plots of untransformed residuals that preserve time ordering and the connections within subjects are also interesting. Intuitively, parallel plots of linear RCR marginal residuals show the random effects covariance structure, Vi. For example, if there is a single random effect, time, then the spread between parallel plots of marginal residuals by time should increase with time. In finite samples, the variance of .ti, will be less than the variance of the original observations. The formulae for the conditional residuals, ki , provide fewer clues about sample behaviour of plots. Intuitively, parallel plots of ii should randomly fluctuate about zero.

5.3.2. Outlying observations

In longitudinal data there are two general types of outliers: outlying individuals and outlying observations within individuals. Outlying individuals may show patterns that are different from other subjects under study. When data are approximately normal, these individuals can be identified in parallel plots. Outlying individuals can also be found by examining within-individual analysis results. They may have fitted regression parameters that are very different from most others, or the regression model may not fit. Outliers within individuals are observations that are unusual within a given individual’s set of data. If the within-individual outlier is not influential, within-individual analysis results will not help to find it. When data are fit using a linear RCR model, within-individual outliers may be found by examining parallel plots of RCR conditional residuals.

5.3.3. Influential observations

Influential observations are in many ways more important than residuals. LiskiS3 derives influence statistics for a mixed-model analysis of variance (ANOVA) based on the Lawley-Hotelling trace statistics. Liski considers both influential cases and influential observations. Beckman et also discuss influence measures for the mixed-model ANOVA. De Gruttola et al.” derive approximate deletion influence measures and leverage based on a three-step method for fitting linear RCR models. These measures assess influence arising from each step: OLS regression, computation of 2 using OLS residuals, and GLS estimation given 2. McCullagh and Nelder’ recommend deletion methods for detecting outlying and influential observations in generalized linear models. At this time deletion influence measures are not practical for GL-RCR models because of the computational effort required for estimation. Influential observations in RCR models may be influential in the estimation of the mean function, estimation of the covariance function, or both.

6. AN APPROACH TO DATA ANALYSIS

The first step in data analysis is to choose a model, or set of models that are appropriate to both the data and the research question. Random coefficient regression can be an intuitively appealing way to model longitudinal data. A basic assumption of RCR models is that part of the within-individual correlation is due to random within-individual effects. The validity of this assumption should strongly influence the inclusion of random effects.

Page 14: Analysis of longitudinal data: Random coefficient regression modelling

1224 C. RUTTER AND R. ELASHOFF

Once candidate models have been chosen, the data are used to check model assumptions (Section 5.1). Linear RCR model assumptions, including a common regression model for all subjects and increasing variance with increasing values of random effects covariates, are relatively easy to check. Assumptions implied by GL-RCR models are difficult to assess unless there are enough observations within individuals to fit individual general linear regression models.

After initial screening of model assumptions, candidate models are fit and assessed. Consistent estimation of linear RCR fixed effects requires only correct specification of the mean function. When data are non-Gaussian, consistent estimation of PA effects can be obtained using GEE'S when the marginal mean structure is correctly specified. In both cases, accurate models for the marginal variance improve efficiency of PA effect estimates. Two methods for consistent estimation of the variance of fixed effects also require only correct specification of the marginal mean: the leave-one-out jackknife and a robust variance estimator.

The final stage of data analysis is interpretation of results. In the linear RCR model, fixed effects model both SS and PA effects. Random effects affect the model primarily through the covariance of the estimate fixed effects. In the GL-RCR model, fixed effects model SS effects. Estimated random effects can be used to describe the data. Approximate methods proposed by Kass and Steffey4 or bootstrap methods suggested by Laird and Louiss6 can be used to estimate the variance of PEB estimates of random effects in the linear RCR model.

7. EXAMPLE

We return now to the data analysis problem. Data come from six clinics. Each administered a meal replacement plan to at least 50 individuals. By design these data have two distinct phases, weight loss and maintenance. The weight loss phase occurs during the first twelve weeks of study. During weight loss, subjects replace two meals a day, and their weight is scheduled to be measured once a week. The maintenance phase follows for up to two years after the weight loss phase, with weight scheduled to be measured once every two weeks. Because of clear differences in both the mean structure and covariance structure during the two study phases, we restrict our analyses to the weight loss phase.

Both the week numbers and dates of measurements are recorded. Week numbers are used when looking at observations across subjects; dates are used in analyses describing weight loss over time.

There are 71 men (23 per cent) and 230 women in the sample. The age of participants ranges from 21 to 60 years, with a median age of 38 years. Men tend to be slightly more overweight than women in terms of pounds over ideal weight (medians are 41 versus 37.5 pounds), and slightly less overweight than women in terms of per cent over ideal weight (medians are 26 per cent versus 29 per cent of ideal weight). Ninety per cent of subjects have the complete set of 12 weekly measurements over the weight loss phase. At the last measurement in the weight loss study phase, average weight lost is 14.1 pounds.

Per cent over ideal weight is approximately normally distributed across subjects at each time point. Each normal probability plot is near the 45" line. The variability of per cent over ideal weight tends to increase with time through the weight loss phase and stabilizes during the maintenance phase.

Parallel plots of the per cent over ideal weight suggest that a linear RCR model, with random time effects would be appropriate. Figure 1 shows parallel plots of per cent over ideal weight for the weight loss data from clinic 2.

Page 15: Analysis of longitudinal data: Random coefficient regression modelling

RANDOM COEFFICIENT REGRESSION MODELLING 1225

0

I I I I I 1

0 20 40 60 80 100

Figure 1 . Observed per Cent over ideal weight by days in study, clinic 2

Within-individual ordinary least squares models are fit using the linear regression model:

yij = boi + blitij + bzitij2 + Eij i = 1, ... ,m; j = 1, ... ,ni < 12

where yij is per cent over ideal weight for the ith individual on the jth occasion and t i j is the time (in days, rescaled to weeks) from baseline corresponding to the jth occasion. Only subjects with at least four observations are included in these analyses (m = 298).

Within-individual OLS models fit most individual data well, as demonstrated by the coefficient of multiple determination (R2 statistic). These results demonstrate clear differences between clinic 1 and the remaining five clinics. Individuals from clinic 1 have a median R 2 = 99-1, among the remaining clinics the median R2 statistics ranges from 84-13 to 88.30. The quadratic time effect model appears to fit well for most individuals' data, with primary differences in the value of regression coefficients, rather than the form of time effect.

OLS regression coefficients are approximately normally distributed, although the distribution of the quadratic week effect is more sharply peaked than would be expected for a Gaussian random variable. Again, there are differences between clinic 1 and the remaining five clinics, particularly in the estimated rates of weight loss. Clinic 1 has average values of 6, = - 2.02 and 6,. = 0.07, compared to 6,. = - 0.96 and = 0-25. These differences are not attributable to differences in the sample distribution of covariates. Average values of within-individual regression coefficients are similar among the remaining five clinics.

Data from clinic 1 are unusual. In addition to the within-subject OLS results, profile plots of clinic 1 data are unusually smooth. With the exception of clinic 1, we do not find any unusual observations. The parallel plots did not reveal any clear outliers, and none of the estimated

- -

Page 16: Analysis of longitudinal data: Random coefficient regression modelling

1226 C. RUTTER AND R. ELASHOFF

Table I. Estimated standard errors of fixed effect parameters under the linear RCR model for weight loss data

~ ~

Term 2 Estimated standard error substitution robust jackknife

intercept - 2.281 0.665 0658 0,680 t - 0.821 0.060 0.06 1 0.062 t2 0.015 0.004 0.004 0.004 Yo 1.025 0.022 0.022 0.022 male 0.444 0-290 0.301 0.306 male-t - 0.436 0-127 0.120 0.123 male-t2 0.024 0-008 0.007 0.007

regression coefficients were particularly unusual. Clinic 1 is currently under investigation for protocol violation. This clinic is dropped from all further analyses, leaving m = 251 individuals.

There are differences in estimated within-individual regression coefficients by gender. Average - OLS estimates among women are &:28.26, 5: - 0.87, &:0-02; among men averages are 6,: 25.34, &: - 1.27, 6: 0.04. Within-individual regression coefficients do not show correlation with age or baseline proportion over ideal weight ( y o ) with one exception. The intercept term, b0, is correlated with yo suggesting the inclusion of a fixed yo effects in RCR models.

The data analysis thus far indicates that a RCR model is appropriate for these data. We use the RCR conditional independence model. RCR model parameters are estimated using the Laird, Lange and Stram37 algorithm implemented by Cook and Stram.”

The initial RCR model includes an intercept, time ( t i ) , time-squared (t?), baseline per cent over ideal weight (yo ) , clinic, and age as fixed effects, with covariates (age, clinic and yo) affecting only the intercept term. There is evidence that the rate of weight loss differs between men and women. These differences are modelled by including fixed gender-time interaction effects in the initial RCR model. We include random intercept, time and time-squared effects in the initial model. We fit the full parameter model, without gender effects, separately for men and women. From these models we find that the estimated random effect variance matrices and the marginal variance are similar for men and women.

Model selection is guided by examining changes in the likelihood ratio and Wald-type tests using the robust variance estimates (2). In the full data model, only one out of the four clinic effects has an effect significantly different from zero. Age is also not significantly different from zero. Dropping age and clinic effects causes no change in the likelihood ratio. In the gender specific models, we found that age is a significant effect among men. Because age affects men only through the intercept, we chose not to include age and age-gender fixed effects.

Parameter estimates of the final model are given in Table I with estimated standard errors. Standard errors of estimated fixed effects are estimated using three different variance estimators: the substitution variance estimator, the robust variance estimator (2), and a leave-one-out jackknife variance estimator.

Substitution, jackknife and robust variance estimates are nearly identical. It is not surprising that substitution estimates are reasonable in this linear RCR model because of the relatively large sample size. The similarity between these variance estimators suggests that this is an appropriate model for these data.

Page 17: Analysis of longitudinal data: Random coefficient regression modelling

RANDOM COEFFICIENT REGRESSION MODELLING 1227

The estimated marginal variance is 8' = 0.951 and the estimated random effects covariance matrix is

2.362 - 0.347 0.025 0.347 0.564 - 0.028 0.025 - 0.028 0.002

The final model is verified using graphics. We examine a series of three sets of plots. Figure 2 shows these profile plots for clinic 2. Figure 2(a) shows parallel plots of fitted values. Fitted values appear to accurately represent the observed data. The parallel plot of marginal residuals is shown in Figure 2(b). Within-individual time effects vary about zero, with variability increasing with time. Figure 2(c) shows conditional residual parallel plots. Errors vary randomly about zero. There appears to be some large residuals, although fitted value plots suggest that these have not been influential.

Deletion influence analysis was also performed at this stage. Although deletion of some individuals from the sample causes large changes in the value of the maximized likelihood or variance parameters, no individuals cause large changes in the estimated fixed effects. We do not delete any individuals based on these findings.

8. DISCUSSION

Random coefficient regression models can be a useful method for fitting longitudinal data. There is growing interest in linear RCR models that incorporate autocorrelation models for within-subject errors. Both random effects and autoregressive error models can be used to model the marginal covariance. Joness8 and Jones and B~adi-Boateng'~ note that in small samples it may not be possible to distinguish between serial correlation arising from an AR(1) structure and correlation arising from random effects. Random effects and autocorrelation models are conceptually very different ways to model correlation between observations.

The AR(1) assumption cannot be checked easily. Chi and Reinsel'' derived a score test for comparing the conditional independence model to an AR( 1) model for within-individual errors in linear RCR models. There may be some problems with this test because it is based exclusively on measurements separated by one standard time lag. The appropriateness of a linear RCR model can be easily checked. When the linear RCR model is appropriate, use of the conditional independence model brings us closer to the underlying covariance matrix, increasing efficiency of fixed effects estimates relative to models that assume complete independence of observations.

The amount of data required to support RCR models is not known. Additional research is required to determine relationships between the number of random and fixed effects and the number of subjects and observations within subject. It is intuitive that the number of random effects the data can support depends on the number of observations within subjects as well as the number of subjects. It is sensible to have enough observations within individuals to permit within-individual analyses, especially when fitting GL-RCR models. The number of observations required for linear RCR models is further complicated when AR(1) structures are used for the marginal variance of the outcome.

Accurate estimation of the variance of estimated effects is a problem when the covariance structure is not known. Two methods for consistent estimation of the variance of PA effects are the leave-one-out jackknife and the robust variance estimator. These methods require only the correct specification of the marginal mean.

Page 18: Analysis of longitudinal data: Random coefficient regression modelling

I

0

(a)

C. RUTTER AND R. ELASHOFF

I

20 I I

40 60

days from baseline

I

80 1

100

- m C .- e E

I

0

(b)

I

20 I

40 I

60

days from baseline

1 I 80 100

Figure 2. Final model, clinic 2 (a) Fitted values, X i & + Z,Bi, of per cent over ideal weight by days in study (b) Marginal residuals, yi - Xi&, by day (c) Conditional residuals, yi - Xi& - Zibi, by day

Page 19: Analysis of longitudinal data: Random coefficient regression modelling

RANDOM COEFFICIENT REGRESSION MODELLING 1229

A

‘1 I I I I I I

0 20 40 60 80 100

days from baseline

Figure 2. (Continued)

Generalized linear RCR models are much more difficult to fit, interpret, and assess than linear RCR models. GL-RCR models are only useful when SS effects are required. Because B and 8 are asymptotically correlated in the GL-RCR model, accurate estimation of var(B) must take into account estimation of 8. Gibbs sampling has made estimation of GL-RCR model parameters possible. Further research is needed to clearly determine the benefits and drawbacks of fitting GL-RCR models, and the robustness of Gibbs sampling to model assumptions.

REFERENCES 1. Metropolitan Insurance Companies ‘1983 Metropolitan Height and Weight Tables’, Statistical Bulletin,

2. Harville, D. A. ‘Maximum likelihood approaches to variance component estimation and to related problems’, Journal o j the American Statistical Association, 72, 320-240 (1977).

3. Larid, N. M. and Ware, J. H. ‘Random-effects models for longitudinal data’, Biometrics, 38, 963-974 (1982).

4. Kass, R. E. and Steffey, D. ‘Approximate Bayesian inference in conditionally independent hierarchical models (parametric empirical Bayes models)’, Journal of the American Statistical Association, 84,

5. McCullagh, P. and Nelder, J. A. Generalized Linear Models, second edition, Chapman and Hall, New York, 1989.

6. Zeger, S. L., Liang, K. and Albert, P. S. ’Models for longitudinal data: a generalized estimating equation approach’, Biometrics, 44, 1049-1060 (1988).

7. Neuhaus, J. M., Kalbfleisch, J. D. and Hauck, W. W. ‘A comparison of cluster-specific and population- averaged approaches for analyzing correlated binary data’, International Statistical Review, 59, 25-35 (1991).

64, 2-9 (1983).

717-726 (1989).

Page 20: Analysis of longitudinal data: Random coefficient regression modelling

1230 C. RUTTER AND R. ELASHOFF

8. 9.

10.

11. 12.

13.

14.

15.

16.

17.

18.

19.

20.

21.

22.

23.

24.

25. 26.

27.

28.

29. 30.

31.

32.

33.

34.

35.

36.

Galbraith, J. I. ‘The interpretation of a regression coefficient’, Biornetrics, 47, 1593-1596 (1991). Gilmour, A. R., Anderson, R. D. and Rae, A. L. ‘The analysis of binomial data by a generalized linear mixed model’, Biometrika, 72, 503-509 (1985). Zeger, S. L. and Karim, M. R. ‘Generalized linear models with random effects; a Gibbs sampling approach’, Journal of the American Statistical Association, 86, 79-86 (199 1). Diggle, P. J. ‘An approach to the analysis of repeated measurements’, Biornetrics, 44, 959-971 (1988). Chi, E. M. and Reinsel, G. C. ‘Models for longitudinal data with random effects and AR(1) errors’, Journal of the American Statistical Association, 84,452-459 (1989). Jones, R. H. and Boadi-Boateng, F. ‘Unequally spaced longitudinal data with AR(1) serial correlation’, Biornetrics, 47, 161-175 (1991). Liang, K. and Zeger, S. L. ‘Longitudinal data analysis using generalized linear models’, Biometrika, 73, 13-22 (19861 Taylor,’J. M: G., Cumberland, W. G. and Sy, J. P. ‘A stochastic model for analysis of longitudinal AIDS data’, UCLA Technical Report, 1992. Harville, D. A. ‘Extension of the Gauss-Markov theorem to include the estimation of random effects’, The Annals of Statistics, 4, 384-395 (1976). Morris, C. N. ‘Parametric empirical Bayes inference: theory and applications’, with discussion, Journal of the American Statistical Association, 78, 47-65 (1983). Freedman, D. A. and Peters, S. C. ‘Bootstrapping a regression equation: some empirical results’, Journal of the American Statistical Association, 79, 97-106 (1984). Ware, J. H. and De Gruttola, V. ‘Multivariate linear models for longitudinal data: a bootstrap study of the GLS estimator’, in Sen, P. K. (ed.), Biostatistics - Statistics in Biomedical, Public Health and Environmental Science (the Bernard Greenberg Volume), Elsevier, Amsterdam, 1985. Vonesh, E. F. and Carter, R. L. ‘Efficient inference for random-coefficient growth curve models with unbalanced data’, Biometrics, 43, 617-628 (1987). Reinsel, G. C. ‘Mean squared error properties of empirical Bayes estimators in a multivariate random effects general model’, Journal of the American Statistical Association, 80, 642-650 (1985). Stiratelli, R., Laird, N. and Ware, J. H. ‘Random-effects models for serial observations with binary response’, Biornetrics, 40, 961-971 (1984). Kackar, R. N. and Harville, D. A. ‘Unbiasedness of two-stage estimation and prediction procedures for mixed linear models’, Communications in Statistics - Theory and Methods, 10, 1249-1261 (1981). Berk, R. H. ‘Limiting behavior of posterior distributions when the model is incorrect’, The Annals of Mathematical Statistics, 37, 5 1-58 (1966). Berk, R. H. ‘Consistency a posteriori’, The Annals of Mathematical Statistics, 41, 894-906 (1970). Berger, J. 0. Statistical Decision Theory and Bayesian Analysis, second edition, Springer-Verlag, New York, 1985. Gelfand, A. E., Hills, S. E., Racine-Poon, A. and Smith, A. F. M. ‘Illustration of Bayesian inference in normal data models using Gibbs sampling’, Journal of the American Statistical Association, 85,972-985 (1990). Kackar, R. N. and Harville, D. A. ‘Approximations for standard errors of estimators of fixed and random effects in mixed linear models’, Journal of the American Statistical Association, 79,853-862 (1984). White, H. ‘Maximum likelihood estimation of misspecified models’, Econometrica, 50, 1-25 (1982). Royall, R. M. ‘Model robust confidence intervals using maximum likelihood estimators’, International Stastistical Review, 54, 221-226 (1986). Shao, J. and Wu, C. F. J. ‘A general theory for jackknife variance estimation’, The Annals of Statistics, 17,

Lee, Y. ‘Jackknife variance estimators of the location estimator in the one-way random-effects model’, Annals of the Institute of Statistical Mathematics, 43, 707-714 (1991). Moulton, L. H. and Zeger, S. L. ‘Analyzing repeated measures on generalized linear models via the bootstrap’, Biornetrics, 45, 381-394 (1989). Lipsitz, S. R., Liard, N. M. and Harrington, D. P. ‘Using the jackknife to estimate the variance of regression estimators from repeated measures studies’, Communications in Statistics - Theory and Methods, 19, 821-845 (1990). Wu, C. F. J. ‘Jackknife, bootstrap and other resampling methods in regression analysis’, The Annals of Statistics, 14, 1261-1295 (1986). Dempster, A. P., Laird, N. M. and Rubin, D. B. ‘Maximum likelihood for incomplete data via the EM algorithm’, Journal of the Royal Statistical Society, Series B, 39, 1-22 (1976).

1176-1197 (1989).

Page 21: Analysis of longitudinal data: Random coefficient regression modelling

RANDOM COEFFICIENT REGRESSION MODELLING 1231

37. Liard, N., Lange, N. and Stram, D. ‘Maximum likelihood computations with repeated measures: application of the EM algorithm’, Journal of the American Statistical Association, 82, 97-105 (1987).

38. Jennrich, R. I. and Schluchter, M. D. ‘Unbalanced repeated-measures models with structured covariance matrices’, Biometrics, 42, 805-820 (1986).

39. Lindstrom, M. J. and Bates, D. M. ‘Newton-Raphson and EM algorithms for linear mixed-effects models for repeated-measures data’, Journal of the American Statistical Association, 83, 1014-1022 (1988).

40. Gerald, C. F. Applied Numerical Analysis, Addison-Wesley, Reading MA, 1970. 41. Jamshidian, M. and Jennrich, R. I. ‘Conjugate gradient acceleration of the EM algorithm’, UCLA

42. Lange, K. ‘A Quasi-Newton acceleration of the EM algorithm’, unpublished manuscript, 1993. 43. Meng, X. and Rubin, D. B. ‘Maximum likelihood via the ECM algorithm’, Biometrika, (1993) in press. 44. Gelfand, A. E. and Smith, A. F. M. ‘Sampling-based approaches to calculating marginal densities’,

45. Tierney, L. ‘Exploring posterior distributions using Markov chains’, Computing and Stastistical

46. Gelman, A. and Rubin, D. B. ‘Inference from interative simulation using multiple sequences’, Statistical

47. Gyer, C. J. ‘Practical Markov Chain Monte Carlo’, with comment, Statistical Science, 7,473-51 1 (1993). 48. Weiss, R. E. and Lazaro, C. G. ‘Residual plots for repeated measures’, Statistics in Medicine, 11, 115-124

49. Lange, N. and Ryan, L. ‘Assessing Normality in random effects models’, The Annals of Statistics, 17,

50. Waternaux, C., Laird, N. M. and Ware, J. H. ‘Methods for analysis of longitudinal data: blood-lead concentrations and cognitive development’, Journal of the American Statistical Association, 84, 33-41 (1989).

5 1. Schluchter, M. D. ‘Likelihood-based methods for the analysis of continuous longitudinal data’, pres- ented at the 1992 Meetings of the Western North American Region of the Biometrics Society and Institute of Mathematical Statistics, Corvallis, Oregon 1992.

52. Louis, T. A. ‘General methods for analysing repeated measures’, Statistics in Medicine, 7, 29-45 (1988). 53. Liski, E. P. ‘Detecting influential measurements in a growth curves model’, Biometrics, 47, 659-668

54. Beckman, R. J., Nachtsheim, C. J. and Cook, R. D. ‘Diagnostics for mixed-model analysis of variance’,

55. De Gruttola, V., Ware, J. H. and Louis, T. A. ‘Influence analysis of generalized least squares estimators’,

56. Laird, N. M. and Louis, T. A. ‘Empirical Bayes confidence intervals based on bootstrap samples’,

57. Cook, N. and Stram, D. 0. ‘Program REML‘, Department of Biostatistics, Harvard School of Public

58. Jones, R. H. ‘Serial correlation or random subject effects?, Communications in Statistics - Simulation, 19,

Statistics Series, 48, (1990).

Journal of the American Statistical Association, 85, 398-409 (1990).

Science: 23rd Symposium on the Interface, Keramidas, E. (ed.), 1991, pp. 563-570.

Science, 7, 457-472 (1993).

(1992).

624-642 (1989).

(1991).

Technometrics, 29, 413-426 (1987).

Journal of the American Statistical Association, 82, 91 1-917 (1987).

Journal of the American Statistical Association, 82, 739-757 (1987).

Health, 1986.

1 105- 1123 (1990).