statistical analysis of repeated measures data … name, department 2 linear mixed models for...
TRANSCRIPT
1 Date
Name, department
Statistical Analysis of Repeated Measures Data
Ziad Taib
Biostatistics, AZ
MV, CTH
Mars 2009
Date
Name, department
2
Linear Mixed Models for Longitudinal Data, Spring 2009
Time: Fridays 9.00-11:00 lectures and Tuesdays 15.00-17.00 computer exercises
Location: MVF21 Instructors: Ziad Taib (0707655471)
[email protected] or [email protected] and Malin Östensson (031 772 53 16) [email protected]
Course Home Page: http://www.chalmers.se/math/SV/utbildning/grundutbildning/fristaende-kurser-ms/msa650
Date
Name, department
3
Textbook: Linear mixed models for longitudinal data, Geert Verbeke and Geert Molenberghs Springer Verlag, New York. plus some handouts.
Exams: Computer Assignments: 20%, Final Exam : 80%. More information on the the Final Exam will be given later
Policy on Homework: Computer Assignments are to be presented in a report, with programs, output, and statistical notation integrated. The quality of writing and organization, as well as content, will influence the grade. Homework may be turned in directly to Malin Östensson.
Date
Name, department
4
Tentative Schedule And Outline of Lectures: Introduction to Linear Mixed Models (Chap 1-4)
April 3 Estimation and Inference for the marginal model (Chap 5-6)
April 24 Inference for the random effects, Software issues (Chap 7-8)
May 4 Model building and serial correlation (Chap 9-10)
May 8 Non-linear and Generalized Linear Mixed Models (Handouts)
May 15 Incomplete data (Chap 15-16)
May 20 Design and sample size issues (Chap 23)
May 26 Exam
March 27
Date
Name, department
5
Why longitudinal data?
Very useful for their own sake.
With longitudinal data, we have the possibility of understanding what mixed models are about in a relatively simple but yet rich enough context.
___________________________________
A good reference is the book ”Designing experiments and analyzing data” by Maxwel l& Delaney (2004)
Date
Name, department
6
A motivating example
Consider a randomized clinical trial with two treatment groups and repeated measurements at baseline, 3 and 6 months later. As it turned out some of the data was missing. Moreover patients did not always comply with time requirements. Our first reaction is to try to compensate for the missing values by some kind of imputation, or to use list-wise deletion.
Both ”methods” having their shortcomings, wouldn't it be nice to be able to use something else? There is in fact an alternative method: using the idea of mixed models.
With mixed models,1. we can use all our data having the attitude that ”what is missing is missing”.
2. we can even account for the dependencies resulting from measurements made on the same individuals at different times.
3. we don’t need to be consistent about time.
Date
Name, department
7
Outline of lecture 1
1. Two examples
1. Principles of Inference
1. Modelling continuous longitudinal data
Date
Name, department
8
Mixed effects models
Ordinary fixed effects linear model usually assume:
1) independence with the same variance.2) normally distributed.3) constant parameters
If we modify assumptions 1) and 3), then the problem becomes more complicated and in general we need a large number of parameters only to describe the covariance structure of the observations. Mixed effects models deal with this type of problems.
In general, this type of models allows us to tackle such problems as: clustered data, repeated measures, hierarchical data.
constant. ),,0( is , 2 =+= βσεεβ INXY
Date
Name, department
9
Longitudinal Data
Repeated measures are obtained when a response is measured repeatedly on a set of units
• Units:
• Subjects, patients, participants, . . .
• indivduals, plants, . . .
• Clusters: nests, families, towns, . .
• . . .
• Special case: Longitudinal data
Obs! Possible to handle several levels
Date
Name, department
10
Various forms of models and relation between them
LM: Assumptions:
1. independence,
2. normality,
3. constant parameters
GLM: assumption 2) Exponential family
LMM: Assumptions 1) and 3) are modified
GLMM: Assumption 2) Exponential family and assumptions 1) and 3) are modified
Repeated measures: Assumptions 1) and 3) are modified
Longitudinal dataMaximum likelihood
Classical statistics (Observations are random, parameters are unknown constants)
Bayesian statistics
LM - Linear model
GLM - Generalised linear model
LMM - Linear mixed model
GLMM - Generalised linear mixed model
Non-linear models
Date
Name, department
12
Example 1: Rat Data (Verbecke et al)
Research question How does craniofacial growth in the wistar rat depend on testosteron production?
Date
Name, department
14
• Randomized experiment in which 50 male Wistar rats are randomized to: Control (15 rats) Low dose of Decapeptyl (18 rats) High dose of Decapeptyl (17 rats)
Treatment starts at the age of 45 days. Measurements taken every 10 days, from day 50 on. The responses are distances (pixels) between two well
defined points on x-ray pictures of the skull of each rat. Here, we consider only one response, reflecting the height of the skull.
Prevents the production of testesterone
45
Days
60 7050 80
Date
Name, department
15
Individual profiles:
1. Connected profiles better that scatter plots2. Growth is expected but is it linear3. Of interest change over time (i.e. Relationship between response and age)
Date
Name, department
16
Complication: Many dropouts due to anaesthesia imply less power but no bias.
Without dropouts easier problem because of balance.
Date
Name, department
17
Remarks:
Much variability between rats Much less variability within rats Fixed number of measurements scheduled per subject,
but not all measurements available due to dropout, for known reason.
Measurements taken at fixed time points
Research question: How does craniofacial growth in the wistar
rat depend on testosteron production ?
Date
Name, department
19
Example 2: The BLSA Prostate Data (Pearson et al., Statistics in Medicine,1994).
Prostate disease is one of the most common and most costly medical problems in the world. Important to look for biomarkers which can detect the disease at an early stage.
Prostate-Specific Antigen is an enzyme produced by both normal and cancerous prostate cells. It is believed that PSA level is related to the volume of prostate tissue.
Problem: Patients with Benign Prostatic Hyperplasia also have an increased PSA level
Overlap in PSA distribution for cancer and BPH cases seriously complicates the detection of prostate cancer.
Date
Name, department
20
Prostate-specific antigen (PSA) is a glycoprotein in the cytoplasm of prostatic epithelial cells. It can be detected in the blood of all adult men. The PSA level is increased in men with prostate cancer but can also be increased somewhat in other disorders of the prostate.
Date
Name, department
21
Research question: Can longitudinal PSA profiles be used to detect prostate cancer in an early stage ?
A retrospective case-control study based on frozen serum samples:
16 control patients 20 BPH cases 14 local cancer cases 4 metastatic cancer cases
Date
Name, department
23
Remarks:
Much variability between subjects Little variability within subjects Highly unbalanced data
Research question: Can longitudinal PSA profiles be used to
detect prostate cancer in an early stage ?
Date
Name, department
25
Fisher likelihood Inference for observable y and fixed parameter θ Data Generation : Given a stochastic model ,
Generate data, y, from
Parameter Estimation : Given the data y, make inference about θ by using the likelihood
Connection between two processes :
)(yfθ
)/( θθ yL
)()/( yfyL θθ θ =
)(yfθ
Date
Name, department
26
(Classical) Likelihood Principle
Birnbaum (1962) All the evidence or information about the parameters in the data is in the likelihood.
Conditionality principle& Sufficiency principle
Likelihood principle
Date
Name, department
27
Bayesian Inference for observable y and unobservable ν Data Generation : Generate data according to
1. ν, from
1. For ν fixed generate y from
Combine into Parameter Estimation : Given the data y, make
inference about ν by using The connection between two processes:
)(νf
)/()()/()( yfyfyff ννν =
)/()( νν yff
)/( νyf
)/( yf ν
prior
posterior
Compare with )/( θθ yL
)/()(),()/()()(
),()/( νννννν yffyfyfyf
yf
yfyf ==⇒=
Date
Name, department
28
Extended likelihood inference: (Lee and Nelder) for observable y, fixed parameter θ and unobservable ν
Date
Name, department
30
Extended Likelihood Principle
Björnstad (1996) All information in the data about the unobservables and the parameters is in the “likelihood”.
Conditionality principle& Sufficiency principle
Likelihood principle
Date
Name, department
33
Bayesian Predictive Inference
Given ν, the observations y are assumed to be independent. How do we predict the next value, Y, of the observable? In a Bayesian setting we may determine the posterior and define the predictive density of Y given y as:
)/( yxfY
)/( yf ν
Obs!
Jefreys’ Priors
Date
Name, department
39
Introduction
In practice: often unbalanced data due to (i) unequal number of measurements per subject (ii) measurements not taken at fixed time points. Therefore,
ordinary multivariate regression techniques are often not applicable.
Often, subject-specific longitudinal profiles can be well approximated by linear regression functions. This leads to a 2-stage model formulation: Stage 1: A linear regression model for each subject separately Stage 2: Explain variability in the subject-specific regression
coefficients using known covariates
Date
Name, department
40
A 2-stage Model Formulation: Stage 1 Response Yij for ith subject, measured at time tij, i = 1, . . . , N, j = 1, . .
. , ni • Response vector Yi for ith subject:
Zi is a (ni x q) matrix of known covariates and βi is a (ni x q) matrix of parameters
Note that the above model describes the observed variability within subjects
iiiiiiii
iniii
InNZY
YYYYi
2
21
often ),,0(~ ,
)',...,,(
σεεβ =ΣΣ+=
= Possibly after some convenient transformation
Date
Name, department
41
Stage 2
Between-subject variability can now be studied from relating the parameters βi to known covariates
Ki is a (q x p) matrix of known covariates and
β is a (p-dimensional vector of unknown regression
parameters
Finally
iii bK += ββ
),0(~ ii Nb Σ
Date
Name, department
42
The General Linear Mixed-effectsModel The 2-stages of the 2-stage approach can now be
combined into one model:
Average evolution Subject specific
Date
Name, department
43
Convenient using multivariate normal.Very difficult with other distributions
The general mixed effects models can be summarized by:
Terminology:• Fixed ffects: β• Random effects: bi
• Variance components: elements in D and Σi
Date
Name, department
44
Remarks
1. It is occasionally unclear if we should treat an effect as a fixed or a mixed effect. For example in clinical trials with treatment and clinic as “factors” should we consider clinics as random?
2. Considering the general form of a mixed effects model
notice that the fixed effects are involved only in mean values (just like in ordinary linear models) while random effects modify the covariance matrix of the observations.
iiiii bZXY εβ ++=
?
Date
Name, department
46
Transformation of the time scale to linearize the profiles:
Note that t = 0 corresponds to the start of the treatment (moment of randomization)
• Stage 1 model:
]10
)45(1ln[
−+=→ ij
ijij
AgetAge
iijijiiij njtY ,1,... ,21 =++= εββ
Date
Name, department
48
Stage 2 model:
In the second stage, the subject-specific intercepts and time effects are related to the treatment of the rats
Date
Name, department
49
The hierarchical versus the marginal Model
The general mixed model is given by It can be written as
It is therefore also called a hierarchical model
Date
Name, department
51
Example: The Rat Data
Linear model where eachrat has its own interceptand its own slope
Can be negative or positivereflecting individual deviationfrom average
Date
Name, department
52
Notice that the model assumes that thevariance function is quadratic over time.
Comments:• Linear average evolution in each group• Equal average intercepts• Different average slopes
Moreover, taking
Date
Name, department
53
[ ] [ ]
[ ]
[ ]
[ ]
[ ]),cov()(
),cov(
),cov(1
,
),cov(1
,1
),cov(1
)cov(,1
),1,,1(
))(),((
112221122111
11222112212111
112
2211212111
1122212
12111
1122
11
22
121
2
11
21
ii
ii
ii
ii
iii
i
ii
ii
i
i
i
dttdttd
dttdtdtd
tdtddtd
tdd
ddt
tt
ttCov
ttCov
εεεε
εε
εε
εεββ
εββ
εββ
+++++=++++=
+
++=
+
=
+
=
+
+
=
YY
Date
Name, department
56
The prostate data
iijijiijii
ij
ij
njtt
PSA
Y
,1,... ,
)1ln(2
321 =+++=
+=
εβββ
A model for the prostate cancer Stage 1
Date
Name, department
57
The prostate data
Age could not be matched
+++++++++++++++
=
jiiiii
jiiiii
jiiiii
i
i
i
bMLBCAge
bMLBCAge
bMLBCAge
31514131211
2109876
154321
3
2
1
βββββββββββββββ
βββ
A model for the prostate cancer Stage 2
Ci, Bi, Li, Mi are indicators of the classes: control, BPH, local or
metastatic cancer. Agei is the subject’s age at diagnosis. The parameters in the first row are the average intercepts for the different classes.
Date
Name, department
59
Stochastic components in general linear mixed model
Average evolution
Subject 2
Subject 1
Time
Res
pons
e
Date
Name, department
60
References
Aerts, M., Geys, H., Molenberghs, G., and Ryan, L.M.(2002). Topics in Modelling of Clustered Data. London: Chapman and Hall.
• Brown, H. and Prescott, R. (1999). Applied Mixed Models in Medicine. New-York: John Wiley & Sons.
• Crowder, M.J. and Hand, D.J. (1990). Analysis of Repeated Measures. London: Chapman and Hall.
• Davidian, M. and Giltinan, D.M. (1995). Nonlinear Models For Repeated Measurement Data. London: Chapman and Hall.
Davis, C.S. (2002). Statistical Methods for the Analysis of Repeated Measurements. New York: Springer-Verlag.
Diggle, P.J., Heagerty, P.J., Liang, K.Y. and Zeger, S.L. (2002). Analysis of Longitudinal Data. (2nd edition). Oxford: Oxford University Press.
Date
Name, department
61
References
Fahrmeir, L. and Tutz, G. (2002). Multivariate Statistical Modelling Based on Generalized Linear Models, (2nd edition). Springer Series in Statistics. New-York: Springer-Verlag.
Goldstein, H. (1979). The Design and Analysis of Longitudinal Studies. London: Academic Press.
Goldstein, H. (1995). Multilevel Statistical Models. London: Edward Arnold.
Hand, D.J. and Crowder, M.J. (1995). Practical Longitudinal Data Analysis. London: Chapman and Hall.
Jones, B. and Kenward, M.G. (1989). Design and Analysis of Crossover Trials. London: Chapman and Hall.
Kshirsagar, A.M. and Smith, W.B. (1995). Growth Curves. New-York: Marcel Dekker.
Lindsey, J.K. (1993). Models for Repeated Measurements. Oxford: Oxford University Press.
Longford, N.T. (1993). Random Coefficient Models. Oxford: Oxford University Press.
Date
Name, department
62
References
Pinheiro, J.C. and Bates D.M. (2000). Mixed effects models in S and S-Plus, Springer Series in Statistics and Computing. New-York: Springer-Verlag.
Searle, S.R., Casella, G., and McCulloch, C.E. (1992). Variance Components. New-York: Wiley.
Senn, S.J. (1993). Cross-over Trials in Clinical Research. Chichester: Wiley.
Verbeke, G. and Molenberghs, G. (1997). Linear Mixed Models In Practice: A SAS Oriented Approach, Lecture Notes in Statistics 126. New-York: Springer-Verlag.
Verbeke, G. and Molenberghs, G. (2000). Linear Mixed Models for Longitudinal Data. Springer Series in Statistics. New-York: Springer-Verlag.
Vonesh, E.F. and Chinchilli, V.M. (1997). Linear and Non-linear Models for the Analysis of Repeated Measurements. Marcel Dekker: Basel.