statistical analysis of repeated measures data … name, department 2 linear mixed models for...

63
1 Date Name, department Statistical Analysis of Repeated Measures Data Ziad Taib Biostatistics, AZ MV, CTH Mars 2009

Upload: ngothu

Post on 29-May-2018

212 views

Category:

Documents


0 download

TRANSCRIPT

1 Date

Name, department

Statistical Analysis of Repeated Measures Data

Ziad Taib

Biostatistics, AZ

MV, CTH

Mars 2009

Date

Name, department

2

Linear Mixed Models for Longitudinal Data, Spring 2009

Time: Fridays 9.00-11:00 lectures and Tuesdays 15.00-17.00 computer exercises

Location: MVF21 Instructors: Ziad Taib (0707655471)

[email protected] or [email protected] and Malin Östensson (031 772 53 16) [email protected]

Course Home Page: http://www.chalmers.se/math/SV/utbildning/grundutbildning/fristaende-kurser-ms/msa650

Date

Name, department

3

Textbook: Linear mixed models for longitudinal data, Geert Verbeke and Geert Molenberghs Springer Verlag, New York. plus some handouts.

Exams: Computer Assignments: 20%, Final Exam : 80%. More information on the the Final Exam will be given later

Policy on Homework: Computer Assignments are to be presented in a report, with programs, output, and statistical notation integrated. The quality of writing and organization, as well as content, will influence the grade. Homework may be turned in directly to Malin Östensson.

Date

Name, department

4

Tentative Schedule And Outline of Lectures: Introduction to Linear Mixed Models (Chap 1-4)

April 3 Estimation and Inference for the marginal model (Chap 5-6)

April 24 Inference for the random effects, Software issues (Chap 7-8)

May 4 Model building and serial correlation (Chap 9-10)

May 8 Non-linear and Generalized Linear Mixed Models (Handouts)

May 15 Incomplete data (Chap 15-16)

May 20 Design and sample size issues (Chap 23)

May 26 Exam

March 27

Date

Name, department

5

Why longitudinal data?

Very useful for their own sake.

With longitudinal data, we have the possibility of understanding what mixed models are about in a relatively simple but yet rich enough context.

___________________________________

A good reference is the book ”Designing experiments and analyzing data” by Maxwel l& Delaney (2004)

Date

Name, department

6

A motivating example

Consider a randomized clinical trial with two treatment groups and repeated measurements at baseline, 3 and 6 months later. As it turned out some of the data was missing. Moreover patients did not always comply with time requirements. Our first reaction is to try to compensate for the missing values by some kind of imputation, or to use list-wise deletion.

Both ”methods” having their shortcomings, wouldn't it be nice to be able to use something else? There is in fact an alternative method: using the idea of mixed models.

With mixed models,1. we can use all our data having the attitude that ”what is missing is missing”.

2. we can even account for the dependencies resulting from measurements made on the same individuals at different times.

3. we don’t need to be consistent about time.

Date

Name, department

7

Outline of lecture 1

1. Two examples

1. Principles of Inference

1. Modelling continuous longitudinal data

Date

Name, department

8

Mixed effects models

Ordinary fixed effects linear model usually assume:

1) independence with the same variance.2) normally distributed.3) constant parameters

If we modify assumptions 1) and 3), then the problem becomes more complicated and in general we need a large number of parameters only to describe the covariance structure of the observations. Mixed effects models deal with this type of problems.

In general, this type of models allows us to tackle such problems as: clustered data, repeated measures, hierarchical data.

constant. ),,0( is , 2 =+= βσεεβ INXY

Date

Name, department

9

Longitudinal Data

Repeated measures are obtained when a response is measured repeatedly on a set of units

• Units:

• Subjects, patients, participants, . . .

• indivduals, plants, . . .

• Clusters: nests, families, towns, . .

• . . .

• Special case: Longitudinal data

Obs! Possible to handle several levels

Date

Name, department

10

Various forms of models and relation between them

LM: Assumptions:

1. independence,

2. normality,

3. constant parameters

GLM: assumption 2) Exponential family

LMM: Assumptions 1) and 3) are modified

GLMM: Assumption 2) Exponential family and assumptions 1) and 3) are modified

Repeated measures: Assumptions 1) and 3) are modified

Longitudinal dataMaximum likelihood

Classical statistics (Observations are random, parameters are unknown constants)

Bayesian statistics

LM - Linear model

GLM - Generalised linear model

LMM - Linear mixed model

GLMM - Generalised linear mixed model

Non-linear models

Date

Name, department

11

Part 1: Two examples

Rat data

Prostate data

Date

Name, department

12

Example 1: Rat Data (Verbecke et al)

Research question How does craniofacial growth in the wistar rat depend on testosteron production?

Date

Name, department

13

Simplifie

d

(univaria

te) resp

onse

Date

Name, department

14

• Randomized experiment in which 50 male Wistar rats are randomized to: Control (15 rats) Low dose of Decapeptyl (18 rats) High dose of Decapeptyl (17 rats)

Treatment starts at the age of 45 days. Measurements taken every 10 days, from day 50 on. The responses are distances (pixels) between two well

defined points on x-ray pictures of the skull of each rat. Here, we consider only one response, reflecting the height of the skull.

Prevents the production of testesterone

45

Days

60 7050 80

Date

Name, department

15

Individual profiles:

1. Connected profiles better that scatter plots2. Growth is expected but is it linear3. Of interest change over time (i.e. Relationship between response and age)

Date

Name, department

16

Complication: Many dropouts due to anaesthesia imply less power but no bias.

Without dropouts easier problem because of balance.

Date

Name, department

17

Remarks:

Much variability between rats Much less variability within rats Fixed number of measurements scheduled per subject,

but not all measurements available due to dropout, for known reason.

Measurements taken at fixed time points

Research question: How does craniofacial growth in the wistar

rat depend on testosteron production ?

Date

Name, department

18

Example 2: The BLSA Prostate Data

Date

Name, department

19

Example 2: The BLSA Prostate Data (Pearson et al., Statistics in Medicine,1994).

Prostate disease is one of the most common and most costly medical problems in the world. Important to look for biomarkers which can detect the disease at an early stage.

Prostate-Specific Antigen is an enzyme produced by both normal and cancerous prostate cells. It is believed that PSA level is related to the volume of prostate tissue.

Problem: Patients with Benign Prostatic Hyperplasia also have an increased PSA level

Overlap in PSA distribution for cancer and BPH cases seriously complicates the detection of prostate cancer.

Date

Name, department

20

Prostate-specific antigen (PSA) is a glycoprotein in the cytoplasm of prostatic epithelial cells. It can be detected in the blood of all adult men. The PSA level is increased in men with prostate cancer but can also be increased somewhat in other disorders of the prostate.

Date

Name, department

21

Research question: Can longitudinal PSA profiles be used to detect prostate cancer in an early stage ?

A retrospective case-control study based on frozen serum samples:

16 control patients 20 BPH cases 14 local cancer cases 4 metastatic cancer cases

Date

Name, department

22

Individual profiles:

Date

Name, department

23

Remarks:

Much variability between subjects Little variability within subjects Highly unbalanced data

Research question: Can longitudinal PSA profiles be used to

detect prostate cancer in an early stage ?

Date

Name, department

24

Part 2: Principles of Inference

Date

Name, department

25

Fisher likelihood Inference for observable y and fixed parameter θ Data Generation : Given a stochastic model ,

Generate data, y, from

Parameter Estimation : Given the data y, make inference about θ by using the likelihood

Connection between two processes :

)(yfθ

)/( θθ yL

)()/( yfyL θθ θ =

)(yfθ

Date

Name, department

26

(Classical) Likelihood Principle

Birnbaum (1962) All the evidence or information about the parameters in the data is in the likelihood.

Conditionality principle& Sufficiency principle

Likelihood principle

Date

Name, department

27

Bayesian Inference for observable y and unobservable ν Data Generation : Generate data according to

1. ν, from

1. For ν fixed generate y from

Combine into Parameter Estimation : Given the data y, make

inference about ν by using The connection between two processes:

)(νf

)/()()/()( yfyfyff ννν =

)/()( νν yff

)/( νyf

)/( yf ν

prior

posterior

Compare with )/( θθ yL

)/()(),()/()()(

),()/( νννννν yffyfyfyf

yf

yfyf ==⇒=

Date

Name, department

28

Extended likelihood inference: (Lee and Nelder) for observable y, fixed parameter θ and unobservable ν

Date

Name, department

29

Parameter estimation )()/( yfyL θθ =

Date

Name, department

30

Extended Likelihood Principle

Björnstad (1996) All information in the data about the unobservables and the parameters is in the “likelihood”.

Conditionality principle& Sufficiency principle

Likelihood principle

Date

Name, department

31

Prediction: predict the number of seizures during the next week

Date

Name, department

32

Date

Name, department

33

Bayesian Predictive Inference

Given ν, the observations y are assumed to be independent. How do we predict the next value, Y, of the observable? In a Bayesian setting we may determine the posterior and define the predictive density of Y given y as:

)/( yxfY

)/( yf ν

Obs!

Jefreys’ Priors

Date

Name, department

34

Bayesian inference (Pearson, 1920)

Date

Name, department

35

Date

Name, department

36

Nelder and Lee (1996))()/( yy θθ fL =

)/( yνθf

?

Date

Name, department

37

Date

Name, department

38

Part 3: A Model for Longitudinal Data

Date

Name, department

39

Introduction

In practice: often unbalanced data due to (i) unequal number of measurements per subject (ii) measurements not taken at fixed time points. Therefore,

ordinary multivariate regression techniques are often not applicable.

Often, subject-specific longitudinal profiles can be well approximated by linear regression functions. This leads to a 2-stage model formulation: Stage 1: A linear regression model for each subject separately Stage 2: Explain variability in the subject-specific regression

coefficients using known covariates

Date

Name, department

40

A 2-stage Model Formulation: Stage 1 Response Yij for ith subject, measured at time tij, i = 1, . . . , N, j = 1, . .

. , ni • Response vector Yi for ith subject:

Zi is a (ni x q) matrix of known covariates and βi is a (ni x q) matrix of parameters

Note that the above model describes the observed variability within subjects

iiiiiiii

iniii

InNZY

YYYYi

2

21

often ),,0(~ ,

)',...,,(

σεεβ =ΣΣ+=

= Possibly after some convenient transformation

Date

Name, department

41

Stage 2

Between-subject variability can now be studied from relating the parameters βi to known covariates

Ki is a (q x p) matrix of known covariates and

β is a (p-dimensional vector of unknown regression

parameters

Finally

iii bK += ββ

),0(~ ii Nb Σ

Date

Name, department

42

The General Linear Mixed-effectsModel The 2-stages of the 2-stage approach can now be

combined into one model:

Average evolution Subject specific

Date

Name, department

43

Convenient using multivariate normal.Very difficult with other distributions

The general mixed effects models can be summarized by:

Terminology:• Fixed ffects: β• Random effects: bi

• Variance components: elements in D and Σi

Date

Name, department

44

Remarks

1. It is occasionally unclear if we should treat an effect as a fixed or a mixed effect. For example in clinical trials with treatment and clinic as “factors” should we consider clinics as random?

2. Considering the general form of a mixed effects model

notice that the fixed effects are involved only in mean values (just like in ordinary linear models) while random effects modify the covariance matrix of the observations.

iiiii bZXY εβ ++=

?

Date

Name, department

45

Example: The Rat Data

Date

Name, department

46

Transformation of the time scale to linearize the profiles:

Note that t = 0 corresponds to the start of the treatment (moment of randomization)

• Stage 1 model:

]10

)45(1ln[

−+=→ ij

ijij

AgetAge

iijijiiij njtY ,1,... ,21 =++= εββ

Date

Name, department

47

Stage 1

=

i

ii

2

1

ββ

β

Date

Name, department

48

Stage 2 model:

In the second stage, the subject-specific intercepts and time effects are related to the treatment of the rats

Date

Name, department

49

The hierarchical versus the marginal Model

The general mixed model is given by It can be written as

It is therefore also called a hierarchical model

Date

Name, department

50

f(yi I bi)f(bi)

f(yi)

Marginally we have that is distributed as

Hence

Date

Name, department

51

Example: The Rat Data

Linear model where eachrat has its own interceptand its own slope

Can be negative or positivereflecting individual deviationfrom average

Date

Name, department

52

Notice that the model assumes that thevariance function is quadratic over time.

Comments:• Linear average evolution in each group• Equal average intercepts• Different average slopes

Moreover, taking

Date

Name, department

53

[ ] [ ]

[ ]

[ ]

[ ]

[ ]),cov()(

),cov(

),cov(1

,

),cov(1

,1

),cov(1

)cov(,1

),1,,1(

))(),((

112221122111

11222112212111

112

2211212111

1122212

12111

1122

11

22

121

2

11

21

ii

ii

ii

ii

iii

i

ii

ii

i

i

i

dttdttd

dttdtdtd

tdtddtd

tdd

ddt

tt

ttCov

ttCov

εεεε

εε

εε

εεββ

εββ

εββ

+++++=++++=

+

++=

+

=

+

=

+

+

=

YY

Date

Name, department

54

Date

Name, department

55

Date

Name, department

56

The prostate data

iijijiijii

ij

ij

njtt

PSA

Y

,1,... ,

)1ln(2

321 =+++=

+=

εβββ

A model for the prostate cancer Stage 1

Date

Name, department

57

The prostate data

Age could not be matched

+++++++++++++++

=

jiiiii

jiiiii

jiiiii

i

i

i

bMLBCAge

bMLBCAge

bMLBCAge

31514131211

2109876

154321

3

2

1

βββββββββββββββ

βββ

A model for the prostate cancer Stage 2

Ci, Bi, Li, Mi are indicators of the classes: control, BPH, local or

metastatic cancer. Agei is the subject’s age at diagnosis. The parameters in the first row are the average intercepts for the different classes.

Date

Name, department

58

The prostate data

This gives the following model

εij

Date

Name, department

59

Stochastic components in general linear mixed model

Average evolution

Subject 2

Subject 1

Time

Res

pons

e

Date

Name, department

60

References

Aerts, M., Geys, H., Molenberghs, G., and Ryan, L.M.(2002). Topics in Modelling of Clustered Data. London: Chapman and Hall.

• Brown, H. and Prescott, R. (1999). Applied Mixed Models in Medicine. New-York: John Wiley & Sons.

• Crowder, M.J. and Hand, D.J. (1990). Analysis of Repeated Measures. London: Chapman and Hall.

• Davidian, M. and Giltinan, D.M. (1995). Nonlinear Models For Repeated Measurement Data. London: Chapman and Hall.

Davis, C.S. (2002). Statistical Methods for the Analysis of Repeated Measurements. New York: Springer-Verlag.

Diggle, P.J., Heagerty, P.J., Liang, K.Y. and Zeger, S.L. (2002). Analysis of Longitudinal Data. (2nd edition). Oxford: Oxford University Press.

Date

Name, department

61

References

Fahrmeir, L. and Tutz, G. (2002). Multivariate Statistical Modelling Based on Generalized Linear Models, (2nd edition). Springer Series in Statistics. New-York: Springer-Verlag.

Goldstein, H. (1979). The Design and Analysis of Longitudinal Studies. London: Academic Press.

Goldstein, H. (1995). Multilevel Statistical Models. London: Edward Arnold.

Hand, D.J. and Crowder, M.J. (1995). Practical Longitudinal Data Analysis. London: Chapman and Hall.

Jones, B. and Kenward, M.G. (1989). Design and Analysis of Crossover Trials. London: Chapman and Hall.

Kshirsagar, A.M. and Smith, W.B. (1995). Growth Curves. New-York: Marcel Dekker.

Lindsey, J.K. (1993). Models for Repeated Measurements. Oxford: Oxford University Press.

Longford, N.T. (1993). Random Coefficient Models. Oxford: Oxford University Press.

Date

Name, department

62

References

Pinheiro, J.C. and Bates D.M. (2000). Mixed effects models in S and S-Plus, Springer Series in Statistics and Computing. New-York: Springer-Verlag.

Searle, S.R., Casella, G., and McCulloch, C.E. (1992). Variance Components. New-York: Wiley.

Senn, S.J. (1993). Cross-over Trials in Clinical Research. Chichester: Wiley.

Verbeke, G. and Molenberghs, G. (1997). Linear Mixed Models In Practice: A SAS Oriented Approach, Lecture Notes in Statistics 126. New-York: Springer-Verlag.

Verbeke, G. and Molenberghs, G. (2000). Linear Mixed Models for Longitudinal Data. Springer Series in Statistics. New-York: Springer-Verlag.

Vonesh, E.F. and Chinchilli, V.M. (1997). Linear and Non-linear Models for the Analysis of Repeated Measurements. Marcel Dekker: Basel.

Date

Name, department

63

Any Questions?