mixed models, theory and applications

228
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Contents 1 Models with fixed effects 1 Modelisation preambles and linear (fixed effects) models (LM) 2 Generalized linear (fixed effects) models (GLM) 2 Linear Mixed Models (LMM) 1 Definition and notations 2 Estimation and algorithms 3 Models selection 3 Generalized Linear Mixed Models (GLMM) 1 Definition and notations 2 Estimation and algorithms 3 Model selection 4 Longitudinal, overdispersion and non linear mixed models examples 5 Experimentation using R C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Upload: others

Post on 18-Oct-2021

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Contents

1 Models with fixed effects1 Modelisation preambles and linear (fixed effects) models (LM)2 Generalized linear (fixed effects) models (GLM)

2 Linear Mixed Models (LMM)1 Definition and notations2 Estimation and algorithms3 Models selection

3 Generalized Linear Mixed Models (GLMM)1 Definition and notations2 Estimation and algorithms3 Model selection

4 Longitudinal, overdispersion and non linear mixed models examples

5 Experimentation using R

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 2: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

Basics for model construction

1 a set of observations for :

y a response variable (i.e. dependent variable)x1, x2, ... explanatory variables

with matrix notation :

y =

y1

...yn

X =

x1,1 . . . x1,K

......

...xn,1 . . . xn,K

y is a realization of the n × 1 response random vector YX is the observation of the n × K design matrix

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 3: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

2 an assumption on the distribution of Y

Definition (Parametric density family)

A parametric density family is a collection of density functions fθ(y),described by a finite-dimensional parameter θ. The set of all allowablevalues for the parameter is denoted Θ ⊆ RK , and the model itself iswritten as

F = {fθ(y) | θ ∈ Θ}

Example

Gaussian parametric model

F =

{fµ,σ(y) = 1√

2πσe− (y−µ)2

2σ2 | µ ∈ R, σ > 0

}Exponential parametric model

F ={

fλ(y) = λe−λy | λ > 0}

Binomial parametric model

F ={

fp(y) = C yn py (1− p)n−y | 0 ≤ p ≤ 1

}C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 4: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

3 a link between Y and the X ’s

for ’linear models’, this link is assumed between :

the mean (i.e. expected value)

µ = E(Y )

a linear combination of the X ’s through the linear predictor

η = Xβ

byg(µ) = η

with g(µ) = µ for the normality assumption !

Remark

Linearity is expressed in terms of linear combination of parameters

η = µ+ β1x + β2x2

takes part of a linear model but relation between y and x is no morelinear.

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 5: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

About explanatory variables

2 types of explanatory variables :

1 factors↪→ interest is in attributing variability in y to various categories ofthe factor

Example: patients classified by gender (M/F) and age group(A/B/C)

ηij = µ+ αi + βj i = 1, 2 j = 1, 2, 3

↪→ parameter values give the impact of factor’s levels on theresponse variable

factors may be crossed or nestedfactors may have main effect and interaction effect

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 6: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

2 regressors↪→ interest is in attributing variability in y to changes in values of acontinuous covariable

Example: changes due to weight x

η = β0 + β1x

↪→ parameter values give the impact of an increase in x on theresponse variable

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 7: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

Fixed vs Random effects

Example

4 loaves of bread are taken from each of 6 batches of bread baked at 3different temperatures

interest of course in each particular baking temperature used

no interest in each batch which are very depending on thecircumstances

batches effect can be viewed as a sample of a random batch effect(levels are chosen at random from an infinite set of batches levels)

interest in estimating the variance of the batch effect as a source ofrandom variation in the data

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 8: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

fixed effect (factor) is defined with a finite set of levels and wheninterest lies in the estimation of each particular level effect

random effect (factor) is defined with an infinite set of levels (withonly a finite subset present in the data collection) and when interestlies more in the variance induced by these levels than in theestimation of the levels themselves

↪→ Consequence: data collected within each level of the random effectfactor are linked to a same realization of a random variable. Thisintroduce dependency between this data.

Example

data collected on the 4 loaves in each batch share something linked tothe batch itself !

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 9: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

Example (Placebo and drug)

Clinical trial of treating epileptics with a drug.Patients randomly allocated to either drug or placebo.Response: number of seizures for patient k receiving treatment i

E(Yik ) = µi = µ+ αi

The 2 treatments are the only 2 being used and there is no thought forany other treatments. Each level is of interest.

↪→ Treatment is a fixed effect factor.

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 10: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

Example (cont.)

Suppose the clinical trial is composed of repetition at 20 different clinicsin the city.Response: number of seizures for patient k receiving treatment i inclinics j

E(Yijk ) = µij = µ+ αi + bj

Clinics can be viewed as a random sample of clinics.Inference will be made on the population of clinics effects and inparticular the variance (magnitude of the variations among clinics).

↪→ Clinics is a random effect factor.

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 11: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

Example (cont.)

Assume the procedure is very specialized so that the trial is onlyconducted in a very few number of referral hospitals.We can no longer consider the clinics selected as a sample from a largergroup of clinics.We want to make inference only to the clinics in the study.

↪→ Clinics is a fixed effect factor.

The situation is determining whether clinics effect is to be consideredfixed or random.

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 12: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

Help for decision random or fixed

Are the levels of the factor going to be considered as a random samplefrom a population of effects ?

Is there enough information about a factor to decide that the levels of itin the data are like a random sample ?

Does each level of the factor have a constant effect we want to estimate ?

Do we want to identify the levels variation as a source of variability ?

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 13: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

Mathematical notation for fixed effects

E(Yik ) = µ+ αi

Mathematical notation and properties for random effects

E(Yik |ai ) = µ+ ai

when ai are the levels of a random effect; ai ’s are treated as randomvariables.2 usual assumptions :

1 ai ’s are independently and identically distributed

2 ai ’s have zero mean and all the same variance σ2a .

ai ∼ i .i .d .(0, σ2a)

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 14: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

Is height depending on sex and weight?

Observations

(y , s,w) the height,the sex and theweight of 100

individuals sampledfrom a population

Height in cm

Fre

quen

cy

160 165 170 175

05

1015

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 15: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

Model construction

yij is a realization for individual i of sex j of Yij from a parametricdensity family FWe assume:

F is the gaussian parametric family,

Height in cm

Fre

quen

cy

160 165 170 175

05

1015

Residuals

Fre

quen

cy

−2 −1 0 1 2 3

05

1015

20

Yij are independentE(Yij ) = µ+ αj + βwij where µ is the intercept, αj the sex fixedeffects (male and female), β fixed effect of weight.V(Yij ) = σ2,∀i , j

this implies1 Yij ∼ N (µ+ αj + βwij , σ

2)2 µ, αj , β and σ2 are the unknown parameters.

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 16: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

Is number of foliar blotches depending on clone?

Observations

(y , c) number offoliar blotches and

clone identifiersampled from apopulation. 25

individuals for eachof 4 clones havebeen measured.

0 2 4 6 8 10 12 14

010

2030

4050

60

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 17: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

Is number of foliar blotches depending on clone?

Model construction

yij is a realization for individual i of clone j of Yij from theparametric Poisson family FF = { e−λλy

y ! , λ > 0} and E(Y ) = λ.

We assume:

Yij are independentηij = µ+ αj where µ is the intercept, αj the clone fixed effectslog link: log[E(Yij )] = log(λij ) = ηij

this implies1 V(Yij ) = E(Yij )2 µ and αj ’s are unknown parameters.

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 18: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

Is number of foliar blotches depending on clone?

Observations

(y , c) number offoliar blotches and

clone identifiersampled from apopulation. 25

individuals per clonehave beenmeasured.

z = 1 if tree is non infectedz = 0 if tree is infected

number of non-infected trees per clone

C1 C2 C3 C419 17 4 0

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 19: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

Model construction

zij is a realization for individual i of clone j of Zij from theparametric Bernoulli family FF =

{fp(z) = pz (1− p)1−z | 0 ≤ p ≤ 1

}and E(Z ) = p.

We assume:

Zij ’s are independentηij = µ+ αj where µ is the intercept, αj the clone fixed effectslogit link: logit(pij ) = log(

pij

1−pij) = ηij

this implies1 V(Zij ) = pij (1− pij )2 µ and αj ’s are unknown parameters.

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 20: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

Is height depending on family?

Observations

30 families havebeen studied and 10

individuals perfamily have been

measured. Let(y , f ) height andfamily of the 300

individuals sampledfrom a population

0 5 10 15 20

020

4060

80

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 21: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

Model construction 1: select the best families

Estimate the value of each family jWe assume:

1 Yij comes from the gaussian parametric density,2 Yij are independent3 E(Yij ) = µ+ αj where µ is the intercept, αj the family fixed effects4 V(Yij ) = σ2,∀i , j

this implies1 family fixed effects2 individuals belonging to a same family are assumed to be independent

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 22: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

Model construction 2: genetic breeding program

estimate the family variability in the whole populationWe assume:

1 Yij comes from the gaussian parametric density,2 E(Yij |αj ) = µ+ αj where µ is the intercept, αj the family random

effects3 conditionnally to αj , Yij are independent4 V(Yij |αj ) = σ2, ∀i , j5 αj ∼ N (0, σ2

α)

this implies1 marginal expectation E(Yij ) = µ2 marginal correlation

ρ(Yij ,Yi′j′) =

{σ2α

σ2α+σ2 if j = j ′

0 if j 6= j ′

3 individuals belonging to a same family are not independent

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 23: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

Linear Models definition

Let (yi , xi1, . . . , xiK )ni=1 denote K + 1 observations for n individuals.

(yi )ni=1 realizations of n random variables (Yi )

ni=1. Yi are modeled by a

linear model when

1 Yi comes from the gaussian parametric density family

2 Yi are independent

3 predictor: ηi = β0 +∑K

k=1 xikβk (linear combination of parameters)

4 expectation with identity link: E(Yi ) = ηi

5 variance: V(Yi ) = σ2, ∀i

Yi = β0 +K∑

k=1

xikβk︸ ︷︷ ︸fixed part

+ εi︸︷︷︸error part

εi iid ∼ N (0, σ2)

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 24: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

Matrix formulation

Whatever the nature of the explanatory variables (regressors, factors),linear model always could be defined as follows:

Y = Xβ + ε

with β = (β0, β1, . . . , βK )′

6 observations3 regressors

Y1

.

.

.Y6

=

1 x1,1 . . . x1,3

.

.

.

.

.

.

.

.

.

.

.

.1 x6,1 . . . x6,3

β0β1β2β3

1 factor, 2 levels

Y1

.

.

.Y6

=

1 1 01 1 01 1 01 0 11 0 11 0 1

β0

β1β2

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 25: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

Terminology :

Multiple Linear Regression/ANOVA/ANCOVA

if matrix X only contains regressors, models are called multipleregression models

if matrix X only contains factors, model are called Analysis OfVariance (ANOVA) models (X is a matrix composed with 0 or 1).

if matrix X contains both regressors and factors, models are calledAnalysis of Covariance (ANCOVA) models.

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 26: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

R procedure

> res.lm <- lm(formula=y ∼ x+f+x:f, data = data, offset

=rep(0,n), weights=rep(1,n))

> summary(res.lm)

> anova(res.lm)

> names(res.lm)

> res.lm$coefficients

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 27: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

Gaussian hypothesis

Graphical

histogram, QQ-plot,

> hist(res.lm$residuals)

> par(mfrow=c(2,2)

> plot(res.lm)

Statistical test

Kolmogorov-Smirnov

> ks.test(res.lm$res, "pnorm")

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 28: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

Homoscedasticity hypothesis

Graphical

residual/fitted

> par(mfrow=c(1,2)

> plot(res.lm$fitted.values,

res.lm$res) ;

> plot(res.lm$res)

> abline(v=0)

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 29: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

Independence hypothesis

Difficult to test !!!

have a look at the plot of the residualsfor time correlation : Durbin-Watson test ...

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 30: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

Estimation

Let’s assume a linear model :

Y = Xβ + ε

with ε ∼ N (0, σ2Id)

Parameters to be estimated are β, σIn all the following, X is supposed of full rank: rank(X )= K

Least squares approach : min(||Y − Xβ||2)

βls = (X ′X )−1X ′Y

best linear unbiased estimator of β

βls ∼ N (β, σ2(X ′X )−1)

best quadratic unbiased estimator of σ2

σ2ls =

1

n − K(Y − X β)′(Y − X β) and σ2

ls ∼σ2

n − Kχ2(n − K )

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 31: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

Preambles for estimation

Definition (Likelihood function)

The likelihood function is a function of the parameters of a statisticalmodel and is defined as

Θ ↪→ Rθ 7→ fθ(y),

and will be denoted as L(θ; y)

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 32: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

Maximum likelihood approach

Likelihood

L(β, σ, y) =n∏

i=1

1√2πσ2

e−1

2σ2 (yi−x′i β)′(yi−x′i β)

Log-likelihood

`(β, σ, y) = −n

2log (2πσ2)− 1

2σ2(y − Xβ)′(y − Xβ)

Maximum log-likelihood

∂β,σ`(β, σ, y) = 0⇒

βml = (X ′X )−1X ′Y

σ2ml =

1

n(Y − X β)′(Y − X β)

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 33: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

βls = βml

but σ2ls 6= σ2

ml

σ2ls :

is unbiasedis calculated on the orthogonal of Xit takes into account the difference between Y and its projection onX : X β

σ2ml :

is biasedestimation of σ2 and β are made jointlyit does note take into account the lost in degrees of freedom due tothe estimation of β

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 34: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

Goodness of fit criterion

Adjusted R-square

R2 = 1−∑n

i=1(yi − x ′i β)2/(n − K )∑ni=1(yi − y)2/(n − 1)

Akaike’s Information Criterion

AIC = −2 logL(βml , σml , y) + 2K

Bayesian Information Criterion

BIC = −2 logL(βml , σml , y) + K log(n)

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 35: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

Variable selection in multiple regression

The main approaches

Forward selection, which involves starting with no variables in themodel, trying out the variables one by one and including them ifthey are statistically significant.

Backward elimination, which involves starting with all candidatevariables and testing them one by one for statistical significance,deleting any that are not significant.

Stepwise methods that are a combination of the above, testing ateach stage for variables to be included or excluded.

step function

For AIC: > step(model, data, direction = c("both",

"backward", "forward"), k=2, trace)

k=2 is by default

For BIC: > step(model, data, direction,

k=log(nrow(data)), trace)

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 36: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

Generalized Linear Models – Introduction

Agricultural Science - different types of data (responses):

continuous: weight, height, diameterdiscrete: count, proportion

Model choice - important part of the research: search for a simplemodel which explains well the data.

All models envolve:

a systematic component - related to the explanatory variables(regression model, analysis of variance model, analysis of covariancemodel);a random component - related to the distributions followed by theresponse variables;a link between systematic and random components.

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 37: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

Motivating example – Carnation meristem culture

0.0 0.1 0.3 0.5 1.0 2.0

s l v s l v s l v s l v s l v s l v

1 2.5 0 3 5.5 1 5 4.8 1 9 2.8 0 10 2.0 1 12 1.7 12 2.5 0 2 4.3 1 5 3.0 1 10 2.3 1 8 2.3 1 15 2.5 11 3.0 0 6 3.3 0 4 2.7 0 8 2.7 1 12 2.0 1 15 2.3 12 2.5 1 3 4.3 0 4 3.1 1 11 3.2 0 13 1.0 1 12 1.5 11 4.0 0 4 5.4 0 5 2.9 0 8 2.9 1 14 2.8 1 13 1.7 11 4.0 0 3 3.8 1 6 3.3 1 8 1.5 1 14 2.0 1 16 2.0 12 3.0 0 3 4.3 1 6 2.1 1 8 2.5 0 14 2.7 1 17 1.7 11 3.0 0 4 6.0 1 5 3.7 1 8 2.8 0 9 1.8 1 15 2.0 11 5.0 0 3 5.0 1 4 3.8 1 8 1.8 1 13 1.8 1 17 2.0 11 4.0 0 2 5.0 0 5 3.8 1 11 2.0 0 9 2.1 1 14 2.3 11 2.0 1 3 4.5 0 6 3.3 0 9 2.7 1 15 1.3 1 16 2.5 11 4.0 0 3 4.0 1 6 2.6 1 12 1.8 1 15 1.2 1 21 1.3 12 3.0 0 4 3.3 0 5 2.3 0 12 2.3 1 16 1.2 1 18 1.3 12 3.5 1 3 4.3 1 4 3.6 1 10 1.5 1 9 1.0 1 16 1.8 11 3.0 0 3 4.5 1 3 4.8 1 10 1.5 1 13 1.7 1 18 1.0 12 3.0 0 2 3.8 0 4 2.0 0 7 1.0 1 14 1.7 1 20 1.3 12 5.5 0 3 4.7 1 6 1.7 0 8 3.0 1 16 1.3 1 22 1.5 11 3.0 0 4 2.2 0 5 2.5 0 12 2.0 1 13 1.8 1 20 1.3 11 2.5 0 2 3.8 1 5 2.0 0 9 3.0 11 2.0 0 3 5.0 0 5 2.0 0

Response variables: number of shoots (s), average length of shoots(l), vitrification (v)

Distributions: ??

Systematic component: regression model, completely randomizedexperiment.

Links: ??

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 38: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

********************

**

*

***************

**

****************

****

**

*

*

*****

*

*

**

**

**

*

**

*

*****

*

*

*

***

*

**

*

**

**

**

**

*

*

*

*

*

*

*

*

*

*

*

0.0 0.5 1.0 1.5 2.0

510

1520

BAP dose

Num

ber o

f sho

ots

*

*

*

*

**

*

*

*

*

*

*

*

*

*

*

*

*

*

**

*

*

******

*

**

*

**

*

*

*

**

*

******

******

*

*

***** **

*****************

****************** ******************

0.0 0.5 1.0 1.5 2.0

01

23

45

BAP dose

Aver

age

leng

th o

f sho

ots

***

*

******

*

**

*

******

**

***

****

**

*

*

**

*

*

*

*

*

**

*

*

*

*****

*

*

*

**

***** *

**

*

**

**

*

*

********* ****************** ******************

0.0 0.5 1.0 1.5 2.0

0.0

0.2

0.4

0.6

0.8

1.0

BAP dose

Vitri

ficat

ion

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 39: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

Motivating example – Rotenon toxicity

Dose (di ) mi yi

0.0 49 02.6 50 63.8 48 165.1 46 247.7 49 42

10.2 50 44

Response variable: Yi – number of dead insects out of mi insects(Martin, 1942).

Distribution: Binomial.

Systematic component: regression model, completely randomizedexperiment.

Aim: Lethal doses.

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 40: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

*

*

*

*

* *

0 2 4 6 8 10

0.0

0.2

0.4

0.6

0.8

Dose

Obs

erve

d pr

opor

tions

the curve of the observed proportions against doses has an S shape

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 41: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

Motivation for the link function

Bioassay or biological assay – the measurement of the potency of astimulus by means of the reactions it produces in living matter

The stimulus may be physiological, chemical, biological, etc.

insects exposed to chemical stimuli (insecticides or biological, eg avirus, stimuli)other agricultural bioassays may involve large animals, fungi, plantsor plant parts such as leaves.

The response here is quantal or binary – just 2 possible responses(success or failure).

an insect dies or survivesa seed germinates or fails to germinatea cutting roots or fails to root

The rotenone data example

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 42: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

The tolerance of an individual (eg an insect) is the dose orconcentration or intensity of the stimulus above which the individualresponds (eg dies) and below which it does not respond.

An individual dies if his tolerance is smaller than a given dose.

Tolerance varies between individuals – it is a random variable.

Supposez : tolerance of a randomly chosen individualfZ (z): the probability density function of Zx : dose of the stimulus

5 10 15 20 25 30 35

0.0

0.1

0.2

0.3

0.4

z

f(z)

5 10 15 20 25 30 35

0.00

0.02

0.04

0.06

0.08

0.10

z

f(z)

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 43: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

Then, for an individual chosen at random from the population

P(death|x) = P(Z < x) =

∫ x

−∞f (z)dz

5 10 15 20 25 30 35

0.0

0.1

0.2

0.3

0.4

z

f(z) π

10 15 20 25 30

0.0

0.2

0.4

0.6

0.8

1.0

z

Pro

port

ion

of d

ead

inse

cts

LD50

It is often reasonable to assume that the tolerance Z has a normaldistribution, that is, Z ∼ N(µ, σ2). Then, the cumulative normaldistribution is

π = P(death|x) = P(Z < x) = P

(Z − µσ

<x − µσ

)= Φ

(x − µσ

)The inverse of the cumulative normal function,probit(π) = Φ−1(π) = α + βx , where α = −µσ and β = 1

σ , is calledthe probit transformation.

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 44: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

Alternatively, we can assume that tolerance Z has a logisticdistribution with mean E(Z ) = µ, variance Var(Z ) = π2τ 2/3 andpdf

fZ (z ;µ, τ) =1

τ

exp

(z − µτ

)[

1 + exp

(z − µτ

)]2 , µ ∈ R, τ > 0,

Then, the cumulative logistic distribution is

π = P(Z ≤ x) = F(x) =

∫ x

−∞

βeα+βz

(1 + eα+βz )2 dz =

eα+βx

1 + eα+βx

where α = −µ/τ and β = 1/τ .

The inverse of the cumulative distribution of the logistic distributionis called the logit transformation

logit(π) = logπ

1− π= α + βx

The normal and logistic distributions are symmetrical around themean.

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 45: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

A different distribution, which is asymmetrical, for the tolerance Z isthe extreme value (Gumbel) distribution with mean E(Z ) = α + γτ ,variance Var(Z ) = π2τ 2/6, where γ ≈ 0, 577216 is the Euler numberdefined by γ = −ψ(1) = limn→∞(

∑ni=1 i−1 − log n),

ψ(p) = d log Γ(p)/dp is the digamma function, and pdf

fZ (z ;α, τ) =1

τexp

(z − ατ

)exp

[− exp

(z − ατ

)], α ∈ R, τ > 0,

Then, the cumulative Gumbel distribution is

π = F (x) =

∫ x

−∞β exp

(α + βz − eα+βz

)dz = 1− exp [− exp(α + βx)]

where α = −µ/τ and β = 1/τ .

The inverse of the cumulative Gumbel distribution is calledcomplementary log-log transformation

c-loglog(π) = log(− log(1− π)) = α + βx

The exposed theory shows one type of biological justification for thelink function.

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 46: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

5 10 15 20 25 30 35

0.0

0.1

0.2

0.3

0.4

z

f(z)

5 10 15 20 25 30 35

0.00

0.04

0.08

z

f(z)

5 10 15 20 25 30 35

0.0

0.4

0.8

zPro

port

ion

of d

ead

inse

cts

5 10 15 20 25 30 35

0.0

0.4

0.8

zPro

port

ion

of d

ead

inse

cts

5 10 15 20 25 30 35

−3−1

13

z

Pro

bit(p

ropo

rtio

n)

5 10 15 20 25 30 35

−6−4

−20

2

z

C−l

og−l

og(p

ropo

rtio

n)

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 47: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

Generalized Linear Models – Definition

The three components of a generalized linear model (Nelder andWedderburn, 1972; McCullagh and Nelder, 1989) are:

independent random variables Yi , i = 1, . . . , n, from an exponentialfamily distribution with means µi and constant dispersion (scale)parameter φ,

f (y) = exp

{yθ − b(θ)

φ+ c(y , φ)

}where µ = E[Y ] = b′(θ) and Var(Y ) = φb′′(θ) = φV (µ), V (µ)called variance function.

a linear predictor vector η given by

η = Xβ

where β is a vector of p unknown parameters and X = [x1, . . . , xn]′

is the n × p design matrix;

a link function g(·) relating the mean to the linear predictor, i.e.

g(µi ) = ηi = x′iβ

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 48: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

Normal Models

Yi , i = 1, . . . , n, a continuous response variable,Yi ∼ N(µi , σ

2) with mean µi and constant variance σ2

We model the mean µi in terms of the explanatory variables xi .As a glm

Random component:

f (yi ;µi , σ2) =

1√

2πσ2exp

[−

(yi − µi )2

2σ2

]= exp

[1

σ2

(yiµi −

µ2i

2

)−

1

2log(2πσ2)−

y2i

2σ2

]

where

θi = µi , φ = σ2, b(θi ) =

µ2i

2=θ2

i

2, c(yi , φ) = −

1

2

[y2

i

σ2+ log(2πσ2)

], V (µi ) = 1

Systematic component: ηi = x′iβ

Link function: ηi = µi (identity link, canonical link)

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 49: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

Binomial regression model

Yi , count of successes out of a sample of size mi , i = 1, . . . , nYi ∼ Bin(mi , πi ) with mean E[Yi ] = µi = miπi , andVar(Yi ) = miπi (1− πi )We model the expected proportions πi ∈ [0, 1] in terms of theexplanatory variables xi . As a glm

Random component:

f (yi ;πi ) =(m

y

yii (1− πi )

mi−yi = exp

[yi log

(πi

1− πi

)+ mi log(1− πi ) + log

(mi

yi

)],

where yi = 0, 1, . . . ,mi ,

φ = 1, θi = log

(πi

1− πi

)= log

(µi

mi − µi

)⇒ µi =

meθi(1 + eθi )

,

b(θ) = −mi log(1− πi ) = mi log (1 + eθi ), c(yi , φ) = log(m

y

), V (µi ) = µi (mi − µi )/mi

Systematic component:

ηi = x′iβ

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 50: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

Link function:

The canonical link function is the logit

ηi = g(πi ) = g(µi

mi) = log

(µi

mi − µi

)= log

(πi

1− πi

)Other common choices are

probit ηi = g(πi ) = g( µi

mi) = Φ−1(µi/mi ) = Φ−1(πi )

complementary log-log link

ηi = g(πi ) = g(µi

mi) = log{− log(1− πi )}.

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 51: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

Poisson regression models

Yi , i = 1, . . . , n, are counts with means µi

Yi ∼ Pois(µi ) with mean µi and variance Var(Yi ) = µi

As a glm

Random component:

f (yi ;µi ) =e−µiµyi

i

yi != exp[yi log(µi )− µi − log(yi !)]

where

φ = 1, θi = log(µi )⇒ µi = exp(θi ), b(θ) = µi = exp(θi ), c(yi , φ) = log(yi !), V (µi ) = µi

Systematic component:

ηi = x′iβ

Link function:

The canonical link function is the log

ηi = g(µi ) = log(µi )

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 52: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

For different observation periods/areas/volumes:

Yi ∼ Pois(tiλi )

Taking a log-linear model for the rates,

log(λi ) = x′iβ

results in the following log-linear model for the Poisson means

log(µi ) = log(tiλi ) = log(ti ) + x′iβ,

where the log(ti ) is included as a fixed term, or offset, in the model.

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 53: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

Components of some distributions from exponential family

Distribution φ θ b(θ) c(y, φ) µ(θ) V (µ)

Normal: N(µ, σ2) σ2 µθ2

2−

1

2

y2

σ2+ log(2πσ2)

θ 1

Poisson: P(µ) 1 log(µ) eθ − log(y !) eθ µ

Binomial: B(m, π) 1 log

m − µ

)m log(1 + eθ ) log

(m

y

) meθ

1 + eθ

µ

m(m − µ)

Negative Binomial: BN(µ, k) 1 log

µ + k

)−k log(1 − eθ ) log

[Γ(k + y)

Γ(k)y !

]keθ

1 − eθµ

(µk

+ 1

)Gamma: G(µ, ν) ν−1 −

1

µ− log(−θ) ν log(νy) − log(y) − log Γ(ν) −

1

θµ2

Inverse Gaussian: IG(µ, σ2) σ2 −1

2µ2−(−2θ)1/2 −

1

2

[log(2πσ2y3) +

1

σ2y

](−2θ)−1/2 µ3

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 54: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

Canonical link functions for some distributions

Distribution Canonical link functionsNormal Identity: η = µPoisson Logarithmic: η = log(µ)

Binomial Logistic: η = log

1− π

)= log

m − µ

)Gamma Reciprocal: η =

1

µ

Inverse Gaussian Reciprocal squared: η =1

µ2

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 55: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

Estimation

Maximum likelihood estimation.

Estimation algorithm (Nelder and Wedderburn, 1972) – Iterativellyweighted least squares (IWLS)

X′W[t]Xβ[t+1] = X′W[t]z[t]

where

t denotes the iteration index

X = [x1, x2, . . . , xn]′ is a design matrix n × p,

W = diag{Wi} – depends on the known (prior) weights (wi ), variancefunction (distribution) and link function

Wi = wiV (µi )

(dµidηi

)2

β – parameter vector p × 1

z – a vector n × 1 (adjusted response variable) – depends on y and linkfunction

zi = ηi + (yi − µi )dηidµi

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 56: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

When the dispersion parameter is unknown, it may be estimated bythe Pearson Estimator

φ =1

n − p

n∑i=1

wi (yi − µi )2

V (µi )

where µi = g−1(x′i β) is the ith fitted value.

Some computer packages estimate φ by the deviance estimatorD(β)/(n − p); but it cannot be recommended because of problemswith bias and inconsistency in the case of a non-constant variancefunction.

For positive data, the deviance may also be sensitive to roundingerrors for small values of yi .

The asymptotic variance of β is estimated by the inverse (Fisher)information matrix, giving

Var(β) = K = φ(X′WX)−1,

where W is calculated from β.

The standard error se(βj ) is calculated as the square-root of the jthdiagonal element of this matrix, for j = 1, . . . , p

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 57: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

When φ is known, a 1− α confidence interval for βj is defined bythe endpoints

βj ± se(βj )z1−α/2

where z1−α/2 is the 1− α/2 standard normal quantile.

For φ unknown, we replace φ by φ in K and a 1− α confidenceinterval for βj is defined by the endpoints

βj ± se(βj )t(1−α/2)(n − p)

where t(1−α/2)(n − p) is the 1− α/2 quantile of Student’s tdistribution with n − p degrees of freedom.

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 58: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

Analysis of Deviance – Goodness of fitting and modelselection

Analysis of deviance is the method of parameter inference forgeneralized linear models based on the deviance, generalizing ideasfrom ANOVA, and first introduced by Nelder and Wedderburn(1972).

The situation is similar to regression analysis, in the sense thatmodel terms must be eliminated sequentially, and the significance ofa term may depend on which other terms are in the model.

The deviance D measures the distance between y and µ, given by

S =D(β)

φ= −2[log L(µ, y)−log L(y , y)] = 2φ−1

n∑i=1

wi [yi (θi−θi )+b(θi )−b(θi )]

where L(µ, y) and L(y , y) are the likelihood function values for thecurrent and saturated models, θi = θ(yi ), θi = θ(µi ) andD(β) =

∑ni=1 wi d(yi ; µi ).

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 59: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

Deviance for some models

Model Deviance

Normal Dp =n∑

i=1

(yi − µi )2

Binomial Dp = 2n∑

i=1

[yi log

(yi

µi

)+ (mi − yi ) log

(mi − yi

mi − µi

)]Poisson Dp = 2

n∑i=1

[yi log

(yi

µi

)+ (µi − yi )

]Negative Binomial Dp = 2

n∑i=1

[yi log

(yi

µi

)+ (yi + k) log

(µi + k

yi + k

)]Gamma Dp = 2

n∑i=1

[log

(µi

yi

)+

yi − µi

µi

]Inverse Gaussian Dp =

n∑i=1

(yi − µi )2

yi µ2i

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 60: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

We consider separately the cases where φ is known and unknown,but first we introduce some notation.

Let M1 denote a model with p parameters, and let D1 = D(β)denote the minimized deviance under M1

Let M2 denote a sub-model of M1 with q < p parameters, and letD2 denote the corresponding minimized deviance, where D2 ≥ D1

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 61: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

Known dispersion φ parameter

Mainly relevant for discrete data, for which, in general, φ = 1.

The deviance D1 is a measure of goodness-of-fit of the model M1;and is also known as the G 2 statistic in discrete data analysis.

A more traditional goodness-of-fit statistic is Pearson’s X 2 statistic

X 2 =∑ wi (yi − µi )

2

V (µi )

Asymptotically, for large w the statistics D1 and X 2 are equivalentand distributed as χ2(n − p) under M1.

Various numerical and analytical investigations have shown that thelimiting χ2 distribution is approached faster for the X 2 statistic thanfor D1, at least for discrete data.

A formal level α goodness-of-fit test for M1 is obtained by rejectingM1 if X 2 > χ2

(1−α)(n − p)

This test may be interpreted as a test for overdispersion.

The fit of a model is a complex question, cannot be summarized in asingle number – supplement with an inspection of residuals.

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 62: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

To test the sub-model M2 with q < p we use the log likelihood ratiostatistic

D2 − D1 ∼ χ2(p − q)

M2 is rejected at level α if D2 − D1 > χ2(1−α)(p − q)

In the case where φ 6= 1 we use the scaled deviance D/φ instead ofD; and the scaled Pearson statistic X 2/φ instead of X 2 and so on.

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 63: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

Unknown dispersion φ parameter

The dispersion parameter is usually unknown for continuous data

In the discrete case we may prefer to work with unknown dispersionparameter, if evidence of overdispersion has been found in the data.

There is no formal goodness-of-fit test available based on X 2 – thefit of the model M1 to the data must be checked by residual analysis.

X 2 is used to estimate the dispersion parameter

φ =1

n − p

∑ wi (yi − µi )2

V (µi )

where µi = g−1(β′xi ) is the ith fitted value.

To test the sub-model M2 with q < p parameters inference may bebased on F−statistic,

F =(D2 − D1)/(p − q)

φ∼ F (p − q, n − p)

We reject M2 at level α if F > F1−α(p − q, n − p)

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 64: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

Deviance Table – An example

Suppose a completely randomized experiment with two factors A (with alevels) and B (with b levels) and r replications

Model DF Deviance Deviance Diff. DF Diff. MeaningNull rab − 1 D1

D1 − DA a− 1 A ignoring BA a(rb − 1) DA

DA − DA+B b − 1 B including AA+B a(rb − 1)− (b − 1) DA+B

DA+B − DA∗B (a− 1)(b − 1) Interaction ABincluded A and B

A+B+A.B ab(r − 1) DA∗B

DA∗B ab(r − 1) ResidualSaturated 0 0

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 65: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

Residual analysis

Pearson residual

rPi =yi − µi√

V (µi )

reflect the skewness of the underlying distribution.

Deviance residual

rDi = sign(yi − µi )√

d(yi ; µi )

which is much closer to being normal than the Pearson residual, buthas a bias (Jorgensen, 2011).

Modified deviance residual (Jorgensen, 1997)

r∗Di = rDi +φ

rDilog

rWi

rDi

where rWi is the Wald residual defined by

rWi = [g0(yi )− g0(µi )]√

V (yi )

where g0 is the canonical link.

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 66: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

All those residuals have approximately mean zero and varianceφ(1− hi ), where hi is the ith diagonal element of the matrixH = W1/2X(X′WX)−1X′W1/2.

Use standardized residuals such as r∗Di (1− hi )1/2, which are nearly

normal with variance φ

Plot residuals against the fitted values – to check the proposedvariance function

Normal Q-Q plot (or normal Q-Q plot with simulated envelopes) forthe residuals – to check the correctness of the distributionalassumption

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 67: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

Checking for the link function

Suppose the link function g0(µ) = g(µ, λ0) = Xβ, nested in a parametricfamily g(µ, λ), indexed by the parameter λ, for example,

g(µ, λ) =

µλ − 1

λλ 6= 0

log(µ) λ = 0

which includes the identity, logarithmic links, or the Aranda-Ordaz family,

µ =

{1− (1 + λeη)−

1λ λeη > −11 c.c.

which includes the identity, complementary log-log links.The Taylor expansion for g(µ, λ) around λ0, gives

g(µ, λ) ' g(µ, λ0) + (λ− λ0)u(λ0) = Xβ + γu(λ0)

where u(λ0) =∂g(µ, λ)

∂λ

∣∣∣∣λ=λ0

.

In general it is used u(λ0) = η2.C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 68: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

Justification

Suppose the link function used was η = g(µ) and that the true link isg∗(µ). Then,

g(µ) = g [g∗−1(η)] = h(η)

The null hypothesis is H0 : h(η) = η and Ha : h(η) = non linear.Using Taylor expansion for g(µ) we have:

g(µ) ' h(0) + ηh′(0) + η2 h′′(0)

2

then, the added variable is η2, assuming that the model has terms forwhich the mean is marginal.Formal tests

Likelihood ratio test

Score Test

Wald Test

Graphics

Added variable plot

Partial residual plot

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 69: Mixed Models, theory and applications

R commands for GLM

glm(resp ~ linear predictor + offset(of), weights = w,

family=familyname(link ="linkname" ))

The resp is the response variable y . For a binomial regression model it isnecessary to create:

resp<-cbind(y,n-y)

The possible families (”canonical link”) are:

binomial(link = "logit")

gaussian(link = "identity")

Gamma(link = "inverse")

inverse.gaussian(link = "1/mu^2")

poisson(link = "log")

quasi(link = "identity", variance = "constant")

quasibinomial(link = "logit")

quasipoisson(link = "log")

The default family is the gaussian family and default links are the canonicallinks (don’t need to be declared). Other possible links are ”probit”, ”cloglog”,”cauchit”, ”sqrt”,etc. To see more, type

? glm

Page 70: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

Normal models – Transform or link

Example

Average daily fat yields (kg/day) from milk from a single cow for each of35 weeks (McCulloch, 2001)

0.31 0.39 0.50 0.58 0.59 0.640.68 0.66 0.67 0.70 0.72 0.680.65 0.64 0.57 0.48 0.46 0.450.31 0.33 0.36 0.30 0.26 0.340.29 0.31 0.29 0.20 0.15 0.180.11 0.07 0.06 0.01 0.01

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 71: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

Fat yield example

A typical model:Fat yield ≈ αtβeγt where t=week

TransformYi = αtβi eγti eεi

log(Yi ) = µi + εi = logα + β log(ti ) + γi t + εi

log(Yi ) ∼ N(logα + β log(ti ) + γi t, σ2)

E[log(Yi )] = logα + β log(ti ) + γi t

LinkYi = µi + ξi = αtβi eγti + ξi

Yi ∼ N(αtβi eγti , τ 2)

E[Yi ] = αtβi eγti

log(E[Yi ]) = logα + β log(ti ) + γi t

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 72: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

Plot of fat yield for each week – Observed values and fitted curve

0 5 10 15 20 25 30 35

0.0

0.2

0.4

0.6

0.8

Weeks

Fat y

ield

(kg

/day

)

*

*

*

* *

**

* **

**

* *

*

** *

**

*

**

*

**

*

*

**

** *

* *

*

ObservedLog transformationLog link

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 73: Mixed Models, theory and applications

R program

# Average daily fat yields (kg/day) from milk

# from a single cow for each of 35 weeks

# McCulloch (2001)

# Ruppert, Cressie, Carroll (1989), Biometrics, 45:637-656

fatyield.dat<-scan(what=list(yield=0))

0.31 0.39 0.50 0.58 0.59 0.64

0.68 0.66 0.67 0.70 0.72 0.68

0.65 0.64 0.57 0.48 0.46 0.45

0.31 0.33 0.36 0.30 0.26 0.34

0.29 0.31 0.29 0.20 0.15 0.18

0.11 0.07 0.06 0.01 0.01

fatyield.dat$week=1:35

attach(fatyield.dat)

lweek<-log(week)

## Normal model for log(fat)

lyield<-log(yield)

mod1<-lm(lyield~week+lweek)

summary(mod1)

anova(mod1)

Page 74: Mixed Models, theory and applications

R program (cont.)

## Normal model with log link

mod2<-glm(yield~week+lweek, (family=gaussian(link="log")))

fit2<-fitted(mod2)

summary(mod2)

anova(mod2)

# Plotting

plot(c(0,35), c(0,0.9), type="n", xlab="Weeks",

ylab="Fat yield (kg/day)")

points(week,yield, pch="*")

w<-seq(1,35,0.1)

lines(w, predict(mod2,data.frame(week=w, lweek=log(w)),

type="response"),

col="green", lty=1)

lines(w, exp(predict(mod1,data.frame(week=w, lweek=log(w)),

type="response")),

col="red", lty=2)

legend(20,0.9,c("Observed","Log transformation", "Log link"),

lty=c(-1,2,1),

pch=c("*"," "," "), col=c("black","red","green"),cex=.6)

title(sub="Figure 1. Fat yield (kg/day) for each week")

Page 75: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

Binomial regression model

Example

Batches of 20 pyrethroid-resistant moths (Heliothis virescens), a cottoncrop pest) of each sex were exposed to a range of doses of cypermethrintwo days after emergence from pupation. The number of moths whichwere either knocked down or dead was recorded after 72h.

Doses (di ) Males Females

1.0 1 02.0 4 24.0 9 68.0 13 10

16.0 18 1232.0 20 16

Response variable: Yi – number of dead insects out of 20 insects.Distribution: Binomial.Systematic component: regression model, completely randomizedexperiment.Aim: Lethal doses.

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 76: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

Cypermethrin toxicity example

Deviance residuals

Terms fitted in model d.f. Deviance X 2

Constant 11 124.9 101.4Sex 10 118.8 97.4

Dose 6 15.2 12.9Sex + Dose 5 5.0 3.7

Analysis of deviance

Source d.f. Deviance p-valueSex 1 6.1 0.0144Sex|Dose 1 10.2 0.0002Dose 5 109.7 < 0.0001Dose|Sex 5 113.8 < 0.0001Residual 5 5.0 0.5841Total 11 124.9

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 77: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

Cypermethrin toxicity example

Modelslogit(p) = αj + βj log2(dose) – different logistic regression lineslogit(p) = αj + β log2(dose) – common slopelogit(p) = α + βj log2(dose) – common interceptlogit(p) = α + β log2(dose) – same logistic regression line.

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 78: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

Cypermethrin toxicity example

Residual Deviances

Terms fitted in model d.f. Deviance X 2

Constant 11 124.9 101.4Sex + Sex log2(dose) 8 4.99 3.51Sex + log2(dose) 9 6.75 5.31Const. + Sex log2(dose) 9 5.04 3.50Const. + log2(dose) 10 16.98 14.76

Analysis of deviance

Source d.f. Deviance p-valueSex 1 6.1 0.0144Linear Regression 1 112.0 < 0.0001Residual 9 6.8 0.7473Total 11 124.9

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 79: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

Cypermethrin toxicity example

Regression equationsmales

logp

1− p= −2.372 + 1.535 log2(dose)

females

logp

1− p= −3.473 + 1.535 log2(dose)

Lethal Dosesmales

log2(LD50) =2.372

1.535= 1.55⇒ LD50 = 4.69

females

log2(LD50) =3.473

1.535= 2.26⇒ LD50 = 9.61

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 80: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

Cypermethrin toxicity example

Plot of observed proportions and fitted curves

1 2 5 10 20

0.0

0.2

0.4

0.6

0.8

1.0

log(dose)

Pro

port

ions

*

*

*

*

*

*

+

+

+

+

+

+

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 81: Mixed Models, theory and applications

R program

y <- c(1, 4, 9, 13, 18, 20, 0, 2, 6, 10, 12, 16)

sex<-factor(rep(c("M","F"), c(6,6)))

ldose<-rep(0:5,2)

dose<-2**ldose

dose<-factor(dose)

Cyper.dat <- data.frame(sex, dose, ldose, y)

attach(Cyper.dat)

plot(ldose,y/20, pch=c(rep("*",6),rep("+",6)),

col=c(rep("green",6), rep("red",6)),

xlab="log(dose)", ylab="Proportion killed")

resp<-cbind(y,20-y)

mod1<-glm(resp~1, family=binomial)

mod2<-glm(resp~dose, family=binomial)

mod3<-glm(resp~sex, family=binomial)

mod4<-glm(resp~dose+sex, family=binomial)

anova(mod1, mod2, mod4, test="Chisq")

anova(mod1, mod3, mod4, test="Chisq")

Page 82: Mixed Models, theory and applications

R program (cont.)

mod5<-glm(resp~ldose, family=binomial)

mod6<-glm(resp~sex+ldose-1, family=binomial)

mod7<-glm(resp~ldose/sex, family=binomial)

mod8<-glm(resp~ldose*sex, family=binomial)

anova(mod1, mod5, mod6, mod8, test="Chisq")

anova(mod1, mod5, mod7, mod8, test="Chisq")

summary(mod6)

## Plotting

plot(c(1,32), c(0,1), type="n", xlab="log(dose)",

ylab="Proportions", log="x")

points(2**ldose,y/20, pch=c(rep("*",6),rep("+",6)),

col=c(rep("green",6),rep("red",6)))

ld<-seq(0,5,0.1)

lines(2**ld, predict(mod6,data.frame(ldose=ld,

sex=factor(rep("M",length(ld)),levels=levels(sex))),

type="response"), col="green")

lines(2**ld, predict(mod6,data.frame(ldose=ld,

sex=factor(rep("F",length(ld)),levels=levels(sex))),

type="response"), col="red")

Page 83: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

CS2 toxicity

Mortality of adult beetles after five hours’ exposure to gaseous carbon disulphid(Bliss, 1935).

Log Number Numberdosage exposed killed

1.6907 59 61.7242 60 131.7552 62 181.7842 56 281.8113 63 521.8369 59 531.8610 62 611.8839 60 60

Considerations

Response variable: Yi – number of killed beetles out of mi exposed beetles(Pregibon, 1980).

Distribution: Binomial.

Systematic component: regression model, completely randomizedexperiment.

Aim: Lethal doses.C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 84: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

Binomial logistic model with η = β0 + β1log(dose) gives deviance11.23 on 6 d.f. ⇒ lack of fit

Deviance for link test 8.037 on 1 d.f. ⇒ need for different link

Added variable plot for U = η2 ⇒ evidence that γ 6= 0, spreadthroughout data

●●

−2 0 2 4 6 8 10

−1.5

−1.0

−0.5

0.0

0.5

1.0

1.5

Ordinary residuals

Pear

son

resid

uals

Figure: CS2 - Added variable plot

C-loglog link with a linear lp gives good fit

Logistic link with a quadratic lp gives good fit

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 85: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

1.70 1.75 1.80 1.85

0.0

0.2

0.4

0.6

0.8

1.0

Log(dose)

Pro

port

ion

of b

eetle

s ki

lled

*

*

*

*

*

*

* **

ObservedLogitCloglog

Figure: Beetle mortality - Observed proportions and fitted curvesC. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 86: Mixed Models, theory and applications

R program

# *** Mortality of adult beetle Example ***# Pregibon (1980) - Applied Statistics, 29: 20

y <- c(6, 13, 18, 28, 52, 53, 61, 60)m <- c(59, 60, 62, 56, 63, 59, 62, 60)ldose <- c(1.6907, 1.7242, 1.7552, 1.7842, 1.8113, 1.8369, 1.8610, 1.8839)beetle.dat <- data.frame(ldose, m, y)attach(beetle.dat)resp<-cbind(y,m-y)

## Logistic link functionmodl<-glm(resp ~ poly(I(ldose))+poly(I(ldose),2), family=binomial(link="logit"))anova(modl, test="Chisq")

mod1l<-glm(resp ~ 1, family=binomial(link="logit"))1-pchisq(deviance(mod1l), df.residual(mod1l))print(sum(residuals(mod1l, 'pearson')^2))mod2l<-glm(resp ~ I(ldose), family=binomial(link="logit"))1-pchisq(deviance(mod2l), df.residual(mod2l))print(sum(residuals(mod2l, 'pearson')^2))

## Link function testLP2l <- (predict(mod2l,type = c("link")))^2mod3l<-update(mod2l , .~. +LP2l, family=binomial(link="logit"))anova(mod3l, test="Chisq")

mod4l<-glm(resp ~ poly(I(ldose),2), family=binomial(link="logit"))1-pchisq(deviance(mod4l), df.residual(mod4l))print(sum(residuals(mod4l, 'pearson')^2))

Page 87: Mixed Models, theory and applications

R program

## Link function testLP2q <- (predict(mod4l,type = c("link")))^2mod5l<-update(mod4l , .~. +LP2q, family=binomial(link="logit"))anova(mod5l, test="Chisq")

## Added variable plotrp <- residuals(mod2l, 'pearson')W <- mod2l$fitted*(1- mod2l$fitted)*mU <- LP2lmod1n <- glm(U~I(ldose), family=gaussian, weights=W)rU <- (U - mod1n$fitted)plot(rU,rp, xlab="Ordinary residuals",ylab="Pearson residuals")

# Partial residual plotresparc <- residuals(mod3l, 'pearson') + mod3l$coef[3]*Uplot(U,resparc, xlab="U",ylab="Residual + component")

summary(mod6l<-glm(resp ~ I(ldose)+I(ldose^2), family=binomial(link="logit")))anova(mod6l, test="Chisq")library(MASS)# variance-covariance matrix of estimated parametersvcov(mod6l)

Page 88: Mixed Models, theory and applications

R program

## Complementary log-log link functionmodc<-glm(resp ~ poly(I(ldose))+poly(I(ldose),2), family=binomial(link="cloglog"))anova(modc, test="Chisq")

mod1c<-glm(resp ~ 1, family=binomial(link="cloglog"))1-pchisq(deviance(mod1c), df.residual(mod1c))print(sum(residuals(mod1c, 'pearson')^2))mod2c<-glm(resp ~ I(ldose), family=binomial(link="cloglog"))1-pchisq(deviance(mod2c), df.residual(mod2c))print(sum(residuals(mod2c, 'pearson')^2))

## Link function testLP2l <- (predict(mod2c,type = c("link")))^2mod3c<-update(mod2c , .~. +LP2l, family=binomial(link="cloglog"))anova(mod3c, test="Chisq")

## Added variable plotrp <- residuals(mod2c, 'pearson')W <- mod2c$fitted*(1- mod2c$fitted)*mU <- LP2lmod1n <- glm(U~I(ldose), family=gaussian, weights=W)rU <- (U - mod1n$fitted)plot(rU,rp, xlab="Ordinary residuals",ylab="Pearson residuals")

# Partial residual plotresparc <- residuals(mod3c, 'pearson') + mod3c$coef[3]*Uplot(U,resparc, xlab="U", ylab="Residual + component")

Page 89: Mixed Models, theory and applications

R program

summary(mod5c<-glm(resp ~ I(ldose), family=binomial(link="cloglog")))library(MASS)# variance-covariance matrix of estimated parametersvcov(mod5c)

# Doses LD50 and LD90dose.p(mod5c); dose.p(mod5c, p=0.9)

# Doses LD25, LD50, LD75dose.p(mod5c, p = 1:3/4)

# Plottingplot(c(1.69,1.89), c(0,1), type="n", xlab="Log(dose)",ylab="Proportion of beetles killed")points(ldose, y/m, pch="*")ld<-seq(1.69,1.89,0.01)lines(ld, predict(mod4l,data.frame(ldose=ld),type="response"), col="blue", lty=1)lines(ld, predict(mod2c,data.frame(ldose=ld),type="response"), col="red", lty=2)legend(1.7,1.0,c("Observed","Logit", "Cloglog"), lty=c(-1,1,2),pch=c("*"," "," "), col=c("black","blue", "red"))

Page 90: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

Germination of Orobanche seed

Example

O. aegyptiaca 75 O. aegyptiaca 73

Bean Cucumber Bean Cucumber

10/39 5/6 8/16 3/12

23/62 53/74 10/30 22/41

23/81 55/72 8/28 15/30

26/51 32/51 23/45 32/51

17/39 46/79 0/4 3/7

10/13

Response variable: Yi – number of germinated seeds out of mi seeds.Distribution: Binomial.Systematic component: factorial 2× 2 (2 species, 2 extracts),completely randomized experiment (Crowder, 1978).Aim: to see how germination is affected by species and extracts.Problem: overdispersion.

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 91: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

Germination of Orobanche seed

Simple Binomial Model

No of seeds germinating yi as y-variable

Binomial logit model

Species * Extract interaction model

Analysis of deviance

Source d.f. Deviance p-valueE 1 55.97E | S 1 56.49S 1 2.54S | E 1 3.06S.E 1 6.41Residual 17 33.28 0.0096Total 20 98.72

Interaction significant

Some evidence of overdispersion?

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 92: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

Germination of Orobanche seed

Normal plot with simulated envelope

−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5

−2

−1

01

2

quantile

devi

ance

res

idua

ls

● ●

●●

●●

● ●

● ●

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 93: Mixed Models, theory and applications

R program

Species <- factor(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,

2, 2, 2, 2, 2, 2, 2, 2, 2, 2))

Extract <- factor(c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2,

1, 1, 1, 1, 1, 2, 2, 2, 2, 2))

y <- c(10, 23, 23, 26, 17, 5, 53, 55, 32, 46, 10, 8,

10, 8, 23, 0, 3, 22, 15, 32, 3)

m <- c(39, 62, 81, 51, 39, 6, 74, 72, 51, 79, 13, 16,

30, 28, 45, 4, 12, 41, 30, 51, 7)

orobanch <- data.frame(Species, Extract, m, y)

attach(orobanch)

resp<-cbind(y,m-y)

p<-y/m

# Binomial fit

orobanchB.fit<-glm(resp~Species*Extract, family=binomial)

summary(orobanchB.fit)

anova(orobanchB.fit, test="Chisq")

orobanchB.fit<-glm(resp~Extract*Species, family=binomial)

summary(orobanchB.fit)

anova(orobanchB.fit, test="Chisq")

Page 94: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

Poisson regression model

Example

Storing of micro-organisms

Bacterial concentrations (counts per fixed area) measured at initial freezing(−70oC) and at 1, 2, 6, 12 months.

Time 0 1 2 6 12Count 31 26 19 15 20

Hypothesized that count decays over time, i.e.

average count ∝ 1

(Time)γ

Model

Count ∼ Pois(µ)

logµ = β0 + β1 log(Time)

Avoid problems at time 0 by using

logµ = β0 + β1 log(Time + 0.1)

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 95: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

Storing of micro-organisms example

Residual deviances

Model d.f. Deviance X 2

Constant 4 7.0672 7.1532log(Time) 3 1.8338 1.8203

Analysis of deviance

Source d.f. Deviance p-valueLinear Regression 1 5.2334 0.0222Error 3 1.8338Total 4 7.0672

log(µ) = 3.149− 0.1261 log(Time)

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 96: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

Storing of micro-organisms example

Plot of bacterial concentration: observed values and fitted curve

0 2 4 6 8 10 12

1520

2530

Time in months

Cou

nts

*

*

*

*

*

Figure 1. Data and Fitted curveC. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 97: Mixed Models, theory and applications

R program

ltime <- log((tim <- c(0, 1, 2, 6, 12))+ 0.1)

lcount <- log(count <- c(31, 26, 19, 15, 20))

bacteria.dat <- data.frame(tim, count, ltime, lcount)

with(bacteria.dat,{

par(mfrow=c(1,2))

plot(tim, count, xlab="Time in months", ylab="Counts")

plot(ltime,lcount, xlab="Log(time in months)", ylab="Log(counts)")

par(mfrow=c(1,1))

})

mod1<-glm(count ~ tim, family=poisson, bacteria.dat)

anova(mod1, test="Chisq")

mod2<-glm(count ~ ltime, family=poisson)

anova(mod2, test="Chisq")

plot(c(0,12), c(15,31), type="n", xlab="Time in months",

ylab="Counts")

points(tim,count,pch="*")

x<-seq(0,12,0.1)

lp<-predict(mod2,data.frame(ltime=log(x+0.1)), type="response")

lines(x,lp,lty=1)

title(sub="Figure 1. Data and Fitted curve")

Page 98: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

Exercise Relative potency - Toxicity of insecticide to flour beetles

Insecticide Deposit of Insecticide

2.00 2.64 3.48 4.59 6.06 8.00

DDT 3/50 5/49 19/47 19/50 24/49 35/50

γ-BHC 2/50 14/49 20/50 27/50 41/50 40/50

DDT + γ-BHC 28/50 37/50 46/50 48/50 48/50 50/50

Considerations

Response variable: Yi – number of dead insects out of mi insects.

Distribution: Binomial.

Systematic component: ANOVA with regression model, completelyrandomized experiment.

Aim: Lethal doses and comparison of insecticides.

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 99: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models

Exercise

Count of the number of plant species on plots that have differentbiomass (a continuous explanatory variable with three levels: high, midand low) (Venables and Ripley, 1994)

Tabela: Number of plant species (Y), quantity of biomass (X) and levels of pHof the soil.

pH level Y X Y X Y X Y X Y X18 0,1008 15 2,6292 13 0,6526 8 3,6787 9 1,507919 0,1385 9 3,2522 9 1,5553 2 4,8315 8 2,3259

Low 15 0,8635 3 4,4172 8 1,6716 17 0,2897 12 2,995719 1,2929 2 4,7808 14 2,8700 14 0,0775 14 3,538112 2,4691 18 0,0501 13 2,5107 15 1,4290 7 4,364511 2,3665 19 0,4828 4 3,4976 17 1,1207 3 4,8705

29 0,1757 30 1,3767 21 2,5510 18 3,0002 13 4,905613 5,3433 9 7,7000 24 0,5536 26 1,9902 26 2,9126

Mid 20 3,2164 21 4,9798 15 5,6587 8 8,1000 31 0,739528 1,5269 18 2,2321 16 3,8852 19 4,6265 20 5,12096 8,3000 25 0,5112 23 1,4782 25 2,9345 22 3,5054

15 4,6179 11 5,6969 17 6,0930 24 0,7300 27 1,1580

30 0,4692 39 1,7308 44 2,0897 35 3,9257 25 4,266729 5,4819 23 6,6846 18 7,5116 19 8,1322 12 9,5721

High 39 0,0866 35 1,2369 30 2,5320 30 3,4079 33 4,605020 5,3677 26 6,5608 36 7,2420 18 8,5036 7 9,390939 0,7648 39 1,1764 34 2,3251 31 3,2228 24 4,136125 5,1371 20 6,4219 21 7,0655 12 8,7459 11 9,9817

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 100: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Linear Mixed Models

Violation of independence in Linear Models

sampling process

cluster trials (randomized units: families, countries, factories)multilevel (classes into schools into towns, repetitions into block intotrials). . .

longitudinal or repeated data

spatial data

Then,Y comes from the multivariate gaussian parametric family density :

F =

{1

(2π)n/2det(V )−1/2e−

12 (Y−Xβ)′V−1(Y−Xβ)

}dimension of V is n(n + 1)/2difficult to specify directly these elements⇒ with random effects we can deal with variances and covariances.

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 101: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Eucalyptus example

Example

Breeding program of Eucalypt.Ten families of trees full-sibs have been measured each year. There arethree blocks, and 36 individuals per family in each block.Let assume that

Yijk = µ+ Bk + Fj + εijk

with Bk , k = 1, 2, 3 random block effects, Fj , j = 1, . . . , 10 random familyeffects. Then

1 variances : Var(Yijk ) = σ2B + σ2

F + σ2ε

2 covariances :

cov(Yijk ,Yi ′jk ) = σ2B + σ2

F

cov(Yijk ,Yi ′j′k ) = σ2B and cov(Yijk ,Yi ′jk′) = σ2

F

cov(Yijk ,Yi ′j′k′) = 0

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 102: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Definition

Let y denote the observation vector, for n individuals (y a realization ofrandom vector Y).Let β be a vector of fixed unknown parameters and u be the unobservedrealized vector of the random effect vector U.Let X and Z be two known model matrices.

Y is modeled by a linear mixed model when

1 Y comes from the multivariate gaussian parametric density family

2 the conditional expectation E(Y|U = u) is related to the predictorη = Xβ + Zu through the identity link: E(Y|U = u) = η

3 the conditional variance: V(Y |U = u) = R

4 U ∼ N (0,D) and D is an unknown covariance matrix

This is commonly written:

Y = Xβ + Zu + ε

with U and ε independent.

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 103: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Marginal moments

marginal expectation

E (Y) = EU [E(Y|U)]

= EU (Xβ + ZU)

= Xβ

marginal variance

V (Y) = EU(V(Y |U)) + VU(E(Y |U))

= R + VU(Xβ + ZU)

= R + ZDZ ′

marginal distribution of Y

Y ∼ N (Xβ,ZDZ ′ + R)

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 104: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Estimation for V known

Coming back to the very general case Y ∼ N (Xβ,V )Maximum likelihood estimator of β?

Log-likelihood

`(β; y) = −1

2(y − Xβ)′V−1(y − Xβ)− 1

2log [det(V )]− n

2log(2π)

Score function

U(β) =∂`(β; y)

∂β= X ′V−1(y − Xβ)

Solution of U(β) = 0

β0 = (X ′V−1X )−X ′V−1Y

where A− is the generalized inverse of A

Theorem (Kruskal):

β0 = (X ′X )−X ′Y iff VX = X

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 105: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Properties of β0 and tests

Properties

expectationE(β0) = β

varianceV(β0) = (X ′V−1X )−

multivariate gaussian distribution

β0 ∼ NK

[β, (X ′V−1X )−

]Tests

H0 : Lβ = m

Test statistics: X 2 = (Lβ0 −m)′[L(X ′V−1X )−L′]−1(Lβ0 −m)

Distribution: X 2 ∼ χ2(r(L))

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 106: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Estimation for V known up to a constant

Let V = σ2W , W known, σ2 unknownMaximum likelihood estimator of β and σ2?

Log-likelihood

`(β; y) = − 1

2σ2(y−Xβ)′W−1(y−Xβ)−n

2log(2πσ2)−1

2log [det(W )]

Score function for β

U(β; σ2) =∂`(β, σ2; y)

∂β= X ′(σ2W )−1(y − Xβ)

Score function for σ2

U(σ2; β) =∂`(β, σ2; y)

∂σ2=

1

2(σ2)2(y − X β)′W−1(y − X β)− n

2σ2

Solution of U(β; σ2) = 0 and U(σ2; β) = 0

β0 = (X ′W−1X )−X ′W−1Y

σ2ml =

1

n(Y − Xβ0)′W−1(Y − Xβ0)

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 107: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Properties and tests

Properties

β0 ∼ NK

[β, σ2(X ′W−1X )−

]E(σ2

ml ) =σ2(n − r(X ))

nσ2

ml is biased but asymptotically unbiased

σ2ls =

n

n − r(X )σ2

ml is unbiased

σ2ls ∼

σ2

n − r(X )χ2(n − r(X ))

Tests

H0 : Lβ = m

Test statistics: F =(Lβ0 −m)′[L(X ′W−1X )−L′]−1(Lβ0 −m)

r(L)σ2ls

Distribution: F ∼ F(r(L), n − r(X ))

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 108: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Geometrical interpretation with W = Id

Y ε

Y = Xβ^

^

Vec(X)

Vec┴(X)

Y = X β ∈ Vec(X )dim(Vec(X )) = r(X )Y projection of Y on Vec(X )

ε = (Y − Y) ∈ Vec⊥(X )dim(Vec⊥(X )) = n − r(X )P = (Id − X (X ′X )−1X ′)

σ2ls = ε′ε

n−r(X )

σ2ml =

Y′PY

n=ε′ε

n

↪→ ML does not take into account dim(Vec⊥(X ))↪→ REML estimation

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 109: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Restricted likelihood

Let L such as LX = 0

Restricted log-likelihood of y (log-likelihood of Ly)

−2`reml (σ2; y) = y′L′(Lσ2WL′)−1Ly + log [det(Lσ2WL′)] + (n − r(X )) log (2π)

= C1 + y′Py + log [det(σ2W )] + log {det[X ′(σ2W )−1X ]}= C2 + y′Py + n log(σ2)− r(X ) log(σ2)

= C2 + y′Py + (n − r(X )) log(σ2)

= C2 +1

σ2y′Py + (n − r(X )) log(σ2)

where

C1 = (n − r(X )) log(2π)− log det(X ′X ) + log det(LL′)

C2 = C1 + log det(W ) + log {det[X ′W−1X ]}P = V−1 − V−1X (X ′V−1X )−1X ′V−1

P = W−1 −W−1X (X ′W−1X )−1X ′W−1

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 110: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Score function

Ureml (σ2) =

1

2(σ2)2y′Py − n − r(X )

2σ2

Solution

σ2reml =

y′Py

n − r(X )

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 111: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

ML estimation for V unknown

Let Y ∼ N (Xβ,V )Let assume that V depends on a parameter vector φ: V = V (φ)Maximum likelihood estimator of β and φ?

Score functions U(β) = ∂`(β;φ,y)∂β = X ′V−1(y − Xβ), where V = V (φ)

−2U(φ) = −2∂`(φ;β,y)∂φ = tr(V−1 ∂V

∂φ )− (y − X β)′V−1 ∂V∂φV−1(y − X β)

Solutions1

β = (X ′V−1X )−X ′V−1Y

2

tr(V−1 ∂V

∂φ) = y′P

∂V

∂φPy

where P = V−1 − V−1X (X ′V−1X )−X ′V−1.↪→ not a closed form; use iterative algorithms

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 112: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Properties and tests

Properties

(X ′V−1X )− is not an estimator of Var(β): we should take intoaccount variability in V

Var(β − β) = Var(β0 − β) + Var(β − β0) ≥ Var(β0 − β)

Var∞(β, φ) = I−1

with

I =

(X ′V−1X 0

0 ( 12 tr(V−1 ∂V

∂φmV−1 ∂V

∂φm′))m,m′=1,...,M

)

the Fisher information matrix I = E(−∂

2`(β,φ;y)∂γ∂γ′

)with γ = (β, φ)′

and β assymptotically gaussian

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 113: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Test

H0 : Lβ = m

Test statistics: F =(Lβ0 −m)′[L(X ′V−1X )−L′]−1(Lβ0 −m)

r(L)

Approximate distribution F ∼ F(r(L), df )

df = n − r(X ) if V known up to a constant

many corrections of degrees of freedom df have been proposed

Satterthwaite (1941)Kenward and Roger (1997)...

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 114: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

REML estimation for V unknown

Restricted maximum likelihood estimator of φ?

Restricted likelihood

−2`reml (σ2; y) = y′Py + log(det(V )) + log(det(X ′V−1X ))

Score functions{−2Ureml (φ) = −2∂`reml (φ;β,y)

∂φ = tr(P ∂V∂φ )− y′P ∂V

∂φPy

Solution

tr(P∂V

∂φ) = y′P

∂V

∂φPy

where P = V−1 − V−1X (X ′V−1X )−X ′V−1.↪→ not a closed form; use iterative algorithms

Remark 1 : E(Ureml (φ; Y)) = 0

Remark 2 : β = (X ′V−reml 1X )−X ′V−reml 1y

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 115: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Properties

Var∞(β, φ) = I−1

with

I =

(X ′V−1X 0

0 ( 12 tr(P ∂V

∂φmP ∂V∂φm′

))m,m′=1,...,M

)and β assymptotically gaussian.

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 116: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Estimation in the linear mixed model

Coming back to the linear mixed model definition:

Y = Xβ +L∑

l=1

Zl ul + ε

with L different random effects ul ∼ N (0, σ2l Iql )

and ε ∼ N (0, σ20 In)

Parameters to be estimated : β and σ20 , σ

21 , . . . , σ

2L.

Conditional moments

E(Y|u1, . . . , uL) = Xβ +L∑

l=1

Zl ul

V(Y|u1, . . . , uL) = σ20 In

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 117: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Marginal moments

E(Y) = Xβ

V(Y) =L∑

l=1

σ2l Zl Z

′l + σ2

0 In

=L∑

l=0

σ2l Vl with V0 = In and Vl = Zl Z

′l

σ2 = (σ20 , σ

21 , . . . , σ

2L) are called variance components

↪→ V is a linear combination of variance components

↪→ ∂V∂σ2

l= Vl for all l ∈ {1, . . . , L}.

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 118: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

ML estimation

1 βml = (X ′V−1X )−X ′V−1Y

2 tr(V−1Vl ) = y′PVl Py

i.e.∑L

m=1 tr(V−1Vl V−1Vm)σ2

m = y′PVl Py

↪→ iterative linear system of equations :(tr(V−1Vl V

−1Vm))

l,m=1,...,L

(σ2

m

)m=1,...,L

=(

y′PVmPy)

m=1,...,L

REML estimation

1 βreml = (X ′V−1X )−X ′V−1Y

2 tr(PVl ) = y′PVl Py

i.e.∑L

m=1 tr(PVl PVm)σ2m = y′PVl Py

↪→ iterative linear system of equations :(tr(PVl PVm)

)l,m=1,...,L

(σ2

m

)m=1,...,L

=(

y′PVmPy)

m=1,...,L

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 119: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Properties

ML estimationVar∞(βml , σ

2ml ) = I−1

with

I =

(X ′V−1X 0

0 ( 12 tr(V−1Vl V

−1Vm))l,m=1,...,L

)REML estimation

Var∞(βreml , σ2reml ) = I−1

with

I =

(X ′V−1X 0

0 ( 12 tr(PVl PVm))l,m=1,...,L

)

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 120: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Likelihood ratio test for fixed effects

Test model M0 embeded in M1

(M0 and M1 have the same variance structure)

with m0 = dim(Vec(X0)) and m1 = dim(Vec(X1)) (m0 < m1).Let `M0 and `M1 denote the associated log-likelihood calculated at theML-estimated parameter values (not REML).Then

X 2 = −2`M0 + 2`M1 ∼H0 χ2m1−m0

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 121: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Comparison of models

Akaike’s Information Criterion

AIC = −2 logL(βml , σml , y) + 2K

Bayesian Information Criterion

BIC = −2 logL(βml , σml , y) + K log(n)

Other criterions with other penalty terms

Danger !!! Softwares often give AIC or BIC calculated at reml-estimatedparameter values.

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 122: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

σ2 =(σ2

0 , . . . , σ2L

)are solution of the non-linear equations:

F (σ2)σ2 = g(σ2)

where F (σ)2 is (L + 1)× (L + 1) matrix

F σ2 =L∑l

tr(V−1V ′l V−1Vl ), l ′ = 0; . . . L

andg(σ2) = y′PVl′ Py

Iterative algorithm should be employed

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 123: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Prediction of random effects

β are unknown but fixed parameters to be estimatedu are unobserved realized values of the latent random vector U↪→ estimation of U has no sense but prediction has !

As the joint distribution (Y,U) is gaussian

U = E (U|Y) = DZ ′V−1(Y − Xβ)

is the best linear predictor (BLP) in the sense of minimizing the meansquare error

if V is known,U0 = DZ ′V−1(Y − Xβ0)

is the best Unbiased linear predictor (BLUP)

if V is unknownU = DZ ′V−1(Y − X β)

is the estimate of the best linear predictor

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 124: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

ANOVA with random effects

Example

2 varieties (Resistant/Susceptible)4 pesticide doses5 blocks40 observations of a response YYijk response for the i th dose j th variety and k th block.Evaluation of the influence of varieties, pesticide doses and interaction onthe response.Take into account the variablity between blocks and between interactionsblock × dose.

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 125: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Example

Yijk = µ+ δi + γj + (δγ)ij + rk + wik + εijk

rk ∼ N (0, σ2r I5), wik ∼ N (0, σ2

w I20) and εijk ∼ N (0, σ2I40) mutuallyindependent.

cov(Yijk ,Yi ′j′k′) =

σ2

r + σ2w + σ2 if k = k ′ i = i ′ and j = j ′

σ2r + σ2

w if k = k ′ and i = i ′

σ2r if k = k ′

0 otherwise

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 126: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

> variety <- read.table("DataSets/Variety_eval.csv",sep=";",

+ header=T, colClasses=c(rep("factor",3),rep("numeric",2)))

> interaction.plot(variety$dose,variety$type,variety$y)

> plot(variety$y~I(variety$dose:variety$type))

1820

2224

2628

variety$dose

mea

n of

var

iety

$y

1 2 4 8

variety$type

sr

1:r 2:r 4:r 8:r

1520

2530

I(variety$dose:variety$type)

varie

ty$y

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 127: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Basic R code

> res.reml <- lmer(y ~ dose*type + (1|block) + (1|block:dose),

+ data=variety,REML = TRUE)

res.reml is a mer object

methods are available (e.g. fixef(res.reml), VarCorr(res.reml))

slots are available (e.g. res.reml@ranef, res.reml@mu)

Summary

> res.sum <- summary(res.reml)> print(c(slotNames(res.sum)[1:5],"...."))

[1] "methTitle" "logLik" "ngrps" "sigma" "coefs" "...."

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 128: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Summary

> res.sum@coefs

Estimate Std. Error t value

(Intercept) 20.00 1.476828 13.5425397

dose2 7.84 1.879586 4.1711322

dose4 8.18 1.879586 4.3520231

dose8 4.80 1.879586 2.5537544

types -2.94 1.314363 -2.2368242

dose2:types 0.30 1.858791 0.1613953

dose4:types 3.14 1.858791 1.6892704

dose8:types 3.94 1.858791 2.1196577

> data.frame(res.sum@REmat)

Groups Name Variance Std.Dev.

1 block:dose (Intercept) 4.5132 2.1244

2 block (Intercept) 2.0735 1.4400

3 Residual 4.3189 2.0782

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 129: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

ML versus REML

Random effects

Groups var.ML std.ML var.REML std.REML1 block:dose 3.6106 1.9002 4.5132 2.12442 block 1.6588 1.2879 2.0735 1.44003 Residual 3.4551 1.8588 4.3189 2.0782

Fixed effects

Est.ML t.value.ML Est.REML t.value.REML(Intercept) 20.00 15.14 20.00 13.54

dose2 7.84 4.66 7.84 4.17dose4 8.18 4.87 8.18 4.35dose8 4.80 2.86 4.80 2.55types -2.94 -2.50 -2.94 -2.24

dose2:types 0.30 0.18 0.30 0.16dose4:types 3.14 1.89 3.14 1.69dose8:types 3.94 2.37 3.94 2.12

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 130: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Tests for fixed effects: each parameters

Use maximum likelihood : REML=FALSE

t-test approach, with n-p degrees of freedom (40-8)

Est.ML t.value.ML p.val(Intercept) 20.00 15.14 0.0000

dose2 7.84 4.66 0.0001dose4 8.18 4.87 0.0000dose8 4.80 2.86 0.0075types -2.94 -2.50 0.0177

dose2:types 0.30 0.18 0.8579dose4:types 3.14 1.89 0.0680dose8:types 3.94 2.37 0.0240

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 131: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Tests for fixed effects: each parameters

resampling approach : available for simple linear mixed models> ic <- mcmcsamp(res.ml,n = 100)> ic <- ic@fixef> ic <- t(apply(ic,1,function(x) quantile(x,prob=c(0.025,0.975))))> xtable(ic)

2.5% 97.5%(Intercept) 16.83 23.12

dose2 4.12 11.44dose4 4.30 11.64dose8 0.77 8.50types -5.78 1.14

dose2:types -4.16 5.12dose4:types -1.73 7.88dose8:types -1.03 8.52

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 132: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Tests for fixed effects: global test

F-test approach, with n-p degrees of freedom (40-8) usinganova(res.reml) R command

Df Sum.Sq Mean.Sq F.value p.valdose 3 176.54 58.85 13.625 0.0000type 1 11.99 11.99 2.776 0.1054

dose:type 3 29.64 9.88 2.288 0.0974

balanced case: exact FtestError: block

Df Sum Sq Mean Sq F value Pr(>F)Residuals 4 119.73 29.933

Error: block:doseDf Sum Sq Mean Sq F value Pr(>F)

dose 3 545.50 181.835 13.625 0.0003595 ***Residuals 12 160.14 13.345---Signif. codes: 0 S***S 0.001 S**S 0.01 S*S 0.05 S.S 0.1 S S 1

Error: block:dose:typeDf Sum Sq Mean Sq F value Pr(>F)

type 1 11.990 11.9903 2.7762 0.1151dose:type 3 29.643 9.8809 2.2878 0.1176Residuals 16 69.102 4.3189

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 133: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Model Comparisons

we want to compare different nested models : use ML (not REML)

full model: dose*type

additive model ( dose+type) versus full model

an approximated test using chi-squarre distribution:

> compar <- anova(res.ml,additif)

Df AIC BIC logLik Chisq Chi Df Pr(>Chisq)additif 8 212.85 226.36 -98.43res.ml 11 211.71 230.29 -94.86 7.14 3 0.0676

AIC or BIC criterion

> AICTab <-rbind(summary(additif)@AICtab,summary(res.ml)@AICtab)

> rownames(AICTab) <- c("additif", "res.ml")

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 134: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Predictions

conditional: E (Y |U) = Xβ + Zu

> pred.cond <- res.sum@mu

marginal: E (Y |U) = Xβ

> pred.mar <- res.sum@X%*%res.sum@coefs[,1]

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 135: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

ANCOVA with random effects

Example

32 steers dispatched into8 farms4 diet treatmentseach steer in farm received one treatment

influence of the initial weight of steers

32 observations of a response Y

Yij response for the i th diet treatment j th farm .

Determine the optimal level of feed additive to maximize the averagedaily gain of steer.

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 136: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

ANCOVA with random farm effects

Yij = dtrti + γwij + farmj + εij

farmj ∼ N (0, σ2f I8), εij ∼ N (0, σ2I32) mutually independent.

cov(Yij ,Yi ′j′) =

σ2f + σ2 if i = i ′ and j = j ′

σ2f if j = j ′

0 otherwise

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 137: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Pairwise comparison of treatments via Tukey

> steers.ml <- lmer(adg~dtrt+iwt + (1|farm), data=steers,REML=FALSE)

> library(multcomp)

> summary(glht(steers.ml, linfct = mcp(dtrt = "Tukey")))

Simultaneous Tests for General Linear Hypotheses

Multiple Comparisons of Means: Tukey Contrasts

Fit: lmer(formula = adg ~ dtrt + iwt + (1 | farm), data = steers,

REML = FALSE)

Linear Hypotheses:

Estimate Std. Error z value Pr(>|z|)

10 - 0 == 0 0.48366 0.10308 4.692 <1e-04 ***

20 - 0 == 0 0.46401 0.10235 4.534 <1e-04 ***

30 - 0 == 0 0.55180 0.10487 5.262 <1e-04 ***

20 - 10 == 0 -0.01966 0.10251 -0.192 0.998

30 - 10 == 0 0.06814 0.10867 0.627 0.923

30 - 20 == 0 0.08779 0.10622 0.827 0.842

---

Signif. codes: 0 S***S 0.001 S**S 0.01 S*S 0.05 S.S 0.1 S S 1

(Adjusted p values reported -- single-step method)

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 138: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

ANCOVA with random farm effects and specific fixed weight slope

Yij = dtrti + γi wij + farmj + εij

farmj ∼ N (0, σ2f I8), εij ∼ N (0, σ2I32) mutually independent.

cov(Yij ,Yi ′j′) =

σ2f + σ2 if i = i ′ and j = j ′

σ2f if j = j ′

0 otherwise

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 139: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

R code

> steers2.ml <- lmer(adg~dtrt*iwt + (1|farm), data=steers, REML=FALSE)

> anova(steers.ml,steers2.ml)

Data: steers

Models:

steers.ml: adg ~ dtrt + iwt + (1 | farm)

steers2.ml: adg ~ dtrt * iwt + (1 | farm)

Df AIC BIC logLik Chisq Chi Df Pr(>Chisq)

steers.ml 7 27.617 37.877 -6.8086

steers2.ml 10 29.971 44.628 -4.9855 3.6463 3 0.3023

> library(ggplot2)

> datapred <- data.frame(iwt= steers$iwt,pred=

+ steers.ml@X%*%summary(steers.ml)@coefs[,1],

+ dtrt=steers$dtrt)

> steers.plot <- ggplot(datapred, aes(y=pred,x=iwt,group=dtrt,

+ colour=dtrt))+geom_line()

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 140: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

iwt

pred

1.2

1.4

1.6

1.8

2.0

350 400 450

dtrt

0

10

20

30

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 141: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Random coefficient model

Example

10 varieties of winter wheat sampled in population

6 clones per varieties

60 combinations completely randomized in the field

pre-planting moisture at each plot have been measured

60 observations of a response Y of wheat yield (in bushel)

Yij response for the i th variety j th clone

moisture influences the wheat yield .

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 142: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Plot

> library(ggplot2)

> wheat <- read.table("DataSets/wheat.csv", sep=";",

+ colClasses=c("factor","numeric","numeric"),header=T)

> wheat.plot <- ggplot(wheat, aes(y=yield,x=moisture,group=wheat,

+ colour=wheat))+geom_line()

moisture

yiel

d

30

40

50

60

70

10 20 30 40 50 60

wheat

1

10

2

3

4

5

6

7

8

9

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 143: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Models

1 fixed coefficients

Yij = µ+ αwij + vi + γi wij + εij

but varieties sampled from population,

2 random variety effects

Yij = µ+ αwij + vi + γi wij + εij

where

v ∼ N (0, σ2v I10)

γ ∼ N (0, σ2γ I10)

↪→ are γ and v independent?

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 144: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

R code: independent case

> wheat.ind <- lmer(yield~moisture + (1|wheat)

+ + (0+moisture|wheat), data=wheat, REML=FALSE)

> xtable(data.frame(summary(wheat.ind)@REmat))

Groups Name Variance Std.Dev.

1 wheat (Intercept) 16.3594335 4.0446802 wheat moisture 0.0019991 0.0447113 Residual 0.3551692 0.595961

R code: dependent case

> wheat.dep <- lmer(yield~moisture + (1+moisture|wheat),

+ data=wheat, REML=FALSE)

Groups Name Variance Std.Dev. Corr

1 wheat (Intercept) 16.9590713 4.1181392 moisture 0.0021012 0.045839 -0.3403 Residual 0.3524533 0.593678

Fixed part is the same for both models, REML=TRUE option could bepossible

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 145: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Comparing two models

> xtable(anova(wheat.ind,wheat.dep))

Df AIC BIC logLik Chisq Chi Df Pr(>Chisq)wheat.ind 5 193.10 203.57 -91.55wheat.dep 6 194.06 206.62 -91.03 1.04 1 0.3076

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 146: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Is random slope to be included?

> xtable(data.frame(summary(wheat.ind)@REmat))

Groups Name Variance Std.Dev.1 wheat (Intercept) 16.3594335 4.0446802 wheat moisture 0.0019991 0.0447113 Residual 0.3551692 0.595961

> wheat.int <- lmer(yield~moisture + (1|wheat),

+ data=wheat, REML=FALSE)

> xtable(anova(wheat.int,wheat.ind))

Df AIC BIC logLik Chisq Chi Df Pr(>Chisq)

wheat.int 4 208.26 216.64 -100.13wheat.ind 5 193.10 203.57 -91.55 17.16 1 0.0000

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 147: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

GLMM construction

Linear models

LM

�����*

HHHHHj Mixed

LMM

Generalized

GLMHHHHHj

�����*

GLMM

Flexibility to account for:

categorical, discrete ... non gaussian data ... distributions in theexponential family↪→ mean non linearly related to the model parameters↪→ variances of the data related to the mean

random effect factors

correlation structure in the data ... repeated data

over-dispersion

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 148: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Example

8 clinics sampled from a larger population

At clinic j , - nj1 receive treatment 1- nj2 receive treatment 2

The response Yjk at treament k and clinic j is the number of ‘favorable’responses.

fjk =Yjk

njkis the proportion of ‘favorable’ responses.

↪→ binomial data in a multi-center clinical trial.

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 149: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Model definition

. Two random sources in the data:

• uj j = 1, .., q (q clusters) j th random effect

• Yijk k = 1, .., nij k th observation for subject i in cluster j

. Conditional building of the model:

• conditional distribution of Y given random effect:

Yi |U ∼ indep. ExpFam(µ, φ)with ∀i :

E(Yi |U) = µi = g−1(ηi ) = g−1(x ′i β + z ′i u)V(Yi |U) = φv(µi )

with g the link functionand v the variance function

• random effects distribution U :

Uj ∼ Nqj (0, σ2j Idqj )

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 150: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Example (cont.)

Yjk |U ∼ indep. BinomFam(njk , pjk )

E(Yjk |U) = µjk = njk × pjk = njk × g−1(ηjk )

V(Yjk |U) = φv(µjk )

U ∼ N8(0, σ2Id8)

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 151: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Marginal moments

E(Yi ) = E(E(Yi |U))= E(g−1(x ′i β + z ′i u))= ?

↪→ cannot in general be simplified due to the non linear functiong−1.

V(Yi ) = V(E(Yi |U)) + E(V(Yi |U))= V(g−1(x ′i β + z ′i u)) + E(φv(g−1(x ′i β + z ′i u)))= ?

↪→ cannot in general be simplified.

cov(Yi ,Yj ) = cov(E(Yi |U),E(Yj |U)) + E(cov(Yi ,Yj |U))= cov(g−1(x ′i β + z ′i u), g−1(x ′j β + z ′j u)) + 0= ?

↪→ cannot in general be simplified.

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 152: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Likelihood definition

parameters to be estimated: β, σ2

observed data: y

L(β, σ2; y) =∫IRq fY |U (y) fU (u) du

=∫IRq

∏i fYi |Uj

(yi ) fU (u) du

Problem: This integral cannot be formally expressed in closed formwith exceptions e.g. gaussian distribution, beta-binomial model,Poisson-gamma model ...

↪→ no analytic expression for the marginal distribution of Y↪→ need for approximation

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 153: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Estimation

2 main estimation approaches:

numerical integration for calculating the likelihood and hencenumerical maximisation of the likelihood

linearization methods using Taylor expansions to approximate themodel

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 154: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Estimation by numerical integration

L(β, σ2; y) =∫IRq

∏i fYi |U (yi ) fU (u) du

. Gaussian quadrature:integral approximated by a weighted sum:cf. Abramowitz & Stegun (1964)If Uj are independent:

L(β, σ2; y) =∫IRq

∏i fYi |U (yi ) fU (u) du

=∏

j

∫IR

∏i fYij |Uj

(yij ) fUj (uj ) duj

=∏

j

∫IR

∏i fYij |U∗j (yij ) fU∗j

(u∗j ) du∗j

`(β, σ2; y) '∑

j ln[∑

l

(∏i fYij |U∗j (yij , u

∗cj,l ))

wj,l

]with - quadrature points: u∗cj,l

- weights: wj,l

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 155: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

. Difficulties:

choice of the quadrature points

low dimension of random effects

a single random effects or two, no more

nested : OKcrossed : impossible

↪→ works only in very simple situations

↪→ implemented in SAS macro nlmixed devoted to non linear mixedmodels

. Other numerical integration techniques:MCMC, EM, MCEM ... suffer from nearly the same difficulties

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 156: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Estimation by linearization

. Simple approach cf . Schall (1991)

2 steps method:

⇒ conditionaly on U: the GLMM model is a GLM

recall estimation of β in a GLM : weighted least squares

X ′W−1β Xβ = X ′W−1

β zβ

- on pseudo data (or working variates) : zβ = η + (y − µβ)g ′(µβ)↪→ seen as an expansion of the link function around the mean- with covariance matrix : Wβ = diag{g ′(µβ)2 v(y)}

linearized model M[t] (iterative procedure) :

Z [t] = Xβ + ε where V(ε) = Wβ[t]

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 157: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

given U, linearized model M[t]u :

Z [t] = Xβ + ZU + ε where V(ε) = Wβ[t],u[t] = W [t]

⇒ estimation in the LMM

iterative estimation of parameters in the LMM (random effects recoveringtheir random nature)

↪→ double iterative algorithm

Other justifications to similar computational algorithms:

. Penalized quasi-likelihood cf. Breslow & Clayton (1993)

. Laplace approximation cf. Wolfinger (1993)

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 158: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

. Advantages:- easy to compute- numerically fast- no limit on the number or type of random effects

. Drawbacks:- small bias (correction proposed by Lin & Breslow (1996))- normality assumption of the random effects- small value of variance components

↪→ implemented in SAS macro glimmix

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 159: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Multi-center clinical trial

clinic trt fav unfav

1 1 drug 11.00 25.002 1 cntl 10.00 27.003 2 drug 16.00 4.004 2 cntl 22.00 10.005 3 drug 14.00 5.006 3 cntl 7.00 12.007 4 drug 2.00 14.008 4 cntl 1.00 16.009 5 drug 6.00 11.00

10 5 cntl 0.00 12.0011 6 drug 1.00 10.0012 6 cntl 0.00 10.0013 7 drug 1.00 4.0014 7 cntl 1.00 8.0015 8 drug 4.00 2.0016 8 cntl 6.00 1.00

Do drug and clinic have an effecton the probability of favorableresponse?

↪→ each clinic is measured twicefor drug and control: dependencybetween the 2 reponses for eachclinic

↪→ clinics sampled from a targetpopulation : between clinicsvariability

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 160: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Modeling approach

Denote pjk the probability of favorable response for treatment k andclinic j .

A linear model would be described:

fjk = pjk + εjk

withpjk = µ+ αk + cj + (γ)jk

Problems:

non normal distribution of the data

values of linear combination of effects are outside of the range forprobabilities

interaction γ and error ε are confounded in the model

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 161: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Binomial model with canonical logit link

ηjk = logpjk

1− pjk= µ+ αk + cj + (γ)jk

with cj ∼ N (0, σ2c ) and (γ)jk ∼ N (0, σ2

γ).

> clinics <- read.table("DataSets/clinics.txt", header=T,

+ colClasses = c(rep("factor",2),rep("numeric",2)))

> clinics$trt <- relevel(clinics$trt,"drug")#change reference level

> clinics.glmm <- glmer(cbind(fav,unfav)~ trt + (1|clinic/trt),

+ data=clinics, family=binomial(link="logit"))

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 162: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Binomial model with canonical logit link

> summary(clinics.glmm)

Generalized linear mixed model fit by the Laplace approximationFormula: cbind(fav, unfav) ~ trt + (1 | clinic/trt)

Data: clinicsAIC BIC logLik deviance

43.81 46.9 -17.9 35.81Random effects:Groups Name Variance Std.Dev.trt:clinic (Intercept) 0.0057396 0.07576clinic (Intercept) 1.9317511 1.38987Number of obs: 16, groups: trt:clinic, 16; clinic, 8

Fixed effects:Estimate Std. Error z value Pr(>|z|)

(Intercept) -0.4573 0.5442 -0.840 0.4008trtcntl -0.7423 0.3001 -2.473 0.0134 *---Signif. codes: 0 S***S 0.001 S**S 0.01 S*S 0.05 S.S 0.1 S S 1

Correlation of Fixed Effects:(Intr)

trtcntl -0.263

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 163: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Binomial model with other link functions

> clinicsProb.glmm <- glmer(cbind(fav,unfav)~ trt + (1|clinic/trt),+ data=clinics, family=binomial(link="probit"))> summary(clinicsProb.glmm)

Generalized linear mixed model fit by the Laplace approximationFormula: cbind(fav, unfav) ~ trt + (1 | clinic/trt)

Data: clinicsAIC BIC logLik deviance

44.12 47.21 -18.06 36.12Random effects:Groups Name Variance Std.Dev.trt:clinic (Intercept) 0.00045787 0.021398clinic (Intercept) 0.66611291 0.816157Number of obs: 16, groups: trt:clinic, 16; clinic, 8

Fixed effects:Estimate Std. Error z value Pr(>|z|)

(Intercept) -0.2680 0.3181 -0.842 0.3996trtcntl -0.4425 0.1735 -2.550 0.0108 *---Signif. codes: 0 S***S 0.001 S**S 0.01 S*S 0.05 S.S 0.1 S S 1

Correlation of Fixed Effects:(Intr)

trtcntl -0.266

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 164: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

You can get predicted values:- on the predictor scale / on the inverse link scale (here probability scale)- with BLUP of random effects or not.

BLUP NOBLUP

Inv.pred g−1(x ′β + z ′u) g−1(x ′β)

pred x ′β + z ′u x ′β

different scale

> xtable(data.frame(logits=clinics.glmm@eta[1:5], ilogits=clinics.glmm@mu[1:5],+ ilogit2 = exp(clinics.glmm@eta[1:5])/(1+exp(clinics.glmm@eta[1:5]))))

logits ilogits ilogit21 -0.57 0.36 0.362 -1.29 0.22 0.223 1.39 0.80 0.804 0.65 0.66 0.665 0.54 0.63 0.63

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 165: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Longitudinal data

Longitudinal (or repeated) data are successive observations on each of acollection of observational units.

Example

traits are measured at different times

tree or animal growthchange in survivorship over time among populations

individuals are exposed to different level of a same treatment

same plants exposed to varying nitrogen dose (discrete or continuous)patients receiving several treatments along time periods

General objectives

→ Modeling the evolution among repetitions of traits depending onco-variables (factor or regressor) taking into account dependencies withinsubjects→ Modeling the between subjects variability

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 166: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Advantages:

an experiment more efficienthelps keep the variability lowhelps keep the validity of the results higher

Drawbacks :

may not be possible for each participant to be in all conditions of theexperimentunbalanced designmissing data

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 167: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Main question

observations on same subject

dependencies within subjects

but consecutive observations can be more correlated than more distantobservations

How to model within subject dependencies?

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 168: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Simple treatment×time model with repeated measures

Example

Pharmaceutical company tests a new drug on respiratory ability

A=standard, C=new, P = placebo

24 patients

each patient will receive each drug in a randomly assigned order

respiratory ability (FEV1) measured hourly for 8 hours followingtreatment

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 169: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Hypothesis

How does the response mean differ among treatment?

↪→ Is there a treatment main effect?

How does the response mean change over time?

↪→ Is there a time main effect?

How do response differences among treatment change over time?

↪→ Is there treatment×time interactions?

Treatment is called the between-subjects factor

Time is called within-subjects factor

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 170: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Data and reshape

> respir <- read.table("DataSets/respiratory.csv",

+ colClasses=c("factor", rep("numeric",9), "factor"),

+ header=TRUE, sep=";")

> colnames(respir) <-c("patient", 0:8,"drug")

> library(reshape)

> respir.lon <- melt(respir,id=c("patient","drug") )

> respir.lon$time <- as.numeric(respir.lon[,3])-1

> colnames(respir.lon)[4] <- "ability"

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 171: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Data and reshape

> qplot(time,ability,data=respir.lon, facets =~drug,

+ group=patient, geom="line", colour=patient)

time

abili

ty

1

2

3

4

1

2

3

4

a

p

0 2 4 6 8

c

0 2 4 6 8

patient

201

202

203

204

205

206

207

208

209

210

211

212

214

215

216

217

218

219

220

221

222

223

224

232

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 172: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Fixed model

gaussian familyY = µ+ ε

expectation for individual i in the treatment j at time k

E[Yijk ] = µ+ αj + γk + (αγ)jk

where µ the intercept, α the treatment effects, γ the time effect and(αγ) the interaction time × treatment

varianceV(Y ) = σ2Id

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 173: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Simple ANCOVA model

> Subrespir.lon <- respir.lon[respir.lon$variable!=0,]

> respir.lm <- lm(ability~drug*variable,data=Subrespir.lon)

> anova(respir.lm)

Analysis of Variance Table

Response: ability

Df Sum Sq Mean Sq F value Pr(>F)

drug 2 25.783 12.8913 25.6060 2.320e-11 ***

variable 7 17.170 2.4529 4.8722 2.386e-05 ***

drug:variable 14 6.280 0.4486 0.8910 0.5687

Residuals 552 277.903 0.5034

---

Signif. codes: 0 S***S 0.001 S**S 0.01 S*S 0.05 S.S 0.1 S S 1

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 174: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Simple plots

> residus <- matrix(respir.lm$res,ncol=8)

> pairs(residus[,1:5])

var 1

−1.5 −0.5 0.5

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

−1.5 −0.5 0.5 1.5

●●

●●●

●●

●●

●●

●●

●●

●●

●●

−1.

5−

0.5

0.5

●●

●●●

●●

●●

●●

●●

●●

●●

●●

−1.

5−

0.5

0.5

●●●

●●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

var 2●

●● ●

●●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●●

● ●

●●

●●

●●

● ●●

●●

●●

●●

● ●

●●●

●●

var 3●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

● ●

●●●

●●

−1.

50.

01.

0

●●

●●

●●

●●

●●

●●●

●●

●●

●●

● ●

●●●

● ●

−1.

5−

0.5

0.5

1.5

●●●

●●

● ●

● ●●

●●

●●●

●●

●●●

●●

●●

●●

● ●●

●●

●●●

●●

● ●

● ●●

●●

● ●●

●●

●●●

●●

●●

●●

● ●●

●●

●●●

●●

●●

● ●●

●●

● ●●

●●

●●●

●●

●●

●●

● ●●

●●

var 4●

●●

●●

●●

● ●●

●●●

● ●●

●●

●●●

●●

●●

● ●

●●●

● ●

−1.5 −0.5 0.5

●●

●●

●●●

●●

●● ●

●●

●●

● ●

● ●

●●

● ●●

● ●

●●

●●

●●●

●●

●● ●

●●

●●

● ●

●●

●●

● ●●

● ●

−1.5 0.0 1.0

●●

●●

●● ●

●●

●● ●

●●

●●

●●

● ●

●●

● ●●

● ●

●●

●●

●● ●

●●

●●●

●●

●●

●●

●●

●●●

●●●

● ●

−1.5 0.0 1.0

−1.

50.

01.

0

var 5

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 175: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Linear mixed model

> respir1.lmm <- lmer(ability~drug*variable+(1|patient),

+ data=Subrespir.lon, REML=FALSE)

> respir2.lmm <- lmer(ability~drug*variable+(1|patient/drug),

+ data=Subrespir.lon, REML=FALSE)

> anova(respir1.lmm,respir2.lmm)

Data: Subrespir.lon

Models:

respir1.lmm: ability ~ drug * variable + (1 | patient)

respir2.lmm: ability ~ drug * variable + (1 | patient/drug)

Df AIC BIC logLik Chisq Chi Df Pr(>Chisq)

respir1.lmm 26 456.37 569.63 -202.19

respir2.lmm 27 294.02 411.63 -120.01 164.35 1 < 2.2e-16 ***

---

Signif. codes: 0 S***S 0.001 S**S 0.01 S*S 0.05 S.S 0.1 S S 1

> data.frame(summary(respir2.lmm)@REmat)

Groups Name Variance Std.Dev.

1 drug:patient (Intercept) 0.053485 0.23127

2 patient (Intercept) 0.368488 0.60703

3 Residual 0.060497 0.24596

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 176: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Comparing fixed parameters estimation

> aic.lm <- c(AIC(respir.lm),BIC(respir.lm))

> aic.lm

[1] 1264.807 1373.710

> aic.lmm <- summary(respir2.lmm)@AICtab

> aic.lmm

AIC BIC logLik deviance REMLdev

294.0151 411.63 -120.0075 240.0151 329.7807

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 177: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Comparing fixed part given covariance structure

> respir3.lmm <- lmer(ability~drug+variable+(1|patient/drug),

+ data=Subrespir.lon, REML=FALSE)

> anova(respir2.lmm,respir3.lmm)

Data: Subrespir.lon

Models:

respir3.lmm: ability ~ drug + variable + (1 | patient/drug)

respir2.lmm: ability ~ drug * variable + (1 | patient/drug)

Df AIC BIC logLik Chisq Chi Df Pr(>Chisq)

respir3.lmm 13 360.41 417.04 -167.20

respir2.lmm 27 294.02 411.63 -120.01 94.391 14 5.584e-14 ***

---

Signif. codes: 0 S***S 0.001 S**S 0.01 S*S 0.05 S.S 0.1 S S 1

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 178: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Comments

Balanced experimental design

↪→ estimator of fixed effects do not depend on variance estimations↪→ estimation of fixed effects are equals for both cases

> coefs

[,1] [,2]

(Intercept) 3.48875000 3.48875000

drugc 0.19625000 0.19625000

drugp -0.67375000 -0.67375000

variable2 -0.07666667 -0.07666667

variable3 -0.28958333 -0.28958333

Residual variances are different

0.5 without random effects and 0.06 taking into account randomeffects

↪→ interaction becomes ”significant”.

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 179: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Modelling covariance function

are residual variances constant over time?

are covariances between times depending on lag?

are covariances depending on treatments?

...

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 180: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

From here, we consider:

V =

V11 0 0 0

0 V12 0 0

0 0. . . 0

0 0 0 V83

We assume no patient dependency from one drug to the other.We work on the 8× 8-covariance sub-block matrix

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 181: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Studying relation between estimated covariance and time

Let assume an unstructured covariance model: the mostparameterized model

V(Yij ) =

σ21 σ12 σ13 . . .

σ22 σ23 . . .

. . .. . .

. . .

> Subrespir.lon <- respir.lon[respir.lon$variable!=0,]

> Subrespir.lon$indic <- I(Subrespir.lon$patient:Subrespir.lon$drug)

> respir.un <- gls(ability~drug*variable, data=Subrespir.lon,

+ correlation = corSymm(form = ~1 | indic),

+ weights= varIdent( form = ~1 | time),method="ML")> VarCov

[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8][1,] 0.4351987 0.4395978 0.4256334 0.3981156 0.4168162 0.3769659 0.3413459 0.3679788[2,] 0.4395978 0.4947652 0.4607254 0.4492399 0.4737156 0.4077123 0.3825432 0.4079240[3,] 0.4256334 0.4607254 0.4717923 0.4491765 0.4641391 0.4085257 0.3853456 0.4078965[4,] 0.3981156 0.4492399 0.4491765 0.4732017 0.4635553 0.4004393 0.3855756 0.4073768[5,] 0.4168162 0.4737156 0.4641391 0.4635553 0.5538420 0.4738645 0.4449387 0.4744100[6,] 0.3769659 0.4077123 0.4085257 0.4004393 0.4738645 0.4701743 0.4267930 0.4438868[7,] 0.3413459 0.3825432 0.3853456 0.3855756 0.4449387 0.4267930 0.4786082 0.4308936[8,] 0.3679788 0.4079240 0.4078965 0.4073768 0.4744100 0.4438868 0.4308936 0.4821444

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 182: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

times

un

0.35

0.40

0.45

0.50

0.55

0 1 2 3 4 5 6 7

factor(gp)

1

2

3

4

5

6

7

8

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 183: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Autoregressive structure R

Let assume autoregressive covariance matrix:

V(Yij ) = σ2R where rkk′ = cor(Yijk ,Yijk′) = ρ|k−k′|

> respir.ar <- gls(ability~drug*variable, data = Subrespir.lon,

+ correlation = corAR1(0.2, form= ~time|indic), method="ML")

φ = 0.9219 and σ2 = 0.4685

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 184: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Plot between estimated unstructured and AR(1)covariances

ar

un

0.25

0.30

0.35

0.40

0.45

0.50

0.55

0.60

0.25 0.30 0.35 0.40 0.45 0.50

factor(gp)

1

2

3

4

5

6

7

8

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 185: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Random effect added to autoregressive structure

Autoregressive conditional covariance matrix for 1 subject within drug:

V(Yij |u) = σ2R where rkk′ = cor(Yijk ,Yijk′ |u) = ρ|k−k′|

Marginal covariance matrix for 1 subject within drug:

Vij =

(σ2 + σ2

u σ2ρ|k−k′| + σ2u

σ2ρ|k−k′| + σ2u σ2 + σ2

u

)> respir.lmear <- lme(ability~drug*variable, data=Subrespir.lon,

+ random= ~1|indic, correlation = corAR1(0.2, form= ~time|indic),method="ML")

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 186: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Plot between estimated unstructured and AR(1)+REcovariances

lmear

un

0.35

0.40

0.45

0.50

0.55

0.40 0.42 0.44 0.46

factor(gp)

1

2

3

4

5

6

7

8

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 187: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Impact of the patient RE

> respir2.lmear <- lme(ability~drug*variable, data=Subrespir.lon,+ random= ~1|patient/drug,+ correlation = corAR1(0.2, form= ~time|patient/drug),method="ML")> anova(respir.lmear,respir2.lmear)

Model df AIC BIC logLik Test L.Ratio p-valuerespir.lmear 1 27 258.8141 376.4290 -102.4070respir2.lmear 2 28 190.0226 311.9936 -67.0113 1 vs 2 70.79148 <.0001

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 188: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Some remarks

taking into account dependencies within subjects can inscrease testfor fixed effects

random effects are a way to model within subjects dependencies butare not necessarily sufficient

defining appropriate residual correlation form is a difficult task

plots are useful tools but not sufficientdifferent models should be compared but:

↪→ conclusions may differ according to different criterion

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 189: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Overdispersion in glms (Hinde and Demetrio, 1998a,b)

Residual Deviance ≈ Residual d.f.

What if Residual Deviance � Residual d.f.?1 Badly fitting model

omitted terms/variablesincorrect relationship (link)outliers

2 variation greater than predicted by model: ⇒ Overdispersion

count data: Var(Y ) > µcounted proportion data:

Var(Y ) > nπ(1− π)

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 190: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Overdispersion in glms (Hinde and Demetrio, 1998a,b)

Causes of Overdispersion

variability of experimental material– individual level variability

correlation between individual responsese.g. litters of rats

cluster samplinge.g. areas; schools; classes; children

aggregate level data

omitted unobserved variables

excess zero counts (structural and sampling zeros)

ConsequencesWith correct mean model we have consistent estimates of β but:

incorrect standard errors

selection of overly complex models

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 191: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Worldwide Airline Fatalities, 1976-85

Example

Year Fatal Passenger Passengeraccidents deaths miles

(100 million)

1976 24 734 38631977 25 516 43001978 31 754 50271979 31 877 54811980 22 814 58141981 21 362 60331982 26 764 58771983 20 809 62231984 16 223 74331985 22 1066 7107

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 192: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Worldwide Airline Fatalities, 1976-85

Simple Models

Passenger miles (mi ) as exposure variable

Poisson log-linear model

Linear time trend

Yi ∼ Pois(miλi ) (1)

log λi = β0 + β1yeari

Fatal accidents:Deviance(time trend) = 20.68Residual Deviance = 5.46 on 8 d.f.

Passenger deaths:Deviance(time trend) = 202.1Residual Deviance = 1051.5 on 8 d.f.

⇒ compounding with aircraft size

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 193: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Models for Overdispersion (Hinde and Demetrio, 1998a,b)

Two broad categories

assume some more general form for the variance function, possiblywith additional parameters.Estimation using moment methods, quasi-likelihood, extendedquasi-likelihood, pseudo-likelihood, . . .

assume a two-stage model for the response with the response modelparameter following some distribution.Maximum likelihood estimation (conjugate distribution models) orapproximate methods (e.g. using first two moments as above)Full hierarchical model – Bayesian methods

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 194: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Mean-variance Models (Hinde and Demetrio, 1998a,b)

Overdispersed Proportion DataYi successes out of mi trials, i = 1, . . . , n.

Model expected proportions πi with link function g and

g(πi ) = β′xi

constant overdispersion

Var(Yi ) = φmiπi (1− πi )

A general variance function:Overdispersion allowed to depend upon both mi and πi .

Var(Yi ) = miπi (1− πi )×[1 + φ(mi − 1)δ1{πi (1− πi )}δ2

]

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 195: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Mean-variance Models (Hinde and Demetrio, 1998a,b)

Overdispersed Count dataRandom variables Yi represent counts with means µi .

Constant overdispersion

Var(Yi ) = φµi

can arise through a simple compounding process.Suppose that Ni ∼ Pois(µN ) and T =

∑Ni

i=1 Xi , Xi are iid randomvariables.

E[T ] = µT = EN (E[T |N]) = µNµX

Var(T ) = EN [V(T |N)] + VN (E[T |N])

= µT

(σ2

X

µX+ µX

)= µT

E[X 2]

E[X ]

A general variance function

Var(Yi ) = µi

{1 + φµδi

}C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 196: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Two-stage Models – Binomial (Hinde and Demetrio, 1998a,b)

Beta-BinomialYi |Pi ∼ Bin(mi ,Pi )

E(Pi ) = πi Var(Pi ) = φπi (1− πi )

Unconditionally, E(Yi ) = miπi and

Var(Yi ) = miπi (1− πi )[1 + (mi − 1)φ]

Taking Pi ∼ Beta(αi , βi ), with αi + βi fixed, gives beta-binomialdistribution for Yi with the same variance function.

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 197: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Two-stage Models – Binomial (Hinde and Demetrio, 1998a,b)

The same variance function results from assuming that individual binaryresponses are not independent but have a constant correlation.Writing Yi =

∑mi

j=1 Rij , where Rij are Bernoulli random variables with

E[Rij ] = πi and Var(Rij ) = πi (1− πi )

then, assuming a constant correlation ρ between the Rij ’s for j 6= k, wehave

Cov(Rij ,Rik ) = ρπi (1− πi )

and

E[Yi ] = miπi

Var(Yi ) =

mi∑j=1

Var(Rij ) +

mi∑j=1

∑k 6=j

Cov(Rij ,Rik )

= miπi (1− πi ) + mi (mi − 1)[ρπi (1− πi )]

= miπi (1− πi )[1 + ρ(mi − 1)],

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 198: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Two-stage Models – Binomial (Hinde and Demetrio, 1998a,b)

Logistic-normal and related modelsRandom effect in the linear predictor

ηi = β′xi + σzi

assume zi ∼ N(0, 1)

Probit-normal model - a convenient interpretation as a thresholdmodel for a normally distributed latent variable (McCulloch, 1994).Logistic-normal using EM algorithm with Gaussian quadrature.Approximate approach using a Williams type III model with

Var(Yi ) = miπi (1− πi )[1 + φ(mi − 1)πi (1− πi )]

make no specific distributional assumption about z - estimate adiscrete mixing distribution by non-parametric maximum likelihood(NPML).

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 199: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Two-stage Models – Binomial (Hinde and Demetrio, 1998a,b)

Considered as the two-stage model, the logit(Pi ) have a normal distributionwith variance σ2, i.e. logit(Pi ) ∼ N(x′iβ, σ

2). Writing

Ui = logit(Pi ) = logPi

(1− Pi )⇒ Pi =

eUi

(1 + eUi )

and using Taylor series for Pi , around Ui = E[Ui ] = x′iβ, we have

Pi =ex′i β

(1 + ex′iβ)

+ex′i β

(1 + ex′iβ)2

(Ui − x′iβ) + o(Ui − x′iβ).

Then

E(Pi ) ≈ex′i β

(1 + ex′iβ)

:= πi

and

Var(Pi ) ≈

[ex′i β

(1 + ex′iβ)2

]2

Var(Ui ) = σ2π2i (1− πi )

2

Consequently the variance function for the logistic-normal model can beapproximated by

Var(Yi ) ≈ miπi (1− πi )[1 + σ2(mi − 1)πi (1− πi )]

which Williams (1982) refers to as a type III variance function.

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 200: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Two-stage models – Count data (Hinde and Demetrio, 1998a,b)

Negative Binomial

Variation in Poisson rate parameter:

Yi |θi ∼ Pois(θi ), θi ∼ Gamma(k, λi )

leads to negative binomial distribution with

E[Yi ] = µi = k/λi

and

Var(Yi ) = µi +µ2

ik

For known k, in the 1-parameter exponential family so still in glmframework.

Different assumptions for the Γ-distribution lead to differentparameterizations with different overdispersed variance functions,e.g. θi ∼ Γ(ki , λ) gives

Var(Yi ) = µi

(1 + 1

λ

)= φµi

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 201: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Two-stage models – Count data (Hinde and Demetrio, 1998a,b)

Negative binomial distribution

Yi |θi ∼ Pois(θi )θi ∼ Gamma(k, λi ), i = 1, . . . , n

This leads to a negative binomial distribution for the Yi with

fYi (yi ;µi , k) =Γ(k + yi )

Γ(k)yi !

µyii kk

(µi + k)k+yi, yi = 0, 1, . . .

and

E(Yi ) = k/λi = µi

Var(Yi ) = Eθi [Var(Yi |θi )] + Varθi (E[Yi |θi ])

= E[θi ] + V(θi ) =k

λi+

k

λ2i

Var(Yi ) = µi +µ2

i

k

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 202: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Two-stage models – Count data (Hinde and Demetrio, 1998a,b)

Poisson-normal and related modelsIndividual level random effect in the linear predictor

ηi = β′xi + σZi

assume Zi ∼ N(0, 1), so

Yi |Zi ∼ Pois(λi ) with log λi = x′iβ + σZi

where Zi ∼ N(0, 1), which gives

E[Yi ] = EZi (E[Yi |Zi ]) = EZi [ex′i β+σZi ]

= ex′i β+ 12σ2

:= µi

Var(Yi ) = EZi [Var(Yi |Zi )] + VarZi (E[Yi |Zi ])

= ex′i β+ 12σ2

+ VarZi (ex′i β+σZi )

= ex′i β+ 12σ2

+ e2x′i β+σ2

(eσ2

− 1).

i.e. a variance function of the form

Var(Yi ) = µi + k ′µ2i

make no specific distributional assumption about Z - estimate adiscrete mixing distribution by NPML.

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 203: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Germination of Orobanche seed

Example

O. aegyptiaca 75 O. aegyptiaca 73

Bean Cucumber Bean Cucumber

10/39 5/6 8/16 3/12

23/62 53/74 10/30 22/41

23/81 55/72 8/28 15/30

26/51 32/51 23/45 32/51

17/39 46/79 0/4 3/7

10/13

Response variable: Yi – number of germinated seeds out of mi seeds.Distribution: Binomial.Systematic component: factorial 2× 2 (2 species, 2 extracts),completely randomized experiment (Crowder, 1978).Aim: to see how germination is affected by species and extracts.Problem: overdispersion.

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 204: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Models for Orobanche Data (Hinde and Demetrio, 1998a,b)

Binomial:

Var(Yi ) = miπi (1− πi )

• residual deviance for the interaction model is 33.28 on 17 df. – overdispersion

Quasi-likelihood:

Var(Yi ) = miπi (1− πi )

• constant overdispersion φ = 1.862• only marginal evidence of interaction• extract only important factor

Williams:

Var(Yi ) = miπi (1− πi )[1 + φ(mi − 1)]

• moment estimate φ = 0.0249• only marginal evidence of interaction• extract only important factor

• note similarity to QL, even though mi not equal.

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 205: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Table: Deviances with overdispersion estimated from maximal model.

Bin Const Beta-BinomialML QL ML

E | S 56.49 30.34 32.69S | E 3.06 1.64 2.88S.Et 6.41 3.44 4.45

φ 1.862 0.012

Table: Deviances with overdispersion re-estimated for each model.

Beta-Binomial Logistic-NormalSource ML MLE | S 15.44 15.40S | E 2.73 2.71S.E 4.13 4.17

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 206: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Half-normal plots with simulation envelopes (Hinde and Demetrio, 1998a,b)

0.5 1.0 1.5 2.0

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

Binomial: Logit

Half-normal scores

Devia

nce

resid

uals

0.5 1.0 1.5 2.0

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

Constant Overdispersion (QL)

Half-normal scores

Devia

nce

resid

uals

0.5 1.0 1.5 2.0

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

Williams Type II

Half-normal scores

Devia

nce

resid

uals

0.5 1.0 1.5 2.0

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

Williams Type III

Half-normal scores

Devia

nce

resid

uals

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 207: Mixed Models, theory and applications

R program

Species <- factor(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,2, 2, 2, 2, 2, 2, 2, 2, 2, 2))

Extract <- factor(c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2,1, 1, 1, 1, 1, 2, 2, 2, 2, 2))

y <- c(10, 23, 23, 26, 17, 5, 53, 55, 32, 46, 10, 8,10, 8, 23, 0, 3, 22, 15, 32, 3)

m <- c(39, 62, 81, 51, 39, 6, 74, 72, 51, 79, 13, 16,30, 28, 45, 4, 12, 41, 30, 51, 7)

orobanch <- data.frame(Species, Extract, m, y)rm(Species, Extract, m, y)attach(orobanch)resp<-cbind(y,m-y)

# Binomial fitoro.Bin<-glm(resp~Species*Extract, family=binomial)anova(oro.Bin, test="Chisq")summary(oro.Bin, cor=FALSE)

oro.Bin<-glm(resp~Extract*Species, family=binomial)anova(oro.Bin, test="Chisq")summary(oro.Bin, cor=FALSE)

## Pearson estimate of phi(X2<-sum(residuals(oro.Bin, 'pearson')^2))(phi<-X2/df.residual(oro.Bin))

Page 208: Mixed Models, theory and applications

R program (cont)

# Quasilikelihhod fitoro.QL<-glm(resp~Species*Extract, family=quasibinomial)summary(oro.QL, cor=FALSE)summary(oro.QL)$dispersionanova(orobanchQL.fit, test="F")

##### Beta-binomial modellibrary(aod)# re-estimating phioro.BBin4 <- betabin(cbind(y,m-y) ~Extract,~1, data=orobanch)oro.BBin3 <- betabin(cbind(y,m-y) ~Species,~1, data=orobanch)oro.BBin2 <- betabin(cbind(y,m-y) ~Extract+Species,~1, data=orobanch)oro.BBin1 <- betabin(cbind(y,m-y) ~Extract+Species+Extract:Species,~1, data=orobanch)anova(oro.BBin4,oro.BBin2, oro.BBin1)anova(oro.BBin3,oro.BBin2, oro.BBin1)summary(oro.BBin2)

# fixing phi estimated by the maximum modelf1 <- betabin(cbind(y, m - y) ~ Extract+Species+Extract:Species, ~ 1, data=orobanch,

fixpar = list(5, 0.01238))f2 <- betabin(cbind(y, m - y) ~ Extract+Species, ~ 1, data=orobanch,

fixpar = list(4, 0.01238))f3 <- betabin(cbind(y, m - y) ~ Species, ~ 1, data=orobanch,

fixpar = list(3, 0.01238))f4 <- betabin(cbind(y, m - y) ~ Extract, ~ 1, data=orobanch,

fixpar = list(3, 0.01238))

anova(f4,f2,f1)anova(f3,f2,f1)

Page 209: Mixed Models, theory and applications

R program (cont)

# logistic normallibrary(lme4)ind <- (1:length(y))# re-estimating phioro.LN1<-glmer(resp~Extract*Species + (1|ind), family=binomial(link="logit"))oro.LN1summary(oro.LN1)@coefssummary(oro.LN1)@REmatoro.LN2<-glmer(resp~Extract+Species + (1|ind), family=binomial(link="logit"))oro.LN2oro.LN4<-glmer(resp~Extract + (1|ind), family=binomial(link="logit"))oro.LN4oro.LN3<-glmer(resp~Species + (1|ind), family=binomial(link="logit"))oro.LN3

anova(oro.LN4,oro.LN2,oro.LN1)anova(oro.LN3,oro.LN2,oro.LN1)

Page 210: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Worldwide Airline Fatalities, 1976-85

Example

Year Fatal Passenger Passengeraccidents deaths miles

(100 million)

1976 24 734 38631977 25 516 43001978 31 754 50271979 31 877 54811980 22 814 58141981 21 362 60331982 26 764 58771983 20 809 62231984 16 223 74331985 22 1066 7107

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 211: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Worldwide Airline Fatalities, 1976-85

Simple Models

Passenger miles (mi ) as exposure variable

Poisson log-linear model

Linear time trend

Yi ∼ Pois(miλi ) (2)

log λi = β0 + β1yeari

Fatal accidents:Deviance(time trend) = 20.68Residual Deviance = 5.46 on 8 d.f.

Passenger deaths:Deviance(time trend) = 202.1Residual Deviance = 1051.5 on 8 d.f.

⇒ compounding with aircraft size

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 212: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Confidence Intervals – airline data

1976 1978 1980 1982 1984

400

600

800

1000

1200

Year

Num

ber

of d

eath

s

*

ObservedPoissonNegative binomialQuasi−Poisson

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 213: Mixed Models, theory and applications

R program

airline.dat <- scan(what=list(acc=0, death=0, miles=0))24 734 386325 516 430031 754 502731 877 548122 814 581421 362 603326 764 587720 809 622316 223 743322 1066 7107

airline <-data.frame(airline.dat)airline.dat$year<-seq(1976,1985,by=1)attach(airline.dat)plot(year,acc, main="Number of Fatal Accidents vs Year")

acc1.fit<-glm(acc~year+offset(log(miles)), family=poisson)summary(acc1.fit)anova(acc1.fit,test="Chisq")

plot(c(1976,1985), c(15,35), type="n", xlab="Year", ylab="Number of Accidents")points(year,acc)x<-seq(1976,1985,1)lp<-predict(acc1.fit,data.frame(year=x))fv<-exp(lp)lines(x,fv,lty=1)title(sub="Observed values and fitted curve")

qqnorm(resid(acc1.fit),ylab="Residuo")qqline(resid(acc1.fit))

Page 214: Mixed Models, theory and applications

R program (cont)

# Number of deaths# Poisson fitplot(year,death, main="Number of Fatal Accidents vs Year")

death1.fit<-glm(death~year+offset(log(miles)), family=poisson)summary(death1.fit)anova(death1.fit, test="Chisq")inflim <- with(predict(death1.fit, se = TRUE,type="response"),fit -pnorm(0.975)*se.fit)suplim <- with(predict(death1.fit, se = TRUE,type="response"),fit + pnorm(0.975)*se.fit)

plot(c(1976,1985), c(350,1200), type="n", xlab="Year", ylab="Number of deaths")points(year,death)x<-seq(1976,1985,1)fv1<-predict(death1.fit,data.frame(year=x),type="response")lines(x,fv1,lty=1)lines(x,inflim,lty=2)lines(x,suplim,lty=2)

# Negative binomial fitlibrary(MASS)death2.fit<-glm.nb(death~year+offset(log(miles)),link = log)summary(death2.fit)anova(death2.fit, test="F")inflim <- with(predict(death2.fit, se = TRUE,type="response"),fit -qt(0.975,8)*se.fit)suplim <- with(predict(death2.fit, se = TRUE,type="response"),fit + qt(0.975,8)*se.fit)

Page 215: Mixed Models, theory and applications

R program (cont)

fv2<-predict(death2.fit,data.frame(year=x),type="response")#lines(x,fv2,lty=1)#lines(x,fv1,lty=1,col="blue")lines(x,inflim,lty=3, col="blue")lines(x,suplim,lty=3, col="blue")

# Quasilikelihhod fitdeath3.fit<-glm(death~year+offset(log(miles)), family=quasipoisson)summary(death3.fit)anova(death3.fit, test="F")inflim <- with(predict(death3.fit, se = TRUE,type="response"),fit -qt(0.975,8)*se.fit)suplim <- with(predict(death3.fit, se = TRUE,type="response"),fit + qt(0.975,8)*se.fit)

fv3<- predict(death3.fit,data.frame(year=x),type="response")#lines(x,fv3,lty=1,col="red")lines(x,inflim,lty=4, col="red")lines(x,suplim,lty=4, col="red")legend(1976,1200.0,c("Observed","Poisson", "Negative binomial","Quasi-Poisson"), lty=c(-1,2,3,4),pch=c("*"," "," "," "), col=c("black","black","blue", "red"))

# Poisson log-normallibrary(lme4)ind <- (1:length(death))death.LN1<-glmer(death~year+offset(log(miles)) + (1|ind),family=poisson(link="log"))death.LN1summary(death.LN1)@coefssummary(death.LN1)@REmat

Page 216: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Orange data (Pinheiro & Bates, 2000; Molenberghs &Verbeke, 2008)

Example

The data are the trunk circumference (mm) of each of five orange treesat seven different occasions.

ResponseDay Tree 1 Tree 2 Tree 3 Tree 4 Tree 5118 30 33 30 32 30484 58 69 51 62 49664 87 111 75 112 81

1004 115 156 108 167 1251231 120 172 115 179 1421372 142 203 139 209 1741582 145 203 140 214 177

All trees measured on the same occasions – balanced, longitudinal data.

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 217: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Time since December 31, 1968 (days)

Trun

k ci

rcum

fere

nce

(mm

)

50

100

150

200

500 1000 1500

●●

● ●

3

●●

● ●

1

500 1000 1500

● ●

5

● ●

2

500 1000 1500

50

100

150

200

●●

4

Time since December 31, 1968 (days)

Trun

k ci

rcum

fere

nce

(mm

)

50

100

150

200

500 1000 1500

●●

● ●

● ●

●●

1

3 1 5 2 4

Much variability between trees

Much less variability within trees

Non-linear trend

Possible correlation along the time

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 218: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Models for Orange data

Model 1: Logistic model for tree i at age xij , i = 1, . . . , 5, j = 1, . . . , 7without random effects (4 parameters)

Yij =φ1

1 + exp[−(xij − φ2)/φ3]+ εij

φ1 is an asymptotic trunk circumference,φ2 is the age at which the tree attains half of it asymptotic trunkcircumference,φ3 represents the growth scale,the error term εij ∼ N(0, σ2)

Model 2: Separate logistic curve to each tree (16 parameters)

Yij =φ1i

1 + exp[−(xij − φ2i )/φ3i ]+ εij

εij ∼ N(0, σ2)

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 219: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Models for Orange data (cont.)

Model 3: Logistic model with random efects (10 parameters)

Yij =φ1i

1 + exp[−(xij − φ2i )/φ3i ]+ εij

φi =

ψ1i

ψ2i

ψ3i

=

β1

β2

β3

+

b1i

b2i

b3i

= β+ bi , ψ =

σ21 σ12 σ13

σ12 σ22 σ23

σ13 σ23 σ23

bi ∼ N(0,ψ), εij ∼ N(0, σ2)

Model 4: Logistic model with random effect on the asymptote (5parameters)

Yij =φ1i

1 + exp[−(xij − φ2)/φ3]+ εij

φ1i = β1 + bi , bi ∼ N(0, σ2b), εij ∼ N(0, σ2)

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 220: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Model 1: Fixed logistic model

Fitted values

Sta

ndar

dize

d re

sidu

als

−1

0

1

2

50 100 150

● ●

● ●

● ●

● ●

Residuals

Tree

3

1

5

2

4

−40 −20 0 20 40

The variability in the residuals increases with the fitted values, dueto correlation among observations in the same treeThe residuals are mostly negative for trees 1 and 3 and mostlypositive for trees 2 and 4

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 221: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Model 2: Separate logistic curve to each treeTr

ee

3

1

5

2

4

150 200 250

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

Asym

600 800 1000

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

xmid

200 300 400 500 600

|

|

|

|

|

|

|

|

|

|

|

|

|

|

|

scal

Residuals (mm)Tr

ee

3

1

5

2

4

−10 −5 0 5 10

The estimates for all the parameters vary with tree, there appears tobe relatively more variability in the Asym estimates.Asym – the only parameter for which all the confidence intervals donot overlap, suggesting that it is the only parameter for whichrandom effects are needed to account for variation among trees.

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 222: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Models 3 and 4: Random logistic models

Model df AIC BIC logLik Test L.Ratio p-value3 10 279.98 295.53 -129.994 5 273.17 280.95 -131.58 3 vs 4 3.1896 0.6708

Note: The default estimation method in nlme in R is maximumlikelihood (ML), while in lme is restricted maximum likelihood(REML).

The large p-value for the likelihood ratio test confirms that thesimpler model (random effect only for the asymptote) is to bepreferred.

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 223: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Model 1 Model 4Par. Value Std. Error DF t-value Value Std. Error DF t-value

Asym 192.68 20.24 32 9.52 191.05 16.15 28 11.83xmid 728.71 107.27 32 6.79 722.56 35.15 28 20.56scal 353.49 81.46 32 4.34 344.17 27.15 28 12.68

Residual 546.26 61.56

The fixed-effects estimates are similar, but the standard errors aremuch smaller in the nlme fit.

The estimated within-group standard error is also considerablysmaller in the non-linear mixed model.

This is because the between-group variability is not incorporated inthe non-linear fixed model, being absorbed in the standard error.

This pattern is generally observed when comparing mixed-effectsversus fixed-effects fits.

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 224: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

Fitted values (mm)

Sta

ndar

dize

d re

sidu

als

−1

0

1

50 100 150 200

● ●

●●

●●

Time since December 31, 1968 (days)

Trun

k ci

rcum

fere

nce

(mm

)

50

100

150

200

500 1000 1500

● ●

3

500 1000 1500

●●

1

500 1000 1500

●●

5

500 1000 1500

● ●

2

500 1000 1500

4

fixed Tree

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 225: Mixed Models, theory and applications

R program

library(nlme)data() # lists all datasetsdata(Orange, package='datasets')str(Orange)head(Orange)summary(Orange)par(mfrow=c(1,2))plot(Orange) ## five curves in five different panelsplot(Orange, outer=~1) ## all five curves in one panel

## no random effects - Model 1(fm1Oran.nls <- nls(circumference ~ SSlogis(age,Asym,xmid,scal),data = Orange))summary(fm1Oran.nls)plot(fm1Oran.nls)plot(fm1Oran.nls, Tree ~ resid(.), abline=0)

## separate logistic curve to each tree - Model 2(fm1Oran.lis <- nlsList(circumference ~ SSlogis(age, Asym, xmid, scal)|Tree,data = Orange))summary(fm1Oran.lis)plot(intervals(fm1Oran.lis), layout = c(3,1))plot(fm1Oran.lis, Tree ~ resid(.), abline = 0)

Page 226: Mixed Models, theory and applications

R program (cont.)

## random effect model - Model 3## no need to specify groups, as Orange is a groupedData object# random is ommitted - by default is equal to fixed(fm1Oran.nlme <- nlme(circumference ~ SSlogis(age,Asym,xmid,scal),data = Orange,fixed=Asym+xmid+scal~1, start=fixef(fm1Oran.lis)))## or simplyfm1Oran.nlme <- nlme(fm1Oran.lis)summary(fm1Oran.nlme) ## compare with next (look at the standard errors)summary(fm1Oran.nls)

## random effect model - Model 4fm2Oran.nlme <- update(fm1Oran.nlme, random = Asym ~ 1)anova(fm1Oran.nlme,fm2Oran.nlme) ## compares Model 3 and 4summary(fm2Oran.nlme)plot(fm1Oran.nlme) ## plots the standardized residuals versus the fitted valuesplot(augPred(fm2Oran.nlme, level = 0:1), layout = c(5,1))qqnorm(fm2Oran.nlme, abline = c(0,1),cex=0.7)

Page 227: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

References I

Bates D.M. (2010). lme4: Mixed-effects modeling, with R, Springer,http://lme4.r-forge.r-project.org/book/

Cornillon, P.A. and Matzner-Lober, E. (2011). Regression avec R. Springer, Collection Pratique R,1st edition.

Faraway J.J.(2006). Extending the Linear Model with R. Texts in Statistical Science. Chapman &Hall

Hinde, J. and Demetrio, C.G.B. (1998a). Overdispersion: models and estimation. ComputationalStatistics and Data Analysis, 27, 151–170.

Hinde, J. and Demetrio, C.G.B. (1998b). Overdispersion: models and estimation. Sao Paulo: ABE.Jorgensen, B. (1997) The Theory of Dispersion Models. Chapman and Hall, London.Jorgensen, B. (2011) Generalized linear models. Encyclopedia of Environmetrics, to appear.Littell, R.C.; Milliken, G.A.; Stroup, W.W.; Wolfinger, R.D. and Schabenberger, O. (2006). SAS

for Mixed Models, SAS Institute.McCullagh, P. and Nelder, J.A. (1989). Generalized Linear Models, 2ed., Chapman and Hall,

London.McCulloch, C.E. and Searle, S.R. (2001). Generalized, Linear, and Mixed Models, Wiley Series in

Probability and Statistics.Nelder, J.A. and Wedderburn, R.W.M. (1972). Generalized linear models, Journal of the Royal

Statistical Society, A, 135 370–384.Pinheiro, J. C., and Bates, D. (2000), Mixed Effects Models in S and S-PLUS, New York: Springer.Searle, S.R.; Casella, G. and McCulloch, C.E. (1992). Variance components, Wiley Series in

Probability and Statistics.Vazquez, A. I.; Bates,D. M.; Rosa,G. J. M.; Gianola, D.; Weigel, K. A. (2010) Technical note: An

R package for fitting generalized linear mixed models in animal breeding. J. Anim. Sci..88:497-504.

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications

Page 228: Mixed Models, theory and applications

Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models

References II

Venables, W.N. and Ripley, B.D. (1994) Modern applied statistics with S-Plus. Springer-Verlag,New York.

Verbeke, G. and Molenberghs, G. (2000) Linear Mixed Models for Longitudinal Data. New York:Springer-Verlag.

Verbeke, G. and Molenberghs, G. (2008) Models for Longitudinal and Incomplete Data. Slides of ashort course.

West, B.T.; Welch, K.B,; Galecki, A.T. (2007) Linear Mixed Models. A practical Guide UsingStatistical Software New York: Chapman & Hall/CRC.

Journal of Societe Francaise de Statistique (2002) Special Number on Modeles Mixtes etBiometrie. Vol.143 - No 1-2

C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications