longitudinal data analysis: why and how to do it with multi-level modeling (mlm)?

Longitudinal Data Analysis:Why and How to Do it With

Multi-Level Modeling (MLM)?

Oi-man Kwok

Texas A & M University

2

• Why do we want to analyze longitudinal data under multilevel modeling (MLM) framework?– Dependency issue– Advantages of using MLM over traditional Methods

(e.g., Univariate ANOVA, Multivariate ANOVA)– Review of important parameters in MLM

• How can we do it under SPSS?

Road Map

• Regression Model:e.g.

DV: Test Scores of 1st Year Grad-Level Statistics IV: GRE_M (GRE Math Test Score) 150 Students (i = 1,…,150)

One of the important Assumptions for OLS regression?(Observations are independent from each other)

iii eMGREStat _10

4

Ignoring the clustered structure (or dependency between observations) in the analyses can result in:

• Bias in the standard errors

*Bias in the test of significance and confidence interval(Type I errors: Inflated alpha level (e.g. set α=.05; actual α=.10)) non-replicable results

5

Advantages of MLM over the traditional Methods on analyzing longitudinal data

• Univariate ANOVA—Restriction on the error structure: Compound Symmetry (CS) type error structure (higher statistical power but not likely to be met in longitudinal data)

• Multivariate ANOVA—No restriction on the error structure: Unstructured (UN) type error structure (often too conservative, lower statistical power); can only handle completely balanced data (Listwise deletion)

• More…

Analyzing Longitudinal Data:

• Example• (Based on Actual Data—variable names changed for ease

of presentation):Compare two different teaching methods on Achievement over time

• Teaching Methods:78 students are randomly assigned to either:A. Lecture (Control group; 39 students) orB. Computer (Treatment group; 39 students)

• 4 Achievement (Ach) scores (right after the course, 1 year after, 2 year after, & 3 year after) were collected from each student after treatment (i.e. statistics course)

7

Achievement

Computer

Lecture

Time=0 : Immediately posttest measure

Time (Year)1 2 3

Multi-Level Model (MLM) • Note: Start with simple growth model

ttt eTimeAch 10

1 2 3

e1

Acht

Timet0

Student 36

β0

β1

e0

e2

e3

A Simple Regression Model for ONE student (student 36)

(t=0,1,2,3)

et: Captures variation of individual achievement scores from the fitted regression model WITHIN student 36

V(eti)=σ2

ttt eTimeAch 10

titiiiti eTimeAch 10 Compare to

(Micro Level Model)

1 2 3

Student 27

Achti

Timeti0

Student 36

Student 52

β1_Student 27

β1_Student 36

Β0_Student 36

Β0_Student 52Β0_Student 27

(i=1,2,3,…,78)

10

Student ID

12 13.5 1.25

15 10.5 2.75

23 12.6 .23

27 15.6 .28

28 22.3 1.64

33 36.4 3.27

37 25.2 1.22

i0i1

1 2 3

Student 27Achti

Timeti0

Student 36

Student 52

β1_Student 27

β1_Student 36

Β0_Student 36

Β0_Student 52Β0_Student 27

00

10

Grand Intercept

Grand Slope00

11

Variance of the intercepts

Variance of the Slopes

Overall Model

Student 27

Student 36

Student 52

No variation among the 78 intercepts

Ach

Time0

γ00

110

00

G

11

00

0

0

G

Captures the deviations ofthe 78 intercepts from thegrand intercept γ00

Captures the deviations of the 78 slopes from theGrand slope γ10

Ach

Time

Overall Model

Student 27

Student 36

Student 52

γ10

γ10

γ10

γ10

No variation among the 78 slopes

00

000G

11

00

0

0

G

13

Ach

Time

Overall Model

11

00

0

0

G

1110

0100

G

01001

Summary

• G: Captures between- student differences

• R: Captures within-student random errors

1110

0100

G

00

10

Grand Intercept

Grand Slope

00

11

Variance of the Intercepts

Variance of the Slopes

01 Covariance betweenIntercepts and Slopes

V(eti)=σ2

15

MACRO vs. MICRO

• UNITS:

Educational

study

Family study Longitudinal study

MACRO School

/Class

Family Individual

MICRO Student Family member

Repeated observations

16

MACRO vs. MICRO (Cont.)

• MODELS:MICRO level model:

regression model fits the observations within each MACRO unit

MACRO level model:model captures the differences between the overall model and individual regression models from different macro units

17

• Dependent Variable:

Math Achievement (Achieve, Repeat measures /Micro Level)

• Predictors:• Repeated measure (MICRO) Level Predictor:

Time (& any time varying covariates)

• Student (MACRO) Level Predictor:

Computer (Different teaching methods) (& any time-invariant variables such as gender)

18

Data format under MANOVA approaches:

• Student Treat T0 T1 T2 T3• S1 0 5 3 2 3 • S2 1 5 25 -- 33• S3 1 -- 19 17 26 • S1 has responses on all time points• S2 has missing response at time 2 (indicated by "--") • S3 has missing response at time 0.

• MANOVA: only retains S1 in the analysis

(SPSS Data Format)

19

Student Treat T0 T1 T2 T3S1 0 5 3 2 3 S2 1 5 25 -- 33S3 1 -- 19 17 26

Student Treat Time DVS1 0 0 5 S1 0 1 3S1 0 2 2S1 0 3 3S2 1 0 5S2 1 1 25S2 1 3 33S3 1 1 19S3 1 2 17S3 1 3 26

Data format for MANOVA

Data format for Multilevel Model

(All 3 students are included in the analyses)

20

Student Treat Time DVS1 0 0 5 S1 0 7 3S1 0 12 2S1 0 13 3S2 1 1 5S2 1 3 9S2 1 4 5S2 1 6 25S3 1 3 18 S3 1 15 19S3 1 28 17S3 1 31 26

Can you transform thisdataset back into multivariateformat???

21

Questions

• 1. On average, is there any trend of the math achievement over time?

• 2. Are there any differences between students on the trend of math achievement over time? (Do all students have the same trend of math achievement over time?)

titiiiti eTimeAch 10

ii U0000

ii U1101

Micro Level (Level 1):

Macro Level (Level 2):

111 )( iUVar

000 )( iUVar

Grand Slope

Grand Intercept

23

titiiiti eTimeMathach 10

ii U0000 ii U1101

Micro Level

Macro Level

Combined Model

titiiititi eTimeUUTimeMathach 101000

Between School Differences

Within School Errors

Grand Intercept

Grand Slope

TIME

1.51.0.50.0

AC

H

120

100

80

60

40

20

SUBID

53

32

18

15

14

11

6.0

4.0

Red: ComputerBlue: Lecture

25

MAti =γ00 + γ10 Timeti+U0i +U1i Timeti+ eti

SPSS MIXED Syntax:MIXED mathach with Time

/METHOD = REML

/Fixed = intercept Time

/Random = intercept Time

|Subject(Subid) COVTYPE (UN)

/PRINT = G SOLUTION TESTCOV.

Execute.

Default: REML(Restricted Maximum Likelihood)Other option:ML (Maximum Likelihood)

Produce asymptoticstandard errors andWald Z-tests for The covarianceParameter estimates

identity variable for Macro levelUnits (e.g., Subid)

Captures the overall model

Requests for regressioncoefficients

Specify random effects: Effects capture the between-School differences

Print G matrix

Structure of G matrix (Unstructured)

DV with Continuous IV by Categorical IV

26

SPSS Output

Basic Information

Model Dimensionb

1 1

1 1

2 Unstructured 3 subid

1

4 6

Intercept

time

Fixed Effects

Intercept + timeaRandom Effects

Residual

Total

Numberof Levels

CovarianceStructure

Number ofParameters

SubjectVariables

As of version 11.5, the syntax rules for the RANDOM subcommand have changed. Yourcommand syntax may yield results that differ from those produced by prior versions. Ifyou are using SPSS 11 syntax, please consult the current syntax reference guide formore information.

a.

Dependent Variable: Achieve.b.

27

Information Criteriaa

2509.873

2517.873

2518.004

2536.819

2532.819

-2 Restricted LogLikelihood

Akaike's InformationCriterion (AIC)

Hurvich and Tsai'sCriterion (AICC)

Bozdogan's Criterion(CAIC)

Schwarz's BayesianCriterion (BIC)

The information criteria are displayedin smaller-is-better forms.

Dependent Variable: Achieve.a.

28

Type III Tests of Fixed Effectsa

1 77 871.772 .000

1 77 13.701 .000

SourceIntercept

time

Numerator dfDenominator

df F Sig.


Estimates of Fixed Effectsa

54.25609 1.8375833 77 29.526 .000 50.5969939 57.9151856

2.3760897 .6419278 77 3.701 .000 1.0978482 3.6543313

ParameterIntercept

time

Estimate Std. Error df t Sig. Lower Bound Upper Bound

95% Confidence Interval


Requested by the “Solution” command in the PRINT statement (Line 5)

(γ10) Average Trend of the MA score

(γ00) Average MA score at Time=0

29

Estimates of Covariance Parametersa

87.75982 9.9368430 8.832 .000 70.2936565 109.5658788

201.9517 43.01424 4.695 .000 133.0294456 306.5824032

-.1513755 11.31083 -.013 .989 -22.3201972 22.0174463

14.58960 5.5482320 2.630 .009 6.9237677 30.7428445

ParameterResidual

UN (1,1)

UN (2,1)

UN (2,2)

Intercept +time [subject= subid]

Estimate Std. Error Wald Z Sig. Lower Bound Upper Bound



Random Effect Covariance Structure (G)a

201.9517 -.1513755

-.1513755 14.5895961

Intercept | subid

time | subid

Intercept |subid time | subid

Unstructured


Requested by the “G” commandin the PRINT statement (Line 5)

τ00 τ10 τ11

τ01

1110

0100

τ00 τ10

τ11

τ01

Requested by the “TESTCOV” command in the PRINT statement (Line 5)

Asymptotic standard errors and Wald Z-tests

σ2

30

• Compare

Likelihood Ratio Test!

Can I have a simpler G matrix (i.e. τ01= τ10 =0)

1110

0100

11

00

0

0

With

-2LL: 2509.873 -2LL: ?

31

Syntax for fitting simpler G

SPSS syntax/random = intercept Time |subject(Subid)

COVTYPE (Diag)

11

00

0

0

32

(Model with τ01= τ10 =0)

-2 Res Log Likelihood 2509.873

(or Deviance)

(Model with τ01= τ10 ≠0)


(or Deviance)

χ2(1)=.00018, p=.99

Choose This

33

Compare to model with τ11= 0

SPSS syntax

/random = intercept |subject(Subid) COVTYPE (Diag)

00

000

34

(Model with τ01=τ10=0, τ11≠0)


(Model with τ11=τ01=τ10= 0)


χ2(1)=14.51, p<.001

Choose This

Halved P-value

11

00

0

0

00

000

35

Result of the final Model

γ00

γ10

Estimates of Covariance Parametersa

87.794973 9.591118 9.154 .000 70.872958 108.757380

201.7136 39.133631 5.154 .000 137.910425 295.034860

14.556515 4.964819 2.932 .003 7.459959 28.403928

ParameterResidual

Var: Intercept

Var: time

Intercept + time [subject= subid]

Estimate Std. Error Wald Z Sig. Lower Bound Upper Bound




201.7136 0

0 14.556515

Intercept | subid

time | subid


Diagonal


Estimates of Fixed Effectsa

54.256090 1.836838 89.672 29.538 .000 50.606708 57.905472

2.376090 .641668 89.672 3.703 .000 1.101242 3.650938

ParameterIntercept

time

Estimate Std. Error df t Sig. Lower Bound Upper Bound



τ00 τ11

σ2

36

• 1. On average, is there any trend of the math achievement over time?

• 2. Are there any differences between students on the trend of math achievement over time? (Or, do all students have the same trend of math achievement over time?)

τ00 = 201.71 τ11 = 14.56

• Q3. If Yes to Q2, what causes the differences?

titi TimechaMath 38.226.54ˆ

37

• Micro Level (Level 1):

MAti = 0i + 1i Timeti + eti

(Variance of eti = σ2)

• Combined Model:

MAti =γ00 + γ01 Compi + γ10 Timeti + γ11Timeti*Compi

+ U0i + U1i SESti + eti

• Macro Level (Level 2):

β0i =γ00 + γ01 Compi + U0i

β1i =γ10 + γ11 Compi + U1i

(Variance of U0i = τ00; Variance of U1i = τ11)

Null Hypothesis:Different teaching methods have SAME effects on achievement over time

(H0: γ11 = 0)

38

MAij =γ00 + γ01 Compi + γ10 Timeti + γ11Timeti*Compi + U0i + U1i Timeti + eti

• SPSS PROC MIXED Syntax:MIXED mathach with Time

/METHOD = REML /Fixed = intercept Comp Time Time*Comp /Random = intercept Time

|Subject(Subid) COVTYPE (Diag)

/PRINT = G SOLUTION TESTCOV. Execute.

39

Without Comp in the Macro models

With Comp in the Macro models


176.1636 0

0 9.813461

Intercept | subid

time | subid


Diagonal



201.7136 0

0 14.556515

Intercept | subid

time | subid


Diagonal


40

81.90

016.176G

56.140

071.201G

(WITHOUT “Comp” in the model) (WITH “Comp” in the model)

Proportion of variance in the intercept ( ) explained by “Comp”=(201.71-176.16)/201.71 = .13 (or 13%)

Proportion of variance in the slope ( ) explained by “Comp”=(14.56-9.81)/14.56 = .33 (or 33%)

00

11

41

Solution for Fixed Effects

Standard

Effect Estimate Error DF t Value Pr > |t|

Intercept 50.3769 2.4764 76 20.34 <.0001

time 0.5756 0.8445 232 0.68 0.4962

computer 7.7583 3.5021 76 2.22 0.0297

time*comp 3.6009 1.1943 232 3.02 0.0029

tiitiiti TimeCompTimeComphcA *60.3*58.*76.738.50ˆ

42

titi TimeachhMat 58.38.50ˆ

Overall Model for students in the Lecture method group

Overall Model for students in the Computer method group

titi TimeachhMat 18.414.58ˆ

tiitiiti TimeCompTimeComphcA *60.3*58.*76.738.50ˆ

81.90

016.176G

Random Effect

V(eti)=σ2=90.00

43

Achievement

Computer

Lecture

Time=0 : Immediately posttest measure

Time (Year)

Conclusion

• Advantages of using MLM over traditional ANOVA approaches for analyzing longitudinal data: – 1. Can flexibly model the variance function– 2. Retain meaning of the random effects– 3. Explore factors which predict individual differences in

change over time (e.g., Treatment effect)

– 4.Take both unequal spacing and missing data into

account

1100 ,

45

Take Home Exercise A clinical psychologist wants to examine the

impact of the stress level of each family member (STRESS) on his/her level of symptomatology (SYMPTOM). There are 100 families, and families vary in size from three to eight members. The total number of participants is 400.

a) Can you write out the model? (Hint: What is in the micro model? What is in the macro model?)

b) Can you write out the syntax (SPSS) to analyze this model?

46

c) In designing the study, what possible macro predictors do you think the clinical psychologist should include in her study? (e.g. family size?)

d) In designing the study, what possible micro predictors do you think the clinical psychologist should include in her study? (e.g. participant’s neuroticism?)

e) Can you write out the model? (Hint: What is in the micro model? What is in the macro model)

f) Can you write out the syntax (SPSS) to analyze this model?

47

b) SYMPTOMij = γ00 + γ10 STRESSij + U0j + U1j STRESSij + eij

SPSS Syntax:MIXED Symptom with Stress

/fixed = intercept Stress

/random = intercept Stress |subject (Family) COVTYPE (UN)

/PRINT = G SOLUTION TESTCOV.

execute.

48

a) Micro-level model:

SYMPTOMij = β0j + β1j STRESSij + eij

Macro-level model:

β0j = γ00 + U0j

β1j = γ10 + U1j

Combined model:

SYMPTOMij = γ00 + γ10 STRESSij

+ U0j + U1j STRESSij + eij

THE END!

THANK YOU!

longitudinal data analysis: why and how to do it with multi-level modeling (mlm)?

Documents

student student

student differencesr

fitted regression model

macro unitmacro level

multilevel modeling

ols regression

individual regression

longitudinal data analysis