longitudinal data analysis: why and how to do it with multi-level modeling (mlm)?
DESCRIPTION
Longitudinal Data Analysis: Why and How to Do it With Multi-Level Modeling (MLM)?. Oi-man Kwok Texas A & M University. Road Map. Why do we want to analyze longitudinal data under multilevel modeling (MLM) framework? Dependency issue - PowerPoint PPT PresentationTRANSCRIPT
Longitudinal Data Analysis:Why and How to Do it With
Multi-Level Modeling (MLM)?
Oi-man Kwok
Texas A & M University
2
• Why do we want to analyze longitudinal data under multilevel modeling (MLM) framework?– Dependency issue– Advantages of using MLM over traditional Methods
(e.g., Univariate ANOVA, Multivariate ANOVA)– Review of important parameters in MLM
• How can we do it under SPSS?
Road Map
• Regression Model:e.g.
DV: Test Scores of 1st Year Grad-Level Statistics IV: GRE_M (GRE Math Test Score) 150 Students (i = 1,…,150)
One of the important Assumptions for OLS regression?(Observations are independent from each other)
iii eMGREStat _10
4
Ignoring the clustered structure (or dependency between observations) in the analyses can result in:
• Bias in the standard errors
*Bias in the test of significance and confidence interval(Type I errors: Inflated alpha level (e.g. set α=.05; actual α=.10)) non-replicable results
5
Advantages of MLM over the traditional Methods on analyzing longitudinal data
• Univariate ANOVA—Restriction on the error structure: Compound Symmetry (CS) type error structure (higher statistical power but not likely to be met in longitudinal data)
• Multivariate ANOVA—No restriction on the error structure: Unstructured (UN) type error structure (often too conservative, lower statistical power); can only handle completely balanced data (Listwise deletion)
• More…
Analyzing Longitudinal Data:
• Example• (Based on Actual Data—variable names changed for ease
of presentation):Compare two different teaching methods on Achievement over time
• Teaching Methods:78 students are randomly assigned to either:A. Lecture (Control group; 39 students) orB. Computer (Treatment group; 39 students)
• 4 Achievement (Ach) scores (right after the course, 1 year after, 2 year after, & 3 year after) were collected from each student after treatment (i.e. statistics course)
7
Achievement
Computer
Lecture
Time=0 : Immediately posttest measure
Time (Year)1 2 3
Multi-Level Model (MLM) • Note: Start with simple growth model
ttt eTimeAch 10
1 2 3
e1
Acht
Timet0
Student 36
β0
β1
e0
e2
e3
A Simple Regression Model for ONE student (student 36)
(t=0,1,2,3)
et: Captures variation of individual achievement scores from the fitted regression model WITHIN student 36
V(eti)=σ2
ttt eTimeAch 10
titiiiti eTimeAch 10 Compare to
(Micro Level Model)
1 2 3
Student 27
Achti
Timeti0
Student 36
Student 52
β1_Student 27
β1_Student 36
Β0_Student 36
Β0_Student 52Β0_Student 27
(i=1,2,3,…,78)
10
Student ID
12 13.5 1.25
15 10.5 2.75
23 12.6 .23
27 15.6 .28
28 22.3 1.64
33 36.4 3.27
37 25.2 1.22
i0i1
1 2 3
Student 27Achti
Timeti0
Student 36
Student 52
β1_Student 27
β1_Student 36
Β0_Student 36
Β0_Student 52Β0_Student 27
00
10
Grand Intercept
Grand Slope00
11
Variance of the intercepts
Variance of the Slopes
Overall Model
Student 27
Student 36
Student 52
No variation among the 78 intercepts
Ach
Time0
γ00
110
00
G
11
00
0
0
G
Captures the deviations ofthe 78 intercepts from thegrand intercept γ00
Captures the deviations of the 78 slopes from theGrand slope γ10
Ach
Time
Overall Model
Student 27
Student 36
Student 52
γ10
γ10
γ10
γ10
No variation among the 78 slopes
00
000G
11
00
0
0
G
13
Ach
Time
Overall Model
11
00
0
0
G
1110
0100
G
01001
Summary
• G: Captures between- student differences
• R: Captures within-student random errors
1110
0100
G
00
10
Grand Intercept
Grand Slope
00
11
Variance of the Intercepts
Variance of the Slopes
01 Covariance betweenIntercepts and Slopes
V(eti)=σ2
15
MACRO vs. MICRO
• UNITS:
Educational
study
Family study Longitudinal study
MACRO School
/Class
Family Individual
MICRO Student Family member
Repeated observations
16
MACRO vs. MICRO (Cont.)
• MODELS:MICRO level model:
regression model fits the observations within each MACRO unit
MACRO level model:model captures the differences between the overall model and individual regression models from different macro units
17
• Dependent Variable:
Math Achievement (Achieve, Repeat measures /Micro Level)
• Predictors:• Repeated measure (MICRO) Level Predictor:
Time (& any time varying covariates)
• Student (MACRO) Level Predictor:
Computer (Different teaching methods) (& any time-invariant variables such as gender)
18
Data format under MANOVA approaches:
• Student Treat T0 T1 T2 T3• S1 0 5 3 2 3 • S2 1 5 25 -- 33• S3 1 -- 19 17 26 • S1 has responses on all time points• S2 has missing response at time 2 (indicated by "--") • S3 has missing response at time 0.
• MANOVA: only retains S1 in the analysis
(SPSS Data Format)
19
Student Treat T0 T1 T2 T3S1 0 5 3 2 3 S2 1 5 25 -- 33S3 1 -- 19 17 26
Student Treat Time DVS1 0 0 5 S1 0 1 3S1 0 2 2S1 0 3 3S2 1 0 5S2 1 1 25S2 1 3 33S3 1 1 19S3 1 2 17S3 1 3 26
Data format for MANOVA
Data format for Multilevel Model
(All 3 students are included in the analyses)
20
Student Treat Time DVS1 0 0 5 S1 0 7 3S1 0 12 2S1 0 13 3S2 1 1 5S2 1 3 9S2 1 4 5S2 1 6 25S3 1 3 18 S3 1 15 19S3 1 28 17S3 1 31 26
Can you transform thisdataset back into multivariateformat???
21
Questions
• 1. On average, is there any trend of the math achievement over time?
• 2. Are there any differences between students on the trend of math achievement over time? (Do all students have the same trend of math achievement over time?)
titiiiti eTimeAch 10
ii U0000
ii U1101
Micro Level (Level 1):
Macro Level (Level 2):
111 )( iUVar
000 )( iUVar
Grand Slope
Grand Intercept
23
titiiiti eTimeMathach 10
ii U0000 ii U1101
Micro Level
Macro Level
Combined Model
titiiititi eTimeUUTimeMathach 101000
Between School Differences
Within School Errors
Grand Intercept
Grand Slope
TIME
1.51.0.50.0
AC
H
120
100
80
60
40
20
SUBID
53
32
18
15
14
11
6.0
4.0
Red: ComputerBlue: Lecture
25
MAti =γ00 + γ10 Timeti+U0i +U1i Timeti+ eti
SPSS MIXED Syntax:MIXED mathach with Time
/METHOD = REML
/Fixed = intercept Time
/Random = intercept Time
|Subject(Subid) COVTYPE (UN)
/PRINT = G SOLUTION TESTCOV.
Execute.
Default: REML(Restricted Maximum Likelihood)Other option:ML (Maximum Likelihood)
Produce asymptoticstandard errors andWald Z-tests for The covarianceParameter estimates
identity variable for Macro levelUnits (e.g., Subid)
Captures the overall model
Requests for regressioncoefficients
Specify random effects: Effects capture the between-School differences
Print G matrix
Structure of G matrix (Unstructured)
DV with Continuous IV by Categorical IV
26
SPSS Output
Basic Information
Model Dimensionb
1 1
1 1
2 Unstructured 3 subid
1
4 6
Intercept
time
Fixed Effects
Intercept + timeaRandom Effects
Residual
Total
Numberof Levels
CovarianceStructure
Number ofParameters
SubjectVariables
As of version 11.5, the syntax rules for the RANDOM subcommand have changed. Yourcommand syntax may yield results that differ from those produced by prior versions. Ifyou are using SPSS 11 syntax, please consult the current syntax reference guide formore information.
a.
Dependent Variable: Achieve.b.
27
Information Criteriaa
2509.873
2517.873
2518.004
2536.819
2532.819
-2 Restricted LogLikelihood
Akaike's InformationCriterion (AIC)
Hurvich and Tsai'sCriterion (AICC)
Bozdogan's Criterion(CAIC)
Schwarz's BayesianCriterion (BIC)
The information criteria are displayedin smaller-is-better forms.
Dependent Variable: Achieve.a.
28
Type III Tests of Fixed Effectsa
1 77 871.772 .000
1 77 13.701 .000
SourceIntercept
time
Numerator dfDenominator
df F Sig.
Dependent Variable: Achieve.a.
Estimates of Fixed Effectsa
54.25609 1.8375833 77 29.526 .000 50.5969939 57.9151856
2.3760897 .6419278 77 3.701 .000 1.0978482 3.6543313
ParameterIntercept
time
Estimate Std. Error df t Sig. Lower Bound Upper Bound
95% Confidence Interval
Dependent Variable: Achieve.a.
Requested by the “Solution” command in the PRINT statement (Line 5)
(γ10) Average Trend of the MA score
(γ00) Average MA score at Time=0
29
Estimates of Covariance Parametersa
87.75982 9.9368430 8.832 .000 70.2936565 109.5658788
201.9517 43.01424 4.695 .000 133.0294456 306.5824032
-.1513755 11.31083 -.013 .989 -22.3201972 22.0174463
14.58960 5.5482320 2.630 .009 6.9237677 30.7428445
ParameterResidual
UN (1,1)
UN (2,1)
UN (2,2)
Intercept +time [subject= subid]
Estimate Std. Error Wald Z Sig. Lower Bound Upper Bound
95% Confidence Interval
Dependent Variable: Achieve.a.
Random Effect Covariance Structure (G)a
201.9517 -.1513755
-.1513755 14.5895961
Intercept | subid
time | subid
Intercept |subid time | subid
Unstructured
Dependent Variable: Achieve.a.
Requested by the “G” commandin the PRINT statement (Line 5)
τ00 τ10 τ11
τ01
1110
0100
τ00 τ10
τ11
τ01
Requested by the “TESTCOV” command in the PRINT statement (Line 5)
Asymptotic standard errors and Wald Z-tests
σ2
30
• Compare
Likelihood Ratio Test!
Can I have a simpler G matrix (i.e. τ01= τ10 =0)
1110
0100
11
00
0
0
With
-2LL: 2509.873 -2LL: ?
31
Syntax for fitting simpler G
SPSS syntax/random = intercept Time |subject(Subid)
COVTYPE (Diag)
11
00
0
0
32
(Model with τ01= τ10 =0)
-2 Res Log Likelihood 2509.873
(or Deviance)
(Model with τ01= τ10 ≠0)
-2 Res Log Likelihood 2509.873
(or Deviance)
χ2(1)=.00018, p=.99
Choose This
33
Compare to model with τ11= 0
SPSS syntax
/random = intercept |subject(Subid) COVTYPE (Diag)
00
000
34
(Model with τ01=τ10=0, τ11≠0)
-2 Res Log Likelihood 2509.873
(Model with τ11=τ01=τ10= 0)
-2 Res Log Likelihood 2524.387
χ2(1)=14.51, p<.001
Choose This
Halved P-value
11
00
0
0
00
000
35
Result of the final Model
γ00
γ10
Estimates of Covariance Parametersa
87.794973 9.591118 9.154 .000 70.872958 108.757380
201.7136 39.133631 5.154 .000 137.910425 295.034860
14.556515 4.964819 2.932 .003 7.459959 28.403928
ParameterResidual
Var: Intercept
Var: time
Intercept + time [subject= subid]
Estimate Std. Error Wald Z Sig. Lower Bound Upper Bound
95% Confidence Interval
Dependent Variable: Achieve.a.
Random Effect Covariance Structure (G)a
201.7136 0
0 14.556515
Intercept | subid
time | subid
Intercept |subid time | subid
Diagonal
Dependent Variable: Achieve.a.
Estimates of Fixed Effectsa
54.256090 1.836838 89.672 29.538 .000 50.606708 57.905472
2.376090 .641668 89.672 3.703 .000 1.101242 3.650938
ParameterIntercept
time
Estimate Std. Error df t Sig. Lower Bound Upper Bound
95% Confidence Interval
Dependent Variable: Achieve.a.
τ00 τ11
σ2
36
• 1. On average, is there any trend of the math achievement over time?
• 2. Are there any differences between students on the trend of math achievement over time? (Or, do all students have the same trend of math achievement over time?)
τ00 = 201.71 τ11 = 14.56
• Q3. If Yes to Q2, what causes the differences?
titi TimechaMath 38.226.54ˆ
37
• Micro Level (Level 1):
MAti = 0i + 1i Timeti + eti
(Variance of eti = σ2)
• Combined Model:
MAti =γ00 + γ01 Compi + γ10 Timeti + γ11Timeti*Compi
+ U0i + U1i SESti + eti
• Macro Level (Level 2):
β0i =γ00 + γ01 Compi + U0i
β1i =γ10 + γ11 Compi + U1i
(Variance of U0i = τ00; Variance of U1i = τ11)
Null Hypothesis:Different teaching methods have SAME effects on achievement over time
(H0: γ11 = 0)
38
MAij =γ00 + γ01 Compi + γ10 Timeti + γ11Timeti*Compi + U0i + U1i Timeti + eti
• SPSS PROC MIXED Syntax:MIXED mathach with Time
/METHOD = REML /Fixed = intercept Comp Time Time*Comp /Random = intercept Time
|Subject(Subid) COVTYPE (Diag)
/PRINT = G SOLUTION TESTCOV. Execute.
39
Without Comp in the Macro models
With Comp in the Macro models
Random Effect Covariance Structure (G)a
176.1636 0
0 9.813461
Intercept | subid
time | subid
Intercept |subid time | subid
Diagonal
Dependent Variable: Achieve.a.
Random Effect Covariance Structure (G)a
201.7136 0
0 14.556515
Intercept | subid
time | subid
Intercept |subid time | subid
Diagonal
Dependent Variable: Achieve.a.
40
81.90
016.176G
56.140
071.201G
(WITHOUT “Comp” in the model) (WITH “Comp” in the model)
Proportion of variance in the intercept ( ) explained by “Comp”=(201.71-176.16)/201.71 = .13 (or 13%)
Proportion of variance in the slope ( ) explained by “Comp”=(14.56-9.81)/14.56 = .33 (or 33%)
00
11
41
Solution for Fixed Effects
Standard
Effect Estimate Error DF t Value Pr > |t|
Intercept 50.3769 2.4764 76 20.34 <.0001
time 0.5756 0.8445 232 0.68 0.4962
computer 7.7583 3.5021 76 2.22 0.0297
time*comp 3.6009 1.1943 232 3.02 0.0029
tiitiiti TimeCompTimeComphcA *60.3*58.*76.738.50ˆ
42
titi TimeachhMat 58.38.50ˆ
Overall Model for students in the Lecture method group
Overall Model for students in the Computer method group
titi TimeachhMat 18.414.58ˆ
tiitiiti TimeCompTimeComphcA *60.3*58.*76.738.50ˆ
81.90
016.176G
Random Effect
V(eti)=σ2=90.00
43
Achievement
Computer
Lecture
Time=0 : Immediately posttest measure
Time (Year)
Conclusion
• Advantages of using MLM over traditional ANOVA approaches for analyzing longitudinal data: – 1. Can flexibly model the variance function– 2. Retain meaning of the random effects– 3. Explore factors which predict individual differences in
change over time (e.g., Treatment effect)
– 4.Take both unequal spacing and missing data into
account
1100 ,
45
Take Home Exercise A clinical psychologist wants to examine the
impact of the stress level of each family member (STRESS) on his/her level of symptomatology (SYMPTOM). There are 100 families, and families vary in size from three to eight members. The total number of participants is 400.
a) Can you write out the model? (Hint: What is in the micro model? What is in the macro model?)
b) Can you write out the syntax (SPSS) to analyze this model?
46
c) In designing the study, what possible macro predictors do you think the clinical psychologist should include in her study? (e.g. family size?)
d) In designing the study, what possible micro predictors do you think the clinical psychologist should include in her study? (e.g. participant’s neuroticism?)
e) Can you write out the model? (Hint: What is in the micro model? What is in the macro model)
f) Can you write out the syntax (SPSS) to analyze this model?
47
b) SYMPTOMij = γ00 + γ10 STRESSij + U0j + U1j STRESSij + eij
SPSS Syntax:MIXED Symptom with Stress
/fixed = intercept Stress
/random = intercept Stress |subject (Family) COVTYPE (UN)
/PRINT = G SOLUTION TESTCOV.
execute.
48
a) Micro-level model:
SYMPTOMij = β0j + β1j STRESSij + eij
Macro-level model:
β0j = γ00 + U0j
β1j = γ10 + U1j
Combined model:
SYMPTOMij = γ00 + γ10 STRESSij
+ U0j + U1j STRESSij + eij
THE END!
THANK YOU!