a brief introduction to multilevel models...2015/02/27 · a brief introduction to multilevel...

A BRIEF INTRODUCTION TO MULTILEVEL MODELS

Leslie Rutkowski, PhD Assistant Professor of Inquiry Methodology Counseling & Educational Psychology School of Education Indiana University WIM Seminar February 2015

Multilevel data • Often participants of studies are nested within specific

contexts • Patients treated in hospitals • Firms operate within countries • Families live in neighborhoods • Students learn in classes within schools

• Data stemming from such research designs have a

multilevel or hierarchical structure.

2

Terminology • HLM • Multilevel modeling (MLM) • Random effects models • Variance components • Mixed effects modeling

3

More terminology • Macro

• Macro-level units • Macro units • Primary units • Clusters • Level 2 units

• Micro • Micro-level units • Micro units • Secondary units • Elementary units • Level 1 units

HLM: Simple model • One L1 predictor with a random intercept & a random

slope: • HLM form:

• Linear mixed model (LMM) form:

5

Notation, notation, notation

Notation, notation, notation, II

7

Group Level Similarities • If we use traditional linear regression, we assume:

8

Group Level Similarities • But more often we have:

9

Group Level Similarities • Students within a school are somewhat similar • Students between schools are different • Why? (absolutely not comprehensive)

• Teacher factors • Pedagogical approaches • Training

• School factors • Public vs. Private • Safety

• Community factors • Parental involvement • Average SES

Implications? • It is often (but not always) important to take into account

the group level dependencies in analyses. • Why?

• Most (traditional) assumptions are violated. • We might miss some very important group effects. • The level of group dependency is important on its own.

• When do we care about group-level dependencies?

Single vs. multilevel regression

How different are the relationships? MATH452 = 448.28 + 38.57*books. MATH517 = 518.99 + 4.78*books. MATH529 = 487.81 + 10.09*books. MATH548 = 461.95 + 16.55*books. MATH577 = 393.50 – 7.82*books. MATH604 = 493.21 + 17.81*books. MATH619 = 498.76 – 1.93*books. MATH622 = 465.68 + 5.35*books. MATH677 = 506.50 + 1.24*books. MATH1479= 480.86 + 29.54*books. MATH1673= 605.29 + 11.16*books.

Ignoring data structure

• We can easily have such a situation:

y = 0.5x + 6

y = 0.5x + 1

y = 0.5x + 8

y = -1x + 21.619

0

2

4

6

8

10

12

14

16

18

0 5 10 15 20 25

Y

X

Within vs. Between

Null/Empty/Baseline Model • Useful first step in model building / hypothesis testing

process. • With the empty model we learn if there are between group

differences • Yes? Multilevel approach is warranted. ( ) • No? A standard, plain vanilla regression is sufficient. ( )

Simplest Multilevel Model: Null • Null model, empty model, fully unconditional model:

• Where

• There are no predictors • Linear mixed model: • Yij is random because U0j & Rij are random

Deconstructed • Intercept is composed of two parts: • Overall (fixed) mean: • Random group effect:

• This is random difference for group j from . • Individual deviation / residual deviation:

• This is random difference for student i from • Variance components:

• Decomposed into two parts:

In words • Groups (j) are regarded as a sample from a population of

groups. • We can say that the intercept coefficient depends on j. • This is an important first step b/c it provides a basic

partition of the variance. • If the intercept does not depend on j, then it is best to take

an OLS approach to analyzing the data. • Usually – it’s possible that there are no intercept differences but

there are slope differences. When could this happen?

What are we modeling? • We know that we have a sample from a larger population of schools and students.

• We see that the means are quite different. • Do we have sufficient evidence to support a multilevel approach?

452 517 529 548 577 604 619 622 677 1479 1673

300

400

500

600

700

800

Math_Score

SCHOOL ID

How similar/different?: Intraclass correlation • Measure of similarity between two randomly chosen level one

units within a randomly chosen level two unit

• Proportion of variance in the outcome that is between groups

• Proportion of variance in outcome explained by group differences.

• Provides justification for a multilevel modeling approach

• is the population between group variance

• is the population within group variance

Null Model ICC

• Estimated parameters:

• ICC from this example: 3170.88/(3170.88+ 4680.16) = 0.4039

• 40% of the variance in mathematics scores can be attributed to between school differences. The remainder lies within the school.

Parameter Value

3170.88

4680.16

Intercepts as outcomes models • These are also referred to as random intercepts models. • A couple of possibilities:

• Some level one predictors, no level two predictors; • Some level one predictors, some level two predictors. • Some level two predictors, no level one predictors

• But what do these look like?

Add a level-1 predictor • Add to our empty model a predictor for number of books

in the home:

• Number of books in the home is an historic proxy for SES (goes back to FIMS - http://www.iea.nl/fims.html)

• Additional variable may help to explain variance in achievement – this is at the heart of what we are trying to do.

RI model with L1 Books • Adding a predictor to the model gives us:

• As a linear mixed model:

Specifically:

Abstractly:

HLM or MLM

Based on the linear mixed model:

Estimated parameters • We are estimating 4 parameters:

• Intercept • Books effect • Between groups intercept variance • Within-groups variance

Results • Edited output from SAS:

• And the residual ICC:

• Notice this is a bit smaller than the empty model. Why?

Value SE Value SE

Fixed EffectsIntercept 514.03 17.62 515.62 14.59Books -- -- 17.99 3.70

Random EffectsIntercept 3170.88 2115.35Residual 4680.16 4336.00

Null Model Add Books

Adding Level 2 Effects

• Only level 1 variable forces the between and within regressions for a particular effect to be equal.

• Does this seem reasonable? (1.0 = .25?)

0

1

2

3

4

5

6

7

8

0 2 4 6 8 10X

Y

And With Our Data

Adding Level 2 Effects • If we add group mean for books, the between and within groups can differ.

Hierarchical linear model (HLM):

Linear mixed model (LMM):

If , then the between and within relationships are not different

Estimated parameters • Now we are estimating 5 parameters:

• Intercept • Books effect • School average effect of books • Intercept variance • Within variance component

Models compared

• What changed?

• And the ICC:

Value SE Value SE Value SEFixed Effects

Intercept 514.03 17.62 515.62 14.59 365.39 26.67books -- -- 17.99 3.70 16.30 3.75Mbooks 76.13 12.89

Random EffectsIntercept 3170.88 1460.23 2115.35 1021.63 574.51 363.05Residual 4680.16 432.74 4336.00 402.16 4345.33 403.97

Null Model Add Books Add Mbooks

Random slopes

Forcing slopes to be equal across groups

Allowing slopes to be unequal across groups

How does the model look?

• With one micro-variable:

• Where

• And

34

What are all these parts? First, what’s new?

1. A random component for the slope:

2. Variance components – Before: – Now, add slope variance: – And intercept/slope covariance:

How many more parameters are we estimating?

35

Putting it all together… • Linear Mixed Model:

• And:

36

And how do we interpret ?

• If this were a representative situation:

• What do you think?

2 Groups

0

2

4

6

8

10

12

0 0.2 0.4 0.6 0.8 1

X

Y

Intra-class correlation • In a random intercept model, recall the ICC:

• Now, variance and covariance of Yij

• So, the ICC is also dependent on the value of x

Ex 1: Random slopes, 1 predictor • Going back to our earlier example

• Yij = mathij & xij = booksij

• Level 1:

• Level 2:

Results


Intercept 514.03 17.62 515.62 14.59 505.46 3.10books -- -- 17.99 3.70 12.71 0.73

Random EffectsIntercept 3170.88 2115.35 2157.87Int/Slope 36.33Slope 11.29Residual 4680.16 4336.00 3426.69

Null Model Random Intercept Random Slope

Plot of Groups

And the ICC? • Depends on books, so let’s choose a couple of

reasonable values: • Most students fall between (-2.36, 2.36)

L2 Predictors & Cross Level Interactions • As in random intercept models, we can include level-2

predictors • Means of level one variables

• School SES • School attitude toward math

• Natural level two predictors • School resources (books, facilities) • Principal reports of student behavior problems

• These effects can predict the intercept, the slope or a combination

43

Simple CLI Model • Adding a bit to what we had before:

44

Linear Mixed Model • Combined:

• Notice the term. • This is a “cross-level” interaction • We are trying to explain the slope

• Does the slope for group j depend on some level 2 predictor? • Example: does the effect of student language use depend on the

average SES of the school?

45

Variance components • Notice that our variance components do not change

With all of the usual assumptions

46

Many other possibilities for L2 • We could also have:

Cross-Level Example Level 1: Level 2: Combined:

Results • Interpretation?


Intercept 514.03 17.62 505.46 3.10 412.91 3.10books -- -- 12.71 0.73 14.08 2.53Mbooks 47.68 3.36books*Mbooks -0.71 1.24

Random EffectsIntercept 3170.88 2157.87 836.09Int/Slope 36.33 73.56Slope 11.29 39.51Residual 4680.16 3426.69 3426.69

Null Model Random Slope CLI Model

Multilevel models • Just a VERY brief overview • A major methodological advance

• Allows for the possibility of randomly varying intercepts • Randomly varying slopes • We can decompose the variance within and between higher level

units. • We can fit cross level interactions

• We can test rich and complex theories

What do we gain? • Statistically efficient estimates of regression coefficients. • Correct standard errors, confidence intervals, and

significance tests. • Can use covariates measured at any of the levels of the

hierarchy. • We can test hypotheses about homogeneity or

heterogeneity of groups.

Lots of other topics • Centering – what and why • Testing variance components • Longitudinal analysis • Cross-classified models

Now, on to SAS!

PROC MIXED • Basic Syntax:

1. PROC MIXED options 2. CLASS statement 3. MODEL statement 4. RANDOM statement. 5. TITLE statement.

PROC MIXED PROC MIXED <options go here> ; <various commands on the following lines> Options: DATA=<name of sas data set>: what SAS data set to use. NOCLPRINT: don’t print classification levels/information. COVTEST: hypothesis tests for variances (& covariances). METHOD: estimation method to use.

ML = maximum likelihood REML = restricted maximum likelihood (REML is the default)

EMPIRICAL: empirical / Hubert-White / sandwich estimators.

CLASS, MODEL, BY statements • CLASS statement indicates variables that are class /

categorical / nominal / discrete • MODEL is of the form

• MODEL response = fixed predictors / options • SOLUTION requests a solution for the fixed effects

• BY statement produces a separate analysis for all of the “by” variables (e.g. country)

RANDOM & TITLE statement • RANDOM specifies the random effects, RANDOM random predictors / options • Option SOLUTION produces solution for random effects • Option type=UN gives variance and covariance components • Option SUBJECT= specifies the grouping variable

• TITLE adds a title to the procedure TITLE ‘title here’

ODS OUTPUT • Writes a data file based on your requests. • Requires certain statements in the PROC MIXED syntax.

• See TABLE 56.22 in SAS Help file for details. • In general: ods output <table name> = • Example: ods output solutionf=f_full covparms=c_full;

SAS Examples: • Null model:

• RI with L1 predictor:

a brief introduction to multilevel models...2015/02/27 · a brief introduction to multilevel...

Documents