the r package lavaan - meetupfiles.meetup.com/2968362/rbelgium5_lavaan.pdf · 2013. 9. 16. ·...

42

Department of Data Analysis Ghent University The R package lavaan Yves Rosseel Department of Data Analysis Ghent University – Belgium RBelgium meeting 5 @ UGent 13 September 2013 Yves Rosseel The R package lavaan 1 / 42

Upload: others

Post on 03-Feb-2021

2 views

Category:

Documents

0 download

Report

Download

Embed Size (px):

TRANSCRIPT

Department of Data Analysis Ghent University

The R package lavaan

Yves RosseelDepartment of Data AnalysisGhent University – Belgium

RBelgium meeting 5 @ UGent13 September 2013

Yves Rosseel The R package lavaan 1 / 42
Department of Data Analysis Ghent University

lavaan: latent variable analysis

• large family of statistical models, exploiting the concept of ‘latent variables’

• latent variables are:

– unobserved constructs (eg ‘level of depression’)– random effects (as in mixed models)– missing data

• most well-known subclass: ‘structural equation models (SEM)’

Yves Rosseel The R package lavaan 2 / 42
Department of Data Analysis Ghent University

SEM examples: path analysis (no latent variables)

y1

y2

y3

y4

y5

y6 y7

Yves Rosseel The R package lavaan 3 / 42
Department of Data Analysis Ghent University

SEM examples: confirmatory factor analysis (CFA)

y1

y2

y3

y4

y5

y6

y7

y8

y9

η1

η2

η3

Yves Rosseel The R package lavaan 4 / 42
Department of Data Analysis Ghent University

SEM examples: ‘full’ structural equation modeling

y1

y2

y3

y4

y5

y6

η1

η2

y7 y8 y9 y10 y11 y12

η3 η4

Yves Rosseel The R package lavaan 5 / 42
Department of Data Analysis Ghent University

software for SEM: commercial – closed-source

• the big four:

– LISREL– EQS– AMOS– Mplus

• SAS/Stat: proc CALIS, proc TCALIS

• SEPATH (Statistica), RAMONA (Systat), Stata 12

• Mx (free, closed-source)

Yves Rosseel The R package lavaan 6 / 42
Department of Data Analysis Ghent University

software for SEM: non-commercial – open-source

• outside the R ecosystem: gllamm (Stata module), . . .

• R packages:

– sem– OpenMx– lavaan– lava

• interfaces between R and commercial packages:

– REQS– MplusAutomation

Yves Rosseel The R package lavaan 7 / 42
Department of Data Analysis Ghent University

the lavaan project

1. lavaan subproject: the lavaan package/program

• lavaan is an R package for latent variable analysis• the long-term goal of lavaan is to implement all the state-of-the-art

capabilities that are currently available in commercial packages

2. lavaan subproject: Rosetta

• collection of tools for reading/parsing and writing legacy syntax (egclassic LISREL syntax)

• intermediate representation: the lavaan parameter table• currently (0.5-14) only: Mplus/LISREL to lavaan, lavaan to Mplus

3. lavaan subproject: Chameleon

• mimic legacy software• reproducibility

Yves Rosseel The R package lavaan 8 / 42
Department of Data Analysis Ghent University

the lavaan package (1)

• lavaan is an R package for latent variable analysis

• the long-term goal of lavaan is to implement all the state-of-the-art capabil-ities that are currently available in commercial packages

Yves Rosseel The R package lavaan 9 / 42
Department of Data Analysis Ghent University

the lavaan package (2)

• the lavaan source code is hosted on github:

https://github.com/yrosseel/lavaan

• more information about lavaan:

http://lavaan.org

• the lavaan paper:

Rosseel (2012). lavaan: an R package for structural equationmodeling. Journal of Statistical Software, 48(2), 1–36.

• lavaan discussion group (mailing list)

https://groups.google.com/d/forum/lavaan

Yves Rosseel The R package lavaan 10 / 42
Department of Data Analysis Ghent University

how big is lavaan?

• > 25K lines of code (currently, 0.5-15, R code only)

• how many people are using lavaan?

– no idea– the lavaan paper has been downloaded > 12, 600 times (since April

2012)

– lavaan.org gets around 60–120 hits per day

Yves Rosseel The R package lavaan 11 / 42
Department of Data Analysis Ghent University

where are the lavaan users?

Yves Rosseel The R package lavaan 12 / 42
Department of Data Analysis Ghent University

why do we need lavaan?

1. lavaan is for statisticians working in the field of SEM

• it seems unfortunate that new developments in this field are hindered bythe lack of open source software that researchers can use to implementtheir newest ideas

2. lavaan is for teachers

• teaching these techniques to students was often complicated by theforced choice for one of the commercial packages

3. lavaan is for applied researchers

• keep it simple, provide all the features they need

Yves Rosseel The R package lavaan 13 / 42
Department of Data Analysis Ghent University

features of lavaan

• lavaan is well-tested

• user-friendly fitting functions (cfa, sem, growth)

• power-user fitting function (lavaan)

• support for non-normal continuous data:

– robust standard errors, Satorra-Bentler correction, ADF estimation, boot-strapping

• support for categorical (binary/ordinal) data

– lavaan has implemented the three-stage WLS approach as developedby Bengt Muthén (1984); including robust variants (aka WLSMV)

• full support for missing data, meanstructures, and multiple groups

• linear and non-linear equality and inequality constraints

Yves Rosseel The R package lavaan 14 / 42
Department of Data Analysis Ghent University

unique features

• default model specification: lavaan model syntax

– Mplus2lavaan (Michael Hallquist)– lisrel2lavaan (Corbin Quick)– graphical (via Onyx)– . . .

• mimic the (numerical) results of commercial packages:

– mimic="Mplus"– mimic="EQS"

• new technical features:

– informative hypothesis testing (Leonard Vanbrabant)– pairwise ML for binary/ordinal data (Myrsini Katsikatsou)– fraction of missing information (Mijke Rhemtulla)– . . .

Yves Rosseel The R package lavaan 15 / 42
Department of Data Analysis Ghent University

features NOT in lavaan (yet)

• multilevel SEM

• mixture (latent class) SEM

• . . .

features we are working on

• Bayesian SEM (BUGS interface, stan interface, native)

• small-sample corrections

• causal inference

• standard errors for standardized parameters

• ML estimation for categorical data (IRT)

• . . .

• better (technical) documentation

Yves Rosseel The R package lavaan 16 / 42
Department of Data Analysis Ghent University

the lavaan ecosystem

• lavaan.survey (Daniel Oberski)

survey weights, clustering, strata, and finite sampling correctionsin SEM

• Onyx (Timo von Oertzen, Andreas M. Brandmaier, Siny Tsang)

interactive graphical interface for SEM (written in Java)

• semTools (Sunthud Pornprasertmanit and many others)

collection of useful functions for SEM

• simsem (Sunthud Pornprasertmanit and many others)

simulation of SEM models

• semPlot (Sacha Epskamp)

visualizations of SEM models

Yves Rosseel The R package lavaan 17 / 42
Department of Data Analysis Ghent University

semPlot

y1 y2 y3 y4 y5 y6

x1 x2 x3

f1 f2

Yves Rosseel The R package lavaan 18 / 42
Department of Data Analysis Ghent University

a simple regression analysis in R

x1

x2

x3

x4

y

# read in your datamyData
Department of Data Analysis Ghent University

lm() output artificial data (N=100)Call:lm(formula = y ˜ x1 + x2 + x3 + x4, data = myData)

Residuals:Min 1Q Median 3Q Max

-102.372 -29.458 -3.658 27.275 148.404

Coefficients:Estimate Std. Error t value Pr(>|t|)

(Intercept) 97.7210 4.7200 20.704
Department of Data Analysis Ghent University

the lavaan model syntax – a simple regression

x1

x2

x3

x4

y

library(lavaan)myData
Department of Data Analysis Ghent University

output (artificial data, N=100)lavaan (0.5-13) converged normally after 1 iterations

Number of observations 100

Estimator MLMinimum Function Test Statistic 0.000Degrees of freedom 0P-value (Chi-square) 1.000

Parameter estimates:

Information ExpectedStandard Errors Standard

Estimate Std.err Z-value P(>|z|)Regressions:y ˜x1 5.773 0.511 11.309 0.000x2 -1.321 0.479 -2.757 0.006x3 1.135 0.446 2.545 0.011x4 0.271 0.466 0.581 0.561

Variances:y 2075.100 293.463

Yves Rosseel The R package lavaan 22 / 42
Department of Data Analysis Ghent University

the lavaan model syntax – multivariate regression

x1

x2

x3

x4

y1

y2

myModel
Department of Data Analysis Ghent University

the lavaan model syntax – path analysis

x1

x2

x3

x4

x5

x6

x7

myModel
Department of Data Analysis Ghent University

the lavaan model syntax – mediation analysis

X

M

Y

a

c

b

model
Department of Data Analysis Ghent University

output...

Parameter estimates:

Information ObservedStandard Errors BootstrapNumber of requested bootstrap draws 1000Number of successful bootstrap draws 1000

Estimate Std.err Z-value P(>|z|)Regressions:Y ˜M (b) 0.597 0.098 6.068 0.000X (c) 2.594 1.210 2.145 0.032

M ˜X (a) 2.739 0.999 2.741 0.006

Variances:Y 108.700 17.747M 105.408 16.556

Defined parameters:indirect 1.636 0.645 2.535 0.011total 4.230 1.383 3.059 0.002

Yves Rosseel The R package lavaan 26 / 42
Department of Data Analysis Ghent University

the lavaan model syntax – using cfa() or sem()

x1

x2

x3

x4

x5

x6

x7

x8

x9

visual

textual

speed

HS.model
Department of Data Analysis Ghent University

the lavaan model syntax – using lavaan()

x1

x2

x3

x4

x5

x6

x7

x8

x9

visual

textual

speed

HS.model
Department of Data Analysis Ghent University

outputlavaan (0.5-12) converged normally after 41 iterations

Number of observations 301

Estimator MLMinimum Function Chi-square 85.306Degrees of freedom 24P-value 0.000

Chi-square test baseline model:

Minimum Function Chi-square 918.852Degrees of freedom 36P-value 0.000

Full model versus baseline model:

Comparative Fit Index (CFI) 0.931Tucker-Lewis Index (TLI) 0.896

Loglikelihood and Information Criteria:

Loglikelihood user model (H0) -3737.745Loglikelihood unrestricted model (H1) -3695.092

Number of free parameters 21

Yves Rosseel The R package lavaan 29 / 42
Department of Data Analysis Ghent University

Akaike (AIC) 7517.490Bayesian (BIC) 7595.339Sample-size adjusted Bayesian (BIC) 7528.739

Root Mean Square Error of Approximation:

RMSEA 0.09290 Percent Confidence Interval 0.071 0.114P-value RMSEA |z|) Std.lv Std.allLatent variables:visual =˜x1 1.000 0.900 0.772x2 0.553 0.100 5.554 0.000 0.498 0.424x3 0.729 0.109 6.685 0.000 0.656 0.581

textual =˜x4 1.000 0.990 0.852x5 1.113 0.065 17.014 0.000 1.102 0.855

Yves Rosseel The R package lavaan 30 / 42
Department of Data Analysis Ghent University

x6 0.926 0.055 16.703 0.000 0.917 0.838speed =˜x7 1.000 0.619 0.570x8 1.180 0.165 7.152 0.000 0.731 0.723x9 1.082 0.151 7.155 0.000 0.670 0.665

Covariances:visual ˜˜textual 0.408 0.074 5.552 0.000 0.459 0.459speed 0.262 0.056 4.660 0.000 0.471 0.471

textual ˜˜speed 0.173 0.049 3.518 0.000 0.283 0.283

Variances:x1 0.549 0.114 0.549 0.404x2 1.134 0.102 1.134 0.821x3 0.844 0.091 0.844 0.662x4 0.371 0.048 0.371 0.275x5 0.446 0.058 0.446 0.269x6 0.356 0.043 0.356 0.298x7 0.799 0.081 0.799 0.676x8 0.488 0.074 0.488 0.477x9 0.566 0.071 0.566 0.558visual 0.809 0.145 1.000 1.000textual 0.979 0.112 1.000 1.000speed 0.384 0.086 1.000 1.000

Yves Rosseel The R package lavaan 31 / 42
Department of Data Analysis Ghent University

testing for measurement invariance# model 1: configural invariancefit1
Department of Data Analysis Ghent University

lavaan model syntax: full sem

y1

y2

y3

y4

y5

y6

y7

y8

x1 x2 x3

dem60

dem65

ind60

myModel
Department of Data Analysis Ghent University

a simple growth curve model with time-varying covariates

c1

c2

c3

c4

t1 t2 t3 t4

i s

x1 x2

model
Department of Data Analysis Ghent University

further syntax

• fixing parameters, and overriding auto-fixed parametersHS.model.bis
Department of Data Analysis Ghent University

the parameter table (Holzinger & Swineford CFA example)> parTable(fit)

id lhs op rhs user group free ustart exo label eq.id unco1 1 visual =˜ x1 1 1 0 1 0 0 02 2 visual =˜ x2 1 1 1 NA 0 0 13 3 visual =˜ x3 1 1 2 NA 0 0 24 4 textual =˜ x4 1 1 0 1 0 0 05 5 textual =˜ x5 1 1 3 NA 0 0 36 6 textual =˜ x6 1 1 4 NA 0 0 47 7 speed =˜ x7 1 1 0 1 0 0 08 8 speed =˜ x8 1 1 5 NA 0 0 59 9 speed =˜ x9 1 1 6 NA 0 0 610 10 x1 ˜˜ x1 0 1 7 NA 0 0 711 11 x2 ˜˜ x2 0 1 8 NA 0 0 812 12 x3 ˜˜ x3 0 1 9 NA 0 0 913 13 x4 ˜˜ x4 0 1 10 NA 0 0 1014 14 x5 ˜˜ x5 0 1 11 NA 0 0 1115 15 x6 ˜˜ x6 0 1 12 NA 0 0 1216 16 x7 ˜˜ x7 0 1 13 NA 0 0 1317 17 x8 ˜˜ x8 0 1 14 NA 0 0 1418 18 x9 ˜˜ x9 0 1 15 NA 0 0 1519 19 visual ˜˜ visual 0 1 16 NA 0 0 1620 20 textual ˜˜ textual 0 1 17 NA 0 0 1721 21 speed ˜˜ speed 0 1 18 NA 0 0 1822 22 visual ˜˜ textual 0 1 19 NA 0 0 1923 23 visual ˜˜ speed 0 1 20 NA 0 0 2024 24 textual ˜˜ speed 0 1 21 NA 0 0 21

Yves Rosseel The R package lavaan 36 / 42
Department of Data Analysis Ghent University

the parameter table (2)> PT lavNames(fit, "ov")[1] "x1" "x2" "x3" "x4" "x5" "x6" "x7" "x8" "x9"

> lavNames(fit, "lv")[1] "visual" "textual" "speed"

> lavNames(fit, "ov.x")character(0)

> lavNames(fit, "lv.x")[1] "visual" "textual" "speed"

> lavaan:::getDF(PT)[1] 24

> lMR
Department of Data Analysis Ghent University

> lMR[,c("id","lhs","op","rhs","mat","row","col")]id lhs op rhs mat row col1 1 visual =˜ x1 lambda 1 12 2 visual =˜ x2 lambda 2 13 3 visual =˜ x3 lambda 3 14 4 textual =˜ x4 lambda 4 25 5 textual =˜ x5 lambda 5 26 6 textual =˜ x6 lambda 6 27 7 speed =˜ x7 lambda 7 38 8 speed =˜ x8 lambda 8 39 9 speed =˜ x9 lambda 9 310 10 x1 ˜˜ x1 theta 1 111 11 x2 ˜˜ x2 theta 2 212 12 x3 ˜˜ x3 theta 3 313 13 x4 ˜˜ x4 theta 4 414 14 x5 ˜˜ x5 theta 5 515 15 x6 ˜˜ x6 theta 6 616 16 x7 ˜˜ x7 theta 7 717 17 x8 ˜˜ x8 theta 8 818 18 x9 ˜˜ x9 theta 9 919 19 visual ˜˜ visual psi 1 120 20 textual ˜˜ textual psi 2 221 21 speed ˜˜ speed psi 3 322 22 visual ˜˜ textual psi 1 223 23 visual ˜˜ speed psi 1 324 24 textual ˜˜ speed psi 2 3

Yves Rosseel The R package lavaan 38 / 42
Department of Data Analysis Ghent University

future plans

• S4 classes: nice, but clumsy and ridiculously slow

• newer code relies on ‘Reference Classes’

• for large-scale simulation studies: lavaan (or R) is too slow

• eventually, I will rewrite everything in C++ (using the Eigen library)

– ideally, only a thin layer is written in R– to Rcpp or not to Rcpp?– the python/MATLAB/. . . communities also need a high-quality pack-

age for latent variable analysis

Yves Rosseel The R package lavaan 39 / 42
Department of Data Analysis Ghent University

why you should not create an R package

• I get 5–20 emails per day (lavaan related)

• contributed code is usually of low quality

• . . .

• R core is not a democracy

• CRAN is not a democracy

• . . .

• dependency hell: packages are (sometimes) removed from CRAN

• I shall not break any packages that depend on lavaan

Yves Rosseel The R package lavaan 40 / 42
Department of Data Analysis Ghent University

R is (not so) great

• R, as a language, is not perfect

– the copy-by-value semantics– a lot of unnecessary internal copying; this affects speed– big data, parallelization: can be done, but not easily– no native support for many basic matrix operations (sparse, ginv, . . . )– optimizers are of medium-quality (optim, nlminb, . . . )– vectorized code is relatively fast, but not always possible– computing p-values under a multivariate normal distribution (package

mvtnorm) is NOT vectorized (and hence slow)

– . . .

• I do not expect any spectacular changes in the future

• future alternatives? http://julialang.org/

Yves Rosseel The R package lavaan 41 / 42
Department of Data Analysis Ghent University

Thank you!

http://lavaan.org

Yves Rosseel The R package lavaan 42 / 42

lavaan: a brief user’s guideyrosseel/lavaan/lavaan2.pdf · lavaan: a brief user’s guide Yves Rosseel Department of Data Analysis Ghent University – Belgium Utrecht – April

Structural Equation Modeling with lavaanyrosseel/lavaan/gent2020/... · 2 Introduction to lavaan54 ... structural equation modeling (SEM) •path analysis with latent variables y

lavaan: An R Package for Structural Equation Modeling2 lavaan: An R Package for Structural Equation Modeling paper. Structural equation models encompass a wide range of multivariate

multiple group 6Dec2012 - UGentyrosseel/lavaan/multiplegroup6Dec2012.pdf · Multiple group measurement invariance analysis in Lavaan Kate Xu Department of Psychiatry University of

multiple group 6Dec2012 - UGentusers.ugent.be/~yrosseel/lavaan/multiplegroup6Dec2012.pdf · invariance analysis in Lavaan ... Model 2 metric MI model2

Package 'lavaan

Multiple group measurement invariance analysis in Lavaan

Semantic annotation on the SONet and Semtools projects: Challenges for broad multidisciplinary exchange of observational data Mark Schildhauer, NCEAS/UCSB

Complex Survey Analysis of Structural Equation Models: the lavaan

การวิเคราะห์ถดถอย 2 - Sunthud...การว เคราะห ถดถอย 2 สถ ต ส าหร บจ ตว ทยา 1 ส นท ด

Package ‘semTools’ - cran.r-project.org · x A target lavaan object used in the Bollen-Stine bootstrap transformation The transformation methods in Savalei and Yuan (2009). There

Package ‘regsem’ · cv_regsem 3 Arguments model Lavaan output object. This is a model that was previously run with any of the lavaan main functions: cfa(), lavaan(), sem(), or

lavaan: a brief user’s guideusers.ugent.be/~yrosseel/lavaan/lavaan2.pdf · 1 lavaan: a brief user’s guide 1.1 Model syntax: specifying models The four main formula types,

The lavaan tutorial - UGent · The lavaan tutorial Yves Rosseel Department of Data Analysis Ghent University (Belgium) May 13, 2020 Abstract If you are new to lavaan, this is the

Multilevel Structural Equation Modeling with lavaanusers.ugent.be/~yrosseel/lavaan/zurich2017/MULTILEVEL/lavaan... · Department of Data Analysis Ghent University Multilevel Structural

Tutorial The Pairwise Likelihood Method for Structural Equation …yrosseel/lavaan/pml/PL_Tutorial.pdf · 2017-12-18 · 2 Structural Equation Models (SEM) with ordinal variables

Open-source modern modeling software: the R package lavaanyrosseel/lavaan/lavaan_M3_2013.pdf · lavaan is an R package for latent variable analysis the long-term goal of lavaan is

The lavaan tutorial - UGentlavaan.ugent.be/tutorial/tutorial.pdf · The lavaan tutorial Yves Rosseel ... If you are new to lavaan, this is the place to start. In this tutorial,

lavaan: an R package for structural equation …jarrettbyrnes.info/ubc_sem/lavaan_materials/lavaan...lavaan: an R package for structural equation modeling and more Version 0.4-9 (BETA)

The lavaan tutorial - What is lavaan?

Lecture 5 Psychological Testing and Measurement Sunthud ... Lectur… · Aptitude Test For aptitude and ... Case study in assessment center The reliability and validity of criterion

Old and new approaches for the analysis of categorical ...yrosseel/lavaan/lavaan_2015_Berlin.pdf · Department of Data Analysis Ghent University example SEM framework: u = binary,

lavaan: an R package for structural equation modeling - Modern

Package ‘semTools’ · 2013. 8. 29. · Package ‘semTools’ March 17, 2013 Type Package Title Useful tools for structural equation modeling. Version 0.3-2 Date 2013-03-17 Author

Mplus estimators: MLM and MLR - UGentusers.ugent.be/~yrosseel/lavaan/utrecht2010.pdf · Department of Data Analysis Ghent University Estimator: ML •default estimator for many model

SONet: Scientific Observations Network Semtools: Semantic Enhancements for Ecological Data Management Mark Schildhauer, Matt Jones, Shawn Bowers, Huiping

Summary of Inferential Statistics - Sunthud

ANOVA for Factorial Design - Sunthud

Giessen R Users Group - 1. Workshop SEM mit lavaan

Notes: IP-056512; Support provided by the USGS Climate & Land · Intro to Lavaan”. Again, three steps in lavaan: (1) specify model using lavaan’scode, (2) use the “sem” function

lavaan: internals - UGentusers.ugent.be/~yrosseel/lavaan/lavaan3.pdf · Department of Data Analysis Ghent University lavaan: internals Yves Rosseel Department of Data Analysis Ghent

Basic lavaan Syntax Guide - Amazon Web Services · 1. Getting Started [top] A few basic points: Lavaan is an R package for classical structural equation modeling (SEM). An elementary

The lavaan tutorial - USC Dana and David Dornsife … lavaan tutorial Yves Rosseel Department of Data Analysis Ghent University (Belgium) July 21, 2013 Abstract ... this model syntax

Structural Equation Modeling: models, software and storiesyrosseel/lavaan/rosseel_user2017.pdf · image writing a package for structural equation modeling (SEM) – there is a ‘tradition’,

Multiple group measurement invariance analysis in Lavaan Kate Xu Department of Psychiatry University of Cambridge Email: [email protected]