구조방정식모형의 통계적 쟁점 - 서울대학교...

[사회과학 연구방법론 포럼 2016. 11. 17] 1

[2016년 사회과학 연구방법론 포럼]

구조방정식모형의 통계적 쟁점

Statistical Issues in Structural Equation Models

2016. 11. 17 (목)

김 규 성

(서울시립대학교 통계학과, [email protected])

[사회과학 연구방법론 포럼 2016. 11. 17] 2

Contents

1. Introduction

2. Basics of Structural Equation Modeling

2.1 Model Specification

2.2 Model Identification

2.3 Parameter Estimation

2.4 Testing

3. An Example Using SAS

4. Criticisms in SEM

5. SEM with Survey Data

[사회과학 연구방법론 포럼 2016. 11. 17] 3

1. Introduction

Structural Equation Modeling (SEM, 구조방정식 모형화)

⚪ (Kaplan, 2001) a class of methodologies that seeks to represent

hypotheses about the means, variances, and covariances of

observed data in terms of a smaller number of ‘structural’

parameters defined by a hypothesized underlying conceptual or

theoretical model.

⚪ Analysis of Covariance Structures

Causal Modeling

LISREL(Linear Structural RELations) Modeling

[사회과학 연구방법론 포럼 2016. 11. 17] 4

Developments of SEM (1)

1st Stage : Early disciplinary-specific developments

⚪ Psychology : Factor Analysis (인자분석)

- Spearman (1904)

⚪ Human Genetics : Regression Analysis (회귀분석)

- Galton (1889)

⚪ Biology : Path Modeling (경로 모형화)

- Wright (1934)

⚪ Economics : Simultaneous Equation Modeling (연립방정식 모형화)

- Koopmans (1953), Wold (1954)

[사회과학 연구방법론 포럼 2016. 11. 17] 5


2nd stage : Unification

◦ Cross-disciplinary fertilization between economics, sociology, and

psychology leading to an explosion of empirical applications of

SEM

- Joreskog (1970, 1973), Goldberger and Duncan (1973)

3rd stage : Extensions

◦ A period of developing methods for handing discrete, ordinal, and

limited dependent variables

[사회과학 연구방법론 포럼 2016. 11. 17] 6


4th stage : Recently

◦ A recent period of incorporating statistical advances into the SEM

framework, including

- generalized linear models,

- mixed effects models,

- mixture regression models,

- Bayesian methods,

- graphical models, and

- methods for identifying causal effects.

⚪ The recent period is substantially integrating SEM with the broader

statistical literature, which is making SEM an evermore exciting tool.

[사회과학 연구방법론 포럼 2016. 11. 17] 7

Developments of SEM (4) : Karimi et. al. (2014)

[사회과학 연구방법론 포럼 2016. 11. 17] 8

Software Package for SEM

⚪ LISREL(Linear Structural RELations)

⚪ AMOS (Analysis of MOmen tStructure)

⚪ EQS (Equations)

⚪ Mplus

⚪ Calis (a module of SAS)

⚪ SEPATH (a module of Statistica)

⚪ SPSS

⚪ lavaan, sem (modules in R)

[사회과학 연구방법론 포럼 2016. 11. 17] 9

Some References

◦ 김청택. 2016. “탐색적 요인분석의 오·남용 문제와 교정.” 조사연구 17: 1-29.

◦ 김상욱. 2016. “공분산구조분석의 모형추정 절차: 방법론적 진단 및 처방.” 조사연구 17:

55-70.

◦ 이기종. 2016. “구조방정식모형의 모형평가 오·남용과 교정.” 조사연구 17: 71-83.

◦ Bollen, K.A. and Pearl, J. 2013. “Eight myths about causality and SEMs.” In S.L.

Morgan (ed.), Hb of Causal Analysis for Social Research, 301-328.

◦ Fabrigar, L.R., Porter, R.D. and Norris, M.E. 2010. “Some things you should know

about structural equation modeling but never thought to ask.” Journal of

Consumer Psychology, 20: 221-225.

◦ Iacobucci, D. 2009. “Everything you always wanted to know about SEM but were

afraid to ask.” Journal of Consumer Psychology, 19: 673-80.

[사회과학 연구방법론 포럼 2016. 11. 17] 10

2. Basics of Structural Equation Modeling

2.1 Model Specification (모형 명시)

2.2 Model Identification (모형 식별)

2.3 Parameter Estimation (모수 추정)

2.4 Testing (검정)

[사회과학 연구방법론 포럼 2016. 11. 17] 11

2.1 Model Specification

⚪ The general structural equation model consists of two parts:

(a) the structural part (구조 부분) linking to latent variables to

each other via systems of simultaneous equations

(b) the measurement part (측정 부분) which links latent variables

to observed variables via a restricted (confirmatory) factor

model

[사회과학 연구방법론 포럼 2016. 11. 17] 12

Structural Part

⚪ Model

where

- : a vector of endogenous (criterion) latent variables

- : a vector of exogenous (predictor) latent variables

- : a matrix of regression coefficients relating the latent

endogenous variables to each other

- : a matrix of regression coefficients relating the latent

endogenous variables to exogenous variables

- : a vector of disturbance terms

[사회과학 연구방법론 포럼 2016. 11. 17] 13

Example : Structural Part

Self-esteem

Jobsatisfaction

[사회과학 연구방법론 포럼 2016. 11. 17] 14

Structural Part

⚪ In the structural part,

we assume that

, , : non-singular

⚪ Then

′ ′

[사회과학 연구방법론 포럼 2016. 11. 17] 15

Measurement Part

⚪ Model

where

- : a vector of dependent variable

- : a vector of independent variable

- , : matrices of factor loadings

- : vectors of errors

⚪ In the measurement part, assume that

, , : uncorrelated

[사회과학 연구방법론 포럼 2016. 11. 17] 16

Example : Measurement Part

Self-esteem

Jobsatisfaction

, , ,

[사회과학 연구방법론 포럼 2016. 11. 17] 17

Example : SEM - Combination

Self-esteem

Jobsatisfaction

[사회과학 연구방법론 포럼 2016. 11. 17] 18

Covariance Structure Model

⚪ Combining the structural part and measurement part leads to

×

where

- ′ ′′

- ′ -

′′

[사회과학 연구방법론 포럼 2016. 11. 17] 19

Covariance Structure Model

⚪ Elements of

- Some are fixed to zero by hypothesis.

- The remaining parameters are free to be estimated.

⚪ The pattern of fixed and free parameters implies a specific

structure for the covariance matrix of the observed variables.

⚪ Let : the parameter vector containing all of the parameters of

the model.

⇨ A structural equation model is a special case of the general

covariance structure with

[사회과학 연구방법론 포럼 2016. 11. 17] 20

2.2 Model Identification

⚪ The elements in are identified if they can be expressed uniquely

in terms of the elements of the covariance matrix

⚪ Previous Example: under-identified

- # of parameter in = 10

- # of parameter in = 11

[사회과학 연구방법론 포럼 2016. 11. 17] 21

2.3 Parameter Estimation (1)

⚪ Parameter estimation means to estimate of the covariance matrix

, subject to the constraints.

⚪ Let , are random vector from the model with ,

and let be the sample covariance matrix :

′

⚪ In case of over-indentification, estimates are obtained such that

the discrepancy between and is minimized.

[사회과학 연구방법론 포럼 2016. 11. 17] 22

Parameter Estimation (2) : Fitting Function

⚪ ML method :

log log ⚪ GLS method :

⚪ OLS method :

[사회과학 연구방법론 포럼 2016. 11. 17] 23

2.4 Testing : Testing of Model

⚪ Hypothesis :

v.s. ≠

⚪ Likelihood Ratio Test :

log →

where

-

: Likelihood ratio

- (# of elements in )

[사회과학 연구방법론 포럼 2016. 11. 17] 24

Testing of Individual Parameter

⚪ Hypothesis:

v.s. ≠

where is the th element in .

⚪ Let

= without the th element

then

log

→

[사회과학 연구방법론 포럼 2016. 11. 17] 25

3. Example Using SAS

◦ Salesperson’s Job-satisfaction (Bagozzi & Aaker, 1979)

(Ref: Analyzing Multivariate Data (2003), p.355 )

◦ In the study, they constructed a simple model of salesperson job

satisfaction based on a covariance data.

[사회과학 연구방법론 포럼 2016. 11. 17] 26

Covariance Data

Data job (type=cov);input _type_ $ _name_ $ y1 y2 x1 x2;cards n . 106 106 106 106 cov y1 11.7649 . . . cov y2 6.2359 7.8961 . . cov x1 2.2004 1.7480 4.6656 . cov x2 1.7947 1.6439 2.4383 4.2436;run;

[사회과학 연구방법론 포럼 2016. 11. 17] 27

Measurement Equation (1)

Independent Construct

◦ One independent construct is ‘self-esteem’ as a latent variable()

and used two different measures () :

, ,

◦ We assume that

≡ ≡≡

,

[사회과학 연구방법론 포럼 2016. 11. 17] 28


Dependent Construct

◦ One dependent construct was salesperson “satisfaction” (), which

is measured by :

,

◦ We assume that

≡≡

[사회과학 연구방법론 포럼 2016. 11. 17] 29


Self-esteem

Jobsatisfaction

[사회과학 연구방법론 포럼 2016. 11. 17] 30

Structural Equation

◦ The structural equations describe the dependence relationships

between the dependent latent variables and the independent

latent variable :

with

Self-esteem

Jobsatisfacti

on

[사회과학 연구방법론 포럼 2016. 11. 17] 31

Structural Equations with Latent Variables

Self-esteem

Jobsatisfaction

[사회과학 연구방법론 포럼 2016. 11. 17] 32

SAS Code

proc calis data=job cov edf=106 method=ml; LINEQS x1 = Lx1 f1 + ex1, x2 = Lx2 f1 + ex2, y1 = Ly1 f2 + ey1, y2 = Ly2 f2 + ey2, f2 = GAM1 f1 + d1; STD ex1 ex2 = vx1 vx2, ey1 ey2 = vy1 vy2, d1 = vd1;Run; Quit;

[사회과학 연구방법론 포럼 2016. 11. 17] 33

Warning

WARNING: The estimation problem is not identified: There are more parameters to

estimate ( 11 ) than the total number of mean and covariance elements ( 10 ).

NOTE: The means of one or more variables in the input data set WORK.JOB are missing

and are assumed to be 0.

NOTE: Convergence criterion (ABSGCONV=0.00001) satisfied.

NOTE: The Moore-Penrose inverse is used in computing the covariance matrix for

parameter estimates.

WARNING: Standard errors and t values might not be accurate with the use of the

Moore-Penrose inverse.

WARNING: Critical N is not computable for df= -1.

[사회과학 연구방법론 포럼 2016. 11. 17] 34

Additional Assumption

◦ Assume that ,

proc calis data=job cov edf=106 method=ml; LINEQS x1 = Lx1 f1 + ex1, x2 = Lx2 f1 + ex2, y1 = f2 + ey1, y2 = Ly2 f2 + ey2, f2 = GAM1 f1 + d1; STD ex1 ex2 = vx1 vx2, ey1 ey2 = vy1 vy2, d1 = vd1, f1 =1.0;Run; Quit;

[사회과학 연구방법론 포럼 2016. 11. 17] 35

Fit Summary

Fit Function 0.0025

Chi-Square 0.2626

Chi-Square DF 1

Pr > Chi-Square 0.6084

Root Mean Square Residual (RMR) 0.0422

Standardized RMR (SRMR) 0.0063

Goodness of Fit Index (GFI) 0.9988

◦ We cannot reject the null hypothesis that our model is the same

as the underlying model that generating these data.

◦ The result is corroborated by an adjusted goodness-of-fit measure

of 0.9988.

[사회과학 연구방법론 포럼 2016. 11. 17] 36

Parameter Estimates

Effects in Linear Equations Standard Variable Predictor Parameter Estimate Error t Value Pr > |t|

x1 f1 Lx1 1.66349 0.29421 5.6540 <.0001 x2 f1 Lx2 1.46578 0.27031 5.4226 <.0001 y1 f2 1.00000 y2 f2 Ly2 0.84238 0.21493 3.9192 <.0001 f2 f1 GAM1 1.28212 0.38106 3.3647 0.0008

◦ × , ×

◦ , ×

◦ ×

[사회과학 연구방법론 포럼 2016. 11. 17] 37

Variance Estimates

Effects in Linear Equations

Standard

Variable Predictor Parameter Estimate Error t Value Pr > |t|

x1 f1 Lx1 1.66349 0.29421 5.6540 <.0001

x2 f1 Lx2 1.46578 0.27031 5.4226 <.0001

y1 f2 1.00000

y2 f2 Ly2 0.84238 0.21493 3.9192 <.0001

f2 f1 GAM1 1.28212 0.38106 3.3647 0.0008

[사회과학 연구방법론 포럼 2016. 11. 17] 38

Variance Estimates

Estimates for Variances of Exogenous Variables

Variable Standard

Type Variable Parameter Estimate Error t Value Pr > |t|

Error ex1 vx1 1.89841 0.82669 2.2964 0.0217

ex2 vx2 2.09510 0.67365 3.1101 0.0019

ey1 vy1 4.36214 1.87488 2.3266 0.0200

ey2 vy2 2.64313 1.31188 2.0148 0.0439

Disturbance d1 vd1 5.75893 1.88868 3.0492 0.0023

Latent f1 1.00000

[사회과학 연구방법론 포럼 2016. 11. 17] 39

4. Criticisms in SEM

◦ Statistical Viewpoint : Worry about too Strong Assumptions

1. Data Generating Mechanism

2. Various Measurements

3. Causality

4. Latent Variables

5. Parameter Estimation

6. Model Testing

◦ Practical Viewpoint : Enjoy Affluent Results

[사회과학 연구방법론 포럼 2016. 11. 17] 40

5. SEM with Survey Data

⚪ Much of the data subject to analysis by SEMs comes from surveys

collected using complex samples.

⚪ But the majority of SEMs analyses ignore the sample design and

report results that implicitly assume simple random sampling.

⚪ The consequences of this practice depend on the degree of

departure from simple random sampling.

[사회과학 연구방법론 포럼 2016. 11. 17] 41

Pseudo-Maximum Likelihood Method

⚪ (Muthen & Satorra, 1995) Psuedo-maximum likelihood (PML) for

linearization estimation of asymptotic covariance matrices is

frequently advocated for estimating SEM models with complex

survey data

⚪ (Skinner, Holt, & Smith, 1989) PML consists of two parts:

(1) replacing sample covariances by weighted sample

covariances, and

(2) replacing inverse Fisher information with a sandwich

estimator of variance.

[사회과학 연구방법론 포럼 2016. 11. 17] 42

Example of Weighted Sample Mean

⚪ : survey data

⚪ : corresponding weights

⚪ Sample mean :

⚪ Weighted sample mean :

[사회과학 연구방법론 포럼 2016. 11. 17] 43

Pseudo-Maximum Likelihood Method

⚪ (Stapleton, 2006; Asparouhov & Muthen, 2006; Asparouhov, 2005).

⚫ Not replacing the estimates by weighted estimates leads to

bias.

⚫ Not replacing Fisher information variance estimator with

sandwich estimator leads to wrong standard errors.

[사회과학 연구방법론 포럼 2016. 11. 17] 44

Thank You!

구조방정식모형의 통계적 쟁점 - 서울대학교...

Documents