구조방정식모형의 통계적 쟁점 - 서울대학교...
TRANSCRIPT
[사회과학 연구방법론 포럼 2016. 11. 17] 1
[2016년 사회과학 연구방법론 포럼]
구조방정식모형의 통계적 쟁점
Statistical Issues in Structural Equation Models
2016. 11. 17 (목)
김 규 성
(서울시립대학교 통계학과, [email protected])
[사회과학 연구방법론 포럼 2016. 11. 17] 2
Contents
1. Introduction
2. Basics of Structural Equation Modeling
2.1 Model Specification
2.2 Model Identification
2.3 Parameter Estimation
2.4 Testing
3. An Example Using SAS
4. Criticisms in SEM
5. SEM with Survey Data
[사회과학 연구방법론 포럼 2016. 11. 17] 3
1. Introduction
Structural Equation Modeling (SEM, 구조방정식 모형화)
⚪ (Kaplan, 2001) a class of methodologies that seeks to represent
hypotheses about the means, variances, and covariances of
observed data in terms of a smaller number of ‘structural’
parameters defined by a hypothesized underlying conceptual or
theoretical model.
⚪ Analysis of Covariance Structures
Causal Modeling
LISREL(Linear Structural RELations) Modeling
[사회과학 연구방법론 포럼 2016. 11. 17] 4
Developments of SEM (1)
1st Stage : Early disciplinary-specific developments
⚪ Psychology : Factor Analysis (인자분석)
- Spearman (1904)
⚪ Human Genetics : Regression Analysis (회귀분석)
- Galton (1889)
⚪ Biology : Path Modeling (경로 모형화)
- Wright (1934)
⚪ Economics : Simultaneous Equation Modeling (연립방정식 모형화)
- Koopmans (1953), Wold (1954)
[사회과학 연구방법론 포럼 2016. 11. 17] 5
Developments of SEM (2)
2nd stage : Unification
◦ Cross-disciplinary fertilization between economics, sociology, and
psychology leading to an explosion of empirical applications of
SEM
- Joreskog (1970, 1973), Goldberger and Duncan (1973)
3rd stage : Extensions
◦ A period of developing methods for handing discrete, ordinal, and
limited dependent variables
[사회과학 연구방법론 포럼 2016. 11. 17] 6
Developments of SEM (3)
4th stage : Recently
◦ A recent period of incorporating statistical advances into the SEM
framework, including
- generalized linear models,
- mixed effects models,
- mixture regression models,
- Bayesian methods,
- graphical models, and
- methods for identifying causal effects.
⚪ The recent period is substantially integrating SEM with the broader
statistical literature, which is making SEM an evermore exciting tool.
[사회과학 연구방법론 포럼 2016. 11. 17] 7
Developments of SEM (4) : Karimi et. al. (2014)
[사회과학 연구방법론 포럼 2016. 11. 17] 8
Software Package for SEM
⚪ LISREL(Linear Structural RELations)
⚪ AMOS (Analysis of MOmen tStructure)
⚪ EQS (Equations)
⚪ Mplus
⚪ Calis (a module of SAS)
⚪ SEPATH (a module of Statistica)
⚪ SPSS
⚪ lavaan, sem (modules in R)
[사회과학 연구방법론 포럼 2016. 11. 17] 9
Some References
◦ 김청택. 2016. “탐색적 요인분석의 오·남용 문제와 교정.” 조사연구 17: 1-29.
◦ 김상욱. 2016. “공분산구조분석의 모형추정 절차: 방법론적 진단 및 처방.” 조사연구 17:
55-70.
◦ 이기종. 2016. “구조방정식모형의 모형평가 오·남용과 교정.” 조사연구 17: 71-83.
◦ Bollen, K.A. and Pearl, J. 2013. “Eight myths about causality and SEMs.” In S.L.
Morgan (ed.), Hb of Causal Analysis for Social Research, 301-328.
◦ Fabrigar, L.R., Porter, R.D. and Norris, M.E. 2010. “Some things you should know
about structural equation modeling but never thought to ask.” Journal of
Consumer Psychology, 20: 221-225.
◦ Iacobucci, D. 2009. “Everything you always wanted to know about SEM but were
afraid to ask.” Journal of Consumer Psychology, 19: 673-80.
[사회과학 연구방법론 포럼 2016. 11. 17] 10
2. Basics of Structural Equation Modeling
2.1 Model Specification (모형 명시)
2.2 Model Identification (모형 식별)
2.3 Parameter Estimation (모수 추정)
2.4 Testing (검정)
[사회과학 연구방법론 포럼 2016. 11. 17] 11
2.1 Model Specification
⚪ The general structural equation model consists of two parts:
(a) the structural part (구조 부분) linking to latent variables to
each other via systems of simultaneous equations
(b) the measurement part (측정 부분) which links latent variables
to observed variables via a restricted (confirmatory) factor
model
[사회과학 연구방법론 포럼 2016. 11. 17] 12
Structural Part
⚪ Model
where
- : a vector of endogenous (criterion) latent variables
- : a vector of exogenous (predictor) latent variables
- : a matrix of regression coefficients relating the latent
endogenous variables to each other
- : a matrix of regression coefficients relating the latent
endogenous variables to exogenous variables
- : a vector of disturbance terms
[사회과학 연구방법론 포럼 2016. 11. 17] 13
Example : Structural Part
Self-esteem
Jobsatisfaction
[사회과학 연구방법론 포럼 2016. 11. 17] 14
Structural Part
⚪ In the structural part,
we assume that
, , : non-singular
⚪ Then
′ ′
[사회과학 연구방법론 포럼 2016. 11. 17] 15
Measurement Part
⚪ Model
where
- : a vector of dependent variable
- : a vector of independent variable
- , : matrices of factor loadings
- : vectors of errors
⚪ In the measurement part, assume that
, , : uncorrelated
[사회과학 연구방법론 포럼 2016. 11. 17] 16
Example : Measurement Part
Self-esteem
Jobsatisfaction
, , ,
[사회과학 연구방법론 포럼 2016. 11. 17] 17
Example : SEM - Combination
Self-esteem
Jobsatisfaction
[사회과학 연구방법론 포럼 2016. 11. 17] 18
Covariance Structure Model
⚪ Combining the structural part and measurement part leads to
×
where
- ′ ′′
- ′ -
′′
[사회과학 연구방법론 포럼 2016. 11. 17] 19
Covariance Structure Model
⚪ Elements of
- Some are fixed to zero by hypothesis.
- The remaining parameters are free to be estimated.
⚪ The pattern of fixed and free parameters implies a specific
structure for the covariance matrix of the observed variables.
⚪ Let : the parameter vector containing all of the parameters of
the model.
⇨ A structural equation model is a special case of the general
covariance structure with
[사회과학 연구방법론 포럼 2016. 11. 17] 20
2.2 Model Identification
⚪ The elements in are identified if they can be expressed uniquely
in terms of the elements of the covariance matrix
⚪ Previous Example: under-identified
- # of parameter in = 10
- # of parameter in = 11
[사회과학 연구방법론 포럼 2016. 11. 17] 21
2.3 Parameter Estimation (1)
⚪ Parameter estimation means to estimate of the covariance matrix
, subject to the constraints.
⚪ Let , are random vector from the model with ,
and let be the sample covariance matrix :
′
⚪ In case of over-indentification, estimates are obtained such that
the discrepancy between and is minimized.
[사회과학 연구방법론 포럼 2016. 11. 17] 22
Parameter Estimation (2) : Fitting Function
⚪ ML method :
log log ⚪ GLS method :
⚪ OLS method :
[사회과학 연구방법론 포럼 2016. 11. 17] 23
2.4 Testing : Testing of Model
⚪ Hypothesis :
v.s. ≠
⚪ Likelihood Ratio Test :
log →
where
-
: Likelihood ratio
- (# of elements in )
[사회과학 연구방법론 포럼 2016. 11. 17] 24
Testing of Individual Parameter
⚪ Hypothesis:
v.s. ≠
where is the th element in .
⚪ Let
= without the th element
then
log
→
[사회과학 연구방법론 포럼 2016. 11. 17] 25
3. Example Using SAS
◦ Salesperson’s Job-satisfaction (Bagozzi & Aaker, 1979)
(Ref: Analyzing Multivariate Data (2003), p.355 )
◦ In the study, they constructed a simple model of salesperson job
satisfaction based on a covariance data.
[사회과학 연구방법론 포럼 2016. 11. 17] 26
Covariance Data
Data job (type=cov);input _type_ $ _name_ $ y1 y2 x1 x2;cards n . 106 106 106 106 cov y1 11.7649 . . . cov y2 6.2359 7.8961 . . cov x1 2.2004 1.7480 4.6656 . cov x2 1.7947 1.6439 2.4383 4.2436;run;
[사회과학 연구방법론 포럼 2016. 11. 17] 27
Measurement Equation (1)
Independent Construct
◦ One independent construct is ‘self-esteem’ as a latent variable()
and used two different measures () :
, ,
◦ We assume that
≡ ≡≡
,
[사회과학 연구방법론 포럼 2016. 11. 17] 28
Measurement Equation (2)
Dependent Construct
◦ One dependent construct was salesperson “satisfaction” (), which
is measured by :
,
◦ We assume that
≡≡
[사회과학 연구방법론 포럼 2016. 11. 17] 29
Measurement Equation (3)
Self-esteem
Jobsatisfaction
[사회과학 연구방법론 포럼 2016. 11. 17] 30
Structural Equation
◦ The structural equations describe the dependence relationships
between the dependent latent variables and the independent
latent variable :
with
Self-esteem
Jobsatisfacti
on
[사회과학 연구방법론 포럼 2016. 11. 17] 31
Structural Equations with Latent Variables
Self-esteem
Jobsatisfaction
[사회과학 연구방법론 포럼 2016. 11. 17] 32
SAS Code
proc calis data=job cov edf=106 method=ml; LINEQS x1 = Lx1 f1 + ex1, x2 = Lx2 f1 + ex2, y1 = Ly1 f2 + ey1, y2 = Ly2 f2 + ey2, f2 = GAM1 f1 + d1; STD ex1 ex2 = vx1 vx2, ey1 ey2 = vy1 vy2, d1 = vd1;Run; Quit;
[사회과학 연구방법론 포럼 2016. 11. 17] 33
Warning
WARNING: The estimation problem is not identified: There are more parameters to
estimate ( 11 ) than the total number of mean and covariance elements ( 10 ).
NOTE: The means of one or more variables in the input data set WORK.JOB are missing
and are assumed to be 0.
NOTE: Convergence criterion (ABSGCONV=0.00001) satisfied.
NOTE: The Moore-Penrose inverse is used in computing the covariance matrix for
parameter estimates.
WARNING: Standard errors and t values might not be accurate with the use of the
Moore-Penrose inverse.
WARNING: Critical N is not computable for df= -1.
[사회과학 연구방법론 포럼 2016. 11. 17] 34
Additional Assumption
◦ Assume that ,
proc calis data=job cov edf=106 method=ml; LINEQS x1 = Lx1 f1 + ex1, x2 = Lx2 f1 + ex2, y1 = f2 + ey1, y2 = Ly2 f2 + ey2, f2 = GAM1 f1 + d1; STD ex1 ex2 = vx1 vx2, ey1 ey2 = vy1 vy2, d1 = vd1, f1 =1.0;Run; Quit;
[사회과학 연구방법론 포럼 2016. 11. 17] 35
Fit Summary
Fit Function 0.0025
Chi-Square 0.2626
Chi-Square DF 1
Pr > Chi-Square 0.6084
Root Mean Square Residual (RMR) 0.0422
Standardized RMR (SRMR) 0.0063
Goodness of Fit Index (GFI) 0.9988
◦ We cannot reject the null hypothesis that our model is the same
as the underlying model that generating these data.
◦ The result is corroborated by an adjusted goodness-of-fit measure
of 0.9988.
[사회과학 연구방법론 포럼 2016. 11. 17] 36
Parameter Estimates
Effects in Linear Equations Standard Variable Predictor Parameter Estimate Error t Value Pr > |t|
x1 f1 Lx1 1.66349 0.29421 5.6540 <.0001 x2 f1 Lx2 1.46578 0.27031 5.4226 <.0001 y1 f2 1.00000 y2 f2 Ly2 0.84238 0.21493 3.9192 <.0001 f2 f1 GAM1 1.28212 0.38106 3.3647 0.0008
◦ × , ×
◦ , ×
◦ ×
[사회과학 연구방법론 포럼 2016. 11. 17] 37
Variance Estimates
Effects in Linear Equations
Standard
Variable Predictor Parameter Estimate Error t Value Pr > |t|
x1 f1 Lx1 1.66349 0.29421 5.6540 <.0001
x2 f1 Lx2 1.46578 0.27031 5.4226 <.0001
y1 f2 1.00000
y2 f2 Ly2 0.84238 0.21493 3.9192 <.0001
f2 f1 GAM1 1.28212 0.38106 3.3647 0.0008
[사회과학 연구방법론 포럼 2016. 11. 17] 38
Variance Estimates
Estimates for Variances of Exogenous Variables
Variable Standard
Type Variable Parameter Estimate Error t Value Pr > |t|
Error ex1 vx1 1.89841 0.82669 2.2964 0.0217
ex2 vx2 2.09510 0.67365 3.1101 0.0019
ey1 vy1 4.36214 1.87488 2.3266 0.0200
ey2 vy2 2.64313 1.31188 2.0148 0.0439
Disturbance d1 vd1 5.75893 1.88868 3.0492 0.0023
Latent f1 1.00000
[사회과학 연구방법론 포럼 2016. 11. 17] 39
4. Criticisms in SEM
◦ Statistical Viewpoint : Worry about too Strong Assumptions
1. Data Generating Mechanism
2. Various Measurements
3. Causality
4. Latent Variables
5. Parameter Estimation
6. Model Testing
◦ Practical Viewpoint : Enjoy Affluent Results
[사회과학 연구방법론 포럼 2016. 11. 17] 40
5. SEM with Survey Data
⚪ Much of the data subject to analysis by SEMs comes from surveys
collected using complex samples.
⚪ But the majority of SEMs analyses ignore the sample design and
report results that implicitly assume simple random sampling.
⚪ The consequences of this practice depend on the degree of
departure from simple random sampling.
[사회과학 연구방법론 포럼 2016. 11. 17] 41
Pseudo-Maximum Likelihood Method
⚪ (Muthen & Satorra, 1995) Psuedo-maximum likelihood (PML) for
linearization estimation of asymptotic covariance matrices is
frequently advocated for estimating SEM models with complex
survey data
⚪ (Skinner, Holt, & Smith, 1989) PML consists of two parts:
(1) replacing sample covariances by weighted sample
covariances, and
(2) replacing inverse Fisher information with a sandwich
estimator of variance.
[사회과학 연구방법론 포럼 2016. 11. 17] 42
Example of Weighted Sample Mean
⚪ : survey data
⚪ : corresponding weights
⚪ Sample mean :
⚪ Weighted sample mean :
[사회과학 연구방법론 포럼 2016. 11. 17] 43
Pseudo-Maximum Likelihood Method
⚪ (Stapleton, 2006; Asparouhov & Muthen, 2006; Asparouhov, 2005).
⚫ Not replacing the estimates by weighted estimates leads to
bias.
⚫ Not replacing Fisher information variance estimator with
sandwich estimator leads to wrong standard errors.
[사회과학 연구방법론 포럼 2016. 11. 17] 44
Thank You!