ridge structural equation modeling with correlation matrices for...
TRANSCRIPT
Ridge Structural Equation Modeling with Correlation Matrices
for Ordinal and Continuous Data∗
Ke-Hai Yuan
University of Notre Dame
Rui Wu
Beihang University
Peter M. Bentler
University of California, Los Angeles
August 27, 2009
∗This research was supported by Grants DA00017 and DA01070 from the National In-
stitute on Drug Abuse and a grant from the National Natural Science Foundation of China
(30870784). The third author acknowledges a financial interest in EQS and its distribu-
tor, Multivariate Software. Correspondence concerning this article should be addressed to
Ke-Hai Yuan ([email protected]).
Abstract
This paper develops a ridge procedure for structural equation modeling (SEM) with
ordinal and continuous data by modeling polychoric/polyserial/product-moment correlation
matrix R. Rather than directly fitting R, the procedure fits a structural model to Ra =
R + aI by minimizing the normal-distribution-based discrepancy function, where a > 0.
Statistical properties of the parameter estimates are obtained. Four statistics for overall
model evaluation are proposed. Empirical results indicate that the ridge procedure for SEM
with ordinal data has better convergence rate, smaller bias, smaller mean square error and
better overall model evaluation than the widely used maximum likelihood procedure.
Keywords: Polychoric correlation, bias, efficiency, convergence, mean square error, overall
model evaluation.
1. Introduction
In social science research data are typically obtained by questionnaires in which respon-
dents are asked to choose one of a few categories on each of many items. Measurements are
obtained by coding the categories using 0 and 1 for dichotomized items or 1 to m for items
with m categories. Because the difference between 1 and 2 cannot be regarded as equivalent
to the difference between m − 1 and m, such obtained measurements only possess ordinal
properties. Pearson product-moment correlations cannot reflect the proper association for
items with ordinal data. Polychoric correlations are more appropriate if the ordinal variables
can be regarded as categorization from an underlying continuous normal distribution. Under
such an assumption, each observed ordinal random variable x is related to an underlying
continuous random variable z according to
x =
1 when z ∈ (−∞, τ1]2 when z ∈ (τ1, τ2]...
...m when z ∈ (τm−1,∞),
where τ0 = −∞ < τ1 < . . . < τm−1 < τm = ∞ are thresholds. All the continuous variables
together form a vector z = (z1, z2, . . . , zp)′ that follows a multivariate normal distribution
Np(µ,Σ), where µ = 0 and Σ = (ρij) is a correlation matrix due to identification considera-
tions. When such an assumption holds, polychoric correlations are consistent, asymptotically
normally distributed and their standard errors (SE) can also be consistently estimated (Ols-
son, 1979; Poon & Lee, 1987). On the other hand, the Pearson product-moment correlation
is generally biased, especially when the number of categories is small and the observed fre-
quencies of the marginal distributions are skewed. Simulation studies imply that polychoric
correlations also possess certain robust properties when the underlying continuous distribu-
tion departs from normality (see e.g., Quiroga, 1992).
Because item level data in social sciences are typically ordinal, structural equation mod-
eling (SEM) for such data has long been developed. Bock and Lieberman (1970) developed
a maximum likelihood (ML) approach to factor analysis with dichotomous data and a single
factor. Lee, Poon and Bentler (1990) extended this approach to general SEM with poly-
tomous variables. Because the ML approach involves the evaluation of multiple integrals,
it is computationally intensive. Instead of ML, Christoffersson (1975) and Muthen (1978)
1
proposed procedures of fitting a multiple-factor model using pairwise frequencies for dichoto-
mous data by generalized least squares with an asymptotically correct weight matrix (AGLS).
Muthen (1984) further formulated a general procedure for SEM with ordinal and continuous
data using AGLS, which forms the basis of LISCOMP (an early version of Mplus, Muthen &
Muthen, 2007). An AGLS approach for SEM with ordinal data was developed in Lee, Poon
and Bentler (1992), where thresholds, polychoric and polyserial variances-covariances were
estimated by ML. Lee, Poon and Bentler (1995) further formulated another AGLS approach
in which thresholds, polychoric and polyserial correlations are estimated by partition ML.
The approach of Lee et al. (1995) has been implemented in EQS (Bentler, 1995) with various
extensions for better statistical inference. Joreskog (1994) also gave the technical details to
SEM with ordinal variables, where thresholds are estimated using marginal frequencies and
followed by the estimation of polychoric correlations using pairwise frequencies and holding
the estimated thresholds constant. Joreskog’s development formed the basis for ordinal data
in LISREL1 (see e.g., Joreskog 1990; Joreskog & Sorbom, 1996). Technical details of the de-
velopment in Muthen (1984) were provided by Muthen and Satorra (1995). Recently, Bollen
and Maydeu-Olivares (2007) proposed a procedure using polychoric instrumental variables.
In summary, various technical developments have been made for SEM with ordinal data,
emphasizing AGLS.
Methods currently available in software are two-stage procedures where polychoric and
polyserial correlations are obtained first. This correlation matrix is then modeled with
SEM using ML, AGLS, the normal-distribution-based GLS (NGLS), least quares (LS), and
diagonally-weighted least squares (DWLS), as implemented in EQS, LISREL and Mplus.
We need to note that the ML procedure in software fits a structural model to a poly-
choric/polyserial correlation matrix by minimizing the normal distribution based discrep-
ancy function, treating the polychoric/polyserial correlation matrix as a sample covariance
matrix from a normally distributed sample, which is totally different from the ML procedure
considered by Lee et al. (1990). We will refer to this ML method in SEM software as the ML
method from now on. The main purpose of this paper is to develop a ridge procedure for
1In LISREL, the AGLS procedure is called weighted least squares (WLS) while GLS is reserved forthe GLS procedure when the weight matrix is obtained using the normal distribution assumption as incovariance structure analysis (Browne, 1974). We will further discuss the normal-distribution-based GLS inthe concluding section.
2
SEM with ordinal data that has a better convergence rate, smaller bias, smaller mean square
error (MSE) and better overall model evaluation than ML. We next briefly review studies on
the empirical behavior of several procedures to identify the limitation and strength of each
method and to motivate our study and development.
Babakus, Ferguson and Joreskog (1987) studied the procedure of ML with modeling four
different kinds of correlations (product-moment, polychoric, Spearman’s rho, and Kendall’s
tau-b) and found that ML with polychoric correlations provides the most accurate estimates
of parameters with respect to bias and MSE, but it is also associated with most noncon-
vergences. Rigdon and Ferguson (1991) studied modeling polychoric correlation matrices
with several discrepancy functions and found that distribution shape of the ordinal data,
sample size and fitting function all affect convergence rate. In particular, AGLS has the
most serious problem of convergence and improper solutions, especially when the sample
size is small. ML generates the most accurate estimates at sample size n = 500; ML almost
generates the most accurate parameter estimates at n = 300, as reported in Table 2 of the
paper. Potthast (1993) only studied the AGLS estimator and found that the resulting SEs
are substantially underestimated while the associated chi-square statistic is substantially in-
flated. Potthast also found that the AGLS estimators contain positive biases. Dolan (1994)
studied ML and AGLS with polychoric correlations and found that ML with polychoric cor-
relations produces the least biased parameter estimates while the AGLS estimator contains
substantial biases. DiStefano (2002) studied the performance of AGLS and also found that
the resulting SEs are substantially underestimated while the associated chi-square statistic
is substantially inflated. There also exist many nonconvergence problems and improper so-
lutions. The literature also shows that, for ML with polychoric correlations, the resulting
SEs and test statistic behave badly, because without correction the formula for SEs and
test statistics are not correct. Currently, when modeling polychoric/polyserial correlation
matrices, software has the option of calculating SEs based on a sandwich-type covariance
matrix and using rescaled or adjusted statistics for overall model evaluation. These corrected
versions are often called robust procedures in the literature. A recent study by Lei (2009)
on ML in EQS and DWLS in Mplus found that DWLS has a better convergence rate than
ML. She also found that relative biases of ML and DWLS parameter estimates were similar
3
conditioned on the study factors. She concluded (p. 505)
ML performed slightly better in standard error estimation (at smaller sample
sizes before it started to over-correct) while robust WLS provided slightly better
overall Type I error control and higher power in detecting omitted paths. In cases
when sample size is adequately large for the model size, especially when the model
is also correctly specified, it would matter little whether ML of EQS6 or robust
WLS of Mplus3.1 is chosen. However, when sample sizes are very small for the
model and the ordinal variables are moderately skewed, ML with Satorra-Bentler
scaled statistics may be recommended if proper solutions are obtainable.
In summary, ML and AGLS are the most widely studied procedures, and the latter cannot be
trusted with not large enough sample sizes although conditional on the correlation matrix it is
asymptotically the best procedure. These studies indicate that robust ML and robust DWLS
are promising procedures for SEM with ordinal data. Comparing robust ML and robust
DWLS, the former uses a weight that is determined by the normal distribution assumption
while the latter uses a diagonal weight matrix that treats all the correlations as independent.
The ridge procedure to be studied can be regarded as a combination of ML and LS.
One problem with ML is its convergence rate. This is partially because the polychoric
correlations are obtained from different marginals, and the resulting correlation matrix may
not be positive definite, especially when the sample size is not large enough and there are
many items. As a matter of fact, the normal-distribution-based discrepancy function can-
not take a polychoric correlation matrix that is not positive definite because the involved
logarithm function cannot take non-positive values. When the correlation matrix is near
singular and is still positive definite, the model implied matrix will need to mimic the near
singular correlation (data) matrix so that the estimation problem becomes ill-conditioned
(see e.g., Kelley, 1995), which results not only in slower or nonconvergence but also unstable
parameter estimates and unstable test statistics (Yuan & Chan, 2008). Such a phenomenon
can also happen to the sample covariance matrix when the sample size is small or when
the elements of the covariance matrix are obtained by ad-hoc procedures (see e.g., Wothke,
1993). Although smoothing the eigenvalues and imposing a constraint of positive definitive-
ness is possible (e.g., Knol & ten Berge, 1989), the statistical consequences have not been
4
worked out and remain unknown.
When a covariance matrix S is near singular, the matrix Sa = S + aI with a positive
scalar a will be positive definite and well-conditioned. Yuan and Chan (2008) proposed to
model Sa rather than S, using the normal distribution based discrepancy function. They
showed that the procedure results in consistent parameter estimates. Empirical results in-
dicate that the procedure not only converges better, at small sample sizes the resulting
parameter estimates are more accurate than the ML estimator (MLE) even when data are
normally distributed. Compared to modeling sample covariance matrices, modeling correla-
tions typically encounters more problems of convergence with smaller sample sizes, especially
for ordinal data that are skew distributed. Actually, both EQS and LISREL contain warn-
ings about proper application of modeling polychoric correlations when sample size is small.
Thus, the methodology in Yuan and Chan (2008) may be even more relevant to the analysis
of polychoric correlations than to sample covariance matrices. The aim of this paper is to
extend the procedure in Yuan and Chan (2008) to SEM with ordinal and continuous data
by modeling polychoric/polyserial/product-moment correlation matrices.
When a p × p sample covariance matrix S = (sij) is singular, the program LISREL
provides an option of modeling S + a diag(s11, · · · , spp), which is called the ridge option
(Joreskog & Sorbom, 1996, p. 24). With a correlation matrix R, the ridge option in LISREL
fits the structural model to R + aI, which is the same as extending the procedure in Yuan
and Chan (2008) to correlation matrices. Thus, ridge SEM with ordinal data has already
been implemented in LISREL. However, it is not clear how to properly apply this procedure
in practice, due to lack of studies of its properties. Actually, McQuitty (1997) conducted
empirical studies on the ridge option in LISREL 8 and concluded that (p. 251) “there appears
to be ample evidence that structural equation models should not be estimated with LISREL’s
ridge option unless the estimation of unstandardized factor loadings is the only goal.” One of
the contributions of this paper is to obtain statistical properties of ridge SEM with ordinal
data and to make it a statistically sound procedure. We will show that ridge SEM with
ordinal data enjoys consistent parameter estimates and consistent SEs. We will also propose
four statistics for overall model evaluation. Because ridge SEM is most useful when the
polychoric/polyserial correlation matrix is near singular, which tends to occur with smaller
5
sample sizes, we will conduct Monte Carlo study to see how ridge SEM performs with
respect to bias and efficiency of parameter estimates. We will also empirically identify the
most reliable statistics for overall model evaluation and evaluate the performance of formula-
based SEs.
Section 2 provides the details of the development for model inference, including consistent
parameter estimates and SEs as well as rescaled and adjusted statistic for overall model
evaluation. Monte Carlo results are presented in section 3. Section 4 contains a real data
example. Conclusion and discussion are offered at the end of the paper.
2. Model Inference
Let R be a p×p correlation matrix, including polychoric, polyserial and Pearson product-
moment correlations for ordinal and continuous variables. Let r be the vector of all the
correlations formed by the below-diagonal elements of R and ρ be the population counterpart
of r. Then it follows from Joreskog (1994), Lee et al. (1995) or Muthen and Satorra (1995)
that,√n(r − ρ)
L→ N(0,Υ), (1)
whereL→ denotes convergence in distribution and Υ is the asymptotic covariance matrix
of√nr that can be consistently estimated. For SEM with ordinal data, we will have a
correlation structure Σ(θ). As mentioned in the introduction, popular SEM software has
the option of modeling R by minimizing
FML(θ) = tr[RΣ−1(θ)] − log |RΣ−1(θ)| − p (2)
for parameter estimates θ. Such a procedure is just to replace the sample covariance matrix
S by the correlation matrix R in the most commonly used ML procedure for SEM. Equations
(1) and (2) can be compared to covariance structure analysis when S is based on a sample
from an unknown distribution. Actually, the same amount of information is provided in both
cases, where Υ’s need to be estimated using fourth-order moments. Similar to modeling
covariance matrices, (2) needs R to be positive definite. Otherwise, the term log |RΣ−1(θ)|is not defined.
For a positive a, let Ra = R + aI. Instead of minimizing (2), ridge SEM minimizes
FMLa(θa) = tr[RaΣ−1a (θa)] − log |RaΣ
−1a (θa)| − p (3)
6
for parameter estimates θa, where Σa(θa) = Σ(θa) + aI. We will show that θa is consistent
and asymptotically normally distributed. Notice that corresponding to Ra is a population
covariance matrix Σa = Σ+ aI, which has identical off-diagonal elements with Σ. Although
minimizing (2) for θ with a = 0 for categorical data is available in software, we cannot find
any documentation of its statistical properties. Our development will be for an arbitrary
positive a, including the ML procedure when a = 0. Parallel to Yuan and Chan (2008), the
following technical development will be within the context of LISREL models.
2.1 Consistency
Let z = (x∗′,y∗′)′ be the underlying standardized population. Using LISREL notation,
the “measurement model2” is given by
x∗ = µx + Λxξ + δ, y∗ = µy + Λyη + ε,
where µx = E(x∗), µy = E(y∗), Λx and Λy are factor loading matrices; ξ and η are vectors
of latent constructs with E(ξ) = 0 and E(η) = 0; and δ and ε are vectors of measurement
errors with E(δ) = 0, E(ε) = 0, Θδ = E(δδ′), Θε = E(εε′). The structural model that
describes interrelations of η and ξ is
η = Bη + Γξ + ζ,
where ζ is a vector of prediction errors with E(ζ) = 0 and Ψ = E(ζζ′). Let Φ = E(ξξ′),
the resulting covariance structure of z is (see Joreskog & Sorbom, 1996, pp. 1–3)
Σ(θ) =
(
ΛxΦΛ′x + Θδ ΛxΦΓ′(I − B′)−1Λ′
y
Λy(I − B)−1ΓΦΛ′x Λy(I− B)−1(ΓΦΓ′ + Ψ)(I− B′)−1Λ′
y + Θε
)
.
Recall that µ = E(z) = 0 and that Σ = Cov(z) is a correlation matrix when modeling
polychoric/polyserial/product-moment correlations. We have µx = 0, µy = 0,
diag(Θδ) = Iδ − diag(ΛxΦΛ′x)
and
diag(Θε) = Iε − diag[Λy(I − B)−1(ΓΦΓ′ + Ψ)(I − B′)−1Λ′y],
2The model presented here is the standard LISREL model in which δ and ε are not correlated. Resultsin this paper still hold when δ and ε correlate.
7
where diag(A) means the diagonal matrix formed by the diagonal elements of A, and Iδ
and Iε are identity matrices with the same dimension as Θδ and Θε, respectively. Thus, the
diagonal elements of Θδ and Θε are not part of the free parameters but part of the model
of Σ(θ) through the functions of free parameters in Λx, Λy, B, Γ, Φ, Ψ, offdiag(Θδ) and
offdiag(Θε), where offdiag(A) implies the off-diagonal elements of A.
When Σ(θ) is a correct model for Σ, there exist matrices Λ(0)x , Λ(0)
y , B(0), Γ(0), Φ(0),
Ψ(0), offdiag(Θ(0)δ ) and offdiag(Θ(0)
ε ) such that Σ = Σ(θ0), where θ0 is the vector containing
the population values of all the free parameters in θ. Let θa be the corresponding vector for
Λ(a)x = Λ(0)
x , Λ(a)y = Λ(0)
y , B(a) = B(0), Γ(a) = Γ(0), Φ(a) = Φ(0), Ψ(a) = Ψ(0), offdiag(Θ(a)δ ) =
offdiag(Θ(0)δ ) and offdiag(Θ(a)
ε ) = offdiag(Θ(0)ε ). In addition, let
diag(Θ(a)δ ) = diag(Θ
(0)δ ) + aIδ (4a)
and
diag(Θ(a)ε ) = diag(Θ(0)
ε ) + aIε. (4b)
Thus, θ(a) = θ(0) and Θ(a)δ and Θ(a)
ε are functions of θ(a) = θ(0) and a. Let the functions in
(4a) and (4b) be part of the model of Σa(θa). Then we have Σa = Σa(θa). Notice that Σa(θ)
is uniquely determined by Σ(θ) and a. This implies that whenever Σ(θ) is a correct model
for modeling Σ, Σa(θ) will be a correct model for modeling Σa. The above result also implies
that for a correctly specified Σ(θ), except for sampling errors, the parameter estimates for
Λx, Λy, B, Γ, Φ, Ψ and the non-diagonal elements of Θδ and Θε when modeling Ra will be
the same as those when modeling R. The parameter estimates for the diagonals of Θδ and
Θε of modeling Ra are different from those of modeling R by the constant a, which is up to
our choice. Traditionally, the diagonal elements of Θδ and Θε are estimates of the variances
of measurement errors/uniquenesses. When modeling Ra, these can be obtained by
diag(Θδ) = diag(Θ(a)δ ) − aIδ (5a)
and
diag(Θε) = diag(Θ(a)ε ) − aIε. (5b)
The above discussion implies that θa of modeling Ra may be different from θ of modeling R
due to sampling error, but their population counterparts are identical. We have the following
formal result.
8
Theorem 1. Under the conditions (I) Σ(θ) is correctly specified and identified and (II)
θ ∈ Θ and Θ is a compact subset of the Euclidean space Rq, θa is consistent for θ0 regardless
of the value of a.
The proof of the theorem is essentially the same as that for Theorem 1 in Yuan and Chan
(2008) when replacing the sample covariance matrix there by the correlation matrix R. Yuan
and Chan (2008) also discussed the benefit of modeling Sa = S + aI from a computational
perspective using the concept of condition number; the same benefit holds for modeling Ra.
One advantage of estimation with a better condition number is that a small change in the
sample will only cause a small change in θa while a small change in the sample can cause a
great change in θ if R is near singular. Readers who are interested in the details are referred
to Yuan and Chan (2008).
2.2 Asymptotic normality
We will obtain the asymptotic distribution of θa, which allows us to obtain its consistent
SEs. For such a purpose we need to introduce some notation first.
For a symmetric matrix A, let vech(A) be a vector by stacking the columns of A and
leaving out the elements above the diagonal. We define sa = vech(Ra), σa(θ) = vech[Σa(θ)],
s = vech(R), and σ(θ) = vech[Σ(θ)]. Notice that the difference between r and sa is that
sa also contains the p elements of a + 1 on the diagonal of Ra. Let Dp be the duplication
matrix defined by Magnus and Neudecker (1999, p. 49) and
Wa(θ) = 2−1D′p[Σ
−1a (θ) ⊗Σ−1
a (θ)]Dp.
We will use a dot on top of a function to denote the first derivative or the gradient. For
example, if θ contains q unknown parameters, σa(θ) = ∂σa(θ)/∂θ′ is a p∗ × q matrix, with
p∗ = p(p+ 1)/2. Under standard regularity conditions, including that θ0 is an interior point
of Θ, θa satisfies
ga(θa) = 0, (6)
where
ga(θ) = σ′a(θ)Wa(θ)[sa − σa(θ)].
In equation (6), because σa(θ) and σ(θ) only differ by a constant a, σa(θ) = σ(θ) and
9
sa − σa(θ) = s− σ(θ). So the effect of a on θa in (6) is only through
Wa(θ) = 2−1D′p[Σ(θ) + aI]−1 ⊗ [Σ(θ) + aI]−1Dp =
1
2(a + 1)2Ua, (7)
where
Ua = D′p[
1
a+ 1Σ(θ) +
a
a + 1I]−1 ⊗ [
1
a + 1Σ(θ) +
a
a + 1I]−1Dp.
Notice that the constant coefficient 1/[2(a + 1)2] in (7) does not have any effect on θa. It
is Ua that makes a difference. When a = 0, θa = θ is the ML parameter estimate. When
a = ∞,
Ua = D′pDp
is the weight matrix corresponding to modeling R by least squares. Thus, ridge SEM can
be regarded as a combination of ML and LS. We would expect that it has the merits of
both procedures. That is, ridge SEM will have a better convergence rate and more accurate
parameter estimates than ML and more efficient estimates than LS.
It follows from (6) and a Taylor expansion of ga(θa) at θ0 that
√n(θa − θ0) = −[ga(θ)]−1ga(θ0) = (σ′
aWaσa)−1σ′
aWa
√n(sa − σa) + op(1), (8)
where ga(θ) is q× q matrix and each of its rows is evaluated at a vector θ that is between θ0
and θa, and op(1) represents a quantity that converges to zero in probability as n increases.
We also omitted the argument of the functions in (8) when evaluated at the population value
θ0. Notice that the rows of σa corresponding to the diagonal elements of Σa are zeros and
the vector (sa − σa) constitutes of (r − ρ) plus p zeros. It follows from (1) that
√n(sa − σa)
L→ N(0,Υ∗), (9)
where Υ∗ is a p∗ × p∗ matrix consisting of Υ and p rows and p columns of zeros. We may
understand (9) by the general definition of a random variable, which is just a constant when
its variance is zero. It follows from (8) and (9) that
√n(θa − θ0)
L→ N(0,Ω), (10a)
where
Ω = (σ′aWaσa)
−1σ′aWaΥ
∗Waσa(σ′aWaσa). (10b)
10
A consistent estimate Ω = (ωij) of Ω can be obtained when replacing the unknown param-
eters in (10) by θa and Υ∗ by Υ∗. Notice that the Ω in (10) is the asymptotic covariance
matrix. We will compare the formula-based SEs ω1/2jj /
√n against empirical SEs at smaller
sample sizes using Monte Carlo.
2.3 Statistics for overall model evaluation
This subsection presents four statistics for overall model evaluation. When minimizing
(3) for parameter estimates, we automatically get a measure of discrepancy between data
and model, i.e., FMLa(θa). However, the popular statistic TMLa = nFMLa(θa) does not
asymptotically follow a chi-square distribution even when a = 0. Let
FRLSa(θa) = [sa − σa(θa)]′Wa(θa)[sa −σa(θa)] =
1
2tr[RaΣ
−1a (θa) − I]2
and
TRLSa = nFRLSa(θa) (11)
be the so-called reweighted LS statistic, which is in the default output of EQS when modeling
the covariance matrix. Under the assumption of a correct model structure, there exists
TMLa = TRLSa + op(1). (12)
It follows from (8) that
√n[sa − σa(θa)] =
√n(sa − σa) − [σa(θa) − σa(θ0)]
=√n(sa − σa) − σa(θa − θ0) + op(1)
= Pa
√n(sa − σa) + op(1),
(13)
where
Pa = I − σa(σ′aWaσa)
−1σ′aWa.
Combining (12) and (13) leads to
TMLa = n(sa − σa)′P′
aWaPa(sa − σa) + op(1). (14)
Notice that Υ∗ in (9) has a rank of p∗ = p(p − 1)/2, there exists a p∗ × p∗ matrix A such
that AA′ = Υ∗. Let u ∼ Np∗(0, I), then it follows from (9) that
√n(sa − σa) = Au + op(1). (15)
11
Combining (14) and (15) yields
TMLa = u′(PaA)′Wa(PaA)u + op(1). (16)
Notice that (PaA)′Wa(PaA) is nonnegative definite and its rank is p∗−q. Let 0 < κ1 ≤ κ2 ≤. . . ≤ κp∗−q be the nonzero eigenvalues of (PaA)′Wa(PaA) or equivalently of P′
aWaPaΥ∗.
It follows from (16) that
TMLa =p∗−q∑
j=1
κju2j + op(1). (17)
where u2j are independent and each follows χ2
1. Unless all the κj ’s are 1.0, the distribution of
TMLa will not be χ2p∗−q. However, the behavior of TMLa might be approximately described
by a chi-square distribution with the same mean. Let Υ∗
be a consistent estimator of Υ∗,
which can be obtained from Υ plus p rows and p columns of zeros, and
m = tr(Υ∗P′
aWaPa)/(p∗ − q).
Then, as n→ ∞,
TRMLa = TMLa/m
approaches a distribution whose mean equals p∗ − q. Thus, we may approximate the distri-
bution of TMLa by
TRMLa∼χ2p∗−q , (18)
parallel to the Satorra and Bentler (1988) rescaled statistic when modeling the sample co-
variance matrix. Again, the approximation in (18) is motivated by asymptotics, we will use
Monte Carlo to study its performance with smaller sizes.
Notice that the systematic part of TRMLa is the quadratic form
QRMLa = (p∗ − q)p∗−q∑
j=1
κiu2j/(
p∗−q∑
j=1
κj),
which agrees with χ2p∗−q in the first moment. Allowing the degrees of freedom to be estimated
rather than p∗ − q, a statistic that agrees with the chi-square distribution in both the first
and second moments was studied by Satterthwaite (1941) and Box (1954), and applied
to covariance structure models by Satorra and Bentler (1988). It can also be applied to
approximate the distribution of TMLa. Let
m1 =p∗−q∑
j=1
κ2j/
p∗−q∑
j=1
κj , m2 = (p∗−q∑
j=1
κj)2/
p∗−q∑
j=1
κ2j .
12
Then TMLa/m1 asymptotically agrees with χ2m2
in the first two moments. Consistent esti-
mates of m1 and m2 are given by
m1 = tr[(Υ∗P′
aWaPa)2]/tr(Υ
∗P′
aWaPa), m2 = [tr(Υ∗P′
aWaPa)]2/tr[(Υ
∗P′
aWaPa)2].
(19)
Thus, using the approximation
TAMLa = TMLa/m1∼χ2m2
(20)
might lead to a better description of TMLa than (18). We will also study the performance of
(20) using Monte Carlo in the next section.
In addition to TMLa, TRLSa can also be used to construct statistics for overall model
evaluation, as printed out in EQS when modeling the sample covariance matrix. Like TMLa,
when modeling Ra, TRLSa does not asymptotically follow a chi-square distribution even when
a = 0. It follows from (12) that the distribution of TRLSa can be approximated using
TRRLSa = TRLSa/m ∼ χ2p∗−q (21)
or
TARLSa = TRLSa/m1 ∼ χ2m2. (22)
The rationales for the approximations in (21) and (22) are the same as those for (18) and
(20), respectively. Again, we will study the performance of TRRLSa and TARLSa using Monte
Carlo in the next section.
We would like to note that the op(1) in equation (12) goes to zero as n → ∞. But
statistical theory does not tell how close TMLa and TRLSa are at a finite n. The Monte Carlo
study in the next section will allow us to compare their performances and to identify the
best statistic for overall model evaluation at smaller sample sizes.
We also would like to note that the ridge procedure developed here is different from
the ridge procedure for modeling covariance matrices developed in Yuan and Chan (2008).
When treating Ra as a covariance matrix in the analysis, we will get identical TMLa and
TRLSa as defined here if all the diagonal elements of Σa(θa) happen to be a + 1, which is
true for many commonly used SEM models. However, the rescaled or adjusted statistics
will be different. Similarly, we may also get identical estimates for factor loadings, but their
13
SEs will be different when based on either the commonly used information matrix or the
sandwich-type covariance matrix constructed using Υ∗.
3. Monte Carlo Results
The population z contains 15 normally distributed random variables with mean zero and
covariance matrix
Σ = ΛΦΛ′ + Ψ,
where
Λ =
λ 0 0
0 λ 0
0 0 λ
with λ = (.60, .70, 0.75, .80, .90)′ and 0 being a vector of five zeros;
Φ =
1.0 .30 .40.30 1.0 .50.40 .50 1.0
;
and Ψ is a diagonal matrix such that all the diagonal elements of Σ are 1.0. Thus, z can be
regarded as generated by a 3-factor model, and each factor has 5 unidimensional indicators.
The observed variables in x are dichotomous and are obtained using thresholds
τ = (−0.52,−0.25, 0, 0.25, 0.52,−0.39,−0.25,−0.13, 0, 0.13, 0, 0.13, 0.25, 0.39, 0.52)′ ,
which corresponds to 30%, 40%, 50%, 60%, 70%, 35%, 40%, 45%, 50%, 55%, 50%, 55%, 60%,
65%, and 70% of zeros at the population level for the 15 variables, respectively. Because it is
with smaller sample sizes that ML encounter problems, we choose sample sizes3 n = 100, 200,
300 and 400. One thousand replications are used at each sample size. For each sample, we
model Ra with a = 0, .1 and .2, which are denoted by ML, ML.1 and ML.2, respectively. Our
evaluation includes the number of convergence or converging rate, the speed of convergence,
biases and SEs as well as mean square errors (MSE) of the parameter estimates; and the
performance of the four statistics given in the previous section. We also compare SEs based
on the covariance matrix Ω in (10) against empirical SEs. Note that not all of the N = 1000
replications converge in all the conditions, all the empirical results are based on Nc converged
3Actually, at sample size 400, we found that all replications converged with ML.
14
replications for each estimation method. For each parameter estimate, we report
Bias =1
Nc
Nc∑
i=1
θi − θ0,
Var =1
Nc − 1
Nc∑
i=1
(θi − θ)2,
with θ =∑Nc
i=1 θi/Nc, and
MSE =1
Nc
Nc∑
i=1
(θi − θ0)2.
For the quality of the formula-based SE, we report
SEF =1
Nc
Nc∑
i=1
SEi
with SEi being the square root of the diagonal elements of Ω/n in the ith converged repli-
cation, and the empirical SE, SEE , which is just the square root of Var.
The thresholds are estimated using the default probit function in SAS. Fisher scoring al-
gorithms are used in estimating both tetrachoric correlations R and structural model param-
eters θa (Lee & Jennrich, 1979; Olsson, 1979). Convergence criterion in estimating the tetra-
choric correlation matrix is set as |r(j+1) − r(j)| < 10−4, where r(j) is the value of r after the
jth iteration; convergence criterion for obtaining θa is set as max1≤i≤q |θ(j+1)i − θ
(j)i | < 10−4,
where θ(j)i is the ith parameter after the jth iteration. True population values are set as the
initial value in both estimation processes. We record the estimation as unable to reach a
convergence if the convergence criterion cannot be reached after 100 iterations. For each R,
the Υ in (1) is estimated using the procedure given in Joreskog (1994).
All replications reached convergence when estimating R. But more than half replications
cannot converge when solving (6) with a = 0 and .1 at n = 100, as reported in the upper
panel of Table 1. When a = .2, the number of convergence doubles that when a = 0 at
n = 100. When n = 200, there are still about one third of the replications cannot reach
convergence at a = 0 while all reach convergence at a = .2. Ridge SEM not only results in
more convergences but also converges faster, as reported in the lower panel of Table 1, where
each entry is the average of the number of iterations for Nc converged replications.
Insert Table 1 about here
15
Tables 2 to 5 contain the bias, empirical variances and MSE for n = 100 to 400, respec-
tively. We may notice that most biases are at the 3rd decimal place although they tend to
be negative. We may also notice that estimates for smaller loadings tend to have greater
variances and MSE. Because the Nc’s for the three different a’s in Table 2 are so different,
it is had to compare the results between different estimation methods. In Table 5, all the
three methods are converged for all the replications, the bias, variance, and MSE all become
smaller as a changes from 0 to .2, as reflected by the average of absolute values (AA) re-
ported in the last row of the table. In Table 4, both ML.1 and ML.2 converged for all the
replications, the bias, variance, and MSE corresponding to ML.2 are also smaller. These
indicate that the ridge procedure with a proper a leads to less biased, more efficient and
more accurate parameter estimates than ML.
Insert Tables 2 to 5 about here
Table 6 contains the empirical mean, standard deviation (SD), the number of rejection
and the rejection ratio of the statistics TRMLa and TAMLa, based on the converged replica-
tions. Each rejection is compared to the 95th percentiles of the reference distribution in (18)
or (20). For reference, the mean and SD of TMLa are also reported. Both TRMLa and TAMLa
over-reject the correct model although they tend to improve as n or a increases. Because
the three Nc’s at n = 100 are very different, the rejection rates, means or SDs of the three
estimation methods are not comparable for this sample size. Table 6 also suggests that TMLa
cannot be used for model inference because its empirical means and SDs are far away from
those of the nominal χ287.
Insert Tables 6 and 7 about here
Table 7 contains the results of TRRLSa and TARLSa, parallel to those in Table 6. For
n = 200, 300 and 400, the statistic TRRLSa performed very well in mean, SD and rejection rate
although there is a slight over-rejection at smaller n. While TRLSa monotonically decreases
with a, TRRLSa is very stable when a changes. The statistic TARLSa also performed well with
a little bit of under-rejection.
Insert Table 8 to 11 about here
16
Tables 8 to 11 compare formula-based SEs against empirical ones. Overall, formula-
based SEs tend to slightly under-predict empirical SEs when n is small, and the under-
prediction is between 1% and 2% with ML.2 at n = 400. In Table 11 when all the three
methods converged for all the replications, formula-based SEs under ML.1 and ML.2 predict
their corresponding empirical SEs better on average, as reflected by the average of absolute
differences (AAD) between SEF and SEE . In Table 10 when both ML.1 and ML.2 converged
on all the replications, SEF under ML.2 predicts the corresponding SEE better than those
under ML.1. Actually, the AAD under ML.2 in Table 11 is also smaller than that under
ML.1, although they are the same at the 3rd decimal place.
4. An Empirical Example
The empirical results in the previous section indicate that, on average, a proper a in MLa
resulted in better convergence rate, faster convergence, more accurate and more efficient
parameter estimates. This section further illustrates the effect of a on individual parameter
estimates and test statistics using a real data example, where ML fails.
Eysenck and Eysenck (1975) developed a Personality Questionnaire. The Chinese version
of it was available through Gong (1983). This questionnaire was administered to 117 fist
year graduate students in a Chinese university. There are four subscales in this question-
naire (Extraversion/ Introversion, Neuroticism/ Stability, Psychoticism/ Socialisation, Lie),
and each subscale consists of 20 to 24 items with two categories. We have access to the
dichotomized data of the Extraversion/Introversion subscale, which has 21 items. According
to the manual of the questionnaire, answers to the 21 items reflect a respondent’s latent
trait of Extraversion/Introversion. Thus, we may want to fit the dichotomized data by a
one-factor model. The tetrachoric correlation matrix4 R was first obtained together with
Υ. However, R is not positive definite. Its smallest eigenvalue is −.473. Thus, the ML
procedure cannot be used to analyze R.
Insert Table 12 about here
Table 12(a) contains the parameter estimates and their SEs when modeling Ra = R+aI
with a = .5, .6 and .7. To be more informative, estimates of error variances (ψ11 to ψ21,21)
4Four of the 21×20/2 = 210 contingency tables contain a cell with zero observations, which was replacedby .1 to facilitate the estimation of R.
17
are also reported. The results imply that both parameter estimates and their SEs change
little when a changes from .5 to .7. Table 12(b) contain the statistics TRMLa, TAMLa, TRRLSa
and TARLSa as well as the associated p-values. The estimated degrees of freedom m2 for
the adjusted statistics are also reported. All statistics indicate that the model does not fit
the data well, which is common when fitting practical data with a substantive model. The
statistics TRMLa and TAMLa decrease as a increases, while TRRLSa and TARLSa as well as m2
are barely affected by a. Comparing the statistics implies that statistics derived from TMLa
may not be as reliable as those derived from TRLSa. The p-values associated with TRRLSa
are smaller than those associated with TARLSa, which agrees with the results in the previous
section where TARLSa tends to slightly under-reject the correct model.
5. Conclusion and Discussion
Procedures for SEM with ordinal data have been implemented in major software. How-
ever, there exist problems of convergence in parameter estimation and lack of reliable statis-
tics for overall model evaluation, especially when the sample size is small and the observed
frequencies are skewed in distribution. In this paper we studied a ridge procedure paired
with the ML estimation method. We have shown that parameter estimates are consistent,
asymptotically normally distributed and their SEs can be consistently estimated. We also
proposed four statistics for overall model evaluation. Empirical results imply that the ridge
procedure performs better than ML in convergence rate, convergence speed, accuracy and
efficiency of parameter estimate. Empirical results also imply that the rescaled statistic
TRRLSa performs best at smaller sample sizes and TARLSa also performs well.
For SEM with covariance matrices, Yuan and Chan (2008) suggested choosing a = p/n.
Because p/n → 0 as n → ∞, the resulting estimator is asymptotically equivalent to
ML estimator. Unlike a covariance matrix that is always nonnegative definite, the poly-
choric/polyserial correlation matrix may have negative eigenvalues that are greater than p/n
in absolute value, choosing a = p/n may not lead to a positive definite Ra, as is the case
with the example in the previous section. In practice, one should choose an a that leads to a
proper convergence when estimating θa. Once converged, a greater a makes little difference
on parameter estimates and test statistics TRRLSa and TARLSa, as illustrated by the example
18
in the previous section and the Monte Carlo results in section 3. If the estimation cannot
converge for an a that makes the smallest eigenvalue of Ra great than, say, 1.0, then one
needs to choose either a different set of starting values or to reformulate the model.
We have only studied ridge ML in this paper, mainly because ML is the most popular
and most widely used procedure in SEM. In addition to ML, the normal-distribution-based
NGLS procedure has also been implemented in essentially all SEM software. The ridge
procedure developed in section 2 can be easily extended to NGLS. Actually, the asymptotic
distribution in (10) also holds for the NGLS estimator after changing Σ(θa) in the definition
of Wa to Sa; the rescaled and adjusted statistics parallel to those in (21) and (22) for the
NGLS procedure can be similarly obtained. Further Monte Carlo study on such an extension
is valuable.
Another issue with modeling polychoric/polyserial correlation matrix is the occurrence
of negative estimates of error variances. Such a problem can be caused by model misspecifi-
cation and/or small sample together with true error variances being small (see e.g, van Driel,
1978). Negative estimates of error variances can also occur with the ridge estimate although
it is more efficient than the ML estimator. This is because Σa(θ) will be also misspecified if
Σ(θ) is misspecified, and true error variances corresponding to the estimator in (5) continue
to be small when modeling Ra. For correctly specified models, negative estimates of error
variances is purely due to sampling error, which should be counted when evaluating empirical
efficiency and bias, as is done in this paper.
We have only considered the situation when z ∼ N(µ,Σ) and when Σ(θ) is correctly
specified. Monte Carlo results in Lee et al. (1995) and Flora and Curran (2004) imply
that SEM by analyzing the polychoric correlation matrix with ordinal data has certain
robust properties when z ∼ N(µ,Σ) is violated, which is a direct consequence of the robust
properties possessed by R, as reported in Quiroga (1992). These robust properties should
equally hold for ridge SEM because θa is a continuous function of R. When both Σ(θ) is
misspecified and z ∼ N(µ,Σ) does not hold, the two misspecifications might be confounded.
Any development to segregate the two would be valuable.
As a final note, the developed procedure can be easily implemented in a software that
already has the option of modeling polychoric/polyserial/product-moment correlation matrix
19
R by robust ML. With R being replaced by Ra, one only needs to set the diagonal of the fitted
model to a + 1 instead of 1.0 in the iteration process. The resulting statistics TML, TRML,
TAML and TRLS will automatically become TMLa, TRMLa, TAMLa and TRLSa, respectively. To
our knowledge, no software currently generates TRRLSa and TARLSa. However, with TRLSa,
m1 and m2, these two statistics can be easily calculated. Although Ra is literally a covariance
matrix, except unstandardized θa when diag(Ra) = Σ(θa) happen to hold, treating Ra as a
covariance matrix will not generate correct analysis (see e.g., McQuitty, 1997).
References
Babakus, E., Ferguson, C. E., & Joreskog, K. G. (1987). The sensitivity of confirmatory
maximum likelihood factor analysis to violations of measurement scale and distributional
assumptions. Journal of Marketing Research, 24, 222–229.
Bentler, P. M. (1995). EQS structural equations program manual. Encino, CA: Multivariate
Software.
Bock, R. D., & Lieberman, M. (1970). Fitting a response model for n dichotomously scored
items. Psychometrika, 35, 179–197.
Bollen, K. A., & Maydeu-Olivares, A. (2007). A polychoric instrumental variable (PIV)
estimator for structural equation models with categorical variables. Psychometrika, 72,
309–326.
Box, G. E. P. (1954). Some theorems on quadratic forms applied in the study of analysis of
variance problems. I. Effect of inequality of variance in the one-way classification. Annals
of Mathematical Statistics, 25, 290–302.
Browne, M. W. (1974). Generalized least-squares estimators in the analysis of covariance
structures. South African Statistical Journal, 8, 1–24.
Christoffersson, A. (1975). Factor analysis of dichotomized variables. Psychometrika, 40,
5–32.
DiStefano, C. (2002). The impact of categorization with confirmatory factor analysis. Struc-
tural Equation Modeling, 9, 327–346.
Dolan, C. V. (1994). Factor analysis of variables with 2, 3, 5 and 7 response categories: A
comparison of categorical variable estimators using simulated data. British Journal of
Mathematical and Statistical Psychology, 47, 309-326.
Eysenck, H. J., & Eysenck, S. B. G. (1975). Manual of the Eysenck personality questionnaire.
San Diego: Education and Industrial Testing Service.
20
Flora D. B., & Curran P. J. (2004). An empirical evaluation of alternative methods of
estimation for confirmatory factor analysis with ordinal data. Psychological Methods, 9,
466–491.
Gong, Y. (1983). Manual of EPQ Chinese revised version. Changsha, China: Hunan Medical
Institute.
Joreskog, K. G. (1994). On the estimation of polychoric correlations and their asymptotic
covariance matrix. Psychometrika, 59, 381-390.
Joreskog, K. G. (1990). New development in LISREL: Analysis of ordinal variables using
polychoric correlations and weighted least squares. Quality and qQuantity, 24, 387–404.
Joreskog, K. G., & Sorbom, D. (1996). LISREL 8 user’s reference guide. Chicago: Scientific
Software International.
Kelley, C. T. (1995). Iterative methods for linear and nonlinear equations. Philadelphia:
SIAM.
Knol, D. L., & ten Berge, J. M. F. (1989). Least squares approximation of an improper
correlation matrix by a proper one. Psychometrika, 54, 53–61.
Lee, S.-Y., & Jennrich, R. I. (1979). A study of algorithms for covariance structure analysis
with specific comparisons using factor analysis. Psychometrika, 44, 99–114.
Lee, S.-Y., Poon, W.-Y., & Bentler, P. M. (1990). Full maximum likelihood analysis of
structural equation models with polytomous variables. Statistics & Probability Letters, 9,
91–97.
Lee, S.-Y., Poon, W.-Y., & Bentler, P. M. (1992). Structural equation models with contin-
uous and polytomous variables. Psychometrika, 57, 89–105.
Lee, S.-Y., Poon, W.-Y., & Bentler, P. M. (1995). A two-stage estimation of structural equa-
tion models with continuous and polytomous variables. British Journal of Mathematical
and Statistical Psychology, 48, 339-358.
Lei P.-W. (2009). Evaluating estimation methods for ordinal data in structural equation
modeling. Quality and Quaintly, 43, 495–507.
Magnus, J. R., & Neudecker, H. (1999). Matrix differential calculus with applications in
statistics and econometrics. New York: Wiley.
McQuitty, S. (1997). Effects of employing ridge regression in structural equation models.
Structural Equation Modeling, 4, 244–252.
Muthen, B. (1978). Contributions to factor analysis of dichotomous variables. Psychome-
trika, 43, 551-560.
21
Muthen, B. (1984). A general structural equation model with dichotomous, ordered cate-
gorical, and continuous latent variable indicators. Psychometrika, 49, 115-132.
Muthen, L. K., & Muthen, B. O. (2007). Mplus user’s guide (5th ed). Los Angeles, CA:
Muthen & Muthen.
Muthen, B., & Satorra, A. (1995). Technical aspects of Muthens LISCOMP approach to
estimation of latent variable relations with a comprehensive measurement model. Psy-
chometrika, 60, 489–503.
Olsson, U. (1979). Maximum likelihood estimation of the polychoric correlation coefficient.
Psychometrika, 44, 443-460.
Poon, W.-Y., & Lee, S.-Y. (1987). Maximum likelihood estimation of multivariate polyserial
and polychoric correlation coefficient. Psychometrika, 52, 409–430.
Potthast, M. J. (1993). Confirmatory factor analysis of ordered categorical variables with
large models. British Journal of Mathematical and Statistical Psychology, 46, 273-286.
Quiroga, A. M. (1992). Studies of the polychoric correlation and other correlation measures
for ordinal variables. Unpublished doctoral dissertation, Acta Universitatis Upsaliensis.
Rigdon E. E., & Ferguson C. E. (1991). The performance of polychoric correlation coefficient
and selected fitting functions in confirmatory factor analysis with ordinal data. Journal
of Marketing Research, 28, 491–497.
Satorra, A., & Bentler, P. M. (1988). Scaling corrections for chi-square statistic in covariance
structure analysis. In Proceedings of the American Statistical Association, 308–313.
Satterthwaite, F. E. (1941). Synthesis of variance. Psychometrika, 6, 309–316.
van Driel, O. P. (1978). On various causes of improper solutions of maximum likelihood
factor analysis. Psychometrika, 43, 225-243.
Wothke, W. (1993). Nonpositive definite matrices in structural modeling. In K. A. Bollen
& J. S. Long (Eds.), Testing structural equation models (pp. 256–293). Newbury Park,
CA: Sage.
Yuan, K.-H., & Chan, W. (2008). Structural equation modeling with near singular covariance
matrices. Computational Statistics & Data Analysis, 52, 4842–4858.
22
Table 1. Number of convergences and average number of iterations, with 1000 replications.n ML ML.1 ML.2
100 438 491 961number of 200 688 995 1000
convergences 300 988 1000 1000400 1000 1000 1000100 18.767 12.112 10.286
average number 200 9.416 8.085 7.405of iterations 300 7.886 6.949 6.480
400 7.046 6.348 5.996
Table 2. Bias×103, variance×103 and MSE×103, based on converged replications (n = 100).ML ML.1 ML.2
θ Bias Var MSE Bias Var MSE Bias Var MSEλ11 5.185 18.143 18.170 -3.988 17.470 17.486 1.530 17.529 17.532λ21 -11.727 14.180 14.318 -15.649 12.559 12.804 -6.255 12.499 12.538λ31 -1.890 11.550 11.553 -5.339 11.444 11.473 -7.392 9.760 9.814λ41 -0.368 8.699 8.699 -14.127 8.798 8.998 -8.690 8.260 8.335λ51 0.657 6.947 6.947 -6.941 8.158 8.206 2.834 7.409 7.417λ62 -5.241 14.978 15.006 -8.417 13.622 13.693 -2.261 13.870 13.875λ72 -1.215 10.516 10.517 -8.112 10.938 11.004 -3.656 9.958 9.971λ82 3.788 10.932 10.946 -13.898 10.771 10.964 -4.154 9.790 9.807λ92 -1.361 8.740 8.742 -2.445 7.719 7.725 -4.716 7.455 7.477λ102 -0.004 6.824 6.824 0.890 6.722 6.723 -0.270 6.041 6.041λ113 -1.719 14.050 14.053 -8.167 12.416 12.483 -5.362 13.243 13.271λ123 0.779 10.636 10.636 -11.077 11.081 11.203 -5.846 10.399 10.434λ133 -10.898 10.471 10.590 -9.974 10.251 10.350 -8.264 10.128 10.196λ143 -2.369 8.945 8.950 -9.352 8.377 8.465 -6.214 8.093 8.132λ153 8.096 5.594 5.660 5.678 6.040 6.073 5.352 5.702 5.731φ21 -4.425 23.276 23.296 -6.583 19.610 19.653 1.626 19.017 19.019φ31 -15.483 19.907 20.147 -12.415 18.177 18.331 -6.745 17.842 17.888φ32 -18.726 16.089 16.439 -7.180 13.267 13.318 -3.908 13.664 13.679AA 5.218 12.249 12.305 8.346 11.523 11.608 4.726 11.148 11.175
Table 3. Bias×103, variance×103 and MSE×103, based on converged replications (n = 200).ML ML.1 ML.2
θ Bias Var MSE Bias Var MSE Bias Var MSEλ11 -6.274 8.541 8.580 -1.294 8.432 8.434 -0.903 8.194 8.195λ21 -7.503 5.817 5.873 -4.820 5.759 5.783 -3.902 5.618 5.633λ31 -6.789 4.921 4.967 -2.795 4.713 4.720 -2.057 4.624 4.628λ41 -3.583 4.411 4.424 -3.113 4.213 4.222 -2.567 4.165 4.171λ51 0.006 3.421 3.421 6.049 3.498 3.535 4.879 3.484 3.508λ62 -5.104 7.036 7.062 -0.611 6.686 6.686 -0.009 6.576 6.576λ72 -5.427 5.664 5.694 -2.001 5.172 5.176 -1.670 5.120 5.123λ82 -9.126 4.696 4.779 -3.397 4.453 4.465 -2.962 4.352 4.361λ92 -3.384 3.844 3.855 -3.905 3.654 3.669 -3.656 3.576 3.590λ102 1.446 3.041 3.043 3.100 2.746 2.755 1.882 2.687 2.690λ113 -2.467 7.303 7.309 1.211 6.472 6.473 1.997 6.332 6.336λ123 -7.934 5.449 5.512 -1.303 5.035 5.036 -0.974 4.901 4.902λ133 -6.691 4.974 5.019 -1.950 4.791 4.795 -1.344 4.743 4.745λ143 -4.782 3.882 3.905 -1.579 3.748 3.750 -0.976 3.670 3.671λ153 -1.121 3.019 3.021 2.049 2.988 2.992 1.182 2.936 2.938φ21 -6.746 9.078 9.124 -2.565 8.182 8.188 -1.256 8.076 8.077φ31 -10.248 9.997 10.102 -6.108 8.938 8.976 -4.708 8.687 8.710φ32 -7.119 7.275 7.325 -4.886 6.865 6.889 -2.882 6.751 6.760AA 5.319 5.687 5.723 2.930 5.353 5.364 2.211 5.250 5.256
Table 4. Bias×103, variance×103 and MSE×103, based on converged replications (n = 300).ML ML.1 ML.2
θ Bias Var MSE Bias Var MSE Bias Var MSEλ11 -1.083 5.365 5.366 -0.656 5.118 5.118 -0.657 5.025 5.025λ21 -1.849 3.904 3.907 -1.250 3.758 3.759 -1.077 3.691 3.692λ31 -4.751 3.272 3.295 -3.146 3.056 3.066 -2.444 2.984 2.990λ41 -2.907 2.805 2.814 -1.724 2.654 2.657 -1.208 2.601 2.602λ51 5.079 2.342 2.367 3.634 2.244 2.257 2.800 2.199 2.207λ62 -1.757 4.611 4.614 -0.707 4.446 4.446 -0.350 4.386 4.386λ72 -2.055 3.570 3.574 -1.489 3.444 3.446 -1.213 3.392 3.393λ82 -3.369 2.838 2.849 -2.820 2.740 2.748 -2.534 2.706 2.712λ92 -2.993 2.374 2.382 -2.424 2.293 2.299 -2.282 2.266 2.271λ102 3.584 1.753 1.766 2.440 1.687 1.693 1.828 1.663 1.666λ113 -1.719 4.670 4.673 -0.902 4.514 4.515 -0.548 4.455 4.456λ123 0.788 3.362 3.362 1.315 3.221 3.222 1.667 3.173 3.176λ133 -0.354 3.314 3.314 0.436 3.213 3.213 0.528 3.175 3.176λ143 -1.745 2.529 2.532 -1.026 2.434 2.435 -0.761 2.402 2.403λ153 3.050 2.136 2.145 1.937 2.081 2.084 1.253 2.051 2.053φ21 -6.235 6.290 6.329 -3.692 5.799 5.813 -2.512 5.651 5.657φ31 -6.226 5.907 5.946 -3.485 5.507 5.519 -2.147 5.377 5.382φ32 -5.390 4.606 4.635 -3.285 4.440 4.451 -2.149 4.371 4.375AA 3.052 3.647 3.659 2.020 3.481 3.486 1.553 3.420 3.423
Table 5. Bias×103, variance×103 and MSE×103, based on converged replications (n = 400).ML ML.1 ML.2
θ Bias Var MSE Bias Var MSE Bias Var MSEλ11 -1.647 3.975 3.978 -1.524 3.821 3.823 -1.500 3.759 3.761λ21 -0.748 2.901 2.902 -0.490 2.805 2.805 -0.447 2.766 2.766λ31 -5.050 2.480 2.506 -3.967 2.362 2.378 -3.456 2.315 2.327λ41 -3.670 2.078 2.091 -2.821 2.005 2.013 -2.456 1.976 1.982λ51 3.891 1.753 1.769 2.762 1.702 1.709 2.159 1.682 1.687λ62 -1.300 3.209 3.211 -0.826 3.156 3.157 -0.597 3.143 3.143λ72 -1.675 2.626 2.628 -1.220 2.546 2.548 -1.014 2.515 2.516λ82 -2.582 2.212 2.218 -2.170 2.146 2.150 -2.020 2.119 2.123λ92 -1.892 1.847 1.850 -1.449 1.792 1.794 -1.331 1.773 1.775λ102 2.656 1.263 1.270 1.837 1.225 1.228 1.405 1.210 1.212λ113 -3.076 3.393 3.402 -2.578 3.288 3.295 -2.329 3.248 3.254λ123 -0.097 2.518 2.518 0.525 2.443 2.443 0.801 2.414 2.415λ133 0.956 2.403 2.404 1.277 2.336 2.337 1.349 2.310 2.312λ143 -0.632 1.873 1.874 -0.085 1.808 1.808 0.136 1.780 1.780λ153 3.404 1.472 1.484 2.450 1.434 1.440 1.912 1.420 1.424φ21 -4.142 4.822 4.839 -2.642 4.499 4.506 -1.837 4.389 4.392φ31 -5.185 4.855 4.882 -3.193 4.534 4.544 -2.191 4.420 4.425φ32 -3.733 3.536 3.550 -2.350 3.376 3.381 -1.555 3.322 3.324AA 2.574 2.734 2.743 1.898 2.627 2.631 1.583 2.587 2.590
Table 6. Performance of the statistics TRMLa and TAMLa, based on converged replications
(df = p∗ − q = 87, RN=reject number, RR=reject ratio).TMLa TRMLa TAMLa
n Method Mean SD Mean SD RN RR Mean SD RN RR100 ML 497.31 162.63 87.59 32.53 92 21.0% 32.15 13.63 52 11.9%
ML.1 468.81 140.42 137.47 39.73 363 73.9% 56.23 15.99 284 57.8%ML.2 290.74 80.33 121.35 32.71 572 59.5% 51.20 13.47 375 39.0%
200 ML 708.30 228.50 133.40 41.16 474 68.9% 71.03 21.73 388 56.4%ML.1 371.14 89.05 109.59 25.27 441 44.3% 62.57 13.85 308 31.0%ML.2 236.76 43.25 99.24 17.78 248 24.8% 58.64 10.07 130 13.0%
300 ML 619.56 189.86 116.80 33.29 505 51.1% 71.41 19.72 410 41.5%ML.1 335.20 62.25 99.71 18.34 262 26.2% 65.79 11.75 177 17.7%ML.2 225.60 37.47 94.77 15.77 156 15.6% 64.52 10.34 95 9.5%
400 ML 549.61 126.47 105.00 23.18 365 36.5% 69.97 15.07 274 27.4%ML.1 320.01 55.37 95.55 16.45 189 18.9% 68.30 11.44 123 12.3%ML.2 219.34 35.16 92.24 14.85 114 11.4% 67.96 10.62 70 7.0%
Table 7. Performance of the statistics TRRLSa and TARLSa, based on converged replications(df = p∗ − q = 87, RN=reject number, RR=reject ratio).
TRLSa TRRLSa TARLSa
n Method Mean SD Mean SD RN RR Mean SD RN RR100 ML 523.60 88.61 90.45 13.47 37 8.4% 32.71 6.17 6 1.4%
ML.1 280.69 45.27 82.46 12.81 14 2.9% 33.76 5.19 1 0.2%ML.2 211.03 32.72 88.19 13.51 64 6.7% 37.24 5.47 5 0.5%
200 ML 443.05 69.13 83.66 12.52 16 2.3% 44.63 7.05 6 0.9%ML.1 296.88 48.23 87.74 13.80 65 6.5% 50.13 7.59 23 2.3%ML.2 209.50 32.92 87.83 13.61 65 6.5% 51.91 7.67 24 2.4%
300 ML 462.72 77.62 87.51 13.96 66 6.7% 53.63 8.75 29 2.9%ML.1 295.39 46.19 87.88 13.67 67 6.7% 57.99 8.74 37 3.7%ML.2 209.27 31.91 87.91 13.46 64 6.4% 59.86 8.79 37 3.7%
400 ML 457.14 75.88 87.41 13.94 59 5.9% 58.29 9.18 31 3.1%ML.1 292.46 45.10 87.33 13.44 48 4.8% 62.43 9.32 31 3.1%ML.2 207.57 31.31 87.29 13.25 46 4.6% 64.32 9.45 29 2.9%
Table 8. Empirical standard errors versus average formula-based standard errors, based on
converged replications (n = 100).ML ML.1 ML.2
θ SEE SEF SEE SEF SEE SEF
λ11 0.135 0.122 0.132 0.121 0.132 0.118λ21 0.119 0.104 0.112 0.103 0.112 0.100λ31 0.107 0.093 0.107 0.092 0.099 0.091λ41 0.093 0.085 0.094 0.087 0.091 0.084λ51 0.083 0.077 0.090 0.080 0.086 0.078λ62 0.122 0.115 0.117 0.113 0.118 0.112λ72 0.103 0.098 0.105 0.098 0.100 0.097λ82 0.105 0.088 0.104 0.090 0.099 0.088λ92 0.093 0.080 0.088 0.080 0.086 0.080λ102 0.083 0.068 0.082 0.068 0.078 0.068λ113 0.119 0.114 0.111 0.112 0.115 0.111λ123 0.103 0.099 0.105 0.099 0.102 0.097λ133 0.102 0.093 0.101 0.091 0.101 0.090λ143 0.095 0.084 0.092 0.084 0.090 0.083λ153 0.075 0.070 0.078 0.071 0.076 0.070φ21 0.153 0.138 0.140 0.131 0.138 0.128φ31 0.141 0.135 0.135 0.127 0.134 0.124φ32 0.127 0.123 0.115 0.117 0.117 0.115
AAD 0.010 0.008 0.008
Table 9. Empirical standard errors versus average formula-based standard errors, based on
converged replications (n = 200).ML ML.1 ML.2
θ SEE SEF SEE SEF SEE SEF
λ11 0.092 0.088 0.092 0.086 0.091 0.085λ21 0.076 0.074 0.076 0.072 0.075 0.072λ31 0.070 0.067 0.069 0.065 0.068 0.065λ41 0.066 0.061 0.065 0.060 0.065 0.060λ51 0.058 0.056 0.059 0.055 0.059 0.055λ62 0.084 0.082 0.082 0.080 0.081 0.080λ72 0.075 0.071 0.072 0.069 0.072 0.069λ82 0.069 0.065 0.067 0.063 0.066 0.063λ92 0.062 0.058 0.060 0.058 0.060 0.058λ102 0.055 0.049 0.052 0.049 0.052 0.049λ113 0.085 0.081 0.080 0.079 0.080 0.079λ123 0.074 0.071 0.071 0.069 0.070 0.069λ133 0.071 0.066 0.069 0.064 0.069 0.064λ143 0.062 0.061 0.061 0.059 0.061 0.059λ153 0.055 0.051 0.055 0.050 0.054 0.050φ21 0.095 0.096 0.090 0.093 0.090 0.092φ31 0.100 0.093 0.095 0.090 0.093 0.089φ32 0.085 0.086 0.083 0.083 0.082 0.082
AAD 0.003 0.003 0.003
Table 10. Empirical standard errors versus average formula-based standard errors, based
on converged replications (n = 300).ML ML.1 ML.2
θ SEE SEF SEE SEF SEE SEF
λ11 0.073 0.072 0.072 0.070 0.071 0.070λ21 0.062 0.060 0.061 0.059 0.061 0.059λ31 0.057 0.054 0.055 0.053 0.055 0.053λ41 0.053 0.050 0.052 0.049 0.051 0.049λ51 0.048 0.045 0.047 0.045 0.047 0.045λ62 0.068 0.067 0.067 0.066 0.066 0.066λ72 0.060 0.057 0.059 0.057 0.058 0.057λ82 0.053 0.052 0.052 0.052 0.052 0.052λ92 0.049 0.048 0.048 0.047 0.048 0.047λ102 0.042 0.040 0.041 0.040 0.041 0.040λ113 0.068 0.066 0.067 0.065 0.067 0.064λ123 0.058 0.057 0.057 0.057 0.056 0.056λ133 0.058 0.053 0.057 0.052 0.056 0.052λ143 0.050 0.049 0.049 0.048 0.049 0.048λ153 0.046 0.041 0.046 0.041 0.045 0.041φ21 0.079 0.078 0.076 0.076 0.075 0.075φ31 0.077 0.076 0.074 0.074 0.073 0.073φ32 0.068 0.070 0.067 0.068 0.066 0.067
AAD 0.002 0.002 0.001
Table 11. Empirical standard errors versus average formula-based standard errors, based
on converged replications (n = 400).ML ML.1 ML.2
θ SEE SEF SEE SEF SEE SEF
λ11 0.063 0.062 0.062 0.061 0.061 0.060λ21 0.054 0.052 0.053 0.051 0.053 0.051λ31 0.050 0.047 0.049 0.046 0.048 0.046λ41 0.046 0.043 0.045 0.043 0.044 0.043λ51 0.042 0.040 0.041 0.039 0.041 0.040λ62 0.057 0.058 0.056 0.057 0.056 0.057λ72 0.051 0.050 0.050 0.049 0.050 0.049λ82 0.047 0.045 0.046 0.045 0.046 0.045λ92 0.043 0.041 0.042 0.041 0.042 0.041λ102 0.036 0.035 0.035 0.035 0.035 0.035λ113 0.058 0.057 0.057 0.056 0.057 0.056λ123 0.050 0.050 0.049 0.049 0.049 0.049λ133 0.049 0.046 0.048 0.045 0.048 0.045λ143 0.043 0.042 0.043 0.042 0.042 0.042λ153 0.038 0.036 0.038 0.036 0.038 0.036φ21 0.069 0.068 0.067 0.066 0.066 0.065φ31 0.070 0.066 0.067 0.064 0.066 0.063φ32 0.059 0.060 0.058 0.059 0.058 0.058
AAD 0.002 0.001 0.001
Table 12(a). Parameter estimates θa and their standard errors for Example 1.
θa SEa 0.5 0.6 0.7 0.5 0.6 0.7λ1 0.664 0.663 0.663 0.119 0.119 0.119λ2 0.741 0.741 0.742 0.069 0.070 0.070λ3 0.827 0.826 0.826 0.063 0.063 0.064λ4 0.308 0.310 0.311 0.121 0.121 0.121λ5 0.711 0.713 0.714 0.079 0.079 0.079λ6 0.551 0.552 0.554 0.089 0.088 0.088λ7 -0.653 -0.654 -0.655 0.086 0.085 0.085λ8 0.694 0.695 0.695 0.072 0.072 0.072λ9 -0.641 -0.643 -0.644 0.078 0.078 0.078λ10 0.835 0.837 0.838 0.080 0.080 0.080λ11 0.549 0.549 0.550 0.098 0.098 0.098λ12 0.595 0.596 0.597 0.095 0.095 0.094λ13 -0.658 -0.657 -0.657 0.099 0.099 0.099λ14 0.810 0.809 0.807 0.063 0.064 0.064λ15 0.651 0.649 0.647 0.160 0.160 0.160λ16 0.324 0.325 0.325 0.117 0.117 0.117λ17 0.444 0.443 0.442 0.116 0.116 0.116λ18 0.234 0.235 0.237 0.138 0.138 0.138λ19 0.809 0.808 0.806 0.097 0.097 0.098λ20 0.327 0.329 0.330 0.124 0.124 0.124λ21 0.812 0.811 0.810 0.075 0.075 0.075ψ11 0.559 0.560 0.560ψ22 0.451 0.451 0.450ψ33 0.316 0.317 0.318ψ44 0.905 0.904 0.903ψ55 0.494 0.492 0.490ψ66 0.697 0.695 0.694ψ77 0.574 0.573 0.572ψ88 0.518 0.518 0.517ψ99 0.589 0.587 0.585ψ10,10 0.302 0.300 0.298ψ11,11 0.699 0.698 0.698ψ12,12 0.646 0.645 0.644ψ13,13 0.567 0.568 0.569ψ14,14 0.343 0.346 0.348ψ15,15 0.576 0.579 0.581ψ16,16 0.895 0.895 0.894ψ17,17 0.803 0.804 0.805ψ18,18 0.945 0.945 0.944ψ19,19 0.346 0.348 0.350ψ20,20 0.893 0.892 0.891ψ21,21 0.340 0.342 0.343
Table 12(b). Statistics for overall model evaluation for Example 1 (df = p∗ − q = 189).a TRMLa p TAMLa p m2 TRRLSa p TRLSa p.5 480.691 .000 110.491 .000 43.443 292.675 .000 67.274 0.012.6 381.939 .000 87.791 .000 43.443 293.880 .000 67.550 0.011.7 350.703 .000 80.597 .001 43.435 294.895 .000 67.771 0.011